[v4,5/5] arm64: mte: add compression support to mteswap.c

Message ID 20230720173956.3674987-6-glider@google.com
State New
Headers
Series Implement MTE tag compression for swapped pages |

Commit Message

Alexander Potapenko July 20, 2023, 5:39 p.m. UTC
  Define the internal mteswap.h interface:
 - _mte_alloc_and_save_tags()
 - _mte_free_saved_tags()
 - _mte_restore_tags()

, that encapsulates saving tags for a struct page (together with memory
allocation), restoring tags, and deleting the storage allocated for them.

These functions accept opaque pointers, which may point to 128-byte
tag buffers, as well as smaller buffers containing compressed tags, or
have compressed tags stored directly in them.

The existing code from mteswap.c operating with uncompressed tags is split
away into mteswap_nocomp.c, and the newly introduced mteswap_comp.c
provides compression support. The latter implementation
is picked if CONFIG_ARM64_MTE_COMP=y.

Soon after booting Android, tag compression saves ~2.5x memory previously
spent by mteswap.c on tag allocations. With the growing uptime, the
savings reach 20x and even more.

Signed-off-by: Alexander Potapenko <glider@google.com>

---
 v4:
  - minor code simplifications suggested by Andy Shevchenko, added
    missing header dependencies
  - changed compression API names to reflect modifications made to
    memcomp.h (as suggested by Yury Norov)

 v3:
  - Addressed comments by Andy Shevchenko in another patch:
   - fixed includes order
   - replaced u64 with unsigned long
   - added MODULE_IMPORT_NS(MTECOMP)
---
 arch/arm64/mm/Makefile         |  5 ++++
 arch/arm64/mm/mteswap.c        | 20 ++++++-------
 arch/arm64/mm/mteswap.h        | 12 ++++++++
 arch/arm64/mm/mteswap_comp.c   | 54 ++++++++++++++++++++++++++++++++++
 arch/arm64/mm/mteswap_nocomp.c | 38 ++++++++++++++++++++++++
 5 files changed, 118 insertions(+), 11 deletions(-)
 create mode 100644 arch/arm64/mm/mteswap.h
 create mode 100644 arch/arm64/mm/mteswap_comp.c
 create mode 100644 arch/arm64/mm/mteswap_nocomp.c
  

Comments

Catalin Marinas Aug. 18, 2023, 6:18 p.m. UTC | #1
On Thu, Jul 20, 2023 at 07:39:56PM +0200, Alexander Potapenko wrote:
> Soon after booting Android, tag compression saves ~2.5x memory previously
> spent by mteswap.c on tag allocations. With the growing uptime, the
> savings reach 20x and even more.

This sounds like a marketing claim ;). The way I read your statement is
that the memory used for tag storage is 2.5x less with your patches and
that's great. It means a 2.5x compression on average. How does the
compression get so much better to 20x with more uptime? The number of
tag storage allocations should be proportional to the pages swapped out
(not equal as not all would be mapped as PROT_MTE). So you can indeed
have a lot more pages swapped out than available in RAM and the tag
storage can take space but not sure which numbers you divided to get
20x.

Anyway, it would be nice to see the full picture of what the savings
relative to the total RAM is. Given that the swap in this instance is
zram, you have an upper bound of how many pages it can store. I'm just
trying to assess whether the complexity added here is worth it.

Maybe not as good as the RLE algorithm here, I was wondering whether we
could use zswap to save the tags together with the page. I looked some
time ago at it seemed slightly easier for zswap than zram. Another
option is to make the swap format more generic to support metadata
storage. Yet another option is for the zram to flag that it can compress
the metadata together with the data (no swap format change needed; when
decompressing the page, it populates the tags as well).
  
Alexander Potapenko Sept. 20, 2023, 1:26 p.m. UTC | #2
On Fri, Aug 18, 2023 at 8:18 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> On Thu, Jul 20, 2023 at 07:39:56PM +0200, Alexander Potapenko wrote:
> > Soon after booting Android, tag compression saves ~2.5x memory previously
> > spent by mteswap.c on tag allocations. With the growing uptime, the
> > savings reach 20x and even more.
>
> This sounds like a marketing claim ;). The way I read your statement is
> that the memory used for tag storage is 2.5x less with your patches and
> that's great. It means a 2.5x compression on average. How does the
> compression get so much better to 20x with more uptime?

I am currently looking at this, and I think this happens because some
userspace apps assign zero tags to userspace pages, and those tags are
effectively zero-cost, because they can be compressed into 8 bytes of
the Xarray pointer.
As the uptime grows, the share of such pages also grows.
I agree though that this is a marginal use case, and the average
compression rate is more representative.

> The number of
> tag storage allocations should be proportional to the pages swapped out
> (not equal as not all would be mapped as PROT_MTE).

We think a reasonable upper bound for PROT_MTE pages is 40%, but
currently it is probably more like 20%.


> So you can indeed
> have a lot more pages swapped out than available in RAM and the tag
> storage can take space but not sure which numbers you divided to get
> 20x.

Right now (in kernel 6.5) the amount of memory spent to store the tags
can be calculated as 128 * (number of mte_allocate_tag_storage() calls
- number of mte_free_tag_storage() calls).
In my patchset I calculate the total amount of memory
allocated/deallocated from the mte-tags-N caches and compare that with
128 * (total number of live objects in those caches).
E.g. the stats after booting up the device (~120s uptime) look as follows:

8 bytes: 14007 allocations, 256 deallocations
16 bytes: 1583 allocations, 179 deallocations
32 bytes: 1634 allocations, 205 deallocations
64 bytes: 1538 allocations, 142 deallocations
128 bytes: 10881 allocations, 1340 deallocations
uncompressed tag storage size: 3522688
compressed tag storage size: 1488792

(note 8-byte allocations contribute to uncompressed storage, but not
to compressed storage).

After running various apps, I made the device use almost 19Mb swap space:
8 bytes: 71352 allocations, 8093 deallocations
16 bytes: 5102 allocations, 2598 deallocations
32 bytes: 8206 allocations, 4536 deallocations
64 bytes: 9489 allocations, 5377 deallocations
128 bytes: 43258 allocations, 23364 deallocations
uncompressed tag storage size: 11960192
compressed tag storage size: 2967104

(Note the share of allocations compressed into 8 bytes is slowly growing)

In this case the compression ratio is 4x. I must admit I could not
reproduce the 20x compression this time, and after thinking a little I
anticipate the absolute values to be lower in that case (maybe that
was observed at some point where a lot of uncompressed data was
evicted from swap).

I therefore think I'd better make a modester claim in the docs/patch
description.



>
> Anyway, it would be nice to see the full picture of what the savings
> relative to the total RAM is. Given that the swap in this instance is
> zram, you have an upper bound of how many pages it can store. I'm just
> trying to assess whether the complexity added here is worth it.
>
> Maybe not as good as the RLE algorithm here, I was wondering whether we
> could use zswap to save the tags together with the page. I looked some
> time ago at it seemed slightly easier for zswap than zram. Another
> option is to make the swap format more generic to support metadata
> storage. Yet another option is for the zram to flag that it can compress
> the metadata together with the data (no swap format change needed; when
> decompressing the page, it populates the tags as well).
>
> --
> Catalin



--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
  
Alexander Potapenko Sept. 20, 2023, 2:22 p.m. UTC | #3
>
> Anyway, it would be nice to see the full picture of what the savings
> relative to the total RAM is. Given that the swap in this instance is
> zram, you have an upper bound of how many pages it can store. I'm just
> trying to assess whether the complexity added here is worth it.
Assuming the average compression rate of 2.5x, and the share of tagged
pages being 20%, we'll be saving 0.375% of the swapped memory:
  20% / 32 * (2.5-1)/2.5
With the compression rate of 4x and 40% of PROT_MTE pages, that would
be 0.9375%, which is tens of megabytes (probably still a little
though).

>
> Maybe not as good as the RLE algorithm here, I was wondering whether we
> could use zswap to save the tags together with the page. I looked some
> time ago at it seemed slightly easier for zswap than zram.

Android uses zram, so unfortunately we'll need to handle both.

> Another
> option is to make the swap format more generic to support metadata
> storage. Yet another option is for the zram to flag that it can compress
> the metadata together with the data (no swap format change needed; when
> decompressing the page, it populates the tags as well).

I haven't looked into this, but this might sound promising. We'll have
to copy the page contents to a temporary buffer holding both the page
data and the metadata, but it might be ok.

Another idea you've mentioned in one of the other patches is to only
compress the data if it fits into 63 bits.
Looks like this alone could yield 2x+ compression for 4K pages, and
will spare us the kmem cache juggling in this patch series.


> --
> Catalin



--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
  
Alexander Potapenko Sept. 20, 2023, 4:18 p.m. UTC | #4
On Wed, Sep 20, 2023 at 3:26 PM Alexander Potapenko <glider@google.com> wrote:
>
> On Fri, Aug 18, 2023 at 8:18 PM Catalin Marinas <catalin.marinas@arm.com> wrote:
> >
> > On Thu, Jul 20, 2023 at 07:39:56PM +0200, Alexander Potapenko wrote:
> > > Soon after booting Android, tag compression saves ~2.5x memory previously
> > > spent by mteswap.c on tag allocations. With the growing uptime, the
> > > savings reach 20x and even more.
> >
> > This sounds like a marketing claim ;). The way I read your statement is
> > that the memory used for tag storage is 2.5x less with your patches and
> > that's great. It means a 2.5x compression on average. How does the
> > compression get so much better to 20x with more uptime?
>
> I am currently looking at this, and I think this happens because some
> userspace apps assign zero tags to userspace pages, and those tags are
> effectively zero-cost, because they can be compressed into 8 bytes of
> the Xarray pointer.
> As the uptime grows, the share of such pages also grows.
> I agree though that this is a marginal use case, and the average
> compression rate is more representative.
>
> > The number of
> > tag storage allocations should be proportional to the pages swapped out
> > (not equal as not all would be mapped as PROT_MTE).
>
> We think a reasonable upper bound for PROT_MTE pages is 40%, but
> currently it is probably more like 20%.
>
>
> > So you can indeed
> > have a lot more pages swapped out than available in RAM and the tag
> > storage can take space but not sure which numbers you divided to get
> > 20x.
>
> Right now (in kernel 6.5) the amount of memory spent to store the tags
> can be calculated as 128 * (number of mte_allocate_tag_storage() calls
> - number of mte_free_tag_storage() calls).
> In my patchset I calculate the total amount of memory
> allocated/deallocated from the mte-tags-N caches and compare that with
> 128 * (total number of live objects in those caches).
> E.g. the stats after booting up the device (~120s uptime) look as follows:
>
> 8 bytes: 14007 allocations, 256 deallocations
> 16 bytes: 1583 allocations, 179 deallocations
> 32 bytes: 1634 allocations, 205 deallocations
> 64 bytes: 1538 allocations, 142 deallocations
> 128 bytes: 10881 allocations, 1340 deallocations
> uncompressed tag storage size: 3522688
> compressed tag storage size: 1488792
>
> (note 8-byte allocations contribute to uncompressed storage, but not
> to compressed storage).
>
> After running various apps, I made the device use almost 19Mb swap space:

Sorry, this is by no means 19Mb

> 8 bytes: 71352 allocations, 8093 deallocations
> 16 bytes: 5102 allocations, 2598 deallocations
> 32 bytes: 8206 allocations, 4536 deallocations
> 64 bytes: 9489 allocations, 5377 deallocations
> 128 bytes: 43258 allocations, 23364 deallocations
> uncompressed tag storage size: 11960192
> compressed tag storage size: 2967104
  

Patch

diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile
index 170dc62b010b9..46a798e2b67cb 100644
--- a/arch/arm64/mm/Makefile
+++ b/arch/arm64/mm/Makefile
@@ -11,6 +11,11 @@  obj-$(CONFIG_TRANS_TABLE)	+= trans_pgd-asm.o
 obj-$(CONFIG_DEBUG_VIRTUAL)	+= physaddr.o
 obj-$(CONFIG_ARM64_MTE)		+= mteswap.o
 obj-$(CONFIG_ARM64_MTE_COMP)	+= mtecomp.o
+ifdef CONFIG_ARM64_MTE_COMP
+obj-$(CONFIG_ARM64_MTE)		+= mteswap_comp.o
+else
+obj-$(CONFIG_ARM64_MTE)		+= mteswap_nocomp.o
+endif
 obj-$(CONFIG_ARM64_MTE_COMP_KUNIT_TEST) += test_mtecomp.o
 KASAN_SANITIZE_physaddr.o	+= n
 
diff --git a/arch/arm64/mm/mteswap.c b/arch/arm64/mm/mteswap.c
index cd508ba80ab1b..9d8f87fd191a2 100644
--- a/arch/arm64/mm/mteswap.c
+++ b/arch/arm64/mm/mteswap.c
@@ -5,8 +5,11 @@ 
 #include <linux/slab.h>
 #include <linux/swap.h>
 #include <linux/swapops.h>
+
 #include <asm/mte.h>
 
+#include "mteswap.h"
+
 static DEFINE_XARRAY(mte_pages);
 
 void *mte_allocate_tag_storage(void)
@@ -27,20 +30,18 @@  int mte_save_tags(struct page *page)
 	if (!page_mte_tagged(page))
 		return 0;
 
-	tag_storage = mte_allocate_tag_storage();
+	tag_storage = _mte_alloc_and_save_tags(page);
 	if (!tag_storage)
 		return -ENOMEM;
 
-	mte_save_page_tags(page_address(page), tag_storage);
-
 	/* page_private contains the swap entry.val set in do_swap_page */
 	ret = xa_store(&mte_pages, page_private(page), tag_storage, GFP_KERNEL);
 	if (WARN(xa_is_err(ret), "Failed to store MTE tags")) {
-		mte_free_tag_storage(tag_storage);
+		_mte_free_saved_tags(tag_storage);
 		return xa_err(ret);
 	} else if (ret) {
 		/* Entry is being replaced, free the old entry */
-		mte_free_tag_storage(ret);
+		_mte_free_saved_tags(ret);
 	}
 
 	return 0;
@@ -53,10 +54,7 @@  void mte_restore_tags(swp_entry_t entry, struct page *page)
 	if (!tags)
 		return;
 
-	if (try_page_mte_tagging(page)) {
-		mte_restore_page_tags(page_address(page), tags);
-		set_page_mte_tagged(page);
-	}
+	_mte_restore_tags(tags, page);
 }
 
 void mte_invalidate_tags(int type, pgoff_t offset)
@@ -64,7 +62,7 @@  void mte_invalidate_tags(int type, pgoff_t offset)
 	swp_entry_t entry = swp_entry(type, offset);
 	void *tags = xa_erase(&mte_pages, entry.val);
 
-	mte_free_tag_storage(tags);
+	_mte_free_saved_tags(tags);
 }
 
 void mte_invalidate_tags_area(int type)
@@ -78,7 +76,7 @@  void mte_invalidate_tags_area(int type)
 	xa_lock(&mte_pages);
 	xas_for_each(&xa_state, tags, last_entry.val - 1) {
 		__xa_erase(&mte_pages, xa_state.xa_index);
-		mte_free_tag_storage(tags);
+		_mte_free_saved_tags(tags);
 	}
 	xa_unlock(&mte_pages);
 }
diff --git a/arch/arm64/mm/mteswap.h b/arch/arm64/mm/mteswap.h
new file mode 100644
index 0000000000000..4c576b76785d1
--- /dev/null
+++ b/arch/arm64/mm/mteswap.h
@@ -0,0 +1,12 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef ARCH_ARM64_MM_MTESWAP_H_
+#define ARCH_ARM64_MM_MTESWAP_H_
+
+struct page;
+
+void *_mte_alloc_and_save_tags(struct page *page);
+void _mte_free_saved_tags(void *tags);
+void _mte_restore_tags(void *tags, struct page *page);
+
+#endif // ARCH_ARM64_MM_MTESWAP_H_
diff --git a/arch/arm64/mm/mteswap_comp.c b/arch/arm64/mm/mteswap_comp.c
new file mode 100644
index 0000000000000..2c4ac5eac9e59
--- /dev/null
+++ b/arch/arm64/mm/mteswap_comp.c
@@ -0,0 +1,54 @@ 
+// SPDX-License-Identifier: GPL-2.0
+
+/* MTE tag storage management with compression. */
+
+#include <linux/module.h>
+#include <linux/pagemap.h>
+#include <linux/slab.h>
+#include <linux/swap.h>
+#include <linux/swapops.h>
+#include <linux/xarray.h>
+
+#include <asm/mte.h>
+#include <asm/mtecomp.h>
+
+#include "mteswap.h"
+
+void *_mte_alloc_and_save_tags(struct page *page)
+{
+	u8 tags[128];
+	unsigned long handle;
+
+	mte_save_page_tags(page_address(page), tags);
+	handle = mte_compress(tags);
+	return xa_mk_value(handle);
+}
+
+void _mte_free_saved_tags(void *storage)
+{
+	unsigned long handle;
+	int size;
+
+	handle = xa_to_value(storage);
+	if (!handle)
+		return;
+	size = mte_storage_size(handle);
+	mte_release_handle(handle);
+}
+
+void _mte_restore_tags(void *tags, struct page *page)
+{
+	unsigned long handle;
+	u8 tags_decomp[128];
+
+	handle = xa_to_value(tags);
+	if (!handle)
+		return;
+	if (!try_page_mte_tagging(page))
+		return;
+	if (!mte_decompress(handle, tags_decomp))
+		return;
+	mte_restore_page_tags(page_address(page), tags_decomp);
+	set_page_mte_tagged(page);
+}
+MODULE_IMPORT_NS(MTECOMP);
diff --git a/arch/arm64/mm/mteswap_nocomp.c b/arch/arm64/mm/mteswap_nocomp.c
new file mode 100644
index 0000000000000..1e665a4b5f940
--- /dev/null
+++ b/arch/arm64/mm/mteswap_nocomp.c
@@ -0,0 +1,38 @@ 
+// SPDX-License-Identifier: GPL-2.0
+
+/* MTE tag storage management without compression support. */
+
+#include <linux/pagemap.h>
+#include <linux/slab.h>
+#include <linux/swap.h>
+#include <linux/swapops.h>
+#include <linux/xarray.h>
+
+#include <asm/mte.h>
+
+#include "mteswap.h"
+
+void *_mte_alloc_and_save_tags(struct page *page)
+{
+	void *storage;
+
+	storage = mte_allocate_tag_storage();
+	if (!storage)
+		return NULL;
+
+	mte_save_page_tags(page_address(page), storage);
+	return storage;
+}
+
+void _mte_free_saved_tags(void *storage)
+{
+	mte_free_tag_storage(storage);
+}
+
+void _mte_restore_tags(void *tags, struct page *page)
+{
+	if (!try_page_mte_tagging(page))
+		return;
+	mte_restore_page_tags(page_address(page), tags);
+	set_page_mte_tagged(page);
+}