[v5,0/5] Implement MTE tag compression for swapped pages

Message ID 20230922080848.1261487-1-glider@google.com
Headers
Series Implement MTE tag compression for swapped pages |

Message

Alexander Potapenko Sept. 22, 2023, 8:08 a.m. UTC
  Currently, when MTE pages are swapped out, the tags are kept in the
memory, occupying PAGE_SIZE/32 bytes per page. This is especially
problematic for devices that use zram-backed in-memory swap, because
tags stored uncompressed in the heap effectively reduce the available
amount of swap memory.

The RLE-based algorithm suggested by Evgenii Stepanov and implemented in
this patch series is able to efficiently compress fixed-size tag buffers,
resulting in practical compression ratio between 2.5x and 4x. In most
cases it is possible to store the compressed data in 63-bit Xarray values,
resulting in no extra memory allocations.

Our measurements show that the proposed algorithm provides better
compression than existing kernel compression algorithms (LZ4, LZO,
LZ4HC, ZSTD) can offer.

To implement compression/decompression, we also extend <linux/bitmap.h>
with methods to read/write bit values at arbitrary places in the map.

We refactor arch/arm64/mm/mteswap.c to support both the compressed
(CONFIG_ARM64_MTE_COMP) and non-compressed case. For the former, in
addition to tag compression, we move tag allocation from kmalloc() to
separate kmem caches, providing greater locality and relaxing the
alignment requirements.

v5:
 - fixed comments by Andy Shevchenko, Catalin Marinas, and Yury Norov
 - added support for 16K- and 64K pages
 - more efficient bitmap_write() implementation

v4:
 - fixed a bunch of comments by Andy Shevchenko and Yury Norov
 - added Documentation/arch/arm64/mte-tag-compression.rst

v3:
 - as suggested by Andy Shevchenko, use
   bitmap_get_value()/bitmap_set_value() written by Syed Nayyar Waris
 - switched to unsigned long to reduce typecasts
 - simplified the compression code

v2:
 - as suggested by Yuri Norov, replace the poorly implemented struct
   bitq with <linux/bitmap.h>



Alexander Potapenko (4):
  lib/test_bitmap: add tests for bitmap_{read,write}()
  arm64: mte: implement CONFIG_ARM64_MTE_COMP
  arm64: mte: add a test for MTE tags compression
  arm64: mte: add compression support to mteswap.c

Syed Nayyar Waris (1):
  lib/bitmap: add bitmap_{read,write}()

 Documentation/arch/arm64/index.rst            |   1 +
 .../arch/arm64/mte-tag-compression.rst        | 245 +++++++++
 arch/arm64/Kconfig                            |  21 +
 arch/arm64/include/asm/mtecomp.h              |  13 +
 arch/arm64/mm/Makefile                        |   7 +
 arch/arm64/mm/mtecomp.c                       | 467 ++++++++++++++++++
 arch/arm64/mm/mtecomp.h                       |  12 +
 arch/arm64/mm/mteswap.c                       |  20 +-
 arch/arm64/mm/mteswap.h                       |  12 +
 arch/arm64/mm/mteswap_comp.c                  |  60 +++
 arch/arm64/mm/mteswap_nocomp.c                |  38 ++
 arch/arm64/mm/test_mtecomp.c                  | 287 +++++++++++
 include/linux/bitmap.h                        |  68 +++
 lib/test_bitmap.c                             | 115 +++++
 14 files changed, 1355 insertions(+), 11 deletions(-)
 create mode 100644 Documentation/arch/arm64/mte-tag-compression.rst
 create mode 100644 arch/arm64/include/asm/mtecomp.h
 create mode 100644 arch/arm64/mm/mtecomp.c
 create mode 100644 arch/arm64/mm/mtecomp.h
 create mode 100644 arch/arm64/mm/mteswap.h
 create mode 100644 arch/arm64/mm/mteswap_comp.c
 create mode 100644 arch/arm64/mm/mteswap_nocomp.c
 create mode 100644 arch/arm64/mm/test_mtecomp.c
  

Comments

Andy Shevchenko Sept. 22, 2023, 2:35 p.m. UTC | #1
+Cc: Olek, who internally is being developed something similar to your first
patch here.

On Fri, Sep 22, 2023 at 10:08:42AM +0200, Alexander Potapenko wrote:
> Currently, when MTE pages are swapped out, the tags are kept in the
> memory, occupying PAGE_SIZE/32 bytes per page. This is especially
> problematic for devices that use zram-backed in-memory swap, because
> tags stored uncompressed in the heap effectively reduce the available
> amount of swap memory.
> 
> The RLE-based algorithm suggested by Evgenii Stepanov and implemented in
> this patch series is able to efficiently compress fixed-size tag buffers,
> resulting in practical compression ratio between 2.5x and 4x. In most
> cases it is possible to store the compressed data in 63-bit Xarray values,
> resulting in no extra memory allocations.
> 
> Our measurements show that the proposed algorithm provides better
> compression than existing kernel compression algorithms (LZ4, LZO,
> LZ4HC, ZSTD) can offer.
> 
> To implement compression/decompression, we also extend <linux/bitmap.h>
> with methods to read/write bit values at arbitrary places in the map.
> 
> We refactor arch/arm64/mm/mteswap.c to support both the compressed
> (CONFIG_ARM64_MTE_COMP) and non-compressed case. For the former, in
> addition to tag compression, we move tag allocation from kmalloc() to
> separate kmem caches, providing greater locality and relaxing the
> alignment requirements.
> 
> v5:
>  - fixed comments by Andy Shevchenko, Catalin Marinas, and Yury Norov
>  - added support for 16K- and 64K pages
>  - more efficient bitmap_write() implementation
> 
> v4:
>  - fixed a bunch of comments by Andy Shevchenko and Yury Norov
>  - added Documentation/arch/arm64/mte-tag-compression.rst
> 
> v3:
>  - as suggested by Andy Shevchenko, use
>    bitmap_get_value()/bitmap_set_value() written by Syed Nayyar Waris
>  - switched to unsigned long to reduce typecasts
>  - simplified the compression code
> 
> v2:
>  - as suggested by Yuri Norov, replace the poorly implemented struct
>    bitq with <linux/bitmap.h>
> 
> 
> 
> Alexander Potapenko (4):
>   lib/test_bitmap: add tests for bitmap_{read,write}()
>   arm64: mte: implement CONFIG_ARM64_MTE_COMP
>   arm64: mte: add a test for MTE tags compression
>   arm64: mte: add compression support to mteswap.c
> 
> Syed Nayyar Waris (1):
>   lib/bitmap: add bitmap_{read,write}()
> 
>  Documentation/arch/arm64/index.rst            |   1 +
>  .../arch/arm64/mte-tag-compression.rst        | 245 +++++++++
>  arch/arm64/Kconfig                            |  21 +
>  arch/arm64/include/asm/mtecomp.h              |  13 +
>  arch/arm64/mm/Makefile                        |   7 +
>  arch/arm64/mm/mtecomp.c                       | 467 ++++++++++++++++++
>  arch/arm64/mm/mtecomp.h                       |  12 +
>  arch/arm64/mm/mteswap.c                       |  20 +-
>  arch/arm64/mm/mteswap.h                       |  12 +
>  arch/arm64/mm/mteswap_comp.c                  |  60 +++
>  arch/arm64/mm/mteswap_nocomp.c                |  38 ++
>  arch/arm64/mm/test_mtecomp.c                  | 287 +++++++++++
>  include/linux/bitmap.h                        |  68 +++
>  lib/test_bitmap.c                             | 115 +++++
>  14 files changed, 1355 insertions(+), 11 deletions(-)
>  create mode 100644 Documentation/arch/arm64/mte-tag-compression.rst
>  create mode 100644 arch/arm64/include/asm/mtecomp.h
>  create mode 100644 arch/arm64/mm/mtecomp.c
>  create mode 100644 arch/arm64/mm/mtecomp.h
>  create mode 100644 arch/arm64/mm/mteswap.h
>  create mode 100644 arch/arm64/mm/mteswap_comp.c
>  create mode 100644 arch/arm64/mm/mteswap_nocomp.c
>  create mode 100644 arch/arm64/mm/test_mtecomp.c
> 
> -- 
> 2.42.0.515.g380fc7ccd1-goog
>
  
Alexander Lobakin Sept. 22, 2023, 2:40 p.m. UTC | #2
From: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Date: Fri, 22 Sep 2023 17:35:19 +0300

> +Cc: Olek, who internally is being developed something similar to your first
> patch here.

Oh, thanks.
The patch you mentioned properly implements cross-boundary accesses,
mine does not :D
But I guess we want to keep them both to keep the latter as optimized as
the current bitmap_{get,set}_value8()?

> 
> On Fri, Sep 22, 2023 at 10:08:42AM +0200, Alexander Potapenko wrote:
>> Currently, when MTE pages are swapped out, the tags are kept in the
>> memory, occupying PAGE_SIZE/32 bytes per page. This is especially
>> problematic for devices that use zram-backed in-memory swap, because
>> tags stored uncompressed in the heap effectively reduce the available
>> amount of swap memory.
>>
>> The RLE-based algorithm suggested by Evgenii Stepanov and implemented in
>> this patch series is able to efficiently compress fixed-size tag buffers,
>> resulting in practical compression ratio between 2.5x and 4x. In most
>> cases it is possible to store the compressed data in 63-bit Xarray values,
>> resulting in no extra memory allocations.
>>
>> Our measurements show that the proposed algorithm provides better
>> compression than existing kernel compression algorithms (LZ4, LZO,
>> LZ4HC, ZSTD) can offer.

[...]

Thanks,
Olek