[RFC,v1,0/4] Introduce merge identical pages mechanism

Message ID 20221121190020.66548-1-avromanov@sberdevices.ru
Headers
Series Introduce merge identical pages mechanism |

Message

Alexey Romanov Nov. 21, 2022, 7 p.m. UTC
  Hello!

This RFC series adds feature which allows merge identical
compressed pages into a single one. The main idea is that
zram only stores object references, which store the compressed
content of the pages. Thus, the contents of the zsmalloc objects
don't change in any way.

For simplicity, let's imagine that 3 pages with the same content
got into zram:

+----------------+   +----------------+   +----------------+
|zram_table_entry|   |zram_table_entry|   |zram_table_entry|
+-------+--------+   +-------+--------+   +--------+-------+
        |                    |                     |
        | handle (1)         | handle (2)          | handle (3)
+-------v--------+   +-------v---------+  +--------v-------+
|zsmalloc  object|   |zsmalloc  object |  |zsmalloc  object|
++--------------++   +-+-------------+-+  ++--------------++
 +--------------+      +-------------+     +--------------+
 | buffer: "abc"|      |buffer: "abc"|     | buffer: "abc"|
 +--------------+      +-------------+     +--------------+

As you can see, the data is duplicated. Merge mechanism saves
(after scanning objects) only one zsmalloc object. Here's
what happens ater the scan and merge:

+----------------+   +----------------+   +----------------+
|zram_table_entry|   |zram_table_entry|   |zram_tabl _entry|
+-------+--------+   +-------+--------+   +--------+-------+
        |                    |                     |
        | handle (1)         | handle (1)          | handle (1)
        |           +--------v---------+           |
        +-----------> zsmalloc  object <-----------+
                    +--+-------------+-+
                       +-------------+
                       |buffer: "abc"|
                       +-------------+

Thus, we reduced the amount of memory occupied by 3 times.

This mechanism doesn't affect the perf of the zram itself in
any way (maybe just a little bit on the zram_free_page function).
In order to describe each such identical object, we (constantly)
need sizeof(zram_rbtree_node) bytes. So, for example, if the system
has 20 identical buffers with a size of 1024, the memory gain will be
(20 * 1024) - (1 * 1024 + sizeof(zram_rbtree_node)) = 19456 -
sizeof(zram_rbtree_node) bytes. But, it should be understood, these are
counts without zsmalloc data structures overhead.

Testing on my system (8GB ram + 1 gb zram swap) showed that at high 
loads, on average, when calling the merge mechanism, we can save 
up to 15-20% of the memory usage.

This patch serices adds a new sysfs node (trigger merging) and new 
field in mm_stat (how many pages are merged in zram at the moment):

  $ cat /sys/block/zram/mm_stat
    431452160 332984392 339894272 0 339894272 282 0 51374 51374 0

  $ echo 1 > /sys/block/zram/merge

  $ cat /sys/block/zram/mm_stat
    431452160 270376848 287301504 0 339894272 282 0 51374 51374 6593

Alexey Romanov (4):
  zram: introduce merge identical pages mechanism
  zram: add merge sysfs knob
  zram: add pages_merged counter to mm_stat
  zram: recompression: add ZRAM_MERGED check

 Documentation/admin-guide/blockdev/zram.rst |   2 +
 drivers/block/zram/zram_drv.c               | 315 +++++++++++++++++++-
 drivers/block/zram/zram_drv.h               |   7 +
 3 files changed, 320 insertions(+), 4 deletions(-)
  

Comments

Johannes Weiner Nov. 21, 2022, 8:44 p.m. UTC | #1
On Mon, Nov 21, 2022 at 10:00:16PM +0300, Alexey Romanov wrote:
> Hello!
> 
> This RFC series adds feature which allows merge identical
> compressed pages into a single one. The main idea is that
> zram only stores object references, which store the compressed
> content of the pages. Thus, the contents of the zsmalloc objects
> don't change in any way.
> 
> For simplicity, let's imagine that 3 pages with the same content
> got into zram:
> 
> +----------------+   +----------------+   +----------------+
> |zram_table_entry|   |zram_table_entry|   |zram_table_entry|
> +-------+--------+   +-------+--------+   +--------+-------+
>         |                    |                     |
>         | handle (1)         | handle (2)          | handle (3)
> +-------v--------+   +-------v---------+  +--------v-------+
> |zsmalloc  object|   |zsmalloc  object |  |zsmalloc  object|
> ++--------------++   +-+-------------+-+  ++--------------++
>  +--------------+      +-------------+     +--------------+
>  | buffer: "abc"|      |buffer: "abc"|     | buffer: "abc"|
>  +--------------+      +-------------+     +--------------+
> 
> As you can see, the data is duplicated. Merge mechanism saves
> (after scanning objects) only one zsmalloc object. Here's
> what happens ater the scan and merge:
> 
> +----------------+   +----------------+   +----------------+
> |zram_table_entry|   |zram_table_entry|   |zram_tabl _entry|
> +-------+--------+   +-------+--------+   +--------+-------+
>         |                    |                     |
>         | handle (1)         | handle (1)          | handle (1)
>         |           +--------v---------+           |
>         +-----------> zsmalloc  object <-----------+
>                     +--+-------------+-+
>                        +-------------+
>                        |buffer: "abc"|
>                        +-------------+
> 
> Thus, we reduced the amount of memory occupied by 3 times.
> 
> This mechanism doesn't affect the perf of the zram itself in
> any way (maybe just a little bit on the zram_free_page function).
> In order to describe each such identical object, we (constantly)
> need sizeof(zram_rbtree_node) bytes. So, for example, if the system
> has 20 identical buffers with a size of 1024, the memory gain will be
> (20 * 1024) - (1 * 1024 + sizeof(zram_rbtree_node)) = 19456 -
> sizeof(zram_rbtree_node) bytes. But, it should be understood, these are
> counts without zsmalloc data structures overhead.
> 
> Testing on my system (8GB ram + 1 gb zram swap) showed that at high 
> loads, on average, when calling the merge mechanism, we can save 
> up to 15-20% of the memory usage.

This looks pretty great.

However, I'm curious why it's specific to zram, and not part of
zsmalloc? That way zswap would benefit as well, without having to
duplicate the implementation. This happened for example with
page_same_filled() and zswap_is_page_same_filled().

It's zsmalloc's job to store content efficiently, so couldn't this
feature (just like the page_same_filled one) be an optimization that
zsmalloc does transparently for all its users?

> This patch serices adds a new sysfs node (trigger merging) and new 
> field in mm_stat (how many pages are merged in zram at the moment):
> 
>   $ cat /sys/block/zram/mm_stat
>     431452160 332984392 339894272 0 339894272 282 0 51374 51374 0
> 
>   $ echo 1 > /sys/block/zram/merge
> 
>   $ cat /sys/block/zram/mm_stat
>     431452160 270376848 287301504 0 339894272 282 0 51374 51374 6593

The optimal frequency for calling this is probably tied to prevalent
memory pressure, which is somewhat tricky to do from userspace.

Would it make sense to hook this up to a shrinker?
  
Sergey Senozhatsky Nov. 22, 2022, 3 a.m. UTC | #2
On (22/11/21 15:44), Johannes Weiner wrote:
> This looks pretty great.
> 
> However, I'm curious why it's specific to zram, and not part of
> zsmalloc? That way zswap would benefit as well, without having to
> duplicate the implementation. This happened for example with
> page_same_filled() and zswap_is_page_same_filled().
> 
> It's zsmalloc's job to store content efficiently, so couldn't this
> feature (just like the page_same_filled one) be an optimization that
> zsmalloc does transparently for all its users?

Yea, that's a much needed functionality, but things may be "complicated".
We had that KSM-ish thing in the past in zram. Very briefly as we quickly
found out that the idea was patented by some company in China and we couldn't
figure our if it was safe to land that code upstream. So we ended up dropping
the patches.

https://lore.kernel.org/lkml/1494556204-25796-1-git-send-email-iamjoonsoo.kim@lge.com/

> Would it make sense to hook this up to a shrinker?

Hmm, ratelimited perhaps. We most likely don't want to scan the whole pool
every time a shrinker calls us (which can be quite often).
  
Sergey Senozhatsky Nov. 22, 2022, 3:07 a.m. UTC | #3
On (22/11/22 12:00), Sergey Senozhatsky wrote:
> On (22/11/21 15:44), Johannes Weiner wrote:
> > This looks pretty great.
> > 
> > However, I'm curious why it's specific to zram, and not part of
> > zsmalloc? That way zswap would benefit as well, without having to
> > duplicate the implementation. This happened for example with
> > page_same_filled() and zswap_is_page_same_filled().
> > 
> > It's zsmalloc's job to store content efficiently, so couldn't this
> > feature (just like the page_same_filled one) be an optimization that
> > zsmalloc does transparently for all its users?
> 
> Yea, that's a much needed functionality, but things may be "complicated".
> We had that KSM-ish thing in the past in zram. Very briefly as we quickly
> found out that the idea was patented by some company in China and we couldn't
> figure our if it was safe to land that code upstream. So we ended up dropping
> the patches.
> 
> https://lore.kernel.org/lkml/1494556204-25796-1-git-send-email-iamjoonsoo.kim@lge.com/

IIRC that was patent in question:

https://patentimages.storage.googleapis.com/e2/66/9e/0ddbfae5c182ac/US9977598.pdf
  
Alexey Romanov Nov. 22, 2022, 12:14 p.m. UTC | #4
Hello!

On Tue, Nov 22, 2022 at 12:07:42PM +0900, Sergey Senozhatsky wrote:
> On (22/11/22 12:00), Sergey Senozhatsky wrote:
> > On (22/11/21 15:44), Johannes Weiner wrote:
> > > This looks pretty great.
> > > 
> > > However, I'm curious why it's specific to zram, and not part of
> > > zsmalloc? That way zswap would benefit as well, without having to
> > > duplicate the implementation. This happened for example with
> > > page_same_filled() and zswap_is_page_same_filled().
> > > 
> > > It's zsmalloc's job to store content efficiently, so couldn't this
> > > feature (just like the page_same_filled one) be an optimization that
> > > zsmalloc does transparently for all its users?
> > 
> > Yea, that's a much needed functionality, but things may be "complicated".
> > We had that KSM-ish thing in the past in zram. Very briefly as we quickly
> > found out that the idea was patented by some company in China and we couldn't
> > figure our if it was safe to land that code upstream. So we ended up dropping
> > the patches.
> > 
> > https://lore.kernel.org/lkml/1494556204-25796-1-git-send-email-iamjoonsoo.kim@lge.com/
> 
> IIRC that was patent in question:
> 
> https://patentimages.storage.googleapis.com/e2/66/9e/0ddbfae5c182ac/US9977598.pdf

I think the patent is talking about "mapping the virtual address" (like
in KSM). But zram works with the "handle" abstraction, which is a boxed
pointer to the required object. I think my implementation and the patent
is slightly different. 

Also, the patent speaks of "compressing" pages. In this case, we can add
zs_merge() function (like zs_compact()), that is, remove the merge logic
at the allocator level. zsmalloc doesn't say anything about what objects
it can work with. Implementation at the zsmalloc level is possible,
though more complicated that at the zram level. 

I believe that we can implement at least one of the options I proposed.

What do you think?
  
Sergey Senozhatsky Nov. 23, 2022, 4:13 a.m. UTC | #5
On (22/11/22 12:14), Aleksey Romanov wrote:
> > IIRC that was patent in question:
> > 
> > https://patentimages.storage.googleapis.com/e2/66/9e/0ddbfae5c182ac/US9977598.pdf
> 
> I think the patent is talking about "mapping the virtual address" (like
> in KSM). But zram works with the "handle" abstraction, which is a boxed
> pointer to the required object. I think my implementation and the patent
> is slightly different. 
> 
> Also, the patent speaks of "compressing" pages. In this case, we can add
> zs_merge() function (like zs_compact()), that is, remove the merge logic
> at the allocator level. zsmalloc doesn't say anything about what objects
> it can work with. Implementation at the zsmalloc level is possible,
> though more complicated that at the zram level. 
> 
> I believe that we can implement at least one of the options I proposed.
> 
> What do you think?

Oh, yeah, I'm not saying that we cannot have something like that
in zram/zsmalloc, just wanted to give some historical retrospective
on this and point at some implementation details that should be
considered.
  
Dmitry Rokosov Nov. 23, 2022, 8:53 a.m. UTC | #6
Hello Sergey,

Thank you for your quick and detailed support! Here is my two cents
below.

On Wed, Nov 23, 2022 at 01:13:55PM +0900, Sergey Senozhatsky wrote:
> On (22/11/22 12:14), Aleksey Romanov wrote:
> > > IIRC that was patent in question:
> > > 
> > > https://patentimages.storage.googleapis.com/e2/66/9e/0ddbfae5c182ac/US9977598.pdf
> > 
> > I think the patent is talking about "mapping the virtual address" (like
> > in KSM). But zram works with the "handle" abstraction, which is a boxed
> > pointer to the required object. I think my implementation and the patent
> > is slightly different. 
> > 
> > Also, the patent speaks of "compressing" pages. In this case, we can add
> > zs_merge() function (like zs_compact()), that is, remove the merge logic
> > at the allocator level. zsmalloc doesn't say anything about what objects
> > it can work with. Implementation at the zsmalloc level is possible,
> > though more complicated that at the zram level. 
> > 
> > I believe that we can implement at least one of the options I proposed.
> > 
> > What do you think?
> 
> Oh, yeah, I'm not saying that we cannot have something like that
> in zram/zsmalloc, just wanted to give some historical retrospective
> on this and point at some implementation details that should be
> considered.

It's a very curious situation, I would say. I'm not so familiar with US
patent law, but I suppose it should be based on some keywords and
algorithms.

If we speak in terms of algorithm Alexey patch is different a little bit
from suggested in the patent paper. If we care about keywords, I think by
moving Alexey same page merging algorithm to zsmalloc we lose
"compressing" keyword, because zsmalloc operates with "objects" only,
doesn't matter if they are compressed or not.

Anyway, could you please suggest who can help to understand if it's safe
to use such same page merging algorithm in the upstream or not?
Maybe we can ask Linux Foundation lawyers to help us, just a guess.
I'm sure we shouldn't decline helpful features and optimization without
complete certainty about all restrictions.
  
Alexey Romanov Nov. 23, 2022, 9:07 a.m. UTC | #7
On Wed, Nov 23, 2022 at 01:13:55PM +0900, Sergey Senozhatsky wrote:
> On (22/11/22 12:14), Aleksey Romanov wrote:
> > > IIRC that was patent in question:
> > > 
> > > https://patentimages.storage.googleapis.com/e2/66/9e/0ddbfae5c182ac/US9977598.pdf
> > 
> > I think the patent is talking about "mapping the virtual address" (like
> > in KSM). But zram works with the "handle" abstraction, which is a boxed
> > pointer to the required object. I think my implementation and the patent
> > is slightly different. 
> > 
> > Also, the patent speaks of "compressing" pages. In this case, we can add
> > zs_merge() function (like zs_compact()), that is, remove the merge logic
> > at the allocator level. zsmalloc doesn't say anything about what objects
> > it can work with. Implementation at the zsmalloc level is possible,
> > though more complicated that at the zram level. 
> > 
> > I believe that we can implement at least one of the options I proposed.
> > 
> > What do you think?
> 
> Oh, yeah, I'm not saying that we cannot have something like that
> in zram/zsmalloc, just wanted to give some historical retrospective
> on this and point at some implementation details that should be
> considered.

Okay, in this case, can you review my patches please?
  
Dmitry Rokosov Dec. 1, 2022, 10:14 a.m. UTC | #8
Hello Sergey,

Hope you are doing well. Really sorry for the ping.

Did you get a chance to see the patch series, my questions, and
thoughts?

On Wed, Nov 23, 2022 at 11:53:06AM +0300, Dmitry Rokosov wrote:
> Hello Sergey,
> 
> Thank you for your quick and detailed support! Here is my two cents
> below.
> 
> On Wed, Nov 23, 2022 at 01:13:55PM +0900, Sergey Senozhatsky wrote:
> > On (22/11/22 12:14), Aleksey Romanov wrote:
> > > > IIRC that was patent in question:
> > > > 
> > > > https://patentimages.storage.googleapis.com/e2/66/9e/0ddbfae5c182ac/US9977598.pdf
> > > 
> > > I think the patent is talking about "mapping the virtual address" (like
> > > in KSM). But zram works with the "handle" abstraction, which is a boxed
> > > pointer to the required object. I think my implementation and the patent
> > > is slightly different. 
> > > 
> > > Also, the patent speaks of "compressing" pages. In this case, we can add
> > > zs_merge() function (like zs_compact()), that is, remove the merge logic
> > > at the allocator level. zsmalloc doesn't say anything about what objects
> > > it can work with. Implementation at the zsmalloc level is possible,
> > > though more complicated that at the zram level. 
> > > 
> > > I believe that we can implement at least one of the options I proposed.
> > > 
> > > What do you think?
> > 
> > Oh, yeah, I'm not saying that we cannot have something like that
> > in zram/zsmalloc, just wanted to give some historical retrospective
> > on this and point at some implementation details that should be
> > considered.
> 
> It's a very curious situation, I would say. I'm not so familiar with US
> patent law, but I suppose it should be based on some keywords and
> algorithms.
> 
> If we speak in terms of algorithm Alexey patch is different a little bit
> from suggested in the patent paper. If we care about keywords, I think by
> moving Alexey same page merging algorithm to zsmalloc we lose
> "compressing" keyword, because zsmalloc operates with "objects" only,
> doesn't matter if they are compressed or not.
> 
> Anyway, could you please suggest who can help to understand if it's safe
> to use such same page merging algorithm in the upstream or not?
> Maybe we can ask Linux Foundation lawyers to help us, just a guess.
> I'm sure we shouldn't decline helpful features and optimization without
> complete certainty about all restrictions.
  
Sergey Senozhatsky Dec. 1, 2022, 10:47 a.m. UTC | #9
On (22/12/01 13:14), Dmitry Rokosov wrote:
> Hello Sergey,
> 
> Hope you are doing well. Really sorry for the ping.
> 
> Did you get a chance to see the patch series, my questions, and
> thoughts?

Hey,

Not really, sorry. It's a holidays season + pre-merge window week.
I probably will start looking attentively next week or so.

In the meantime time I'll try to reach out to lawyers to get more
clarifications on that patent thingy.
  
Dmitry Rokosov Dec. 1, 2022, 11:14 a.m. UTC | #10
On Thu, Dec 01, 2022 at 07:47:21PM +0900, Sergey Senozhatsky wrote:
> On (22/12/01 13:14), Dmitry Rokosov wrote:
> > Hello Sergey,
> > 
> > Hope you are doing well. Really sorry for the ping.
> > 
> > Did you get a chance to see the patch series, my questions, and
> > thoughts?
> 
> Hey,
> 
> Not really, sorry. It's a holidays season + pre-merge window week.
> I probably will start looking attentively next week or so.
> 
> In the meantime time I'll try to reach out to lawyers to get more
> clarifications on that patent thingy.

Yeah, got it. Thanks a lot for the support! Appreciate it!

BTW, happy holidays :)
  
Sergey Senozhatsky Dec. 1, 2022, 1:29 p.m. UTC | #11
On (22/12/01 14:14), Dmitry Rokosov wrote:
> 
> BTW, happy holidays :)

Happy holidays!
  
Alexey Romanov Jan. 11, 2023, 2 p.m. UTC | #12
Hello Sergey! 

On Thu, Dec 01, 2022 at 07:47:21PM +0900, Sergey Senozhatsky wrote:
> On (22/12/01 13:14), Dmitry Rokosov wrote:
> > Hello Sergey,
> > 
> > Hope you are doing well. Really sorry for the ping.
> > 
> > Did you get a chance to see the patch series, my questions, and
> > thoughts?
> 
> Hey,
> 
> Not really, sorry. It's a holidays season + pre-merge window week.
> I probably will start looking attentively next week or so.
> 
> In the meantime time I'll try to reach out to lawyers to get more
> clarifications on that patent thingy.

Is there any news about the patent, lawyers and my patchset?
  
Sergey Senozhatsky Feb. 6, 2023, 10:37 a.m. UTC | #13
Hi,

On Wed, Jan 11, 2023 at 11:00 PM Alexey Romanov
<AVRomanov@sberdevices.ru> wrote:
>
> Is there any news about the patent, lawyers and my patchset?

I'll get back to you once I have any updates.