Message ID | 20230619143506.45253-2-avromanov@sberdevices.ru |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp3053821vqr; Mon, 19 Jun 2023 07:43:37 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6BlsQTOCMfUIHE/06DglxsNUzdZE9R+Rzw2S3J+392K5IIQZj6sMQGGrSrhyr6kWsN1dIX X-Received: by 2002:a17:90a:6f43:b0:25e:fb6d:ce68 with SMTP id d61-20020a17090a6f4300b0025efb6dce68mr5523984pjk.6.1687185816642; Mon, 19 Jun 2023 07:43:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687185816; cv=none; d=google.com; s=arc-20160816; b=eY9kpzlC2DM1M7jivArEeWa9O2d5cDZ/HHvmMfwjTA6g9p0Hv121SQVajsmlqmucZo /DNrruSu+cQ1PdMN9cPD3+itjfI4Hh4g9iabXtk9T7GjcbPh9yhXyOlje0gXoiXcvi96 16UeTuY3nQgnh6qzfuYqUErBcJJPPvPjFrWYbpZ16YlB6t7vD3hkUV9706a21hCMIUHz qiBATVPAg2W1WcX7El45S0up8uji7U3FJJQTsMobZUEWhg9uNUUPABSYK9DwKmvCAxbM hGf6i6WgJa0CaNJB/X3vedEys455J6Q6i/vwoySvG9MYKTrVo0IZYZDTcN57D9dXcQAI phow== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=yyJFbEiRHCma+uvofoZbZvm1fPuV2gLaYZpyW0jnz8Q=; b=KMb/yazpTqxSrZESmdqnarvtAYu3LniDyTUAR8oAxoEzQLSc1wstnzuXDkH5bZcCoV kcj52FebKb5waLdGnlYhjJmDNL47pSSTOXjN4fQZdTpOjROQPXFzgRwB2mqgg09zKQ/q zNcUJwVxmLzRBpOQS1gbMEJyXd8gUAxYh1E8v3Ru0HddqkktMXp2iqgGxeuMGxrHhZEb +U0JlubqzdnZCF3vSvtMi9xbQgX9+9f2IGGhEXc+ky4P5VRG8mUnNZMEbNK9+rVEIh/p 51sTTmL2Pj279ZlDCoSq9VTI/In32AyLnehPYNOJwxjabAfmi7YTW500iSWwlkDRDp4U 3mRQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sberdevices.ru header.s=mail header.b=Kx0Y2IWD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=sberdevices.ru Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q72-20020a17090a1b4e00b0025e2c7b1808si7541335pjq.53.2023.06.19.07.43.20; Mon, 19 Jun 2023 07:43:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@sberdevices.ru header.s=mail header.b=Kx0Y2IWD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=sberdevices.ru Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231256AbjFSOf3 (ORCPT <rfc822;duw91626@gmail.com> + 99 others); Mon, 19 Jun 2023 10:35:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37198 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229519AbjFSOfY (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 19 Jun 2023 10:35:24 -0400 Received: from mx.sberdevices.ru (mx.sberdevices.ru [45.89.227.171]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EC547E76 for <linux-kernel@vger.kernel.org>; Mon, 19 Jun 2023 07:35:21 -0700 (PDT) Received: from s-lin-edge02.sberdevices.ru (localhost [127.0.0.1]) by mx.sberdevices.ru (Postfix) with ESMTP id 3F04F5FD2A; Mon, 19 Jun 2023 17:35:20 +0300 (MSK) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sberdevices.ru; s=mail; t=1687185320; bh=yyJFbEiRHCma+uvofoZbZvm1fPuV2gLaYZpyW0jnz8Q=; h=From:To:Subject:Date:Message-ID:MIME-Version:Content-Type; b=Kx0Y2IWD//KCWUqHs31kllelCYkzawNTLoFolOJ7WtBIBaE8DqUSOB/P56564TZCF 8wHNaQcpS0AVi1fqBHPqXLECjUBexiVLjDkZ5UdwkYp1XGCgYp/0B0c3mrALy9vibg rFrlXFmHizU4e85qqg4LuyznALDrWT5bILn/SP5GOjJlfezNjEJqwTzTgl6KrtexYm tK0rqzSGYboERdOMW0HqAtqyogAl/3fYtBrstj2z++mPdW9xVfZjCwFl5ywnksS9Jc z0AJeymCexXYTe04JqBI0f9SoFpOE726nKIfOwYtI5nXkrleuAxli5DTU//5vUt5S+ wOPWWE+e+QW+A== Received: from p-i-exch-sc-m01.sberdevices.ru (p-i-exch-sc-m01.sberdevices.ru [172.16.192.107]) by mx.sberdevices.ru (Postfix) with ESMTP; Mon, 19 Jun 2023 17:35:20 +0300 (MSK) From: Alexey Romanov <avromanov@sberdevices.ru> To: <minchan@kernel.org>, <senozhatsky@chromium.org>, <akpm@linux-foundation.org> CC: <linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>, <kernel@sberdevices.ru>, Alexey Romanov <avromanov@sberdevices.ru> Subject: [PATCH v1 1/2] zsmalloc: add allocated objects counter for subpage Date: Mon, 19 Jun 2023 17:35:05 +0300 Message-ID: <20230619143506.45253-2-avromanov@sberdevices.ru> X-Mailer: git-send-email 2.33.0 In-Reply-To: <20230619143506.45253-1-avromanov@sberdevices.ru> References: <20230619143506.45253-1-avromanov@sberdevices.ru> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [172.16.1.6] X-ClientProxiedBy: S-MS-EXCH02.sberdevices.ru (172.16.1.5) To p-i-exch-sc-m01.sberdevices.ru (172.16.192.107) X-KSMG-Rule-ID: 4 X-KSMG-Message-Action: clean X-KSMG-AntiSpam-Status: not scanned, disabled by settings X-KSMG-AntiSpam-Interceptor-Info: not scanned X-KSMG-AntiPhishing: not scanned, disabled by settings X-KSMG-AntiVirus: Kaspersky Secure Mail Gateway, version 1.1.2.30, bases: 2023/06/19 10:56:00 #21523989 X-KSMG-AntiVirus-Status: Clean, skipped X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769142554917034871?= X-GMAIL-MSGID: =?utf-8?q?1769142554917034871?= |
Series |
Add obj allocated counter for subpages
|
|
Commit Message
Alexey Romanov
June 19, 2023, 2:35 p.m. UTC
We use a variable of type unsigned int to store the offset
of the first object at the subpage In turn, the offset cannot
exceed the size of PAGE_SIZE, which is usually 4096. Thus,
12 bits are enough to store the offset.
We can use the remaining bytes to store, for example, the
count of allocated objects on a subpage. If the page size is
4Kb, then no more than 128 (4096 / 32) objects can be allocated
on the subpage, which means that one byte is enough to store
the counter of allocated objects.
This patch adds support for counting the number of allocated
objects on a subpage in the first byte of the page_type field.
The offset of the first object is now stored in the remaining
bytes of this field.
The sum of allocated counter for all subpages is zspage->inuse.
We only count objects that have been tagged (I'm talking about
OBJ_ALLOCATED_TAG) on a subpage.
So, for example, in the situation:
subpage 1 subpage 2
[obj1_s - obj1_e, obj2_s - ] -> [obj2_e, obj3_s - obj3_e, free space]
Allocated counter for subpage 1 will be 2, and 1 for subpage 2.
Signed-off-by: Alexey Romanov <avromanov@sberdevices.ru>
---
mm/zsmalloc.c | 38 +++++++++++++++++++++++++++++++++++---
1 file changed, 35 insertions(+), 3 deletions(-)
Comments
On (23/06/19 17:35), Alexey Romanov wrote: > > We use a variable of type unsigned int to store the offset > of the first object at the subpage In turn, the offset cannot > exceed the size of PAGE_SIZE, which is usually 4096. Thus, > 12 bits are enough to store the offset. [..] > If the page size is 4Kb It's certainly not a given. PAGE_SIZE is architecture specific. PAGE_SIZE_16KB and PAGE_SIZE_64KB would be simple examples, but there are, apparently, architectures that even can have PAGE_SIZE_256KB.
Hello Sergey! Thank you for feedback. On Tue, Jun 20, 2023 at 07:36:29PM +0900, Sergey Senozhatsky wrote: > On (23/06/19 17:35), Alexey Romanov wrote: > > > > We use a variable of type unsigned int to store the offset > > of the first object at the subpage In turn, the offset cannot > > exceed the size of PAGE_SIZE, which is usually 4096. Thus, > > 12 bits are enough to store the offset. > > [..] > > > If the page size is 4Kb > > It's certainly not a given. PAGE_SIZE is architecture specific. > PAGE_SIZE_16KB and PAGE_SIZE_64KB would be simple examples, but there > are, apparently, architectures that even can have PAGE_SIZE_256KB. Sure. As far I understand at the moment the maximum size of the page (according to Kconfig info in linux sources) is 256Kb. In this case, we need maximum 18 bits for storing offset. And 2^18 / 32 = 8192 bytes (13 bits, I think u16 is OK for such purpose) for storing allocated objects counter. If sizeof(unsigned int) >= 32 bits the this will be enough for us. Of course, in rare cases this will not be the case. But it seems that zram and kernel already has similiar places. For example, if page size is 256 Kb and sizeof(unsigned int) = 16 bits (2 byte), zram will not wotk on such system, because we can't store offset. But such case is very rare, most systems have unsigned int over 32 bits. Therefore, I think that my idea is still applicable, we just need to change the counter type. What do you think?
On (23/06/20 11:16), Alexey Romanov wrote: > If sizeof(unsigned int) >= 32 bits the this will be enough for us. > Of course, in rare cases this will not be the case. But it seems that > zram and kernel already has similiar places. For example, if page size > is 256 Kb and sizeof(unsigned int) = 16 bits (2 byte), zram will not > wotk on such system, because we can't store offset. But such case is > very rare, most systems have unsigned int over 32 bits. > > Therefore, I think that my idea is still applicable, we just need to > change the counter type. What do you think? My gut feeling is that we better avoid mixing in architecture specific magic into generic code. It works fine until it doesn't. May be Minchan will have a different opinion tho. There can be other ways to avoid linear scan of empty sub-pages. For instance, something like below probably can cover less cases than your patch 0002, but on the other hand is rather generic, trivial and doesn't contain any assumptions on the architecture specifics. (composed/edited in mail client, so likely is broken, but outlines the idea) ==================================================================== mm/zsmalloc: do not scan empty zspages We already stop zspage migration when we detect that target zspage has no space left for any new objects. There is one more thing we can do in order to avoid doing useless work: stop scanning for allocated objects in sub-pages when we have migrated the last inuse object from the zspage in question. diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 02f7f414aade..2875152e6497 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -1263,6 +1263,11 @@ static bool zspage_full(struct size_class *class, struct zspage *zspage) return get_zspage_inuse(zspage) == class->objs_per_zspage; } +static bool zspage_empty(struct zspage *zspage) +{ + return get_zspage_inuse(zspage) == 0; +} + /** * zs_lookup_class_index() - Returns index of the zsmalloc &size_class * that hold objects of the provided size. @@ -1787,6 +1792,10 @@ static void migrate_zspage(struct zs_pool *pool, struct size_class *class, obj_idx++; record_obj(handle, free_obj); obj_free(class->size, used_obj, NULL); + + /* Stop if there are no more objects to migrate */ + if (zspage_empty(get_zspage(s_page))) + break; }
Hi! On Wed, Jun 21, 2023 at 10:17:16PM +0900, Sergey Senozhatsky wrote: > On (23/06/20 11:16), Alexey Romanov wrote: > > If sizeof(unsigned int) >= 32 bits the this will be enough for us. > > Of course, in rare cases this will not be the case. But it seems that > > zram and kernel already has similiar places. For example, if page size > > is 256 Kb and sizeof(unsigned int) = 16 bits (2 byte), zram will not > > wotk on such system, because we can't store offset. But such case is > > very rare, most systems have unsigned int over 32 bits. > > > > Therefore, I think that my idea is still applicable, we just need to > > change the counter type. What do you think? > > My gut feeling is that we better avoid mixing in architecture specific > magic into generic code. It works fine until it doesn't. May be Minchan > will have a different opinion tho. > > There can be other ways to avoid linear scan of empty sub-pages. For > instance, something like below probably can cover less cases than your > patch 0002, but on the other hand is rather generic, trivial and doesn't > contain any assumptions on the architecture specifics. > > (composed/edited in mail client, so likely is broken, but outlines > the idea) > > ==================================================================== > > mm/zsmalloc: do not scan empty zspages > > We already stop zspage migration when we detect that target > zspage has no space left for any new objects. There is > one more thing we can do in order to avoid doing useless > work: stop scanning for allocated objects in sub-pages when > we have migrated the last inuse object from the zspage in > question. > > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c > index 02f7f414aade..2875152e6497 100644 > --- a/mm/zsmalloc.c > +++ b/mm/zsmalloc.c > @@ -1263,6 +1263,11 @@ static bool zspage_full(struct size_class *class, struct zspage *zspage) > return get_zspage_inuse(zspage) == class->objs_per_zspage; > } > > +static bool zspage_empty(struct zspage *zspage) > +{ > + return get_zspage_inuse(zspage) == 0; > +} > + > /** > * zs_lookup_class_index() - Returns index of the zsmalloc &size_class > * that hold objects of the provided size. > @@ -1787,6 +1792,10 @@ static void migrate_zspage(struct zs_pool *pool, struct size_class *class, > obj_idx++; > record_obj(handle, free_obj); > obj_free(class->size, used_obj, NULL); > + > + /* Stop if there are no more objects to migrate */ > + if (zspage_empty(get_zspage(s_page))) > + break; > } Yes it seems my version is not as good as I thought. Looks bad for an architecturally dependent PAGE_SIZE. Your version sounds good. In general, I can implement this option. I'll test this and send patch this week.
On (23/06/21 13:41), Alexey Romanov wrote: [..] > > +static bool zspage_empty(struct zspage *zspage) > > +{ > > + return get_zspage_inuse(zspage) == 0; > > +} > > + > > /** > > * zs_lookup_class_index() - Returns index of the zsmalloc &size_class > > * that hold objects of the provided size. > > @@ -1787,6 +1792,10 @@ static void migrate_zspage(struct zs_pool *pool, struct size_class *class, > > obj_idx++; > > record_obj(handle, free_obj); > > obj_free(class->size, used_obj, NULL); > > + > > + /* Stop if there are no more objects to migrate */ > > + if (zspage_empty(get_zspage(s_page))) > > + break; > > } > > Yes it seems my version is not as good as I thought. Looks bad for an > architecturally dependent PAGE_SIZE. [..] Well, we are looking for a solution that is both reasonable (perf wise) and is maintainable. > I can implement this option. I'll test this and send patch this week. Either that or, if Suggested-by: Alexey Romanov <AVRomanov@sberdevices.ru> is good enough for you, then I can send a series tonight or tomorrow (after some testing). I have two more patches on top of that one.
On Wed, Jun 21, 2023 at 10:55:18PM +0900, Sergey Senozhatsky wrote: > On (23/06/21 13:41), Alexey Romanov wrote: > [..] > > > +static bool zspage_empty(struct zspage *zspage) > > > +{ > > > + return get_zspage_inuse(zspage) == 0; > > > +} > > > + > > > /** > > > * zs_lookup_class_index() - Returns index of the zsmalloc &size_class > > > * that hold objects of the provided size. > > > @@ -1787,6 +1792,10 @@ static void migrate_zspage(struct zs_pool *pool, struct size_class *class, > > > obj_idx++; > > > record_obj(handle, free_obj); > > > obj_free(class->size, used_obj, NULL); > > > + > > > + /* Stop if there are no more objects to migrate */ > > > + if (zspage_empty(get_zspage(s_page))) > > > + break; > > > } > > > > Yes it seems my version is not as good as I thought. Looks bad for an > > architecturally dependent PAGE_SIZE. [..] > > Well, we are looking for a solution that is both reasonable (perf wise) > and is maintainable. > > > I can implement this option. I'll test this and send patch this week. > > Either that or, if Suggested-by: Alexey Romanov <AVRomanov@sberdevices.ru> > is good enough for you, then I can send a series tonight or tomorrow (after > some testing). I have two more patches on top of that one. Yeah, Suggested-by is OK. Let's send a patch. Thank you.
On (23/06/21 13:59), Alexey Romanov wrote: > On Wed, Jun 21, 2023 at 10:55:18PM +0900, Sergey Senozhatsky wrote: > > On (23/06/21 13:41), Alexey Romanov wrote: > > [..] > > > > +static bool zspage_empty(struct zspage *zspage) > > > > +{ > > > > + return get_zspage_inuse(zspage) == 0; > > > > +} > > > > + > > > > /** > > > > * zs_lookup_class_index() - Returns index of the zsmalloc &size_class > > > > * that hold objects of the provided size. > > > > @@ -1787,6 +1792,10 @@ static void migrate_zspage(struct zs_pool *pool, struct size_class *class, > > > > obj_idx++; > > > > record_obj(handle, free_obj); > > > > obj_free(class->size, used_obj, NULL); > > > > + > > > > + /* Stop if there are no more objects to migrate */ > > > > + if (zspage_empty(get_zspage(s_page))) > > > > + break; > > > > } > > > > > > Yes it seems my version is not as good as I thought. Looks bad for an > > > architecturally dependent PAGE_SIZE. [..] > > > > Well, we are looking for a solution that is both reasonable (perf wise) > > and is maintainable. > > > > > I can implement this option. I'll test this and send patch this week. > > > > Either that or, if Suggested-by: Alexey Romanov <AVRomanov@sberdevices.ru> > > is good enough for you, then I can send a series tonight or tomorrow (after > > some testing). I have two more patches on top of that one. > > Yeah, Suggested-by is OK. Let's send a patch. Thank you. Got it. Let's hear from Minchan first, just to make sure that Minchan is OK with that.
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index c0d433541636..dd6e2c3429e0 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -20,7 +20,10 @@ * page->index: links together all component pages of a zspage * For the huge page, this is always 0, so we use this field * to store handle. - * page->page_type: first object offset in a subpage of zspage + * page->page_type: + * First byte: count of allocated objects (OBJ_ALLOCATED_TAG) + * in a subpage of zspage. + * Other bytes: first object offset in a subpage of zspage. * * Usage of struct page flags: * PG_private: identifies the first component page @@ -126,6 +129,9 @@ #define OBJ_INDEX_BITS (BITS_PER_LONG - _PFN_BITS - OBJ_TAG_BITS) #define OBJ_INDEX_MASK ((_AC(1, UL) << OBJ_INDEX_BITS) - 1) +#define OBJ_ALLOCATED_BITS (sizeof(u8) * BITS_PER_BYTE) +#define OBJ_ALLOCATED_MASK ((1UL << OBJ_ALLOCATED_BITS) - 1) + #define HUGE_BITS 1 #define FULLNESS_BITS 4 #define CLASS_BITS 8 @@ -520,14 +526,37 @@ static inline struct page *get_first_page(struct zspage *zspage) return first_page; } +static inline u8 get_obj_allocated(struct page *page) +{ + return page->page_type & OBJ_ALLOCATED_MASK; +} + +static inline void set_obj_allocated(struct page *page, u8 value) +{ + page->page_type = (page->page_type & ~OBJ_ALLOCATED_MASK) | value; +} + +static inline void mod_obj_allocated(struct page *page, s8 value) +{ + u8 inuse = get_obj_allocated(page); + /* + * Overflow is not possible: + * 1. Maximum number of objects allocated on a subpage is 128. + * 2. We use this function only with value = 1 or -1. + */ + inuse += value; + set_obj_allocated(page, inuse); +} + static inline unsigned int get_first_obj_offset(struct page *page) { - return page->page_type; + return page->page_type >> OBJ_ALLOCATED_BITS; } static inline void set_first_obj_offset(struct page *page, unsigned int offset) { - page->page_type = offset; + page->page_type = (page->page_type & OBJ_ALLOCATED_MASK) | + (offset << OBJ_ALLOCATED_BITS); } static inline unsigned int get_freeobj(struct zspage *zspage) @@ -1126,6 +1155,7 @@ static struct zspage *alloc_zspage(struct zs_pool *pool, } inc_zone_page_state(page, NR_ZSPAGES); + set_obj_allocated(page, 0); pages[i] = page; } @@ -1456,6 +1486,7 @@ static unsigned long obj_malloc(struct zs_pool *pool, kunmap_atomic(vaddr); mod_zspage_inuse(zspage, 1); + mod_obj_allocated(m_page, 1); obj = location_to_obj(m_page, obj); @@ -1576,6 +1607,7 @@ static void obj_free(int class_size, unsigned long obj, unsigned long *handle) kunmap_atomic(vaddr); mod_zspage_inuse(zspage, -1); + mod_obj_allocated(f_page, -1); } void zs_free(struct zs_pool *pool, unsigned long handle)