Message ID | 20221117163839.230900-4-nphamcs@gmail.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp508978wrr; Thu, 17 Nov 2022 08:58:30 -0800 (PST) X-Google-Smtp-Source: AA0mqf5SWXiZiYghNTSYh+B7Jf9/ThPfaD7raooudI5kODbGQrNk2BjKqvn2EF7tox0PSBpmZQQB X-Received: by 2002:a05:6a00:27a3:b0:56d:6450:9e49 with SMTP id bd35-20020a056a0027a300b0056d64509e49mr3765429pfb.54.1668704310078; Thu, 17 Nov 2022 08:58:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668704310; cv=none; d=google.com; s=arc-20160816; b=SS+JI1dCG+2mIFiOEev5xXDH3QrR6qBcia9n9IAm4rZz4Yad4YuN3EgXy5YvmfrjLv DbZ2I+zMIAnEpZZspbM4rmaaQqhUtMFyR42PmyiNEIEmBpJqaOeGjnQUvYKZEAIPI/Xx lQEbP4A8YHEd6m0zKbvg/11Y4Wu5qL6eiDU5KCEkkZI1+UCuK9PavMf8KXwbmoYvy63/ dE32gCP3U1hBqvHn4dprpf5e5Ymzu5I0Zdme1NBpwGtVRJyyba7lcr2fgK2XY8EwW6DC gB4v1lxoCyZxhjp1GrYtWwn4bne3bPkAjES7utjX9DBQttOZaLTBQcEi7rYACMHgqKHF cLBw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=+p1hPj61mepMRhXTZUUxeqFE4oggG2yUtQcHGL8WEqI=; b=tx2E2RSCl4/gyBiL67myKTvkUugZe6LlPcNv6Rvi4SR6Ea4T2l+5yEQ9eY4JqVoQ6S vl+1Y4/V9YD68AB/sYkzTaKXmCzWGurrspQu2ACxxGObQxWBMhdoUd+piI4Aw7uGhqBO +r7zKIa1cnrnqPhYekHQym8nJ0C0uwmikWvyHzfhSfuHCymomewI1Uu8t1v2jngyqu+s yu3LuDJ8MlqwLBlWl80G+N8GyJM3lXHottj0nYimy+sAnS2hveIVK46VKGO4tN5Unk89 JBPFS2Rd1ago7u9WZEh8k9FPOtxg08ksHN9Ad8s9WjSOX1ybOE3Kq6a0uyX4e9sbydo6 zZVg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=R3zzWdjc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s9-20020a056a00194900b00562bb120d21si1454625pfk.165.2022.11.17.08.58.16; Thu, 17 Nov 2022 08:58:30 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=R3zzWdjc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240051AbiKQQja (ORCPT <rfc822;a1648639935@gmail.com> + 99 others); Thu, 17 Nov 2022 11:39:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47176 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240677AbiKQQi4 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 17 Nov 2022 11:38:56 -0500 Received: from mail-pf1-x42d.google.com (mail-pf1-x42d.google.com [IPv6:2607:f8b0:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D33635BD5C for <linux-kernel@vger.kernel.org>; Thu, 17 Nov 2022 08:38:45 -0800 (PST) Received: by mail-pf1-x42d.google.com with SMTP id 130so2282130pfu.8 for <linux-kernel@vger.kernel.org>; Thu, 17 Nov 2022 08:38:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=+p1hPj61mepMRhXTZUUxeqFE4oggG2yUtQcHGL8WEqI=; b=R3zzWdjc5t/hqkJuKcDtHkIvecRh7pqG32ZDOeKpoiUvm6JWW16cQBdC0opssauqa9 7zeNitEvIVRebPigqX1DOwT3lIXMPvIrYMmLV7WkKYai7bdcJur42doKXuGhgA1fWBYK YFET9iVKGKSnxcLUCzXWNasfQb4KvedIQ9EowX60sMHs/Ld+qan1Mhc9DOB34LoHONqi Wvrl1qxUfanPwueBdkvDOoynmkJBWjVNnj7WNgCC0XQ9hDnWZ7MQ33Ou4FhLpnWCnigG APgiIfw/vUuNDUgu8e0jgjJNvt/dfCUDFpIXhUm+WFMytCiLp9x5nCUsim84H9lqgcf7 xs1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+p1hPj61mepMRhXTZUUxeqFE4oggG2yUtQcHGL8WEqI=; b=qt5IStls3ihDV3zsR0xyvjZodpnXFu7EEC52bEriPnu83cqiI11WQWY4H0xrylYNB+ E5XC2BX3jngrETq+gBFBGy4a8wt9uvC8fKi2cenBByafnUZ7dpw+7RIr4bLIbsguY7G4 W45Fi28x45eIDa+ny4Wc12DL2mL1SUjQFyApXdvz1ej7ilMnzcv9IalDpC4IUQEjF/fq sZYU+9buDnz9e/iOzC1Q3ymYv9oHf0zcNYqDcM5DHqTOjZnXiU8oQFiI9xunFFfCVzm0 uZi8uk8PryuAXwQR/CleQv2Z2r/EZ5WBs12zf7nvAxU7oeLoTYgzVYGj+AyZ7G+/JUCT F2yg== X-Gm-Message-State: ANoB5pm5kSzo1w+Wr4jGUQNez1Ncc/H1hQrWecKeG/aubca4kwqM/xN9 3Wa7NuL1knT2khYYAyC9UDs= X-Received: by 2002:a05:6a00:2352:b0:572:91c6:9e4e with SMTP id j18-20020a056a00235200b0057291c69e4emr3760746pfj.53.1668703125266; Thu, 17 Nov 2022 08:38:45 -0800 (PST) Received: from localhost (fwdproxy-prn-000.fbsv.net. [2a03:2880:ff::face:b00c]) by smtp.gmail.com with ESMTPSA id a23-20020aa79717000000b0056bc31f4f9fsm1373044pfg.65.2022.11.17.08.38.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 17 Nov 2022 08:38:44 -0800 (PST) From: Nhat Pham <nphamcs@gmail.com> To: akpm@linux-foundation.org Cc: hannes@cmpxchg.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, minchan@kernel.org, ngupta@vflare.org, senozhatsky@chromium.org, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com Subject: [PATCH v4 3/5] zsmalloc: Add a LRU to zs_pool to keep track of zspages in LRU order Date: Thu, 17 Nov 2022 08:38:37 -0800 Message-Id: <20221117163839.230900-4-nphamcs@gmail.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20221117163839.230900-1-nphamcs@gmail.com> References: <20221117163839.230900-1-nphamcs@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1749763290698157123?= X-GMAIL-MSGID: =?utf-8?q?1749763290698157123?= |
Series |
Implement writeback for zsmalloc
|
|
Commit Message
Nhat Pham
Nov. 17, 2022, 4:38 p.m. UTC
This helps determines the coldest zspages as candidates for writeback.
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
---
mm/zsmalloc.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 46 insertions(+)
--
2.30.2
Comments
On Thu, Nov 17, 2022 at 08:38:37AM -0800, Nhat Pham wrote: > This helps determines the coldest zspages as candidates for writeback. > > Signed-off-by: Nhat Pham <nphamcs@gmail.com> > --- > mm/zsmalloc.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 46 insertions(+) > > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c > index 326faa751f0a..2557b55ec767 100644 > --- a/mm/zsmalloc.c > +++ b/mm/zsmalloc.c > @@ -239,6 +239,11 @@ struct zs_pool { > /* Compact classes */ > struct shrinker shrinker; > > +#ifdef CONFIG_ZPOOL > + /* List tracking the zspages in LRU order by most recently added object */ > + struct list_head lru; > +#endif > + > #ifdef CONFIG_ZSMALLOC_STAT > struct dentry *stat_dentry; > #endif > @@ -260,6 +265,12 @@ struct zspage { > unsigned int freeobj; > struct page *first_page; > struct list_head list; /* fullness list */ > + > +#ifdef CONFIG_ZPOOL > + /* links the zspage to the lru list in the pool */ > + struct list_head lru; > +#endif > + > struct zs_pool *pool; > #ifdef CONFIG_COMPACTION > rwlock_t lock; > @@ -352,6 +363,18 @@ static void cache_free_zspage(struct zs_pool *pool, struct zspage *zspage) > kmem_cache_free(pool->zspage_cachep, zspage); > } > > +#ifdef CONFIG_ZPOOL > +/* Moves the zspage to the front of the zspool's LRU */ > +static void move_to_front(struct zs_pool *pool, struct zspage *zspage) > +{ > + assert_spin_locked(&pool->lock); > + > + if (!list_empty(&zspage->lru)) > + list_del(&zspage->lru); > + list_add(&zspage->lru, &pool->lru); > +} > +#endif > + > /* pool->lock(which owns the handle) synchronizes races */ > static void record_obj(unsigned long handle, unsigned long obj) > { > @@ -953,6 +976,9 @@ static void free_zspage(struct zs_pool *pool, struct size_class *class, > } > > remove_zspage(class, zspage, ZS_EMPTY); > +#ifdef CONFIG_ZPOOL > + list_del(&zspage->lru); > +#endif > __free_zspage(pool, class, zspage); > } > > @@ -998,6 +1024,10 @@ static void init_zspage(struct size_class *class, struct zspage *zspage) > off %= PAGE_SIZE; > } > > +#ifdef CONFIG_ZPOOL > + INIT_LIST_HEAD(&zspage->lru); > +#endif > + > set_freeobj(zspage, 0); > } > > @@ -1418,6 +1448,11 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp) > fix_fullness_group(class, zspage); > record_obj(handle, obj); > class_stat_inc(class, OBJ_USED, 1); > + > +#ifdef CONFIG_ZPOOL > + /* Move the zspage to front of pool's LRU */ > + move_to_front(pool, zspage); > +#endif > spin_unlock(&pool->lock); > > return handle; > @@ -1444,6 +1479,10 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp) > > /* We completely set up zspage so mark them as movable */ > SetZsPageMovable(pool, zspage); > +#ifdef CONFIG_ZPOOL > + /* Move the zspage to front of pool's LRU */ > + move_to_front(pool, zspage); > +#endif > spin_unlock(&pool->lock); Why do we move the zspage in the alloc instead of accessor? Isn't zs_map_object better place since it's clear semantic that user start to access the object? If you are concerning unnecessary churning, can we do that only for WO access? Yeah, it is still weird layering since allocator couldn't know what user will do over the this access(will keep or discard) so the lru in the allocator is not a good design but I just want to make little more sense.
On Thu, Nov 17, 2022 at 02:15:11PM -0800, Minchan Kim wrote: > On Thu, Nov 17, 2022 at 08:38:37AM -0800, Nhat Pham wrote: > > This helps determines the coldest zspages as candidates for writeback. > > > > Signed-off-by: Nhat Pham <nphamcs@gmail.com> > > --- > > mm/zsmalloc.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 46 insertions(+) > > > > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c > > index 326faa751f0a..2557b55ec767 100644 > > --- a/mm/zsmalloc.c > > +++ b/mm/zsmalloc.c > > @@ -239,6 +239,11 @@ struct zs_pool { > > /* Compact classes */ > > struct shrinker shrinker; > > > > +#ifdef CONFIG_ZPOOL > > + /* List tracking the zspages in LRU order by most recently added object */ > > + struct list_head lru; > > +#endif > > + > > #ifdef CONFIG_ZSMALLOC_STAT > > struct dentry *stat_dentry; > > #endif > > @@ -260,6 +265,12 @@ struct zspage { > > unsigned int freeobj; > > struct page *first_page; > > struct list_head list; /* fullness list */ > > + > > +#ifdef CONFIG_ZPOOL > > + /* links the zspage to the lru list in the pool */ > > + struct list_head lru; > > +#endif > > + > > struct zs_pool *pool; > > #ifdef CONFIG_COMPACTION > > rwlock_t lock; > > @@ -352,6 +363,18 @@ static void cache_free_zspage(struct zs_pool *pool, struct zspage *zspage) > > kmem_cache_free(pool->zspage_cachep, zspage); > > } > > > > +#ifdef CONFIG_ZPOOL > > +/* Moves the zspage to the front of the zspool's LRU */ > > +static void move_to_front(struct zs_pool *pool, struct zspage *zspage) > > +{ > > + assert_spin_locked(&pool->lock); > > + > > + if (!list_empty(&zspage->lru)) > > + list_del(&zspage->lru); > > + list_add(&zspage->lru, &pool->lru); > > +} > > +#endif > > + > > /* pool->lock(which owns the handle) synchronizes races */ > > static void record_obj(unsigned long handle, unsigned long obj) > > { > > @@ -953,6 +976,9 @@ static void free_zspage(struct zs_pool *pool, struct size_class *class, > > } > > > > remove_zspage(class, zspage, ZS_EMPTY); > > +#ifdef CONFIG_ZPOOL > > + list_del(&zspage->lru); > > +#endif > > __free_zspage(pool, class, zspage); > > } > > > > @@ -998,6 +1024,10 @@ static void init_zspage(struct size_class *class, struct zspage *zspage) > > off %= PAGE_SIZE; > > } > > > > +#ifdef CONFIG_ZPOOL > > + INIT_LIST_HEAD(&zspage->lru); > > +#endif > > + > > set_freeobj(zspage, 0); > > } > > > > @@ -1418,6 +1448,11 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp) > > fix_fullness_group(class, zspage); > > record_obj(handle, obj); > > class_stat_inc(class, OBJ_USED, 1); > > + > > +#ifdef CONFIG_ZPOOL > > + /* Move the zspage to front of pool's LRU */ > > + move_to_front(pool, zspage); > > +#endif > > spin_unlock(&pool->lock); > > > > return handle; > > @@ -1444,6 +1479,10 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp) > > > > /* We completely set up zspage so mark them as movable */ > > SetZsPageMovable(pool, zspage); > > +#ifdef CONFIG_ZPOOL > > + /* Move the zspage to front of pool's LRU */ > > + move_to_front(pool, zspage); > > +#endif > > spin_unlock(&pool->lock); > > Why do we move the zspage in the alloc instead of accessor? > > Isn't zs_map_object better place since it's clear semantic > that user start to access the object? Remember that this is used for swap, and these entries aren't accessed on an ongoing basis while in the pool. An access means swapin. So functionally this is fine. On cleaner choices, I would actually agree with you that map would be more appropriate. But all I can do is repeat replies from previous questions: We're not reinventing the wheel here. zbud and z3fold do it this way, and this follows their precedent. We've talked about wanting to generalize the LRU, and that's a heck of a lot easier if we have copies of *one* implementation that we can deduplicate - instead of having to merge multiple implementations with arbitrary differences. So as you review these patches, please compare what they do to zbud and z3fold. Those existing zpool implementations have been a common answer in response to review questions.
On Thu, Nov 17, 2022 at 11:30:11PM -0500, Johannes Weiner wrote: > On Thu, Nov 17, 2022 at 02:15:11PM -0800, Minchan Kim wrote: > > On Thu, Nov 17, 2022 at 08:38:37AM -0800, Nhat Pham wrote: > > > This helps determines the coldest zspages as candidates for writeback. > > > > > > Signed-off-by: Nhat Pham <nphamcs@gmail.com> > > > --- > > > mm/zsmalloc.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ > > > 1 file changed, 46 insertions(+) > > > > > > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c > > > index 326faa751f0a..2557b55ec767 100644 > > > --- a/mm/zsmalloc.c > > > +++ b/mm/zsmalloc.c > > > @@ -239,6 +239,11 @@ struct zs_pool { > > > /* Compact classes */ > > > struct shrinker shrinker; > > > > > > +#ifdef CONFIG_ZPOOL > > > + /* List tracking the zspages in LRU order by most recently added object */ > > > + struct list_head lru; > > > +#endif > > > + > > > #ifdef CONFIG_ZSMALLOC_STAT > > > struct dentry *stat_dentry; > > > #endif > > > @@ -260,6 +265,12 @@ struct zspage { > > > unsigned int freeobj; > > > struct page *first_page; > > > struct list_head list; /* fullness list */ > > > + > > > +#ifdef CONFIG_ZPOOL > > > + /* links the zspage to the lru list in the pool */ > > > + struct list_head lru; > > > +#endif > > > + > > > struct zs_pool *pool; > > > #ifdef CONFIG_COMPACTION > > > rwlock_t lock; > > > @@ -352,6 +363,18 @@ static void cache_free_zspage(struct zs_pool *pool, struct zspage *zspage) > > > kmem_cache_free(pool->zspage_cachep, zspage); > > > } > > > > > > +#ifdef CONFIG_ZPOOL > > > +/* Moves the zspage to the front of the zspool's LRU */ > > > +static void move_to_front(struct zs_pool *pool, struct zspage *zspage) > > > +{ > > > + assert_spin_locked(&pool->lock); > > > + > > > + if (!list_empty(&zspage->lru)) > > > + list_del(&zspage->lru); > > > + list_add(&zspage->lru, &pool->lru); > > > +} > > > +#endif > > > + > > > /* pool->lock(which owns the handle) synchronizes races */ > > > static void record_obj(unsigned long handle, unsigned long obj) > > > { > > > @@ -953,6 +976,9 @@ static void free_zspage(struct zs_pool *pool, struct size_class *class, > > > } > > > > > > remove_zspage(class, zspage, ZS_EMPTY); > > > +#ifdef CONFIG_ZPOOL > > > + list_del(&zspage->lru); > > > +#endif > > > __free_zspage(pool, class, zspage); > > > } > > > > > > @@ -998,6 +1024,10 @@ static void init_zspage(struct size_class *class, struct zspage *zspage) > > > off %= PAGE_SIZE; > > > } > > > > > > +#ifdef CONFIG_ZPOOL > > > + INIT_LIST_HEAD(&zspage->lru); > > > +#endif > > > + > > > set_freeobj(zspage, 0); > > > } > > > > > > @@ -1418,6 +1448,11 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp) > > > fix_fullness_group(class, zspage); > > > record_obj(handle, obj); > > > class_stat_inc(class, OBJ_USED, 1); > > > + > > > +#ifdef CONFIG_ZPOOL > > > + /* Move the zspage to front of pool's LRU */ > > > + move_to_front(pool, zspage); > > > +#endif > > > spin_unlock(&pool->lock); > > > > > > return handle; > > > @@ -1444,6 +1479,10 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp) > > > > > > /* We completely set up zspage so mark them as movable */ > > > SetZsPageMovable(pool, zspage); > > > +#ifdef CONFIG_ZPOOL > > > + /* Move the zspage to front of pool's LRU */ > > > + move_to_front(pool, zspage); > > > +#endif > > > spin_unlock(&pool->lock); > > > > Why do we move the zspage in the alloc instead of accessor? > > > > Isn't zs_map_object better place since it's clear semantic > > that user start to access the object? > > Remember that this is used for swap, and these entries aren't accessed > on an ongoing basis while in the pool. An access means swapin. So > functionally this is fine. > > On cleaner choices, I would actually agree with you that map would be > more appropriate. But all I can do is repeat replies from previous > questions: We're not reinventing the wheel here. zbud and z3fold do it > this way, and this follows their precedent. We've talked about wanting > to generalize the LRU, and that's a heck of a lot easier if we have > copies of *one* implementation that we can deduplicate - instead of > having to merge multiple implementations with arbitrary differences. It's already very weird to add the LRU logic into allocator but since you have talked that you are working general LRU concept what I preferred and this temporal work would be beneficial to get all the requirements for the work, I wanted to help this as interim solution since I know the general LRU is bigger work than this. However, If you also agree the suggestion(adding LRU into mapping function rather than allocation) and it doesn't change any behavior, why not? If the divergence would make harder for your upcoming LRU work, I totally agree with you but I don't see such one line change makes your duduplication work harder. At the same time, I am trying to make zsmalloc *reasonable* rather than "Yeah, I am fine to take something smelly code since others are already doing it" stance. > > So as you review these patches, please compare what they do to zbud > and z3fold. Those existing zpool implementations have been a common > answer in response to review questions. I did and thought it's weird and the correction is trivial.
On Fri, Nov 18, 2022 at 11:24:13AM -0800, Minchan Kim wrote: > On Thu, Nov 17, 2022 at 11:30:11PM -0500, Johannes Weiner wrote: > > On Thu, Nov 17, 2022 at 02:15:11PM -0800, Minchan Kim wrote: > > > On Thu, Nov 17, 2022 at 08:38:37AM -0800, Nhat Pham wrote: > > > > This helps determines the coldest zspages as candidates for writeback. > > > > > > > > Signed-off-by: Nhat Pham <nphamcs@gmail.com> > > > > --- > > > > mm/zsmalloc.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ > > > > 1 file changed, 46 insertions(+) > > > > > > > > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c > > > > index 326faa751f0a..2557b55ec767 100644 > > > > --- a/mm/zsmalloc.c > > > > +++ b/mm/zsmalloc.c > > > > @@ -239,6 +239,11 @@ struct zs_pool { > > > > /* Compact classes */ > > > > struct shrinker shrinker; > > > > > > > > +#ifdef CONFIG_ZPOOL > > > > + /* List tracking the zspages in LRU order by most recently added object */ > > > > + struct list_head lru; > > > > +#endif > > > > + > > > > #ifdef CONFIG_ZSMALLOC_STAT > > > > struct dentry *stat_dentry; > > > > #endif > > > > @@ -260,6 +265,12 @@ struct zspage { > > > > unsigned int freeobj; > > > > struct page *first_page; > > > > struct list_head list; /* fullness list */ > > > > + > > > > +#ifdef CONFIG_ZPOOL > > > > + /* links the zspage to the lru list in the pool */ > > > > + struct list_head lru; > > > > +#endif > > > > + > > > > struct zs_pool *pool; > > > > #ifdef CONFIG_COMPACTION > > > > rwlock_t lock; > > > > @@ -352,6 +363,18 @@ static void cache_free_zspage(struct zs_pool *pool, struct zspage *zspage) > > > > kmem_cache_free(pool->zspage_cachep, zspage); > > > > } > > > > > > > > +#ifdef CONFIG_ZPOOL > > > > +/* Moves the zspage to the front of the zspool's LRU */ > > > > +static void move_to_front(struct zs_pool *pool, struct zspage *zspage) > > > > +{ > > > > + assert_spin_locked(&pool->lock); > > > > + > > > > + if (!list_empty(&zspage->lru)) > > > > + list_del(&zspage->lru); > > > > + list_add(&zspage->lru, &pool->lru); > > > > +} > > > > +#endif > > > > + > > > > /* pool->lock(which owns the handle) synchronizes races */ > > > > static void record_obj(unsigned long handle, unsigned long obj) > > > > { > > > > @@ -953,6 +976,9 @@ static void free_zspage(struct zs_pool *pool, struct size_class *class, > > > > } > > > > > > > > remove_zspage(class, zspage, ZS_EMPTY); > > > > +#ifdef CONFIG_ZPOOL > > > > + list_del(&zspage->lru); > > > > +#endif > > > > __free_zspage(pool, class, zspage); > > > > } > > > > > > > > @@ -998,6 +1024,10 @@ static void init_zspage(struct size_class *class, struct zspage *zspage) > > > > off %= PAGE_SIZE; > > > > } > > > > > > > > +#ifdef CONFIG_ZPOOL > > > > + INIT_LIST_HEAD(&zspage->lru); > > > > +#endif > > > > + > > > > set_freeobj(zspage, 0); > > > > } > > > > > > > > @@ -1418,6 +1448,11 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp) > > > > fix_fullness_group(class, zspage); > > > > record_obj(handle, obj); > > > > class_stat_inc(class, OBJ_USED, 1); > > > > + > > > > +#ifdef CONFIG_ZPOOL > > > > + /* Move the zspage to front of pool's LRU */ > > > > + move_to_front(pool, zspage); > > > > +#endif > > > > spin_unlock(&pool->lock); > > > > > > > > return handle; > > > > @@ -1444,6 +1479,10 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp) > > > > > > > > /* We completely set up zspage so mark them as movable */ > > > > SetZsPageMovable(pool, zspage); > > > > +#ifdef CONFIG_ZPOOL > > > > + /* Move the zspage to front of pool's LRU */ > > > > + move_to_front(pool, zspage); > > > > +#endif > > > > spin_unlock(&pool->lock); > > > > > > Why do we move the zspage in the alloc instead of accessor? > > > > > > Isn't zs_map_object better place since it's clear semantic > > > that user start to access the object? > > > > Remember that this is used for swap, and these entries aren't accessed > > on an ongoing basis while in the pool. An access means swapin. So > > functionally this is fine. > > > > On cleaner choices, I would actually agree with you that map would be > > more appropriate. But all I can do is repeat replies from previous > > questions: We're not reinventing the wheel here. zbud and z3fold do it > > this way, and this follows their precedent. We've talked about wanting > > to generalize the LRU, and that's a heck of a lot easier if we have > > copies of *one* implementation that we can deduplicate - instead of > > having to merge multiple implementations with arbitrary differences. > > It's already very weird to add the LRU logic into allocator but since > you have talked that you are working general LRU concept what I > preferred and this temporal work would be beneficial to get all the > requirements for the work, I wanted to help this as interim solution > since I know the general LRU is bigger work than this. However, > If you also agree the suggestion(adding LRU into mapping function > rather than allocation) and it doesn't change any behavior, why not? map is called during writeback and load, where we wouldn't want to update the LRU. Sure, you can filter for RO/WO, but that's also not generic from an interface POV because it's entirely specific to zswap operations which type of accesses *should* rotate the LRU. We'd also call LRU code from previously untested contexts and thereby lose our existing prod testing of these patches. I think your asking to take risks and introduce variance into a code base we're already trying to unify, for something that isn't actually cleaner or more generic. So respectfully, I disagree with this change request rather strongly.
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 326faa751f0a..2557b55ec767 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -239,6 +239,11 @@ struct zs_pool { /* Compact classes */ struct shrinker shrinker; +#ifdef CONFIG_ZPOOL + /* List tracking the zspages in LRU order by most recently added object */ + struct list_head lru; +#endif + #ifdef CONFIG_ZSMALLOC_STAT struct dentry *stat_dentry; #endif @@ -260,6 +265,12 @@ struct zspage { unsigned int freeobj; struct page *first_page; struct list_head list; /* fullness list */ + +#ifdef CONFIG_ZPOOL + /* links the zspage to the lru list in the pool */ + struct list_head lru; +#endif + struct zs_pool *pool; #ifdef CONFIG_COMPACTION rwlock_t lock; @@ -352,6 +363,18 @@ static void cache_free_zspage(struct zs_pool *pool, struct zspage *zspage) kmem_cache_free(pool->zspage_cachep, zspage); } +#ifdef CONFIG_ZPOOL +/* Moves the zspage to the front of the zspool's LRU */ +static void move_to_front(struct zs_pool *pool, struct zspage *zspage) +{ + assert_spin_locked(&pool->lock); + + if (!list_empty(&zspage->lru)) + list_del(&zspage->lru); + list_add(&zspage->lru, &pool->lru); +} +#endif + /* pool->lock(which owns the handle) synchronizes races */ static void record_obj(unsigned long handle, unsigned long obj) { @@ -953,6 +976,9 @@ static void free_zspage(struct zs_pool *pool, struct size_class *class, } remove_zspage(class, zspage, ZS_EMPTY); +#ifdef CONFIG_ZPOOL + list_del(&zspage->lru); +#endif __free_zspage(pool, class, zspage); } @@ -998,6 +1024,10 @@ static void init_zspage(struct size_class *class, struct zspage *zspage) off %= PAGE_SIZE; } +#ifdef CONFIG_ZPOOL + INIT_LIST_HEAD(&zspage->lru); +#endif + set_freeobj(zspage, 0); } @@ -1418,6 +1448,11 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp) fix_fullness_group(class, zspage); record_obj(handle, obj); class_stat_inc(class, OBJ_USED, 1); + +#ifdef CONFIG_ZPOOL + /* Move the zspage to front of pool's LRU */ + move_to_front(pool, zspage); +#endif spin_unlock(&pool->lock); return handle; @@ -1444,6 +1479,10 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp) /* We completely set up zspage so mark them as movable */ SetZsPageMovable(pool, zspage); +#ifdef CONFIG_ZPOOL + /* Move the zspage to front of pool's LRU */ + move_to_front(pool, zspage); +#endif spin_unlock(&pool->lock); return handle; @@ -1967,6 +2006,9 @@ static void async_free_zspage(struct work_struct *work) VM_BUG_ON(fullness != ZS_EMPTY); class = pool->size_class[class_idx]; spin_lock(&pool->lock); +#ifdef CONFIG_ZPOOL + list_del(&zspage->lru); +#endif __free_zspage(pool, class, zspage); spin_unlock(&pool->lock); } @@ -2278,6 +2320,10 @@ struct zs_pool *zs_create_pool(const char *name) */ zs_register_shrinker(pool); +#ifdef CONFIG_ZPOOL + INIT_LIST_HEAD(&pool->lru); +#endif + return pool; err: