[RFC,1/1] mm/swap: queue reclaimable folio to local rotate batch when !folio_test_lru()
Message ID | 20240209115950.3885183-2-chengming.zhou@linux.dev |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-59305-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:50ea:b0:106:860b:bbdd with SMTP id r10csp797462dyd; Fri, 9 Feb 2024 04:01:16 -0800 (PST) X-Google-Smtp-Source: AGHT+IEEzyFGHLVc52E8UhPM9JjaiLbqHBuxfkd4ZVKYEwZ9xBlKM0sg2qcMZANR5YzUdxofzx4n X-Received: by 2002:a05:620a:1a89:b0:785:9e49:4117 with SMTP id bl9-20020a05620a1a8900b007859e494117mr1653489qkb.7.1707480076028; Fri, 09 Feb 2024 04:01:16 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707480076; cv=pass; d=google.com; s=arc-20160816; b=d+DEvRxg/XppXTgrlKNkeF7tB0rt+DaaYaOWVIvmJob/5RWZyYgpY1EaDGWK4lmJpl N3d9hyM8xFEXtUjG0Y84e8Uajv3qlEhaTzuHA41gE30ZqWJXk1lApaBNPGd9WlPnXvyf WptVopKruvSjg11YwpX3BWkJsDn10VgCQJQToJJ+QOlNEJuCMhUXogSrOncVV1bnGGsL l8Ac0CZU2TPhuShmAowV40SZuYWybwHpYF5xUCONmXCmjCB2f2xIWdyX2lz4ZwcmP2H8 MKr2egFZYs0oLik9WQP/dfcwi3bM686m6ZzudKx79xuuVHfquhmN+L2UB2SI1RyX08TN GOzA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=H8vjJet6cEqjnm/AF8gtjZ0cWAGZljlA0TlT1cQYzHE=; fh=WGP7XSPmnHizmS4QMf/9IZkOst+Wsl+HKNuMnyoaaJI=; b=izBz0d26iQJ4smV0yLThAfpLq+Y3/gCi0wTK2B38eYmNyBdwALALGGqZHFZ67IZ1hI S4UC8VBU9fynXfm1ryRj4ako59MnfsSzHDGc41H6P3NhWL22EWB/UumAaPOT622X6+Ch /gKhlNshy3eadvSJQW2ddXHpwYnej3pn+5WPc9ptNoFbkwyd/gHDJu8VlTJmpNhvHeDG 67sxrkZhlc3u5+xoTnJhDNf6CeeujYlQ1rCT35A9AJ8Bjr8b8JNKqMRPVhttNE/3X0vw jqb/t35X5H3RFElQvTcmFnfwV99FRrGuWgMFz2xRPZI2qoEe975l4H6/G1VeTtsvAoT8 a5Vw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=wR6BtmIv; arc=pass (i=1 spf=pass spfdomain=linux.dev dkim=pass dkdomain=linux.dev dmarc=pass fromdomain=linux.dev); spf=pass (google.com: domain of linux-kernel+bounces-59305-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-59305-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev X-Forwarded-Encrypted: i=2; AJvYcCV/eL038TzU3ogTcj1eUif0d8ifZkSX9adFHJzxIhixl3BrWuD8RxLdTmfmiw0ALGBHecdaQ0CkGJoefInVPGKRSZaqKw== Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id bp18-20020a05620a459200b00785546f6699si1978023qkb.659.2024.02.09.04.01.15 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 09 Feb 2024 04:01:16 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-59305-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=wR6BtmIv; arc=pass (i=1 spf=pass spfdomain=linux.dev dkim=pass dkdomain=linux.dev dmarc=pass fromdomain=linux.dev); spf=pass (google.com: domain of linux-kernel+bounces-59305-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-59305-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id C8FBC1C231CC for <ouuuleilei@gmail.com>; Fri, 9 Feb 2024 12:01:15 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 46EB33612A; Fri, 9 Feb 2024 12:00:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="wR6BtmIv" Received: from out-189.mta0.migadu.com (out-189.mta0.migadu.com [91.218.175.189]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5293F20311 for <linux-kernel@vger.kernel.org>; Fri, 9 Feb 2024 12:00:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.189 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707480051; cv=none; b=VC+UzzpePF4FtojxCJilu07tM2sIZxkKMbW6IxxdgootLIE/MyOz7wkX5yPeE4aV+5Bk273sUKQriQPt3/v0qjJCZGk2zbgRYCaNZ4XP1NeV39cx6njlxeJO6VYizrYsTP7/jDmsTz1zn4luGtKeosTI15aeZsf4e4msahap7+A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707480051; c=relaxed/simple; bh=KEBYIrTh33UwK7TabHpZdff+4ImJ7WjKDWafJZ2lFVU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=qKgjBCvrKt7ZkU+nheUQHexUr8nckZtqlIkRuEh7KajL54BQkRR+JqgAS/3OcfVYHcdKAh4GG0QHw754wODjpL6QZ5nDF0lsUXuY/4cJfFVp1HOmYMv2zxulqmr9Hx24yrBYoXf36qfJZ7eDzQv6kIUZjthZBxmcpPhoxMS/R0Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=wR6BtmIv; arc=none smtp.client-ip=91.218.175.189 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1707480043; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=H8vjJet6cEqjnm/AF8gtjZ0cWAGZljlA0TlT1cQYzHE=; b=wR6BtmIvXYaNDb25zKYBYPeL0CPBfaQZ6v/7Gr9UKiDDQsSWzzxx4NdEmrRZKajRv9ME6N YsugFwG1wpt+u3qF0D4YGlMz3v4TtSyRVvNUgXu4jBEGgLtudbwWMSYJdnGOcP8klAr4e6 uQsskjmJ2Pdx40q3SmNvSRxSqYIGhG8= From: chengming.zhou@linux.dev To: willy@infradead.org, hannes@cmpxchg.org, yosryahmed@google.com, nphamcs@gmail.com, akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, chengming.zhou@linux.dev, Chengming Zhou <zhouchengming@bytedance.com> Subject: [PATCH RFC 1/1] mm/swap: queue reclaimable folio to local rotate batch when !folio_test_lru() Date: Fri, 9 Feb 2024 11:59:50 +0000 Message-Id: <20240209115950.3885183-2-chengming.zhou@linux.dev> In-Reply-To: <20240209115950.3885183-1-chengming.zhou@linux.dev> References: <20240209115950.3885183-1-chengming.zhou@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790422628195597528 X-GMAIL-MSGID: 1790422628195597528 |
Series |
mm/zswap: fix LRU reclaim for zswap writeback folios
|
|
Commit Message
Chengming Zhou
Feb. 9, 2024, 11:59 a.m. UTC
From: Chengming Zhou <zhouchengming@bytedance.com> All LRU move interfaces have a problem that it has no effect if the folio is isolated from LRU (in cpu batch or isolated by shrinker). Since it can't move/change folio LRU status when it's isolated, mostly just clear the folio flag and do nothing in this case. In our case, a written back and reclaimable folio won't be rotated to the tail of inactive list, since it's still in cpu lru_add batch. It may cause the delayed reclaim of this folio and evict other folios. This patch changes to queue the reclaimable folio to cpu rotate batch even when !folio_test_lru(), hoping it will likely be handled after the lru_add batch which will put folio on the LRU list first, so will be rotated to the tail successfully when handle rotate batch. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> --- mm/swap.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
Comments
On Fri, Feb 9, 2024 at 4:00 AM <chengming.zhou@linux.dev> wrote: > > From: Chengming Zhou <zhouchengming@bytedance.com> > > All LRU move interfaces have a problem that it has no effect if the > folio is isolated from LRU (in cpu batch or isolated by shrinker). > Since it can't move/change folio LRU status when it's isolated, mostly > just clear the folio flag and do nothing in this case. > > In our case, a written back and reclaimable folio won't be rotated to > the tail of inactive list, since it's still in cpu lru_add batch. It > may cause the delayed reclaim of this folio and evict other folios. > > This patch changes to queue the reclaimable folio to cpu rotate batch > even when !folio_test_lru(), hoping it will likely be handled after > the lru_add batch which will put folio on the LRU list first, so > will be rotated to the tail successfully when handle rotate batch. It seems to me that it is totally up to chance whether the lru_add batch is handled first, especially that there may be problems if it isn't. > > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> > --- > mm/swap.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/mm/swap.c b/mm/swap.c > index cd8f0150ba3a..d304731e47cf 100644 > --- a/mm/swap.c > +++ b/mm/swap.c > @@ -236,7 +236,8 @@ static void folio_batch_add_and_move(struct folio_batch *fbatch, > > static void lru_move_tail_fn(struct lruvec *lruvec, struct folio *folio) > { > - if (!folio_test_unevictable(folio)) { > + if (!folio_test_locked(folio) && !folio_test_dirty(folio) && > + !folio_test_unevictable(folio) && !folio_test_active(folio)) { What are these conditions based on? I assume we want to check if the folio is locked because we no longer check that it is on the LRUs, so we want to check that no one else is operating on it, but I am not sure that's enough. > lruvec_del_folio(lruvec, folio); > folio_clear_active(folio); > lruvec_add_folio_tail(lruvec, folio); > @@ -254,7 +255,7 @@ static void lru_move_tail_fn(struct lruvec *lruvec, struct folio *folio) > void folio_rotate_reclaimable(struct folio *folio) > { > if (!folio_test_locked(folio) && !folio_test_dirty(folio) && > - !folio_test_unevictable(folio) && folio_test_lru(folio)) { > + !folio_test_unevictable(folio) && !folio_test_active(folio)) { I am not sure it is safe to continue with a folio that is not on the LRUs. It could be isolated for other purposes, and we end up adding it to an LRU nonetheless. Also, folio_batch_move_lru() will do folio_test_clear_lru() and skip such folios anyway. There may also be messed up accounting, for example lru_move_tail_fn() calls lruvec_del_folio(), which does some bookkeeping, at least for the MGLRU case. > struct folio_batch *fbatch; > unsigned long flags; > > -- > 2.40.1 >
On 2024/2/14 15:13, Yu Zhao wrote: > On Fri, Feb 9, 2024 at 6:00 AM <chengming.zhou@linux.dev> wrote: >> >> From: Chengming Zhou <zhouchengming@bytedance.com> >> >> All LRU move interfaces have a problem that it has no effect if the >> folio is isolated from LRU (in cpu batch or isolated by shrinker). >> Since it can't move/change folio LRU status when it's isolated, mostly >> just clear the folio flag and do nothing in this case. >> >> In our case, a written back and reclaimable folio won't be rotated to >> the tail of inactive list, since it's still in cpu lru_add batch. It >> may cause the delayed reclaim of this folio and evict other folios. >> >> This patch changes to queue the reclaimable folio to cpu rotate batch >> even when !folio_test_lru(), hoping it will likely be handled after >> the lru_add batch which will put folio on the LRU list first, so >> will be rotated to the tail successfully when handle rotate batch. >> >> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> > > I don't think the analysis is correct. IIRC, writeback from non > reclaim paths doesn't require isolation and the reclaim path doesn't > use struct folio_batch lru_add. Ah, my bad, I forgot to mention the important context in the message: This is not from the normal reclaim context, it's from zswap writeback reclaim context, which will first set PG_reclaim flag, then submit the async writeback io. If the writeback io complete fast enough, folio_rotate_reclaimable() will be called before that folio put on LRU list (it still in the local lru_add batch, so it's somewhat like isolated too) > > Did you see any performance improvements with this patch? In general, > this kind of patches should have performance numbers to show it really > helps (not just in theory). Right, there are some improvements, the numbers are put in cover letter. But this solution is not good enough, just RFC for discussion. :) mm-unstable-hot zswap-lru-reclaim real 63.34 62.72 user 1063.20 1060.30 sys 272.04 256.14 workingset_refault_anon 2103297.00 1788155.80 workingset_refault_file 28638.20 39249.40 workingset_activate_anon 746134.00 695435.40 workingset_activate_file 4344.60 4255.80 workingset_restore_anon 653163.80 605315.60 workingset_restore_file 1079.00 883.00 workingset_nodereclaim 0.00 0.00 pgscan 12971305.60 12730331.20 pgscan_kswapd 0.00 0.00 pgscan_direct 12971305.60 12730331.20 pgscan_khugepaged 0.00 0.00 > > My guess is that you are hitting this problem [1]. > > [1] https://lore.kernel.org/linux-mm/20221116013808.3995280-1-yuzhao@google.com/ Right, I just see it, it's the same problem. The only difference is that in your case the folio is isolated by shrinker, in my case, the folio is in cpu lru_add batch. Anyway, the result is the same, that folio can't be rotated successfully when writeback complete. Thanks.
On 2024/2/15 15:06, Yu Zhao wrote: > On Wed, Feb 14, 2024 at 4:18 AM Chengming Zhou <chengming.zhou@linux.dev> wrote: >> >> On 2024/2/14 15:13, Yu Zhao wrote: >>> On Fri, Feb 9, 2024 at 6:00 AM <chengming.zhou@linux.dev> wrote: >>>> >>>> From: Chengming Zhou <zhouchengming@bytedance.com> >>>> >>>> All LRU move interfaces have a problem that it has no effect if the >>>> folio is isolated from LRU (in cpu batch or isolated by shrinker). >>>> Since it can't move/change folio LRU status when it's isolated, mostly >>>> just clear the folio flag and do nothing in this case. >>>> >>>> In our case, a written back and reclaimable folio won't be rotated to >>>> the tail of inactive list, since it's still in cpu lru_add batch. It >>>> may cause the delayed reclaim of this folio and evict other folios. >>>> >>>> This patch changes to queue the reclaimable folio to cpu rotate batch >>>> even when !folio_test_lru(), hoping it will likely be handled after >>>> the lru_add batch which will put folio on the LRU list first, so >>>> will be rotated to the tail successfully when handle rotate batch. >>>> >>>> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> >>> >>> I don't think the analysis is correct. IIRC, writeback from non >>> reclaim paths doesn't require isolation and the reclaim path doesn't >>> use struct folio_batch lru_add. >> >> Ah, my bad, I forgot to mention the important context in the message: >> >> This is not from the normal reclaim context, it's from zswap writeback >> reclaim context, which will first set PG_reclaim flag, then submit the >> async writeback io. >> >> If the writeback io complete fast enough, folio_rotate_reclaimable() >> will be called before that folio put on LRU list (it still in the local >> lru_add batch, so it's somewhat like isolated too) >> >>> >>> Did you see any performance improvements with this patch? In general, >>> this kind of patches should have performance numbers to show it really >>> helps (not just in theory). >> >> Right, there are some improvements, the numbers are put in cover letter. >> But this solution is not good enough, just RFC for discussion. :) >> >> mm-unstable-hot zswap-lru-reclaim >> real 63.34 62.72 >> user 1063.20 1060.30 >> sys 272.04 256.14 >> workingset_refault_anon 2103297.00 1788155.80 >> workingset_refault_file 28638.20 39249.40 >> workingset_activate_anon 746134.00 695435.40 >> workingset_activate_file 4344.60 4255.80 >> workingset_restore_anon 653163.80 605315.60 >> workingset_restore_file 1079.00 883.00 >> workingset_nodereclaim 0.00 0.00 >> pgscan 12971305.60 12730331.20 >> pgscan_kswapd 0.00 0.00 >> pgscan_direct 12971305.60 12730331.20 >> pgscan_khugepaged 0.00 0.00 >> >>> >>> My guess is that you are hitting this problem [1]. >>> >>> [1] https://lore.kernel.org/linux-mm/20221116013808.3995280-1-yuzhao@google.com/ >> >> Right, I just see it, it's the same problem. The only difference is that >> in your case the folio is isolated by shrinker, in my case, the folio is >> in cpu lru_add batch. Anyway, the result is the same, that folio can't be >> rotated successfully when writeback complete. > > In that case, a better solution would be to make lru_add add > (_reclaim() && !_dirty() && !_writeback()) folios at the tail. > (_rotate() needs to leave _reclaim() set if it fails to rotate.) Right, this is a solution. But PG_readahead is alias of PG_reclaim, I'm afraid this would rotate readahead folio to the inactive tail.
diff --git a/mm/swap.c b/mm/swap.c index cd8f0150ba3a..d304731e47cf 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -236,7 +236,8 @@ static void folio_batch_add_and_move(struct folio_batch *fbatch, static void lru_move_tail_fn(struct lruvec *lruvec, struct folio *folio) { - if (!folio_test_unevictable(folio)) { + if (!folio_test_locked(folio) && !folio_test_dirty(folio) && + !folio_test_unevictable(folio) && !folio_test_active(folio)) { lruvec_del_folio(lruvec, folio); folio_clear_active(folio); lruvec_add_folio_tail(lruvec, folio); @@ -254,7 +255,7 @@ static void lru_move_tail_fn(struct lruvec *lruvec, struct folio *folio) void folio_rotate_reclaimable(struct folio *folio) { if (!folio_test_locked(folio) && !folio_test_dirty(folio) && - !folio_test_unevictable(folio) && folio_test_lru(folio)) { + !folio_test_unevictable(folio) && !folio_test_active(folio)) { struct folio_batch *fbatch; unsigned long flags;