From patchwork Fri Jan 26 08:30:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chengming Zhou X-Patchwork-Id: 192461 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:e09d:b0:103:945f:af90 with SMTP id gm29csp538568dyb; Fri, 26 Jan 2024 01:19:09 -0800 (PST) X-Google-Smtp-Source: AGHT+IFYP4zStClzIcj9w0WAQnhXGsmOaPaHXHyXKUZcQDPWhIIUafpQmI1pDBY7U7Al8B8pXuBJ X-Received: by 2002:a05:6a20:914f:b0:19c:80b6:42ad with SMTP id x15-20020a056a20914f00b0019c80b642admr690906pzc.53.1706260749535; Fri, 26 Jan 2024 01:19:09 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706260749; cv=pass; d=google.com; s=arc-20160816; b=B/tURhAKwEGDvdM/LOuj+I9wrMR/imS2dzZC6f77lTuazq15vsxpZ3GG4tlRV9VqtH IZKbgAUwcsnuXRWH/m+EYlYzsoezpSuqoo/v5t163FPrqhgc6hm7ZlOonJxcUp+Xx5RD LBg3XWeMe5ktA9HveaGfiGDCWdKLMw+vDkBXdsmNhGk1k1NdfJQMULGtb7O0cx4MR/Vn SqwL8M5zEeGHxbGOI7WiJFby3c3dXEs5bO9Bvtbxq1s3eDASS/G1gWCG/cErNLZDm4IQ /NI6RXQp4XuQa3Fvbv8woFyi84941r67h7oDt4EiuKHJrgmTiOliffzfn7SoP4chIv7r kySA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from:dkim-signature; bh=ul4Yj318/IPfN1mLm2oHYHwsIT4SRkZ2huqI5cSMQ1M=; fh=Cv2JnOWBhkv0KX+IPsfdzGF6FV7WfkXK2IUIUnpyRYY=; b=MS9md6DjGTw75WWCBnD19uVVuVhdDetbYTXZULY4PBJlDBfYLMWl24FJTKRxFeosl6 fiJbG9Q0g3m/XrD7DyKcqDtqtx1F0llJb49bMKuUgNAqfL9nyNhmZcgt2nRzncvtPS43 nmas0ytazcq66awivvxx8vh9FPoSaVdUeuaR607ORdMFqAKD5vwgYvij/YFGAW/K9s+G au3z2/85kjU7MJoORQcrHW8G4G9wTO1Oynsjfx6G5459sCOYvGgx1UCebDNT0NjFmLhh fbl4BFkkSCynT+bjquftIBKE/pji5sNh6BJvjG+VawWGWqqmuekSTpeR+TZDSCskL9Vb zZIg== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=t38FsnZ1; arc=pass (i=1 spf=pass spfdomain=linux.dev dkim=pass dkdomain=linux.dev dmarc=pass fromdomain=linux.dev); spf=pass (google.com: domain of linux-kernel+bounces-39764-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-39764-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id r59-20020a17090a43c100b00290c935cdd4si2864746pjg.83.2024.01.26.01.19.09 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 26 Jan 2024 01:19:09 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-39764-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=t38FsnZ1; arc=pass (i=1 spf=pass spfdomain=linux.dev dkim=pass dkdomain=linux.dev dmarc=pass fromdomain=linux.dev); spf=pass (google.com: domain of linux-kernel+bounces-39764-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-39764-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 96B242829FA for ; Fri, 26 Jan 2024 09:17:22 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 9AD101BF4D; Fri, 26 Jan 2024 08:32:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="t38FsnZ1" Received: from out-175.mta1.migadu.com (out-175.mta1.migadu.com [95.215.58.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0CA753C07 for ; Fri, 26 Jan 2024 08:31:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706257922; cv=none; b=C+olsVUaX+gLXEQtmVnz0iIUZ4HABfW5JvYsHn1NjyQd1kjQA+X4eFW0T6hEkBC3ECxWOsRLlxdk15+OSLgb2DNgiQ6Ai3QU47++UdMcC8iTrYf1kLuNv4FETKNFzJnzG51zvSpstVtFArqkExVIeC46r7S2TIDUFj2X1XwAou8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706257922; c=relaxed/simple; bh=beAV2xZzm1pfX7BpJJ4sLYGMc2yhgnU5EF3GNuNtmDc=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=QD+HgMw6t5/umHpmdEUKq/g4KZSfX/SCqF6J3Dv8AbCXsV7/++PUiJLgN57pc+T7i7K03Jz8gDVtgNW1fwSHUJXoUEP758Yatwm2mR8APpjHyu39Fxhm0xyHJoFmCyzM4JfLSZC1c/l4H/n4DXLV3IHaPMBOx+eykg3hp9wXNrU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=t38FsnZ1; arc=none smtp.client-ip=95.215.58.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1706257917; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=ul4Yj318/IPfN1mLm2oHYHwsIT4SRkZ2huqI5cSMQ1M=; b=t38FsnZ1Muc6VXvBr4BPW2hj/FxunpwClITeZKYaX+yMeSCpB5klMmu1qlnZQ4WYYk6bbD 1GiLoLKVo/wWom1Stp4eARMN/cROOzoWLXX5GTzdXy3ExsHOa9BKTaRBTPCWWV6Uvk/adX uepBwPm12pYL45eeBg0ZxwSF2w5x0dg= From: chengming.zhou@linux.dev To: hannes@cmpxchg.org, yosryahmed@google.com, nphamcs@gmail.com, akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, chengming.zhou@linux.dev, Chengming Zhou Subject: [PATCH 1/2] mm/zswap: don't return LRU_SKIP if we have dropped lru lock Date: Fri, 26 Jan 2024 08:30:14 +0000 Message-Id: <20240126083015.3557006-1-chengming.zhou@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789144071602927543 X-GMAIL-MSGID: 1789144071602927543 From: Chengming Zhou LRU_SKIP can only be returned if we don't ever dropped lru lock, or we need to return LRU_RETRY to restart from the head of lru list. Actually we may need to introduce another LRU_STOP to really terminate the ongoing shrinking scan process, when we encounter a warm page already in the swap cache. The current list_lru implementation doesn't have this function to early break from __list_lru_walk_one. Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure") Signed-off-by: Chengming Zhou Acked-by: Johannes Weiner Reviewed-by: Nhat Pham --- mm/zswap.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index 00e90b9b5417..81cb3790e0dd 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -901,10 +901,8 @@ static enum lru_status shrink_memcg_cb(struct list_head *item, struct list_lru_o * into the warmer region. We should terminate shrinking (if we're in the dynamic * shrinker context). */ - if (writeback_result == -EEXIST && encountered_page_in_swapcache) { - ret = LRU_SKIP; + if (writeback_result == -EEXIST && encountered_page_in_swapcache) *encountered_page_in_swapcache = true; - } goto put_unlock; } From patchwork Fri Jan 26 08:30:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chengming Zhou X-Patchwork-Id: 192473 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:e09d:b0:103:945f:af90 with SMTP id gm29csp541375dyb; Fri, 26 Jan 2024 01:25:52 -0800 (PST) X-Google-Smtp-Source: AGHT+IHvR8qv7CPylJ2ffvnUhA9kwofAoY1XvGyhruOvzt9bXhz+/OX9hHVH8rqGxOKb8nR+mQk5 X-Received: by 2002:a05:6a20:4727:b0:19b:7ca3:ebca with SMTP id ek39-20020a056a20472700b0019b7ca3ebcamr754104pzb.44.1706261152589; Fri, 26 Jan 2024 01:25:52 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706261152; cv=pass; d=google.com; s=arc-20160816; b=Ze2efOJJ5lBn/VdRbBdYM/STaJUjb7LWA1cNQPbjyud56jJgohXzxtt2hc1AWD6vlU NCgtfjm3bzxyVNHdgsyXuJcwDxakYxGHF0zMlpisrye5tWPv9Xvx1tlKcFXdFf7Z4FyS iYFG7PU/9geBL3+BAHMhMxPx2QAS4IPEJl7k9iAhUZLOBtM9x5oEb4fZ+HtXrfyXJ1tM 55tLiaGRgFa/i65vxvzdZY/PcMMSeo51BHkmuo8zv7GVG2PhXlN9GAmRFhUW+8iqDFtL mPLgl8/aAH5uTgngCesqdToJYjdYa5mmuzk94y+2HwCcJtjSBiRnzjai5NkTjLwsJu3p Dw8A== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=4RGl5GCV+/l8k8PkKYE4YjnNWtmux4RyfbXvzaTXTBw=; fh=Cv2JnOWBhkv0KX+IPsfdzGF6FV7WfkXK2IUIUnpyRYY=; b=m4PnbzVCrw1/RchlmVPDmMqqEH9HxFAURPIJGxNTE9fMyxo54YbNaN1Tmro5iHR3LF WQXf9bbW6Stv7ga8Y9GoevlofarGgbLI+EK64e8MC4dmkS6O6x8lDIC8xMmW7mJeiQI/ cbD3WfooRPWfFCGADMvJmYCYTsisVTnUIj4krifZx5/wXN653L+QFuL9Ol7lMLvZ2vbs nrraRwkQlE1Cm8wTjepcSaBbR42rvg4JHTTpITGYA4o544SgMEwLSxziMR7PqYzP0p8L ymzq1ufRmzFcJ2jHgPpJ1xtx+Z08c2U9pzYOg+pGzrMdxh1vQXH+SsAkU8bcwAz67CR0 maiw== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=CLpsDgqA; arc=pass (i=1 spf=pass spfdomain=linux.dev dkim=pass dkdomain=linux.dev dmarc=pass fromdomain=linux.dev); spf=pass (google.com: domain of linux-kernel+bounces-39765-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-39765-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id t9-20020a170902bc4900b001d88baf3f09si786910plz.310.2024.01.26.01.25.52 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 26 Jan 2024 01:25:52 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-39765-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=CLpsDgqA; arc=pass (i=1 spf=pass spfdomain=linux.dev dkim=pass dkdomain=linux.dev dmarc=pass fromdomain=linux.dev); spf=pass (google.com: domain of linux-kernel+bounces-39765-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-39765-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id E83D7B26CEE for ; Fri, 26 Jan 2024 09:17:34 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id CE96020303; Fri, 26 Jan 2024 08:32:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="CLpsDgqA" Received: from out-188.mta1.migadu.com (out-188.mta1.migadu.com [95.215.58.188]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AC32719D for ; Fri, 26 Jan 2024 08:32:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.188 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706257924; cv=none; b=okdP7v30hgJF6lLczMv9tkdoeNp6Ge+CdlQSLrqXO6WiStsmFjoKQDT1aZp3GGgS+vdbrRKsnbcuAKth7RPWYlLqAhoumx7lBEZTRyBSrxsItzoNOfzzcpT5YjlSoIEKld/iaCV9hI3lwoZBZMk6CVVDvLdx/89+OEUgXI8Y0hk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706257924; c=relaxed/simple; bh=v5bJZkTs3BYM941I9It3NXIwfB5nm79uHDr7s+ctpzc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=acvbQ8Ll3BQb9YKHEQYmOM11P7RLow3aAOg6nxB+MuRbU4LL8xYSWZ0WWR5zLUTcZTqfF2bVMsMDSCt2MykO5bWUHx6NM7oOp4t8ezIlgIMednG7uYgyUSb3UrdRIiK8LqMuliz228UrQLQYag0yRb44Wnpu+JkTDG9zTJnwmAk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=CLpsDgqA; arc=none smtp.client-ip=95.215.58.188 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1706257919; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4RGl5GCV+/l8k8PkKYE4YjnNWtmux4RyfbXvzaTXTBw=; b=CLpsDgqApa2jWN3UyBSlFHJMAB6JLKUdRecLoWya7czH6asoNiydNa2hdfjXFJ3mhGU7bd sxi5fCGk4rSyYcc5yWXNRM0vKYyE0GwyUZFSztKB4ZxOLK9RE1TvftQiJma1PBMEg5J5qM jv04y+2koKHteg9CVHY3j3Wwzp5AsR4= From: chengming.zhou@linux.dev To: hannes@cmpxchg.org, yosryahmed@google.com, nphamcs@gmail.com, akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, chengming.zhou@linux.dev, Chengming Zhou Subject: [PATCH 2/2] mm/zswap: fix race between lru writeback and swapoff Date: Fri, 26 Jan 2024 08:30:15 +0000 Message-Id: <20240126083015.3557006-2-chengming.zhou@linux.dev> In-Reply-To: <20240126083015.3557006-1-chengming.zhou@linux.dev> References: <20240126083015.3557006-1-chengming.zhou@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Migadu-Flow: FLOW_OUT X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789144494426300480 X-GMAIL-MSGID: 1789144494426300480 From: Chengming Zhou LRU writeback has race problem with swapoff, as spotted by Yosry[1]: CPU1 CPU2 shrink_memcg_cb swap_off list_lru_isolate zswap_invalidate zswap_swapoff kfree(tree) // UAF spin_lock(&tree->lock) The problem is that the entry in lru list can't protect the tree from being swapoff and freed, and the entry also can be invalidated and freed concurrently after we unlock the lru lock. We can fix it by moving the swap cache allocation ahead before referencing the tree, then check invalidate race with tree lock, only after that we can safely deref the entry. Note we couldn't deref entry or tree anymore after we unlock the folio, since we depend on this to hold on swapoff. So this patch moves all tree and entry usage to zswap_writeback_entry(), we only use the copied swpentry on the stack to allocate swap cache and return with folio locked, after which we can reference the tree. Then check invalidate race with tree lock, the following things is much the same like zswap_load(). Since we can't deref the entry after zswap_writeback_entry(), we can't use zswap_lru_putback() anymore, instead we rotate the entry in the LRU list so concurrent reclaimers have little chance to see it. Or it will be deleted from LRU list if writeback success. Another confusing part to me is the update of memcg nr_zswap_protected in zswap_lru_putback(). I'm not sure why it's needed here since if we raced with swapin, memcg nr_zswap_protected has already been updated in zswap_folio_swapin(). So not include this part for now. [1] https://lore.kernel.org/all/CAJD7tkasHsRnT_75-TXsEe58V9_OW6m3g6CF7Kmsvz8CKRG_EA@mail.gmail.com/ Signed-off-by: Chengming Zhou Acked-by: Johannes Weiner Acked-by: Nhat Pham --- mm/zswap.c | 93 ++++++++++++++++++------------------------------------ 1 file changed, 31 insertions(+), 62 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index 81cb3790e0dd..fa2bdb7ec1d8 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -277,7 +277,7 @@ static inline struct zswap_tree *swap_zswap_tree(swp_entry_t swp) zpool_get_type((p)->zpools[0])) static int zswap_writeback_entry(struct zswap_entry *entry, - struct zswap_tree *tree); + swp_entry_t swpentry); static int zswap_pool_get(struct zswap_pool *pool); static void zswap_pool_put(struct zswap_pool *pool); @@ -445,27 +445,6 @@ static void zswap_lru_del(struct list_lru *list_lru, struct zswap_entry *entry) rcu_read_unlock(); } -static void zswap_lru_putback(struct list_lru *list_lru, - struct zswap_entry *entry) -{ - int nid = entry_to_nid(entry); - spinlock_t *lock = &list_lru->node[nid].lock; - struct mem_cgroup *memcg; - struct lruvec *lruvec; - - rcu_read_lock(); - memcg = mem_cgroup_from_entry(entry); - spin_lock(lock); - /* we cannot use list_lru_add here, because it increments node's lru count */ - list_lru_putback(list_lru, &entry->lru, nid, memcg); - spin_unlock(lock); - - lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(entry_to_nid(entry))); - /* increment the protection area to account for the LRU rotation. */ - atomic_long_inc(&lruvec->zswap_lruvec_state.nr_zswap_protected); - rcu_read_unlock(); -} - /********************************* * rbtree functions **********************************/ @@ -860,40 +839,34 @@ static enum lru_status shrink_memcg_cb(struct list_head *item, struct list_lru_o { struct zswap_entry *entry = container_of(item, struct zswap_entry, lru); bool *encountered_page_in_swapcache = (bool *)arg; - struct zswap_tree *tree; - pgoff_t swpoffset; + swp_entry_t swpentry; enum lru_status ret = LRU_REMOVED_RETRY; int writeback_result; + /* + * First rotate to the tail of lru list before unlocking lru lock, + * so the concurrent reclaimers have little chance to see it. + * It will be deleted from the lru list if writeback success. + */ + list_move_tail(item, &l->list); + /* * Once the lru lock is dropped, the entry might get freed. The - * swpoffset is copied to the stack, and entry isn't deref'd again + * swpentry is copied to the stack, and entry isn't deref'd again * until the entry is verified to still be alive in the tree. */ - swpoffset = swp_offset(entry->swpentry); - tree = swap_zswap_tree(entry->swpentry); - list_lru_isolate(l, item); + swpentry = entry->swpentry; + /* * It's safe to drop the lock here because we return either * LRU_REMOVED_RETRY or LRU_RETRY. */ spin_unlock(lock); - /* Check for invalidate() race */ - spin_lock(&tree->lock); - if (entry != zswap_rb_search(&tree->rbroot, swpoffset)) - goto unlock; - - /* Hold a reference to prevent a free during writeback */ - zswap_entry_get(entry); - spin_unlock(&tree->lock); - - writeback_result = zswap_writeback_entry(entry, tree); + writeback_result = zswap_writeback_entry(entry, swpentry); - spin_lock(&tree->lock); if (writeback_result) { zswap_reject_reclaim_fail++; - zswap_lru_putback(&entry->pool->list_lru, entry); ret = LRU_RETRY; /* @@ -903,27 +876,10 @@ static enum lru_status shrink_memcg_cb(struct list_head *item, struct list_lru_o */ if (writeback_result == -EEXIST && encountered_page_in_swapcache) *encountered_page_in_swapcache = true; - - goto put_unlock; + } else { + zswap_written_back_pages++; } - zswap_written_back_pages++; - - if (entry->objcg) - count_objcg_event(entry->objcg, ZSWPWB); - - count_vm_event(ZSWPWB); - /* - * Writeback started successfully, the page now belongs to the - * swapcache. Drop the entry from zswap - unless invalidate already - * took it out while we had the tree->lock released for IO. - */ - zswap_invalidate_entry(tree, entry); -put_unlock: - /* Drop local reference */ - zswap_entry_put(entry); -unlock: - spin_unlock(&tree->lock); spin_lock(lock); return ret; } @@ -1408,9 +1364,9 @@ static void __zswap_load(struct zswap_entry *entry, struct page *page) * freed. */ static int zswap_writeback_entry(struct zswap_entry *entry, - struct zswap_tree *tree) + swp_entry_t swpentry) { - swp_entry_t swpentry = entry->swpentry; + struct zswap_tree *tree; struct folio *folio; struct mempolicy *mpol; bool folio_was_allocated; @@ -1442,18 +1398,31 @@ static int zswap_writeback_entry(struct zswap_entry *entry, * backs (our zswap_entry reference doesn't prevent that), to * avoid overwriting a new swap folio with old compressed data. */ + tree = swap_zswap_tree(swpentry); spin_lock(&tree->lock); - if (zswap_rb_search(&tree->rbroot, swp_offset(entry->swpentry)) != entry) { + if (zswap_rb_search(&tree->rbroot, swp_offset(swpentry)) != entry) { spin_unlock(&tree->lock); delete_from_swap_cache(folio); folio_unlock(folio); folio_put(folio); return -ENOMEM; } + + /* Safe to deref entry after the entry is verified above. */ + zswap_entry_get(entry); spin_unlock(&tree->lock); __zswap_load(entry, &folio->page); + count_vm_event(ZSWPWB); + if (entry->objcg) + count_objcg_event(entry->objcg, ZSWPWB); + + spin_lock(&tree->lock); + zswap_invalidate_entry(tree, entry); + zswap_entry_put(entry); + spin_unlock(&tree->lock); + /* folio is up to date */ folio_mark_uptodate(folio);