From patchwork Thu Feb 29 00:37:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 208113 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2097:b0:108:e6aa:91d0 with SMTP id gs23csp96153dyb; Wed, 28 Feb 2024 16:38:58 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCWKvCLWXXfj2lSpv5z/m/UxWUgn8Ik5UZ/YF3pJwXN4oVxg9QCr1kGD+ZowfgJRT+naEtuLUlFg1aeq3hajnSxRsylXCw== X-Google-Smtp-Source: AGHT+IFWLUY/7O9J5TULbjbT74ZzCVa3eL6iCxyNxWqQneACt2LN40BO5o29hfT9y0RHd7pfNQa6 X-Received: by 2002:a17:903:495:b0:1db:e089:745b with SMTP id jj21-20020a170903049500b001dbe089745bmr614814plb.6.1709167138502; Wed, 28 Feb 2024 16:38:58 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709167138; cv=pass; d=google.com; s=arc-20160816; b=SXcprVg1TDCIk+Idr2dwTlLuna60XJcAN25zu8V/JSjNtqewAgDiuLPRKTfyuUDmfo 7tdY3FQJRiLnmxPYqFlpnwnC2wJvcIp7MPHg7kYyzNsPz3LtJCc1R8mjbRzPuCWFvgtE iv7itmHck9ZakeAQU03FLNWvod88uUVrQKYI+9mA+ktt9uxxiv29hsTXPLxk/C5ec6LI 2+P4DM2+29RtDNJWI3hWNdi8fyPhjq9Fz+FxSj0JXx2qsjlpI3lUMW50o4NS1F1Gem2n bYaAZf21Rg4gQMWkngj9L6O++bbmmZd7HEka6k4QJSe50L10sY6aGo/hovswQ+wjvR8U l8Tg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=aRxwZRNCH2mhvj5BH6CtV9uA6BSo86eped93k9fcnb8=; fh=TZqi+MBcgvDOEYVUsLdbXCdPRLGe8yrszIBtTTpZOng=; b=LH3Jul+n7LBMcOHMRxfzrkk1NZi4t5JvDCbxyexJV7ZWmBXTJPxIdMSBOhpbmyffhy 7uwxkjkGDn7BwgyNcocLppm13Jt/JHIqMlGzaMoIX38w26EQJHCgiQzkuMQr2Tn5oZtT fITtqzLNPLoAwVMgfZcHPZ08uSvvUwxKwC65hV2ENucdqrbk2d56yrGTruBPGjoGdu2G xOfwIUcO8FPz6LKXgv745xGfHs/44PZxFw0fwx5ccaCnfeBCnG9q5s42WAl+NsgLiamA jSqt26ng+jC/deTN+3RJCAJopH0yH5NimsygIL5/ElkYRWHfT3modJGdr2mSIqlBFFvb r5ew==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b="RC5/IaQu"; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-85926-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-85926-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id z8-20020a170902d54800b001d8f2354fdfsi178358plf.87.2024.02.28.16.38.58 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Feb 2024 16:38:58 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-85926-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b="RC5/IaQu"; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-85926-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-85926-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 504E4285B28 for ; Thu, 29 Feb 2024 00:38:58 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 6744B2E63C; Thu, 29 Feb 2024 00:38:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="RC5/IaQu" Received: from mail-pf1-f169.google.com (mail-pf1-f169.google.com [209.85.210.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9F7BE2D046 for ; Thu, 29 Feb 2024 00:38:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709167115; cv=none; b=sAOmoTu8MJ/2WMHC2XEYBvCvbcvpEyF27U/6uy9uoUZx+Jq6hQxzOJSyHJ5O/MlbsglhPPZnCiM83UZAN3dY8ggf4hYfJ0ZyAm3VljgVqCR719wBGjKe+WQ4k4hAMCV7G7zTQpnwEDsPfGYKEWCC6eBqwUl2VbjQmJ2EW/h0zrQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709167115; c=relaxed/simple; bh=/Yg0m5FhDLeTCyY/71+4EpOmP5M47bIj1XRXEAIbbXk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=gGwKcxiX7a8Y9nkvFeF8kz5cyRg0iaxHq6SaiDTLruSV+/aZbNgcgWe4laqbF0skPtj8w9IlfDckQMp71cSNSQP7hWcJ7Jd2oI4/rnLnM7fYZ2hOA1JDQtblsEa47m9bCqTn8tbuftM9q3WG92HhK29DdaUA2//ZGIBQvXDul5M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=RC5/IaQu; arc=none smtp.client-ip=209.85.210.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pf1-f169.google.com with SMTP id d2e1a72fcca58-6da4a923b1bso262290b3a.2 for ; Wed, 28 Feb 2024 16:38:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709167113; x=1709771913; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=aRxwZRNCH2mhvj5BH6CtV9uA6BSo86eped93k9fcnb8=; b=RC5/IaQuW23ia1cE2c8h28GAnfB+T1uzERPHFGkxti7atnU6ZRqhfRJERMgpq/2+XT DHUZqmzHnqxtxZxlvMUcPjZ31xU6w8Ma+pha8QqBSCwKdY/Qq7J2exgpqx/qUynweney JNddkWAWiVnoEl9D24OqG+j1x5GthkmdCKbj1BiEV81QIq8kXWNpUQHHEc8JItRopQg/ 0igwcdB18mJ5FO/a4N0SzK4YX+w+UX+uTQm1i72Acxwr29Mwtq1YhTN9Y5/i/SzkS9sk v1ZaSS/H5SQEP4RpcRoePBzppFf2UyfMNz/CSE5jAB5f6LBiNgqhRzFdk+6T9Xj/oGBF mHrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709167113; x=1709771913; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=aRxwZRNCH2mhvj5BH6CtV9uA6BSo86eped93k9fcnb8=; b=m/EiJ7Q1PbYb2E9ITPRc2uRJ0KCar6lCAVAIzZNEvwreA9DpuKf+c9KGEN3Z6ls5aa xk+zyngr0yhLBe5gbxpWjpTKY3EV9cl8m3XRicHaDDeSgFUzC9Ja00E/oJW0AGcHU56x waZEKJ1UdNliLsmdQgdc7/RVRWLmu6zSgelX+zFYKN159WbSkTsd/Ea2n00vgJJx6gjB 5t9XMGuSLQIhQWVYvUoy76euQrdQkzZnXmTEFTm9kxYcDyfNY2FhW9cvWKujBO0B7ajP FwtM+QAa+1n65ErznViADOpK7dAl2CTeTTO2tw87yr8e598A+bRy10/zFsV1EzJstK3F bhtg== X-Forwarded-Encrypted: i=1; AJvYcCVkCO+Xo3D0g1Bx9jhG6ESGbTTbSgNjQOO3Fgldle//u5GYLjS4KwX9qsr2MiEldwJq9lhtlqxf9n7yZhYCGjF5r33MgkbU3Zep7QGz X-Gm-Message-State: AOJu0Yznk+Dn+wGXblHvH3KJ/Xw0ChqWVnQfuyxQYLPOTKEr7pNmIXzX u1zf1OTTsNJu6d0AFms7cTJ7ehxzBgG6WWjNbmIZ4HDCHE5HCsQEr/v4dfB1BGggmA== X-Received: by 2002:a05:6a20:3252:b0:19f:f059:c190 with SMTP id hm18-20020a056a20325200b0019ff059c190mr905440pzc.24.1709167112788; Wed, 28 Feb 2024 16:38:32 -0800 (PST) Received: from localhost.localdomain ([2407:7000:8942:5500:5158:ed66:78b3:7fda]) by smtp.gmail.com with ESMTPSA id p3-20020a170902780300b001d9641003cfsm62647pll.142.2024.02.28.16.38.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Feb 2024 16:38:32 -0800 (PST) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, david@redhat.com, linux-mm@kvack.org, ryan.roberts@arm.com, chrisl@kernel.org Cc: 21cnbao@gmail.com, linux-kernel@vger.kernel.org, mhocko@suse.com, shy828301@gmail.com, steven.price@arm.com, surenb@google.com, wangkefeng.wang@huawei.com, willy@infradead.org, xiang@kernel.org, ying.huang@intel.com, yuzhao@google.com, kasong@tencent.com, yosryahmed@google.com, nphamcs@gmail.com, chengming.zhou@linux.dev, hannes@cmpxchg.org, linux-arm-kernel@lists.infradead.org, Barry Song , Catalin Marinas , Will Deacon , Mark Rutland , Kemeng Shi , Anshuman Khandual , Peter Collingbourne , Peter Xu , Lorenzo Stoakes , "Mike Rapoport (IBM)" , Hugh Dickins , "Aneesh Kumar K.V" , Rick Edgecombe Subject: [PATCH RFC v2 1/5] arm64: mm: swap: support THP_SWAP on hardware with MTE Date: Thu, 29 Feb 2024 13:37:49 +1300 Message-Id: <20240229003753.134193-2-21cnbao@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240229003753.134193-1-21cnbao@gmail.com> References: <20240229003753.134193-1-21cnbao@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1792191641574135950 X-GMAIL-MSGID: 1792191641574135950 From: Barry Song Commit d0637c505f8a1 ("arm64: enable THP_SWAP for arm64") brings up THP_SWAP on ARM64, but it doesn't enable THP_SWP on hardware with MTE as the MTE code works with the assumption tags save/restore is always handling a folio with only one page. The limitation should be removed as more and more ARM64 SoCs have this feature. Co-existence of MTE and THP_SWAP becomes more and more important. This patch makes MTE tags saving support large folios, then we don't need to split large folios into base pages for swapping out on ARM64 SoCs with MTE any more. arch_prepare_to_swap() should take folio rather than page as parameter because we support THP swap-out as a whole. It saves tags for all pages in a large folio. As now we are restoring tags based-on folio, in arch_swap_restore(), we may increase some extra loops and early-exitings while refaulting a large folio which is still in swapcache in do_swap_page(). In case a large folio has nr pages, do_swap_page() will only set the PTE of the particular page which is causing the page fault. Thus do_swap_page() runs nr times, and each time, arch_swap_restore() will loop nr times for those subpages in the folio. So right now the algorithmic complexity becomes O(nr^2). Once we support mapping large folios in do_swap_page(), extra loops and early-exitings will decrease while not being completely removed as a large folio might get partially tagged in corner cases such as, 1. a large folio in swapcache can be partially unmapped, thus, MTE tags for the unmapped pages will be invalidated; 2. users might use mprotect() to set MTEs on a part of a large folio. arch_thp_swp_supported() is dropped since ARM64 MTE was the only one who needed it. Cc: Catalin Marinas Cc: Will Deacon Cc: Ryan Roberts Cc: Mark Rutland Cc: David Hildenbrand Cc: Kemeng Shi Cc: "Matthew Wilcox (Oracle)" Cc: Anshuman Khandual Cc: Peter Collingbourne Cc: Steven Price Cc: Yosry Ahmed Cc: Peter Xu Cc: Lorenzo Stoakes Cc: "Mike Rapoport (IBM)" Cc: Hugh Dickins CC: "Aneesh Kumar K.V" Cc: Rick Edgecombe Signed-off-by: Barry Song Reviewed-by: Steven Price Acked-by: Chris Li --- arch/arm64/include/asm/pgtable.h | 19 ++------------ arch/arm64/mm/mteswap.c | 43 ++++++++++++++++++++++++++++++++ include/linux/huge_mm.h | 12 --------- include/linux/pgtable.h | 2 +- mm/page_io.c | 2 +- mm/swap_slots.c | 2 +- 6 files changed, 48 insertions(+), 32 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 401087e8a43d..7a54750770b8 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -45,12 +45,6 @@ __flush_tlb_range(vma, addr, end, PUD_SIZE, false, 1) #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ -static inline bool arch_thp_swp_supported(void) -{ - return !system_supports_mte(); -} -#define arch_thp_swp_supported arch_thp_swp_supported - /* * Outside of a few very special situations (e.g. hibernation), we always * use broadcast TLB invalidation instructions, therefore a spurious page @@ -1095,12 +1089,7 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma, #ifdef CONFIG_ARM64_MTE #define __HAVE_ARCH_PREPARE_TO_SWAP -static inline int arch_prepare_to_swap(struct page *page) -{ - if (system_supports_mte()) - return mte_save_tags(page); - return 0; -} +extern int arch_prepare_to_swap(struct folio *folio); #define __HAVE_ARCH_SWAP_INVALIDATE static inline void arch_swap_invalidate_page(int type, pgoff_t offset) @@ -1116,11 +1105,7 @@ static inline void arch_swap_invalidate_area(int type) } #define __HAVE_ARCH_SWAP_RESTORE -static inline void arch_swap_restore(swp_entry_t entry, struct folio *folio) -{ - if (system_supports_mte()) - mte_restore_tags(entry, &folio->page); -} +extern void arch_swap_restore(swp_entry_t entry, struct folio *folio); #endif /* CONFIG_ARM64_MTE */ diff --git a/arch/arm64/mm/mteswap.c b/arch/arm64/mm/mteswap.c index a31833e3ddc5..295836fef620 100644 --- a/arch/arm64/mm/mteswap.c +++ b/arch/arm64/mm/mteswap.c @@ -68,6 +68,13 @@ void mte_invalidate_tags(int type, pgoff_t offset) mte_free_tag_storage(tags); } +static inline void __mte_invalidate_tags(struct page *page) +{ + swp_entry_t entry = page_swap_entry(page); + + mte_invalidate_tags(swp_type(entry), swp_offset(entry)); +} + void mte_invalidate_tags_area(int type) { swp_entry_t entry = swp_entry(type, 0); @@ -83,3 +90,39 @@ void mte_invalidate_tags_area(int type) } xa_unlock(&mte_pages); } + +int arch_prepare_to_swap(struct folio *folio) +{ + long i, nr; + int err; + + if (!system_supports_mte()) + return 0; + + nr = folio_nr_pages(folio); + + for (i = 0; i < nr; i++) { + err = mte_save_tags(folio_page(folio, i)); + if (err) + goto out; + } + return 0; + +out: + while (i--) + __mte_invalidate_tags(folio_page(folio, i)); + return err; +} + +void arch_swap_restore(swp_entry_t entry, struct folio *folio) +{ + if (system_supports_mte()) { + long i, nr = folio_nr_pages(folio); + + entry.val -= swp_offset(entry) & (nr - 1); + for (i = 0; i < nr; i++) { + mte_restore_tags(entry, folio_page(folio, i)); + entry.val++; + } + } +} diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index de0c89105076..e04b93c43965 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -535,16 +535,4 @@ static inline int split_folio_to_order(struct folio *folio, int new_order) #define split_folio_to_list(f, l) split_folio_to_list_to_order(f, l, 0) #define split_folio(f) split_folio_to_order(f, 0) -/* - * archs that select ARCH_WANTS_THP_SWAP but don't support THP_SWP due to - * limitations in the implementation like arm64 MTE can override this to - * false - */ -#ifndef arch_thp_swp_supported -static inline bool arch_thp_swp_supported(void) -{ - return true; -} -#endif - #endif /* _LINUX_HUGE_MM_H */ diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index a36cf4e124b0..ec7efce0f3f0 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1052,7 +1052,7 @@ static inline int arch_unmap_one(struct mm_struct *mm, * prototypes must be defined in the arch-specific asm/pgtable.h file. */ #ifndef __HAVE_ARCH_PREPARE_TO_SWAP -static inline int arch_prepare_to_swap(struct page *page) +static inline int arch_prepare_to_swap(struct folio *folio) { return 0; } diff --git a/mm/page_io.c b/mm/page_io.c index ae2b49055e43..a9a7c236aecc 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -189,7 +189,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc) * Arch code may have to preserve more data than just the page * contents, e.g. memory tags. */ - ret = arch_prepare_to_swap(&folio->page); + ret = arch_prepare_to_swap(folio); if (ret) { folio_mark_dirty(folio); folio_unlock(folio); diff --git a/mm/swap_slots.c b/mm/swap_slots.c index 90973ce7881d..53abeaf1371d 100644 --- a/mm/swap_slots.c +++ b/mm/swap_slots.c @@ -310,7 +310,7 @@ swp_entry_t folio_alloc_swap(struct folio *folio) entry.val = 0; if (folio_test_large(folio)) { - if (IS_ENABLED(CONFIG_THP_SWAP) && arch_thp_swp_supported()) + if (IS_ENABLED(CONFIG_THP_SWAP)) get_swap_pages(1, &entry, folio_nr_pages(folio)); goto out; } From patchwork Thu Feb 29 00:37:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 208114 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2097:b0:108:e6aa:91d0 with SMTP id gs23csp96254dyb; Wed, 28 Feb 2024 16:39:12 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCU1ibqJeokeRvZZp6i9Vyf/mWVoM3FNqDVY2PG1dEinlTJBYlgYvV7QCv/L34u5HeO6Hko1tP3q7IK4GsKSXwc3HyUKQA== X-Google-Smtp-Source: AGHT+IGsKYdnb1BDGchkgHjH8O4H3PF7+TFY87vrZkaYv+rS9+Fv1Gjel+XIImGpu+CaIwEGvBGt X-Received: by 2002:a05:620a:3956:b0:787:c4b0:a24c with SMTP id qs22-20020a05620a395600b00787c4b0a24cmr657852qkn.13.1709167152552; Wed, 28 Feb 2024 16:39:12 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709167152; cv=pass; d=google.com; s=arc-20160816; b=Z+J36GKXedR/pzYyezBvO0GW8AZwwYhdCc1X9BiaA5/xI7mvSQe63gkbmsUZTN7lrr Y0y0KH2411cMuy7vgj/tDdeiQG0UpbYW+omL5O3thL9gmWdkwsMdcfDzReUsFBblQd0z pXE0fRalP1fXfE3g76LyNwSPxcqqsW30jZ3ILeIe20JiaoevjmT7CNi1NLaMS4Dby6nw V/hPO59UjZur+UQNWN36YwYeuix7m81RCvElcBpJ7Rn9B/RUUkNnJinvdR4dl/dxGE+m DFQwUTSsyoDlSf7zJKhg/v1/zm8vpMTza/wZACD2CHsc9NrE6XqOQmZcmBLxSHC+GLsM DoOQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=yMJ7yXDIGraWBc5cHOK7WDJvzj0gdP9hzucxgbPAd4E=; fh=kl6acphUHPkJB8//MbZRjC6j5pywHve+CAOzz3rmYZA=; b=nvL2krUyMhvJ4CbiNy37/Zc32kWYrTesprOSE5wRBP3TqQAbPp4SEJzGHJW9705Hif 2HCIZ4lriSm1/Y6dkomWOoLppBwdBSBicm4f6E/pJwBc7hXvmwtvSBKVdCmwTkSKij1+ 21bUEhoeAvmnO4bMcctudePEIlO4b33hAtS7shBF8bHVrgklMLZy8xl8up6W7lK3G75p W/k/0im4EZmFwn2DpXMyWFKUcDaBOh8LZ8rn0AJongO5hWnsg+Z07hCDG1dIWYtzBbQM qOrOxfeSxzgzT4w+c6uMakDBBYImQdXzP+7YIqgTLsAOLVpwWs46aHKn1b6w7iVgkl0g 7JXA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=LWgyZS1i; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-85927-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-85927-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id m22-20020a05620a221600b00787c6f1a342si344614qkh.688.2024.02.28.16.39.12 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Feb 2024 16:39:12 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-85927-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=LWgyZS1i; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-85927-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-85927-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 592541C235C0 for ; Thu, 29 Feb 2024 00:39:12 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id F2316364BF; Thu, 29 Feb 2024 00:38:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="LWgyZS1i" Received: from mail-pl1-f180.google.com (mail-pl1-f180.google.com [209.85.214.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9410D2E851 for ; Thu, 29 Feb 2024 00:38:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709167123; cv=none; b=gq4dRpy3FLMuGVaznj/+p7TCOm4+/aODM8hq5/IrAcLXYqWCAz13L2GIf+a5zEscQn8hmh2JwedsRzpp84KQhlJ6Fg1TzOzVuleG701gRrkuXzomi8scfPJ5SJk04O0l86AxjoRffMPjckefP+iBzLGlmmXrCosLrv/nSMFJhJs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709167123; c=relaxed/simple; bh=cIgsX/okUpGvURqUeaQmBadiRRitG9Nc7gxU81tM22Q=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Q9JwG4wMFH5ait/nHg+IAueScmZ9jAqjVHpQGMG6XZ9sWmi904BXdPZJetSr3SVT9gMJOm4IGY8ZSCFpaZ04el1zA0hOdS/rx/IWAh22N1VBy+GZHqU5ykzCDoX4lq1uLE7ZMueZzmOV/HlAThpFiTz5AXUBQc5ewWDE5SdQQu4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=LWgyZS1i; arc=none smtp.client-ip=209.85.214.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pl1-f180.google.com with SMTP id d9443c01a7336-1dba177c596so2667325ad.0 for ; Wed, 28 Feb 2024 16:38:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709167121; x=1709771921; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=yMJ7yXDIGraWBc5cHOK7WDJvzj0gdP9hzucxgbPAd4E=; b=LWgyZS1isSKVdBaUD8XhjE88vnz0GCSMqspSVS+9jJ6pfNLsu77wa0mqZmuifbUxBa ge4Cvj50JJvd0kLuOc1n1VOXsIM/uJ6ibqIIaCx4ZG5TeTb575T1OJ7pr+Npe3kO9KTg F45cNv2F9vX7zEbnxF0dF63B6Q2ritqm9eCHRXaBY3AfeCEUKOF+SAdYbbldjZ+0/P9+ uqQa9X8Rf0EynB9q1UaMvl2Z+na3FyByZii/VIhLsy1Q7cpDRvCuGb9kDeuFLlCY+kIG P6Xk5QjJmihgIRiaLhQIaL9swTiGqqZu2CD3nWf1neeWglW5s/EQy4TBbcIVSzHzt2i9 hFlQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709167121; x=1709771921; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=yMJ7yXDIGraWBc5cHOK7WDJvzj0gdP9hzucxgbPAd4E=; b=dSrKzmqVv/C1IFMzMHGQErRSPwXJL97W5bCKYbGuFKx4j5GjEBdfNrYE2kQjz+r/1k Tulku8/l/iH2R18w6UiHQ1wzDh5Kn9mYkJcYq0yyUypT89ASrq8HlhSaKxdKzm+jdSeQ fui98vkvH6KQDDjicW7IaWPxVS88RrGJzPDs+GgU7JkxyZgTosfY705GKl2dXcJnLd0T 12SqruECZ/a5UJ8TSZdSgs7O+36rOT4a7CBbHRRxz4McY2uAFOfL17FaC9YkWekxIKX5 imOjs6s5LwjY2Qr74d87Lz8DDKVM+6QC5y9RuTeIjHgUyO52tu7ybEsqoQUJvO0cHT2E ZqDw== X-Forwarded-Encrypted: i=1; AJvYcCUTAj32IFv90+xWkY44TFX9KqOwYgh3n2VszqzXCKA3GeyEgMxNHituVi3hQk7AeuSyBeXdNWbkSAbL8lNNTu1NWO8k15vpEptz6EVp X-Gm-Message-State: AOJu0YyRvkkIxYdpUcQc+4qUfRNLQuBtTgl9dYdyXQMHAabuC662uIXy niZY7e/++5QWTMSUXiucg1C2obz8m63msjs8d/axp3xMe33dXp/V X-Received: by 2002:a17:902:e5c2:b0:1d9:adc9:2962 with SMTP id u2-20020a170902e5c200b001d9adc92962mr530369plf.20.1709167120889; Wed, 28 Feb 2024 16:38:40 -0800 (PST) Received: from localhost.localdomain ([2407:7000:8942:5500:5158:ed66:78b3:7fda]) by smtp.gmail.com with ESMTPSA id p3-20020a170902780300b001d9641003cfsm62647pll.142.2024.02.28.16.38.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Feb 2024 16:38:40 -0800 (PST) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, david@redhat.com, linux-mm@kvack.org, ryan.roberts@arm.com, chrisl@kernel.org Cc: 21cnbao@gmail.com, linux-kernel@vger.kernel.org, mhocko@suse.com, shy828301@gmail.com, steven.price@arm.com, surenb@google.com, wangkefeng.wang@huawei.com, willy@infradead.org, xiang@kernel.org, ying.huang@intel.com, yuzhao@google.com, kasong@tencent.com, yosryahmed@google.com, nphamcs@gmail.com, chengming.zhou@linux.dev, hannes@cmpxchg.org, linux-arm-kernel@lists.infradead.org, Chuanhua Han , Barry Song Subject: [PATCH RFC v2 2/5] mm: swap: introduce swap_nr_free() for batched swap_free() Date: Thu, 29 Feb 2024 13:37:50 +1300 Message-Id: <20240229003753.134193-3-21cnbao@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240229003753.134193-1-21cnbao@gmail.com> References: <20240229003753.134193-1-21cnbao@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1792191656060102779 X-GMAIL-MSGID: 1792191656060102779 From: Chuanhua Han While swapping in a large folio, we need to free swaps related to the whole folio. To avoid frequently acquiring and releasing swap locks, it is better to introduce an API for batched free. Signed-off-by: Chuanhua Han Co-developed-by: Barry Song Signed-off-by: Barry Song --- include/linux/swap.h | 6 ++++++ mm/swapfile.c | 35 +++++++++++++++++++++++++++++++++++ 2 files changed, 41 insertions(+) diff --git a/include/linux/swap.h b/include/linux/swap.h index 25f6368be078..b3581c976e5f 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -481,6 +481,7 @@ extern void swap_shmem_alloc(swp_entry_t); extern int swap_duplicate(swp_entry_t); extern int swapcache_prepare(swp_entry_t); extern void swap_free(swp_entry_t); +extern void swap_nr_free(swp_entry_t entry, int nr_pages); extern void swapcache_free_entries(swp_entry_t *entries, int n); extern int free_swap_and_cache(swp_entry_t); int swap_type_of(dev_t device, sector_t offset); @@ -561,6 +562,11 @@ static inline void swap_free(swp_entry_t swp) { } +void swap_nr_free(swp_entry_t entry, int nr_pages) +{ + +} + static inline void put_swap_folio(struct folio *folio, swp_entry_t swp) { } diff --git a/mm/swapfile.c b/mm/swapfile.c index 2b3a2d85e350..c0c058ee7b69 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1340,6 +1340,41 @@ void swap_free(swp_entry_t entry) __swap_entry_free(p, entry); } +/* + * Called after swapping in a large folio, batched free swap entries + * for this large folio, entry should be for the first subpage and + * its offset is aligned with nr_pages + */ +void swap_nr_free(swp_entry_t entry, int nr_pages) +{ + int i; + struct swap_cluster_info *ci; + struct swap_info_struct *p; + unsigned type = swp_type(entry); + unsigned long offset = swp_offset(entry); + DECLARE_BITMAP(usage, SWAPFILE_CLUSTER) = { 0 }; + + /* all swap entries are within a cluster for mTHP */ + VM_BUG_ON(offset % SWAPFILE_CLUSTER + nr_pages > SWAPFILE_CLUSTER); + + if (nr_pages == 1) { + swap_free(entry); + return; + } + + p = _swap_info_get(entry); + + ci = lock_cluster(p, offset); + for (i = 0; i < nr_pages; i++) { + if (__swap_entry_free_locked(p, offset + i, 1)) + __bitmap_set(usage, i, 1); + } + unlock_cluster(ci); + + for_each_clear_bit(i, usage, nr_pages) + free_swap_slot(swp_entry(type, offset + i)); +} + /* * Called after dropping swapcache to decrease refcnt to swap entries. */ From patchwork Thu Feb 29 00:37:51 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 208115 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2097:b0:108:e6aa:91d0 with SMTP id gs23csp96312dyb; Wed, 28 Feb 2024 16:39:25 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCVf8nWRAlDBGB2fNwww7kgB9CPDSuMjzPQTmxkMmR16iWtiShkjguLbgzXWWSA2/Y1NaBtkeckqRWaNT5xmRdLwTLUv5w== X-Google-Smtp-Source: AGHT+IGBRQkifOaAiINIQ5ZrrqCK0M3Il4IxAqyuy2jnV5MUdmjaAkilsD51HxokoOQdVPk4OdrY X-Received: by 2002:ac8:5acd:0:b0:42e:ba41:3426 with SMTP id d13-20020ac85acd000000b0042eba413426mr363614qtd.55.1709167165480; Wed, 28 Feb 2024 16:39:25 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709167165; cv=pass; d=google.com; s=arc-20160816; b=R5TWJM20OSm++OuJ+KVFmR0/gATTb2AsjNOh//MY3NiZ5ZD2NBfmIwmQozIGIqF4yr WHpdUkkkvtSCKRmIg3YEoTx7m3iWx9YEW3ran1zEX7W9DEsBq/co1CSRZ1YFrQB/aYkL IZ6RRY+ZpViCiaWuXx3bnpvM9lHIgM6u/yil2Vf4oWH9705cCKurvsOKtbYBa0ZfF6Si C+ZjhysPVhArg0e2Y4fK7bOgOsPQeCrgGoMbTzc69n1DIbRjSZOHR2CVIl0OHhjS1aHB 77yGoCKAsWb9OEJUvcjiOY0JwtGRE+5whQOW1LjedtbaNGR3HKcHvBmSri7FDsWj7gmT oNCg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=UdVUS/tLhNyeqOoyodd3YqCiZJm0fvc/nLmJZ8OGUGI=; fh=S2cYVuGXRTtc/uWM8zjoMsWVWHhw1o92hc2WFz7PQ9w=; b=urfn/iGaQ0iR4Dwpln5twggjTDVAHxKnNlNQRhRR9DHk6kHIVcyKAvNeiQH+6p+Tbq znPKBJozohWBYksoC4GNole317SokP/pUZet1pU6Sn8zeyYCjA0Nh/6zD/vZ1wATfa8S bVg+iP24Vaza2BSjcg0RayWnOCB+K2JnPMnGPkMlmVIWqM8HAST+dYbugF4/nCzg9z5B UBu4/4tLl+nDe4ZLgGwvHRDAiIVq+g6Jno4nWBY/yfXwQtkZEdSPrrwBbFTp869Dmv2O O0rYLxIRZJkO8HiydDya3MFpplwxnv7ac0QtkCEbq2kEwPD8Gp8scecUxZlJ1t2+Uz1i wMgw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=hpfQZsm4; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-85928-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-85928-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id c10-20020ac8518a000000b0042e58058ce1si336730qtn.14.2024.02.28.16.39.25 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Feb 2024 16:39:25 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-85928-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=hpfQZsm4; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-85928-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-85928-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 493AF1C22FA7 for ; Thu, 29 Feb 2024 00:39:25 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 8588337149; Thu, 29 Feb 2024 00:38:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="hpfQZsm4" Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 75A10364C0 for ; Thu, 29 Feb 2024 00:38:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709167130; cv=none; b=mM40AqSpPtWWLpbAnEI85VluNRTZKcEZzG346xEk5Yv+XpdlVYMylLnBClPuy3G76DLyaAb7KuJ2ngdt3KdPjKViE5/hpO51mbmjIkMElUVamAMv7tptgRlUC0NJ19icfXYuHCY6k1nFXx1FrnGTQgE0513JrAYkZjvI7GW1MoM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709167130; c=relaxed/simple; bh=R3LQ+u6f5MkP7av2fnteoi3gqkMKiE8GmMQ/DN9Lv0I=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=q7/8eZU/er2MctVF5j71NszMlSzBAgCPlX5IbCCvokWD3BbGnuUWAiueZ45KI/JoF5FUqQVjFjY/HSJ15osGGf834OOd/2zw2mi26/gkDUuNZfe5DKNDz7ur3PhilzAGtsn7HqMkvvfMtjnGyLjiyPOfFq33LAiWvqaEG2J9nXY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=hpfQZsm4; arc=none smtp.client-ip=209.85.214.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pl1-f170.google.com with SMTP id d9443c01a7336-1dc1ff3ba1aso3566845ad.3 for ; Wed, 28 Feb 2024 16:38:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709167129; x=1709771929; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=UdVUS/tLhNyeqOoyodd3YqCiZJm0fvc/nLmJZ8OGUGI=; b=hpfQZsm4+tKvYyiJMwjnLFJ/YGTBuZHJO4Bqpl148QvmW190VP5m9KbJ9+bvi1wRkv CBM6H3DbhKHcCLsOgC+rJqORZsd5l/oncm9rkifxx5MxV8gKbX/E6qfivqC21hEbp3DE zTkj96cKDRd58enbq9Yqa+QECiwUz5PzzeLZLHqSJQkmBc/JyV7LSLWYS0RYUxWfY+SI D29bae7OM2AT9zoYJ7ljVZE6sFmZf7Ud+r9u5Mx+e+l4/iX5zGvWeBmpEMAeheILzTFN 6SGxMz5elTz+o8mKTHnU4E2r86BKO8k1zp/THkpXjy8ZtPdslPVm2vWgsEKf+of9IlV8 3PXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709167129; x=1709771929; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=UdVUS/tLhNyeqOoyodd3YqCiZJm0fvc/nLmJZ8OGUGI=; b=wMvRGrKtX/XnB1TEO57VU9FnqZfT90fXpw0KOPvm1lI9SOK6op/4b5jpYt/khGrp7h sQu5Lecc9b269OukRwP1OdymkAF5d3c+MX/TIgrOUZW96dxnGzg8UnmZ0PTEe7i85CQM 6UYFfBd08damz2ySeO9uDyCHN0LkZW7l1Q2N+esOFD5tgq/bmqRhbJXFmk0JrvlX6pre +egRDakaveWhX7u721Z0MeS5BhjDg1sfWBHedZVkbzSzxIzeO1LRyL9AMp4cEXkMkWJZ A9IXz8fUq14yEwbqLe9atFQ7osARVHQRBO2xrldFiD5rZZqBc1I4cKAXdupJshv2un45 pEhg== X-Forwarded-Encrypted: i=1; AJvYcCUuhZYt3B6YLiDPQUOv2XbPhonhw/VOs//oXv7hpV49CxtF/vaazAiyp6d+kySE5y/Xqz/CYrZ2BY621hbMzPcY1rACbIjAaJJOL2PT X-Gm-Message-State: AOJu0Yw3WegEadmVYpnqmhlKJsYyYlSOVzv3uYmdgE3371W2c77sBmrs x8P65XmPw0F68h2to4FhY9aZRdsmP8voLsFgxywAr3V8o6A37dO9 X-Received: by 2002:a17:902:7d82:b0:1dc:b320:9475 with SMTP id a2-20020a1709027d8200b001dcb3209475mr648542plm.13.1709167128925; Wed, 28 Feb 2024 16:38:48 -0800 (PST) Received: from localhost.localdomain ([2407:7000:8942:5500:5158:ed66:78b3:7fda]) by smtp.gmail.com with ESMTPSA id p3-20020a170902780300b001d9641003cfsm62647pll.142.2024.02.28.16.38.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Feb 2024 16:38:48 -0800 (PST) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, david@redhat.com, linux-mm@kvack.org, ryan.roberts@arm.com, chrisl@kernel.org Cc: 21cnbao@gmail.com, linux-kernel@vger.kernel.org, mhocko@suse.com, shy828301@gmail.com, steven.price@arm.com, surenb@google.com, wangkefeng.wang@huawei.com, willy@infradead.org, xiang@kernel.org, ying.huang@intel.com, yuzhao@google.com, kasong@tencent.com, yosryahmed@google.com, nphamcs@gmail.com, chengming.zhou@linux.dev, hannes@cmpxchg.org, linux-arm-kernel@lists.infradead.org, Chuanhua Han , Barry Song Subject: [PATCH RFC v2 3/5] mm: swap: make should_try_to_free_swap() support large-folio Date: Thu, 29 Feb 2024 13:37:51 +1300 Message-Id: <20240229003753.134193-4-21cnbao@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240229003753.134193-1-21cnbao@gmail.com> References: <20240229003753.134193-1-21cnbao@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1792191670029617350 X-GMAIL-MSGID: 1792191670029617350 From: Chuanhua Han should_try_to_free_swap() works with an assumption that swap-in is always done at normal page granularity, aka, folio_nr_pages = 1. To support large folio swap-in, this patch removes the assumption. Signed-off-by: Chuanhua Han Co-developed-by: Barry Song Signed-off-by: Barry Song Acked-by: Chris Li --- mm/memory.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/memory.c b/mm/memory.c index 319b3be05e75..90b08b7cbaac 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3904,7 +3904,7 @@ static inline bool should_try_to_free_swap(struct folio *folio, * reference only in case it's likely that we'll be the exlusive user. */ return (fault_flags & FAULT_FLAG_WRITE) && !folio_test_ksm(folio) && - folio_ref_count(folio) == 2; + folio_ref_count(folio) == (1 + folio_nr_pages(folio)); } static vm_fault_t pte_marker_clear(struct vm_fault *vmf) From patchwork Thu Feb 29 00:37:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 208116 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2097:b0:108:e6aa:91d0 with SMTP id gs23csp96386dyb; Wed, 28 Feb 2024 16:39:39 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCX+3Id/deOeX5iRR1gLnMmhyD/zB1h94Zm1i8I0yINx9EPGfXYfOzrLwRjmTV8eeQZOBgV8IWgl5W6CHfntVvhuakxZtA== X-Google-Smtp-Source: AGHT+IEa0BSgtovJ/jgmnjDWBT2xrSXRgdgyGZIqxnrOolKuvNBoReYN6Y7seRtCedmaFOgJM0t4 X-Received: by 2002:a17:902:e841:b0:1d9:d341:b150 with SMTP id t1-20020a170902e84100b001d9d341b150mr704405plg.40.1709167178689; Wed, 28 Feb 2024 16:39:38 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709167178; cv=pass; d=google.com; s=arc-20160816; b=l22lE6cCuwROgdZIHjcjfQbZp3+se4y7+QtK98M4XGmv0FCgH8b6kehRjJ1aQ+QXRW scVWYJh0CMzk7jNKR7isY7UmiQCrf7c3aX0rJ5VWU1x7knS00ioiFC69jbXJ93IgbN4D vGKenEWO0r0Pg9+NsNyp873E2IuvX8JOX0FpjKoCY4vx2quMvvY8vtfO/79K908VEu66 3fzqbG7iXzL4/K5IvqvI9DVzl5FPVDxOFJZHnfCr5mGHlOJTgn7WuHv71o7MMkI/+B/y Pd/Xj5sktMt4QcUxinuPw+o3f7pBdT22pOvuaww6E4xMKrzhBmURgLxcaEyfxEfz99vR UwpA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=qNEaXVngxEEVFW2r7SF7/R/SOQEyIxB42KfKsKpMTi0=; fh=WT/vGqwAxH23exUeIO0UmZUjBlBb2YUTQPEDgFHAfks=; b=NZxekDqWu2gsnErYK8jji7OOXlgddMgyrvKq7j4c8XbGa2HPOPK5pn5ohpagjE+6ow j/q/g1yUk7vqMStOzX1eO7JZiPNHO8CE4mOMn6MvfzgnQgo8NCQboSqbdRBQZaJLvZsz pIIP7B8vLOrJskWj+NdBH+3vb+2lk3w1MUsmp+2jHY8GIPF79l8Iptl73VNslSXF9oRr QR/+qL3AKDsgoaQsLCGTDRM6jGC8whOQgytMf23VIiKRrxZ60mmrzMjrNUSRJSPdn1/g 4Rzez6Yu3H3YZAdWvXTDtKJa4F5zE/BHWlUYpTIX0zpVYZNSEuPWE7YmPIbkf+M16mH0 QcJg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=fUb2pqt8; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-85929-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-85929-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id ma5-20020a170903094500b001dcc8ea6241si136679plb.381.2024.02.28.16.39.38 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Feb 2024 16:39:38 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-85929-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=fUb2pqt8; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-85929-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-85929-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 7CB14285E28 for ; Thu, 29 Feb 2024 00:39:38 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id C4C0122612; Thu, 29 Feb 2024 00:39:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="fUb2pqt8" Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 529E4134AA for ; Thu, 29 Feb 2024 00:38:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709167140; cv=none; b=kc14ZtRD7RDdMQdpLiRveV3JmWPXuIcnefttN4i4TJxWIdeG/cd3+kWaRZE1+yRoZlZbC5fCqhRh/TGNJC3roEN5YApF/iyR4sT5rY6/CP4EDK1I9p92nXVbjUrnB50MO4MxBJoh8jsmL4Ahz46Z4TVFVOg54pzsYGLFJsTiAiU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709167140; c=relaxed/simple; bh=/VyAw9wBKYJlNjDR0pZggTu+d9r8B6/nY09fzfomcx8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=bxxC49O+K0hY3xGgANGpo/ha+oqmlwgKb2rKs4y1/znCx+nwePMGEdvTwPHxqjBRJZVIL46LZq8tDoF9bCH+/xVgNm0bm43djg4RrYITu/LFsbY9+yftaiBoTSe9rPFjaHMUCgHBgtV1q8Pe2oiQlu3v9jJ2CTnhufswrWinehs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=fUb2pqt8; arc=none smtp.client-ip=209.85.214.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pl1-f177.google.com with SMTP id d9443c01a7336-1dc0d11d1b7so3489125ad.2 for ; Wed, 28 Feb 2024 16:38:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709167137; x=1709771937; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=qNEaXVngxEEVFW2r7SF7/R/SOQEyIxB42KfKsKpMTi0=; b=fUb2pqt8o3xmFnsDHDQDyBs3KPeRgw9E+Co8rjfitzA6Xqm+NqI8+agnQRbR32jMP4 OAGqTpiDney/799uXT5Mx8Tn6Gp+AnGKDy4HHNkGEMn0e5U6ipvh8IXHx4yBM4OCTwMJ h57wPC0oyGVHkONBiEbG0GAYCK2dHpIkSO7qdNsSAOonOfURPUcYa2gxxxELspzgLY+Y p7AUPGL5Ata1tpYrKDg6zDywnYIb7TY3x841dOCQFmLRut4RlM2Tu6+8rr6ALgrO+Ti4 GskIbA7J2pv2BV9s1SKQtK+TsJ9C+3QkgQUDZLdQOww0gSLQVsU4TTyWmd8h+oyDxaz+ 5DtQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709167137; x=1709771937; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=qNEaXVngxEEVFW2r7SF7/R/SOQEyIxB42KfKsKpMTi0=; b=OLXbtkADlwy+menILhjkL3Qv3HLcmaPLw8fluKgKRqWRXWH397oaBpibabKuc5AJhE B2R9Tf7nwuMEDIIM9zZ6ONp6EBe8jLFl+AqSjT0lviIWsnDcWO4bNApBVIboNcUU2oqG RHXhB14LyFSK/W0iCcqq09zrTPw8xQZj8Z27lT7JcZJ9NEeS0DkwHlPW0XfYzBsR6XGx K0k4/bA5aiTIEzOaV1regXI6xvB//G5hqbR96CSUvHU++iiWwEDPf3y8qRyDwqUBYSrG FvB0X1dPZI9ahxLl95+Oyw/KmYp1W6S+rwft8QBPdb56CcslWuJ+WZ/crvuMPwl/eq+W LmmA== X-Forwarded-Encrypted: i=1; AJvYcCXEhGEHs6FYQaCOhMSq1y5Ucz15m1Drxrt92otIo6n9kxWZkZEHEDnG7tMLyBqt+hS5pdzX/8dmrElunfpNGkN/qku9NX/n/EpcDGT8 X-Gm-Message-State: AOJu0Ywp57FW7ajuPVryOhTAQ0VFF33XLCcUifMKr24IQ5Lpf6Ds1OEs 5ma+zE0VWTN09JhTl5LxaiImZf1me6aplriH+SfyPMa7rcWT0gxo X-Received: by 2002:a17:902:d38d:b0:1dc:8db3:16e9 with SMTP id e13-20020a170902d38d00b001dc8db316e9mr530965pld.45.1709167137495; Wed, 28 Feb 2024 16:38:57 -0800 (PST) Received: from localhost.localdomain ([2407:7000:8942:5500:5158:ed66:78b3:7fda]) by smtp.gmail.com with ESMTPSA id p3-20020a170902780300b001d9641003cfsm62647pll.142.2024.02.28.16.38.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Feb 2024 16:38:57 -0800 (PST) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, david@redhat.com, linux-mm@kvack.org, ryan.roberts@arm.com, chrisl@kernel.org Cc: 21cnbao@gmail.com, linux-kernel@vger.kernel.org, mhocko@suse.com, shy828301@gmail.com, steven.price@arm.com, surenb@google.com, wangkefeng.wang@huawei.com, willy@infradead.org, xiang@kernel.org, ying.huang@intel.com, yuzhao@google.com, kasong@tencent.com, yosryahmed@google.com, nphamcs@gmail.com, chengming.zhou@linux.dev, hannes@cmpxchg.org, linux-arm-kernel@lists.infradead.org, Barry Song , Hugh Dickins , Minchan Kim , SeongJae Park Subject: [PATCH RFC v2 4/5] mm: swap: introduce swapcache_prepare_nr and swapcache_clear_nr for large folios swap-in Date: Thu, 29 Feb 2024 13:37:52 +1300 Message-Id: <20240229003753.134193-5-21cnbao@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240229003753.134193-1-21cnbao@gmail.com> References: <20240229003753.134193-1-21cnbao@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1792191683620677382 X-GMAIL-MSGID: 1792191683620677382 From: Barry Song Commit 13ddaf26be32 ("mm/swap: fix race when skipping swapcache") supports one entry only, to support large folio swap-in, we need to handle multiple swap entries. Cc: Kairui Song Cc: "Huang, Ying" Cc: Yu Zhao Cc: David Hildenbrand Cc: Chris Li Cc: Hugh Dickins Cc: Johannes Weiner Cc: Matthew Wilcox (Oracle) Cc: Michal Hocko Cc: Minchan Kim Cc: Yosry Ahmed Cc: Yu Zhao Cc: SeongJae Park Signed-off-by: Barry Song --- include/linux/swap.h | 1 + mm/swap.h | 1 + mm/swapfile.c | 117 ++++++++++++++++++++++++++----------------- 3 files changed, 72 insertions(+), 47 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index b3581c976e5f..2691c739d9a4 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -480,6 +480,7 @@ extern int add_swap_count_continuation(swp_entry_t, gfp_t); extern void swap_shmem_alloc(swp_entry_t); extern int swap_duplicate(swp_entry_t); extern int swapcache_prepare(swp_entry_t); +extern int swapcache_prepare_nr(swp_entry_t, int nr); extern void swap_free(swp_entry_t); extern void swap_nr_free(swp_entry_t entry, int nr_pages); extern void swapcache_free_entries(swp_entry_t *entries, int n); diff --git a/mm/swap.h b/mm/swap.h index fc2f6ade7f80..1cec991efcda 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -42,6 +42,7 @@ void delete_from_swap_cache(struct folio *folio); void clear_shadow_from_swap_cache(int type, unsigned long begin, unsigned long end); void swapcache_clear(struct swap_info_struct *si, swp_entry_t entry); +void swapcache_clear_nr(struct swap_info_struct *si, swp_entry_t entry, int nr); struct folio *swap_cache_get_folio(swp_entry_t entry, struct vm_area_struct *vma, unsigned long addr); struct folio *filemap_get_incore_folio(struct address_space *mapping, diff --git a/mm/swapfile.c b/mm/swapfile.c index c0c058ee7b69..c8c8b6dbaeda 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -3308,7 +3308,7 @@ void si_swapinfo(struct sysinfo *val) } /* - * Verify that a swap entry is valid and increment its swap map count. + * Verify that nr swap entries are valid and increment their swap map count. * * Returns error code in following case. * - success -> 0 @@ -3318,66 +3318,73 @@ void si_swapinfo(struct sysinfo *val) * - swap-cache reference is requested but the entry is not used. -> ENOENT * - swap-mapped reference requested but needs continued swap count. -> ENOMEM */ -static int __swap_duplicate(swp_entry_t entry, unsigned char usage) +static int __swap_duplicate_nr(swp_entry_t entry, int nr, unsigned char usage) { struct swap_info_struct *p; struct swap_cluster_info *ci; unsigned long offset; - unsigned char count; - unsigned char has_cache; - int err; + unsigned char count[SWAPFILE_CLUSTER]; + unsigned char has_cache[SWAPFILE_CLUSTER]; + int err, i; p = swp_swap_info(entry); offset = swp_offset(entry); ci = lock_cluster_or_swap_info(p, offset); - count = p->swap_map[offset]; - - /* - * swapin_readahead() doesn't check if a swap entry is valid, so the - * swap entry could be SWAP_MAP_BAD. Check here with lock held. - */ - if (unlikely(swap_count(count) == SWAP_MAP_BAD)) { - err = -ENOENT; - goto unlock_out; - } + for (i = 0; i < nr; i++) { + count[i] = p->swap_map[offset + i]; - has_cache = count & SWAP_HAS_CACHE; - count &= ~SWAP_HAS_CACHE; - err = 0; - - if (usage == SWAP_HAS_CACHE) { - - /* set SWAP_HAS_CACHE if there is no cache and entry is used */ - if (!has_cache && count) - has_cache = SWAP_HAS_CACHE; - else if (has_cache) /* someone else added cache */ - err = -EEXIST; - else /* no users remaining */ + /* + * swapin_readahead() doesn't check if a swap entry is valid, so the + * swap entry could be SWAP_MAP_BAD. Check here with lock held. + */ + if (unlikely(swap_count(count[i]) == SWAP_MAP_BAD)) { err = -ENOENT; + goto unlock_out; + } - } else if (count || has_cache) { - - if ((count & ~COUNT_CONTINUED) < SWAP_MAP_MAX) - count += usage; - else if ((count & ~COUNT_CONTINUED) > SWAP_MAP_MAX) - err = -EINVAL; - else if (swap_count_continued(p, offset, count)) - count = COUNT_CONTINUED; - else - err = -ENOMEM; - } else - err = -ENOENT; /* unused swap entry */ - - if (!err) - WRITE_ONCE(p->swap_map[offset], count | has_cache); + has_cache[i] = count[i] & SWAP_HAS_CACHE; + count[i] &= ~SWAP_HAS_CACHE; + err = 0; + + if (usage == SWAP_HAS_CACHE) { + + /* set SWAP_HAS_CACHE if there is no cache and entry is used */ + if (!has_cache[i] && count[i]) + has_cache[i] = SWAP_HAS_CACHE; + else if (has_cache[i]) /* someone else added cache */ + err = -EEXIST; + else /* no users remaining */ + err = -ENOENT; + } else if (count[i] || has_cache[i]) { + + if ((count[i] & ~COUNT_CONTINUED) < SWAP_MAP_MAX) + count[i] += usage; + else if ((count[i] & ~COUNT_CONTINUED) > SWAP_MAP_MAX) + err = -EINVAL; + else if (swap_count_continued(p, offset + i, count[i])) + count[i] = COUNT_CONTINUED; + else + err = -ENOMEM; + } else + err = -ENOENT; /* unused swap entry */ + } + if (!err) { + for (i = 0; i < nr; i++) + WRITE_ONCE(p->swap_map[offset + i], count[i] | has_cache[i]); + } unlock_out: unlock_cluster_or_swap_info(p, ci); return err; } +static int __swap_duplicate(swp_entry_t entry, unsigned char usage) +{ + return __swap_duplicate_nr(entry, 1, usage); +} + /* * Help swapoff by noting that swap entry belongs to shmem/tmpfs * (in which case its reference count is never incremented). @@ -3416,17 +3423,33 @@ int swapcache_prepare(swp_entry_t entry) return __swap_duplicate(entry, SWAP_HAS_CACHE); } -void swapcache_clear(struct swap_info_struct *si, swp_entry_t entry) +int swapcache_prepare_nr(swp_entry_t entry, int nr) +{ + return __swap_duplicate_nr(entry, nr, SWAP_HAS_CACHE); +} + +void swapcache_clear_nr(struct swap_info_struct *si, swp_entry_t entry, int nr) { struct swap_cluster_info *ci; unsigned long offset = swp_offset(entry); - unsigned char usage; + unsigned char usage[SWAPFILE_CLUSTER]; + int i; ci = lock_cluster_or_swap_info(si, offset); - usage = __swap_entry_free_locked(si, offset, SWAP_HAS_CACHE); + for (i = 0; i < nr; i++) + usage[i] = __swap_entry_free_locked(si, offset + i, SWAP_HAS_CACHE); unlock_cluster_or_swap_info(si, ci); - if (!usage) - free_swap_slot(entry); + for (i = 0; i < nr; i++) { + if (!usage[i]) { + free_swap_slot(entry); + entry.val++; + } + } +} + +void swapcache_clear(struct swap_info_struct *si, swp_entry_t entry) +{ + swapcache_clear_nr(si, entry, 1); } struct swap_info_struct *swp_swap_info(swp_entry_t entry) From patchwork Thu Feb 29 00:37:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 208117 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2097:b0:108:e6aa:91d0 with SMTP id gs23csp96502dyb; Wed, 28 Feb 2024 16:39:54 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCVser7IILN9sWndc2aRSOEIHwYQv287sLkbvg5kAIHvhgN+bQqxO/yp7gjJ4AHr26dfNSLxNJzxsbzybyOrWF74ZwlN3w== X-Google-Smtp-Source: AGHT+IFqvuyJ721Aq4b7k2KteP4b9aJQQICNzBZ/5UGNJPRc+dcMvvWKZCCn3d0s8KVj1LwN+dbd X-Received: by 2002:a17:906:6d57:b0:a43:5aad:f20c with SMTP id a23-20020a1709066d5700b00a435aadf20cmr309434ejt.2.1709167194710; Wed, 28 Feb 2024 16:39:54 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709167194; cv=pass; d=google.com; s=arc-20160816; b=TmjGyouMON3XzaAnO9KkwqFfniF2CwZC2jQtzDndFrOM6W166MIJmdBl+LaXZdgqFK OK4mtq229/Fw5d8Faa5/NZ9qBwluKY6tBNlf0okcCNitOC1qV2TvUwau2POoXZoum4yP QdFkvgvseiwt3CwJ40myPdMLk1bMwtU/VWnfyY84yw/alVw2f6SBiNtOSqGYpV28/U+a zgO+1YohJYtUde/ZnxB0PyP9q62MXU7ViaZ5iAfU2q5S6+iNR+f6OeQO1xvqPP2jYJ3U CWRa43qlyl2YCmchYXpDlBFkpVX0cKGgJyFyXmxDvzXVx8sba19LBBa1PiaW0diuvX9J 40GQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=qbI/XApED6aL6ip36JGDw3HTQoLAAxWC9U5cpk4hjWc=; fh=ak1zq5bqONxYpFRUkJtPRq9o1cgCqefmWuit5rTgkuc=; b=llkOTW4YNMQ2D15cp4aEt9VIM85H5hxsHySFesV+usSgRwYAPNY6Ft00E/VcY7k1j1 REtlddHZmzflhv15b+pibWtexGBOCkK0vc+nUSUON0+hqc7wz+Mu9uvx6oRq7ozU2IWC kKIHmPUmugw+inf+LCKAauclvFvr1GthGdO13RLyK/DxsW/Qf8+xWWDahAVY5V0zq6xw 6CCwFNVXLv1nWXfBrD1c0nVpbuafLmtMzL2oaaTLg1euiSWmH/VX+aie5lnudeAvKnXk zH9nTHHv0/nE/tByUmnL/awdzVCP6ClbkqaQOBcQJUfcm0byqK2M35PMEooIb2HM682R rvag==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=eahb1Mc7; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-85930-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-85930-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id gn34-20020a1709070d2200b00a3e7a559099si66861ejc.716.2024.02.28.16.39.54 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Feb 2024 16:39:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-85930-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=eahb1Mc7; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-85930-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-85930-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 2C3FF1F251C3 for ; Thu, 29 Feb 2024 00:39:54 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 401FD22338; Thu, 29 Feb 2024 00:39:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="eahb1Mc7" Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6C2162772A for ; Thu, 29 Feb 2024 00:39:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709167148; cv=none; b=VXn3tONwMC8wLvnOwrxJPPbVr/df6Xy6G4guOWqV2W2cZoNNebmrbZfAMSlQorq9Lu3uLkjPOm1B3k4pkXQnq0mfCQQoN12KoJasjuUj7N6WA9kHDlyUxdCSJlkRblnF+U7C6Ll5GANymEbMyNk40WO4ZwZ4RyURBJq9GkzTujU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709167148; c=relaxed/simple; bh=6f5FbMtgjFQJN+a8Ug3AwZsfgcPneqGUkmK0PoSEDgk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=LLU7DaPV6oRuf9xCddzIvRq7UMYsOUuMSZG/w7D4a8bVp/9bKeL8L0bSyhzBxlQJS7vqI+KEUGlI9rolr02C1kIA2NEBHIqpMJY6prwIfjuYWnKQzXBNX2B105swJCQ4cDVQRwOv447TAuRX20hMGH7lCbpxzvpX6fWWZZU1lzA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=eahb1Mc7; arc=none smtp.client-ip=209.85.214.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pl1-f176.google.com with SMTP id d9443c01a7336-1d93edfa76dso4077485ad.1 for ; Wed, 28 Feb 2024 16:39:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709167146; x=1709771946; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=qbI/XApED6aL6ip36JGDw3HTQoLAAxWC9U5cpk4hjWc=; b=eahb1Mc7HlHzH/r+qKtJ7ti9VTrEAZA5NIihXqg3Y30mLE9V6R2eY65vQrJ7VOSZYO 4aezt/9QII4sUjF8/FgtLLWQvG8fWIl3VlfZAmbkTJ8v0ZmFpFcQ1KnehqBJW0bJdiVN CvxrGJINTc6LIfycMUqm5t+SeWEO7Qr0FEB6kzGVvGTJ8DDSQCmUW+JKIIi2+bsZQ5/Z WVkt8buG6+u4tNKNbEnOULm8GRnh6/9cfYRh4/9/QFweFmpJu9CRsNo8oYPoLfKbjeTK 1BxpNTIvr1PVnmKtefJZ8skvlMZ+23ILbT0zFFTGKkAUeWvne492MDYfZuCQ6QOVHQ76 iazQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709167146; x=1709771946; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=qbI/XApED6aL6ip36JGDw3HTQoLAAxWC9U5cpk4hjWc=; b=dgMJtPXjoWVR/rb6CFbrCvljhouEKhL6wXJBiyZQ9qVNTURqr2rokYgHYV8kJsoJum Hc6EPyV6bzIqtsgd2yejrN8K2ff3+TnzjeG8Pvnm1oBCCidOMv2UeV2nJZsLMwaI52Za u1YlHWn++h2f2Y/t/TLC5abgO8hdunRtN4yfNf3JIjV2d0FiKqWttWtbZ04PZ3Hla/gi OUbflt9SOci5IGL+6XIXkakC3PYX4yliEyxrxBEI/h+eGZb5gvirX+FCBx6nueKTtL7+ 0NV0h5BU/UEVRJbOkPwwm92rg6HsGm/466veFR0FvICnRFsLCDu0Hk8NRFzp8jS/u4EQ KFIQ== X-Forwarded-Encrypted: i=1; AJvYcCWlY/HR+STd/WdabtDHQzzLdBQTkTztya1B4+73eAxmUCQj2LpdA/BIXDTaiV+tu1Sn2DSf+1BbvCE5hhrJ3M1+RpVHSb03/4t8Q3En X-Gm-Message-State: AOJu0YzBHlJVs/pHbsKnwZMgnaniTEOB0wXcmJHMFuGvJ8kfeZt/5+GV Ul+beHBRDdaAh8jkbwq/55jN/SWPD9oEHLILX+nNCcEDwJ8aB+Fh X-Received: by 2002:a17:902:b192:b0:1dc:5efc:8498 with SMTP id s18-20020a170902b19200b001dc5efc8498mr527487plr.56.1709167145576; Wed, 28 Feb 2024 16:39:05 -0800 (PST) Received: from localhost.localdomain ([2407:7000:8942:5500:5158:ed66:78b3:7fda]) by smtp.gmail.com with ESMTPSA id p3-20020a170902780300b001d9641003cfsm62647pll.142.2024.02.28.16.38.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Feb 2024 16:39:05 -0800 (PST) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, david@redhat.com, linux-mm@kvack.org, ryan.roberts@arm.com, chrisl@kernel.org Cc: 21cnbao@gmail.com, linux-kernel@vger.kernel.org, mhocko@suse.com, shy828301@gmail.com, steven.price@arm.com, surenb@google.com, wangkefeng.wang@huawei.com, willy@infradead.org, xiang@kernel.org, ying.huang@intel.com, yuzhao@google.com, kasong@tencent.com, yosryahmed@google.com, nphamcs@gmail.com, chengming.zhou@linux.dev, hannes@cmpxchg.org, linux-arm-kernel@lists.infradead.org, Chuanhua Han , Barry Song Subject: [PATCH RFC v2 5/5] mm: support large folios swapin as a whole Date: Thu, 29 Feb 2024 13:37:53 +1300 Message-Id: <20240229003753.134193-6-21cnbao@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240229003753.134193-1-21cnbao@gmail.com> References: <20240229003753.134193-1-21cnbao@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1792191700346850448 X-GMAIL-MSGID: 1792191700346850448 From: Chuanhua Han On an embedded system like Android, more than half of anon memory is actually in swap devices such as zRAM. For example, while an app is switched to back- ground, its most memory might be swapped-out. Now we have mTHP features, unfortunately, if we don't support large folios swap-in, once those large folios are swapped-out, we immediately lose the performance gain we can get through large folios and hardware optimization such as CONT-PTE. This patch brings up mTHP swap-in support. Right now, we limit mTHP swap-in to those contiguous swaps which were likely swapped out from mTHP as a whole. On the other hand, the current implementation only covers the SWAP_SYCHRONOUS case. It doesn't support swapin_readahead as large folios yet. Right now, we are re-faulting large folios which are still in swapcache as a whole, this can effectively decrease extra loops and early-exitings which we have increased in arch_swap_restore() while supporting MTE restore for folios rather than page. On the other hand, it can also decrease do_swap_page as PTEs used to be set one by one even we hit a large folio in swapcache. Signed-off-by: Chuanhua Han Co-developed-by: Barry Song Signed-off-by: Barry Song --- mm/memory.c | 191 ++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 157 insertions(+), 34 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 90b08b7cbaac..471689ce4e91 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -104,9 +104,16 @@ struct page *mem_map; EXPORT_SYMBOL(mem_map); #endif +/* A choice of behaviors for alloc_anon_folio() */ +enum behavior { + DO_SWAP_PAGE, + DO_ANON_PAGE, +}; + static vm_fault_t do_fault(struct vm_fault *vmf); static vm_fault_t do_anonymous_page(struct vm_fault *vmf); static bool vmf_pte_changed(struct vm_fault *vmf); +static struct folio *alloc_anon_folio(struct vm_fault *vmf, enum behavior behavior); /* * Return true if the original pte was a uffd-wp pte marker (so the pte was @@ -3974,6 +3981,52 @@ static vm_fault_t handle_pte_marker(struct vm_fault *vmf) return VM_FAULT_SIGBUS; } +/* + * check a range of PTEs are completely swap entries with + * contiguous swap offsets and the same SWAP_HAS_CACHE. + * pte must be first one in the range + */ +static bool is_pte_range_contig_swap(pte_t *pte, int nr_pages) +{ + int i; + struct swap_info_struct *si; + swp_entry_t entry; + unsigned type; + pgoff_t start_offset; + char has_cache; + + entry = pte_to_swp_entry(ptep_get_lockless(pte)); + if (non_swap_entry(entry)) + return false; + start_offset = swp_offset(entry); + if (start_offset % nr_pages) + return false; + + si = swp_swap_info(entry); + type = swp_type(entry); + has_cache = si->swap_map[start_offset] & SWAP_HAS_CACHE; + for (i = 1; i < nr_pages; i++) { + entry = pte_to_swp_entry(ptep_get_lockless(pte + i)); + if (non_swap_entry(entry)) + return false; + if (swp_offset(entry) != start_offset + i) + return false; + if (swp_type(entry) != type) + return false; + /* + * while allocating a large folio and doing swap_read_folio for the + * SWP_SYNCHRONOUS_IO path, which is the case the being faulted pte + * doesn't have swapcache. We need to ensure all PTEs have no cache + * as well, otherwise, we might go to swap devices while the content + * is in swapcache + */ + if ((si->swap_map[start_offset + i] & SWAP_HAS_CACHE) != has_cache) + return false; + } + + return true; +} + /* * We enter with non-exclusive mmap_lock (to exclude vma changes, * but allow concurrent faults), and pte mapped but not yet locked. @@ -3995,6 +4048,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) pte_t pte; vm_fault_t ret = 0; void *shadow = NULL; + int nr_pages = 1; + unsigned long start_address; + pte_t *start_pte; if (!pte_unmap_same(vmf)) goto out; @@ -4058,28 +4114,32 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) if (!folio) { if (data_race(si->flags & SWP_SYNCHRONOUS_IO) && __swap_count(entry) == 1) { - /* - * Prevent parallel swapin from proceeding with - * the cache flag. Otherwise, another thread may - * finish swapin first, free the entry, and swapout - * reusing the same entry. It's undetectable as - * pte_same() returns true due to entry reuse. - */ - if (swapcache_prepare(entry)) { - /* Relax a bit to prevent rapid repeated page faults */ - schedule_timeout_uninterruptible(1); - goto out; - } - need_clear_cache = true; - /* skip swapcache */ - folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, - vma, vmf->address, false); + folio = alloc_anon_folio(vmf, DO_SWAP_PAGE); page = &folio->page; if (folio) { __folio_set_locked(folio); __folio_set_swapbacked(folio); + if (folio_test_large(folio)) { + nr_pages = folio_nr_pages(folio); + entry.val = ALIGN_DOWN(entry.val, nr_pages); + } + + /* + * Prevent parallel swapin from proceeding with + * the cache flag. Otherwise, another thread may + * finish swapin first, free the entry, and swapout + * reusing the same entry. It's undetectable as + * pte_same() returns true due to entry reuse. + */ + if (swapcache_prepare_nr(entry, nr_pages)) { + /* Relax a bit to prevent rapid repeated page faults */ + schedule_timeout_uninterruptible(1); + goto out; + } + need_clear_cache = true; + if (mem_cgroup_swapin_charge_folio(folio, vma->vm_mm, GFP_KERNEL, entry)) { @@ -4185,6 +4245,42 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) */ vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); + + start_address = vmf->address; + start_pte = vmf->pte; + if (folio_test_large(folio)) { + unsigned long nr = folio_nr_pages(folio); + unsigned long addr = ALIGN_DOWN(vmf->address, nr * PAGE_SIZE); + pte_t *aligned_pte = vmf->pte - (vmf->address - addr) / PAGE_SIZE; + + /* + * case 1: we are allocating large_folio, try to map it as a whole + * iff the swap entries are still entirely mapped; + * case 2: we hit a large folio in swapcache, and all swap entries + * are still entirely mapped, try to map a large folio as a whole. + * otherwise, map only the faulting page within the large folio + * which is swapcache + */ + if (!is_pte_range_contig_swap(aligned_pte, nr)) { + if (nr_pages > 1) /* ptes have changed for case 1 */ + goto out_nomap; + goto check_pte; + } + + start_address = addr; + start_pte = aligned_pte; + /* + * the below has been done before swap_read_folio() + * for case 1 + */ + if (unlikely(folio == swapcache)) { + nr_pages = nr; + entry.val = ALIGN_DOWN(entry.val, nr_pages); + page = &folio->page; + } + } + +check_pte: if (unlikely(!vmf->pte || !pte_same(ptep_get(vmf->pte), vmf->orig_pte))) goto out_nomap; @@ -4252,12 +4348,14 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) * We're already holding a reference on the page but haven't mapped it * yet. */ - swap_free(entry); + swap_nr_free(entry, nr_pages); if (should_try_to_free_swap(folio, vma, vmf->flags)) folio_free_swap(folio); - inc_mm_counter(vma->vm_mm, MM_ANONPAGES); - dec_mm_counter(vma->vm_mm, MM_SWAPENTS); + folio_ref_add(folio, nr_pages - 1); + add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages); + add_mm_counter(vma->vm_mm, MM_SWAPENTS, -nr_pages); + pte = mk_pte(page, vma->vm_page_prot); /* @@ -4267,14 +4365,14 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) * exclusivity. */ if (!folio_test_ksm(folio) && - (exclusive || folio_ref_count(folio) == 1)) { + (exclusive || folio_ref_count(folio) == nr_pages)) { if (vmf->flags & FAULT_FLAG_WRITE) { pte = maybe_mkwrite(pte_mkdirty(pte), vma); vmf->flags &= ~FAULT_FLAG_WRITE; } rmap_flags |= RMAP_EXCLUSIVE; } - flush_icache_page(vma, page); + flush_icache_pages(vma, page, nr_pages); if (pte_swp_soft_dirty(vmf->orig_pte)) pte = pte_mksoft_dirty(pte); if (pte_swp_uffd_wp(vmf->orig_pte)) @@ -4283,17 +4381,19 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) /* ksm created a completely new copy */ if (unlikely(folio != swapcache && swapcache)) { - folio_add_new_anon_rmap(folio, vma, vmf->address); + folio_add_new_anon_rmap(folio, vma, start_address); folio_add_lru_vma(folio, vma); + } else if (!folio_test_anon(folio)) { + folio_add_new_anon_rmap(folio, vma, start_address); } else { - folio_add_anon_rmap_pte(folio, page, vma, vmf->address, + folio_add_anon_rmap_ptes(folio, page, nr_pages, vma, start_address, rmap_flags); } VM_BUG_ON(!folio_test_anon(folio) || (pte_write(pte) && !PageAnonExclusive(page))); - set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte); - arch_do_swap_page(vma->vm_mm, vma, vmf->address, pte, vmf->orig_pte); + set_ptes(vma->vm_mm, start_address, start_pte, pte, nr_pages); + arch_do_swap_page(vma->vm_mm, vma, start_address, pte, vmf->orig_pte); folio_unlock(folio); if (folio != swapcache && swapcache) { @@ -4310,6 +4410,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) } if (vmf->flags & FAULT_FLAG_WRITE) { + if (nr_pages > 1) + vmf->orig_pte = ptep_get(vmf->pte); + ret |= do_wp_page(vmf); if (ret & VM_FAULT_ERROR) ret &= VM_FAULT_ERROR; @@ -4317,14 +4420,14 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) } /* No need to invalidate - it was non-present before */ - update_mmu_cache_range(vmf, vma, vmf->address, vmf->pte, 1); + update_mmu_cache_range(vmf, vma, start_address, start_pte, nr_pages); unlock: if (vmf->pte) pte_unmap_unlock(vmf->pte, vmf->ptl); out: /* Clear the swap cache pin for direct swapin after PTL unlock */ if (need_clear_cache) - swapcache_clear(si, entry); + swapcache_clear_nr(si, entry, nr_pages); if (si) put_swap_device(si); return ret; @@ -4340,7 +4443,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) folio_put(swapcache); } if (need_clear_cache) - swapcache_clear(si, entry); + swapcache_clear_nr(si, entry, nr_pages); if (si) put_swap_device(si); return ret; @@ -4358,7 +4461,7 @@ static bool pte_range_none(pte_t *pte, int nr_pages) return true; } -static struct folio *alloc_anon_folio(struct vm_fault *vmf) +static struct folio *alloc_anon_folio(struct vm_fault *vmf, enum behavior behavior) { struct vm_area_struct *vma = vmf->vma; #ifdef CONFIG_TRANSPARENT_HUGEPAGE @@ -4376,6 +4479,19 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf) if (unlikely(userfaultfd_armed(vma))) goto fallback; + /* + * a large folio being swapped-in could be partially in + * zswap and partially in swap devices, zswap doesn't + * support large folios yet, we might get corrupted + * zero-filled data by reading all subpages from swap + * devices while some of them are actually in zswap + */ + if (behavior == DO_SWAP_PAGE && is_zswap_enabled()) + goto fallback; + + if (unlikely(behavior != DO_ANON_PAGE && behavior != DO_SWAP_PAGE)) + return ERR_PTR(-EINVAL); + /* * Get a list of all the (large) orders below PMD_ORDER that are enabled * for this vma. Then filter out the orders that can't be allocated over @@ -4393,15 +4509,22 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf) return ERR_PTR(-EAGAIN); /* - * Find the highest order where the aligned range is completely - * pte_none(). Note that all remaining orders will be completely + * For do_anonymous_page, find the highest order where the aligned range is + * completely pte_none(). Note that all remaining orders will be completely * pte_none(). + * For do_swap_page, find the highest order where the aligned range is + * completely swap entries with contiguous swap offsets. */ order = highest_order(orders); while (orders) { addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order); - if (pte_range_none(pte + pte_index(addr), 1 << order)) - break; + if (behavior == DO_ANON_PAGE) { + if (pte_range_none(pte + pte_index(addr), 1 << order)) + break; + } else { + if (is_pte_range_contig_swap(pte + pte_index(addr), 1 << order)) + break; + } order = next_order(&orders, order); } @@ -4485,7 +4608,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) if (unlikely(anon_vma_prepare(vma))) goto oom; /* Returns NULL on OOM or ERR_PTR(-EAGAIN) if we must retry the fault */ - folio = alloc_anon_folio(vmf); + folio = alloc_anon_folio(vmf, DO_ANON_PAGE); if (IS_ERR(folio)) return 0; if (!folio)