From patchwork Mon Mar 4 08:13:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 209406 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:fa17:b0:10a:f01:a869 with SMTP id ju23csp1282207dyc; Mon, 4 Mar 2024 00:14:47 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCVDGbO6qtPxJelNReH9FXctE5Tja5CNRzuDhzUNLT+rXo9Ts0KL2Z0BU909HLSiIcUX5M6sVbHXjf9tmilsr1E71F/iig== X-Google-Smtp-Source: AGHT+IE5BXdbqsH9KaFOPo2AB2Cm0a9EsBuUe2Lic+tmPjJ+Z3bO51zy98FzffUsM/nXh69L5X9S X-Received: by 2002:a05:6214:a6d:b0:690:65a9:3403 with SMTP id ef13-20020a0562140a6d00b0069065a93403mr5156030qvb.61.1709540086958; Mon, 04 Mar 2024 00:14:46 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709540086; cv=pass; d=google.com; s=arc-20160816; b=ZviIycBouN8mLbucd1w2HjtQLJ+eJlEiGL6UBljHWVLXTPkMkdVMv2KzDAJf+3vClk ypYztKYRUgiRNhI440YQaAyHtpwm3YOosB4mDjWiub3YVRH6UD8apiQgBa4r2UStGH3s kU8FyBttqzeSYcxh2TcOh8u9zf04K5b37iYYYjxX6BzvP8WP9wUKNW59lRiCwpn+McCj NMm3HLet61u5GeTgfr4Rvf7v6Z7/SC3nGmvYwAeZcJ+M7xhZWn8GNWz09XVjMMDE6r11 kBL4KcS0qaV/FLlCOp6Flm04NsWpP9Um4cyWh2cd8HIdjVeSYOwhqUhKXNtm27FQeNPA BgSA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=y59EUmsn936TktKB/aH9UYKKlrxQ0A+jm6Fw0p9O+fA=; fh=MkhfaQVJWVvfT+2r61oUq9+Xm11iQsAtRzICp1/vz5E=; b=XUOM2L/+JZyug1q3wq1FLCRRAburW2dvGMwNmZL49Nd6ua35crU986SLeN8797NNxO tvKbytJ7u+dDF/iLJU4IEn1H5VHOJKUPI2p+b9zH1HG2KRwgQWNP1P7AHlNJRJq4Kb36 1KI8fd7xfheB+e0wWB2h5MhuJbebx/wIsFSZq52xUp6BjBd1GmVjjDgpKvnWfgn+SBRr acYcO/gus792NbPEhSlRvyu/10nAOr6gJH63o4DEoRa6JT/b2ud2ggdKJSp+8cyIxKoy 1gRoTy7YtuCmcuTlV1RBv7OcDIenzTfPasfryTndvWk8rDz9YyW8YjQ3SpB+IgXxBLrx HGkA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=Nt5iBoBh; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-90176-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-90176-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id gw6-20020a0562140f0600b006903cb0e289si9103085qvb.574.2024.03.04.00.14.46 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Mar 2024 00:14:46 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-90176-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=Nt5iBoBh; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-90176-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-90176-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id AFDFF1C21B54 for ; Mon, 4 Mar 2024 08:14:46 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 30A5C17588; Mon, 4 Mar 2024 08:14:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Nt5iBoBh" Received: from mail-pf1-f178.google.com (mail-pf1-f178.google.com [209.85.210.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A2A691756D for ; Mon, 4 Mar 2024 08:14:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709540070; cv=none; b=OoUAYIB7Uxe/8p4ZFoL02cl5u5gfNQfTLscxzGIs7LTgpW62U/sAPkuuc32ylvdMi8RryZN0PjLDYXJoiXWt+ggYU4Mvw4/rNbhjtqj4ZDPDMjrBjg821He+irYjH7D7vtQD+6xCUlH9X6+UBW7szmWwnvfloE9Fj8tsODXx3cg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709540070; c=relaxed/simple; bh=TthfiCTSHniBg07/g9kff9ziRLL3IKn2kCZUYvskwmg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=NeLUf9HSCi2hdOKoLW9W0BQdfBfpBK1YbZhHcJP9ooSDemXI7Zmy+2CbOkq8quK00fqkKEXWC2G+335HCMfrMtetyZBCXD738ZNnPRMfqUwQ94smXWvLEFJiMI8RBozOuExJYwUlS5tqKo9T0mDFlrqCGcqNVP3aywrxyNxFyJM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Nt5iBoBh; arc=none smtp.client-ip=209.85.210.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pf1-f178.google.com with SMTP id d2e1a72fcca58-6e558a67f70so3659650b3a.0 for ; Mon, 04 Mar 2024 00:14:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709540068; x=1710144868; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=y59EUmsn936TktKB/aH9UYKKlrxQ0A+jm6Fw0p9O+fA=; b=Nt5iBoBhVkZOGXpTIU7AKV1U0/ax2pulTqFUxQn5c1/Gc59xGjn/bfVcq0ZK9RRF06 jhbesD3fT2PnqSWm62/U6Sjqr3GXhgDSvu6d0BKpJsi1L/UxpGB5NtLYZqriR3KsWscx SafRcHqx4aBs9wW4f3V580g+ybc7XNdPYaJY3kXkYLYs3iKQp9un26TwKDgqaUjQrB49 nluAgQ2kMbph4fVyO4ARD3OeObN7RfkTgTb9SMBKCbFt7daJfSSN75VrZwOnhHjJT41L UEWDcvdfN5vV+sHw6C0SdWhuf8/T+bEtzPufl6niC1dVVbul76Rv4HCwX9R2r2a9VAfA De+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709540068; x=1710144868; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=y59EUmsn936TktKB/aH9UYKKlrxQ0A+jm6Fw0p9O+fA=; b=hLKURDW81yhHmcErRvEv+vq5uL59OP8VMJvUmHKMYl1Dx9RoWuS+WvWTdAq7o2AHhB MqDBw7fwIqwrZpKiwHPn6CeXc6IRHq6VBUKPx7NIkZ6Y7uZSAv22BdlrT5Gwzkl055x2 7qGTfM/1kw7ZMv6py7qGsNkxA/0Ftc91Cv20Ken3MHpdqMqZxqGqdtLDcRnMrrf0Y1R2 3nVFw3Hb2+onja5nf55pBzLSN0us93iPKa++KAkpWvDUddCVnFEFkuAy6HoZErMyGOtt 2eTOG9f8ReirkCFnvKZqUCiS8yZW2WszHGLrgriIdlWvbanDbgaI6cideXxQybWb6WOr kXSg== X-Forwarded-Encrypted: i=1; AJvYcCU1IEuKTKH2kdxysfSodM6di+rZfrnninIkGto6vLaqqXc6MESiD6Q0JLiTx4uLeoGNVOOLGktfEp/4D6318lFZpe48c2P+mR3YUe51 X-Gm-Message-State: AOJu0Yz2ZQ7rFXSoB9qo3ECBmmbHn4JtD4GNFXm/EHKF+V/PwyxXJwwj RDsKkT4BXaBWIFsQUcjF72H1DvRvjWA6W7PC562uFSBbALwJx15J X-Received: by 2002:a05:6a00:b8f:b0:6e6:13ec:7178 with SMTP id g15-20020a056a000b8f00b006e613ec7178mr2688020pfj.32.1709540067899; Mon, 04 Mar 2024 00:14:27 -0800 (PST) Received: from localhost.localdomain ([2407:7000:8942:5500:aaa1:59ff:fe57:eb97]) by smtp.gmail.com with ESMTPSA id ka42-20020a056a0093aa00b006e558a67374sm6686387pfb.0.2024.03.04.00.14.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Mar 2024 00:14:27 -0800 (PST) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org, ryan.roberts@arm.com Cc: chengming.zhou@linux.dev, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, kasong@tencent.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, mhocko@suse.com, nphamcs@gmail.com, shy828301@gmail.com, steven.price@arm.com, surenb@google.com, wangkefeng.wang@huawei.com, willy@infradead.org, xiang@kernel.org, ying.huang@intel.com, yosryahmed@google.com, yuzhao@google.com, Barry Song , Catalin Marinas , Will Deacon , Mark Rutland , Kemeng Shi , Anshuman Khandual , Peter Collingbourne , Peter Xu , Lorenzo Stoakes , "Mike Rapoport (IBM)" , Hugh Dickins , "Aneesh Kumar K.V" , Rick Edgecombe Subject: [RFC PATCH v3 1/5] arm64: mm: swap: support THP_SWAP on hardware with MTE Date: Mon, 4 Mar 2024 21:13:44 +1300 Message-Id: <20240304081348.197341-2-21cnbao@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240304081348.197341-1-21cnbao@gmail.com> References: <20240304081348.197341-1-21cnbao@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1792582706504565075 X-GMAIL-MSGID: 1792582706504565075 From: Barry Song Commit d0637c505f8a1 ("arm64: enable THP_SWAP for arm64") brings up THP_SWAP on ARM64, but it doesn't enable THP_SWP on hardware with MTE as the MTE code works with the assumption tags save/restore is always handling a folio with only one page. The limitation should be removed as more and more ARM64 SoCs have this feature. Co-existence of MTE and THP_SWAP becomes more and more important. This patch makes MTE tags saving support large folios, then we don't need to split large folios into base pages for swapping out on ARM64 SoCs with MTE any more. arch_prepare_to_swap() should take folio rather than page as parameter because we support THP swap-out as a whole. It saves tags for all pages in a large folio. As now we are restoring tags based-on folio, in arch_swap_restore(), we may increase some extra loops and early-exitings while refaulting a large folio which is still in swapcache in do_swap_page(). In case a large folio has nr pages, do_swap_page() will only set the PTE of the particular page which is causing the page fault. Thus do_swap_page() runs nr times, and each time, arch_swap_restore() will loop nr times for those subpages in the folio. So right now the algorithmic complexity becomes O(nr^2). Once we support mapping large folios in do_swap_page(), extra loops and early-exitings will decrease while not being completely removed as a large folio might get partially tagged in corner cases such as, 1. a large folio in swapcache can be partially unmapped, thus, MTE tags for the unmapped pages will be invalidated; 2. users might use mprotect() to set MTEs on a part of a large folio. arch_thp_swp_supported() is dropped since ARM64 MTE was the only one who needed it. Cc: Catalin Marinas Cc: Will Deacon Cc: Ryan Roberts Cc: Mark Rutland Cc: David Hildenbrand Cc: Kemeng Shi Cc: "Matthew Wilcox (Oracle)" Cc: Anshuman Khandual Cc: Peter Collingbourne Cc: Steven Price Cc: Yosry Ahmed Cc: Peter Xu Cc: Lorenzo Stoakes Cc: "Mike Rapoport (IBM)" Cc: Hugh Dickins CC: "Aneesh Kumar K.V" Cc: Rick Edgecombe Signed-off-by: Barry Song Reviewed-by: Steven Price Acked-by: Chris Li --- arch/arm64/include/asm/pgtable.h | 19 ++------------ arch/arm64/mm/mteswap.c | 43 ++++++++++++++++++++++++++++++++ include/linux/huge_mm.h | 12 --------- include/linux/pgtable.h | 2 +- mm/page_io.c | 2 +- mm/swap_slots.c | 2 +- 6 files changed, 48 insertions(+), 32 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 401087e8a43d..7a54750770b8 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -45,12 +45,6 @@ __flush_tlb_range(vma, addr, end, PUD_SIZE, false, 1) #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ -static inline bool arch_thp_swp_supported(void) -{ - return !system_supports_mte(); -} -#define arch_thp_swp_supported arch_thp_swp_supported - /* * Outside of a few very special situations (e.g. hibernation), we always * use broadcast TLB invalidation instructions, therefore a spurious page @@ -1095,12 +1089,7 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma, #ifdef CONFIG_ARM64_MTE #define __HAVE_ARCH_PREPARE_TO_SWAP -static inline int arch_prepare_to_swap(struct page *page) -{ - if (system_supports_mte()) - return mte_save_tags(page); - return 0; -} +extern int arch_prepare_to_swap(struct folio *folio); #define __HAVE_ARCH_SWAP_INVALIDATE static inline void arch_swap_invalidate_page(int type, pgoff_t offset) @@ -1116,11 +1105,7 @@ static inline void arch_swap_invalidate_area(int type) } #define __HAVE_ARCH_SWAP_RESTORE -static inline void arch_swap_restore(swp_entry_t entry, struct folio *folio) -{ - if (system_supports_mte()) - mte_restore_tags(entry, &folio->page); -} +extern void arch_swap_restore(swp_entry_t entry, struct folio *folio); #endif /* CONFIG_ARM64_MTE */ diff --git a/arch/arm64/mm/mteswap.c b/arch/arm64/mm/mteswap.c index a31833e3ddc5..295836fef620 100644 --- a/arch/arm64/mm/mteswap.c +++ b/arch/arm64/mm/mteswap.c @@ -68,6 +68,13 @@ void mte_invalidate_tags(int type, pgoff_t offset) mte_free_tag_storage(tags); } +static inline void __mte_invalidate_tags(struct page *page) +{ + swp_entry_t entry = page_swap_entry(page); + + mte_invalidate_tags(swp_type(entry), swp_offset(entry)); +} + void mte_invalidate_tags_area(int type) { swp_entry_t entry = swp_entry(type, 0); @@ -83,3 +90,39 @@ void mte_invalidate_tags_area(int type) } xa_unlock(&mte_pages); } + +int arch_prepare_to_swap(struct folio *folio) +{ + long i, nr; + int err; + + if (!system_supports_mte()) + return 0; + + nr = folio_nr_pages(folio); + + for (i = 0; i < nr; i++) { + err = mte_save_tags(folio_page(folio, i)); + if (err) + goto out; + } + return 0; + +out: + while (i--) + __mte_invalidate_tags(folio_page(folio, i)); + return err; +} + +void arch_swap_restore(swp_entry_t entry, struct folio *folio) +{ + if (system_supports_mte()) { + long i, nr = folio_nr_pages(folio); + + entry.val -= swp_offset(entry) & (nr - 1); + for (i = 0; i < nr; i++) { + mte_restore_tags(entry, folio_page(folio, i)); + entry.val++; + } + } +} diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index de0c89105076..e04b93c43965 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -535,16 +535,4 @@ static inline int split_folio_to_order(struct folio *folio, int new_order) #define split_folio_to_list(f, l) split_folio_to_list_to_order(f, l, 0) #define split_folio(f) split_folio_to_order(f, 0) -/* - * archs that select ARCH_WANTS_THP_SWAP but don't support THP_SWP due to - * limitations in the implementation like arm64 MTE can override this to - * false - */ -#ifndef arch_thp_swp_supported -static inline bool arch_thp_swp_supported(void) -{ - return true; -} -#endif - #endif /* _LINUX_HUGE_MM_H */ diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index e1b22903f709..bfcfe3386934 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1106,7 +1106,7 @@ static inline int arch_unmap_one(struct mm_struct *mm, * prototypes must be defined in the arch-specific asm/pgtable.h file. */ #ifndef __HAVE_ARCH_PREPARE_TO_SWAP -static inline int arch_prepare_to_swap(struct page *page) +static inline int arch_prepare_to_swap(struct folio *folio) { return 0; } diff --git a/mm/page_io.c b/mm/page_io.c index ae2b49055e43..a9a7c236aecc 100644 --- a/mm/page_io.c +++ b/mm/page_io.c @@ -189,7 +189,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc) * Arch code may have to preserve more data than just the page * contents, e.g. memory tags. */ - ret = arch_prepare_to_swap(&folio->page); + ret = arch_prepare_to_swap(folio); if (ret) { folio_mark_dirty(folio); folio_unlock(folio); diff --git a/mm/swap_slots.c b/mm/swap_slots.c index 90973ce7881d..53abeaf1371d 100644 --- a/mm/swap_slots.c +++ b/mm/swap_slots.c @@ -310,7 +310,7 @@ swp_entry_t folio_alloc_swap(struct folio *folio) entry.val = 0; if (folio_test_large(folio)) { - if (IS_ENABLED(CONFIG_THP_SWAP) && arch_thp_swp_supported()) + if (IS_ENABLED(CONFIG_THP_SWAP)) get_swap_pages(1, &entry, folio_nr_pages(folio)); goto out; } From patchwork Mon Mar 4 08:13:45 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 209407 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:fa17:b0:10a:f01:a869 with SMTP id ju23csp1282287dyc; Mon, 4 Mar 2024 00:14:59 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCWCIME6Lnp4WqEUhuPBIIXMJwVl5SxHxebSziK7yJKC8B9ruq5dZc+bPNPgnPowigmnBLyXyCvseH/nZhDoOeTk6M4s5w== X-Google-Smtp-Source: AGHT+IFS4nu1jTdAt6o2D6eqgsNKvf6LRD5pL73EgQB4o7a5HZ7fHNiSygK43MDZhkbDFbNhNroh X-Received: by 2002:a17:902:f687:b0:1dc:b003:ed7a with SMTP id l7-20020a170902f68700b001dcb003ed7amr12372959plg.5.1709540099721; Mon, 04 Mar 2024 00:14:59 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709540099; cv=pass; d=google.com; s=arc-20160816; b=sFaG1FZWae4b6PglWhZ9x+ZjDjbNYyjuYk8NvSEtebkEZiuQF89MdVCeUo97W83MYi rPDHDI6ZA9XuSysyxf2oJTIZmS+WObAz0qkSJ/NIY9swEC1C+pKCwxjMDfCB/hZPMYzC j2BnDSEjiWFk8nVGJkR9Tx7xtZuAIiX+i6sPy5ROWfOP2gJfmFt0D7kCWnScWm+zhjjD g/bP8sq5eVtmoPZvSWeoirycOvtqc4e2vnH/wWduNopPimcBGKbm4SIr9H2l6x0FULJ5 3OdkKkZpD0x946m4UO2p72cTrQbs1+h87pMtuYiARWcAudMfn8nBQPJoCX4Oj+danIo/ GuAg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=AOQhq+D2hctSN3bqux7faSvf1ZbbOD2cN4aBjG+5ulk=; fh=FU6o5G7a4t/GS2PwM1WGH7OVti0EMYK2DaLGVeqPRkM=; b=z6DV8c1HBspOEE54A5Iwdr2nXaKDmLrMEsFStywhx3gvy0yHLOYkG/D4Xu56snWAt0 UkQo/1Bl1IP33b6khVdVJwQklnc1rwPVSfN37+sMOq1BaxodX38+Ua86ntXfH6prj7iT 8kjoQTCsHEpMiutCdXacGzMB8NnFWNrGu7YaiIonVZSJPzwSRRgYQ/ONPmFuH7tNNhQX sc4IJQw/EyAiNjDuqWIPRlYN4IPAM0X3XK9z0FOTN6w2hdP0vhP9Qm2qGD1cqFskqLHs 0/nmSPGKCw0mVgVgUx75V8+6cMoFT7PtU88Mgu1rhD7xunDngNobkjB7ptq/w4nWOQpU I22g==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=D9d3OlLT; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-90177-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-90177-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id q4-20020a170902dac400b001dc4b04c51bsi7066353plx.295.2024.03.04.00.14.59 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Mar 2024 00:14:59 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-90177-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=D9d3OlLT; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-90177-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-90177-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 818AF28201A for ; Mon, 4 Mar 2024 08:14:59 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 9FE1C17995; Mon, 4 Mar 2024 08:14:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="D9d3OlLT" Received: from mail-pf1-f169.google.com (mail-pf1-f169.google.com [209.85.210.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7B2F3171CE for ; Mon, 4 Mar 2024 08:14:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709540077; cv=none; b=si3+7K3dRAbji60nU4EdEwJc1VWV/+OAioNAzJdKFZxFG0PwX2+Go9nvC554r6+oI5pd72MuBo6NlJi28f4tHpUJ4mKGS5fKcAURlkMDerDzMYX5Ca1x7MsBctty0GCeN+0DtaRnYr8o5VopX6evHOhXsPV/X+B+RkQd7ibhCoE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709540077; c=relaxed/simple; bh=PHC2OmMBrBHeHzSR537jJ2GXTFYmQehyNnf3GsevdDE=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Uiq+5tt6PyohlMdXWPMgWyuiAzPmBi6NcqCFFsTteKsul4CMYN2csm0ChxzV88mj1GcVwAj3+jf5xmW0EpdM1X5cUxjI/YdQ3zODdK6K39v9f9upZ5t0BSVGUshnKhk+xfYQLzrEKeqlBdoFCMbeRzjMwTFxcvhpzKPhu145LJ8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=D9d3OlLT; arc=none smtp.client-ip=209.85.210.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pf1-f169.google.com with SMTP id d2e1a72fcca58-6da202aa138so3020867b3a.2 for ; Mon, 04 Mar 2024 00:14:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709540076; x=1710144876; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=AOQhq+D2hctSN3bqux7faSvf1ZbbOD2cN4aBjG+5ulk=; b=D9d3OlLTK1TC2jOvPi5Z2pb9hrOzomJa8LOHJ1pOnCCRV2Ed7pJBudmN3kYk5vvRnE UuknfVLeyC1D8M51y3+1rjX94ysmP/g4lOij6qgYb5otD3QEds50EzaMhJ5Z1IyvW2Nr 9rwEN1/Cu+weVpibnDuqd+eqEnutI5estrPx3cKCp9gV5/ffJbHn8LPC+6RF6ql7NlOu gWrB4a2kqE7ArAd3k9fxSi+JzEgGoYkash5/7B6id8M8y51awA7bvNfqxxXBbX8qv/Sx TC2NhrQhDy0yqEEQercxGTUEb3ch6BnDW66eopNBRGx1CA/V+pqI6Mi55WoeSoxG8BYm Kw+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709540076; x=1710144876; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=AOQhq+D2hctSN3bqux7faSvf1ZbbOD2cN4aBjG+5ulk=; b=h1Qpv1clVrcb2SNTYbiXr/hji/WvqIqEQaVOz/PdDkB4nNbWgto0FYESn/l5MSluh/ b5eNKKbx08CW/q+XN/5YJKnTi+vN3N9Ev9GUGBnV+SsoAOX0nOX0MBe9XGcH580eGLYO tTTrPQnmOEN8tYAfgwZD1mlSHi9QboK4YXctbpWA+T71fHPLB7Uwyg8XH9pcA8HNb6ZA 9KnCjjqreVjn7EU8JZvGIqPHlSCHntrt848CHk3+33VyKsboLPD//7CZhcCb2lcBSNnu LdUzPSenKK2ncg1Oln5avATKaHxMkidW4aoy7h0CHnglcuc8/1/tBnKT628+agM/hkbW M0XQ== X-Forwarded-Encrypted: i=1; AJvYcCWYgyxOVjpNM1daiSzF0X31qvOHd5I8TqoYSPXOSCHEyfwI4j6tC3kU0la/mDspKqiBJqQqw8cRWtP9L5u8+EXYPrTPpXmaDHHO+moc X-Gm-Message-State: AOJu0YwPtiFb5GUtuDKfBMlEfI6++ZopdsRdc0vF+QKwMmENQRCE00R5 umi+fwlIMK8D4sxkwZzVA+Na9n6F+IYQ5gSTW96yzl+D1nDoF0I9 X-Received: by 2002:a05:6a20:979b:b0:1a1:3000:ab67 with SMTP id hx27-20020a056a20979b00b001a13000ab67mr9356185pzc.46.1709540075740; Mon, 04 Mar 2024 00:14:35 -0800 (PST) Received: from localhost.localdomain ([2407:7000:8942:5500:aaa1:59ff:fe57:eb97]) by smtp.gmail.com with ESMTPSA id ka42-20020a056a0093aa00b006e558a67374sm6686387pfb.0.2024.03.04.00.14.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Mar 2024 00:14:35 -0800 (PST) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org, ryan.roberts@arm.com Cc: chengming.zhou@linux.dev, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, kasong@tencent.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, mhocko@suse.com, nphamcs@gmail.com, shy828301@gmail.com, steven.price@arm.com, surenb@google.com, wangkefeng.wang@huawei.com, willy@infradead.org, xiang@kernel.org, ying.huang@intel.com, yosryahmed@google.com, yuzhao@google.com, Chuanhua Han , Barry Song Subject: [RFC PATCH v3 2/5] mm: swap: introduce swap_nr_free() for batched swap_free() Date: Mon, 4 Mar 2024 21:13:45 +1300 Message-Id: <20240304081348.197341-3-21cnbao@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240304081348.197341-1-21cnbao@gmail.com> References: <20240304081348.197341-1-21cnbao@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1792582719958949211 X-GMAIL-MSGID: 1792582719958949211 From: Chuanhua Han While swapping in a large folio, we need to free swaps related to the whole folio. To avoid frequently acquiring and releasing swap locks, it is better to introduce an API for batched free. Signed-off-by: Chuanhua Han Co-developed-by: Barry Song Signed-off-by: Barry Song --- include/linux/swap.h | 6 ++++++ mm/swapfile.c | 35 +++++++++++++++++++++++++++++++++++ 2 files changed, 41 insertions(+) diff --git a/include/linux/swap.h b/include/linux/swap.h index 2955f7a78d8d..d6ab27929458 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -481,6 +481,7 @@ extern void swap_shmem_alloc(swp_entry_t); extern int swap_duplicate(swp_entry_t); extern int swapcache_prepare(swp_entry_t); extern void swap_free(swp_entry_t); +extern void swap_nr_free(swp_entry_t entry, int nr_pages); extern void swapcache_free_entries(swp_entry_t *entries, int n); extern int free_swap_and_cache(swp_entry_t); int swap_type_of(dev_t device, sector_t offset); @@ -561,6 +562,11 @@ static inline void swap_free(swp_entry_t swp) { } +void swap_nr_free(swp_entry_t entry, int nr_pages) +{ + +} + static inline void put_swap_folio(struct folio *folio, swp_entry_t swp) { } diff --git a/mm/swapfile.c b/mm/swapfile.c index 3f594be83b58..244106998a69 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1341,6 +1341,41 @@ void swap_free(swp_entry_t entry) __swap_entry_free(p, entry); } +/* + * Called after swapping in a large folio, batched free swap entries + * for this large folio, entry should be for the first subpage and + * its offset is aligned with nr_pages + */ +void swap_nr_free(swp_entry_t entry, int nr_pages) +{ + int i; + struct swap_cluster_info *ci; + struct swap_info_struct *p; + unsigned type = swp_type(entry); + unsigned long offset = swp_offset(entry); + DECLARE_BITMAP(usage, SWAPFILE_CLUSTER) = { 0 }; + + /* all swap entries are within a cluster for mTHP */ + VM_BUG_ON(offset % SWAPFILE_CLUSTER + nr_pages > SWAPFILE_CLUSTER); + + if (nr_pages == 1) { + swap_free(entry); + return; + } + + p = _swap_info_get(entry); + + ci = lock_cluster(p, offset); + for (i = 0; i < nr_pages; i++) { + if (__swap_entry_free_locked(p, offset + i, 1)) + __bitmap_set(usage, i, 1); + } + unlock_cluster(ci); + + for_each_clear_bit(i, usage, nr_pages) + free_swap_slot(swp_entry(type, offset + i)); +} + /* * Called after dropping swapcache to decrease refcnt to swap entries. */ From patchwork Mon Mar 4 08:13:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 209408 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:fa17:b0:10a:f01:a869 with SMTP id ju23csp1282377dyc; Mon, 4 Mar 2024 00:15:12 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCX8LSBkS2qXDrS4o9uwTAeEukdcputMGfniP/ltW9ubeYTHzjJC7Hi6jPWLJSQ1SzDKd28W6Grena+t6Qy2SSuF3vOSpA== X-Google-Smtp-Source: AGHT+IHmH/P5PYz2oEJvk3dMU5LBZ73MSlNluLh9vAAfuUPxNLC6IkFj6SwgDpGjJTm32wPvzCHg X-Received: by 2002:a05:620a:6202:b0:787:da35:7bb9 with SMTP id ou2-20020a05620a620200b00787da357bb9mr8687381qkn.17.1709540112102; Mon, 04 Mar 2024 00:15:12 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709540112; cv=pass; d=google.com; s=arc-20160816; b=0yPeytvV8iWDiGKGpqZUq1jPCN02aa17S1GF30M8ukuDDcKnwxV3USR2J04dw/R8BQ /nr+SJGZFKmPcb14xnHFPkp9XeOuvb8fTKeWQHuJtUkdnsIEEDDcIUEUE7uTyZ9zrGXW HijAcjB5VRkcdqITP15Nwrgl7oZCDa2IHg+gYP9Tukm1Ber/w9g6EOjqvGO3GE3ZVgOM +MM6FSEPAcAnVZIaAVmhArQgaVqPCtuyg237o9v7/uaoX/Ad6/grZmn4qmDiQbmq/MxH Gl3bvWcAy6MI7GemRrp6dMSy2ibKMX4zO+pPYLUswTrIwgWXcEqKzzF6kOELOXf4q/hJ kJlA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=nPNnhD7u3Mm6wgXrNGhON5/bGI4wy7Z421ED7ypUMCI=; fh=rhVEr8MCaZXvvx+fn3MjJzZ0ungL5cVhybPOZ3ntD9U=; b=LHALYw6al6Lr7kxvt41ihQXTJokPMzz1LvmaQV6kG0PiHU84t6m+ZFM96tBMnh9sru JQ07UXdc/79QHqbsJiHB/eWK60zlX2hfZeKtTVHFCE+guLjeIT7z0MtDAxJJ6ctMn0Br 0RQJ7PvxdrMQGPIYBvIc0Xe8xCxsfIw1ohNn8t1bK/aGYzX2+ZtzE5/YQwv02aa6ESRN l1kcjiPcRCoKPg/QCqEk6Jl6BMTDIUyQwBQPSAN9l2iOstgF18fKHYYXNpYQEnork7Yh 7MK9FmPp2ne3uRqTWvgfRVXQY8qsLKIoHtZfMAGX1gdwTDKzAhm6teAZskBEyPexzXc9 N8NA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b="H6e/S3c4"; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-90178-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-90178-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id dw12-20020a05620a600c00b007881da20dc9si4265939qkb.404.2024.03.04.00.15.12 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Mar 2024 00:15:12 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-90178-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b="H6e/S3c4"; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-90178-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-90178-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id E0E421C21AF7 for ; Mon, 4 Mar 2024 08:15:11 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 693EB17588; Mon, 4 Mar 2024 08:14:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="H6e/S3c4" Received: from mail-io1-f43.google.com (mail-io1-f43.google.com [209.85.166.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7FBC617BA9 for ; Mon, 4 Mar 2024 08:14:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.43 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709540085; cv=none; b=A0cfFhy2sHYMMhrmWxmIj0KOuAhTqQfP7/6Fs+3yI1AEBc/e7ArnQlvS8MVUVW1QyrekqV7vQoCWdCLxCnKYxabV+lo6OXJYPuZvevbzkDAjW/slBz0L3tkyZrrrb0PDcinIcroBvV0/2jldytAxt2RJsc0mJA7svUjEhmdHa/g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709540085; c=relaxed/simple; bh=97XECp6ImGl0bxcVjD9vOm2RDIYctvChEeuK0x7tRrg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=jjXKh7u9f6mE3BKorRqm2wB8s1/702yXd7n19NGJ44qIu2rN/BP1cVE5qpxI9opx9YRUjLUXGS8H2hJtcfd18TI8hpSlGOBEGhm9t3fCQmIR6ZqSBJrUQNNEKgO0xP1Vo5a2RgmBl7yD9Vqup7477vecYC3bObBFH5jXzjfZI2Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=H6e/S3c4; arc=none smtp.client-ip=209.85.166.43 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-io1-f43.google.com with SMTP id ca18e2360f4ac-7bed9fb159fso259233639f.1 for ; Mon, 04 Mar 2024 00:14:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709540083; x=1710144883; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=nPNnhD7u3Mm6wgXrNGhON5/bGI4wy7Z421ED7ypUMCI=; b=H6e/S3c4QDzprosK/gcKxHAuMi17BoaNAUBWA6yvutonMR1hjZ5quIM9q4ydHdLUPK mYYbkKOxbGEFGRTYqRxYbtfbBZL/qEwNff4sjuBQNIUP1yRI3WbKTegaRgzMLwLjJv2Y 4sSItnarMVwRUXPEaiqugCursDRdZxQ5ahWFyO+UaU5pTyqpPKe7ruadjtpADL/kVsyi tsUCduR+85/UgU2pMH1nTdXEgWWgr6ovWgues7skBZ5GmhLxuzv1OCgXHUqVzKrx9Jvf JVvLKCS43ny1UPy4xp40fe77dG/bh9EtjkjSBMy4IE2upl2CMU8Bq/JtGaDtIdWZCbl/ IxoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709540083; x=1710144883; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nPNnhD7u3Mm6wgXrNGhON5/bGI4wy7Z421ED7ypUMCI=; b=W34g+PakdrWOj+2qyCDgX8wGJbZ34QopM9PVrGA536bR0WnMrDZ9zIiGEi3VUx8oly ck8dodIdviikGtywI78uLcIrbCfSoBCP+Ozgl1L+99s3aSKfBmXeXXDvKWIBa8rW6yY3 VZznoOLTdR78k/YoiflUrathnNFcENJtMnfPErPeAUzUzdpTolG0GLQ0AT7Z+vG9+YDT 4JVBJTgsDlcdahEVeI1f1xgXuG7dvnKzWm2bhqBnZee/vn4W7i1EGrw4821YiYX2zXBp wAKG8tQ7NAqrMHzprLG3A6YzdgUfGjBuUo3d29diYrZRtkkscaOxTu5ae7a2R6cOwRDz aqQQ== X-Forwarded-Encrypted: i=1; AJvYcCWMluaJzmg6eulZth0Fq3eX1YgGR6TMQzG+PkQIM9/T2c7dkEXLyMwr1nMurTZ0x24xAKuNbGMWe8SmeSml2RU0+DeUc2PZ408XK1C2 X-Gm-Message-State: AOJu0YxWi2WNjsV9KCUiRJmqQspwSKVIEn4jB4RIhm3qoduoyMQkb6UP AfKiHptdCycclF1p6dPo8KLOGIOnHqyODNEZW08YrtRn5NmOD01V X-Received: by 2002:a05:6e02:1567:b0:365:f8d:50c3 with SMTP id k7-20020a056e02156700b003650f8d50c3mr11150779ilu.21.1709540083664; Mon, 04 Mar 2024 00:14:43 -0800 (PST) Received: from localhost.localdomain ([2407:7000:8942:5500:aaa1:59ff:fe57:eb97]) by smtp.gmail.com with ESMTPSA id ka42-20020a056a0093aa00b006e558a67374sm6686387pfb.0.2024.03.04.00.14.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Mar 2024 00:14:43 -0800 (PST) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org, ryan.roberts@arm.com Cc: chengming.zhou@linux.dev, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, kasong@tencent.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, mhocko@suse.com, nphamcs@gmail.com, shy828301@gmail.com, steven.price@arm.com, surenb@google.com, wangkefeng.wang@huawei.com, willy@infradead.org, xiang@kernel.org, ying.huang@intel.com, yosryahmed@google.com, yuzhao@google.com, Chuanhua Han , Barry Song Subject: [RFC PATCH v3 3/5] mm: swap: make should_try_to_free_swap() support large-folio Date: Mon, 4 Mar 2024 21:13:46 +1300 Message-Id: <20240304081348.197341-4-21cnbao@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240304081348.197341-1-21cnbao@gmail.com> References: <20240304081348.197341-1-21cnbao@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1792582732807890587 X-GMAIL-MSGID: 1792582732807890587 From: Chuanhua Han should_try_to_free_swap() works with an assumption that swap-in is always done at normal page granularity, aka, folio_nr_pages = 1. To support large folio swap-in, this patch removes the assumption. Signed-off-by: Chuanhua Han Co-developed-by: Barry Song Signed-off-by: Barry Song Acked-by: Chris Li --- mm/memory.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/memory.c b/mm/memory.c index abd4f33d62c9..e0d34d705e07 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3837,7 +3837,7 @@ static inline bool should_try_to_free_swap(struct folio *folio, * reference only in case it's likely that we'll be the exlusive user. */ return (fault_flags & FAULT_FLAG_WRITE) && !folio_test_ksm(folio) && - folio_ref_count(folio) == 2; + folio_ref_count(folio) == (1 + folio_nr_pages(folio)); } static vm_fault_t pte_marker_clear(struct vm_fault *vmf) From patchwork Mon Mar 4 08:13:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 209409 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:fa17:b0:10a:f01:a869 with SMTP id ju23csp1282583dyc; Mon, 4 Mar 2024 00:15:39 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCX8Xg1E69Mvvme38BZg7oFi7C4Cic0RbuXM0TvW5d2eR2+lcFEi63gs8Rj8+3mSF17P5DEj6+ZApMWt6ikZOMlCZFC+SA== X-Google-Smtp-Source: AGHT+IH1jWKPKbct5rOutrrttO/nlDbMoSPLHMBJDrVKQ+yU86LNBm1HdLdNl0zlZrzoV+iMzTyF X-Received: by 2002:a17:906:8d6:b0:a43:1862:d7b with SMTP id o22-20020a17090608d600b00a4318620d7bmr5513098eje.15.1709540139124; Mon, 04 Mar 2024 00:15:39 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709540139; cv=pass; d=google.com; s=arc-20160816; b=uxNClkKUJJdjbqFd/POsLetBLzPfSn6DN0vcN31DOSc8nbYR3Ox5YsCE7otnN0mx27 v1qhWzI9gNupwJn2EMEn9KVOMJR9se3PaZR703evb2A6XIO3VnBXSeuvV4lB8U2GP9xn zmzCHZTXoFK5uG8u68pJd1oehqmMc0LX1R5esKOUocfxl+3aP7theUx1z5mZGUn/FeoB 4gaYyht1z5s6IEoO0eRPUzJ7TV2ZuJc7uknrYW12UUKEF9jqfId48q8p5i1H6+nE7pvE PDQYzV9OZpJW1sZOp45ERZdCdjloIh9PiQeJRTvhuXiYVroBxh6YTLFuyNLK92/HVvpj YNxQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=CAJqxwL1m9o2obT3SwA5MTdLBPpoaZY5B/QAJ6xv5j8=; fh=t5JlXmvAuaXWMBY7iADnd4QDEKS0i74jkNq5Bwhnj2A=; b=VcB479MW3EslzXmsb86ByI8h5bX8R1yryBm6sv5W8tUAT0ZirUOh1YqadJA9ZLq7YM IuMHpfj7LFO5dMzWTUkZw1qPQNNGScyjICRyzVhOYVUCmZiMmrHaGP05cRSDG+y7myOi nCGAPkI7mSBqiqykB9wwdf8PTl/JoE8av/SF7+7vpWH6Ada5WOIM+q5uIJYvxvI0wR1p nLhlCmEVu8/gJvdLsYBY1RlQCd9pVZfNJOHDBq+DZDElWki7LRGotYQnbma8HeraUW/3 jRlWnkcYff9cik5oRaDWFkzLPMTkulbHlVSzrhRX8jUFM5XLbgDAqPXfqp47jnV+ZlzJ dkAg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=Oka7TYIe; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-90180-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-90180-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id qk11-20020a1709077f8b00b00a4521c07b53si1039512ejc.298.2024.03.04.00.15.38 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Mar 2024 00:15:39 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-90180-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=Oka7TYIe; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-90180-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-90180-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 7290D1F2285D for ; Mon, 4 Mar 2024 08:15:38 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 921DF17C6A; Mon, 4 Mar 2024 08:15:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Oka7TYIe" Received: from mail-oi1-f179.google.com (mail-oi1-f179.google.com [209.85.167.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E2E3A17731 for ; Mon, 4 Mar 2024 08:15:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709540102; cv=none; b=tvMlfWdEgqa5F+eUiHewZFBHpFRV6Lbc5LJZajLQZkVQI95/70IlOM7uoiSWl4SFkWe9U6tTbKMC9mXPX4JwvT7gRA9WqNUcwZBo7hhLRpBKZM4ANUxCazNO/HyZrTVme3YmYbs6y4Zpl7G0AkkOk2arokb2U7EFt4jdy+OF1gI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709540102; c=relaxed/simple; bh=qJPCusJ04ma2IPDts7aPIAhRV53b4FPmjk2hD0OS2RQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=CuHNjREKLh0txdgDvnW2Snif7qQ+HLV/wlZW7JGexjgAEE+77ikxE5ov17/6uO2tkAJGotVsem95weC0Bn6NFelNRrbK1ZVqrGl2lFXXmoSw8Q64NIbGLSLlm0OaK8mrSviHMrrDYJL0trtWQ8CTJM1Up+EBWKJJfIWNkheGRBc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Oka7TYIe; arc=none smtp.client-ip=209.85.167.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-oi1-f179.google.com with SMTP id 5614622812f47-3c1adc90830so2570183b6e.0 for ; Mon, 04 Mar 2024 00:15:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709540100; x=1710144900; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=CAJqxwL1m9o2obT3SwA5MTdLBPpoaZY5B/QAJ6xv5j8=; b=Oka7TYIeZTjtj0Jjn8q/UYoM9aL+z2hJJNIU73PtM0LNRISK4dw2e+4xQugwJVRjFg hOs5QUBhB2v7XQxSJAD6vP5ArzS8D8BmK5eP5XbVNyW0YB6M86Ob/VJs8w2NlKS2QxIc 5Pg4LtUXMWGIbTvRK1PP056QxB2cMNzq2PMmZcBwms9FHpVyMVqMfe5/xm89/wonTUsC MVv2xIZEl9YSoOluU+kbml1f/OklpN5GTrr35Amlhbe0gKOpgO+frG2RgnbJ8I/2vMod /8fLY53WS4saOvRL65DmE4QrN08qSvgDVUEd1T4dVH/zIZDpQ3hKObOsKsVjGeVNlaZn tZXQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709540100; x=1710144900; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CAJqxwL1m9o2obT3SwA5MTdLBPpoaZY5B/QAJ6xv5j8=; b=t81bCGQakg/oBPj4SNxB/paZKuXT5qJAye8dPXenSU1qQlvOe/C7XRA+o3c+xb8DU6 2I+X5bP5O7D3FcOrZVQtw5TrZR/kkys9EXAyLf8PrnxtSiqCPCBGSDGpB5drCIwEykRX 5THWJwkj8NXPfoUKZAeB41cRxJg2YHMChaxuc08EP+2BDZ/n5g0oD/9VtbxdstJpZsPY G69NOmFvJ1Cm9l9qkc4Uda8+hXsmjASyZ6ka5/AGv44VOlLk6f0WZ9yYN04vf0nKOBpc mNYX1hNtiJuD6MwQGMHKl8gvMNk1OeQ9fSpIHcU9SihnSIDVEs0NSfV5X70M/dWdmvep Cdyg== X-Forwarded-Encrypted: i=1; AJvYcCUKjkR5k5RtHAJAyI9TvPsuhcteNgFuNnqwk7m72EOgKNp7AdygVr+YQPplPpj4EdXSdnKPc28jlHD07WG2RxIFdlk7QqgYsjFD2PKj X-Gm-Message-State: AOJu0YySPxHca66lXica159pyaxkg8RliS1dwAixQl1hG2zfNGfBblo3 JCbIRvUYAUZMvwhGyA3x3+n0Zpn2JmQFuHdcTVlsGB3cQ/mu5fEI X-Received: by 2002:a05:6870:e91:b0:21e:d80d:3f13 with SMTP id mm17-20020a0568700e9100b0021ed80d3f13mr11743190oab.58.1709540099802; Mon, 04 Mar 2024 00:14:59 -0800 (PST) Received: from localhost.localdomain ([2407:7000:8942:5500:aaa1:59ff:fe57:eb97]) by smtp.gmail.com with ESMTPSA id ka42-20020a056a0093aa00b006e558a67374sm6686387pfb.0.2024.03.04.00.14.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Mar 2024 00:14:59 -0800 (PST) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org, ryan.roberts@arm.com Cc: chengming.zhou@linux.dev, chrisl@kernel.org, david@redhat.com, hannes@cmpxchg.org, kasong@tencent.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, mhocko@suse.com, nphamcs@gmail.com, shy828301@gmail.com, steven.price@arm.com, surenb@google.com, wangkefeng.wang@huawei.com, willy@infradead.org, xiang@kernel.org, ying.huang@intel.com, yosryahmed@google.com, yuzhao@google.com, Chuanhua Han , Barry Song Subject: [RFC PATCH v3 5/5] mm: support large folios swapin as a whole Date: Mon, 4 Mar 2024 21:13:48 +1300 Message-Id: <20240304081348.197341-6-21cnbao@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240304081348.197341-1-21cnbao@gmail.com> References: <20240304081348.197341-1-21cnbao@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1792582760844938337 X-GMAIL-MSGID: 1792582760844938337 From: Chuanhua Han On an embedded system like Android, more than half of anon memory is actually in swap devices such as zRAM. For example, while an app is switched to background, its most memory might be swapped-out. Now we have mTHP features, unfortunately, if we don't support large folios swap-in, once those large folios are swapped-out, we immediately lose the performance gain we can get through large folios and hardware optimization such as CONT-PTE. This patch brings up mTHP swap-in support. Right now, we limit mTHP swap-in to those contiguous swaps which were likely swapped out from mTHP as a whole. Meanwhile, the current implementation only covers the SWAP_SYCHRONOUS case. It doesn't support swapin_readahead as large folios yet since this kind of shared memory is much less than memory mapped by single process. Right now, we are re-faulting large folios which are still in swapcache as a whole, this can effectively decrease extra loops and early-exitings which we have increased in arch_swap_restore() while supporting MTE restore for folios rather than page. On the other hand, it can also decrease do_swap_page as PTEs used to be set one by one even we hit a large folio in swapcache. Signed-off-by: Chuanhua Han Co-developed-by: Barry Song Signed-off-by: Barry Song --- mm/memory.c | 250 ++++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 212 insertions(+), 38 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index e0d34d705e07..501ede745ef3 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3907,6 +3907,136 @@ static vm_fault_t handle_pte_marker(struct vm_fault *vmf) return VM_FAULT_SIGBUS; } +/* + * check a range of PTEs are completely swap entries with + * contiguous swap offsets and the same SWAP_HAS_CACHE. + * pte must be first one in the range + */ +static bool is_pte_range_contig_swap(pte_t *pte, int nr_pages) +{ + int i; + struct swap_info_struct *si; + swp_entry_t entry; + unsigned type; + pgoff_t start_offset; + char has_cache; + + entry = pte_to_swp_entry(ptep_get_lockless(pte)); + if (non_swap_entry(entry)) + return false; + start_offset = swp_offset(entry); + if (start_offset % nr_pages) + return false; + + si = swp_swap_info(entry); + type = swp_type(entry); + has_cache = si->swap_map[start_offset] & SWAP_HAS_CACHE; + for (i = 1; i < nr_pages; i++) { + entry = pte_to_swp_entry(ptep_get_lockless(pte + i)); + if (non_swap_entry(entry)) + return false; + if (swp_offset(entry) != start_offset + i) + return false; + if (swp_type(entry) != type) + return false; + /* + * while allocating a large folio and doing swap_read_folio for the + * SWP_SYNCHRONOUS_IO path, which is the case the being faulted pte + * doesn't have swapcache. We need to ensure all PTEs have no cache + * as well, otherwise, we might go to swap devices while the content + * is in swapcache + */ + if ((si->swap_map[start_offset + i] & SWAP_HAS_CACHE) != has_cache) + return false; + } + + return true; +} + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +/* + * Get a list of all the (large) orders below PMD_ORDER that are enabled + * for this vma. Then filter out the orders that can't be allocated over + * the faulting address and still be fully contained in the vma. + */ +static inline unsigned long get_alloc_folio_orders(struct vm_fault *vmf) +{ + struct vm_area_struct *vma = vmf->vma; + unsigned long orders; + + orders = thp_vma_allowable_orders(vma, vma->vm_flags, false, true, true, + BIT(PMD_ORDER) - 1); + orders = thp_vma_suitable_orders(vma, vmf->address, orders); + return orders; +} +#endif + +static struct folio *alloc_swap_folio(struct vm_fault *vmf) +{ + struct vm_area_struct *vma = vmf->vma; +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + unsigned long orders; + struct folio *folio; + unsigned long addr; + pte_t *pte; + gfp_t gfp; + int order; + + /* + * If uffd is active for the vma we need per-page fault fidelity to + * maintain the uffd semantics. + */ + if (unlikely(userfaultfd_armed(vma))) + goto fallback; + + /* + * a large folio being swapped-in could be partially in + * zswap and partially in swap devices, zswap doesn't + * support large folios yet, we might get corrupted + * zero-filled data by reading all subpages from swap + * devices while some of them are actually in zswap + */ + if (is_zswap_enabled()) + goto fallback; + + orders = get_alloc_folio_orders(vmf); + if (!orders) + goto fallback; + + pte = pte_offset_map(vmf->pmd, vmf->address & PMD_MASK); + if (unlikely(!pte)) + goto fallback; + + /* + * For do_swap_page, find the highest order where the aligned range is + * completely swap entries with contiguous swap offsets. + */ + order = highest_order(orders); + while (orders) { + addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order); + if (is_pte_range_contig_swap(pte + pte_index(addr), 1 << order)) + break; + order = next_order(&orders, order); + } + + pte_unmap(pte); + + /* Try allocating the highest of the remaining orders. */ + gfp = vma_thp_gfp_mask(vma); + while (orders) { + addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order); + folio = vma_alloc_folio(gfp, order, vma, addr, true); + if (folio) + return folio; + order = next_order(&orders, order); + } + +fallback: +#endif + return vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, vmf->address, false); +} + + /* * We enter with non-exclusive mmap_lock (to exclude vma changes, * but allow concurrent faults), and pte mapped but not yet locked. @@ -3928,6 +4058,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) pte_t pte; vm_fault_t ret = 0; void *shadow = NULL; + int nr_pages = 1; + unsigned long start_address; + pte_t *start_pte; if (!pte_unmap_same(vmf)) goto out; @@ -3991,35 +4124,41 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) if (!folio) { if (data_race(si->flags & SWP_SYNCHRONOUS_IO) && __swap_count(entry) == 1) { - /* - * Prevent parallel swapin from proceeding with - * the cache flag. Otherwise, another thread may - * finish swapin first, free the entry, and swapout - * reusing the same entry. It's undetectable as - * pte_same() returns true due to entry reuse. - */ - if (swapcache_prepare(entry)) { - /* Relax a bit to prevent rapid repeated page faults */ - schedule_timeout_uninterruptible(1); - goto out; - } - need_clear_cache = true; - /* skip swapcache */ - folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, - vma, vmf->address, false); + folio = alloc_swap_folio(vmf); page = &folio->page; if (folio) { __folio_set_locked(folio); __folio_set_swapbacked(folio); + if (folio_test_large(folio)) { + nr_pages = folio_nr_pages(folio); + entry.val = ALIGN_DOWN(entry.val, nr_pages); + } + + /* + * Prevent parallel swapin from proceeding with + * the cache flag. Otherwise, another thread may + * finish swapin first, free the entry, and swapout + * reusing the same entry. It's undetectable as + * pte_same() returns true due to entry reuse. + */ + if (swapcache_prepare_nr(entry, nr_pages)) { + /* Relax a bit to prevent rapid repeated page faults */ + schedule_timeout_uninterruptible(1); + goto out; + } + need_clear_cache = true; + if (mem_cgroup_swapin_charge_folio(folio, vma->vm_mm, GFP_KERNEL, entry)) { ret = VM_FAULT_OOM; goto out_page; } - mem_cgroup_swapin_uncharge_swap(entry); + + for (swp_entry_t e = entry; e.val < entry.val + nr_pages; e.val++) + mem_cgroup_swapin_uncharge_swap(e); shadow = get_shadow_from_swap_cache(entry); if (shadow) @@ -4118,6 +4257,42 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) */ vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); + + start_address = vmf->address; + start_pte = vmf->pte; + if (start_pte && folio_test_large(folio)) { + unsigned long nr = folio_nr_pages(folio); + unsigned long addr = ALIGN_DOWN(vmf->address, nr * PAGE_SIZE); + pte_t *aligned_pte = vmf->pte - (vmf->address - addr) / PAGE_SIZE; + + /* + * case 1: we are allocating large_folio, try to map it as a whole + * iff the swap entries are still entirely mapped; + * case 2: we hit a large folio in swapcache, and all swap entries + * are still entirely mapped, try to map a large folio as a whole. + * otherwise, map only the faulting page within the large folio + * which is swapcache + */ + if (!is_pte_range_contig_swap(aligned_pte, nr)) { + if (nr_pages > 1) /* ptes have changed for case 1 */ + goto out_nomap; + goto check_pte; + } + + start_address = addr; + start_pte = aligned_pte; + /* + * the below has been done before swap_read_folio() + * for case 1 + */ + if (unlikely(folio == swapcache)) { + nr_pages = nr; + entry.val = ALIGN_DOWN(entry.val, nr_pages); + page = &folio->page; + } + } + +check_pte: if (unlikely(!vmf->pte || !pte_same(ptep_get(vmf->pte), vmf->orig_pte))) goto out_nomap; @@ -4185,12 +4360,14 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) * We're already holding a reference on the page but haven't mapped it * yet. */ - swap_free(entry); + swap_nr_free(entry, nr_pages); if (should_try_to_free_swap(folio, vma, vmf->flags)) folio_free_swap(folio); - inc_mm_counter(vma->vm_mm, MM_ANONPAGES); - dec_mm_counter(vma->vm_mm, MM_SWAPENTS); + folio_ref_add(folio, nr_pages - 1); + add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages); + add_mm_counter(vma->vm_mm, MM_SWAPENTS, -nr_pages); + pte = mk_pte(page, vma->vm_page_prot); /* @@ -4200,14 +4377,14 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) * exclusivity. */ if (!folio_test_ksm(folio) && - (exclusive || folio_ref_count(folio) == 1)) { + (exclusive || folio_ref_count(folio) == nr_pages)) { if (vmf->flags & FAULT_FLAG_WRITE) { pte = maybe_mkwrite(pte_mkdirty(pte), vma); vmf->flags &= ~FAULT_FLAG_WRITE; } rmap_flags |= RMAP_EXCLUSIVE; } - flush_icache_page(vma, page); + flush_icache_pages(vma, page, nr_pages); if (pte_swp_soft_dirty(vmf->orig_pte)) pte = pte_mksoft_dirty(pte); if (pte_swp_uffd_wp(vmf->orig_pte)) @@ -4216,17 +4393,19 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) /* ksm created a completely new copy */ if (unlikely(folio != swapcache && swapcache)) { - folio_add_new_anon_rmap(folio, vma, vmf->address); + folio_add_new_anon_rmap(folio, vma, start_address); folio_add_lru_vma(folio, vma); + } else if (!folio_test_anon(folio)) { + folio_add_new_anon_rmap(folio, vma, start_address); } else { - folio_add_anon_rmap_pte(folio, page, vma, vmf->address, + folio_add_anon_rmap_ptes(folio, page, nr_pages, vma, start_address, rmap_flags); } VM_BUG_ON(!folio_test_anon(folio) || (pte_write(pte) && !PageAnonExclusive(page))); - set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte); - arch_do_swap_page(vma->vm_mm, vma, vmf->address, pte, vmf->orig_pte); + set_ptes(vma->vm_mm, start_address, start_pte, pte, nr_pages); + arch_do_swap_page(vma->vm_mm, vma, start_address, pte, vmf->orig_pte); folio_unlock(folio); if (folio != swapcache && swapcache) { @@ -4243,6 +4422,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) } if (vmf->flags & FAULT_FLAG_WRITE) { + if (nr_pages > 1) + vmf->orig_pte = ptep_get(vmf->pte); + ret |= do_wp_page(vmf); if (ret & VM_FAULT_ERROR) ret &= VM_FAULT_ERROR; @@ -4250,14 +4432,14 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) } /* No need to invalidate - it was non-present before */ - update_mmu_cache_range(vmf, vma, vmf->address, vmf->pte, 1); + update_mmu_cache_range(vmf, vma, start_address, start_pte, nr_pages); unlock: if (vmf->pte) pte_unmap_unlock(vmf->pte, vmf->ptl); out: /* Clear the swap cache pin for direct swapin after PTL unlock */ if (need_clear_cache) - swapcache_clear(si, entry); + swapcache_clear_nr(si, entry, nr_pages); if (si) put_swap_device(si); return ret; @@ -4273,7 +4455,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) folio_put(swapcache); } if (need_clear_cache) - swapcache_clear(si, entry); + swapcache_clear_nr(si, entry, nr_pages); if (si) put_swap_device(si); return ret; @@ -4309,15 +4491,7 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf) if (unlikely(userfaultfd_armed(vma))) goto fallback; - /* - * Get a list of all the (large) orders below PMD_ORDER that are enabled - * for this vma. Then filter out the orders that can't be allocated over - * the faulting address and still be fully contained in the vma. - */ - orders = thp_vma_allowable_orders(vma, vma->vm_flags, false, true, true, - BIT(PMD_ORDER) - 1); - orders = thp_vma_suitable_orders(vma, vmf->address, orders); - + orders = get_alloc_folio_orders(vmf); if (!orders) goto fallback;