From patchwork Wed Jul 26 09:51:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 126249 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a985:0:b0:3e4:2afc:c1 with SMTP id t5csp301510vqo; Wed, 26 Jul 2023 03:23:38 -0700 (PDT) X-Google-Smtp-Source: APBJJlFQErXSW90Fw81nmCsGhyuQT/r+0VQpPCO2HtvNOHjLO+DfJHBBbqtoC7Uvrra11R2C/w/7 X-Received: by 2002:a17:902:c941:b0:1b8:a4e5:1735 with SMTP id i1-20020a170902c94100b001b8a4e51735mr1895881pla.61.1690367017804; Wed, 26 Jul 2023 03:23:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690367017; cv=none; d=google.com; s=arc-20160816; b=zLz2uxjGmhQwB41y0DwypKmu7s/sR2Utq66HD6n31TEQPLjN9jWg4VeD9PFaQyc+77 +hQF5eUwVDyczpCOI5VJLCAYnFJmuyzloSlP8GvLLAqo23sseRTaywcaqa7nPBZ1vw7q KdOzTuPdzznrgCzPznTEawPjinVU9LNYwnGVjvS8RqmbZ8ewRYCw3B004uX50tGT2soN 6bAuAv652VKQb+VDnKN4EaC2jS20iESTMhHGeepqSSzEgoaCgOfX9MB0at7rXApmKwP2 b6ZT+XvNZQrNg6E8DbVgf1s79r9yyG7o/CbJaFoND17xlwLXTyOPBuz6AORxHk0Xbh+H A8vw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=P8ft9n3i6LFq7DQLCMNj+awnrZC8BqgiGOJGw+Gf8+8=; fh=7mShmVuH9suBfBzgvZMyGTZzqI3OVCsEZy4ihinIHw8=; b=tNdFWsClgypG5CduA1onYf/vyV3+Huwh0YC4JfOsxSHLg19oWjeCjvQTE6morjvA89 UbC1ujoSkfQcGFv5I3+B+ukXlHo4K8XqlNU1cAxF8o53Wqb0CIwtBqBV3ghzJbV/7f7i rhzvJYZ/3dTB0POxnUIcMlrUgCgie/KlPgWnfJCNM2BvkWrGp3tXvRzFawQj5oODCtdU xLzrmAxSdiNkoNE0WZ3qmrCp+aGjaSjlvJiEQXeC9ppWzU0oyKUUDJc6qFUOqTH8gqHu Sf1pEYz4j6XDrh8okW/opFWovbTzCtNK2cWYz5a0lrnnX4CcSNAlORR1UOFoh40B0V+j Oy/w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id t5-20020a1709027fc500b001bbc138af14si2261935plb.160.2023.07.26.03.23.23; Wed, 26 Jul 2023 03:23:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233793AbjGZJwm (ORCPT + 99 others); Wed, 26 Jul 2023 05:52:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35642 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233757AbjGZJwQ (ORCPT ); Wed, 26 Jul 2023 05:52:16 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id CE3D0272A for ; Wed, 26 Jul 2023 02:52:02 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 31F81169C; Wed, 26 Jul 2023 02:52:45 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id EAB913F67D; Wed, 26 Jul 2023 02:51:59 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , Matthew Wilcox , Yin Fengwei , David Hildenbrand , Yu Zhao , Catalin Marinas , Will Deacon , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , Itaru Kitayama Cc: Ryan Roberts , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org Subject: [PATCH v4 1/5] mm: Non-pmd-mappable, large folios for folio_add_new_anon_rmap() Date: Wed, 26 Jul 2023 10:51:42 +0100 Message-Id: <20230726095146.2826796-2-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230726095146.2826796-1-ryan.roberts@arm.com> References: <20230726095146.2826796-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1772478285980156677 X-GMAIL-MSGID: 1772478285980156677 In preparation for LARGE_ANON_FOLIO support, improve folio_add_new_anon_rmap() to allow a non-pmd-mappable, large folio to be passed to it. In this case, all contained pages are accounted using the order-0 folio (or base page) scheme. Reviewed-by: Yu Zhao Reviewed-by: Yin Fengwei Signed-off-by: Ryan Roberts --- mm/rmap.c | 27 ++++++++++++++++++++------- 1 file changed, 20 insertions(+), 7 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 0c0d8857dfce..b3e3006738e4 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1278,31 +1278,44 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma, * This means the inc-and-test can be bypassed. * The folio does not have to be locked. * - * If the folio is large, it is accounted as a THP. As the folio + * If the folio is pmd-mappable, it is accounted as a THP. As the folio * is new, it's assumed to be mapped exclusively by a single process. */ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma, unsigned long address) { - int nr; + int nr = folio_nr_pages(folio); - VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma); + VM_BUG_ON_VMA(address < vma->vm_start || + address + (nr << PAGE_SHIFT) > vma->vm_end, vma); __folio_set_swapbacked(folio); - if (likely(!folio_test_pmd_mappable(folio))) { + if (likely(!folio_test_large(folio))) { /* increment count (starts at -1) */ atomic_set(&folio->_mapcount, 0); - nr = 1; + __page_set_anon_rmap(folio, &folio->page, vma, address, 1); + } else if (!folio_test_pmd_mappable(folio)) { + int i; + + for (i = 0; i < nr; i++) { + struct page *page = folio_page(folio, i); + + /* increment count (starts at -1) */ + atomic_set(&page->_mapcount, 0); + __page_set_anon_rmap(folio, page, vma, + address + (i << PAGE_SHIFT), 1); + } + + atomic_set(&folio->_nr_pages_mapped, nr); } else { /* increment count (starts at -1) */ atomic_set(&folio->_entire_mapcount, 0); atomic_set(&folio->_nr_pages_mapped, COMPOUND_MAPPED); - nr = folio_nr_pages(folio); + __page_set_anon_rmap(folio, &folio->page, vma, address, 1); __lruvec_stat_mod_folio(folio, NR_ANON_THPS, nr); } __lruvec_stat_mod_folio(folio, NR_ANON_MAPPED, nr); - __page_set_anon_rmap(folio, &folio->page, vma, address, 1); } /** From patchwork Wed Jul 26 09:51:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 126258 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a985:0:b0:3e4:2afc:c1 with SMTP id t5csp316194vqo; Wed, 26 Jul 2023 03:58:33 -0700 (PDT) X-Google-Smtp-Source: APBJJlEjH3lAuob1NCsWsbmGO9GpiGNH9leUGfsF0KVyERQW5b8XDoRWzocXqetkr+hkULbin5GR X-Received: by 2002:a05:6358:7203:b0:134:e603:116e with SMTP id h3-20020a056358720300b00134e603116emr1939816rwa.6.1690369113603; Wed, 26 Jul 2023 03:58:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690369113; cv=none; d=google.com; s=arc-20160816; b=cE/cZxXs14iH+RvbPtSvLChcuUIqHcMQ6p2FwxaaWO5M9Ab84jnvPaN53vBzVBbml8 7gWwtEQW5ACXcJTKLYe67aAGyuq3ljNggIWISEM8gmo0rqibXid1a8delpDDF8CWKrrf kqyt/pIq3/83WgiqVM4+uiHXwgfGnm/Q5NcMYaIyXQ6TZT9caZBCPvPI6USktcjV+J5C UwvPR06NEOfkf6X/lP9aXhzrqC3OEZklCtGpGCglymZrNIgP7+fWNaZPNYMBitMFdYiO FuvWFl36bFnUqycOW8yoRz6Rg6JYuo+biVDiCpfA+lOIowc9Q9bgJanYQpVaL6phNAPx 0CZg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=x0BA6E9+i5Gnd4bO5XZ4PdPptm0H5ieTfyuaPLlEOJY=; fh=7mShmVuH9suBfBzgvZMyGTZzqI3OVCsEZy4ihinIHw8=; b=oJXIPnMWzE7aFb9kl6dlDE8bFEc+u5UqskE9DzRA96gLusUP/hRVAJTdzwNdSz1zpw G9fMzGzqTsJzRhtO3lEJ7ooXAMLP3RSgvA3Zyzlk6LFDuEO3ZjfxZ9/Y+meOgxuR2vCf Jezc2dmyupJTBOKjvKoWmYT1wvjT4uUEhW2tAHPU10XciVqdYEHu2qLf0+ud31ClhrcR Kg0+1HfbL0Q0weJdynDKKlWqr98pgmEF67/9jtzuXy7/ex5AWEwkkd76mTdIg98O2v2j w1DUJDCpc2+gEAdTomUi90XMavLvN6jHNEHAhYXRQQBaQJGbzrR6E9xj8Gz8pf1bdv3/ dy/Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d23-20020a63ed17000000b005533cf1fdbfsi13317400pgi.629.2023.07.26.03.58.20; Wed, 26 Jul 2023 03:58:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233762AbjGZJwu (ORCPT + 99 others); Wed, 26 Jul 2023 05:52:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35680 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232910AbjGZJwX (ORCPT ); Wed, 26 Jul 2023 05:52:23 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id B22912D5F for ; Wed, 26 Jul 2023 02:52:05 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D817F169E; Wed, 26 Jul 2023 02:52:47 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 7EFDF3F67D; Wed, 26 Jul 2023 02:52:02 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , Matthew Wilcox , Yin Fengwei , David Hildenbrand , Yu Zhao , Catalin Marinas , Will Deacon , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , Itaru Kitayama Cc: Ryan Roberts , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org Subject: [PATCH v4 2/5] mm: LARGE_ANON_FOLIO for improved performance Date: Wed, 26 Jul 2023 10:51:43 +0100 Message-Id: <20230726095146.2826796-3-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230726095146.2826796-1-ryan.roberts@arm.com> References: <20230726095146.2826796-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1772480483161524021 X-GMAIL-MSGID: 1772480483161524021 Introduce LARGE_ANON_FOLIO feature, which allows anonymous memory to be allocated in large folios of a determined order. All pages of the large folio are pte-mapped during the same page fault, significantly reducing the number of page faults. The number of per-page operations (e.g. ref counting, rmap management lru list management) are also significantly reduced since those ops now become per-folio. The new behaviour is hidden behind the new LARGE_ANON_FOLIO Kconfig, which defaults to disabled for now; The long term aim is for this to defaut to enabled, but there are some risks around internal fragmentation that need to be better understood first. When enabled, the folio order is determined as such: For a vma, process or system that has explicitly disabled THP, we continue to allocate order-0. THP is most likely disabled to avoid any possible internal fragmentation so we honour that request. Otherwise, the return value of arch_wants_pte_order() is used. For vmas that have not explicitly opted-in to use transparent hugepages (e.g. where thp=madvise and the vma does not have MADV_HUGEPAGE), then arch_wants_pte_order() is limited to 64K (or PAGE_SIZE, whichever is bigger). This allows for a performance boost without requiring any explicit opt-in from the workload while limitting internal fragmentation. If the preferred order can't be used (e.g. because the folio would breach the bounds of the vma, or because ptes in the region are already mapped) then we fall back to a suitable lower order; first PAGE_ALLOC_COSTLY_ORDER, then order-0. arch_wants_pte_order() can be overridden by the architecture if desired. Some architectures (e.g. arm64) can coalsece TLB entries if a contiguous set of ptes map physically contigious, naturally aligned memory, so this mechanism allows the architecture to optimize as required. Here we add the default implementation of arch_wants_pte_order(), used when the architecture does not define it, which returns -1, implying that the HW has no preference. In this case, mm will choose it's own default order. Signed-off-by: Ryan Roberts --- include/linux/pgtable.h | 13 ++++ mm/Kconfig | 10 +++ mm/memory.c | 166 ++++++++++++++++++++++++++++++++++++---- 3 files changed, 172 insertions(+), 17 deletions(-) diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 5063b482e34f..2a1d83775837 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -313,6 +313,19 @@ static inline bool arch_has_hw_pte_young(void) } #endif +#ifndef arch_wants_pte_order +/* + * Returns preferred folio order for pte-mapped memory. Must be in range [0, + * PMD_SHIFT-PAGE_SHIFT) and must not be order-1 since THP requires large folios + * to be at least order-2. Negative value implies that the HW has no preference + * and mm will choose it's own default order. + */ +static inline int arch_wants_pte_order(void) +{ + return -1; +} +#endif + #ifndef __HAVE_ARCH_PTEP_GET_AND_CLEAR static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long address, diff --git a/mm/Kconfig b/mm/Kconfig index 09130434e30d..fa61ea160447 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1238,4 +1238,14 @@ config LOCK_MM_AND_FIND_VMA source "mm/damon/Kconfig" +config LARGE_ANON_FOLIO + bool "Allocate large folios for anonymous memory" + depends on TRANSPARENT_HUGEPAGE + default n + help + Use large (bigger than order-0) folios to back anonymous memory where + possible, even for pte-mapped memory. This reduces the number of page + faults, as well as other per-page overheads to improve performance for + many workloads. + endmenu diff --git a/mm/memory.c b/mm/memory.c index 01f39e8144ef..64c3f242c49a 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4050,6 +4050,127 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) return ret; } +static bool vmf_pte_range_changed(struct vm_fault *vmf, int nr_pages) +{ + int i; + + if (nr_pages == 1) + return vmf_pte_changed(vmf); + + for (i = 0; i < nr_pages; i++) { + if (!pte_none(ptep_get_lockless(vmf->pte + i))) + return true; + } + + return false; +} + +#ifdef CONFIG_LARGE_ANON_FOLIO +#define ANON_FOLIO_MAX_ORDER_UNHINTED \ + (ilog2(max_t(unsigned long, SZ_64K, PAGE_SIZE)) - PAGE_SHIFT) + +static int anon_folio_order(struct vm_area_struct *vma) +{ + int order; + + /* + * If THP is explicitly disabled for either the vma, the process or the + * system, then this is very likely intended to limit internal + * fragmentation; in this case, don't attempt to allocate a large + * anonymous folio. + * + * Else, if the vma is eligible for thp, allocate a large folio of the + * size preferred by the arch. Or if the arch requested a very small + * size or didn't request a size, then use PAGE_ALLOC_COSTLY_ORDER, + * which still meets the arch's requirements but means we still take + * advantage of SW optimizations (e.g. fewer page faults). + * + * Finally if thp is enabled but the vma isn't eligible, take the + * arch-preferred size and limit it to ANON_FOLIO_MAX_ORDER_UNHINTED. + * This ensures workloads that have not explicitly opted-in take benefit + * while capping the potential for internal fragmentation. + */ + + if ((vma->vm_flags & VM_NOHUGEPAGE) || + test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags) || + !hugepage_flags_enabled()) + order = 0; + else { + order = max(arch_wants_pte_order(), PAGE_ALLOC_COSTLY_ORDER); + + if (!hugepage_vma_check(vma, vma->vm_flags, false, true, true)) + order = min(order, ANON_FOLIO_MAX_ORDER_UNHINTED); + } + + return order; +} + +static int alloc_anon_folio(struct vm_fault *vmf, struct folio **folio) +{ + int i; + gfp_t gfp; + pte_t *pte; + unsigned long addr; + struct vm_area_struct *vma = vmf->vma; + int prefer = anon_folio_order(vma); + int orders[] = { + prefer, + prefer > PAGE_ALLOC_COSTLY_ORDER ? PAGE_ALLOC_COSTLY_ORDER : 0, + 0, + }; + + *folio = NULL; + + if (vmf_orig_pte_uffd_wp(vmf)) + goto fallback; + + for (i = 0; orders[i]; i++) { + addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << orders[i]); + if (addr >= vma->vm_start && + addr + (PAGE_SIZE << orders[i]) <= vma->vm_end) + break; + } + + if (!orders[i]) + goto fallback; + + pte = pte_offset_map(vmf->pmd, vmf->address & PMD_MASK); + if (!pte) + return -EAGAIN; + + for (; orders[i]; i++) { + addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << orders[i]); + vmf->pte = pte + pte_index(addr); + if (!vmf_pte_range_changed(vmf, 1 << orders[i])) + break; + } + + vmf->pte = NULL; + pte_unmap(pte); + + gfp = vma_thp_gfp_mask(vma); + + for (; orders[i]; i++) { + addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << orders[i]); + *folio = vma_alloc_folio(gfp, orders[i], vma, addr, true); + if (*folio) { + clear_huge_page(&(*folio)->page, addr, 1 << orders[i]); + return 0; + } + } + +fallback: + *folio = vma_alloc_zeroed_movable_folio(vma, vmf->address); + return *folio ? 0 : -ENOMEM; +} +#else +static inline int alloc_anon_folio(struct vm_fault *vmf, struct folio **folio) +{ + *folio = vma_alloc_zeroed_movable_folio(vmf->vma, vmf->address); + return *folio ? 0 : -ENOMEM; +} +#endif + /* * We enter with non-exclusive mmap_lock (to exclude vma changes, * but allow concurrent faults), and pte mapped but not yet locked. @@ -4057,6 +4178,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) */ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) { + int i = 0; + int nr_pages = 1; + unsigned long addr = vmf->address; bool uffd_wp = vmf_orig_pte_uffd_wp(vmf); struct vm_area_struct *vma = vmf->vma; struct folio *folio; @@ -4101,10 +4225,15 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) /* Allocate our own private page. */ if (unlikely(anon_vma_prepare(vma))) goto oom; - folio = vma_alloc_zeroed_movable_folio(vma, vmf->address); + ret = alloc_anon_folio(vmf, &folio); + if (unlikely(ret == -EAGAIN)) + return 0; if (!folio) goto oom; + nr_pages = folio_nr_pages(folio); + addr = ALIGN_DOWN(vmf->address, nr_pages * PAGE_SIZE); + if (mem_cgroup_charge(folio, vma->vm_mm, GFP_KERNEL)) goto oom_free_page; folio_throttle_swaprate(folio, GFP_KERNEL); @@ -4116,17 +4245,12 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) */ __folio_mark_uptodate(folio); - entry = mk_pte(&folio->page, vma->vm_page_prot); - entry = pte_sw_mkyoung(entry); - if (vma->vm_flags & VM_WRITE) - entry = pte_mkwrite(pte_mkdirty(entry)); - - vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, - &vmf->ptl); + vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl); if (!vmf->pte) goto release; - if (vmf_pte_changed(vmf)) { - update_mmu_tlb(vma, vmf->address, vmf->pte); + if (vmf_pte_range_changed(vmf, nr_pages)) { + for (i = 0; i < nr_pages; i++) + update_mmu_tlb(vma, addr + PAGE_SIZE * i, vmf->pte + i); goto release; } @@ -4141,16 +4265,24 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) return handle_userfault(vmf, VM_UFFD_MISSING); } - inc_mm_counter(vma->vm_mm, MM_ANONPAGES); - folio_add_new_anon_rmap(folio, vma, vmf->address); + folio_ref_add(folio, nr_pages - 1); + add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages); + folio_add_new_anon_rmap(folio, vma, addr); folio_add_lru_vma(folio, vma); + + for (i = 0; i < nr_pages; i++) { + entry = mk_pte(folio_page(folio, i), vma->vm_page_prot); + entry = pte_sw_mkyoung(entry); + if (vma->vm_flags & VM_WRITE) + entry = pte_mkwrite(pte_mkdirty(entry)); setpte: - if (uffd_wp) - entry = pte_mkuffd_wp(entry); - set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry); + if (uffd_wp) + entry = pte_mkuffd_wp(entry); + set_pte_at(vma->vm_mm, addr + PAGE_SIZE * i, vmf->pte + i, entry); - /* No need to invalidate - it was non-present before */ - update_mmu_cache(vma, vmf->address, vmf->pte); + /* No need to invalidate - it was non-present before */ + update_mmu_cache(vma, addr + PAGE_SIZE * i, vmf->pte + i); + } unlock: if (vmf->pte) pte_unmap_unlock(vmf->pte, vmf->ptl); From patchwork Wed Jul 26 09:51:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 126266 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a985:0:b0:3e4:2afc:c1 with SMTP id t5csp329525vqo; Wed, 26 Jul 2023 04:21:39 -0700 (PDT) X-Google-Smtp-Source: APBJJlG1kow29ThkBreYKTxW0wulKvDd7PhDiGiNRghOaec1+Tyw6HNLjj5wZOCrlyMpFWloGrt2 X-Received: by 2002:a05:6a00:1882:b0:682:4edf:b9b4 with SMTP id x2-20020a056a00188200b006824edfb9b4mr1988580pfh.23.1690370499397; Wed, 26 Jul 2023 04:21:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690370499; cv=none; d=google.com; s=arc-20160816; b=bUttpEu7/r6B8N6SlpG2IPylctNLb+4mKwgGhg2FJKRD/4bi0ndxrz/5hvAwiyKU6D H92vVSKIXvG01FwRb73UI+vQySUVkg1+ywIIx5f+IizK8p/I+HJ4WakVJnKScoQlxw17 8ayYGaiPM1wDEeBXTQtiTL55b9UlEb6rOiEloZwMV2OHVhlCqutFRmuySxxU2QM9CvAP sM4xdPWRal72Xj0EsnB1St4bCuWes+jGLCPGsES3MTzG8DTzs6bWmeQP/Khr0qDArTCK md+Tooo0cRWArlTQK97LA0trmcvqHpDZUrSa8j4YdWxanPlN3ZN32EhXf9pAJQO+xDRC 3EmQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=iwm9DkoXLBxio5wlQLJ5lOhkwwAE6xJtKQSzoG6WzoY=; fh=7mShmVuH9suBfBzgvZMyGTZzqI3OVCsEZy4ihinIHw8=; b=ht7Aw7jDr9NWqVz29/uM3NqQgcwgtVbV/aM9NCeYAA5HPZnR594PR1mn2SA1mCxjr+ XEutCHODF3ff7meoetGyGvYwsvavsGFAlt/mzJccb8RcUT4s4J9hKsKKC7geJ2XSbsex pE2H1X9AFiSw7qq64vNKKjlk05v2IYIwhkCC5P0JxL6L9iIg9TGaX/XcD9tbs/Q7lVgG kot9v09n3K+wZvofo0u1In1EE4AitmsK0wGTzZbcow3JaeMJskcCcpIXUq1t/hKfaAFT LoKp+8r0KAA3SW9vc+6orG2JabyezVnSEH/qCue49ksdpYiYQzH715SYbIcnb0jKCH2i 6gwQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id eg26-20020a056a00801a00b00686680dc33esi10517258pfb.404.2023.07.26.04.21.26; Wed, 26 Jul 2023 04:21:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233804AbjGZJw4 (ORCPT + 99 others); Wed, 26 Jul 2023 05:52:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35600 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231674AbjGZJw2 (ORCPT ); Wed, 26 Jul 2023 05:52:28 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id DDDFB2D67 for ; Wed, 26 Jul 2023 02:52:07 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6C4B9169C; Wed, 26 Jul 2023 02:52:50 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 30CED3F67D; Wed, 26 Jul 2023 02:52:05 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , Matthew Wilcox , Yin Fengwei , David Hildenbrand , Yu Zhao , Catalin Marinas , Will Deacon , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , Itaru Kitayama Cc: Ryan Roberts , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org Subject: [PATCH v4 3/5] arm64: mm: Override arch_wants_pte_order() Date: Wed, 26 Jul 2023 10:51:44 +0100 Message-Id: <20230726095146.2826796-4-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230726095146.2826796-1-ryan.roberts@arm.com> References: <20230726095146.2826796-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1772481936635136356 X-GMAIL-MSGID: 1772481936635136356 Define an arch-specific override of arch_wants_pte_order() so that when LARGE_ANON_FOLIO is enabled, large folios will be allocated for anonymous memory with an order that is compatible with arm64's contpte mappings. Reviewed-by: Yu Zhao Signed-off-by: Ryan Roberts --- arch/arm64/include/asm/pgtable.h | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 0bd18de9fd97..d00bb26fe28f 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -1106,6 +1106,12 @@ extern pte_t ptep_modify_prot_start(struct vm_area_struct *vma, extern void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep, pte_t old_pte, pte_t new_pte); + +#define arch_wants_pte_order arch_wants_pte_order +static inline int arch_wants_pte_order(void) +{ + return CONT_PTE_SHIFT - PAGE_SHIFT; +} #endif /* !__ASSEMBLY__ */ #endif /* __ASM_PGTABLE_H */ From patchwork Wed Jul 26 09:51:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 126268 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a985:0:b0:3e4:2afc:c1 with SMTP id t5csp332274vqo; Wed, 26 Jul 2023 04:27:25 -0700 (PDT) X-Google-Smtp-Source: APBJJlG8443NeLiikToJy6Pgk85EX6dwv0vPJK/JopzXi8WOQ8Pn7jwiQlTR7BSex4SvjtwHfKmE X-Received: by 2002:aa7:d30d:0:b0:522:6e3f:b65 with SMTP id p13-20020aa7d30d000000b005226e3f0b65mr1139425edq.33.1690370845548; Wed, 26 Jul 2023 04:27:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690370845; cv=none; d=google.com; s=arc-20160816; b=XlTOqxRWYUxPisJuv8edULR9jPvhciQ0WZsM9/WrfMrrT2w1PCMmex0UImqm50DxeM 6Jd5uWrVm5OyQ3yM7Q9lX9vyS7cr5jTAYN5A7fzWwCCzZl25lQb88I6t9yPKMypNjki3 Qb/kk+5JlCMKTEno5atCKdGwCMwyF640QtQKJoOV2SVdYm0wiCo/zeiEG7d2kOQB10Di Wy9BrHJJumMYqwVdZjBdf+IcmoZbtyBC+PQl7uG9vfwzjOoi9dskuum8HoGFBi+ZmNRw JfUlAe/b1Xon0c4N8AZDmvuOfW3mRORqVG8m8AF9lVmV1klu+tsCSNii2mrbycNuXeQ+ uP6g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=mMjrxtyJTpWC3olVckqQ03r5Ge3JaClfF6iuN/DQcwg=; fh=7mShmVuH9suBfBzgvZMyGTZzqI3OVCsEZy4ihinIHw8=; b=Os4sLqd4I3ztvbAxftcS/IlJrVxwmvTJ76XRZEqXQxE62ogAdWi4CifdzaUlyd74BG Y0MhRmbYWfxxb1DNUYGFpLl8EwvHKSf9rqR9mGQuQUu436BU19LkhmeIsclJSatRlG45 MtOzOsJYib198ArjL9xZeRamP03sYoDm5ETseY4GNW4uLk6BvKQHqgNctX6muTKg2DD/ w3TIDsYbSiEyvVxcOLXLWLPTQazuaYJJ2P2KKzfJSV0zccyB4I9LPqxPY5KxUtwWeLnG TAL2qX2wMnUDfZgILJ2bOESg3gICOgItiZWcjF8IVjBtaUBshb/6bxRluypMBUSeGvNH tlmQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f13-20020a056402150d00b00521656f855dsi9951225edw.110.2023.07.26.04.27.01; Wed, 26 Jul 2023 04:27:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232592AbjGZJxD (ORCPT + 99 others); Wed, 26 Jul 2023 05:53:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34966 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233686AbjGZJwb (ORCPT ); Wed, 26 Jul 2023 05:52:31 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 86D9019A0 for ; Wed, 26 Jul 2023 02:52:10 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 004491692; Wed, 26 Jul 2023 02:52:53 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B913A3F67D; Wed, 26 Jul 2023 02:52:07 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , Matthew Wilcox , Yin Fengwei , David Hildenbrand , Yu Zhao , Catalin Marinas , Will Deacon , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , Itaru Kitayama Cc: Ryan Roberts , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org Subject: [PATCH v4 4/5] selftests/mm/cow: Generalize do_run_with_thp() helper Date: Wed, 26 Jul 2023 10:51:45 +0100 Message-Id: <20230726095146.2826796-5-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230726095146.2826796-1-ryan.roberts@arm.com> References: <20230726095146.2826796-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1772482299636295636 X-GMAIL-MSGID: 1772482299636295636 do_run_with_thp() prepares THP memory into different states before running tests. We would like to reuse this logic to also test large anon folios. So let's add a size parameter which tells the function what size of memory it should operate on. Remove references to THP and replace with LARGE, and fix up all existing call sites to pass thpsize as the required size. No functional change intended here, but a separate commit will add new large anon folio tests that use this new capability. Signed-off-by: Ryan Roberts --- tools/testing/selftests/mm/cow.c | 118 ++++++++++++++++--------------- 1 file changed, 61 insertions(+), 57 deletions(-) diff --git a/tools/testing/selftests/mm/cow.c b/tools/testing/selftests/mm/cow.c index 7324ce5363c0..304882bf2e5d 100644 --- a/tools/testing/selftests/mm/cow.c +++ b/tools/testing/selftests/mm/cow.c @@ -723,25 +723,25 @@ static void run_with_base_page_swap(test_fn fn, const char *desc) do_run_with_base_page(fn, true); } -enum thp_run { - THP_RUN_PMD, - THP_RUN_PMD_SWAPOUT, - THP_RUN_PTE, - THP_RUN_PTE_SWAPOUT, - THP_RUN_SINGLE_PTE, - THP_RUN_SINGLE_PTE_SWAPOUT, - THP_RUN_PARTIAL_MREMAP, - THP_RUN_PARTIAL_SHARED, +enum large_run { + LARGE_RUN_PMD, + LARGE_RUN_PMD_SWAPOUT, + LARGE_RUN_PTE, + LARGE_RUN_PTE_SWAPOUT, + LARGE_RUN_SINGLE_PTE, + LARGE_RUN_SINGLE_PTE_SWAPOUT, + LARGE_RUN_PARTIAL_MREMAP, + LARGE_RUN_PARTIAL_SHARED, }; -static void do_run_with_thp(test_fn fn, enum thp_run thp_run) +static void do_run_with_large(test_fn fn, enum large_run large_run, size_t size) { char *mem, *mmap_mem, *tmp, *mremap_mem = MAP_FAILED; - size_t size, mmap_size, mremap_size; + size_t mmap_size, mremap_size; int ret; - /* For alignment purposes, we need twice the thp size. */ - mmap_size = 2 * thpsize; + /* For alignment purposes, we need twice the requested size. */ + mmap_size = 2 * size; mmap_mem = mmap(NULL, mmap_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (mmap_mem == MAP_FAILED) { @@ -749,36 +749,40 @@ static void do_run_with_thp(test_fn fn, enum thp_run thp_run) return; } - /* We need a THP-aligned memory area. */ - mem = (char *)(((uintptr_t)mmap_mem + thpsize) & ~(thpsize - 1)); + /* We need to naturally align the memory area. */ + mem = (char *)(((uintptr_t)mmap_mem + size) & ~(size - 1)); - ret = madvise(mem, thpsize, MADV_HUGEPAGE); + ret = madvise(mem, size, MADV_HUGEPAGE); if (ret) { ksft_test_result_fail("MADV_HUGEPAGE failed\n"); goto munmap; } /* - * Try to populate a THP. Touch the first sub-page and test if we get - * another sub-page populated automatically. + * Try to populate a large folio. Touch the first sub-page and test if + * we get the last sub-page populated automatically. */ mem[0] = 0; - if (!pagemap_is_populated(pagemap_fd, mem + pagesize)) { - ksft_test_result_skip("Did not get a THP populated\n"); + if (!pagemap_is_populated(pagemap_fd, mem + size - pagesize)) { + ksft_test_result_skip("Did not get fully populated\n"); goto munmap; } - memset(mem, 0, thpsize); + memset(mem, 0, size); - size = thpsize; - switch (thp_run) { - case THP_RUN_PMD: - case THP_RUN_PMD_SWAPOUT: + switch (large_run) { + case LARGE_RUN_PMD: + case LARGE_RUN_PMD_SWAPOUT: + if (size != thpsize) { + ksft_test_result_fail("test bug: can't PMD-map size\n"); + goto munmap; + } break; - case THP_RUN_PTE: - case THP_RUN_PTE_SWAPOUT: + case LARGE_RUN_PTE: + case LARGE_RUN_PTE_SWAPOUT: /* - * Trigger PTE-mapping the THP by temporarily mapping a single - * subpage R/O. + * Trigger PTE-mapping the large folio by temporarily mapping a + * single subpage R/O. This is a noop if the large-folio is not + * thpsize (and therefore already PTE-mapped). */ ret = mprotect(mem + pagesize, pagesize, PROT_READ); if (ret) { @@ -791,25 +795,25 @@ static void do_run_with_thp(test_fn fn, enum thp_run thp_run) goto munmap; } break; - case THP_RUN_SINGLE_PTE: - case THP_RUN_SINGLE_PTE_SWAPOUT: + case LARGE_RUN_SINGLE_PTE: + case LARGE_RUN_SINGLE_PTE_SWAPOUT: /* - * Discard all but a single subpage of that PTE-mapped THP. What - * remains is a single PTE mapping a single subpage. + * Discard all but a single subpage of that PTE-mapped large + * folio. What remains is a single PTE mapping a single subpage. */ - ret = madvise(mem + pagesize, thpsize - pagesize, MADV_DONTNEED); + ret = madvise(mem + pagesize, size - pagesize, MADV_DONTNEED); if (ret) { ksft_test_result_fail("MADV_DONTNEED failed\n"); goto munmap; } size = pagesize; break; - case THP_RUN_PARTIAL_MREMAP: + case LARGE_RUN_PARTIAL_MREMAP: /* - * Remap half of the THP. We need some new memory location - * for that. + * Remap half of the lareg folio. We need some new memory + * location for that. */ - mremap_size = thpsize / 2; + mremap_size = size / 2; mremap_mem = mmap(NULL, mremap_size, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (mem == MAP_FAILED) { @@ -824,13 +828,13 @@ static void do_run_with_thp(test_fn fn, enum thp_run thp_run) } size = mremap_size; break; - case THP_RUN_PARTIAL_SHARED: + case LARGE_RUN_PARTIAL_SHARED: /* - * Share the first page of the THP with a child and quit the - * child. This will result in some parts of the THP never - * have been shared. + * Share the first page of the large folio with a child and quit + * the child. This will result in some parts of the large folio + * never have been shared. */ - ret = madvise(mem + pagesize, thpsize - pagesize, MADV_DONTFORK); + ret = madvise(mem + pagesize, size - pagesize, MADV_DONTFORK); if (ret) { ksft_test_result_fail("MADV_DONTFORK failed\n"); goto munmap; @@ -844,7 +848,7 @@ static void do_run_with_thp(test_fn fn, enum thp_run thp_run) } wait(&ret); /* Allow for sharing all pages again. */ - ret = madvise(mem + pagesize, thpsize - pagesize, MADV_DOFORK); + ret = madvise(mem + pagesize, size - pagesize, MADV_DOFORK); if (ret) { ksft_test_result_fail("MADV_DOFORK failed\n"); goto munmap; @@ -854,10 +858,10 @@ static void do_run_with_thp(test_fn fn, enum thp_run thp_run) assert(false); } - switch (thp_run) { - case THP_RUN_PMD_SWAPOUT: - case THP_RUN_PTE_SWAPOUT: - case THP_RUN_SINGLE_PTE_SWAPOUT: + switch (large_run) { + case LARGE_RUN_PMD_SWAPOUT: + case LARGE_RUN_PTE_SWAPOUT: + case LARGE_RUN_SINGLE_PTE_SWAPOUT: madvise(mem, size, MADV_PAGEOUT); if (!range_is_swapped(mem, size)) { ksft_test_result_skip("MADV_PAGEOUT did not work, is swap enabled?\n"); @@ -878,49 +882,49 @@ static void do_run_with_thp(test_fn fn, enum thp_run thp_run) static void run_with_thp(test_fn fn, const char *desc) { ksft_print_msg("[RUN] %s ... with THP\n", desc); - do_run_with_thp(fn, THP_RUN_PMD); + do_run_with_large(fn, LARGE_RUN_PMD, thpsize); } static void run_with_thp_swap(test_fn fn, const char *desc) { ksft_print_msg("[RUN] %s ... with swapped-out THP\n", desc); - do_run_with_thp(fn, THP_RUN_PMD_SWAPOUT); + do_run_with_large(fn, LARGE_RUN_PMD_SWAPOUT, thpsize); } static void run_with_pte_mapped_thp(test_fn fn, const char *desc) { ksft_print_msg("[RUN] %s ... with PTE-mapped THP\n", desc); - do_run_with_thp(fn, THP_RUN_PTE); + do_run_with_large(fn, LARGE_RUN_PTE, thpsize); } static void run_with_pte_mapped_thp_swap(test_fn fn, const char *desc) { ksft_print_msg("[RUN] %s ... with swapped-out, PTE-mapped THP\n", desc); - do_run_with_thp(fn, THP_RUN_PTE_SWAPOUT); + do_run_with_large(fn, LARGE_RUN_PTE_SWAPOUT, thpsize); } static void run_with_single_pte_of_thp(test_fn fn, const char *desc) { ksft_print_msg("[RUN] %s ... with single PTE of THP\n", desc); - do_run_with_thp(fn, THP_RUN_SINGLE_PTE); + do_run_with_large(fn, LARGE_RUN_SINGLE_PTE, thpsize); } static void run_with_single_pte_of_thp_swap(test_fn fn, const char *desc) { ksft_print_msg("[RUN] %s ... with single PTE of swapped-out THP\n", desc); - do_run_with_thp(fn, THP_RUN_SINGLE_PTE_SWAPOUT); + do_run_with_large(fn, LARGE_RUN_SINGLE_PTE_SWAPOUT, thpsize); } static void run_with_partial_mremap_thp(test_fn fn, const char *desc) { ksft_print_msg("[RUN] %s ... with partially mremap()'ed THP\n", desc); - do_run_with_thp(fn, THP_RUN_PARTIAL_MREMAP); + do_run_with_large(fn, LARGE_RUN_PARTIAL_MREMAP, thpsize); } static void run_with_partial_shared_thp(test_fn fn, const char *desc) { ksft_print_msg("[RUN] %s ... with partially shared THP\n", desc); - do_run_with_thp(fn, THP_RUN_PARTIAL_SHARED); + do_run_with_large(fn, LARGE_RUN_PARTIAL_SHARED, thpsize); } static void run_with_hugetlb(test_fn fn, const char *desc, size_t hugetlbsize) @@ -1338,7 +1342,7 @@ static void run_anon_thp_test_cases(void) struct test_case const *test_case = &anon_thp_test_cases[i]; ksft_print_msg("[RUN] %s\n", test_case->desc); - do_run_with_thp(test_case->fn, THP_RUN_PMD); + do_run_with_large(test_case->fn, LARGE_RUN_PMD, thpsize); } } From patchwork Wed Jul 26 09:51:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 126250 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a985:0:b0:3e4:2afc:c1 with SMTP id t5csp301805vqo; Wed, 26 Jul 2023 03:24:14 -0700 (PDT) X-Google-Smtp-Source: APBJJlG5zEF95PZPUUjL69ZGNH3fKkBK45I/+LRaOCHtCJ0jqdDwcPcu8Atdr2LXUbl22dD7PQqx X-Received: by 2002:a17:906:2219:b0:99b:cf7a:c8f1 with SMTP id s25-20020a170906221900b0099bcf7ac8f1mr29512ejs.70.1690367053822; Wed, 26 Jul 2023 03:24:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690367053; cv=none; d=google.com; s=arc-20160816; b=uELWj/0fFNO9rTjEgFFpcRnvyUsM/WqqE/wz5QdyWCC4IVhLvpxzqt7nGHJIMxf3zW /IdxlKmGvhQLiLogeveLKAbvYgPCh0ZT1aeSpaAw9iQWCiCZQM0ktvCQVtpPU8xSoxfu kTmvISRLbuwyKqNjGFzaL3fupc23mfxuaPeDWV19mcpdeOS9iwlQ1hdNCTF4NV2yP8HO outQ3p2TbuXu0vVAEx3UwzN6tshuVuGeYBDyXW0GXkeHdwR6BjYi52ob+wpYaBPnvYT6 oemeosLykbGnz/NN5+8JEfNFT5XQi1ez6MTRqd1yeI+exk4oqsMoOYbSt2uFrMur+TWv 0wig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=YAYuga94X0gYR3iPhGr2eIcuvk2ipwl7xIVV/NAaqzI=; fh=7mShmVuH9suBfBzgvZMyGTZzqI3OVCsEZy4ihinIHw8=; b=HCX8GsbgOoLqSE8b27awhUeFzbQL6sVKBmY3QJqmRpjnuiUiu4A64+6hBsKvkXKdQG Y9BN6vQaNFWYSeH1LPX56Ro1XKbv2biAPJBqphDtXXKUDiTvrYyTIFn8DM3nuTpi9Gqy nIUjzhRQ3aezF1DVV4KPIpTs7XXPEueIROaADu8y3Htt4Ad1wm3TMm3kHYMxOMX74XZb K76jiKL9mRbNy+I4AbYWHDe+fpoeHz0BWehRvpqU1PXmeAZPO0qKAYU1P5qKAS1ApHs/ eQsGD9iU0z43WCJjAEedQORH5JOvBB7lgvmYlqpblRy7rr0/KRlTo+l4mBjwdkb5aSA0 2b5Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z22-20020a170906075600b00987b20b66bbsi9007438ejb.711.2023.07.26.03.23.48; Wed, 26 Jul 2023 03:24:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231460AbjGZJxR (ORCPT + 99 others); Wed, 26 Jul 2023 05:53:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35786 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233788AbjGZJwj (ORCPT ); Wed, 26 Jul 2023 05:52:39 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id B5B0C1BC8 for ; Wed, 26 Jul 2023 02:52:12 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8860416A3; Wed, 26 Jul 2023 02:52:55 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 4CFB33F67D; Wed, 26 Jul 2023 02:52:10 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , Matthew Wilcox , Yin Fengwei , David Hildenbrand , Yu Zhao , Catalin Marinas , Will Deacon , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , Itaru Kitayama Cc: Ryan Roberts , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org Subject: [PATCH v4 5/5] selftests/mm/cow: Add large anon folio tests Date: Wed, 26 Jul 2023 10:51:46 +0100 Message-Id: <20230726095146.2826796-6-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230726095146.2826796-1-ryan.roberts@arm.com> References: <20230726095146.2826796-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1772478323531126775 X-GMAIL-MSGID: 1772478323531126775 Add tests similar to the existing THP tests, but which operate on memory backed by large anonymous folios, which are smaller than THP. This reuses all the existing infrastructure. If the test suite detects that large anonyomous folios are not supported by the kernel, the new tests are skipped. Signed-off-by: Ryan Roberts --- tools/testing/selftests/mm/cow.c | 111 +++++++++++++++++++++++++++++-- 1 file changed, 106 insertions(+), 5 deletions(-) diff --git a/tools/testing/selftests/mm/cow.c b/tools/testing/selftests/mm/cow.c index 304882bf2e5d..932242c965a4 100644 --- a/tools/testing/selftests/mm/cow.c +++ b/tools/testing/selftests/mm/cow.c @@ -33,6 +33,7 @@ static size_t pagesize; static int pagemap_fd; static size_t thpsize; +static size_t lafsize; static int nr_hugetlbsizes; static size_t hugetlbsizes[10]; static int gup_fd; @@ -927,6 +928,42 @@ static void run_with_partial_shared_thp(test_fn fn, const char *desc) do_run_with_large(fn, LARGE_RUN_PARTIAL_SHARED, thpsize); } +static void run_with_laf(test_fn fn, const char *desc) +{ + ksft_print_msg("[RUN] %s ... with large anon folio\n", desc); + do_run_with_large(fn, LARGE_RUN_PTE, lafsize); +} + +static void run_with_laf_swap(test_fn fn, const char *desc) +{ + ksft_print_msg("[RUN] %s ... with swapped-out large anon folio\n", desc); + do_run_with_large(fn, LARGE_RUN_PTE_SWAPOUT, lafsize); +} + +static void run_with_single_pte_of_laf(test_fn fn, const char *desc) +{ + ksft_print_msg("[RUN] %s ... with single PTE of large anon folio\n", desc); + do_run_with_large(fn, LARGE_RUN_SINGLE_PTE, lafsize); +} + +static void run_with_single_pte_of_laf_swap(test_fn fn, const char *desc) +{ + ksft_print_msg("[RUN] %s ... with single PTE of swapped-out large anon folio\n", desc); + do_run_with_large(fn, LARGE_RUN_SINGLE_PTE_SWAPOUT, lafsize); +} + +static void run_with_partial_mremap_laf(test_fn fn, const char *desc) +{ + ksft_print_msg("[RUN] %s ... with partially mremap()'ed large anon folio\n", desc); + do_run_with_large(fn, LARGE_RUN_PARTIAL_MREMAP, lafsize); +} + +static void run_with_partial_shared_laf(test_fn fn, const char *desc) +{ + ksft_print_msg("[RUN] %s ... with partially shared large anon folio\n", desc); + do_run_with_large(fn, LARGE_RUN_PARTIAL_SHARED, lafsize); +} + static void run_with_hugetlb(test_fn fn, const char *desc, size_t hugetlbsize) { int flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB; @@ -1105,6 +1142,14 @@ static void run_anon_test_case(struct test_case const *test_case) run_with_partial_mremap_thp(test_case->fn, test_case->desc); run_with_partial_shared_thp(test_case->fn, test_case->desc); } + if (lafsize) { + run_with_laf(test_case->fn, test_case->desc); + run_with_laf_swap(test_case->fn, test_case->desc); + run_with_single_pte_of_laf(test_case->fn, test_case->desc); + run_with_single_pte_of_laf_swap(test_case->fn, test_case->desc); + run_with_partial_mremap_laf(test_case->fn, test_case->desc); + run_with_partial_shared_laf(test_case->fn, test_case->desc); + } for (i = 0; i < nr_hugetlbsizes; i++) run_with_hugetlb(test_case->fn, test_case->desc, hugetlbsizes[i]); @@ -1126,6 +1171,8 @@ static int tests_per_anon_test_case(void) if (thpsize) tests += 8; + if (lafsize) + tests += 6; return tests; } @@ -1680,15 +1727,74 @@ static int tests_per_non_anon_test_case(void) return tests; } +static size_t large_anon_folio_size(void) +{ + /* + * There is no interface to query this. But we know that it must be less + * than thpsize. So we map a thpsize area, aligned to thpsize offset by + * thpsize/2 (to avoid a hugepage being allocated), then touch the first + * page and see how many pages get faulted in. + */ + + int max_order = __builtin_ctz(thpsize); + size_t mmap_size = thpsize * 3; + char *mmap_mem = NULL; + int order = 0; + char *mem; + size_t offset; + int ret; + + /* For alignment purposes, we need 2.5x the requested size. */ + mmap_mem = mmap(NULL, mmap_size, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (mmap_mem == MAP_FAILED) + goto out; + + /* Align the memory area to thpsize then offset it by thpsize/2. */ + mem = (char *)(((uintptr_t)mmap_mem + thpsize) & ~(thpsize - 1)); + mem += thpsize / 2; + + /* We might get a bigger large anon folio when MADV_HUGEPAGE is set. */ + ret = madvise(mem, thpsize, MADV_HUGEPAGE); + if (ret) + goto out; + + /* Probe the memory to see how much is populated. */ + mem[0] = 0; + for (order = 0; order < max_order; order++) { + offset = (1 << order) * pagesize; + if (!pagemap_is_populated(pagemap_fd, mem + offset)) + break; + } + +out: + if (mmap_mem) + munmap(mmap_mem, mmap_size); + + if (order == 0) + return 0; + + return offset; +} + int main(int argc, char **argv) { int err; + gup_fd = open("/sys/kernel/debug/gup_test", O_RDWR); + pagemap_fd = open("/proc/self/pagemap", O_RDONLY); + if (pagemap_fd < 0) + ksft_exit_fail_msg("opening pagemap failed\n"); + pagesize = getpagesize(); thpsize = read_pmd_pagesize(); if (thpsize) ksft_print_msg("[INFO] detected THP size: %zu KiB\n", thpsize / 1024); + lafsize = large_anon_folio_size(); + if (lafsize) + ksft_print_msg("[INFO] detected large anon folio size: %zu KiB\n", + lafsize / 1024); nr_hugetlbsizes = detect_hugetlb_page_sizes(hugetlbsizes, ARRAY_SIZE(hugetlbsizes)); detect_huge_zeropage(); @@ -1698,11 +1804,6 @@ int main(int argc, char **argv) ARRAY_SIZE(anon_thp_test_cases) * tests_per_anon_thp_test_case() + ARRAY_SIZE(non_anon_test_cases) * tests_per_non_anon_test_case()); - gup_fd = open("/sys/kernel/debug/gup_test", O_RDWR); - pagemap_fd = open("/proc/self/pagemap", O_RDONLY); - if (pagemap_fd < 0) - ksft_exit_fail_msg("opening pagemap failed\n"); - run_anon_test_cases(); run_anon_thp_test_cases(); run_non_anon_test_cases();