From patchwork Thu Jun 22 14:41:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 111696 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp5121444vqr; Thu, 22 Jun 2023 07:45:24 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7fVsapuItR9mjJlfgl6558ct4pNk7gqtPU1y6h+oCcQiQeiybs+K1BGoEreJSu07y/9sW9 X-Received: by 2002:a05:6a00:2d9a:b0:66a:525f:64a7 with SMTP id fb26-20020a056a002d9a00b0066a525f64a7mr3052184pfb.16.1687445123636; Thu, 22 Jun 2023 07:45:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687445123; cv=none; d=google.com; s=arc-20160816; b=p/gH4kgR+sAG17BTj85cH8Ibxud+KLx8foGKwhkNUF87ydZCu1r68jMMgoly/Rnpbm tyjrm8BiSEFTO03kfpGzagwhP+4GXc9pZg5Bd9HDMFMMq2fwj3lt8crQpRLAb3QsCgiF pwXefEzxl6VBttzSHtGwLQRD28qnDyfmp2UylB0DOHZyELXrAKcyBuARUOc9xb54jeXu 0BuCymIXmWLbjxQwhTSgBh5UxnNaqM62hJiAfrrz27loPj8rnNFfnbs+fBKQKeIrUfnO 8ZQjSgQqRlLE4Hvv4pGnFeRdhCKEONcpHYz+RB5iQecoqkzBVKOhp1cluyE7kyuNqPZX 7hqg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=0wmgZ1gAShkaExAhWff4a7KZ8yvQfuwLze+OBa4plIM=; b=y4tlGbc5Y41cr4W3Y6m4AIyyTrTMm0/85xq4jRDETgSdbWalaq2zo+Orp7tNzxbNsr aePTTsVi+ppteVpQfdkP+xSD/Ba9tYOlam4EcbI79ojmduYzLo/W7rv4li/mO/3mtRwF oBGF/WgaLG1LhQSd1He7B932/2pm8lfv2oz0e4OYz8UcD27k/OJNXMflpF4Czawt1x+U GrUFwN/6yce/FDbOFRPZSRwBymGsGwS8dQ3BcIITC7Yyz64Wi731di1pOumRFix3rPQ1 6eUEetXkwtv54Cd6zweUK5oe8EwIAuYloKCiVoW3apmH96Thnly0YSI2rMVUTfMs8cUg W9rQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s14-20020a63770e000000b0054ff40830b5si5866721pgc.384.2023.06.22.07.45.09; Thu, 22 Jun 2023 07:45:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231720AbjFVOmo (ORCPT + 99 others); Thu, 22 Jun 2023 10:42:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48268 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231723AbjFVOmi (ORCPT ); Thu, 22 Jun 2023 10:42:38 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id E8F3D1BFC for ; Thu, 22 Jun 2023 07:42:30 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 927601042; Thu, 22 Jun 2023 07:43:14 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 0E8A43F663; Thu, 22 Jun 2023 07:42:27 -0700 (PDT) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Andrew Morton , Anshuman Khandual , Matthew Wilcox , Yu Zhao , Mark Rutland Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v1 01/14] arm64/mm: set_pte(): New layer to manage contig bit Date: Thu, 22 Jun 2023 15:41:56 +0100 Message-Id: <20230622144210.2623299-2-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230622144210.2623299-1-ryan.roberts@arm.com> References: <20230622144210.2623299-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769414458048913023?= X-GMAIL-MSGID: =?utf-8?q?1769414458048913023?= Create a new layer for the in-table PTE manipulation APIs. For now, The existing API is prefixed with double underscore to become the arch-private API and the public API is just a simple wrapper that calls the private API. The public API implementation will subsequently be used to transparently manipulate the contiguous bit where appropriate. But since there are already some contig-aware users (e.g. hugetlb, kernel mapper), we must first ensure those users use the private API directly so that the future contig-bit manipulations in the public API do not interfere with those existing uses. No behavioural changes intended. Signed-off-by: Ryan Roberts --- arch/arm64/include/asm/pgtable.h | 23 ++++++++++++++++++++--- arch/arm64/kernel/efi.c | 2 +- arch/arm64/mm/fixmap.c | 2 +- arch/arm64/mm/kasan_init.c | 4 ++-- arch/arm64/mm/mmu.c | 2 +- arch/arm64/mm/pageattr.c | 2 +- arch/arm64/mm/trans_pgd.c | 4 ++-- 7 files changed, 28 insertions(+), 11 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 6fd012663a01..7f5ce5687466 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -93,7 +93,8 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys) __pte(__phys_to_pte_val((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot)) #define pte_none(pte) (!pte_val(pte)) -#define pte_clear(mm,addr,ptep) set_pte(ptep, __pte(0)) +#define pte_clear(mm, addr, ptep) \ + __set_pte(ptep, __pte(0)) #define pte_page(pte) (pfn_to_page(pte_pfn(pte))) /* @@ -260,7 +261,7 @@ static inline pte_t pte_mkdevmap(pte_t pte) return set_pte_bit(pte, __pgprot(PTE_DEVMAP | PTE_SPECIAL)); } -static inline void set_pte(pte_t *ptep, pte_t pte) +static inline void __set_pte(pte_t *ptep, pte_t pte) { WRITE_ONCE(*ptep, pte); @@ -352,7 +353,7 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr, __check_safe_pte_update(mm, ptep, pte); - set_pte(ptep, pte); + __set_pte(ptep, pte); } static inline void set_ptes(struct mm_struct *mm, unsigned long addr, @@ -1117,6 +1118,22 @@ extern pte_t ptep_modify_prot_start(struct vm_area_struct *vma, extern void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep, pte_t old_pte, pte_t new_pte); + +/* + * The below functions constitute the public API that arm64 presents to the + * core-mm to manipulate PTE entries within the their page tables (or at least + * this is the subset of the API that arm64 needs to implement). These public + * versions will automatically and transparently apply the contiguous bit where + * it makes sense to do so. Therefore any users that are contig-aware (e.g. + * hugetlb, kernel mapper) should NOT use these APIs, but instead use the + * private versions, which are prefixed with double underscore. + */ + +static inline void set_pte(pte_t *ptep, pte_t pte) +{ + __set_pte(ptep, pte); +} + #endif /* !__ASSEMBLY__ */ #endif /* __ASM_PGTABLE_H */ diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c index baab8dd3ead3..7a28b6a08a82 100644 --- a/arch/arm64/kernel/efi.c +++ b/arch/arm64/kernel/efi.c @@ -115,7 +115,7 @@ static int __init set_permissions(pte_t *ptep, unsigned long addr, void *data) else if (IS_ENABLED(CONFIG_ARM64_BTI_KERNEL) && system_supports_bti() && spd->has_bti) pte = set_pte_bit(pte, __pgprot(PTE_GP)); - set_pte(ptep, pte); + __set_pte(ptep, pte); return 0; } diff --git a/arch/arm64/mm/fixmap.c b/arch/arm64/mm/fixmap.c index c0a3301203bd..51cd4501816d 100644 --- a/arch/arm64/mm/fixmap.c +++ b/arch/arm64/mm/fixmap.c @@ -121,7 +121,7 @@ void __set_fixmap(enum fixed_addresses idx, ptep = fixmap_pte(addr); if (pgprot_val(flags)) { - set_pte(ptep, pfn_pte(phys >> PAGE_SHIFT, flags)); + __set_pte(ptep, pfn_pte(phys >> PAGE_SHIFT, flags)); } else { pte_clear(&init_mm, addr, ptep); flush_tlb_kernel_range(addr, addr+PAGE_SIZE); diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c index e969e68de005..40125b217195 100644 --- a/arch/arm64/mm/kasan_init.c +++ b/arch/arm64/mm/kasan_init.c @@ -112,7 +112,7 @@ static void __init kasan_pte_populate(pmd_t *pmdp, unsigned long addr, if (!early) memset(__va(page_phys), KASAN_SHADOW_INIT, PAGE_SIZE); next = addr + PAGE_SIZE; - set_pte(ptep, pfn_pte(__phys_to_pfn(page_phys), PAGE_KERNEL)); + __set_pte(ptep, pfn_pte(__phys_to_pfn(page_phys), PAGE_KERNEL)); } while (ptep++, addr = next, addr != end && pte_none(READ_ONCE(*ptep))); } @@ -275,7 +275,7 @@ static void __init kasan_init_shadow(void) * so we should make sure that it maps the zero page read-only. */ for (i = 0; i < PTRS_PER_PTE; i++) - set_pte(&kasan_early_shadow_pte[i], + __set_pte(&kasan_early_shadow_pte[i], pfn_pte(sym_to_pfn(kasan_early_shadow_page), PAGE_KERNEL_RO)); diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index af6bc8403ee4..c84dc87d08b9 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -178,7 +178,7 @@ static void init_pte(pmd_t *pmdp, unsigned long addr, unsigned long end, do { pte_t old_pte = READ_ONCE(*ptep); - set_pte(ptep, pfn_pte(__phys_to_pfn(phys), prot)); + __set_pte(ptep, pfn_pte(__phys_to_pfn(phys), prot)); /* * After the PTE entry has been populated once, we diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c index 8e2017ba5f1b..057097acf9e0 100644 --- a/arch/arm64/mm/pageattr.c +++ b/arch/arm64/mm/pageattr.c @@ -41,7 +41,7 @@ static int change_page_range(pte_t *ptep, unsigned long addr, void *data) pte = clear_pte_bit(pte, cdata->clear_mask); pte = set_pte_bit(pte, cdata->set_mask); - set_pte(ptep, pte); + __set_pte(ptep, pte); return 0; } diff --git a/arch/arm64/mm/trans_pgd.c b/arch/arm64/mm/trans_pgd.c index 4ea2eefbc053..f9997b226614 100644 --- a/arch/arm64/mm/trans_pgd.c +++ b/arch/arm64/mm/trans_pgd.c @@ -40,7 +40,7 @@ static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr) * read only (code, rodata). Clear the RDONLY bit from * the temporary mappings we use during restore. */ - set_pte(dst_ptep, pte_mkwrite(pte)); + __set_pte(dst_ptep, pte_mkwrite(pte)); } else if (debug_pagealloc_enabled() && !pte_none(pte)) { /* * debug_pagealloc will removed the PTE_VALID bit if @@ -53,7 +53,7 @@ static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr) */ BUG_ON(!pfn_valid(pte_pfn(pte))); - set_pte(dst_ptep, pte_mkpresent(pte_mkwrite(pte))); + __set_pte(dst_ptep, pte_mkpresent(pte_mkwrite(pte))); } } From patchwork Thu Jun 22 14:41:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 111700 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp5123915vqr; Thu, 22 Jun 2023 07:49:37 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4tYNk5eyYWL7DVoXYyRpcsVAOH+B58Gl10gkTJR76V6HmaSkLWOhGd7RizVvVLGb4sOB+l X-Received: by 2002:a05:6a20:440d:b0:11a:2908:bb5c with SMTP id ce13-20020a056a20440d00b0011a2908bb5cmr9457593pzb.28.1687445377308; Thu, 22 Jun 2023 07:49:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687445377; cv=none; d=google.com; s=arc-20160816; b=SQ3wgfeXLYcVs1LnPexUXNtqKNIJvO/roOVm5cSiLE7+nt6Cb/lnS6K5XAmNJzI7Yp Mf3U8v/TQpHDpEuKL8AxVaHyn5T6Ozj4QS3DKtmGBHyexM83zZhC93ZSNI8m95w2gX72 5EexYqghnPcmKkvXn90HHLGpg6WSuCViM6TcJ52nlt7v3UjQeNH1v5NlN9MY6mxcxlRW 5WEH5s5QRUrtHYCZW1vWG+AyudlOPkojf0NotD6+aAUqRaAaDknxaoi/bZkCIfLfzwht 4Iu3/9hwZjTZYP/78byeQ0JkBFeyfPk2KiUqsp1pSQlrEsFKQBHEzJW4OIll1mBvRegg vzAg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=fYvhdkX/ZWLQAoUTwjzI4mzSzxcxmlI7JPHyvYv9B00=; b=U73QYD114s2yAQi9ajYkUkf/USK2oe2wC04E82owx86lWFl6XBOw4wj0/TM8H5NKVC UMgoZn2gO13SlRvA523HeNf0nOS7lgv7oXvlleqw/Gdu7IroE6HAwfAE0WKLvQ2k8Pj8 HwZ55fP6LstwyERHnncABb1qCGzrIuq/7of78hcGUr+CN0ACCxOi+NJKTUuvCIEAJCFo 7UHlww1MgiytQdjt0A4nq8W+MPJvynEyUzA7RorGjWRbgrypwtONELRpodgmlhQSzXvX 8gQ0c3yHpgi3LFnYmiLG6JjbJUBY5CvvPCdwom98Mk7ycbVN3KOT8N7Iglitlj7EWtL8 Voiw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m16-20020a63fd50000000b005573574f000si3207592pgj.182.2023.06.22.07.49.24; Thu, 22 Jun 2023 07:49:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231757AbjFVOms (ORCPT + 99 others); Thu, 22 Jun 2023 10:42:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48274 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231688AbjFVOmi (ORCPT ); Thu, 22 Jun 2023 10:42:38 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id C2DCB1FE4 for ; Thu, 22 Jun 2023 07:42:33 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6CB6C113E; Thu, 22 Jun 2023 07:43:17 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id DC8543F663; Thu, 22 Jun 2023 07:42:30 -0700 (PDT) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Andrew Morton , Anshuman Khandual , Matthew Wilcox , Yu Zhao , Mark Rutland Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v1 02/14] arm64/mm: set_ptes()/set_pte_at(): New layer to manage contig bit Date: Thu, 22 Jun 2023 15:41:57 +0100 Message-Id: <20230622144210.2623299-3-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230622144210.2623299-1-ryan.roberts@arm.com> References: <20230622144210.2623299-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769414723744192824?= X-GMAIL-MSGID: =?utf-8?q?1769414723744192824?= Create a new layer for the in-table PTE manipulation APIs. For now, The existing API is prefixed with double underscore to become the arch-private API and the public API is just a simple wrapper that calls the private API. The public API implementation will subsequently be used to transparently manipulate the contiguous bit where appropriate. But since there are already some contig-aware users (e.g. hugetlb, kernel mapper), we must first ensure those users use the private API directly so that the future contig-bit manipulations in the public API do not interfere with those existing uses. set_pte_at() is a core macro that forwards to set_ptes() (with nr=1). Instead of creating a __set_pte_at() internal macro, convert all arch users to use set_ptes()/__set_ptes() directly, as appropriate. Callers in hugetlb may benefit from calling __set_ptes() once for their whole range rather than managing their own loop. This is left for future improvement. No behavioural changes intended. Signed-off-by: Ryan Roberts --- arch/arm64/include/asm/pgtable.h | 12 +++++++++--- arch/arm64/kernel/mte.c | 2 +- arch/arm64/kvm/guest.c | 2 +- arch/arm64/mm/fault.c | 2 +- arch/arm64/mm/hugetlbpage.c | 10 +++++----- 5 files changed, 17 insertions(+), 11 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 7f5ce5687466..84919a3c558e 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -356,7 +356,7 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr, __set_pte(ptep, pte); } -static inline void set_ptes(struct mm_struct *mm, unsigned long addr, +static inline void __set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte, unsigned int nr) { page_table_check_ptes_set(mm, addr, ptep, pte, nr); @@ -370,7 +370,6 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr, pte_val(pte) += PAGE_SIZE; } } -#define set_ptes set_ptes /* * Huge pte definitions. @@ -1067,7 +1066,7 @@ static inline void arch_swap_restore(swp_entry_t entry, struct folio *folio) #endif /* CONFIG_ARM64_MTE */ /* - * On AArch64, the cache coherency is handled via the set_pte_at() function. + * On AArch64, the cache coherency is handled via the __set_ptes() function. */ static inline void update_mmu_cache_range(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep, unsigned int nr) @@ -1134,6 +1133,13 @@ static inline void set_pte(pte_t *ptep, pte_t pte) __set_pte(ptep, pte); } +#define set_ptes set_ptes +static inline void set_ptes(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte, unsigned int nr) +{ + __set_ptes(mm, addr, ptep, pte, nr); +} + #endif /* !__ASSEMBLY__ */ #endif /* __ASM_PGTABLE_H */ diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c index 7e89968bd282..9b248549a020 100644 --- a/arch/arm64/kernel/mte.c +++ b/arch/arm64/kernel/mte.c @@ -90,7 +90,7 @@ int memcmp_pages(struct page *page1, struct page *page2) /* * If the page content is identical but at least one of the pages is * tagged, return non-zero to avoid KSM merging. If only one of the - * pages is tagged, set_pte_at() may zero or change the tags of the + * pages is tagged, __set_ptes() may zero or change the tags of the * other page via mte_sync_tags(). */ if (page_mte_tagged(page1) || page_mte_tagged(page2)) diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c index 20280a5233f6..478df2edcf99 100644 --- a/arch/arm64/kvm/guest.c +++ b/arch/arm64/kvm/guest.c @@ -1087,7 +1087,7 @@ int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm, } else { /* * Only locking to serialise with a concurrent - * set_pte_at() in the VMM but still overriding the + * __set_ptes() in the VMM but still overriding the * tags, hence ignoring the return value. */ try_page_mte_tagging(page); diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index 6045a5117ac1..d3a64624ed88 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -191,7 +191,7 @@ static void show_pte(unsigned long addr) * * It needs to cope with hardware update of the accessed/dirty state by other * agents in the system and can safely skip the __sync_icache_dcache() call as, - * like set_pte_at(), the PTE is never changed from no-exec to exec here. + * like __set_ptes(), the PTE is never changed from no-exec to exec here. * * Returns whether or not the PTE actually changed. */ diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 95364e8bdc19..31a1da655bf1 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -264,12 +264,12 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, ncontig = num_contig_ptes(folio_size(folio), &pgsize); for (i = 0; i < ncontig; i++, ptep++) - set_pte_at(mm, addr, ptep, pte); + __set_ptes(mm, addr, ptep, pte, 1); return; } if (!pte_cont(pte)) { - set_pte_at(mm, addr, ptep, pte); + __set_ptes(mm, addr, ptep, pte, 1); return; } @@ -281,7 +281,7 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, clear_flush(mm, addr, ptep, pgsize, ncontig); for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn) - set_pte_at(mm, addr, ptep, pfn_pte(pfn, hugeprot)); + __set_ptes(mm, addr, ptep, pfn_pte(pfn, hugeprot), 1); } pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, @@ -496,7 +496,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma, hugeprot = pte_pgprot(pte); for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn) - set_pte_at(mm, addr, ptep, pfn_pte(pfn, hugeprot)); + __set_ptes(mm, addr, ptep, pfn_pte(pfn, hugeprot), 1); return 1; } @@ -525,7 +525,7 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm, pfn = pte_pfn(pte); for (i = 0; i < ncontig; i++, ptep++, addr += pgsize, pfn += dpfn) - set_pte_at(mm, addr, ptep, pfn_pte(pfn, hugeprot)); + __set_ptes(mm, addr, ptep, pfn_pte(pfn, hugeprot), 1); } pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, From patchwork Thu Jun 22 14:41:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 111713 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp5130088vqr; Thu, 22 Jun 2023 08:00:08 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7fy93CCiqLzTjSfiz5ili9D64TDOG7Y2Xq3ItHiHSwR3EVoS74dAaIQZrNB9bH2ZZTwVL7 X-Received: by 2002:a05:6a20:4323:b0:105:6d0e:c046 with SMTP id h35-20020a056a20432300b001056d0ec046mr23680827pzk.26.1687446007409; Thu, 22 Jun 2023 08:00:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687446007; cv=none; d=google.com; s=arc-20160816; b=q+MWU+FvUWbYJaQ0LRNL6dN65aE+jc3PPWxIcmXK5u/jhyoBxZ/aFSbp4V8Ctp1RKW QO+Fia/4+yAAbffNtm+fcy0UpZ6yT7Ab0QX3XNcLYqYAo+PUir6lfGxcO7998y59COPd +FhTOumZRy4kHxtaQvHuZx0W7gLPLNjLBokYAsnUpVbG1Bq15GR+h8uFRaMm87ifQ7Vt yDeAg0q+EMiQce74Jdr4Qycj0LMTxnUWoOpElABwCyWbfnhCjxN4NHjS77KCQa7o4WqV zNmEcgRXiGU+CxwG/KAZRhaw1t3opoSPc6nSlwtu/u1mYpYKlf4NikgdolX6+K+zyPqt Jq+g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=0oPSCCAWlfmzkzgt7vGIxZAnqWTVpAld++EiGzSncrg=; b=vtbiwj82qvk0q41qkj72Rhroua1oOmpjOC00BnHfKX1bp8F3wdveXh1PymEaAJ8apJ TVOoPWujw1InxD5MpF3G3OX2f3M440gV7EwHStPNqcqaUzMM+En0tSWOXDlDz9Zve70h lSsemxlNkmFU9Fsd56Vgg1gFbVftA7PGyHfBeo5ABIvTC0B5rJx4IjaaxIOX7FfcwzyC zQaFs+FuG06Rmdltl9TbWZtunopBhnOSNTOGqL4dGcPUauZc2Me0iWHSb9NAOWnW8R0A ZqxpZHL2yy88kAtxfsYJR3ObAm2mboIURt/ccQ7kVYIvVGSgzVnkTfMFSYNsMXxC129j 7Y7A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p27-20020a63741b000000b0054f78775084si6406304pgc.125.2023.06.22.07.59.24; Thu, 22 Jun 2023 08:00:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231761AbjFVOmw (ORCPT + 99 others); Thu, 22 Jun 2023 10:42:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48498 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231708AbjFVOmk (ORCPT ); Thu, 22 Jun 2023 10:42:40 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id A91561FEF for ; Thu, 22 Jun 2023 07:42:36 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4662B14BF; Thu, 22 Jun 2023 07:43:20 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B63163F663; Thu, 22 Jun 2023 07:42:33 -0700 (PDT) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Andrew Morton , Anshuman Khandual , Matthew Wilcox , Yu Zhao , Mark Rutland Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v1 03/14] arm64/mm: pte_clear(): New layer to manage contig bit Date: Thu, 22 Jun 2023 15:41:58 +0100 Message-Id: <20230622144210.2623299-4-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230622144210.2623299-1-ryan.roberts@arm.com> References: <20230622144210.2623299-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769415385022926962?= X-GMAIL-MSGID: =?utf-8?q?1769415385022926962?= Create a new layer for the in-table PTE manipulation APIs. For now, The existing API is prefixed with double underscore to become the arch-private API and the public API is just a simple wrapper that calls the private API. The public API implementation will subsequently be used to transparently manipulate the contiguous bit where appropriate. But since there are already some contig-aware users (e.g. hugetlb, kernel mapper), we must first ensure those users use the private API directly so that the future contig-bit manipulations in the public API do not interfere with those existing uses. No behavioural changes intended. Signed-off-by: Ryan Roberts --- arch/arm64/include/asm/pgtable.h | 8 +++++++- arch/arm64/mm/fixmap.c | 2 +- arch/arm64/mm/hugetlbpage.c | 4 ++-- arch/arm64/mm/mmu.c | 2 +- 4 files changed, 11 insertions(+), 5 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 84919a3c558e..06b5dca469f5 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -93,7 +93,7 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys) __pte(__phys_to_pte_val((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot)) #define pte_none(pte) (!pte_val(pte)) -#define pte_clear(mm, addr, ptep) \ +#define __pte_clear(mm, addr, ptep) \ __set_pte(ptep, __pte(0)) #define pte_page(pte) (pfn_to_page(pte_pfn(pte))) @@ -1140,6 +1140,12 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr, __set_ptes(mm, addr, ptep, pte, nr); } +static inline void pte_clear(struct mm_struct *mm, + unsigned long addr, pte_t *ptep) +{ + __pte_clear(mm, addr, ptep); +} + #endif /* !__ASSEMBLY__ */ #endif /* __ASM_PGTABLE_H */ diff --git a/arch/arm64/mm/fixmap.c b/arch/arm64/mm/fixmap.c index 51cd4501816d..bfc02568805a 100644 --- a/arch/arm64/mm/fixmap.c +++ b/arch/arm64/mm/fixmap.c @@ -123,7 +123,7 @@ void __set_fixmap(enum fixed_addresses idx, if (pgprot_val(flags)) { __set_pte(ptep, pfn_pte(phys >> PAGE_SHIFT, flags)); } else { - pte_clear(&init_mm, addr, ptep); + __pte_clear(&init_mm, addr, ptep); flush_tlb_kernel_range(addr, addr+PAGE_SIZE); } } diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 31a1da655bf1..eebd3107c7d2 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -236,7 +236,7 @@ static void clear_flush(struct mm_struct *mm, unsigned long i, saddr = addr; for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) - pte_clear(mm, addr, ptep); + __pte_clear(mm, addr, ptep); flush_tlb_range(&vma, saddr, addr); } @@ -418,7 +418,7 @@ void huge_pte_clear(struct mm_struct *mm, unsigned long addr, ncontig = num_contig_ptes(sz, &pgsize); for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) - pte_clear(mm, addr, ptep); + __pte_clear(mm, addr, ptep); } pte_t huge_ptep_get_and_clear(struct mm_struct *mm, diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index c84dc87d08b9..085a7e3eec98 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -853,7 +853,7 @@ static void unmap_hotplug_pte_range(pmd_t *pmdp, unsigned long addr, continue; WARN_ON(!pte_present(pte)); - pte_clear(&init_mm, addr, ptep); + __pte_clear(&init_mm, addr, ptep); flush_tlb_kernel_range(addr, addr + PAGE_SIZE); if (free_mapped) free_hotplug_page_range(pte_page(pte), From patchwork Thu Jun 22 14:41:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 111712 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp5130042vqr; Thu, 22 Jun 2023 08:00:05 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6KsTmzKQRay0dYwhhyr2UetPPMUSvBq4GC9ylhY7Kwn1E4WxG0ZxQE5X26n5w3Tr4DF2Nx X-Received: by 2002:a17:903:1c7:b0:1b6:9bf3:616b with SMTP id e7-20020a17090301c700b001b69bf3616bmr4604343plh.23.1687446003733; Thu, 22 Jun 2023 08:00:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687446003; cv=none; d=google.com; s=arc-20160816; b=II7f8Gq6B6JaZal08gYbCbbaFofDKv5ZfKLULfg6eWQsOimUjYp7HiRS5byM5VsOGO AP77qsXFD9+Y3Q39DAMQTvIbtLuo0H3dcVCYrCYQsXOhrasvQ156GQVJ11qX1n52Pdq0 TsT56yllReOQUbJ94cySvX3QwnuTrmIo3KB/iQs68gBPeYm/N1GDiYkZTNkA3ZKQgmlY FHSuG+W74gIZPjhCl35aSQEBamaX9mVKVlPTKpjaYg+m/AhZFp495oKQyPmAxOw+RbNh s7Y5RptuPPurBOaIrPqBmT34zs+no6l6JN3OaJNDUvvCFD2hVIXCOqAXZ1pnTyhkMjlq 1pNQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=AS1XtnxRscbzxA0kaRz9FvWsJTVqgcu/WVonKiamoVo=; b=UPIM04A1ipcqc+OMq1hJCdRmMBOLeV+S6cLKtR450c+dSg2ee61oVqjtetZ0XRTzqb h6FyEgJF7nxgmBvDN99CvvI4gk5Ym5/YYBTE8GrcIJI76z84nEjDO8aztOn9Av4qhjj5 rSbljImGaYosUqotXgwOxgX9ITO/qspPbzJdNae4GiXq3iItM6Yx019HNwowqAw/YI0Y s2PwbPBRQRtpcveS2rfxRCLJCJF+ofQdGa9zcXzvy5/kTki4znS8PC2xddB5ccLQeUHt yk5jQC9wJYxRnHMEJs3Ksrlsis223XogXQSz0AbJ0tRhXRCYZpP+IhB5zZaXcJ1iNEBG J03A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a21-20020a170902b59500b001b3d822f131si6539673pls.239.2023.06.22.07.59.50; Thu, 22 Jun 2023 08:00:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231748AbjFVOnB (ORCPT + 99 others); Thu, 22 Jun 2023 10:43:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48614 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231725AbjFVOmp (ORCPT ); Thu, 22 Jun 2023 10:42:45 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id D4C851FF6 for ; Thu, 22 Jun 2023 07:42:39 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 200F6C14; Thu, 22 Jun 2023 07:43:23 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 8FFBD3F663; Thu, 22 Jun 2023 07:42:36 -0700 (PDT) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Andrew Morton , Anshuman Khandual , Matthew Wilcox , Yu Zhao , Mark Rutland Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v1 04/14] arm64/mm: ptep_get_and_clear(): New layer to manage contig bit Date: Thu, 22 Jun 2023 15:41:59 +0100 Message-Id: <20230622144210.2623299-5-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230622144210.2623299-1-ryan.roberts@arm.com> References: <20230622144210.2623299-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769415381040981358?= X-GMAIL-MSGID: =?utf-8?q?1769415381040981358?= Create a new layer for the in-table PTE manipulation APIs. For now, The existing API is prefixed with double underscore to become the arch-private API and the public API is just a simple wrapper that calls the private API. The public API implementation will subsequently be used to transparently manipulate the contiguous bit where appropriate. But since there are already some contig-aware users (e.g. hugetlb, kernel mapper), we must first ensure those users use the private API directly so that the future contig-bit manipulations in the public API do not interfere with those existing uses. No behavioural changes intended. Signed-off-by: Ryan Roberts --- arch/arm64/include/asm/pgtable.h | 10 ++++++++-- arch/arm64/mm/hugetlbpage.c | 4 ++-- 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 06b5dca469f5..2a525e72537d 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -941,8 +941,7 @@ static inline int pmdp_test_and_clear_young(struct vm_area_struct *vma, } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ -#define __HAVE_ARCH_PTEP_GET_AND_CLEAR -static inline pte_t ptep_get_and_clear(struct mm_struct *mm, +static inline pte_t __ptep_get_and_clear(struct mm_struct *mm, unsigned long address, pte_t *ptep) { pte_t pte = __pte(xchg_relaxed(&pte_val(*ptep), 0)); @@ -1146,6 +1145,13 @@ static inline void pte_clear(struct mm_struct *mm, __pte_clear(mm, addr, ptep); } +#define __HAVE_ARCH_PTEP_GET_AND_CLEAR +static inline pte_t ptep_get_and_clear(struct mm_struct *mm, + unsigned long addr, pte_t *ptep) +{ + return __ptep_get_and_clear(mm, addr, ptep); +} + #endif /* !__ASSEMBLY__ */ #endif /* __ASM_PGTABLE_H */ diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index eebd3107c7d2..931a17f3c3fb 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -188,7 +188,7 @@ static pte_t get_clear_contig(struct mm_struct *mm, unsigned long i; for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) { - pte_t pte = ptep_get_and_clear(mm, addr, ptep); + pte_t pte = __ptep_get_and_clear(mm, addr, ptep); /* * If HW_AFDBM is enabled, then the HW could turn on @@ -429,7 +429,7 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm, pte_t orig_pte = ptep_get(ptep); if (!pte_cont(orig_pte)) - return ptep_get_and_clear(mm, addr, ptep); + return __ptep_get_and_clear(mm, addr, ptep); ncontig = find_num_contig(mm, addr, ptep, &pgsize); From patchwork Thu Jun 22 14:42:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 111707 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp5129074vqr; Thu, 22 Jun 2023 07:58:30 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6CeaQCnFOy+LfL3kLi1RNkWXJ7hrqt4FjOo8Kf+MIBuEZXQKibO6E1QrrcVmSxHVvp3qMk X-Received: by 2002:a17:90b:11d2:b0:25e:aedf:e82b with SMTP id gv18-20020a17090b11d200b0025eaedfe82bmr17731064pjb.15.1687445910410; Thu, 22 Jun 2023 07:58:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687445910; cv=none; d=google.com; s=arc-20160816; b=r5EofLfVUmhctXJtK9UdFh5AxW5b4XXm9trJau/+JOVHhgym2uMFiOaMGtb6w/0rRo hODoaQ7dKiIm6xN3NGxnGnypeCxWuCj7u36Q1tucCSDyCOGo452CsL2U7dYW2wMonhb8 wpsxbRQLGXau+a/0FXFKsX5+yy2ZRPJPWJM4QgRcvqRh0nLsP37IDtvlNVchY7/1uUUf usLEyOenlw/RU4mPHXigaf0IeIGYmQHqvD0sYmHyb1MNodyWGpoyG3VL1q+N2arN4FDz TPt/inndLZmvFf8NoQmZjG2aJ8ELjI/0S70P8ZEIvzKOt0mpFaUf6EiAi6+63bloq8zr Nvow== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=IEufNZScMSs5MVwG9Ps+h77UK8rEp6c++7KnxZ9ZyzQ=; b=Ri1uFy+UH7G27NH5GJ5AYwKy3QtzMsw5OhugbWBHIhWEAl7b6PLs/pLI/5ub93IKcR wUMpYOTwP6JDiLDspQq2Kg1so6WAHtUvgh4/93jjUMm/UyeLULhJqNZy107gL1qrxpxZ HHnv32xr5CVFWlMN9yWP8zUhfONSFQUaQlQ2JRUTenv0H7XvW9gf3cWFn16xr/Wd8ejY 4K3u1Q/dEZbODHCJ18XLpnsW3w3vig+WCr8bP3aqHIPgjz1VFLhIUnfrqaBlKDqnOO53 CYxm9RTpF4y73OGRjnRVszsurywTpd0fk4r3uVSCOkgm/BU6/z8Rl2w4zEit7j1+dLLa +kHQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id mv12-20020a17090b198c00b0025baa49fa95si6810636pjb.1.2023.06.22.07.58.17; Thu, 22 Jun 2023 07:58:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231820AbjFVOnH (ORCPT + 99 others); Thu, 22 Jun 2023 10:43:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48518 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231653AbjFVOmt (ORCPT ); Thu, 22 Jun 2023 10:42:49 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 532111FCB for ; Thu, 22 Jun 2023 07:42:42 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id EDA091042; Thu, 22 Jun 2023 07:43:25 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 6998E3F663; Thu, 22 Jun 2023 07:42:39 -0700 (PDT) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Andrew Morton , Anshuman Khandual , Matthew Wilcox , Yu Zhao , Mark Rutland Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v1 05/14] arm64/mm: ptep_test_and_clear_young(): New layer to manage contig bit Date: Thu, 22 Jun 2023 15:42:00 +0100 Message-Id: <20230622144210.2623299-6-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230622144210.2623299-1-ryan.roberts@arm.com> References: <20230622144210.2623299-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769415282956343884?= X-GMAIL-MSGID: =?utf-8?q?1769415282956343884?= Create a new layer for the in-table PTE manipulation APIs. For now, The existing API is prefixed with double underscore to become the arch-private API and the public API is just a simple wrapper that calls the private API. The public API implementation will subsequently be used to transparently manipulate the contiguous bit where appropriate. But since there are already some contig-aware users (e.g. hugetlb, kernel mapper), we must first ensure those users use the private API directly so that the future contig-bit manipulations in the public API do not interfere with those existing uses. No behavioural changes intended. Signed-off-by: Ryan Roberts --- arch/arm64/include/asm/pgtable.h | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 2a525e72537d..1f4efa17cc39 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -887,8 +887,9 @@ static inline bool pud_user_accessible_page(pud_t pud) /* * Atomic pte/pmd modifications. */ -#define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG -static inline int __ptep_test_and_clear_young(pte_t *ptep) +static inline int __ptep_test_and_clear_young(struct vm_area_struct *vma, + unsigned long address, + pte_t *ptep) { pte_t old_pte, pte; @@ -903,18 +904,11 @@ static inline int __ptep_test_and_clear_young(pte_t *ptep) return pte_young(pte); } -static inline int ptep_test_and_clear_young(struct vm_area_struct *vma, - unsigned long address, - pte_t *ptep) -{ - return __ptep_test_and_clear_young(ptep); -} - #define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH static inline int ptep_clear_flush_young(struct vm_area_struct *vma, unsigned long address, pte_t *ptep) { - int young = ptep_test_and_clear_young(vma, address, ptep); + int young = __ptep_test_and_clear_young(vma, address, ptep); if (young) { /* @@ -937,7 +931,7 @@ static inline int pmdp_test_and_clear_young(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp) { - return ptep_test_and_clear_young(vma, address, (pte_t *)pmdp); + return __ptep_test_and_clear_young(vma, address, (pte_t *)pmdp); } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ @@ -1152,6 +1146,13 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm, return __ptep_get_and_clear(mm, addr, ptep); } +#define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG +static inline int ptep_test_and_clear_young(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) +{ + return __ptep_test_and_clear_young(vma, addr, ptep); +} + #endif /* !__ASSEMBLY__ */ #endif /* __ASM_PGTABLE_H */ From patchwork Thu Jun 22 14:42:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 111701 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp5125799vqr; Thu, 22 Jun 2023 07:52:37 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5adlmwGWxjD9wXVkpanK+1MCZ2JpwJCPC/R1P0MpmxUBgpwdMBMj+tIexxWo3aDpHffkqd X-Received: by 2002:a05:6a21:7897:b0:122:9e67:1d77 with SMTP id bf23-20020a056a21789700b001229e671d77mr7716888pzc.38.1687445556768; Thu, 22 Jun 2023 07:52:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687445556; cv=none; d=google.com; s=arc-20160816; b=vTK1nl++FkFRUE/QgYVkGV1JIhPsaD9mk3jCgM1FhJx8B9vxaXXhtMaMMRyc2sCix6 3wybG5eLkMRn5L243j1THnaKbs9gdMuxA9V3BrGJRAnws2dYv0IkbETvF/pajRB80v77 jN+PEZrSDxajFwOjBdmft2OZmW2vMOE7qeQQmZ1g+w0y7wXokxj3uQMAvYS2rwi1vn/4 Ky3kXrj8kiFPZXZcl3GNK5sGG/uZ5VhhuY8dqBcuZgZGTPlBJ6SC+BeviKScA2FlCEFb XP4HdRahYFCNp9FchoZHkyKU3Bp674u3Qx5CK9NTSPv76EBrIt7D2APKBAtkwH97XFSj URGg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=RSoTM1LsxTuhuluqju/tIeLVsAKwQnd+CnuMIDbAx3Q=; b=ccjpDRZSGt+cJ/uzOYQG8ebaf00dGTkS/HEfeZh4VbkU9Vp0SyKkjjOoe+FybF5pje R6Bxsd0bSMBgrvYxAqhivOhBJqSve7yFqWKnfQlGtxgMi80Mlmt9oslnEhLmpecwnYEi mun0fCSlsLsALMT3aM+o3PRu6EV0Av1W7+iHd16YZTF45DDwchL4iVaHo6s7dn/icWxH QBAnTa7zYLsekPhTU2fZ5lp1Mt3+Niuyz1XsD5n6wW9Oho4mxitGOU96aNGBvTOB9e6b H557iO2+wIQWAyOU1TN3At/w1YF7S52r+7GKNb+VINALASWGANBL7OOfhLRK25YCWXrc 8CYg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l65-20020a639144000000b005573d0a4de4si2898220pge.218.2023.06.22.07.52.23; Thu, 22 Jun 2023 07:52:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231721AbjFVOnP (ORCPT + 99 others); Thu, 22 Jun 2023 10:43:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48910 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231593AbjFVOm5 (ORCPT ); Thu, 22 Jun 2023 10:42:57 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 817EB2118 for ; Thu, 22 Jun 2023 07:42:45 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C707A113E; Thu, 22 Jun 2023 07:43:28 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 431FC3F663; Thu, 22 Jun 2023 07:42:42 -0700 (PDT) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Andrew Morton , Anshuman Khandual , Matthew Wilcox , Yu Zhao , Mark Rutland Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v1 06/14] arm64/mm: ptep_clear_flush_young(): New layer to manage contig bit Date: Thu, 22 Jun 2023 15:42:01 +0100 Message-Id: <20230622144210.2623299-7-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230622144210.2623299-1-ryan.roberts@arm.com> References: <20230622144210.2623299-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769414912324539356?= X-GMAIL-MSGID: =?utf-8?q?1769414912324539356?= Create a new layer for the in-table PTE manipulation APIs. For now, The existing API is prefixed with double underscore to become the arch-private API and the public API is just a simple wrapper that calls the private API. The public API implementation will subsequently be used to transparently manipulate the contiguous bit where appropriate. But since there are already some contig-aware users (e.g. hugetlb, kernel mapper), we must first ensure those users use the private API directly so that the future contig-bit manipulations in the public API do not interfere with those existing uses. No behavioural changes intended. Signed-off-by: Ryan Roberts --- arch/arm64/include/asm/pgtable.h | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 1f4efa17cc39..450428b11c49 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -137,7 +137,7 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys) * so that we don't erroneously return false for pages that have been * remapped as PROT_NONE but are yet to be flushed from the TLB. * Note that we can't make any assumptions based on the state of the access - * flag, since ptep_clear_flush_young() elides a DSB when invalidating the + * flag, since __ptep_clear_flush_young() elides a DSB when invalidating the * TLB. */ #define pte_accessible(mm, pte) \ @@ -904,8 +904,7 @@ static inline int __ptep_test_and_clear_young(struct vm_area_struct *vma, return pte_young(pte); } -#define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH -static inline int ptep_clear_flush_young(struct vm_area_struct *vma, +static inline int __ptep_clear_flush_young(struct vm_area_struct *vma, unsigned long address, pte_t *ptep) { int young = __ptep_test_and_clear_young(vma, address, ptep); @@ -1153,6 +1152,13 @@ static inline int ptep_test_and_clear_young(struct vm_area_struct *vma, return __ptep_test_and_clear_young(vma, addr, ptep); } +#define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH +static inline int ptep_clear_flush_young(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) +{ + return __ptep_clear_flush_young(vma, addr, ptep); +} + #endif /* !__ASSEMBLY__ */ #endif /* __ASM_PGTABLE_H */ From patchwork Thu Jun 22 14:42:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 111698 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp5123328vqr; Thu, 22 Jun 2023 07:48:28 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5gs98W278/ZXzDqDWfDXwWcJiZU3kxLD5uOADsNpIyu8aL1Kph2uqeLtIGn1SULgWRIjFv X-Received: by 2002:a17:903:2451:b0:1ac:a6b0:1c87 with SMTP id l17-20020a170903245100b001aca6b01c87mr23865717pls.48.1687445308405; Thu, 22 Jun 2023 07:48:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687445308; cv=none; d=google.com; s=arc-20160816; b=T9LSH1Ixi5G8FJt6GwSIpgFQcCPTi9u7zPlC/hsXtGFUl7vWHV2KmRlEYcadlrPD0u W734BmOpD3FhcFMSwHO+gwOSnxFx0HKhHKcPE5aC77Y9nj1ti027A3ZDU0SSucflc6OU 4PphyVOZvl9fBIVQVLfGpXk24WA9Kk+bLcaWVKLMJV/lK58nGnDJ1HGdtqAI7Tuf/lTM grhns6US1pgfqloGJVIH2FFy6AKO5maIqdSpGLi1Blls++4wlO/ik1igs0w75UoCCKlh f4Ssi332bov3rITWRAm72Nbx1zHZdJ1vcxNgBLv3s6TX4VLnr+Due2emPdqwU2H03gTr fOtw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=LGmsfh5ujWnepK3ZoOUTFN5hSel/MimSSYZZvfFi/dY=; b=yf0yLgMonG7rtCwv23vfBGM3p6tSUr4H9mwKNwdrWUHgD37/zIddchr+J+yiYIXD/w LJLSqzQ1mcwXT+AoG5BHUpmSjWdTASVIddR2wv+MWg4hzQ1Q1GN6Jbw8JB/Z7MF5aFo8 1ZygAKofFvhYgwb9a8rRnOcxnxr/VPMqiYpiwshOF10MBvvfzWbFb4YasXoA6OBcYbgZ FYsRw/oL4Hbdh4fU9DFBJsdL5P4zh9pJM1eGOLEllXH1ddyDwGfu6qsCuKYflTQZJNH8 3k3+UoqTFE0bZxds3Cmc7it3iRWd/z0xnK6kh6ASxJHfF9kOTCyXe7UOt1X1+M0IdTIp qhZg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b14-20020a170902d88e00b001b51183cef3si6542624plz.214.2023.06.22.07.48.16; Thu, 22 Jun 2023 07:48:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231777AbjFVOnR (ORCPT + 99 others); Thu, 22 Jun 2023 10:43:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48290 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231791AbjFVOm7 (ORCPT ); Thu, 22 Jun 2023 10:42:59 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 05E061FEF for ; Thu, 22 Jun 2023 07:42:48 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A0A93C14; Thu, 22 Jun 2023 07:43:31 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 1CB8C3F663; Thu, 22 Jun 2023 07:42:45 -0700 (PDT) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Andrew Morton , Anshuman Khandual , Matthew Wilcox , Yu Zhao , Mark Rutland Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v1 07/14] arm64/mm: ptep_set_wrprotect(): New layer to manage contig bit Date: Thu, 22 Jun 2023 15:42:02 +0100 Message-Id: <20230622144210.2623299-8-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230622144210.2623299-1-ryan.roberts@arm.com> References: <20230622144210.2623299-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769414651210656268?= X-GMAIL-MSGID: =?utf-8?q?1769414651210656268?= Create a new layer for the in-table PTE manipulation APIs. For now, The existing API is prefixed with double underscore to become the arch-private API and the public API is just a simple wrapper that calls the private API. The public API implementation will subsequently be used to transparently manipulate the contiguous bit where appropriate. But since there are already some contig-aware users (e.g. hugetlb, kernel mapper), we must first ensure those users use the private API directly so that the future contig-bit manipulations in the public API do not interfere with those existing uses. No behavioural changes intended. Signed-off-by: Ryan Roberts --- arch/arm64/include/asm/pgtable.h | 15 +++++++++++---- arch/arm64/mm/hugetlbpage.c | 2 +- 2 files changed, 12 insertions(+), 5 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 450428b11c49..2fcc3b19c873 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -958,11 +958,11 @@ static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm, #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ /* - * ptep_set_wrprotect - mark read-only while trasferring potential hardware + * __ptep_set_wrprotect - mark read-only while trasferring potential hardware * dirty status (PTE_DBM && !PTE_RDONLY) to the software PTE_DIRTY bit. */ -#define __HAVE_ARCH_PTEP_SET_WRPROTECT -static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long address, pte_t *ptep) +static inline void __ptep_set_wrprotect(struct mm_struct *mm, + unsigned long address, pte_t *ptep) { pte_t old_pte, pte; @@ -980,7 +980,7 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addres static inline void pmdp_set_wrprotect(struct mm_struct *mm, unsigned long address, pmd_t *pmdp) { - ptep_set_wrprotect(mm, address, (pte_t *)pmdp); + __ptep_set_wrprotect(mm, address, (pte_t *)pmdp); } #define pmdp_establish pmdp_establish @@ -1159,6 +1159,13 @@ static inline int ptep_clear_flush_young(struct vm_area_struct *vma, return __ptep_clear_flush_young(vma, addr, ptep); } +#define __HAVE_ARCH_PTEP_SET_WRPROTECT +static inline void ptep_set_wrprotect(struct mm_struct *mm, + unsigned long addr, pte_t *ptep) +{ + __ptep_set_wrprotect(mm, addr, ptep); +} + #endif /* !__ASSEMBLY__ */ #endif /* __ASM_PGTABLE_H */ diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 931a17f3c3fb..7d5eb71db396 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -511,7 +511,7 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm, pte_t pte; if (!pte_cont(READ_ONCE(*ptep))) { - ptep_set_wrprotect(mm, addr, ptep); + __ptep_set_wrprotect(mm, addr, ptep); return; } From patchwork Thu Jun 22 14:42:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 111737 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp5150772vqr; Thu, 22 Jun 2023 08:25:53 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5X0EWyTLkTxt8o7YKp+EMAnwb5VufVUug9Fu2j/gICgia7XJnTUqedCKGg4swKMzzibyrV X-Received: by 2002:aa7:8894:0:b0:651:ce88:27f5 with SMTP id z20-20020aa78894000000b00651ce8827f5mr23843328pfe.13.1687447552575; Thu, 22 Jun 2023 08:25:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687447552; cv=none; d=google.com; s=arc-20160816; b=kUizLSr8WbKNG6CL4sGwVF5+/ctI+38RAEqYREHnr6491Mq+TW6xfWeQBy+8ZN2xzj Sm7ZbIyKxDsncgmGVMz7oHF2NLeKQNZzCnqEHOkNmYEUmlmWpnQ14dkmX9KVAVTACffo H+bSrcXqDVUDrq+1eKxkFMiD8vT8Bdne63egGoqobepcGuWTvQIXQSU7PbhMfTcvg6gu RFDLzE956RphK5FweEZyXZ8scSPgKfE/Gp5SBGMCq7L7OkoYjd6kZSX9fm5nn8jLfh1Y H7YvwQXTH0zwbJ5t2e7isom8YCY05Z1GFtdQCrxa01KTSDc7EIUT0h0QP1o1lVrgNbr5 b9UQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=8+4aKe5InG/sT9ahgDIcgoSk7tRyyrQ1fi5c8NU1sPo=; b=MCOvv4Ok/4vPxXH7pWp/kKLUUCofbJSXyJ1flmntxxIzGMvwx024iEvvnMy5h7dScu U3r70ykwINk3bwy+bGgYzs+CF/WfccU2RNlfzmN33xpwUlig9p6silrL4uusLPCRcD8j ydZpbdGofNql7o5FwIFOHgeLDs5jWy21wKvqvSbtSRT1LTaYzpG5kVv4uX1ewbt5gjh1 65HYTHv1qWuf64ZsA1QjS8TC1ImWsKbPe6fSKPp5ucza9mUt5hRbk4sVGXsmrDInda5c byATGvg0hOWF8gSk1OINLdRN1vdHcUNws01UENtM2l0197Nh+e0MjQQojUpe9Va+SNRz Nq2w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b13-20020aa7950d000000b0063b8f0a6f51si5208327pfp.117.2023.06.22.08.25.40; Thu, 22 Jun 2023 08:25:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231754AbjFVOnX (ORCPT + 99 others); Thu, 22 Jun 2023 10:43:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48758 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231715AbjFVOnG (ORCPT ); Thu, 22 Jun 2023 10:43:06 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id CE4D52134 for ; Thu, 22 Jun 2023 07:42:50 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7A4C61042; Thu, 22 Jun 2023 07:43:34 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id EAAD23F663; Thu, 22 Jun 2023 07:42:47 -0700 (PDT) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Andrew Morton , Anshuman Khandual , Matthew Wilcox , Yu Zhao , Mark Rutland Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v1 08/14] arm64/mm: ptep_set_access_flags(): New layer to manage contig bit Date: Thu, 22 Jun 2023 15:42:03 +0100 Message-Id: <20230622144210.2623299-9-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230622144210.2623299-1-ryan.roberts@arm.com> References: <20230622144210.2623299-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769417005176517899?= X-GMAIL-MSGID: =?utf-8?q?1769417005176517899?= Create a new layer for the in-table PTE manipulation APIs. For now, The existing API is prefixed with double underscore to become the arch-private API and the public API is just a simple wrapper that calls the private API. The public API implementation will subsequently be used to transparently manipulate the contiguous bit where appropriate. But since there are already some contig-aware users (e.g. hugetlb, kernel mapper), we must first ensure those users use the private API directly so that the future contig-bit manipulations in the public API do not interfere with those existing uses. No behavioural changes intended. Signed-off-by: Ryan Roberts --- arch/arm64/include/asm/pgtable.h | 16 ++++++++++++---- arch/arm64/mm/fault.c | 6 +++--- arch/arm64/mm/hugetlbpage.c | 2 +- 3 files changed, 16 insertions(+), 8 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 2fcc3b19c873..ff79578fd806 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -311,7 +311,7 @@ static inline void __check_safe_pte_update(struct mm_struct *mm, pte_t *ptep, /* * Check for potential race with hardware updates of the pte - * (ptep_set_access_flags safely changes valid ptes without going + * (__ptep_set_access_flags safely changes valid ptes without going * through an invalid entry). */ VM_WARN_ONCE(!pte_young(pte), @@ -842,8 +842,7 @@ static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot) return pte_pmd(pte_modify(pmd_pte(pmd), newprot)); } -#define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS -extern int ptep_set_access_flags(struct vm_area_struct *vma, +extern int __ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address, pte_t *ptep, pte_t entry, int dirty); @@ -853,7 +852,8 @@ static inline int pmdp_set_access_flags(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp, pmd_t entry, int dirty) { - return ptep_set_access_flags(vma, address, (pte_t *)pmdp, pmd_pte(entry), dirty); + return __ptep_set_access_flags(vma, address, (pte_t *)pmdp, + pmd_pte(entry), dirty); } static inline int pud_devmap(pud_t pud) @@ -1166,6 +1166,14 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm, __ptep_set_wrprotect(mm, addr, ptep); } +#define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS +static inline int ptep_set_access_flags(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep, + pte_t entry, int dirty) +{ + return __ptep_set_access_flags(vma, addr, ptep, entry, dirty); +} + #endif /* !__ASSEMBLY__ */ #endif /* __ASM_PGTABLE_H */ diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index d3a64624ed88..f5a7a5ff6814 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -195,9 +195,9 @@ static void show_pte(unsigned long addr) * * Returns whether or not the PTE actually changed. */ -int ptep_set_access_flags(struct vm_area_struct *vma, - unsigned long address, pte_t *ptep, - pte_t entry, int dirty) +int __ptep_set_access_flags(struct vm_area_struct *vma, + unsigned long address, pte_t *ptep, + pte_t entry, int dirty) { pteval_t old_pteval, pteval; pte_t pte = READ_ONCE(*ptep); diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 7d5eb71db396..9a87b1c5661a 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -477,7 +477,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma, pte_t orig_pte; if (!pte_cont(pte)) - return ptep_set_access_flags(vma, addr, ptep, pte, dirty); + return __ptep_set_access_flags(vma, addr, ptep, pte, dirty); ncontig = find_num_contig(mm, addr, ptep, &pgsize); dpfn = pgsize >> PAGE_SHIFT; From patchwork Thu Jun 22 14:42:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 111703 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp5126964vqr; Thu, 22 Jun 2023 07:54:45 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6+sO3U64JC4X8moOlupp6UgNT7gCH4CoKc4V9O63yCMC+s1+ttOeakM5+MKCfXDyooDnKW X-Received: by 2002:a05:6a20:7d8f:b0:121:98a3:d7c1 with SMTP id v15-20020a056a207d8f00b0012198a3d7c1mr15088854pzj.28.1687445685053; Thu, 22 Jun 2023 07:54:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687445685; cv=none; d=google.com; s=arc-20160816; b=lZAsowaoPatJf0VgGGY9i5K6PgLwVjt9kE+geqqowbbE7qRa4Lb5BWjSp6fj7KF6xP h0sHyS+f3HfH04xDQzoGXdkm06NDEYdgcZbq83NXHbQsbf8bcl9vLlnHcMwxhY/A7d2g PrhV4TmBouV4CtWX0sj+KdBFl8HtH0vp3gpSI74WUAcIX3YgTx6mgxoeEvMdwAGPa3IY QWAa9ILhf6UqSs185cnbGLjNB8f24JfRaLRsz02CuwbJJORfnfNnS/3YbHdqCMjjelZk iEFWfFP3d3IC+W19W42xeC8/5MIFzSNsJ3ZXVdcmhEL+3YwUbhdFnDXGCSRxtAQ/27ZD sYEw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=DGRckod/otio/3IodOPAW6O8qQnfn5Pej2aY1zORQlE=; b=jnhLmbRf1IH/H7o2MycNIrBXgKs0+vhkOBi0WA24em84P6jG453j1PCxl8bmvNPnEE 30Y40stkq7/4MN6s5X8a7+17Kd1IKIOcrji+I95/hg94bgmFahib2djFkXBDLuHshiI4 Lv2kk45JTAId6RBiPBjCcTg0uQgCXK9wSbVTY56N1NbihAfTrnElSn3DY88d04SG0dza IJ6NfRI92n8BGPcuwpGy5An+RJt38frgr3KPJS4aYKkgnk0GVxOBjDXCVif0fu/z4QZA gTQNNFKsbNJvdIMQ8EdAoiOjGpikJK8dvz6ij5jvklZx3QudwcR6xjsN1Rj64+IZgThL iE3w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e1-20020aa798c1000000b0066a1f3a9190si401881pfm.207.2023.06.22.07.54.32; Thu, 22 Jun 2023 07:54:45 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231843AbjFVOnd (ORCPT + 99 others); Thu, 22 Jun 2023 10:43:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48632 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231857AbjFVOnO (ORCPT ); Thu, 22 Jun 2023 10:43:14 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id CFAFC2690 for ; Thu, 22 Jun 2023 07:42:53 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6E9EB14BF; Thu, 22 Jun 2023 07:43:37 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id C4A133F663; Thu, 22 Jun 2023 07:42:50 -0700 (PDT) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Andrew Morton , Anshuman Khandual , Matthew Wilcox , Yu Zhao , Mark Rutland Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v1 09/14] arm64/mm: ptep_get(): New layer to manage contig bit Date: Thu, 22 Jun 2023 15:42:04 +0100 Message-Id: <20230622144210.2623299-10-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230622144210.2623299-1-ryan.roberts@arm.com> References: <20230622144210.2623299-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769415046751001022?= X-GMAIL-MSGID: =?utf-8?q?1769415046751001022?= Create a new layer for the in-table PTE manipulation APIs. For now, The existing API is prefixed with double underscore to become the arch-private API and the public API is just a simple wrapper that calls the private API. The public API implementation will subsequently be used to transparently manipulate the contiguous bit where appropriate. But since there are already some contig-aware users (e.g. hugetlb, kernel mapper), we must first ensure those users use the private API directly so that the future contig-bit manipulations in the public API do not interfere with those existing uses. arm64 did not previously define an arch-specific ptep_get(), so override the default version in the arch code, and also define the private __ptep_get() version. Currently they both do the same thing that the default version does (READ_ONCE()). Some arch users (hugetlb) were already using ptep_get() so convert those to the private API. While other callsites were doing direct READ_ONCE(), so convert those to use the appropriate (public/private) API too. There are some core kernel locations that directly dereference the ptep, so these will need to be updated separately. No behavioural changes intended. Signed-off-by: Ryan Roberts --- arch/arm64/include/asm/pgtable.h | 19 +++++++++++++++---- arch/arm64/kernel/efi.c | 2 +- arch/arm64/mm/fault.c | 4 ++-- arch/arm64/mm/hugetlbpage.c | 18 +++++++++--------- arch/arm64/mm/kasan_init.c | 2 +- arch/arm64/mm/mmu.c | 12 ++++++------ arch/arm64/mm/pageattr.c | 4 ++-- arch/arm64/mm/trans_pgd.c | 2 +- 8 files changed, 37 insertions(+), 26 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index ff79578fd806..31df4d73f9ac 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -275,6 +275,11 @@ static inline void __set_pte(pte_t *ptep, pte_t pte) } } +static inline pte_t __ptep_get(pte_t *ptep) +{ + return READ_ONCE(*ptep); +} + extern void __sync_icache_dcache(pte_t pteval); bool pgattr_change_is_safe(u64 old, u64 new); @@ -302,7 +307,7 @@ static inline void __check_safe_pte_update(struct mm_struct *mm, pte_t *ptep, if (!IS_ENABLED(CONFIG_DEBUG_VM)) return; - old_pte = READ_ONCE(*ptep); + old_pte = __ptep_get(ptep); if (!pte_valid(old_pte) || !pte_valid(pte)) return; @@ -339,7 +344,7 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr, */ if (system_supports_mte() && pte_access_permitted(pte, false) && !pte_special(pte)) { - pte_t old_pte = READ_ONCE(*ptep); + pte_t old_pte = __ptep_get(ptep); /* * We only need to synchronise if the new PTE has tags enabled * or if swapping in (in which case another mapping may have @@ -893,7 +898,7 @@ static inline int __ptep_test_and_clear_young(struct vm_area_struct *vma, { pte_t old_pte, pte; - pte = READ_ONCE(*ptep); + pte = __ptep_get(ptep); do { old_pte = pte; pte = pte_mkold(pte); @@ -966,7 +971,7 @@ static inline void __ptep_set_wrprotect(struct mm_struct *mm, { pte_t old_pte, pte; - pte = READ_ONCE(*ptep); + pte = __ptep_get(ptep); do { old_pte = pte; pte = pte_wrprotect(pte); @@ -1120,6 +1125,12 @@ extern void ptep_modify_prot_commit(struct vm_area_struct *vma, * private versions, which are prefixed with double underscore. */ +#define ptep_get ptep_get +static inline pte_t ptep_get(pte_t *ptep) +{ + return __ptep_get(ptep); +} + static inline void set_pte(pte_t *ptep, pte_t pte) { __set_pte(ptep, pte); diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c index 7a28b6a08a82..9536fbce77a2 100644 --- a/arch/arm64/kernel/efi.c +++ b/arch/arm64/kernel/efi.c @@ -106,7 +106,7 @@ static int __init set_permissions(pte_t *ptep, unsigned long addr, void *data) { struct set_perm_data *spd = data; const efi_memory_desc_t *md = spd->md; - pte_t pte = READ_ONCE(*ptep); + pte_t pte = __ptep_get(ptep); if (md->attribute & EFI_MEMORY_RO) pte = set_pte_bit(pte, __pgprot(PTE_RDONLY)); diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index f5a7a5ff6814..3193526b226d 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -177,7 +177,7 @@ static void show_pte(unsigned long addr) break; ptep = pte_offset_map(pmdp, addr); - pte = READ_ONCE(*ptep); + pte = __ptep_get(ptep); pr_cont(", pte=%016llx", pte_val(pte)); pte_unmap(ptep); } while(0); @@ -200,7 +200,7 @@ int __ptep_set_access_flags(struct vm_area_struct *vma, pte_t entry, int dirty) { pteval_t old_pteval, pteval; - pte_t pte = READ_ONCE(*ptep); + pte_t pte = __ptep_get(ptep); if (pte_same(pte, entry)) return 0; diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 9a87b1c5661a..82b2036dbe2f 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -152,14 +152,14 @@ pte_t huge_ptep_get(pte_t *ptep) { int ncontig, i; size_t pgsize; - pte_t orig_pte = ptep_get(ptep); + pte_t orig_pte = __ptep_get(ptep); if (!pte_present(orig_pte) || !pte_cont(orig_pte)) return orig_pte; ncontig = num_contig_ptes(page_size(pte_page(orig_pte)), &pgsize); for (i = 0; i < ncontig; i++, ptep++) { - pte_t pte = ptep_get(ptep); + pte_t pte = __ptep_get(ptep); if (pte_dirty(pte)) orig_pte = pte_mkdirty(orig_pte); @@ -184,7 +184,7 @@ static pte_t get_clear_contig(struct mm_struct *mm, unsigned long pgsize, unsigned long ncontig) { - pte_t orig_pte = ptep_get(ptep); + pte_t orig_pte = __ptep_get(ptep); unsigned long i; for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) { @@ -426,7 +426,7 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm, { int ncontig; size_t pgsize; - pte_t orig_pte = ptep_get(ptep); + pte_t orig_pte = __ptep_get(ptep); if (!pte_cont(orig_pte)) return __ptep_get_and_clear(mm, addr, ptep); @@ -449,11 +449,11 @@ static int __cont_access_flags_changed(pte_t *ptep, pte_t pte, int ncontig) { int i; - if (pte_write(pte) != pte_write(ptep_get(ptep))) + if (pte_write(pte) != pte_write(__ptep_get(ptep))) return 1; for (i = 0; i < ncontig; i++) { - pte_t orig_pte = ptep_get(ptep + i); + pte_t orig_pte = __ptep_get(ptep + i); if (pte_dirty(pte) != pte_dirty(orig_pte)) return 1; @@ -510,7 +510,7 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm, size_t pgsize; pte_t pte; - if (!pte_cont(READ_ONCE(*ptep))) { + if (!pte_cont(__ptep_get(ptep))) { __ptep_set_wrprotect(mm, addr, ptep); return; } @@ -535,7 +535,7 @@ pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, size_t pgsize; int ncontig; - if (!pte_cont(READ_ONCE(*ptep))) + if (!pte_cont(__ptep_get(ptep))) return ptep_clear_flush(vma, addr, ptep); ncontig = find_num_contig(mm, addr, ptep, &pgsize); @@ -569,7 +569,7 @@ pte_t huge_ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr * when the permission changes from executable to non-executable * in cases where cpu is affected with errata #2645198. */ - if (pte_user_exec(READ_ONCE(*ptep))) + if (pte_user_exec(__ptep_get(ptep))) return huge_ptep_clear_flush(vma, addr, ptep); } return huge_ptep_get_and_clear(vma->vm_mm, addr, ptep); diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c index 40125b217195..65074cf7f3a3 100644 --- a/arch/arm64/mm/kasan_init.c +++ b/arch/arm64/mm/kasan_init.c @@ -113,7 +113,7 @@ static void __init kasan_pte_populate(pmd_t *pmdp, unsigned long addr, memset(__va(page_phys), KASAN_SHADOW_INIT, PAGE_SIZE); next = addr + PAGE_SIZE; __set_pte(ptep, pfn_pte(__phys_to_pfn(page_phys), PAGE_KERNEL)); - } while (ptep++, addr = next, addr != end && pte_none(READ_ONCE(*ptep))); + } while (ptep++, addr = next, addr != end && pte_none(__ptep_get(ptep))); } static void __init kasan_pmd_populate(pud_t *pudp, unsigned long addr, diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 085a7e3eec98..d5dc36ff3827 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -176,7 +176,7 @@ static void init_pte(pmd_t *pmdp, unsigned long addr, unsigned long end, ptep = pte_set_fixmap_offset(pmdp, addr); do { - pte_t old_pte = READ_ONCE(*ptep); + pte_t old_pte = __ptep_get(ptep); __set_pte(ptep, pfn_pte(__phys_to_pfn(phys), prot)); @@ -185,7 +185,7 @@ static void init_pte(pmd_t *pmdp, unsigned long addr, unsigned long end, * only allow updates to the permission attributes. */ BUG_ON(!pgattr_change_is_safe(pte_val(old_pte), - READ_ONCE(pte_val(*ptep)))); + pte_val(__ptep_get(ptep)))); phys += PAGE_SIZE; } while (ptep++, addr += PAGE_SIZE, addr != end); @@ -848,7 +848,7 @@ static void unmap_hotplug_pte_range(pmd_t *pmdp, unsigned long addr, do { ptep = pte_offset_kernel(pmdp, addr); - pte = READ_ONCE(*ptep); + pte = __ptep_get(ptep); if (pte_none(pte)) continue; @@ -981,7 +981,7 @@ static void free_empty_pte_table(pmd_t *pmdp, unsigned long addr, do { ptep = pte_offset_kernel(pmdp, addr); - pte = READ_ONCE(*ptep); + pte = __ptep_get(ptep); /* * This is just a sanity check here which verifies that @@ -1000,7 +1000,7 @@ static void free_empty_pte_table(pmd_t *pmdp, unsigned long addr, */ ptep = pte_offset_kernel(pmdp, 0UL); for (i = 0; i < PTRS_PER_PTE; i++) { - if (!pte_none(READ_ONCE(ptep[i]))) + if (!pte_none(__ptep_get(ptep++))) return; } @@ -1470,7 +1470,7 @@ pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr, pte * when the permission changes from executable to non-executable * in cases where cpu is affected with errata #2645198. */ - if (pte_user_exec(READ_ONCE(*ptep))) + if (pte_user_exec(ptep_get(ptep))) return ptep_clear_flush(vma, addr, ptep); } return ptep_get_and_clear(vma->vm_mm, addr, ptep); diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c index 057097acf9e0..624b0b0982e3 100644 --- a/arch/arm64/mm/pageattr.c +++ b/arch/arm64/mm/pageattr.c @@ -36,7 +36,7 @@ bool can_set_direct_map(void) static int change_page_range(pte_t *ptep, unsigned long addr, void *data) { struct page_change_data *cdata = data; - pte_t pte = READ_ONCE(*ptep); + pte_t pte = __ptep_get(ptep); pte = clear_pte_bit(pte, cdata->clear_mask); pte = set_pte_bit(pte, cdata->set_mask); @@ -246,5 +246,5 @@ bool kernel_page_present(struct page *page) return true; ptep = pte_offset_kernel(pmdp, addr); - return pte_valid(READ_ONCE(*ptep)); + return pte_valid(__ptep_get(ptep)); } diff --git a/arch/arm64/mm/trans_pgd.c b/arch/arm64/mm/trans_pgd.c index f9997b226614..b130a65092c1 100644 --- a/arch/arm64/mm/trans_pgd.c +++ b/arch/arm64/mm/trans_pgd.c @@ -32,7 +32,7 @@ static void *trans_alloc(struct trans_pgd_info *info) static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr) { - pte_t pte = READ_ONCE(*src_ptep); + pte_t pte = __ptep_get(src_ptep); if (pte_valid(pte)) { /* From patchwork Thu Jun 22 14:42:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 111708 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp5129567vqr; Thu, 22 Jun 2023 07:59:22 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6a/V3i1H0u1JZcSpS6KsMsy6DLLvNzn5U2+bb9dl7lmeclriu+gepCmElJsw6E9UknRI/J X-Received: by 2002:a92:4b02:0:b0:33d:ab70:3447 with SMTP id m2-20020a924b02000000b0033dab703447mr17854095ilg.19.1687445962085; Thu, 22 Jun 2023 07:59:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687445962; cv=none; d=google.com; s=arc-20160816; b=syWEmUJ7dhLwHlXp9r0+TMwaUlY5OsDtopHcB3ax4nt9JQIvpFftA/YH0wKJVmir91 +urYhHB4hHayZA9YgqFik8HQ1XhuNsvJ9GVPszLQZHN2iX77UtjW34RqhFonPkCT6/Eq livcx4xa8DjVIsNU4B4ue2mrI1ZFvTvfwOnZVqP92+BkLLSo+WBTAGOTI2tzxug7qOtu 06fAIleFHDrz7mgT/obbM+pCLYw0kdkxpdF6sGhRNEHzgF5Tt+fEV9fGCg6I7rc2JfDN rHgWKlkKbRrCJheoT6qoeLbNt1I/iThyPjlyFsLzyNZhj0QN5HwzFenNLfwkN+624/Lw DiYA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=WAztuwwCVy8qP9L2dunxZQePeRG4wFUsOFlwxBwE1/E=; b=w0I8PPa2wCiHw2Bj6UOMnOdk0WnLMTjPTuaEeJ47ihBJ8UQgX9kp8SAQ0XEiZfEcIP sdL+TFoSYl05vqJOj2d5Thaoo0b9MwFIga3u2/VLEeoTX+xColSukHlmTyqkSoV/GWvl a9be/bjXqZqUCo+wYEMSrpvNFrrjPnQ9xqbCNkHnTpMLA4/TrZqkHenMxRLsQCRoPnnO FTXvq4C3HA6w5qIxIJJwtd7ZHZ+RdYcm0RXNciQvIGQCCBn8+224yQGx1w3nUJT4sk0i MxffmMhFCwlleF/h2O4NC0vfOU68vtFTJ3oa6ZztqlO9wsOs6hFnD1wk12ckoNBHdzCh b15A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 14-20020a63010e000000b0054033bf35f4si6379054pgb.773.2023.06.22.07.59.09; Thu, 22 Jun 2023 07:59:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231806AbjFVOnf (ORCPT + 99 others); Thu, 22 Jun 2023 10:43:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48408 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231791AbjFVOnS (ORCPT ); Thu, 22 Jun 2023 10:43:18 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id B24F726AE for ; Thu, 22 Jun 2023 07:42:56 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4843F113E; Thu, 22 Jun 2023 07:43:40 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B85DE3F663; Thu, 22 Jun 2023 07:42:53 -0700 (PDT) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Andrew Morton , Anshuman Khandual , Matthew Wilcox , Yu Zhao , Mark Rutland Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v1 10/14] arm64/mm: Split __flush_tlb_range() to elide trailing DSB Date: Thu, 22 Jun 2023 15:42:05 +0100 Message-Id: <20230622144210.2623299-11-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230622144210.2623299-1-ryan.roberts@arm.com> References: <20230622144210.2623299-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769415337034285367?= X-GMAIL-MSGID: =?utf-8?q?1769415337034285367?= Split __flush_tlb_range() into __flush_tlb_range_nosync() + __flush_tlb_range(), in the same way as the existing flush_tlb_page() arrangement. This allows calling __flush_tlb_range_nosync() to elide the trailing DSB. Forthcoming "contpte" code will take advantage of this when clearing the young bit from a contiguous range of ptes. Signed-off-by: Ryan Roberts --- arch/arm64/include/asm/tlbflush.h | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h index 412a3b9a3c25..de1f5d9a546e 100644 --- a/arch/arm64/include/asm/tlbflush.h +++ b/arch/arm64/include/asm/tlbflush.h @@ -278,7 +278,7 @@ static inline void flush_tlb_page(struct vm_area_struct *vma, */ #define MAX_TLBI_OPS PTRS_PER_PTE -static inline void __flush_tlb_range(struct vm_area_struct *vma, +static inline void __flush_tlb_range_nosync(struct vm_area_struct *vma, unsigned long start, unsigned long end, unsigned long stride, bool last_level, int tlb_level) @@ -357,6 +357,15 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma, } scale++; } +} + +static inline void __flush_tlb_range(struct vm_area_struct *vma, + unsigned long start, unsigned long end, + unsigned long stride, bool last_level, + int tlb_level) +{ + __flush_tlb_range_nosync(vma, start, end, stride, + last_level, tlb_level); dsb(ish); } From patchwork Thu Jun 22 14:42:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 111710 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp5129671vqr; Thu, 22 Jun 2023 07:59:30 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7tYThnSxZOHm3SrMfs/D0pbDxp/CcHFX6JxdIO7R3UiYJvxrj/G7sN+8uRO585Tj9B/hJA X-Received: by 2002:a05:6359:6389:b0:132:d30e:9e7a with SMTP id sg9-20020a056359638900b00132d30e9e7amr631568rwb.8.1687445970338; Thu, 22 Jun 2023 07:59:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687445970; cv=none; d=google.com; s=arc-20160816; b=PC0zC2pGnWtTzvDsHi9OXyNgcIMK1YzU+nOJBMsxc/gvaYPG/4nnsIeQU/Kz40+wbS qInD12/2q8BAZ9+mQBSx1EHIxssBLjDO2sTyoTfazOyqHQlOq4ITtRR99P1ZQW4f44Uz 2mCg10BkXHELahPYpv6uEKqaaXq8Vr6OOL7xbRmrjrZu/tSsZN9hcABBdDYp+maS3aTk Z/V+ZodPzcMWWr2XlojQNgdd9LsnH5fVwMnNZyOryi82KpnD9y/gP9IE/W/fj+zNoN9E uzrib6ghzvme7kd2uE46jSEwqDZ/xWVREKB7NO4e5fpJK9EjBeMJW+uJZhbyx8PNApwg APjQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=upLk2HLYgqwAW4MQvQEZ09lc7CQdtN33REcZ0u51mAs=; b=VWBmZHz+fK3ZlqwsKb2r4J0IP2SD77CTdsqKA9/c/6SeRXRqi9j9kHSLzvtdroZCjP wgmg3p6iA74OG7l2MdxZJw3z3v+F/cgqW6HWm770y9v00aHIfnptgKvdPQdKiw952Spn 6LBPj/VI8u7PgxJ1QbX+ynNveBIStQcdkcdyvVC9AXEIayqDo/PEx4JX/ckQTxjqt9t0 SEfZiYEht+Mtj+oipAPR66TqZ4pBrft+IJlr6voVTJOfVMNNcI/DHvM3QGwz/ij/5bKX slXUu+MuZlj3lGymC80BdHU3Gm+FYcQGET5nca8/pMZtGHzA1Yn97wIWaYuw+gBBU38m U/7w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e30-20020a63371e000000b0055337ca9ce9si6213512pga.248.2023.06.22.07.59.17; Thu, 22 Jun 2023 07:59:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231829AbjFVOnr (ORCPT + 99 others); Thu, 22 Jun 2023 10:43:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49472 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231769AbjFVOn2 (ORCPT ); Thu, 22 Jun 2023 10:43:28 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 465A22705 for ; Thu, 22 Jun 2023 07:43:02 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 22AEF1515; Thu, 22 Jun 2023 07:43:43 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 925DC3F663; Thu, 22 Jun 2023 07:42:56 -0700 (PDT) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Andrew Morton , Anshuman Khandual , Matthew Wilcox , Yu Zhao , Mark Rutland Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v1 11/14] arm64/mm: Wire up PTE_CONT for user mappings Date: Thu, 22 Jun 2023 15:42:06 +0100 Message-Id: <20230622144210.2623299-12-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230622144210.2623299-1-ryan.roberts@arm.com> References: <20230622144210.2623299-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769415345577260567?= X-GMAIL-MSGID: =?utf-8?q?1769415345577260567?= With the ptep API sufficiently refactored, we can now introduce a new "contpte" API layer, which transparently manages the PTE_CONT bit for user mappings. Whenever it detects a set of PTEs that meet the requirements for a contiguous range, the PTEs are re-painted with the PTE_CONT bit. This initial change provides a baseline that can be optimized in future commits. That said, fold/unfold operations (which imply tlb invalidation) are avoided where possible with a few tricks for access/dirty bit management. Write-enable and write-protect modifications are likely non-optimal and likely incure a regression in fork() performance. This will be addressed separately. Signed-off-by: Ryan Roberts --- arch/arm64/include/asm/pgtable.h | 137 ++++++++++++- arch/arm64/mm/Makefile | 3 +- arch/arm64/mm/contpte.c | 334 +++++++++++++++++++++++++++++++ 3 files changed, 466 insertions(+), 8 deletions(-) create mode 100644 arch/arm64/mm/contpte.c diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 31df4d73f9ac..17ea534bc5b0 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -1115,6 +1115,71 @@ extern void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep, pte_t old_pte, pte_t new_pte); +/* + * The contpte APIs are used to transparently manage the contiguous bit in ptes + * where it is possible and makes sense to do so. The PTE_CONT bit is considered + * a private implementation detail of the public ptep API (see below). + */ +extern void __contpte_try_fold(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte); +extern void __contpte_try_unfold(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte); +extern pte_t contpte_ptep_get(pte_t *ptep, pte_t orig_pte); +extern pte_t contpte_ptep_get_lockless(pte_t *orig_ptep); +extern void contpte_set_ptes(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte, unsigned int nr); +extern int contpte_ptep_test_and_clear_young(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep); +extern int contpte_ptep_clear_flush_young(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep); +extern int contpte_ptep_set_access_flags(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep, + pte_t entry, int dirty); + +static inline pte_t *contpte_align_down(pte_t *ptep) +{ + return (pte_t *)(ALIGN_DOWN((unsigned long)ptep >> 3, CONT_PTES) << 3); +} + +static inline bool contpte_is_enabled(struct mm_struct *mm) +{ + /* + * Don't attempt to apply the contig bit to kernel mappings, because + * dynamically adding/removing the contig bit can cause page faults. + * These racing faults are ok for user space, since they get serialized + * on the PTL. But kernel mappings can't tolerate faults. + */ + + return mm != &init_mm; +} + +static inline void contpte_try_fold(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte) +{ + /* + * Only bother trying if both the virtual and physical addresses are + * aligned and correspond to the last entry in a contig range. The core + * code mostly modifies ranges from low to high, so this is the likely + * the last modification in the contig range, so a good time to fold. + */ + + bool valign = ((unsigned long)ptep >> 3) % CONT_PTES == CONT_PTES - 1; + bool palign = pte_pfn(pte) % CONT_PTES == CONT_PTES - 1; + + if (contpte_is_enabled(mm) && + pte_present(pte) && !pte_cont(pte) && + valign && palign) + __contpte_try_fold(mm, addr, ptep, pte); +} + +static inline void contpte_try_unfold(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte) +{ + if (contpte_is_enabled(mm) && + pte_present(pte) && pte_cont(pte)) + __contpte_try_unfold(mm, addr, ptep, pte); +} + /* * The below functions constitute the public API that arm64 presents to the * core-mm to manipulate PTE entries within the their page tables (or at least @@ -1122,30 +1187,68 @@ extern void ptep_modify_prot_commit(struct vm_area_struct *vma, * versions will automatically and transparently apply the contiguous bit where * it makes sense to do so. Therefore any users that are contig-aware (e.g. * hugetlb, kernel mapper) should NOT use these APIs, but instead use the - * private versions, which are prefixed with double underscore. + * private versions, which are prefixed with double underscore. All of these + * APIs except for ptep_get_lockless() are expected to be called with the PTL + * held. */ #define ptep_get ptep_get static inline pte_t ptep_get(pte_t *ptep) { - return __ptep_get(ptep); + pte_t pte = __ptep_get(ptep); + + if (!pte_present(pte) || !pte_cont(pte)) + return pte; + + return contpte_ptep_get(ptep, pte); +} + +#define ptep_get_lockless ptep_get_lockless +static inline pte_t ptep_get_lockless(pte_t *ptep) +{ + pte_t pte = __ptep_get(ptep); + + if (!pte_present(pte) || !pte_cont(pte)) + return pte; + + return contpte_ptep_get_lockless(ptep); } static inline void set_pte(pte_t *ptep, pte_t pte) { - __set_pte(ptep, pte); + /* + * We don't have the mm or vaddr so cannot unfold or fold contig entries + * (since it requires tlb maintenance). set_pte() is not used in core + * code, so this should never even be called. Regardless do our best to + * service any call and emit a warning if there is any attempt to set a + * pte on top of an existing contig range. + */ + pte_t orig_pte = __ptep_get(ptep); + + WARN_ON_ONCE(pte_present(orig_pte) && pte_cont(orig_pte)); + __set_pte(ptep, pte_mknoncont(pte)); } #define set_ptes set_ptes static inline void set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte, unsigned int nr) { - __set_ptes(mm, addr, ptep, pte, nr); + pte = pte_mknoncont(pte); + + if (!contpte_is_enabled(mm)) + __set_ptes(mm, addr, ptep, pte, nr); + else if (nr == 1) { + contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep)); + __set_ptes(mm, addr, ptep, pte, nr); + contpte_try_fold(mm, addr, ptep, pte); + } else + contpte_set_ptes(mm, addr, ptep, pte, nr); } static inline void pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { + contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep)); __pte_clear(mm, addr, ptep); } @@ -1153,6 +1256,7 @@ static inline void pte_clear(struct mm_struct *mm, static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { + contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep)); return __ptep_get_and_clear(mm, addr, ptep); } @@ -1160,21 +1264,33 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm, static inline int ptep_test_and_clear_young(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) { - return __ptep_test_and_clear_young(vma, addr, ptep); + pte_t orig_pte = __ptep_get(ptep); + + if (!pte_present(orig_pte) || !pte_cont(orig_pte)) + return __ptep_test_and_clear_young(vma, addr, ptep); + + return contpte_ptep_test_and_clear_young(vma, addr, ptep); } #define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH static inline int ptep_clear_flush_young(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) { - return __ptep_clear_flush_young(vma, addr, ptep); + pte_t orig_pte = __ptep_get(ptep); + + if (!pte_present(orig_pte) || !pte_cont(orig_pte)) + return __ptep_clear_flush_young(vma, addr, ptep); + + return contpte_ptep_clear_flush_young(vma, addr, ptep); } #define __HAVE_ARCH_PTEP_SET_WRPROTECT static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { + contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep)); __ptep_set_wrprotect(mm, addr, ptep); + contpte_try_fold(mm, addr, ptep, __ptep_get(ptep)); } #define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS @@ -1182,7 +1298,14 @@ static inline int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep, pte_t entry, int dirty) { - return __ptep_set_access_flags(vma, addr, ptep, entry, dirty); + pte_t orig_pte = __ptep_get(ptep); + + entry = pte_mknoncont(entry); + + if (!pte_present(orig_pte) || !pte_cont(orig_pte)) + return __ptep_set_access_flags(vma, addr, ptep, entry, dirty); + + return contpte_ptep_set_access_flags(vma, addr, ptep, entry, dirty); } #endif /* !__ASSEMBLY__ */ diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile index dbd1bc95967d..70b6aba09b5d 100644 --- a/arch/arm64/mm/Makefile +++ b/arch/arm64/mm/Makefile @@ -2,7 +2,8 @@ obj-y := dma-mapping.o extable.o fault.o init.o \ cache.o copypage.o flush.o \ ioremap.o mmap.o pgd.o mmu.o \ - context.o proc.o pageattr.o fixmap.o + context.o proc.o pageattr.o fixmap.o \ + contpte.o obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o obj-$(CONFIG_PTDUMP_CORE) += ptdump.o obj-$(CONFIG_PTDUMP_DEBUGFS) += ptdump_debugfs.o diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c new file mode 100644 index 000000000000..e8e4a298fd53 --- /dev/null +++ b/arch/arm64/mm/contpte.c @@ -0,0 +1,334 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2023 ARM Ltd. + */ + +#include +#include + +static void ptep_clear_flush_range(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, int nr) +{ + struct vm_area_struct vma = TLB_FLUSH_VMA(mm, 0); + unsigned long start_addr = addr; + int i; + + for (i = 0; i < nr; i++, ptep++, addr += PAGE_SIZE) + __pte_clear(mm, addr, ptep); + + __flush_tlb_range(&vma, start_addr, addr, PAGE_SIZE, true, 3); +} + +static bool ptep_any_present(pte_t *ptep, int nr) +{ + int i; + + for (i = 0; i < nr; i++, ptep++) { + if (pte_present(__ptep_get(ptep))) + return true; + } + + return false; +} + +static void contpte_fold(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte, bool fold) +{ + struct vm_area_struct vma = TLB_FLUSH_VMA(mm, 0); + unsigned long start_addr; + pte_t *start_ptep; + int i; + + start_ptep = ptep = contpte_align_down(ptep); + start_addr = addr = ALIGN_DOWN(addr, CONT_PTE_SIZE); + pte = pfn_pte(ALIGN_DOWN(pte_pfn(pte), CONT_PTES), pte_pgprot(pte)); + pte = fold ? pte_mkcont(pte) : pte_mknoncont(pte); + + for (i = 0; i < CONT_PTES; i++, ptep++, addr += PAGE_SIZE) { + pte_t ptent = __ptep_get_and_clear(mm, addr, ptep); + + if (pte_dirty(ptent)) + pte = pte_mkdirty(pte); + + if (pte_young(ptent)) + pte = pte_mkyoung(pte); + } + + __flush_tlb_range(&vma, start_addr, addr, PAGE_SIZE, true, 3); + + __set_ptes(mm, start_addr, start_ptep, pte, CONT_PTES); +} + +void __contpte_try_fold(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte) +{ + /* + * We have already checked that the virtual and pysical addresses are + * correctly aligned for a contig mapping in contpte_try_fold() so the + * remaining checks are to ensure that the contig range is fully covered + * by a single folio, and ensure that all the ptes are present with + * contiguous PFNs and matching prots. + */ + + struct page *page = pte_page(pte); + struct folio *folio = page_folio(page); + unsigned long folio_saddr = addr - (page - &folio->page) * PAGE_SIZE; + unsigned long folio_eaddr = folio_saddr + folio_nr_pages(folio) * PAGE_SIZE; + unsigned long cont_saddr = ALIGN_DOWN(addr, CONT_PTE_SIZE); + unsigned long cont_eaddr = cont_saddr + CONT_PTE_SIZE; + unsigned long pfn; + pgprot_t prot; + pte_t subpte; + pte_t *orig_ptep; + int i; + + if (folio_saddr > cont_saddr || folio_eaddr < cont_eaddr) + return; + + pfn = pte_pfn(pte) - ((addr - cont_saddr) >> PAGE_SHIFT); + prot = pte_pgprot(pte_mkold(pte_mkclean(pte))); + orig_ptep = ptep; + ptep = contpte_align_down(ptep); + + for (i = 0; i < CONT_PTES; i++, ptep++, pfn++) { + subpte = __ptep_get(ptep); + subpte = pte_mkold(pte_mkclean(subpte)); + + if (!pte_present(subpte) || + pte_pfn(subpte) != pfn || + pgprot_val(pte_pgprot(subpte)) != pgprot_val(prot)) + return; + } + + contpte_fold(mm, addr, orig_ptep, pte, true); +} + +void __contpte_try_unfold(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte) +{ + /* + * We have already checked that the ptes are contiguous in + * contpte_try_unfold(), so we can unfold unconditionally here. + */ + + contpte_fold(mm, addr, ptep, pte, false); +} + +pte_t contpte_ptep_get(pte_t *ptep, pte_t orig_pte) +{ + /* + * Gather access/dirty bits, which may be populated in any of the ptes + * of the contig range. We are guarranteed to be holding the PTL, so any + * contiguous range cannot be unfolded or otherwise modified under our + * feet. + */ + + pte_t pte; + int i; + + ptep = contpte_align_down(ptep); + + for (i = 0; i < CONT_PTES; i++, ptep++) { + pte = __ptep_get(ptep); + + /* + * Deal with the partial contpte_ptep_get_and_clear_full() case, + * where some of the ptes in the range may be cleared but others + * are still to do. See contpte_ptep_get_and_clear_full(). + */ + if (pte_val(pte) == 0) + continue; + + if (pte_dirty(pte)) + orig_pte = pte_mkdirty(orig_pte); + + if (pte_young(pte)) + orig_pte = pte_mkyoung(orig_pte); + } + + return orig_pte; +} + +pte_t contpte_ptep_get_lockless(pte_t *orig_ptep) +{ + /* + * Gather access/dirty bits, which may be populated in any of the ptes + * of the contig range. We may not be holding the PTL, so any contiguous + * range may be unfolded/modified/refolded under our feet. + */ + + pte_t orig_pte; + pgprot_t orig_prot; + pte_t *ptep; + unsigned long pfn; + pte_t pte; + pgprot_t prot; + int i; + +retry: + orig_pte = __ptep_get(orig_ptep); + + if (!pte_present(orig_pte) || !pte_cont(orig_pte)) + return orig_pte; + + orig_prot = pte_pgprot(pte_mkold(pte_mkclean(orig_pte))); + ptep = contpte_align_down(orig_ptep); + pfn = pte_pfn(orig_pte) - (orig_ptep - ptep); + + for (i = 0; i < CONT_PTES; i++, ptep++, pfn++) { + pte = __ptep_get(ptep); + prot = pte_pgprot(pte_mkold(pte_mkclean(pte))); + + if (!pte_present(pte) || !pte_cont(pte) || + pte_pfn(pte) != pfn || + pgprot_val(prot) != pgprot_val(orig_prot)) + goto retry; + + if (pte_dirty(pte)) + orig_pte = pte_mkdirty(orig_pte); + + if (pte_young(pte)) + orig_pte = pte_mkyoung(orig_pte); + } + + return orig_pte; +} + +void contpte_set_ptes(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte, unsigned int nr) +{ + unsigned long next; + unsigned long end = addr + (nr << PAGE_SHIFT); + unsigned long pfn = pte_pfn(pte); + pgprot_t prot = pte_pgprot(pte); + pte_t orig_pte; + + do { + next = pte_cont_addr_end(addr, end); + nr = (next - addr) >> PAGE_SHIFT; + pte = pfn_pte(pfn, prot); + + if (((addr | next | (pfn << PAGE_SHIFT)) & ~CONT_PTE_MASK) == 0) + pte = pte_mkcont(pte); + else + pte = pte_mknoncont(pte); + + /* + * If operating on a partial contiguous range then we must first + * unfold the contiguous range if it was previously folded. + * Otherwise we could end up with overlapping tlb entries. + */ + if (nr != CONT_PTES) + contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep)); + + /* + * If we are replacing ptes that were contiguous or if the new + * ptes are contiguous and any of the ptes being replaced are + * present, we need to clear and flush the range to prevent + * overlapping tlb entries. + */ + orig_pte = __ptep_get(ptep); + if ((pte_present(orig_pte) && pte_cont(orig_pte)) || + (pte_cont(pte) && ptep_any_present(ptep, nr))) + ptep_clear_flush_range(mm, addr, ptep, nr); + + __set_ptes(mm, addr, ptep, pte, nr); + + addr = next; + ptep += nr; + pfn += nr; + + } while (addr != end); +} + +int contpte_ptep_test_and_clear_young(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) +{ + /* + * ptep_clear_flush_young() technically requires us to clear the access + * flag for a _single_ pte. However, the core-mm code actually tracks + * access/dirty per folio, not per page. And since we only create a + * contig range when the range is covered by a single folio, we can get + * away with clearing young for the whole contig range here, so we avoid + * having to unfold. + */ + + int i; + int young = 0; + + ptep = contpte_align_down(ptep); + addr = ALIGN_DOWN(addr, CONT_PTE_SIZE); + + for (i = 0; i < CONT_PTES; i++, ptep++, addr += PAGE_SIZE) + young |= __ptep_test_and_clear_young(vma, addr, ptep); + + return young; +} + +int contpte_ptep_clear_flush_young(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) +{ + int young; + + young = contpte_ptep_test_and_clear_young(vma, addr, ptep); + addr = ALIGN_DOWN(addr, CONT_PTE_SIZE); + + if (young) { + /* + * See comment in __ptep_clear_flush_young(); same rationale for + * eliding the trailing DSB applies here. + */ + __flush_tlb_range_nosync(vma, addr, addr + CONT_PTE_SIZE, + PAGE_SIZE, true, 3); + } + + return young; +} + +int contpte_ptep_set_access_flags(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep, + pte_t entry, int dirty) +{ + pte_t orig_pte; + int i; + + /* + * Gather the access/dirty bits for the contiguous range. If nothing has + * changed, its a noop. + */ + orig_pte = ptep_get(ptep); + if (pte_val(orig_pte) == pte_val(entry)) + return 0; + + /* + * We can fix up access/dirty bits without having to unfold/fold the + * contig range. But if the write bit is changing, we need to go through + * the full unfold/fold cycle. + */ + if (pte_write(orig_pte) == pte_write(entry)) { + /* + * No need to flush here; This is always "more permissive" so we + * can only be _adding_ the access or dirty bit. And since the + * tlb can't cache an entry without the AF set and the dirty bit + * is a SW bit, there can be no confusion. For HW access + * management, we technically only need to update the flag on a + * single pte in the range. But for SW access management, we + * need to update all the ptes to prevent extra faults. + */ + ptep = contpte_align_down(ptep); + addr = ALIGN_DOWN(addr, CONT_PTE_SIZE); + + for (i = 0; i < CONT_PTES; i++, ptep++, addr += PAGE_SIZE) + __ptep_set_access_flags(vma, addr, ptep, entry, 0); + } else { + /* + * No need to flush in __ptep_set_access_flags() because we just + * flushed the whole range in __contpte_try_unfold(). + */ + __contpte_try_unfold(vma->vm_mm, addr, ptep, orig_pte); + __ptep_set_access_flags(vma, addr, ptep, entry, 0); + contpte_try_fold(vma->vm_mm, addr, ptep, entry); + } + + return 1; +} From patchwork Thu Jun 22 14:42:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 111705 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp5128645vqr; Thu, 22 Jun 2023 07:57:40 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5/RMcxNc0zQ6Zw2z4LGOIhlRg0Td/cZ705uARvaoQr0XxtpegJQixxvnmX0F84tJZoddBu X-Received: by 2002:a05:6808:408e:b0:394:2868:d51f with SMTP id db14-20020a056808408e00b003942868d51fmr16376229oib.4.1687445860445; Thu, 22 Jun 2023 07:57:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687445860; cv=none; d=google.com; s=arc-20160816; b=z+JIAZjKXterWrFVo1zOXPAorwvyoyBCi4CZn0/1qKTZs0WAISBQtGI4OlBGBjkUex B5AmeBIeEIi+I2spTSrc9Fvj0AhbR91uHvhE8TtZMTd1c4unFN1WGMe55GhybhggbEzw Gf1jtvEQ1L1GzZfEq/W4/LC9bQSVaOpOYO0pViD6rR8pceGxm2jYC+M3forDKIpwQy99 yt3oWXgvrce42/cE4PFbq+3OhlCsN0j63QflUKrTPtrvWsEjLvuMjSTeDZnAH9FM4Adw VUBhqx8r0lg/ERuyh5Jx9kVOqEtu87z9GMdl7BQoOI9tliujK4lmVRGCQRD9qQrXd6Y0 eySg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=dsj19i4WO7Xm5RCy72VgkaRoah4zje0Yv8SkvWhAYkk=; b=m8/tPDgY2AtL/I77MhL4IPinFPdTroGPexLlOSwYMf6sD9f42blYwkrj5QRG4Qbb86 d/DlfqE4ANHypoCEljIkw3TfWKcgq7g24me1Hoaomqp58p72Ec5t3xIwytlrc4IvLYGi cmygXE7xqhnuKJuGLxP+N96t/yq/z2GvpR8q5lEa8/3y69umevZ13S3QnnqXpC2f4Jzi 4V3fsPXgeGUBlSdnb9scWEpdWzgFqGKOhJ8dLbJFo61ifkYv72ldyNSFDyk0mj8OP8rc mvqhCaHoi1g38sDnjkJu7SJq+hG/LaSDPD3k7i/IkMSxlrTmDB+SHJ8cfJe1EjDYcXm2 SO9Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id kx18-20020a17090b229200b00259aa59bf3esi720772pjb.176.2023.06.22.07.57.02; Thu, 22 Jun 2023 07:57:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231861AbjFVOoD (ORCPT + 99 others); Thu, 22 Jun 2023 10:44:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49370 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231848AbjFVOng (ORCPT ); Thu, 22 Jun 2023 10:43:36 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 1E009270B for ; Thu, 22 Jun 2023 07:43:12 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id F0FCAC14; Thu, 22 Jun 2023 07:43:45 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 6C4793F663; Thu, 22 Jun 2023 07:42:59 -0700 (PDT) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Andrew Morton , Anshuman Khandual , Matthew Wilcox , Yu Zhao , Mark Rutland Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v1 12/14] arm64/mm: Add ptep_get_and_clear_full() to optimize process teardown Date: Thu, 22 Jun 2023 15:42:07 +0100 Message-Id: <20230622144210.2623299-13-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230622144210.2623299-1-ryan.roberts@arm.com> References: <20230622144210.2623299-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769415230808446159?= X-GMAIL-MSGID: =?utf-8?q?1769415230808446159?= ptep_get_and_clear_full() adds a 'full' parameter which is not present for the fallback ptep_get_and_clear() function. 'full' is set to 1 when a full address space teardown is in progress. We use this information to optimize arm64_sys_exit_group() by avoiding unfolding (and therefore tlbi) contiguous ranges. Instead we just clear the PTE but allow all the contiguous neighbours to keep their contig bit set, because we know we are about to clear the rest too. Before this optimization, the cost of arm64_sys_exit_group() exploded to 32x what it was before PTE_CONT support was wired up, when compiling the kernel. With this optimization in place, we are back down to the original cost. Signed-off-by: Ryan Roberts --- arch/arm64/include/asm/pgtable.h | 18 ++++++++- arch/arm64/mm/contpte.c | 68 ++++++++++++++++++++++++++++++++ 2 files changed, 84 insertions(+), 2 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 17ea534bc5b0..5963da651da7 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -1128,6 +1128,8 @@ extern pte_t contpte_ptep_get(pte_t *ptep, pte_t orig_pte); extern pte_t contpte_ptep_get_lockless(pte_t *orig_ptep); extern void contpte_set_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte, unsigned int nr); +extern pte_t contpte_ptep_get_and_clear_full(struct mm_struct *mm, + unsigned long addr, pte_t *ptep); extern int contpte_ptep_test_and_clear_young(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep); extern int contpte_ptep_clear_flush_young(struct vm_area_struct *vma, @@ -1252,12 +1254,24 @@ static inline void pte_clear(struct mm_struct *mm, __pte_clear(mm, addr, ptep); } +#define __HAVE_ARCH_PTEP_GET_AND_CLEAR_FULL +static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm, + unsigned long addr, pte_t *ptep, int full) +{ + pte_t orig_pte = __ptep_get(ptep); + + if (!pte_present(orig_pte) || !pte_cont(orig_pte) || !full) { + contpte_try_unfold(mm, addr, ptep, orig_pte); + return __ptep_get_and_clear(mm, addr, ptep); + } else + return contpte_ptep_get_and_clear_full(mm, addr, ptep); +} + #define __HAVE_ARCH_PTEP_GET_AND_CLEAR static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { - contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep)); - return __ptep_get_and_clear(mm, addr, ptep); + return ptep_get_and_clear_full(mm, addr, ptep, 0); } #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c index e8e4a298fd53..0b585d1c4c94 100644 --- a/arch/arm64/mm/contpte.c +++ b/arch/arm64/mm/contpte.c @@ -241,6 +241,74 @@ void contpte_set_ptes(struct mm_struct *mm, unsigned long addr, } while (addr != end); } +pte_t contpte_ptep_get_and_clear_full(struct mm_struct *mm, + unsigned long addr, pte_t *ptep) +{ + /* + * When doing a full address space teardown, we can avoid unfolding the + * contiguous range, and therefore avoid the associated tlbi. Instead, + * just clear the pte. The caller is promising to call us for every pte, + * so every pte in the range will be cleared by the time the tlbi is + * issued. + * + * However, this approach will leave the ptes in an inconsistent state + * until ptep_get_and_clear_full() has been called for every pte in the + * range. This could cause ptep_get() to fail to return the correct + * access/dirty bits, if ptep_get() calls are interleved with + * ptep_get_and_clear_full() (which they are). Solve this by copying the + * access/dirty bits to every pte in the range so that ptep_get() still + * sees them if we have already cleared pte that the hw chose to update. + * Note that a full teardown will only happen when the process is + * exiting, so we do not expect anymore accesses and therefore no more + * access/dirty bit updates, so there is no race here. + */ + + pte_t *orig_ptep = ptep; + pte_t pte; + bool flags_propagated = false; + bool dirty = false; + bool young = false; + int i; + + /* First, gather access and dirty bits. */ + ptep = contpte_align_down(orig_ptep); + for (i = 0; i < CONT_PTES; i++, ptep++) { + pte = __ptep_get(ptep); + + /* + * If we find a zeroed PTE, contpte_ptep_get_and_clear_full() + * must have already been called for it, so we have already + * propagated the flags to the other ptes. + */ + if (pte_val(pte) == 0) { + flags_propagated = true; + break; + } + + if (pte_dirty(pte)) + dirty = true; + + if (pte_young(pte)) + young = true; + } + + /* Now copy the access and dirty bits into each pte in the range. */ + if (!flags_propagated) { + ptep = contpte_align_down(orig_ptep); + for (i = 0; i < CONT_PTES; i++, ptep++) { + pte = __ptep_get(ptep); + + if (dirty) + pte = pte_mkdirty(pte); + + if (young) + pte = pte_mkyoung(pte); + } + } + + return __ptep_get_and_clear(mm, addr, orig_ptep); +} + int contpte_ptep_test_and_clear_young(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) { From patchwork Thu Jun 22 14:42:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 111714 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp5130402vqr; Thu, 22 Jun 2023 08:00:27 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4AVI7iRiSPB+bjlA65haUhWBcT8KJsEOf9Pm5AYLfsipnB+2bRiQyNzO7yFw7ewYO5dX1L X-Received: by 2002:a05:6a20:9390:b0:11d:8a8f:655d with SMTP id x16-20020a056a20939000b0011d8a8f655dmr23954683pzh.4.1687446026791; Thu, 22 Jun 2023 08:00:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687446026; cv=none; d=google.com; s=arc-20160816; b=b632OMAtJqV+C7PHqhQVFVnJ09chj6MP5veVz56nphkQQ5PWxTdbPfFzQGEvfP3xzB sjK/lT9N4qrviT9akqX+U1cUlKSLHsasWSWlh7GAU0UepyJiONdN48qEQ/54ogtwdqLV ACt72nuA0Pxx5QoN+4HtxGi05bfbUHgPHStB3BttmeoGwCNuQBja2yMAophCp4AfzRaS pMwl++BDPtXH42+xgl/DKybXmfghqfc1diV5uvMZSqGIJ0Ptnks4jsWrBgaL0cHCjPRp GXrBXVRwiJMUWX0C6a4ELEcNmOOjrsG3In2h4Slm4+Sj2oEx3/P4FNOfuGbwGnmnTQW3 4a0g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=1U98Qo4K46blocWc+Tr5yK1y0iUP+/AjAftybug9D74=; b=OK0ow/vyxAzMjkpPE7NBI+M4ILCKNLuQ6Jd7uIBHgSnxyiHNWjshsOpcCQKkv9CEEd QWqWZn0Xi2/ahkAiGsonDGcWtO+kjr7LIOh7eR5IKnnuOqjfLFZNZKnrELBthwpv8AJN fxPN9O+59P0ObaZoSOpNlbcwMFV/XOEiWTjHwbwoWExz2zstX/A623PNa9n6zhszICDD QrY2GH3+Dz1bDMpub6v86AedvoTYg+5DIm0ng1+gso9F6X9TBLvbCH3tcq/aKXEp3R8S AdufXeFZKftBfC/rCSID/YSPImqXsZbMNjd43MWmXLYamyuLJ6jjyknwxTx6Y+ahIPGn rOHQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k186-20020a6384c3000000b00543ae703b97si6577173pgd.209.2023.06.22.08.00.13; Thu, 22 Jun 2023 08:00:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231653AbjFVOoH (ORCPT + 99 others); Thu, 22 Jun 2023 10:44:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49782 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231902AbjFVOnn (ORCPT ); Thu, 22 Jun 2023 10:43:43 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id A5B17212B for ; Thu, 22 Jun 2023 07:43:19 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id CB0721516; Thu, 22 Jun 2023 07:43:48 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 46D553F663; Thu, 22 Jun 2023 07:43:02 -0700 (PDT) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Andrew Morton , Anshuman Khandual , Matthew Wilcox , Yu Zhao , Mark Rutland Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v1 13/14] mm: Batch-copy PTE ranges during fork() Date: Thu, 22 Jun 2023 15:42:08 +0100 Message-Id: <20230622144210.2623299-14-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230622144210.2623299-1-ryan.roberts@arm.com> References: <20230622144210.2623299-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769415405367447371?= X-GMAIL-MSGID: =?utf-8?q?1769415405367447371?= Convert copy_pte_range() to copy a set of ptes that map a physically contiguous block of memory in a batch. This will likely improve performance by a tiny amount due to batching the folio reference count management and calling set_ptes() rather than making individual calls to set_pte_at(). However, the primary motivation for this change is to reduce the number of tlb maintenance operations that the arm64 backend has to perform during fork, now that it transparently supports the "contiguous bit" in its ptes. By write-protecting the parent using the new ptep_set_wrprotects() (note the 's' at the end) function, the backend can avoid having to unfold contig ranges of PTEs, which is expensive, when all ptes in the range are being write-protected. Similarly, by using set_ptes() rather than set_pte_at() to set up ptes in the child, the backend does not need to fold a contiguous range once they are all populated - they can be initially populated as a contiguous range in the first place. This change addresses the core-mm refactoring only, and introduces ptep_set_wrprotects() with a default implementation that calls ptep_set_wrprotect() for each pte in the range. A separate change will implement ptep_set_wrprotects() in the arm64 backend to realize the performance improvement. Signed-off-by: Ryan Roberts --- include/linux/pgtable.h | 13 ++++ mm/memory.c | 149 +++++++++++++++++++++++++++++++--------- 2 files changed, 128 insertions(+), 34 deletions(-) diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index a661a17173fa..6a7b28d520de 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -547,6 +547,19 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addres } #endif +#ifndef ptep_set_wrprotects +struct mm_struct; +static inline void ptep_set_wrprotects(struct mm_struct *mm, + unsigned long address, pte_t *ptep, + unsigned int nr) +{ + unsigned int i; + + for (i = 0; i < nr; i++, address += PAGE_SIZE, ptep++) + ptep_set_wrprotect(mm, address, ptep); +} +#endif + /* * On some architectures hardware does not set page access bit when accessing * memory page, it is responsibility of software setting this bit. It brings diff --git a/mm/memory.c b/mm/memory.c index fb30f7523550..9a041cc31c74 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -911,57 +911,126 @@ copy_present_page(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma /* Uffd-wp needs to be delivered to dest pte as well */ pte = pte_mkuffd_wp(pte); set_pte_at(dst_vma->vm_mm, addr, dst_pte, pte); - return 0; + return 1; +} + +static inline unsigned long page_addr(struct page *page, + struct page *anchor, unsigned long anchor_addr) +{ + unsigned long offset; + unsigned long addr; + + offset = (page_to_pfn(page) - page_to_pfn(anchor)) << PAGE_SHIFT; + addr = anchor_addr + offset; + + if (anchor > page) { + if (addr > anchor_addr) + return 0; + } else { + if (addr < anchor_addr) + return ULONG_MAX; + } + + return addr; +} + +static int calc_anon_folio_map_pgcount(struct folio *folio, + struct page *page, pte_t *pte, + unsigned long addr, unsigned long end) +{ + pte_t ptent; + int floops; + int i; + unsigned long pfn; + + end = min(page_addr(&folio->page + folio_nr_pages(folio), page, addr), + end); + floops = (end - addr) >> PAGE_SHIFT; + pfn = page_to_pfn(page); + pfn++; + pte++; + + for (i = 1; i < floops; i++) { + ptent = ptep_get(pte); + + if (!pte_present(ptent) || + pte_pfn(ptent) != pfn) { + return i; + } + + pfn++; + pte++; + } + + return floops; } /* - * Copy one pte. Returns 0 if succeeded, or -EAGAIN if one preallocated page - * is required to copy this pte. + * Copy set of contiguous ptes. Returns number of ptes copied if succeeded + * (always gte 1), or -EAGAIN if one preallocated page is required to copy the + * first pte. */ static inline int -copy_present_pte(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, - pte_t *dst_pte, pte_t *src_pte, unsigned long addr, int *rss, - struct folio **prealloc) +copy_present_ptes(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, + pte_t *dst_pte, pte_t *src_pte, + unsigned long addr, unsigned long end, + int *rss, struct folio **prealloc) { struct mm_struct *src_mm = src_vma->vm_mm; unsigned long vm_flags = src_vma->vm_flags; pte_t pte = ptep_get(src_pte); struct page *page; struct folio *folio; + bool anon; + int nr; + int i; page = vm_normal_page(src_vma, addr, pte); - if (page) + if (page) { folio = page_folio(page); - if (page && folio_test_anon(folio)) { - /* - * If this page may have been pinned by the parent process, - * copy the page immediately for the child so that we'll always - * guarantee the pinned page won't be randomly replaced in the - * future. - */ - folio_get(folio); - if (unlikely(page_try_dup_anon_rmap(page, false, src_vma))) { - /* Page may be pinned, we have to copy. */ - folio_put(folio); - return copy_present_page(dst_vma, src_vma, dst_pte, src_pte, - addr, rss, prealloc, page); + anon = folio_test_anon(folio); + nr = calc_anon_folio_map_pgcount(folio, page, src_pte, addr, end); + + for (i = 0; i < nr; i++, page++) { + if (anon) { + /* + * If this page may have been pinned by the + * parent process, copy the page immediately for + * the child so that we'll always guarantee the + * pinned page won't be randomly replaced in the + * future. + */ + if (unlikely(page_try_dup_anon_rmap( + page, false, src_vma))) { + if (i != 0) + break; + /* Page may be pinned, we have to copy. */ + return copy_present_page(dst_vma, src_vma, + dst_pte, src_pte, + addr, rss, + prealloc, page); + } + rss[MM_ANONPAGES]++; + VM_BUG_ON(PageAnonExclusive(page)); + } else { + page_dup_file_rmap(page, false); + rss[mm_counter_file(page)]++; + } } - rss[MM_ANONPAGES]++; - } else if (page) { - folio_get(folio); - page_dup_file_rmap(page, false); - rss[mm_counter_file(page)]++; - } + + nr = i; + folio_ref_add(folio, nr); + } else + nr = 1; /* * If it's a COW mapping, write protect it both * in the parent and the child */ if (is_cow_mapping(vm_flags) && pte_write(pte)) { - ptep_set_wrprotect(src_mm, addr, src_pte); + ptep_set_wrprotects(src_mm, addr, src_pte, nr); pte = pte_wrprotect(pte); } - VM_BUG_ON(page && folio_test_anon(folio) && PageAnonExclusive(page)); /* * If it's a shared mapping, mark it clean in @@ -974,8 +1043,8 @@ copy_present_pte(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, if (!userfaultfd_wp(dst_vma)) pte = pte_clear_uffd_wp(pte); - set_pte_at(dst_vma->vm_mm, addr, dst_pte, pte); - return 0; + set_ptes(dst_vma->vm_mm, addr, dst_pte, pte, nr); + return nr; } static inline struct folio *page_copy_prealloc(struct mm_struct *src_mm, @@ -1065,15 +1134,28 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, */ WARN_ON_ONCE(ret != -ENOENT); } - /* copy_present_pte() will clear `*prealloc' if consumed */ - ret = copy_present_pte(dst_vma, src_vma, dst_pte, src_pte, - addr, rss, &prealloc); + /* copy_present_ptes() will clear `*prealloc' if consumed */ + ret = copy_present_ptes(dst_vma, src_vma, dst_pte, src_pte, + addr, end, rss, &prealloc); + /* * If we need a pre-allocated page for this pte, drop the * locks, allocate, and try again. */ if (unlikely(ret == -EAGAIN)) break; + + /* + * Positive return value is the number of ptes copied. + */ + VM_WARN_ON_ONCE(ret < 1); + progress += 8 * ret; + ret--; + dst_pte += ret; + src_pte += ret; + addr += ret << PAGE_SHIFT; + ret = 0; + if (unlikely(prealloc)) { /* * pre-alloc page cannot be reused by next time so as @@ -1084,7 +1166,6 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, folio_put(prealloc); prealloc = NULL; } - progress += 8; } while (dst_pte++, src_pte++, addr += PAGE_SIZE, addr != end); arch_leave_lazy_mmu_mode(); From patchwork Thu Jun 22 14:42:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 111702 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp5126622vqr; Thu, 22 Jun 2023 07:54:06 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7aIpx+94rOIweF3fg/nOcT7FqyulTZ7Jw7pRl0Au/M90sNcmTe9OYSfrTAL4Bk8f8DZ3rR X-Received: by 2002:a05:6a20:4e06:b0:125:56d8:6724 with SMTP id gk6-20020a056a204e0600b0012556d86724mr810999pzb.58.1687445645685; Thu, 22 Jun 2023 07:54:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687445645; cv=none; d=google.com; s=arc-20160816; b=Ji7tkSOQomvt1kcwk1CUEy6yfxpapp6ZN8UAALUS1ZH4bWeGxEvEoreCtGUpg2p96I K0dvH/7Qh/Rr4plpVOdnJ9gtDS3OuxdodFFMcJP4BiTL12xra3bV0TJ/rQjVD8T7rrA8 WxRXsrL6Dnr2BFV534U5uzwoIeHQORO3HazhfchgBlzfa3k7mkAG1wZw3gZgTM8ThHqZ G3qf1tHKjt9Ao6sGzLzIXoPuPZ3ZgMfL1BYD+Qxw4Lu65RJ5fAUSzgQuUQh2I3FXq8Q1 /GCFxr786uChOaQvnYfbIvByjLMTt8owJ63q9RqGS2bYeIxdO7gvOGThu78xZr2tU5wB lnPw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=3VXUsl66xsyGZxd0YSBpF82rlW1iasetGIFfl5S2d8w=; b=q1w0cVXjkTFne0z2jtGWbszJQXfO1zvJKTT+t1UXaApLGOyjQivAOJgq3atUF0VnVG 6d/jPQj8Yq4fu/ZnYlKsznhyAr+X3qkUY6pCfNSlfmnxPStFI3Uw9v9YxcrTtr6jzsQg 8aT/ei61okp8cfnbqppe1bul0l/xDLOuIZh3QAnPDr3m7azTYqPt6qx6WSZ6chpHWUaM /eYYNwtn5xeTTGdnbs9s4qnKdMteOkzWCBBYkPE69HZifSDaxhBhKzbAKLypu4ypFd8K 07SIWpvAk8Vw03rz45ioGUscWHdKIAqg/ghR3+ZPZOpdro6D11UllygPXO6QazOFtoSB aesg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z9-20020a63b909000000b00553ced07cf3si2022217pge.527.2023.06.22.07.53.53; Thu, 22 Jun 2023 07:54:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231835AbjFVOnu (ORCPT + 99 others); Thu, 22 Jun 2023 10:43:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49236 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231836AbjFVOnc (ORCPT ); Thu, 22 Jun 2023 10:43:32 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id DD3781FFD for ; Thu, 22 Jun 2023 07:43:07 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A45B71042; Thu, 22 Jun 2023 07:43:51 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 207E63F663; Thu, 22 Jun 2023 07:43:05 -0700 (PDT) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Andrew Morton , Anshuman Khandual , Matthew Wilcox , Yu Zhao , Mark Rutland Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v1 14/14] arm64/mm: Implement ptep_set_wrprotects() to optimize fork() Date: Thu, 22 Jun 2023 15:42:09 +0100 Message-Id: <20230622144210.2623299-15-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230622144210.2623299-1-ryan.roberts@arm.com> References: <20230622144210.2623299-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769415005146196535?= X-GMAIL-MSGID: =?utf-8?q?1769415005146196535?= With the core-mm changes in place to batch-copy ptes during fork, we can take advantage of this in arm64 to greatly reduce the number of tlbis we have to issue, and recover the lost fork performance incured when adding support for transparent contiguous ptes. If we are write-protecting a whole contig range, we can apply the write-protection to the whole range and know that it won't change whether the range should have the contiguous bit set or not. For ranges smaller than the contig range, we will still have to unfold, apply the write-protection, then fold if the change now means the range is foldable. Performance tested with the following test written for the will-it-scale framework: ------- char *testcase_description = "fork and exit"; void testcase(unsigned long long *iterations, unsigned long nr) { int pid; char *mem; mem = malloc(SZ_128M); assert(mem); memset(mem, 1, SZ_128M); while (1) { pid = fork(); assert(pid >= 0); if (!pid) exit(0); waitpid(pid, NULL, 0); (*iterations)++; } } ------- I see huge performance regression when PTE_CONT support was added, then the regression is mostly fixed with the addition of this change. The following shows regression relative to before PTE_CONT was enabled (bigger negative value is bigger regression): | cpus | before opt | after opt | |-------:|-------------:|------------:| | 1 | -10.4% | -5.2% | | 8 | -15.4% | -3.5% | | 16 | -38.7% | -3.7% | | 24 | -57.0% | -4.4% | | 32 | -65.8% | -5.4% | Signed-off-by: Ryan Roberts --- arch/arm64/include/asm/pgtable.h | 48 ++++++++++++++++++++++---------- arch/arm64/mm/contpte.c | 41 +++++++++++++++++++++++++++ arch/arm64/mm/hugetlbpage.c | 2 +- 3 files changed, 75 insertions(+), 16 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 5963da651da7..479a9e5ab756 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -963,21 +963,25 @@ static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm, #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ /* - * __ptep_set_wrprotect - mark read-only while trasferring potential hardware + * __ptep_set_wrprotects - mark read-only while trasferring potential hardware * dirty status (PTE_DBM && !PTE_RDONLY) to the software PTE_DIRTY bit. */ -static inline void __ptep_set_wrprotect(struct mm_struct *mm, - unsigned long address, pte_t *ptep) +static inline void __ptep_set_wrprotects(struct mm_struct *mm, + unsigned long address, pte_t *ptep, + unsigned int nr) { pte_t old_pte, pte; - - pte = __ptep_get(ptep); - do { - old_pte = pte; - pte = pte_wrprotect(pte); - pte_val(pte) = cmpxchg_relaxed(&pte_val(*ptep), - pte_val(old_pte), pte_val(pte)); - } while (pte_val(pte) != pte_val(old_pte)); + unsigned int i; + + for (i = 0; i < nr; i++, address += PAGE_SIZE, ptep++) { + pte = __ptep_get(ptep); + do { + old_pte = pte; + pte = pte_wrprotect(pte); + pte_val(pte) = cmpxchg_relaxed(&pte_val(*ptep), + pte_val(old_pte), pte_val(pte)); + } while (pte_val(pte) != pte_val(old_pte)); + } } #ifdef CONFIG_TRANSPARENT_HUGEPAGE @@ -985,7 +989,7 @@ static inline void __ptep_set_wrprotect(struct mm_struct *mm, static inline void pmdp_set_wrprotect(struct mm_struct *mm, unsigned long address, pmd_t *pmdp) { - __ptep_set_wrprotect(mm, address, (pte_t *)pmdp); + __ptep_set_wrprotects(mm, address, (pte_t *)pmdp, 1); } #define pmdp_establish pmdp_establish @@ -1134,6 +1138,8 @@ extern int contpte_ptep_test_and_clear_young(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep); extern int contpte_ptep_clear_flush_young(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep); +extern void contpte_set_wrprotects(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, unsigned int nr); extern int contpte_ptep_set_access_flags(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep, pte_t entry, int dirty); @@ -1298,13 +1304,25 @@ static inline int ptep_clear_flush_young(struct vm_area_struct *vma, return contpte_ptep_clear_flush_young(vma, addr, ptep); } +#define ptep_set_wrprotects ptep_set_wrprotects +static inline void ptep_set_wrprotects(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, unsigned int nr) +{ + if (!contpte_is_enabled(mm)) + __ptep_set_wrprotects(mm, addr, ptep, nr); + else if (nr == 1) { + contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep)); + __ptep_set_wrprotects(mm, addr, ptep, 1); + contpte_try_fold(mm, addr, ptep, __ptep_get(ptep)); + } else + contpte_set_wrprotects(mm, addr, ptep, nr); +} + #define __HAVE_ARCH_PTEP_SET_WRPROTECT static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { - contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep)); - __ptep_set_wrprotect(mm, addr, ptep); - contpte_try_fold(mm, addr, ptep, __ptep_get(ptep)); + ptep_set_wrprotects(mm, addr, ptep, 1); } #define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c index 0b585d1c4c94..4f666697547d 100644 --- a/arch/arm64/mm/contpte.c +++ b/arch/arm64/mm/contpte.c @@ -353,6 +353,47 @@ int contpte_ptep_clear_flush_young(struct vm_area_struct *vma, return young; } +void contpte_set_wrprotects(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, unsigned int nr) +{ + unsigned long next; + unsigned long end = addr + (nr << PAGE_SHIFT); + + do { + next = pte_cont_addr_end(addr, end); + nr = (next - addr) >> PAGE_SHIFT; + + /* + * If wrprotecting an entire contig range, we can avoid + * unfolding. Just set wrprotect and wait for the later + * mmu_gather flush to invalidate the tlb. Until the flush, the + * page may or may not be wrprotected. After the flush, it is + * guarranteed wrprotected. If its a partial range though, we + * must unfold, because we can't have a case where CONT_PTE is + * set but wrprotect applies to a subset of the PTEs; this would + * cause it to continue to be unpredictable after the flush. + */ + if (nr != CONT_PTES) + contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep)); + + __ptep_set_wrprotects(mm, addr, ptep, nr); + + addr = next; + ptep += nr; + + /* + * If applying to a partial contig range, the change could have + * made the range foldable. Use the last pte in the range we + * just set for comparison, since contpte_try_fold() only + * triggers when acting on the last pte in the contig range. + */ + if (nr != CONT_PTES) + contpte_try_fold(mm, addr - PAGE_SIZE, ptep - 1, + __ptep_get(ptep - 1)); + + } while (addr != end); +} + int contpte_ptep_set_access_flags(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep, pte_t entry, int dirty) diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 82b2036dbe2f..ecf7bfa761c3 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -511,7 +511,7 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm, pte_t pte; if (!pte_cont(__ptep_get(ptep))) { - __ptep_set_wrprotect(mm, addr, ptep); + __ptep_set_wrprotects(mm, addr, ptep, 1); return; }