From patchwork Tue Dec 19 07:55:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 180809 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp1779716dyi; Tue, 19 Dec 2023 00:00:44 -0800 (PST) X-Google-Smtp-Source: AGHT+IHDxHGsES+oPYV3MrZUjTi6JxnQo7KxwghlT/x4VCNU/2MLj1zguc4PpRu7c72YUZRFMcZW X-Received: by 2002:a05:620a:4e1:b0:77e:fba3:9d34 with SMTP id b1-20020a05620a04e100b0077efba39d34mr20504456qkh.152.1702972844138; Tue, 19 Dec 2023 00:00:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702972844; cv=none; d=google.com; s=arc-20160816; b=TJjL55Wm0w9Ca9CkuMw61A2oV3xgEeoLAjOKOuCXDxsBPOnqFVXkwAq4J4BSW9a34B 4rQoyxgvoomhTjHmG7Zl2oEnpLAyOle0HBAjsBMu2JVkOzSj70Q0xAO2EU2tIkUPBtb9 zCyJZMjOr9mk2+2TVZP5kl1TU+3e9SzkphBPIBcyVK0qydbQpH7xevoIM6o5mpLSxWWu KNW90O4E7LRcVCbsoAlKsh5Vd1fV2esgLOnCidk5KGTUy+fPJ6zypN+Fsd1KcACqfyvO 1WpoUtd3dZQMOQZk/Nefp98WBh3rGR4BjqaGO5Vthc7KIZSSCfDvhi8XKpZN8iP2dWZz KvPg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=C+SULC41WW+Kd90iko+LfCmzwEshCatJrP9IQa4Pxio=; fh=3mpu6Yyr5hMzDKtkQvzcApl7ZSQGU4G8QyIY2BAU7VY=; b=GH0Dc0NgKYxjfxs2eQW5bwno8iwIu91LlddhsPuc73a7TydHlzKzd9fGY6z97YEl8z Gl1RcKnu0ViPc2ckBUCv41l5WEFlb1POv6UP0q+Kos7LFHZpdwnDMP7BAZ4z/Epm1fUT a14HbjUT+KYO0IDSKBvSGY8yhKqWgoX8e0MRigvXC7V4j5Kuw6iCH5IJzWI5iirkfGEF UYFiSLEtvSCEB0jzbWefj7AiCPaET+TepPD5POPrE9M0j96khyuJayi86iwoXsU16Jtb aK8Q443h0PJE2F6UTNQbK/8naEMKo2fXEi/IZrc0jrsv/7ubwG/M3YuAD6epsR/8gDvt aTfg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=HzGYs5D1; spf=pass (google.com: domain of linux-kernel+bounces-4848-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-4848-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id pb22-20020a05620a839600b00774173d6bb4si26261305qkn.190.2023.12.19.00.00.44 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Dec 2023 00:00:44 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-4848-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=HzGYs5D1; spf=pass (google.com: domain of linux-kernel+bounces-4848-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-4848-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 99D441C2403E for ; Tue, 19 Dec 2023 08:00:19 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 326AB168CB; Tue, 19 Dec 2023 07:58:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="HzGYs5D1" X-Original-To: linux-kernel@vger.kernel.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8F29E156EA for ; Tue, 19 Dec 2023 07:58:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1702972714; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=C+SULC41WW+Kd90iko+LfCmzwEshCatJrP9IQa4Pxio=; b=HzGYs5D1vLbUuJVmAykZPFsCzxrnjFycfrLCcmIOKg+r0BCb/RhiwCkwN6v8l5rTBdiMMC D6OEbZqVnMsZ2kJmReZCP5it19ahGc5eynOSTCEz4EtKWAi7ihjgqZgln949+/3ffJevPl SQADVOSweryz+2NcZdc9DCHxOHUDue4= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-443-hR7CaQnoO4OWgrV4n0G44w-1; Tue, 19 Dec 2023 02:58:29 -0500 X-MC-Unique: hR7CaQnoO4OWgrV4n0G44w-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 462A5185A782; Tue, 19 Dec 2023 07:58:28 +0000 (UTC) Received: from x1n.redhat.com (unknown [10.72.116.117]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4D2F42026D66; Tue, 19 Dec 2023 07:58:16 +0000 (UTC) From: peterx@redhat.com To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Matthew Wilcox , Christophe Leroy , Lorenzo Stoakes , David Hildenbrand , Vlastimil Babka , Mike Kravetz , Mike Rapoport , Christoph Hellwig , John Hubbard , Andrew Jones , linux-arm-kernel@lists.infradead.org, Michael Ellerman , "Kirill A . Shutemov" , linuxppc-dev@lists.ozlabs.org, Rik van Riel , linux-riscv@lists.infradead.org, Yang Shi , James Houghton , "Aneesh Kumar K . V" , Andrew Morton , Jason Gunthorpe , Andrea Arcangeli , peterx@redhat.com, Axel Rasmussen Subject: [PATCH 13/13] mm/gup: Handle hugetlb in the generic follow_page_mask code Date: Tue, 19 Dec 2023 15:55:38 +0800 Message-ID: <20231219075538.414708-14-peterx@redhat.com> In-Reply-To: <20231219075538.414708-1-peterx@redhat.com> References: <20231219075538.414708-1-peterx@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.4 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1785696453072306744 X-GMAIL-MSGID: 1785696453072306744 From: Peter Xu Now follow_page() is ready to handle hugetlb pages in whatever form, and over all architectures. Switch to the generic code path. Time to retire hugetlb_follow_page_mask(), following the previous retirement of follow_hugetlb_page() in 4849807114b8. There may be a slight difference of how the loops run when processing slow GUP over a large hugetlb range on cont_pte/cont_pmd supported archs: each loop of __get_user_pages() will resolve one pgtable entry with the patch applied, rather than relying on the size of hugetlb hstate, the latter may cover multiple entries in one loop. A quick performance test on an aarch64 VM on M1 chip shows 15% degrade over a tight loop of slow gup after the path switched. That shouldn't be a problem because slow-gup should not be a hot path for GUP in general: when page is commonly present, fast-gup will already succeed, while when the page is indeed missing and require a follow up page fault, the slow gup degrade will probably buried in the fault paths anyway. It also explains why slow gup for THP used to be very slow before 57edfcfd3419 ("mm/gup: accelerate thp gup even for "pages != NULL"") lands, the latter not part of a performance analysis but a side benefit. If the performance will be a concern, we can consider handle CONT_PTE in follow_page(). Before that is justified to be necessary, keep everything clean and simple. Signed-off-by: Peter Xu --- include/linux/hugetlb.h | 7 ---- mm/gup.c | 15 +++------ mm/hugetlb.c | 71 ----------------------------------------- 3 files changed, 5 insertions(+), 88 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index f8c5c174c8a6..8a352d577ebf 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -332,13 +332,6 @@ static inline void hugetlb_zap_end( { } -static inline struct page *hugetlb_follow_page_mask( - struct vm_area_struct *vma, unsigned long address, unsigned int flags, - unsigned int *page_mask) -{ - BUILD_BUG(); /* should never be compiled in if !CONFIG_HUGETLB_PAGE*/ -} - static inline int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, struct vm_area_struct *dst_vma, diff --git a/mm/gup.c b/mm/gup.c index 14a7d13e7bd6..f34c0a912311 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -997,18 +997,11 @@ static struct page *follow_page_mask(struct vm_area_struct *vma, { pgd_t *pgd, pgdval; struct mm_struct *mm = vma->vm_mm; + struct page *page; - ctx->page_mask = 0; - - /* - * Call hugetlb_follow_page_mask for hugetlb vmas as it will use - * special hugetlb page table walking code. This eliminates the - * need to check for hugetlb entries in the general walking code. - */ - if (is_vm_hugetlb_page(vma)) - return hugetlb_follow_page_mask(vma, address, flags, - &ctx->page_mask); + vma_pgtable_walk_begin(vma); + ctx->page_mask = 0; pgd = pgd_offset(mm, address); pgdval = *pgd; @@ -1020,6 +1013,8 @@ static struct page *follow_page_mask(struct vm_area_struct *vma, else page = follow_p4d_mask(vma, address, pgd, flags, ctx); + vma_pgtable_walk_end(vma); + return page; } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 29705e5c6f40..3013122a739f 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6783,77 +6783,6 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte, } #endif /* CONFIG_USERFAULTFD */ -struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, - unsigned long address, unsigned int flags, - unsigned int *page_mask) -{ - struct hstate *h = hstate_vma(vma); - struct mm_struct *mm = vma->vm_mm; - unsigned long haddr = address & huge_page_mask(h); - struct page *page = NULL; - spinlock_t *ptl; - pte_t *pte, entry; - int ret; - - hugetlb_vma_lock_read(vma); - pte = hugetlb_walk(vma, haddr, huge_page_size(h)); - if (!pte) - goto out_unlock; - - ptl = huge_pte_lock(h, mm, pte); - entry = huge_ptep_get(pte); - if (pte_present(entry)) { - page = pte_page(entry); - - if (!huge_pte_write(entry)) { - if (flags & FOLL_WRITE) { - page = NULL; - goto out; - } - - if (gup_must_unshare(vma, flags, page)) { - /* Tell the caller to do unsharing */ - page = ERR_PTR(-EMLINK); - goto out; - } - } - - page = nth_page(page, ((address & ~huge_page_mask(h)) >> PAGE_SHIFT)); - - /* - * Note that page may be a sub-page, and with vmemmap - * optimizations the page struct may be read only. - * try_grab_page() will increase the ref count on the - * head page, so this will be OK. - * - * try_grab_page() should always be able to get the page here, - * because we hold the ptl lock and have verified pte_present(). - */ - ret = try_grab_page(page, flags); - - if (WARN_ON_ONCE(ret)) { - page = ERR_PTR(ret); - goto out; - } - - *page_mask = (1U << huge_page_order(h)) - 1; - } -out: - spin_unlock(ptl); -out_unlock: - hugetlb_vma_unlock_read(vma); - - /* - * Fixup retval for dump requests: if pagecache doesn't exist, - * don't try to allocate a new page but just skip it. - */ - if (!page && (flags & FOLL_DUMP) && - !hugetlbfs_pagecache_present(h, vma, address)) - page = ERR_PTR(-EFAULT); - - return page; -} - long hugetlb_change_protection(struct vm_area_struct *vma, unsigned long address, unsigned long end, pgprot_t newprot, unsigned long cp_flags)