From patchwork Thu Jan 5 10:17:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39424 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp227229wrt; Thu, 5 Jan 2023 02:21:16 -0800 (PST) X-Google-Smtp-Source: AMrXdXtBs7JvbpYBHiPoy/TMwFhAU/Vn/eNW4I1GL7UicWLPRpdHoaSwTs61gFsEH91XPZC7owIo X-Received: by 2002:a17:90a:5901:b0:219:b936:6bd7 with SMTP id k1-20020a17090a590100b00219b9366bd7mr53886789pji.19.1672914075839; Thu, 05 Jan 2023 02:21:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914075; cv=none; d=google.com; s=arc-20160816; b=KTVSNufyT4YpaGLdTD2V3Z0Y+6nXJjt6GDpbWRmQL7NxbM/jfyOQPQ0ViLK+lBefP9 y+5guQXrYFkWcsVWCxnIPR40xzEFGdSNIVn/dV12cMsQyBa6ScsgFin+mMUBd9RcxYmu NHbYmMQxoLYsRS1eF/9MVuAlnhYTWLa60hgWyLhozqcfDucGHt7mtlzTJCBN8k0W3R6J 7nLoHMpz/AlOSoNsSO8Im4Nurtx0Do8iK2NM+M4nVsrgLbAG5aibpCTIafSgXp4zKb4g lFcQcSRpNcfE2KfFC70QQhXDQFcc/CiVzaWxeEepc7qA7ZvJMk2G3oAWrAdjJVaq7ClG qoqg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=hd7V91rscRPZFZZvxWy710h4affBEizLA7fDXt78Shk=; b=ht3fXSVvysMAZF2tA2agTPirzxonSv3i88rQWnn7JFEE/WKE6RKQrGy3BZcLvr8kX3 At2utKkAT1Y+UyRh+ARLbSts07bw67M8cV4boZjgemoeYsh97ANloMyjW/+9hvbI1CJo 8Mz7o0x49XgNF7GeLdOtQ5MRE+rSM82O230r7fhuB8IddF7yyt5rsngNovCwI1YVziL/ xzoZjP5m8X8jZ020Zd250dNBRs0lfSjuJsbYnIvQnW56J3pLkiqRwpQ4mZzRZSxfdJQ+ PDHlGn8jfYu6RZYPYeTZgvvXZoZWwAoIsQj4cdiL4FtqApzM/Ax/8fPyi216Z7utpvIg vcyQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="e5/xp7qt"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f88-20020a17090a706100b00213997a5fe6si1531698pjk.113.2023.01.05.02.21.03; Thu, 05 Jan 2023 02:21:15 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="e5/xp7qt"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232580AbjAEKTG (ORCPT + 99 others); Thu, 5 Jan 2023 05:19:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40056 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232460AbjAEKS5 (ORCPT ); Thu, 5 Jan 2023 05:18:57 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C58A64C731 for ; Thu, 5 Jan 2023 02:18:53 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id y66-20020a25c845000000b00733b5049b6fso36235128ybf.3 for ; Thu, 05 Jan 2023 02:18:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=hd7V91rscRPZFZZvxWy710h4affBEizLA7fDXt78Shk=; b=e5/xp7qtQB20h9zFT5hG8PhvybXKj9xDQ+Q32FSU+DoEcADv55QGkwnSACPjQG/DyG vSRBq0IQ2N16x9PTtjdsbMAthbWAIxsOWG8qTqMi3fOpGsCfnEzbWtwY/ftO7/Aww8UO QDPnn5bU281BREv8ROjgTb6/jrQ4xdaIwAXCflIgBhVPfKKsCUqvAx6332vM6h13gOZT 9fP/e7Jwx/iB0gVJpzSXmMLjDIxvFvGjRhbP16hT207hdtIUzqTLLWx1ltXn2+FmtM24 IuRG4pS+ekNQ8KYf+xVb4krM3ZYDVnIAUyYIxI4L+mJmziZTbCp7hnTBs60V1VSV/eHb UOjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=hd7V91rscRPZFZZvxWy710h4affBEizLA7fDXt78Shk=; b=DQNLwQRy60WD3Pc3atp8jSckK1AKWOkuZDUjEE9HyUsYEq/5Hz1R9FifM0e9+0g7w7 6cO7p/BO9Kz+f1SynczottRdCPJEjZ8ObWXMf7hJe1NNgK4LK8ShvWL7FmkYJKNb4N9V yYLXhrdXRaIh3REbIgPDPK5CQCX4d4MMDwMvxBVoVR5dhVXt0LzCf6ztgPAr0DejKmHm sfC1C7kv5trELKkzPwOuyN95fnA5vJGmAmTGNIWkDtJXkSFmkrxqid+waLGqud8lM+N8 GvISu2c0pecAX1fZ68isxmTyLNDwREaXvv70+eT2I4kF8jDKkEtJZBby7MZV8l602x7r 17+w== X-Gm-Message-State: AFqh2krYK9LroXQ3XPkOulQ/nAoEmfBStI7haQ92MvzhxUsm5fOSzvvC zfBS6PM/kFCANuhcjR4l1wLOtc2Fav7Xq7uD X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a0d:ccc2:0:b0:3c3:b17a:4255 with SMTP id o185-20020a0dccc2000000b003c3b17a4255mr7272761ywd.38.1672913933058; Thu, 05 Jan 2023 02:18:53 -0800 (PST) Date: Thu, 5 Jan 2023 10:17:59 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-2-jthoughton@google.com> Subject: [PATCH 01/46] hugetlb: don't set PageUptodate for UFFDIO_CONTINUE From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177550114112847?= X-GMAIL-MSGID: =?utf-8?q?1754177550114112847?= If would be bad if we actually set PageUptodate with UFFDIO_CONTINUE; PageUptodate indicates that the page has been zeroed, and we don't want to give a non-zeroed page to the user. The reason this change is being made now is because UFFDIO_CONTINUEs on subpages definitely shouldn't set this page flag on the head page. Signed-off-by: James Houghton --- mm/hugetlb.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index b39b74e0591a..b061e31c1fb8 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6229,7 +6229,16 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, * preceding stores to the page contents become visible before * the set_pte_at() write. */ - __SetPageUptodate(page); + if (!is_continue) + __SetPageUptodate(page); + else if (!PageUptodate(page)) { + /* + * This should never happen; HugeTLB pages are always Uptodate + * as soon as they are allocated. + */ + ret = -EFAULT; + goto out_release_nounlock; + } /* Add shared, newly allocated pages to the page cache. */ if (vm_shared && !is_continue) { From patchwork Thu Jan 5 10:18:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39426 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp227267wrt; Thu, 5 Jan 2023 02:21:21 -0800 (PST) X-Google-Smtp-Source: AMrXdXtC0ut8kx+Izma0395rCi2GG14YaTfqS9VhNTSnpH2IoQLLSvtW8OPtEf6e6AzoHFzSZ/C7 X-Received: by 2002:a05:6a20:8414:b0:b3:87f8:8386 with SMTP id c20-20020a056a20841400b000b387f88386mr56341043pzd.24.1672914081302; Thu, 05 Jan 2023 02:21:21 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914081; cv=none; d=google.com; s=arc-20160816; b=yeyR0WlLt6Ppiuy+3tPhPgpP7xu1a7t4ywzBrZKXe70HbuWXTzw6Zpt+aEPbY1mb9D KNV0qJZSSODpSesXEVLxDdaKcJxt2qMV2ELvzyvN487R9ICVrmf3Q3ETfTtW4vr+HcMq cP19C5aspXwqqHefa+nW0QVc6U1pq/+DZJJ/N85dLlHjF9Q0xQWBSje5QIwkQVTdKPwO +BLBMsOPI2nfqkX2kUVCjC/8juyrLM1DBdfvX7vuPbarhJmahhl2bG4KwY3sYzcC2pdk Kgp6Y1S/zuFKZReJwtYiYyWoshcMTCDm/tGFw9y/0Wr6/9oF/56fdKtZ0QeDhZTS0nhB o2wQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=0YYyrn/i5J+Hdl7Oce+Vi3RxVXkQH55IR2Pj2f6MRss=; b=D6Cia2BRdeI+3k2CnxNEkIYRvbEBKOoAHvayGYwyVtqCR8may2JhHr00lpUg4Kg/Ym m6m3faemvT/E5dtfffzGWadViff6gAJZxd5iWuwE9QarWGMURRJreymF+NVlbLK0P2y9 WLmkc3i01wRYWAkEysG95/g+IWk26i2ybE9MycQqzsWBh/s0XD9a4jz3T3H+xVqoRj02 E6Jl0/FOmeiolbJP4ij2EUkMvCsq3KPQzyf0yZdc/fpHBN1bPCOl6JcY0nRUSe0VhejT ZA6C+0iq5ZUqerYuuCVxsxvjovB2QM4dG111qbiJGDhzJCVDFcEJR/fWRGaRMdwmS5Yb Dzfg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=OvRCq1K7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b18-20020a63e712000000b00490ddd8a344si38146330pgi.46.2023.01.05.02.21.08; Thu, 05 Jan 2023 02:21:21 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=OvRCq1K7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231772AbjAEKTZ (ORCPT + 99 others); Thu, 5 Jan 2023 05:19:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40156 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232249AbjAEKTA (ORCPT ); Thu, 5 Jan 2023 05:19:00 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 474004E436 for ; Thu, 5 Jan 2023 02:18:55 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-4bdeb1bbeafso12312247b3.4 for ; Thu, 05 Jan 2023 02:18:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=0YYyrn/i5J+Hdl7Oce+Vi3RxVXkQH55IR2Pj2f6MRss=; b=OvRCq1K7vBr4jC60kQ8Em8RrEuL5pvhvHnKcAsgERc3uS+w79rqkAY+8zuKClMXYP1 9Ko8z21BLtt7+dY9dy1V18U882pGXu/OE6i+Xr0iQPnYIyNgznMU4MIYAU10i4V8msYi vF8PBI3bFXNBLc45eUtC+jumcHqn1CRhScBqxVt0/BNc1x40y+nrTfMPbTVNaD7phI6p n1O1IlZLttIwzPlRgdc/hc2CmAd1C15Vy/o6AJ8g8UFwO7fkbcWUnHCUscKvp68diaN9 zV3S/Vot4voCzs8e2T5wk56DNUb4ssJaLwx9z6hGPfl2a20zFaxom/8VdWD8sa1qWbzF 3Vuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=0YYyrn/i5J+Hdl7Oce+Vi3RxVXkQH55IR2Pj2f6MRss=; b=fhcXL4/myGNM52DgXDv3lbc7cjcEybYuQ8OqJQwIuS3LxzSEnahPMZFigb5BaB2U+w bRzLtb0O/wzftA3I9hKzjhrL3UY1hlC9zvdhM/+X5mCmartI31F6+LHiODQYpvDFfTlA 5hUI3XBNUjLN1CFaMDweTBb59Qvl41y81xBQ4X8LG/B0nh8OD3fINZRaXWRPzZX89rDm 57s65LszxsN9MAWh4TsMs8qRSI0N3+Tm64EIaoYFFzJn1wrati4iNtzfXiGq2W2yPWXR 62PtjARyDt+AyOs3XFBeZkK0MwZIY9HI9Jpqp9G3TJhSVsLDbDxJZ7SVlcHIRnDoqIhx fAlA== X-Gm-Message-State: AFqh2kpPb3ODrmFr7hhEugSMDLyT24CoBOz758HKF283ViN2x8EM3vwo hjplqIwnIIG8YPkcTfhNHQXvwv6NrUQJ8sED X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:5056:0:b0:7b6:2b8f:f2c0 with SMTP id e83-20020a255056000000b007b62b8ff2c0mr25297ybb.46.1672913934575; Thu, 05 Jan 2023 02:18:54 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:00 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-3-jthoughton@google.com> Subject: [PATCH 02/46] hugetlb: remove mk_huge_pte; it is unused From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177555870623607?= X-GMAIL-MSGID: =?utf-8?q?1754177555870623607?= mk_huge_pte is unused and not necessary. pte_mkhuge is the appropriate function to call to create a HugeTLB PTE (see Documentation/mm/arch_pgtable_helpers.rst). It is being removed now to avoid complicating the implementation of HugeTLB high-granularity mapping. Acked-by: Peter Xu Acked-by: Mina Almasry Reviewed-by: Mike Kravetz Signed-off-by: James Houghton --- arch/s390/include/asm/hugetlb.h | 5 ----- include/asm-generic/hugetlb.h | 5 ----- mm/debug_vm_pgtable.c | 2 +- mm/hugetlb.c | 7 +++---- 4 files changed, 4 insertions(+), 15 deletions(-) diff --git a/arch/s390/include/asm/hugetlb.h b/arch/s390/include/asm/hugetlb.h index ccdbccfde148..c34893719715 100644 --- a/arch/s390/include/asm/hugetlb.h +++ b/arch/s390/include/asm/hugetlb.h @@ -77,11 +77,6 @@ static inline void huge_ptep_set_wrprotect(struct mm_struct *mm, set_huge_pte_at(mm, addr, ptep, pte_wrprotect(pte)); } -static inline pte_t mk_huge_pte(struct page *page, pgprot_t pgprot) -{ - return mk_pte(page, pgprot); -} - static inline int huge_pte_none(pte_t pte) { return pte_none(pte); diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h index d7f6335d3999..be2e763e956f 100644 --- a/include/asm-generic/hugetlb.h +++ b/include/asm-generic/hugetlb.h @@ -5,11 +5,6 @@ #include #include -static inline pte_t mk_huge_pte(struct page *page, pgprot_t pgprot) -{ - return mk_pte(page, pgprot); -} - static inline unsigned long huge_pte_write(pte_t pte) { return pte_write(pte); diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c index c631ade3f1d2..643cce3493cc 100644 --- a/mm/debug_vm_pgtable.c +++ b/mm/debug_vm_pgtable.c @@ -900,7 +900,7 @@ static void __init hugetlb_basic_tests(struct pgtable_debug_args *args) * as it was previously derived from a real kernel symbol. */ page = pfn_to_page(args->fixed_pmd_pfn); - pte = mk_huge_pte(page, args->page_prot); + pte = mk_pte(page, args->page_prot); WARN_ON(!huge_pte_dirty(huge_pte_mkdirty(pte))); WARN_ON(!huge_pte_write(huge_pte_mkwrite(huge_pte_wrprotect(pte)))); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index b061e31c1fb8..7e9793b602ac 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4870,11 +4870,10 @@ static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page, unsigned int shift = huge_page_shift(hstate_vma(vma)); if (writable) { - entry = huge_pte_mkwrite(huge_pte_mkdirty(mk_huge_pte(page, - vma->vm_page_prot))); + entry = huge_pte_mkwrite(huge_pte_mkdirty(mk_pte(page, + vma->vm_page_prot))); } else { - entry = huge_pte_wrprotect(mk_huge_pte(page, - vma->vm_page_prot)); + entry = huge_pte_wrprotect(mk_pte(page, vma->vm_page_prot)); } entry = pte_mkyoung(entry); entry = arch_make_huge_pte(entry, shift, vma->vm_flags); From patchwork Thu Jan 5 10:18:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39427 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp227280wrt; Thu, 5 Jan 2023 02:21:23 -0800 (PST) X-Google-Smtp-Source: AMrXdXs4Bnp8ZvlwMAFN/6buIGf0hjVrv1S5uGUvF3iFyJIEDmVs9IZN9SnbF3yagitNjtow6vav X-Received: by 2002:a05:6a21:339b:b0:ad:c97f:1c1b with SMTP id yy27-20020a056a21339b00b000adc97f1c1bmr78188551pzb.0.1672914083186; Thu, 05 Jan 2023 02:21:23 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914083; cv=none; d=google.com; s=arc-20160816; b=SloVWMrdWIhr9SMkgCJyRrqiAW2/4RDws2ufntbqF3ymX70jC9Bl2+UPlzGA3bQArk +DwWvKJQNd7Ll24K9yss8B/VCIxR9OuuKHQvriame38Wl+YT+zS6Hf0C3KfdfHkr5OmI fZax59CWrtHkgAWib5bLWLz/GsnmRMz0hIttuv30vFiwPor5gYhuL3AECmQWcAuX5wUR Hkp0IYVmzWJLxsIt6gu0qFwoPWVF50WlAlBa7BOT8MDuiUKPFVQHFAtyWVsm8rYBEinn 304YlL3874LhVwgAY0GIJYP4mRlWr2cg8jNipCHHVzp+i0BuypKiyYfhyXyG0DLWmvaO 4mDw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=6/ywzp+1XO7/Q8q0gmg2VRvRHujBF/Iur/xAcCpSChk=; b=HgDuacqq5iywOo7q0G3fdjew/8d3Q9pk9NqEnkHFgMtaBwPY2tNexywaX+DVMkkD3h s3lF+AQVW3WkagW1xwxysPNl+CurGm4K2wKXe8v6lQgtFscSFvmh/1bJsFmmD7JDQO+C Yam9c81w411AsyqKZQL+PklHQM4OEzPS91kQ9s6cW0qFlTikEi8NKd1wj7qRvlFvFKaM /gjkQa4TakTAuMsbn3cNZonAvSdDE82D7yey2C5XGsqtpYEsJqlbnwKKheR02/fA/tHv UGYSBwUr4cwGEMcVqe4GaNzY6su5Ao8khXO8gitRk3+f02rlUSqLRZ3q4bDLiZhMNnuI Cqkw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=pqyC2Khn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g12-20020a63fa4c000000b00477dd72d2b9si37891895pgk.72.2023.01.05.02.21.10; Thu, 05 Jan 2023 02:21:23 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=pqyC2Khn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232546AbjAEKTa (ORCPT + 99 others); Thu, 5 Jan 2023 05:19:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40172 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232540AbjAEKTB (ORCPT ); Thu, 5 Jan 2023 05:19:01 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F13ED395E1 for ; Thu, 5 Jan 2023 02:18:56 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id v13-20020a25ab8d000000b007b54623bf71so1083316ybi.2 for ; Thu, 05 Jan 2023 02:18:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=6/ywzp+1XO7/Q8q0gmg2VRvRHujBF/Iur/xAcCpSChk=; b=pqyC2KhnurZ2BnxaxGpvFtfxzpkkxexWo/NsKpGJkrtsdVnOn9JsTznF5L8T10dpT9 AAb47kUzRLKe5zL54ZIpCmLTz4iG0qnBDPnnSJAcX0pnE6+rQ5/lLGGtdmK63OoTnblo GF3eU42h4BBDQ+ilspuDnsLoOBvix+ESeq+eoZdNGcZsD8QY/hbtyL1dR+g501YZ67Fz z9RZ/NbRSQRm1YcmxGoEQ2uIv7H63+AOQO1vX0XPbnZjLlUJSXkLGIr1Ih39qo/E4CRT MtdnGWeT9yFYs2+Gmn78g91Y+8+cLHMNidVLgMRBi0PWC2oIxVrguFCrpF8qXSNrN1Aj N+1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=6/ywzp+1XO7/Q8q0gmg2VRvRHujBF/Iur/xAcCpSChk=; b=k+wtV9f+Vo4yo2iwVPRg+YPf676NVMWbA/bz6f3ErzkMph4NsLxQrWvbDeBYWWN6Y/ UY5xPGHsZ8l7b6qCu7xUeiQOeUqvGmaRJrrijStmhnR6XAOXM5Le5yanImtCvWo5RH+9 2Y1SpgfVgnwNdgZTy3mu2IADGyPoQrYBwGecQgqGTEs9JgwC6xNHOEn3H6xcLYlVGr5I 3OPG1Ud6VMp7dWpsM+Vw9sDFyvNm1yHkhxWJOqsO6cf4Z7+foE1Qj8fR7dUdXdSxHMj8 w/kEsNAoxxyfc1BUz5OGPgQF+hrFfR51O1pCDz+J7WtXk13yY4Qk0/7U5uc8M1Ai7NPL 0qxQ== X-Gm-Message-State: AFqh2kpoL9USd/E7MiNSDC9LGdbGJiHGo1yZI3zOl1WOKfQRY9B9rtIK g3ONhuu5lwNuZz6Jvb3/slroZuR0Lz3eLaTF X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:bc8c:0:b0:79c:8a57:64ed with SMTP id e12-20020a25bc8c000000b0079c8a5764edmr1722191ybk.16.1672913935908; Thu, 05 Jan 2023 02:18:55 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:01 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-4-jthoughton@google.com> Subject: [PATCH 03/46] hugetlb: remove redundant pte_mkhuge in migration path From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177557732941907?= X-GMAIL-MSGID: =?utf-8?q?1754177557732941907?= arch_make_huge_pte, which is called immediately following pte_mkhuge, already makes the necessary changes to the PTE that pte_mkhuge would have. The generic implementation of arch_make_huge_pte simply calls pte_mkhuge. Acked-by: Peter Xu Acked-by: Mina Almasry Reviewed-by: Mike Kravetz Signed-off-by: James Houghton --- mm/migrate.c | 1 - 1 file changed, 1 deletion(-) diff --git a/mm/migrate.c b/mm/migrate.c index 494b3753fda9..b5032c3e940a 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -246,7 +246,6 @@ static bool remove_migration_pte(struct folio *folio, if (folio_test_hugetlb(folio)) { unsigned int shift = huge_page_shift(hstate_vma(vma)); - pte = pte_mkhuge(pte); pte = arch_make_huge_pte(pte, shift, vma->vm_flags); if (folio_test_anon(folio)) hugepage_add_anon_rmap(new, vma, pvmw.address, From patchwork Thu Jan 5 10:18:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39428 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp227302wrt; Thu, 5 Jan 2023 02:21:27 -0800 (PST) X-Google-Smtp-Source: AMrXdXuAI+qeXu6I5t+UBQZs7mQu2ZpeHG9YiX7DYa5MllwEa4C8RDUkNowM6r+2g8kyZnkG1d7q X-Received: by 2002:a17:90a:f317:b0:226:ca90:df8b with SMTP id ca23-20020a17090af31700b00226ca90df8bmr2241647pjb.12.1672914087217; Thu, 05 Jan 2023 02:21:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914087; cv=none; d=google.com; s=arc-20160816; b=oAoJLmhdV9TPTQY56NVOz5oe328EMkZWw+5MrFhw8Vf0Pm9KMpy5ibMfsPMvGSAQUt SvssoALjSFl2J+t+Ye6CHbaR+aLUVv/G9G4EoMiFLdVgvqtqsuKtRCQsdk0cxNcWPuPk J1plJ4U5NeQVXLM2iyTE8kmtFScogWD+V+sh239oWPDAVeKHqxxIKumrpEji1IO010pG pSiz7yqj4h+FuVlzwN58LapmQeVIS/UO7+rMvJ+5TlP5RzrICmtmtZaN+SUjXJ0LyZac BBzI/JitPopsqr+v+9B3rAoRZU/IjrEY5TWCx3LoV+NKB838V/G+2JdWbcqaRjHG4ADR zOew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=8fBYHLCr9oSB4M/a/ngoaaKhcuSrl4k3J29XD8vYb2Q=; b=EAL9k22u7ASYpZWj9USgNGprVh87W5qKVWG5304Mn5xpOgQ8ciOzqg7fDsijHoKIAo lgnIvU64EhPEsAV+7vq/RCCTFxctSuc6UXmw/MaJJlEMT27QUCL0V21AV9vb8GSeFaVb m09iNtcQTACy/HkpXikdtppB16++4CIZ1KPlw5Hw1/27aOpzgyJY3f4PPbegAL0LQDgL Gjjxtd+usSlE6RYiHqrvXwrjBR8nQ9eq3IV9LB52CNImOPRogchXwOoUEZBFyKXqycro 2ksz/jNsLNO3QC0TUtNw3Rs3PwcAFVz1RiNLr2Ic7rBQWfozbUvxiouLW5SeFBn2pjqQ ATCA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=NmgS3tlh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id mh17-20020a17090b4ad100b00205f1a25a31si1587377pjb.161.2023.01.05.02.21.14; Thu, 05 Jan 2023 02:21:27 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=NmgS3tlh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232682AbjAEKTi (ORCPT + 99 others); Thu, 5 Jan 2023 05:19:38 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40062 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232562AbjAEKTG (ORCPT ); Thu, 5 Jan 2023 05:19:06 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AD4784FD69 for ; Thu, 5 Jan 2023 02:18:58 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id w9-20020a05690210c900b007b20e8d0c99so4516353ybu.0 for ; Thu, 05 Jan 2023 02:18:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=8fBYHLCr9oSB4M/a/ngoaaKhcuSrl4k3J29XD8vYb2Q=; b=NmgS3tlh8WUTbw91DNXzILqULTbddBWnPQn0Wz57ppipw1WObH6FkSpTww6yztM+jA 6e8g5ZzoA5lVCAREHkwcp9nrWOI+Q0S0zYB3GUuo7cG78xBZHsz/CSyv57wBUHMoyAZn uMRawDWwRYn9gVztU7/tvicjuzxjqEUsaG6Q2Nh3CiNSMC3lromhxCi7P1Rw/hwbHL1F YmJbdMcEkKBiPCX4t46+UvV1gVJE1sWQPNJLWRGQKmx7VESDn0Z4IGBEAOvcilWvJwx0 097HlvspkiuYJIAb7OcrrOPaSLfJxD3uYwyJHjA4LcqVHz5gZ4GhB3QpiX2mLa2tpK7P fY/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=8fBYHLCr9oSB4M/a/ngoaaKhcuSrl4k3J29XD8vYb2Q=; b=wnmt2Kv9a9scTwKcMWBV9N6HLMXJDPxH6gUzS0IZe7ed7HwUOXbuQ2ugjsY7Dlqza+ PE2OmeTptEQoFyzrWUaMXohnfVXvCB724Ke3caaEnSRnaf+J0rIDXrttE0byjJqdcSDx ju/+qxEy/POEfI+VIxKSBik8spJHW3bLNL4dShNIPXov6zsA/bAraTU8x1FF9Q9XYR9k 7BsjlUtclgFGKm4z7igklXTaHomLmTSYWke82aFWtVrCBaZMoSayO+qh00pTwbuvAYLR Enh3M8Om+76+dWSj5jWhg5s+9G/pyzoDiJNmiEgzsVwCGS2tL+RRIWQU0PZIZukpXR4D bfUw== X-Gm-Message-State: AFqh2kqKWT7mpbsj7iaYrWMoMdlDLlRR9qeZHUyYQrgKeH0CNnpn7LDc FvAzonvRCw2KGLLKNj7QupofZGkrMcXVgQWo X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:f311:0:b0:6ff:84be:717 with SMTP id c17-20020a25f311000000b006ff84be0717mr5385597ybs.314.1672913937929; Thu, 05 Jan 2023 02:18:57 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:02 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-5-jthoughton@google.com> Subject: [PATCH 04/46] hugetlb: only adjust address ranges when VMAs want PMD sharing From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177562002440008?= X-GMAIL-MSGID: =?utf-8?q?1754177562002440008?= Currently this check is overly aggressive. For some userfaultfd VMAs, VMA sharing is disabled, yet we still widen the address range, which is used for flushing TLBs and sending MMU notifiers. This is done now, as HGM VMAs also have sharing disabled, yet would still have flush ranges adjusted. Overaggressively flushing TLBs and triggering MMU notifiers is particularly harmful with lots of high-granularity operations. Acked-by: Peter Xu Reviewed-by: Mike Kravetz Signed-off-by: James Houghton --- mm/hugetlb.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 7e9793b602ac..99fadd7680ec 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6961,22 +6961,31 @@ static unsigned long page_table_shareable(struct vm_area_struct *svma, return saddr; } -bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr) +static bool pmd_sharing_possible(struct vm_area_struct *vma) { - unsigned long start = addr & PUD_MASK; - unsigned long end = start + PUD_SIZE; - #ifdef CONFIG_USERFAULTFD if (uffd_disable_huge_pmd_share(vma)) return false; #endif /* - * check on proper vm_flags and page table alignment + * Only shared VMAs can share PMDs. */ if (!(vma->vm_flags & VM_MAYSHARE)) return false; if (!vma->vm_private_data) /* vma lock required for sharing */ return false; + return true; +} + +bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr) +{ + unsigned long start = addr & PUD_MASK; + unsigned long end = start + PUD_SIZE; + /* + * check on proper vm_flags and page table alignment + */ + if (!pmd_sharing_possible(vma)) + return false; if (!range_in_vma(vma, start, end)) return false; return true; @@ -6997,7 +7006,7 @@ void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, * vma needs to span at least one aligned PUD size, and the range * must be at least partially within in. */ - if (!(vma->vm_flags & VM_MAYSHARE) || !(v_end > v_start) || + if (!pmd_sharing_possible(vma) || !(v_end > v_start) || (*end <= v_start) || (*start >= v_end)) return; From patchwork Thu Jan 5 10:18:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39430 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp227356wrt; Thu, 5 Jan 2023 02:21:36 -0800 (PST) X-Google-Smtp-Source: AMrXdXsJ/8zKF0OO9y25zl8os9B/EMZl/57tKmVra434AOBuFP3XIquEP304cpMFE4XZCpDo7s9d X-Received: by 2002:a17:90a:b288:b0:225:c2b4:5742 with SMTP id c8-20020a17090ab28800b00225c2b45742mr45680038pjr.34.1672914096054; Thu, 05 Jan 2023 02:21:36 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914096; cv=none; d=google.com; s=arc-20160816; b=sT9MpP1k4S6O8+vor3Q05YEJi7u03KBuAF24naE6tx//KvwqRDU3VLfWb2MtiEppGW 3eJbJ8+N3STgCyEsK2x6SzLJvdK3tZPwHBlIQX0ofZqQjS7nzc/0ji04FlTkvHG4WAKr NAQvbiU7sNl//soBw7Wnw9qiT2Z8RFA73dzkOJ9UWHU7Mkb2NO030wYh1e7YkfRXwayL N91S/SfHhKJc4fnw6FQtbFYQBEGnAufQ6nLQ91bfv/j3q0/iTPQe3P6wIalx0FbnTQPv i2qnTmj7yj/JdyOLoP0GpOr2a7QidIVX3l+SH3c8rI1RRcG4PhJR/LUkvVADqQbnTZu2 PwkA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=QisuvGCTCnQrC46aY73QASoWAbEZFo7ff5Vj8AOV5D4=; b=Lzuhog84PkQAtEdb0I7iVk8b+5s4Q0EH3FTuq9ito/pUZFodSm1DsAbKF0ZR7Ef1Nk k+zh+IkFsMoTVpArHBv6U13RcWMfETpZ0aM0Au4+o7UbOsW5A6faiB3EfN312Mz7OUya /9JD0udlufX0PuTWqIKgEChPjfz+41w0Hc3KlIrIRAthkt2TRMxSfGkTDMong2LbEfHq qeCygZwPW5Z9M0o15JQs+rjrgWxrOTcHYKPxGMfPDeF7IeMIgiDAyG6P1DdXn3yNfGAD A1UVZ6yck4M7MNMhJiBnjPlZdIQD45PfTXG4KJJroPtUsIurub2rGzqua49js0OZepFM G80Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=D05RGKnu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g21-20020a17090ace9500b002213d9c01cdsi1452214pju.136.2023.01.05.02.21.23; Thu, 05 Jan 2023 02:21:36 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=D05RGKnu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232707AbjAEKTq (ORCPT + 99 others); Thu, 5 Jan 2023 05:19:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40156 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232598AbjAEKTN (ORCPT ); Thu, 5 Jan 2023 05:19:13 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8234A50156 for ; Thu, 5 Jan 2023 02:19:00 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id a4-20020a5b0004000000b006fdc6aaec4fso36882267ybp.20 for ; Thu, 05 Jan 2023 02:19:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=QisuvGCTCnQrC46aY73QASoWAbEZFo7ff5Vj8AOV5D4=; b=D05RGKnu5npzQ4+IoIOCUoopJdHBZlGdmzR7wKgjiYJ7fpsPJXuc7/AVMHrkDHqo2o RNix7ExBJuxdhCqhn48A5PCySgryuQc3GCHIVd9cC5LTv0Q9aGo5YYJG8Ce+XT9OpeXk M3IL0FBVE2YUx+rEwav3paVke2g6M0EhpZNkkEWcM4bLj0uxgg8utb4DdIrHzftCy4tA Kij4pO4w0SwaaYqXSHOVWmgJ+wQsZ23Y9Y/ujE19y5mdHZG78XAtWGKWdrNVRkm9Nl44 elMxTtHTuRQV91u0RG0dTfAvGUJKSIuBA+Spih8dBWhAl2BkcB0t7zcxNmwHvfL9sk81 3ptA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=QisuvGCTCnQrC46aY73QASoWAbEZFo7ff5Vj8AOV5D4=; b=bI+UbhFoZYNZW+Fsd1qnpMj342fV5eDzP/Z5NXBwfIF6ZQtDmnJsukLUDfydugn0aA 4K4XEhs0LXWcdVYqQH0ZX5VBhH2QPY9oLjUxdzGgtfxwJtKjVd/feTrMayvGAbqSlQuP pqJ4cO6zl0PHCLAX4hxGwPhQssmoAY+KAsweVxF3h80N20L49IZIV657JfZT0/pzSXfT sBJwSTmNS07Pz4HlxT9P9eAGdz8EHSAcuxI6K0XwI0fWFGefRbPvW6NKhLjDfmd13Pwp dBQ7+uwPXPZUlNPnurp7sUKFHn6lvaSl0mQArMF9UOnEKicvRRPLGvQzYQEu2WB/2g8j MBGA== X-Gm-Message-State: AFqh2kp4ho5J2pAWpkNV04/1FYjW3H7EokoUeZc9H8kGX9AL1po6uCMr yCpU0li2doWLLQyQNm4a6ZAkCrDmQjyPczaY X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:ab4a:0:b0:7b4:df57:d887 with SMTP id u68-20020a25ab4a000000b007b4df57d887mr131335ybi.601.1672913939807; Thu, 05 Jan 2023 02:18:59 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:03 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-6-jthoughton@google.com> Subject: [PATCH 05/46] hugetlb: add CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177571040605921?= X-GMAIL-MSGID: =?utf-8?q?1754177571040605921?= This adds the Kconfig to enable or disable high-granularity mapping. Each architecture must explicitly opt-in to it (via ARCH_WANT_HUGETLB_HIGH_GRANULARITY_MAPPING), but when opted in, HGM will be enabled by default if HUGETLB_PAGE is enabled. Signed-off-by: James Houghton --- fs/Kconfig | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/fs/Kconfig b/fs/Kconfig index 2685a4d0d353..ce2567946016 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -267,6 +267,13 @@ config HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON enable HVO by default. It can be disabled via hugetlb_free_vmemmap=off (boot command line) or hugetlb_optimize_vmemmap (sysctl). +config ARCH_WANT_HUGETLB_HIGH_GRANULARITY_MAPPING + bool + +config HUGETLB_HIGH_GRANULARITY_MAPPING + def_bool HUGETLB_PAGE + depends on ARCH_WANT_HUGETLB_HIGH_GRANULARITY_MAPPING + config MEMFD_CREATE def_bool TMPFS || HUGETLBFS From patchwork Thu Jan 5 10:18:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39431 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp227376wrt; Thu, 5 Jan 2023 02:21:38 -0800 (PST) X-Google-Smtp-Source: AMrXdXsvVsg0s00wOqhdYK1/2nLf/GocVGs7uEfgZCCJ+lX57e1au7DN78wgYMOdLoZTEMpPwDba X-Received: by 2002:a05:6a21:3a94:b0:9d:efbe:52c3 with SMTP id zv20-20020a056a213a9400b0009defbe52c3mr62460704pzb.51.1672914097870; Thu, 05 Jan 2023 02:21:37 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914097; cv=none; d=google.com; s=arc-20160816; b=BtQUDIgvos+HQsmDI8Wzairyy4P9rFN9lDa9Knd+AdpLJrVuT3Q0Cmh2VKIttps8ZA StPdmQPFgRQLlnyvoqdo4LufuXFj6AQ2NfyBEgtoebIimEtt5LhpuW7JAmPzPI9xKDQ0 yIQqF7uyj7IfglCQGe+y7HiT0xx3FlWRXTHsJz4Jl1cxNlXf5OoSgH/XW3nG3X2k41Tg 6pt5utuVVwSoPC+UnTBv0m1cjXLUb+BqF0I5fAIL9k9xuV/sv7L2RgtesVMBVCXPk9Wb RLwNGWo8IjrUezltvt8nBT9wTk4qNAoiK/xICz07iNFVs8TIziJQnPfOjH/kqHl30P/q qbkg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=HIG9BIAJU+cdFn89xXe0LNee36+C1EcX0L4kUoSH9VA=; b=JZ/4Ow0e+UzQNtb2qqwesynuLvIpvD9Y8Bu4IXR1K8z84wUhN+mN3WqTObdhMrBJzl qiHpY5hbWLVKCHyZAYf2MsD3DxtAdWoM3h6TgfhEQnp6Wjxs1hDtZvFxbL7JmebUqv3A G03HTu2JG5FUK63VWzYcIED1m8CZuipXLXQZvr6O+WFRGybZ9TN8EyggacJkAQaidsDP 1+11lQp/yEQgQjQZvDpmfTMhnrpqkeNip0DCm4uXzRj4+36jGcVwbR9PnlHWzIcWbcpU yhXhdMWj55pf1IzLyH4A1sz22Gn7m0eZskGjM3vlsnAiUDnKzcrUG18VQV7HvZlVVqh1 xxng== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=CMFdycOs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c5-20020a6566c5000000b004a339f6f89esi9927928pgw.470.2023.01.05.02.21.24; Thu, 05 Jan 2023 02:21:37 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=CMFdycOs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232547AbjAEKT7 (ORCPT + 99 others); Thu, 5 Jan 2023 05:19:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40172 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232626AbjAEKTU (ORCPT ); Thu, 5 Jan 2023 05:19:20 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0ACA648CCE for ; Thu, 5 Jan 2023 02:19:02 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-47ede4426e1so257845227b3.7 for ; Thu, 05 Jan 2023 02:19:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=HIG9BIAJU+cdFn89xXe0LNee36+C1EcX0L4kUoSH9VA=; b=CMFdycOsnlaZrP1tfINkoF9MpUwFHEYphAoiyklgUZsVLZG1tYthaAIbMq5iR4qWCX OXHWYueDyEFCpfBnjUioieM42w8NXjNUz5fmKNAAz8wLLvn5nPj/Bs9/ErfoA3IO4IPY c5oqUYecWCJ1I0u3x2YoT/ifuZ7HYu6l6/zffhFRBo2upvwZyY7aMzs+EuiwuXlkfrtN iASt+00Ipmgct2pD5KbpfbNwwHjYtIXrPj370sJlaEw+j7kbcNb7n2JHv67hdVJRv4uI ls+rD3wSrKgqrGl9nm1pKSRgOs1WF84oEpvybmix2WpHGSw3rcDgV7GQxKLzlEWTzLiD +41A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=HIG9BIAJU+cdFn89xXe0LNee36+C1EcX0L4kUoSH9VA=; b=qRN1WeLS8RBaZAtJXCfqXzGAkMnyu2Yb1aaBD9Pynf+6gM3RPBJilEnDS3FlngxL3o 753UylRkdvkV13EvFh3OwTjDDdnqqzMqz0qOMXAPqp94eYUozODR2lnV7F9OpieDxx8J 5XUkBqrXoDNdLtla2KNBjdlkrgLgLpIJlUEURJtAAFBsBW05hMrmGUe8/ccCDQfkEt2l w2gVzmyokQHgJNFhoJFr4sPZjtmvmHUz/FynuWWfNQOZoa86IZPLNVvOuZ0DGpW0clI0 BFteA+sb+1k7DbrVlRjJJbF0YhxGg/V4CFggAJmEBIWgzV5EnxYkg6AgvKBz3q0n3pJR m0Gw== X-Gm-Message-State: AFqh2krKWncdI8CDoNsw4ow8P0CH+UO6PMhIvTip7mknC8UZX6aXzkXl axKnyTfHdQ9gZnRCX9MM33eL0ybTdd5IKWrF X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:58c3:0:b0:703:5949:3b2a with SMTP id m186-20020a2558c3000000b0070359493b2amr3359803ybb.525.1672913941310; Thu, 05 Jan 2023 02:19:01 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:04 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-7-jthoughton@google.com> Subject: [PATCH 06/46] mm: add VM_HUGETLB_HGM VMA flag From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177572959068680?= X-GMAIL-MSGID: =?utf-8?q?1754177572959068680?= VM_HUGETLB_HGM indicates that a HugeTLB VMA may contain high-granularity mappings. Its VmFlags string is "hm". Signed-off-by: James Houghton --- fs/proc/task_mmu.c | 3 +++ include/linux/mm.h | 7 +++++++ include/trace/events/mmflags.h | 7 +++++++ 3 files changed, 17 insertions(+) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index e35a0398db63..41b5509bde0e 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -711,6 +711,9 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR [ilog2(VM_UFFD_MINOR)] = "ui", #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING + [ilog2(VM_HUGETLB_HGM)] = "hm", +#endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ }; size_t i; diff --git a/include/linux/mm.h b/include/linux/mm.h index c37f9330f14e..738b3605f80e 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -372,6 +372,13 @@ extern unsigned int kobjsize(const void *objp); # define VM_UFFD_MINOR VM_NONE #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING +# define VM_HUGETLB_HGM_BIT 38 +# define VM_HUGETLB_HGM BIT(VM_HUGETLB_HGM_BIT) /* HugeTLB high-granularity mapping */ +#else /* !CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ +# define VM_HUGETLB_HGM VM_NONE +#endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ + /* Bits set in the VMA until the stack is in its final location */ #define VM_STACK_INCOMPLETE_SETUP (VM_RAND_READ | VM_SEQ_READ) diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h index 412b5a46374c..88ce04b2ff69 100644 --- a/include/trace/events/mmflags.h +++ b/include/trace/events/mmflags.h @@ -163,6 +163,12 @@ IF_HAVE_PG_SKIP_KASAN_POISON(PG_skip_kasan_poison, "skip_kasan_poison") # define IF_HAVE_UFFD_MINOR(flag, name) #endif +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING +# define IF_HAVE_HUGETLB_HGM(flag, name) {flag, name}, +#else +# define IF_HAVE_HUGETLB_HGM(flag, name) +#endif + #define __def_vmaflag_names \ {VM_READ, "read" }, \ {VM_WRITE, "write" }, \ @@ -187,6 +193,7 @@ IF_HAVE_UFFD_MINOR(VM_UFFD_MINOR, "uffd_minor" ) \ {VM_ACCOUNT, "account" }, \ {VM_NORESERVE, "noreserve" }, \ {VM_HUGETLB, "hugetlb" }, \ +IF_HAVE_HUGETLB_HGM(VM_HUGETLB_HGM, "hugetlb_hgm" ) \ {VM_SYNC, "sync" }, \ __VM_ARCH_SPECIFIC_1 , \ {VM_WIPEONFORK, "wipeonfork" }, \ From patchwork Thu Jan 5 10:18:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39432 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp227395wrt; Thu, 5 Jan 2023 02:21:43 -0800 (PST) X-Google-Smtp-Source: AMrXdXtRfYIGvCGEZypoXgGRH54P2zSMp6WfTLb+vs7dI0y70GpIZ4O2I8T2h6kvDES3YQg2aLA3 X-Received: by 2002:a05:6a20:2a96:b0:b2:5cf9:817b with SMTP id v22-20020a056a202a9600b000b25cf9817bmr84090278pzh.5.1672914102843; Thu, 05 Jan 2023 02:21:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914102; cv=none; d=google.com; s=arc-20160816; b=MpdBUlYBdxA1rNF6uoK7CS3aMyq1ZfLzhUvj5dwAw0GFNf9IIFq8XAo50ns/R24UcD xIbJmhfZ05NDX+/wWKtpSLPdOceYLgZcOe48GDJjiYSs+5qW1nFeB4ttkNRzCFHw9/tg 6kh1FPgc7WvGPaWvaityNn4+EXaTE4N/iNei458LOvM2Hk5wgYpTfiRggmxXv+CX9MR/ WQYy8+ofDQFpYotaqWYvk9VTbLRyGa2nI2L5s6DP6l76+X1iqXJ45/xyOG7vN7DGBqD5 huAyRJ7C//E+4f2p/J3huwnC4lXF3kUkS1Q4xRtRRNPzFh+/hsx94jYmNmKeTRF9VJ9/ piXw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=/8tZn9tkRZbaiVi1nuTHJ8rnAzFnEyMSAOvoIDK0LiA=; b=pFbiGPGay/qtd921Ko4Bg+ALa+7XCTRejUWAEGgJ72QtTKye90vS9uRrbpQWlczNES lPscvlr+teEa8nH9IdkMO56iBdq6zX1IQLFyq5lG051hHPdu7L3QsRtEfL/kC8HwcrEw tTiARiS9t3RxhRi+nl9+SW4m74WM2VdVtczJzL+En84Ryrsk510MRjmMxbM1VjPrF7Iz CJywKiwbCZaWVnAe/OLAbS4zT0dB11yOMcBFGoFkir5RkewHS465gF+zQz27pTI3sokX J82Q6O3t84eqTsTLLy/Y+E2QDy/BklRNaFC9qIjoymodY1Rk0GMOQSbLFLblmURtszCP wc6g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=Wlu7XQk1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a73-20020a63904c000000b00495e3934f60si37609491pge.38.2023.01.05.02.21.30; Thu, 05 Jan 2023 02:21:42 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=Wlu7XQk1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232799AbjAEKUD (ORCPT + 99 others); Thu, 5 Jan 2023 05:20:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40056 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232631AbjAEKTU (ORCPT ); Thu, 5 Jan 2023 05:19:20 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F20064C731 for ; Thu, 5 Jan 2023 02:19:03 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id l194-20020a2525cb000000b007b411fbdc13so2631560ybl.23 for ; Thu, 05 Jan 2023 02:19:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=/8tZn9tkRZbaiVi1nuTHJ8rnAzFnEyMSAOvoIDK0LiA=; b=Wlu7XQk16tqMmg+gNIiIDB1JA/O/qtq05ltp8iTMsYdjH00SBVdUFxa+cBdSZqAQcb d1Wubwm9cgGb/JWbc53zSO6qZR6u0tqT/uhjkvshXe4Nbc29/Mx0FTWQepbVo5VkK/f4 iZzimrA+jYG5bRVUIknjFLaPXJryFDe1l0StPyym3HfhV9kGhFLJmjpADMJIHuvbrMU2 TBM7F+ry7k4c0I//IisI9senk9jQmTO6g/9v4uatar6m1pi5ieYC8Z12sX5T0E2ZspUi kbNXjBMVADs1kunBuGHYbySRRHfw6oNiaeB7XCT6DI2AIDizBxl6F3hmyK+58dSXDshb CYtg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=/8tZn9tkRZbaiVi1nuTHJ8rnAzFnEyMSAOvoIDK0LiA=; b=Sm0dsE3fNkCopBCfSLmpvSmZ8sNYZETyFwI5//6Tu3tBTZJrRYMfFY0gWSTfog8e3e m3XGOwBgn0A7whztrbVxJ8melnKLkanmdC8R5Td6EE49osdNtqLhAdmINOneWeafpC5g nEwM+UiA8SnS1IdY4wtwdaFbOuAZKrE1G81hi/MI8owecQonf6vzqqhzr9OemOR67rPV efSZP9W2kGclv6WOiA3r1a0TjYHjsUnb4inmBBqgIXkD6rOVidW+AQuxTUi4BTqDHt5k JJM8P4oWV+K8vec140ghrZRjTMRC8ZcVs3R2HtHkQTkEpT8+yARh5EcqcxVQNWiF8WbI zNJg== X-Gm-Message-State: AFqh2kpYR/cWF+ER+ZNtyScAud1OlbFf/DgguJird04Vgq5QQMSTp/Dt Jy9MRzOvnwv7Izu5WboD4amdGLRTE6ZE8vjf X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:afce:0:b0:6ca:ecb9:9fc0 with SMTP id d14-20020a25afce000000b006caecb99fc0mr6101337ybj.199.1672913943280; Thu, 05 Jan 2023 02:19:03 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:05 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-8-jthoughton@google.com> Subject: [PATCH 07/46] hugetlb: rename __vma_shareable_flags_pmd to __vma_has_hugetlb_vma_lock From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177578308455548?= X-GMAIL-MSGID: =?utf-8?q?1754177578308455548?= Previously, if the hugetlb VMA lock was present, that meant that the VMA was PMD-shareable. Now it is possible that the VMA lock is allocated but the VMA is not PMD-shareable: if the VMA is a high-granularity VMA. It is possible for a high-granularity VMA not to have a VMA lock; in this case, MADV_COLLAPSE will not be able to collapse the mappings. Signed-off-by: James Houghton --- include/linux/hugetlb.h | 15 ++++++++++----- mm/hugetlb.c | 16 ++++++++-------- 2 files changed, 18 insertions(+), 13 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index b6b10101bea7..aa49fd8cb47c 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -1235,7 +1235,8 @@ bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr); #define flush_hugetlb_tlb_range(vma, addr, end) flush_tlb_range(vma, addr, end) #endif -static inline bool __vma_shareable_lock(struct vm_area_struct *vma) +static inline bool +__vma_has_hugetlb_vma_lock(struct vm_area_struct *vma) { return (vma->vm_flags & VM_MAYSHARE) && vma->vm_private_data; } @@ -1252,13 +1253,17 @@ hugetlb_walk(struct vm_area_struct *vma, unsigned long addr, unsigned long sz) struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; /* - * If pmd sharing possible, locking needed to safely walk the - * hugetlb pgtables. More information can be found at the comment - * above huge_pte_offset() in the same file. + * If the VMA has the hugetlb vma lock (PMD sharable or HGM + * collapsible), locking needed to safely walk the hugetlb pgtables. + * More information can be found at the comment above huge_pte_offset() + * in the same file. + * + * This doesn't do a full high-granularity walk, so we are concerned + * only with PMD unsharing. * * NOTE: lockdep_is_held() is only defined with CONFIG_LOCKDEP. */ - if (__vma_shareable_lock(vma)) + if (__vma_has_hugetlb_vma_lock(vma)) WARN_ON_ONCE(!lockdep_is_held(&vma_lock->rw_sema) && !lockdep_is_held( &vma->vm_file->f_mapping->i_mmap_rwsem)); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 99fadd7680ec..2f86fedef283 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -260,7 +260,7 @@ static inline struct hugepage_subpool *subpool_vma(struct vm_area_struct *vma) */ void hugetlb_vma_lock_read(struct vm_area_struct *vma) { - if (__vma_shareable_lock(vma)) { + if (__vma_has_hugetlb_vma_lock(vma)) { struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; down_read(&vma_lock->rw_sema); @@ -269,7 +269,7 @@ void hugetlb_vma_lock_read(struct vm_area_struct *vma) void hugetlb_vma_unlock_read(struct vm_area_struct *vma) { - if (__vma_shareable_lock(vma)) { + if (__vma_has_hugetlb_vma_lock(vma)) { struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; up_read(&vma_lock->rw_sema); @@ -278,7 +278,7 @@ void hugetlb_vma_unlock_read(struct vm_area_struct *vma) void hugetlb_vma_lock_write(struct vm_area_struct *vma) { - if (__vma_shareable_lock(vma)) { + if (__vma_has_hugetlb_vma_lock(vma)) { struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; down_write(&vma_lock->rw_sema); @@ -287,7 +287,7 @@ void hugetlb_vma_lock_write(struct vm_area_struct *vma) void hugetlb_vma_unlock_write(struct vm_area_struct *vma) { - if (__vma_shareable_lock(vma)) { + if (__vma_has_hugetlb_vma_lock(vma)) { struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; up_write(&vma_lock->rw_sema); @@ -298,7 +298,7 @@ int hugetlb_vma_trylock_write(struct vm_area_struct *vma) { struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; - if (!__vma_shareable_lock(vma)) + if (!__vma_has_hugetlb_vma_lock(vma)) return 1; return down_write_trylock(&vma_lock->rw_sema); @@ -306,7 +306,7 @@ int hugetlb_vma_trylock_write(struct vm_area_struct *vma) void hugetlb_vma_assert_locked(struct vm_area_struct *vma) { - if (__vma_shareable_lock(vma)) { + if (__vma_has_hugetlb_vma_lock(vma)) { struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; lockdep_assert_held(&vma_lock->rw_sema); @@ -338,7 +338,7 @@ static void __hugetlb_vma_unlock_write_put(struct hugetlb_vma_lock *vma_lock) static void __hugetlb_vma_unlock_write_free(struct vm_area_struct *vma) { - if (__vma_shareable_lock(vma)) { + if (__vma_has_hugetlb_vma_lock(vma)) { struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; __hugetlb_vma_unlock_write_put(vma_lock); @@ -350,7 +350,7 @@ static void hugetlb_vma_lock_free(struct vm_area_struct *vma) /* * Only present in sharable vmas. */ - if (!vma || !__vma_shareable_lock(vma)) + if (!vma || !__vma_has_hugetlb_vma_lock(vma)) return; if (vma->vm_private_data) { From patchwork Thu Jan 5 10:18:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39433 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp227420wrt; Thu, 5 Jan 2023 02:21:46 -0800 (PST) X-Google-Smtp-Source: AMrXdXt2JzGDIUtndTY1ACA+ZAG3mtJY6hJa4jCrfZVg1x+oo175tBZxgwoGt1EyIumXePWFzdGB X-Received: by 2002:a05:6a20:cc41:b0:b1:f593:e738 with SMTP id hq1-20020a056a20cc4100b000b1f593e738mr51647287pzb.28.1672914106267; Thu, 05 Jan 2023 02:21:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914106; cv=none; d=google.com; s=arc-20160816; b=FdOPEUygThXAL5HE8ua/4wCz4iJGSw6COqSBcNgB81EpyMJysClusxCKcJlGSe2kj6 Bz5rWw4kBTKbgI+Sj+nf/Nb8oaBl/aOjLSLH3Hxbck8m56NUnJ8/5e5wCzFDJUOULGyT 4YJBryZV+A1tS4yWfgrKanRG95GpDlUNV0E32H0lyKMrBB+fB8csapLhXzaXS+wrK6Wa WXDqemA744BoUW/ue8YISjbFEbQBbr/0ZO5EwI519hqiC1fA1CU6qjnHTt7KMT93O9k1 sqehRVyTk8ooa1srfa70lL+FXQe4lV4VFQO8ydPBfRIxuDE1BvOQgnb9bkCGBWbV8x9s LLzg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=GMmONeDZhRsfRYenu3fsdwuxvPajRN7/ApHlNDbguPQ=; b=Zh/UWVS019jnCTFJZYIzmgFN7z2MSlK9ryDzXsPQhCJK8OUYhNaByoAdIEjreDqPpB rTMW4Ds1wqvyS1GotGAlcWHAxBTdvaK8KL61nx+LQifPz9AdXNfBU2hF+SkCZtzPPuqW yfsStwMMTm/aZfG5QRGJwcoF1vxM0mxoKc3+LtK69u+byuaRUDo7B4W3FTIN/hqvfY5W G1v3IRd9Lx5GMbyEpv2Oi1k2J3ypakUkXmCiIVvxhsiyxTT0+5WToNVreOd1exVMwl49 Vb9yeCC67GmJdVzYWm1G63B5KBbEH5tScj6B95C50N9zRW8m/joXPUqMDUi/ZGx8X1RK m5IQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=UFJAvfKJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 2-20020a631342000000b0043ca0a1f7dasi38468633pgt.674.2023.01.05.02.21.33; Thu, 05 Jan 2023 02:21:46 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=UFJAvfKJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232812AbjAEKUK (ORCPT + 99 others); Thu, 5 Jan 2023 05:20:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40730 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232540AbjAEKTd (ORCPT ); Thu, 5 Jan 2023 05:19:33 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D061B50E69 for ; Thu, 5 Jan 2023 02:19:05 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-45dd4fb5580so378164247b3.22 for ; Thu, 05 Jan 2023 02:19:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=GMmONeDZhRsfRYenu3fsdwuxvPajRN7/ApHlNDbguPQ=; b=UFJAvfKJ0l2l1UPgK2wOxQdo8MGVCU1OiqZoCoIsORlWWBUstNNxYXGaVW7ns6IVpx sWOpSOdqwOmhWrxPIEuTJShy8NWBIxPmZh7CTQVtYZgNAzrkiyxm4tK3EB2cr1fcFG72 NL2UY0HHYPq++cxfnJOWh1PCfM28o2mBsAvAuRNwxLK1h+x7QAsbLSW86Vjev3uMwHMA m4xMKgabvqB+ymtxPfBLgyttxaDC5fCgZB7AtaPYv2QEle4gH4NgP5WxMkQzfjWNUecC PeW8MaipU+W1oEePXDzy12uT9ZIJZoMLhlOKeScuy6DFqn7J8Iqdtb46MwSIbDmz23uA U6Yw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=GMmONeDZhRsfRYenu3fsdwuxvPajRN7/ApHlNDbguPQ=; b=MYD6qk8u7Aq/b2rD3KPDtgHy7eDsX8TG9WOFnZfjlhq5m/9kS6m5P6Fag+PkEZO0H2 cvuzAFyM9t7UAJwDAjPmzjC9TrYKvOy8d62yaRxmoWkMJ9i4DtGLuS6fd8W8rUkdMU2j yThkX7pmzKRZaqzmxYvI2TULPa2MhJQgJ/qwdSoVOo9QU51V+h/3RjlAPPWn4Y1S4ulE X2Ghu2QvLqFojp9lFblYuYrvY4e8xmjRVZPF0JfAOoP1iBbBvopl2+tTXqUxnCWOsnE+ AFerXEPOdgoZoj59teqyHpafNO7UV+OAI814J0DSh2kJ+PqH+1XQwYgH4DYFl0lQomUF wRBA== X-Gm-Message-State: AFqh2krQ9QPbxBbE9HVoI1khl8PHEiK6TZpiIvRBbX3Cy4SIc+cNXHcC agY1Y5g4wvgjO9c6Af6WHtLhVgSpjtc/tnMY X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a5b:58f:0:b0:727:6671:ff85 with SMTP id l15-20020a5b058f000000b007276671ff85mr4592846ybp.585.1672913945077; Thu, 05 Jan 2023 02:19:05 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:06 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-9-jthoughton@google.com> Subject: [PATCH 08/46] hugetlb: add HugeTLB HGM enablement helpers From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177581951513729?= X-GMAIL-MSGID: =?utf-8?q?1754177581951513729?= hugetlb_hgm_eligible indicates that a VMA is eligible to have HGM explicitly enabled via MADV_SPLIT, and hugetlb_hgm_enabled indicates that HGM has been enabled. Signed-off-by: James Houghton --- include/linux/hugetlb.h | 14 ++++++++++++++ mm/hugetlb.c | 23 +++++++++++++++++++++++ 2 files changed, 37 insertions(+) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index aa49fd8cb47c..8713d9c4f86c 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -1207,6 +1207,20 @@ static inline void hugetlb_unregister_node(struct node *node) } #endif /* CONFIG_HUGETLB_PAGE */ +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING +bool hugetlb_hgm_enabled(struct vm_area_struct *vma); +bool hugetlb_hgm_eligible(struct vm_area_struct *vma); +#else +static inline bool hugetlb_hgm_enabled(struct vm_area_struct *vma) +{ + return false; +} +static inline bool hugetlb_hgm_eligible(struct vm_area_struct *vma) +{ + return false; +} +#endif + static inline spinlock_t *huge_pte_lock(struct hstate *h, struct mm_struct *mm, pte_t *pte) { diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 2f86fedef283..d27fe05d5ef6 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6966,6 +6966,10 @@ static bool pmd_sharing_possible(struct vm_area_struct *vma) #ifdef CONFIG_USERFAULTFD if (uffd_disable_huge_pmd_share(vma)) return false; +#endif +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING + if (hugetlb_hgm_enabled(vma)) + return false; #endif /* * Only shared VMAs can share PMDs. @@ -7229,6 +7233,25 @@ __weak unsigned long hugetlb_mask_last_page(struct hstate *h) #endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */ +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING +bool hugetlb_hgm_eligible(struct vm_area_struct *vma) +{ + /* + * All shared VMAs may have HGM. + * + * HGM requires using the VMA lock, which only exists for shared VMAs. + * To make HGM work for private VMAs, we would need to use another + * scheme to prevent collapsing/splitting from invalidating other + * threads' page table walks. + */ + return vma && (vma->vm_flags & VM_MAYSHARE); +} +bool hugetlb_hgm_enabled(struct vm_area_struct *vma) +{ + return vma && (vma->vm_flags & VM_HUGETLB_HGM); +} +#endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ + /* * These functions are overwritable if your architecture needs its own * behavior. From patchwork Thu Jan 5 10:18:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39434 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp227457wrt; Thu, 5 Jan 2023 02:21:52 -0800 (PST) X-Google-Smtp-Source: AMrXdXsrjFsvkQ7Chusw0uXZN0LTlBF2jagl4KFk4SZdQlPhcyEl2HZx0ROv0at/pZ7gOFHv4iWd X-Received: by 2002:a62:18d6:0:b0:581:710e:56ad with SMTP id 205-20020a6218d6000000b00581710e56admr31560403pfy.23.1672914112013; Thu, 05 Jan 2023 02:21:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914111; cv=none; d=google.com; s=arc-20160816; b=GqEsIKj3LfRmrWBBRY8i4F/iKnZ7GiI1o+6/y3hb50hxdjCDpY2Kk3MVh6+M4vBGoP 0kij337Vabwg4XmAQjoZapjRxjvc+ThAPXsJT7aktxaw2m8162dr9ylJ0Rd47roA/41O ZaOPHR6b9BiiAuJcCa7dgdwSGZyBcNhkBWIQWjqkvkkNKSvsy8389lUe7eyE6o4EvAMH 69g/7b4IsQRHijDCYKyaeF8dsE+GnpdbGmDd6/1VaK6fMD+lgrgafQSE7esOvgEFDy7R FOlWzdcueR3liBPKP/kSGpIoEP/Amqeo82KT0HPwFQoTHFwVjNBhj7naa2M62XiwVTMQ jzKA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=wyTgKoUPaqLvPlKgkL6dENYZlNeSHJ5X+BJXc4mR0WE=; b=eLEZ77PW1aZakAwt60o1ooZX8XZgKtjq6D+Bpw+IG73FUtOEJhtuoVlFizxg8qvo0T Fa/gCuEmY98pxWvGiT7suF8STYlC/6BM3BU/ne22vnvF3cLaABPd67R7/XeCQ08EKgHq 8CmSPQTuj4+/pj5ezr2llcvAvQePfpWqdS+BsfcPIYn9NQttTwBMwOMAQ6uEPqnM3T8h PojiIUt+LhJZvCjPrd/4gDbaVsAqfIxQ6oQOVWngUahUZShp2jTTEsxlBj0D9p48kRQa nX8CphF2JyILzRQBpyBQ3sBJsmYNYtd3PPzy5hV1W0XA4gAVOjtAIiYqAgpGxAQnM676 kuUw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="mO/o/8j5"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f2-20020a056a0022c200b00580fb9dfd95si33165921pfj.186.2023.01.05.02.21.39; Thu, 05 Jan 2023 02:21:51 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="mO/o/8j5"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232815AbjAEKUO (ORCPT + 99 others); Thu, 5 Jan 2023 05:20:14 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40268 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232657AbjAEKTf (ORCPT ); Thu, 5 Jan 2023 05:19:35 -0500 Received: from mail-vk1-xa49.google.com (mail-vk1-xa49.google.com [IPv6:2607:f8b0:4864:20::a49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 666784FD67 for ; Thu, 5 Jan 2023 02:19:07 -0800 (PST) Received: by mail-vk1-xa49.google.com with SMTP id r23-20020a1f2b17000000b003b89463c349so9997548vkr.0 for ; Thu, 05 Jan 2023 02:19:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=wyTgKoUPaqLvPlKgkL6dENYZlNeSHJ5X+BJXc4mR0WE=; b=mO/o/8j5uzBbSG/A9cnee+je9u5JCkoj3aSd2P7vs7zhonYorgkMtjaT6Camo16MRO HNtN+6XL9Mv6MTsGSvNsS0ThZoOWReEOx+mlNlqsi2BHHzTVLxaSyppMdsTCWcfSc5jK 6+ZKrfvxX1DRZ4xkYZFJxsbNBzLnTyhaq61sRVQmunxV1fBNLuDxQ/deFSB6JHPEpTUt 5/tku/ASVRacTDjSteTyVtqapAitjwzsKEVznkEuAhKi0Dyl3YHpk9+wTziB0iAhpapX LWhXUWqa8fiTZGxfOcvuaDuvdtQuOtDLPsve4S6HaeJ5gJdlA8bacDMpeFwBpswodRai BbcA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=wyTgKoUPaqLvPlKgkL6dENYZlNeSHJ5X+BJXc4mR0WE=; b=RQWNQcAMo9jUKxbIrCePps+iM+7q65whRLZ2QffYspEnowHnRzGiMQZLioXXWghZdc 4aReJZaR75keIxK8aJKMJiNzGe/eauhJ/WdEo4dcViTsmSoWZlWQGBY/4IBPhtMeQb0w UTe4wFwq4cyYMaKmMtJH0VJxBdBaqcH6k+7zPcf4Twzo7zpp4t2+B9d9juQMGKsQXaVM 3Cmx2Fq3e6u9oVF/+ZbIKpJ3asRIQqW7HDIHIL8WN5tcbdf9+xOuQHSduEeslf9ai5md o7rX2Av4brVQBciLFJnrw1l060hqxLM9nHj4u83k4Cxs9hWLdqtmZLoc3ULyYZWtfygl agnQ== X-Gm-Message-State: AFqh2koeU63/cvNKUSzCWgq5ew655+S5fwut1cPJTQoBqfG1B0hg1FBj +3ssmcpL4WfT9UrRglNU1P/63CmwSb8aQ+z6 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a67:eb15:0:b0:3c6:5f9:689d with SMTP id a21-20020a67eb15000000b003c605f9689dmr4653326vso.8.1672913946620; Thu, 05 Jan 2023 02:19:06 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:07 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-10-jthoughton@google.com> Subject: [PATCH 09/46] mm: add MADV_SPLIT to enable HugeTLB HGM From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177587731988147?= X-GMAIL-MSGID: =?utf-8?q?1754177587731988147?= Issuing ioctl(MADV_SPLIT) on a HugeTLB address range will enable HugeTLB HGM. MADV_SPLIT was chosen for the name so that this API can be applied to non-HugeTLB memory in the future, if such an application is to arise. MADV_SPLIT provides several API changes for some syscalls on HugeTLB address ranges: 1. UFFDIO_CONTINUE is allowed for MAP_SHARED VMAs at PAGE_SIZE alignment. 2. read()ing a page fault event from a userfaultfd will yield a PAGE_SIZE-rounded address, instead of a huge-page-size-rounded address (unless UFFD_FEATURE_EXACT_ADDRESS is used). There is no way to disable the API changes that come with issuing MADV_SPLIT. MADV_COLLAPSE can be used to collapse high-granularity page table mappings that come from the extended functionality that comes with using MADV_SPLIT. For post-copy live migration, the expected use-case is: 1. mmap(MAP_SHARED, some_fd) primary mapping 2. mmap(MAP_SHARED, some_fd) alias mapping 3. MADV_SPLIT the primary mapping 4. UFFDIO_REGISTER/etc. the primary mapping 5. Copy memory contents into alias mapping and UFFDIO_CONTINUE the corresponding PAGE_SIZE sections in the primary mapping. More API changes may be added in the future. Signed-off-by: James Houghton --- arch/alpha/include/uapi/asm/mman.h | 2 ++ arch/mips/include/uapi/asm/mman.h | 2 ++ arch/parisc/include/uapi/asm/mman.h | 2 ++ arch/xtensa/include/uapi/asm/mman.h | 2 ++ include/linux/hugetlb.h | 2 ++ include/uapi/asm-generic/mman-common.h | 2 ++ mm/hugetlb.c | 3 +-- mm/madvise.c | 26 ++++++++++++++++++++++++++ 8 files changed, 39 insertions(+), 2 deletions(-) diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h index 763929e814e9..7a26f3648b90 100644 --- a/arch/alpha/include/uapi/asm/mman.h +++ b/arch/alpha/include/uapi/asm/mman.h @@ -78,6 +78,8 @@ #define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */ +#define MADV_SPLIT 26 /* Enable hugepage high-granularity APIs */ + /* compatibility flags */ #define MAP_FILE 0 diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h index c6e1fc77c996..f8a74a3a0928 100644 --- a/arch/mips/include/uapi/asm/mman.h +++ b/arch/mips/include/uapi/asm/mman.h @@ -105,6 +105,8 @@ #define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */ +#define MADV_SPLIT 26 /* Enable hugepage high-granularity APIs */ + /* compatibility flags */ #define MAP_FILE 0 diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h index 68c44f99bc93..a6dc6a56c941 100644 --- a/arch/parisc/include/uapi/asm/mman.h +++ b/arch/parisc/include/uapi/asm/mman.h @@ -72,6 +72,8 @@ #define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */ +#define MADV_SPLIT 74 /* Enable hugepage high-granularity APIs */ + #define MADV_HWPOISON 100 /* poison a page for testing */ #define MADV_SOFT_OFFLINE 101 /* soft offline page for testing */ diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h index 1ff0c858544f..f98a77c430a9 100644 --- a/arch/xtensa/include/uapi/asm/mman.h +++ b/arch/xtensa/include/uapi/asm/mman.h @@ -113,6 +113,8 @@ #define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */ +#define MADV_SPLIT 26 /* Enable hugepage high-granularity APIs */ + /* compatibility flags */ #define MAP_FILE 0 diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 8713d9c4f86c..16fc3e381801 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -109,6 +109,8 @@ struct hugetlb_vma_lock { struct vm_area_struct *vma; }; +void hugetlb_vma_lock_alloc(struct vm_area_struct *vma); + extern struct resv_map *resv_map_alloc(void); void resv_map_release(struct kref *ref); diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index 6ce1f1ceb432..996e8ded092f 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -79,6 +79,8 @@ #define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */ +#define MADV_SPLIT 26 /* Enable hugepage high-granularity APIs */ + /* compatibility flags */ #define MAP_FILE 0 diff --git a/mm/hugetlb.c b/mm/hugetlb.c index d27fe05d5ef6..5bd53ae8ca4b 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -92,7 +92,6 @@ struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp; /* Forward declaration */ static int hugetlb_acct_memory(struct hstate *h, long delta); static void hugetlb_vma_lock_free(struct vm_area_struct *vma); -static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma); static void __hugetlb_vma_unlock_write_free(struct vm_area_struct *vma); static inline bool subpool_is_free(struct hugepage_subpool *spool) @@ -361,7 +360,7 @@ static void hugetlb_vma_lock_free(struct vm_area_struct *vma) } } -static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma) +void hugetlb_vma_lock_alloc(struct vm_area_struct *vma) { struct hugetlb_vma_lock *vma_lock; diff --git a/mm/madvise.c b/mm/madvise.c index 025be3517af1..04ee28992e52 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -1011,6 +1011,24 @@ static long madvise_remove(struct vm_area_struct *vma, return error; } +static int madvise_split(struct vm_area_struct *vma, + unsigned long *new_flags) +{ + if (!is_vm_hugetlb_page(vma) || !hugetlb_hgm_eligible(vma)) + return -EINVAL; + /* + * Attempt to allocate the VMA lock again. If it isn't allocated, + * MADV_COLLAPSE won't work. + */ + hugetlb_vma_lock_alloc(vma); + + /* PMD sharing doesn't work with HGM. */ + hugetlb_unshare_all_pmds(vma); + + *new_flags |= VM_HUGETLB_HGM; + return 0; +} + /* * Apply an madvise behavior to a region of a vma. madvise_update_vma * will handle splitting a vm area into separate areas, each area with its own @@ -1089,6 +1107,11 @@ static int madvise_vma_behavior(struct vm_area_struct *vma, break; case MADV_COLLAPSE: return madvise_collapse(vma, prev, start, end); + case MADV_SPLIT: + error = madvise_split(vma, &new_flags); + if (error) + goto out; + break; } anon_name = anon_vma_name(vma); @@ -1183,6 +1206,9 @@ madvise_behavior_valid(int behavior) case MADV_HUGEPAGE: case MADV_NOHUGEPAGE: case MADV_COLLAPSE: +#endif +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING + case MADV_SPLIT: #endif case MADV_DONTDUMP: case MADV_DODUMP: From patchwork Thu Jan 5 10:18:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39435 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp227469wrt; Thu, 5 Jan 2023 02:21:54 -0800 (PST) X-Google-Smtp-Source: AMrXdXvW7vsTpPPDuBWbUMIu1jU2qEvQ86/u9qKw+IOkrk2+c6846Ctsp4frDj6u5QGmFlxgQX0B X-Received: by 2002:a17:902:a50c:b0:192:6c8a:6b81 with SMTP id s12-20020a170902a50c00b001926c8a6b81mr39926007plq.31.1672914114182; Thu, 05 Jan 2023 02:21:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914114; cv=none; d=google.com; s=arc-20160816; b=fJ1m511nDNrU26Xhd4PYGGlkt5lIUqbbwm7XfEkYCo8IOzf6W9sq6VyTf8e8HocSRC /7qS+CywellHzU+qvHB5oCvy9epmck1kbwtq1jx5BT42qnvlWQKIAiRZPWY/thfwdqKl 9TfkvOelDbjuO0OnU0jCkFWkp7M9/95pwjHw2gi7RG55X01BseM9S9UBVIq7L5dEOJRb FKunDitMEpzrEGqo9zSvY6YbJEz4gjTTNQhdEFWBlA/m+2h5dA+7ypPw1iG6OUYTYipF I/aJeK/8c56we/bXLa8icKB/uBFCKLdPfF7UibxNLoZfjvW5X9WSlbVghUg10DRjSto8 WyiQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=YeLoaIulUhCWWXAMbuNF9M+t6oo8aI+KwJNM9mGseGc=; b=MsirHvDhvtR8ZXUI1GdxgqLS9cjBnfOOSxpyBIsHkDVrI4/VxZm8g8hPOEawc97kLC DRQOdnoM1j0MrcCjDzIbdpcLtF6AN8Brk0ha5Vd0aaEw2lFyEr2FLsk7P8/bgNlHjAfF xLQA2jnj1Us1kStl9Q+X51Ee0mTpp/keotvgU6KiOSrbLWu2Athckkz9KHQO9gO8n0ol SJDO9clOBQdMK7QFuUrUxov08C7fb17mHR321NMD2qiFZ0Tn1T2jroK9xCpxbTJR+WcM 5pbRCGJbSDDrXKAymJqlfkyvCLhmPvjdxQQ5xTEHyv2SuPTEJ47MuSWKyiPXpdhA0dKJ Eqaw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=kKVOVndA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y3-20020a17090322c300b00186ab03eebfsi5556322plg.418.2023.01.05.02.21.40; Thu, 05 Jan 2023 02:21:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=kKVOVndA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231712AbjAEKUU (ORCPT + 99 others); Thu, 5 Jan 2023 05:20:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40280 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232678AbjAEKTg (ORCPT ); Thu, 5 Jan 2023 05:19:36 -0500 Received: from mail-ua1-x949.google.com (mail-ua1-x949.google.com [IPv6:2607:f8b0:4864:20::949]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EA39650E66 for ; Thu, 5 Jan 2023 02:19:08 -0800 (PST) Received: by mail-ua1-x949.google.com with SMTP id 88-20020a9f2161000000b00576fb176177so3553637uab.3 for ; Thu, 05 Jan 2023 02:19:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=YeLoaIulUhCWWXAMbuNF9M+t6oo8aI+KwJNM9mGseGc=; b=kKVOVndAgw/m6ANRlyIyxTS08GniA9U67oeruSczuNHOB6SYI/n6pFU+HOjz+szql2 PWk4++j2ti0R10lgALWNsE1fhYxnWxM7JukY7mIAe7pCJ5g5YXsry19Ma9iPvK6znPtU jMYy+OpdWtqzX+Nek/p3045z5ujRR/oRPI7EUBPhqyf1BWSaZmKdKoPtQI+kGu8Ti537 3J1Z6ngNzsR80E2A85x/f+xTfzkF6gI9uOpsAHaLaSIpG6B2w5XFihWl8tN3nkAnSfct 1xZKvd3M+YUP+kgur9AeYOYhuRYRgatI3fPPdF/dj+Kh+YLKqpI/TnSWPjyTPhxjRiqI rjMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=YeLoaIulUhCWWXAMbuNF9M+t6oo8aI+KwJNM9mGseGc=; b=tK8Ladb9W+G05hfikjI/8fj6SPmrLp9043aibW0lxUSUGokIA/EbsuJF4EXXO4XW2c NPm5tdvinivWb6t4mvgmgXI3rY2JK1bkLLTk33HW11elk3PYHOE8/udnri2YQJs/bQ4P a4zzBkAJVJOHg3+sHVieFKQTK3YctCbT2iRdCR1r0AmdxTPQetjTDgFzD+D7wMUrMxtl Z9xy3bLsU2SnMKhKahqkj07j3LD5aEQ1GPgg1sZca3jfHqNLPQEXUZeHzaqN6vIgkHsN GF0mVV67COGs28ZDzJxcecwBNf5f0eUmDrx3kiKxDFVQcD2o9DorflLuqy+RkaRoJ4Ug DoLQ== X-Gm-Message-State: AFqh2kp8heZmAcmZwSVHF5BiJ8zhAb0SZxEbzm8iYJz2pYGUgi09pxLN 7gn2pTXskTiGp1kzrhmn+IFJoiHxIdGOvlkV X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:ab0:2e84:0:b0:419:2056:34b8 with SMTP id f4-20020ab02e84000000b00419205634b8mr5177046uaa.85.1672913948093; Thu, 05 Jan 2023 02:19:08 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:08 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-11-jthoughton@google.com> Subject: [PATCH 10/46] hugetlb: make huge_pte_lockptr take an explicit shift argument From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177590561172042?= X-GMAIL-MSGID: =?utf-8?q?1754177590561172042?= This is needed to handle PTL locking with high-granularity mapping. We won't always be using the PMD-level PTL even if we're using the 2M hugepage hstate. It's possible that we're dealing with 4K PTEs, in which case, we need to lock the PTL for the 4K PTE. Reviewed-by: Mina Almasry Acked-by: Mike Kravetz Signed-off-by: James Houghton --- arch/powerpc/mm/pgtable.c | 3 ++- include/linux/hugetlb.h | 9 ++++----- mm/hugetlb.c | 7 ++++--- mm/migrate.c | 3 ++- 4 files changed, 12 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c index cb2dcdb18f8e..035a0df47af0 100644 --- a/arch/powerpc/mm/pgtable.c +++ b/arch/powerpc/mm/pgtable.c @@ -261,7 +261,8 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma, psize = hstate_get_psize(h); #ifdef CONFIG_DEBUG_VM - assert_spin_locked(huge_pte_lockptr(h, vma->vm_mm, ptep)); + assert_spin_locked(huge_pte_lockptr(huge_page_shift(h), + vma->vm_mm, ptep)); #endif #else diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 16fc3e381801..3f098363cd6e 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -956,12 +956,11 @@ static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask) return modified_mask; } -static inline spinlock_t *huge_pte_lockptr(struct hstate *h, +static inline spinlock_t *huge_pte_lockptr(unsigned int shift, struct mm_struct *mm, pte_t *pte) { - if (huge_page_size(h) == PMD_SIZE) + if (shift == PMD_SHIFT) return pmd_lockptr(mm, (pmd_t *) pte); - VM_BUG_ON(huge_page_size(h) == PAGE_SIZE); return &mm->page_table_lock; } @@ -1171,7 +1170,7 @@ static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask) return 0; } -static inline spinlock_t *huge_pte_lockptr(struct hstate *h, +static inline spinlock_t *huge_pte_lockptr(unsigned int shift, struct mm_struct *mm, pte_t *pte) { return &mm->page_table_lock; @@ -1228,7 +1227,7 @@ static inline spinlock_t *huge_pte_lock(struct hstate *h, { spinlock_t *ptl; - ptl = huge_pte_lockptr(h, mm, pte); + ptl = huge_pte_lockptr(huge_page_shift(h), mm, pte); spin_lock(ptl); return ptl; } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 5bd53ae8ca4b..4db38dc79d0e 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4987,7 +4987,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, } dst_ptl = huge_pte_lock(h, dst, dst_pte); - src_ptl = huge_pte_lockptr(h, src, src_pte); + src_ptl = huge_pte_lockptr(huge_page_shift(h), src, src_pte); spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); entry = huge_ptep_get(src_pte); again: @@ -5068,7 +5068,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, /* Install the new huge page if src pte stable */ dst_ptl = huge_pte_lock(h, dst, dst_pte); - src_ptl = huge_pte_lockptr(h, src, src_pte); + src_ptl = huge_pte_lockptr(huge_page_shift(h), + src, src_pte); spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); entry = huge_ptep_get(src_pte); if (!pte_same(src_pte_old, entry)) { @@ -5122,7 +5123,7 @@ static void move_huge_pte(struct vm_area_struct *vma, unsigned long old_addr, pte_t pte; dst_ptl = huge_pte_lock(h, mm, dst_pte); - src_ptl = huge_pte_lockptr(h, mm, src_pte); + src_ptl = huge_pte_lockptr(huge_page_shift(h), mm, src_pte); /* * We don't have to worry about the ordering of src and dst ptlocks diff --git a/mm/migrate.c b/mm/migrate.c index b5032c3e940a..832f639fc49a 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -360,7 +360,8 @@ void __migration_entry_wait_huge(struct vm_area_struct *vma, void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte) { - spinlock_t *ptl = huge_pte_lockptr(hstate_vma(vma), vma->vm_mm, pte); + spinlock_t *ptl = huge_pte_lockptr(huge_page_shift(hstate_vma(vma)), + vma->vm_mm, pte); __migration_entry_wait_huge(vma, pte, ptl); } From patchwork Thu Jan 5 10:18:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39436 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp227496wrt; Thu, 5 Jan 2023 02:21:58 -0800 (PST) X-Google-Smtp-Source: AMrXdXsS+L4YT/49+lPL+6RuhrWRoEQiSA72Q8qMi/V6fEAbo0nPxZOZnpW1Xe896socMqCzQWf6 X-Received: by 2002:a05:6a20:138b:b0:a0:462f:8e3e with SMTP id w11-20020a056a20138b00b000a0462f8e3emr78044543pzh.55.1672914118277; Thu, 05 Jan 2023 02:21:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914118; cv=none; d=google.com; s=arc-20160816; b=y1YrBor10ZpIHX3Mjazdp3Os9Ej7Q7if3SNMFLudilSFimfsAKO513Latqq+1LmEfl TJpO7D3F1pPwKyg/57ZbCzJC6vIgsBhsC8f4AY2OFdcDPCjFtkgE1GIYgMtga+eKrzU4 dX2I6oG1UKHOinHL644xRAAqiWLFg0M9PjNbhL0hLbX6vIrrRK7iv+4OlCn3OYKqJpCw W2ycTgHviKel2YKrnYrwwJndaZYcwb10Cy/JOFf8u8XCbiyDhueRnw8Wu5HlfAkkq/Yo ZiyeS5FUXTwqw/igMJjhJDg8qJsmZPxDpZjfHXyfOp4Om6h2FSqZ2gqLSpSFNohn4Z5T 6OlQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=HlgWOjaAJz0zfdNn9cG/30nMIJO7QLF2mBB2gxe7Hq8=; b=YC5ezH0NXHHqTzmdVJQojoDPwJK3FbRi8S/jbdxGJJjJVC/VWGVxt9Fu9igNwY2ajw go1sTGV7KuN9B99HI6MS5H40qE6rOrvzVo4dUJ21aGbI1MzglGvMzBFGtdang7xkuW0G EKFm+9VfsuwpNHzYNpBPwaJ6uXcqeb8u22INR0j2stZtc6EHFX6PlLhEvvLMZLj8sohV RexTtiPNjoqHknpwibLW6ZCuHAfjnFP1Q286V528i9+6QoaqTQUu9ZRsIDiArvRjZT2I rHaY1Rz+7D7faZWtRa/pLtI2bxeKxnS3FFdDeKNc47iO+Iz5Qjvn06MUJJSPW47D2n31 POjw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=iSz2ooNk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c17-20020a056a00249100b005814ce11593si30063738pfv.258.2023.01.05.02.21.44; Thu, 05 Jan 2023 02:21:58 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=iSz2ooNk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232836AbjAEKUY (ORCPT + 99 others); Thu, 5 Jan 2023 05:20:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40788 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232549AbjAEKTg (ORCPT ); Thu, 5 Jan 2023 05:19:36 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 50DA050E62 for ; Thu, 5 Jan 2023 02:19:09 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id a4-20020a5b0004000000b006fdc6aaec4fso36882545ybp.20 for ; Thu, 05 Jan 2023 02:19:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=HlgWOjaAJz0zfdNn9cG/30nMIJO7QLF2mBB2gxe7Hq8=; b=iSz2ooNkUZGnH87sDmHD2gCBj2Hey3YSf423VRaH6AnWyafV22fMV2Sw9PR3D4uqic CllJH2wRMj7/tkZJT/L4HrkszREidz8xo8elOcL/ubDPrXdbjKqcvSwNuBpfIvAGVtvR A5ldBvPdHR0ntED8Zu2n7yVUxfVZ9fS2FUgGqGsf9I7fh5DevYo9DWvQ14UI5opS2Ndf 9pyDdSpxBL6nohGJDqj9v21g+a96iXBzAfKohulMwY8yNne3UYL+LmR1Lm/zk3AAVII7 QovdH+ZXsH6oWLiYCENZhk3nLT5IEtiB909u4krXh79gZLNok0fXXDBeFzffTBDi9IPZ p31Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=HlgWOjaAJz0zfdNn9cG/30nMIJO7QLF2mBB2gxe7Hq8=; b=wcblBh12TaQq6mHqq11VfhPP/Hyd/XimXS7JLcJNeXV645s/vu1HbXX5EpTqkcJZvr 8OlnJicystYc1XAdmAbLd0z62zZosa9DIktDTNAYu7O0vDFTAxvovW3fMvrlJUKSt8av FbUzHLiS/qQwAXu5RCoqFH10184LDmB/xon6sVHsQbtVx7kTT3gJznZygex1ks8CwNy+ Zc7FpCSwQdxCpIn3bGIvddMNLMeQO9L679lyh9BfDj2BKK1wQC4kA1nqXEMNv9Oyi4wc hOYH+BxPLt3vL9hBnC8S/LPpF4eHClXTAYyVQzbTBAhp27uwuas1xYrqgbP92l/an9MA 0KsA== X-Gm-Message-State: AFqh2koSq0UeMM73dgyhmunoSpG1B/I1b02dnzaj4dAgmVEoy8jHf8ze W3DB7TYFg+7LZ9P/boF45XDgE8vuTcQQdKCq X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:42c2:0:b0:758:bd9:8a70 with SMTP id p185-20020a2542c2000000b007580bd98a70mr4286087yba.377.1672913949470; Thu, 05 Jan 2023 02:19:09 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:09 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-12-jthoughton@google.com> Subject: [PATCH 11/46] hugetlb: add hugetlb_pte to track HugeTLB page table entries From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177594458095702?= X-GMAIL-MSGID: =?utf-8?q?1754177594458095702?= After high-granularity mapping, page table entries for HugeTLB pages can be of any size/type. (For example, we can have a 1G page mapped with a mix of PMDs and PTEs.) This struct is to help keep track of a HugeTLB PTE after we have done a page table walk. Without this, we'd have to pass around the "size" of the PTE everywhere. We effectively did this before; it could be fetched from the hstate, which we pass around pretty much everywhere. hugetlb_pte_present_leaf is included here as a helper function that will be used frequently later on. Signed-off-by: James Houghton --- include/linux/hugetlb.h | 72 +++++++++++++++++++++++++++++++++++++++++ mm/hugetlb.c | 29 +++++++++++++++++ 2 files changed, 101 insertions(+) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 3f098363cd6e..bf441d8a1b52 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -38,6 +38,54 @@ typedef struct { unsigned long pd; } hugepd_t; */ #define __NR_USED_SUBPAGE 3 +enum hugetlb_level { + HUGETLB_LEVEL_PTE = 1, + /* + * We always include PMD, PUD, and P4D in this enum definition so that, + * when logged as an integer, we can easily tell which level it is. + */ + HUGETLB_LEVEL_PMD, + HUGETLB_LEVEL_PUD, + HUGETLB_LEVEL_P4D, + HUGETLB_LEVEL_PGD, +}; + +struct hugetlb_pte { + pte_t *ptep; + unsigned int shift; + enum hugetlb_level level; + spinlock_t *ptl; +}; + +static inline +void __hugetlb_pte_populate(struct hugetlb_pte *hpte, pte_t *ptep, + unsigned int shift, enum hugetlb_level level, + spinlock_t *ptl) +{ + /* + * If 'shift' indicates that this PTE is contiguous, then @ptep must + * be the first pte of the contiguous bunch. + */ + hpte->ptl = ptl; + hpte->ptep = ptep; + hpte->shift = shift; + hpte->level = level; +} + +static inline +unsigned long hugetlb_pte_size(const struct hugetlb_pte *hpte) +{ + return 1UL << hpte->shift; +} + +static inline +unsigned long hugetlb_pte_mask(const struct hugetlb_pte *hpte) +{ + return ~(hugetlb_pte_size(hpte) - 1); +} + +bool hugetlb_pte_present_leaf(const struct hugetlb_pte *hpte, pte_t pte); + struct hugepage_subpool { spinlock_t lock; long count; @@ -1232,6 +1280,30 @@ static inline spinlock_t *huge_pte_lock(struct hstate *h, return ptl; } +static inline +spinlock_t *hugetlb_pte_lockptr(struct hugetlb_pte *hpte) +{ + return hpte->ptl; +} + +static inline +spinlock_t *hugetlb_pte_lock(struct hugetlb_pte *hpte) +{ + spinlock_t *ptl = hugetlb_pte_lockptr(hpte); + + spin_lock(ptl); + return ptl; +} + +static inline +void hugetlb_pte_populate(struct mm_struct *mm, struct hugetlb_pte *hpte, + pte_t *ptep, unsigned int shift, + enum hugetlb_level level) +{ + __hugetlb_pte_populate(hpte, ptep, shift, level, + huge_pte_lockptr(shift, mm, ptep)); +} + #if defined(CONFIG_HUGETLB_PAGE) && defined(CONFIG_CMA) extern void __init hugetlb_cma_reserve(int order); #else diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 4db38dc79d0e..2d83a2c359a2 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1266,6 +1266,35 @@ static bool vma_has_reserves(struct vm_area_struct *vma, long chg) return false; } +bool hugetlb_pte_present_leaf(const struct hugetlb_pte *hpte, pte_t pte) +{ + pgd_t pgd; + p4d_t p4d; + pud_t pud; + pmd_t pmd; + + switch (hpte->level) { + case HUGETLB_LEVEL_PGD: + pgd = __pgd(pte_val(pte)); + return pgd_present(pgd) && pgd_leaf(pgd); + case HUGETLB_LEVEL_P4D: + p4d = __p4d(pte_val(pte)); + return p4d_present(p4d) && p4d_leaf(p4d); + case HUGETLB_LEVEL_PUD: + pud = __pud(pte_val(pte)); + return pud_present(pud) && pud_leaf(pud); + case HUGETLB_LEVEL_PMD: + pmd = __pmd(pte_val(pte)); + return pmd_present(pmd) && pmd_leaf(pmd); + case HUGETLB_LEVEL_PTE: + return pte_present(pte); + default: + WARN_ON_ONCE(1); + return false; + } +} + + static void enqueue_hugetlb_folio(struct hstate *h, struct folio *folio) { int nid = folio_nid(folio); From patchwork Thu Jan 5 10:18:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39437 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp227515wrt; Thu, 5 Jan 2023 02:22:02 -0800 (PST) X-Google-Smtp-Source: AMrXdXt8bg6et1GU72PQe0gGkjV2sBuOe4XkongkrUMIwYShw3it2IuaEBjvTz4Hx+C/nBAJ6wCH X-Received: by 2002:a05:6a20:7d95:b0:b0:1051:2a96 with SMTP id v21-20020a056a207d9500b000b010512a96mr78622719pzj.57.1672914122283; Thu, 05 Jan 2023 02:22:02 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914122; cv=none; d=google.com; s=arc-20160816; b=DizA7f9bCa8OwWcnXd9gWsvk+q7YJ4ZOZwZm5ppQAsWDGZyKsGUj+VsQBr5mEP1p1F YqQ94owJHn3nwyR22LE15uLMlv86vv2uyB0JyQpbl6GjUuWPR5VMV1FjZ/tUC05H3aNi NL0YIQIUG9g8UCxsaHAp1wBkc9siGPXsCHNjt+zFVXtG6ajj+dW+dv3uaNUMYrcCrERt JQDZBsI4GgZhY0Hn84Z/KXzxdJ9x/fIrZBrUxwldqKwxzHnvJhu/8tzzlAZOntuK1AgZ XhSCrbjUBrKLJgJWSLpkGuGY+noIoui+ciNSXfotFz5y4akJ9+K8HVsKMHwh5VUN/qXw 7JeQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=YXjDdGeof+pkTOexvpTub2FYR/fAsSt0/calmVQ1OdU=; b=w9Ux9ICxy4jKlJsKEj4wrCYF20ZA1UGDlG86/igrjvNIx2jZZ1CeaDi5zBwvOpaq3/ rBsFpXVrS3Rji1jzh2kZ3UlNM55Im7ux8S74V55ro0OjVJvV27eWqpEJcY8brfvLkZ0N qeKihVpwT9UF/joAogK4bmpOBDagQulnTO3y+46ZcAGCNNIjTNyBWGuHGlJvzAPaCIoQ uLwJ/dyORsJrQocRPsH/2QprZeIZvAg57V+sN3uD/o9QAlItZK/99y3vgUHKiLmqQ5vq uRpij7WFQ1guz8uJ4iUt9+ZNsyJK5w1tdmzXaWABUktyr2+MmGY9q6krVFcIRGfOnLqf Wlcw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=XCDU0evH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c21-20020a631c15000000b00478ff3661d1si36368929pgc.440.2023.01.05.02.21.49; Thu, 05 Jan 2023 02:22:02 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=XCDU0evH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232846AbjAEKU1 (ORCPT + 99 others); Thu, 5 Jan 2023 05:20:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40372 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232690AbjAEKTl (ORCPT ); Thu, 5 Jan 2023 05:19:41 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 97D8E52753 for ; Thu, 5 Jan 2023 02:19:11 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-47ede4426e1so257847787b3.7 for ; Thu, 05 Jan 2023 02:19:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=YXjDdGeof+pkTOexvpTub2FYR/fAsSt0/calmVQ1OdU=; b=XCDU0evHwnZkmMXLipslFPIAImW1wnD5gg1yPSGJZCiJT1xNPXMtFtNvS/kZpXrh6h /YzmZN/osO/uOmKFuvSiBStVwQ/r8eUxfoBViWLdx/V5nbPmzEDvLxPqD88BhVRIk6gx BdYsxNiAVkM52/C8UupP5zDev9GYfWWEQNeFa2R3dl6z5oPntfdNRseZqzzslEjMFOLL US6rpYr3YdIccRX77RBQrSex+xbAKFESNnT8PrAfGIQxncDhYiZquJomS8iN2RA/dlmE I/ng9bMh5yIaNqpp0R/MRqgY3ugfYkSYOlYkBPLzLzHOxuzNSoTLFIrbRaCLRi+7kJln BAuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=YXjDdGeof+pkTOexvpTub2FYR/fAsSt0/calmVQ1OdU=; b=Qq24O2ptzRSXv5u+Qyse61T3GPupfRQAlKYudOaF9YsT8ENby5FHyRj0VunlEaBM7+ gS7JahBVWsQjjhr7PSLsacolvUQ7bh4Lg5u3qb6xRYd9OFp4Z/2u0mziA6/TwO/NzBaB Pd1KhT+N2SAloDfx5FaV5hev5PNRZHnkjnMf308o5xoS9b2d1TcHtu8mTNVvlmDn52Ah tLM2mKA+9KTh9nNJc+5wlFHDtnevFawmx0ve52EgA7WSjmtimJw8VHt6DyjIZOXcIGoS jjfVYihTPlLkY6zwBUBPw5hoPKMayDgpcALk6Nr4ojIbjGaKgkO8byyvE8+FTpE7Gg00 N9Bw== X-Gm-Message-State: AFqh2kqERsLAUdtsEYDZzWouLnWR5t999I987+yeStfLCtbWFPm9E4sZ kTJEQb/+FoCImonay46G1LyhpyGNNmdSwNh0 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:c054:0:b0:725:2e78:3fd3 with SMTP id c81-20020a25c054000000b007252e783fd3mr3300724ybf.41.1672913951341; Thu, 05 Jan 2023 02:19:11 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:10 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-13-jthoughton@google.com> Subject: [PATCH 12/46] hugetlb: add hugetlb_alloc_pmd and hugetlb_alloc_pte From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177598908682947?= X-GMAIL-MSGID: =?utf-8?q?1754177598908682947?= These functions are used to allocate new PTEs below the hstate PTE. This will be used by hugetlb_walk_step, which implements stepping forwards in a HugeTLB high-granularity page table walk. The reasons that we don't use the standard pmd_alloc/pte_alloc* functions are: 1) This prevents us from accidentally overwriting swap entries or attempting to use swap entries as present non-leaf PTEs (see pmd_alloc(); we assume that !pte_none means pte_present and non-leaf). 2) Locking hugetlb PTEs can different than regular PTEs. (Although, as implemented right now, locking is the same.) 3) We can maintain compatibility with CONFIG_HIGHPTE. That is, HugeTLB HGM won't use HIGHPTE, but the kernel can still be built with it, and other mm code will use it. When GENERAL_HUGETLB supports P4D-based hugepages, we will need to implement hugetlb_pud_alloc to implement hugetlb_walk_step. Signed-off-by: James Houghton --- include/linux/hugetlb.h | 5 ++ mm/hugetlb.c | 114 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 119 insertions(+) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index bf441d8a1b52..ad9d19f0d1b9 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -86,6 +86,11 @@ unsigned long hugetlb_pte_mask(const struct hugetlb_pte *hpte) bool hugetlb_pte_present_leaf(const struct hugetlb_pte *hpte, pte_t pte); +pmd_t *hugetlb_alloc_pmd(struct mm_struct *mm, struct hugetlb_pte *hpte, + unsigned long addr); +pte_t *hugetlb_alloc_pte(struct mm_struct *mm, struct hugetlb_pte *hpte, + unsigned long addr); + struct hugepage_subpool { spinlock_t lock; long count; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 2d83a2c359a2..2160cbaf3311 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -480,6 +480,120 @@ static bool has_same_uncharge_info(struct file_region *rg, #endif } +/* + * hugetlb_alloc_pmd -- Allocate or find a PMD beneath a PUD-level hpte. + * + * This is meant to be used to implement hugetlb_walk_step when one must go to + * step down to a PMD. Different architectures may implement hugetlb_walk_step + * differently, but hugetlb_alloc_pmd and hugetlb_alloc_pte are architecture- + * independent. + * + * Returns: + * On success: the pointer to the PMD. This should be placed into a + * hugetlb_pte. @hpte is not changed. + * ERR_PTR(-EINVAL): hpte is not PUD-level + * ERR_PTR(-EEXIST): there is a non-leaf and non-empty PUD in @hpte + * ERR_PTR(-ENOMEM): could not allocate the new PMD + */ +pmd_t *hugetlb_alloc_pmd(struct mm_struct *mm, struct hugetlb_pte *hpte, + unsigned long addr) +{ + spinlock_t *ptl = hugetlb_pte_lockptr(hpte); + pmd_t *new; + pud_t *pudp; + pud_t pud; + + if (hpte->level != HUGETLB_LEVEL_PUD) + return ERR_PTR(-EINVAL); + + pudp = (pud_t *)hpte->ptep; +retry: + pud = READ_ONCE(*pudp); + if (likely(pud_present(pud))) + return unlikely(pud_leaf(pud)) + ? ERR_PTR(-EEXIST) + : pmd_offset(pudp, addr); + else if (!pud_none(pud)) + /* + * Not present and not none means that a swap entry lives here, + * and we can't get rid of it. + */ + return ERR_PTR(-EEXIST); + + new = pmd_alloc_one(mm, addr); + if (!new) + return ERR_PTR(-ENOMEM); + + spin_lock(ptl); + if (!pud_same(pud, *pudp)) { + spin_unlock(ptl); + pmd_free(mm, new); + goto retry; + } + + mm_inc_nr_pmds(mm); + smp_wmb(); /* See comment in pmd_install() */ + pud_populate(mm, pudp, new); + spin_unlock(ptl); + return pmd_offset(pudp, addr); +} + +/* + * hugetlb_alloc_pte -- Allocate a PTE beneath a pmd_none PMD-level hpte. + * + * See the comment above hugetlb_alloc_pmd. + */ +pte_t *hugetlb_alloc_pte(struct mm_struct *mm, struct hugetlb_pte *hpte, + unsigned long addr) +{ + spinlock_t *ptl = hugetlb_pte_lockptr(hpte); + pgtable_t new; + pmd_t *pmdp; + pmd_t pmd; + + if (hpte->level != HUGETLB_LEVEL_PMD) + return ERR_PTR(-EINVAL); + + pmdp = (pmd_t *)hpte->ptep; +retry: + pmd = READ_ONCE(*pmdp); + if (likely(pmd_present(pmd))) + return unlikely(pmd_leaf(pmd)) + ? ERR_PTR(-EEXIST) + : pte_offset_kernel(pmdp, addr); + else if (!pmd_none(pmd)) + /* + * Not present and not none means that a swap entry lives here, + * and we can't get rid of it. + */ + return ERR_PTR(-EEXIST); + + /* + * With CONFIG_HIGHPTE, calling `pte_alloc_one` directly may result + * in page tables being allocated in high memory, needing a kmap to + * access. Instead, we call __pte_alloc_one directly with + * GFP_PGTABLE_USER to prevent these PTEs being allocated in high + * memory. + */ + new = __pte_alloc_one(mm, GFP_PGTABLE_USER); + if (!new) + return ERR_PTR(-ENOMEM); + + spin_lock(ptl); + if (!pmd_same(pmd, *pmdp)) { + spin_unlock(ptl); + pgtable_pte_page_dtor(new); + __free_page(new); + goto retry; + } + + mm_inc_nr_ptes(mm); + smp_wmb(); /* See comment in pmd_install() */ + pmd_populate(mm, pmdp, new); + spin_unlock(ptl); + return pte_offset_kernel(pmdp, addr); +} + static void coalesce_file_region(struct resv_map *resv, struct file_region *rg) { struct file_region *nrg, *prg; From patchwork Thu Jan 5 10:18:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39438 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp227614wrt; Thu, 5 Jan 2023 02:22:15 -0800 (PST) X-Google-Smtp-Source: AMrXdXtfPd1HM7I4gzVBiyKR+8dTjtcSjs0LbtkrkK82Pi8FHkZF0lFxvuIjRFtdi0bqcvAAxhhw X-Received: by 2002:a17:90a:7d0f:b0:219:7f29:3152 with SMTP id g15-20020a17090a7d0f00b002197f293152mr52724294pjl.39.1672914135009; Thu, 05 Jan 2023 02:22:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914134; cv=none; d=google.com; s=arc-20160816; b=M7niuVz0L76ARi0EF4Z84jbsi1MlvV1nGH0hzvJuRKBaoflDlBDSDkj7sF3Pvw28aK g0COkvzFTrk8w2hzQkNHWyXc9/Idw9pvBBSOJ+5Ddb20CFiMrC35R4q1lieRs+P2i8ot I/zPI9EgAmtTmjDJbk0prcPitTSYksFo4NbRKz6Fo+hqrr+iSQiJ5q0KUswvGq8CtvI7 sRIRMUcMChxxCO9E3M5Vu4e1Jz4uS/VrnwkZNBqVOa59jyB292qmqR9KBPJp/AwSa0Y4 VxTB/Vg6BFsqFSBUOJv2C0rtDMVfeRbiOhBmmSPI0LHxamHWgiTpLn/S+v+F/LZZg85M I2+w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=2Vt58mLGL5M8YpW+S8eBdfZehWuEGnsVXCCJx2houLg=; b=b5vPu5gFHH+8ZvRUQGqKXzp6tnTIZz0dX2F5G8ouphjAqTZv4HOs4+HpQgnC2KlSY1 7kyoZ3xNq2K2bveLUij0gQJCoD3qdcEB2YdSZm8SycaKWuuoLBiM3zh1tUMEdSharL6k pQyr9yvfsphRY680eAaY1A72noeGjZW1J8HisaA71iVFDnGD5Pd1zimWLkA9+F7kb+Xa X9tzxZVr8rjjtQeAmkuztccp5j23MhHWw7SJ6Y2rZ5IS5lQ1hqquwmKpOouikgfFxpCT dPPO/563R0/wcTI3pl/gFIyrlhxIa0HVjv5gtlalmSRHNMKMAfJ4fUXUpBz4k+cR/+Vz wMAw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=FAFveYt9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u14-20020a63790e000000b0049fb97c2743si17220208pgc.251.2023.01.05.02.22.02; Thu, 05 Jan 2023 02:22:14 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=FAFveYt9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232269AbjAEKUc (ORCPT + 99 others); Thu, 5 Jan 2023 05:20:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40908 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232593AbjAEKTl (ORCPT ); Thu, 5 Jan 2023 05:19:41 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 139E65370E for ; Thu, 5 Jan 2023 02:19:14 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-46a518991cfso343225017b3.18 for ; Thu, 05 Jan 2023 02:19:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=2Vt58mLGL5M8YpW+S8eBdfZehWuEGnsVXCCJx2houLg=; b=FAFveYt9nBF0eFfaA1d0Tv7HE5xVwqFCRHNy2MmHG8kjRiH1IJ9npxQOlM0OPPr9x0 SOVsSNTwUscRwlguA9nGCrhEAEgibLkiSxGoSBRrXyiTEZUYuMRPxRxyfuuinve75ZiS b63qtgYDL7WfBK69glJ4RMOKf3QPMUJGqEe/PQ3lN23K5qmqgGr6+syabeOfq80mm59A aqLVtsA1nxHBiLCtySePm+DxyeSOS0NdLf5oXMQa6tQk1Fla5YNZga5tHOyH+tx+EP0G l5qkwogI70U832FjytD4NOveDXyTFLX+2z3zmQ8We0Ne3ob7c9NV3FWN7gNIpfjzemkn +pgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=2Vt58mLGL5M8YpW+S8eBdfZehWuEGnsVXCCJx2houLg=; b=RbTxKHXcaR4KsLG/UlBMSD1IvXWZPuw9vH3YeKKo4S8oT0bpnkHVLw4zwR65e/umFl fKVj4u8xxpM41+Xobx/s3mbvevR4LLKOst8lgW4tkF1ORW4BeKB7NmvFd9FvLrDawNRQ VTSrLYY5lk4oQm9cdfuxVpeFaT7sYrqTelg9ByccMP+u3nnNSp3oeEchCmKa2vhdJM0T 0ogN18xQ/UQJuwmRtNBZmshvyVL+Zg3wvuMPkwVeD1riHEkAwuiA8rMSgUC7X9s9v1Vp 7T6AuUWe0K5IRiK4YQvD680JFCa4fF+2/sF0vNXWp3BWNidQj7vNAZpBDIXxsMEIW9pw MmcQ== X-Gm-Message-State: AFqh2kqLJI81xvtUGVH4tl+LvyhQiftrGsrM/3cxerNXKSh6C/Gf6t7M qoywMzKBsKO7NkUxzPDOeHFfC1NX3VGvV8wf X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:850e:0:b0:6f8:42d8:2507 with SMTP id w14-20020a25850e000000b006f842d82507mr6177141ybk.110.1672913953310; Thu, 05 Jan 2023 02:19:13 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:11 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-14-jthoughton@google.com> Subject: [PATCH 13/46] hugetlb: add hugetlb_hgm_walk and hugetlb_walk_step From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177611753201107?= X-GMAIL-MSGID: =?utf-8?q?1754177611753201107?= hugetlb_hgm_walk implements high-granularity page table walks for HugeTLB. It is safe to call on non-HGM enabled VMAs; it will return immediately. hugetlb_walk_step implements how we step forwards in the walk. For architectures that don't use GENERAL_HUGETLB, they will need to provide their own implementation. Signed-off-by: James Houghton --- include/linux/hugetlb.h | 35 +++++-- mm/hugetlb.c | 213 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 242 insertions(+), 6 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index ad9d19f0d1b9..2fcd8f313628 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -239,6 +239,14 @@ u32 hugetlb_fault_mutex_hash(struct address_space *mapping, pgoff_t idx); pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, pud_t *pud); +int hugetlb_full_walk(struct hugetlb_pte *hpte, struct vm_area_struct *vma, + unsigned long addr); +void hugetlb_full_walk_continue(struct hugetlb_pte *hpte, + struct vm_area_struct *vma, unsigned long addr); +int hugetlb_full_walk_alloc(struct hugetlb_pte *hpte, + struct vm_area_struct *vma, unsigned long addr, + unsigned long target_sz); + struct address_space *hugetlb_page_mapping_lock_write(struct page *hpage); extern int sysctl_hugetlb_shm_group; @@ -288,6 +296,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr, unsigned long sz); unsigned long hugetlb_mask_last_page(struct hstate *h); +int hugetlb_walk_step(struct mm_struct *mm, struct hugetlb_pte *hpte, + unsigned long addr, unsigned long sz); int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, pte_t *ptep); void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, @@ -1067,6 +1077,8 @@ void hugetlb_register_node(struct node *node); void hugetlb_unregister_node(struct node *node); #endif +enum hugetlb_level hpage_size_to_level(unsigned long sz); + #else /* CONFIG_HUGETLB_PAGE */ struct hstate {}; @@ -1259,6 +1271,11 @@ static inline void hugetlb_register_node(struct node *node) static inline void hugetlb_unregister_node(struct node *node) { } + +static inline enum hugetlb_level hpage_size_to_level(unsigned long sz) +{ + return HUGETLB_LEVEL_PTE; +} #endif /* CONFIG_HUGETLB_PAGE */ #ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING @@ -1333,12 +1350,8 @@ __vma_has_hugetlb_vma_lock(struct vm_area_struct *vma) return (vma->vm_flags & VM_MAYSHARE) && vma->vm_private_data; } -/* - * Safe version of huge_pte_offset() to check the locks. See comments - * above huge_pte_offset(). - */ -static inline pte_t * -hugetlb_walk(struct vm_area_struct *vma, unsigned long addr, unsigned long sz) +static inline void +hugetlb_walk_lock_check(struct vm_area_struct *vma) { #if defined(CONFIG_HUGETLB_PAGE) && \ defined(CONFIG_ARCH_WANT_HUGE_PMD_SHARE) && defined(CONFIG_LOCKDEP) @@ -1360,6 +1373,16 @@ hugetlb_walk(struct vm_area_struct *vma, unsigned long addr, unsigned long sz) !lockdep_is_held( &vma->vm_file->f_mapping->i_mmap_rwsem)); #endif +} + +/* + * Safe version of huge_pte_offset() to check the locks. See comments + * above huge_pte_offset(). + */ +static inline pte_t * +hugetlb_walk(struct vm_area_struct *vma, unsigned long addr, unsigned long sz) +{ + hugetlb_walk_lock_check(vma); return huge_pte_offset(vma->vm_mm, addr, sz); } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 2160cbaf3311..aa8e59cbca69 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -94,6 +94,29 @@ static int hugetlb_acct_memory(struct hstate *h, long delta); static void hugetlb_vma_lock_free(struct vm_area_struct *vma); static void __hugetlb_vma_unlock_write_free(struct vm_area_struct *vma); +/* + * hpage_size_to_level() - convert @sz to the corresponding page table level + * + * @sz must be less than or equal to a valid hugepage size. + */ +enum hugetlb_level hpage_size_to_level(unsigned long sz) +{ + /* + * We order the conditionals from smallest to largest to pick the + * smallest level when multiple levels have the same size (i.e., + * when levels are folded). + */ + if (sz < PMD_SIZE) + return HUGETLB_LEVEL_PTE; + if (sz < PUD_SIZE) + return HUGETLB_LEVEL_PMD; + if (sz < P4D_SIZE) + return HUGETLB_LEVEL_PUD; + if (sz < PGDIR_SIZE) + return HUGETLB_LEVEL_P4D; + return HUGETLB_LEVEL_PGD; +} + static inline bool subpool_is_free(struct hugepage_subpool *spool) { if (spool->count) @@ -7276,6 +7299,153 @@ bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr) } #endif /* CONFIG_ARCH_WANT_HUGE_PMD_SHARE */ +/* hugetlb_hgm_walk - walks a high-granularity HugeTLB page table to resolve + * the page table entry for @addr. We might allocate new PTEs. + * + * @hpte must always be pointing at an hstate-level PTE or deeper. + * + * This function will never walk further if it encounters a PTE of a size + * less than or equal to @sz. + * + * @alloc determines what we do when we encounter an empty PTE. If false, + * we stop walking. If true and @sz is less than the current PTE's size, + * we make that PTE point to the next level down, going until @sz is the same + * as our current PTE. + * + * If @alloc is false and @sz is PAGE_SIZE, this function will always + * succeed, but that does not guarantee that hugetlb_pte_size(hpte) is @sz. + * + * Return: + * -ENOMEM if we couldn't allocate new PTEs. + * -EEXIST if the caller wanted to walk further than a migration PTE, + * poison PTE, or a PTE marker. The caller needs to manually deal + * with this scenario. + * -EINVAL if called with invalid arguments (@sz invalid, @hpte not + * initialized). + * 0 otherwise. + * + * Even if this function fails, @hpte is guaranteed to always remain + * valid. + */ +static int hugetlb_hgm_walk(struct mm_struct *mm, struct vm_area_struct *vma, + struct hugetlb_pte *hpte, unsigned long addr, + unsigned long sz, bool alloc) +{ + int ret = 0; + pte_t pte; + + if (WARN_ON_ONCE(sz < PAGE_SIZE)) + return -EINVAL; + + if (WARN_ON_ONCE(!hpte->ptep)) + return -EINVAL; + + /* We have the same synchronization requirements as hugetlb_walk. */ + hugetlb_walk_lock_check(vma); + + while (hugetlb_pte_size(hpte) > sz && !ret) { + pte = huge_ptep_get(hpte->ptep); + if (!pte_present(pte)) { + if (!alloc) + return 0; + if (unlikely(!huge_pte_none(pte))) + return -EEXIST; + } else if (hugetlb_pte_present_leaf(hpte, pte)) + return 0; + ret = hugetlb_walk_step(mm, hpte, addr, sz); + } + + return ret; +} + +static int hugetlb_hgm_walk_uninit(struct hugetlb_pte *hpte, + pte_t *ptep, + struct vm_area_struct *vma, + unsigned long addr, + unsigned long target_sz, + bool alloc) +{ + struct hstate *h = hstate_vma(vma); + + hugetlb_pte_populate(vma->vm_mm, hpte, ptep, huge_page_shift(h), + hpage_size_to_level(huge_page_size(h))); + return hugetlb_hgm_walk(vma->vm_mm, vma, hpte, addr, target_sz, + alloc); +} + +/* + * hugetlb_full_walk_continue - continue a high-granularity page-table walk. + * + * If a user has a valid @hpte but knows that @hpte is not a leaf, they can + * attempt to continue walking by calling this function. + * + * This function may never fail, but @hpte might not change. + * + * If @hpte is not valid, then this function is a no-op. + */ +void hugetlb_full_walk_continue(struct hugetlb_pte *hpte, + struct vm_area_struct *vma, + unsigned long addr) +{ + /* hugetlb_hgm_walk will never fail with these arguments. */ + WARN_ON_ONCE(hugetlb_hgm_walk(vma->vm_mm, vma, hpte, addr, + PAGE_SIZE, false)); +} + +/* + * hugetlb_full_walk - do a high-granularity page-table walk; never allocate. + * + * This function can only fail if we find that the hstate-level PTE is not + * allocated. Callers can take advantage of this fact to skip address regions + * that cannot be mapped in that case. + * + * If this function succeeds, @hpte is guaranteed to be valid. + */ +int hugetlb_full_walk(struct hugetlb_pte *hpte, + struct vm_area_struct *vma, + unsigned long addr) +{ + struct hstate *h = hstate_vma(vma); + unsigned long sz = huge_page_size(h); + /* + * We must mask the address appropriately so that we pick up the first + * PTE in a contiguous group. + */ + pte_t *ptep = hugetlb_walk(vma, addr & huge_page_mask(h), sz); + + if (!ptep) + return -ENOMEM; + + /* hugetlb_hgm_walk_uninit will never fail with these arguments. */ + WARN_ON_ONCE(hugetlb_hgm_walk_uninit(hpte, ptep, vma, addr, + PAGE_SIZE, false)); + return 0; +} + +/* + * hugetlb_full_walk_alloc - do a high-granularity walk, potentially allocate + * new PTEs. + */ +int hugetlb_full_walk_alloc(struct hugetlb_pte *hpte, + struct vm_area_struct *vma, + unsigned long addr, + unsigned long target_sz) +{ + struct hstate *h = hstate_vma(vma); + unsigned long sz = huge_page_size(h); + /* + * We must mask the address appropriately so that we pick up the first + * PTE in a contiguous group. + */ + pte_t *ptep = huge_pte_alloc(vma->vm_mm, vma, addr & huge_page_mask(h), + sz); + + if (!ptep) + return -ENOMEM; + + return hugetlb_hgm_walk_uninit(hpte, ptep, vma, addr, target_sz, true); +} + #ifdef CONFIG_ARCH_WANT_GENERAL_HUGETLB pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz) @@ -7343,6 +7513,49 @@ pte_t *huge_pte_offset(struct mm_struct *mm, return (pte_t *)pmd; } +/* + * hugetlb_walk_step() - Walk the page table one step to resolve the page + * (hugepage or subpage) entry at address @addr. + * + * @sz always points at the final target PTE size (e.g. PAGE_SIZE for the + * lowest level PTE). + * + * @hpte will always remain valid, even if this function fails. + * + * Architectures that implement this function must ensure that if @hpte does + * not change levels, then its PTL must also stay the same. + */ +int hugetlb_walk_step(struct mm_struct *mm, struct hugetlb_pte *hpte, + unsigned long addr, unsigned long sz) +{ + pte_t *ptep; + spinlock_t *ptl; + + switch (hpte->level) { + case HUGETLB_LEVEL_PUD: + ptep = (pte_t *)hugetlb_alloc_pmd(mm, hpte, addr); + if (IS_ERR(ptep)) + return PTR_ERR(ptep); + hugetlb_pte_populate(mm, hpte, ptep, PMD_SHIFT, + HUGETLB_LEVEL_PMD); + break; + case HUGETLB_LEVEL_PMD: + ptep = hugetlb_alloc_pte(mm, hpte, addr); + if (IS_ERR(ptep)) + return PTR_ERR(ptep); + ptl = pte_lockptr(mm, (pmd_t *)hpte->ptep); + __hugetlb_pte_populate(hpte, ptep, PAGE_SHIFT, + HUGETLB_LEVEL_PTE, ptl); + hpte->ptl = ptl; + break; + default: + WARN_ONCE(1, "%s: got invalid level: %d (shift: %d)\n", + __func__, hpte->level, hpte->shift); + return -EINVAL; + } + return 0; +} + /* * Return a mask that can be used to update an address to the last huge * page in a page table page mapping size. Used to skip non-present From patchwork Thu Jan 5 10:18:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39439 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp227630wrt; Thu, 5 Jan 2023 02:22:18 -0800 (PST) X-Google-Smtp-Source: AMrXdXt7QXRR1gnjlg+vplF+u/m2Xoyo+5/EV798+tQn5F5oVmIBVKe34T5fVuolu7KwtTLROP3z X-Received: by 2002:a05:6a20:3c8f:b0:ad:97cc:e948 with SMTP id b15-20020a056a203c8f00b000ad97cce948mr78577513pzj.45.1672914138510; Thu, 05 Jan 2023 02:22:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914138; cv=none; d=google.com; s=arc-20160816; b=q8+Eypq4PY0HDqOKU3Fz1wdXn8gt5upn8LUFIlieG4wJMDXf1HOs0QLW1gk8MxFd0e vqcNuYoo+BcVTUkfB9ct8Hkj4m8/egPC/NEmnPKiHojkuRyyPLW82/DWcS7v8lcVuk0O 80mun2Aeo+9R3gKrbecx/Q9n8Fab/wFGmp1F3Uaf2v8qHpg7IhTEV1pQSEqobQw9Hkrd pB08d+PnKInKrae2YF+t6VHu/+Yq7S7h6129jtyFssdjuce0rbrYrKVz73TcW4tLwO2Y uq7agkAImpsmxBK8nV8LSB1Ps4Hq83AEoj7c8C7zGKW1Gf2VD5gOzU16Aj2FZLccoJa2 h0Dg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=TZkjgoDFSdPqMdf6PVi66Nkhma8WC4z5Ws9uGEsRfvU=; b=BVdnvO0N6dTIFklpYcuGzyw8+MmdG+6qeckkdbF64gIOS890k8uRk98NUyvt4U+fZV r+X48LQ0g8PYekhJxY+5RD8TNe/23ISl9IM+BZ/i+fMAJ0c75tLJcUlY/0vC85i2Inog d5b2n6jW143+6n9TPDPeJCk62xKLscCEtrdZGmy45zIIRZ+OyFGuQnws/0o+vKRtoiCd OxRChZ/InSBmtZQwiyDQabnTzbknlAY+FhEhbTGZT8283Qm1UtuCKgJ2nQsaGXEJV5qo OHxvM4mSu9JaQfXme/NrJoWImtfaqdJNYwJr+6zBMvJhZ+1zjntuUN7Gc9/gUmBgqj9N h0Rg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=b1RVVWEc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q29-20020a635c1d000000b0047785402653si37793032pgb.776.2023.01.05.02.22.05; Thu, 05 Jan 2023 02:22:18 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=b1RVVWEc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232856AbjAEKUf (ORCPT + 99 others); Thu, 5 Jan 2023 05:20:35 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40910 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232697AbjAEKTl (ORCPT ); Thu, 5 Jan 2023 05:19:41 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 41FAB551C6 for ; Thu, 5 Jan 2023 02:19:16 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id i17-20020a25bc11000000b007b59a5b74aaso773082ybh.7 for ; Thu, 05 Jan 2023 02:19:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=TZkjgoDFSdPqMdf6PVi66Nkhma8WC4z5Ws9uGEsRfvU=; b=b1RVVWEcn7xCiOOjSjy13xNZs7I3m2tPgL+vaICF4A006PbkCzbik5RH+bvlc5t1Qx smy7s9lD0NeMHyXR/guX74nirJ4mhSP/b5OsHYtrukajSYuo8SN51XOlUZveqfcIBxTi pCfvLZWOwBGymc/6OMx4GvgiaEZ8gEeK8Ch/w/fxFKNdFChMKGauZEpNnT4CrnGxewHT oGPT+PC4YdlyU5cZzBSC43pYPeJidbn+PHc6Uj+0smcgsOefwqWpVuhCtBoBfzEfEMFV upTK7E8QZhz98epzC3+PkHQg4sEDxGGdEgPVD9h872OYNbYRLMjyEnUxFS/EyfSg2GYD aEIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=TZkjgoDFSdPqMdf6PVi66Nkhma8WC4z5Ws9uGEsRfvU=; b=qshl5wbcNG5G1zJME2vwAA5XZS+LHFNuzg1NZFuKbP7rHWQCjXqAOtUoPWakgVToYk iAjPVmRxzRZDz4oJ2x/daRb517k5qrsj2wMUD3MANz+QRtOwGj/00h6wDG7TCtx0Z8KK 8Y6TK5TB1ZMB2vgtfyouvQqllM8cb6I8IP+8Vxnt0ntVZCtBbGBdhMt+7PjsQihiEMXt 1tLpZOy/zAaMVkVZAXyL3OJ2101dATlzL6NqFTyGj9f2+lCmkpn2uCAXnkuXx38p48cg H/TIeWXNYv72s5rDyGc8VwuSwf35aF48GnaxdHMfi3W7bMtYgKklG+BQ7tdNObsCSLW2 iAnQ== X-Gm-Message-State: AFqh2kpgy87sLN9NRZSP5NErS6bL4Nfh84FBpI78o8LkEvqSLTyyq0PK NuqEFlw4Mvw3vGxY2WatFM+nLCfkJ5CNc7pD X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:f41:0:b0:6f9:bd92:e428 with SMTP id 62-20020a250f41000000b006f9bd92e428mr4815135ybp.28.1672913955309; Thu, 05 Jan 2023 02:19:15 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:12 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-15-jthoughton@google.com> Subject: [PATCH 14/46] hugetlb: add make_huge_pte_with_shift From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177615329904239?= X-GMAIL-MSGID: =?utf-8?q?1754177615329904239?= This allows us to make huge PTEs at shifts other than the hstate shift, which will be necessary for high-granularity mappings. Acked-by: Mike Kravetz Signed-off-by: James Houghton --- mm/hugetlb.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index aa8e59cbca69..3a75833d7aba 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5028,11 +5028,11 @@ const struct vm_operations_struct hugetlb_vm_ops = { .pagesize = hugetlb_vm_op_pagesize, }; -static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page, - int writable) +static pte_t make_huge_pte_with_shift(struct vm_area_struct *vma, + struct page *page, int writable, + int shift) { pte_t entry; - unsigned int shift = huge_page_shift(hstate_vma(vma)); if (writable) { entry = huge_pte_mkwrite(huge_pte_mkdirty(mk_pte(page, @@ -5046,6 +5046,14 @@ static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page, return entry; } +static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page, + int writable) +{ + unsigned int shift = huge_page_shift(hstate_vma(vma)); + + return make_huge_pte_with_shift(vma, page, writable, shift); +} + static void set_huge_ptep_writable(struct vm_area_struct *vma, unsigned long address, pte_t *ptep) { From patchwork Thu Jan 5 10:18:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39440 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp227657wrt; Thu, 5 Jan 2023 02:22:23 -0800 (PST) X-Google-Smtp-Source: AMrXdXt/UURZ22J+GluguSuKd9rpaOsh45DprWAqfoNq/UZP1SWfmULDNAZk/GRjY+YMTRKfnLAH X-Received: by 2002:a17:902:7c8a:b0:192:835d:c861 with SMTP id y10-20020a1709027c8a00b00192835dc861mr33003906pll.68.1672914143488; Thu, 05 Jan 2023 02:22:23 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914143; cv=none; d=google.com; s=arc-20160816; b=CvHCZdhz5hY/rUs5DstfXrYPPVbuFEHCcXK9L9Oyw6CoTxX6uLMc/Jl+nNzPn5IkdR vZ4w8W296iC2dzh0cYwYXVfRJB+iaVIrEoYz2j8S8BmwT4T17ZgJmrQER9wVt7Po+yml ztgaVzUQmjlodxcfYf5mo/mVCWkfUrv/wH2SAIbb0vMTXA63igxyRT/lTzyeSmhThdo4 9Pn4qvMQ+O1rZZ2UGmPu5mcXdB5jgqUqyicYb8voo9+GRU3RrJ8HMlS1FJmU6s7lRkSj zkmxxZwBM96wfeVBXSI7Qdcy/bt0VY3qhpju3lWMdFjsZu5iT70NHp5qaFVgvKXNfLMg ByYg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=xdV3IYpedDoNM7qUi7kHNJzHPxVnQ9DxeM3hKKmcY9U=; b=BESwN0rCFk8u6Eku9E4Xm33cCCqlfNHyUP40gXx9++W3lZYfBk58NXecomKsCRcbD7 Dvqfn8swum5iyui8zbg4QfktYzy93mQGlggTXwQBLiTIShP0mhQx7bEJcAge839Cxs8L EFslkIx8evNmre4LCN5wfeFrfZrPZVbfa0Z6GkavY3CxfCgdFJYDxn6VZr1X2SiJvknb XMydxRe9uadWix5UaCVgvSePxgyZAQrqnk0uuOL+FpDlqWLVEnezpvydQw7u3PvIH4CQ JOm/mQZfykqEnK3HccHfeWLOQmKfLvWrSMmDlBOlrow5Zgq8+DxDDjsRKB7zk8DacZm+ d6tA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=DiSQp6yj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id t18-20020a170902b21200b0019248864b1bsi847510plr.624.2023.01.05.02.22.10; Thu, 05 Jan 2023 02:22:23 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=DiSQp6yj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232873AbjAEKUx (ORCPT + 99 others); Thu, 5 Jan 2023 05:20:53 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41232 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232615AbjAEKTz (ORCPT ); Thu, 5 Jan 2023 05:19:55 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 62788551C9 for ; Thu, 5 Jan 2023 02:19:17 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id g9-20020a25bdc9000000b0073727a20239so36033610ybk.4 for ; Thu, 05 Jan 2023 02:19:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=xdV3IYpedDoNM7qUi7kHNJzHPxVnQ9DxeM3hKKmcY9U=; b=DiSQp6yj/Jffb5ePxq8rLXfWEn0QKWirG+DC/Z+7i72DYO+nh0hgIQIWulypFbtBD0 CNXALjMtPUpaNYD+K9/Wtem86znsi9S0XhhVza6MwF9b0Ah41Zx62xgpk0EqpvxL45Ez twNhGjN8owfKEsoU0pxg3/ZHYJoT3u5nVxsT0weZJKuU5bHZEEigDexGz6ip4UrCMr3u 3a/kheO+Wz2ofksSCLNRFL/BR4OqsTuwZPgS6RFEg5mlXviTOOl8DGf/P5TUOJfu8Kz5 jKgiNh/96tLer8BEZqipSxoSlnT5zdxZozrvn3a/BmqSquCXcaUJoYHENLG2wGTwjIMH H8MA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xdV3IYpedDoNM7qUi7kHNJzHPxVnQ9DxeM3hKKmcY9U=; b=ct9ChEO9CEnD0kY8vp2yGudpbEsESguOxQe1vrvdYdb6lwO9Fcy8NWiP5Q2x6K5Cun dWthLGbs667XLp/21CpQJW2JdTG6IMJkSLJfGJ1/QQpEVPgQxgJRE70OBcxjkq9dC37a fMZwdyVboex/2tT8+Vw8HtjP70sKIylqrniyXLgZGpuFDZBoQ91gp/51tqTF6rg5zVeD Amba1Q5bIsmNXsmWe44+tG1hsVP+JMU6ohhVLHmlsa+/i8v9sKyJZY7UJ3VDucn8Xc+Q fMsvuBwqhWoS2m+uYTNqiaOcsVxvEqONAQlRUqE4WeCzO90qGNkQjpFgN3N31Z8p+nRn BwDg== X-Gm-Message-State: AFqh2kph3ep7HF0gPiXBp0WZ8xunrZbF72nhoqYjEMWYOBu3yTG4rRNY hX2j8seZFCKMlqeJIEmn2TaAItGAUBBPuB1G X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a81:490c:0:b0:41f:702d:7883 with SMTP id w12-20020a81490c000000b0041f702d7883mr5495262ywa.22.1672913957258; Thu, 05 Jan 2023 02:19:17 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:13 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-16-jthoughton@google.com> Subject: [PATCH 15/46] hugetlb: make default arch_make_huge_pte understand small mappings From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177621122730101?= X-GMAIL-MSGID: =?utf-8?q?1754177621122730101?= This is a simple change: don't create a "huge" PTE if we are making a regular, PAGE_SIZE PTE. All architectures that want to implement HGM likely need to be changed in a similar way if they implement their own version of arch_make_huge_pte. Signed-off-by: James Houghton --- include/linux/hugetlb.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 2fcd8f313628..b7cf45535d64 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -912,7 +912,7 @@ static inline void arch_clear_hugepage_flags(struct page *page) { } static inline pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags) { - return pte_mkhuge(entry); + return shift > PAGE_SHIFT ? pte_mkhuge(entry) : entry; } #endif From patchwork Thu Jan 5 10:18:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39441 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp227674wrt; Thu, 5 Jan 2023 02:22:27 -0800 (PST) X-Google-Smtp-Source: AMrXdXtMeNPwL0g6Bkd21VNJhBKhsk0xnShk31ZTjkRAfKGRDSbAyh0oDnhgm993sOepBg3aXkBw X-Received: by 2002:a17:903:24b:b0:192:e9cf:d4d0 with SMTP id j11-20020a170903024b00b00192e9cfd4d0mr6072675plh.52.1672914146932; Thu, 05 Jan 2023 02:22:26 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914146; cv=none; d=google.com; s=arc-20160816; b=wdYHkCqzRWcbS7dJBVhWkowZVs/+TxWkFh/rNWbClvlKv7uPSw+AHBpTf6FNyvmO6y CAVXdwS4VVxhvqTtThl+Kj9ikLtgVy3/GQoHv4t6Mi70knby26XUJSK1rW7+SJBrhZhZ +T6yY/HcQd1g/Kz9NlL/FEJX9ybYNYDnxxb0aa+P/+ZG2OPRUU1f6sMtXaJq1lAAlpl7 L84zF3akt/L3455I35d18fwQCezHPdmoVECzMSzd1+zNw6FqZlnn48av+LcE/TxS5uUw BqUFVk0fjAlD5heoShxBueZKfYeqRSy+EoCXpvKg/oSkbaBsLnPfYaaEy9D2hGBg5CB3 ATnw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=JBa/YNOWHc8BdkByQMj3j16NgHnHDUWMHmpwThmwYW8=; b=vgKVNXIYQ3bCtXMw1Qy6nABfumtLxbprk+YHIEOIX3c+1SEyoYTVm+jqKHuDKs945I raYz4gNfBDr/xFWUiHfj3TAmYtgrhI8h9Y84H28+W0D9ZsukDfC3aC4kdG4YVLkYRtLq EED+w71/GarQLGJPRbDdRY5eMHu5Wgh+FiA4+54LiqqiHBus9ncfAxGxErESE5G6kLWI znst1rs99regeMvQF57z1Tnn6bT4qzq2IJTwQEA2Wxk7cIB0+YOuMtgIUoQsY6rqRW2T Lu28QRctl0kyuZ2yHJIJJQeyJTQr2HDZ0mLVICPG6AYsBu+Tm/wWTUVGpr5L6gsjbS1b ZWzA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=XMejQEid; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o4-20020a654584000000b00478496a1373si36781952pgq.382.2023.01.05.02.22.13; Thu, 05 Jan 2023 02:22:26 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=XMejQEid; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232512AbjAEKU7 (ORCPT + 99 others); Thu, 5 Jan 2023 05:20:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40054 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232616AbjAEKT4 (ORCPT ); Thu, 5 Jan 2023 05:19:56 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2E46E551F0 for ; Thu, 5 Jan 2023 02:19:19 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id s6-20020a259006000000b00706c8bfd130so36459466ybl.11 for ; Thu, 05 Jan 2023 02:19:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=JBa/YNOWHc8BdkByQMj3j16NgHnHDUWMHmpwThmwYW8=; b=XMejQEid82qJD40DqqAVmKO7KaK2I6fqGVaQFn3qa+xsUhE5GeK5QsvXmIgXrlYAYp ZglFLDAuM74Ul8HnkiBVKndyeFl7TfYD6lrTc+G+Y0y5590vBgUhLji2pkV/uo0dTtba JOhfbgAT9W5Ih3mvcLPzzoGXyJzqEfRKoNqybN3iBy+HEU29ORkAZ56QrvlR8wMgJBPv /fiJ3Qjur/kHePLpxQf6mexQ2zmKFEUxMYH6n8AvNHRLmu3tqS42gPfHnw2VZCW+sun6 KjrZlQt7j8nYKl5hgQmUrLHKrYGRJUYde6DMPUl9jm0rRYEyRoygWOPbOjS00/ls6W8J sVQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=JBa/YNOWHc8BdkByQMj3j16NgHnHDUWMHmpwThmwYW8=; b=D4RD7j7Ekivsr4r2NsylfwUwbgTqsia9/1zw9jii9p+vvBfoJdqD5eb6SXHydy6FA/ vm5rbp0QsfLsoRmSIE20NoTmYBJQsgdDJkwU9Ib6n+VLFgaz8SQ6XVeGUK3ZslsoRWdX BbGpoAMiudw93TRqnHfHkXkk+6UKHRCFJPVGW7jFHYU+Z/d4ddGXnj++8X9xsfMvOWmH MXL+N2zdFbhpP6lYP3e2PXUclxVHYfqOlsN1GUW5D2kuXYb0USfbYe7CBZlAVN3QfPOO vGCjUmmBgzKK214ZrLG+zYTD4oNiQkj3uoaFzVLfe30fmA87JTzwrs97VKEElHvAcRog +kkQ== X-Gm-Message-State: AFqh2krUTD4u5LrTe0bAitJJIQkD6jmMvaxStRTPJlTPJDYTzJLB2J7h hIPk1jwM/QMV0kCEPY9uF5znIXyruGT4OyTC X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6902:b14:b0:6fc:c88a:1c6d with SMTP id ch20-20020a0569020b1400b006fcc88a1c6dmr5728371ybb.486.1672913959168; Thu, 05 Jan 2023 02:19:19 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:14 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-17-jthoughton@google.com> Subject: [PATCH 16/46] hugetlbfs: do a full walk to check if vma maps a page From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177624931585618?= X-GMAIL-MSGID: =?utf-8?q?1754177624931585618?= Because it is safe to do so, we can do a full high-granularity page table walk to check if the page is mapped. If it were not safe to do so, we could bail out early in the case of a high-granularity mapped PTE, indicating that the page could have been mapped. Signed-off-by: James Houghton --- fs/hugetlbfs/inode.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 48f1a8ad2243..d34ce79da595 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -386,17 +386,24 @@ static void hugetlb_delete_from_page_cache(struct folio *folio) static bool hugetlb_vma_maps_page(struct vm_area_struct *vma, unsigned long addr, struct page *page) { - pte_t *ptep, pte; + pte_t pte; + struct hugetlb_pte hpte; - ptep = hugetlb_walk(vma, addr, huge_page_size(hstate_vma(vma))); - if (!ptep) + if (hugetlb_full_walk(&hpte, vma, addr)) return false; - pte = huge_ptep_get(ptep); + pte = huge_ptep_get(hpte.ptep); if (huge_pte_none(pte) || !pte_present(pte)) return false; - if (pte_page(pte) == page) + if (unlikely(!hugetlb_pte_present_leaf(&hpte, pte))) + /* + * We raced with someone splitting us, and the only case + * where this is impossible is when the pte was none. + */ + return false; + + if (compound_head(pte_page(pte)) == page) return true; return false; From patchwork Thu Jan 5 10:18:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39450 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp228000wrt; Thu, 5 Jan 2023 02:23:22 -0800 (PST) X-Google-Smtp-Source: AMrXdXvTX/Bh9eRFFnoFIsNQpm01epIilym5A4zm93nPpuUm2B1UN2HTJVGHSty7O8aAcqproo01 X-Received: by 2002:a17:902:e889:b0:18e:4a20:5b89 with SMTP id w9-20020a170902e88900b0018e4a205b89mr77727048plg.23.1672914201761; Thu, 05 Jan 2023 02:23:21 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914201; cv=none; d=google.com; s=arc-20160816; b=vXn2UoZHytFE268Tm8yDfPZ1qUxfGo1qARvXAxcXQ6XKP3OSkwEADj/UNvfuc0Ebs8 q7a3BqzOCeRAYNfgx1ohxNq+kDIR5/6mD5uf+bUh0y9ZDVq4rSv9VTLJ1qbNrylnH8vd turr5k/kgdlMCEb05g1SwPGs9hMxVlShdD17j0wtb7V/J1TzLzVEPMqXSvf0jo/Ny9Xu rHqYXGTVy1pL/kKk9uIZyM9ZdRTRg37JX7KFLKaqSTV6qz9ew4XH4MazTu/jSe89cbu0 MPH8lNAkh+E//0txcxmL05sl/RhVCDH9qsY7gTo4ejgxi3+8/DvASVEDP7hdXySWtSZX Us7A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=3kWM8DYu7DnV7Jv/QGJ6f3ppiIBnNL94nY50ZsYLZ48=; b=W1j1sIY0BxDiWLlYE83HuuYug2mw0G4fccKHkMbnc+WISG32f/URgVJs09//rSbFda 9pcjc4M+scNTr+yzSB+satTkXpaXG8btFz39EuoQf8kCSEaGRHrqL8AMzNuqyvPkWG20 8msq8xME7lwDNGb7NHjwpOBZgTFF/RPQhgHFmn2wAwvUoLROKKRlVpBJm11qdzucfVc+ oi6IzYsnQyfaq5J+uJn6tzKPxK6eMeMI2hs10bzdWpRP9BWWSlUqL2U6Zr7q/0WxeqRK BM8Aass+DjrAD16NWBEXa023TqLVOUxPwavtbToNKjBM1RCXs4OKq6l+ysvpyORS0Jbr YeJQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=L2aTbeQA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e5-20020a170902f1c500b001895ef14f57si34400876plc.166.2023.01.05.02.23.06; Thu, 05 Jan 2023 02:23:21 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=L2aTbeQA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232901AbjAEKVE (ORCPT + 99 others); Thu, 5 Jan 2023 05:21:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41256 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232621AbjAEKT4 (ORCPT ); Thu, 5 Jan 2023 05:19:56 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4614B395E1 for ; Thu, 5 Jan 2023 02:19:21 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-4700580ca98so305708967b3.9 for ; Thu, 05 Jan 2023 02:19:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=3kWM8DYu7DnV7Jv/QGJ6f3ppiIBnNL94nY50ZsYLZ48=; b=L2aTbeQAeNJJeiUCuWw/dfwAL4UVIBqWGH55xL0ZJpLs3sGFv6hFcIvCc0Ga7d/ieX hjmwK3ib7/cchvS/YNI5B2b8qO3orIA5BWZwCv046KSsSk68EciP1H12pdFZ0JnY3ID3 pYb07IsQSWYw/0QL7lCbsdokm5Jdl9QVO58MurdRYCOYSbQ0K1s+CTWgqBbCQMbN52p3 rBm+Mj78x+W8YyDclWaBCjNduKppB0sC5WajJ2RRtfrhbf2j86ceAoNxktgW6316luJG DMu0s3GTU3x/vCtEgoRMt2pwviQKJhe3xm5AQwX5rbR/oBcVLAk0J73laj0wxnS/nDXd glUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=3kWM8DYu7DnV7Jv/QGJ6f3ppiIBnNL94nY50ZsYLZ48=; b=Up5cXJ2IgAH1VIX2OCrSLL2YhQW5d539HjgZaLQbzGeTX7ylaR2U4GaEaKncJl4j0V goOP0P4kTeC2b9thtDwy6YnfysfQtBlhh26+ckaCIvchixchYDW5jt3jPURHH0UeP4B2 bs23mWissfy2zoTPePZp6ALMpRGIQj2zZcKYgxuY3lpqfzCgD6a6zhM1TaDBAxGVHLF9 +bbsKs8/7oBQGRteuIMmtxqgTw2y6EAlBWmnMKYyTtGJ4o1sHj8Cb0L6W0QhSyVHGHXz jD9zEmaXFgmNEJBv9/OR3oLoUDOiA4UMON7zdC3Ha8TxUC2Z2Y0RyeMqdsrNvSjzzY8y 1eTQ== X-Gm-Message-State: AFqh2kph8w/cqC09MRs+/pnkfGbPntgGCZL0OrzZbfspElNwam3vm6gX 8kvza/ndK708SBAio5nv68GuDjSF5MF/s9Mc X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:aa4f:0:b0:717:e051:5d2d with SMTP id s73-20020a25aa4f000000b00717e0515d2dmr5822098ybi.474.1672913960512; Thu, 05 Jan 2023 02:19:20 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:15 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-18-jthoughton@google.com> Subject: [PATCH 17/46] hugetlb: make unmapping compatible with high-granularity mappings From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177682194985622?= X-GMAIL-MSGID: =?utf-8?q?1754177682194985622?= Enlighten __unmap_hugepage_range to deal with high-granularity mappings. This doesn't change its API; it still must be called with hugepage alignment, but it will correctly unmap hugepages that have been mapped at high granularity. The rules for mapcount and refcount here are: 1. Refcount and mapcount are tracked on the head page. 2. Each page table mapping into some of an hpage will increase that hpage's mapcount and refcount by 1. Eventually, functionality here can be expanded to allow users to call MADV_DONTNEED on PAGE_SIZE-aligned sections of a hugepage, but that is not done here. Signed-off-by: James Houghton --- include/asm-generic/tlb.h | 6 ++-- mm/hugetlb.c | 74 ++++++++++++++++++++++++--------------- 2 files changed, 48 insertions(+), 32 deletions(-) diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h index b46617207c93..31267471760e 100644 --- a/include/asm-generic/tlb.h +++ b/include/asm-generic/tlb.h @@ -598,9 +598,9 @@ static inline void tlb_flush_p4d_range(struct mmu_gather *tlb, __tlb_remove_tlb_entry(tlb, ptep, address); \ } while (0) -#define tlb_remove_huge_tlb_entry(h, tlb, ptep, address) \ +#define tlb_remove_huge_tlb_entry(tlb, hpte, address) \ do { \ - unsigned long _sz = huge_page_size(h); \ + unsigned long _sz = hugetlb_pte_size(&hpte); \ if (_sz >= P4D_SIZE) \ tlb_flush_p4d_range(tlb, address, _sz); \ else if (_sz >= PUD_SIZE) \ @@ -609,7 +609,7 @@ static inline void tlb_flush_p4d_range(struct mmu_gather *tlb, tlb_flush_pmd_range(tlb, address, _sz); \ else \ tlb_flush_pte_range(tlb, address, _sz); \ - __tlb_remove_tlb_entry(tlb, ptep, address); \ + __tlb_remove_tlb_entry(tlb, hpte.ptep, address);\ } while (0) /** diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 3a75833d7aba..dfd6c1491ac3 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5384,10 +5384,10 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct { struct mm_struct *mm = vma->vm_mm; unsigned long address; - pte_t *ptep; + struct hugetlb_pte hpte; pte_t pte; spinlock_t *ptl; - struct page *page; + struct page *hpage, *subpage; struct hstate *h = hstate_vma(vma); unsigned long sz = huge_page_size(h); unsigned long last_addr_mask; @@ -5397,35 +5397,33 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct BUG_ON(start & ~huge_page_mask(h)); BUG_ON(end & ~huge_page_mask(h)); - /* - * This is a hugetlb vma, all the pte entries should point - * to huge page. - */ - tlb_change_page_size(tlb, sz); tlb_start_vma(tlb, vma); last_addr_mask = hugetlb_mask_last_page(h); address = start; - for (; address < end; address += sz) { - ptep = hugetlb_walk(vma, address, sz); - if (!ptep) { - address |= last_addr_mask; + + while (address < end) { + if (hugetlb_full_walk(&hpte, vma, address)) { + address = (address | last_addr_mask) + sz; continue; } - ptl = huge_pte_lock(h, mm, ptep); - if (huge_pmd_unshare(mm, vma, address, ptep)) { + ptl = hugetlb_pte_lock(&hpte); + if (hugetlb_pte_size(&hpte) == sz && + huge_pmd_unshare(mm, vma, address, hpte.ptep)) { spin_unlock(ptl); tlb_flush_pmd_range(tlb, address & PUD_MASK, PUD_SIZE); force_flush = true; address |= last_addr_mask; + address += sz; continue; } - pte = huge_ptep_get(ptep); + pte = huge_ptep_get(hpte.ptep); + if (huge_pte_none(pte)) { spin_unlock(ptl); - continue; + goto next_hpte; } /* @@ -5441,24 +5439,35 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct */ if (pte_swp_uffd_wp_any(pte) && !(zap_flags & ZAP_FLAG_DROP_MARKER)) - set_huge_pte_at(mm, address, ptep, + set_huge_pte_at(mm, address, hpte.ptep, make_pte_marker(PTE_MARKER_UFFD_WP)); else - huge_pte_clear(mm, address, ptep, sz); + huge_pte_clear(mm, address, hpte.ptep, + hugetlb_pte_size(&hpte)); + spin_unlock(ptl); + goto next_hpte; + } + + if (unlikely(!hugetlb_pte_present_leaf(&hpte, pte))) { + /* + * We raced with someone splitting out from under us. + * Retry the walk. + */ spin_unlock(ptl); continue; } - page = pte_page(pte); + subpage = pte_page(pte); + hpage = compound_head(subpage); /* * If a reference page is supplied, it is because a specific * page is being unmapped, not a range. Ensure the page we * are about to unmap is the actual page of interest. */ if (ref_page) { - if (page != ref_page) { + if (hpage != ref_page) { spin_unlock(ptl); - continue; + goto next_hpte; } /* * Mark the VMA as having unmapped its page so that @@ -5468,25 +5477,32 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct set_vma_resv_flags(vma, HPAGE_RESV_UNMAPPED); } - pte = huge_ptep_get_and_clear(mm, address, ptep); - tlb_remove_huge_tlb_entry(h, tlb, ptep, address); + pte = huge_ptep_get_and_clear(mm, address, hpte.ptep); + tlb_change_page_size(tlb, hugetlb_pte_size(&hpte)); + tlb_remove_huge_tlb_entry(tlb, hpte, address); if (huge_pte_dirty(pte)) - set_page_dirty(page); + set_page_dirty(hpage); /* Leave a uffd-wp pte marker if needed */ if (huge_pte_uffd_wp(pte) && !(zap_flags & ZAP_FLAG_DROP_MARKER)) - set_huge_pte_at(mm, address, ptep, + set_huge_pte_at(mm, address, hpte.ptep, make_pte_marker(PTE_MARKER_UFFD_WP)); - hugetlb_count_sub(pages_per_huge_page(h), mm); - page_remove_rmap(page, vma, true); + hugetlb_count_sub(hugetlb_pte_size(&hpte)/PAGE_SIZE, mm); + page_remove_rmap(hpage, vma, true); spin_unlock(ptl); - tlb_remove_page_size(tlb, page, huge_page_size(h)); /* - * Bail out after unmapping reference page if supplied + * Lower the reference count on the head page. + */ + tlb_remove_page_size(tlb, hpage, sz); + /* + * Bail out after unmapping reference page if supplied, + * and there's only one PTE mapping this page. */ - if (ref_page) + if (ref_page && hugetlb_pte_size(&hpte) == sz) break; +next_hpte: + address += hugetlb_pte_size(&hpte); } tlb_end_vma(tlb, vma); From patchwork Thu Jan 5 10:18:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39455 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp228030wrt; Thu, 5 Jan 2023 02:23:25 -0800 (PST) X-Google-Smtp-Source: AMrXdXvRiLDpbWrQQih8CNFzgov3byYh/V7gQgyRnZWk3LTVPcQLDdfmURXK6kSjTtRS7eXFeAmK X-Received: by 2002:a05:6a20:4655:b0:aa:23f9:7314 with SMTP id eb21-20020a056a20465500b000aa23f97314mr52326938pzb.46.1672914205472; Thu, 05 Jan 2023 02:23:25 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914205; cv=none; d=google.com; s=arc-20160816; b=x9QMfYMAhyvNrfZ7ccV5MRHB2pA0vQyMkS2wE0xrrRI+qSMSucuXyPPPW04w0MS4uB 3IrgTR5Wobi6cWhyqdLBQpAXzbe0/rubNIB5BzPWzATQREZjSp8jSnWKDHVhnWjbKovm KvT1aCur6LeANouS3A8i0fPzVx6Df6SOcxEHcgLaYm129xo15/NlsW80lZZjKiWoHo/R YysV1dS0ljf9dmLAASoPTFe34pteExRqg+8Hgnu1B95mtrP3LpoCY4LEhqXdfh2NwZ9V tt2i30V7+0+h7ZUTcJLrMUYdqd0/taRCEB4flbygNiAYudkZLPpfsugdn03lgPZvy9gg ziRA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=lp3NkrzzsDy9ah3gYP7Di6ebFhCh/hI0r9EhbhKs3qA=; b=P0QSsHEpQFx9hEqvcXwBjoMSjAHWy4CaMnsqky5lUQn7xSgXtxARLIrzEpQpAavb1o ZqV+vJcnqDqa1MS1FH0HHy0j0Me1ahSVKUiUzhbmUkrmANzKII5UQjwgQvBqaci7fc99 6so405tnfvZMyWUAMCOKjlSpfH72JMWJq1uNXXDAVaJhRmyJuRhUYmA1v+6eJmCsOvho vi2e543Yy1yhp+1xpziUqDEJYZUfEzVofpBrZnQ6q5fuc3hlgK3nsR3EGaa1smkP2lB8 noAkGb90S5tmEi3AnMDZNRujUIWD/vo7grdvGWib4x36O7m6BdPK48cP81TD5L1krNry /Hkg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=gcqO4fum; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k23-20020a63ba17000000b004626957c3c7si39553255pgf.193.2023.01.05.02.23.12; Thu, 05 Jan 2023 02:23:25 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=gcqO4fum; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232650AbjAEKVK (ORCPT + 99 others); Thu, 5 Jan 2023 05:21:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40490 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232741AbjAEKT5 (ORCPT ); Thu, 5 Jan 2023 05:19:57 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 44A63559C2 for ; Thu, 5 Jan 2023 02:19:22 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id y66-20020a25c845000000b00733b5049b6fso36235986ybf.3 for ; Thu, 05 Jan 2023 02:19:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=lp3NkrzzsDy9ah3gYP7Di6ebFhCh/hI0r9EhbhKs3qA=; b=gcqO4fumRx0zgQZPFIbbvhj2RDn3uzOiEjenn5HSn/KoUqOlS6cvCBPlQZqdnokXAv yuBgbgVdZ68dWkcgWO302Mw9PxijLJ23xw6JxoY2ObqFFf3QxAWs1BdbmgNwKsp6kjEu 8GFqq0LZlvusyfldGmkc6FfTZZ/RbptJV8SUrxL5hGv7ZFxVYZY2rxA32ezUN4HQKndZ n4NkO1TfQAKi7YjlKvn942sD0+vyg8DNgNqttkJLun2/ieyUyjjedHWOdKf/Toc3RDXd NYkM9a5g7CgZdcmSy/hZ0BvTwMl6CAeI0KwuHwV/HKzTEVBfjxeq4ELE0Z8GmCd1b9Ya 6emg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=lp3NkrzzsDy9ah3gYP7Di6ebFhCh/hI0r9EhbhKs3qA=; b=R8SICv1wYg3x3KFPJulWGMRhG124aXwkCJfttKj8x6001atFQHIAFFnfjSoFWLBE4Q HONNi0Hny5r5CpD125qflLBxc2dL0BP6WHOPRK6LXM2R3G1Y6/00MQIuIjTLVcQ6UhxV idFfCZYPj2qRkigrFOEnM69TNDZjrpEHSEeeSmuUWq/bjuCuKbUp82JyNywbfArniG7S GueWfLjmHGLa7o8R1dtDZ/4FvMD4e4iC42Tr5ALwS7pzxRgganTmbd9BOUBdK6oKWgtd +3DprPyV8GnEEkJqR5D6JlU95yTj2GkSFDimvPHBchNF70rZFc5gkAisOQS86Xcpg2MO M9jA== X-Gm-Message-State: AFqh2krCTcQG4vr4N8fM/breQyUs39o1Rpdm9H9t03bbJFij4/2svi0d 9JQ9RZY9k6qS3h6Ua2u45Z9TiYUO+FIMIp7y X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:13c2:0:b0:716:10cb:dc2e with SMTP id 185-20020a2513c2000000b0071610cbdc2emr6990557ybt.530.1672913962037; Thu, 05 Jan 2023 02:19:22 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:16 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-19-jthoughton@google.com> Subject: [PATCH 18/46] hugetlb: add HGM support for hugetlb_change_protection From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177685844195303?= X-GMAIL-MSGID: =?utf-8?q?1754177685844195303?= The main change here is to do a high-granularity walk and pulling the shift from the walk (not from the hstate). Signed-off-by: James Houghton --- mm/hugetlb.c | 59 +++++++++++++++++++++++++++++++++------------------- 1 file changed, 38 insertions(+), 21 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index dfd6c1491ac3..73672d806172 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6798,15 +6798,15 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, { struct mm_struct *mm = vma->vm_mm; unsigned long start = address; - pte_t *ptep; pte_t pte; struct hstate *h = hstate_vma(vma); - unsigned long pages = 0, psize = huge_page_size(h); + unsigned long base_pages = 0, psize = huge_page_size(h); bool shared_pmd = false; struct mmu_notifier_range range; unsigned long last_addr_mask; bool uffd_wp = cp_flags & MM_CP_UFFD_WP; bool uffd_wp_resolve = cp_flags & MM_CP_UFFD_WP_RESOLVE; + struct hugetlb_pte hpte; /* * In the case of shared PMDs, the area to flush could be beyond @@ -6824,28 +6824,30 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, hugetlb_vma_lock_write(vma); i_mmap_lock_write(vma->vm_file->f_mapping); last_addr_mask = hugetlb_mask_last_page(h); - for (; address < end; address += psize) { + while (address < end) { spinlock_t *ptl; - ptep = hugetlb_walk(vma, address, psize); - if (!ptep) { - address |= last_addr_mask; + + if (hugetlb_full_walk(&hpte, vma, address)) { + address = (address | last_addr_mask) + psize; continue; } - ptl = huge_pte_lock(h, mm, ptep); - if (huge_pmd_unshare(mm, vma, address, ptep)) { + + ptl = hugetlb_pte_lock(&hpte); + if (hugetlb_pte_size(&hpte) == psize && + huge_pmd_unshare(mm, vma, address, hpte.ptep)) { /* * When uffd-wp is enabled on the vma, unshare * shouldn't happen at all. Warn about it if it * happened due to some reason. */ WARN_ON_ONCE(uffd_wp || uffd_wp_resolve); - pages++; + base_pages += psize / PAGE_SIZE; spin_unlock(ptl); shared_pmd = true; - address |= last_addr_mask; + address = (address | last_addr_mask) + psize; continue; } - pte = huge_ptep_get(ptep); + pte = huge_ptep_get(hpte.ptep); if (unlikely(is_hugetlb_entry_hwpoisoned(pte))) { /* Nothing to do. */ } else if (unlikely(is_hugetlb_entry_migration(pte))) { @@ -6861,7 +6863,7 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, entry = make_readable_migration_entry( swp_offset(entry)); newpte = swp_entry_to_pte(entry); - pages++; + base_pages += hugetlb_pte_size(&hpte) / PAGE_SIZE; } if (uffd_wp) @@ -6869,34 +6871,49 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, else if (uffd_wp_resolve) newpte = pte_swp_clear_uffd_wp(newpte); if (!pte_same(pte, newpte)) - set_huge_pte_at(mm, address, ptep, newpte); + set_huge_pte_at(mm, address, hpte.ptep, newpte); } else if (unlikely(is_pte_marker(pte))) { /* No other markers apply for now. */ WARN_ON_ONCE(!pte_marker_uffd_wp(pte)); if (uffd_wp_resolve) /* Safe to modify directly (non-present->none). */ - huge_pte_clear(mm, address, ptep, psize); + huge_pte_clear(mm, address, hpte.ptep, + hugetlb_pte_size(&hpte)); } else if (!huge_pte_none(pte)) { pte_t old_pte; - unsigned int shift = huge_page_shift(hstate_vma(vma)); + unsigned int shift = hpte.shift; - old_pte = huge_ptep_modify_prot_start(vma, address, ptep); + if (unlikely(!hugetlb_pte_present_leaf(&hpte, pte))) { + /* + * Someone split the PTE from under us, so retry + * the walk, + */ + spin_unlock(ptl); + continue; + } + + old_pte = huge_ptep_modify_prot_start( + vma, address, hpte.ptep); pte = huge_pte_modify(old_pte, newprot); - pte = arch_make_huge_pte(pte, shift, vma->vm_flags); + pte = arch_make_huge_pte( + pte, shift, vma->vm_flags); if (uffd_wp) pte = huge_pte_mkuffd_wp(pte); else if (uffd_wp_resolve) pte = huge_pte_clear_uffd_wp(pte); - huge_ptep_modify_prot_commit(vma, address, ptep, old_pte, pte); - pages++; + huge_ptep_modify_prot_commit( + vma, address, hpte.ptep, + old_pte, pte); + base_pages += hugetlb_pte_size(&hpte) / PAGE_SIZE; } else { /* None pte */ if (unlikely(uffd_wp)) /* Safe to modify directly (none->non-present). */ - set_huge_pte_at(mm, address, ptep, + set_huge_pte_at(mm, address, hpte.ptep, make_pte_marker(PTE_MARKER_UFFD_WP)); } spin_unlock(ptl); + address += hugetlb_pte_size(&hpte); } /* * Must flush TLB before releasing i_mmap_rwsem: x86's huge_pmd_unshare @@ -6919,7 +6936,7 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, hugetlb_vma_unlock_write(vma); mmu_notifier_invalidate_range_end(&range); - return pages << h->order; + return base_pages; } /* Return true if reservation was successful, false otherwise. */ From patchwork Thu Jan 5 10:18:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39444 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp227775wrt; Thu, 5 Jan 2023 02:22:46 -0800 (PST) X-Google-Smtp-Source: AMrXdXt0lGcFGZddbr7EE4Lp56Cq5AHeFziBsWrvbkdySqnpvUG3+6Yj+OcxhIeeUDlzshhBQ5j8 X-Received: by 2002:aa7:9154:0:b0:57b:30b6:9e85 with SMTP id 20-20020aa79154000000b0057b30b69e85mr49945909pfi.22.1672914166705; Thu, 05 Jan 2023 02:22:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914166; cv=none; d=google.com; s=arc-20160816; b=yl29mHRz2J0Rh3VfSJxDbfaFUH+BqXTxPPj74mE1fgB5DUiky5HwcJgnOd1y6xwTzh epDEO3lFDl2dj/XW9sn91rwu9Gw9ZPcg782VEFg+1Ys8eNkiXKrE5LKGRaia+jDERE+9 YZYFPdiJT33yMiyRIltnS23L3wyajCh9+EI31tbzNHcmYR2Ww8fnrEZEnF2UMeZyXAKZ gU2hkZFiGRBR56CcNCLsGmqw1HPFhLzBJmG51qOReST1sF5v0+EhTVORMW/EQI6d+tb7 FSQwx4EYqODeXdFVPWTKNUthevdwqLn+Ctluawpq9KdEwa0h1f++NzPN7mAkDomPIcM8 7Bdg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=BXNpcEQhxfziaYYe5JKwye/WpzbaEjEGrnqAXOzwqC4=; b=YoXCKPweot6nBZqkr60Zs+sAWPRmkxpSffX8l9o+mVOj+JYnUOAhNNC/KgJapLn7Ob YJCdMc3cZU+U8arD04xcQkZhNJXEINikuOgNvGMa7n1WLMJCfhnMytyAHU8wvWbW2gMr M88V+lEPBo5G+qflRERIyQvlIyqmQCLDZiLgnrHUOmikmvrmgVpjbWauHX76CaBgNjlG 9U/96CjWZCCre+kMatdj6mNUhpZFCooPKm6OV9HMVdC42uhhnH8YbZY0bAzI+QcT8GEG zVa+jJfzt1ofhq4jlhOziMIus53Np6AuUoV91G9j/yxz+423KaOBiviAyEMk0NUuV9MG QX8A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="V/Jfcp7c"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p8-20020a056a000a0800b00580439e00fasi38466993pfh.45.2023.01.05.02.22.34; Thu, 05 Jan 2023 02:22:46 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="V/Jfcp7c"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232772AbjAEKVR (ORCPT + 99 others); Thu, 5 Jan 2023 05:21:17 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41288 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232637AbjAEKT5 (ORCPT ); Thu, 5 Jan 2023 05:19:57 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 88972559C8 for ; Thu, 5 Jan 2023 02:19:23 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-482d3bf0266so243517007b3.3 for ; Thu, 05 Jan 2023 02:19:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=BXNpcEQhxfziaYYe5JKwye/WpzbaEjEGrnqAXOzwqC4=; b=V/Jfcp7ceK8WRpWk4kcE61X4AiMfGhPqZ8JiYFyiLAHzV69wvdHEzn4wPN7LIZk4CE bihTqKci5bcgrW5kSTMvQE87sTXKrY8emKYYInvNKBClO/fvmBJhy9VMt8s5XVUBJAvy cjyveafAHg6xtMfAOLygapq7E3ypRwoV1bqQRPlA4hyPfPYlp0LcL9Go83tyAqpX/AKd +fsGh7mpBN3witwaQC4bf1miNUfGCnDqMY9vU+x/X6z3JTCZFv94ilBZkurucEqHB8a0 FYbuUHfSwENyyzNyQOZRtyZ3QM703eKUjEsRRXQp4MeRqBusXwkmY1T3GybPKQ9WLMUE ioXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=BXNpcEQhxfziaYYe5JKwye/WpzbaEjEGrnqAXOzwqC4=; b=HkU6CTGZdKhddrlq3MdC5mg6hzPeTX/CGt5Xb18Tkv9Kjs8lCmlO+MWYCGbiVkC59g T5ZODGrrEhMxmq/sF7jBoiCKLVHAichh9Yf37zKu9kIPpr3gzxjo8Cr5nvUuly5HtdbK xtAih13vczDveKvcvQYHUVxmLGGBNf9KocbHW+5/l1pbrEq9aBCiejmMiWqCBn8T5mIB rwKR9SQTlNWzLP7jWCVz05l0cyXmjI/hayL2XgARpx8wc1YPY1ClADeFRS/0cMUaSRJh R/zdE0Dx8wdnxeoCFxWQ9ieR9EB8xFWuoUPSYmoqM8gv9hRZKjOFCF2SPzrkdJpKjEso Vmow== X-Gm-Message-State: AFqh2kqsiuPz2jlTfCmeSe2yPHjjdBZXM8Dzg7PExl/Ed+X9gDelhkHZ UMMlx6a6Py9k4wR+byvectFe0ODnx/6pM4Iv X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:c50f:0:b0:6fc:b841:cf42 with SMTP id v15-20020a25c50f000000b006fcb841cf42mr5162375ybe.372.1672913963216; Thu, 05 Jan 2023 02:19:23 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:17 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-20-jthoughton@google.com> Subject: [PATCH 19/46] hugetlb: add HGM support for follow_hugetlb_page From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177644883175310?= X-GMAIL-MSGID: =?utf-8?q?1754177644883175310?= This enables high-granularity mapping support in GUP. In case it is confusing, pfn_offset is the offset (in PAGE_SIZE units) that vaddr points to within the subpage that hpte points to. Signed-off-by: James Houghton --- mm/hugetlb.c | 59 ++++++++++++++++++++++++++++++++-------------------- 1 file changed, 37 insertions(+), 22 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 73672d806172..30fea414d9ee 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6532,11 +6532,9 @@ static void record_subpages_vmas(struct page *page, struct vm_area_struct *vma, } static inline bool __follow_hugetlb_must_fault(struct vm_area_struct *vma, - unsigned int flags, pte_t *pte, + unsigned int flags, pte_t pteval, bool *unshare) { - pte_t pteval = huge_ptep_get(pte); - *unshare = false; if (is_swap_pte(pteval)) return true; @@ -6611,11 +6609,13 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, int err = -EFAULT, refs; while (vaddr < vma->vm_end && remainder) { - pte_t *pte; + pte_t *ptep, pte; spinlock_t *ptl = NULL; bool unshare = false; int absent; - struct page *page; + unsigned long pages_per_hpte; + struct page *page, *subpage; + struct hugetlb_pte hpte; /* * If we have a pending SIGKILL, don't keep faulting pages and @@ -6632,13 +6632,19 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, * each hugepage. We have to make sure we get the * first, for the page indexing below to work. * - * Note that page table lock is not held when pte is null. + * hugetlb_full_walk will mask the address appropriately. + * + * Note that page table lock is not held when ptep is null. */ - pte = hugetlb_walk(vma, vaddr & huge_page_mask(h), - huge_page_size(h)); - if (pte) - ptl = huge_pte_lock(h, mm, pte); - absent = !pte || huge_pte_none(huge_ptep_get(pte)); + if (hugetlb_full_walk(&hpte, vma, vaddr)) { + ptep = NULL; + absent = true; + } else { + ptl = hugetlb_pte_lock(&hpte); + ptep = hpte.ptep; + pte = huge_ptep_get(ptep); + absent = huge_pte_none(pte); + } /* * When coredumping, it suits get_dump_page if we just return @@ -6649,13 +6655,20 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, */ if (absent && (flags & FOLL_DUMP) && !hugetlbfs_pagecache_present(h, vma, vaddr)) { - if (pte) + if (ptep) spin_unlock(ptl); hugetlb_vma_unlock_read(vma); remainder = 0; break; } + if (!absent && pte_present(pte) && + !hugetlb_pte_present_leaf(&hpte, pte)) { + /* We raced with someone splitting the PTE, so retry. */ + spin_unlock(ptl); + continue; + } + /* * We need call hugetlb_fault for both hugepages under migration * (in which case hugetlb_fault waits for the migration,) and @@ -6671,7 +6684,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, vm_fault_t ret; unsigned int fault_flags = 0; - if (pte) + if (ptep) spin_unlock(ptl); hugetlb_vma_unlock_read(vma); @@ -6720,8 +6733,10 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, continue; } - pfn_offset = (vaddr & ~huge_page_mask(h)) >> PAGE_SHIFT; - page = pte_page(huge_ptep_get(pte)); + pfn_offset = (vaddr & ~hugetlb_pte_mask(&hpte)) >> PAGE_SHIFT; + subpage = pte_page(pte); + pages_per_hpte = hugetlb_pte_size(&hpte) / PAGE_SIZE; + page = compound_head(subpage); VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) && !PageAnonExclusive(page), page); @@ -6731,22 +6746,22 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, * and skip the same_page loop below. */ if (!pages && !vmas && !pfn_offset && - (vaddr + huge_page_size(h) < vma->vm_end) && - (remainder >= pages_per_huge_page(h))) { - vaddr += huge_page_size(h); - remainder -= pages_per_huge_page(h); - i += pages_per_huge_page(h); + (vaddr + pages_per_hpte < vma->vm_end) && + (remainder >= pages_per_hpte)) { + vaddr += pages_per_hpte; + remainder -= pages_per_hpte; + i += pages_per_hpte; spin_unlock(ptl); hugetlb_vma_unlock_read(vma); continue; } /* vaddr may not be aligned to PAGE_SIZE */ - refs = min3(pages_per_huge_page(h) - pfn_offset, remainder, + refs = min3(pages_per_hpte - pfn_offset, remainder, (vma->vm_end - ALIGN_DOWN(vaddr, PAGE_SIZE)) >> PAGE_SHIFT); if (pages || vmas) - record_subpages_vmas(nth_page(page, pfn_offset), + record_subpages_vmas(nth_page(subpage, pfn_offset), vma, refs, likely(pages) ? pages + i : NULL, vmas ? vmas + i : NULL); From patchwork Thu Jan 5 10:18:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39445 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp227776wrt; Thu, 5 Jan 2023 02:22:47 -0800 (PST) X-Google-Smtp-Source: AMrXdXtJ7/QnkEIXttSm84OzFjak92ECfZELoffi1v86NCFYz2JZbjEfmTuWzy9/W0wH/QhEDJc6 X-Received: by 2002:a05:6a20:2d28:b0:b2:5a61:da0e with SMTP id g40-20020a056a202d2800b000b25a61da0emr62959152pzl.62.1672914166851; Thu, 05 Jan 2023 02:22:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914166; cv=none; d=google.com; s=arc-20160816; b=fTPUcunDxlFZMTcnnH40iZRMX3DT/WJ5UTDENMx9+3HJl87SIbupYGaCHzhMd6CXli hQplUUa41X5u78DQ4/0w/TpIJBlPD2WXSVxwcyQMDkvilSmfFwfeFSpqEPtBaOxjo1nk 91JuSP6E/thZs+P73Cn4v2S6YOGbR917Sa9WtjeqBotvky/52vLOuF/sq/ohMehr5/DJ 8F+/v73lVPyqOvJMmWKYWe0QFokJ/giOZlaRQDOXhmzxi5+LL3D+v7I8ZNcrN9b64B6Q NFnreT7GoPod1yUcag0aBX4SX3fjRoJNQ/spifVx3pjfFTvZbH2o/bujVBYfmAzvu5py vjbg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=1YiV8/mi4PL7n0t7wUxWEcBb+colJN0WHKmqKjso6aI=; b=WJfkAzpO4eVNn3kTscuwBs7RvwlmO4/9YAoT0xxInLqTROfn19KCd2/NhnYACeegwF R8IuzTdhA+kfoPXc8fy88zBv/QGnUd1r9sjYjjoOjcGTRNlsaqycbBkvpis6cGrkKRSh 7X2JCGqyYDbRKvdgWaa9fFeKLb0iSK49UNJjj8JYCJWFkuRW81vv9AaYv/ygeVSCIURX GGBRYB0AEhfmSxc9Zwy0lol1CtC2+hU+EDJ5UB4laWPORepOwvTBEEDz47TBHhfx9bgT E/gQ7su8MVmy7XHdO8VOkbQiuze+OQCHzKUeMCjVjfXj8gzyLwuPnVJ+kQ4vZ9JcptLz T//g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=LLH1FHQm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e21-20020a637455000000b00478d123064dsi38990478pgn.663.2023.01.05.02.22.34; Thu, 05 Jan 2023 02:22:46 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=LLH1FHQm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232938AbjAEKVX (ORCPT + 99 others); Thu, 5 Jan 2023 05:21:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41290 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232642AbjAEKT5 (ORCPT ); Thu, 5 Jan 2023 05:19:57 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 64687559CD for ; Thu, 5 Jan 2023 02:19:25 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-4b34cf67fb6so64981647b3.6 for ; Thu, 05 Jan 2023 02:19:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=1YiV8/mi4PL7n0t7wUxWEcBb+colJN0WHKmqKjso6aI=; b=LLH1FHQmSuPREM9no8Xr1mrJ+WX7c7pQin9EAc1TvbLcGl/RkRXVwm8E1KdBkYBfGV YIbOE4N1o74dyaPeOKZcf4lLMNNXkB6ntMAbzLwmbHJOP6v9E5qeX1N0aPIuGJxFBbMc DAYDRj/A2/l8+0SDjtsFQpbfETrwBoJUzB2rpy+l5tAr/dFGII17vpR5Red3X2ZytOJ7 oWSpFA1Ze6s9PZS3/XShmJem6/Ne0GdcNEZibyC+1G61SR9BHwKMzCLxkcexZKOZqBmj o+/TsTl7bfdyibV9AZKmqM5CVurcLoGPu4otDexr+01trkvt9Pgocw+EzAHZiG5i0KHf sA1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=1YiV8/mi4PL7n0t7wUxWEcBb+colJN0WHKmqKjso6aI=; b=Kk8s3VtO8SJCYObezdiIa4NF4+tIosP4TcPNkGR/VguwI/IQt4OzzhOVUvVl93uG8J uLwWh9h7vtvXGchXteRiMJZOkcSgGKBEdnQ8pezImzlExYS0M65NJ6I2B/V4Hz8jg+lA FQCqEcUpu9RBfgEyRL+AFTJ6wM1EkTnj/w0RXOv0miRz4oFZjkvfBF2VvrGOJ7h9L9z6 ovNp5qupq38Szq3mkuTwP3nIU9m9LJr89GNPOAkF1ml+tOeFIoeeWhmAH3/sS9ABeWLO Y9iE5c7pP7V+HiyGGPFJuTPqJm6FwrHo850ZwNzzjjCttzDSvo58VO9PsOQwVWdzRual wi3w== X-Gm-Message-State: AFqh2kphvfHqGNIDoqyCiSiZxDR+xcM2sjnddGLHv+pPoW2b6kCbob7r L9SRAH7TT0Lb7U/uSHznQL5l2BOO/ocNkZ1Y X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:d06:0:b0:731:7af4:cc3e with SMTP id 6-20020a250d06000000b007317af4cc3emr3400989ybn.368.1672913964504; Thu, 05 Jan 2023 02:19:24 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:18 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-21-jthoughton@google.com> Subject: [PATCH 20/46] hugetlb: add HGM support for hugetlb_follow_page_mask From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177645432248222?= X-GMAIL-MSGID: =?utf-8?q?1754177645432248222?= The change here is very simple: do a high-granularity walk. Signed-off-by: James Houghton --- mm/hugetlb.c | 24 +++++++++++++++++------- 1 file changed, 17 insertions(+), 7 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 30fea414d9ee..718572444a73 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6553,11 +6553,10 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, unsigned long address, unsigned int flags) { struct hstate *h = hstate_vma(vma); - struct mm_struct *mm = vma->vm_mm; - unsigned long haddr = address & huge_page_mask(h); struct page *page = NULL; spinlock_t *ptl; - pte_t *pte, entry; + pte_t entry; + struct hugetlb_pte hpte; /* * FOLL_PIN is not supported for follow_page(). Ordinary GUP goes via @@ -6567,13 +6566,24 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, return NULL; hugetlb_vma_lock_read(vma); - pte = hugetlb_walk(vma, haddr, huge_page_size(h)); - if (!pte) + + if (hugetlb_full_walk(&hpte, vma, address)) goto out_unlock; - ptl = huge_pte_lock(h, mm, pte); - entry = huge_ptep_get(pte); +retry: + ptl = hugetlb_pte_lock(&hpte); + entry = huge_ptep_get(hpte.ptep); if (pte_present(entry)) { + if (unlikely(!hugetlb_pte_present_leaf(&hpte, entry))) { + /* + * We raced with someone splitting from under us. + * Keep walking to get to the real leaf. + */ + spin_unlock(ptl); + hugetlb_full_walk_continue(&hpte, vma, address); + goto retry; + } + page = pte_page(entry) + ((address & ~huge_page_mask(h)) >> PAGE_SHIFT); /* From patchwork Thu Jan 5 10:18:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39446 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp227772wrt; Thu, 5 Jan 2023 02:22:46 -0800 (PST) X-Google-Smtp-Source: AMrXdXuFXPqkG6v+rA0/zZMk/L4bmDDG5VdKfuJ1GNFBJ6/bbQuBZUHjVXgGgNMGWcNRAOXM9YC8 X-Received: by 2002:a17:902:f70c:b0:188:6b9c:d17d with SMTP id h12-20020a170902f70c00b001886b9cd17dmr71530739plo.16.1672914166533; Thu, 05 Jan 2023 02:22:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914166; cv=none; d=google.com; s=arc-20160816; b=TQOt0bqOY3mrY5tS+r6CvNBqPktpnpYMGZ2ynYJWJiGRaQmEd/4i3BebGSAQZvY6O9 RPUDX0Xuhyj4aVXHH41yWRqcWGd7UESPZTlfNXgO5IL6244bHCn/SomCXjYsyILB4xU1 UuyTqUVTbNZ7o2yG0+pZoog3wgTV6oim6CdRQ71x6POQKganIdEXtJJF38nZ8lH97dxe eGLbzz2rplemrlZrvmXMnu8D/hFMkobuiV1EwuORNFdT6rUp4Ima9pAC2DBVq5dWSMoA 2eRUppnTf9Jv1mXHQodlNWMTCloMUgr+QD/NV1OSGThmhDtfxYqyqal1cs8ogra6Z/zz jIjQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=dzlak4wlRH3uJBMFvzVwTlLZdbkzJfi0hjjO9/IWItc=; b=t0epQhLZ7eBFO+0UuGh11VYrqdsGWVdHlGEXBrn2DKrOBO6VO8IUIvruUAbQG45PW6 orxTpOa8IuEanzoDRLGyE+KFdxF3rQmomu/OiMu7Hx8T662QjG7uZGwtcjOMn6yy13TL lBnb22s7a6P2Mf4gObAm345ha5f78Vv3LTbIKRfPGIVXOsDiuTHI3sln6O4vD9oaP2+S s61iOOqsh0ujkdw6P5I2bGxZpUdmFTtmmGKrdw0G3OYZ7pibCWUNsFubQm1tCAx1AD2N GWF2VelGQrpnK1cOAJvqeMl0mES+lfERQqpU5EQUUWvhCoDkl24gFh/81LCg1Hi75rD/ 6Baw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=awstklLG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id t21-20020a170902b21500b00188ef2314b2si609902plr.79.2023.01.05.02.22.34; Thu, 05 Jan 2023 02:22:46 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=awstklLG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232779AbjAEKV3 (ORCPT + 99 others); Thu, 5 Jan 2023 05:21:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41298 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231702AbjAEKT6 (ORCPT ); Thu, 5 Jan 2023 05:19:58 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E15ED559C3 for ; Thu, 5 Jan 2023 02:19:26 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-460ab8a327eso378497417b3.23 for ; Thu, 05 Jan 2023 02:19:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=dzlak4wlRH3uJBMFvzVwTlLZdbkzJfi0hjjO9/IWItc=; b=awstklLGib5NLIDq4Z9cXiLXngXaTfD1T2MU8qCmmFIg7CftwwRRmxHAX6D7bWf5Gk eP1XAJ27BxVhrMSTPS4OjW8E1bBJRXlppA2RWBZmOpuN5hpHa1nhxolbVg5N3XX7fR/i uxhvgdxFosl+f9OSWK0aUkrSsfbYoSIksFEwk1rsCnqG3hCxFs6mirU8CHZeMTGjuX2e Q0rjB5hCGfPsTJ4KAHBF7XDazDkHbd8tLKzKTBPDcEeSaSx12m8GRvkvEisrZ27K202+ A0uBeYtWLg95Whts+TekNEMT7WDGdYBbfWNEqStD8GqMAKLEzVcmfqPLMlDyrJLjx5tS jt3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=dzlak4wlRH3uJBMFvzVwTlLZdbkzJfi0hjjO9/IWItc=; b=h8oFCQNbDDlPkokrnp1cqFe86eBkUl84pLrs+wrV46emGUc9ntfyMhQEpMIDJOcRdI ax+lg/vpLV+9aOl9eEERIcfvl77d4M34HOC06G9xzqNzWJsIjUncANQG2jVz9coW4t5t KsZ3Aq9wvkkbBYgvPvAbWeCO9N2S0jPnOwmo1blREQdK1uYOpbE/6iuWb1hjWfR7Gjvm 1DRWeDmJIyKLBSNFs4QaNyBu1GztrcBGIxfGiuinL7/Fktia5h4M3E2VekrgZn4PFK/f RoS0jbmjQhFV9uOHWq+BBXaIycKK9GxA6a1uWNUX835ZlahHf0khNOjR3SNP/f8RVzv5 afCg== X-Gm-Message-State: AFqh2kqk9Nx+qeL+UKmo+jqOm3mZD7jpMiNFAa3gDYhj1vigCts7kV0b IwxIMl7uO7CJr2luvVvDsYG8x2gr0Y4X+A6R X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6902:13cc:b0:75f:65da:a046 with SMTP id y12-20020a05690213cc00b0075f65daa046mr5651223ybu.357.1672913965974; Thu, 05 Jan 2023 02:19:25 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:19 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-22-jthoughton@google.com> Subject: [PATCH 21/46] hugetlb: use struct hugetlb_pte for walk_hugetlb_range From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177645228085650?= X-GMAIL-MSGID: =?utf-8?q?1754177645228085650?= The main change in this commit is to walk_hugetlb_range to support walking HGM mappings, but all walk_hugetlb_range callers must be updated to use the new API and take the correct action. Listing all the changes to the callers: For s390 changes, we simply ignore HGM PTEs (we don't support s390 yet). For smaps, shared_hugetlb (and private_hugetlb, although private mappings don't support HGM) may now not be divisible by the hugepage size. The appropriate changes have been made to support analyzing HGM PTEs. For pagemap, we ignore non-leaf PTEs by treating that as if they were none PTEs. We can only end up with non-leaf PTEs if they had just been updated from a none PTE. For show_numa_map, the challenge is that, if any of a hugepage is mapped, we have to count that entire page exactly once, as the results are given in units of hugepages. To support HGM mappings, we keep track of the last page that we looked it. If the hugepage we are currently looking at is the same as the last one, then we must be looking at an HGM-mapped page that has been mapped at high-granularity, and we've already accounted for it. For DAMON, we treat non-leaf PTEs as if they were blank, for the same reason as pagemap. For hwpoison, we proactively update the logic to support the case when hpte is pointing to a subpage within the poisoned hugepage. For queue_pages_hugetlb/migration, we ignore all HGM-enabled VMAs for now. For mincore, we ignore non-leaf PTEs for the same reason as pagemap. For mprotect/prot_none_hugetlb_entry, we retry the walk when we get a non-leaf PTE. Signed-off-by: James Houghton --- arch/s390/mm/gmap.c | 20 ++++++++-- fs/proc/task_mmu.c | 83 +++++++++++++++++++++++++++++----------- include/linux/pagewalk.h | 10 +++-- mm/damon/vaddr.c | 42 +++++++++++++------- mm/hmm.c | 20 +++++++--- mm/memory-failure.c | 17 ++++---- mm/mempolicy.c | 12 ++++-- mm/mincore.c | 17 ++++++-- mm/mprotect.c | 18 ++++++--- mm/pagewalk.c | 20 +++++----- 10 files changed, 180 insertions(+), 79 deletions(-) diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c index 74e1d873dce0..284466bf4f25 100644 --- a/arch/s390/mm/gmap.c +++ b/arch/s390/mm/gmap.c @@ -2626,13 +2626,25 @@ static int __s390_enable_skey_pmd(pmd_t *pmd, unsigned long addr, return 0; } -static int __s390_enable_skey_hugetlb(pte_t *pte, unsigned long addr, - unsigned long hmask, unsigned long next, +static int __s390_enable_skey_hugetlb(struct hugetlb_pte *hpte, + unsigned long addr, struct mm_walk *walk) { - pmd_t *pmd = (pmd_t *)pte; + struct hstate *h = hstate_vma(walk->vma); + pmd_t *pmd; unsigned long start, end; - struct page *page = pmd_page(*pmd); + struct page *page; + + if (huge_page_size(h) != hugetlb_pte_size(hpte)) + /* Ignore high-granularity PTEs. */ + return 0; + + if (!pte_present(huge_ptep_get(hpte->ptep))) + /* Ignore non-present PTEs. */ + return 0; + + pmd = (pmd_t *)hpte->ptep; + page = pmd_page(*pmd); /* * The write check makes sure we do not set a key on shared diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 41b5509bde0e..c353cab11eee 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -731,18 +731,28 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) } #ifdef CONFIG_HUGETLB_PAGE -static int smaps_hugetlb_range(pte_t *pte, unsigned long hmask, - unsigned long addr, unsigned long end, - struct mm_walk *walk) +static int smaps_hugetlb_range(struct hugetlb_pte *hpte, + unsigned long addr, + struct mm_walk *walk) { struct mem_size_stats *mss = walk->private; struct vm_area_struct *vma = walk->vma; struct page *page = NULL; + pte_t pte = huge_ptep_get(hpte->ptep); - if (pte_present(*pte)) { - page = vm_normal_page(vma, addr, *pte); - } else if (is_swap_pte(*pte)) { - swp_entry_t swpent = pte_to_swp_entry(*pte); + if (pte_present(pte)) { + /* We only care about leaf-level PTEs. */ + if (!hugetlb_pte_present_leaf(hpte, pte)) + /* + * The only case where hpte is not a leaf is that + * it was originally none, but it was split from + * under us. It was originally none, so exclude it. + */ + return 0; + + page = vm_normal_page(vma, addr, pte); + } else if (is_swap_pte(pte)) { + swp_entry_t swpent = pte_to_swp_entry(pte); if (is_pfn_swap_entry(swpent)) page = pfn_swap_entry_to_page(swpent); @@ -751,9 +761,9 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned long hmask, int mapcount = page_mapcount(page); if (mapcount >= 2) - mss->shared_hugetlb += huge_page_size(hstate_vma(vma)); + mss->shared_hugetlb += hugetlb_pte_size(hpte); else - mss->private_hugetlb += huge_page_size(hstate_vma(vma)); + mss->private_hugetlb += hugetlb_pte_size(hpte); } return 0; } @@ -1572,22 +1582,31 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end, #ifdef CONFIG_HUGETLB_PAGE /* This function walks within one hugetlb entry in the single call */ -static int pagemap_hugetlb_range(pte_t *ptep, unsigned long hmask, - unsigned long addr, unsigned long end, +static int pagemap_hugetlb_range(struct hugetlb_pte *hpte, + unsigned long addr, struct mm_walk *walk) { struct pagemapread *pm = walk->private; struct vm_area_struct *vma = walk->vma; u64 flags = 0, frame = 0; int err = 0; - pte_t pte; + unsigned long hmask = hugetlb_pte_mask(hpte); + unsigned long end = addr + hugetlb_pte_size(hpte); + pte_t pte = huge_ptep_get(hpte->ptep); + struct page *page; if (vma->vm_flags & VM_SOFTDIRTY) flags |= PM_SOFT_DIRTY; - pte = huge_ptep_get(ptep); if (pte_present(pte)) { - struct page *page = pte_page(pte); + /* + * We raced with this PTE being split, which can only happen if + * it was blank before. Treat it is as if it were blank. + */ + if (!hugetlb_pte_present_leaf(hpte, pte)) + return 0; + + page = pte_page(pte); if (!PageAnon(page)) flags |= PM_FILE; @@ -1868,10 +1887,16 @@ static struct page *can_gather_numa_stats_pmd(pmd_t pmd, } #endif +struct show_numa_map_private { + struct numa_maps *md; + struct page *last_page; +}; + static int gather_pte_stats(pmd_t *pmd, unsigned long addr, unsigned long end, struct mm_walk *walk) { - struct numa_maps *md = walk->private; + struct show_numa_map_private *priv = walk->private; + struct numa_maps *md = priv->md; struct vm_area_struct *vma = walk->vma; spinlock_t *ptl; pte_t *orig_pte; @@ -1883,6 +1908,7 @@ static int gather_pte_stats(pmd_t *pmd, unsigned long addr, struct page *page; page = can_gather_numa_stats_pmd(*pmd, vma, addr); + priv->last_page = page; if (page) gather_stats(page, md, pmd_dirty(*pmd), HPAGE_PMD_SIZE/PAGE_SIZE); @@ -1896,6 +1922,7 @@ static int gather_pte_stats(pmd_t *pmd, unsigned long addr, orig_pte = pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl); do { struct page *page = can_gather_numa_stats(*pte, vma, addr); + priv->last_page = page; if (!page) continue; gather_stats(page, md, pte_dirty(*pte), 1); @@ -1906,19 +1933,25 @@ static int gather_pte_stats(pmd_t *pmd, unsigned long addr, return 0; } #ifdef CONFIG_HUGETLB_PAGE -static int gather_hugetlb_stats(pte_t *pte, unsigned long hmask, - unsigned long addr, unsigned long end, struct mm_walk *walk) +static int gather_hugetlb_stats(struct hugetlb_pte *hpte, unsigned long addr, + struct mm_walk *walk) { - pte_t huge_pte = huge_ptep_get(pte); + struct show_numa_map_private *priv = walk->private; + pte_t huge_pte = huge_ptep_get(hpte->ptep); struct numa_maps *md; struct page *page; - if (!pte_present(huge_pte)) + if (!hugetlb_pte_present_leaf(hpte, huge_pte)) + return 0; + + page = compound_head(pte_page(huge_pte)); + if (priv->last_page == page) + /* we've already accounted for this page */ return 0; - page = pte_page(huge_pte); + priv->last_page = page; - md = walk->private; + md = priv->md; gather_stats(page, md, pte_dirty(huge_pte), 1); return 0; } @@ -1948,9 +1981,15 @@ static int show_numa_map(struct seq_file *m, void *v) struct file *file = vma->vm_file; struct mm_struct *mm = vma->vm_mm; struct mempolicy *pol; + char buffer[64]; int nid; + struct show_numa_map_private numa_map_private; + + numa_map_private.md = md; + numa_map_private.last_page = NULL; + if (!mm) return 0; @@ -1980,7 +2019,7 @@ static int show_numa_map(struct seq_file *m, void *v) seq_puts(m, " huge"); /* mmap_lock is held by m_start */ - walk_page_vma(vma, &show_numa_ops, md); + walk_page_vma(vma, &show_numa_ops, &numa_map_private); if (!md->pages) goto out; diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h index 27a6df448ee5..f4bddad615c2 100644 --- a/include/linux/pagewalk.h +++ b/include/linux/pagewalk.h @@ -3,6 +3,7 @@ #define _LINUX_PAGEWALK_H #include +#include struct mm_walk; @@ -31,6 +32,10 @@ struct mm_walk; * ptl after dropping the vma lock, or else revalidate * those items after re-acquiring the vma lock and before * accessing them. + * In the presence of high-granularity hugetlb entries, + * @hugetlb_entry is called only for leaf-level entries + * (hstate-level entries are ignored if they are not + * leaves). * @test_walk: caller specific callback function to determine whether * we walk over the current vma or not. Returning 0 means * "do page table walk over the current vma", returning @@ -58,9 +63,8 @@ struct mm_walk_ops { unsigned long next, struct mm_walk *walk); int (*pte_hole)(unsigned long addr, unsigned long next, int depth, struct mm_walk *walk); - int (*hugetlb_entry)(pte_t *pte, unsigned long hmask, - unsigned long addr, unsigned long next, - struct mm_walk *walk); + int (*hugetlb_entry)(struct hugetlb_pte *hpte, + unsigned long addr, struct mm_walk *walk); int (*test_walk)(unsigned long addr, unsigned long next, struct mm_walk *walk); int (*pre_vma)(unsigned long start, unsigned long end, diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c index 9d92c5eb3a1f..2383f647f202 100644 --- a/mm/damon/vaddr.c +++ b/mm/damon/vaddr.c @@ -330,11 +330,12 @@ static int damon_mkold_pmd_entry(pmd_t *pmd, unsigned long addr, } #ifdef CONFIG_HUGETLB_PAGE -static void damon_hugetlb_mkold(pte_t *pte, struct mm_struct *mm, +static void damon_hugetlb_mkold(struct hugetlb_pte *hpte, pte_t entry, + struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr) { bool referenced = false; - pte_t entry = huge_ptep_get(pte); + pte_t entry = huge_ptep_get(hpte->ptep); struct folio *folio = pfn_folio(pte_pfn(entry)); folio_get(folio); @@ -342,12 +343,12 @@ static void damon_hugetlb_mkold(pte_t *pte, struct mm_struct *mm, if (pte_young(entry)) { referenced = true; entry = pte_mkold(entry); - set_huge_pte_at(mm, addr, pte, entry); + set_huge_pte_at(mm, addr, hpte->ptep, entry); } #ifdef CONFIG_MMU_NOTIFIER if (mmu_notifier_clear_young(mm, addr, - addr + huge_page_size(hstate_vma(vma)))) + addr + hugetlb_pte_size(hpte))) referenced = true; #endif /* CONFIG_MMU_NOTIFIER */ @@ -358,20 +359,26 @@ static void damon_hugetlb_mkold(pte_t *pte, struct mm_struct *mm, folio_put(folio); } -static int damon_mkold_hugetlb_entry(pte_t *pte, unsigned long hmask, - unsigned long addr, unsigned long end, +static int damon_mkold_hugetlb_entry(struct hugetlb_pte *hpte, + unsigned long addr, struct mm_walk *walk) { - struct hstate *h = hstate_vma(walk->vma); spinlock_t *ptl; pte_t entry; - ptl = huge_pte_lock(h, walk->mm, pte); - entry = huge_ptep_get(pte); + ptl = hugetlb_pte_lock(hpte); + entry = huge_ptep_get(hpte->ptep); if (!pte_present(entry)) goto out; - damon_hugetlb_mkold(pte, walk->mm, walk->vma, addr); + if (!hugetlb_pte_present_leaf(hpte, entry)) + /* + * We raced with someone splitting a blank PTE. Treat this PTE + * as if it were blank. + */ + goto out; + + damon_hugetlb_mkold(hpte, entry, walk->mm, walk->vma, addr); out: spin_unlock(ptl); @@ -484,8 +491,8 @@ static int damon_young_pmd_entry(pmd_t *pmd, unsigned long addr, } #ifdef CONFIG_HUGETLB_PAGE -static int damon_young_hugetlb_entry(pte_t *pte, unsigned long hmask, - unsigned long addr, unsigned long end, +static int damon_young_hugetlb_entry(struct hugetlb_pte *hpte, + unsigned long addr, struct mm_walk *walk) { struct damon_young_walk_private *priv = walk->private; @@ -494,11 +501,18 @@ static int damon_young_hugetlb_entry(pte_t *pte, unsigned long hmask, spinlock_t *ptl; pte_t entry; - ptl = huge_pte_lock(h, walk->mm, pte); - entry = huge_ptep_get(pte); + ptl = hugetlb_pte_lock(hpte); + entry = huge_ptep_get(hpte->ptep); if (!pte_present(entry)) goto out; + if (!hugetlb_pte_present_leaf(hpte, entry)) + /* + * We raced with someone splitting a blank PTE. Treat this PTE + * as if it were blank. + */ + goto out; + folio = pfn_folio(pte_pfn(entry)); folio_get(folio); diff --git a/mm/hmm.c b/mm/hmm.c index 6a151c09de5e..d3e40cfdd4cb 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -468,8 +468,8 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end, #endif #ifdef CONFIG_HUGETLB_PAGE -static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask, - unsigned long start, unsigned long end, +static int hmm_vma_walk_hugetlb_entry(struct hugetlb_pte *hpte, + unsigned long start, struct mm_walk *walk) { unsigned long addr = start, i, pfn; @@ -479,16 +479,24 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask, unsigned int required_fault; unsigned long pfn_req_flags; unsigned long cpu_flags; + unsigned long hmask = hugetlb_pte_mask(hpte); + unsigned int order = hpte->shift - PAGE_SHIFT; + unsigned long end = start + hugetlb_pte_size(hpte); spinlock_t *ptl; pte_t entry; - ptl = huge_pte_lock(hstate_vma(vma), walk->mm, pte); - entry = huge_ptep_get(pte); + ptl = hugetlb_pte_lock(hpte); + entry = huge_ptep_get(hpte->ptep); + + if (!hugetlb_pte_present_leaf(hpte, entry)) { + spin_unlock(ptl); + return -EAGAIN; + } i = (start - range->start) >> PAGE_SHIFT; pfn_req_flags = range->hmm_pfns[i]; cpu_flags = pte_to_hmm_pfn_flags(range, entry) | - hmm_pfn_flags_order(huge_page_order(hstate_vma(vma))); + hmm_pfn_flags_order(order); required_fault = hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, cpu_flags); if (required_fault) { @@ -605,7 +613,7 @@ int hmm_range_fault(struct hmm_range *range) * in pfns. All entries < last in the pfn array are set to their * output, and all >= are still at their input values. */ - } while (ret == -EBUSY); + } while (ret == -EBUSY || ret == -EAGAIN); return ret; } EXPORT_SYMBOL(hmm_range_fault); diff --git a/mm/memory-failure.c b/mm/memory-failure.c index c77a9e37e27e..e7e56298d305 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -641,6 +641,7 @@ static int check_hwpoisoned_entry(pte_t pte, unsigned long addr, short shift, unsigned long poisoned_pfn, struct to_kill *tk) { unsigned long pfn = 0; + unsigned long base_pages_poisoned = (1UL << shift) / PAGE_SIZE; if (pte_present(pte)) { pfn = pte_pfn(pte); @@ -651,7 +652,8 @@ static int check_hwpoisoned_entry(pte_t pte, unsigned long addr, short shift, pfn = swp_offset_pfn(swp); } - if (!pfn || pfn != poisoned_pfn) + if (!pfn || pfn < poisoned_pfn || + pfn >= poisoned_pfn + base_pages_poisoned) return 0; set_to_kill(tk, addr, shift); @@ -717,16 +719,15 @@ static int hwpoison_pte_range(pmd_t *pmdp, unsigned long addr, } #ifdef CONFIG_HUGETLB_PAGE -static int hwpoison_hugetlb_range(pte_t *ptep, unsigned long hmask, - unsigned long addr, unsigned long end, - struct mm_walk *walk) +static int hwpoison_hugetlb_range(struct hugetlb_pte *hpte, + unsigned long addr, + struct mm_walk *walk) { struct hwp_walk *hwp = walk->private; - pte_t pte = huge_ptep_get(ptep); - struct hstate *h = hstate_vma(walk->vma); + pte_t pte = huge_ptep_get(hpte->ptep); - return check_hwpoisoned_entry(pte, addr, huge_page_shift(h), - hwp->pfn, &hwp->tk); + return check_hwpoisoned_entry(pte, addr & hugetlb_pte_mask(hpte), + hpte->shift, hwp->pfn, &hwp->tk); } #else #define hwpoison_hugetlb_range NULL diff --git a/mm/mempolicy.c b/mm/mempolicy.c index d3558248a0f0..e5859ed34e90 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -558,8 +558,8 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr, return addr != end ? -EIO : 0; } -static int queue_pages_hugetlb(pte_t *pte, unsigned long hmask, - unsigned long addr, unsigned long end, +static int queue_pages_hugetlb(struct hugetlb_pte *hpte, + unsigned long addr, struct mm_walk *walk) { int ret = 0; @@ -570,8 +570,12 @@ static int queue_pages_hugetlb(pte_t *pte, unsigned long hmask, spinlock_t *ptl; pte_t entry; - ptl = huge_pte_lock(hstate_vma(walk->vma), walk->mm, pte); - entry = huge_ptep_get(pte); + /* We don't migrate high-granularity HugeTLB mappings for now. */ + if (hugetlb_hgm_enabled(walk->vma)) + return -EINVAL; + + ptl = hugetlb_pte_lock(hpte); + entry = huge_ptep_get(hpte->ptep); if (!pte_present(entry)) goto unlock; page = pte_page(entry); diff --git a/mm/mincore.c b/mm/mincore.c index a085a2aeabd8..0894965b3944 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -22,18 +22,29 @@ #include #include "swap.h" -static int mincore_hugetlb(pte_t *pte, unsigned long hmask, unsigned long addr, - unsigned long end, struct mm_walk *walk) +static int mincore_hugetlb(struct hugetlb_pte *hpte, unsigned long addr, + struct mm_walk *walk) { #ifdef CONFIG_HUGETLB_PAGE unsigned char present; + unsigned long end = addr + hugetlb_pte_size(hpte); unsigned char *vec = walk->private; + pte_t pte = huge_ptep_get(hpte->ptep); /* * Hugepages under user process are always in RAM and never * swapped out, but theoretically it needs to be checked. */ - present = pte && !huge_pte_none(huge_ptep_get(pte)); + present = !huge_pte_none(pte); + + /* + * If the pte is present but not a leaf, we raced with someone + * splitting it. For someone to have split it, it must have been + * huge_pte_none before, so treat it as such. + */ + if (pte_present(pte) && !hugetlb_pte_present_leaf(hpte, pte)) + present = false; + for (; addr != end; vec++, addr += PAGE_SIZE) *vec = present; walk->private = vec; diff --git a/mm/mprotect.c b/mm/mprotect.c index 71358e45a742..62d8c5f7bc92 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -543,12 +543,16 @@ static int prot_none_pte_entry(pte_t *pte, unsigned long addr, 0 : -EACCES; } -static int prot_none_hugetlb_entry(pte_t *pte, unsigned long hmask, - unsigned long addr, unsigned long next, +static int prot_none_hugetlb_entry(struct hugetlb_pte *hpte, + unsigned long addr, struct mm_walk *walk) { - return pfn_modify_allowed(pte_pfn(*pte), *(pgprot_t *)(walk->private)) ? - 0 : -EACCES; + pte_t pte = huge_ptep_get(hpte->ptep); + + if (!hugetlb_pte_present_leaf(hpte, pte)) + return -EAGAIN; + return pfn_modify_allowed(pte_pfn(pte), + *(pgprot_t *)(walk->private)) ? 0 : -EACCES; } static int prot_none_test(unsigned long addr, unsigned long next, @@ -591,8 +595,10 @@ mprotect_fixup(struct mmu_gather *tlb, struct vm_area_struct *vma, (newflags & VM_ACCESS_FLAGS) == 0) { pgprot_t new_pgprot = vm_get_page_prot(newflags); - error = walk_page_range(current->mm, start, end, - &prot_none_walk_ops, &new_pgprot); + do { + error = walk_page_range(current->mm, start, end, + &prot_none_walk_ops, &new_pgprot); + } while (error == -EAGAIN); if (error) return error; } diff --git a/mm/pagewalk.c b/mm/pagewalk.c index cb23f8a15c13..05ce242f8b7e 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -3,6 +3,7 @@ #include #include #include +#include /* * We want to know the real level where a entry is located ignoring any @@ -296,20 +297,21 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end, struct vm_area_struct *vma = walk->vma; struct hstate *h = hstate_vma(vma); unsigned long next; - unsigned long hmask = huge_page_mask(h); - unsigned long sz = huge_page_size(h); - pte_t *pte; const struct mm_walk_ops *ops = walk->ops; int err = 0; + struct hugetlb_pte hpte; hugetlb_vma_lock_read(vma); do { - next = hugetlb_entry_end(h, addr, end); - pte = hugetlb_walk(vma, addr & hmask, sz); - if (pte) - err = ops->hugetlb_entry(pte, hmask, addr, next, walk); - else if (ops->pte_hole) - err = ops->pte_hole(addr, next, -1, walk); + if (hugetlb_full_walk(&hpte, vma, addr)) { + next = hugetlb_entry_end(h, addr, end); + if (ops->pte_hole) + err = ops->pte_hole(addr, next, -1, walk); + } else { + err = ops->hugetlb_entry( + &hpte, addr, walk); + next = min(addr + hugetlb_pte_size(&hpte), end); + } if (err) break; } while (addr = next, addr != end); From patchwork Thu Jan 5 10:18:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39443 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp227773wrt; Thu, 5 Jan 2023 02:22:46 -0800 (PST) X-Google-Smtp-Source: AMrXdXvLnPlwxKQx20ABj5KlqPIOMJxtpDlJPxEnDeDh6DfHQX9bv6DROG1qVsfueZFZw1nTflVT X-Received: by 2002:a05:6a20:6f47:b0:a3:7d0b:5dcb with SMTP id gu7-20020a056a206f4700b000a37d0b5dcbmr64532986pzb.15.1672914166538; Thu, 05 Jan 2023 02:22:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914166; cv=none; d=google.com; s=arc-20160816; b=GhzYUM3o0puZQGglZddsTIY/VUBK3spShooh6oe1a1qvkUduk2MvFhKyfV8Gz+mH7N LhfpJrKcFYv27ZbtEIylA8dj1PHb5AW0fKT3rOKdsCI39j6FNPxh0xS4N2yeM94Ivdea mr2vDP3Jo0jLrojt1bix+Fs3c59310L8lmFpRCjSQpgUuWY7DC+45ZlLfdahhxE1biOZ YN8qnfHP2/rPjqLhNdK41FLIMDvtzZ7hbCHNbZJdtZiERW0P4Utbk1+7afa/YW/OmMSR alDgFEnQfWQ7bf9PasBBiE6wtkQuAFOLAY9gtG/hm2r0NJKtjEj7xQCawnukoIUe189b 51Xw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=nuChqEE6EgS9ZWad0RFM9vhCC4fCrQA1CHpyknS9e9I=; b=Horn6Pk7HWUeQmBeZZC1CXTV4/hTfvybRk8B58CpjDlrOaAZvVfIEnBIhrXpBNuEYM z8biuklAE58sxxruBuzBVYp+16uQghh7P4eNwekMBEbd8ur5MGp90bkkursn1ICcyoAc LcM+u+OpF53SfxskGeWgQJIfDxBvUSPszEUbqyMcL8VB/gXSvn0E0HT/TV++a1fc0/a6 UhCmUc87tA+5+t9SYHC3YEquPImYlo9PGk3+xv2xi96oAw51P7Ijg6y6x80bFVKQA8J6 TXt608nr3wYmEYcyQBcP2CGSkLNzvGtIxyR+lUk4Dw6sdHurIlkOgMgdnNqwFcBo51YL ohJA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=PMdoUglm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w190-20020a6382c7000000b004a418549e70si8552100pgd.239.2023.01.05.02.22.34; Thu, 05 Jan 2023 02:22:46 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=PMdoUglm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232959AbjAEKVe (ORCPT + 99 others); Thu, 5 Jan 2023 05:21:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41304 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229514AbjAEKT6 (ORCPT ); Thu, 5 Jan 2023 05:19:58 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7C680559DC for ; Thu, 5 Jan 2023 02:19:27 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id t13-20020a056902018d00b0074747131938so36503864ybh.12 for ; Thu, 05 Jan 2023 02:19:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=nuChqEE6EgS9ZWad0RFM9vhCC4fCrQA1CHpyknS9e9I=; b=PMdoUglmYM3qm6mWAqtyVJgzu77MRJxWmvYG3y/y52hzYvzk3AeSDZU/v2V/pwhi1f AihoMvQwpu8u1lszjJmu97VBPUPHphQwGK+Wi8ukPijxmNbY6TtoExxE1zmWstSCaeED T5Ii9p6OfjRqSgGbZeKD1Y0Rzcm0Fdvj3xukaysFr/AEQaLMJgbLYNRWvM54l3lHDA7x fl11csrpr58Kd28oGSDnxgd7q6xGmGLlKrMpRBTnVtv5V+fe2oSKpXRf+O+a4IQLcoCC PuKdMBIByP3sRXz/ggMAPuiyizC3+IXS18St1U3mOWfSpY1iHbA15ZXf/yxY1pQexB6C posA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=nuChqEE6EgS9ZWad0RFM9vhCC4fCrQA1CHpyknS9e9I=; b=2yp3XvOgcjrMUSjDXZFOBefclY0cZNlvGSMBU+biuX8yV2OVnWVjWtlh1ZrZ2K8Tw5 DZYyZVzmDl+VFMWmRRXAv0VIoBMHEYiJyboXLQBBId35FuwZF64wSpec2+/gUxBxj3kl gKZqLL2egqzImWqNDConQDQsEoMXY/8AO4wMGQK+r5OAeNQZPmj88shUNoBKbySV08/U LEsXazUie5BMHuRoa8LRNGAE76rihzq5/zko+7ufuptrsan70UvdzeI4PxLgLACk4QR0 2DxUBfWMGxt19VpDFaqYkKNLBkJZrrjEpuP26o4tomXOePevI20FwL2vwLa3MP4THiZx n93g== X-Gm-Message-State: AFqh2koxNadWcFQL9CE5i4EBhPXIDX4ubPftqoCDiJ1J8/bj7DKXVJpN AbqRYgy+YDFuk4grrFgUWHPdNWtTJ8T/lkpi X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a81:688a:0:b0:3b7:e501:90cf with SMTP id d132-20020a81688a000000b003b7e50190cfmr2167304ywc.501.1672913967234; Thu, 05 Jan 2023 02:19:27 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:20 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-23-jthoughton@google.com> Subject: [PATCH 22/46] mm: rmap: provide pte_order in page_vma_mapped_walk From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177645092863987?= X-GMAIL-MSGID: =?utf-8?q?1754177645092863987?= page_vma_mapped_walk callers will need this information to know how HugeTLB pages are mapped. pte_order only applies if pte is not NULL. Signed-off-by: James Houghton --- include/linux/rmap.h | 1 + mm/page_vma_mapped.c | 3 +++ 2 files changed, 4 insertions(+) diff --git a/include/linux/rmap.h b/include/linux/rmap.h index bd3504d11b15..e0557ede2951 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -378,6 +378,7 @@ struct page_vma_mapped_walk { pmd_t *pmd; pte_t *pte; spinlock_t *ptl; + unsigned int pte_order; unsigned int flags; }; diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 4e448cfbc6ef..08295b122ad6 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -16,6 +16,7 @@ static inline bool not_found(struct page_vma_mapped_walk *pvmw) static bool map_pte(struct page_vma_mapped_walk *pvmw) { pvmw->pte = pte_offset_map(pvmw->pmd, pvmw->address); + pvmw->pte_order = 0; if (!(pvmw->flags & PVMW_SYNC)) { if (pvmw->flags & PVMW_MIGRATION) { if (!is_swap_pte(*pvmw->pte)) @@ -177,6 +178,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) if (!pvmw->pte) return false; + pvmw->pte_order = huge_page_order(hstate); pvmw->ptl = huge_pte_lock(hstate, mm, pvmw->pte); if (!check_pte(pvmw)) return not_found(pvmw); @@ -272,6 +274,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) } pte_unmap(pvmw->pte); pvmw->pte = NULL; + pvmw->pte_order = 0; goto restart; } pvmw->pte++; From patchwork Thu Jan 5 10:18:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39442 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp227774wrt; Thu, 5 Jan 2023 02:22:46 -0800 (PST) X-Google-Smtp-Source: AMrXdXuXDygWp4al9gI5VPF2vz70+dGzbACUn5eRE+TFax/rDa70QsfBSCdMAMK0ygAhT4J5+/+x X-Received: by 2002:aa7:9486:0:b0:581:e756:76a8 with SMTP id z6-20020aa79486000000b00581e75676a8mr29634883pfk.19.1672914166546; Thu, 05 Jan 2023 02:22:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914166; cv=none; d=google.com; s=arc-20160816; b=u/P//XfLUGXwr5xhnSTcOF8FRMQczels4gGCduUp/Z1OsURgBBY//x+WhkMYwdu99F F4I5+ACNhErzojgNQu/tL8hA8n+S68fiF8xZcW3bK3L4H1GR+d8OWKCciE31KPqQcw8Z jErz/85ThbuRobIzrf/MKtIE1srD2l44vS2UIFEWrgXy4YJXc02Fqk2gk2d1GUnSZk4f 4sB2PfI3JPwKG5dnOsOSZebuOmtiY6E2xMbxgJsoUHl311OQVx+lBUWwrAK3O/Qet7lg MMcSktuDj/QJsnyWwI2NSIGr1ioW9D/OxGo847GpIbFHVExVlXzskX4R6b4hf00EpUUE sTtA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=6PQiqeYYa8WUfPyd+HQWnupwn9/KSP4leVRdHky4yOM=; b=PjVKQJpdR9yf2VOwFA64wYvCQzX9XofyhMW6n65XyHY6rO1IFPzaozWLhuvzwkmsK2 +G1+4621T1UhuTUgwdVzL5YCva2Mbh5BO3l7faMNR270QzGHOBha0Y9fJTx/6fsidRUd 7WTObu2hAyDHC9chUiyxUtoSoqhonFQculsAlLH59gAoc6Iz0dtq77feruahGRhH6zRj l2bih5I0GCLNdqVjCT01HMjwUpmT23kzWPEj8TonnL8wCIi2N1L6y+uaDd5B2Ikw/uS/ FqHu7AT6DVPRqf7jvCpWz6mJw3PeU1xzMAT2WK4gxIdPZLj0HwOTlqnTJCFUMtnhHZmW ShNw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=d8vpxkw4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b74-20020a621b4d000000b00581f652f124si17623464pfb.156.2023.01.05.02.22.34; Thu, 05 Jan 2023 02:22:46 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=d8vpxkw4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232968AbjAEKVj (ORCPT + 99 others); Thu, 5 Jan 2023 05:21:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40268 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232775AbjAEKT7 (ORCPT ); Thu, 5 Jan 2023 05:19:59 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 725F5559E7 for ; Thu, 5 Jan 2023 02:19:29 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id e12-20020a25500c000000b007b48c520262so1904388ybb.14 for ; Thu, 05 Jan 2023 02:19:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=6PQiqeYYa8WUfPyd+HQWnupwn9/KSP4leVRdHky4yOM=; b=d8vpxkw4cTLgxinECDNkHM+FwEcUinhdEQd7152gUdKOFJEIQWa1NKtdQXLphkyKcT bpdH2lMDTgH/beB2g0VRtLMbp+YcroXFdVWLsiB6oveY6i22u5iwTllcBXk4KZYEr5Lx jObYGpas5lPGHqnShPWhNo4LAECP+oEcPhuNZUZ42WhflXI8bNSy0hV3UJuJvQz9gAgU aaKzW0AsLx6ESc+6vZVPFmko4ZGS7CpMbtELN0GU2kUY+xSiGVcH6pE6avyLi3c5Nfa7 BUv7WnFzE7GEtpQgVFx2SZ3xKkQNMHAOjJODDJtAiHOnqYYOeN0dJ26YV8xiaDyNelrQ A/ig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=6PQiqeYYa8WUfPyd+HQWnupwn9/KSP4leVRdHky4yOM=; b=ai0elht5qYDCtIAyphQjSHum/aQ3uPL3cBT/cTQQs7w8pz+t5pJXXsXBzhc3a5WQH6 pqcmi2rPtAJQC9Duq98oTFdijlb9IVg7vP+m3Fc8Pp3wo5PEEJvBCa041nUYDZd+zDXv /Q1vAQYWrid5s2D2Ik5it4ZsTVl4igBiWRAJ5JCbyf6CapSuljORMMMOmewoy3LB/xcl 9RLj5pxKdlITFHN7ANvjFpIzGWRtiGAyDBVH17WQPJCie5+XL7xvmygFyNglsHpVqOhe cB5d16eu/mbgTsRcmokBrRyi/BHtyazZdu/n35EQiwpZZKjRl4W0dbRCrKAXIryyc5Hw jn7A== X-Gm-Message-State: AFqh2ko2f8FURqgnw0/LxvONnffsEw3FH8vIc/OkKLFcUZ5WnDpCi049 LPI6wPf76VX+Y7YHoRjegWE2BMXlGbqNKx9Y X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:ab26:0:b0:703:481:a16b with SMTP id u35-20020a25ab26000000b007030481a16bmr4722307ybi.301.1672913968726; Thu, 05 Jan 2023 02:19:28 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:21 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-24-jthoughton@google.com> Subject: [PATCH 23/46] mm: rmap: make page_vma_mapped_walk callers use pte_order From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177644896919757?= X-GMAIL-MSGID: =?utf-8?q?1754177644896919757?= This also updates the callers' hugetlb mapcounting code to handle mapcount properly for subpage-mapped hugetlb pages. Signed-off-by: James Houghton --- mm/migrate.c | 2 +- mm/rmap.c | 17 +++++++++++++---- 2 files changed, 14 insertions(+), 5 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index 832f639fc49a..0062689f4878 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -244,7 +244,7 @@ static bool remove_migration_pte(struct folio *folio, #ifdef CONFIG_HUGETLB_PAGE if (folio_test_hugetlb(folio)) { - unsigned int shift = huge_page_shift(hstate_vma(vma)); + unsigned int shift = pvmw.pte_order + PAGE_SHIFT; pte = arch_make_huge_pte(pte, shift, vma->vm_flags); if (folio_test_anon(folio)) diff --git a/mm/rmap.c b/mm/rmap.c index 8a24b90d9531..ff7e6c770b0a 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1608,7 +1608,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, if (PageHWPoison(subpage) && !(flags & TTU_IGNORE_HWPOISON)) { pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); if (folio_test_hugetlb(folio)) { - hugetlb_count_sub(folio_nr_pages(folio), mm); + hugetlb_count_sub(1UL << pvmw.pte_order, mm); set_huge_pte_at(mm, address, pvmw.pte, pteval); } else { dec_mm_counter(mm, mm_counter(&folio->page)); @@ -1767,7 +1767,11 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, * * See Documentation/mm/mmu_notifier.rst */ - page_remove_rmap(subpage, vma, folio_test_hugetlb(folio)); + if (folio_test_hugetlb(folio)) + page_remove_rmap(&folio->page, vma, true); + else + page_remove_rmap(subpage, vma, false); + if (vma->vm_flags & VM_LOCKED) mlock_page_drain_local(); folio_put(folio); @@ -2030,7 +2034,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, } else if (PageHWPoison(subpage)) { pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); if (folio_test_hugetlb(folio)) { - hugetlb_count_sub(folio_nr_pages(folio), mm); + hugetlb_count_sub(1L << pvmw.pte_order, mm); set_huge_pte_at(mm, address, pvmw.pte, pteval); } else { dec_mm_counter(mm, mm_counter(&folio->page)); @@ -2122,7 +2126,10 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, * * See Documentation/mm/mmu_notifier.rst */ - page_remove_rmap(subpage, vma, folio_test_hugetlb(folio)); + if (folio_test_hugetlb(folio)) + page_remove_rmap(&folio->page, vma, true); + else + page_remove_rmap(subpage, vma, false); if (vma->vm_flags & VM_LOCKED) mlock_page_drain_local(); folio_put(folio); @@ -2206,6 +2213,8 @@ static bool page_make_device_exclusive_one(struct folio *folio, args->owner); mmu_notifier_invalidate_range_start(&range); + VM_BUG_ON_FOLIO(folio_test_hugetlb(folio), folio); + while (page_vma_mapped_walk(&pvmw)) { /* Unexpected PMD-mapped THP? */ VM_BUG_ON_FOLIO(!pvmw.pte, folio); From patchwork Thu Jan 5 10:18:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39447 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp227787wrt; Thu, 5 Jan 2023 02:22:48 -0800 (PST) X-Google-Smtp-Source: AMrXdXsJemcKvQg8VvB4UbJK74y20kwXYP24dOX9ZwFtfhyeOaEUTqLmidwGF/LMaljsuKOO0a3w X-Received: by 2002:a17:902:f788:b0:192:dda4:30e2 with SMTP id q8-20020a170902f78800b00192dda430e2mr9529706pln.52.1672914168264; Thu, 05 Jan 2023 02:22:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914168; cv=none; d=google.com; s=arc-20160816; b=kZv2RkvaY1YkaxZ6DbY2NSuyZJACiS6rEMPa9WLBDiJ/UIofFmJIzQ2XVP+/jeHqd/ OxeXX0AGZ78/tFWaIzo+3m6UyCeAsxzJsotaYXIVUQ5NkeI4CoIAwI4CVOVNZ4FUH5lz E3BOzfYxT7Wp51iKxXGHMUScpzShXSieLNC/h9Os1jc/aXLvJn6sKfSi8NJ45Ere3BiF oW9VNs7Y+zXxKjZ+fDhCMw9eE8Omkzw++ClvxeWdZ4pkwegeha2qdgl6/QEC+jlefJNE RiQqCLIZbuBzcm1bJ1uBhYgUpmupxkzrU71rBHjOh6+hyI1P5iINenP8Y1ANtKcXVZlg HmoA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=NiUhznaM6wSBHvFTCRNfcVBZtE9H57LW0vfZq/kiKq4=; b=ZJ9+Or7y34DI4IRErlJFVGMZ5qME+OwsmRR8Cek9v2MwWM6y5X1uvqaYFlDyXcQEam PMJP3dAYDGc9dHwbJlRkd6HYEyNcCN28YmuUYB+dPVLlrc21PZ0Ifx1fUmpJmit+tvbM MSK8NnDlRsGo1+zvjWP6bMLOQI0wGUJFqDrq8INUDhwwyyrfaK3KFV8C2fYDOB694mDv mDC8t00zt2Ee/2x5XysaZdTpG6YsbiocLExAco4E6wPIcrHupRYnRM6k6uOZ0UYSuQpg NHhNqYI6sssc7drt0o2F+recQb1yYq6N5Ek0Xns6Qpx//H0KF0szhXV3Jse3jx0629Y+ vQGA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=R0e0fs1q; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q24-20020a170902bd9800b00189c83914c7si36057006pls.89.2023.01.05.02.22.35; Thu, 05 Jan 2023 02:22:48 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=R0e0fs1q; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232977AbjAEKVo (ORCPT + 99 others); Thu, 5 Jan 2023 05:21:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40780 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232670AbjAEKT7 (ORCPT ); Thu, 5 Jan 2023 05:19:59 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 63722559F3 for ; Thu, 5 Jan 2023 02:19:30 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id n203-20020a2572d4000000b0078f09db9888so19454996ybc.18 for ; Thu, 05 Jan 2023 02:19:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=NiUhznaM6wSBHvFTCRNfcVBZtE9H57LW0vfZq/kiKq4=; b=R0e0fs1qHnnMIp3WQSIqsS/oYtW5o+Yj7Qa36R5hH23w7ZdGxSpC7WlTWoe3UlQfRD dGgTxvzDN/xImPp7YvQkVKm27XaeHjtfDYvAANhaT1tIjTWlY7bJyEtqoyb0eUUUqrU8 QXro3ShToMB2A8M5U7Ys2rZi65CcMRN/fFJOZAG6TsQBgyFYz/TZFDkv9hIUQiwwQiem r9xHbnsfSv7cyl5PTzK6aFGrR3WmNbvqetolrPlO4hROQfy+iN7qwrrmjuqcIxUxBVSJ t16F0RLTvqQFbGkRUXJ/5oh9tgpANRdg+EBmcNIs00ZH4Psbt9wmtDm5WNvKRja9aVfR jqgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=NiUhznaM6wSBHvFTCRNfcVBZtE9H57LW0vfZq/kiKq4=; b=vHvJ8MyHyvWVdwTREtpZPBjHzuNKwNhQ34zN7JlxqlbvPrD4jIUqouu2ojx5WzmUoa +D2Kkm6KjwxTVkV6DamCwAaBapTu8EwFXWpIZbOXtxWxJ3hiJ3twcc955C4/3feyEkF2 hTQj8w9KOEenPDnTUNBFaLJXuGsZulvq5G4e8M1dHmHBdGcn+O07Sel1gpVPBM5nyldp 1VNeg40r14DtMKNtSsNLiTdBSyv10A49IiSeREne5+7BqMlQkcP0FGLSQZSLnEe6yeK0 54COwIknJ2enVd7aGE9YR6JLb5s2QuS4r1YtkUDof/5+oUHzEe9WUMiumXZQaZogB0ps 4m4A== X-Gm-Message-State: AFqh2kp1DtFM8qCbrHBRp9FRCqg9Dv72yk80gs2JQ/zTez60mf/Pj01m PdLETY/nAlDp+xwI+KQ0CfkmHzoi2y7Xav9d X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:cf02:0:b0:7b4:fa63:5519 with SMTP id f2-20020a25cf02000000b007b4fa635519mr198616ybg.270.1672913970030; Thu, 05 Jan 2023 02:19:30 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:22 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-25-jthoughton@google.com> Subject: [PATCH 24/46] rmap: update hugetlb lock comment for HGM From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177646784674024?= X-GMAIL-MSGID: =?utf-8?q?1754177646784674024?= The VMA lock is used to prevent high-granularity HugeTLB mappings from being collapsed while other threads are doing high-granularity page table walks. Signed-off-by: James Houghton --- include/linux/hugetlb.h | 12 ++++++++++++ mm/rmap.c | 3 ++- 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index b7cf45535d64..daf993fdbc38 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -156,6 +156,18 @@ struct file_region { #endif }; +/* + * The HugeTLB VMA lock is used to synchronize HugeTLB page table walks. + * Right now, it is only used for VM_SHARED mappings. + * - The read lock is held when we want to stabilize mappings (prevent PMD + * unsharing or MADV_COLLAPSE for high-granularity mappings). + * - The write lock is held when we want to free mappings (PMD unsharing and + * MADV_COLLAPSE for high-granularity mappings). + * + * Note: For PMD unsharing and MADV_COLLAPSE, the i_mmap_rwsem is held for + * writing as well, so page table walkers will also be safe if they hold + * i_mmap_rwsem for at least reading. See hugetlb_walk() for more information. + */ struct hugetlb_vma_lock { struct kref refs; struct rw_semaphore rw_sema; diff --git a/mm/rmap.c b/mm/rmap.c index ff7e6c770b0a..076ea77010e5 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -47,7 +47,8 @@ * * hugetlbfs PageHuge() take locks in this order: * hugetlb_fault_mutex (hugetlbfs specific page fault mutex) - * vma_lock (hugetlb specific lock for pmd_sharing) + * vma_lock (hugetlb specific lock for pmd_sharing and high-granularity + * mapping) * mapping->i_mmap_rwsem (also used for hugetlb pmd sharing) * page->flags PG_locked (lock_page) */ From patchwork Thu Jan 5 10:18:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39464 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp228217wrt; Thu, 5 Jan 2023 02:24:04 -0800 (PST) X-Google-Smtp-Source: AMrXdXsiPiEUuTmtg3SWUlvsLMq2HJxRCVHQKy3x1L6pqd1CGI4dhLeil7oMkfbK+ORRS3GFfl9+ X-Received: by 2002:a17:903:2786:b0:192:8a19:9b4f with SMTP id jw6-20020a170903278600b001928a199b4fmr32768924plb.41.1672914243963; Thu, 05 Jan 2023 02:24:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914243; cv=none; d=google.com; s=arc-20160816; b=xg4PTp4dBviVBA18Igeu2aKs5hxisDlCO0YuaBS+ZueTwUL0p5lJTAbKibXhnlQddK iC/EpZ1R5775uj0awq0FDiZm6e+1Vl8tHw2hJAAa+joSKXt0gCb4gwDOqJgzgE6sriE4 FRfauEQ5GyKvRmAkpBq1Q89nBP9+6R+BTXmAixnQIxpZFDS3e5eTaO7maVVRqCSvJHI3 41RHY0NksbhXnQuX1WRqVoiRhI3jZ+lcx9urFMOdjGRdzcVzMZejE+W8aEngQaLfa2zJ O4m4XiNJHS5DZdSacLPB0+Txe2Gzp2YBRXyZRudw5qmIN4lpCN9jbmLvYUj53qOTDtx4 YzOQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=9Yt8Zfl8fNE6CjUoa2m2ZbJ28GLtsXtxPk8weKgz6Sk=; b=hkM2VlDept7ckmc6yvItUWYTkRkdSHCYjtDUAvX91wOJfNuMu2RtPRI28pvHOwpGrg CmKlKjRnD/6J5mCjjnr86kCxuhEa8uDTmjQmsEUhtUBt9ZAfLG7H55dTey9LXETjzgk/ 5jEDpEwbmAi/I88GrFvKzMlHGHUD11hCebuipPqBKC3+4fz5Twh2QqwrLA0qqGpYyTuj 9utkDY1+ER/pVv5t34ENrcyvvvRVz/euhiSkZKxZHFVsygqHlcviyip0XXgMriMLiqIa fccZG+iiw8U/wWFB+DdZRCXWTFImd7dRlt++TMOOGBZE7NwcJxEZKqrL+bnlb277NQTH B04w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="mmJynU/T"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c16-20020a170903235000b00189dc29c89esi40076967plh.129.2023.01.05.02.23.51; Thu, 05 Jan 2023 02:24:03 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="mmJynU/T"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232488AbjAEKWF (ORCPT + 99 others); Thu, 5 Jan 2023 05:22:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40798 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231841AbjAEKUA (ORCPT ); Thu, 5 Jan 2023 05:20:00 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6956D559FA for ; Thu, 5 Jan 2023 02:19:31 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id t13-20020a056902018d00b0074747131938so36503973ybh.12 for ; Thu, 05 Jan 2023 02:19:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=9Yt8Zfl8fNE6CjUoa2m2ZbJ28GLtsXtxPk8weKgz6Sk=; b=mmJynU/TU9vgbAWdVBnooR6qbWy6e6Xt4HjkahKPBivEshc97MtogxTwIBRYXaYqJ0 fSpmPH7oyypf8QtZl1TBck6C8Qnc5vP3vXChtZzIua5/HPzB+uT2N0El75+vpSdXgb3I fXUucvYApqmIOdNMwmWBYNiybAbob8pq547IG5xhW67a2+7RxSBPQsKdkno0G9NfjnHR vThJc94sL2B5jTo0R162NfJZTasyE6NQSOqGcbgqMDv7BN8cJUhlAVO1ikA07733qw6v J0FL1bdTenfsSmm7z4ZB7CRnQWtuNtJNF7SWWWGOq6wzEvgYIHYppRVKiHaPtU8vk09q t6pA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=9Yt8Zfl8fNE6CjUoa2m2ZbJ28GLtsXtxPk8weKgz6Sk=; b=cXo5JUEh4o8GWtFh0oryi0iMbw+nEYMwepqaRK37DFNyHS3xixrNsxy4GnAhhfESbO 9Cc+Cq3cT/mYzX2t9pxUg03ukXJ8lpJgCRRqcW2Yt0P4lj5u2wFhx+DvB46SKjN4/JVh wI5wlL7udYE7HTWNkeMVcA/r9zlqCdwAGo5W7XqMlM/7LJRDbRyMjzjVjM4Nv8flAxYe Tl8FDnvyy0Dw/NzR8+6iK+n9F9xdJq3gaIlBS9FxAayJ3WDmnmFrYcyJEg0BWHETQT5P npn259uRI1jWaOJ1o6NGg1SnMWgcMbWkFOiUPiF1hMA83mfX0URmJCGaQteSQUy7q9zz D0Jw== X-Gm-Message-State: AFqh2ko+AcwQSKmLOxIONL3y7Ua3XET/wdCAa6h/3QWWjeX7nRkD+wsj zK3FLGeOLD6lWvNihQTecY0ehcRN0CF9cWL0 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:5056:0:b0:7b6:2b8f:f2c0 with SMTP id e83-20020a255056000000b007b62b8ff2c0mr25432ybb.46.1672913971225; Thu, 05 Jan 2023 02:19:31 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:23 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-26-jthoughton@google.com> Subject: [PATCH 25/46] hugetlb: update page_vma_mapped to do high-granularity walks From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177726321036039?= X-GMAIL-MSGID: =?utf-8?q?1754177726321036039?= Update the HugeTLB logic to look a lot more like the PTE-mapped THP logic. When a user calls us in a loop, we will update pvmw->address to walk to each page table entry that could possibly map the hugepage containing pvmw->pfn. Make use of the new pte_order so callers know what size PTE they're getting. The !pte failure case is changed to call not_found() instead of just returning false. This should be a no-op, but if somehow the hstate-level PTE were deallocated between iterations, not_found() should be called to drop locks. Signed-off-by: James Houghton --- mm/page_vma_mapped.c | 59 +++++++++++++++++++++++++++++++------------- 1 file changed, 42 insertions(+), 17 deletions(-) diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 08295b122ad6..03e8a4987272 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -133,7 +133,8 @@ static void step_forward(struct page_vma_mapped_walk *pvmw, unsigned long size) * * Returns true if the page is mapped in the vma. @pvmw->pmd and @pvmw->pte point * to relevant page table entries. @pvmw->ptl is locked. @pvmw->address is - * adjusted if needed (for PTE-mapped THPs). + * adjusted if needed (for PTE-mapped THPs and high-granularity-mapped HugeTLB + * pages). * * If @pvmw->pmd is set but @pvmw->pte is not, you have found PMD-mapped page * (usually THP). For PTE-mapped THP, you should run page_vma_mapped_walk() in @@ -165,23 +166,47 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) if (unlikely(is_vm_hugetlb_page(vma))) { struct hstate *hstate = hstate_vma(vma); - unsigned long size = huge_page_size(hstate); - /* The only possible mapping was handled on last iteration */ - if (pvmw->pte) - return not_found(pvmw); - /* - * All callers that get here will already hold the - * i_mmap_rwsem. Therefore, no additional locks need to be - * taken before calling hugetlb_walk(). - */ - pvmw->pte = hugetlb_walk(vma, pvmw->address, size); - if (!pvmw->pte) - return false; + struct hugetlb_pte hpte; + pte_t pteval; + + end = (pvmw->address & huge_page_mask(hstate)) + + huge_page_size(hstate); + + do { + if (pvmw->pte) { + if (pvmw->ptl) + spin_unlock(pvmw->ptl); + pvmw->ptl = NULL; + pvmw->address += PAGE_SIZE << pvmw->pte_order; + if (pvmw->address >= end) + return not_found(pvmw); + } - pvmw->pte_order = huge_page_order(hstate); - pvmw->ptl = huge_pte_lock(hstate, mm, pvmw->pte); - if (!check_pte(pvmw)) - return not_found(pvmw); + /* + * All callers that get here will already hold the + * i_mmap_rwsem. Therefore, no additional locks need to + * be taken before calling hugetlb_walk(). + */ + if (hugetlb_full_walk(&hpte, vma, pvmw->address)) + return not_found(pvmw); + +retry: + pvmw->pte = hpte.ptep; + pvmw->pte_order = hpte.shift - PAGE_SHIFT; + pvmw->ptl = hugetlb_pte_lock(&hpte); + pteval = huge_ptep_get(hpte.ptep); + if (pte_present(pteval) && !hugetlb_pte_present_leaf( + &hpte, pteval)) { + /* + * Someone split from under us, so keep + * walking. + */ + spin_unlock(pvmw->ptl); + hugetlb_full_walk_continue(&hpte, vma, + pvmw->address); + goto retry; + } + } while (!check_pte(pvmw)); return true; } From patchwork Thu Jan 5 10:18:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39452 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp228002wrt; Thu, 5 Jan 2023 02:23:22 -0800 (PST) X-Google-Smtp-Source: AMrXdXvh5v0WN5TEzgjtZrXj3YCpWrc3CyMZOaZ0N/H5OMFH7gVla1yWIdTTfGQuNwJjm34mBIFv X-Received: by 2002:a17:902:e543:b0:189:a6b4:91ed with SMTP id n3-20020a170902e54300b00189a6b491edmr77756153plf.17.1672914202016; Thu, 05 Jan 2023 02:23:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914201; cv=none; d=google.com; s=arc-20160816; b=rlqxWgqbHcnVpAtOhxh1qqVj2wZtdi4OUCbnsEK7h+nmjOMMpaI/HcBgEZElZ5a7Qe UQkEBJ1jYKJbsEqjrvD36IZdaZswVEpv20gghACJBx8MluGxyfk25vZovZnUGF2zVIcc ooAyZj+g8cqdY40mfWQw79evA7l+77O7oC2Kzz+Gn9lPOXY/+2pPSqYSoEJ2MiBTRx9/ foqj+s3NKYKceSAeLak+6O0lmJbBt52FVAb74yU5ZQIrj8FyzxVvJ21lPIiexoyHMeDs VLdWTt3gVQzKa6We5Lvb2APbrmuQULJeXqgrIfrRRpBXmsFHDdcVbEKYUeYZOHQTQDuw vJdw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=zYgA7vTy+UiW0jcHBjsT9jPj1JCczIw3Dv3/VC3unlc=; b=tgVHyl0m72NzIr+vdey4CBU+J3WmFEMzx0c392Fy0EgF4x5CCm1pBYtP3VbVjRp8vW /3MtVCc/8KPVsD7actlcIHlpQWM+SWntZjempzEmQJpgY6wBdWeeV47c+kq59xinpBx5 /HUu211x5S8++Jv16fcELXN6/a0QmDCWYVkqTiRkTI5egJkbvOQK2hC/0GXtZsSHdtOV xRX00ZbEMCeqHY2QVo6ky0I2i6jTPrvCC3UkOval4gETojG1+udRwxhWUCSSNhcKeDA2 HrYMkd9qIatKCS4BMuC9QsAQzLWevuND8sjoJdVJYa8+YPFGRKOtARW6jIH+gLdCNQRT FYDg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="aZrPoex/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q9-20020a170902bd8900b00192e2fde201si4996905pls.124.2023.01.05.02.23.06; Thu, 05 Jan 2023 02:23:21 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="aZrPoex/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232626AbjAEKWK (ORCPT + 99 others); Thu, 5 Jan 2023 05:22:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40374 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232810AbjAEKUF (ORCPT ); Thu, 5 Jan 2023 05:20:05 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BBD85559F8 for ; Thu, 5 Jan 2023 02:19:33 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id n190-20020a25dac7000000b007447d7a25e4so35872606ybf.9 for ; Thu, 05 Jan 2023 02:19:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=zYgA7vTy+UiW0jcHBjsT9jPj1JCczIw3Dv3/VC3unlc=; b=aZrPoex/pVw6R2bZjUd0s/eQeCQD6Z+ZR1pwJBBdpF4+ODpwC87+571DSmG52tQd2A huBv6LZpBuHxyiw0O3CiSCiBc811ySkiBmjVOpz7sDguClfGeqjOyal8IA+Ji11V8jkN bEFusYMrdUdHMUBqX7dtj5Cx+qMWmrKu+iN47FofwRZruFCBAY5cfV+BlJmz8ohPVsX3 hreD3U4vshXW0/FmhhWMhII3bpHEdeySPmB3+xk2RkaW9C04IumqG5jIlCikAQysXd4v Ip9f+khY1/cJbjOG79FTa5wulnPT7jLwfHVYArewSw1+oGaupqCkA/OQmR8a7Mqa+kJj +N4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=zYgA7vTy+UiW0jcHBjsT9jPj1JCczIw3Dv3/VC3unlc=; b=F0Om2jCRxZo/VppyAFG9F1Oo5Ba6TmrOHMQl00tPo3DhUnSb7S4SJkgV18rCXWncvm R9Yprnsj5NB6DQe5NUJxa5lUUx/s3xApa/kbviDVeXD2PcYGs/il7w72LHuP2HgZQ6yy 2w4eOitr7BYec/XUWeF7luANM2QuaRat6MBsdd9djLwghbdw7qwccqL+ABn5qA8Yq2GV d7s4qTPmsDdWkpGSGKWnkVkKWi5vfN5uckiIc9CAD74EJYoRcauKii2vtg1K5CVd0bjQ o2sAdDoJZxPYwIkv+IMQi+66Q3OdiqBj2CZGIWGfxU3rMCxX1mcqEisWmdFHfuqwCcYF bQsw== X-Gm-Message-State: AFqh2kq7qhxDaS+sAR+5klDgB//wEFvTjfwoQTtQ8Jn/9AZHxpSyyicy lHLU+O80M3pWzxtou1vyVko0hJ0PHYLFHKiY X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a0d:d454:0:b0:482:a03a:3fcd with SMTP id w81-20020a0dd454000000b00482a03a3fcdmr3459124ywd.99.1672913972517; Thu, 05 Jan 2023 02:19:32 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:24 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-27-jthoughton@google.com> Subject: [PATCH 26/46] hugetlb: add HGM support for copy_hugetlb_page_range From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177682281040706?= X-GMAIL-MSGID: =?utf-8?q?1754177682281040706?= This allows fork() to work with high-granularity mappings. The page table structure is copied such that partially mapped regions will remain partially mapped in the same way for the new process. A page's reference count is incremented for *each* portion of it that is mapped in the page table. For example, if you have a PMD-mapped 1G page, the reference count and mapcount will be incremented by 512. Signed-off-by: James Houghton --- mm/hugetlb.c | 75 ++++++++++++++++++++++++++++++++++------------------ 1 file changed, 50 insertions(+), 25 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 718572444a73..21a5116f509b 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5106,7 +5106,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, struct vm_area_struct *src_vma) { pte_t *src_pte, *dst_pte, entry; - struct page *ptepage; + struct hugetlb_pte src_hpte, dst_hpte; + struct page *ptepage, *hpage; unsigned long addr; bool cow = is_cow_mapping(src_vma->vm_flags); struct hstate *h = hstate_vma(src_vma); @@ -5126,26 +5127,34 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, } else { /* * For shared mappings the vma lock must be held before - * calling hugetlb_walk() in the src vma. Otherwise, the - * returned ptep could go away if part of a shared pmd and - * another thread calls huge_pmd_unshare. + * calling hugetlb_full_walk() in the src vma. Otherwise, the + * returned hpte could go away if + * - part of a shared pmd and another thread calls + * - huge_pmd_unshare, or + * - another thread collapses a high-granularity mapping. */ hugetlb_vma_lock_read(src_vma); } last_addr_mask = hugetlb_mask_last_page(h); - for (addr = src_vma->vm_start; addr < src_vma->vm_end; addr += sz) { + addr = src_vma->vm_start; + while (addr < src_vma->vm_end) { spinlock_t *src_ptl, *dst_ptl; - src_pte = hugetlb_walk(src_vma, addr, sz); - if (!src_pte) { - addr |= last_addr_mask; + unsigned long hpte_sz; + + if (hugetlb_full_walk(&src_hpte, src_vma, addr)) { + addr = (addr | last_addr_mask) + sz; continue; } - dst_pte = huge_pte_alloc(dst, dst_vma, addr, sz); - if (!dst_pte) { - ret = -ENOMEM; + ret = hugetlb_full_walk_alloc(&dst_hpte, dst_vma, addr, + hugetlb_pte_size(&src_hpte)); + if (ret) break; - } + + src_pte = src_hpte.ptep; + dst_pte = dst_hpte.ptep; + + hpte_sz = hugetlb_pte_size(&src_hpte); /* * If the pagetables are shared don't copy or take references. @@ -5155,13 +5164,14 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, * another vma. So page_count of ptep page is checked instead * to reliably determine whether pte is shared. */ - if (page_count(virt_to_page(dst_pte)) > 1) { - addr |= last_addr_mask; + if (hugetlb_pte_size(&dst_hpte) == sz && + page_count(virt_to_page(dst_pte)) > 1) { + addr = (addr | last_addr_mask) + sz; continue; } - dst_ptl = huge_pte_lock(h, dst, dst_pte); - src_ptl = huge_pte_lockptr(huge_page_shift(h), src, src_pte); + dst_ptl = hugetlb_pte_lock(&dst_hpte); + src_ptl = hugetlb_pte_lockptr(&src_hpte); spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); entry = huge_ptep_get(src_pte); again: @@ -5205,10 +5215,15 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, */ if (userfaultfd_wp(dst_vma)) set_huge_pte_at(dst, addr, dst_pte, entry); + } else if (!hugetlb_pte_present_leaf(&src_hpte, entry)) { + /* Retry the walk. */ + spin_unlock(src_ptl); + spin_unlock(dst_ptl); + continue; } else { - entry = huge_ptep_get(src_pte); ptepage = pte_page(entry); - get_page(ptepage); + hpage = compound_head(ptepage); + get_page(hpage); /* * Failing to duplicate the anon rmap is a rare case @@ -5220,25 +5235,31 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, * need to be without the pgtable locks since we could * sleep during the process. */ - if (!PageAnon(ptepage)) { - page_dup_file_rmap(ptepage, true); - } else if (page_try_dup_anon_rmap(ptepage, true, + if (!PageAnon(hpage)) { + page_dup_file_rmap(hpage, true); + } else if (page_try_dup_anon_rmap(hpage, true, src_vma)) { pte_t src_pte_old = entry; struct page *new; + if (hugetlb_pte_size(&src_hpte) != sz) { + put_page(hpage); + ret = -EINVAL; + break; + } + spin_unlock(src_ptl); spin_unlock(dst_ptl); /* Do not use reserve as it's private owned */ new = alloc_huge_page(dst_vma, addr, 1); if (IS_ERR(new)) { - put_page(ptepage); + put_page(hpage); ret = PTR_ERR(new); break; } - copy_user_huge_page(new, ptepage, addr, dst_vma, + copy_user_huge_page(new, hpage, addr, dst_vma, npages); - put_page(ptepage); + put_page(hpage); /* Install the new huge page if src pte stable */ dst_ptl = huge_pte_lock(h, dst, dst_pte); @@ -5256,6 +5277,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, hugetlb_install_page(dst_vma, dst_pte, addr, new); spin_unlock(src_ptl); spin_unlock(dst_ptl); + addr += hugetlb_pte_size(&src_hpte); continue; } @@ -5272,10 +5294,13 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, } set_huge_pte_at(dst, addr, dst_pte, entry); - hugetlb_count_add(npages, dst); + hugetlb_count_add( + hugetlb_pte_size(&dst_hpte) / PAGE_SIZE, + dst); } spin_unlock(src_ptl); spin_unlock(dst_ptl); + addr += hugetlb_pte_size(&src_hpte); } if (cow) { From patchwork Thu Jan 5 10:18:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39449 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp227999wrt; Thu, 5 Jan 2023 02:23:22 -0800 (PST) X-Google-Smtp-Source: AMrXdXupmCMhoHQhRFLv4u1jds5iZpvFO9JU1iKGGDC4ZtK3+CsBZdV9JWVYXqHnF/QVkS23IPHw X-Received: by 2002:a05:6a20:8e0b:b0:ad:a0c2:53ee with SMTP id y11-20020a056a208e0b00b000ada0c253eemr79529696pzj.12.1672914201730; Thu, 05 Jan 2023 02:23:21 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914201; cv=none; d=google.com; s=arc-20160816; b=lCSY1DdxhEy1rBJJKSNFB4IWd1VnksnPHdUYK7hiYu7p53aXpa+8MeMopCiyGZvt/8 0RNMvNlkfuyM0VSoa6gCi1AcGUywU+9iumpt9CMprEIv7gnXkwMQTfBY4WJaHm4Zywk3 Omi8IhUU1TAdRbwm7DIWSaJ3uvq7+ao2VPhIpzzQiTpuGIUQ2OkLsjI3Hr1AkIm8UiSC NVxRgYwZt5p06a57CdxFJks+7v7MqMMPnUDUmLSp+6jGwA2q9eEVB34CbfjERbx8+ALz Ufz9C9O3LRq0DgQbWPpZZd3LetkhKPQSlZyH55w27jKaKAVgIkRdjayHrVbgDV++rQj1 e9lQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=NW6wHLpIenfZ4H83ugQ2cvHsKjvU5VzZf+cgjAGHY40=; b=qSL1rCMssO55qKz8DP5IJeDu2pcpNeauBoJnMo7lKjQ6EMAc4tJb+FYGTLmCapL0Hw 219skv+GWdwlKtDU92UpFJiOuMiozI3Z7YvGO7n7mLjNJndgAK98bN534DbnocGKm01Z eBRyvWFW3MzPsFM+h+a294dpYr5chjerX74Oq6W1xcwECpcs9JAK+m7ggMYHzEDn1Amq OwjhwfYfc8kNpbnOdLT2GaFC4IQWJjtdm7rsQAKoq1c8OZQzSWQYSxanBT//SsuD6Y/o 4Vn9zSsfzV+tXfLoTxJ5TM0CUsdNY4LIM6DYq87FI2DNU7wNGvLWwEnkQ1t7pYeofo5k T9xg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=rEGgdMvu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g1-20020a056a001a0100b005632d18cd91si38051874pfv.263.2023.01.05.02.23.06; Thu, 05 Jan 2023 02:23:21 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=rEGgdMvu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232688AbjAEKWP (ORCPT + 99 others); Thu, 5 Jan 2023 05:22:15 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40160 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232507AbjAEKUm (ORCPT ); Thu, 5 Jan 2023 05:20:42 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CDE38574D2 for ; Thu, 5 Jan 2023 02:19:34 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-46eb8a5a713so313721737b3.1 for ; Thu, 05 Jan 2023 02:19:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=NW6wHLpIenfZ4H83ugQ2cvHsKjvU5VzZf+cgjAGHY40=; b=rEGgdMvuu+27Ici8qii/KRILsHn9RlUYSO9K/TJzur40T3t5kesN5ok/NY1HloAU4L Z7iSC9U8uLAkzjnzBDCE6RwQaxIdaGRfRBI7Ay4/uqsXEpMpfJH5fXmrdGe8jBMdebhB PTfqNSSWxeBYlcJ/n89QdrXzg+3piVs9cMRx16JezvMUtD4M+SLYVIhAlR6VJsoaz/0q LIveRtwE+XWa1auke1aNsrJsM9rgwGQVMhxbEoL1PGm2s5/+Us9MZpK3VqY68e/ucGb6 rLqEJyLAlrwQyolGVPX3aLH9GGNSXJQpX581J5jygzsuQLpiwPMUgm2Gwf8Q06CukAiZ 8Rdw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=NW6wHLpIenfZ4H83ugQ2cvHsKjvU5VzZf+cgjAGHY40=; b=RTR6GN1vHS4yjBFCnzr9Jp9uOh4VNwNaOKJqAP75vAh1UefWCUvZkUj6xDe1Zqy2Jx H0PGcxUif1TmhW6PYspoAVZKwFKIhYb9Q0ARxazraQth9lRfLL8zmylcy//heoCt6rnD jAg1d7jPZmShUPJ9HRv5Zjz5RCSEgBuDwARTcrL3MHnLmFJZuLo4MDAK73Yv0Untz4Wk OR5RM/z4lvVQ1qEG4ENCG3er2gBYY3HL4XJpb/rMKAMiLzsCPdnoKKyZk3ku6yINlDnF FOVC8clLSvRH46IOmnQqmx0VfOBmyayeVkdZCSgU/s0v1f/gtFyec4Bz0w7w2BGrwy+B V3Qg== X-Gm-Message-State: AFqh2koqaXJB4+piqBAFA9t096yI8hRaYY2ZM17ScIWkM2KN2bSsTXr3 Kk0E/Yr09GCsD4cD2rXl4050PnaQKAW9xC/i X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:4bc4:0:b0:794:4ff7:1111 with SMTP id y187-20020a254bc4000000b007944ff71111mr1789851yba.603.1672913973763; Thu, 05 Jan 2023 02:19:33 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:25 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-28-jthoughton@google.com> Subject: [PATCH 27/46] hugetlb: add HGM support for move_hugetlb_page_tables From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177682033380033?= X-GMAIL-MSGID: =?utf-8?q?1754177682033380033?= This is very similar to the support that was added to copy_hugetlb_page_range. We simply do a high-granularity walk now, and most of the rest of the code stays the same. Signed-off-by: James Houghton --- mm/hugetlb.c | 47 +++++++++++++++++++++++++++-------------------- 1 file changed, 27 insertions(+), 20 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 21a5116f509b..582d14a206b5 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5313,16 +5313,16 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, return ret; } -static void move_huge_pte(struct vm_area_struct *vma, unsigned long old_addr, - unsigned long new_addr, pte_t *src_pte, pte_t *dst_pte) +static void move_hugetlb_pte(struct vm_area_struct *vma, unsigned long old_addr, + unsigned long new_addr, struct hugetlb_pte *src_hpte, + struct hugetlb_pte *dst_hpte) { - struct hstate *h = hstate_vma(vma); struct mm_struct *mm = vma->vm_mm; spinlock_t *src_ptl, *dst_ptl; pte_t pte; - dst_ptl = huge_pte_lock(h, mm, dst_pte); - src_ptl = huge_pte_lockptr(huge_page_shift(h), mm, src_pte); + dst_ptl = hugetlb_pte_lock(dst_hpte); + src_ptl = hugetlb_pte_lockptr(src_hpte); /* * We don't have to worry about the ordering of src and dst ptlocks @@ -5331,8 +5331,8 @@ static void move_huge_pte(struct vm_area_struct *vma, unsigned long old_addr, if (src_ptl != dst_ptl) spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); - pte = huge_ptep_get_and_clear(mm, old_addr, src_pte); - set_huge_pte_at(mm, new_addr, dst_pte, pte); + pte = huge_ptep_get_and_clear(mm, old_addr, src_hpte->ptep); + set_huge_pte_at(mm, new_addr, dst_hpte->ptep, pte); if (src_ptl != dst_ptl) spin_unlock(src_ptl); @@ -5350,9 +5350,9 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma, struct mm_struct *mm = vma->vm_mm; unsigned long old_end = old_addr + len; unsigned long last_addr_mask; - pte_t *src_pte, *dst_pte; struct mmu_notifier_range range; bool shared_pmd = false; + struct hugetlb_pte src_hpte, dst_hpte; mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, old_addr, old_end); @@ -5368,28 +5368,35 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma, /* Prevent race with file truncation */ hugetlb_vma_lock_write(vma); i_mmap_lock_write(mapping); - for (; old_addr < old_end; old_addr += sz, new_addr += sz) { - src_pte = hugetlb_walk(vma, old_addr, sz); - if (!src_pte) { - old_addr |= last_addr_mask; - new_addr |= last_addr_mask; + while (old_addr < old_end) { + if (hugetlb_full_walk(&src_hpte, vma, old_addr)) { + /* The hstate-level PTE wasn't allocated. */ + old_addr = (old_addr | last_addr_mask) + sz; + new_addr = (new_addr | last_addr_mask) + sz; continue; } - if (huge_pte_none(huge_ptep_get(src_pte))) + + if (huge_pte_none(huge_ptep_get(src_hpte.ptep))) { + old_addr += hugetlb_pte_size(&src_hpte); + new_addr += hugetlb_pte_size(&src_hpte); continue; + } - if (huge_pmd_unshare(mm, vma, old_addr, src_pte)) { + if (hugetlb_pte_size(&src_hpte) == sz && + huge_pmd_unshare(mm, vma, old_addr, src_hpte.ptep)) { shared_pmd = true; - old_addr |= last_addr_mask; - new_addr |= last_addr_mask; + old_addr = (old_addr | last_addr_mask) + sz; + new_addr = (new_addr | last_addr_mask) + sz; continue; } - dst_pte = huge_pte_alloc(mm, new_vma, new_addr, sz); - if (!dst_pte) + if (hugetlb_full_walk_alloc(&dst_hpte, new_vma, new_addr, + hugetlb_pte_size(&src_hpte))) break; - move_huge_pte(vma, old_addr, new_addr, src_pte, dst_pte); + move_hugetlb_pte(vma, old_addr, new_addr, &src_hpte, &dst_hpte); + old_addr += hugetlb_pte_size(&src_hpte); + new_addr += hugetlb_pte_size(&src_hpte); } if (shared_pmd) From patchwork Thu Jan 5 10:18:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39448 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp227993wrt; Thu, 5 Jan 2023 02:23:20 -0800 (PST) X-Google-Smtp-Source: AMrXdXtQBaS9KsI7AxSYQYy8TiuP0OfAKurlbMbNo66ItCjWFb3tN50CthQedMQbPnCyb1zsaqBv X-Received: by 2002:a17:903:3284:b0:188:82fc:e277 with SMTP id jh4-20020a170903328400b0018882fce277mr59324724plb.12.1672914200323; Thu, 05 Jan 2023 02:23:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914200; cv=none; d=google.com; s=arc-20160816; b=znXExsO517mPs+1OoAp4gQOYrkcAe1GuH+IadtAm6uUrHNhz6l4rI9H7HiTzf8g9zQ xKbZcgh/SLpo9belrbMcXd2gJ7nxEzUn0nkQpxHEN/brqufYZGwE6z+wt3mHnXcwDBJJ JW04KVfWV6VQSyNH5bKA/OMkqt/sn1dY9K4iV1y1bvQ7vgvcgnMT4hMs2HxTueyM8UL9 ZVnIZSmgiBjDT3QTEgzk5NXy+jobKSU7lkiN5VXh3XjPieaVkH0oKzDsyQMSKnri9XPg JxA7dvGuBFtt1mvoHsAJC8wLhv44GdIgSvhhXqs+NVIc28zWU9gmSVNpIHyicRY6QIY/ Cq1g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=52Lc/VSk0yIghq22praJapn3lwvkCPBnf2CxE486HBM=; b=MwJjOsuczKl50Jf4h/+tqYN8+89eZzR3pZ+laAE2d3LhdPz62y5VzkgMQVweV+n9Td Gv4w7xtV3RvRomNJbG8LYBB9aqAr/SxugMRdvI/0CN6WgSqqbBnx9gu82styIc/A6pD4 +6WbseMurAb98Qi3GvHOdtQGSrbBIJY/0Fk+xp+Y6orOgpsGEItkBatotLTmGdwDZ26d bOaYvflFshg7ly4ACKAwEc81zK9AvitXL++L8/aAttEbOAwoewxE58Mi/+owRZJl00Jp Lh6lhN8KgxHhuPsbGbTyddg9xr17qbVDyiGiB/vvQfqoOY9ZSyumYxCxsToFl42Q3jtJ FlKA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="Zx/5LSq7"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w20-20020a170902d71400b001926fc5007bsi29642489ply.506.2023.01.05.02.23.06; Thu, 05 Jan 2023 02:23:20 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="Zx/5LSq7"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232790AbjAEKWY (ORCPT + 99 others); Thu, 5 Jan 2023 05:22:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40054 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232718AbjAEKUv (ORCPT ); Thu, 5 Jan 2023 05:20:51 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 052F45004E for ; Thu, 5 Jan 2023 02:19:35 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-4755eb8a57bso287393857b3.12 for ; Thu, 05 Jan 2023 02:19:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=52Lc/VSk0yIghq22praJapn3lwvkCPBnf2CxE486HBM=; b=Zx/5LSq7vJ/Y7mzopiStfT2/J3Nk5Y/hwcDUw2EZUveXn48NopEe92bJctEe8AHRHb R4+NwkrLiB3Ui7F8085MNdLcPt4SgmkuRV80nXw5KvqkQh5nlPVpAeOmFBTYO/yyBK2R mfAGLc+fybaVayC+0em2EDp6HTBg/HUz57DWEpLbV1cSXcNeF2gtIk8xfD6FrGxUk4Ap emm5dAP/C0b01sNxXNKH99hzKuCt/6gecSgkdNuQIvP8mfymZ8FnbtLiZReUszJrDuC7 KppRcHq+DOV0gvkrTGcEapdiAAhjR3UU0kRqrCKj5VHgUxOeotl97qLhxViO6fj5HHq5 NEGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=52Lc/VSk0yIghq22praJapn3lwvkCPBnf2CxE486HBM=; b=sU0diBzKb5CIvLNAK8kPMedWo/M796eDX3WPAjj+IfGOH0ho/OrzfI1+HfxQDcHjwr PPGVbvT/gEB0DwGqJSiU1f+IhxqHg0tMHP7ZVwnDvwyddhLWEd8uga07FiDQ0v2u6Ac4 ryAeHR6fIrpYGcpjiprp7Ewm8nPYHpRjvT1ykHlXyZofshs3lv1J9aMPO0ykyv8v6mJD vcbNMXy2UwogCjZ498G45zhR1Y7oi6OEdaOkyEshGL0ECvVrcoZY6jNawGEvSwv6zaIf MYpCoCcqNuhc7mlKfm2IP+PXrhZ5Zm69FmExndCj0OP9lqjWtFahNyJLwmARySQFeWNM Up4w== X-Gm-Message-State: AFqh2kopUKoXfVtZ5qVMD6CjG6D7l0mNEPAhwwgDp6LeeSqxzxcYgxv/ PHuaCta7Ki1Sjn2TEQVMlzhmgTlcYM3ICdv8 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a0d:e2c1:0:b0:474:1969:ed89 with SMTP id l184-20020a0de2c1000000b004741969ed89mr3636781ywe.175.1672913975233; Thu, 05 Jan 2023 02:19:35 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:26 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-29-jthoughton@google.com> Subject: [PATCH 28/46] hugetlb: add HGM support for hugetlb_fault and hugetlb_no_page From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177680480049150?= X-GMAIL-MSGID: =?utf-8?q?1754177680480049150?= Update the page fault handler to support high-granularity page faults. While handling a page fault on a partially-mapped HugeTLB page, if the PTE we find with hugetlb_pte_walk is none, then we will replace it with a leaf-level PTE to map the page. To give some examples: 1. For a completely unmapped 1G page, it will be mapped with a 1G PUD. 2. For a 1G page that has its first 512M mapped, any faults on the unmapped sections will result in 2M PMDs mapping each unmapped 2M section. 3. For a 1G page that has only its first 4K mapped, a page fault on its second 4K section will get a 4K PTE to map it. Unless high-granularity mappings are created via UFFDIO_CONTINUE, it is impossible for hugetlb_fault to create high-granularity mappings. This commit does not handle hugetlb_wp right now, and it doesn't handle HugeTLB page migration and swap entries. The BUG_ON in huge_pte_alloc is removed, as it is not longer valid when HGM is possible. HGM can be disabled if the VMA lock cannot be allocated after a VMA is split, yet high-granularity mappings may still exist. Signed-off-by: James Houghton --- mm/hugetlb.c | 115 ++++++++++++++++++++++++++++++++++++--------------- 1 file changed, 81 insertions(+), 34 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 582d14a206b5..8e690a22456a 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -117,6 +117,18 @@ enum hugetlb_level hpage_size_to_level(unsigned long sz) return HUGETLB_LEVEL_PGD; } +/* + * Find the subpage that corresponds to `addr` in `hpage`. + */ +static struct page *hugetlb_find_subpage(struct hstate *h, struct page *hpage, + unsigned long addr) +{ + size_t idx = (addr & ~huge_page_mask(h))/PAGE_SIZE; + + BUG_ON(idx >= pages_per_huge_page(h)); + return &hpage[idx]; +} + static inline bool subpool_is_free(struct hugepage_subpool *spool) { if (spool->count) @@ -5926,14 +5938,14 @@ static inline vm_fault_t hugetlb_handle_userfault(struct vm_area_struct *vma, * Recheck pte with pgtable lock. Returns true if pte didn't change, or * false if pte changed or is changing. */ -static bool hugetlb_pte_stable(struct hstate *h, struct mm_struct *mm, - pte_t *ptep, pte_t old_pte) +static bool hugetlb_pte_stable(struct hstate *h, struct hugetlb_pte *hpte, + pte_t old_pte) { spinlock_t *ptl; bool same; - ptl = huge_pte_lock(h, mm, ptep); - same = pte_same(huge_ptep_get(ptep), old_pte); + ptl = hugetlb_pte_lock(hpte); + same = pte_same(huge_ptep_get(hpte->ptep), old_pte); spin_unlock(ptl); return same; @@ -5942,17 +5954,18 @@ static bool hugetlb_pte_stable(struct hstate *h, struct mm_struct *mm, static vm_fault_t hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma, struct address_space *mapping, pgoff_t idx, - unsigned long address, pte_t *ptep, + unsigned long address, struct hugetlb_pte *hpte, pte_t old_pte, unsigned int flags) { struct hstate *h = hstate_vma(vma); vm_fault_t ret = VM_FAULT_SIGBUS; int anon_rmap = 0; unsigned long size; - struct page *page; + struct page *page, *subpage; pte_t new_pte; spinlock_t *ptl; unsigned long haddr = address & huge_page_mask(h); + unsigned long haddr_hgm = address & hugetlb_pte_mask(hpte); bool new_page, new_pagecache_page = false; u32 hash = hugetlb_fault_mutex_hash(mapping, idx); @@ -5997,7 +6010,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, * never happen on the page after UFFDIO_COPY has * correctly installed the page and returned. */ - if (!hugetlb_pte_stable(h, mm, ptep, old_pte)) { + if (!hugetlb_pte_stable(h, hpte, old_pte)) { ret = 0; goto out; } @@ -6021,7 +6034,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, * here. Before returning error, get ptl and make * sure there really is no pte entry. */ - if (hugetlb_pte_stable(h, mm, ptep, old_pte)) + if (hugetlb_pte_stable(h, hpte, old_pte)) ret = vmf_error(PTR_ERR(page)); else ret = 0; @@ -6071,7 +6084,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, unlock_page(page); put_page(page); /* See comment in userfaultfd_missing() block above */ - if (!hugetlb_pte_stable(h, mm, ptep, old_pte)) { + if (!hugetlb_pte_stable(h, hpte, old_pte)) { ret = 0; goto out; } @@ -6096,30 +6109,43 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, vma_end_reservation(h, vma, haddr); } - ptl = huge_pte_lock(h, mm, ptep); + ptl = hugetlb_pte_lock(hpte); ret = 0; - /* If pte changed from under us, retry */ - if (!pte_same(huge_ptep_get(ptep), old_pte)) + /* + * If pte changed from under us, retry. + * + * When dealing with high-granularity-mapped PTEs, it's possible that + * a non-contiguous PTE within our contiguous PTE group gets populated, + * in which case, we need to retry here. This is NOT caught here, and + * will need to be addressed when HGM is supported for architectures + * that support contiguous PTEs. + */ + if (!pte_same(huge_ptep_get(hpte->ptep), old_pte)) goto backout; if (anon_rmap) hugepage_add_new_anon_rmap(page, vma, haddr); else page_dup_file_rmap(page, true); - new_pte = make_huge_pte(vma, page, ((vma->vm_flags & VM_WRITE) - && (vma->vm_flags & VM_SHARED))); + + subpage = hugetlb_find_subpage(h, page, haddr_hgm); + new_pte = make_huge_pte_with_shift(vma, subpage, + ((vma->vm_flags & VM_WRITE) + && (vma->vm_flags & VM_SHARED)), + hpte->shift); /* * If this pte was previously wr-protected, keep it wr-protected even * if populated. */ if (unlikely(pte_marker_uffd_wp(old_pte))) new_pte = huge_pte_mkuffd_wp(new_pte); - set_huge_pte_at(mm, haddr, ptep, new_pte); + set_huge_pte_at(mm, haddr_hgm, hpte->ptep, new_pte); - hugetlb_count_add(pages_per_huge_page(h), mm); + hugetlb_count_add(hugetlb_pte_size(hpte) / PAGE_SIZE, mm); if ((flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) { + WARN_ON_ONCE(hugetlb_pte_size(hpte) != huge_page_size(h)); /* Optimization, do the COW without a second fault */ - ret = hugetlb_wp(mm, vma, address, ptep, flags, page, ptl); + ret = hugetlb_wp(mm, vma, address, hpte->ptep, flags, page, ptl); } spin_unlock(ptl); @@ -6176,17 +6202,20 @@ u32 hugetlb_fault_mutex_hash(struct address_space *mapping, pgoff_t idx) vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, unsigned int flags) { - pte_t *ptep, entry; + pte_t entry; spinlock_t *ptl; vm_fault_t ret; u32 hash; pgoff_t idx; struct page *page = NULL; + struct page *subpage = NULL; struct page *pagecache_page = NULL; struct hstate *h = hstate_vma(vma); struct address_space *mapping; int need_wait_lock = 0; unsigned long haddr = address & huge_page_mask(h); + unsigned long haddr_hgm; + struct hugetlb_pte hpte; /* * Serialize hugepage allocation and instantiation, so that we don't @@ -6200,26 +6229,26 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, /* * Acquire vma lock before calling huge_pte_alloc and hold - * until finished with ptep. This prevents huge_pmd_unshare from - * being called elsewhere and making the ptep no longer valid. + * until finished with hpte. This prevents huge_pmd_unshare from + * being called elsewhere and making the hpte no longer valid. */ hugetlb_vma_lock_read(vma); - ptep = huge_pte_alloc(mm, vma, haddr, huge_page_size(h)); - if (!ptep) { + if (hugetlb_full_walk_alloc(&hpte, vma, address, 0)) { hugetlb_vma_unlock_read(vma); mutex_unlock(&hugetlb_fault_mutex_table[hash]); return VM_FAULT_OOM; } - entry = huge_ptep_get(ptep); + entry = huge_ptep_get(hpte.ptep); /* PTE markers should be handled the same way as none pte */ - if (huge_pte_none_mostly(entry)) + if (huge_pte_none_mostly(entry)) { /* * hugetlb_no_page will drop vma lock and hugetlb fault * mutex internally, which make us return immediately. */ - return hugetlb_no_page(mm, vma, mapping, idx, address, ptep, + return hugetlb_no_page(mm, vma, mapping, idx, address, &hpte, entry, flags); + } ret = 0; @@ -6240,7 +6269,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, * be released there. */ mutex_unlock(&hugetlb_fault_mutex_table[hash]); - migration_entry_wait_huge(vma, ptep); + migration_entry_wait_huge(vma, hpte.ptep); return 0; } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) ret = VM_FAULT_HWPOISON_LARGE | @@ -6248,6 +6277,10 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, goto out_mutex; } + if (!hugetlb_pte_present_leaf(&hpte, entry)) + /* We raced with someone splitting the entry. */ + goto out_mutex; + /* * If we are going to COW/unshare the mapping later, we examine the * pending reservations for this page now. This will ensure that any @@ -6267,14 +6300,17 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, pagecache_page = find_lock_page(mapping, idx); } - ptl = huge_pte_lock(h, mm, ptep); + ptl = hugetlb_pte_lock(&hpte); /* Check for a racing update before calling hugetlb_wp() */ - if (unlikely(!pte_same(entry, huge_ptep_get(ptep)))) + if (unlikely(!pte_same(entry, huge_ptep_get(hpte.ptep)))) goto out_ptl; + /* haddr_hgm is the base address of the region that hpte maps. */ + haddr_hgm = address & hugetlb_pte_mask(&hpte); + /* Handle userfault-wp first, before trying to lock more pages */ - if (userfaultfd_wp(vma) && huge_pte_uffd_wp(huge_ptep_get(ptep)) && + if (userfaultfd_wp(vma) && huge_pte_uffd_wp(entry) && (flags & FAULT_FLAG_WRITE) && !huge_pte_write(entry)) { struct vm_fault vmf = { .vma = vma, @@ -6298,7 +6334,8 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, * pagecache_page, so here we need take the former one * when page != pagecache_page or !pagecache_page. */ - page = pte_page(entry); + subpage = pte_page(entry); + page = compound_head(subpage); if (page != pagecache_page) if (!trylock_page(page)) { need_wait_lock = 1; @@ -6309,7 +6346,9 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, if (flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) { if (!huge_pte_write(entry)) { - ret = hugetlb_wp(mm, vma, address, ptep, flags, + WARN_ON_ONCE(hugetlb_pte_size(&hpte) != + huge_page_size(h)); + ret = hugetlb_wp(mm, vma, address, hpte.ptep, flags, pagecache_page, ptl); goto out_put_page; } else if (likely(flags & FAULT_FLAG_WRITE)) { @@ -6317,9 +6356,9 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, } } entry = pte_mkyoung(entry); - if (huge_ptep_set_access_flags(vma, haddr, ptep, entry, + if (huge_ptep_set_access_flags(vma, haddr_hgm, hpte.ptep, entry, flags & FAULT_FLAG_WRITE)) - update_mmu_cache(vma, haddr, ptep); + update_mmu_cache(vma, haddr_hgm, hpte.ptep); out_put_page: if (page != pagecache_page) unlock_page(page); @@ -7523,6 +7562,9 @@ int hugetlb_full_walk(struct hugetlb_pte *hpte, /* * hugetlb_full_walk_alloc - do a high-granularity walk, potentially allocate * new PTEs. + * + * If @target_sz is 0, then only attempt to allocate the hstate-level PTE, and + * walk as far as we can go. */ int hugetlb_full_walk_alloc(struct hugetlb_pte *hpte, struct vm_area_struct *vma, @@ -7541,6 +7583,12 @@ int hugetlb_full_walk_alloc(struct hugetlb_pte *hpte, if (!ptep) return -ENOMEM; + if (!target_sz) { + WARN_ON_ONCE(hugetlb_hgm_walk_uninit(hpte, ptep, vma, addr, + PAGE_SIZE, false)); + return 0; + } + return hugetlb_hgm_walk_uninit(hpte, ptep, vma, addr, target_sz, true); } @@ -7569,7 +7617,6 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, pte = (pte_t *)pmd_alloc(mm, pud, addr); } } - BUG_ON(pte && pte_present(*pte) && !pte_huge(*pte)); return pte; } From patchwork Thu Jan 5 10:18:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39451 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp228004wrt; Thu, 5 Jan 2023 02:23:22 -0800 (PST) X-Google-Smtp-Source: AMrXdXtXXnAoIHeZo7FzqpQOHIaYUMtM2jF19Y3kosWNQptEKh8LMNig41H96E8WyJ91Sw7Do5+B X-Received: by 2002:a05:6a20:93a7:b0:b0:42a5:fe24 with SMTP id x39-20020a056a2093a700b000b042a5fe24mr66726955pzh.4.1672914202187; Thu, 05 Jan 2023 02:23:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914202; cv=none; d=google.com; s=arc-20160816; b=jEbkgS0ZuC73Qs/P9rE11Aglv68lwokkiItVWWUqritsXIF6wirDjvnlGFm5jqr8bR F0EOG4Jb1Gl1sZvSzWAEGzwRzEyUFsfVduHdPsDCvxYvnPF78gTaZYIQ0vssfNFhkr7+ 1oLAGH5lqqwyyoAxFA0AjYm897BqpREjHF/c8DbB5EzYAx0vre5dT/pJsfqa1Z9Zh3Dd 7m0tb4t5IqERbjHACNAxU45lEWUlB/cxmCUh3XWsdyYZCsbdaDMIY0x7JdsntpXKW8mN J2N1zN6csRC1q/x+bKRNvmvZcB/63phWrDOAtHc71o6WbiQw2DPCU/CytsSO0zxWavnT 24zw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=/OkucptRSHaE2jww1tz6Gz5twfCMvXYe7c+PeOdayik=; b=q/UYNheN4i1Mk3a6WeM8yyBWg/PIvAnFgIkqa9wuYKxjty9QuAGSZG+SsNJnRvA4Sr 8fTMaeuPa+a3hky7Uexr5Sflw3KSjWESwuUGsPBsnqM3bP4qD8CaP5sDmKkn1jYkxmmy IvdUhKvvKUPrH+Z5Kq4U21+jUF8MV1Cf/GtlmGT1939ti58GFaEDMkKot0WVQ4IHMDXp N/C942WBcn2KHdrjLjZJRCJPUyYAC2rk/jSXVxvT52qLUufrjILqZBZDNbycXGnoFOVn b1d4pDJ7X/aR2jjU1F5h8zcO2BwXR7tHh8bvgkqFwBjRIDLavz8w0oTDTPYZAUthANRM rVVg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=iSGNYib6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a5-20020a631a45000000b004794a80d855si35475053pgm.708.2023.01.05.02.23.06; Thu, 05 Jan 2023 02:23:22 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=iSGNYib6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232721AbjAEKWU (ORCPT + 99 others); Thu, 5 Jan 2023 05:22:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40172 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232866AbjAEKUw (ORCPT ); Thu, 5 Jan 2023 05:20:52 -0500 Received: from mail-vs1-xe49.google.com (mail-vs1-xe49.google.com [IPv6:2607:f8b0:4864:20::e49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B9215568AE for ; Thu, 5 Jan 2023 02:19:37 -0800 (PST) Received: by mail-vs1-xe49.google.com with SMTP id d187-20020a671dc4000000b003c3a754b594so8077740vsd.10 for ; Thu, 05 Jan 2023 02:19:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=/OkucptRSHaE2jww1tz6Gz5twfCMvXYe7c+PeOdayik=; b=iSGNYib6WI9KvfmqIGZ0i7Vk1UHRi9UDSqMCmoRyiWYTjMJCfkdilKxemSSs37rmEB LHey9zQAKiluMdfQLrqTJC18PcagtYORncHX8xb4slmIglKAK8i5xsLn8irb+SQa5QzI 4QtmQAfki38HB20NjnEKifiLax0woXit1rzbXzd6DiZJt4TmQVa8Cw2yCoz++5MjYfVN pYOtQPvRLchwdc3u+rabZTzy1X2ezxmhCkcFxceEMFn6sa9jw6mf0hk0rjrfbLUnoyi1 4OF7tPUD2sGPQORVj4TOgSn8kcTeH6qegdWylrCr5/p3m+hVSlBOd7UYhD1rHychySU9 7SpA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=/OkucptRSHaE2jww1tz6Gz5twfCMvXYe7c+PeOdayik=; b=0sqPzHhhximMH3UFFyHVd5G8D9053dqgrrhpsiOqMT6x79mFT2R42ZtZ6nPJDbHQnO /zoPyvDKq8FjdkA6n9WOxr7BcvggELuFAP/1wbutc1EQ9zvy3zNg0bGRT8ITNHNi8Mb9 y9EGIGUD3UavitFwFBMfmTgygSduzuqq7rw6OlFKA0aHOP1wEUMHu6Kcz1MZaI3zR6V0 o4I3TsN6VdOn+Q6CYdHvs8z8KV6MRLHpoMZtxJ99kFr1yQ4znfeaCea1UBtnF2qaNynW lGv0u0C9+ZqXDu31KuyFoNjyGQFNK5SegHOFf8ptm42aGwui2fhux1VVwYMdYs9G4r54 WC/w== X-Gm-Message-State: AFqh2koWSrscDzhgeuVD1UpmMurIPTV3WZJgp65nZZkvh+VK3teZ3OSR +auZfQkyAiwPUJEo2ICvJggY/5buP0EvC8ae X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:ac5:cbcc:0:b0:3cf:a8e:4620 with SMTP id h12-20020ac5cbcc000000b003cf0a8e4620mr5931022vkn.31.1672913976826; Thu, 05 Jan 2023 02:19:36 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:27 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-30-jthoughton@google.com> Subject: [PATCH 29/46] rmap: in try_to_{migrate,unmap}_one, check head page for page flags From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177682517792348?= X-GMAIL-MSGID: =?utf-8?q?1754177682517792348?= The main complication here is that HugeTLB pages have their poison status stored in the head page as the HWPoison page flag. Because HugeTLB high-granularity mapping can create PTEs that point to subpages instead of always the head of a hugepage, we need to check the compound_head for page flags. Signed-off-by: James Houghton --- mm/rmap.c | 34 ++++++++++++++++++++++++++-------- 1 file changed, 26 insertions(+), 8 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 076ea77010e5..a6004d6b0415 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1456,10 +1456,11 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, struct mm_struct *mm = vma->vm_mm; DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); pte_t pteval; - struct page *subpage; + struct page *subpage, *page_flags_page; bool anon_exclusive, ret = true; struct mmu_notifier_range range; enum ttu_flags flags = (enum ttu_flags)(long)arg; + bool page_poisoned; /* * When racing against e.g. zap_pte_range() on another cpu, @@ -1512,9 +1513,17 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, subpage = folio_page(folio, pte_pfn(*pvmw.pte) - folio_pfn(folio)); + /* + * We check the page flags of HugeTLB pages by checking the + * head page. + */ + page_flags_page = folio_test_hugetlb(folio) + ? &folio->page + : subpage; + page_poisoned = PageHWPoison(page_flags_page); address = pvmw.address; anon_exclusive = folio_test_anon(folio) && - PageAnonExclusive(subpage); + PageAnonExclusive(page_flags_page); if (folio_test_hugetlb(folio)) { bool anon = folio_test_anon(folio); @@ -1523,7 +1532,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, * The try_to_unmap() is only passed a hugetlb page * in the case where the hugetlb page is poisoned. */ - VM_BUG_ON_PAGE(!PageHWPoison(subpage), subpage); + VM_BUG_ON_FOLIO(!page_poisoned, folio); /* * huge_pmd_unshare may unmap an entire PMD page. * There is no way of knowing exactly which PMDs may @@ -1606,7 +1615,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, /* Update high watermark before we lower rss */ update_hiwater_rss(mm); - if (PageHWPoison(subpage) && !(flags & TTU_IGNORE_HWPOISON)) { + if (page_poisoned && !(flags & TTU_IGNORE_HWPOISON)) { pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); if (folio_test_hugetlb(folio)) { hugetlb_count_sub(1UL << pvmw.pte_order, mm); @@ -1632,7 +1641,9 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, mmu_notifier_invalidate_range(mm, address, address + PAGE_SIZE); } else if (folio_test_anon(folio)) { - swp_entry_t entry = { .val = page_private(subpage) }; + swp_entry_t entry = { + .val = page_private(page_flags_page) + }; pte_t swp_pte; /* * Store the swap location in the pte. @@ -1831,7 +1842,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, struct mm_struct *mm = vma->vm_mm; DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); pte_t pteval; - struct page *subpage; + struct page *subpage, *page_flags_page; bool anon_exclusive, ret = true; struct mmu_notifier_range range; enum ttu_flags flags = (enum ttu_flags)(long)arg; @@ -1911,9 +1922,16 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, subpage = folio_page(folio, pte_pfn(*pvmw.pte) - folio_pfn(folio)); } + /* + * We check the page flags of HugeTLB pages by checking the + * head page. + */ + page_flags_page = folio_test_hugetlb(folio) + ? &folio->page + : subpage; address = pvmw.address; anon_exclusive = folio_test_anon(folio) && - PageAnonExclusive(subpage); + PageAnonExclusive(page_flags_page); if (folio_test_hugetlb(folio)) { bool anon = folio_test_anon(folio); @@ -2032,7 +2050,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, * No need to invalidate here it will synchronize on * against the special swap migration pte. */ - } else if (PageHWPoison(subpage)) { + } else if (PageHWPoison(page_flags_page)) { pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); if (folio_test_hugetlb(folio)) { hugetlb_count_sub(1L << pvmw.pte_order, mm); From patchwork Thu Jan 5 10:18:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39453 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp228007wrt; Thu, 5 Jan 2023 02:23:22 -0800 (PST) X-Google-Smtp-Source: AMrXdXvNTMz5mCfF/4SvIrMl3MTo+L6y2nxZOL2RICkPwtdEjRKWI5WYAifFWyZVeVc0hPiK0S05 X-Received: by 2002:a17:902:d4ca:b0:189:ec62:3b2d with SMTP id o10-20020a170902d4ca00b00189ec623b2dmr78400045plg.4.1672914202281; Thu, 05 Jan 2023 02:23:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914202; cv=none; d=google.com; s=arc-20160816; b=xlclRRY9+DVAWcpYMjr6JE3Ql/EIzSl8KsJ3wzerGsmkR1nZh1TVu1XXLTmJwxInWp HVdGdBHGrfbzKsJfhin55m/FwCc2XPq7e4E46zI72jN3mE88Rt0La0PpDbGs9eYijui7 FB6mw3I6I8CcaDb93YyvtSyAUprEKfcyyr8GiKSWQ642PauIGjmqtGshBDULJY31eQM1 eU8ZgxW+nnj8dUNqLiA8fVg5lB40GabhXAIkHWVsZo1RZ9Nq4AXkW8zTFm5RW7Qr+OxC omRKE2cMujN2GLNPNF38ZoIQFF5oOf87Fl+Yfmr+C1Sa7WALah/3lKmJzKWhuntsvNEc WWxw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=D85Uov3Vr9+M/n7WNaDbBuEuFLe7i/N1k26DEBJc4Oo=; b=ihjVuCFAuhURRKb1hOPyEO3X+F5otUxyaUB4983LmlY8Nrhz+jQbss3Wvy1ZYFY8LE Avgt5NYxQbQu3xzkrYfnAJgSP2pKFH1UDpEQO57hohawtxHMcsBZC38uAQDEG140NDA6 j85p/XTkELw/75jO48BNYRi7tSbzgDNBscaA8g3HzhGs4USYyLkY3lFRkr4Vmmscf8tT U4VQy6KEI+enTE/96zWlb9WulRPIt5rwgNqBy7Rjw1xtPpztzLv8ihUqs1h/fi/vu1m7 VJC8wLGnYSj8WbkrLcPltNzQeUfQcmJBU28C1DtB1c2eqyCqAtGFukMEvLT6kcbEsCqB AwxQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=Fc+sFoa0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c5-20020a170902c1c500b0018537cc29f9si35955269plc.15.2023.01.05.02.23.06; Thu, 05 Jan 2023 02:23:22 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=Fc+sFoa0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232893AbjAEKW2 (ORCPT + 99 others); Thu, 5 Jan 2023 05:22:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41258 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232874AbjAEKUx (ORCPT ); Thu, 5 Jan 2023 05:20:53 -0500 Received: from mail-vs1-xe49.google.com (mail-vs1-xe49.google.com [IPv6:2607:f8b0:4864:20::e49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C7F2D559E8 for ; Thu, 5 Jan 2023 02:19:38 -0800 (PST) Received: by mail-vs1-xe49.google.com with SMTP id j68-20020a676e47000000b003c860388b23so6349928vsc.3 for ; Thu, 05 Jan 2023 02:19:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=D85Uov3Vr9+M/n7WNaDbBuEuFLe7i/N1k26DEBJc4Oo=; b=Fc+sFoa0jzCWo3hJ/NOaTUhKo/Ryani/NbeUDYNYcfhk7QQC6IXT8SsP8RkpxdgQH0 Boa0nJMUokcB7P333Ef2yEOjw4nVlYYEEjKGQIu6pJHSa/sd/3BcY0VeYH8OQ7Ull7iD RZidov6eS7oRCowhc+nmf87KUuctR7UFMLVm7HW4S88jfAaN6OtaeNBZWA+IhO0nSuMt 3o/wfdL6uSYfAutpej7givW+b5zwSgmNlC9Aes9ldYfBsTRoMBs8tJvrkPpLBLkF1Gue jGQhMxLX3e6eD9v4YR/phB1VaySb2QLjouCRE8nhl5DlmoHI5vi/DDIvbIlCxmosvGfr rr2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=D85Uov3Vr9+M/n7WNaDbBuEuFLe7i/N1k26DEBJc4Oo=; b=uKWWxlKWTzSylhgQtoGYuKJaYV5EzmnU1U4TNZUttyCQincCwFz7eNHXIomPD6h9YN Xq5WUfdeyubExL4RE9Yp0piCyO8UvpsqyaA1NmIkJPqSWUjw9gLNo4ME5LXizQvAGH1Y XWRa2Vjn/QeSmyJ/SBnbd0TmKaVpv/oQ/NUpKyPVZwBtXeNT9/08+XHlri91OuBsaGcn 76ztarrVrIYVPdfY6rLjhed7seAS9BcRTqxTg4HSnSBfESZHBbgOt5U+x+6KQjW7umSn mlASkIad1WcV3mm5BdIxP3JQ+iD7kSriOE7sOnzniEGzNrKiEtY+5wnqKPpY7XoEiQ8F 99pg== X-Gm-Message-State: AFqh2kqjHK5pY1E1mQK3Y2R21Vv8xSMrHATD9wPXmBHNRM1aKzJTEpJJ y/lmD5whFKJMESlUoAZgEQqxWcv+KaUKODUu X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6130:a19:b0:4aa:585:d7c2 with SMTP id bx25-20020a0561300a1900b004aa0585d7c2mr4446498uab.48.1672913978000; Thu, 05 Jan 2023 02:19:38 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:28 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-31-jthoughton@google.com> Subject: [PATCH 30/46] hugetlb: add high-granularity migration support From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177682456886323?= X-GMAIL-MSGID: =?utf-8?q?1754177682456886323?= To prevent queueing a hugepage for migration multiple times, we use last_page to keep track of the last page we saw in queue_pages_hugetlb, and if the page we're looking at is last_page, then we skip it. For the non-hugetlb cases, last_page, although unused, is still updated so that it has a consistent meaning with the hugetlb case. This commit adds a check in hugetlb_fault for high-granularity migration PTEs. Signed-off-by: James Houghton --- include/linux/swapops.h | 8 ++++++-- mm/hugetlb.c | 2 +- mm/mempolicy.c | 24 +++++++++++++++++++----- mm/migrate.c | 18 ++++++++++-------- 4 files changed, 36 insertions(+), 16 deletions(-) diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 3a451b7afcb3..6ef80763e629 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -68,6 +68,8 @@ static inline bool is_pfn_swap_entry(swp_entry_t entry); +struct hugetlb_pte; + /* Clear all flags but only keep swp_entry_t related information */ static inline pte_t pte_swp_clear_flags(pte_t pte) { @@ -339,7 +341,8 @@ extern void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, #ifdef CONFIG_HUGETLB_PAGE extern void __migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *ptep, spinlock_t *ptl); -extern void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte); +extern void migration_entry_wait_huge(struct vm_area_struct *vma, + struct hugetlb_pte *hpte); #endif /* CONFIG_HUGETLB_PAGE */ #else /* CONFIG_MIGRATION */ static inline swp_entry_t make_readable_migration_entry(pgoff_t offset) @@ -369,7 +372,8 @@ static inline void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, #ifdef CONFIG_HUGETLB_PAGE static inline void __migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *ptep, spinlock_t *ptl) { } -static inline void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte) { } +static inline void migration_entry_wait_huge(struct vm_area_struct *vma, + struct hugetlb_pte *hpte) { } #endif /* CONFIG_HUGETLB_PAGE */ static inline int is_writable_migration_entry(swp_entry_t entry) { diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 8e690a22456a..2fb95ecafc63 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6269,7 +6269,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, * be released there. */ mutex_unlock(&hugetlb_fault_mutex_table[hash]); - migration_entry_wait_huge(vma, hpte.ptep); + migration_entry_wait_huge(vma, &hpte); return 0; } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) ret = VM_FAULT_HWPOISON_LARGE | diff --git a/mm/mempolicy.c b/mm/mempolicy.c index e5859ed34e90..6c4c3c923fa2 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -424,6 +424,7 @@ struct queue_pages { unsigned long start; unsigned long end; struct vm_area_struct *first; + struct page *last_page; }; /* @@ -475,6 +476,7 @@ static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr, flags = qp->flags; /* go to thp migration */ if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) { + qp->last_page = page; if (!vma_migratable(walk->vma) || migrate_page_add(page, qp->pagelist, flags)) { ret = 1; @@ -532,6 +534,7 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr, continue; if (!queue_pages_required(page, qp)) continue; + if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) { /* MPOL_MF_STRICT must be specified if we get here */ if (!vma_migratable(vma)) { @@ -539,6 +542,8 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr, break; } + qp->last_page = page; + /* * Do not abort immediately since there may be * temporary off LRU pages in the range. Still @@ -570,15 +575,22 @@ static int queue_pages_hugetlb(struct hugetlb_pte *hpte, spinlock_t *ptl; pte_t entry; - /* We don't migrate high-granularity HugeTLB mappings for now. */ - if (hugetlb_hgm_enabled(walk->vma)) - return -EINVAL; - ptl = hugetlb_pte_lock(hpte); entry = huge_ptep_get(hpte->ptep); if (!pte_present(entry)) goto unlock; - page = pte_page(entry); + + if (!hugetlb_pte_present_leaf(hpte, entry)) { + ret = -EAGAIN; + goto unlock; + } + + page = compound_head(pte_page(entry)); + + /* We already queued this page with another high-granularity PTE. */ + if (page == qp->last_page) + goto unlock; + if (!queue_pages_required(page, qp)) goto unlock; @@ -605,6 +617,7 @@ static int queue_pages_hugetlb(struct hugetlb_pte *hpte, /* With MPOL_MF_MOVE, we migrate only unshared hugepage. */ if (flags & (MPOL_MF_MOVE_ALL) || (flags & MPOL_MF_MOVE && page_mapcount(page) == 1)) { + qp->last_page = page; if (isolate_hugetlb(page, qp->pagelist) && (flags & MPOL_MF_STRICT)) /* @@ -739,6 +752,7 @@ queue_pages_range(struct mm_struct *mm, unsigned long start, unsigned long end, .start = start, .end = end, .first = NULL, + .last_page = NULL, }; err = walk_page_range(mm, start, end, &queue_pages_walk_ops, &qp); diff --git a/mm/migrate.c b/mm/migrate.c index 0062689f4878..c30647b75459 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -195,6 +195,9 @@ static bool remove_migration_pte(struct folio *folio, /* pgoff is invalid for ksm pages, but they are never large */ if (folio_test_large(folio) && !folio_test_hugetlb(folio)) idx = linear_page_index(vma, pvmw.address) - pvmw.pgoff; + else if (folio_test_hugetlb(folio)) + idx = (pvmw.address & ~huge_page_mask(hstate_vma(vma)))/ + PAGE_SIZE; new = folio_page(folio, idx); #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION @@ -244,14 +247,15 @@ static bool remove_migration_pte(struct folio *folio, #ifdef CONFIG_HUGETLB_PAGE if (folio_test_hugetlb(folio)) { + struct page *hpage = folio_page(folio, 0); unsigned int shift = pvmw.pte_order + PAGE_SHIFT; pte = arch_make_huge_pte(pte, shift, vma->vm_flags); if (folio_test_anon(folio)) - hugepage_add_anon_rmap(new, vma, pvmw.address, + hugepage_add_anon_rmap(hpage, vma, pvmw.address, rmap_flags); else - page_dup_file_rmap(new, true); + page_dup_file_rmap(hpage, true); set_huge_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte); } else #endif @@ -267,7 +271,7 @@ static bool remove_migration_pte(struct folio *folio, mlock_page_drain_local(); trace_remove_migration_pte(pvmw.address, pte_val(pte), - compound_order(new)); + pvmw.pte_order); /* No need to invalidate - it was non-present before */ update_mmu_cache(vma, pvmw.address, pvmw.pte); @@ -358,12 +362,10 @@ void __migration_entry_wait_huge(struct vm_area_struct *vma, } } -void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte) +void migration_entry_wait_huge(struct vm_area_struct *vma, + struct hugetlb_pte *hpte) { - spinlock_t *ptl = huge_pte_lockptr(huge_page_shift(hstate_vma(vma)), - vma->vm_mm, pte); - - __migration_entry_wait_huge(vma, pte, ptl); + __migration_entry_wait_huge(vma, hpte->ptep, hpte->ptl); } #endif From patchwork Thu Jan 5 10:18:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39456 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp228053wrt; Thu, 5 Jan 2023 02:23:30 -0800 (PST) X-Google-Smtp-Source: AMrXdXsl+z2Tn9r1WRoA7IMCB9zGSg2ENrdH8/leHyZPqktxWUzoSciHdSy+7bA7w8QvSsvFQSO9 X-Received: by 2002:a05:6a20:8e10:b0:a4:a73e:d1e2 with SMTP id y16-20020a056a208e1000b000a4a73ed1e2mr75320581pzj.57.1672914210279; Thu, 05 Jan 2023 02:23:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914210; cv=none; d=google.com; s=arc-20160816; b=uyWWPVfrIyA3kXNITbL4NqFQiPbOIZMz246ar/wPOKoGzXnIQGnvdcJkHMDzYZdJyi PE8xA+tDzLUw0ZvJ+yDjFvzKiFNPG+1ikxovVOkX6PAUDD8HWzmXjeb0nRsMkfAKWcqA pCYncQQ3z6rB5Z67GJ4fnXXL8ell6eb2u3JuIYBO2i3lrsVAkskyNXNBptvzIAewmwhU 4kf3a9aQjOLd0q7Fk+4acgYs3V4pVw9LmgbIZBXJ23k0iP9FbYBOVLPdzDO86UfC8/Sl Xzo2irFPO63Jm34dpV6yVCjjY5/ZNv2ucmRABXsT2y6K1yLf3eQKkabs1MAS5AUNB1Eh Fgng== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=D5Cz37H+8pkm+eG9Z0H34ERNe8HsOxRqux6VnMQZxVI=; b=dovvRkrr3nCzBjVJIyhDNfqdhpqOqRrwHO95hQyjkvZIHvPIaHmJsBWYVUqJUXlwyk 6YhKEjcb+Igrtu7QjEyV4DNzro+40ctixOx/3SM+FEIB/kjgxsel7D4CFVDb61yDYa4N h6GqKzuVhX8+HxuxsB1H6AOUNuTWEn3M5HeODfWg0QiLUPMWQlHCEyxp2ph1t+kGkmRf iVjVVjyOqeUCZF205kVQPZm7QHcHtMDIuC2EaPVK9UONQBpI5XqR2AIZ3WCxGfTAVEZf vZwJ2++jKdVJ4CPLmYBQia1HzaZ7oPZhfynulLm+/bGspY5NVFQr93sK3+62AstCVml2 wKiw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=RZvyYiGX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c21-20020a631c15000000b00478ff3661d1si36368929pgc.440.2023.01.05.02.23.17; Thu, 05 Jan 2023 02:23:30 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=RZvyYiGX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232541AbjAEKWe (ORCPT + 99 others); Thu, 5 Jan 2023 05:22:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41282 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232746AbjAEKUy (ORCPT ); Thu, 5 Jan 2023 05:20:54 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D093758304 for ; Thu, 5 Jan 2023 02:19:39 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-4755eb8a57bso287394817b3.12 for ; Thu, 05 Jan 2023 02:19:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=D5Cz37H+8pkm+eG9Z0H34ERNe8HsOxRqux6VnMQZxVI=; b=RZvyYiGXLrB3ETdxawuVmo48jsdy59hZOrayu4MwbFs3pJ3ttnsPWlfACHcTKuKELB u7elr2o9ZzrreQn73HuxEenrQvdmQK+5qRx8fmCbb3+0y2yuQjtc7Eke6FKuTchDEYK8 YYzHLqNy10+sQRvm2sKmrDW/f1TF8kIqeHUqYRoK6qBEqz5HHAvdPBYZSb8+r9/77pgk lP8AdMMJCj/pfVHI2L4IlZhXYA3ZFnieW2soK/tiuKrevAeNTQ9+1Pzbbjl+RBoEO1Mv 1xYC+3/f2aDhItaL5lZuwhNnbQvNespJUKgHC6UK4LpthFqABlyIPMSYy0cETysfVVG/ YFkA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=D5Cz37H+8pkm+eG9Z0H34ERNe8HsOxRqux6VnMQZxVI=; b=BIw6RbUDoSmkeR/xxKy0KvxXmuj7ZrokWTVzLtCmnPZ1w9H0SQkIUA/owE5T/Iw1cX FcSwrTGnw+mo0k3W0CXQpn/s+bhi9Xp79VgpuX2bsGMF8ZJI6N9KigYntcBo+08lrKOC Qv7NrSnUNXlfB48jSJdZs8W0fiw8aRHT6tYhbqVVduq3/U6qAQvRc48A5vpB7s23ooBA K6XkXu494B0T5sIzp72bjL1GcHLKdNk3QVIhwtgeLUZFTBZB4RV85NYEeVqVC8WXNtH5 yHBa5DVGfHXLuM5Rjy61qOY8mZXzyqQK/DSl2421CFoNGliPuq+NoAI/OscO9KWjZP8S AG8Q== X-Gm-Message-State: AFqh2kpSDhIHIVHyukI/N6C5I0QtDkGOw4+o7KO5ieioUguHiDb9R/eu Ipf8Rje/SzC/js2WXoFG+J3D51d1DNy6FIZc X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a81:17d6:0:b0:3ea:9ce2:cd76 with SMTP id 205-20020a8117d6000000b003ea9ce2cd76mr93655ywx.217.1672913979081; Thu, 05 Jan 2023 02:19:39 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:29 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-32-jthoughton@google.com> Subject: [PATCH 31/46] hugetlb: sort hstates in hugetlb_init_hstates From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177690616290622?= X-GMAIL-MSGID: =?utf-8?q?1754177690616290622?= When using HugeTLB high-granularity mapping, we need to go through the supported hugepage sizes in decreasing order so that we pick the largest size that works. Consider the case where we're faulting in a 1G hugepage for the first time: we want hugetlb_fault/hugetlb_no_page to map it with a PUD. By going through the sizes in decreasing order, we will find that PUD_SIZE works before finding out that PMD_SIZE or PAGE_SIZE work too. This commit also changes bootmem hugepages from storing hstate pointers directly to storing the hstate sizes. The hstate pointers used for boot-time-allocated hugepages become invalid after we sort the hstates. `gather_bootmem_prealloc`, called after the hstates have been sorted, now converts the size to the correct hstate. Signed-off-by: James Houghton --- include/linux/hugetlb.h | 2 +- mm/hugetlb.c | 49 ++++++++++++++++++++++++++++++++--------- 2 files changed, 40 insertions(+), 11 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index daf993fdbc38..8a664a9dd0a8 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -789,7 +789,7 @@ struct hstate { struct huge_bootmem_page { struct list_head list; - struct hstate *hstate; + unsigned long hstate_sz; }; int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 2fb95ecafc63..1e9e149587b3 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -34,6 +34,7 @@ #include #include #include +#include #include #include @@ -49,6 +50,10 @@ int hugetlb_max_hstate __read_mostly; unsigned int default_hstate_idx; +/* + * After hugetlb_init_hstates is called, hstates will be sorted from largest + * to smallest. + */ struct hstate hstates[HUGE_MAX_HSTATE]; #ifdef CONFIG_CMA @@ -3347,7 +3352,7 @@ int __alloc_bootmem_huge_page(struct hstate *h, int nid) /* Put them into a private list first because mem_map is not up yet */ INIT_LIST_HEAD(&m->list); list_add(&m->list, &huge_boot_pages); - m->hstate = h; + m->hstate_sz = huge_page_size(h); return 1; } @@ -3362,7 +3367,7 @@ static void __init gather_bootmem_prealloc(void) list_for_each_entry(m, &huge_boot_pages, list) { struct page *page = virt_to_page(m); struct folio *folio = page_folio(page); - struct hstate *h = m->hstate; + struct hstate *h = size_to_hstate(m->hstate_sz); VM_BUG_ON(!hstate_is_gigantic(h)); WARN_ON(folio_ref_count(folio) != 1); @@ -3478,9 +3483,38 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h) kfree(node_alloc_noretry); } +static int compare_hstates_decreasing(const void *a, const void *b) +{ + unsigned long sz_a = huge_page_size((const struct hstate *)a); + unsigned long sz_b = huge_page_size((const struct hstate *)b); + + if (sz_a < sz_b) + return 1; + if (sz_a > sz_b) + return -1; + return 0; +} + +static void sort_hstates(void) +{ + unsigned long default_hstate_sz = huge_page_size(&default_hstate); + + /* Sort from largest to smallest. */ + sort(hstates, hugetlb_max_hstate, sizeof(*hstates), + compare_hstates_decreasing, NULL); + + /* + * We may have changed the location of the default hstate, so we need to + * update it. + */ + default_hstate_idx = hstate_index(size_to_hstate(default_hstate_sz)); +} + static void __init hugetlb_init_hstates(void) { - struct hstate *h, *h2; + struct hstate *h; + + sort_hstates(); for_each_hstate(h) { /* oversize hugepages were init'ed in early boot */ @@ -3499,13 +3533,8 @@ static void __init hugetlb_init_hstates(void) continue; if (hugetlb_cma_size && h->order <= HUGETLB_PAGE_ORDER) continue; - for_each_hstate(h2) { - if (h2 == h) - continue; - if (h2->order < h->order && - h2->order > h->demote_order) - h->demote_order = h2->order; - } + if (h - 1 >= &hstates[0]) + h->demote_order = huge_page_order(h - 1); } } From patchwork Thu Jan 5 10:18:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39454 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp228014wrt; Thu, 5 Jan 2023 02:23:23 -0800 (PST) X-Google-Smtp-Source: AMrXdXswIYgBNR5utgpVpg4R3O4h2iv1vah5NI+h4HMRlxXJ05QZYvmulYYr4HarsWxaJsyi4JOR X-Received: by 2002:aa7:955d:0:b0:582:d5e3:12a7 with SMTP id w29-20020aa7955d000000b00582d5e312a7mr7930778pfq.6.1672914203302; Thu, 05 Jan 2023 02:23:23 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914203; cv=none; d=google.com; s=arc-20160816; b=shzV3LJhTIB72oncZ1Jcz4ByLdT2dms6oSKR8wz2IZUw/Y88cVBvhyZZqlZnfBcPC6 /7tzJDdQ/3maMQrgBeAq6sklT/CIbalX8qFmWNIZe6a/sKAqyfSByinCidnkjgPPq4b2 N04fxgv9sITJHgc2aSB4O0HgdVHlRcMAYL8SF8cI+m1wrQvn/gHwnaiq167YgpfXkWv5 Xd8OMdcw973djL7lV8i3P+EXbcC4tthvBrjMjPEQf4KAAmJi4xKe+hEWrMoUQL+Fwdez KgzWPv3aYZYH94zdDJdCqXSweFF8PQvQFvP/SBgE/QsEI9yel1vGNd3RY5TTQiMuyeb6 +iUQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=fjKcTgCHQU9dL6HEL7BjnCeJzokWebjD5rXkoH4g/pM=; b=1JlJKrQRgTX6P4Hwo89R4Fh4sE2YSzGt3ZPHuc3NYS55CDs9ybZmdax7TlMFxOtLvZ iF+vz0AY4EZ8FTu6fEMP9iqV3GQPfbZt4H33xJbVoQzXltcTWaji2dAZW86WUfCH6RU4 6UhGzTYmBOw+Vfwk1+rO1Z+XtD+DKIUORKWWO4XpNZrweBOh5Mh5FHVd1KpJ/2uhMVfn GM/Au/BTDb3jVFjpo/fgVrT2EdmZKIeDgR6n8ags+UJu1O7e90L/eXCuQeF2PoywMPuz CUFE+MUgj5qNon/VVunUu+Z6YXowFb1CHWLqxa82Z4RCWgAkv46HZy5nvc2lO5s6sXmf TmxQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=VKrl9diS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p26-20020a056a000a1a00b005768619bd8esi40163366pfh.177.2023.01.05.02.23.10; Thu, 05 Jan 2023 02:23:23 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=VKrl9diS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233152AbjAEKWb (ORCPT + 99 others); Thu, 5 Jan 2023 05:22:31 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40510 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232615AbjAEKU4 (ORCPT ); Thu, 5 Jan 2023 05:20:56 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5C4C758310 for ; Thu, 5 Jan 2023 02:19:41 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-4b34cf67fb6so64986277b3.6 for ; Thu, 05 Jan 2023 02:19:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=fjKcTgCHQU9dL6HEL7BjnCeJzokWebjD5rXkoH4g/pM=; b=VKrl9diSEpwY713/gRrQ7D7jynnfdDKrTaaauRJ8Kmx76DzNiFvpsbu9cleofopMYY +vN1ez44452yYnBQYI9/ica76cicDmXhDVXewbjHC7ZvbpYxQspAVjUda//9KqW+Xs8u LR39HFBaerFN0EvlEGK+6/C5d0YJLK8kf7/07SPIgTMOJAJTpX9hXOYciZWdCmvO2ahR 6iZd5/Y9MDnIUiesI7LWvxsmnqWFe7L9Ef31ukl3F+hZHDBLwQWc9oEr3/AQYOzqITnW ykinTUIHoWMFPDqTP4+sgJ80WGoY/Y9vThhY32VjfbhOO1aAUa7akpnfIabYK+/jBMcc tlBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=fjKcTgCHQU9dL6HEL7BjnCeJzokWebjD5rXkoH4g/pM=; b=dozJYGMf0EahLV50FKPQ2GA6ZqN+wjmgaIFUUjkmOXjbHLCGDbUzDUTTxrNLMdD3TN Z45uexTIULbT6DRJPPtGyye1+dJqSMggVLo7gYMlJAzbadYMnkmtVB4i4B8Bsvlik4i6 LzpgMF7K5vVx1U0VyEcTygECw6XsWw8f2xdDhULtGHqUwBCrmTu2gWkwNP/BI+zkEaBM vQWGFvawytiC/r4iOCaIuH1jUduDTPGbjI6OCfbVForg0jCPbW69Qz1Aabsf2LB9244z MbCKqOPe5HfBhFNnGYqyQNLR/GnAmkSAXDlaSe27kB32nUN0hUTCvKQQAgjumTFiiDpL Dj5Q== X-Gm-Message-State: AFqh2koyx+qFux8swDKWLjAdg4ddjew1hNvyeloe4BlvvwCM+9DguBsG wZo+J9nqhfQO2+bd32N6G9JXOh2QX9JXWe3R X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:b78c:0:b0:769:74cd:9c63 with SMTP id n12-20020a25b78c000000b0076974cd9c63mr4949127ybh.257.1672913980684; Thu, 05 Jan 2023 02:19:40 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:30 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-33-jthoughton@google.com> Subject: [PATCH 32/46] hugetlb: add for_each_hgm_shift From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177683966280981?= X-GMAIL-MSGID: =?utf-8?q?1754177683966280981?= This is a helper macro to loop through all the usable page sizes for a high-granularity-enabled HugeTLB VMA. Given the VMA's hstate, it will loop, in descending order, through the page sizes that HugeTLB supports for this architecture. It always includes PAGE_SIZE. This is done by looping through the hstates; however, there is no hstate for PAGE_SIZE. To handle this case, the loop intentionally goes out of bounds, and the out-of-bounds pointer is mapped to PAGE_SIZE. Signed-off-by: James Houghton --- mm/hugetlb.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 1e9e149587b3..1eef6968b1fa 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -7780,6 +7780,24 @@ bool hugetlb_hgm_enabled(struct vm_area_struct *vma) { return vma && (vma->vm_flags & VM_HUGETLB_HGM); } +/* Should only be used by the for_each_hgm_shift macro. */ +static unsigned int __shift_for_hstate(struct hstate *h) +{ + /* If h is out of bounds, we have reached the end, so give PAGE_SIZE */ + if (h >= &hstates[hugetlb_max_hstate]) + return PAGE_SHIFT; + return huge_page_shift(h); +} + +/* + * Intentionally go out of bounds. An out-of-bounds hstate will be converted to + * PAGE_SIZE. + */ +#define for_each_hgm_shift(hstate, tmp_h, shift) \ + for ((tmp_h) = hstate; (shift) = __shift_for_hstate(tmp_h), \ + (tmp_h) <= &hstates[hugetlb_max_hstate]; \ + (tmp_h)++) + #endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ /* From patchwork Thu Jan 5 10:18:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39459 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp228154wrt; Thu, 5 Jan 2023 02:23:51 -0800 (PST) X-Google-Smtp-Source: AMrXdXv385SimSzhjE1Eg5frOD6ak3V/ixsqyX1jPO+CpW1MJeF8OclKCqDD8ikVzvxzbTIY+q/f X-Received: by 2002:a17:902:b713:b0:191:2a9c:52a1 with SMTP id d19-20020a170902b71300b001912a9c52a1mr54006498pls.19.1672914231149; Thu, 05 Jan 2023 02:23:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914231; cv=none; d=google.com; s=arc-20160816; b=cS2edVq4AbGSQBo/NbKU4ogafyKhG+ZPkubtoDKeb/DkN6tvqR4xdkOrHttzo+46tH hBPqeVXjAh8qc8XWq+NMI9kpj+5PaEOJgjcGSk3j9VnwvSrqYNa03YXgGMTW0OTYnhCr RVw/Ej+pVLIBkL/z834D2NZZXV72vJYvbieUlNN5IaYGMwxkR4g5/c6T6q55DYWSZNwW scogEvKL1vmSHGSAoudE+5/03i/Jz8VTOvaU5wz/M+fkP9pGWFFHlLag4nfV/3LfpJN2 trHisF3Bqltmcmsg6KRgMQr8CnpE75cA4ZGYrenF4W0LUnda0m9+msvNU5x5MICamx+G YonA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=zyj8WI6ZHUfTa3QEH+sjYLC8OtvTBOvuUTYLgK/H6sg=; b=nwjXDbltLM7jzva+/wZqkfNsYBLSOMPo8c1P+jcRTIlprm9PyiE2jDdkHyt2GCcMO6 xWvfSvOvic20NOvoECSkclziStz5yM4VzBWexCCRmKhnFneRseylFpDJLeuQuVgqI9TK lCZ7NnTwNMuUJ9b94Tl82R4FJSW+OLo/0gWDPmNvIddW/QiJ1d7M4cG7+SfQHGkWKRZl Fi0u/PbkErfcOjgds88Wdxu77ak/EJcPtgVIEZcHSSBIKc3bB3nb38TCY7DbKrCEJeX2 JZZuuO4ZaP8y59IIhBDzJzxFMEVidocA7wIlxnzX4arSD/GK7uc74eUBNIbqaCGh+ufM RKTQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=Rbxv9WG5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l3-20020a170902f68300b00176a6c988c6si40711116plg.218.2023.01.05.02.23.38; Thu, 05 Jan 2023 02:23:51 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=Rbxv9WG5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232986AbjAEKWl (ORCPT + 99 others); Thu, 5 Jan 2023 05:22:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39980 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232762AbjAEKU6 (ORCPT ); Thu, 5 Jan 2023 05:20:58 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4FA7F574D9 for ; Thu, 5 Jan 2023 02:19:42 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id n203-20020a2572d4000000b0078f09db9888so19455387ybc.18 for ; Thu, 05 Jan 2023 02:19:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=zyj8WI6ZHUfTa3QEH+sjYLC8OtvTBOvuUTYLgK/H6sg=; b=Rbxv9WG514G9oOVs5vgQcTsbBy3/SRiOX3mNrNz29MS2bMXWY98la+rMNn2ajtmuKI RbKhJIhFROdY9YOERK0qYGEoiyq1d3vUSlg1jaKt5uAVqninxjoRWct88Yzwuxfk19z2 tsUY9QbBNmp7uiYlaf1LkNsrVrEC6ImgJHVLpEkO0iklvaW4mvdOLPz/4kUNPie65iNB gl7E3ZTK1v0Ya7fJYqs1A2YDxwbzwmY0HhaZ2uNEUltjjPJr2t9Ft0HjMgP5+BWI5luR 489kqIQR2WzL51HSVEyA0vbDsaZePxEYeKfvbmKR0o+veu6LHZWmA2Yg2sQObvWtPROx y3mA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=zyj8WI6ZHUfTa3QEH+sjYLC8OtvTBOvuUTYLgK/H6sg=; b=3c3fIKX7NKD7RrLkj4LYLTQ85HEQGZBFVrUsaWrl72eX3+amvDTOeSVc2OIayGIrgL qMcdytUUXGIA+gz/U6cdLEiIv5fk0t4ZdZFG7WouFnqZcj18nfPC02wv3TOemYRMdRqb w4I5C6Ctj436fX4FPe6WOl/8SvQPcERvDbyly/W7MD7OXcr06GVu/Lvp/5WrPG1Sye3q wY+L4MUIBqFY5hkdzW704Ph0bjnc/FI/rm9/O1u67vncDM9wmWOos5ifkuuXL+navToh hApC4t5ysQfE3oyuVr7LrTRgrq4So2tiFoI1xqR1ddxHG6pjPCDl64OCiq7BBtnZ0dwy rmzg== X-Gm-Message-State: AFqh2kouYsW5gwSiWjtQ7I6b/K8p2xNCXSmp1pSwTunEDIV9FJSkiohz +gA2Yi0mRo+wovZEFvFqlYXN/dkVKoYlpL8x X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:3745:0:b0:75b:b01c:6a2c with SMTP id e66-20020a253745000000b0075bb01c6a2cmr5617998yba.166.1672913982057; Thu, 05 Jan 2023 02:19:42 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:31 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-34-jthoughton@google.com> Subject: [PATCH 33/46] hugetlb: userfaultfd: add support for high-granularity UFFDIO_CONTINUE From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177712989053317?= X-GMAIL-MSGID: =?utf-8?q?1754177712989053317?= Changes here are similar to the changes made for hugetlb_no_page. Pass vmf->real_address to userfaultfd_huge_must_wait because vmf->address may be rounded down to the hugepage size, and a high-granularity page table walk would look up the wrong PTE. Also change the call to userfaultfd_must_wait in the same way for consistency. This commit introduces hugetlb_alloc_largest_pte which is used to find the appropriate PTE size to map pages with UFFDIO_CONTINUE. When MADV_SPLIT is provided, page fault events will report PAGE_SIZE-aligned address instead of huge_page_size(h)-aligned addresses, regardless of if UFFD_FEATURE_EXACT_ADDRESS is used. Signed-off-by: James Houghton --- fs/userfaultfd.c | 14 +++---- include/linux/hugetlb.h | 18 ++++++++- mm/hugetlb.c | 85 +++++++++++++++++++++++++++++++++-------- mm/userfaultfd.c | 40 +++++++++++-------- 4 files changed, 119 insertions(+), 38 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 15a5bf765d43..940ff63096a9 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -252,17 +252,17 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx, unsigned long flags, unsigned long reason) { - pte_t *ptep, pte; + pte_t pte; bool ret = true; + struct hugetlb_pte hpte; mmap_assert_locked(ctx->mm); - ptep = hugetlb_walk(vma, address, vma_mmu_pagesize(vma)); - if (!ptep) + if (hugetlb_full_walk(&hpte, vma, address)) goto out; ret = false; - pte = huge_ptep_get(ptep); + pte = huge_ptep_get(hpte.ptep); /* * Lockless access: we're in a wait_event so it's ok if it @@ -531,11 +531,11 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) spin_unlock_irq(&ctx->fault_pending_wqh.lock); if (!is_vm_hugetlb_page(vma)) - must_wait = userfaultfd_must_wait(ctx, vmf->address, vmf->flags, - reason); + must_wait = userfaultfd_must_wait(ctx, vmf->real_address, + vmf->flags, reason); else must_wait = userfaultfd_huge_must_wait(ctx, vma, - vmf->address, + vmf->real_address, vmf->flags, reason); if (is_vm_hugetlb_page(vma)) hugetlb_vma_unlock_read(vma); diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 8a664a9dd0a8..c8524ac49b24 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -224,7 +224,8 @@ unsigned long hugetlb_total_pages(void); vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, unsigned int flags); #ifdef CONFIG_USERFAULTFD -int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte, +int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, + struct hugetlb_pte *dst_hpte, struct vm_area_struct *dst_vma, unsigned long dst_addr, unsigned long src_addr, @@ -1292,16 +1293,31 @@ static inline enum hugetlb_level hpage_size_to_level(unsigned long sz) #ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING bool hugetlb_hgm_enabled(struct vm_area_struct *vma); +bool hugetlb_hgm_advised(struct vm_area_struct *vma); bool hugetlb_hgm_eligible(struct vm_area_struct *vma); +int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *mm, + struct vm_area_struct *vma, unsigned long start, + unsigned long end); #else static inline bool hugetlb_hgm_enabled(struct vm_area_struct *vma) { return false; } +static inline bool hugetlb_hgm_advised(struct vm_area_struct *vma) +{ + return false; +} static inline bool hugetlb_hgm_eligible(struct vm_area_struct *vma) { return false; } +static inline +int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *mm, + struct vm_area_struct *vma, unsigned long start, + unsigned long end) +{ + return -EINVAL; +} #endif static inline spinlock_t *huge_pte_lock(struct hstate *h, diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 1eef6968b1fa..5af6db52f34e 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5936,6 +5936,13 @@ static inline vm_fault_t hugetlb_handle_userfault(struct vm_area_struct *vma, unsigned long addr, unsigned long reason) { + /* + * Don't use the hpage-aligned address if the user has explicitly + * enabled HGM. + */ + if (hugetlb_hgm_advised(vma) && reason == VM_UFFD_MINOR) + haddr = address & PAGE_MASK; + u32 hash; struct vm_fault vmf = { .vma = vma, @@ -6420,7 +6427,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, * modifications for huge pages. */ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, - pte_t *dst_pte, + struct hugetlb_pte *dst_hpte, struct vm_area_struct *dst_vma, unsigned long dst_addr, unsigned long src_addr, @@ -6431,13 +6438,14 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, bool is_continue = (mode == MCOPY_ATOMIC_CONTINUE); struct hstate *h = hstate_vma(dst_vma); struct address_space *mapping = dst_vma->vm_file->f_mapping; - pgoff_t idx = vma_hugecache_offset(h, dst_vma, dst_addr); + unsigned long haddr = dst_addr & huge_page_mask(h); + pgoff_t idx = vma_hugecache_offset(h, dst_vma, haddr); unsigned long size; int vm_shared = dst_vma->vm_flags & VM_SHARED; pte_t _dst_pte; spinlock_t *ptl; int ret = -ENOMEM; - struct page *page; + struct page *page, *subpage; int writable; bool page_in_pagecache = false; @@ -6452,12 +6460,12 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, * a non-missing case. Return -EEXIST. */ if (vm_shared && - hugetlbfs_pagecache_present(h, dst_vma, dst_addr)) { + hugetlbfs_pagecache_present(h, dst_vma, haddr)) { ret = -EEXIST; goto out; } - page = alloc_huge_page(dst_vma, dst_addr, 0); + page = alloc_huge_page(dst_vma, haddr, 0); if (IS_ERR(page)) { ret = -ENOMEM; goto out; @@ -6473,13 +6481,13 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, /* Free the allocated page which may have * consumed a reservation. */ - restore_reserve_on_error(h, dst_vma, dst_addr, page); + restore_reserve_on_error(h, dst_vma, haddr, page); put_page(page); /* Allocate a temporary page to hold the copied * contents. */ - page = alloc_huge_page_vma(h, dst_vma, dst_addr); + page = alloc_huge_page_vma(h, dst_vma, haddr); if (!page) { ret = -ENOMEM; goto out; @@ -6493,14 +6501,14 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, } } else { if (vm_shared && - hugetlbfs_pagecache_present(h, dst_vma, dst_addr)) { + hugetlbfs_pagecache_present(h, dst_vma, haddr)) { put_page(*pagep); ret = -EEXIST; *pagep = NULL; goto out; } - page = alloc_huge_page(dst_vma, dst_addr, 0); + page = alloc_huge_page(dst_vma, haddr, 0); if (IS_ERR(page)) { put_page(*pagep); ret = -ENOMEM; @@ -6548,7 +6556,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, page_in_pagecache = true; } - ptl = huge_pte_lock(h, dst_mm, dst_pte); + ptl = hugetlb_pte_lock(dst_hpte); ret = -EIO; if (PageHWPoison(page)) @@ -6560,7 +6568,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, * page backing it, then access the page. */ ret = -EEXIST; - if (!huge_pte_none_mostly(huge_ptep_get(dst_pte))) + if (!huge_pte_none_mostly(huge_ptep_get(dst_hpte->ptep))) goto out_release_unlock; if (page_in_pagecache) @@ -6577,7 +6585,10 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, else writable = dst_vma->vm_flags & VM_WRITE; - _dst_pte = make_huge_pte(dst_vma, page, writable); + subpage = hugetlb_find_subpage(h, page, dst_addr); + + _dst_pte = make_huge_pte_with_shift(dst_vma, subpage, writable, + dst_hpte->shift); /* * Always mark UFFDIO_COPY page dirty; note that this may not be * extremely important for hugetlbfs for now since swapping is not @@ -6590,12 +6601,12 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, if (wp_copy) _dst_pte = huge_pte_mkuffd_wp(_dst_pte); - set_huge_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte); + set_huge_pte_at(dst_mm, dst_addr, dst_hpte->ptep, _dst_pte); - hugetlb_count_add(pages_per_huge_page(h), dst_mm); + hugetlb_count_add(hugetlb_pte_size(dst_hpte) / PAGE_SIZE, dst_mm); /* No need to invalidate - it was non-present before */ - update_mmu_cache(dst_vma, dst_addr, dst_pte); + update_mmu_cache(dst_vma, dst_addr, dst_hpte->ptep); spin_unlock(ptl); if (!is_continue) @@ -7780,6 +7791,18 @@ bool hugetlb_hgm_enabled(struct vm_area_struct *vma) { return vma && (vma->vm_flags & VM_HUGETLB_HGM); } +bool hugetlb_hgm_advised(struct vm_area_struct *vma) +{ + /* + * Right now, the only way for HGM to be enabled is if a user + * explicitly enables it via MADV_SPLIT, but in the future, there + * may be cases where it gets enabled automatically. + * + * Provide hugetlb_hgm_advised() now for call sites where care that the + * user explicitly enabled HGM. + */ + return hugetlb_hgm_enabled(vma); +} /* Should only be used by the for_each_hgm_shift macro. */ static unsigned int __shift_for_hstate(struct hstate *h) { @@ -7798,6 +7821,38 @@ static unsigned int __shift_for_hstate(struct hstate *h) (tmp_h) <= &hstates[hugetlb_max_hstate]; \ (tmp_h)++) +/* + * Find the HugeTLB PTE that maps as much of [start, end) as possible with a + * single page table entry. It is returned in @hpte. + */ +int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *mm, + struct vm_area_struct *vma, unsigned long start, + unsigned long end) +{ + struct hstate *h = hstate_vma(vma), *tmp_h; + unsigned int shift; + unsigned long sz; + int ret; + + for_each_hgm_shift(h, tmp_h, shift) { + sz = 1UL << shift; + + if (!IS_ALIGNED(start, sz) || start + sz > end) + continue; + goto found; + } + return -EINVAL; +found: + ret = hugetlb_full_walk_alloc(hpte, vma, start, sz); + if (ret) + return ret; + + if (hpte->shift > shift) + return -EEXIST; + + return 0; +} + #endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ /* diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 65ad172add27..2b233d31be24 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -320,14 +320,16 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, { int vm_shared = dst_vma->vm_flags & VM_SHARED; ssize_t err; - pte_t *dst_pte; unsigned long src_addr, dst_addr; long copied; struct page *page; - unsigned long vma_hpagesize; + unsigned long vma_hpagesize, target_pagesize; pgoff_t idx; u32 hash; struct address_space *mapping; + bool use_hgm = hugetlb_hgm_advised(dst_vma) && + mode == MCOPY_ATOMIC_CONTINUE; + struct hstate *h = hstate_vma(dst_vma); /* * There is no default zero huge page for all huge page sizes as @@ -345,12 +347,13 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, copied = 0; page = NULL; vma_hpagesize = vma_kernel_pagesize(dst_vma); + target_pagesize = use_hgm ? PAGE_SIZE : vma_hpagesize; /* - * Validate alignment based on huge page size + * Validate alignment based on the targeted page size. */ err = -EINVAL; - if (dst_start & (vma_hpagesize - 1) || len & (vma_hpagesize - 1)) + if (dst_start & (target_pagesize - 1) || len & (target_pagesize - 1)) goto out_unlock; retry: @@ -381,13 +384,14 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, } while (src_addr < src_start + len) { + struct hugetlb_pte hpte; BUG_ON(dst_addr >= dst_start + len); /* * Serialize via vma_lock and hugetlb_fault_mutex. - * vma_lock ensures the dst_pte remains valid even - * in the case of shared pmds. fault mutex prevents - * races with other faulting threads. + * vma_lock ensures the hpte.ptep remains valid even + * in the case of shared pmds and page table collapsing. + * fault mutex prevents races with other faulting threads. */ idx = linear_page_index(dst_vma, dst_addr); mapping = dst_vma->vm_file->f_mapping; @@ -395,23 +399,28 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, mutex_lock(&hugetlb_fault_mutex_table[hash]); hugetlb_vma_lock_read(dst_vma); - err = -ENOMEM; - dst_pte = huge_pte_alloc(dst_mm, dst_vma, dst_addr, vma_hpagesize); - if (!dst_pte) { + if (use_hgm) + err = hugetlb_alloc_largest_pte(&hpte, dst_mm, dst_vma, + dst_addr, + dst_start + len); + else + err = hugetlb_full_walk_alloc(&hpte, dst_vma, dst_addr, + vma_hpagesize); + if (err) { hugetlb_vma_unlock_read(dst_vma); mutex_unlock(&hugetlb_fault_mutex_table[hash]); goto out_unlock; } if (mode != MCOPY_ATOMIC_CONTINUE && - !huge_pte_none_mostly(huge_ptep_get(dst_pte))) { + !huge_pte_none_mostly(huge_ptep_get(hpte.ptep))) { err = -EEXIST; hugetlb_vma_unlock_read(dst_vma); mutex_unlock(&hugetlb_fault_mutex_table[hash]); goto out_unlock; } - err = hugetlb_mcopy_atomic_pte(dst_mm, dst_pte, dst_vma, + err = hugetlb_mcopy_atomic_pte(dst_mm, &hpte, dst_vma, dst_addr, src_addr, mode, &page, wp_copy); @@ -423,6 +432,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, if (unlikely(err == -ENOENT)) { mmap_read_unlock(dst_mm); BUG_ON(!page); + WARN_ON_ONCE(hpte.shift != huge_page_shift(h)); err = copy_huge_page_from_user(page, (const void __user *)src_addr, @@ -440,9 +450,9 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, BUG_ON(page); if (!err) { - dst_addr += vma_hpagesize; - src_addr += vma_hpagesize; - copied += vma_hpagesize; + dst_addr += hugetlb_pte_size(&hpte); + src_addr += hugetlb_pte_size(&hpte); + copied += hugetlb_pte_size(&hpte); if (fatal_signal_pending(current)) err = -EINTR; From patchwork Thu Jan 5 10:18:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39458 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp228151wrt; Thu, 5 Jan 2023 02:23:51 -0800 (PST) X-Google-Smtp-Source: AMrXdXvyTcTmL23lisD7p5Osg11lXWCo9bBmVOXmGUXWbUbH/ZXcf1CO0DV2b1ABreyP286cmouz X-Received: by 2002:a05:6a20:43aa:b0:af:7336:f468 with SMTP id i42-20020a056a2043aa00b000af7336f468mr65969070pzl.20.1672914230748; Thu, 05 Jan 2023 02:23:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914230; cv=none; d=google.com; s=arc-20160816; b=1ID+jOu5kGXwDxIuCw72fE4ppZZ7as399NsOadVUg46YYkwKNuHqLb51NkKAwIkUOu 9aEgbXDKLLRY7jywF4bFvqGjfUBEynTbRDCilGWpC0GuxZKsKDgR61+zTPdb2YKReb31 PKWY9XkEApBxLx4RbCKz1Q6AuZV9QiPZI0lNo0zHJ3UJc3HmprJPndGcLYvL9QuLODFL mYesVSffvbxZ1bxgHo8kLxbVanY3sCaRQEgRWLtoi2n0F36TuwQDI9fEYoqX18mZYeKl qJXg62XKd8SagnrsJct8wVBW1dbwiJc+oVp5A0YRPyl722BNAvUh5+qZW7f6M+72+VR9 FeAw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=zCmSbITJPCJiNnskiU3UeHLtqxSjqb+c0ixZX/ofLl4=; b=XdpSO+xsSLfoXNuGtnjhGmFjLEns1falHvSyf29HI27cuvhM9N0c9mJlezcU42UlSB ppzLfvm/QMXq7ffQiRwsJ140Tgxdii4vBt8RHm4R2br5d+408Lsui17xijAQtrrPZYsr zc71ogq9zxdX3XgkMW6+7/xjjzXb9Sp4ZUfiWPtO3DWYI+UaJS/tUl4j6cKH4DVpB0jJ Nm+dAiuZiM4nzyPstY4+EMQqk1sPfM4xQzwTIecQKsBokQAWg1of1PEbElbua5OMDT4A PJWUd01xAmy1Z+CbgfF7mP/zLQVHG9LIYGBH9guWED0ygY0G25smsJ1WxZCXc08zaO2l 8Xgw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=lJAnXXDa; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d5-20020a656b85000000b00439fb921f8fsi37050023pgw.460.2023.01.05.02.23.38; Thu, 05 Jan 2023 02:23:50 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=lJAnXXDa; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232996AbjAEKWs (ORCPT + 99 others); Thu, 5 Jan 2023 05:22:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41314 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232921AbjAEKVJ (ORCPT ); Thu, 5 Jan 2023 05:21:09 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 16BDC5831B for ; Thu, 5 Jan 2023 02:19:44 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id z17-20020a25e311000000b00719e04e59e1so36650445ybd.10 for ; Thu, 05 Jan 2023 02:19:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=zCmSbITJPCJiNnskiU3UeHLtqxSjqb+c0ixZX/ofLl4=; b=lJAnXXDaHr20zsyaBijCW0KfmxNaRfjZpxfzb+EmrpYOQBN9AQ+W3KqrhpFEfEMu4i +7s1tvD7xS0kP2x25VUp+DunUaseeJKeyN1avNtBeuFHDsiHsskzjfBlou/AESX/+GyZ Yno3ggoNr2tSyK2HIjIkjbcpNd8owTXKEPNm9MytpSANgE4bv5+upiaLxV9VBMb4i1Xe S8lJoonIgUC5GAtZGJIrTMLDZnJTnMIHM2tRy+FWyrtFT6x4k3aGvbCYpCKZhQPF1Ozf oa94HBmejEbGCXSYJeddbV1BbDR2UK5+OlrJ4o06xbexPYzwtOlvGFmkuLgDAu3Jz91E fNDQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=zCmSbITJPCJiNnskiU3UeHLtqxSjqb+c0ixZX/ofLl4=; b=YeW62+CoHH4lNTYNWYaLBcy4Xl8oVz/2dW+ROprR6V6PoeVZ29IDaRoVLZ0CNJjkDn PGGzljPay8NqxToUdiHEeCrIi0DeLn/xhb/N6K2FSB+ZljkFKEOmAKuw81SqAyOHp3nm Iw7qPiRrsDc8fSx8gquXAhov5p3KCHOUTaBJf39ouOYxLw5aoJirz+Lfln+i+jx1r49E QMAvEwFOJL0UsgVjlgObTLGBG4ztGFhRfBOAuOk41s7Uou12m89wLf7OxVUPbycLxfhi ooRzUWtPtgdcii+4QfGIaNpiknTkahM/0sj7nsiafTvOAoTnr5QoW++v4kYhEjgu2qE+ irag== X-Gm-Message-State: AFqh2kqceEh9qYmGoYMyI5SHzLk2LL63ZEz7YDCjmLTaO6yNtSb2+4Bc Vk9T5CgafEi1CAA/D2QPyUaVNkTzgXNCxk/+ X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a81:688a:0:b0:3b7:e501:90cf with SMTP id d132-20020a81688a000000b003b7e50190cfmr2167375ywc.501.1672913983311; Thu, 05 Jan 2023 02:19:43 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:32 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-35-jthoughton@google.com> Subject: [PATCH 34/46] hugetlb: userfaultfd: when using MADV_SPLIT, round addresses to PAGE_SIZE From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177712082686641?= X-GMAIL-MSGID: =?utf-8?q?1754177712082686641?= MADV_SPLIT enables HugeTLB HGM which allows for UFFDIO_CONTINUE in PAGE_SIZE chunks. If a huge-page-aligned address were to be provided, userspace would be completely unable to take advantage of HGM. That would then require userspace to know to provide UFFD_FEATURE_EXACT_ADDRESS. This patch would make it harder to make a mistake. Instead of requiring userspace to provide UFFD_FEATURE_EXACT_ADDRESS, always provide a usable address. Signed-off-by: James Houghton --- mm/hugetlb.c | 31 +++++++++++++++---------------- 1 file changed, 15 insertions(+), 16 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 5af6db52f34e..5b6215e03fe1 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5936,28 +5936,27 @@ static inline vm_fault_t hugetlb_handle_userfault(struct vm_area_struct *vma, unsigned long addr, unsigned long reason) { + u32 hash; + struct vm_fault vmf; + /* * Don't use the hpage-aligned address if the user has explicitly * enabled HGM. */ if (hugetlb_hgm_advised(vma) && reason == VM_UFFD_MINOR) - haddr = address & PAGE_MASK; - - u32 hash; - struct vm_fault vmf = { - .vma = vma, - .address = haddr, - .real_address = addr, - .flags = flags, + haddr = addr & PAGE_MASK; - /* - * Hard to debug if it ends up being - * used by a callee that assumes - * something about the other - * uninitialized fields... same as in - * memory.c - */ - }; + vmf.vma = vma; + vmf.address = haddr; + vmf.real_address = addr; + vmf.flags = flags; + /* + * Hard to debug if it ends up being + * used by a callee that assumes + * something about the other + * uninitialized fields... same as in + * memory.c + */ /* * vma_lock and hugetlb_fault_mutex must be dropped before handling From patchwork Thu Jan 5 10:18:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39460 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp228155wrt; Thu, 5 Jan 2023 02:23:51 -0800 (PST) X-Google-Smtp-Source: AMrXdXvXXkya3MuGFUYEF/Cv6ehF/aeadnEM9yBudk2+DtAjU2kI9Bz68LDdzMqPR4ZOUz2BVa4s X-Received: by 2002:a05:6a20:bb29:b0:ad:394f:50c7 with SMTP id fc41-20020a056a20bb2900b000ad394f50c7mr57226692pzb.3.1672914231273; Thu, 05 Jan 2023 02:23:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914231; cv=none; d=google.com; s=arc-20160816; b=XhGDVIU0Fl0zMQ339g4XLC+v8fvd+vlbOnnI5Q6TTKRqWAPrD62WyTlzzpS6F/LZdv bUeW4zQX3ZSji9w4ZXMBSLyZiaK7tVV9WgY0MyN0W2vfMpE97HMU/Y2Fim77tpwGTQZF mc1w6Fs5Ra96uki10FXmjQq+5Ew04Mo9VNiHJG95IVxH50wu6uS5m+XX1Eonas2hROrA KurxH8eO64q/hI6u5i77wfOnSVIxvhmIj91/Lkk0xbflrAUV+x7oi6RUl8DXxdTn/WYz hEXjF6QMC7eiV5XR2UgsiezLduA9IHo7lj7oAPfH8Dvf4kZLgYTeok1Uk34SYAclel4j 3ihQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=3FgXUHA3tnp9ag/KdcrsQO8+AooEilqNmGGLS4NLvAo=; b=cTKe1ksCXFi1wGB6W/9nxHLAdCVfUaW6STHNiCl26PwLtWape23zddUeHHA7xUgOxD /HxuSkpVDVyKcae6NhguqoOUOy9+rK7CCPorKawV1gND+ZAcqdV2upjmGlOtkJZ0upN9 yhEryhUvl3mk1cgbZ8/SK6GtCxqFFqA+f0HbcrZ16ilZsaliyeCNtHVFfCE4lQ8ZiunO G2+OlevxBWt1DYG1kIX6sEfdT2ZJ+Ypjv71w6Wl/CEj2k5i4cAJaPBPt/HDnIWs6foHL M0Oxe6ywx8Nj7ICrkXv9264eKA0i5ivexWUXlUnZERclZPWPcEM9ZWqf8IIhJcyWptTo jarw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=CQ2ZAMvC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b68-20020a636747000000b00476e90fb3b7si36523243pgc.577.2023.01.05.02.23.38; Thu, 05 Jan 2023 02:23:51 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=CQ2ZAMvC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233004AbjAEKWx (ORCPT + 99 others); Thu, 5 Jan 2023 05:22:53 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41330 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232784AbjAEKV3 (ORCPT ); Thu, 5 Jan 2023 05:21:29 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5B5A158328 for ; Thu, 5 Jan 2023 02:19:45 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id y66-20020a25c845000000b00733b5049b6fso36236672ybf.3 for ; Thu, 05 Jan 2023 02:19:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=3FgXUHA3tnp9ag/KdcrsQO8+AooEilqNmGGLS4NLvAo=; b=CQ2ZAMvCUZJv/+eIPckh0cru0bcQh2zSN2X8UydOAYOHip0mnvLnjMTpx+nekpYPri PgSqIhuQRNSfE3cy5gmX64ME2FE79V7tZA1HzCajMGW/OTeRieb6CfAI5RCC+jjsnkgi P1AqjuWmCe+IBSuxFW0Sx6Yi8H2eOhRCeKbnXhh618YloM81ctLM2MDeHXdprNgFIwKI DnNxI3YwFHfZjuMqooj0PPDF7pJlqgONVEwWBBjnUtuxxCEahkMIAqGF8l2UH+JCve2z UX89Ln9ApsOrMPYb3Pi/fXmim/GTGmCNYZtKYdlY+XWMpNKMbgBdacdNGn0bNgJJOMnv zQ/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=3FgXUHA3tnp9ag/KdcrsQO8+AooEilqNmGGLS4NLvAo=; b=yWrqXJE55nKEvQ4wiQKmEsBpFXw+CjZe/E4xnwQlwMrPtRdQsoK8guZUtceSZIj3qh bLPP0V/0BqonHTVc6MtJjrWaKORfvSkLfwWp0yvwG59e5gyNSIX/pb+MaydPufV//L5k 5arF+uA9Cw7HHj5PvZR57nD9/VPwADxuZHcjpWfTCi44L1hHdqJ6REVG+BlDtr9KpWdj JtGoNt1o4YLmbAdbsyOel6ytJqhsk5AgTq8p0panfHQhQuKjmndThpqAZHBEnCLfRrvj 6wGlrEpaDe8//HMiwKwq0xvomfzfG5fZ+A7jPZ4VD3MpeDsnPUDbJ/eHcX6t59ZfzRCI lIeQ== X-Gm-Message-State: AFqh2kpz4ztTMuCWq3QvtjVpTH7//mO/CUnrBv7/FpN7NGWrFK7/rr2y SnX/qD6IIWfa56WKKu8iI55rVVhrb8TKmvbN X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a81:a209:0:b0:4af:16a7:5334 with SMTP id w9-20020a81a209000000b004af16a75334mr1111907ywg.159.1672913984631; Thu, 05 Jan 2023 02:19:44 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:33 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-36-jthoughton@google.com> Subject: [PATCH 35/46] hugetlb: add MADV_COLLAPSE for hugetlb From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177712886532653?= X-GMAIL-MSGID: =?utf-8?q?1754177712886532653?= This is a necessary extension to the UFFDIO_CONTINUE changes. When userspace finishes mapping an entire hugepage with UFFDIO_CONTINUE, the kernel has no mechanism to automatically collapse the page table to map the whole hugepage normally. We require userspace to inform us that they would like the mapping to be collapsed; they do this with MADV_COLLAPSE. If userspace has not mapped all of a hugepage with UFFDIO_CONTINUE, but only some, hugetlb_collapse will cause the requested range to be mapped as if it were UFFDIO_CONTINUE'd already. The effects of any UFFDIO_WRITEPROTECT calls may be undone by a call to MADV_COLLAPSE for intersecting address ranges. This commit is co-opting the same madvise mode that has been introduced to synchronously collapse THPs. The function that does THP collapsing has been renamed to madvise_collapse_thp. As with the rest of the high-granularity mapping support, MADV_COLLAPSE is only supported for shared VMAs right now. MADV_COLLAPSE has the same synchronization as huge_pmd_unshare. Signed-off-by: James Houghton --- include/linux/huge_mm.h | 12 +-- include/linux/hugetlb.h | 8 ++ mm/hugetlb.c | 164 ++++++++++++++++++++++++++++++++++++++++ mm/khugepaged.c | 4 +- mm/madvise.c | 18 ++++- 5 files changed, 197 insertions(+), 9 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index a1341fdcf666..5d1e3c980f74 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -218,9 +218,9 @@ void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, int hugepage_madvise(struct vm_area_struct *vma, unsigned long *vm_flags, int advice); -int madvise_collapse(struct vm_area_struct *vma, - struct vm_area_struct **prev, - unsigned long start, unsigned long end); +int madvise_collapse_thp(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start, unsigned long end); void vma_adjust_trans_huge(struct vm_area_struct *vma, unsigned long start, unsigned long end, long adjust_next); spinlock_t *__pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma); @@ -367,9 +367,9 @@ static inline int hugepage_madvise(struct vm_area_struct *vma, return -EINVAL; } -static inline int madvise_collapse(struct vm_area_struct *vma, - struct vm_area_struct **prev, - unsigned long start, unsigned long end) +static inline int madvise_collapse_thp(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start, unsigned long end) { return -EINVAL; } diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index c8524ac49b24..e1baf939afb6 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -1298,6 +1298,8 @@ bool hugetlb_hgm_eligible(struct vm_area_struct *vma); int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *mm, struct vm_area_struct *vma, unsigned long start, unsigned long end); +int hugetlb_collapse(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long start, unsigned long end); #else static inline bool hugetlb_hgm_enabled(struct vm_area_struct *vma) { @@ -1318,6 +1320,12 @@ int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *mm, { return -EINVAL; } +static inline +int hugetlb_collapse(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long start, unsigned long end) +{ + return -EINVAL; +} #endif static inline spinlock_t *huge_pte_lock(struct hstate *h, diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 5b6215e03fe1..388c46c7e77a 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -7852,6 +7852,170 @@ int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *mm, return 0; } +static bool hugetlb_hgm_collapsable(struct vm_area_struct *vma) +{ + if (!hugetlb_hgm_eligible(vma)) + return false; + if (!vma->vm_private_data) /* vma lock required for collapsing */ + return false; + return true; +} + +/* + * Collapse the address range from @start to @end to be mapped optimally. + * + * This is only valid for shared mappings. The main use case for this function + * is following UFFDIO_CONTINUE. If a user UFFDIO_CONTINUEs an entire hugepage + * by calling UFFDIO_CONTINUE once for each 4K region, the kernel doesn't know + * to collapse the mapping after the final UFFDIO_CONTINUE. Instead, we leave + * it up to userspace to tell us to do so, via MADV_COLLAPSE. + * + * Any holes in the mapping will be filled. If there is no page in the + * pagecache for a region we're collapsing, the PTEs will be cleared. + * + * If high-granularity PTEs are uffd-wp markers, those markers will be dropped. + */ +int hugetlb_collapse(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long start, unsigned long end) +{ + struct hstate *h = hstate_vma(vma); + struct address_space *mapping = vma->vm_file->f_mapping; + struct mmu_notifier_range range; + struct mmu_gather tlb; + unsigned long curr = start; + int ret = 0; + struct page *hpage, *subpage; + pgoff_t idx; + bool writable = vma->vm_flags & VM_WRITE; + bool shared = vma->vm_flags & VM_SHARED; + struct hugetlb_pte hpte; + pte_t entry; + + /* + * This is only supported for shared VMAs, because we need to look up + * the page to use for any PTEs we end up creating. + */ + if (!shared) + return -EINVAL; + + /* If HGM is not enabled, there is nothing to collapse. */ + if (!hugetlb_hgm_enabled(vma)) + return 0; + + /* + * We lost the VMA lock after splitting, so we can't safely collapse. + * We could improve this in the future (like take the mmap_lock for + * writing and try again), but for now just fail with ENOMEM. + */ + if (unlikely(!hugetlb_hgm_collapsable(vma))) + return -ENOMEM; + + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, + start, end); + mmu_notifier_invalidate_range_start(&range); + tlb_gather_mmu(&tlb, mm); + + /* + * Grab the VMA lock and mapping sem for writing. This will prevent + * concurrent high-granularity page table walks, so that we can safely + * collapse and free page tables. + * + * This is the same locking that huge_pmd_unshare requires. + */ + hugetlb_vma_lock_write(vma); + i_mmap_lock_write(vma->vm_file->f_mapping); + + while (curr < end) { + ret = hugetlb_alloc_largest_pte(&hpte, mm, vma, curr, end); + if (ret) + goto out; + + entry = huge_ptep_get(hpte.ptep); + + /* + * There is no work to do if the PTE doesn't point to page + * tables. + */ + if (!pte_present(entry)) + goto next_hpte; + if (hugetlb_pte_present_leaf(&hpte, entry)) + goto next_hpte; + + idx = vma_hugecache_offset(h, vma, curr); + hpage = find_get_page(mapping, idx); + + if (hpage && !HPageMigratable(hpage)) { + /* + * Don't collapse a mapping to a page that is pending + * a migration. Migration swap entries may have placed + * in the page table. + */ + ret = -EBUSY; + put_page(hpage); + goto out; + } + + if (hpage && PageHWPoison(hpage)) { + /* + * Don't collapse a mapping to a page that is + * hwpoisoned. + */ + ret = -EHWPOISON; + put_page(hpage); + /* + * By setting ret to -EHWPOISON, if nothing else + * happens, we will tell userspace that we couldn't + * fully collapse everything due to poison. + * + * Skip this page, and continue to collapse the rest + * of the mapping. + */ + curr = (curr & huge_page_mask(h)) + huge_page_size(h); + continue; + } + + /* + * Clear all the PTEs, and drop ref/mapcounts + * (on tlb_finish_mmu). + */ + __unmap_hugepage_range(&tlb, vma, curr, + curr + hugetlb_pte_size(&hpte), + NULL, + ZAP_FLAG_DROP_MARKER); + /* Free the PTEs. */ + hugetlb_free_pgd_range(&tlb, + curr, curr + hugetlb_pte_size(&hpte), + curr, curr + hugetlb_pte_size(&hpte)); + if (!hpage) { + huge_pte_clear(mm, curr, hpte.ptep, + hugetlb_pte_size(&hpte)); + goto next_hpte; + } + + page_dup_file_rmap(hpage, true); + + subpage = hugetlb_find_subpage(h, hpage, curr); + entry = make_huge_pte_with_shift(vma, subpage, + writable, hpte.shift); + set_huge_pte_at(mm, curr, hpte.ptep, entry); +next_hpte: + curr += hugetlb_pte_size(&hpte); + + if (curr < end) { + /* Don't hold the VMA lock for too long. */ + hugetlb_vma_unlock_write(vma); + cond_resched(); + hugetlb_vma_lock_write(vma); + } + } +out: + i_mmap_unlock_write(vma->vm_file->f_mapping); + hugetlb_vma_unlock_write(vma); + tlb_finish_mmu(&tlb); + mmu_notifier_invalidate_range_end(&range); + return ret; +} + #endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ /* diff --git a/mm/khugepaged.c b/mm/khugepaged.c index e1c7c1f357ef..cbeb7f00f1bf 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -2718,8 +2718,8 @@ static int madvise_collapse_errno(enum scan_result r) } } -int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev, - unsigned long start, unsigned long end) +int madvise_collapse_thp(struct vm_area_struct *vma, struct vm_area_struct **prev, + unsigned long start, unsigned long end) { struct collapse_control *cc; struct mm_struct *mm = vma->vm_mm; diff --git a/mm/madvise.c b/mm/madvise.c index 04ee28992e52..fec47e9f845b 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -1029,6 +1029,18 @@ static int madvise_split(struct vm_area_struct *vma, return 0; } +static int madvise_collapse(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start, unsigned long end) +{ + if (is_vm_hugetlb_page(vma)) { + *prev = vma; + return hugetlb_collapse(vma->vm_mm, vma, start, end); + } + + return madvise_collapse_thp(vma, prev, start, end); +} + /* * Apply an madvise behavior to a region of a vma. madvise_update_vma * will handle splitting a vm area into separate areas, each area with its own @@ -1205,6 +1217,9 @@ madvise_behavior_valid(int behavior) #ifdef CONFIG_TRANSPARENT_HUGEPAGE case MADV_HUGEPAGE: case MADV_NOHUGEPAGE: +#endif +#if defined(CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING) || \ + defined(CONFIG_TRANSPARENT_HUGEPAGE) case MADV_COLLAPSE: #endif #ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING @@ -1398,7 +1413,8 @@ int madvise_set_anon_name(struct mm_struct *mm, unsigned long start, * MADV_NOHUGEPAGE - mark the given range as not worth being backed by * transparent huge pages so the existing pages will not be * coalesced into THP and new pages will not be allocated as THP. - * MADV_COLLAPSE - synchronously coalesce pages into new THP. + * MADV_COLLAPSE - synchronously coalesce pages into new THP, or, for HugeTLB + * pages, collapse the mapping. * MADV_DONTDUMP - the application wants to prevent pages in the given range * from being included in its core dump. * MADV_DODUMP - cancel MADV_DONTDUMP: no longer exclude from core dump. From patchwork Thu Jan 5 10:18:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39461 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp228159wrt; Thu, 5 Jan 2023 02:23:52 -0800 (PST) X-Google-Smtp-Source: AMrXdXscv6F+F0TTUkCVYoEEVW/rElrsO/oInxJJjFSwKfjjWFjNje+eSJaCBt0983mKeB/kv0kW X-Received: by 2002:a62:6546:0:b0:578:119c:1c52 with SMTP id z67-20020a626546000000b00578119c1c52mr43135822pfb.14.1672914231894; Thu, 05 Jan 2023 02:23:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914231; cv=none; d=google.com; s=arc-20160816; b=k71Fr0E+ed5ij3RnoMPoCCxMrTThic/Gc1dFMZ9CrYycG/yhOh6TGvTt752rojnNWc 3SB/2DaE6zsmTNtTR0Q1jM1HnJr9d7mLG+bW4mwu5BQ6tlCcSgAC4sy9FFiQEPRKDFhe K3CTrZG9pjEIGCeNcZqIMbTYas2Wdwq31brl7KDTa2WZ0ricSqE9UHqvUToHroWDEBCk 1YB1z9NsOn9bngp/WoItFb3J2fIx03qmwBdskhiPAFr3JDmfx189tbVS4RhwIuxsS2hE YPaUP9sZTefch/eimCUlJUhJXCTMyJ6xXtRY5ed42jZhhs6DxZe5WhAqoUKTMuMkMCKR sgPA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=nQJKiUDoV/4j52zLo+Pfg4lOVIUQ9TTfXf4yaRNkk0o=; b=V4p/4O02VCNN+ZVwbVhkF4wK+RRWV8dGbJYYAK/8w2pyrWhh/Hji3eN5v7voCBw7BW EdYGmFsKj6d1R1fhQxLxwDOSppGeCQfaixuYSoUnJ2f/i+KkiHe9wg8a2F9deQz6kybQ vR2ORgHbNBqmDGHKCitLlOnK+Mmyxk9iWa9aDymGrg7YSwBoQrXo+XdXTdgjicz4vXMy /YnkUQuRmuAnk3HsbaM2Rvj7JMn33cYAMy4upCet5p+a8NnmYS/2rfVAFMFhl6+MafqK n3OUkOG4HrXe5NEYsWqlCnenN72INrfDzsKzVwpJ3cMk4iORJJrpYmZNfqcG2H9B+hKG ll2g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=atlQfqiE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h8-20020a056a00218800b0056da577869asi38824984pfi.356.2023.01.05.02.23.38; Thu, 05 Jan 2023 02:23:51 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=atlQfqiE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233024AbjAEKW5 (ORCPT + 99 others); Thu, 5 Jan 2023 05:22:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41338 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232978AbjAEKVo (ORCPT ); Thu, 5 Jan 2023 05:21:44 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6622A58D0C for ; Thu, 5 Jan 2023 02:19:46 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id e12-20020a25500c000000b007b48c520262so1904943ybb.14 for ; Thu, 05 Jan 2023 02:19:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=nQJKiUDoV/4j52zLo+Pfg4lOVIUQ9TTfXf4yaRNkk0o=; b=atlQfqiEgy1xYzhT4IyUVUGbPAhpghAx33YY9Ut2pzp0mD93Mh/Yq8hJbUQYMZUGyv 79g4SUp7Afeizjld4KBniyNCxadcDLxlzNbxhQuOC/KyWBzgN0ZCC7l0KieF/fpc4kkJ iUr6M0pdgU5RSZag7PDJbcOaLT6TYs/HCTL9W9Jb1R+becqgtzJQGEONP4TOXQWnH4F0 DYOLfXy51tLR/SmFY1q18P7ZJv+GrFzaVXxYkXHmZUnrB2+y7sCsAz/aoRq7ER2ueTN4 kZf3HAYzz5w5uKAKpFJxkGcQKV4Ix4sjQ5F5h/IZxCSAjyzQRxzHbXvVGGDZzdryGq3g sSOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=nQJKiUDoV/4j52zLo+Pfg4lOVIUQ9TTfXf4yaRNkk0o=; b=E1mJCr2kdn1ZNfbdKpv4vqFKBioTdwwDlzGw7qFyMGdIQIIRwu3q60xx1e3OAgIjGu WZO+GC7pdQVbotGVK7xsPpSG33AnLXN5KEpeuM252wfhcpV4Q3LcFW9HMpsJgVfEYUGw ZJa4Be2FFsBetED6YPZmWqWFpMXnoR5KnpqBLhWyrS6KatgT8YdsYG43ZTml47byscKE O/ZT3Q/aMuZSW1FBXAC8KZjpgtTNmk0QcQmMCNl1plA47GfA5GpowcKNWVnwJG2RTFRG hDevHE33be28WmppSGXOik5hAfWk6kQE6b1XNKD39nkDEmpaWxYP++2Lz6P3bTE88hZa TQew== X-Gm-Message-State: AFqh2koHtqYDYzTVz4DFEs4MmTf9IvtdSbCh0fz3gYSQReTZI38J7A0+ fEc+gIaoDWedBHHSHZtQeS5IazrvtSrCyzg7 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a81:cc8:0:b0:38d:c23a:c541 with SMTP id 191-20020a810cc8000000b0038dc23ac541mr53560ywm.109.1672913985945; Thu, 05 Jan 2023 02:19:45 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:34 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-37-jthoughton@google.com> Subject: [PATCH 36/46] hugetlb: remove huge_pte_lock and huge_pte_lockptr From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177713955481977?= X-GMAIL-MSGID: =?utf-8?q?1754177713955481977?= They are replaced with hugetlb_pte_lock{,ptr}. All callers that haven't already been replaced don't get called when using HGM, so we handle them by populating hugetlb_ptes with the standard, hstate-sized huge PTEs. Signed-off-by: James Houghton --- arch/powerpc/mm/pgtable.c | 7 +++++-- include/linux/hugetlb.h | 42 +++++++++++++++------------------------ mm/hugetlb.c | 22 +++++++++++++------- 3 files changed, 36 insertions(+), 35 deletions(-) diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c index 035a0df47af0..e20d6aa9a2a6 100644 --- a/arch/powerpc/mm/pgtable.c +++ b/arch/powerpc/mm/pgtable.c @@ -258,11 +258,14 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma, #ifdef CONFIG_PPC_BOOK3S_64 struct hstate *h = hstate_vma(vma); + struct hugetlb_pte hpte; psize = hstate_get_psize(h); #ifdef CONFIG_DEBUG_VM - assert_spin_locked(huge_pte_lockptr(huge_page_shift(h), - vma->vm_mm, ptep)); + /* HGM is not supported for powerpc yet. */ + hugetlb_pte_populate(&hpte, ptep, huge_page_shift(h), + hpage_size_to_level(psize)); + assert_spin_locked(hpte.ptl); #endif #else diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index e1baf939afb6..4d318bf2ced9 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -1032,14 +1032,6 @@ static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask) return modified_mask; } -static inline spinlock_t *huge_pte_lockptr(unsigned int shift, - struct mm_struct *mm, pte_t *pte) -{ - if (shift == PMD_SHIFT) - return pmd_lockptr(mm, (pmd_t *) pte); - return &mm->page_table_lock; -} - #ifndef hugepages_supported /* * Some platform decide whether they support huge pages at boot @@ -1248,12 +1240,6 @@ static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask) return 0; } -static inline spinlock_t *huge_pte_lockptr(unsigned int shift, - struct mm_struct *mm, pte_t *pte) -{ - return &mm->page_table_lock; -} - static inline void hugetlb_count_init(struct mm_struct *mm) { } @@ -1328,16 +1314,6 @@ int hugetlb_collapse(struct mm_struct *mm, struct vm_area_struct *vma, } #endif -static inline spinlock_t *huge_pte_lock(struct hstate *h, - struct mm_struct *mm, pte_t *pte) -{ - spinlock_t *ptl; - - ptl = huge_pte_lockptr(huge_page_shift(h), mm, pte); - spin_lock(ptl); - return ptl; -} - static inline spinlock_t *hugetlb_pte_lockptr(struct hugetlb_pte *hpte) { @@ -1358,8 +1334,22 @@ void hugetlb_pte_populate(struct mm_struct *mm, struct hugetlb_pte *hpte, pte_t *ptep, unsigned int shift, enum hugetlb_level level) { - __hugetlb_pte_populate(hpte, ptep, shift, level, - huge_pte_lockptr(shift, mm, ptep)); + spinlock_t *ptl; + + /* + * For contiguous HugeTLB PTEs that can contain other HugeTLB PTEs + * on the same level, the same PTL for both must be used. + * + * For some architectures that implement hugetlb_walk_step, this + * version of hugetlb_pte_populate() may not be correct to use for + * high-granularity PTEs. Instead, call __hugetlb_pte_populate() + * directly. + */ + if (level == HUGETLB_LEVEL_PMD) + ptl = pmd_lockptr(mm, (pmd_t *) ptep); + else + ptl = &mm->page_table_lock; + __hugetlb_pte_populate(hpte, ptep, shift, level, ptl); } #if defined(CONFIG_HUGETLB_PAGE) && defined(CONFIG_CMA) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 388c46c7e77a..d71adc03138d 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5303,9 +5303,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, put_page(hpage); /* Install the new huge page if src pte stable */ - dst_ptl = huge_pte_lock(h, dst, dst_pte); - src_ptl = huge_pte_lockptr(huge_page_shift(h), - src, src_pte); + dst_ptl = hugetlb_pte_lock(&dst_hpte); + src_ptl = hugetlb_pte_lockptr(&src_hpte); spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); entry = huge_ptep_get(src_pte); if (!pte_same(src_pte_old, entry)) { @@ -7383,7 +7382,8 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long saddr; pte_t *spte = NULL; pte_t *pte; - spinlock_t *ptl; + struct hugetlb_pte hpte; + struct hstate *shstate; i_mmap_lock_read(mapping); vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) { @@ -7404,7 +7404,11 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, if (!spte) goto out; - ptl = huge_pte_lock(hstate_vma(vma), mm, spte); + shstate = hstate_vma(svma); + + hugetlb_pte_populate(mm, &hpte, spte, huge_page_shift(shstate), + hpage_size_to_level(huge_page_size(shstate))); + spin_lock(hpte.ptl); if (pud_none(*pud)) { pud_populate(mm, pud, (pmd_t *)((unsigned long)spte & PAGE_MASK)); @@ -7412,7 +7416,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, } else { put_page(virt_to_page(spte)); } - spin_unlock(ptl); + spin_unlock(hpte.ptl); out: pte = (pte_t *)pmd_alloc(mm, pud, addr); i_mmap_unlock_read(mapping); @@ -8132,6 +8136,7 @@ void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) unsigned long address, start, end; spinlock_t *ptl; pte_t *ptep; + struct hugetlb_pte hpte; if (!(vma->vm_flags & VM_MAYSHARE)) return; @@ -8156,7 +8161,10 @@ void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) ptep = hugetlb_walk(vma, address, sz); if (!ptep) continue; - ptl = huge_pte_lock(h, mm, ptep); + + hugetlb_pte_populate(mm, &hpte, ptep, huge_page_shift(h), + hpage_size_to_level(sz)); + ptl = hugetlb_pte_lock(&hpte); huge_pmd_unshare(mm, vma, address, ptep); spin_unlock(ptl); } From patchwork Thu Jan 5 10:18:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39457 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp228152wrt; Thu, 5 Jan 2023 02:23:51 -0800 (PST) X-Google-Smtp-Source: AMrXdXvv5gw/R46uoV4Ni5Dhz2FJayCEtJ6ABznShnrA5dX4CO7U7NH79BWo4kNbo9HA/LFgiYPF X-Received: by 2002:a17:90a:b395:b0:226:1531:6be6 with SMTP id e21-20020a17090ab39500b0022615316be6mr33041934pjr.48.1672914230727; Thu, 05 Jan 2023 02:23:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914230; cv=none; d=google.com; s=arc-20160816; b=Wc9lnUpPPpLFi5Ap6Dtupcalw/67OKmk8gPVl3t02GGUY3T4nFN5vEv3A65alrwO0M QwrjdikEuT+SqFvF0KggRSQhW+7ZdjePjOweGNrgztcZtutuw6/SeEHXgn1Lb6qH6Xfu aYusX7y1mjkFMOj7NkyCDlh6XGJvaHJL/Dyp1evC77V5W+o8Yc84WWOCARvTSlyjS5tM ezzR/U4cAq7VMofCtoXYpBscA2kt2vdyCjf4A5TagkmcetRVrWM3XdIg+Crr9opeknjw vSYlI0rnieey8kj39+A6AwagI3V/DfD47PteDGExE8mrST/pDYV/q8KMs1MAuspOLQIN g14Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=hEbQ+jBtLxtopyfN7uZpWk6gsndFSvLkXCtItdMk4sk=; b=HXtxDxPw4t5Qk4bTHF0JZm53iJ/oviCOn2lxpa0s2N4W7PSW5LphCr+Cp0wWK0gClO XKYc2+cVenS2XxmAJjZGTPlgV1c0n9OBtOcI7z5bz+YYrvEXmF9hza/PTjXE9NVdvjCY QDaFllSQodkDE8+YfO1MOgpTWZ/WVe+Jo1eQLbW29W5zDh/LO7o/KbuzDqbNJU559/Fc r8IFZMcl4Cyr34Xc9Mr+HxhVMYCv03IrIrbSEDFGSFKJ5ifYFqBVev4Up7JKgK0l5AaA b1LMvQaYJlwiHIs6uQLLB+gHWvooG175riGxOqd41tyBC/ZB9Pl/dwSVCMAkkaQvWlDJ 0UAg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=rzwvJ5O+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s7-20020a6550c7000000b00478595f417dsi34925215pgp.103.2023.01.05.02.23.38; Thu, 05 Jan 2023 02:23:50 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=rzwvJ5O+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233048AbjAEKXB (ORCPT + 99 others); Thu, 5 Jan 2023 05:23:01 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40892 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232995AbjAEKVr (ORCPT ); Thu, 5 Jan 2023 05:21:47 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 06D30559F7 for ; Thu, 5 Jan 2023 02:19:48 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id k18-20020a25e812000000b0077cc9ab9dd9so27183271ybd.8 for ; Thu, 05 Jan 2023 02:19:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=hEbQ+jBtLxtopyfN7uZpWk6gsndFSvLkXCtItdMk4sk=; b=rzwvJ5O+RQ04BJH+OwNVawRQc+ih25bUT/fZ/ETXftYRdFLi35+tT7QSZVPpjxQS1b ABqV/Hscpt1tyn1fI606OsdBiPRbxM/DjtciO23jBbkIiSjZtGIU3B25/psNHjy2QqL+ n+3yYQBuUK1pJUfmrrK81ggN3y49WxAlQreHX/XO36/3irdaWFvE8DSXKP/d0WyODufr NKEQ4IDvemsHhse43nm2blfRwsmlA0BhLtmZ35A63iDIfnnlMTETvBoFXN6Dtdoz/wqg ut170EwANC3ESebWxhfhtn0H73/Xo/GUK5HnUthjZUYekYmuVYd1stlWrAvlonj7Th8O f90g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=hEbQ+jBtLxtopyfN7uZpWk6gsndFSvLkXCtItdMk4sk=; b=XmBaB2an3fK9ETf6kpr6LQOexUKAw5hR2IzGgMiX1BlceUWYzkCWEc0GJ8x2X+uB7/ rnGviCaED+twARZgDNyFiX5b2CPKJfAWKoEhOvjAAdnhKISlNto/rlX+Sn0zyfxbH/VV qdGAznwpoD8wYIecNeW3xMKqULOq51cc+bcQtfBXe42jV4fxur3l4VXAm4qVNehZ/IgG G3pricc9rqP7ZbYmcM1NDSYBKyjoAdtpKeR5WH/gNt13b1QqN9scvKXFpx5rkQsfXZi/ 5c7UrDuoFaHRQvidrifmDT3Rj50dJn5oMLOBs8ndnDENDKYCocgHWc5CJFHC+/lBkp5E XWdw== X-Gm-Message-State: AFqh2kraE2+idAbSqm0GVP0gT/GtK6hxnyfrGBDCm2kdc9UwsKdR2IC9 CoGGAyTWpuFSaeibgoBfmPLIUWKRzLHNRXK4 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a81:c30e:0:b0:4b2:72:d8ee with SMTP id r14-20020a81c30e000000b004b20072d8eemr1083439ywk.272.1672913987306; Thu, 05 Jan 2023 02:19:47 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:35 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-38-jthoughton@google.com> Subject: [PATCH 37/46] hugetlb: replace make_huge_pte with make_huge_pte_with_shift From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177712457903404?= X-GMAIL-MSGID: =?utf-8?q?1754177712457903404?= This removes the old definition of make_huge_pte, where now we always require the shift to be explicitly given. All callsites are cleaned up. Signed-off-by: James Houghton --- mm/hugetlb.c | 31 ++++++++++++------------------- 1 file changed, 12 insertions(+), 19 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index d71adc03138d..10a323e6bd9c 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5069,9 +5069,9 @@ const struct vm_operations_struct hugetlb_vm_ops = { .pagesize = hugetlb_vm_op_pagesize, }; -static pte_t make_huge_pte_with_shift(struct vm_area_struct *vma, - struct page *page, int writable, - int shift) +static pte_t make_huge_pte(struct vm_area_struct *vma, + struct page *page, int writable, + int shift) { pte_t entry; @@ -5087,14 +5087,6 @@ static pte_t make_huge_pte_with_shift(struct vm_area_struct *vma, return entry; } -static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page, - int writable) -{ - unsigned int shift = huge_page_shift(hstate_vma(vma)); - - return make_huge_pte_with_shift(vma, page, writable, shift); -} - static void set_huge_ptep_writable(struct vm_area_struct *vma, unsigned long address, pte_t *ptep) { @@ -5135,10 +5127,12 @@ static void hugetlb_install_page(struct vm_area_struct *vma, pte_t *ptep, unsigned long addr, struct page *new_page) { + struct hstate *h = hstate_vma(vma); __SetPageUptodate(new_page); hugepage_add_new_anon_rmap(new_page, vma, addr); - set_huge_pte_at(vma->vm_mm, addr, ptep, make_huge_pte(vma, new_page, 1)); - hugetlb_count_add(pages_per_huge_page(hstate_vma(vma)), vma->vm_mm); + set_huge_pte_at(vma->vm_mm, addr, ptep, make_huge_pte(vma, new_page, 1, + huge_page_shift(h))); + hugetlb_count_add(pages_per_huge_page(h), vma->vm_mm); SetHPageMigratable(new_page); } @@ -5854,7 +5848,8 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, page_remove_rmap(old_page, vma, true); hugepage_add_new_anon_rmap(new_page, vma, haddr); set_huge_pte_at(mm, haddr, ptep, - make_huge_pte(vma, new_page, !unshare)); + make_huge_pte(vma, new_page, !unshare, + huge_page_shift(h))); SetHPageMigratable(new_page); /* Make the old page be freed below */ new_page = old_page; @@ -6163,7 +6158,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, page_dup_file_rmap(page, true); subpage = hugetlb_find_subpage(h, page, haddr_hgm); - new_pte = make_huge_pte_with_shift(vma, subpage, + new_pte = make_huge_pte(vma, subpage, ((vma->vm_flags & VM_WRITE) && (vma->vm_flags & VM_SHARED)), hpte->shift); @@ -6585,8 +6580,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, subpage = hugetlb_find_subpage(h, page, dst_addr); - _dst_pte = make_huge_pte_with_shift(dst_vma, subpage, writable, - dst_hpte->shift); + _dst_pte = make_huge_pte(dst_vma, subpage, writable, dst_hpte->shift); /* * Always mark UFFDIO_COPY page dirty; note that this may not be * extremely important for hugetlbfs for now since swapping is not @@ -7999,8 +7993,7 @@ int hugetlb_collapse(struct mm_struct *mm, struct vm_area_struct *vma, page_dup_file_rmap(hpage, true); subpage = hugetlb_find_subpage(h, hpage, curr); - entry = make_huge_pte_with_shift(vma, subpage, - writable, hpte.shift); + entry = make_huge_pte(vma, subpage, writable, hpte.shift); set_huge_pte_at(mm, curr, hpte.ptep, entry); next_hpte: curr += hugetlb_pte_size(&hpte); From patchwork Thu Jan 5 10:18:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39462 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp228176wrt; Thu, 5 Jan 2023 02:23:56 -0800 (PST) X-Google-Smtp-Source: AMrXdXsviwQtEH3oRTRj5jgJ8TdkRWn3vZZO3O+b+bcF7w3RrEYFz/nTXnzqdFfhHojfOpr59bQJ X-Received: by 2002:a05:6a20:548e:b0:ac:19cf:1553 with SMTP id i14-20020a056a20548e00b000ac19cf1553mr71643784pzk.61.1672914236007; Thu, 05 Jan 2023 02:23:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914235; cv=none; d=google.com; s=arc-20160816; b=hb8EGvhv/pGGhKJZPfolBl0g2UWahH+eTUjq7tBv454DPxYzDzv3SPu47rt5vIwWJk VRjl7Yly7IU1bx0S7e3GIOo0lx80QhvCWhwlKbGquvCULFgKHWwq6sYwXibwxcbdJrHT MRKM5PfBZAav/Fukh1wDBRnL7Zr3uO7qrA0NALy4U9+FFUgnJ3EEvaOi/9B0fmPQxoJe mTwDNrb5E2Idr0DIjJt0R+lsZ0nvKA9XaVn4+9jFjq20mq+YKhYUfo7nCqiP9bN+w4eH zJqEibcu0x6DjZmsaiwjsS4iDyr+MCECQ5k0a8/GBG3iwro23p7iRECIeW0pjadMjcQI CKdg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=YROC31bBHQbBSjmgrEGHhxpa+rSAT6KfqkSaFQ7ab5I=; b=GCVCeA5m7wvG1E4KjrZROm/qcNCjJ6+qcOVkzwXQSD737ynDR7fMdiajDmF6RK9T7d XuI5BODzAPrfwSrh9TIENYga5qvOo00NO7NbyMR9/XUV7wCpe3G5wYWqPuDgc5z3LVSW ZvDJkQOCrdVP+YY6eXTMzdoqnHoB9aH5Jms6+DdHsajf915ljAgrvNkZ8Sh0/RyhQ9n+ TUjhKND+lP5yQS91ol6qIdbpRNyN1SGeSQWzMju5Umh0ygE0CnVX4jFkz+Qz2G9SYJhS MNlgdB43V1wQZ+ET26W4jlWeI5w/vFWtNf6PcAohww0+iH2UFg1KcS2EUacK0Jgj9T52 bOnQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="U/UY/aBZ"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a194-20020a621acb000000b005828847fbd3si9593777pfa.241.2023.01.05.02.23.43; Thu, 05 Jan 2023 02:23:55 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="U/UY/aBZ"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233256AbjAEKXG (ORCPT + 99 others); Thu, 5 Jan 2023 05:23:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40156 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233003AbjAEKVs (ORCPT ); Thu, 5 Jan 2023 05:21:48 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E1D1A32EAF for ; Thu, 5 Jan 2023 02:19:49 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id n203-20020a2572d4000000b0078f09db9888so19455633ybc.18 for ; Thu, 05 Jan 2023 02:19:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=YROC31bBHQbBSjmgrEGHhxpa+rSAT6KfqkSaFQ7ab5I=; b=U/UY/aBZuBuGO1zBf6hoefUgK67MuUBly/rpc07hnF9u48daPB1OjsNzzcpRAPF19C LuSN2mUnYGUk8rvXyHGEuQtX3iEeJRkKhDrbwW5MY6Wf7TTfUt5oqHr8PuK0ICT4V7Ir G9IYlCVaymyzPUk4ZLkY6wnrdelHelK76kz4YyGJlubRd29SUQWrCXQl5QBNjdcKzI6B +QXWbVZqKiQuB/zZgtBcVlNNhSjJRbffRJg2gp1NzRPUYzXyFHl30sL1lhuUu2N/BaSC x0saMyGSHg8E20qCo+1AtcjQOlOpoSyK4amkpF0avMjdnvn1hlCJtd1I/c5LFWD6iihU /N9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=YROC31bBHQbBSjmgrEGHhxpa+rSAT6KfqkSaFQ7ab5I=; b=E4fshb58xX1Dh+gTCJvKw8zabKuQlqXq2WXYndrMswerVehLJDCDYtZ/FmYXTcQXUd aBfwgQxXSBXVstOFgZ7PHq/DlyldaZJYCfpcmZT9JZBjwZw5f1sNzqzy1onGYOp5PA7f PiYjjnhLUjaihuHdD8yoBEvL+Qv02AQmZLOySPS6M9/UQkAXPKR6CpbkxvUJQ0SU0+uW H/usRuX/oJxjDiAVNGH/FF/RFYwBxkHo0M2PrQNjvMb4kfZfoDffBiJsF3qyggL/9j9m wjjFlRm09UAvCH/nBjOSy9XxJSZUBjLxUF0zqGWlj+5BozTv07Ttto0VjeRle5jRwgiu fywg== X-Gm-Message-State: AFqh2kqeaKsbn6XvRDfjc6KBQlLaQwbQTadPuVgkDHmprmsb3IGo0oxo z77eQ1eilzOax8y69L+ow2r6ANU7UrWd5aZ+ X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a0d:e5c3:0:b0:4ba:4d9d:ca0f with SMTP id o186-20020a0de5c3000000b004ba4d9dca0fmr472468ywe.250.1672913989605; Thu, 05 Jan 2023 02:19:49 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:36 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-39-jthoughton@google.com> Subject: [PATCH 38/46] mm: smaps: add stats for HugeTLB mapping size From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177718276786791?= X-GMAIL-MSGID: =?utf-8?q?1754177718276786791?= When the kernel is compiled with HUGETLB_HIGH_GRANULARITY_MAPPING, smaps may provide HugetlbPudMapped, HugetlbPmdMapped, and HugetlbPteMapped. Levels that are folded will not be outputted. Signed-off-by: James Houghton --- fs/proc/task_mmu.c | 101 +++++++++++++++++++++++++++++++++------------ 1 file changed, 75 insertions(+), 26 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index c353cab11eee..af31c4d314d2 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -412,6 +412,15 @@ struct mem_size_stats { unsigned long swap; unsigned long shared_hugetlb; unsigned long private_hugetlb; +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING +#ifndef __PAGETABLE_PUD_FOLDED + unsigned long hugetlb_pud_mapped; +#endif +#ifndef __PAGETABLE_PMD_FOLDED + unsigned long hugetlb_pmd_mapped; +#endif + unsigned long hugetlb_pte_mapped; +#endif u64 pss; u64 pss_anon; u64 pss_file; @@ -731,6 +740,35 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) } #ifdef CONFIG_HUGETLB_PAGE + +static void smaps_hugetlb_hgm_account(struct mem_size_stats *mss, + struct hugetlb_pte *hpte) +{ +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING + unsigned long size = hugetlb_pte_size(hpte); + + switch (hpte->level) { +#ifndef __PAGETABLE_PUD_FOLDED + case HUGETLB_LEVEL_PUD: + mss->hugetlb_pud_mapped += size; + break; +#endif +#ifndef __PAGETABLE_PMD_FOLDED + case HUGETLB_LEVEL_PMD: + mss->hugetlb_pmd_mapped += size; + break; +#endif + case HUGETLB_LEVEL_PTE: + mss->hugetlb_pte_mapped += size; + break; + default: + break; + } +#else + return; +#endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ +} + static int smaps_hugetlb_range(struct hugetlb_pte *hpte, unsigned long addr, struct mm_walk *walk) @@ -764,6 +802,8 @@ static int smaps_hugetlb_range(struct hugetlb_pte *hpte, mss->shared_hugetlb += hugetlb_pte_size(hpte); else mss->private_hugetlb += hugetlb_pte_size(hpte); + + smaps_hugetlb_hgm_account(mss, hpte); } return 0; } @@ -833,38 +873,47 @@ static void smap_gather_stats(struct vm_area_struct *vma, static void __show_smap(struct seq_file *m, const struct mem_size_stats *mss, bool rollup_mode) { - SEQ_PUT_DEC("Rss: ", mss->resident); - SEQ_PUT_DEC(" kB\nPss: ", mss->pss >> PSS_SHIFT); - SEQ_PUT_DEC(" kB\nPss_Dirty: ", mss->pss_dirty >> PSS_SHIFT); + SEQ_PUT_DEC("Rss: ", mss->resident); + SEQ_PUT_DEC(" kB\nPss: ", mss->pss >> PSS_SHIFT); + SEQ_PUT_DEC(" kB\nPss_Dirty: ", mss->pss_dirty >> PSS_SHIFT); if (rollup_mode) { /* * These are meaningful only for smaps_rollup, otherwise two of * them are zero, and the other one is the same as Pss. */ - SEQ_PUT_DEC(" kB\nPss_Anon: ", + SEQ_PUT_DEC(" kB\nPss_Anon: ", mss->pss_anon >> PSS_SHIFT); - SEQ_PUT_DEC(" kB\nPss_File: ", + SEQ_PUT_DEC(" kB\nPss_File: ", mss->pss_file >> PSS_SHIFT); - SEQ_PUT_DEC(" kB\nPss_Shmem: ", + SEQ_PUT_DEC(" kB\nPss_Shmem: ", mss->pss_shmem >> PSS_SHIFT); } - SEQ_PUT_DEC(" kB\nShared_Clean: ", mss->shared_clean); - SEQ_PUT_DEC(" kB\nShared_Dirty: ", mss->shared_dirty); - SEQ_PUT_DEC(" kB\nPrivate_Clean: ", mss->private_clean); - SEQ_PUT_DEC(" kB\nPrivate_Dirty: ", mss->private_dirty); - SEQ_PUT_DEC(" kB\nReferenced: ", mss->referenced); - SEQ_PUT_DEC(" kB\nAnonymous: ", mss->anonymous); - SEQ_PUT_DEC(" kB\nLazyFree: ", mss->lazyfree); - SEQ_PUT_DEC(" kB\nAnonHugePages: ", mss->anonymous_thp); - SEQ_PUT_DEC(" kB\nShmemPmdMapped: ", mss->shmem_thp); - SEQ_PUT_DEC(" kB\nFilePmdMapped: ", mss->file_thp); - SEQ_PUT_DEC(" kB\nShared_Hugetlb: ", mss->shared_hugetlb); - seq_put_decimal_ull_width(m, " kB\nPrivate_Hugetlb: ", + SEQ_PUT_DEC(" kB\nShared_Clean: ", mss->shared_clean); + SEQ_PUT_DEC(" kB\nShared_Dirty: ", mss->shared_dirty); + SEQ_PUT_DEC(" kB\nPrivate_Clean: ", mss->private_clean); + SEQ_PUT_DEC(" kB\nPrivate_Dirty: ", mss->private_dirty); + SEQ_PUT_DEC(" kB\nReferenced: ", mss->referenced); + SEQ_PUT_DEC(" kB\nAnonymous: ", mss->anonymous); + SEQ_PUT_DEC(" kB\nLazyFree: ", mss->lazyfree); + SEQ_PUT_DEC(" kB\nAnonHugePages: ", mss->anonymous_thp); + SEQ_PUT_DEC(" kB\nShmemPmdMapped: ", mss->shmem_thp); + SEQ_PUT_DEC(" kB\nFilePmdMapped: ", mss->file_thp); + SEQ_PUT_DEC(" kB\nShared_Hugetlb: ", mss->shared_hugetlb); + seq_put_decimal_ull_width(m, " kB\nPrivate_Hugetlb: ", mss->private_hugetlb >> 10, 7); - SEQ_PUT_DEC(" kB\nSwap: ", mss->swap); - SEQ_PUT_DEC(" kB\nSwapPss: ", +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING +#ifndef __PAGETABLE_PUD_FOLDED + SEQ_PUT_DEC(" kB\nHugetlbPudMapped: ", mss->hugetlb_pud_mapped); +#endif +#ifndef __PAGETABLE_PMD_FOLDED + SEQ_PUT_DEC(" kB\nHugetlbPmdMapped: ", mss->hugetlb_pmd_mapped); +#endif + SEQ_PUT_DEC(" kB\nHugetlbPteMapped: ", mss->hugetlb_pte_mapped); +#endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ + SEQ_PUT_DEC(" kB\nSwap: ", mss->swap); + SEQ_PUT_DEC(" kB\nSwapPss: ", mss->swap_pss >> PSS_SHIFT); - SEQ_PUT_DEC(" kB\nLocked: ", + SEQ_PUT_DEC(" kB\nLocked: ", mss->pss_locked >> PSS_SHIFT); seq_puts(m, " kB\n"); } @@ -880,18 +929,18 @@ static int show_smap(struct seq_file *m, void *v) show_map_vma(m, vma); - SEQ_PUT_DEC("Size: ", vma->vm_end - vma->vm_start); - SEQ_PUT_DEC(" kB\nKernelPageSize: ", vma_kernel_pagesize(vma)); - SEQ_PUT_DEC(" kB\nMMUPageSize: ", vma_mmu_pagesize(vma)); + SEQ_PUT_DEC("Size: ", vma->vm_end - vma->vm_start); + SEQ_PUT_DEC(" kB\nKernelPageSize: ", vma_kernel_pagesize(vma)); + SEQ_PUT_DEC(" kB\nMMUPageSize: ", vma_mmu_pagesize(vma)); seq_puts(m, " kB\n"); __show_smap(m, &mss, false); - seq_printf(m, "THPeligible: %d\n", + seq_printf(m, "THPeligible: %d\n", hugepage_vma_check(vma, vma->vm_flags, true, false, true)); if (arch_pkeys_enabled()) - seq_printf(m, "ProtectionKey: %8u\n", vma_pkey(vma)); + seq_printf(m, "ProtectionKey: %8u\n", vma_pkey(vma)); show_smap_vma_flags(m, vma); return 0; From patchwork Thu Jan 5 10:18:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39463 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp228204wrt; Thu, 5 Jan 2023 02:24:01 -0800 (PST) X-Google-Smtp-Source: AMrXdXspbIF5TwF+S5J9Xd95+rxhymzfBojJ2FuUyfMEOjDCD51PBjFoL6Jbc+1EogghKpjgNa0I X-Received: by 2002:a17:90b:92:b0:225:eda7:13e with SMTP id bb18-20020a17090b009200b00225eda7013emr37550416pjb.40.1672914241243; Thu, 05 Jan 2023 02:24:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914241; cv=none; d=google.com; s=arc-20160816; b=FcRvG0amKh71d1ujNEwdVnwEvZsv6t93ckFFDUhNiy6x0rZKCRsv//ieCgo4tRuj2c C7TuRDU+DwD44wLuFzTt8EEpGpyOZbTWd+ZK5muoPzHASnCqGCeV9TFq5PvhoMCZkCJw l9P0lA/sMXsJzJnTjQPVQGz0lq3nvlZ3eC0XUf0iVMlHSprqd4dcb5IUYuAVFfYOe62D ZUjB8kZp63Y37wShVpCAfmblKi2ouLMqesUf45oIzsvWTtUcVp56pWMtsdaL/YtpR/LH atOddZJjscM+CdNe7xFB186lRdura2fRQ0RXmV8i4dfSCC9CiWCRwVnca3iEiViddNdY dwPA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=ebd8Vu/XVboNQYDKBTu4hK6f3y9obuZPyqPY5k+Gudc=; b=gBjzKbur9nLW7X0T4T6PEolIEdF7mBuOdMyo4vXlpWmfmbxrCw5Obh02CUzvVKKFCM DXfo+Zo2uEmm0bVVGjEaw+IUF0PkR5CQl7sk2Y34PI7JM9JleblvSpvBC3rcWxHSw3Lf fK3iWh31YEuVkyMJYU9pv4EDXOBhcicKkPnvPNYSaW7Tm3fACHkAgMvo6qTdAr0w4Nof Z/uyT3K3/S39lNnhellI6bW3VdTJEzNRRxnaYWWHVN2oV+jJ/blg0Y5kAJs+pTVaH6hc qE1lXmPiTWIAhp+HxB/KhANrv6NG+z6Z+V2+IMRdoPMalEPGGMxlZyGy3UXNFZR5p/Q/ 6iUA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=I6ayK2fj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id kb3-20020a17090ae7c300b00215dedefc32si1702241pjb.163.2023.01.05.02.23.48; Thu, 05 Jan 2023 02:24:01 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=I6ayK2fj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232821AbjAEKXK (ORCPT + 99 others); Thu, 5 Jan 2023 05:23:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40476 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231444AbjAEKVu (ORCPT ); Thu, 5 Jan 2023 05:21:50 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0924C559FF for ; Thu, 5 Jan 2023 02:19:52 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-45c1b233dd7so377526717b3.20 for ; Thu, 05 Jan 2023 02:19:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ebd8Vu/XVboNQYDKBTu4hK6f3y9obuZPyqPY5k+Gudc=; b=I6ayK2fjeeV6XAYAV3ZhwaEnnQoYUdQNqveqXyF5ggg1QXVgf5LPqhurqUyX4Sg8HO s6F9jORt3w8TrKLgQbE9o/ipsILjWJVWxOlolxXAYaC7l1LUJTPuzmnYXb8M/6TAOgE6 iV4jd4Plk12mL7A8sfE6TF+7lw9HKJSmcmSdh24PWpxCGN5c8WUgHm+WCIlfPdcQb8Va oaa3biXUtYUW2lT6cUJ+TJCPkPPPYnk+RY3IQZ7J1RSQZAg/BMdpqELg44GmDZbqhD/+ kIK71fstpMvWu1m6mR114A35MUphScUHv/CK8JATL1Z7Dr//wEo/gsARlq+dXyJGqPZ3 FyGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ebd8Vu/XVboNQYDKBTu4hK6f3y9obuZPyqPY5k+Gudc=; b=l3QEQnfbLZIhrBeIPKwd5JBPSmYWi/PpvsT87VR/6ahjoOcjnJPCI1cqs+ww7HLVAS 3oJyLhlUvWqqAtoulhXWtCIX5AvgYmZdSWkOaITaWUI2p0IXVXvkxRNF51VD0e0UPoR2 sTIslCTqodunu1zMg08/X8v6fLB+MQYXP19V+rtBhF2HqXwWH7CqCsPCsjUEHwXR+ou8 TdGeGvvqqg0zjKq3vRvSK23hPvg0ne3LMECKBwm2dTNUQYDdksiuEfhZP5azeOrvhS9i LAN4Z0+5+kcdUqS0d64s0XtJlIVdz/JoqwK8OowGHKksWSa8H+Knd4mPQslhbFYZPah1 8ZGw== X-Gm-Message-State: AFqh2kppfkQxpLcC5D67Fy7GE3bYkaATXRjkpemiAYPmJ+vUaX8pc5Eo Fh2wNOl3eGRVn/1lWZ5iKr1zeoV0I1i5ZxGH X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:690c:c81:b0:48c:9ce1:9ac8 with SMTP id cm1-20020a05690c0c8100b0048c9ce19ac8mr2452166ywb.305.1672913991340; Thu, 05 Jan 2023 02:19:51 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:37 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-40-jthoughton@google.com> Subject: [PATCH 39/46] hugetlb: x86: enable high-granularity mapping From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177723524853857?= X-GMAIL-MSGID: =?utf-8?q?1754177723524853857?= Now that HGM is fully supported for GENERAL_HUGETLB, x86 can enable it. The x86 KVM MMU already properly handles HugeTLB HGM pages (it does a page table walk to determine which size to use in the second-stage page table instead of, for example, checking vma_mmu_pagesize, like arm64 does). We could also enable HugeTLB HGM for arm (32-bit) at this point, as it also uses GENERAL_HUGETLB and I don't see anything else that is needed for it. However, I haven't tested on arm at all, so I won't enable it. Signed-off-by: James Houghton --- arch/x86/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 3604074a878b..3d08cd45549c 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -126,6 +126,7 @@ config X86 select ARCH_WANT_GENERAL_HUGETLB select ARCH_WANT_HUGE_PMD_SHARE select ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP if X86_64 + select ARCH_WANT_HUGETLB_HIGH_GRANULARITY_MAPPING select ARCH_WANT_LD_ORPHAN_WARN select ARCH_WANTS_THP_SWAP if X86_64 select ARCH_HAS_PARANOID_L1D_FLUSH From patchwork Thu Jan 5 10:18:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39465 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp228404wrt; Thu, 5 Jan 2023 02:24:37 -0800 (PST) X-Google-Smtp-Source: AMrXdXvCRSgsdOU0zFDwKXRleXtq1T/7cOaJ8cGu++OKIkM1ZU5FKfnzTrHj0ce2kC6mx9Aq9FXf X-Received: by 2002:a05:6a20:3ba7:b0:b2:3174:d2b2 with SMTP id b39-20020a056a203ba700b000b23174d2b2mr53527129pzh.28.1672914277580; Thu, 05 Jan 2023 02:24:37 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914277; cv=none; d=google.com; s=arc-20160816; b=VstwB6YL4QTVPOS3qfdioqM/UNHnqjl4p29lgKqTDzFyVtXZEi+tkwCpfhhSRQm4As X3QtO0dy21aoA09Zc8DYVGNABkYsclF+TGjiaMR6mz0O8qWngYk0wPWNO5IAfcocKRMZ 9F28khIKoIoU3EQqX8AR2clKODvFmEAB6B7x5lGUCX2Cc9NBOMF5BcNGcWSocN/cYvKY pg3ud/0ai90YhRsLoNRRL7EMNf6uQOn84f0agw8jeYIhsXXy3H1Ds/xpqryK1WxwoLl+ R4auJ6J+losTA69wbm20LagzVvpQ8w0o4XRNt6w5DqvyAEayhUbyFo+yBP9C5dvlPAFK /pmA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=ksh61HdFJKW5FzhQ3xBVqQbws7N9eS43eLhKNpPeqew=; b=HXyXDpuiHIL85507ATnLS0h0JpN1sudBvbMHSND2ri5ELuNbmpFXIRD6WesVPiLRt2 faZVBSQVRW04o7M2v/F+5R+sZl1nU4lxEGwH2DUxIsmVnK9MF71TtFc+0RGXwaQO0prH s86F0hO2IJfqbsKP5ufjZbWwjT1xWZJ7lLCP4wNxNIXe+nrORa653yda2MrGGwocYrJR 1QycFeOzwNjyTaEbaq7/6sJ+WLhm69TYUM5Sk3poiU0iGH0KsJE5A7/tY8+arb+v1sDg 2KDhWScCL5Sgy0Ae6gPfvcdCewlXk/vytNhet9CTs7SA4tTr4z/jHBNaCvBnYqi9K9FA cIBQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=MCxyEzTT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b18-20020a63e712000000b00490ddd8a344si38146330pgi.46.2023.01.05.02.24.24; Thu, 05 Jan 2023 02:24:37 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=MCxyEzTT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232892AbjAEKXQ (ORCPT + 99 others); Thu, 5 Jan 2023 05:23:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41232 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233030AbjAEKVv (ORCPT ); Thu, 5 Jan 2023 05:21:51 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1E62C58D1C for ; Thu, 5 Jan 2023 02:19:53 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-460ab8a327eso378504857b3.23 for ; Thu, 05 Jan 2023 02:19:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ksh61HdFJKW5FzhQ3xBVqQbws7N9eS43eLhKNpPeqew=; b=MCxyEzTTO0yqg91shqPo6Ni+5EFzxsgTqYc3+pyMOGf0oatdr3BfMXYm2vwczO7xxN 6eXisofnTjW6ihZw5N2TmlX/02IpUvl2Obfz7KJBIRlML8eUYj2RVscTVOTPLqUOfEUl 7ALcPn6iafF/zuFvLcOWLr1Ggm27nMtZb4wuXZmie98LQOT+8Oo6f9LCBYicU9kTcsva a2UPiZZmZL+sjQYo2rL2gE+wftIxhHkEnB5vkmujJzydzSbw5WDTc7LnG3S0nQMS6Apd oD0DG4TxblVTSmtrDjw3jcw1qYcj6bz1oBV8ULyJY3Xpek6MnVH6iGGXHSiY51d0n8u/ xlOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ksh61HdFJKW5FzhQ3xBVqQbws7N9eS43eLhKNpPeqew=; b=6tZ+z8JBShw/JQJtKaxOK6DrsJqczZOUOe4P2D/WFv9ViqRG8E25m/v4aySebX4rF5 SFCzTh5VqRHy4Xq17CHRk7Emklkc/99kEPOu9efDPQSktxtfM7yLWOcNHc52NfkQ2PhV QK1emZZLKLpP+2IsfAo9wxBjIQ1rcCu6YiPQmr5gOtcckoUs2PgtIStKoeR95vvQ6SBs gLYniUqkcT40oR1L9qZLE0CoD6ryAX81FiIiGb3EfYdIHLeS8cGCt3RrqvXK9YZm06X2 BSjNso+VvF4oN8m4pTTZFwyAzkYV3APK2wN3XQ7gQg0WQn5iDSiCOeKMfsgE9nAOHFUo 88yQ== X-Gm-Message-State: AFqh2kruN8NiX7rPCIcOKZOYJjaXzAChHHjFv3KTj+c3NiY7gj3J4uQ2 I+Xj+R3AJB6aiSfQB+fMujz6lPDqSp7pVZzP X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a81:1c17:0:b0:475:7911:2119 with SMTP id c23-20020a811c17000000b0047579112119mr5474920ywc.359.1672913992847; Thu, 05 Jan 2023 02:19:52 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:38 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-41-jthoughton@google.com> Subject: [PATCH 40/46] docs: hugetlb: update hugetlb and userfaultfd admin-guides with HGM info From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177761413450314?= X-GMAIL-MSGID: =?utf-8?q?1754177761413450314?= This includes information about how UFFD_FEATURE_MINOR_HUGETLBFS_HGM should be used and when MADV_COLLAPSE should be used with it. Signed-off-by: James Houghton --- Documentation/admin-guide/mm/hugetlbpage.rst | 4 ++++ Documentation/admin-guide/mm/userfaultfd.rst | 16 +++++++++++++++- 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst b/Documentation/admin-guide/mm/hugetlbpage.rst index 19f27c0d92e0..ca7db15ae768 100644 --- a/Documentation/admin-guide/mm/hugetlbpage.rst +++ b/Documentation/admin-guide/mm/hugetlbpage.rst @@ -454,6 +454,10 @@ errno set to EINVAL or exclude hugetlb pages that extend beyond the length if not hugepage aligned. For example, munmap(2) will fail if memory is backed by a hugetlb page and the length is smaller than the hugepage size. +It is possible for users to map HugeTLB pages at a higher granularity than +normal using HugeTLB high-granularity mapping (HGM). For example, when using 1G +pages on x86, a user could map that page with 4K PTEs, 2M PMDs, a combination of +the two. See Documentation/admin-guide/mm/userfaultfd.rst. Examples ======== diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst index 83f31919ebb3..19877aaad61b 100644 --- a/Documentation/admin-guide/mm/userfaultfd.rst +++ b/Documentation/admin-guide/mm/userfaultfd.rst @@ -115,6 +115,14 @@ events, except page fault notifications, may be generated: areas. ``UFFD_FEATURE_MINOR_SHMEM`` is the analogous feature indicating support for shmem virtual memory areas. +- ``UFFD_FEATURE_MINOR_HUGETLBFS_HGM`` indicates that the kernel supports + small-page-aligned regions for ``UFFDIO_CONTINUE`` in HugeTLB-backed + virtual memory areas. ``UFFD_FEATURE_MINOR_HUGETLBFS_HGM`` and + ``UFFD_FEATURE_EXACT_ADDRESS`` must both be specified explicitly to enable + this behavior. If ``UFFD_FEATURE_MINOR_HUGETLBFS_HGM`` is specified but + ``UFFD_FEATURE_EXACT_ADDRESS`` is not, then ``UFFDIO_API`` will fail with + ``EINVAL``. + The userland application should set the feature flags it intends to use when invoking the ``UFFDIO_API`` ioctl, to request that those features be enabled if supported. @@ -169,7 +177,13 @@ like to do to resolve it: the page cache). Userspace has the option of modifying the page's contents before resolving the fault. Once the contents are correct (modified or not), userspace asks the kernel to map the page and let the - faulting thread continue with ``UFFDIO_CONTINUE``. + faulting thread continue with ``UFFDIO_CONTINUE``. If this is done at the + base-page size in a transparent-hugepage-eligible VMA or in a HugeTLB VMA + (requires ``UFFD_FEATURE_MINOR_HUGETLBFS_HGM``), then userspace may want to + use ``MADV_COLLAPSE`` when a hugepage is fully populated to inform the kernel + that it may be able to collapse the mapping. ``MADV_COLLAPSE`` will may undo + the effect of any ``UFFDIO_WRITEPROTECT`` calls on the collapsed address + range. Notes: From patchwork Thu Jan 5 10:18:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39467 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp229063wrt; Thu, 5 Jan 2023 02:26:35 -0800 (PST) X-Google-Smtp-Source: AMrXdXvahVthyfmV03PPshQFc76VUmA+RHEUis5/cmnR3UW3MblPuez9k0dLJD82g/0oO1zRSwtL X-Received: by 2002:a05:6a21:3284:b0:ad:4a3e:a6e1 with SMTP id yt4-20020a056a21328400b000ad4a3ea6e1mr80470836pzb.11.1672914395504; Thu, 05 Jan 2023 02:26:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914395; cv=none; d=google.com; s=arc-20160816; b=ZIX3dUuHRpyaLfNtwNo5dPRHWQybi5mspGw3iGnfYpozk13GCEG7Wp589J2D9ObRX5 lh0e2GAg5VF0CxZhNHSXX77KmjxWfaBSnsh1TDeMLmJtqtWb4HGfR+ghvGP3b7hlXT26 t5ntcvafer4hxHwfDu/Ro3JpTVdAvt+6YAeE6VlEcjGNNvXvnYh0IablGzkY79KxoQqq Ngiyaykm0Rs7X0QK4JvFpyLXbuVMSiWXfzcG928dqP6Ag2vERaUASQ3KxlIo7afAP1ns x1GesuEEC6iuDKz2ytBAPSPp/pBSeHXf8ZIK+IegAk973T/kdmOJv4DaqWxVEhhnA1WF vuVw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=99zyT64KwIVcKcOLonOKX5TPwNAFDbPGP6/roR2wJeg=; b=KBetSUUXlCV4h8UUoQv2U4f0z5pzPNaqKNp/3w9i3gOUbtUxD8TBw7kQD/kOK9PvXy noxEPGLgABvHCV2skc5mCPhIkGW54YO0Mxz4r/+oQ3sjme749BsNm+b02e4o/YiWgUZ2 UUi54Ww1j5jxlIsX/MrfYGGzfi3EfEMaSN4No/uazM1WTTYseVgfw68H8AGXFPBlPmIR P6gnT7mYHGTtyBaEZ0d1R+eu16HvyQkIk5lJcWITuIh7Uo+2BgVwXAP+BFxu7Y44Fkhe +oB15NhrY/dwjVts0xRDgtW2ispsMO/xN2nH2ROSODdd0ppeHE2Y1yMlb//0EU/tYyR+ Palg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=BGsOO1Xa; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b18-20020a63e712000000b00490ddd8a344si38146330pgi.46.2023.01.05.02.26.22; Thu, 05 Jan 2023 02:26:35 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=BGsOO1Xa; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233083AbjAEKX1 (ORCPT + 99 others); Thu, 5 Jan 2023 05:23:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41246 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233049AbjAEKVw (ORCPT ); Thu, 5 Jan 2023 05:21:52 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A564658D23 for ; Thu, 5 Jan 2023 02:19:54 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-4528903f275so370835987b3.8 for ; Thu, 05 Jan 2023 02:19:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=99zyT64KwIVcKcOLonOKX5TPwNAFDbPGP6/roR2wJeg=; b=BGsOO1Xac220ouarsldL8eKmYhPBMh3Wgm0XdhF2qRYBawDIEYWKUm1fx3k2kI0tO3 6haBYEyU52usmxK/1mMxlgVirWHauUoqjfTmpqfD8C4NnA9XgvhoFikpqHngC1fco9V7 S+UlF+nVqFxP0rGL7Ey8oAS/AXZi5EU2c143l7lqDwLU7BB+WKAdEjqfM7TPqkG2EeyJ uasY9UGaoogWW9/VJti+da2o1nbNCj584CdLWxHWuI/cw3xKS8dYTTiOo6VECuB/GmOW 1BhWMy0g/FBHeLD+VAxCyqfvElw8RGe86lYpIg6QfrQJ1bwD+NpCsEMEh5dkAuTWPcaJ K9jg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=99zyT64KwIVcKcOLonOKX5TPwNAFDbPGP6/roR2wJeg=; b=EJikxhHm4HhLDHl09ThbXkuMOs/TZp8+nFApVYemYvhnWryG0Z4+TZxF88JhUYyvBP oqLBejfYcsC02TmJ4z9hNPBzB299Z0HBjCC3nrNyZjewlxocJx3ZLh2mAlaK0cXXNz1i vGUJUCu3fEB0ZgHr4rzK10QEC8aceyfqu3bWCniil3qobX4BD4vwqMQwvasNJQ2jekbN 6WAcA3vkZDZQppNxiAh7+jTEFD1U+uOvWIjXRHXMUhs2DIoDmhMtC6evPb5yOszBDdP0 7Nxi3kQU+g3bBVfcI/ESpXLkXQNqJ6NdRNu0U6Ez++3dXmyU8nlnWeak98JIfg9P5IRs O2sA== X-Gm-Message-State: AFqh2koiIqBJTzhWquFMGKhdxA34AuAdqwQkUGrg8F3EXt1fYVXZ5lIA +7XqgWiJavzWzBpXAamyj7Q/BUcf1Bq0WFJX X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:3146:0:b0:6fb:80c:fe0f with SMTP id x67-20020a253146000000b006fb080cfe0fmr4203995ybx.25.1672913993943; Thu, 05 Jan 2023 02:19:53 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:39 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-42-jthoughton@google.com> Subject: [PATCH 41/46] docs: proc: include information about HugeTLB HGM From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177884876526183?= X-GMAIL-MSGID: =?utf-8?q?1754177884876526183?= This includes the updates that have been made to smaps, specifically, the addition of Hugetlb[Pud,Pmd,Pte]Mapped. Signed-off-by: James Houghton --- Documentation/filesystems/proc.rst | 56 +++++++++++++++++------------- 1 file changed, 32 insertions(+), 24 deletions(-) diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst index e224b6d5b642..1fbb1310cea1 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -447,29 +447,32 @@ Memory Area, or VMA) there is a series of lines such as the following:: 08048000-080bc000 r-xp 00000000 03:02 13130 /bin/bash - Size: 1084 kB - KernelPageSize: 4 kB - MMUPageSize: 4 kB - Rss: 892 kB - Pss: 374 kB - Pss_Dirty: 0 kB - Shared_Clean: 892 kB - Shared_Dirty: 0 kB - Private_Clean: 0 kB - Private_Dirty: 0 kB - Referenced: 892 kB - Anonymous: 0 kB - LazyFree: 0 kB - AnonHugePages: 0 kB - ShmemPmdMapped: 0 kB - Shared_Hugetlb: 0 kB - Private_Hugetlb: 0 kB - Swap: 0 kB - SwapPss: 0 kB - KernelPageSize: 4 kB - MMUPageSize: 4 kB - Locked: 0 kB - THPeligible: 0 + Size: 1084 kB + KernelPageSize: 4 kB + MMUPageSize: 4 kB + Rss: 892 kB + Pss: 374 kB + Pss_Dirty: 0 kB + Shared_Clean: 892 kB + Shared_Dirty: 0 kB + Private_Clean: 0 kB + Private_Dirty: 0 kB + Referenced: 892 kB + Anonymous: 0 kB + LazyFree: 0 kB + AnonHugePages: 0 kB + ShmemPmdMapped: 0 kB + Shared_Hugetlb: 0 kB + Private_Hugetlb: 0 kB + HugetlbPudMapped: 0 kB + HugetlbPmdMapped: 0 kB + HugetlbPteMapped: 0 kB + Swap: 0 kB + SwapPss: 0 kB + KernelPageSize: 4 kB + MMUPageSize: 4 kB + Locked: 0 kB + THPeligible: 0 VmFlags: rd ex mr mw me dw The first of these lines shows the same information as is displayed for the @@ -510,10 +513,15 @@ implementation. If this is not desirable please file a bug report. "ShmemPmdMapped" shows the ammount of shared (shmem/tmpfs) memory backed by huge pages. -"Shared_Hugetlb" and "Private_Hugetlb" show the ammounts of memory backed by +"Shared_Hugetlb" and "Private_Hugetlb" show the amounts of memory backed by hugetlbfs page which is *not* counted in "RSS" or "PSS" field for historical reasons. And these are not included in {Shared,Private}_{Clean,Dirty} field. +If the kernel was compiled with ``CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING``, +"HugetlbPudMapped", "HugetlbPmdMapped", and "HugetlbPteMapped" will appear and +show the amount of HugeTLB memory mapped with PUDs, PMDs, and PTEs respectively. +See Documentation/admin-guide/mm/hugetlbpage.rst. + "Swap" shows how much would-be-anonymous memory is also used, but out on swap. For shmem mappings, "Swap" includes also the size of the mapped (and not From patchwork Thu Jan 5 10:18:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39466 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp228909wrt; Thu, 5 Jan 2023 02:26:01 -0800 (PST) X-Google-Smtp-Source: AMrXdXucBwUo6pFnGW6MhzGq/n0hcZ7PXAZFOzPd2mf46fJ+aAFYYlSFfti5BmjzJRlTxqaGq9MH X-Received: by 2002:a17:902:f08a:b0:189:c19a:2cd9 with SMTP id p10-20020a170902f08a00b00189c19a2cd9mr49946267pla.25.1672914361472; Thu, 05 Jan 2023 02:26:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914361; cv=none; d=google.com; s=arc-20160816; b=MwkAhuC4ZHfw9X9YJrZyNYFUK2KUcrU2qMabxtlhKNHITAMfxRyer8v3dF4u4WAd4O qoDvEiALpUmTVfxFxSciQVrw2XtfLXIJoaVgjRAcwMnB0ulzMtDOm66bSe0gyF0+gRyq w0n1bT5qI3XuUaO8O2uNa+8YX6nGxrvCQue3rJKvFcJzRVqefda8ce2CNmMALwbCP5Qk VEtUuLaY+OpxgW7HmQTXDWj9Gpb6O++LGwHO2+WtE1BPzDbInf3SCr7i+46zNnaPM6Ha JBxDlaCPUh61AyrnWn+L8lOZz1B3O4LyOUG8+uAlHNMmpKkMPPDC408AR1R+5PIC7C/H Jvyg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=oM1fuebOiYdsKI6sxS1LfWku8C58IPteD/Sei8VpSxQ=; b=LJ8SoElNVF0jvbDdifa/0OjeLaxO2EXOg9mH25HbE9JkeIfNQABHKvumfwL1kCMzlh CT6s5eJLmUbhmAUheMcW3wmi3DUmYCS6dN9Jt1Iz74hq/CuDw9baelxXvkDf0ugEFCfb F52468EO/qzYqOVZ4ZLutHf2nebRRp+3hKdDsMZsQ9AycdO+Q3T6qxme1Hc5oQHyt85C UNNNReG3SK8i69e+wymxgwmIci43D4tpgewDM4lYyxWdNX+VPvEzzHWti0HcNNy4Ew+R QvIeGl6y76lkDcJ8tVomwyf2B1Wam1U0BWv018yjesvXUmiGPw9djswYgyf9rqbDPX21 5Psw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=I8sjFLZn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a9-20020a170902710900b001755f43df36si35322005pll.479.2023.01.05.02.25.48; Thu, 05 Jan 2023 02:26:01 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=I8sjFLZn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233030AbjAEKXV (ORCPT + 99 others); Thu, 5 Jan 2023 05:23:21 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40484 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233050AbjAEKVx (ORCPT ); Thu, 5 Jan 2023 05:21:53 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C14A2485A8 for ; Thu, 5 Jan 2023 02:19:55 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id g9-20020a25bdc9000000b0073727a20239so36034702ybk.4 for ; Thu, 05 Jan 2023 02:19:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=oM1fuebOiYdsKI6sxS1LfWku8C58IPteD/Sei8VpSxQ=; b=I8sjFLZnx+aNBRXgBMZRH312QMYInRX5Te2g9nBKTxCe34+OYraFmUgSlUONr2eooJ 62HtRkoKcvk84eoWTg4qbcUFd2KKx07w9eSKHxyCphfKi6xjhUfineaz01Tx3MjmftXp zpVT5lb/OOmri7egZafvXcJTXzLNWPrLvhzt30S+FUXmd1QDJW2D+ASIf6sSrCO2pujj Wcuy4YdikybIl74tALJwRxpzS/O3LreoB3I8MFENTfEl2pSn4PDiQF+YbQjOUN/Fl9LC tguO+iGWrK5Q4xc2bG/OYNQ+g0mnWemidtzpfKV7SFWPLEAOT5YY+2u3ODxF8u421Xje VNWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=oM1fuebOiYdsKI6sxS1LfWku8C58IPteD/Sei8VpSxQ=; b=NvxZxG//tdacZW3tKFFYMVXghjed07XTcXg4w8VISCFMd/xmZ0tfnqKYfNRoqbp8uv EBMRJjQ1QUCftSo9o+Ju/gcBAIZZgjv5NWmqnrjj1Dn1UJ4XXBE10Iy5c9bRBeN+yMnx ndsD8kbMSqL3i08AfAuSBWCPf9l+5qEa++x4qRonbCyFgkrAVeLp7Q7yA1KO52UMQOJ7 S/XFOHsZSdJ3VJunX0W5zeZ0Hgnrblg0HW11ujcYwqdKWpsk8n6Pfjs0gdW1TJ+KRca5 iJxYyJK5h0dHs/r/nTcTWjPqQ7qyA8H2JzcR2QkMqTw/SYxMPoASqESxoBzT+mUj7VM+ eUwA== X-Gm-Message-State: AFqh2koomw0z7CO3yZxCCMMXxLIYTcRuf27lfvIA/3t6EruM8i7vsom3 MPfvtZKFsOZnQqIev9zH74z4VG0xGrXCd/QY X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a81:25cf:0:b0:426:6938:b154 with SMTP id l198-20020a8125cf000000b004266938b154mr102769ywl.511.1672913995101; Thu, 05 Jan 2023 02:19:55 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:40 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-43-jthoughton@google.com> Subject: [PATCH 42/46] selftests/vm: add HugeTLB HGM to userfaultfd selftest From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754177849397878916?= X-GMAIL-MSGID: =?utf-8?q?1754177849397878916?= This test case behaves similarly to the regular shared HugeTLB configuration, except that it uses 4K instead of hugepages, and that we ignore the UFFDIO_COPY tests, as UFFDIO_CONTINUE is the only ioctl that supports PAGE_SIZE-aligned regions. This doesn't test MADV_COLLAPSE. Other tests are added later to exercise MADV_COLLAPSE. Signed-off-by: James Houghton --- tools/testing/selftests/vm/userfaultfd.c | 84 +++++++++++++++++++----- 1 file changed, 69 insertions(+), 15 deletions(-) diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c index 7f22844ed704..681c5c5f863b 100644 --- a/tools/testing/selftests/vm/userfaultfd.c +++ b/tools/testing/selftests/vm/userfaultfd.c @@ -73,9 +73,10 @@ static unsigned long nr_cpus, nr_pages, nr_pages_per_cpu, page_size, hpage_size; #define BOUNCE_POLL (1<<3) static int bounces; -#define TEST_ANON 1 -#define TEST_HUGETLB 2 -#define TEST_SHMEM 3 +#define TEST_ANON 1 +#define TEST_HUGETLB 2 +#define TEST_HUGETLB_HGM 3 +#define TEST_SHMEM 4 static int test_type; #define UFFD_FLAGS (O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY) @@ -93,6 +94,8 @@ static volatile bool test_uffdio_zeropage_eexist = true; static bool test_uffdio_wp = true; /* Whether to test uffd minor faults */ static bool test_uffdio_minor = false; +static bool test_uffdio_copy = true; + static bool map_shared; static int mem_fd; static unsigned long long *count_verify; @@ -151,7 +154,7 @@ static void usage(void) fprintf(stderr, "\nUsage: ./userfaultfd " "[hugetlbfs_file]\n\n"); fprintf(stderr, "Supported : anon, hugetlb, " - "hugetlb_shared, shmem\n\n"); + "hugetlb_shared, hugetlb_shared_hgm, shmem\n\n"); fprintf(stderr, "'Test mods' can be joined to the test type string with a ':'. " "Supported mods:\n"); fprintf(stderr, "\tsyscall - Use userfaultfd(2) (default)\n"); @@ -167,6 +170,11 @@ static void usage(void) exit(1); } +static bool test_is_hugetlb(void) +{ + return test_type == TEST_HUGETLB || test_type == TEST_HUGETLB_HGM; +} + #define _err(fmt, ...) \ do { \ int ret = errno; \ @@ -381,7 +389,7 @@ static struct uffd_test_ops *uffd_test_ops; static inline uint64_t uffd_minor_feature(void) { - if (test_type == TEST_HUGETLB && map_shared) + if (test_is_hugetlb() && map_shared) return UFFD_FEATURE_MINOR_HUGETLBFS; else if (test_type == TEST_SHMEM) return UFFD_FEATURE_MINOR_SHMEM; @@ -393,7 +401,7 @@ static uint64_t get_expected_ioctls(uint64_t mode) { uint64_t ioctls = UFFD_API_RANGE_IOCTLS; - if (test_type == TEST_HUGETLB) + if (test_is_hugetlb()) ioctls &= ~(1 << _UFFDIO_ZEROPAGE); if (!((mode & UFFDIO_REGISTER_MODE_WP) && test_uffdio_wp)) @@ -500,13 +508,16 @@ static void uffd_test_ctx_clear(void) static void uffd_test_ctx_init(uint64_t features) { unsigned long nr, cpu; + uint64_t enabled_features = features; uffd_test_ctx_clear(); uffd_test_ops->allocate_area((void **)&area_src, true); uffd_test_ops->allocate_area((void **)&area_dst, false); - userfaultfd_open(&features); + userfaultfd_open(&enabled_features); + if ((enabled_features & features) != features) + err("couldn't enable all features"); count_verify = malloc(nr_pages * sizeof(unsigned long long)); if (!count_verify) @@ -726,13 +737,16 @@ static void uffd_handle_page_fault(struct uffd_msg *msg, struct uffd_stats *stats) { unsigned long offset; + unsigned long address; if (msg->event != UFFD_EVENT_PAGEFAULT) err("unexpected msg event %u", msg->event); + address = msg->arg.pagefault.address; + if (msg->arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_WP) { /* Write protect page faults */ - wp_range(uffd, msg->arg.pagefault.address, page_size, false); + wp_range(uffd, address, page_size, false); stats->wp_faults++; } else if (msg->arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_MINOR) { uint8_t *area; @@ -751,11 +765,10 @@ static void uffd_handle_page_fault(struct uffd_msg *msg, */ area = (uint8_t *)(area_dst + - ((char *)msg->arg.pagefault.address - - area_dst_alias)); + ((char *)address - area_dst_alias)); for (b = 0; b < page_size; ++b) area[b] = ~area[b]; - continue_range(uffd, msg->arg.pagefault.address, page_size); + continue_range(uffd, address, page_size); stats->minor_faults++; } else { /* @@ -782,7 +795,7 @@ static void uffd_handle_page_fault(struct uffd_msg *msg, if (msg->arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_WRITE) err("unexpected write fault"); - offset = (char *)(unsigned long)msg->arg.pagefault.address - area_dst; + offset = (char *)address - area_dst; offset &= ~(page_size-1); if (copy_page(uffd, offset)) @@ -1192,6 +1205,12 @@ static int userfaultfd_events_test(void) char c; struct uffd_stats stats = { 0 }; + if (!test_uffdio_copy) { + printf("Skipping userfaultfd events test " + "(test_uffdio_copy=false)\n"); + return 0; + } + printf("testing events (fork, remap, remove): "); fflush(stdout); @@ -1245,6 +1264,12 @@ static int userfaultfd_sig_test(void) char c; struct uffd_stats stats = { 0 }; + if (!test_uffdio_copy) { + printf("Skipping userfaultfd signal test " + "(test_uffdio_copy=false)\n"); + return 0; + } + printf("testing signal delivery: "); fflush(stdout); @@ -1329,6 +1354,11 @@ static int userfaultfd_minor_test(void) uffd_test_ctx_init(uffd_minor_feature()); + if (test_type == TEST_HUGETLB_HGM) + /* Enable high-granularity userfaultfd ioctls for HugeTLB */ + if (madvise(area_dst_alias, nr_pages * page_size, MADV_SPLIT)) + err("MADV_SPLIT failed"); + uffdio_register.range.start = (unsigned long)area_dst_alias; uffdio_register.range.len = nr_pages * page_size; uffdio_register.mode = UFFDIO_REGISTER_MODE_MINOR; @@ -1538,6 +1568,12 @@ static int userfaultfd_stress(void) pthread_attr_init(&attr); pthread_attr_setstacksize(&attr, 16*1024*1024); + if (!test_uffdio_copy) { + printf("Skipping userfaultfd stress test " + "(test_uffdio_copy=false)\n"); + bounces = 0; + } + while (bounces--) { printf("bounces: %d, mode:", bounces); if (bounces & BOUNCE_RANDOM) @@ -1696,6 +1732,16 @@ static void set_test_type(const char *type) uffd_test_ops = &hugetlb_uffd_test_ops; /* Minor faults require shared hugetlb; only enable here. */ test_uffdio_minor = true; + } else if (!strcmp(type, "hugetlb_shared_hgm")) { + map_shared = true; + test_type = TEST_HUGETLB_HGM; + uffd_test_ops = &hugetlb_uffd_test_ops; + /* + * HugeTLB HGM only changes UFFDIO_CONTINUE, so don't test + * UFFDIO_COPY. + */ + test_uffdio_minor = true; + test_uffdio_copy = false; } else if (!strcmp(type, "shmem")) { map_shared = true; test_type = TEST_SHMEM; @@ -1731,6 +1777,7 @@ static void parse_test_type_arg(const char *raw_type) err("Unsupported test: %s", raw_type); if (test_type == TEST_HUGETLB) + /* TEST_HUGETLB_HGM gets small pages. */ page_size = hpage_size; else page_size = sysconf(_SC_PAGE_SIZE); @@ -1813,22 +1860,29 @@ int main(int argc, char **argv) nr_cpus = x < y ? x : y; } nr_pages_per_cpu = bytes / page_size / nr_cpus; + if (test_type == TEST_HUGETLB_HGM) + /* + * `page_size` refers to the page_size we can use in + * UFFDIO_CONTINUE. We still need nr_pages to be appropriately + * aligned, so align it here. + */ + nr_pages_per_cpu -= nr_pages_per_cpu % (hpage_size / page_size); if (!nr_pages_per_cpu) { _err("invalid MiB"); usage(); } + nr_pages = nr_pages_per_cpu * nr_cpus; bounces = atoi(argv[3]); if (bounces <= 0) { _err("invalid bounces"); usage(); } - nr_pages = nr_pages_per_cpu * nr_cpus; - if (test_type == TEST_SHMEM || test_type == TEST_HUGETLB) { + if (test_type == TEST_SHMEM || test_is_hugetlb()) { unsigned int memfd_flags = 0; - if (test_type == TEST_HUGETLB) + if (test_is_hugetlb()) memfd_flags = MFD_HUGETLB; mem_fd = memfd_create(argv[0], memfd_flags); if (mem_fd < 0) From patchwork Thu Jan 5 10:18:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39470 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp233423wrt; Thu, 5 Jan 2023 02:39:08 -0800 (PST) X-Google-Smtp-Source: AMrXdXusygrX2CfiV/DJgSGO7AaHtZwLleXoPvd6AMfkdNbrXNs/xMZMjrLKRtDwLdMUZ/1WL4hd X-Received: by 2002:a05:6a21:32a5:b0:a4:93ca:a2d with SMTP id yt37-20020a056a2132a500b000a493ca0a2dmr91067511pzb.49.1672915147940; Thu, 05 Jan 2023 02:39:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672915147; cv=none; d=google.com; s=arc-20160816; b=CMazfiMEzoucb8t5G/3kVg2qOpI8tcrh+UX1/dv/OIg+jn40Q0XSfHdkJnRQan0qF9 j7uF4CBfHzGsubDt4O4xrkkyIDRFNBqB7PrdhQAk6R23wRw9GkNATvijtPNt933ZRu2V xzl2OffZ84RbSLmsARlLJX1qIK+JwiIOa7ykYX8KX8sV36gXiQmo8B/DmZZ2VpPJgESe yKX//z15rSbV0ht+PPrgBLJsBBd6e0+fhNM2wVuWLmgWi1uQ2+brN+iGHncynVk1TGJ8 57b2/hUPzZnGycY+nRMY9ZPC3ggMrtUTttx5Clxm5F0RpCdOf6n87e2Je7TR83EJjYAU qGMw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=EgIq+1+XdJ7ES1jkz4HOORu1ptYeWb67NT6vWcXFv0I=; b=c5bXFC4Ejw3deB4/Udvdon+p2gAzTlmECuPXSC164DNiWBoUvCBRUvVjkZIl0KyJnM wpXLni8rpDHYlLA6n+BWlOS9hKYNvtF2cOFjVoxM72DxO2sn81Tw69BhOi3qQSUc0I2c ANbqDb3r3nhIda62GtOLag0cPvHwV5Q9koNvtfKIlA3ypCszcrxy3j+xy3iaEKaQ7Scy wM+v6xMeGuM9R3FmYZi59EMUpngCItjYyBBHldTk41jGoD+CBqe5BaJdFh5FdiNNrRhE bVephEDvEHIDhwmbN5CttIbnHs1xvThjZzC3Gdvh4Oyi9raMQdVLVKgcZPanDpCinuCC 9F2A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=c7YTwst+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z4-20020a170902ccc400b00192aecb230dsi15837591ple.209.2023.01.05.02.38.54; Thu, 05 Jan 2023 02:39:07 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=c7YTwst+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233276AbjAEKXb (ORCPT + 99 others); Thu, 5 Jan 2023 05:23:31 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40180 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233071AbjAEKVy (ORCPT ); Thu, 5 Jan 2023 05:21:54 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 306D24FD63 for ; Thu, 5 Jan 2023 02:19:57 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id a4-20020a5b0004000000b006fdc6aaec4fso36883924ybp.20 for ; Thu, 05 Jan 2023 02:19:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=EgIq+1+XdJ7ES1jkz4HOORu1ptYeWb67NT6vWcXFv0I=; b=c7YTwst+CXQAde/EC4BKjPkQxQslDinc3BRCYCIN4Tt+2cEbTnwk0TU3bwgaP6mee9 kEFf2wTCzpL5xUl7MHVCSLefee0s5iUkXnq/BXDX7FnSsqYa/Xt+MvBh9hZyl95WL9rl VzfwsERIDQ64g0aldVR7KIXBbKA6yB7hRzB4DLlyHtDcrEcwlH1V2Z30dYIICNHagJAF qbjJhxKV7+3WFweiPZZyzSyK5XDS2kSEUb5iNNMFi6ZL9VpJlG160jKr5BCMmceiKdCo 7Po0f1f/HHNFQPSs5wgIOpRg/gJwmw4aukeHNgtzIRiDRL7lBaiZl3zRpKSymn7/ugDK YcAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=EgIq+1+XdJ7ES1jkz4HOORu1ptYeWb67NT6vWcXFv0I=; b=GW19mtECeLCwwUkMK2+PLwq2ZwvlpNPRflD2FnrPsm9+Egg3ZSV67OAaLYTbSn3SXD tzqDjzf4OuhMT8i2yRfCKZkA09p8DvQJM9r8HdF6XHSHbxYUNTS1GnMij9Tf6ATjzHHi 2p4AaFSJPr4RHqUsy19wWcJoqrTOKt9+NvL8ru/6vS6C2Mq6ug6aPX1QsOp9z/o3hCC9 y85M592N3pKZMg5Zmvz6YruzTvILIj0EaeXQCeR1lY4LA8AVHaDYMm0pHLwSSkkgYLw6 QjmPNNjrZo0tLKhQ2mjNhXNCnNJrqHRTsMb6rI06VxmkLcjrhT3wABOSqrt/LJ4yHcj4 GPBQ== X-Gm-Message-State: AFqh2krdFVnCwezt++MtpQ6ypsGHsydYTzxIM3AD6BUZHarzr1veO7q2 XL1Nw4TijIdQi4feGBqbeedWVpFp9ShDZoYY X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a81:17d6:0:b0:3ea:9ce2:cd76 with SMTP id 205-20020a8117d6000000b003ea9ce2cd76mr93735ywx.217.1672913996509; Thu, 05 Jan 2023 02:19:56 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:41 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-44-jthoughton@google.com> Subject: [PATCH 43/46] selftests/kvm: add HugeTLB HGM to KVM demand paging selftest From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754178674417947944?= X-GMAIL-MSGID: =?utf-8?q?1754178674417947944?= This test exercises the GUP paths for HGM. MADV_COLLAPSE is not tested. Signed-off-by: James Houghton --- tools/testing/selftests/kvm/demand_paging_test.c | 2 +- tools/testing/selftests/kvm/include/test_util.h | 2 ++ .../selftests/kvm/include/userfaultfd_util.h | 6 +++--- tools/testing/selftests/kvm/lib/kvm_util.c | 2 +- tools/testing/selftests/kvm/lib/test_util.c | 14 ++++++++++++++ tools/testing/selftests/kvm/lib/userfaultfd_util.c | 14 +++++++++++--- 6 files changed, 32 insertions(+), 8 deletions(-) diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c index b0e1fc4de9e2..e534f9c927bf 100644 --- a/tools/testing/selftests/kvm/demand_paging_test.c +++ b/tools/testing/selftests/kvm/demand_paging_test.c @@ -170,7 +170,7 @@ static void run_test(enum vm_guest_mode mode, void *arg) uffd_descs[i] = uffd_setup_demand_paging( p->uffd_mode, p->uffd_delay, vcpu_hva, vcpu_args->pages * memstress_args.guest_page_size, - &handle_uffd_page_request); + p->src_type, &handle_uffd_page_request); } } diff --git a/tools/testing/selftests/kvm/include/test_util.h b/tools/testing/selftests/kvm/include/test_util.h index 80d6416f3012..a2106c19a614 100644 --- a/tools/testing/selftests/kvm/include/test_util.h +++ b/tools/testing/selftests/kvm/include/test_util.h @@ -103,6 +103,7 @@ enum vm_mem_backing_src_type { VM_MEM_SRC_ANONYMOUS_HUGETLB_16GB, VM_MEM_SRC_SHMEM, VM_MEM_SRC_SHARED_HUGETLB, + VM_MEM_SRC_SHARED_HUGETLB_HGM, NUM_SRC_TYPES, }; @@ -121,6 +122,7 @@ size_t get_def_hugetlb_pagesz(void); const struct vm_mem_backing_src_alias *vm_mem_backing_src_alias(uint32_t i); size_t get_backing_src_pagesz(uint32_t i); bool is_backing_src_hugetlb(uint32_t i); +bool is_backing_src_shared_hugetlb(enum vm_mem_backing_src_type src_type); void backing_src_help(const char *flag); enum vm_mem_backing_src_type parse_backing_src_type(const char *type_name); long get_run_delay(void); diff --git a/tools/testing/selftests/kvm/include/userfaultfd_util.h b/tools/testing/selftests/kvm/include/userfaultfd_util.h index 877449c34592..d91528a58245 100644 --- a/tools/testing/selftests/kvm/include/userfaultfd_util.h +++ b/tools/testing/selftests/kvm/include/userfaultfd_util.h @@ -26,9 +26,9 @@ struct uffd_desc { pthread_t thread; }; -struct uffd_desc *uffd_setup_demand_paging(int uffd_mode, useconds_t delay, - void *hva, uint64_t len, - uffd_handler_t handler); +struct uffd_desc *uffd_setup_demand_paging( + int uffd_mode, useconds_t delay, void *hva, uint64_t len, + enum vm_mem_backing_src_type src_type, uffd_handler_t handler); void uffd_stop_demand_paging(struct uffd_desc *uffd); diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index c88c3ace16d2..67e7223f054b 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -972,7 +972,7 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm, region->fd = -1; if (backing_src_is_shared(src_type)) region->fd = kvm_memfd_alloc(region->mmap_size, - src_type == VM_MEM_SRC_SHARED_HUGETLB); + is_backing_src_shared_hugetlb(src_type)); region->mmap_start = mmap(NULL, region->mmap_size, PROT_READ | PROT_WRITE, diff --git a/tools/testing/selftests/kvm/lib/test_util.c b/tools/testing/selftests/kvm/lib/test_util.c index 5c22fa4c2825..712a0878932e 100644 --- a/tools/testing/selftests/kvm/lib/test_util.c +++ b/tools/testing/selftests/kvm/lib/test_util.c @@ -271,6 +271,13 @@ const struct vm_mem_backing_src_alias *vm_mem_backing_src_alias(uint32_t i) */ .flag = MAP_SHARED, }, + [VM_MEM_SRC_SHARED_HUGETLB_HGM] = { + /* + * Identical to shared_hugetlb except for the name. + */ + .name = "shared_hugetlb_hgm", + .flag = MAP_SHARED, + }, }; _Static_assert(ARRAY_SIZE(aliases) == NUM_SRC_TYPES, "Missing new backing src types?"); @@ -289,6 +296,7 @@ size_t get_backing_src_pagesz(uint32_t i) switch (i) { case VM_MEM_SRC_ANONYMOUS: case VM_MEM_SRC_SHMEM: + case VM_MEM_SRC_SHARED_HUGETLB_HGM: return getpagesize(); case VM_MEM_SRC_ANONYMOUS_THP: return get_trans_hugepagesz(); @@ -305,6 +313,12 @@ bool is_backing_src_hugetlb(uint32_t i) return !!(vm_mem_backing_src_alias(i)->flag & MAP_HUGETLB); } +bool is_backing_src_shared_hugetlb(enum vm_mem_backing_src_type src_type) +{ + return src_type == VM_MEM_SRC_SHARED_HUGETLB || + src_type == VM_MEM_SRC_SHARED_HUGETLB_HGM; +} + static void print_available_backing_src_types(const char *prefix) { int i; diff --git a/tools/testing/selftests/kvm/lib/userfaultfd_util.c b/tools/testing/selftests/kvm/lib/userfaultfd_util.c index 92cef20902f1..3c7178d6c4f4 100644 --- a/tools/testing/selftests/kvm/lib/userfaultfd_util.c +++ b/tools/testing/selftests/kvm/lib/userfaultfd_util.c @@ -25,6 +25,10 @@ #ifdef __NR_userfaultfd +#ifndef MADV_SPLIT +#define MADV_SPLIT 26 +#endif + static void *uffd_handler_thread_fn(void *arg) { struct uffd_desc *uffd_desc = (struct uffd_desc *)arg; @@ -108,9 +112,9 @@ static void *uffd_handler_thread_fn(void *arg) return NULL; } -struct uffd_desc *uffd_setup_demand_paging(int uffd_mode, useconds_t delay, - void *hva, uint64_t len, - uffd_handler_t handler) +struct uffd_desc *uffd_setup_demand_paging( + int uffd_mode, useconds_t delay, void *hva, uint64_t len, + enum vm_mem_backing_src_type src_type, uffd_handler_t handler) { struct uffd_desc *uffd_desc; bool is_minor = (uffd_mode == UFFDIO_REGISTER_MODE_MINOR); @@ -140,6 +144,10 @@ struct uffd_desc *uffd_setup_demand_paging(int uffd_mode, useconds_t delay, "ioctl UFFDIO_API failed: %" PRIu64, (uint64_t)uffdio_api.api); + if (src_type == VM_MEM_SRC_SHARED_HUGETLB_HGM) + TEST_ASSERT(!madvise(hva, len, MADV_SPLIT), + "Could not enable HGM"); + uffdio_register.range.start = (uint64_t)hva; uffdio_register.range.len = len; uffdio_register.mode = uffd_mode; From patchwork Thu Jan 5 10:18:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39469 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp233288wrt; Thu, 5 Jan 2023 02:38:39 -0800 (PST) X-Google-Smtp-Source: AMrXdXstYyCBJx3CGtvR0SgsHbAZLxbYZH/5k1gm/W0EMiOle0BAyZCNHNjQGwzBSea/UiG7Nvy3 X-Received: by 2002:a17:906:3ad7:b0:7c1:6e53:dd02 with SMTP id z23-20020a1709063ad700b007c16e53dd02mr49687158ejd.64.1672915119429; Thu, 05 Jan 2023 02:38:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672915119; cv=none; d=google.com; s=arc-20160816; b=dNFfadmnBF6OJIyFEXHyhtvvzfJivZfMZJCfYz0ZNsnHAy48a2vPSo4XwEPUtwJbyp 6e7E54OWL5crGrGihcnqPqc12BaT0DqU15XMML5c4Ehqk09tgo5dnov+sFbPnxgbE1kI mhH0po/Iud6eI1dRM20pJv18D1fnJljV5Ykv0I66JSX3M+RsnXmftcu0xZVeDi8D5kos EOSkoHBHN03A72/uj5b1LP4qal8L81bepBff0RIfFXO32WSjnuFR+AAjGZzFzjwyCdU9 Gf99s+rYOiwo6gWdHIQUTbvnrV0U5kqOKRnme9s8sLvQY0HpxZyhOiLCt1qOSo+RnQRK V1lA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=rPcVmryUH+86L7TuxpN8GfuTRNIB++IYGz7ii/fh+60=; b=QC3R5QDEL7sfueTE33lxBv4LYzYEhlM4AVCkwnj3WymU5YvYPKlisMfwa42o+RARv3 j+TbG61gX1LnkdmCU7UY7wkGfa0Uq7h8SeK9vjzb1qIzADlZMDNgAivhkeb3+ttOCYSy H3fGSID5AE+zweZZnGE14iMFJ+OZLFzi20mnm0Cz+SV6U0O7x16BEVS3b2YiHnOAPIzE zGiyKpb4QZfS4ki2QaeJkTf87fzzIkVRNv8Pgwjou1aw5WNmjRxGa2euQyufwm+v1J38 CvoZ67zhCBfkF9zPDlwpzBNSBhAAlSuhXNLOt4lB74F5EFJGYi+sgwVbRliOoPKQsJt7 Mg1w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=kI4ZTRQe; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id dp15-20020a170906c14f00b007c111fc30absi30937543ejc.865.2023.01.05.02.38.13; Thu, 05 Jan 2023 02:38:39 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=kI4ZTRQe; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233287AbjAEKXg (ORCPT + 99 others); Thu, 5 Jan 2023 05:23:36 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41296 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233081AbjAEKVz (ORCPT ); Thu, 5 Jan 2023 05:21:55 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D5ED14D485 for ; Thu, 5 Jan 2023 02:19:59 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-4700580ca98so305719667b3.9 for ; Thu, 05 Jan 2023 02:19:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=rPcVmryUH+86L7TuxpN8GfuTRNIB++IYGz7ii/fh+60=; b=kI4ZTRQe7Q0DNwU8Nd2EZKZhgCILp8F6DoLJQBQIeMmVpBPgy2nY9fRh+7WBj3mKcv b3mHhtPqHKj27maMD9zcK0xNFg7ZH3P+7Zrl4zxFnCf7G/kUV7ygs9iJH6H3DPnwY+7v pEVpy5jqUM6S9+jlWu94uWgwsLLyquWN42I/U3Mye5rfAyPIaLO+CcmOU47oPaODXHBZ g6tTWmWkSREPHJl4j7JvfcfZmIeSW+FeLQ5bKfcMOb/+OzwicxRoFD6mPiFPk365Ae1D C7W9a4QJVUB1YYOH+GqrZqYY061aZQSPOGcDY51qN61zXe8gTbgIaiQswond5R+j607U +9Iw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=rPcVmryUH+86L7TuxpN8GfuTRNIB++IYGz7ii/fh+60=; b=D4ESXZZPo+2MM7GUKFmn7uhMcesL4FpJMI+Rl/yTjt6bjW01QUEs8yP4Pmxj7+ibgE ctvO1hyZn3+jCgG6cOMG8/he08vZHnXyLqVXWin6GjkIkQ3aextsKHx+xP/ElIzmq7C7 7eU0pAw3HPqu5WDHibAcOAcKsnxGWiNrDYwrXYJxtSuZr0cRiWMYIOQ+dcFFUe0EwRAo QXrB+xlzfo7BMGsnmgy0WrypFA7xBf0KGys/jv5hoXTVTNNgNu+PmH0i/cBWFgI4e9ts 7Q9OaFexB7fNwHKCCyn5zh9bTGsr2b189kIPm/BjJdwYY5J9l7OkkL074OC1KH5cXzEQ luVA== X-Gm-Message-State: AFqh2kq0kzu9wEJeoQGPvdmGlRY/PnKmdSocdwoGbTtr5B7v9I8bHO0W B5E3wrYbhNtzzXm/LXjDLwXXPK93N8zY4D1d X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:7c81:0:b0:727:e539:452f with SMTP id x123-20020a257c81000000b00727e539452fmr5333597ybc.552.1672913999168; Thu, 05 Jan 2023 02:19:59 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:42 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-45-jthoughton@google.com> Subject: [PATCH 44/46] selftests/vm: add anon and shared hugetlb to migration test From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754178643965598400?= X-GMAIL-MSGID: =?utf-8?q?1754178643965598400?= Shared HugeTLB mappings are migrated best-effort. Sometimes, due to being unable to grab the VMA lock for writing, migration may just randomly fail. To allow for that, we allow retries. Signed-off-by: James Houghton --- tools/testing/selftests/vm/migration.c | 83 ++++++++++++++++++++++++-- 1 file changed, 79 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/vm/migration.c b/tools/testing/selftests/vm/migration.c index 1cec8425e3ca..21577a84d7e4 100644 --- a/tools/testing/selftests/vm/migration.c +++ b/tools/testing/selftests/vm/migration.c @@ -13,6 +13,7 @@ #include #include #include +#include #define TWOMEG (2<<20) #define RUNTIME (60) @@ -59,11 +60,12 @@ FIXTURE_TEARDOWN(migration) free(self->pids); } -int migrate(uint64_t *ptr, int n1, int n2) +int migrate(uint64_t *ptr, int n1, int n2, int retries) { int ret, tmp; int status = 0; struct timespec ts1, ts2; + int failed = 0; if (clock_gettime(CLOCK_MONOTONIC, &ts1)) return -1; @@ -78,6 +80,9 @@ int migrate(uint64_t *ptr, int n1, int n2) ret = move_pages(0, 1, (void **) &ptr, &n2, &status, MPOL_MF_MOVE_ALL); if (ret) { + if (++failed < retries) + continue; + if (ret > 0) printf("Didn't migrate %d pages\n", ret); else @@ -88,6 +93,7 @@ int migrate(uint64_t *ptr, int n1, int n2) tmp = n2; n2 = n1; n1 = tmp; + failed = 0; } return 0; @@ -128,7 +134,7 @@ TEST_F_TIMEOUT(migration, private_anon, 2*RUNTIME) if (pthread_create(&self->threads[i], NULL, access_mem, ptr)) perror("Couldn't create thread"); - ASSERT_EQ(migrate(ptr, self->n1, self->n2), 0); + ASSERT_EQ(migrate(ptr, self->n1, self->n2, 1), 0); for (i = 0; i < self->nthreads - 1; i++) ASSERT_EQ(pthread_cancel(self->threads[i]), 0); } @@ -158,7 +164,7 @@ TEST_F_TIMEOUT(migration, shared_anon, 2*RUNTIME) self->pids[i] = pid; } - ASSERT_EQ(migrate(ptr, self->n1, self->n2), 0); + ASSERT_EQ(migrate(ptr, self->n1, self->n2, 1), 0); for (i = 0; i < self->nthreads - 1; i++) ASSERT_EQ(kill(self->pids[i], SIGTERM), 0); } @@ -185,9 +191,78 @@ TEST_F_TIMEOUT(migration, private_anon_thp, 2*RUNTIME) if (pthread_create(&self->threads[i], NULL, access_mem, ptr)) perror("Couldn't create thread"); - ASSERT_EQ(migrate(ptr, self->n1, self->n2), 0); + ASSERT_EQ(migrate(ptr, self->n1, self->n2, 1), 0); + for (i = 0; i < self->nthreads - 1; i++) + ASSERT_EQ(pthread_cancel(self->threads[i]), 0); +} + +/* + * Tests the anon hugetlb migration entry paths. + */ +TEST_F_TIMEOUT(migration, private_anon_hugetlb, 2*RUNTIME) +{ + uint64_t *ptr; + int i; + + if (self->nthreads < 2 || self->n1 < 0 || self->n2 < 0) + SKIP(return, "Not enough threads or NUMA nodes available"); + + ptr = mmap(NULL, TWOMEG, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0); + if (ptr == MAP_FAILED) + SKIP(return, "Could not allocate hugetlb pages"); + + memset(ptr, 0xde, TWOMEG); + for (i = 0; i < self->nthreads - 1; i++) + if (pthread_create(&self->threads[i], NULL, access_mem, ptr)) + perror("Couldn't create thread"); + + ASSERT_EQ(migrate(ptr, self->n1, self->n2, 1), 0); for (i = 0; i < self->nthreads - 1; i++) ASSERT_EQ(pthread_cancel(self->threads[i]), 0); } +/* + * Tests the shared hugetlb migration entry paths. + */ +TEST_F_TIMEOUT(migration, shared_hugetlb, 2*RUNTIME) +{ + uint64_t *ptr; + int i; + int fd; + unsigned long sz; + struct statfs filestat; + + if (self->nthreads < 2 || self->n1 < 0 || self->n2 < 0) + SKIP(return, "Not enough threads or NUMA nodes available"); + + fd = memfd_create("tmp_hugetlb", MFD_HUGETLB); + if (fd < 0) + SKIP(return, "Couldn't create hugetlb memfd"); + + if (fstatfs(fd, &filestat) < 0) + SKIP(return, "Couldn't fstatfs hugetlb file"); + + sz = filestat.f_bsize; + + if (ftruncate(fd, sz)) + SKIP(return, "Couldn't allocate hugetlb pages"); + ptr = mmap(NULL, sz, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + if (ptr == MAP_FAILED) + SKIP(return, "Could not map hugetlb pages"); + + memset(ptr, 0xde, sz); + for (i = 0; i < self->nthreads - 1; i++) + if (pthread_create(&self->threads[i], NULL, access_mem, ptr)) + perror("Couldn't create thread"); + + ASSERT_EQ(migrate(ptr, self->n1, self->n2, 10), 0); + for (i = 0; i < self->nthreads - 1; i++) { + ASSERT_EQ(pthread_cancel(self->threads[i]), 0); + pthread_join(self->threads[i], NULL); + } + ftruncate(fd, 0); + close(fd); +} + TEST_HARNESS_MAIN From patchwork Thu Jan 5 10:18:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39471 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp233516wrt; Thu, 5 Jan 2023 02:39:26 -0800 (PST) X-Google-Smtp-Source: AMrXdXv5cXImIprqKqqZTMJpcsqi+za8p+1R0Lg1N7LnHwXw/C8Ty0bBD8fngfeTz3HTEri9k6Ro X-Received: by 2002:a05:6a20:13a8:b0:b0:a35:b763 with SMTP id w40-20020a056a2013a800b000b00a35b763mr81567219pzh.5.1672915165711; Thu, 05 Jan 2023 02:39:25 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672915165; cv=none; d=google.com; s=arc-20160816; b=n31gYKrfqn4Fr0BcRp2wpyGoBcXG/spGrUz4cxZTnn2AORZFHmavnIjbZbNUc2m9C7 ngRwQVx+GCIb2R0Z3H0t5+Vhu0V8bxszvtiIWdKDXCfeT8Uxfa3oB3pMkMK/JLUHqY/L vXYpRCDENbWMPowst7u7g6QW1PAPY14HaGIx0KJrt/CWj6syLJH3jAN7WFGCpvvL45Bc s5x+Whn3N0E6D0trirgct+M8Yb+VG7+oKcnaj9rhZq2nsngg6jsB4/KUcLbm45iT6Ryb 8rUlRvJ4WtsD60HqLYd5hFOtV7TEJN1oYiFQR/2G8rPlYyl+GE6D3sYztyt5YC41gNyw W4JA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=rtdl4oKQo+XEbgxtU4YTxEZN3Xtv3pieSBor01lKxpM=; b=cSnww1TfW/sLVjuwvPMzkrRYgmtNMCCWCPJpXmpYxTGH4l+q//yfqt89oxsC+CxV79 Mt/P/1T4X7f1gLrf7elAY9fTLXS914XBJjk25cLXuO7Mxlz04Dppu/Jg5CWvWicY5+xT 9z71NIE7LuH1ckzOa1FgugP0kXwqs7Y7/dhxxTy2Mk1QY14/N1WFWJOBybgnBGfZ5Wuc ULthNwC6QPuFcd+2yqi1Yj5nLcRlfTXJIwWj7g1JZ7jtkCI0eIsE5hdWt418gdWgdNcm BzpgOipjIUShr0oAncCDIJi75kFmo0affWkNhL6tAsUDpWmCxNUOO5R9J9fW/pz3RPkO wwTg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=JoK7U3yU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a22-20020a656416000000b004783935de72si40138337pgv.45.2023.01.05.02.39.13; Thu, 05 Jan 2023 02:39:25 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=JoK7U3yU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232942AbjAEKXl (ORCPT + 99 others); Thu, 5 Jan 2023 05:23:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40186 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233100AbjAEKV4 (ORCPT ); Thu, 5 Jan 2023 05:21:56 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EE72550E6B for ; Thu, 5 Jan 2023 02:20:01 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id v9-20020a259d89000000b007b515f139e0so1270719ybp.17 for ; Thu, 05 Jan 2023 02:20:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=rtdl4oKQo+XEbgxtU4YTxEZN3Xtv3pieSBor01lKxpM=; b=JoK7U3yUK3xMp7jj3ED65QQRxyxFdtP2VVXxjeaagA1yOQ1VDOWmerpsJ93SBkq9zh 1+sCrcsTNTUzGHfBCN3DVG0Ub0PaSUoyZmam0r/dYWFPfLKvvL3DvvjbWnpAFCJmqtmQ 9dL77taavQp85y0beQ4G5rqDpS08FPS6eH96z44yL0EIzphbGcRadmhuAWN3kkhTwcyi Z6tLvspR8aGzxFL9AdtTmhEJWc3alCOwAyuQe0Fe4aSztOMMuenUMqwRmV5J/FhoFHxX pOzfJfehMxt0Fd8Q97JtyGiSLNhamOIcoa298fqunC0dT5N8onLPpGnxOJexx6VIdt++ pOSw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=rtdl4oKQo+XEbgxtU4YTxEZN3Xtv3pieSBor01lKxpM=; b=CKFsRSYXjKk7ZMF12qnZxr7w33qTpLv3YyMsd+ttfA9/6lNamU/cHR2u4eQNW2atDT eq8kY6+gfP/hhlN8NkOVnomIDseZWIzR03CjuDwDeECo/Zo20zEnYuMD4Vn8EVHlgIlC RuRoC0eLmea8m4XBFOt7ls8Y3dy7/L/eV02lBSyaK32OTuA96wLg4SEYYVvHKA41MLb3 /0IoD+naU3H7gE+ImDzjCCNllNXwEg0vxjxzaTfWaNa+ZyrcbA9n8RU8QXfdCcEX7C/4 QJeBYm7+mFHS7RlSB0e7iOWqH97I0nLS5PEZleSQSoDexz/+mo9wCIL/2qVtvmdMDgXX BaYA== X-Gm-Message-State: AFqh2krwlXrxhChOjLQnbnAJsyVPsnLqdv/pHXy5zvnAzeika3tnCUF6 VPiIoIOGYVvyizFLF+Fk475yOpMpulcy8Uh1 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:8149:0:b0:7b6:5baa:c97c with SMTP id j9-20020a258149000000b007b65baac97cmr13349ybm.515.1672914001263; Thu, 05 Jan 2023 02:20:01 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:43 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-46-jthoughton@google.com> Subject: [PATCH 45/46] selftests/vm: add hugetlb HGM test to migration selftest From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754178692336343364?= X-GMAIL-MSGID: =?utf-8?q?1754178692336343364?= This is mostly the same as the shared HugeTLB case, but instead of mapping the page with a regular page fault, we map it with lots of UFFDIO_CONTINUE operations. We also verify that the contents haven't changed after the migration, which would be the case if the post-migration PTEs pointed to the wrong page. Signed-off-by: James Houghton --- tools/testing/selftests/vm/migration.c | 146 +++++++++++++++++++++++++ 1 file changed, 146 insertions(+) diff --git a/tools/testing/selftests/vm/migration.c b/tools/testing/selftests/vm/migration.c index 21577a84d7e4..1fb3607accab 100644 --- a/tools/testing/selftests/vm/migration.c +++ b/tools/testing/selftests/vm/migration.c @@ -14,12 +14,21 @@ #include #include #include +#include +#include +#include +#include +#include #define TWOMEG (2<<20) #define RUNTIME (60) #define ALIGN(x, a) (((x) + (a - 1)) & (~((a) - 1))) +#ifndef MADV_SPLIT +#define MADV_SPLIT 26 +#endif + FIXTURE(migration) { pthread_t *threads; @@ -265,4 +274,141 @@ TEST_F_TIMEOUT(migration, shared_hugetlb, 2*RUNTIME) close(fd); } +#ifdef __NR_userfaultfd +static int map_at_high_granularity(char *mem, size_t length) +{ + int i; + int ret; + int uffd = syscall(__NR_userfaultfd, 0); + struct uffdio_api api; + struct uffdio_register reg; + int pagesize = getpagesize(); + + if (uffd < 0) { + perror("couldn't create uffd"); + return uffd; + } + + api.api = UFFD_API; + api.features = 0; + + ret = ioctl(uffd, UFFDIO_API, &api); + if (ret || api.api != UFFD_API) { + perror("UFFDIO_API failed"); + goto out; + } + + if (madvise(mem, length, MADV_SPLIT) == -1) { + perror("MADV_SPLIT failed"); + goto out; + } + + reg.range.start = (unsigned long)mem; + reg.range.len = length; + + reg.mode = UFFDIO_REGISTER_MODE_MISSING | UFFDIO_REGISTER_MODE_MINOR; + + ret = ioctl(uffd, UFFDIO_REGISTER, ®); + if (ret) { + perror("UFFDIO_REGISTER failed"); + goto out; + } + + /* UFFDIO_CONTINUE each 4K segment of the 2M page. */ + for (i = 0; i < length/pagesize; ++i) { + struct uffdio_continue cont; + + cont.range.start = (unsigned long long)mem + i * pagesize; + cont.range.len = pagesize; + cont.mode = 0; + ret = ioctl(uffd, UFFDIO_CONTINUE, &cont); + if (ret) { + fprintf(stderr, "UFFDIO_CONTINUE failed " + "for %llx -> %llx: %d\n", + cont.range.start, + cont.range.start + cont.range.len, + errno); + goto out; + } + } + ret = 0; +out: + close(uffd); + return ret; +} +#else +static int map_at_high_granularity(char *mem, size_t length) +{ + fprintf(stderr, "Userfaultfd missing\n"); + return -1; +} +#endif /* __NR_userfaultfd */ + +/* + * Tests the high-granularity hugetlb migration entry paths. + */ +TEST_F_TIMEOUT(migration, shared_hugetlb_hgm, 2*RUNTIME) +{ + uint64_t *ptr; + int i; + int fd; + unsigned long sz; + struct statfs filestat; + + if (self->nthreads < 2 || self->n1 < 0 || self->n2 < 0) + SKIP(return, "Not enough threads or NUMA nodes available"); + + fd = memfd_create("tmp_hugetlb", MFD_HUGETLB); + if (fd < 0) + SKIP(return, "Couldn't create hugetlb memfd"); + + if (fstatfs(fd, &filestat) < 0) + SKIP(return, "Couldn't fstatfs hugetlb file"); + + sz = filestat.f_bsize; + + if (ftruncate(fd, sz)) + SKIP(return, "Couldn't allocate hugetlb pages"); + + if (fallocate(fd, 0, 0, sz) < 0) { + perror("fallocate failed"); + SKIP(return, "fallocate failed"); + } + + ptr = mmap(NULL, sz, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + if (ptr == MAP_FAILED) + SKIP(return, "Could not allocate hugetlb pages"); + + /* + * We have to map_at_high_granularity before we memset, otherwise + * memset will map everything at the hugepage size. + */ + if (map_at_high_granularity((char *)ptr, sz) < 0) + SKIP(return, "Could not map HugeTLB range at high granularity"); + + /* Populate the page we're migrating. */ + for (i = 0; i < sz/sizeof(*ptr); ++i) + ptr[i] = i; + + for (i = 0; i < self->nthreads - 1; i++) + if (pthread_create(&self->threads[i], NULL, access_mem, ptr)) + perror("Couldn't create thread"); + + ASSERT_EQ(migrate(ptr, self->n1, self->n2, 10), 0); + for (i = 0; i < self->nthreads - 1; i++) { + ASSERT_EQ(pthread_cancel(self->threads[i]), 0); + pthread_join(self->threads[i], NULL); + } + + /* Check that the contents didnt' change. */ + for (i = 0; i < sz/sizeof(*ptr); ++i) { + ASSERT_EQ(ptr[i], i); + if (ptr[i] != i) + break; + } + + ftruncate(fd, 0); + close(fd); +} + TEST_HARNESS_MAIN From patchwork Thu Jan 5 10:18:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 39468 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp231080wrt; Thu, 5 Jan 2023 02:32:16 -0800 (PST) X-Google-Smtp-Source: AMrXdXv8JQ7r648zPfbga3VB3/vDJBBgVnewHd33EqgDNPiRSMCXIi8JgUvC0r99DE+2oYc0P4jd X-Received: by 2002:a17:907:d48e:b0:7ae:b2e4:7b3f with SMTP id vj14-20020a170907d48e00b007aeb2e47b3fmr46258956ejc.8.1672914735805; Thu, 05 Jan 2023 02:32:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672914735; cv=none; d=google.com; s=arc-20160816; b=Q8LnklWcSS64qaqhbRr3xv+ye7juG9NbYe779axKkXV14Z7Z9ykJdLZx//5zJlDOaK 966jV+I1yt81LX9AGISlqUuWlZ7D+nYMd6GVGPsnxTAE+bFl4pNwgHCbMBVY9lCULm6O QkTLGuC9AdGG3CkQZFpZMFXP1L+bYat4+ek7dPA8i4M8zj6XnoG30kBaTMRrkrLgTvdv Ska+rLOrlAAZcqYQDt4bFg6iDNFfgL9Qo3rVWz1uHnmC2A3eSO3mXMragNic73XXeIgQ 3n16GVqrCPlAfHyb2ZK9Bi/zgRJDluU8EqSTyPSsOV6FvWu2J4YEbRydOt2ppQP1JmxP 2jpA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=iZP6H5SebDPMe8Z9a4A7Pf8cY5vrps0BxI4iu34x5Ck=; b=sZFt6L+fOe2mTcWygvCX2jYMEB0Q1xCZrexV1pul5tA8+6IcxJmYZmTyCivkfnx5Xl FDa1VQZBDeiAirfF2ZsZr3W6SibeMkZDGk8yGzsdUf/ST0Si/8W6CQk8/jIVZkBdX9hc RuzudYg4pUFtHVRCoj5jM+Znze/6Q6spZM8XvVCetaMLeQnXwVxhdM1UsCdFk4BBfbaS D11H2B9j1fEyW3g7c/pUeBmxOF/dO9tO5yUtejsCD3LahefKYPKFIfmw1aetllm/Th9X lcwXGIsywvn6qv4Ht+LYhMMNe9sFqs/01vnWCeD0PVlJm7QXCZfjhFyf/b4nJ3XT6H9c 6iLg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=FPeE17fM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hq15-20020a1709073f0f00b007c0d88f1614si32670132ejc.342.2023.01.05.02.31.50; Thu, 05 Jan 2023 02:32:15 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=FPeE17fM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233126AbjAEKXq (ORCPT + 99 others); Thu, 5 Jan 2023 05:23:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41332 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232770AbjAEKWY (ORCPT ); Thu, 5 Jan 2023 05:22:24 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 31C1858D2B for ; Thu, 5 Jan 2023 02:20:04 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id w9-20020a05690210c900b007b20e8d0c99so4518225ybu.0 for ; Thu, 05 Jan 2023 02:20:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=iZP6H5SebDPMe8Z9a4A7Pf8cY5vrps0BxI4iu34x5Ck=; b=FPeE17fM2FM2bV60Sj1OaHnRhzqmXE0KPiqkIfKEtnEBuBPDiMOT/w/Udj4oVv3HCS 5q3PpRaQwWWOBzFTdOYoLbywYGPwMsEvCHssUbAsowr+IT9z/IWTwLGie0mMV2+cMbeb cvtAQSBVX9PdQ0lvlYS64Gdf9diov9UIYQ88guFqQTaH+gxQ7dA00f1v9vWPZGJh61XI jU+7SVaQxB3j7FlB+mkL5b6E4ZsRuq+Fs7C3vQgG4xQ9UccPQqq9xMqrJ4o0XGqMGs/G Fr7csII5Pm+/rVUvre0+5eE8fK1XxXa40/ns4sO2I0eokWaab9MvijLgi0lK+qkTiyoZ /gyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=iZP6H5SebDPMe8Z9a4A7Pf8cY5vrps0BxI4iu34x5Ck=; b=3zzHrp/rJxdBj6sosoMRnXagNwPHRcRQ+rqAL8c84pnHO5oA51XNPd1++1mx0Ea+G3 Mskb8FJ4Hq0kV6DIaij76P1MGhyjRhyWo+Hlmmzv6hq4WlugUfgA1VbOAYBeCVZqFoqP b3A4mjXPcUKDg8z9Df1TX/0bg2b5JuccPx405Fbxqb2+aaHXgAP6FDi7dB57Sk7GBsKO RzjgLf/a9GjtjvQG32UXXhhi4JefuDhYbEV9/yh2IzMA5wLRXsM7ghiH1wvuPbtGPVgd Tiz4ZH/J1BJM1h77NzpvXzle2nTqMDH9NP9Dj/2zj8FgheP6ah9c8ny8K3tvRrV4IKC3 9k9g== X-Gm-Message-State: AFqh2krux+BbBPgA69KOFBk+DSQY7ITqP9hiUVFBSbWiv4gBHAO6tIoZ MaMbuCYQRGEK6VL1nkzeycXKeOjgSb1CMYn/ X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:690c:444:b0:3d7:66df:9b62 with SMTP id bj4-20020a05690c044400b003d766df9b62mr8131895ywb.133.1672914003495; Thu, 05 Jan 2023 02:20:03 -0800 (PST) Date: Thu, 5 Jan 2023 10:18:44 +0000 In-Reply-To: <20230105101844.1893104-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230105101844.1893104-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20230105101844.1893104-47-jthoughton@google.com> Subject: [PATCH 46/46] selftests/vm: add HGM UFFDIO_CONTINUE and hwpoison tests From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754178242261608771?= X-GMAIL-MSGID: =?utf-8?q?1754178242261608771?= This tests that high-granularity CONTINUEs at all sizes work (exercising contiguous PTE sizes for arm64, when support is added). This also tests that collapse works and hwpoison works correctly (although we aren't yet testing high-granularity poison). This test uses UFFD_FEATURE_EVENT_FORK + UFFD_REGISTER_MODE_WP to force the kernel to copy page tables on fork(), exercising the changes to copy_hugetlb_page_range(). Signed-off-by: James Houghton --- tools/testing/selftests/vm/Makefile | 1 + tools/testing/selftests/vm/hugetlb-hgm.c | 455 +++++++++++++++++++++++ 2 files changed, 456 insertions(+) create mode 100644 tools/testing/selftests/vm/hugetlb-hgm.c diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile index 89c14e41bd43..4aa4ca75a471 100644 --- a/tools/testing/selftests/vm/Makefile +++ b/tools/testing/selftests/vm/Makefile @@ -32,6 +32,7 @@ TEST_GEN_FILES += compaction_test TEST_GEN_FILES += gup_test TEST_GEN_FILES += hmm-tests TEST_GEN_FILES += hugetlb-madvise +TEST_GEN_FILES += hugetlb-hgm TEST_GEN_FILES += hugepage-mmap TEST_GEN_FILES += hugepage-mremap TEST_GEN_FILES += hugepage-shm diff --git a/tools/testing/selftests/vm/hugetlb-hgm.c b/tools/testing/selftests/vm/hugetlb-hgm.c new file mode 100644 index 000000000000..616bc40164bf --- /dev/null +++ b/tools/testing/selftests/vm/hugetlb-hgm.c @@ -0,0 +1,455 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Test uncommon cases in HugeTLB high-granularity mapping: + * 1. Test all supported high-granularity page sizes (with MADV_COLLAPSE). + * 2. Test MADV_HWPOISON behavior. + */ + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define PAGE_MASK ~(4096 - 1) + +#ifndef MADV_COLLAPSE +#define MADV_COLLAPSE 25 +#endif + +#ifndef MADV_SPLIT +#define MADV_SPLIT 26 +#endif + +#define PREFIX " ... " +#define ERROR_PREFIX " !!! " + +enum test_status { + TEST_PASSED = 0, + TEST_FAILED = 1, + TEST_SKIPPED = 2, +}; + +static char *status_to_str(enum test_status status) +{ + switch (status) { + case TEST_PASSED: + return "TEST_PASSED"; + case TEST_FAILED: + return "TEST_FAILED"; + case TEST_SKIPPED: + return "TEST_SKIPPED"; + default: + return "TEST_???"; + } +} + +int userfaultfd(int flags) +{ + return syscall(__NR_userfaultfd, flags); +} + +int map_range(int uffd, char *addr, uint64_t length) +{ + struct uffdio_continue cont = { + .range = (struct uffdio_range) { + .start = (uint64_t)addr, + .len = length, + }, + .mode = 0, + .mapped = 0, + }; + + if (ioctl(uffd, UFFDIO_CONTINUE, &cont) < 0) { + perror(ERROR_PREFIX "UFFDIO_CONTINUE failed"); + return -1; + } + return 0; +} + +int check_equal(char *mapping, size_t length, char value) +{ + size_t i; + + for (i = 0; i < length; ++i) + if (mapping[i] != value) { + printf(ERROR_PREFIX "mismatch at %p (%d != %d)\n", + &mapping[i], mapping[i], value); + return -1; + } + + return 0; +} + +int test_continues(int uffd, char *primary_map, char *secondary_map, size_t len, + bool verify) +{ + size_t offset = 0; + unsigned char iter = 0; + unsigned long pagesize = getpagesize(); + uint64_t size; + + for (size = len/2; size >= pagesize; + offset += size, size /= 2) { + iter++; + memset(secondary_map + offset, iter, size); + printf(PREFIX "UFFDIO_CONTINUE: %p -> %p = %d%s\n", + primary_map + offset, + primary_map + offset + size, + iter, + verify ? " (and verify)" : ""); + if (map_range(uffd, primary_map + offset, size)) + return -1; + if (verify && check_equal(primary_map + offset, size, iter)) + return -1; + } + return 0; +} + +int verify_contents(char *map, size_t len, bool last_4k_zero) +{ + size_t offset = 0; + int i = 0; + uint64_t size; + + for (size = len/2; size > 4096; offset += size, size /= 2) + if (check_equal(map + offset, size, ++i)) + return -1; + + if (last_4k_zero) + /* expect the last 4K to be zero. */ + if (check_equal(map + len - 4096, 4096, 0)) + return -1; + + return 0; +} + +int test_collapse(char *primary_map, size_t len, bool hwpoison) +{ + printf(PREFIX "collapsing %p -> %p\n", primary_map, primary_map + len); + if (madvise(primary_map, len, MADV_COLLAPSE) < 0) { + if (errno == EHWPOISON && hwpoison) { + /* this is expected for the hwpoison test. */ + printf(PREFIX "could not collapse due to poison\n"); + return 0; + } + perror(ERROR_PREFIX "collapse failed"); + return -1; + } + + printf(PREFIX "verifying %p -> %p\n", primary_map, primary_map + len); + return verify_contents(primary_map, len, true); +} + +static void *sigbus_addr; +bool was_mceerr; +bool got_sigbus; + +void sigbus_handler(int signo, siginfo_t *info, void *context) +{ + got_sigbus = true; + was_mceerr = info->si_code == BUS_MCEERR_AR; + sigbus_addr = info->si_addr; + + pthread_exit(NULL); +} + +void *access_mem(void *addr) +{ + volatile char *ptr = addr; + + *ptr; + return NULL; +} + +int test_sigbus(char *addr, bool poison) +{ + int ret = 0; + pthread_t pthread; + + sigbus_addr = (void *)0xBADBADBAD; + was_mceerr = false; + got_sigbus = false; + ret = pthread_create(&pthread, NULL, &access_mem, addr); + if (ret) { + printf(ERROR_PREFIX "failed to create thread: %s\n", + strerror(ret)); + return ret; + } + + pthread_join(pthread, NULL); + if (!got_sigbus) { + printf(ERROR_PREFIX "didn't get a SIGBUS\n"); + return -1; + } else if (sigbus_addr != addr) { + printf(ERROR_PREFIX "got incorrect sigbus address: %p vs %p\n", + sigbus_addr, addr); + return -1; + } else if (poison && !was_mceerr) { + printf(ERROR_PREFIX "didn't get an MCEERR?\n"); + return -1; + } + return 0; +} + +void *read_from_uffd_thd(void *arg) +{ + int uffd = *(int *)arg; + struct uffd_msg msg; + /* opened without O_NONBLOCK */ + if (read(uffd, &msg, sizeof(msg)) != sizeof(msg)) + printf(ERROR_PREFIX "reading uffd failed\n"); + + return NULL; +} + +int read_event_from_uffd(int *uffd, pthread_t *pthread) +{ + int ret = 0; + + ret = pthread_create(pthread, NULL, &read_from_uffd_thd, (void *)uffd); + if (ret) { + printf(ERROR_PREFIX "failed to create thread: %s\n", + strerror(ret)); + return ret; + } + return 0; +} + +enum test_status test_hwpoison(char *primary_map, size_t len) +{ + const unsigned long pagesize = getpagesize(); + const int num_poison_checks = 512; + unsigned long bytes_per_check = len/num_poison_checks; + int i; + + printf(PREFIX "poisoning %p -> %p\n", primary_map, primary_map + len); + if (madvise(primary_map, len, MADV_HWPOISON) < 0) { + perror(ERROR_PREFIX "MADV_HWPOISON failed"); + return TEST_SKIPPED; + } + + printf(PREFIX "checking that it was poisoned " + "(%d addresses within %p -> %p)\n", + num_poison_checks, primary_map, primary_map + len); + + if (pagesize > bytes_per_check) + bytes_per_check = pagesize; + + for (i = 0; i < len; i += bytes_per_check) + if (test_sigbus(primary_map + i, true) < 0) + return TEST_FAILED; + /* check very last byte, because we left it unmapped */ + if (test_sigbus(primary_map + len - 1, true)) + return TEST_FAILED; + + return TEST_PASSED; +} + +int test_fork(int uffd, char *primary_map, size_t len) +{ + int status; + int ret = 0; + pid_t pid; + pthread_t uffd_thd; + + /* + * UFFD_FEATURE_EVENT_FORK will put fork event on the userfaultfd, + * which we must read, otherwise we block fork(). Setup a thread to + * read that event now. + * + * Page fault events should result in a SIGBUS, so we expect only a + * single event from the uffd (the fork event). + */ + if (read_event_from_uffd(&uffd, &uffd_thd)) + return -1; + + pid = fork(); + + if (!pid) { + /* + * Because we have UFFDIO_REGISTER_MODE_WP and + * UFFD_FEATURE_EVENT_FORK, the page tables should be copied + * exactly. + * + * Check that everything except that last 4K has correct + * contents, and then check that the last 4K gets a SIGBUS. + */ + printf(PREFIX "child validating...\n"); + ret = verify_contents(primary_map, len, false) || + test_sigbus(primary_map + len - 1, false); + ret = 0; + exit(ret ? 1 : 0); + } else { + /* wait for the child to finish. */ + waitpid(pid, &status, 0); + ret = WEXITSTATUS(status); + if (!ret) { + printf(PREFIX "parent validating...\n"); + /* Same check as the child. */ + ret = verify_contents(primary_map, len, false) || + test_sigbus(primary_map + len - 1, false); + ret = 0; + } + } + + pthread_join(uffd_thd, NULL); + return ret; + +} + +enum test_status +test_hgm(int fd, size_t hugepagesize, size_t len, bool hwpoison) +{ + int uffd; + char *primary_map, *secondary_map; + struct uffdio_api api; + struct uffdio_register reg; + struct sigaction new, old; + enum test_status status = TEST_SKIPPED; + + if (ftruncate(fd, len) < 0) { + perror(ERROR_PREFIX "ftruncate failed"); + return status; + } + + uffd = userfaultfd(O_CLOEXEC); + if (uffd < 0) { + perror(ERROR_PREFIX "uffd not created"); + return status; + } + + primary_map = mmap(NULL, len, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + if (primary_map == MAP_FAILED) { + perror(ERROR_PREFIX "mmap for primary mapping failed"); + goto close_uffd; + } + secondary_map = mmap(NULL, len, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + if (secondary_map == MAP_FAILED) { + perror(ERROR_PREFIX "mmap for secondary mapping failed"); + goto unmap_primary; + } + + printf(PREFIX "primary mapping: %p\n", primary_map); + printf(PREFIX "secondary mapping: %p\n", secondary_map); + + api.api = UFFD_API; + api.features = UFFD_FEATURE_SIGBUS | UFFD_FEATURE_EXACT_ADDRESS | + UFFD_FEATURE_EVENT_FORK; + if (ioctl(uffd, UFFDIO_API, &api) == -1) { + perror(ERROR_PREFIX "UFFDIO_API failed"); + goto out; + } + + if (madvise(primary_map, len, MADV_SPLIT)) { + perror(ERROR_PREFIX "MADV_SPLIT failed"); + goto out; + } + + reg.range.start = (unsigned long)primary_map; + reg.range.len = len; + /* + * Register with UFFDIO_REGISTER_MODE_WP to force fork() to copy page + * tables (also need UFFD_FEATURE_EVENT_FORK, which we have). + */ + reg.mode = UFFDIO_REGISTER_MODE_MINOR | UFFDIO_REGISTER_MODE_MISSING | + UFFDIO_REGISTER_MODE_WP; + reg.ioctls = 0; + if (ioctl(uffd, UFFDIO_REGISTER, ®) == -1) { + perror(ERROR_PREFIX "register failed"); + goto out; + } + + new.sa_sigaction = &sigbus_handler; + new.sa_flags = SA_SIGINFO; + if (sigaction(SIGBUS, &new, &old) < 0) { + perror(ERROR_PREFIX "could not setup SIGBUS handler"); + goto out; + } + + status = TEST_FAILED; + + if (test_continues(uffd, primary_map, secondary_map, len, !hwpoison)) + goto done; + if (hwpoison) { + /* test_hwpoison can fail with TEST_SKIPPED. */ + enum test_status new_status = test_hwpoison(primary_map, len); + + if (new_status != TEST_PASSED) { + status = new_status; + goto done; + } + } else if (test_fork(uffd, primary_map, len)) + goto done; + if (test_collapse(primary_map, len, hwpoison)) + goto done; + + status = TEST_PASSED; + +done: + if (ftruncate(fd, 0) < 0) { + perror(ERROR_PREFIX "ftruncate back to 0 failed"); + status = TEST_FAILED; + } + +out: + munmap(secondary_map, len); +unmap_primary: + munmap(primary_map, len); +close_uffd: + close(uffd); + return status; +} + +int main(void) +{ + int fd; + struct statfs file_stat; + size_t hugepagesize; + size_t len; + + fd = memfd_create("hugetlb_tmp", MFD_HUGETLB); + if (fd < 0) { + perror(ERROR_PREFIX "could not open hugetlbfs file"); + return -1; + } + + memset(&file_stat, 0, sizeof(file_stat)); + if (fstatfs(fd, &file_stat)) { + perror(ERROR_PREFIX "fstatfs failed"); + goto close; + } + if (file_stat.f_type != HUGETLBFS_MAGIC) { + printf(ERROR_PREFIX "not hugetlbfs file\n"); + goto close; + } + + hugepagesize = file_stat.f_bsize; + len = 2 * hugepagesize; + printf("HGM regular test...\n"); + printf("HGM regular test: %s\n", + status_to_str(test_hgm(fd, hugepagesize, len, false))); + printf("HGM hwpoison test...\n"); + printf("HGM hwpoison test: %s\n", + status_to_str(test_hgm(fd, hugepagesize, len, true))); +close: + close(fd); + + return 0; +}