From patchwork Sat Feb 18 00:27:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58810 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp142049wrn; Fri, 17 Feb 2023 16:29:36 -0800 (PST) X-Google-Smtp-Source: AK7set/tx+525Vi888m6yK0oT+Guql77JWv7OiZJs3/IHyP5WMXOulOE3XDfGCQyuZQTZlml69Ki X-Received: by 2002:a05:6402:1016:b0:4aa:b216:3e23 with SMTP id c22-20020a056402101600b004aab2163e23mr2107787edu.30.1676680176029; Fri, 17 Feb 2023 16:29:36 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676680176; cv=none; d=google.com; s=arc-20160816; b=hsP/WuWM2fsUi01h7zwQ1jJ9FFtl89Vwk8VWLIXFcVG+zzoSquvj6IhdLFym9LZ8an 1/Oa0mkMQMxIoWTXkORzkb5RlA8u/r372xoXqZlzuo9Igf0MotFaGRgAArnEq7LJWbMU kbCS+JzBgxDlLKwFoMYClR456ZCz4Svj4KWaFhvf826iTjCtjLOc31JO+23jKhD4YbDM bFPD9tfLfveVNtBCZVB/txC4sE/HfGDioQ/6bbFb3L/8W79N38mSV21CSyp1lzNbT0eG at4T/qbldhcq17q+Us1a+WByPZMY0ql8547B+ZXGstC/AYckuwN79h18fOgPFcXuML42 sAZw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=WRJSWRU8dQHXXhdXAbgeFtewLtYoHueyg1zV/lollrc=; b=OUKXsk8AE9Dnhz3j+nFcGbpV2atJOJTLvwarlmIu2ntAbbBmhQ0nIoXWPQGlOe5j/P P3OhjNgLa68BMjP37Q30UmiYDmYNEEroqZJCOD/kjDtc2T3zauDL0ed4LsLweUc3IQBw Al9p5hgB/0c20ZAHbjm2UwYHmfR+yWZuNSOQBSjhRg4Hzk0YnVhiDLzgOsg+fm8/uSAZ nEMKOAyydGe+ioNQj10QutAqIMLt4YbgIFuRnYXgTmjganjCTji3LtPOA2h977EHSpdI zTrneMGw228x9LLxBgZuNtDpsSY2AnkYIT7tnXQ4Kuyy9niVOCHS3ak5QfV4rv7RwUJi lX4w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=JUkI+uUZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m3-20020aa7c2c3000000b004acbe85e42csi6711754edp.414.2023.02.17.16.29.10; Fri, 17 Feb 2023 16:29:36 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=JUkI+uUZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229793AbjBRA2s (ORCPT + 99 others); Fri, 17 Feb 2023 19:28:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41652 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229510AbjBRA2q (ORCPT ); Fri, 17 Feb 2023 19:28:46 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3CB2459B49 for ; Fri, 17 Feb 2023 16:28:45 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-536582abb72so22509327b3.5 for ; Fri, 17 Feb 2023 16:28:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=WRJSWRU8dQHXXhdXAbgeFtewLtYoHueyg1zV/lollrc=; b=JUkI+uUZUGikStVOrWOWfUFvkifZQjdvysrYVMcxjv1b2yQplxM71vw1fjhTzhEkxI BEohrL6MRF518g9B3YG2ljl8T7G93FwY6nToc+IVWoDTzVPkWMKdK99Fw+Y/07yhJ3P2 Pt0n2XoDP+p37aLwSJxXkZR/Jyh8LDLuXTOP46PclH0WaRZarQPsfJWYdmcsYTgivcxA V29rcePGsSV2Bm8JBI3cnvgY6VYwZ38Vy23D8uxt/hblg5U5P6fb6PnTeEwjCMXWB3jt Xt02v+X21m74LuwV377B2oYbsfLR/FpkRSgMkEe4zOTBZt33XomrJuA07ToJPeOqmC5B 2qvw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=WRJSWRU8dQHXXhdXAbgeFtewLtYoHueyg1zV/lollrc=; b=EFhDPtIx+Qrf0TIl/4WzadxlbNvCFb/DC0O4nJH6L5mqgMMaPMCn21myVcAXUcXeEh RfTkfQ4w7ZrSo/ssU1dGh9y1kn8UnUJPO1DHqQeVswx94rABzuihVLJGlinpHrXcNPQC PeCpgxQHiAT8W9gvJDNzTuEr/5R2ICjDzaRtIv6gBH7x0W9r+kQbAvw8SV98nWtx1jWu qsdSJgUQAgoJDcWIxAlOvNcDUOD4uCDa5D6iPiNVwz2WB1EWm1JsxsDfWRc8za4CD/wQ tBvokvaP006rcr7iYsbKHDSd72up7gagteZt3Lclwwx9S2DRWB5lLgMhpe9VLQUYvTBU R9WA== X-Gm-Message-State: AO0yUKXItu4v/sEEHBs8svAlKqTmYGHRsTmwIDH6YikvS8OTk9Cyvmmd usz/rZVcHt6xu/rt2Efoj330AKZlzMcJg5fS X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a81:4981:0:b0:527:adb4:3297 with SMTP id w123-20020a814981000000b00527adb43297mr1507453ywa.161.1676680124408; Fri, 17 Feb 2023 16:28:44 -0800 (PST) Date: Sat, 18 Feb 2023 00:27:34 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-2-jthoughton@google.com> Subject: [PATCH v2 01/46] hugetlb: don't set PageUptodate for UFFDIO_CONTINUE From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758126592195884097?= X-GMAIL-MSGID: =?utf-8?q?1758126592195884097?= If would be bad if we actually set PageUptodate with UFFDIO_CONTINUE; PageUptodate indicates that the page has been zeroed, and we don't want to give a non-zeroed page to the user. The reason this change is being made now is because UFFDIO_CONTINUEs on subpages definitely shouldn't set this page flag on the head page. Signed-off-by: James Houghton diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 07abcb6eb203..792cb2e67ce5 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6256,7 +6256,16 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, * preceding stores to the page contents become visible before * the set_pte_at() write. */ - __folio_mark_uptodate(folio); + if (!is_continue) + __folio_mark_uptodate(folio); + else if (!folio_test_uptodate(folio)) { + /* + * This should never happen; HugeTLB pages are always Uptodate + * as soon as they are allocated. + */ + ret = -EFAULT; + goto out_release_nounlock; + } /* Add shared, newly allocated pages to the page cache. */ if (vm_shared && !is_continue) { From patchwork Sat Feb 18 00:27:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58811 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp142081wrn; Fri, 17 Feb 2023 16:29:42 -0800 (PST) X-Google-Smtp-Source: AK7set+WQbFl1YFHknuF7qNuf1M6D5i2+fjf1Db9N2DKpaVWJXbmpxiFtlgCkSGc0IeapiMbXkND X-Received: by 2002:a17:907:3e29:b0:8b1:2614:edfe with SMTP id hp41-20020a1709073e2900b008b12614edfemr9312799ejc.9.1676680182765; Fri, 17 Feb 2023 16:29:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676680182; cv=none; d=google.com; s=arc-20160816; b=vSZ9pMSYHceQ2740HBfs5SR2AT5Enga9CR4m5kJtWw6WuRaAELn2cT4I3bgI2j/NEm kNhSnYViOoX33rWx2xTRcSsA43Eg2mBConYeckubIgA32rd2mCpzsV2L1g3ZBUPc8eur PjdG0Lr3pBK/6MUn+kR/acFFNZSTACFTkcikM16Jziv1HQD1dHo5fp301QicvNbODTst +fRzdDwINFdHCBMjHmV7C54OQ6eeBDJvWYGB0I0BpJlAuY/XKYG3ykK9m1dLhmbaQoEs rQFRp68isZjGrsr4Ul4XwMRFmIRYaxpyu3+SLoYLjuflAmyFJW451Oo3MvSX647YwiDv liMw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=LnKhelmr0oBezYImc/CiFkR5B91EfUKvE0Rbw9pLRe4=; b=eBLJpI8gNhUqQ3eO20xUSIx6EUK7kjiLUGIZ2CFOhH5zLb7jFMUrfC2efb6TS+a56n IlOGmAdDDWjNi1impMFp8dLOOJgiCvzTN2qvp/ftSeo764B+un0NeGvHgYZbYdrK1SgD /NsXewvKJPLxHdPPthNCCaBaySbqjfPZ0Km+rpZ+dMAp8P+J+XtNVu9k/eQO1OP4UsmK Cm3r0OQ7cp6PRESIobHu48dNBJ6eJkpnavIPEnURw54puLGTJiEHLTjrVT63UKBpnMAM War74398GWjaJ+k2BCKaNC5MlHT8Lk4eM4OBDQ360jNUWp3ZHvUC9mlDleaJJ+nxkNhl G9ZQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=lQipQIAG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c20-20020a17090620d400b008b31e354796si4270018ejc.338.2023.02.17.16.29.20; Fri, 17 Feb 2023 16:29:42 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=lQipQIAG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229948AbjBRA2w (ORCPT + 99 others); Fri, 17 Feb 2023 19:28:52 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41670 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229694AbjBRA2r (ORCPT ); Fri, 17 Feb 2023 19:28:47 -0500 Received: from mail-ua1-x949.google.com (mail-ua1-x949.google.com [IPv6:2607:f8b0:4864:20::949]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 421555A3A3 for ; Fri, 17 Feb 2023 16:28:46 -0800 (PST) Received: by mail-ua1-x949.google.com with SMTP id j26-20020ab06cba000000b0066119a9d3bbso855914uaa.21 for ; Fri, 17 Feb 2023 16:28:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=LnKhelmr0oBezYImc/CiFkR5B91EfUKvE0Rbw9pLRe4=; b=lQipQIAGaSVcFBsRExtQaFLnPm6HAzNz2+VJbpondpycb1x3PDuXTd+i0o++TSF9er gxut9YhAnGbep2SPcG0jDIMVR2mq6NQ41EE/6bkL3ZaN7Gb6kf8EWJR5CwB22gEUYwsb dux6/8aAmRzRNX/2PmjME6f1c/ZNU7ZaQroNOMlaJyZVlnL1oAKwRpxb7/HYgt7/PydP lxdcZ2uDPD9axodhgg/osne68KRXEtgsS6fa3iXBXgLQZ3W+b4wOBh19dwO8qERvglmh y56jvQy6e2CUxUzTLyorFkYxBNBmTKZfxgrrDzoZSssQ0cO8QYft6CecYtHZ5imrv86K Ha7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=LnKhelmr0oBezYImc/CiFkR5B91EfUKvE0Rbw9pLRe4=; b=t3k1eNGmgIllVmMfGbkxM2JeaQUux2IfPFoHtU9ouXRB556HeRdp57gENK4S6UlN7t PF3knzrOn5VEHUROtLBpUswa+freXkrJp6OjOEScCh/NMVbpvQ4yCA7idCGPijP59C6R UegqopOA6ECSmPol+tVNP6/sSceokL29/RMAkr8EzFzLh85CWJZJVlJiyNisWJKniHPE 7seYOgyunXLrt3Eb4nEnziC1riril+RSQQ0o0Fl2cyXIclLseIrQsJB3KUAvB00BeYaF KIq+xnTPy2xd6KTDsDmLdPPNFg/HAbnuliTan9RkOftLJjcy3Wbewa39aY2+hXpk8rjQ 6k3Q== X-Gm-Message-State: AO0yUKUvPMPstaS5OamcopfNnBOyh/4eESrLiDHWlgfMCxwoq/jONArJ VbXJ70oXeHweKh1tfCcxwyLH4QRDu3K7u413 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6130:a0a:b0:67a:2833:5ceb with SMTP id bx10-20020a0561300a0a00b0067a28335cebmr54247uab.0.1676680125416; Fri, 17 Feb 2023 16:28:45 -0800 (PST) Date: Sat, 18 Feb 2023 00:27:35 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-3-jthoughton@google.com> Subject: [PATCH v2 02/46] hugetlb: remove mk_huge_pte; it is unused From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758126599328270351?= X-GMAIL-MSGID: =?utf-8?q?1758126599328270351?= mk_huge_pte is unused and not necessary. pte_mkhuge is the appropriate function to call to create a HugeTLB PTE (see Documentation/mm/arch_pgtable_helpers.rst). It is being removed now to avoid complicating the implementation of HugeTLB high-granularity mapping. Acked-by: Peter Xu Acked-by: Mina Almasry Reviewed-by: Mike Kravetz Signed-off-by: James Houghton diff --git a/arch/s390/include/asm/hugetlb.h b/arch/s390/include/asm/hugetlb.h index ccdbccfde148..c34893719715 100644 --- a/arch/s390/include/asm/hugetlb.h +++ b/arch/s390/include/asm/hugetlb.h @@ -77,11 +77,6 @@ static inline void huge_ptep_set_wrprotect(struct mm_struct *mm, set_huge_pte_at(mm, addr, ptep, pte_wrprotect(pte)); } -static inline pte_t mk_huge_pte(struct page *page, pgprot_t pgprot) -{ - return mk_pte(page, pgprot); -} - static inline int huge_pte_none(pte_t pte) { return pte_none(pte); diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h index d7f6335d3999..be2e763e956f 100644 --- a/include/asm-generic/hugetlb.h +++ b/include/asm-generic/hugetlb.h @@ -5,11 +5,6 @@ #include #include -static inline pte_t mk_huge_pte(struct page *page, pgprot_t pgprot) -{ - return mk_pte(page, pgprot); -} - static inline unsigned long huge_pte_write(pte_t pte) { return pte_write(pte); diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c index af59cc7bd307..fbbc53113473 100644 --- a/mm/debug_vm_pgtable.c +++ b/mm/debug_vm_pgtable.c @@ -925,7 +925,7 @@ static void __init hugetlb_basic_tests(struct pgtable_debug_args *args) * as it was previously derived from a real kernel symbol. */ page = pfn_to_page(args->fixed_pmd_pfn); - pte = mk_huge_pte(page, args->page_prot); + pte = mk_pte(page, args->page_prot); WARN_ON(!huge_pte_dirty(huge_pte_mkdirty(pte))); WARN_ON(!huge_pte_write(huge_pte_mkwrite(huge_pte_wrprotect(pte)))); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 792cb2e67ce5..540cdf9570d3 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4899,11 +4899,10 @@ static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page, unsigned int shift = huge_page_shift(hstate_vma(vma)); if (writable) { - entry = huge_pte_mkwrite(huge_pte_mkdirty(mk_huge_pte(page, - vma->vm_page_prot))); + entry = huge_pte_mkwrite(huge_pte_mkdirty(mk_pte(page, + vma->vm_page_prot))); } else { - entry = huge_pte_wrprotect(mk_huge_pte(page, - vma->vm_page_prot)); + entry = huge_pte_wrprotect(mk_pte(page, vma->vm_page_prot)); } entry = pte_mkyoung(entry); entry = arch_make_huge_pte(entry, shift, vma->vm_flags); From patchwork Sat Feb 18 00:27:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58823 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp142644wrn; Fri, 17 Feb 2023 16:31:09 -0800 (PST) X-Google-Smtp-Source: AK7set++cl4sCGAL9tH6Pw4xN3ct+K1CshRQZ6jbKi1QgniUBNejGiZaanJyw2iYSWwQNoIYjF25 X-Received: by 2002:a17:906:7c55:b0:88e:d435:3fd6 with SMTP id g21-20020a1709067c5500b0088ed4353fd6mr2586264ejp.63.1676680269154; Fri, 17 Feb 2023 16:31:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676680269; cv=none; d=google.com; s=arc-20160816; b=YCQrC7yyHgv8tWjuav7j44f7Gh+nyjH5LDrijg0jD6TnZsHu3RrHIDg54UkVr4LTms c7+llJsDdT4STL8LL2ZVTsu4yT3WJRslCMkYLLduflHmvsFWLchN8zHLuPbbwAqjqU25 7DXkIsRCsAGFjZqa/oD7KlclZ5tJHldDq+qqEHPEDZfxiRM9vh0z5qytbuSaRUhGuPBH u5FlaIZXdAtsc3Y2odjc3lrQVGs2Oe6ri0j6GpjU3Yrp7Fta41qRmNDYv5lAWF85HvuI WUBDvjh7wMe5C72Cd/G/CgeGrcNAvffuWXU0ZyGID0mDJ8Chp5VvmHQpIsKGJW7KE4T4 yTgA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=ZDdwvW86Qv3EMGRuKtmhHN/6JFPd8EDP9/1ArRwRYXY=; b=GS3mRNJCo5laGxi5uKWAD55GrVe+BZQNPbG2eoBVI5E7f79mEHDkLoSZWJZC0ot31v k+u+cUJqDr7lc92dzv43i4/HevueLurwiV9Ps1JCkts+lAxCQDgBB41BKKY3IJoeRIrT J11C9JMwTMrUsV4ForTaa0q0/pP8JrlRJ55vM0wVnEZbtgiE7A/1ZkC9YlajDVUtfe72 mPK1/jm8j/97oC/Cj4vjel+vWC7/RrYa/1EBQGVjPjdAigrgOY1YyNQ3sClwKbVwOcc5 ceDdrVoHofh0TutnZUIWa8w+aFezEy2nW1x237XM3qyk7m0qMZDsrli26qBn79TE1JWY 5fWA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=fw5qqHMf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id if20-20020a170906df5400b008b1318e0bf6si5943412ejc.74.2023.02.17.16.30.45; Fri, 17 Feb 2023 16:31:09 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=fw5qqHMf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230028AbjBRA2z (ORCPT + 99 others); Fri, 17 Feb 2023 19:28:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41660 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229714AbjBRA2r (ORCPT ); Fri, 17 Feb 2023 19:28:47 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0FA8A5A3AD for ; Fri, 17 Feb 2023 16:28:47 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id l206-20020a25ccd7000000b006fdc6aaec4fso2656471ybf.20 for ; Fri, 17 Feb 2023 16:28:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ZDdwvW86Qv3EMGRuKtmhHN/6JFPd8EDP9/1ArRwRYXY=; b=fw5qqHMfBDzeIudommkAOwo+9UU/gmxeQgIih9FaoEsaJr2Xir63Ni9u5plph47GPQ qKNMSk3EoMZlFvKBixc5SlWffdrOIGT9xwufVeB6w6XtbUefWoHl0UTkSBIeVWA0nx9V 3vx/k8P3X+Nic17umWBpychYwMRvoLYQTiDpyeEvZuRVuWt8x3OPh+qcKKxIEMfqpB+f E4BJn1mxijItNJzVQ7AX794mA3LDFIyPdFN2u6SvQkdpMoHSeet4GeunVcsqyLAUuaFL HvKzb76potSZYgInefPIg3oX/vB0taPuTk8WpA+EnFjlSQJ7bSf8eNOYwuT07YyeDAq/ 1HaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ZDdwvW86Qv3EMGRuKtmhHN/6JFPd8EDP9/1ArRwRYXY=; b=rV/Ak7uYg1+nUcXhwhFgwb2O/djGNJvsu5VfW/zyy9GNhI/MF9QwDI11JpTPUBSXA9 f894AFAWIbiv8UG8ny1I3ULhi0HCESK8GIVpVsUva6LG3Vuau3T0EgbGiJo3Rvk/nrc7 IhFviLsUwdCa2dtciOCWR2d9mXjqJ64u63kFAokQBBOsUVMsj99Xof3dN9msjNXRZXcz BHULRRQoG5XWKeHT+WATlEWisJIN5vSRvGkuYojXrvL4XWdIfyRkKjmOX8Rzv+gS5HmC gp4YLuNih5R1o29AfspttvQMZLbC6tMx6a7J0iQLdz0dYr0Kgb3EVa4er02BdLbjgBud /6/Q== X-Gm-Message-State: AO0yUKWlITKU+Uu0KT1F0zlIwMwrhm8rk6luLgvznZreMQk+9UD6AkjI eCfUc1sUYup2bm6IYC5YOJ/ZwZZJyVGReTxd X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a81:4511:0:b0:52f:69d:cc75 with SMTP id s17-20020a814511000000b0052f069dcc75mr301138ywa.6.1676680126332; Fri, 17 Feb 2023 16:28:46 -0800 (PST) Date: Sat, 18 Feb 2023 00:27:36 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-4-jthoughton@google.com> Subject: [PATCH v2 03/46] hugetlb: remove redundant pte_mkhuge in migration path From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758126689653590219?= X-GMAIL-MSGID: =?utf-8?q?1758126689653590219?= arch_make_huge_pte, which is called immediately following pte_mkhuge, already makes the necessary changes to the PTE that pte_mkhuge would have. The generic implementation of arch_make_huge_pte simply calls pte_mkhuge. Acked-by: Peter Xu Acked-by: Mina Almasry Reviewed-by: Mike Kravetz Signed-off-by: James Houghton diff --git a/mm/migrate.c b/mm/migrate.c index 37865f85df6d..d3964c414010 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -249,7 +249,6 @@ static bool remove_migration_pte(struct folio *folio, if (folio_test_hugetlb(folio)) { unsigned int shift = huge_page_shift(hstate_vma(vma)); - pte = pte_mkhuge(pte); pte = arch_make_huge_pte(pte, shift, vma->vm_flags); if (folio_test_anon(folio)) hugepage_add_anon_rmap(new, vma, pvmw.address, From patchwork Sat Feb 18 00:27:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58812 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp142143wrn; Fri, 17 Feb 2023 16:29:54 -0800 (PST) X-Google-Smtp-Source: AK7set+jR/4yP8T+bkFKhUBnerjWYrU3w2ZPGMenITktvXuRUfKrvL6wAf3op5mvWjent6Xv3+z2 X-Received: by 2002:a17:906:c44c:b0:8b1:3002:bd6d with SMTP id ck12-20020a170906c44c00b008b13002bd6dmr474332ejb.31.1676680194177; Fri, 17 Feb 2023 16:29:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676680194; cv=none; d=google.com; s=arc-20160816; b=F/3ATk1gOB8D76b9MMLfhdYrha1igkm/lV8wajZDTX0SrUojPZRRol/A7Y2gGFhIc9 sHRgByEN2KUI8qGjY//Hq8Ix1YzF6EzBIUDkBZAV6f1b+BSDhPM+GWcDUYYkxFeMJidF iY0/C51nDgBXbPvNMu2W1OitcSH9G1Q2oyA9su2ddVe8WONQVcuyRtTmxHKg0ToOQxUQ jVjYEJxXKnlkOIWim8zeQRLRtbA7RTMXo8JX+P5jmxWaQfxBmgBPTr6Kh2TZRDvppY0M lAXXtNTOQkvcB2g8G6BTE8IEzUeO4Krg+v1MsO0SKbU0+rHYtWMnssRery3OtjW0NfTG K/VQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=ywqgi26Brd11uCoWP0sbMHtbYsofzUY9ZAimSzTW2NE=; b=bjOpPn6BlDgqPHM5KgGgjM1lM2r3GRtz9MHFZrImQPpDyz2UkqQGyF9CseO079AJGi TfnfC/ndaqou2wegmygiJqluqJ2g+7ewnHoZ6nzSc0WCwD6mz3HafHG7Mo+8V/eeXSm9 CNHaWtyLWg2RYPhGEIp06VbloKjyd3EEdRr83O10kmz1LbOt07OIzPPsFKQiIPUwgFPQ U/oMDX/cDU3U8PAuAPJwnCMTjiPAzvtVIvFgGjhEhbiu3RzGtX8FbXgwDZUjNflQwNGS ajuzBNvZ9uOuR+7S6rkp9avcLl1wXsarRs0fwSNfmGM/+w3U8fniHRtHDFD0MKoTSbIb mO0Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="n/AuNA+w"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 14-20020a170906224e00b008add06590d6si8193165ejr.700.2023.02.17.16.29.31; Fri, 17 Feb 2023 16:29:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="n/AuNA+w"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230029AbjBRA3A (ORCPT + 99 others); Fri, 17 Feb 2023 19:29:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41670 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229822AbjBRA2s (ORCPT ); Fri, 17 Feb 2023 19:28:48 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0D9BA59732 for ; Fri, 17 Feb 2023 16:28:48 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-5365936facfso20637387b3.4 for ; Fri, 17 Feb 2023 16:28:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ywqgi26Brd11uCoWP0sbMHtbYsofzUY9ZAimSzTW2NE=; b=n/AuNA+wao69AJqH2jxvia/RDl00z2MfIhcsqjJucnXsT0cN+rnMxxAJPhBDLk5ZSi MCiuvlZOLO6F7xC0qObRnm2Fw7Kk4xKcB8RUyTN9WchfsVg6IiD35+jkjOxG7TArgt04 nthdLjbi+YqMB//69CKeg7ZPl8QIrBQY7Ll3FExxGFCSIT1L0xNOUp19y9PDuIe+SxhZ B4Zyh5fsXW/P04RT85l9L0piLa0E//fpZzktUrEB+JJqehCEtNNVOMSTn6SN8kGNhUKz UMxeGr/Oj6oj5yYpspoF7jsafZJJiQT+dGrZV68rl4q3U6mgeh9r3mvqGZ+fe/wLSD9M jzxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ywqgi26Brd11uCoWP0sbMHtbYsofzUY9ZAimSzTW2NE=; b=krrTtTPp1MUrJHSb8GgjBn4L/tTomeHYvpDdjYpzejPTBRA18rxCgjY7GHp7+0v3by KYElLJQd5OAwJ5pXMfRuXvwngMfZz2n6TZlDnAnIrCcVkgAG/NopjtzO3ScxFnBrBz6+ 0YMcaFuHS7YUEDpLaBUrOjf5VIeG9p8SVOmMMG6gvkyH9c15GGrP3KJZAhsNtV6FRi/G JN17aIudVJDlaFMMnCY0IJmGrW6MOg0JmRpqxAHtZK81+oW2+kkORHR8evrwZvP3I9Fq uhBr86WhR15hUM9AGP5gGYUyDMPJLDuhvCUt0Brt1hNY1Uvjw/23kob9BzdjrwySuRXD IpoQ== X-Gm-Message-State: AO0yUKUYl5XgNYC3pm6BWlKhCVGXYcnngVu/ZQ/NzqSJ6C3OIE+hrto/ vuDZaV779mIm31QKoFzvo7MaSsBwVraagJCp X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a5b:84:0:b0:902:5b5c:73f7 with SMTP id b4-20020a5b0084000000b009025b5c73f7mr14406ybp.12.1676680127282; Fri, 17 Feb 2023 16:28:47 -0800 (PST) Date: Sat, 18 Feb 2023 00:27:37 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-5-jthoughton@google.com> Subject: [PATCH v2 04/46] hugetlb: only adjust address ranges when VMAs want PMD sharing From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758126611523449680?= X-GMAIL-MSGID: =?utf-8?q?1758126611523449680?= Currently this check is overly aggressive. For some userfaultfd VMAs, VMA sharing is disabled, yet we still widen the address range, which is used for flushing TLBs and sending MMU notifiers. This is done now, as HGM VMAs also have sharing disabled, yet would still have flush ranges adjusted. Overaggressively flushing TLBs and triggering MMU notifiers is particularly harmful with lots of high-granularity operations. Acked-by: Peter Xu Reviewed-by: Mike Kravetz Signed-off-by: James Houghton Acked-by: Mina Almasry diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 540cdf9570d3..08004371cfed 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6999,22 +6999,31 @@ static unsigned long page_table_shareable(struct vm_area_struct *svma, return saddr; } -bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr) +static bool pmd_sharing_possible(struct vm_area_struct *vma) { - unsigned long start = addr & PUD_MASK; - unsigned long end = start + PUD_SIZE; - #ifdef CONFIG_USERFAULTFD if (uffd_disable_huge_pmd_share(vma)) return false; #endif /* - * check on proper vm_flags and page table alignment + * Only shared VMAs can share PMDs. */ if (!(vma->vm_flags & VM_MAYSHARE)) return false; if (!vma->vm_private_data) /* vma lock required for sharing */ return false; + return true; +} + +bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr) +{ + unsigned long start = addr & PUD_MASK; + unsigned long end = start + PUD_SIZE; + /* + * check on proper vm_flags and page table alignment + */ + if (!pmd_sharing_possible(vma)) + return false; if (!range_in_vma(vma, start, end)) return false; return true; @@ -7035,7 +7044,7 @@ void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, * vma needs to span at least one aligned PUD size, and the range * must be at least partially within in. */ - if (!(vma->vm_flags & VM_MAYSHARE) || !(v_end > v_start) || + if (!pmd_sharing_possible(vma) || !(v_end > v_start) || (*end <= v_start) || (*start >= v_end)) return; From patchwork Sat Feb 18 00:27:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58826 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp142738wrn; Fri, 17 Feb 2023 16:31:19 -0800 (PST) X-Google-Smtp-Source: AK7set/Y7dIE/DRMR7QvDf6sqDbsSYBHiIie0zm7HTW5794LGYYIXJC+pFt3Z1s/1QTCMAeUANLk X-Received: by 2002:a17:906:4dce:b0:8a4:e0a2:e77f with SMTP id f14-20020a1709064dce00b008a4e0a2e77fmr2081669ejw.34.1676680278982; Fri, 17 Feb 2023 16:31:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676680278; cv=none; d=google.com; s=arc-20160816; b=QVl5NmBg56WLZuALuMotV1MQpSefnF7sgJF3oJ+0I+Y9eWxiwLoKuEgmEWJE91paL5 cY46v6SnyetcInnnc5WswC2H7RGGi9+vJQUSW5Ehq1ecVsC1MEK8mS/V30Wx5CdT5/hT ffo77K98S3n45ycO7UCRgOsY3umHmSOoqQi7jA6nyJG4qZtkxO9gCYhoU3iR61ubsip9 HNbF9H6kq7PzjpxVefBktuMoWr5IvoxdcWSIs4qh8UVAqb3i2JtAOLHLWO+PL1QKs8/6 V7OqjFJ2VbE7sa+dxuqlH40WKhAvvEtsaylm7+kDPD/sh/Zgy2tkRrAJEPG6GEpf9xn5 g6YQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=nelWhEWWDb3ok6PYl3fRG2FGkHP98SSg6vQYnhj1QJk=; b=qRAOWh6SIXYkTC1E2Pvyu1PoYIWl3xI2sc6GkoDavmdb3FSPrYDgk+O8LIOAMB8JtZ zUr2+8kS+IvlXlrGSD2t7wWcjNk0WnKmQNxigZTT9OyJ5DOrCYdYDFRInIpXyNkpQ4Xa f3dboKy6838ieEKZlhiM4HZyShgvVGmSt8cafGFvO3MuZA2FjW3VY9AOVPPhDcLLTMA/ Z+/DIuMSgmpnGV+rjxQ97t/nnBA6iIs2ruNk9x2ohioO9v5cNDuVxI31ID8dFt2tZys7 nsR7t0VPkZcLEAW/XvbKNKty0SX0+35OVypqIbKWPzvodJYWQqNevvhXmcReilkKErUr LMvQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=UvLBkxS6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 20-20020a17090600d400b008b16c22d06csi6479486eji.638.2023.02.17.16.30.55; Fri, 17 Feb 2023 16:31:18 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=UvLBkxS6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230042AbjBRA3D (ORCPT + 99 others); Fri, 17 Feb 2023 19:29:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41670 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229872AbjBRA2u (ORCPT ); Fri, 17 Feb 2023 19:28:50 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 184F35D3DC for ; Fri, 17 Feb 2023 16:28:49 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-53659386dc8so20702647b3.6 for ; Fri, 17 Feb 2023 16:28:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=nelWhEWWDb3ok6PYl3fRG2FGkHP98SSg6vQYnhj1QJk=; b=UvLBkxS6HWSvm6F/1ZojF9oUJKGMsoR6TJYUH52tIUGTUnHPSS4OjC8ZKXvwZwASAV tHFCfrP/r3B6sftGO2ROO6qyp0hWV5ssCNe8BVzZ/8UYiB4V6Kh8RSM5cqHAivyWBXpO lalg+riEaU3qUCklp3VFoj3s+ykjm2m5yzWRFdOAwMvlFIaPzMq9C6wMUYVDqXxeAYN4 ydi5/oNHHp4Z8bFgiq0SG1+hakJtq3Bas6/hdi8e26/G9KknLRZXzrhTcc3TXa9yzpwQ m8r8Qx+aRsi+iePo4b0oOe0PRXw52ZFZExKevE8Zt1tK+nMw9eyeE3g+fji1JRRzlVBM Sb3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=nelWhEWWDb3ok6PYl3fRG2FGkHP98SSg6vQYnhj1QJk=; b=quRgufxWyMhwKPm5XptNlcLylFb7dMsE2Zl8L+/6tHn6rru8rRZOrKznOokuR/i2OP xYySJfAcP8s+iXOm1K7cyHhFtKCjCQDI9Nui/7g62uvSJ62kXBf4QzCWzIKbjgh0JbQS 4PG71p6DxktD9QVDAP+rQd/GPPUQhjJxs+Sg+9dgcvNFmN3FKnwcX8hlP/+2PYxEI0iY +1+IGFstgoja0Om4y3AoDHOwNDwprlljIlB6PqHKtZa90fMe73s/VkNcZnATAVkbpayq yKdfwV1L19Y/Nnka7qku3inlAMaFHiXWsvnoRCFlXZSOnlP9s/G3tgqeXN3MouXe9CEw zfWg== X-Gm-Message-State: AO0yUKXCrSMAk/0akckQIHPYxgwYvqJNIIm63L95NeqZIe0X5fhcFfLp 1tMO7AyeoP7HxiUSN04gLyyPWkr84by35lDz X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a81:b705:0:b0:534:d71f:14e6 with SMTP id v5-20020a81b705000000b00534d71f14e6mr53479ywh.9.1676680128284; Fri, 17 Feb 2023 16:28:48 -0800 (PST) Date: Sat, 18 Feb 2023 00:27:38 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-6-jthoughton@google.com> Subject: [PATCH v2 05/46] rmap: hugetlb: switch from page_dup_file_rmap to page_add_file_rmap From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758126700041067441?= X-GMAIL-MSGID: =?utf-8?q?1758126700041067441?= This only applies to file-backed HugeTLB, and it should be a no-op until high-granularity mapping is possible. Also update page_remove_rmap to support the eventual case where !compound && folio_test_hugetlb(). HugeTLB doesn't use LRU or mlock, so we avoid those bits. This also means we don't need to use subpage_mapcount; if we did, it would overflow with only a few mappings. There is still one caller of page_dup_file_rmap left: copy_present_pte, and it is always called with compound=false in this case. Signed-off-by: James Houghton diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 08004371cfed..6c008c9de80e 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5077,7 +5077,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, * sleep during the process. */ if (!PageAnon(ptepage)) { - page_dup_file_rmap(ptepage, true); + page_add_file_rmap(ptepage, src_vma, true); } else if (page_try_dup_anon_rmap(ptepage, true, src_vma)) { pte_t src_pte_old = entry; @@ -5910,7 +5910,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, if (anon_rmap) hugepage_add_new_anon_rmap(folio, vma, haddr); else - page_dup_file_rmap(&folio->page, true); + page_add_file_rmap(&folio->page, vma, true); new_pte = make_huge_pte(vma, &folio->page, ((vma->vm_flags & VM_WRITE) && (vma->vm_flags & VM_SHARED))); /* @@ -6301,7 +6301,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, goto out_release_unlock; if (folio_in_pagecache) - page_dup_file_rmap(&folio->page, true); + page_add_file_rmap(&folio->page, dst_vma, true); else hugepage_add_new_anon_rmap(folio, dst_vma, dst_addr); diff --git a/mm/migrate.c b/mm/migrate.c index d3964c414010..b0f87f19b536 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -254,7 +254,7 @@ static bool remove_migration_pte(struct folio *folio, hugepage_add_anon_rmap(new, vma, pvmw.address, rmap_flags); else - page_dup_file_rmap(new, true); + page_add_file_rmap(new, vma, true); set_huge_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte); } else #endif diff --git a/mm/rmap.c b/mm/rmap.c index 15ae24585fc4..c010d0af3a82 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1318,21 +1318,21 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma, int nr = 0, nr_pmdmapped = 0; bool first; - VM_BUG_ON_PAGE(compound && !PageTransHuge(page), page); + VM_BUG_ON_PAGE(compound && !PageTransHuge(page) + && !folio_test_hugetlb(folio), page); /* Is page being mapped by PTE? Is this its first map to be added? */ if (likely(!compound)) { first = atomic_inc_and_test(&page->_mapcount); nr = first; - if (first && folio_test_large(folio)) { + if (first && folio_test_large(folio) + && !folio_test_hugetlb(folio)) { nr = atomic_inc_return_relaxed(mapped); nr = (nr < COMPOUND_MAPPED); } - } else if (folio_test_pmd_mappable(folio)) { - /* That test is redundant: it's for safety or to optimize out */ - + } else { first = atomic_inc_and_test(&folio->_entire_mapcount); - if (first) { + if (first && !folio_test_hugetlb(folio)) { nr = atomic_add_return_relaxed(COMPOUND_MAPPED, mapped); if (likely(nr < COMPOUND_MAPPED + COMPOUND_MAPPED)) { nr_pmdmapped = folio_nr_pages(folio); @@ -1347,6 +1347,9 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma, } } + if (folio_test_hugetlb(folio)) + return; + if (nr_pmdmapped) __lruvec_stat_mod_folio(folio, folio_test_swapbacked(folio) ? NR_SHMEM_PMDMAPPED : NR_FILE_PMDMAPPED, nr_pmdmapped); @@ -1376,8 +1379,7 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma, VM_BUG_ON_PAGE(compound && !PageHead(page), page); /* Hugetlb pages are not counted in NR_*MAPPED */ - if (unlikely(folio_test_hugetlb(folio))) { - /* hugetlb pages are always mapped with pmds */ + if (unlikely(folio_test_hugetlb(folio)) && compound) { atomic_dec(&folio->_entire_mapcount); return; } @@ -1386,15 +1388,14 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma, if (likely(!compound)) { last = atomic_add_negative(-1, &page->_mapcount); nr = last; - if (last && folio_test_large(folio)) { + if (last && folio_test_large(folio) + && !folio_test_hugetlb(folio)) { nr = atomic_dec_return_relaxed(mapped); nr = (nr < COMPOUND_MAPPED); } - } else if (folio_test_pmd_mappable(folio)) { - /* That test is redundant: it's for safety or to optimize out */ - + } else { last = atomic_add_negative(-1, &folio->_entire_mapcount); - if (last) { + if (last && !folio_test_hugetlb(folio)) { nr = atomic_sub_return_relaxed(COMPOUND_MAPPED, mapped); if (likely(nr < COMPOUND_MAPPED)) { nr_pmdmapped = folio_nr_pages(folio); @@ -1409,6 +1410,9 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma, } } + if (folio_test_hugetlb(folio)) + return; + if (nr_pmdmapped) { if (folio_test_anon(folio)) idx = NR_ANON_THPS; From patchwork Sat Feb 18 00:27:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58829 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp142781wrn; Fri, 17 Feb 2023 16:31:24 -0800 (PST) X-Google-Smtp-Source: AK7set9W3ANg2lfx0RL94sn/PdDUWof1gFaeaHFaB2GrDnpMVSjBLCVbMjWJhshk8u1kJUYC6qPv X-Received: by 2002:a05:6402:217:b0:4ac:bd71:c595 with SMTP id t23-20020a056402021700b004acbd71c595mr2635380edv.23.1676680284394; Fri, 17 Feb 2023 16:31:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676680284; cv=none; d=google.com; s=arc-20160816; b=rNnTS+3mP3xDDNI+p5Ne04ctfS968O4l3T5Vwkeu5QtpI/ejS6gtzJCxcXBql2he2k FVLYzJQD4fo79JDat3hEqVXoD29oW6m1TDNDRnx6msnUGn4vXrKt37vSMVriDOWhkvki MaacFg0TfUA5S8wM8k7kATTSIKoA9bfkGNBMNpf4XYJ2qETT1zY9/RiI1aNG4JV8diBw m7GY3xgiXpvGJbduM6QylxZBOFUzDhY+RAwzIsRZx59kdAFoOGCyJXPXPF8hGe/zjlXb 9f/5Yd0ct1w6TQq4y7nVeovqVBKhI4Lu6l6jGUfo5JNxRg57PzxsFJMb3tQM/snG1vzs UPbg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=8nfyEWndKsa3tyc1KlPsVZDyN8xlZ4zW0vpf47Ary0I=; b=ozD/CutVoYjZKD2n69HYbTmdLId53z32AG52cNx4LDeDA7vUQGaD/QrRuK/Z8r9q8V 7L4AsLHKkpvjFvgzpfqhMj7JCu5oHuiIf1l/L+NzcVN18v+jK5twsVAc96pG4fSJoxW5 zPKeFoZH0tw5LgLq9yZzQ5AW3WADBWxF+s0YWTeuA6ynEef8hYSSH6WBBrDOudvCXRg1 wZ9cgMD76gYAza2ze5MAMZWQ3fNk0th9MMXr2vt9ydc6sixhXJEue1W7GD0ao50KnLxE MG2wSfJdivAEQ0wx1rfAAxmNpiwHWE79MlhAI/0U4T8d2VjCvG90cA8djYV2Ny1stNLW OM9A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=ghgOaEON; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l7-20020a056402124700b004acc575090fsi7058478edw.36.2023.02.17.16.31.00; Fri, 17 Feb 2023 16:31:24 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=ghgOaEON; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229689AbjBRA3F (ORCPT + 99 others); Fri, 17 Feb 2023 19:29:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41792 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229885AbjBRA2v (ORCPT ); Fri, 17 Feb 2023 19:28:51 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 113A15F252 for ; Fri, 17 Feb 2023 16:28:50 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id 188-20020a2503c5000000b008e1de4c1e7dso2280796ybd.17 for ; Fri, 17 Feb 2023 16:28:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=8nfyEWndKsa3tyc1KlPsVZDyN8xlZ4zW0vpf47Ary0I=; b=ghgOaEONmk3TxY7Vbd4llR8C6ZhiH535rAEn2mRoH4sgelGi+DzvBNDNzGInJZzqfs MEErbyZH29/h7FEPNCh9VZcW/Bm4YgRch9ir0kS2NDXVsbFg5l6v2m5+GdPoQ68p73Au kx7/25Br5F6Yw90cWFqLg3eViGqsOMcWI58kCeCLrtLPN7KaOw1BIGgyIswJlZykM+9O 0y4w9vz06OJ9NLQ+jIbxIwJsCdh/V36xONqd3gjqH80qszv271zhh3nf8DhiYiNLA2qQ k8oLBo0ZQMuPQXPM4GYcllmYMP6s7dZX1ehwhZA1pOfZpnWcQPV0VVxxICdxJelUUbzk U9lw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=8nfyEWndKsa3tyc1KlPsVZDyN8xlZ4zW0vpf47Ary0I=; b=s8B0tOqgX8vcKTYrHBQBdT1oUZ3EGwnGoCLmybxnIwa4n7Zhb0wnkPWZs4vVe2B0gJ kdbiuQsNZfeMSJ0ArfCD+JueRq8yUBD8u9NMlLwcWNT0eIisXBIgBmES3kpN2esXL+YA 6SrUOZjl5lpC8YJnSSx1bZFFIZF+W2ShwhsEDKijLNncYbPwGoNbJFDspUmj/SAyWA3M yNTg/5EB7z0+P8kLpx43cd4rNWABw+N75mbKDjtmuBtHxvdEQ71P2ZPsHn8Z3hy2HzA3 hdk97J1ft/C4wixEyelLJoNx08YOxXYh/6pd3QIm1VeanwuXsH9S93NzmdkdD9g3KTaI 8FGg== X-Gm-Message-State: AO0yUKV/ys6LJDzMLR91QYaHn8uzP+tXTKQzI0htQMvqoKsdcwlpkRZo QkBnMaFhQTFqRAJc+OOcNK7EAYOfWwsvhVWH X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6902:241:b0:8db:41c9:aa6f with SMTP id k1-20020a056902024100b008db41c9aa6fmr200113ybs.2.1676680129319; Fri, 17 Feb 2023 16:28:49 -0800 (PST) Date: Sat, 18 Feb 2023 00:27:39 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-7-jthoughton@google.com> Subject: [PATCH v2 06/46] hugetlb: add CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758126705722242837?= X-GMAIL-MSGID: =?utf-8?q?1758126705722242837?= This adds the Kconfig to enable or disable high-granularity mapping. Each architecture must explicitly opt-in to it (via ARCH_WANT_HUGETLB_HIGH_GRANULARITY_MAPPING), but when opted in, HGM will be enabled by default if HUGETLB_PAGE is enabled. Signed-off-by: James Houghton diff --git a/fs/Kconfig b/fs/Kconfig index 2685a4d0d353..a072bbe3439a 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -246,6 +246,18 @@ config HUGETLBFS config HUGETLB_PAGE def_bool HUGETLBFS +config ARCH_WANT_HUGETLB_HIGH_GRANULARITY_MAPPING + bool + +config HUGETLB_HIGH_GRANULARITY_MAPPING + bool "HugeTLB high-granularity mapping support" + default n + depends on ARCH_WANT_HUGETLB_HIGH_GRANULARITY_MAPPING + help + HugeTLB high-granularity mapping (HGM) allows userspace to issue + UFFDIO_CONTINUE on HugeTLB mappings in PAGE_SIZE chunks. + HGM is incompatible with the HugeTLB Vmemmap Optimization (HVO). + # # Select this config option from the architecture Kconfig, if it is preferred # to enable the feature of HugeTLB Vmemmap Optimization (HVO). @@ -257,6 +269,7 @@ config HUGETLB_PAGE_OPTIMIZE_VMEMMAP def_bool HUGETLB_PAGE depends on ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP depends on SPARSEMEM_VMEMMAP + depends on !HUGETLB_HIGH_GRANULARITY_MAPPING config HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON bool "HugeTLB Vmemmap Optimization (HVO) defaults to on" From patchwork Sat Feb 18 00:27:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58813 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp142253wrn; Fri, 17 Feb 2023 16:30:13 -0800 (PST) X-Google-Smtp-Source: AK7set+wy6xk+FzpA1XWAALXT7Ts5b0Qyrxw/1fVLvFQBfkYMF4ZMqDpdoLeyFE5FsG2lFwY6zGb X-Received: by 2002:a17:906:8042:b0:8b1:788f:2198 with SMTP id x2-20020a170906804200b008b1788f2198mr5772102ejw.19.1676680213125; Fri, 17 Feb 2023 16:30:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676680213; cv=none; d=google.com; s=arc-20160816; b=urP6f3PoGgazhhMklWGTyJ1gc3wgpIn5nZhoS0YGvfl/ZSoW8+2g0l4WlgVb9xG6Ca JM97RpsNKwJCSBoVYl6x6/nQKm4MbzY9H9W0DpH5qkWRsqjgHm1f2T2pTN7gJC0Mas2n MUFWvyj0S8I2Wjqu73f8WSQz5xFkW8yiDE+yq++zuk99SWE9gkGNz2IoGjFnxwRD0p9L YbWugWbLJ2LLxHiXtrFMTj/1NDLt2flVH6cvSXM057QhMMq+1u6kch9qdGD97KdKyK8h pc006Au8SUqtW5rVRKcebIOhhB1LcGOi1zC50VoG9JecuIgqs55yWVOsPvkI9XFNgSEs uvUQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=Wk7vTpC+014vSbmA7BXLNrcppJOJiIEYznqCa6jfXqI=; b=sA60QC23/95Dqw9S0Y7Vu4mdnhyU9Jf+jOiXAOo1qOf4sUiWjcncod7BUOXnsK2BEV 4Pix+bpWYXCKpIjcYIz94QTbnF8VHan1E9sOyZIHSdxuEIA0q6q/pihM6QKO0LE4lFYF KbBkMeWu6/zt4EavCBuHBIqBB+Cb5RQzkb+k5VtNwuLUbjVxv8rp6j2Ldt+pH5naS/Tr IgLN+KlHpbE2Dw3gMSb521vVey6VyOGqjoryOjx4dJ9IuWOe8k0GCE8YRRcUxM+Rx5yS eAyct8pA6O8EwmSehKGBm8b/fPSCWdalQDUDLT7T68uAfdglaD/UF6+BK0+3kb8MNWq2 /mDw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=L4T7D1Q7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 18-20020a170906011200b008b1785971e2si5590063eje.33.2023.02.17.16.29.49; Fri, 17 Feb 2023 16:30:13 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=L4T7D1Q7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230052AbjBRA3I (ORCPT + 99 others); Fri, 17 Feb 2023 19:29:08 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41834 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229903AbjBRA2v (ORCPT ); Fri, 17 Feb 2023 19:28:51 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A6C6D68AD1 for ; Fri, 17 Feb 2023 16:28:50 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-536582abb72so22511017b3.5 for ; Fri, 17 Feb 2023 16:28:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Wk7vTpC+014vSbmA7BXLNrcppJOJiIEYznqCa6jfXqI=; b=L4T7D1Q77q9QoOsnz1tm6Qf6BnOnAK2ZGTzjyBq232kdcaHnTPLu0KkbIFKg2iqZGi ur2eS2TG9FID8wSfSSNIUDRAH4/hc4cg6N78xCCSSvyxSCrRt7XxD8Kq9J+3/utwUlNa XwiHxLMwsKt54AmsMKUm23cdHK0jjFppi4B8oAiG6cdvRInp32jKLbSvstKyMXiqo8UK 07XHP/2+lJH0n/T1p+F0gJBrThd4SgIk1odXs4Xk0ZR7hodPP4o/xUlKaYBCidUzDuQ8 b+jdsTQykyqbj5MGq0jpb+xO5JrvhEf0KZUw8P19p8z3GR5sTSonXH6Em23ytW9e2DzL g/dQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Wk7vTpC+014vSbmA7BXLNrcppJOJiIEYznqCa6jfXqI=; b=VU/kYh2YLkc4Pgq3AXqOM48MzXWsWoCVL3Y4iPPyZDykghSyXb4IQO7TH5X7u+jom6 jurRhl+ucqR7oxsZAASoea47zGOuBU+D7WfNJf4nrF1H2pKxpFvT2LUlY97y1arTJsMG uvObI/VHKzvHKpavYhRsZwVl4gSicaalsiUyVayp7lfJQuRzdHpqnwjKleU7MMyAyY5l AL2YI2vX1/jNC1JZl5BJQ9MFh6Vn1Ez5FgaxR9mOPyDUCXuRu7kAtghKJKAJq6HSCLfT BP6xNO9/fX17eVby/mntmUwOaqxze+6ik94m2KrKbNxJdt5+GDIoCMt7nU4G8uAr5KXo INRA== X-Gm-Message-State: AO0yUKVhn3hRpRdPGX6T/iskMO6hWI340N9wrX1HaFYN8U74nQYY4tsz f+zEWSL+WUQnTA2WSGwWUGZNQ5egBYE3Nb3G X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a5b:4c3:0:b0:904:2aa2:c26c with SMTP id u3-20020a5b04c3000000b009042aa2c26cmr196573ybp.5.1676680130224; Fri, 17 Feb 2023 16:28:50 -0800 (PST) Date: Sat, 18 Feb 2023 00:27:40 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-8-jthoughton@google.com> Subject: [PATCH v2 07/46] mm: add VM_HUGETLB_HGM VMA flag From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758126631095475967?= X-GMAIL-MSGID: =?utf-8?q?1758126631095475967?= VM_HUGETLB_HGM indicates that a HugeTLB VMA may contain high-granularity mappings. Its VmFlags string is "hm". Signed-off-by: James Houghton Acked-by: Mike Kravetz diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 6a96e1713fd5..77b72f42556a 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -711,6 +711,9 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR [ilog2(VM_UFFD_MINOR)] = "ui", #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING + [ilog2(VM_HUGETLB_HGM)] = "hm", +#endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ }; size_t i; diff --git a/include/linux/mm.h b/include/linux/mm.h index 2992a2d55aee..9d3216b4284a 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -383,6 +383,13 @@ extern unsigned int kobjsize(const void *objp); # define VM_UFFD_MINOR VM_NONE #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING +# define VM_HUGETLB_HGM_BIT 38 +# define VM_HUGETLB_HGM BIT(VM_HUGETLB_HGM_BIT) /* HugeTLB high-granularity mapping */ +#else /* !CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ +# define VM_HUGETLB_HGM VM_NONE +#endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ + /* Bits set in the VMA until the stack is in its final location */ #define VM_STACK_INCOMPLETE_SETUP (VM_RAND_READ | VM_SEQ_READ) diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h index 9db52bc4ce19..bceb960dbada 100644 --- a/include/trace/events/mmflags.h +++ b/include/trace/events/mmflags.h @@ -162,6 +162,12 @@ IF_HAVE_PG_SKIP_KASAN_POISON(PG_skip_kasan_poison, "skip_kasan_poison") # define IF_HAVE_UFFD_MINOR(flag, name) #endif +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING +# define IF_HAVE_HUGETLB_HGM(flag, name) {flag, name}, +#else +# define IF_HAVE_HUGETLB_HGM(flag, name) +#endif + #define __def_vmaflag_names \ {VM_READ, "read" }, \ {VM_WRITE, "write" }, \ @@ -186,6 +192,7 @@ IF_HAVE_UFFD_MINOR(VM_UFFD_MINOR, "uffd_minor" ) \ {VM_ACCOUNT, "account" }, \ {VM_NORESERVE, "noreserve" }, \ {VM_HUGETLB, "hugetlb" }, \ +IF_HAVE_HUGETLB_HGM(VM_HUGETLB_HGM, "hugetlb_hgm" ) \ {VM_SYNC, "sync" }, \ __VM_ARCH_SPECIFIC_1 , \ {VM_WIPEONFORK, "wipeonfork" }, \ From patchwork Sat Feb 18 00:27:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58814 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp142326wrn; Fri, 17 Feb 2023 16:30:22 -0800 (PST) X-Google-Smtp-Source: AK7set8U79G4IpWA6Xo7cV2pv8h+6eajASydUoQQVEk3QRctndCOSL6P7ayOXAHUNyCJEQ00fpFP X-Received: by 2002:aa7:ce0a:0:b0:4aa:a4e9:fa28 with SMTP id d10-20020aa7ce0a000000b004aaa4e9fa28mr2047676edv.34.1676680222244; Fri, 17 Feb 2023 16:30:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676680222; cv=none; d=google.com; s=arc-20160816; b=Q225UzxJAndfroyrQSRJD6ogNIIWHAWxl3FJ4k4tWEvTEYTTqCgjAgL2s4y8EoCVHH 3ymA76u9rAOVsUi62bVYlwieTJffuqsJBJlot/WMK9dsgS5ChOR7kDhseXNMS/VXPvNo ohnZ25nTNHJz6716R4U4JcOk7mRxuUE9xBwYRt4vxugAwoX7WzRo0d091VGgQUoHpmM4 aaeHdrKU2wa72/fty9qitYdnfLz8c+afons6sdW8H2wUtz1kNJhu4BQwlDqsdtWPIg/8 mCzBkijsqNSe/JYBe01YZQT2Au3vSVG92opQ2LevOVH/+OnIPV0Yn/zNdKNylYVOGzvd MIuA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=EVR7nxu9NqFl8iop5UBI5tIYqVwEs8TwTl7GvIiAmDc=; b=kAQzXBXe7JzAInnTIKPMQMR20umfdF1FYTcEm3tWVqJjE0bLf1nar0cUYEUPXLQoUS EFGVucUTGr7HaYGZPryQvA3vpHMndY1BES5bL1fcvKsfM1xwMyBxWx7Uijl0tKydx3DT NrkxCOe9M+h/90JjfqLuP9CG0Az4eQL7Fov7WcxWA56vcrqWLDmEZeSRgCP1yiQCRLNI uV1vboCZ6PYbSQSNDy5Czj6KNAVzdKSsh5zB6CgpdFgvlIuS0BRbCnQOYYDVHxSBG4R+ pJBndWNuUUKbPdewoRRhmH2Ky/pEroQhueJj87T9UHJWXeNkdRjr7VWy9YYEeAO4evaV 0TsQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="f9I/sF01"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c16-20020aa7df10000000b004ad8bff8f1dsi3514290edy.204.2023.02.17.16.29.58; Fri, 17 Feb 2023 16:30:22 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="f9I/sF01"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229632AbjBRA3S (ORCPT + 99 others); Fri, 17 Feb 2023 19:29:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41956 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229963AbjBRA2x (ORCPT ); Fri, 17 Feb 2023 19:28:53 -0500 Received: from mail-ua1-x949.google.com (mail-ua1-x949.google.com [IPv6:2607:f8b0:4864:20::949]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7673F62FF2 for ; Fri, 17 Feb 2023 16:28:52 -0800 (PST) Received: by mail-ua1-x949.google.com with SMTP id s31-20020a9f3662000000b00683c94d9881so924538uad.3 for ; Fri, 17 Feb 2023 16:28:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=EVR7nxu9NqFl8iop5UBI5tIYqVwEs8TwTl7GvIiAmDc=; b=f9I/sF017TCFQ06v8JVGvP2Gz6zjUBu/YI/6tPRqHUhBratYAP73U723CvkiNRmX+2 Xq1dOa8TR7SCr3LVAGWBGIMiSfJg1XkFZM5gm/al2A53lbtwLqPsmpTxwC6EJWviOgyV 6ZoThnT1qOoxqJ91hnxCH3KMvw34Z4n+qBZp+x8ogItt4N49rc+X+Kia1yaFeZctlPrt dHqaxn4yVk5u2oBjIzFDTnhm8t11FS1zIMAY2JsmW8Lyk6cGaRH4PRACzyyvtyhEDhjv 2AMBPlp0rxu60Rp9Xuc47FP4zxAkrddmeJ3HXqOflP02EXV117YxiSCeFjnJ8DA0bJ1V pWMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=EVR7nxu9NqFl8iop5UBI5tIYqVwEs8TwTl7GvIiAmDc=; b=FAup0ns9VEeANrNpxNVtn4sv/iM06UQ3zsrPsLB7byECciDfxDoyJGP5BLQDdHwvqW jhwWtl1AB15EBfYt/xO4JszZKNLPTwVwoEI6F13tyubgk+tXOi18Fj2R45ugpe9kkzn2 E4+JJOHfE9EVsPbbo66k4kOJuXaxAlpper/InArBOmXzXSfezKlWjiouLLFSv0ePE5Rj ft6NzPZ5XSanl38OBeLIGyBk7Ml2gCjFEkgof0Twke7KviXNLezEp8Xuecl/Y+pccyqR aoKwXhtM/WwOLGLJKUDEam/M6tCHqp+7IjcAgJF21xWJY+UFaAAWJuUzm+tKIPkuO68B DkEA== X-Gm-Message-State: AO0yUKVymVzqsQdj/pmLtfmftBmjFhxeePVwbb7aUKJaDhGE/IeeVagK wUwK8WmherXvk2/pLLMh0RE5cJ0i4AbCGTgy X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a1f:21c4:0:b0:406:983c:e6de with SMTP id h187-20020a1f21c4000000b00406983ce6demr657672vkh.1.1676680131643; Fri, 17 Feb 2023 16:28:51 -0800 (PST) Date: Sat, 18 Feb 2023 00:27:41 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-9-jthoughton@google.com> Subject: [PATCH v2 08/46] hugetlb: add HugeTLB HGM enablement helpers From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758126640840474788?= X-GMAIL-MSGID: =?utf-8?q?1758126640840474788?= hugetlb_hgm_eligible indicates that a VMA is eligible to have HGM explicitly enabled via MADV_SPLIT, and hugetlb_hgm_enabled indicates that HGM has been enabled. Signed-off-by: James Houghton Reviewed-by: Mina Almasry diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 7c977d234aba..efd2635a87f5 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -1211,6 +1211,20 @@ static inline void hugetlb_unregister_node(struct node *node) } #endif /* CONFIG_HUGETLB_PAGE */ +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING +bool hugetlb_hgm_enabled(struct vm_area_struct *vma); +bool hugetlb_hgm_eligible(struct vm_area_struct *vma); +#else +static inline bool hugetlb_hgm_enabled(struct vm_area_struct *vma) +{ + return false; +} +static inline bool hugetlb_hgm_eligible(struct vm_area_struct *vma) +{ + return false; +} +#endif + static inline spinlock_t *huge_pte_lock(struct hstate *h, struct mm_struct *mm, pte_t *pte) { diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 6c008c9de80e..0576dcc98044 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -7004,6 +7004,10 @@ static bool pmd_sharing_possible(struct vm_area_struct *vma) #ifdef CONFIG_USERFAULTFD if (uffd_disable_huge_pmd_share(vma)) return false; +#endif +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING + if (hugetlb_hgm_enabled(vma)) + return false; #endif /* * Only shared VMAs can share PMDs. @@ -7267,6 +7271,18 @@ __weak unsigned long hugetlb_mask_last_page(struct hstate *h) #endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */ +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING +bool hugetlb_hgm_eligible(struct vm_area_struct *vma) +{ + /* All shared VMAs may have HGM. */ + return vma && (vma->vm_flags & VM_MAYSHARE); +} +bool hugetlb_hgm_enabled(struct vm_area_struct *vma) +{ + return vma && (vma->vm_flags & VM_HUGETLB_HGM); +} +#endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ + /* * These functions are overwritable if your architecture needs its own * behavior. From patchwork Sat Feb 18 00:27:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58815 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp142350wrn; Fri, 17 Feb 2023 16:30:25 -0800 (PST) X-Google-Smtp-Source: AK7set+QFqxxBxg0R68POcW7CjF+ErF3gzZLgPdcDgK/7jtwiBxo+x6TdKc0wiDiiCtQD+rNUr+M X-Received: by 2002:a17:906:aad6:b0:8b1:3b95:cf3f with SMTP id kt22-20020a170906aad600b008b13b95cf3fmr1795844ejb.70.1676680225249; Fri, 17 Feb 2023 16:30:25 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676680225; cv=none; d=google.com; s=arc-20160816; b=KXyU6EgXzP1N214Pp4ruE68BhhrCLhVNrAejEDE0Xw2UQCkUTt/Bv1nv8RFt5akGc1 OTt6iLdWSwlpCu1oRqv971mX13N5NDLNTxHhNvMU0kivfLnATQCp+tdmY4fmVPpmaXqU EQTlU4NHsF94Q+oAtiVdLhoh4R385hAE3hhV9acUUXnFDKy9wkzT7FuP7uvsarGuqRY6 QTJ6tRrutvM8l7WkP7XIuESufqHC9B1rvzZZyFGo4K9wvoSz9PyF0yk/WZ7xqOAWcytq aeUnKJjVOjY4dbjNnZxLwZQQnkpPn06DET4INfJbNYlFYKbcI9L0XC22zy0CnA+1Tf++ m5Mg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=Gzu9Oqr9H8Fn0tUeHEDWXYAlOB8OWdsieehz/zpOW6k=; b=TP/UdDJPS8rzkBivVsw28a1UgSrh56c4CoJuMpUIUgiOG8Km0WT+oQ4k3epzNJ8yvW 9tHdtZAgYF3mk/0LcK1OsL4I95nN1LTzeeQLEfjHvhcR6TF7Knf1e3DYmfqPDy6MJ1aZ UsKU7uvTYIIYBGWWsDg78eXqhsTOAbMV5g2e9VIH79usJB9rpPPWQ/u3Dfa9dBd9R/qK WVrrcyIc8fpA4UpnvRAsY1JgmKy02Ji5MYTTCSq2NfLrNe/qW1vHZZX4Yv+On92S0X1P 2loymIMsGQbq1/2GZ0JBmEY8EZMfCUyEbUPKnvk+UYxUFI7QOuZrcGiRd9BjaSIZFWB8 KNzw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=tYpmmFQz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id fi22-20020a1709073ad600b008af449e32d1si6066968ejc.909.2023.02.17.16.30.01; Fri, 17 Feb 2023 16:30:25 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=tYpmmFQz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230104AbjBRA3X (ORCPT + 99 others); Fri, 17 Feb 2023 19:29:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42972 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229570AbjBRA3P (ORCPT ); Fri, 17 Feb 2023 19:29:15 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 517526A050 for ; Fri, 17 Feb 2023 16:28:53 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-5366c11a9e2so12373607b3.17 for ; Fri, 17 Feb 2023 16:28:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Gzu9Oqr9H8Fn0tUeHEDWXYAlOB8OWdsieehz/zpOW6k=; b=tYpmmFQzEgrObj/2FMcbtdaJQm3sazU5VnbVoR+zz6vUh7rASk45yjIKn2FedC7eo+ 64TFERWH3kAocQ35djKIpY0U7uy4JBvt4LsnfsaNiT08MQoKwPApF8bVx1iphGbWHeEc IMInRpDhVgnidOgxuvPZq1H7/CcXyiRAgK7ISL6kaB1IUSOqgvGwY430h2xrB2v8neVQ 8AN2tGoltRJlPrCZJ1MuICMziVOgA+TSNCC1pOgqumOa93V1to2Sec2gOzLTecmc35fN 74QeAtnqV0HzcQ4DlaDW3Lxjmfem+TFSMTYeknGNL3g2HO+TM9ub4VO/dStuPqoBrn6U ZpmA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Gzu9Oqr9H8Fn0tUeHEDWXYAlOB8OWdsieehz/zpOW6k=; b=a93MA8VZ8t8TtJv26HE1lm7513AIX+3ERm69v5wHfx0caROyNWexI4Su/Smm67Stn2 hLfwxFfpAdFCYcn44wxSf54p9FY4ny+dc3ZDoJqrZmo0n+ymWXgjxAHgQRY7qRAQbbAQ sr03ioIzCOW78Hlah5Fe83OlccLcqA+AiNL4viGCf4vbySsyhDSEzg6hJMeyriCn4r1y 5Q6MmO1gsxIRJ+CBwNUEsatMZH3OH2GeyOwgyScPCqrKPAumX6/8vWkdGRsXOo0RyNLW iuDozqu8CHUfmMbR5Hm4cqcmlFJtlPBjXv04ejWmrg+irsoX1mRrBR3jYNLsoQssT87D E8oA== X-Gm-Message-State: AO0yUKUWztCEyXAzbrZ79lUKD8st0Ht7yksB7/OB+w7kE7VMlZhm4XBZ R1eI4w64b8NZk642KCmGfZ+uPvy56wDpKP9G X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6902:1024:b0:8fc:686c:cf87 with SMTP id x4-20020a056902102400b008fc686ccf87mr53474ybt.4.1676680132546; Fri, 17 Feb 2023 16:28:52 -0800 (PST) Date: Sat, 18 Feb 2023 00:27:42 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-10-jthoughton@google.com> Subject: [PATCH v2 09/46] mm: add MADV_SPLIT to enable HugeTLB HGM From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758126643688371612?= X-GMAIL-MSGID: =?utf-8?q?1758126643688371612?= Issuing ioctl(MADV_SPLIT) on a HugeTLB address range will enable HugeTLB HGM. MADV_SPLIT was chosen for the name so that this API can be applied to non-HugeTLB memory in the future, if such an application is to arise. MADV_SPLIT provides several API changes for some syscalls on HugeTLB address ranges: 1. UFFDIO_CONTINUE is allowed for MAP_SHARED VMAs at PAGE_SIZE alignment. 2. read()ing a page fault event from a userfaultfd will yield a PAGE_SIZE-rounded address, instead of a huge-page-size-rounded address (unless UFFD_FEATURE_EXACT_ADDRESS is used). There is no way to disable the API changes that come with issuing MADV_SPLIT. MADV_COLLAPSE can be used to collapse high-granularity page table mappings that come from the extended functionality that comes with using MADV_SPLIT. For post-copy live migration, the expected use-case is: 1. mmap(MAP_SHARED, some_fd) primary mapping 2. mmap(MAP_SHARED, some_fd) alias mapping 3. MADV_SPLIT the primary mapping 4. UFFDIO_REGISTER/etc. the primary mapping 5. Copy memory contents into alias mapping and UFFDIO_CONTINUE the corresponding PAGE_SIZE sections in the primary mapping. More API changes may be added in the future. Signed-off-by: James Houghton diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h index 763929e814e9..7a26f3648b90 100644 --- a/arch/alpha/include/uapi/asm/mman.h +++ b/arch/alpha/include/uapi/asm/mman.h @@ -78,6 +78,8 @@ #define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */ +#define MADV_SPLIT 26 /* Enable hugepage high-granularity APIs */ + /* compatibility flags */ #define MAP_FILE 0 diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h index c6e1fc77c996..f8a74a3a0928 100644 --- a/arch/mips/include/uapi/asm/mman.h +++ b/arch/mips/include/uapi/asm/mman.h @@ -105,6 +105,8 @@ #define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */ +#define MADV_SPLIT 26 /* Enable hugepage high-granularity APIs */ + /* compatibility flags */ #define MAP_FILE 0 diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h index 68c44f99bc93..a6dc6a56c941 100644 --- a/arch/parisc/include/uapi/asm/mman.h +++ b/arch/parisc/include/uapi/asm/mman.h @@ -72,6 +72,8 @@ #define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */ +#define MADV_SPLIT 74 /* Enable hugepage high-granularity APIs */ + #define MADV_HWPOISON 100 /* poison a page for testing */ #define MADV_SOFT_OFFLINE 101 /* soft offline page for testing */ diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h index 1ff0c858544f..f98a77c430a9 100644 --- a/arch/xtensa/include/uapi/asm/mman.h +++ b/arch/xtensa/include/uapi/asm/mman.h @@ -113,6 +113,8 @@ #define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */ +#define MADV_SPLIT 26 /* Enable hugepage high-granularity APIs */ + /* compatibility flags */ #define MAP_FILE 0 diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index 6ce1f1ceb432..996e8ded092f 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -79,6 +79,8 @@ #define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */ +#define MADV_SPLIT 26 /* Enable hugepage high-granularity APIs */ + /* compatibility flags */ #define MAP_FILE 0 diff --git a/mm/madvise.c b/mm/madvise.c index c2202f51e9dd..8c004c678262 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -1006,6 +1006,28 @@ static long madvise_remove(struct vm_area_struct *vma, return error; } +static int madvise_split(struct vm_area_struct *vma, + unsigned long *new_flags) +{ +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING + if (!is_vm_hugetlb_page(vma) || !hugetlb_hgm_eligible(vma)) + return -EINVAL; + + /* + * PMD sharing doesn't work with HGM. If this MADV_SPLIT is on part + * of a VMA, then we will split the VMA. Here, we're unsharing before + * splitting because it's simpler, although we may be unsharing more + * than we need. + */ + hugetlb_unshare_all_pmds(vma); + + *new_flags |= VM_HUGETLB_HGM; + return 0; +#else + return -EINVAL; +#endif +} + /* * Apply an madvise behavior to a region of a vma. madvise_update_vma * will handle splitting a vm area into separate areas, each area with its own @@ -1084,6 +1106,11 @@ static int madvise_vma_behavior(struct vm_area_struct *vma, break; case MADV_COLLAPSE: return madvise_collapse(vma, prev, start, end); + case MADV_SPLIT: + error = madvise_split(vma, &new_flags); + if (error) + goto out; + break; } anon_name = anon_vma_name(vma); @@ -1178,6 +1205,9 @@ madvise_behavior_valid(int behavior) case MADV_HUGEPAGE: case MADV_NOHUGEPAGE: case MADV_COLLAPSE: +#endif +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING + case MADV_SPLIT: #endif case MADV_DONTDUMP: case MADV_DODUMP: @@ -1368,6 +1398,8 @@ int madvise_set_anon_name(struct mm_struct *mm, unsigned long start, * transparent huge pages so the existing pages will not be * coalesced into THP and new pages will not be allocated as THP. * MADV_COLLAPSE - synchronously coalesce pages into new THP. + * MADV_SPLIT - allow HugeTLB pages to be mapped at PAGE_SIZE. This allows + * UFFDIO_CONTINUE to accept PAGE_SIZE-aligned regions. * MADV_DONTDUMP - the application wants to prevent pages in the given range * from being included in its core dump. * MADV_DODUMP - cancel MADV_DONTDUMP: no longer exclude from core dump. From patchwork Sat Feb 18 00:27:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58816 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp142366wrn; Fri, 17 Feb 2023 16:30:29 -0800 (PST) X-Google-Smtp-Source: AK7set+cVxa3PFmnQZ3Yzmr3jXfFxcoJXUsYFe1xrWO0NHAwaknxzLPEgJUjzLKSj2y9J859NA+D X-Received: by 2002:a17:907:7e8e:b0:885:fb8a:7c3f with SMTP id qb14-20020a1709077e8e00b00885fb8a7c3fmr3421567ejc.65.1676680229487; Fri, 17 Feb 2023 16:30:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676680229; cv=none; d=google.com; s=arc-20160816; b=HIgMan7ltOJ7IBvf/fHbKa57B1GdhcHZn7Eytrkd9JJQgW4V7RtZL4fnkF/IKPE6cs +ljDZu2r9XiSzA/HIEm89d8p0GQjkNfMvT9yuSYzCEAtSw9vWR4vNY+69xu8GMaDQJUd vdUu5C8+1HhYoKotYjcpGHOa9X13DrNnYSgTkSaqeES/L3ebzNDWHdlPmhQcalegghKf N1ciULsZcA4JKA1UEcS2kn+SLScvFRtLjMbbqECPWa46NJipOjnXPnV1T6vEN2vWJLol 7vP7Q8mftBx4KqfbAn3TXtOvhN4OTxrGM+TTyD58PKdIAZj4eYW6aA8N19dprbW9Dqe8 pEWA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=5Xd63Fapv1zuFib4b0fw8zL6DTisPv2GzpUTcimolMg=; b=hmDcANRjsvOmpUka6RAuM6GmkhPAU+OlFlV10iLAPAOivqU6+tbCIf0t6VxCkCahPT 0D31o4ocNmwxvzgDJTsC1Qgzt9lQ3ZbXnXHiSG6sKowNSFCSXHwUrg6vJHipKKCL5N/+ Z+g5YCxlgCdYO4cx/FNMDEd63kQFZGwSqAtilMEr6qcNOw/9AJEict4DE49GtzmJ092s be3BgRvoj8+Nu9zxY1pyq0H8wi9rQU57RgrQ+hkiQaU/JbzCozCPlEom/fd6d53DtUy5 V9sMa24is0wIT1QMglbnWMSo2khZtgxAIEy0XuTAl8f3JDlaO+WPOMUyVg7tDbhE8bE0 O4Ig== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=G9OKNC7M; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id mz9-20020a1709071b8900b008b158cf34a7si6698156ejc.540.2023.02.17.16.30.05; Fri, 17 Feb 2023 16:30:29 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=G9OKNC7M; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229508AbjBRA30 (ORCPT + 99 others); Fri, 17 Feb 2023 19:29:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41792 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229884AbjBRA3Q (ORCPT ); Fri, 17 Feb 2023 19:29:16 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9149A5BDB0 for ; Fri, 17 Feb 2023 16:28:54 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id 64-20020a250243000000b007eba3f8e3baso2483973ybc.4 for ; Fri, 17 Feb 2023 16:28:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=5Xd63Fapv1zuFib4b0fw8zL6DTisPv2GzpUTcimolMg=; b=G9OKNC7MB5QZ1WtbCVd/b4GGEEQxFMkkCyh9YPN/SaI+Iv0UY6xpXV4C/bjLuR5GX/ xT5jqUIS0kJvgcXBGXG6eGMOl41EQ+9+EPXqDEN8/E+QrPDYAERgHP5tENbu09gPxEEE J2plHmBfhFP7cLD9TJAsfNCeQfwyLgPfEiNyKU7bNFt6UToPyOfI/Tv35qpqCPqZOx+3 BkeCahUvSvDUjoiOquOpSnIGEQ48uVtj2zGVrhp8k95YXggqw2b91kgnZMkucKn+Y1SW QrvcwU7NPBdC0gLOUvUCZFlWQklPf0wCQDXXAGua2AKBXeE+nHdPEk2ToJ76Lq1odat1 0YNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=5Xd63Fapv1zuFib4b0fw8zL6DTisPv2GzpUTcimolMg=; b=7Y0mktldTTNqhkx9PjBuw4/RXIRzAz0HFpcWlWOETlIMoi+/HPZv9MAiqlXzyoitcX LJBiy8pnWi6I4yyKq03SvrwibNymU1jdYjA1zjy82gLlLs2zQs41FMVj+hmZgFQU0laC 1VoZ/wHL9T99Jg1Qlnyh4/nGW6B7tLWhISLXohsy+097XllhkUIvQ1CefkzO7y5ufWGC OpPEEmkYq979LXoKrKgblCDSEdulYqnh8uROkrtPYpo8YTXl+rEKQxUWx9IxY2DSsoDu lOFxMsXanWtqzYfe3jNsv2G8ROpnAmHU1FMtwbdU42+I+smlVrv9G/eM9bAzrlfWblPJ EKPA== X-Gm-Message-State: AO0yUKXLbnQJsWPrxZtLUhRSFIY4Bjh2Gu6EGoYvJ4H8kGV653+9FyYe ItmULVBeE+0AIQRGgSda5aSyEACxgMMUeZe2 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a81:b60f:0:b0:52a:92e9:27c1 with SMTP id u15-20020a81b60f000000b0052a92e927c1mr279953ywh.10.1676680133715; Fri, 17 Feb 2023 16:28:53 -0800 (PST) Date: Sat, 18 Feb 2023 00:27:43 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-11-jthoughton@google.com> Subject: [PATCH v2 10/46] hugetlb: make huge_pte_lockptr take an explicit shift argument From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758126648554155863?= X-GMAIL-MSGID: =?utf-8?q?1758126648554155863?= This is needed to handle PTL locking with high-granularity mapping. We won't always be using the PMD-level PTL even if we're using the 2M hugepage hstate. It's possible that we're dealing with 4K PTEs, in which case, we need to lock the PTL for the 4K PTE. Reviewed-by: Mina Almasry Acked-by: Mike Kravetz Signed-off-by: James Houghton diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c index cb2dcdb18f8e..035a0df47af0 100644 --- a/arch/powerpc/mm/pgtable.c +++ b/arch/powerpc/mm/pgtable.c @@ -261,7 +261,8 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma, psize = hstate_get_psize(h); #ifdef CONFIG_DEBUG_VM - assert_spin_locked(huge_pte_lockptr(h, vma->vm_mm, ptep)); + assert_spin_locked(huge_pte_lockptr(huge_page_shift(h), + vma->vm_mm, ptep)); #endif #else diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index efd2635a87f5..a1ceb9417f01 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -958,12 +958,11 @@ static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask) return modified_mask; } -static inline spinlock_t *huge_pte_lockptr(struct hstate *h, +static inline spinlock_t *huge_pte_lockptr(unsigned int shift, struct mm_struct *mm, pte_t *pte) { - if (huge_page_size(h) == PMD_SIZE) + if (shift == PMD_SHIFT) return pmd_lockptr(mm, (pmd_t *) pte); - VM_BUG_ON(huge_page_size(h) == PAGE_SIZE); return &mm->page_table_lock; } @@ -1173,7 +1172,7 @@ static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask) return 0; } -static inline spinlock_t *huge_pte_lockptr(struct hstate *h, +static inline spinlock_t *huge_pte_lockptr(unsigned int shift, struct mm_struct *mm, pte_t *pte) { return &mm->page_table_lock; @@ -1230,7 +1229,7 @@ static inline spinlock_t *huge_pte_lock(struct hstate *h, { spinlock_t *ptl; - ptl = huge_pte_lockptr(h, mm, pte); + ptl = huge_pte_lockptr(huge_page_shift(h), mm, pte); spin_lock(ptl); return ptl; } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 0576dcc98044..5ca9eae0ac42 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5017,7 +5017,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, } dst_ptl = huge_pte_lock(h, dst, dst_pte); - src_ptl = huge_pte_lockptr(h, src, src_pte); + src_ptl = huge_pte_lockptr(huge_page_shift(h), src, src_pte); spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); entry = huge_ptep_get(src_pte); again: @@ -5098,7 +5098,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, /* Install the new hugetlb folio if src pte stable */ dst_ptl = huge_pte_lock(h, dst, dst_pte); - src_ptl = huge_pte_lockptr(h, src, src_pte); + src_ptl = huge_pte_lockptr(huge_page_shift(h), + src, src_pte); spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); entry = huge_ptep_get(src_pte); if (!pte_same(src_pte_old, entry)) { @@ -5152,7 +5153,7 @@ static void move_huge_pte(struct vm_area_struct *vma, unsigned long old_addr, pte_t pte; dst_ptl = huge_pte_lock(h, mm, dst_pte); - src_ptl = huge_pte_lockptr(h, mm, src_pte); + src_ptl = huge_pte_lockptr(huge_page_shift(h), mm, src_pte); /* * We don't have to worry about the ordering of src and dst ptlocks diff --git a/mm/migrate.c b/mm/migrate.c index b0f87f19b536..9b4a7e75f6e6 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -363,7 +363,8 @@ void __migration_entry_wait_huge(struct vm_area_struct *vma, void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte) { - spinlock_t *ptl = huge_pte_lockptr(hstate_vma(vma), vma->vm_mm, pte); + spinlock_t *ptl = huge_pte_lockptr(huge_page_shift(hstate_vma(vma)), + vma->vm_mm, pte); __migration_entry_wait_huge(vma, pte, ptl); } From patchwork Sat Feb 18 00:27:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58817 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp142429wrn; Fri, 17 Feb 2023 16:30:39 -0800 (PST) X-Google-Smtp-Source: AK7set/4r1F7yUGcvsYPHBFVutAc0KuiHUJXfWw9xzZmyw1NkXWl7U/yu2EEzYCKK1B/bba1J31/ X-Received: by 2002:a05:6402:42c9:b0:4a1:e4fa:7db2 with SMTP id i9-20020a05640242c900b004a1e4fa7db2mr8585413edc.17.1676680239038; Fri, 17 Feb 2023 16:30:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676680239; cv=none; d=google.com; s=arc-20160816; b=yk1CtWOduXXiCWZJePH2nozxudoyBT59ETKGXNoCIWktvp3CBNbzOYhQwVzOIxJGGs Hq5TuylrzePYcY104G5Af9rzUsHKd6FvE4+fezCK+fhey59t6Hb9TfCu+J2MKTkLDWQD 6UZI6mPShClkd3gODqT4Ie1hlO4VjETldHjAmlf5/vXh2v6pgETv8Kwq5BZZYdmVlTb3 2gA36+irhQcV5HgCj8u8QROhtT/qMfjh3uuronyXePPNqMsoP0yco+C/deXaWi6d6QFD FGXD2dxAZt4wq8kMmd6Dgb1L95E9weg8m24QR7VO23SFl4yYW709K7QU9RZ/PUr7AAq+ dZBA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=Pz2nYroaVRrGSVy94qGG4nZ9E30MSqR5I0rmAzuTvq0=; b=fp4Vgl9Tr1KhUfDAn4yue2VD2uECyBnsxz0nFYTbIVMfIglX1sUElgg9QxDrwZEa8m vUusi2aC4+aJudQRXYIcPB8J5QNrCLyhWtG4pDOBNCT+03faDnYn29Otj6GiQ4rh2piG emRD0zwKaxdsmzXuZAoK1cY0yYmsig2fiSZVEzbELa7cQOEAGdmTbQLxTsJr9zik0RJM Q4IGcGTROryKQspu4X7jG5KN80iNzmUCiOPmW6OsLPMFrTykJvUBpPsHUBBAuT7VSWHA PQT+B5w5Bsz+T6wBG1MaBwS/3C6G4uepu7g3PGdJNuAhatFAEyxn/5UJ3m7WXCpzEiU0 JLXA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=V1UHwis9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l17-20020a056402125100b004acc562f1dcsi6631706edw.601.2023.02.17.16.30.15; Fri, 17 Feb 2023 16:30:39 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=V1UHwis9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230062AbjBRA33 (ORCPT + 99 others); Fri, 17 Feb 2023 19:29:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41804 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229951AbjBRA3Q (ORCPT ); Fri, 17 Feb 2023 19:29:16 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 18D716ABC9 for ; Fri, 17 Feb 2023 16:28:55 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id l206-20020a25ccd7000000b006fdc6aaec4fso2656724ybf.20 for ; Fri, 17 Feb 2023 16:28:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Pz2nYroaVRrGSVy94qGG4nZ9E30MSqR5I0rmAzuTvq0=; b=V1UHwis97dacsHy0l1b3lzUp4aEffElgkQqFRGAP4R9Hw6zzh9pL3pU8TQ6LeCkOOO LAnGR0E6fLjGxKpr6E0yb8yofQJ3Yplq8n0cLT/bshgkpsM/hMNaBPY6WVg4rCJqcfel PmhHCvaQjRLpj3PZA4l3c17dZhuSlGVknfSXIUsszWjHRSH2g8HHkboEveCGaLYpHpYI huKFnLPcITV45hT7FdZX9JtRZdP0XFlN1WbgnsZXqSux2N7F1a5KMCJ3UlJJdxC6K6rW 3mwyAccoa5IIH5iiNCO6YwhAkDP3JU0L27dI4gD0cC8hs7RdC32Vnk5NUwnl613BgcAj bg0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Pz2nYroaVRrGSVy94qGG4nZ9E30MSqR5I0rmAzuTvq0=; b=BtJwc+hq0KUweiaaK/OSQmOHE0bMaoAE+zp5RUzUHkGJ6wG049gVstSMxE0CrFL7sX LBwy5kaIXShajNOvYM5TNWqlh21yneKwgmdPGB0oNaqg9M7iCfFHtnb8ff3RSWhLCpoA d2WgR6Un3UnobYhlWBimcuvNt5/MExVniq9VYx3ARJGkGJvX8w0oEDf3KuBOApvMWRW2 8aXacgWb10H+qwzflludmfyR8GBCCLb5LiI36lGDT1f5HBP6MWPjXRY1Yhd1AwEpY2Mv sCiYmpiX4V99MmRz+uwAFi5pneqtszFc6u+0VLOsj31RsYpdWDgZvCEBxYfmPZahnIx6 8pcQ== X-Gm-Message-State: AO0yUKU2PaKJSL8/EfMtDX9UlcHi62l3kDx1ODq92P3qZhSPYxMNzli8 Xk7w44/o6p5ZO39LZs6wcgCEdE+IMIzCKWuV X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6902:10c:b0:997:c919:4484 with SMTP id o12-20020a056902010c00b00997c9194484mr28393ybh.6.1676680134689; Fri, 17 Feb 2023 16:28:54 -0800 (PST) Date: Sat, 18 Feb 2023 00:27:44 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-12-jthoughton@google.com> Subject: [PATCH v2 11/46] hugetlb: add hugetlb_pte to track HugeTLB page table entries From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758126658020476440?= X-GMAIL-MSGID: =?utf-8?q?1758126658020476440?= After high-granularity mapping, page table entries for HugeTLB pages can be of any size/type. (For example, we can have a 1G page mapped with a mix of PMDs and PTEs.) This struct is to help keep track of a HugeTLB PTE after we have done a page table walk. Without this, we'd have to pass around the "size" of the PTE everywhere. We effectively did this before; it could be fetched from the hstate, which we pass around pretty much everywhere. hugetlb_pte_present_leaf is included here as a helper function that will be used frequently later on. Signed-off-by: James Houghton Reviewed-by: Mina Almasry Acked-by: Mike Kravetz diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index a1ceb9417f01..eeacadf3272b 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -26,6 +26,25 @@ typedef struct { unsigned long pd; } hugepd_t; #define __hugepd(x) ((hugepd_t) { (x) }) #endif +enum hugetlb_level { + HUGETLB_LEVEL_PTE = 1, + /* + * We always include PMD, PUD, and P4D in this enum definition so that, + * when logged as an integer, we can easily tell which level it is. + */ + HUGETLB_LEVEL_PMD, + HUGETLB_LEVEL_PUD, + HUGETLB_LEVEL_P4D, + HUGETLB_LEVEL_PGD, +}; + +struct hugetlb_pte { + pte_t *ptep; + unsigned int shift; + enum hugetlb_level level; + spinlock_t *ptl; +}; + #ifdef CONFIG_HUGETLB_PAGE #include @@ -39,6 +58,20 @@ typedef struct { unsigned long pd; } hugepd_t; */ #define __NR_USED_SUBPAGE 3 +static inline +unsigned long hugetlb_pte_size(const struct hugetlb_pte *hpte) +{ + return 1UL << hpte->shift; +} + +static inline +unsigned long hugetlb_pte_mask(const struct hugetlb_pte *hpte) +{ + return ~(hugetlb_pte_size(hpte) - 1); +} + +bool hugetlb_pte_present_leaf(const struct hugetlb_pte *hpte, pte_t pte); + struct hugepage_subpool { spinlock_t lock; long count; @@ -1234,6 +1267,45 @@ static inline spinlock_t *huge_pte_lock(struct hstate *h, return ptl; } +static inline +spinlock_t *hugetlb_pte_lockptr(struct hugetlb_pte *hpte) +{ + return hpte->ptl; +} + +static inline +spinlock_t *hugetlb_pte_lock(struct hugetlb_pte *hpte) +{ + spinlock_t *ptl = hugetlb_pte_lockptr(hpte); + + spin_lock(ptl); + return ptl; +} + +static inline +void __hugetlb_pte_init(struct hugetlb_pte *hpte, pte_t *ptep, + unsigned int shift, enum hugetlb_level level, + spinlock_t *ptl) +{ + /* + * If 'shift' indicates that this PTE is contiguous, then @ptep must + * be the first pte of the contiguous bunch. + */ + hpte->ptl = ptl; + hpte->ptep = ptep; + hpte->shift = shift; + hpte->level = level; +} + +static inline +void hugetlb_pte_init(struct mm_struct *mm, struct hugetlb_pte *hpte, + pte_t *ptep, unsigned int shift, + enum hugetlb_level level) +{ + __hugetlb_pte_init(hpte, ptep, shift, level, + huge_pte_lockptr(shift, mm, ptep)); +} + #if defined(CONFIG_HUGETLB_PAGE) && defined(CONFIG_CMA) extern void __init hugetlb_cma_reserve(int order); #else diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 5ca9eae0ac42..6c74adff43b6 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1269,6 +1269,35 @@ static bool vma_has_reserves(struct vm_area_struct *vma, long chg) return false; } +bool hugetlb_pte_present_leaf(const struct hugetlb_pte *hpte, pte_t pte) +{ + pgd_t pgd; + p4d_t p4d; + pud_t pud; + pmd_t pmd; + + switch (hpte->level) { + case HUGETLB_LEVEL_PGD: + pgd = __pgd(pte_val(pte)); + return pgd_present(pgd) && pgd_leaf(pgd); + case HUGETLB_LEVEL_P4D: + p4d = __p4d(pte_val(pte)); + return p4d_present(p4d) && p4d_leaf(p4d); + case HUGETLB_LEVEL_PUD: + pud = __pud(pte_val(pte)); + return pud_present(pud) && pud_leaf(pud); + case HUGETLB_LEVEL_PMD: + pmd = __pmd(pte_val(pte)); + return pmd_present(pmd) && pmd_leaf(pmd); + case HUGETLB_LEVEL_PTE: + return pte_present(pte); + default: + WARN_ON_ONCE(1); + return false; + } +} + + static void enqueue_hugetlb_folio(struct hstate *h, struct folio *folio) { int nid = folio_nid(folio); From patchwork Sat Feb 18 00:27:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58818 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp142437wrn; Fri, 17 Feb 2023 16:30:40 -0800 (PST) X-Google-Smtp-Source: AK7set8uxK3r2kqrpRSBiGnpHCAY9FXrjQXoqWI7qm53Y4FfxO8FPjA4fWp9TO2bAU7MGIVT2fNX X-Received: by 2002:a17:906:9688:b0:8b1:32b0:2a24 with SMTP id w8-20020a170906968800b008b132b02a24mr1426267ejx.47.1676680240610; Fri, 17 Feb 2023 16:30:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676680240; cv=none; d=google.com; s=arc-20160816; b=O7hjP5/fjgqf9CdQMn4CXqx/cT+n3297VdyG+mcMLbQMsaR+FahuMQSSI31MxKflpg DuXaS7uZdjU7o45JDurcqG4TCezy2TuFBH7cqT4EhOmrugEFOzVGa6OLZ+aQFLpUJ0La +rbCYE/2J7N79LO/A8+Z3TMfjQOzrGkveR2Bb8IxmTjk+TcnguIcEkSGQNaFfZeqOWvv qndjNelWrLpfqh5AN7e1beg8EVso+DRJzOQ1Jfhg8OgZGhcgFQEDuhyN2ZDcDw3u32cw 5sdmx9pMFM9n84Fq/a5ZXokn/+KIeJgp3KYEMa6KQRD0QLOizQH6c90ZrANqUbq2WQOa /Oxg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=Coc3VejVpx2o70gw2F1FMLmVPB7+pJmCD4QBd1TWaEQ=; b=Y+/mEwyMQBd2/z8eHmQMgW8H6Lj/2bDEJPqwiUoAl3Pg19oNeEeMnXcc3Udo/+zFbl GbPAMZpWZESJF3+00yiW5PxqN1NpjP3tIqK6Dug3yqqV05fzuvuCw+PLgZq8/+4M8Ud3 uxQPh8vxMl4EroFWqKTA+4GcLN1LsfdPJl+N+Eow3X0yeymc5xPrIEVl9jR/4/N3AzPG CQ5xoZ6KXMsA831KYr4cv8h1SJNVTZDzKskcUGQnmpeuPbx8nn7d8Y3YkVlbIAyQRQYX GcoDJtmTDLxFQxm+juAD7rApnhpWMpJJWglDIAyZjF4gfyJKzuPOZqbCIP2Q7wxwdyfD NYTQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=b9QeURwu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bp11-20020a170907918b00b008787e4ac0bfsi5548464ejb.719.2023.02.17.16.30.18; Fri, 17 Feb 2023 16:30:40 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=b9QeURwu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230064AbjBRA3c (ORCPT + 99 others); Fri, 17 Feb 2023 19:29:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43048 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229694AbjBRA3R (ORCPT ); Fri, 17 Feb 2023 19:29:17 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 118F16ABF3 for ; Fri, 17 Feb 2023 16:28:56 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-5365a2b9e4fso19053897b3.15 for ; Fri, 17 Feb 2023 16:28:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Coc3VejVpx2o70gw2F1FMLmVPB7+pJmCD4QBd1TWaEQ=; b=b9QeURwuMOHoR6oMuvaNhmrHn5fZGAuH6iVnLRa9UH1kSasBCwgdba6QJPd+K6jzHb xz6s/6q+eVA72+3bOi37v4IpLi+gt+dDSU4ZiqKrbXd+tRUGRvRJ1mGjZ02I2XMyurj4 SsSE1g1trZVfhuYlrhfBM4w2eQv5qoEpUPlr5CJMDFyaM52fzsGSeRGzxi0rFH4dVLC6 IFd4sxykmk7qNUAT62W1oRvfIgHfE5E1BizMsR8G5eNcCZgGLkwZhgNod0TNXPX+Z62j 6Y1IxEaN7fiBpSKCe960CuuSJGub6psv+J5ih1KSgDonLBFQ4O9QsMNnykmF5WR/o4UG 3CMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Coc3VejVpx2o70gw2F1FMLmVPB7+pJmCD4QBd1TWaEQ=; b=klwZ9okWf7daMKyeDQ0qFM8L7FDsN5pOwsL7CRHRgZ3SM5JhPzz8QOzOydPwjoACUj 7VXFCi3kyGhJ/jRrOZWSw6ogAjQdaktugZf/lpSroB8MctgELZQvGyiL7jNbYp2emez5 10KCnYsM+OEL+oVtyRjRjZnDc1YxPuHnkjQUgOhsZ13E+ppVw97ZCuFFqrtO9YltPny/ rtxU+99k/BnlmDBAaHpvY4wf7hmkeSRQ3GvPIl6fQWXFsSowC2dJH6x3p2An5L9yALLe N7RuUwpGn3UDSLYPU3UPdaivWXw/GeGnesY12IhtIN17lrJrQhTJf+6fSBr+80UmIlf3 ziPQ== X-Gm-Message-State: AO0yUKXOH+ifzjGYDFbF9mVdSLaY0CypMaP58GQQo72c/iJJTcrXjhom h4kxniod5CGCa2R7DnPUSoThV53CdGmIAxcB X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a81:b705:0:b0:534:d71f:14e6 with SMTP id v5-20020a81b705000000b00534d71f14e6mr53501ywh.9.1676680135521; Fri, 17 Feb 2023 16:28:55 -0800 (PST) Date: Sat, 18 Feb 2023 00:27:45 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-13-jthoughton@google.com> Subject: [PATCH v2 12/46] hugetlb: add hugetlb_alloc_pmd and hugetlb_alloc_pte From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758126659898156037?= X-GMAIL-MSGID: =?utf-8?q?1758126659898156037?= These functions are used to allocate new PTEs below the hstate PTE. This will be used by hugetlb_walk_step, which implements stepping forwards in a HugeTLB high-granularity page table walk. The reasons that we don't use the standard pmd_alloc/pte_alloc* functions are: 1) This prevents us from accidentally overwriting swap entries or attempting to use swap entries as present non-leaf PTEs (see pmd_alloc(); we assume that !pte_none means pte_present and non-leaf). 2) Locking hugetlb PTEs can different than regular PTEs. (Although, as implemented right now, locking is the same.) 3) We can maintain compatibility with CONFIG_HIGHPTE. That is, HugeTLB HGM won't use HIGHPTE, but the kernel can still be built with it, and other mm code will use it. When GENERAL_HUGETLB supports P4D-based hugepages, we will need to implement hugetlb_pud_alloc to implement hugetlb_walk_step. Signed-off-by: James Houghton diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index eeacadf3272b..9d839519c875 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -72,6 +72,11 @@ unsigned long hugetlb_pte_mask(const struct hugetlb_pte *hpte) bool hugetlb_pte_present_leaf(const struct hugetlb_pte *hpte, pte_t pte); +pmd_t *hugetlb_alloc_pmd(struct mm_struct *mm, struct hugetlb_pte *hpte, + unsigned long addr); +pte_t *hugetlb_alloc_pte(struct mm_struct *mm, struct hugetlb_pte *hpte, + unsigned long addr); + struct hugepage_subpool { spinlock_t lock; long count; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 6c74adff43b6..bb424cdf79e4 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -483,6 +483,120 @@ static bool has_same_uncharge_info(struct file_region *rg, #endif } +/* + * hugetlb_alloc_pmd -- Allocate or find a PMD beneath a PUD-level hpte. + * + * This is meant to be used to implement hugetlb_walk_step when one must go to + * step down to a PMD. Different architectures may implement hugetlb_walk_step + * differently, but hugetlb_alloc_pmd and hugetlb_alloc_pte are architecture- + * independent. + * + * Returns: + * On success: the pointer to the PMD. This should be placed into a + * hugetlb_pte. @hpte is not changed. + * ERR_PTR(-EINVAL): hpte is not PUD-level + * ERR_PTR(-EEXIST): there is a non-leaf and non-empty PUD in @hpte + * ERR_PTR(-ENOMEM): could not allocate the new PMD + */ +pmd_t *hugetlb_alloc_pmd(struct mm_struct *mm, struct hugetlb_pte *hpte, + unsigned long addr) +{ + spinlock_t *ptl = hugetlb_pte_lockptr(hpte); + pmd_t *new; + pud_t *pudp; + pud_t pud; + + if (hpte->level != HUGETLB_LEVEL_PUD) + return ERR_PTR(-EINVAL); + + pudp = (pud_t *)hpte->ptep; +retry: + pud = READ_ONCE(*pudp); + if (likely(pud_present(pud))) + return unlikely(pud_leaf(pud)) + ? ERR_PTR(-EEXIST) + : pmd_offset(pudp, addr); + else if (!pud_none(pud)) + /* + * Not present and not none means that a swap entry lives here, + * and we can't get rid of it. + */ + return ERR_PTR(-EEXIST); + + new = pmd_alloc_one(mm, addr); + if (!new) + return ERR_PTR(-ENOMEM); + + spin_lock(ptl); + if (!pud_same(pud, *pudp)) { + spin_unlock(ptl); + pmd_free(mm, new); + goto retry; + } + + mm_inc_nr_pmds(mm); + smp_wmb(); /* See comment in pmd_install() */ + pud_populate(mm, pudp, new); + spin_unlock(ptl); + return pmd_offset(pudp, addr); +} + +/* + * hugetlb_alloc_pte -- Allocate a PTE beneath a pmd_none PMD-level hpte. + * + * See the comment above hugetlb_alloc_pmd. + */ +pte_t *hugetlb_alloc_pte(struct mm_struct *mm, struct hugetlb_pte *hpte, + unsigned long addr) +{ + spinlock_t *ptl = hugetlb_pte_lockptr(hpte); + pgtable_t new; + pmd_t *pmdp; + pmd_t pmd; + + if (hpte->level != HUGETLB_LEVEL_PMD) + return ERR_PTR(-EINVAL); + + pmdp = (pmd_t *)hpte->ptep; +retry: + pmd = READ_ONCE(*pmdp); + if (likely(pmd_present(pmd))) + return unlikely(pmd_leaf(pmd)) + ? ERR_PTR(-EEXIST) + : pte_offset_kernel(pmdp, addr); + else if (!pmd_none(pmd)) + /* + * Not present and not none means that a swap entry lives here, + * and we can't get rid of it. + */ + return ERR_PTR(-EEXIST); + + /* + * With CONFIG_HIGHPTE, calling `pte_alloc_one` directly may result + * in page tables being allocated in high memory, needing a kmap to + * access. Instead, we call __pte_alloc_one directly with + * GFP_PGTABLE_USER to prevent these PTEs being allocated in high + * memory. + */ + new = __pte_alloc_one(mm, GFP_PGTABLE_USER); + if (!new) + return ERR_PTR(-ENOMEM); + + spin_lock(ptl); + if (!pmd_same(pmd, *pmdp)) { + spin_unlock(ptl); + pgtable_pte_page_dtor(new); + __free_page(new); + goto retry; + } + + mm_inc_nr_ptes(mm); + smp_wmb(); /* See comment in pmd_install() */ + pmd_populate(mm, pmdp, new); + spin_unlock(ptl); + return pte_offset_kernel(pmdp, addr); +} + static void coalesce_file_region(struct resv_map *resv, struct file_region *rg) { struct file_region *nrg, *prg; From patchwork Sat Feb 18 00:27:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58819 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp142478wrn; Fri, 17 Feb 2023 16:30:45 -0800 (PST) X-Google-Smtp-Source: AK7set+X9ah4VRk3jV1lJoIeYRysS7ef9dhJ3Ss3eRDoxU8d0bidmeGwDA6SE4pJpe5Z5mAj6qgJ X-Received: by 2002:a17:906:f84b:b0:8aa:a802:adcd with SMTP id ks11-20020a170906f84b00b008aaa802adcdmr173262ejb.30.1676680245218; Fri, 17 Feb 2023 16:30:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676680245; cv=none; d=google.com; s=arc-20160816; b=J75QIou7/7Kj1H+gEZRDaXPWpNljgui5IVkIvSDtBxGAH96Bh61ZC3PiUqZ2/3L5go t27JEqeI9bcLpkLFV50q9X6xNaQhxoVLmEVOQtb4s3TCr2lImzpjakAUD7TzVY869Lmf 1BaQ18sfxOo6lRXUTD2baBZmNJNxMk0VdS1r7O4TdT0ElFBXHr2dVVbzCj2eThpVZYHK /hlYhBIb0qfkOQa4jtl4C7GcBCYwsDsl+uWQ9s05+E1pj0uVEwkGqicLmYjCtZ3Ngbd0 Q20I9KK+W/cgJxxckiySaprDEKM5I1REmDdS95x0M9RiUFa+Kr2xzPHiSciKstAaXq08 x1SA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=uRKYla0TQmxQv6jLt3zA0HnNAsQhQcQxeVgRZIxAfD0=; b=GD3JREJiL01QdV6CTG3kSqBjVK20SuDQ5VJSuMkgYrXruCDZfWdULHpS92Mri8kGD/ MGFTvUmvdrHMo11SmjhNxyACpKamSoREVKMPNumH2sFg+9toMKJmTq9l+iSS2c306rFU GaS+8izJhbVV2YDHY4VdjUnotA06V3WhuQtXVXbWz0L+/w2nXWuUWN4DNKdmRxgb3/Uv kwX+iWDuUPoLc0jTrQgI/TkwSHoeo6Dv8nWYiY5TuAAnj4UEGRdN/Yb+OnawQg/wSkC0 pcwz60EqIBmE/cHbQw68o2KfuZ8Fhr9KOHgBG4L3Hk9yp1a4b93Odw9uv63sbU011FKz vVvg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=aY3F+EdX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d8-20020a17090692c800b008b27150c987si3380483ejx.15.2023.02.17.16.30.22; Fri, 17 Feb 2023 16:30:45 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=aY3F+EdX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230078AbjBRA3k (ORCPT + 99 others); Fri, 17 Feb 2023 19:29:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43004 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230009AbjBRA3S (ORCPT ); Fri, 17 Feb 2023 19:29:18 -0500 Received: from mail-ua1-x94a.google.com (mail-ua1-x94a.google.com [IPv6:2607:f8b0:4864:20::94a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0BA966ABE3 for ; Fri, 17 Feb 2023 16:28:57 -0800 (PST) Received: by mail-ua1-x94a.google.com with SMTP id j4-20020ab06004000000b0068b93413c63so812928ual.18 for ; Fri, 17 Feb 2023 16:28:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1676680136; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=uRKYla0TQmxQv6jLt3zA0HnNAsQhQcQxeVgRZIxAfD0=; b=aY3F+EdXtCUshFPXDCNcttdzNX+c4LbhQi1SlJKFNgfqHWfMJ9vDmVr9sTpieHk05X o4uR5QadzoXUGRF67oJyJbtK2h79UlcD3xZm62j78SP9Oa4ap24kfVkt0+OF4A/n+edW 7IAq9oxJqPxznoi+TtMM93uAoNkDnONQ+ts18nLlvOqmoYn3iH/qQKh7/ZFGrihwcuj9 1V4KX6t2Pwg0FKhmdUmpWWjWMENeKOVdjc9494RWzzvR00UMehx3hUstaQ0j8uMeOex4 5VuSVQoGMANGWUUusrSYOeAZy9IVSY0n/qOvAGec/mTz8lN3nGuzMNi0BmdrRxSyqNfw Sbbg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1676680136; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=uRKYla0TQmxQv6jLt3zA0HnNAsQhQcQxeVgRZIxAfD0=; b=sPrLLK6vsihjMt6o+NwR2KqE6PQrEr644HtbFMBmmTBWpgWrDhW25qcndTV3emU9JK fTZgdnrr9wnFkytlrXY5NDmrELsG40qIXingdbA3uLTDCuIKZ2ROh2qo3ytj1OsQ0zKU KMYBrd49zPEPqXfGbNKOcTpAk9DLSqv82itM/orswCTERqs63h5HRyVpkRvtZgBk2Rxb Fly7ZLSqB48BpyMqE2ZwJzShyEehR33yiVOOPz5gy7FGxNF/eLdUaOz+UomMb5JD5GYv X2WuBJl9UkQGTAFcOMyPvTh3GbP1soH40SW7tF9yVHf0DvtCXjjypYQgDXDVdenrL5tc 7XRQ== X-Gm-Message-State: AO0yUKVPN32B/j2XVD4urBD9a5+99ggPeEZaZV2OD85tFL1GO/QO388P jrJ+fBmPdoEjJp0TlGM4J04NarKq3QHGkoTv X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6102:153:b0:417:159c:218b with SMTP id a19-20020a056102015300b00417159c218bmr652647vsr.13.1676680136399; Fri, 17 Feb 2023 16:28:56 -0800 (PST) Date: Sat, 18 Feb 2023 00:27:46 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-14-jthoughton@google.com> Subject: [PATCH v2 13/46] hugetlb: add hugetlb_hgm_walk and hugetlb_walk_step From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758126664499221137?= X-GMAIL-MSGID: =?utf-8?q?1758126664499221137?= hugetlb_hgm_walk implements high-granularity page table walks for HugeTLB. It is safe to call on non-HGM enabled VMAs; it will return immediately. hugetlb_walk_step implements how we step forwards in the walk. For architectures that don't use GENERAL_HUGETLB, they will need to provide their own implementation. The broader API that should be used is hugetlb_full_walk[,alloc|,continue]. Signed-off-by: James Houghton diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 9d839519c875..726d581158b1 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -223,6 +223,14 @@ u32 hugetlb_fault_mutex_hash(struct address_space *mapping, pgoff_t idx); pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, pud_t *pud); +int hugetlb_full_walk(struct hugetlb_pte *hpte, struct vm_area_struct *vma, + unsigned long addr); +void hugetlb_full_walk_continue(struct hugetlb_pte *hpte, + struct vm_area_struct *vma, unsigned long addr); +int hugetlb_full_walk_alloc(struct hugetlb_pte *hpte, + struct vm_area_struct *vma, unsigned long addr, + unsigned long target_sz); + struct address_space *hugetlb_page_mapping_lock_write(struct page *hpage); extern int sysctl_hugetlb_shm_group; @@ -272,6 +280,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr, unsigned long sz); unsigned long hugetlb_mask_last_page(struct hstate *h); +int hugetlb_walk_step(struct mm_struct *mm, struct hugetlb_pte *hpte, + unsigned long addr, unsigned long sz); int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, pte_t *ptep); void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, @@ -1054,6 +1064,8 @@ void hugetlb_register_node(struct node *node); void hugetlb_unregister_node(struct node *node); #endif +enum hugetlb_level hpage_size_to_level(unsigned long sz); + #else /* CONFIG_HUGETLB_PAGE */ struct hstate {}; @@ -1246,6 +1258,11 @@ static inline void hugetlb_register_node(struct node *node) static inline void hugetlb_unregister_node(struct node *node) { } + +static inline enum hugetlb_level hpage_size_to_level(unsigned long sz) +{ + return HUGETLB_LEVEL_PTE; +} #endif /* CONFIG_HUGETLB_PAGE */ #ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING diff --git a/mm/hugetlb.c b/mm/hugetlb.c index bb424cdf79e4..810c05feb41f 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -97,6 +97,29 @@ static void __hugetlb_vma_unlock_write_free(struct vm_area_struct *vma); static void hugetlb_unshare_pmds(struct vm_area_struct *vma, unsigned long start, unsigned long end); +/* + * hpage_size_to_level() - convert @sz to the corresponding page table level + * + * @sz must be less than or equal to a valid hugepage size. + */ +enum hugetlb_level hpage_size_to_level(unsigned long sz) +{ + /* + * We order the conditionals from smallest to largest to pick the + * smallest level when multiple levels have the same size (i.e., + * when levels are folded). + */ + if (sz < PMD_SIZE) + return HUGETLB_LEVEL_PTE; + if (sz < PUD_SIZE) + return HUGETLB_LEVEL_PMD; + if (sz < P4D_SIZE) + return HUGETLB_LEVEL_PUD; + if (sz < PGDIR_SIZE) + return HUGETLB_LEVEL_P4D; + return HUGETLB_LEVEL_PGD; +} + static inline bool subpool_is_free(struct hugepage_subpool *spool) { if (spool->count) @@ -7315,6 +7338,154 @@ bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr) } #endif /* CONFIG_ARCH_WANT_HUGE_PMD_SHARE */ +/* __hugetlb_hgm_walk - walks a high-granularity HugeTLB page table to resolve + * the page table entry for @addr. We might allocate new PTEs. + * + * @hpte must always be pointing at an hstate-level PTE or deeper. + * + * This function will never walk further if it encounters a PTE of a size + * less than or equal to @sz. + * + * @alloc determines what we do when we encounter an empty PTE. If false, + * we stop walking. If true and @sz is less than the current PTE's size, + * we make that PTE point to the next level down, going until @sz is the same + * as our current PTE. + * + * If @alloc is false and @sz is PAGE_SIZE, this function will always + * succeed, but that does not guarantee that hugetlb_pte_size(hpte) is @sz. + * + * Return: + * -ENOMEM if we couldn't allocate new PTEs. + * -EEXIST if the caller wanted to walk further than a migration PTE, + * poison PTE, or a PTE marker. The caller needs to manually deal + * with this scenario. + * -EINVAL if called with invalid arguments (@sz invalid, @hpte not + * initialized). + * 0 otherwise. + * + * Even if this function fails, @hpte is guaranteed to always remain + * valid. + */ +static int __hugetlb_hgm_walk(struct mm_struct *mm, struct vm_area_struct *vma, + struct hugetlb_pte *hpte, unsigned long addr, + unsigned long sz, bool alloc) +{ + int ret = 0; + pte_t pte; + + if (WARN_ON_ONCE(sz < PAGE_SIZE)) + return -EINVAL; + + if (WARN_ON_ONCE(!hpte->ptep)) + return -EINVAL; + + while (hugetlb_pte_size(hpte) > sz && !ret) { + pte = huge_ptep_get(hpte->ptep); + if (!pte_present(pte)) { + if (!alloc) + return 0; + if (unlikely(!huge_pte_none(pte))) + return -EEXIST; + } else if (hugetlb_pte_present_leaf(hpte, pte)) + return 0; + ret = hugetlb_walk_step(mm, hpte, addr, sz); + } + + return ret; +} + +/* + * hugetlb_hgm_walk - Has the same behavior as __hugetlb_hgm_walk but will + * initialize @hpte with hstate-level PTE pointer @ptep. + */ +static int hugetlb_hgm_walk(struct hugetlb_pte *hpte, + pte_t *ptep, + struct vm_area_struct *vma, + unsigned long addr, + unsigned long target_sz, + bool alloc) +{ + struct hstate *h = hstate_vma(vma); + + hugetlb_pte_init(vma->vm_mm, hpte, ptep, huge_page_shift(h), + hpage_size_to_level(huge_page_size(h))); + return __hugetlb_hgm_walk(vma->vm_mm, vma, hpte, addr, target_sz, + alloc); +} + +/* + * hugetlb_full_walk_continue - continue a high-granularity page-table walk. + * + * If a user has a valid @hpte but knows that @hpte is not a leaf, they can + * attempt to continue walking by calling this function. + * + * This function will never fail, but @hpte might not change. + * + * If @hpte hasn't been initialized, then this function's behavior is + * undefined. + */ +void hugetlb_full_walk_continue(struct hugetlb_pte *hpte, + struct vm_area_struct *vma, + unsigned long addr) +{ + /* __hugetlb_hgm_walk will never fail with these arguments. */ + WARN_ON_ONCE(__hugetlb_hgm_walk(vma->vm_mm, vma, hpte, addr, + PAGE_SIZE, false)); +} + +/* + * hugetlb_full_walk - do a high-granularity page-table walk; never allocate. + * + * This function can only fail if we find that the hstate-level PTE is not + * allocated. Callers can take advantage of this fact to skip address regions + * that cannot be mapped in that case. + * + * If this function succeeds, @hpte is guaranteed to be valid. + */ +int hugetlb_full_walk(struct hugetlb_pte *hpte, + struct vm_area_struct *vma, + unsigned long addr) +{ + struct hstate *h = hstate_vma(vma); + unsigned long sz = huge_page_size(h); + /* + * We must mask the address appropriately so that we pick up the first + * PTE in a contiguous group. + */ + pte_t *ptep = hugetlb_walk(vma, addr & huge_page_mask(h), sz); + + if (!ptep) + return -ENOMEM; + + /* hugetlb_hgm_walk will never fail with these arguments. */ + WARN_ON_ONCE(hugetlb_hgm_walk(hpte, ptep, vma, addr, PAGE_SIZE, false)); + return 0; +} + +/* + * hugetlb_full_walk_alloc - do a high-granularity walk, potentially allocate + * new PTEs. + */ +int hugetlb_full_walk_alloc(struct hugetlb_pte *hpte, + struct vm_area_struct *vma, + unsigned long addr, + unsigned long target_sz) +{ + struct hstate *h = hstate_vma(vma); + unsigned long sz = huge_page_size(h); + /* + * We must mask the address appropriately so that we pick up the first + * PTE in a contiguous group. + */ + pte_t *ptep = huge_pte_alloc(vma->vm_mm, vma, addr & huge_page_mask(h), + sz); + + if (!ptep) + return -ENOMEM; + + return hugetlb_hgm_walk(hpte, ptep, vma, addr, target_sz, true); +} + #ifdef CONFIG_ARCH_WANT_GENERAL_HUGETLB pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz) @@ -7382,6 +7553,48 @@ pte_t *huge_pte_offset(struct mm_struct *mm, return (pte_t *)pmd; } +/* + * hugetlb_walk_step() - Walk the page table one step to resolve the page + * (hugepage or subpage) entry at address @addr. + * + * @sz always points at the final target PTE size (e.g. PAGE_SIZE for the + * lowest level PTE). + * + * @hpte will always remain valid, even if this function fails. + * + * Architectures that implement this function must ensure that if @hpte does + * not change levels, then its PTL must also stay the same. + */ +int hugetlb_walk_step(struct mm_struct *mm, struct hugetlb_pte *hpte, + unsigned long addr, unsigned long sz) +{ + pte_t *ptep; + spinlock_t *ptl; + + switch (hpte->level) { + case HUGETLB_LEVEL_PUD: + ptep = (pte_t *)hugetlb_alloc_pmd(mm, hpte, addr); + if (IS_ERR(ptep)) + return PTR_ERR(ptep); + hugetlb_pte_init(mm, hpte, ptep, PMD_SHIFT, + HUGETLB_LEVEL_PMD); + break; + case HUGETLB_LEVEL_PMD: + ptep = hugetlb_alloc_pte(mm, hpte, addr); + if (IS_ERR(ptep)) + return PTR_ERR(ptep); + ptl = pte_lockptr(mm, (pmd_t *)hpte->ptep); + __hugetlb_pte_init(hpte, ptep, PAGE_SHIFT, + HUGETLB_LEVEL_PTE, ptl); + break; + default: + WARN_ONCE(1, "%s: got invalid level: %d (shift: %d)\n", + __func__, hpte->level, hpte->shift); + return -EINVAL; + } + return 0; +} + /* * Return a mask that can be used to update an address to the last huge * page in a page table page mapping size. Used to skip non-present From patchwork Sat Feb 18 00:27:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58821 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp142521wrn; Fri, 17 Feb 2023 16:30:52 -0800 (PST) X-Google-Smtp-Source: AK7set9VT9TUPsd7+Y4SqrfEHX1Njr5xmm4N2kOM058q4SzYceJRi0sGJhfVehaRyEGJN9Wp2Z1T X-Received: by 2002:a17:906:b03:b0:8b1:7eb7:d53d with SMTP id u3-20020a1709060b0300b008b17eb7d53dmr5197228ejg.49.1676680251797; Fri, 17 Feb 2023 16:30:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676680251; cv=none; d=google.com; s=arc-20160816; b=t1DcYDywtfOz6GZHzPdUXsmdfC6w+VkroD3hxvg9+Kh5EBk/o+Wz6nEv4O4jwZZ/+c tEjohMj12u9MQ2+6dOsauY3IlkoaZyfs+LVlHCzSEnsUMwL3UUXMco5W/V7knBfKr/kx aGpDxoo1KlO4JcJDrH8niYbC0+7F8WOUosbRCpdkXAVs0LVy78I2MZEGMn2FgKmbEXD0 zMmArqzjsPVsNjEDMDGcEJTECy8dtynJCivsB3L4pGC5Zi8ersnX8GWT04G/XUCP4mAu mGodSTszmmU257wUcnMBv9yG8WxNcuwety6EyHZWhiYTSr7+fVH6ox+cOdIsvy/oATbN Tf7A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=6oe0JKndOCFJAJQ6FnTw91/ULLeb3XwnLhf6Mjl8ksE=; b=fC+5VujvS4HCwB2FAkz0gDPy1dBxfJ/HjaRcBn+xTWH8tr2xSj9rKuqV7ESNxaq084 bkJEf+SLiTWjHlXsKDvFcddxvaajrUC3ACEvzMBkCtwxRYIz1sCEvLRPfhGlYmXW/S0/ K8K7ovMDD+HwEbKy79+qG9LrG5VM1xVlCTm1bZSieu64WyKs+G3682SWymPRCIxgaMP8 8Z77RaeAVbkNRWdhopbV00BI/hUmW57tc0tAyd+oVlNDMCBIDcy9v2D/BwPvGWmgl+tg NS7WH1EklDcwTcsdnIQvYaAn6NyzIdV1RLIqIHhVzES3vlHbrhiX4+PDo94XbXoJoN1k VphQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=ZXgI9bmT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id fx13-20020a1709069e8d00b008b130005871si5527652ejc.562.2023.02.17.16.30.28; Fri, 17 Feb 2023 16:30:51 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=ZXgI9bmT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229619AbjBRA3r (ORCPT + 99 others); Fri, 17 Feb 2023 19:29:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43020 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229903AbjBRA3T (ORCPT ); Fri, 17 Feb 2023 19:29:19 -0500 Received: from mail-ua1-x949.google.com (mail-ua1-x949.google.com [IPv6:2607:f8b0:4864:20::949]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 38F3D6B303 for ; Fri, 17 Feb 2023 16:28:58 -0800 (PST) Received: by mail-ua1-x949.google.com with SMTP id f9-20020ab049c9000000b00419afefbe3eso547186uad.4 for ; Fri, 17 Feb 2023 16:28:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=6oe0JKndOCFJAJQ6FnTw91/ULLeb3XwnLhf6Mjl8ksE=; b=ZXgI9bmT7U5wLokZ0C4oNJ6Gk9RcbKcaK8gcoGLiwH9auXPSDVM0Ol5I8tJdiHkQBm cqJXsvHhpPo95w78Rkbk3SLZVdtOTUOQ5nztMCmY5/Ud5mRj3hFPJhvSI6benmg8rsXE zUOmpa69V6q/M9cjYkffaZNIdg3JvQ6rKjZMp6Dj0DYTIaXY+Lg7BLYbHF6CKnajEaK+ wl2mnnELaLd/60dmFNDm2ghVntcF83+VBQ5Wv+xyP+zua7DC9xkWhlP06ETH3LjJnFLh bf40rMtbYVk0AxD73EzwULPcAUGoIp7bBDuosmnG4aaaqWZsAeWgTL7c1Q5nHavHsXWV bZFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=6oe0JKndOCFJAJQ6FnTw91/ULLeb3XwnLhf6Mjl8ksE=; b=xBdmixG4X0rO4cr/lK7de3N0pw0oRTCqyX1b98hzmfShRwMC5lWjNEO3xbkrtJ4vot FQdvFBd4WG77Sn23vFNkcbISjzmrIgZ1pYEmtz4EBNvVGXhqNi6aIMd07IinHm8a+T9z 1c7BBbkid2J1JHxFNm48VCN+ZJRHtrF8M9WWVoDrvu5DSBVcnCAJ3kj/yQQQBZDnCUDk BQs4eA7CtEThoH5MSbeXaBz7k6Ko24mvi9XG7BnDSktQfsHxJgDzMOATZZp3v35jFZE0 w/FDWFpAlECEKQocxVhKOKLiP7diNL8kKbScQxD9okSzuZ2zpUJ6Z4ARlDO71cEbLzgP Vhaw== X-Gm-Message-State: AO0yUKXgbmeLzP5jgwVKl1N7fkjfS7rOGjMyj+OH3tmnX1Lofcj4eHPi NLdukANXts9C5m3yRHqKp4mFl3HBa9GwXPFA X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a1f:9111:0:b0:409:92de:63bd with SMTP id t17-20020a1f9111000000b0040992de63bdmr110245vkd.12.1676680137159; Fri, 17 Feb 2023 16:28:57 -0800 (PST) Date: Sat, 18 Feb 2023 00:27:47 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-15-jthoughton@google.com> Subject: [PATCH v2 14/46] hugetlb: split PTE markers when doing HGM walks From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758126671791377703?= X-GMAIL-MSGID: =?utf-8?q?1758126671791377703?= Fix how UFFDIO_CONTINUE and UFFDIO_WRITEPROTECT interact in these two ways: - UFFDIO_WRITEPROTECT no longer prevents a high-granularity UFFDIO_CONTINUE. - UFFD-WP PTE markers installed with UFFDIO_WRITEPROTECT will be properly propagated when high-granularily UFFDIO_CONTINUEs are performed. Note: UFFDIO_WRITEPROTECT is not yet permitted at PAGE_SIZE granularity. Signed-off-by: James Houghton Acked-by: Mike Kravetz diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 810c05feb41f..f74183acc521 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -506,6 +506,30 @@ static bool has_same_uncharge_info(struct file_region *rg, #endif } +static void hugetlb_install_markers_pmd(pmd_t *pmdp, pte_marker marker) +{ + int i; + + for (i = 0; i < PTRS_PER_PMD; ++i) + /* + * WRITE_ONCE not needed because the pud hasn't been + * installed yet. + */ + pmdp[i] = __pmd(pte_val(make_pte_marker(marker))); +} + +static void hugetlb_install_markers_pte(pte_t *ptep, pte_marker marker) +{ + int i; + + for (i = 0; i < PTRS_PER_PTE; ++i) + /* + * WRITE_ONCE not needed because the pmd hasn't been + * installed yet. + */ + ptep[i] = make_pte_marker(marker); +} + /* * hugetlb_alloc_pmd -- Allocate or find a PMD beneath a PUD-level hpte. * @@ -528,23 +552,32 @@ pmd_t *hugetlb_alloc_pmd(struct mm_struct *mm, struct hugetlb_pte *hpte, pmd_t *new; pud_t *pudp; pud_t pud; + bool is_marker; + pte_marker marker; if (hpte->level != HUGETLB_LEVEL_PUD) return ERR_PTR(-EINVAL); pudp = (pud_t *)hpte->ptep; retry: + is_marker = false; pud = READ_ONCE(*pudp); if (likely(pud_present(pud))) return unlikely(pud_leaf(pud)) ? ERR_PTR(-EEXIST) : pmd_offset(pudp, addr); - else if (!pud_none(pud)) + else if (!pud_none(pud)) { /* - * Not present and not none means that a swap entry lives here, - * and we can't get rid of it. + * Not present and not none means that a swap entry lives here. + * If it's a PTE marker, we can deal with it. If it's another + * swap entry, we don't attempt to split it. */ - return ERR_PTR(-EEXIST); + is_marker = is_pte_marker(__pte(pud_val(pud))); + if (!is_marker) + return ERR_PTR(-EEXIST); + + marker = pte_marker_get(pte_to_swp_entry(__pte(pud_val(pud)))); + } new = pmd_alloc_one(mm, addr); if (!new) @@ -557,6 +590,13 @@ pmd_t *hugetlb_alloc_pmd(struct mm_struct *mm, struct hugetlb_pte *hpte, goto retry; } + /* + * Install markers before PUD to avoid races with other + * page tables walks. + */ + if (is_marker) + hugetlb_install_markers_pmd(new, marker); + mm_inc_nr_pmds(mm); smp_wmb(); /* See comment in pmd_install() */ pud_populate(mm, pudp, new); @@ -576,23 +616,32 @@ pte_t *hugetlb_alloc_pte(struct mm_struct *mm, struct hugetlb_pte *hpte, pgtable_t new; pmd_t *pmdp; pmd_t pmd; + bool is_marker; + pte_marker marker; if (hpte->level != HUGETLB_LEVEL_PMD) return ERR_PTR(-EINVAL); pmdp = (pmd_t *)hpte->ptep; retry: + is_marker = false; pmd = READ_ONCE(*pmdp); if (likely(pmd_present(pmd))) return unlikely(pmd_leaf(pmd)) ? ERR_PTR(-EEXIST) : pte_offset_kernel(pmdp, addr); - else if (!pmd_none(pmd)) + else if (!pmd_none(pmd)) { /* - * Not present and not none means that a swap entry lives here, - * and we can't get rid of it. + * Not present and not none means that a swap entry lives here. + * If it's a PTE marker, we can deal with it. If it's another + * swap entry, we don't attempt to split it. */ - return ERR_PTR(-EEXIST); + is_marker = is_pte_marker(__pte(pmd_val(pmd))); + if (!is_marker) + return ERR_PTR(-EEXIST); + + marker = pte_marker_get(pte_to_swp_entry(__pte(pmd_val(pmd)))); + } /* * With CONFIG_HIGHPTE, calling `pte_alloc_one` directly may result @@ -613,6 +662,9 @@ pte_t *hugetlb_alloc_pte(struct mm_struct *mm, struct hugetlb_pte *hpte, goto retry; } + if (is_marker) + hugetlb_install_markers_pte(page_address(new), marker); + mm_inc_nr_ptes(mm); smp_wmb(); /* See comment in pmd_install() */ pmd_populate(mm, pmdp, new); @@ -7384,7 +7436,12 @@ static int __hugetlb_hgm_walk(struct mm_struct *mm, struct vm_area_struct *vma, if (!pte_present(pte)) { if (!alloc) return 0; - if (unlikely(!huge_pte_none(pte))) + /* + * In hugetlb_alloc_pmd and hugetlb_alloc_pte, + * we split PTE markers, so we can tolerate + * PTE markers here. + */ + if (unlikely(!huge_pte_none_mostly(pte))) return -EEXIST; } else if (hugetlb_pte_present_leaf(hpte, pte)) return 0; From patchwork Sat Feb 18 00:27:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58820 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp142491wrn; Fri, 17 Feb 2023 16:30:47 -0800 (PST) X-Google-Smtp-Source: AK7set9fNsJirhnaChnui3kam5YMvzo9K3tH9nkW04deW/HnPGrNS/hNeS9Y6ib8RK726rmn0kI8 X-Received: by 2002:a17:907:9198:b0:860:c12c:14f9 with SMTP id bp24-20020a170907919800b00860c12c14f9mr1349890ejb.40.1676680247318; Fri, 17 Feb 2023 16:30:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676680247; cv=none; d=google.com; s=arc-20160816; b=u4eZ/tFAbrZ2OIeIVkzZfiVSZKJT9zD9xuFERU+awu/sF9V/CpmOq6lu3+jk+MZLR2 ccW7T7PmDtsXNQf+ytKvmltcOJRc+59vuS5QUEpzlj530stDUi/763E4+3D6jQYUtxKi uQFGAwfzOjxpi0cQstqmFc03FTsgQBfKlvfiNNFy/Nsh8lNRYyERvKYFHb89zjmzHUf9 cbq3KZFgtVn1QRTVSMSVtB5RbPA3iqhxY+pKnE62XPn+d8f5eATtpCDCPqDpYeTfMixI mtW/tUfBjJ1B1rEGmxDCiN2HdtfcU44ML6MEdmdMtSHYdujwRB3T3vtcPY4ALKETkHuU uMVQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=xG9DbE8VOder+S14FjxoH+IwzXAdpZM7a0a5kZcYva0=; b=gOnsxS1JYQ82UuIAEU7MqUrYzeo5nG8L8xtXxypoH1eFOzX15EanDykyLQFDU+RRK3 1Q48vBp+BN1vub3mRjnGqVMTPHuYRQ/VbTQt2BaJR7zFgTGaZio1dRLFw9VQyFPZ4U9i c+gPaeTxjoZ2sQ46lSbjK3LS1Y/ylXeGpZOPMH2PZk5CA8dm0rKzwdEDCOUb/vI4Hwtg LPsmnZCNH36p3Raad0GZRIkXP/Et+R1F9R0hWZg61cKCPkMsfnbko8PbPklIAWd6Zyhk UQBxjsj/AuLzJ3vHXP2JZ7wNaHD8m0Gns9ydja5tKSFe4gA+/eYzkQFS3/JbrFZSPDRg UOXA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=a3A4PV5p; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bp14-20020a170907918e00b007ade4c97618si5969921ejb.930.2023.02.17.16.30.23; Fri, 17 Feb 2023 16:30:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=a3A4PV5p; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230081AbjBRA3n (ORCPT + 99 others); Fri, 17 Feb 2023 19:29:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43120 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229947AbjBRA3T (ORCPT ); Fri, 17 Feb 2023 19:29:19 -0500 Received: from mail-vk1-xa4a.google.com (mail-vk1-xa4a.google.com [IPv6:2607:f8b0:4864:20::a4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 20AC96B311 for ; Fri, 17 Feb 2023 16:28:58 -0800 (PST) Received: by mail-vk1-xa4a.google.com with SMTP id n123-20020a1fbd81000000b00401684aa41aso609622vkf.17 for ; Fri, 17 Feb 2023 16:28:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=xG9DbE8VOder+S14FjxoH+IwzXAdpZM7a0a5kZcYva0=; b=a3A4PV5p5qh9WhQPMBGPxG18XU5syYapjVA8cJrIQKg9SlfzzX0meZ5KSEbjCS76/h KkgUk1GyrdDSRmNwgJs67ciqNHmVAv5DJmhTpojLF0Uyhx6Rxq0nYHhEXa500KqHdLSY 8OPHt1vFrno7tMNUTlD5OZynE2nE9LC7G2tsTG/pmDJw36wEBjXLJkf3kjpEqyBFrFHU onGHC9FIBifpRYHynXz9NhBphtMbFvKNAw8XziHocmnSczkwT0klgF5M5m56ALq64vUA bsFoi/I6FyoSWC8X++RYI9nsaIKHXsGED5l665XewLFlFoDtjmgAh+mOcZ6rWrr94rxB 1OVA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xG9DbE8VOder+S14FjxoH+IwzXAdpZM7a0a5kZcYva0=; b=44XdOPU0tjiy/GdRBHvTZJQxlwEc8kTJGm83Bx1p+wkIszjxL9vD+cKhBEYtyqGJCq xpEqttxco6656KAfXOFz2cny1Rhd6fajnXNtzxY/MyzRFNrcL402xU8gJyHCE3rbprw9 R4dHWAYx44EuUwkEq3IaQtCQpaa3soxpIw9i+DEPGV1nItk6yKQN0IfxGJfrIRSoLfhn Q5MkW56KnoJo2EBcO8gpQ4oXp3Dpi9FesMD8lcT68Bnj2wm5DspYjhBtTmmL0Oyww6bz AYlenYlmukvtykVqqmmkYDcgxsOPesqcN9EoE1DSlPCuUHLuwhVOmBbMFvp7aj96IuIy UBug== X-Gm-Message-State: AO0yUKWWwpudVbK7JLBdHtRgIWe8Fll5ur2FiOKcKdarbDoLp6HGLizN PTeaC1y4j9ogAej6+4/rPGN+g2mk/yu2Bru+ X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:ab0:53d9:0:b0:68d:6360:77b with SMTP id l25-20020ab053d9000000b0068d6360077bmr26282uaa.1.1676680137978; Fri, 17 Feb 2023 16:28:57 -0800 (PST) Date: Sat, 18 Feb 2023 00:27:48 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-16-jthoughton@google.com> Subject: [PATCH v2 15/46] hugetlb: add make_huge_pte_with_shift From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758126667439059488?= X-GMAIL-MSGID: =?utf-8?q?1758126667439059488?= This allows us to make huge PTEs at shifts other than the hstate shift, which will be necessary for high-granularity mappings. Acked-by: Mike Kravetz Signed-off-by: James Houghton Reviewed-by: Mina Almasry diff --git a/mm/hugetlb.c b/mm/hugetlb.c index f74183acc521..ed1d806020de 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5110,11 +5110,11 @@ const struct vm_operations_struct hugetlb_vm_ops = { .pagesize = hugetlb_vm_op_pagesize, }; -static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page, - int writable) +static pte_t make_huge_pte_with_shift(struct vm_area_struct *vma, + struct page *page, int writable, + int shift) { pte_t entry; - unsigned int shift = huge_page_shift(hstate_vma(vma)); if (writable) { entry = huge_pte_mkwrite(huge_pte_mkdirty(mk_pte(page, @@ -5128,6 +5128,14 @@ static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page, return entry; } +static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page, + int writable) +{ + unsigned int shift = huge_page_shift(hstate_vma(vma)); + + return make_huge_pte_with_shift(vma, page, writable, shift); +} + static void set_huge_ptep_writable(struct vm_area_struct *vma, unsigned long address, pte_t *ptep) { From patchwork Sat Feb 18 00:27:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58838 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp143027wrn; Fri, 17 Feb 2023 16:31:57 -0800 (PST) X-Google-Smtp-Source: AK7set/CqVcv0DzIej5mffzNJ+crQkbxH2G+GeoM0Ewd78F3c5bjXadZiUdTCxf3Sf8W/Mh7xSC8 X-Received: by 2002:a17:906:4d8c:b0:8ae:f73e:233f with SMTP id s12-20020a1709064d8c00b008aef73e233fmr3249972eju.32.1676680317506; Fri, 17 Feb 2023 16:31:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676680317; cv=none; d=google.com; s=arc-20160816; b=K6s3XXuqbhUQSRdsVTJc0qjIZdxLggDU5nr7Tji3huqp08620i5/Z2hpcbWdHzev4c MWt9Yea+B8bvkUf5EeeTf5HBCrHJBcAbPe1PB6AzsA3pxfDg8JbQxDQX1fuvgJ8M7itG 71G6bMxNJHd9eprZnfvdgzA7Y6F2XncpkDd/+hkiynn4EJ+iR8Sa0Hr9Wm9F7C7F48hO S0gIXyojQVom1zS8OtatFwjOtFmOAJCNnRrZwuFRnOf++jrIucnFF6E3slvn9WeMvEiC VHngiZgrbyPmo5QODluDW7xZ77joAS/HeFeYsdlnYw8dWb29/q5C/0CE2EvSpQI6/vJC 4IdQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=iq/7jaT5OFDRS9NrMnKt72E+h6CEDJFDrueWaD5zQ0Y=; b=hkNVCE7RFDQ8/OtfgvZ1h9SnjMWNY9fi0PSV41KnSIfmmnq3llCaoFwbVIPUAF12hm W0aoIMGMztnssvuagK3BdeTOw2H+eS5jlGT05ioQ3o4i8vEFKfN6e9aJ5hakty1J+iBi erp427sYeqZe/2WXJSDAPyXDjXB/hwIjQ8vsShocDo38dYETn65Um3nyt79stq4UlAdF WefushNBOdU8gaXjCJtn9P1adnzXDgvVZ1OZCsqO79Rz1VMqthID7opvL4FSgUvtujSS ghKLf5Y9M+ys95mxY4Ic6Ymb9QSqLFiOAEbQmtt3rxxGzyn+DuiBASMN0I7hhVoI4pi5 4LfQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=LJj15ou5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w19-20020a170906969300b008b9b135aef1si1427983ejx.997.2023.02.17.16.31.34; Fri, 17 Feb 2023 16:31:57 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=LJj15ou5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230133AbjBRAbA (ORCPT + 99 others); Fri, 17 Feb 2023 19:31:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43520 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229773AbjBRA36 (ORCPT ); Fri, 17 Feb 2023 19:29:58 -0500 Received: from mail-ua1-x949.google.com (mail-ua1-x949.google.com [IPv6:2607:f8b0:4864:20::949]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D764F692BA for ; Fri, 17 Feb 2023 16:29:19 -0800 (PST) Received: by mail-ua1-x949.google.com with SMTP id f9-20020ab049c9000000b00419afefbe3eso547545uad.4 for ; Fri, 17 Feb 2023 16:29:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1676680159; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=iq/7jaT5OFDRS9NrMnKt72E+h6CEDJFDrueWaD5zQ0Y=; b=LJj15ou5aMEGPdFnqegHuN/91OeXfp4rRCRW3cHsebkq5WCcCf2NnXo+eHh1uMaO0x HGob/Qz7INsRX4HMjLzhzpa+KbL6w+JBpKkbb6I/Ysl4ceV4fqt+kMp8qMdpjJbgCgDd Tcz7EoAzke3MYE3bcoJtrb8uvPbhZOxumTn85X2FXJuWHnvKqPNwDU25UykrRCmdga/F knEIFRpT0+EamIE3eGJxptV9TuGnb5/oXNb+6p7X3bGPy9+qYpYd06XgumHdicpJSnfx vsXDE/Y+0UGaB7YP7fdTkJQcKpB/Er0uopshKy4OGw0b9a8rj98PAJY4bfzYQJ4aBoYZ rlXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1676680159; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=iq/7jaT5OFDRS9NrMnKt72E+h6CEDJFDrueWaD5zQ0Y=; b=h52h9LsO2+kALS3Uzc3ZFJ/Q3YCMhrMgvqr3U24xXrk+RYkys4/CGnLKgP3g9DbT+s dWJPnkz976UMx+uM9z+1O7fCy3pT1LbHMxu7EwlHkClJDAPPW9DDy/UzCY1vyvzscpvn lG9r8n/6e7QFzPAnbspy2qlXEItj6BB0kwYo1qLq7fv+flPIa7m2c2keWjiMnqaR6nrb gJDbk7ZVd0QEJFmiN0ZkLxL/8laBFs4rRbFtVY8o5Iael+V7ZZdfUyWpCrPHbKEgTZ+8 InMMeB2urEAD8Llqpe87wzlDJzggexTvwcPUk+Cwpjqp1ciYSYAyaD9+HOsiHoeie4O4 BFbg== X-Gm-Message-State: AO0yUKVaeD27MiKgHlL7DA3SyU5UbqAWtkmf+feLYNu+5cORjikB/erA 3nSpRHu8rfoemmpzZ0HHU7kQIT2Q3rE0vgQ1 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a67:f1c6:0:b0:3ec:ab8:a571 with SMTP id v6-20020a67f1c6000000b003ec0ab8a571mr271401vsm.55.1676680139239; Fri, 17 Feb 2023 16:28:59 -0800 (PST) Date: Sat, 18 Feb 2023 00:27:49 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-17-jthoughton@google.com> Subject: [PATCH v2 16/46] hugetlb: make default arch_make_huge_pte understand small mappings From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758126740724368032?= X-GMAIL-MSGID: =?utf-8?q?1758126740724368032?= This is a simple change: don't create a "huge" PTE if we are making a regular, PAGE_SIZE PTE. All architectures that want to implement HGM likely need to be changed in a similar way if they implement their own version of arch_make_huge_pte. Signed-off-by: James Houghton Reviewed-by: Mike Kravetz diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 726d581158b1..b767b6889dea 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -899,7 +899,7 @@ static inline void arch_clear_hugepage_flags(struct page *page) { } static inline pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags) { - return pte_mkhuge(entry); + return shift > PAGE_SHIFT ? pte_mkhuge(entry) : entry; } #endif From patchwork Sat Feb 18 00:27:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58822 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp142641wrn; Fri, 17 Feb 2023 16:31:09 -0800 (PST) X-Google-Smtp-Source: AK7set8Mc0hyinTvfg4nT94IbFB1dIne2uopVSum9TUW4pmXCt4DAtJor4Ow7nuF4g93rwBsgSv6 X-Received: by 2002:a17:906:198f:b0:8b1:77bf:5b9f with SMTP id g15-20020a170906198f00b008b177bf5b9fmr5376367ejd.13.1676680269152; Fri, 17 Feb 2023 16:31:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676680269; cv=none; d=google.com; s=arc-20160816; b=S8o3nwFggIHSAQgu6kCD7xS21tENGXKl81PEzcjeo1iShSuLvPrHB7uiu8+dhkBYzq ilSLb5wJX2DwGkHBjVJAZm5Z2reRsFssN1gkO9ecHLbzh9io5QUi0eLXGi8dMtrD8g0S p2/yQPL4sg6r6ett418qH3wXGk/l4bGfRtLxN1oiiKdzX/3I7guXFgRtc8vMWoEEg6sP FM96g1Gb9r5Zbi0OaRTWzsJBS5RIMgEfbFJPhjMpzTiFesZbgO00pkTDTqOEFiNpg/Ln h3ibYW6bZpxaQOyCMx0ukexxATuF9xTvjKuVHuZ8VrMazQtfT9bZjFYNk5IEIpNNwirm riyA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=0wSMKMWzwfQ8fLAFHXt0NmwJTi1H1o52qxxMn528B3Y=; b=udw1+A8E7mPF8ioJbXlyy+/8m4V1MAINoEcT/FN3i9cDskROVHQkYpJ2+VhqhkoDhW I11zQrdn07aYWVxYUQy3v2X5An0W31vzNTgiJZKffITA+Diw0FLTWotTxPP1rVy3ZZY6 oV/2p51vH4m+uH6Kaa3emKvZHy3I+X6TN2mwRbykcP34tJCjSyBphG2nabbkLgtLnUks 5lnkXwSdvWhP1lIIsDj/GSPE8rwmBLNFKlM9sGQ5BODBEhWo8/uBAIKHl7xa+yJJjy4x BL1LLnPcETVQA8xPIeL2L9NBvZ5h91srr8sW5xKhupm4BAC/okKemx2PhuKok0/fba0h Mamw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=axPHyxy1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c22-20020a17090620d600b008b1801cac0dsi4184446ejc.417.2023.02.17.16.30.45; Fri, 17 Feb 2023 16:31:09 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=axPHyxy1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230061AbjBRA35 (ORCPT + 99 others); Fri, 17 Feb 2023 19:29:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41840 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229773AbjBRA3X (ORCPT ); Fri, 17 Feb 2023 19:29:23 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4D8F66C00F for ; Fri, 17 Feb 2023 16:29:01 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id k72-20020a25244b000000b0083fa6f15c2fso1931074ybk.16 for ; Fri, 17 Feb 2023 16:29:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=0wSMKMWzwfQ8fLAFHXt0NmwJTi1H1o52qxxMn528B3Y=; b=axPHyxy1GBdJ3j5NWk4sW5NYHawOKf9AoibB4HV5FfIZJydc97adJIODrPR+QPgxNK nr00e5FMgkdE8K907sG4ZDoHr5Lt8bsvRMbzhBZu9D/vDZvX+woxXErLOoWLQSU7lSB3 ColvlhjaPZZh6IxAnrEln+IITIxYCmy04Z2l1SbzE3B6POnZJUSMCAYWIRaT1Yw6pXLR Wq0cbmdstbtdCR3d2d+bJjVqqNZDwx8tE8yRYgFkpy+28U20Db+9VK0iOYSso+pD2sG8 OJeoIbKDbGGxUqdzljTj52m+1rUrxcY+G67jmbfYWqpHTo5y/BK5aqWA0UQIkWR55NRP GAqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=0wSMKMWzwfQ8fLAFHXt0NmwJTi1H1o52qxxMn528B3Y=; b=M9PdICCwevoqsB+YVTZsRvQA61+M95wXb0iiBN3kIbODe4IrbG7LRQmgVpBlmVrqaS 3PDMXLyKm5+6I0IXX1hKLyLZdou2vq95clAAR4aCJGS/qRCvtcAjtqz40z7Y5lMaK9KV l0E9+fyxGoU4kbf198vOow1giRLatW4yVE/bHiPLqaRhoWLyrec8sTlUMOxnQyG7V4Yl OJNMLEeAKhvBq3QytuB34asEcjSPnnVWKLhZTdWfg7zb7hpD1jpUn15fOct9SWRFBABH WcWWYLlpe5zA91qiP5lJn4J+vmB984/78vdtDnOS9Fc/0Q9vkYn9A9algB6N+kmL6hzW Aa4g== X-Gm-Message-State: AO0yUKUGGrCD7M5JpPpo+d/M2EdLTg+ulxaUvXg1m2BkERda9pyfJIUJ Uu30ZxWxXHgh0wnMwBQ7wgQAQC5hRf6PGl5/ X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:690c:38d:b0:533:a15a:d33e with SMTP id bh13-20020a05690c038d00b00533a15ad33emr73114ywb.5.1676680140092; Fri, 17 Feb 2023 16:29:00 -0800 (PST) Date: Sat, 18 Feb 2023 00:27:50 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-18-jthoughton@google.com> Subject: [PATCH v2 17/46] hugetlbfs: do a full walk to check if vma maps a page From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758126690060287624?= X-GMAIL-MSGID: =?utf-8?q?1758126690060287624?= Because it is safe to do so, do a full high-granularity page table walk to check if the page is mapped. Signed-off-by: James Houghton diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index cfd09f95551b..c0ee69f0418e 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -386,17 +386,24 @@ static void hugetlb_delete_from_page_cache(struct folio *folio) static bool hugetlb_vma_maps_page(struct vm_area_struct *vma, unsigned long addr, struct page *page) { - pte_t *ptep, pte; + pte_t pte; + struct hugetlb_pte hpte; - ptep = hugetlb_walk(vma, addr, huge_page_size(hstate_vma(vma))); - if (!ptep) + if (hugetlb_full_walk(&hpte, vma, addr)) return false; - pte = huge_ptep_get(ptep); + pte = huge_ptep_get(hpte.ptep); if (huge_pte_none(pte) || !pte_present(pte)) return false; - if (pte_page(pte) == page) + if (unlikely(!hugetlb_pte_present_leaf(&hpte, pte))) + /* + * We raced with someone splitting us, and the only case + * where this is impossible is when the pte was none. + */ + return false; + + if (compound_head(pte_page(pte)) == page) return true; return false; From patchwork Sat Feb 18 00:27:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58824 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp142691wrn; Fri, 17 Feb 2023 16:31:13 -0800 (PST) X-Google-Smtp-Source: AK7set+kpsx8IPPJGwk+0Cseb2dvYf6Wou3cDGpDk5o0Klu/kC7BbaixyvDxg2ad7P+Wxsf+bipr X-Received: by 2002:a17:906:16d9:b0:886:221b:44e5 with SMTP id t25-20020a17090616d900b00886221b44e5mr1069839ejd.62.1676680273492; Fri, 17 Feb 2023 16:31:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676680273; cv=none; d=google.com; s=arc-20160816; b=YclFvVxioGv3Fe1g2NieoEKOzmBcbjkEvdssb5Bd6IvmyZgRfOjEOdvWjjGhY7yrLB L3tWtSKy9lITsvR1Sx21iw8TxodVKmAba6dbbGOzn7oPQMCvBpJx3qKQGHIlwjAjKA1t XcgBiCwqZj/w1zLwE2yi699TCXtOoAnpjv9dICdhyaXA68tp0GgBs2J7MD+FON0b8hZv nj/DpBfsyZLtHwfb2LjvnWPvgfn9G48eZ9d3ArNsDGD4YmaoHfgCmftxtNq9wmVw7VJO aj0A3uKePmqRuVsINb6jIR/opSf8z7mZ2rmVaWQJMYO96e0HuFi5oCWax5Wo1BxTxfrj c5WQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=Of81VsYOGaJiC6DFXM6sxGYcirLfRpMhUkY/yI+h7kc=; b=KJXhRTAbtikgDAeeLfhHUkNSG1+BybqrDyG1L7mVlrbYpVIe8pYLJ2X/UVHR7dhAuU /QlPDa76GPMcX2bVjgL0VKujwNnXzLbU9kJcfEnLH2TNkGK0BXu9/o2X2vlseUqV/Oen cZswaSv0SbiHG7n4d/BX4vWOCa24dqJOhtnZjX1YiB7KA6Lx0f0UJsR3QyctKnDNuZz3 JM50069iAmYRocLcGkDNsaD+X0PHkrsRhWp7jQsSHdSFq/XT+N9JgP38ZeYVfcmw1r0d TtJcE4Ra++GHkAsiT0nyn+ygyWSq93a6NEBMv1qdhFpmAX/dS4tZOMoo0LNTYM3Bker+ lT3w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=faTIHphc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d3-20020aa7c1c3000000b004a21a662064si6511672edp.352.2023.02.17.16.30.49; Fri, 17 Feb 2023 16:31:13 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=faTIHphc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230128AbjBRA36 (ORCPT + 99 others); Fri, 17 Feb 2023 19:29:58 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41876 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230110AbjBRA33 (ORCPT ); Fri, 17 Feb 2023 19:29:29 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4D9D56C012 for ; Fri, 17 Feb 2023 16:29:01 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-5365a2b9e4fso19055427b3.15 for ; Fri, 17 Feb 2023 16:29:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Of81VsYOGaJiC6DFXM6sxGYcirLfRpMhUkY/yI+h7kc=; b=faTIHphcD3M2ErzVxaHvUUDOZt6TLoZA4Xzil1xu0/G1//a74JWzTWI0tyBHfpuNha JuiaDE7sP9JLYbJHclSCn1zLcoQow2nzU2lXYNGugjxvfih5hgYInPRhhLcCctGhNLo3 KwFJIDeU1tiOhE1xHmEihdXHq7cGiCAhH0q2zoPtNQjwpXYxrAP0w6dyKJSxmBEiRhDd 7V/h4dydjcI+nPCAOZLKD8SpPxrMq1uA1FOYjkSBWBh0f82wcANh+dEFe2hWIerkaZPV R7mY45T1lGt6PPZpw4zMX2iPUJ6tZ/5sKy1qlMP0mc3wUXq6LpfnZyl3pSM6f22otbp5 BCNA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Of81VsYOGaJiC6DFXM6sxGYcirLfRpMhUkY/yI+h7kc=; b=dpe7MO5yNPjJOD6pNxRuArZ7++05XNtRPP8xXitwwP3HTMVFevvgFpgg6YonnxYYqG eY0MOVA15pZ6I1g4FtanrJYSwOtDQgJ9P2rBPLde1sW+ey016G1Vo8fC16JM+nvNYulj +2tNx5LLD0hbO05cnV5+xbjsBOPvloZehtkxSUbLSXtj8mtTiMXxEyt1W9FFEOz4/d4t GZiiOagUHShfD9aFRAmE9qKM3So62cDuzW6SK1KWQ/dQQQVWqTuavJ70dGc6NgcAEYlK PKSG9nZbZADSlPA+2KnMm1EKG9NzKoztLrV5paXcdYSj0+sZbhxwTYHBFoThxm9wEzkF uE4g== X-Gm-Message-State: AO0yUKUCkRCKGExYhIeAB3biDJOYmrjFQPJSxlM6WIKLnQ4bZCt/kqIU qUPc9lYYqwEglHxqx4AEWiVjNf/ryU5gxqUm X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a5b:889:0:b0:95b:7778:5158 with SMTP id e9-20020a5b0889000000b0095b77785158mr63089ybq.12.1676680140991; Fri, 17 Feb 2023 16:29:00 -0800 (PST) Date: Sat, 18 Feb 2023 00:27:51 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-19-jthoughton@google.com> Subject: [PATCH v2 18/46] hugetlb: add HGM support to __unmap_hugepage_range From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758126694389066339?= X-GMAIL-MSGID: =?utf-8?q?1758126694389066339?= Enlighten __unmap_hugepage_range to deal with high-granularity mappings. This doesn't change its API; it still must be called with hugepage alignment, but it will correctly unmap hugepages that have been mapped at high granularity. Eventually, functionality here can be expanded to allow users to call MADV_DONTNEED on PAGE_SIZE-aligned sections of a hugepage, but that is not done here. Introduce hugetlb_remove_rmap to properly decrement mapcount for high-granularity-mapped HugeTLB pages. Signed-off-by: James Houghton diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h index b46617207c93..31267471760e 100644 --- a/include/asm-generic/tlb.h +++ b/include/asm-generic/tlb.h @@ -598,9 +598,9 @@ static inline void tlb_flush_p4d_range(struct mmu_gather *tlb, __tlb_remove_tlb_entry(tlb, ptep, address); \ } while (0) -#define tlb_remove_huge_tlb_entry(h, tlb, ptep, address) \ +#define tlb_remove_huge_tlb_entry(tlb, hpte, address) \ do { \ - unsigned long _sz = huge_page_size(h); \ + unsigned long _sz = hugetlb_pte_size(&hpte); \ if (_sz >= P4D_SIZE) \ tlb_flush_p4d_range(tlb, address, _sz); \ else if (_sz >= PUD_SIZE) \ @@ -609,7 +609,7 @@ static inline void tlb_flush_p4d_range(struct mmu_gather *tlb, tlb_flush_pmd_range(tlb, address, _sz); \ else \ tlb_flush_pte_range(tlb, address, _sz); \ - __tlb_remove_tlb_entry(tlb, ptep, address); \ + __tlb_remove_tlb_entry(tlb, hpte.ptep, address);\ } while (0) /** diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index b767b6889dea..1a1a71868dfd 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -160,6 +160,9 @@ struct hugepage_subpool *hugepage_new_subpool(struct hstate *h, long max_hpages, long min_hpages); void hugepage_put_subpool(struct hugepage_subpool *spool); +void hugetlb_remove_rmap(struct page *subpage, unsigned long shift, + struct hstate *h, struct vm_area_struct *vma); + void hugetlb_dup_vma_private(struct vm_area_struct *vma); void clear_vma_resv_huge_pages(struct vm_area_struct *vma); int hugetlb_sysctl_handler(struct ctl_table *, int, void *, size_t *, loff_t *); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ed1d806020de..ecf1a28dbaaa 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -120,6 +120,28 @@ enum hugetlb_level hpage_size_to_level(unsigned long sz) return HUGETLB_LEVEL_PGD; } +void hugetlb_remove_rmap(struct page *subpage, unsigned long shift, + struct hstate *h, struct vm_area_struct *vma) +{ + struct page *hpage = compound_head(subpage); + + if (shift == huge_page_shift(h)) { + VM_BUG_ON_PAGE(subpage != hpage, subpage); + page_remove_rmap(hpage, vma, true); + } else { + unsigned long nr_subpages = 1UL << (shift - PAGE_SHIFT); + struct page *final_page = &subpage[nr_subpages]; + + VM_BUG_ON_PAGE(HPageVmemmapOptimized(hpage), hpage); + /* + * Decrement the mapcount on each page that is getting + * unmapped. + */ + for (; subpage < final_page; ++subpage) + page_remove_rmap(subpage, vma, false); + } +} + static inline bool subpool_is_free(struct hugepage_subpool *spool) { if (spool->count) @@ -5466,10 +5488,10 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct { struct mm_struct *mm = vma->vm_mm; unsigned long address; - pte_t *ptep; + struct hugetlb_pte hpte; pte_t pte; spinlock_t *ptl; - struct page *page; + struct page *hpage, *subpage; struct hstate *h = hstate_vma(vma); unsigned long sz = huge_page_size(h); unsigned long last_addr_mask; @@ -5479,35 +5501,33 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct BUG_ON(start & ~huge_page_mask(h)); BUG_ON(end & ~huge_page_mask(h)); - /* - * This is a hugetlb vma, all the pte entries should point - * to huge page. - */ - tlb_change_page_size(tlb, sz); tlb_start_vma(tlb, vma); last_addr_mask = hugetlb_mask_last_page(h); address = start; - for (; address < end; address += sz) { - ptep = hugetlb_walk(vma, address, sz); - if (!ptep) { - address |= last_addr_mask; + + while (address < end) { + if (hugetlb_full_walk(&hpte, vma, address)) { + address = (address | last_addr_mask) + sz; continue; } - ptl = huge_pte_lock(h, mm, ptep); - if (huge_pmd_unshare(mm, vma, address, ptep)) { + ptl = hugetlb_pte_lock(&hpte); + if (hugetlb_pte_size(&hpte) == sz && + huge_pmd_unshare(mm, vma, address, hpte.ptep)) { spin_unlock(ptl); tlb_flush_pmd_range(tlb, address & PUD_MASK, PUD_SIZE); force_flush = true; address |= last_addr_mask; + address += sz; continue; } - pte = huge_ptep_get(ptep); + pte = huge_ptep_get(hpte.ptep); + if (huge_pte_none(pte)) { spin_unlock(ptl); - continue; + goto next_hpte; } /* @@ -5523,24 +5543,35 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct */ if (pte_swp_uffd_wp_any(pte) && !(zap_flags & ZAP_FLAG_DROP_MARKER)) - set_huge_pte_at(mm, address, ptep, + set_huge_pte_at(mm, address, hpte.ptep, make_pte_marker(PTE_MARKER_UFFD_WP)); else - huge_pte_clear(mm, address, ptep, sz); + huge_pte_clear(mm, address, hpte.ptep, + hugetlb_pte_size(&hpte)); + spin_unlock(ptl); + goto next_hpte; + } + + if (unlikely(!hugetlb_pte_present_leaf(&hpte, pte))) { + /* + * We raced with someone splitting out from under us. + * Retry the walk. + */ spin_unlock(ptl); continue; } - page = pte_page(pte); + subpage = pte_page(pte); + hpage = compound_head(subpage); /* * If a reference page is supplied, it is because a specific * page is being unmapped, not a range. Ensure the page we * are about to unmap is the actual page of interest. */ if (ref_page) { - if (page != ref_page) { + if (hpage != ref_page) { spin_unlock(ptl); - continue; + goto next_hpte; } /* * Mark the VMA as having unmapped its page so that @@ -5550,25 +5581,32 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct set_vma_resv_flags(vma, HPAGE_RESV_UNMAPPED); } - pte = huge_ptep_get_and_clear(mm, address, ptep); - tlb_remove_huge_tlb_entry(h, tlb, ptep, address); + pte = huge_ptep_get_and_clear(mm, address, hpte.ptep); + tlb_change_page_size(tlb, hugetlb_pte_size(&hpte)); + tlb_remove_huge_tlb_entry(tlb, hpte, address); if (huge_pte_dirty(pte)) - set_page_dirty(page); + set_page_dirty(hpage); /* Leave a uffd-wp pte marker if needed */ if (huge_pte_uffd_wp(pte) && !(zap_flags & ZAP_FLAG_DROP_MARKER)) - set_huge_pte_at(mm, address, ptep, + set_huge_pte_at(mm, address, hpte.ptep, make_pte_marker(PTE_MARKER_UFFD_WP)); - hugetlb_count_sub(pages_per_huge_page(h), mm); - page_remove_rmap(page, vma, true); + hugetlb_count_sub(hugetlb_pte_size(&hpte)/PAGE_SIZE, mm); + hugetlb_remove_rmap(subpage, hpte.shift, h, vma); spin_unlock(ptl); - tlb_remove_page_size(tlb, page, huge_page_size(h)); /* - * Bail out after unmapping reference page if supplied + * Lower the reference count on the head page. + */ + tlb_remove_page_size(tlb, hpage, sz); + /* + * Bail out after unmapping reference page if supplied, + * and there's only one PTE mapping this page. */ - if (ref_page) + if (ref_page && hugetlb_pte_size(&hpte) == sz) break; +next_hpte: + address += hugetlb_pte_size(&hpte); } tlb_end_vma(tlb, vma); @@ -5846,7 +5884,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, /* Break COW or unshare */ huge_ptep_clear_flush(vma, haddr, ptep); mmu_notifier_invalidate_range(mm, range.start, range.end); - page_remove_rmap(old_page, vma, true); + hugetlb_remove_rmap(old_page, huge_page_shift(h), h, vma); hugepage_add_new_anon_rmap(new_folio, vma, haddr); set_huge_pte_at(mm, haddr, ptep, make_huge_pte(vma, &new_folio->page, !unshare)); From patchwork Sat Feb 18 00:27:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58825 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp142711wrn; Fri, 17 Feb 2023 16:31:16 -0800 (PST) X-Google-Smtp-Source: AK7set9nCER0L7o4bfLs+nbeCDnaW5BEsZx74/NkNFGp2uxoVOpNKw6dq6dE+chhakwyy7SQaKHw X-Received: by 2002:a17:906:9750:b0:8b3:946d:51c8 with SMTP id o16-20020a170906975000b008b3946d51c8mr4461141ejy.29.1676680275969; Fri, 17 Feb 2023 16:31:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676680275; cv=none; d=google.com; s=arc-20160816; b=H2pDdgV7qqC5WajAxxo4GFMN1+Pt9SE/IYSDlYZxgQTvmUojiiggBx2Ss5YijxQK6O OjidheX0SktejUYoisJ+k5yEGkygwVcdOoT09kEwrHkmlbH5w88Xx1ORS+yeV9GVDo2W 4YJlFiOmn7/vC3/G8FwuMHVWvWM88DqrxY3kRoY+afNYi69GMBZkWXsZ+l2cA8I/T742 X3qPskaWJ4pi3qFvpHVsClxe4OvDL+XFHVZqsv2XBvnti2kRJvxsq0DPsiC1MfxC6CcJ SZ20+5T3tO74WXUdooRLjXU5YQiZNZ3UkZKnoJ1NYZz7AX1pmhKubw+epFghAyyIlly4 BXIw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=+tSPnCkiv+4IG9wAhN3nJiAPBMdRQ7SRBdMiKhfpNi8=; b=CyPwObwAHDyG9ZY+82ivckB1oK+MjNA1ppYv/52FrHoMG324K80Zlo9/Bf8vxL8fXL Bt0XQpvOOWiBs9KWSYGxhhiAZqDvxl+9Vg5qzounA+myNfB5RXgurmNdnLiSdC8WgNVl 1J9F80hXExkahm2FmaP2TybL/F5v8CwstJ18ojMyH1OFJ7JxclMEVgd+tM3RjDdyeo4H ZmOdxMupuBdTeUpWgUMFEfn0NbKV7Fd+rap8/t9K56RHIUjVWDcAeHQ1iuCczU91aOwx zJnXp4ykh0aHs3PQRWwLp/KUvCnlGue7nNq5Nm7x/peAQvR4ycZrZh2yalfVBpm9nl1A zyZA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=tIIvrAMR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id vl24-20020a17090730d800b008b223acdc5esi3125792ejb.226.2023.02.17.16.30.53; Fri, 17 Feb 2023 16:31:15 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=tIIvrAMR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230163AbjBRAaD (ORCPT + 99 others); Fri, 17 Feb 2023 19:30:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41670 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230133AbjBRA3g (ORCPT ); Fri, 17 Feb 2023 19:29:36 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5D4426BDF7 for ; Fri, 17 Feb 2023 16:29:03 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id o14-20020a25810e000000b0095d2ada3d26so1816863ybk.5 for ; Fri, 17 Feb 2023 16:29:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=+tSPnCkiv+4IG9wAhN3nJiAPBMdRQ7SRBdMiKhfpNi8=; b=tIIvrAMRq+3HXFL5rW8Lh7/gCEy2v7mBrYIzZnmVImsr6rHLUrJkHvESEkKIKnP2HA ETKeSXB6gqBCteIJ68LmS1NgcW8NqFCp4N0/v9eSUb5HNjRwRpHF7O8+s4KVElrP4dI4 Zjvu8aQ6Opgdsvu/yAqPR0lJhv574KFnXRf4LH++FsPR7lEug2Wl9pAFrHWFu0GZq42N Vy0SevF4NJaR0vgAKFGoEdFDQKl01vAhAwbwFzmxhSRiOGi2HQEkv51SNB+xC5ZLD4Tm F1M7gD13G/x1ohwgsWJjFr2DhGgQpTZDe/17U0ZGQKm1J8xgGRPTne6Pgw+hiXkqL9v1 Pi9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=+tSPnCkiv+4IG9wAhN3nJiAPBMdRQ7SRBdMiKhfpNi8=; b=2jPgZkFUViozVZ5jdoVD8eX8E9BfxcdbQBBu/MRq9Eqythmw03kh5CbbaoQqNVOB2I EQyOXwNu+dgDhgYOTsp/suhR8X0COTvOll1FKwOC0Ci4Wn/Q6RyHPyNs0rm7cDD8FUs8 DMgcjsBkbouiTbgTl/gl/HACx4thgm2UClCffL0uql1GE6e7KLLO9be3luNmB3Vx7nCW ACFcxS/rQAcVvramB9TxP2scXwTnyQR+kHVlbbcpX+A/Gffqc3kZ3LZwu6dlqx3mxE6h /veBKvXwfVgfOAtFepMixUKyOkE8fF0moZPpUyJvluONW+bnRLaRkfqRBCju715EYtgV oV+A== X-Gm-Message-State: AO0yUKXiGhVRyion8IEAQKlTv/bwfwq3WeQYtvMRgoPnkEnk5YJrfPKg fdeBX05Qqqo9IXtLmkuA80akt2EywcZOz1Kw X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a81:46d6:0:b0:52e:e3a8:d0b9 with SMTP id t205-20020a8146d6000000b0052ee3a8d0b9mr1575870ywa.509.1676680141961; Fri, 17 Feb 2023 16:29:01 -0800 (PST) Date: Sat, 18 Feb 2023 00:27:52 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-20-jthoughton@google.com> Subject: [PATCH v2 19/46] hugetlb: add HGM support to hugetlb_change_protection From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758126696914798107?= X-GMAIL-MSGID: =?utf-8?q?1758126696914798107?= The main change here is to do a high-granularity walk and pulling the shift from the walk (not from the hstate). Signed-off-by: James Houghton diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ecf1a28dbaaa..7321c6602d6f 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6900,15 +6900,15 @@ long hugetlb_change_protection(struct vm_area_struct *vma, { struct mm_struct *mm = vma->vm_mm; unsigned long start = address; - pte_t *ptep; pte_t pte; struct hstate *h = hstate_vma(vma); - long pages = 0, psize = huge_page_size(h); + long base_pages = 0, psize = huge_page_size(h); bool shared_pmd = false; struct mmu_notifier_range range; unsigned long last_addr_mask; bool uffd_wp = cp_flags & MM_CP_UFFD_WP; bool uffd_wp_resolve = cp_flags & MM_CP_UFFD_WP_RESOLVE; + struct hugetlb_pte hpte; /* * In the case of shared PMDs, the area to flush could be beyond @@ -6926,39 +6926,43 @@ long hugetlb_change_protection(struct vm_area_struct *vma, hugetlb_vma_lock_write(vma); i_mmap_lock_write(vma->vm_file->f_mapping); last_addr_mask = hugetlb_mask_last_page(h); - for (; address < end; address += psize) { + while (address < end) { spinlock_t *ptl; - ptep = hugetlb_walk(vma, address, psize); - if (!ptep) { + if (hugetlb_full_walk(&hpte, vma, address)) { if (!uffd_wp) { - address |= last_addr_mask; + address = (address | last_addr_mask) + psize; continue; } /* * Userfaultfd wr-protect requires pgtable * pre-allocations to install pte markers. + * + * Use hugetlb_full_walk_alloc to allocate + * the hstate-level PTE. */ - ptep = huge_pte_alloc(mm, vma, address, psize); - if (!ptep) { - pages = -ENOMEM; + if (hugetlb_full_walk_alloc(&hpte, vma, + address, psize)) { + base_pages = -ENOMEM; break; } } - ptl = huge_pte_lock(h, mm, ptep); - if (huge_pmd_unshare(mm, vma, address, ptep)) { + + ptl = hugetlb_pte_lock(&hpte); + if (hugetlb_pte_size(&hpte) == psize && + huge_pmd_unshare(mm, vma, address, hpte.ptep)) { /* * When uffd-wp is enabled on the vma, unshare * shouldn't happen at all. Warn about it if it * happened due to some reason. */ WARN_ON_ONCE(uffd_wp || uffd_wp_resolve); - pages++; + base_pages += psize / PAGE_SIZE; spin_unlock(ptl); shared_pmd = true; - address |= last_addr_mask; + address = (address | last_addr_mask) + psize; continue; } - pte = huge_ptep_get(ptep); + pte = huge_ptep_get(hpte.ptep); if (unlikely(is_hugetlb_entry_hwpoisoned(pte))) { /* Nothing to do. */ } else if (unlikely(is_hugetlb_entry_migration(pte))) { @@ -6974,7 +6978,7 @@ long hugetlb_change_protection(struct vm_area_struct *vma, entry = make_readable_migration_entry( swp_offset(entry)); newpte = swp_entry_to_pte(entry); - pages++; + base_pages += hugetlb_pte_size(&hpte) / PAGE_SIZE; } if (uffd_wp) @@ -6982,34 +6986,49 @@ long hugetlb_change_protection(struct vm_area_struct *vma, else if (uffd_wp_resolve) newpte = pte_swp_clear_uffd_wp(newpte); if (!pte_same(pte, newpte)) - set_huge_pte_at(mm, address, ptep, newpte); + set_huge_pte_at(mm, address, hpte.ptep, newpte); } else if (unlikely(is_pte_marker(pte))) { /* No other markers apply for now. */ WARN_ON_ONCE(!pte_marker_uffd_wp(pte)); if (uffd_wp_resolve) /* Safe to modify directly (non-present->none). */ - huge_pte_clear(mm, address, ptep, psize); + huge_pte_clear(mm, address, hpte.ptep, + hugetlb_pte_size(&hpte)); } else if (!huge_pte_none(pte)) { pte_t old_pte; - unsigned int shift = huge_page_shift(hstate_vma(vma)); + unsigned int shift = hpte.shift; + + if (unlikely(!hugetlb_pte_present_leaf(&hpte, pte))) { + /* + * Someone split the PTE from under us, so retry + * the walk, + */ + spin_unlock(ptl); + continue; + } - old_pte = huge_ptep_modify_prot_start(vma, address, ptep); + old_pte = huge_ptep_modify_prot_start( + vma, address, hpte.ptep); pte = huge_pte_modify(old_pte, newprot); - pte = arch_make_huge_pte(pte, shift, vma->vm_flags); + pte = arch_make_huge_pte( + pte, shift, vma->vm_flags); if (uffd_wp) pte = huge_pte_mkuffd_wp(pte); else if (uffd_wp_resolve) pte = huge_pte_clear_uffd_wp(pte); - huge_ptep_modify_prot_commit(vma, address, ptep, old_pte, pte); - pages++; + huge_ptep_modify_prot_commit( + vma, address, hpte.ptep, + old_pte, pte); + base_pages += hugetlb_pte_size(&hpte) / PAGE_SIZE; } else { /* None pte */ if (unlikely(uffd_wp)) /* Safe to modify directly (none->non-present). */ - set_huge_pte_at(mm, address, ptep, + set_huge_pte_at(mm, address, hpte.ptep, make_pte_marker(PTE_MARKER_UFFD_WP)); } spin_unlock(ptl); + address += hugetlb_pte_size(&hpte); } /* * Must flush TLB before releasing i_mmap_rwsem: x86's huge_pmd_unshare @@ -7032,7 +7051,7 @@ long hugetlb_change_protection(struct vm_area_struct *vma, hugetlb_vma_unlock_write(vma); mmu_notifier_invalidate_range_end(&range); - return pages > 0 ? (pages << h->order) : pages; + return base_pages; } /* Return true if reservation was successful, false otherwise. */ From patchwork Sat Feb 18 00:27:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58827 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp142740wrn; Fri, 17 Feb 2023 16:31:20 -0800 (PST) X-Google-Smtp-Source: AK7set+Gj60b7tgmZY7/7s18mjapiU0Dc89K0gWhRYgPNRnuRJciaSSmy6ZRBUs50HZHjvk2sjzi X-Received: by 2002:a17:907:6f1b:b0:8a9:273d:634c with SMTP id sy27-20020a1709076f1b00b008a9273d634cmr6171267ejc.21.1676680279818; Fri, 17 Feb 2023 16:31:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676680279; cv=none; d=google.com; s=arc-20160816; b=QYMZ66kFtGIExzLmE1h65iUeOzFAFJ9jXJSVvAyRWONVld1fAjPDe+hsKhFkHZkEO6 YYr1ITFuC42NpbMMly3UKtfxfDU1zhVBZlqxyUibgnXnodqBwTulLIBKUuDzqHPwQHVu 3rrhzlPiT794VmePdgruwKMNE8mPwXmjKn3bIPFeyXQXHv17bhGcvIjwgCEdPWuWU5Nv D9bf92s1gEHl06u5471acuqhNoiTIGwScGqjCiBK81h9T3MXpBvUOCoX87kLoqNWyLuC I8QJ44KoUnIrOcJyVzK0/FWU7M+q1xKNtHMrLFkK5u3gVVFb/nugONnCKiQnCNYLlgvD CIeg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=Wqs0fsEFEjqd+rJsTObxpTCfifnDhyGaJkzr9TFCSvk=; b=Mvzu3TGvwwPSDpptaDp+lCYxGNhpe0DdGqUgIdYA4FqVDvE6k7WgH/915J3kSgS8nr UE6z5dB8B+eUWE4ZdVuO+E1y0rqOaUgBRD+conPdAZ0jD01dHmf1oAza1kc9SJ+d5od+ 0FYhJS0NxG+zyb7opdnfuf0YzWdbAzr69jNjR15b5JBGkr62rkw47Yd6MIZEJhn3Ovev Bs/8LkEQxHvc+9A+cs51R//qoxgP0+aK8L7XS4brmy0r06Cktox9OXBT/XxShHbV0HVw ecmolhC6M89XU8Gu5rR97eNAvImL3/UDr7+t6UucV+jrA9H0rHwnr6qKfDxSJ85zdBI8 G7DA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=cJVTi2N6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 18-20020a170906209200b0084c7b0977b6si8431135ejq.852.2023.02.17.16.30.56; Fri, 17 Feb 2023 16:31:19 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=cJVTi2N6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230152AbjBRAaN (ORCPT + 99 others); Fri, 17 Feb 2023 19:30:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43096 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230144AbjBRA3l (ORCPT ); Fri, 17 Feb 2023 19:29:41 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6E7526BDCD for ; Fri, 17 Feb 2023 16:29:04 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-53659b9818dso20129197b3.18 for ; Fri, 17 Feb 2023 16:29:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Wqs0fsEFEjqd+rJsTObxpTCfifnDhyGaJkzr9TFCSvk=; b=cJVTi2N6kM66ucMEEndfS5PbMZx/snZYIaX4pqh8wHLpb03XCb5lJGQNTsMPX9uGLM 3wrM65Jw9kjrzso9alDlQrPBuVrtt7sPBKbdMEoD3wjSNN9aVSq6XDimkW/z3s2yU53N 01HcXY5734HriS3KEBsLiLM4XpS0m8aYkKnF+jbQkjlqFDqIpmrep+3g8RZmFM7dnjig IH2rbnvxfAr1zdpMGTaaiYQFb5fCvbDD9c/lVJg17L/SIZ0Al4fOA+YIO6cW7+D3MDjz 3rGL2IRT+C4f2EXWISMVmkij1Ys+Tetb9JtllXr0s2awCerx9KT+6hSElNdAsCbda374 SHaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Wqs0fsEFEjqd+rJsTObxpTCfifnDhyGaJkzr9TFCSvk=; b=u+N8/bXbdR9yqAGXnvdn+etPMWwFbvIO+c6RibUsWoIO6rc7KfHpHfO/VU8WQWXDYp T57BC6dztBoWemt3GDklyJfGlwPLgQxOVwrHl+ZtE4t+l8/jNr2Xnuax3FUqK/ISCiUH r9lIamgfWo9KTu+yGAEF/ODPz+YklVcgOe+NBlM5H78svZRFB0JNBquATIYDc3U0fWS3 8btNfVYkT3BnmpAoIrOemZCoKRlani+MqlPnScZ5GTwyfP9dsqv1X5BTTxNAnONh+cFo 3Dg59GJrbru8z+LMhlrRQOYKhBz/lAuZxot9/bdfZVykQ/vfQKRuk4LU3RgQLrvnEry1 PSng== X-Gm-Message-State: AO0yUKV88JXzHZabA/waC93Bno8iTssEFT+a9W22nmqKpXOXjteUtIEM dQzwIHERoSqVkjjXKBa3P58pozXvlAieTbvs X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a5b:711:0:b0:97a:956d:6a4 with SMTP id g17-20020a5b0711000000b0097a956d06a4mr36513ybq.5.1676680143155; Fri, 17 Feb 2023 16:29:03 -0800 (PST) Date: Sat, 18 Feb 2023 00:27:53 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-21-jthoughton@google.com> Subject: [PATCH v2 20/46] hugetlb: add HGM support to follow_hugetlb_page From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758126700902383842?= X-GMAIL-MSGID: =?utf-8?q?1758126700902383842?= Enable high-granularity mapping support in GUP. In case it is confusing, pfn_offset is the offset (in PAGE_SIZE units) that vaddr points to within the subpage that hpte points to. Signed-off-by: James Houghton diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 7321c6602d6f..c26b040f4fb5 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6634,11 +6634,9 @@ static void record_subpages_vmas(struct page *page, struct vm_area_struct *vma, } static inline bool __follow_hugetlb_must_fault(struct vm_area_struct *vma, - unsigned int flags, pte_t *pte, + unsigned int flags, pte_t pteval, bool *unshare) { - pte_t pteval = huge_ptep_get(pte); - *unshare = false; if (is_swap_pte(pteval)) return true; @@ -6713,11 +6711,13 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, int err = -EFAULT, refs; while (vaddr < vma->vm_end && remainder) { - pte_t *pte; + pte_t *ptep, pte; spinlock_t *ptl = NULL; bool unshare = false; int absent; - struct page *page; + unsigned long pages_per_hpte; + struct page *page, *subpage; + struct hugetlb_pte hpte; /* * If we have a pending SIGKILL, don't keep faulting pages and @@ -6734,13 +6734,19 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, * each hugepage. We have to make sure we get the * first, for the page indexing below to work. * - * Note that page table lock is not held when pte is null. + * hugetlb_full_walk will mask the address appropriately. + * + * Note that page table lock is not held when ptep is null. */ - pte = hugetlb_walk(vma, vaddr & huge_page_mask(h), - huge_page_size(h)); - if (pte) - ptl = huge_pte_lock(h, mm, pte); - absent = !pte || huge_pte_none(huge_ptep_get(pte)); + if (hugetlb_full_walk(&hpte, vma, vaddr)) { + ptep = NULL; + absent = true; + } else { + ptl = hugetlb_pte_lock(&hpte); + ptep = hpte.ptep; + pte = huge_ptep_get(ptep); + absent = huge_pte_none(pte); + } /* * When coredumping, it suits get_dump_page if we just return @@ -6751,13 +6757,21 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, */ if (absent && (flags & FOLL_DUMP) && !hugetlbfs_pagecache_present(h, vma, vaddr)) { - if (pte) + if (ptep) spin_unlock(ptl); hugetlb_vma_unlock_read(vma); remainder = 0; break; } + if (!absent && pte_present(pte) && + !hugetlb_pte_present_leaf(&hpte, pte)) { + /* We raced with someone splitting the PTE, so retry. */ + spin_unlock(ptl); + hugetlb_vma_unlock_read(vma); + continue; + } + /* * We need call hugetlb_fault for both hugepages under migration * (in which case hugetlb_fault waits for the migration,) and @@ -6773,7 +6787,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, vm_fault_t ret; unsigned int fault_flags = 0; - if (pte) + if (ptep) spin_unlock(ptl); hugetlb_vma_unlock_read(vma); @@ -6822,8 +6836,10 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, continue; } - pfn_offset = (vaddr & ~huge_page_mask(h)) >> PAGE_SHIFT; - page = pte_page(huge_ptep_get(pte)); + pfn_offset = (vaddr & ~hugetlb_pte_mask(&hpte)) >> PAGE_SHIFT; + subpage = pte_page(pte); + pages_per_hpte = hugetlb_pte_size(&hpte) / PAGE_SIZE; + page = compound_head(subpage); VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) && !PageAnonExclusive(page), page); @@ -6833,22 +6849,22 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, * and skip the same_page loop below. */ if (!pages && !vmas && !pfn_offset && - (vaddr + huge_page_size(h) < vma->vm_end) && - (remainder >= pages_per_huge_page(h))) { - vaddr += huge_page_size(h); - remainder -= pages_per_huge_page(h); - i += pages_per_huge_page(h); + (vaddr + hugetlb_pte_size(&hpte) < vma->vm_end) && + (remainder >= pages_per_hpte)) { + vaddr += hugetlb_pte_size(&hpte); + remainder -= pages_per_hpte; + i += pages_per_hpte; spin_unlock(ptl); hugetlb_vma_unlock_read(vma); continue; } /* vaddr may not be aligned to PAGE_SIZE */ - refs = min3(pages_per_huge_page(h) - pfn_offset, remainder, + refs = min3(pages_per_hpte - pfn_offset, remainder, (vma->vm_end - ALIGN_DOWN(vaddr, PAGE_SIZE)) >> PAGE_SHIFT); if (pages || vmas) - record_subpages_vmas(nth_page(page, pfn_offset), + record_subpages_vmas(nth_page(subpage, pfn_offset), vma, refs, likely(pages) ? pages + i : NULL, vmas ? vmas + i : NULL); From patchwork Sat Feb 18 00:27:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58830 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp142784wrn; Fri, 17 Feb 2023 16:31:25 -0800 (PST) X-Google-Smtp-Source: AK7set8xxx5sFiaownuEm9dv1b4mX81kxxIXQP3DqepTPIoJDl1p6q01vO2+VRS6ZJER/hL/2+jd X-Received: by 2002:a17:906:9407:b0:8ac:8f3c:7f65 with SMTP id q7-20020a170906940700b008ac8f3c7f65mr231614ejx.48.1676680285595; Fri, 17 Feb 2023 16:31:25 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676680285; cv=none; d=google.com; s=arc-20160816; b=hVwJIllLSU7i3bGQUPmF49xMJ/1cpKpQNqwPgmL4w1RZTVBQyx6/ZFZrRtAJCE21ck FMb3CfycG+mzB47QCrOZ6LjoBU7Dh8qpvkWXEP4FNPgtMkToDH41GtHz/qURCVRC4cvP r8xdRe/opT+DjSSfCPRBLJvUHba9gGWEMcL/EwN+Ik+WpIlMZ1frEk59RgS03ajm6pf5 8HwfYHb6/RSK2hXqefLkbOTf0PEBgn/L839X91CT6Wb2jxY4cbxXyoCZ736EdO33a8g8 tw+NhBepf3DqB5zW5qRwL0Du+GX2wXJItScAopLpPF13PV2keBqi857on2o4F0S2l1L1 bY4A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=xFEWCFEVqNzp10q8weB+17bzU0TWKtM8UIz/Xx7DMio=; b=um3gOxAyuiZapruwJo601UHzuQQDXVKAZyTQzV4Ltt2bxL7K6dOFS6IqJWsNInKm5w sybMTX9g0vsMhDSVK5X66gNYpzrU1e+WUxzdSGVbGfTPW/bk+LqloDj8bQGJUu7DY3mN ImrYdepggVbYfhsvNfm4ZsgQWqS2k1FBRRcMcJn7Rm1XFX9RabC8LAmGZl3O5Ef5CwNl TabYLYTOPYIX2hzZHT3T70d7Q6HMVU24ebxw3kLHNn56Vzr/2iCOwkyG7d4ccNoRTDrI tkkl8srqK9KdLmzs2Pm/586OwkCcyDbrjlORz0aa+oykpykoZR7h2IQIaXSqGKiOIBH6 xgsA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=jsCsMDMV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bp14-20020a170907918e00b008b9f63af534si1288603ejb.97.2023.02.17.16.31.02; Fri, 17 Feb 2023 16:31:25 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=jsCsMDMV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230192AbjBRAaT (ORCPT + 99 others); Fri, 17 Feb 2023 19:30:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43108 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230009AbjBRA3l (ORCPT ); Fri, 17 Feb 2023 19:29:41 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E148360F8D for ; Fri, 17 Feb 2023 16:29:06 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-536566339d6so26929197b3.11 for ; Fri, 17 Feb 2023 16:29:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=xFEWCFEVqNzp10q8weB+17bzU0TWKtM8UIz/Xx7DMio=; b=jsCsMDMV78mjE8aA5hDj4ELvx3b3FYDfUCGtdB3AxH26EGW3FR4Wa7s8y12aCgjJAY xZHdXI0S9zLwyoyC2XXFQ/t878xZ76UJxTZdgaT1614gKQ8S86PPdMjKwYlq97dle3xZ X18JJ5MDR3KedDVpG1uX/f8VVj8pEdf241wOEt1/1ioMygZNzU7w6K5FfESBrPr6rcwL oyCOP/Tffhfxzt33/TtBRunbLaQrs+s+Ng2bav+9PXMaH8T+pd/4F//IKbtKNQ3pXtLn zm3LA/sIhVTcMgTue+SyM1JfkLU6x3PjoLgkHIa6SXkJODW1AhFZZ0jaKNWlzrRQQiMx uOCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xFEWCFEVqNzp10q8weB+17bzU0TWKtM8UIz/Xx7DMio=; b=HCe+oOi7FFFkPp3lIVuSIXEHLbjpBvrIAvnt9jZYus2jkriYkwRllplyGHUpSXFxiL GWUGb8D/3+tkjpktm2XkCHc1YEkKvYPMX/IO0xm7dftlu5zGjRwAxNB84uahGxvQZaqP KdMaDxYidZkpCPiw5EMJ2VJL/5wnJj29Bq2UAMgG7fJVCn40Dur5nntmSlw61aFwWIbY LQT8DGJAPYVdlWnzUDxmUc9qtfeewf8Ae3veTQKbat5dpGQutgqDBXHTSna2lmwX8cIE vQmnrDWl2k3KTkEt8x1GKdBxu+SXlvK/0tjaoNxRskeW7wmL64IzsX13omodT4mAKbzk fxCg== X-Gm-Message-State: AO0yUKWSFdTOlwxBJ9HUj+z/bXjJsr4DwRatM/wYlIOhaVBRDWe0QnWn /o8mgUNwSh0svXpEMpmHCTu7b5Cy29dkuxtN X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a5b:10e:0:b0:94a:ebba:cba6 with SMTP id 14-20020a5b010e000000b0094aebbacba6mr249759ybx.9.1676680144192; Fri, 17 Feb 2023 16:29:04 -0800 (PST) Date: Sat, 18 Feb 2023 00:27:54 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-22-jthoughton@google.com> Subject: [PATCH v2 21/46] hugetlb: add HGM support to hugetlb_follow_page_mask From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758126707091028041?= X-GMAIL-MSGID: =?utf-8?q?1758126707091028041?= The change here is very simple: do a high-granularity walk. Signed-off-by: James Houghton diff --git a/mm/hugetlb.c b/mm/hugetlb.c index c26b040f4fb5..693332b7e186 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6655,11 +6655,10 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, unsigned long address, unsigned int flags) { struct hstate *h = hstate_vma(vma); - struct mm_struct *mm = vma->vm_mm; - unsigned long haddr = address & huge_page_mask(h); struct page *page = NULL; spinlock_t *ptl; - pte_t *pte, entry; + pte_t entry; + struct hugetlb_pte hpte; /* * FOLL_PIN is not supported for follow_page(). Ordinary GUP goes via @@ -6669,13 +6668,24 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, return NULL; hugetlb_vma_lock_read(vma); - pte = hugetlb_walk(vma, haddr, huge_page_size(h)); - if (!pte) + + if (hugetlb_full_walk(&hpte, vma, address)) goto out_unlock; - ptl = huge_pte_lock(h, mm, pte); - entry = huge_ptep_get(pte); +retry: + ptl = hugetlb_pte_lock(&hpte); + entry = huge_ptep_get(hpte.ptep); if (pte_present(entry)) { + if (unlikely(!hugetlb_pte_present_leaf(&hpte, entry))) { + /* + * We raced with someone splitting from under us. + * Keep walking to get to the real leaf. + */ + spin_unlock(ptl); + hugetlb_full_walk_continue(&hpte, vma, address); + goto retry; + } + page = pte_page(entry) + ((address & ~huge_page_mask(h)) >> PAGE_SHIFT); /* From patchwork Sat Feb 18 00:27:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58828 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp142774wrn; Fri, 17 Feb 2023 16:31:23 -0800 (PST) X-Google-Smtp-Source: AK7set95JMexiqqaTCTaIhCR7MxLoyI59Wz8CmAp4y/nIPXCei1Fb5rgnmXN6bFDknjele6+XZXB X-Received: by 2002:a17:906:aac6:b0:8b2:fa6d:45d5 with SMTP id kt6-20020a170906aac600b008b2fa6d45d5mr3932776ejb.71.1676680282998; Fri, 17 Feb 2023 16:31:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676680282; cv=none; d=google.com; s=arc-20160816; b=E3l1V+bwYmONbhfxBPZoxMDRVouJkjhrQASjgrfFjNVYc15cPeU93iXik1xQYPpL6/ Zrxw9iMZv7FvwvO+xsldgK414VDmJS12Dd7ZgzGuCdlaidhZlzPXCYrMN2iaCKIpDGzU FrrpG91CYyWK4emMHjJl4eQlmLnIxzIkK8Sfgx9oYQlTBfjRdl2csp1czLYfx1URf/k6 GtLmTu9Im2SNhtvr3UXD0rfbipM17zXH7l9UsQVHWuNDmLS4ViXaA0HAHrXDocTV+6vx WCGiRdHDfl6BAZr9NMy/2NT0WULP74FZ2RqxV8xLcx/hP9l70pR8Z4JsQe+dOjD7Z3Fp e8CA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=JfAQD2GnLgelJPsgCkKvGQkDAwhBg880UG9Zo5xkj4o=; b=odXXfO/JCvPiK1P+jwOdJczbw7U42lkwvs7ca3511zqAb5UDa3pFF3NBxayyoqVweK 2IIFX2ZP0VhMMr+DcH2FiB18h4VlfOwDT+ylh7umQX8JFqNTEVi+zrmMJIjdF1giIDiI K4ScOmRnOxvHYklBqgaNzmodVXTylOSWwbM+aRZjS6YjRCaOZP9dscCN0loy+123fJY0 xrNBSzVRje85l3BpIrceIFXLlNZDJfLhppKgwJ/4vl6b0GWfuaGzvmCJnqvLfyFzTfSR yzDyJ3y2sVVdrHLueFRcN6MGie/Jb07ozpQLYDPL6ENjOieF+0FlrXd4NihxH7bzJDrf K17g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=GhwC2T28; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f7-20020a1709063f4700b008b1392f7c7bsi8469739ejj.770.2023.02.17.16.30.59; Fri, 17 Feb 2023 16:31:22 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=GhwC2T28; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229955AbjBRAaQ (ORCPT + 99 others); Fri, 17 Feb 2023 19:30:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41956 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229731AbjBRA3l (ORCPT ); Fri, 17 Feb 2023 19:29:41 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C3D4F6A077 for ; Fri, 17 Feb 2023 16:29:05 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-536582abb72so22515337b3.5 for ; Fri, 17 Feb 2023 16:29:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=JfAQD2GnLgelJPsgCkKvGQkDAwhBg880UG9Zo5xkj4o=; b=GhwC2T28C0lmYxfHkr1F39H6tq+748WK5+znOZC5ojr/ThSSPvOkW3JfFS+N39ixGB ghbNpsvcjrL12G/rlTpuIaTgDVDrM0g8p1X3gpqXE1+PzjgwrE4/bveclrEHpITrsH6/ vCh2zPSW0omBWie4q3qfP1M6ZsaWlNDa4GFKgD1OeOlMW8F6eFJFEyAiAd7ZbqdicqY8 v4DpXcrSMZw5s++QcfPtVYVTTZBANPOzW0cTGWCstj+659DoDrTPU4/tbwMfCr3764rU 0IShYiy8Ml8Z3PlRkk3ux+gdZUWzixjHuGIN16t1b0SrnlJSfYwfS4QIWpPPtcB+ore8 jTlw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=JfAQD2GnLgelJPsgCkKvGQkDAwhBg880UG9Zo5xkj4o=; b=OVoCZRkCV2+XQYu6XMSUmSuSG0tCfAvK/JtVEsx9gcdd+PDenH8h84LkdwMnMOrERI MvSG5sXB6eLdG63IUB89t4y9DPKxGibVUsRh830Mv0qhyrZJ0FoXUklSiwxO4FD8u/4L xwzNGNcPHxMLgPQlpqWuKMnQUC+8f+IyQWAF8EM+f3zJPEFXhde4iBgFVPJat6Ck8fhq 0FKhjZBYGJruw+0KoE+oBMWu2xIeuPrKAQDElJwFRPS3PvMO0SW8/Wn+rpQzq/MGK2pW ohsyxOGSVGP+0Mq2/owJbRaDK2uSwTgqPUsREpTbXqGduEAMVIXbHmrgcnfxpO2GUEtG 1JyQ== X-Gm-Message-State: AO0yUKU1tSg8XwmEPzKtskkVGJnpmluShBvycBciUYWUPLJrCUhI/9fs 65lqhPx3QU0sXs5TcCtCxJLVBGoXjJlv5daq X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6902:28c:b0:997:bdfe:78c5 with SMTP id v12-20020a056902028c00b00997bdfe78c5mr59430ybh.6.1676680145067; Fri, 17 Feb 2023 16:29:05 -0800 (PST) Date: Sat, 18 Feb 2023 00:27:55 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-23-jthoughton@google.com> Subject: [PATCH v2 22/46] hugetlb: add HGM support to copy_hugetlb_page_range From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758126704513475153?= X-GMAIL-MSGID: =?utf-8?q?1758126704513475153?= This allows fork() to work with high-granularity mappings. The page table structure is copied such that partially mapped regions will remain partially mapped in the same way for the new process. A page's reference count is incremented for *each* portion of it that is mapped in the page table. For example, if you have a PMD-mapped 1G page, the reference count will be incremented by 512. mapcount is handled similar to THPs: if you're completely mapping a hugepage, then the compound_mapcount is incremented. If you're mapping a part of it, the subpages that are getting mapped will have their mapcounts incremented. Signed-off-by: James Houghton diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 1a1a71868dfd..2fe1eb6897d4 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -162,6 +162,8 @@ void hugepage_put_subpool(struct hugepage_subpool *spool); void hugetlb_remove_rmap(struct page *subpage, unsigned long shift, struct hstate *h, struct vm_area_struct *vma); +void hugetlb_add_file_rmap(struct page *subpage, unsigned long shift, + struct hstate *h, struct vm_area_struct *vma); void hugetlb_dup_vma_private(struct vm_area_struct *vma); void clear_vma_resv_huge_pages(struct vm_area_struct *vma); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 693332b7e186..210c6f2b16a5 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -141,6 +141,37 @@ void hugetlb_remove_rmap(struct page *subpage, unsigned long shift, page_remove_rmap(subpage, vma, false); } } +/* + * hugetlb_add_file_rmap() - increment the mapcounts for file-backed hugetlb + * pages appropriately. + * + * For pages that are being mapped with their hstate-level PTE (e.g., a 1G page + * being mapped with a 1G PUD), then we increment the compound_mapcount for the + * head page. + * + * For pages that are being mapped with high-granularity, we increment the + * mapcounts for the individual subpages that are getting mapped. + */ +void hugetlb_add_file_rmap(struct page *subpage, unsigned long shift, + struct hstate *h, struct vm_area_struct *vma) +{ + struct page *hpage = compound_head(subpage); + + if (shift == huge_page_shift(h)) { + VM_BUG_ON_PAGE(subpage != hpage, subpage); + page_add_file_rmap(hpage, vma, true); + } else { + unsigned long nr_subpages = 1UL << (shift - PAGE_SHIFT); + struct page *final_page = &subpage[nr_subpages]; + + VM_BUG_ON_PAGE(HPageVmemmapOptimized(hpage), hpage); + /* + * Increment the mapcount on each page that is getting mapped. + */ + for (; subpage < final_page; ++subpage) + page_add_file_rmap(subpage, vma, false); + } +} static inline bool subpool_is_free(struct hugepage_subpool *spool) { @@ -5210,7 +5241,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, struct vm_area_struct *src_vma) { pte_t *src_pte, *dst_pte, entry; - struct page *ptepage; + struct hugetlb_pte src_hpte, dst_hpte; + struct page *ptepage, *hpage; unsigned long addr; bool cow = is_cow_mapping(src_vma->vm_flags); struct hstate *h = hstate_vma(src_vma); @@ -5238,18 +5270,24 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, } last_addr_mask = hugetlb_mask_last_page(h); - for (addr = src_vma->vm_start; addr < src_vma->vm_end; addr += sz) { + addr = src_vma->vm_start; + while (addr < src_vma->vm_end) { spinlock_t *src_ptl, *dst_ptl; - src_pte = hugetlb_walk(src_vma, addr, sz); - if (!src_pte) { - addr |= last_addr_mask; + unsigned long hpte_sz; + + if (hugetlb_full_walk(&src_hpte, src_vma, addr)) { + addr = (addr | last_addr_mask) + sz; continue; } - dst_pte = huge_pte_alloc(dst, dst_vma, addr, sz); - if (!dst_pte) { - ret = -ENOMEM; + ret = hugetlb_full_walk_alloc(&dst_hpte, dst_vma, addr, + hugetlb_pte_size(&src_hpte)); + if (ret) break; - } + + src_pte = src_hpte.ptep; + dst_pte = dst_hpte.ptep; + + hpte_sz = hugetlb_pte_size(&src_hpte); /* * If the pagetables are shared don't copy or take references. @@ -5259,13 +5297,14 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, * another vma. So page_count of ptep page is checked instead * to reliably determine whether pte is shared. */ - if (page_count(virt_to_page(dst_pte)) > 1) { - addr |= last_addr_mask; + if (hugetlb_pte_size(&dst_hpte) == sz && + page_count(virt_to_page(dst_pte)) > 1) { + addr = (addr | last_addr_mask) + sz; continue; } - dst_ptl = huge_pte_lock(h, dst, dst_pte); - src_ptl = huge_pte_lockptr(huge_page_shift(h), src, src_pte); + dst_ptl = hugetlb_pte_lock(&dst_hpte); + src_ptl = hugetlb_pte_lockptr(&src_hpte); spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); entry = huge_ptep_get(src_pte); again: @@ -5309,10 +5348,15 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, */ if (userfaultfd_wp(dst_vma)) set_huge_pte_at(dst, addr, dst_pte, entry); + } else if (!hugetlb_pte_present_leaf(&src_hpte, entry)) { + /* Retry the walk. */ + spin_unlock(src_ptl); + spin_unlock(dst_ptl); + continue; } else { - entry = huge_ptep_get(src_pte); ptepage = pte_page(entry); - get_page(ptepage); + hpage = compound_head(ptepage); + get_page(hpage); /* * Failing to duplicate the anon rmap is a rare case @@ -5324,13 +5368,34 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, * need to be without the pgtable locks since we could * sleep during the process. */ - if (!PageAnon(ptepage)) { - page_add_file_rmap(ptepage, src_vma, true); - } else if (page_try_dup_anon_rmap(ptepage, true, + if (!PageAnon(hpage)) { + hugetlb_add_file_rmap(ptepage, + src_hpte.shift, h, src_vma); + } + /* + * It is currently impossible to get anonymous HugeTLB + * high-granularity mappings, so we use 'hpage' here. + * + * This will need to be changed when HGM support for + * anon mappings is added. + */ + else if (page_try_dup_anon_rmap(hpage, true, src_vma)) { pte_t src_pte_old = entry; struct folio *new_folio; + /* + * If we are mapped at high granularity, we + * may end up allocating lots and lots of + * hugepages when we only need one. Bail out + * now. + */ + if (hugetlb_pte_size(&src_hpte) != sz) { + put_page(hpage); + ret = -EINVAL; + break; + } + spin_unlock(src_ptl); spin_unlock(dst_ptl); /* Do not use reserve as it's private owned */ @@ -5342,7 +5407,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, } copy_user_huge_page(&new_folio->page, ptepage, addr, dst_vma, npages); - put_page(ptepage); + put_page(hpage); /* Install the new hugetlb folio if src pte stable */ dst_ptl = huge_pte_lock(h, dst, dst_pte); @@ -5360,6 +5425,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, hugetlb_install_folio(dst_vma, dst_pte, addr, new_folio); spin_unlock(src_ptl); spin_unlock(dst_ptl); + addr += hugetlb_pte_size(&src_hpte); continue; } @@ -5376,10 +5442,13 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, } set_huge_pte_at(dst, addr, dst_pte, entry); - hugetlb_count_add(npages, dst); + hugetlb_count_add( + hugetlb_pte_size(&dst_hpte) / PAGE_SIZE, + dst); } spin_unlock(src_ptl); spin_unlock(dst_ptl); + addr += hugetlb_pte_size(&src_hpte); } if (cow) { From patchwork Sat Feb 18 00:27:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58831 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp142816wrn; Fri, 17 Feb 2023 16:31:28 -0800 (PST) X-Google-Smtp-Source: AK7set9HOlBa5cEPHL37O2F+hUsObo+CyQ2VZbHeEzeJM4dhBUsx0hk5VHgXKYwfjWrT5+TJsaX7 X-Received: by 2002:a17:906:5357:b0:884:c45f:1c04 with SMTP id j23-20020a170906535700b00884c45f1c04mr2702436ejo.2.1676680288537; Fri, 17 Feb 2023 16:31:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676680288; cv=none; d=google.com; s=arc-20160816; b=0iJjrSECCeTsTjq+S9k2uwr6TdMboRcibcezuqusDIRcY2HqEvIjkS+sJqvRG98uN2 c+nUL9InlMhFPiujcceQ2AiD24Enx3pitWc9r7mUP91y/Cpq1yVYNxKcgm+r7GjPatV7 UPnVzgWvfedcZB8ZADmyUMYQ+cdZC6c3c+5eeNnc9rMo4bsvxMH1jDPZQ8naLayx2Ekw iBveHrTg8hEiLq5DqIrHZ7v/SJ/fAlNkpj3lmnXfExjAwA8i8zfyMiMRvUYV1fsQfaJ6 WymqMKu1w86+XN+pjrXV/SzHu/+5u3Frsk4BNTnIO71sjwqBm9ehWJZ58ES+jzC6a2hi g+8Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=LnPCXYu0bp3jbi4b+RX4dTJj+ODlc15FNHREFY5xs1M=; b=kztP9QxCb9PQt60GzwTAP3WOW8s2MyF3T9WhvCgulO+ZqaF2Qn/qp+lN0ZD6IZ9tNx 7BzRnr7ixMIOcvLLGb0KOLGeHQTAlyqzigjgF4MJyhb6X3F9ICS5JViFJPZrwtNj+YU/ oxzBt5q7R2JXt7nxl/zkaRz2RiQn/ZTdq8+C1/On16EZ9DMZ4n7fIS2B8XbEzSeQqVa2 YZxjEZC/pfQR6n4XxJVsK5gK+9icrKOiFqSvdhII4Wii9kG+Zk/RapKLbDbV1nbi7i8a za+H80ad/EW82452ByshnBpGbEtFegCkFs+S+AsBsF3KScEtVU2/rRFVTQxh6bNrDt+W WkfA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=eyB1wewX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ez15-20020a1709070bcf00b008b124615204si6253669ejc.840.2023.02.17.16.31.05; Fri, 17 Feb 2023 16:31:28 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=eyB1wewX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230198AbjBRAaW (ORCPT + 99 others); Fri, 17 Feb 2023 19:30:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43122 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230086AbjBRA3m (ORCPT ); Fri, 17 Feb 2023 19:29:42 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0F2DC6CA0B for ; Fri, 17 Feb 2023 16:29:08 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-53659bb998bso19841337b3.9 for ; Fri, 17 Feb 2023 16:29:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=LnPCXYu0bp3jbi4b+RX4dTJj+ODlc15FNHREFY5xs1M=; b=eyB1wewXCAx1HDERPMCO0zwZAas4YhYFgV+T58Xo/niWsDRG5UC3Rs6XOZmH1VN6an BDeTgeZmAtV2U/RJBmPsaw2Pzs+502TrDV5JW7uSQjIUphjyqQ6JBLVLZiSywnbCwmN7 glxseLDjZd3DHe25r9gB54clLLLOql2zDYooJkeUUmIQE39+XJaDt5UKqvRy0vPMNP/Y dlrZCv9EsYeOE222dzKUD7vcvUi3Z842TugU3j8cUBo6f42mvUEeN6sqIIjBwYUd3WND 0vcWoKQXj/p0vdSibSMZSUoLzsYHkjTdgECKTuMGuOdbRhA9Y6lS7gs45B0ytQxsj3iT jZcg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=LnPCXYu0bp3jbi4b+RX4dTJj+ODlc15FNHREFY5xs1M=; b=OLyhVAiIqtK+yl2Irpuzxp2eYUip01qlGu5vPCmxhn7NQg9GV/vsg63ElMTvbiYNkR ZwgHTxNRBvrHj9CRhrVC7RodZ5i6mCjagpNixUv5UhHPteN5THoCM+xGr0cgtPMAaHiE 04A3mtDU44qCsU+NUwU8hsLjJjRCMT823oVljq0TghxOBtE4Z6CnGCHzVb43iYsQdVDI pPPjTv5QnyOEONh2j3iR7Q5KEYy9lr550xZfUJjAKbJHXCqkIh6gaK47NtZrF8+zRYe/ rtQPir8F6JXYVDQn2FyJjSPE1fL8NM+r40PKNDU32G6GxsRYM3fp8y3aXRHveUHAAqfL QRuA== X-Gm-Message-State: AO0yUKXlACgaFw7dzuLliG3E6cMeqyqLyryWYx3OcZJ9otOD9J17vH72 2zKOHYZdl73bH6ygctbbR8OFvjBIdGzULVQo X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a5b:ec7:0:b0:965:bac9:d458 with SMTP id a7-20020a5b0ec7000000b00965bac9d458mr8139ybs.11.1676680146246; Fri, 17 Feb 2023 16:29:06 -0800 (PST) Date: Sat, 18 Feb 2023 00:27:56 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-24-jthoughton@google.com> Subject: [PATCH v2 23/46] hugetlb: add HGM support to move_hugetlb_page_tables From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758126710343201639?= X-GMAIL-MSGID: =?utf-8?q?1758126710343201639?= This is very similar to the support that was added to copy_hugetlb_page_range. We simply do a high-granularity walk now, and most of the rest of the code stays the same. Signed-off-by: James Houghton diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 210c6f2b16a5..6c4678b7a07d 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5461,16 +5461,16 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, return ret; } -static void move_huge_pte(struct vm_area_struct *vma, unsigned long old_addr, - unsigned long new_addr, pte_t *src_pte, pte_t *dst_pte) +static void move_hugetlb_pte(struct vm_area_struct *vma, unsigned long old_addr, + unsigned long new_addr, struct hugetlb_pte *src_hpte, + struct hugetlb_pte *dst_hpte) { - struct hstate *h = hstate_vma(vma); struct mm_struct *mm = vma->vm_mm; spinlock_t *src_ptl, *dst_ptl; pte_t pte; - dst_ptl = huge_pte_lock(h, mm, dst_pte); - src_ptl = huge_pte_lockptr(huge_page_shift(h), mm, src_pte); + dst_ptl = hugetlb_pte_lock(dst_hpte); + src_ptl = hugetlb_pte_lockptr(src_hpte); /* * We don't have to worry about the ordering of src and dst ptlocks @@ -5479,8 +5479,8 @@ static void move_huge_pte(struct vm_area_struct *vma, unsigned long old_addr, if (src_ptl != dst_ptl) spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); - pte = huge_ptep_get_and_clear(mm, old_addr, src_pte); - set_huge_pte_at(mm, new_addr, dst_pte, pte); + pte = huge_ptep_get_and_clear(mm, old_addr, src_hpte->ptep); + set_huge_pte_at(mm, new_addr, dst_hpte->ptep, pte); if (src_ptl != dst_ptl) spin_unlock(src_ptl); @@ -5498,9 +5498,9 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma, struct mm_struct *mm = vma->vm_mm; unsigned long old_end = old_addr + len; unsigned long last_addr_mask; - pte_t *src_pte, *dst_pte; struct mmu_notifier_range range; bool shared_pmd = false; + struct hugetlb_pte src_hpte, dst_hpte; mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, old_addr, old_end); @@ -5516,28 +5516,35 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma, /* Prevent race with file truncation */ hugetlb_vma_lock_write(vma); i_mmap_lock_write(mapping); - for (; old_addr < old_end; old_addr += sz, new_addr += sz) { - src_pte = hugetlb_walk(vma, old_addr, sz); - if (!src_pte) { - old_addr |= last_addr_mask; - new_addr |= last_addr_mask; + while (old_addr < old_end) { + if (hugetlb_full_walk(&src_hpte, vma, old_addr)) { + /* The hstate-level PTE wasn't allocated. */ + old_addr = (old_addr | last_addr_mask) + sz; + new_addr = (new_addr | last_addr_mask) + sz; continue; } - if (huge_pte_none(huge_ptep_get(src_pte))) + + if (huge_pte_none(huge_ptep_get(src_hpte.ptep))) { + old_addr += hugetlb_pte_size(&src_hpte); + new_addr += hugetlb_pte_size(&src_hpte); continue; + } - if (huge_pmd_unshare(mm, vma, old_addr, src_pte)) { + if (hugetlb_pte_size(&src_hpte) == sz && + huge_pmd_unshare(mm, vma, old_addr, src_hpte.ptep)) { shared_pmd = true; - old_addr |= last_addr_mask; - new_addr |= last_addr_mask; + old_addr = (old_addr | last_addr_mask) + sz; + new_addr = (new_addr | last_addr_mask) + sz; continue; } - dst_pte = huge_pte_alloc(mm, new_vma, new_addr, sz); - if (!dst_pte) + if (hugetlb_full_walk_alloc(&dst_hpte, new_vma, new_addr, + hugetlb_pte_size(&src_hpte))) break; - move_huge_pte(vma, old_addr, new_addr, src_pte, dst_pte); + move_hugetlb_pte(vma, old_addr, new_addr, &src_hpte, &dst_hpte); + old_addr += hugetlb_pte_size(&src_hpte); + new_addr += hugetlb_pte_size(&src_hpte); } if (shared_pmd) From patchwork Sat Feb 18 00:27:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58832 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp142879wrn; Fri, 17 Feb 2023 16:31:40 -0800 (PST) X-Google-Smtp-Source: AK7set9xRjpjoSypS4cvjnbEMRmZdsOKm8C+tYTJjE6cZHtEZNC/wXnY84C9iCwExJy5O7e7CH1c X-Received: by 2002:a17:906:92d1:b0:8b1:800b:9fbf with SMTP id d17-20020a17090692d100b008b1800b9fbfmr3314884ejx.13.1676680299842; Fri, 17 Feb 2023 16:31:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676680299; cv=none; d=google.com; s=arc-20160816; b=Qh1wfqeuLAUp8tXQmGLGojYrEQxtKr8XpSI1nRI2c/nYsVvmNSpIOdnjH5pQJelk8x Efpgkf7oWH78Vl9H8FjL8CCjIitkKQREyRfsQhQ/ff/kVz4CNryGV4TDxcsQqCfVAnbq Uibroq0Z4O0J78w0gLycQdYjT49D1dRANCkA8ocVgytx14g3CqvD+eVSwdWl415e03Oj Fto6du3Pyitw/kbd7d4WF2vi2rQ4GXuJd7IGmr7aBRclSbcas6jOpUZgvl0Bt6CAQSga H+II99M+lFj3QRcbvakybjNIcAOgF3msjh2ZdiHbqA2r1jRJy2p4oXKBBeDPueS+ThNo CHbw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=WTnKSuVZCt+rhuzvxEm8L73SzZuvMukxFScaxZ9D8Co=; b=lI7cQnR0S0RvSfh83ndXhW10CJpXD+JBcBBIzW9HED/KwkhgsLJVKF+jJQC8ETlECE 8IdkJOM+8CLjEgJLPn5kZ5txZoaNLA1h465oJ/K06hYtThVEuJvrldgPPOlERHdK/DDs r6ApiiucXOtj03YZ2399p5WjuOp+1zdL+M1weFnd4leySefPDIaYKZ0x33aruHPRlF5d xAf0pfLa5lFE4OH9SvPt423KBPIsE0yWSCZUIyrQr6qa2IyBV+QGhnLoRT20XKEMm+eT 1mgiPMysoHYZuJumsrfWxhKodoihP0IVeq96hD9VD8pTuWY2uIQR9PMup3FWR1R4Xq2Z CsWA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=L30EdOY7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c30-20020a17090620de00b008b14bcd8052si7782153ejc.715.2023.02.17.16.31.16; Fri, 17 Feb 2023 16:31:39 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=L30EdOY7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230203AbjBRAa1 (ORCPT + 99 others); Fri, 17 Feb 2023 19:30:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42034 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229540AbjBRA3q (ORCPT ); Fri, 17 Feb 2023 19:29:46 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7E8446A042 for ; Fri, 17 Feb 2023 16:29:08 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id g63-20020a25db42000000b00889c54916f2so1740410ybf.14 for ; Fri, 17 Feb 2023 16:29:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=WTnKSuVZCt+rhuzvxEm8L73SzZuvMukxFScaxZ9D8Co=; b=L30EdOY7ZdwK//mLU+7411LW4r7AXP+dL/wo9+jXO/WcJ04CoJuWdXFBiiCSNGb46t sBaVLxi21hhAwKqlQ42PTpoktSV3w8ZCG6GtaV6zNLoCZsWyIozI7LT3SGds8vUSGtvL 8bn+t4XSOU3//vae5pDb5V2Wt0nqx+ASlUpTQxUcwOKp6auK0QYbvPxXLS//Whkg8Xep plK+8niTC21BXzXzAR2uTAn/GBxuv7yY3ZRhMgy6+GQbwBq7yNFIB3yLNd8/V4O+5B+l K1VKQkZL1YOcNbEpaCrjieJj1keaKiBGGAuUTdjO/zQEfWTY5Cm/wW0Vm3cp6E52Cyi6 zD3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=WTnKSuVZCt+rhuzvxEm8L73SzZuvMukxFScaxZ9D8Co=; b=0EQQyy4WyF59saCcw0ncJxjXFGXYuH36e6jym9jikNLUgOcyGZC9wj+pGWiP0m2+4z efmGDNmuoVvC1j/k6cnoiU+pvNGHfmaBu9hGE7KTh5JlSdrMo0jVDbobn3Yd/4oGukWy m8qqCriHUfZivi7GG5+YNxijINapKefmFHASrjjHvK8rpnOnbjPnMmJtkOcNkOf2WfdR v9TJcuQBKtTxIrlvmOtcWSY4/NEf3RUphasbvvfQsbC5Gm0LpUxkT+wJ0OUs2YGRe6HM o9VEbxNb+lowBW3JW9d9yHkPIApioohy2wSvlJeHsfbQBAvLSXAj9VgZBg6Va0edWU2t VkbA== X-Gm-Message-State: AO0yUKWOY7y+VEIZd7bwNB67p59ohrAZ3sbEylwvKbHPkKK3a1wGYTJX LlRXCkchX1XaOBHrQNUJlWhEA30QLmSsZKwO X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6902:154b:b0:97a:ebd:a594 with SMTP id r11-20020a056902154b00b0097a0ebda594mr79653ybu.3.1676680147254; Fri, 17 Feb 2023 16:29:07 -0800 (PST) Date: Sat, 18 Feb 2023 00:27:57 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-25-jthoughton@google.com> Subject: [PATCH v2 24/46] hugetlb: add HGM support to hugetlb_fault and hugetlb_no_page From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758126722266776579?= X-GMAIL-MSGID: =?utf-8?q?1758126722266776579?= Update the page fault handler to support high-granularity page faults. While handling a page fault on a partially-mapped HugeTLB page, if the PTE we find with hugetlb_pte_walk is none, then we will replace it with a leaf-level PTE to map the page. To give some examples: 1. For a completely unmapped 1G page, it will be mapped with a 1G PUD. 2. For a 1G page that has its first 512M mapped, any faults on the unmapped sections will result in 2M PMDs mapping each unmapped 2M section. 3. For a 1G page that has only its first 4K mapped, a page fault on its second 4K section will get a 4K PTE to map it. Unless high-granularity mappings are created via UFFDIO_CONTINUE, it is impossible for hugetlb_fault to create high-granularity mappings. This commit does not handle hugetlb_wp right now, and it doesn't handle HugeTLB page migration and swap entries. The BUG_ON in huge_pte_alloc is removed, as it is not longer valid when HGM is possible. HGM can be disabled if the VMA lock cannot be allocated after a VMA is split, yet high-granularity mappings may still exist. Signed-off-by: James Houghton diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 6c4678b7a07d..86cd51beb02c 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -173,6 +173,18 @@ void hugetlb_add_file_rmap(struct page *subpage, unsigned long shift, } } +/* + * Find the subpage that corresponds to `addr` in `folio`. + */ +static struct page *hugetlb_find_subpage(struct hstate *h, struct folio *folio, + unsigned long addr) +{ + size_t idx = (addr & ~huge_page_mask(h))/PAGE_SIZE; + + BUG_ON(idx >= pages_per_huge_page(h)); + return folio_page(folio, idx); +} + static inline bool subpool_is_free(struct hugepage_subpool *spool) { if (spool->count) @@ -6072,14 +6084,14 @@ static inline vm_fault_t hugetlb_handle_userfault(struct vm_area_struct *vma, * Recheck pte with pgtable lock. Returns true if pte didn't change, or * false if pte changed or is changing. */ -static bool hugetlb_pte_stable(struct hstate *h, struct mm_struct *mm, - pte_t *ptep, pte_t old_pte) +static bool hugetlb_pte_stable(struct hstate *h, struct hugetlb_pte *hpte, + pte_t old_pte) { spinlock_t *ptl; bool same; - ptl = huge_pte_lock(h, mm, ptep); - same = pte_same(huge_ptep_get(ptep), old_pte); + ptl = hugetlb_pte_lock(hpte); + same = pte_same(huge_ptep_get(hpte->ptep), old_pte); spin_unlock(ptl); return same; @@ -6088,7 +6100,7 @@ static bool hugetlb_pte_stable(struct hstate *h, struct mm_struct *mm, static vm_fault_t hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma, struct address_space *mapping, pgoff_t idx, - unsigned long address, pte_t *ptep, + unsigned long address, struct hugetlb_pte *hpte, pte_t old_pte, unsigned int flags) { struct hstate *h = hstate_vma(vma); @@ -6096,10 +6108,12 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, int anon_rmap = 0; unsigned long size; struct folio *folio; + struct page *subpage; pte_t new_pte; spinlock_t *ptl; unsigned long haddr = address & huge_page_mask(h); bool new_folio, new_pagecache_folio = false; + unsigned long haddr_hgm = address & hugetlb_pte_mask(hpte); u32 hash = hugetlb_fault_mutex_hash(mapping, idx); /* @@ -6143,7 +6157,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, * never happen on the page after UFFDIO_COPY has * correctly installed the page and returned. */ - if (!hugetlb_pte_stable(h, mm, ptep, old_pte)) { + if (!hugetlb_pte_stable(h, hpte, old_pte)) { ret = 0; goto out; } @@ -6167,7 +6181,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, * here. Before returning error, get ptl and make * sure there really is no pte entry. */ - if (hugetlb_pte_stable(h, mm, ptep, old_pte)) + if (hugetlb_pte_stable(h, hpte, old_pte)) ret = vmf_error(PTR_ERR(folio)); else ret = 0; @@ -6217,7 +6231,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, folio_unlock(folio); folio_put(folio); /* See comment in userfaultfd_missing() block above */ - if (!hugetlb_pte_stable(h, mm, ptep, old_pte)) { + if (!hugetlb_pte_stable(h, hpte, old_pte)) { ret = 0; goto out; } @@ -6242,30 +6256,46 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, vma_end_reservation(h, vma, haddr); } - ptl = huge_pte_lock(h, mm, ptep); + ptl = hugetlb_pte_lock(hpte); ret = 0; - /* If pte changed from under us, retry */ - if (!pte_same(huge_ptep_get(ptep), old_pte)) + /* + * If pte changed from under us, retry. + * + * When dealing with high-granularity-mapped PTEs, it's possible that + * a non-contiguous PTE within our contiguous PTE group gets populated, + * in which case, we need to retry here. This is NOT caught here, and + * will need to be addressed when HGM is supported for architectures + * that support contiguous PTEs. + */ + if (!pte_same(huge_ptep_get(hpte->ptep), old_pte)) goto backout; - if (anon_rmap) + subpage = hugetlb_find_subpage(h, folio, haddr_hgm); + + if (anon_rmap) { + VM_BUG_ON(&folio->page != subpage); hugepage_add_new_anon_rmap(folio, vma, haddr); + } else - page_add_file_rmap(&folio->page, vma, true); - new_pte = make_huge_pte(vma, &folio->page, ((vma->vm_flags & VM_WRITE) - && (vma->vm_flags & VM_SHARED))); + hugetlb_add_file_rmap(subpage, hpte->shift, h, vma); + + new_pte = make_huge_pte_with_shift(vma, subpage, + ((vma->vm_flags & VM_WRITE) + && (vma->vm_flags & VM_SHARED)), + hpte->shift); /* * If this pte was previously wr-protected, keep it wr-protected even * if populated. */ if (unlikely(pte_marker_uffd_wp(old_pte))) new_pte = huge_pte_mkuffd_wp(new_pte); - set_huge_pte_at(mm, haddr, ptep, new_pte); + set_huge_pte_at(mm, haddr_hgm, hpte->ptep, new_pte); - hugetlb_count_add(pages_per_huge_page(h), mm); + hugetlb_count_add(hugetlb_pte_size(hpte) / PAGE_SIZE, mm); if ((flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) { + WARN_ON_ONCE(hugetlb_pte_size(hpte) != huge_page_size(h)); /* Optimization, do the COW without a second fault */ - ret = hugetlb_wp(mm, vma, address, ptep, flags, folio, ptl); + ret = hugetlb_wp(mm, vma, address, hpte->ptep, flags, folio, ptl); } spin_unlock(ptl); @@ -6322,17 +6352,19 @@ u32 hugetlb_fault_mutex_hash(struct address_space *mapping, pgoff_t idx) vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, unsigned int flags) { - pte_t *ptep, entry; + pte_t entry; spinlock_t *ptl; vm_fault_t ret; u32 hash; pgoff_t idx; - struct page *page = NULL; - struct folio *pagecache_folio = NULL; + struct page *subpage = NULL; + struct folio *pagecache_folio = NULL, *folio = NULL; struct hstate *h = hstate_vma(vma); struct address_space *mapping; int need_wait_lock = 0; unsigned long haddr = address & huge_page_mask(h); + unsigned long haddr_hgm; + struct hugetlb_pte hpte; /* * Serialize hugepage allocation and instantiation, so that we don't @@ -6346,26 +6378,26 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, /* * Acquire vma lock before calling huge_pte_alloc and hold - * until finished with ptep. This prevents huge_pmd_unshare from - * being called elsewhere and making the ptep no longer valid. + * until finished with hpte. This prevents huge_pmd_unshare from + * being called elsewhere and making the hpte no longer valid. */ hugetlb_vma_lock_read(vma); - ptep = huge_pte_alloc(mm, vma, haddr, huge_page_size(h)); - if (!ptep) { + if (hugetlb_full_walk_alloc(&hpte, vma, address, 0)) { hugetlb_vma_unlock_read(vma); mutex_unlock(&hugetlb_fault_mutex_table[hash]); return VM_FAULT_OOM; } - entry = huge_ptep_get(ptep); + entry = huge_ptep_get(hpte.ptep); /* PTE markers should be handled the same way as none pte */ - if (huge_pte_none_mostly(entry)) + if (huge_pte_none_mostly(entry)) { /* * hugetlb_no_page will drop vma lock and hugetlb fault * mutex internally, which make us return immediately. */ - return hugetlb_no_page(mm, vma, mapping, idx, address, ptep, + return hugetlb_no_page(mm, vma, mapping, idx, address, &hpte, entry, flags); + } ret = 0; @@ -6386,7 +6418,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, * be released there. */ mutex_unlock(&hugetlb_fault_mutex_table[hash]); - migration_entry_wait_huge(vma, ptep); + migration_entry_wait_huge(vma, hpte.ptep); return 0; } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) ret = VM_FAULT_HWPOISON_LARGE | @@ -6394,6 +6426,10 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, goto out_mutex; } + if (!hugetlb_pte_present_leaf(&hpte, entry)) + /* We raced with someone splitting the entry. */ + goto out_mutex; + /* * If we are going to COW/unshare the mapping later, we examine the * pending reservations for this page now. This will ensure that any @@ -6413,14 +6449,17 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, pagecache_folio = filemap_lock_folio(mapping, idx); } - ptl = huge_pte_lock(h, mm, ptep); + ptl = hugetlb_pte_lock(&hpte); /* Check for a racing update before calling hugetlb_wp() */ - if (unlikely(!pte_same(entry, huge_ptep_get(ptep)))) + if (unlikely(!pte_same(entry, huge_ptep_get(hpte.ptep)))) goto out_ptl; + /* haddr_hgm is the base address of the region that hpte maps. */ + haddr_hgm = address & hugetlb_pte_mask(&hpte); + /* Handle userfault-wp first, before trying to lock more pages */ - if (userfaultfd_wp(vma) && huge_pte_uffd_wp(huge_ptep_get(ptep)) && + if (userfaultfd_wp(vma) && huge_pte_uffd_wp(entry) && (flags & FAULT_FLAG_WRITE) && !huge_pte_write(entry)) { struct vm_fault vmf = { .vma = vma, @@ -6444,18 +6483,21 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, * pagecache_folio, so here we need take the former one * when page != pagecache_folio or !pagecache_folio. */ - page = pte_page(entry); - if (page_folio(page) != pagecache_folio) - if (!trylock_page(page)) { + subpage = pte_page(entry); + folio = page_folio(subpage); + if (folio != pagecache_folio) + if (!trylock_page(&folio->page)) { need_wait_lock = 1; goto out_ptl; } - get_page(page); + folio_get(folio); if (flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) { if (!huge_pte_write(entry)) { - ret = hugetlb_wp(mm, vma, address, ptep, flags, + WARN_ON_ONCE(hugetlb_pte_size(&hpte) != + huge_page_size(h)); + ret = hugetlb_wp(mm, vma, address, hpte.ptep, flags, pagecache_folio, ptl); goto out_put_page; } else if (likely(flags & FAULT_FLAG_WRITE)) { @@ -6463,13 +6505,13 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, } } entry = pte_mkyoung(entry); - if (huge_ptep_set_access_flags(vma, haddr, ptep, entry, + if (huge_ptep_set_access_flags(vma, haddr_hgm, hpte.ptep, entry, flags & FAULT_FLAG_WRITE)) - update_mmu_cache(vma, haddr, ptep); + update_mmu_cache(vma, haddr_hgm, hpte.ptep); out_put_page: - if (page_folio(page) != pagecache_folio) - unlock_page(page); - put_page(page); + if (folio != pagecache_folio) + folio_unlock(folio); + folio_put(folio); out_ptl: spin_unlock(ptl); @@ -6488,7 +6530,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, * here without taking refcount. */ if (need_wait_lock) - wait_on_page_locked(page); + wait_on_page_locked(&folio->page); return ret; } @@ -7689,6 +7731,9 @@ int hugetlb_full_walk(struct hugetlb_pte *hpte, /* * hugetlb_full_walk_alloc - do a high-granularity walk, potentially allocate * new PTEs. + * + * If @target_sz is 0, then only attempt to allocate the hstate-level PTE and + * walk as far as we can go. */ int hugetlb_full_walk_alloc(struct hugetlb_pte *hpte, struct vm_area_struct *vma, @@ -7707,6 +7752,12 @@ int hugetlb_full_walk_alloc(struct hugetlb_pte *hpte, if (!ptep) return -ENOMEM; + if (!target_sz) { + WARN_ON_ONCE(hugetlb_hgm_walk(hpte, ptep, vma, addr, + PAGE_SIZE, false)); + return 0; + } + return hugetlb_hgm_walk(hpte, ptep, vma, addr, target_sz, true); } @@ -7735,7 +7786,6 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, pte = (pte_t *)pmd_alloc(mm, pud, addr); } } - BUG_ON(pte && pte_present(*pte) && !pte_huge(*pte)); return pte; } From patchwork Sat Feb 18 00:27:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58833 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp142874wrn; Fri, 17 Feb 2023 16:31:39 -0800 (PST) X-Google-Smtp-Source: AK7set/lbfUDen3hfjv+JbRzgTBb+W5IYglnr/+NhLAw6Slv7EvrkpIC1VQYbLv7kRbbSZWFVFJ2 X-Received: by 2002:a17:906:f8c7:b0:8b1:7ab0:369c with SMTP id lh7-20020a170906f8c700b008b17ab0369cmr5123927ejb.31.1676680299527; Fri, 17 Feb 2023 16:31:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676680299; cv=none; d=google.com; s=arc-20160816; b=j+H4G3Z9BtGXlCI9a0ZNA7fAtbZqpWPXHzIJdry7B+4I2wlqeLLGDwOSi+mJb+9xRY wwkFRCrEBaLv/JR+FopbT5RalKqSS/3Np0pH+sCx43e/3BiPzmwmW5YwskwBQO1y9AeK z6EKeSmSdtG3Ae2N/+EA0cyjAXACkqQtiH3bBubGwU9BzPtRmnl66bQ+4/UPe94rbuBO 7RAOR6yCovWh9RQADqOtIwGnyedJKK/ws4vSZ/20A27YqyzFgr01452cOE45RAg8Q7uf BduGbTClCDJSLeGmx1U75TpnhyHOHg/JH1NskPpCqYAqP1LumI63rLBNgpzgFLCwVLub xmVg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=UjVXJ8hdtbHTf+lUAZ4f6x6F3SynSj2CG1xD1+Yj8f8=; b=GLVcI6z0xk28H5F/bAnh+bkAYYd1JlBXSOhqdHdO43lNkjc+75gyB08tqdIgacWU3i nON7nOhAuKMKItnVNAd7fPGJ6FBiLjcmHPPrrJwXxYT0Fc70DsN+M+iOWL+82ylfmS97 hVMZ48IX+x+edcrEZntbUhBkKimFzhTGT+FzsdlwiHnflbWH54g9cP8Rf/s/rU9PZcwy zTBKPU8fhbpF+RinCBdtmDSfPMD+ZzsV04WBNh632C2h+k8BVn0FY7I6C5CqzEGe6o0B iyDNWmOC+aaCq1d5++uoHJHt27s8hfWXuQILZGi4eoHP8Y4F1kKVECmQjLjopybkfOux M5VQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=pv9sgXbR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id we11-20020a170907234b00b00887dadb95e2si6022406ejb.711.2023.02.17.16.31.16; Fri, 17 Feb 2023 16:31:39 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=pv9sgXbR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230220AbjBRAan (ORCPT + 99 others); Fri, 17 Feb 2023 19:30:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43046 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230096AbjBRA3t (ORCPT ); Fri, 17 Feb 2023 19:29:49 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B5B356C015 for ; Fri, 17 Feb 2023 16:29:12 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id o137-20020a25418f000000b009419f64f6afso2165044yba.2 for ; Fri, 17 Feb 2023 16:29:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=UjVXJ8hdtbHTf+lUAZ4f6x6F3SynSj2CG1xD1+Yj8f8=; b=pv9sgXbRSK+ulujLGfdPPPAff8mO5bDv7HpAt6sVLZ6JoZzQXqgTKRKandAavm2wSz GaPYHHRdcJpckwms1PVrFc6vNiVTcxAlG6oCUMRdsxDqYTYXzVurmJEml8+ql0/TDF3v UCJ95R9PBOaSmpPE4Ic9Jp4SeoUjQgy3yUVhXKTk40u73yY7EVS1bRLPiHZZ1nL8e3sb q6IAgfazCkla4E5CBbTWQRtlKAAvX2g0orVfxrreagJHjk3BLCZ3ldDp7zxJzy4vym4x NWi0886Mcej8s6rWUU/kKQxFgDeHHLlKOLYF3V4C/36HC2UNIB85Ve9+c4SQOQU3rEkv hcBQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=UjVXJ8hdtbHTf+lUAZ4f6x6F3SynSj2CG1xD1+Yj8f8=; b=G/JMY2yUjZjyzJX5icYnwqF2rruwEoV0qq11xPtOKbjcWNvNFT28PQg6oQwi6QWA6u DYdZC6bwU2QVtz/RNMix0Z9E6ge0NVV7K/or/bEfttDDjzU+JagJSDWYnVbc7A56jf0I PBMSZoKNJxaR5ZQqGqlfKZe4uoRABy4brJA5iAW29V3/ini/7rYBQlXPEFngKk7mX4s9 gnD6NXP/mo995EZFDCbcQjk8RTzDX65p/mqiE+CZbDRtTutchPzALlqNK0iaGBgUNypT 7B1uSbia/rVkFwWuCQqLZTzU5QR2wItsX1eTUmS8zPEnzQ88WRTZCpCIaDaUsYU8KI8F Qh5Q== X-Gm-Message-State: AO0yUKVnlpcJ9F7IO0PPIqbZfrB+1t1P3EeoQiF6e3xghQAn5qwR6r3y vWZrFDi4jZF0uJHBV0IUffcRygvF7Rux++7/ X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a5b:c4b:0:b0:8c3:7bc8:7f0e with SMTP id d11-20020a5b0c4b000000b008c37bc87f0emr1152747ybr.588.1676680148502; Fri, 17 Feb 2023 16:29:08 -0800 (PST) Date: Sat, 18 Feb 2023 00:27:58 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-26-jthoughton@google.com> Subject: [PATCH v2 25/46] hugetlb: use struct hugetlb_pte for walk_hugetlb_range From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758126721312778056?= X-GMAIL-MSGID: =?utf-8?q?1758126721312778056?= The main change in this commit is to walk_hugetlb_range to support walking HGM mappings, but all walk_hugetlb_range callers must be updated to use the new API and take the correct action. Listing all the changes to the callers: For s390 changes, we simply BUILD_BUG_ON if HGM is enabled. For smaps, shared_hugetlb (and private_hugetlb, although private mappings don't support HGM) may now not be divisible by the hugepage size. The appropriate changes have been made to support analyzing HGM PTEs. For pagemap, we ignore non-leaf PTEs by treating that as if they were none PTEs. We can only end up with non-leaf PTEs if they had just been updated from a none PTE. For show_numa_map, the challenge is that, if any of a hugepage is mapped, we have to count that entire page exactly once, as the results are given in units of hugepages. To support HGM mappings, we keep track of the last page that we looked it. If the hugepage we are currently looking at is the same as the last one, then we must be looking at an HGM-mapped page that has been mapped at high-granularity, and we've already accounted for it. For DAMON, we treat non-leaf PTEs as if they were blank, for the same reason as pagemap. For hwpoison, we proactively update the logic to support the case when hpte is pointing to a subpage within the poisoned hugepage. For queue_pages_hugetlb/migration, we ignore all HGM-enabled VMAs for now. For mincore, we ignore non-leaf PTEs for the same reason as pagemap. For mprotect/prot_none_hugetlb_entry, we retry the walk when we get a non-leaf PTE. Signed-off-by: James Houghton diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c index 5a716bdcba05..e1d41caa8504 100644 --- a/arch/s390/mm/gmap.c +++ b/arch/s390/mm/gmap.c @@ -2629,14 +2629,20 @@ static int __s390_enable_skey_pmd(pmd_t *pmd, unsigned long addr, return 0; } -static int __s390_enable_skey_hugetlb(pte_t *pte, unsigned long addr, - unsigned long hmask, unsigned long next, +static int __s390_enable_skey_hugetlb(struct hugetlb_pte *hpte, + unsigned long addr, struct mm_walk *walk) { - pmd_t *pmd = (pmd_t *)pte; + pmd_t *pmd = (pmd_t *)hpte->ptep; unsigned long start, end; struct page *page = pmd_page(*pmd); + /* + * We don't support high-granularity mappings yet. If we did, the + * pmd_page() call above would be unsafe. + */ + BUILD_BUG_ON(IS_ENABLED(CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING)); + /* * The write check makes sure we do not set a key on shared * memory. This is needed as the walker does not differentiate diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 77b72f42556a..2f293b5dabc0 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -731,27 +731,39 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) } #ifdef CONFIG_HUGETLB_PAGE -static int smaps_hugetlb_range(pte_t *pte, unsigned long hmask, - unsigned long addr, unsigned long end, - struct mm_walk *walk) +static int smaps_hugetlb_range(struct hugetlb_pte *hpte, + unsigned long addr, + struct mm_walk *walk) { struct mem_size_stats *mss = walk->private; struct vm_area_struct *vma = walk->vma; struct page *page = NULL; + pte_t pte = huge_ptep_get(hpte->ptep); - if (pte_present(*pte)) { - page = vm_normal_page(vma, addr, *pte); - } else if (is_swap_pte(*pte)) { - swp_entry_t swpent = pte_to_swp_entry(*pte); + if (pte_present(pte)) { + /* We only care about leaf-level PTEs. */ + if (!hugetlb_pte_present_leaf(hpte, pte)) + /* + * The only case where hpte is not a leaf is that + * it was originally none, but it was split from + * under us. It was originally none, so exclude it. + */ + return 0; + + page = vm_normal_page(vma, addr, pte); + } else if (is_swap_pte(pte)) { + swp_entry_t swpent = pte_to_swp_entry(pte); if (is_pfn_swap_entry(swpent)) page = pfn_swap_entry_to_page(swpent); } if (page) { - if (page_mapcount(page) >= 2 || hugetlb_pmd_shared(pte)) - mss->shared_hugetlb += huge_page_size(hstate_vma(vma)); + unsigned long sz = hugetlb_pte_size(hpte); + + if (page_mapcount(page) >= 2 || hugetlb_pmd_shared(hpte->ptep)) + mss->shared_hugetlb += sz; else - mss->private_hugetlb += huge_page_size(hstate_vma(vma)); + mss->private_hugetlb += sz; } return 0; } @@ -1569,22 +1581,31 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end, #ifdef CONFIG_HUGETLB_PAGE /* This function walks within one hugetlb entry in the single call */ -static int pagemap_hugetlb_range(pte_t *ptep, unsigned long hmask, - unsigned long addr, unsigned long end, +static int pagemap_hugetlb_range(struct hugetlb_pte *hpte, + unsigned long addr, struct mm_walk *walk) { struct pagemapread *pm = walk->private; struct vm_area_struct *vma = walk->vma; u64 flags = 0, frame = 0; int err = 0; - pte_t pte; + unsigned long hmask = hugetlb_pte_mask(hpte); + unsigned long end = addr + hugetlb_pte_size(hpte); + pte_t pte = huge_ptep_get(hpte->ptep); + struct page *page; if (vma->vm_flags & VM_SOFTDIRTY) flags |= PM_SOFT_DIRTY; - pte = huge_ptep_get(ptep); if (pte_present(pte)) { - struct page *page = pte_page(pte); + /* + * We raced with this PTE being split, which can only happen if + * it was blank before. Treat it is as if it were blank. + */ + if (!hugetlb_pte_present_leaf(hpte, pte)) + return 0; + + page = pte_page(pte); if (!PageAnon(page)) flags |= PM_FILE; @@ -1865,10 +1886,16 @@ static struct page *can_gather_numa_stats_pmd(pmd_t pmd, } #endif +struct show_numa_map_private { + struct numa_maps *md; + struct page *last_page; +}; + static int gather_pte_stats(pmd_t *pmd, unsigned long addr, unsigned long end, struct mm_walk *walk) { - struct numa_maps *md = walk->private; + struct show_numa_map_private *priv = walk->private; + struct numa_maps *md = priv->md; struct vm_area_struct *vma = walk->vma; spinlock_t *ptl; pte_t *orig_pte; @@ -1880,6 +1907,7 @@ static int gather_pte_stats(pmd_t *pmd, unsigned long addr, struct page *page; page = can_gather_numa_stats_pmd(*pmd, vma, addr); + priv->last_page = page; if (page) gather_stats(page, md, pmd_dirty(*pmd), HPAGE_PMD_SIZE/PAGE_SIZE); @@ -1893,6 +1921,7 @@ static int gather_pte_stats(pmd_t *pmd, unsigned long addr, orig_pte = pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl); do { struct page *page = can_gather_numa_stats(*pte, vma, addr); + priv->last_page = page; if (!page) continue; gather_stats(page, md, pte_dirty(*pte), 1); @@ -1903,19 +1932,25 @@ static int gather_pte_stats(pmd_t *pmd, unsigned long addr, return 0; } #ifdef CONFIG_HUGETLB_PAGE -static int gather_hugetlb_stats(pte_t *pte, unsigned long hmask, - unsigned long addr, unsigned long end, struct mm_walk *walk) +static int gather_hugetlb_stats(struct hugetlb_pte *hpte, unsigned long addr, + struct mm_walk *walk) { - pte_t huge_pte = huge_ptep_get(pte); + struct show_numa_map_private *priv = walk->private; + pte_t huge_pte = huge_ptep_get(hpte->ptep); struct numa_maps *md; struct page *page; - if (!pte_present(huge_pte)) + if (!hugetlb_pte_present_leaf(hpte, huge_pte)) + return 0; + + page = compound_head(pte_page(huge_pte)); + if (priv->last_page == page) + /* we've already accounted for this page */ return 0; - page = pte_page(huge_pte); + priv->last_page = page; - md = walk->private; + md = priv->md; gather_stats(page, md, pte_dirty(huge_pte), 1); return 0; } @@ -1945,9 +1980,15 @@ static int show_numa_map(struct seq_file *m, void *v) struct file *file = vma->vm_file; struct mm_struct *mm = vma->vm_mm; struct mempolicy *pol; + char buffer[64]; int nid; + struct show_numa_map_private numa_map_private; + + numa_map_private.md = md; + numa_map_private.last_page = NULL; + if (!mm) return 0; @@ -1977,7 +2018,7 @@ static int show_numa_map(struct seq_file *m, void *v) seq_puts(m, " huge"); /* mmap_lock is held by m_start */ - walk_page_vma(vma, &show_numa_ops, md); + walk_page_vma(vma, &show_numa_ops, &numa_map_private); if (!md->pages) goto out; diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h index 27a6df448ee5..f4bddad615c2 100644 --- a/include/linux/pagewalk.h +++ b/include/linux/pagewalk.h @@ -3,6 +3,7 @@ #define _LINUX_PAGEWALK_H #include +#include struct mm_walk; @@ -31,6 +32,10 @@ struct mm_walk; * ptl after dropping the vma lock, or else revalidate * those items after re-acquiring the vma lock and before * accessing them. + * In the presence of high-granularity hugetlb entries, + * @hugetlb_entry is called only for leaf-level entries + * (hstate-level entries are ignored if they are not + * leaves). * @test_walk: caller specific callback function to determine whether * we walk over the current vma or not. Returning 0 means * "do page table walk over the current vma", returning @@ -58,9 +63,8 @@ struct mm_walk_ops { unsigned long next, struct mm_walk *walk); int (*pte_hole)(unsigned long addr, unsigned long next, int depth, struct mm_walk *walk); - int (*hugetlb_entry)(pte_t *pte, unsigned long hmask, - unsigned long addr, unsigned long next, - struct mm_walk *walk); + int (*hugetlb_entry)(struct hugetlb_pte *hpte, + unsigned long addr, struct mm_walk *walk); int (*test_walk)(unsigned long addr, unsigned long next, struct mm_walk *walk); int (*pre_vma)(unsigned long start, unsigned long end, diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c index 1fec16d7263e..0f001950498a 100644 --- a/mm/damon/vaddr.c +++ b/mm/damon/vaddr.c @@ -330,11 +330,11 @@ static int damon_mkold_pmd_entry(pmd_t *pmd, unsigned long addr, } #ifdef CONFIG_HUGETLB_PAGE -static void damon_hugetlb_mkold(pte_t *pte, struct mm_struct *mm, +static void damon_hugetlb_mkold(struct hugetlb_pte *hpte, pte_t entry, + struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr) { bool referenced = false; - pte_t entry = huge_ptep_get(pte); struct folio *folio = pfn_folio(pte_pfn(entry)); folio_get(folio); @@ -342,12 +342,12 @@ static void damon_hugetlb_mkold(pte_t *pte, struct mm_struct *mm, if (pte_young(entry)) { referenced = true; entry = pte_mkold(entry); - set_huge_pte_at(mm, addr, pte, entry); + set_huge_pte_at(mm, addr, hpte->ptep, entry); } #ifdef CONFIG_MMU_NOTIFIER if (mmu_notifier_clear_young(mm, addr, - addr + huge_page_size(hstate_vma(vma)))) + addr + hugetlb_pte_size(hpte))) referenced = true; #endif /* CONFIG_MMU_NOTIFIER */ @@ -358,20 +358,26 @@ static void damon_hugetlb_mkold(pte_t *pte, struct mm_struct *mm, folio_put(folio); } -static int damon_mkold_hugetlb_entry(pte_t *pte, unsigned long hmask, - unsigned long addr, unsigned long end, +static int damon_mkold_hugetlb_entry(struct hugetlb_pte *hpte, + unsigned long addr, struct mm_walk *walk) { - struct hstate *h = hstate_vma(walk->vma); spinlock_t *ptl; pte_t entry; - ptl = huge_pte_lock(h, walk->mm, pte); - entry = huge_ptep_get(pte); + ptl = hugetlb_pte_lock(hpte); + entry = huge_ptep_get(hpte->ptep); if (!pte_present(entry)) goto out; - damon_hugetlb_mkold(pte, walk->mm, walk->vma, addr); + if (!hugetlb_pte_present_leaf(hpte, entry)) + /* + * We raced with someone splitting a blank PTE. Treat this PTE + * as if it were blank. + */ + goto out; + + damon_hugetlb_mkold(hpte, entry, walk->mm, walk->vma, addr); out: spin_unlock(ptl); @@ -483,8 +489,8 @@ static int damon_young_pmd_entry(pmd_t *pmd, unsigned long addr, } #ifdef CONFIG_HUGETLB_PAGE -static int damon_young_hugetlb_entry(pte_t *pte, unsigned long hmask, - unsigned long addr, unsigned long end, +static int damon_young_hugetlb_entry(struct hugetlb_pte *hpte, + unsigned long addr, struct mm_walk *walk) { struct damon_young_walk_private *priv = walk->private; @@ -493,11 +499,18 @@ static int damon_young_hugetlb_entry(pte_t *pte, unsigned long hmask, spinlock_t *ptl; pte_t entry; - ptl = huge_pte_lock(h, walk->mm, pte); - entry = huge_ptep_get(pte); + ptl = hugetlb_pte_lock(hpte); + entry = huge_ptep_get(hpte->ptep); if (!pte_present(entry)) goto out; + if (!hugetlb_pte_present_leaf(hpte, entry)) + /* + * We raced with someone splitting a blank PTE. Treat this PTE + * as if it were blank. + */ + goto out; + folio = pfn_folio(pte_pfn(entry)); folio_get(folio); diff --git a/mm/hmm.c b/mm/hmm.c index 6a151c09de5e..d3e40cfdd4cb 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -468,8 +468,8 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end, #endif #ifdef CONFIG_HUGETLB_PAGE -static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask, - unsigned long start, unsigned long end, +static int hmm_vma_walk_hugetlb_entry(struct hugetlb_pte *hpte, + unsigned long start, struct mm_walk *walk) { unsigned long addr = start, i, pfn; @@ -479,16 +479,24 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask, unsigned int required_fault; unsigned long pfn_req_flags; unsigned long cpu_flags; + unsigned long hmask = hugetlb_pte_mask(hpte); + unsigned int order = hpte->shift - PAGE_SHIFT; + unsigned long end = start + hugetlb_pte_size(hpte); spinlock_t *ptl; pte_t entry; - ptl = huge_pte_lock(hstate_vma(vma), walk->mm, pte); - entry = huge_ptep_get(pte); + ptl = hugetlb_pte_lock(hpte); + entry = huge_ptep_get(hpte->ptep); + + if (!hugetlb_pte_present_leaf(hpte, entry)) { + spin_unlock(ptl); + return -EAGAIN; + } i = (start - range->start) >> PAGE_SHIFT; pfn_req_flags = range->hmm_pfns[i]; cpu_flags = pte_to_hmm_pfn_flags(range, entry) | - hmm_pfn_flags_order(huge_page_order(hstate_vma(vma))); + hmm_pfn_flags_order(order); required_fault = hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, cpu_flags); if (required_fault) { @@ -605,7 +613,7 @@ int hmm_range_fault(struct hmm_range *range) * in pfns. All entries < last in the pfn array are set to their * output, and all >= are still at their input values. */ - } while (ret == -EBUSY); + } while (ret == -EBUSY || ret == -EAGAIN); return ret; } EXPORT_SYMBOL(hmm_range_fault); diff --git a/mm/memory-failure.c b/mm/memory-failure.c index a1ede7bdce95..0b37cbc6e8ae 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -676,6 +676,7 @@ static int check_hwpoisoned_entry(pte_t pte, unsigned long addr, short shift, unsigned long poisoned_pfn, struct to_kill *tk) { unsigned long pfn = 0; + unsigned long base_pages_poisoned = (1UL << shift) / PAGE_SIZE; if (pte_present(pte)) { pfn = pte_pfn(pte); @@ -686,7 +687,8 @@ static int check_hwpoisoned_entry(pte_t pte, unsigned long addr, short shift, pfn = swp_offset_pfn(swp); } - if (!pfn || pfn != poisoned_pfn) + if (!pfn || pfn < poisoned_pfn || + pfn >= poisoned_pfn + base_pages_poisoned) return 0; set_to_kill(tk, addr, shift); @@ -752,16 +754,15 @@ static int hwpoison_pte_range(pmd_t *pmdp, unsigned long addr, } #ifdef CONFIG_HUGETLB_PAGE -static int hwpoison_hugetlb_range(pte_t *ptep, unsigned long hmask, - unsigned long addr, unsigned long end, - struct mm_walk *walk) +static int hwpoison_hugetlb_range(struct hugetlb_pte *hpte, + unsigned long addr, + struct mm_walk *walk) { struct hwp_walk *hwp = walk->private; - pte_t pte = huge_ptep_get(ptep); - struct hstate *h = hstate_vma(walk->vma); + pte_t pte = huge_ptep_get(hpte->ptep); - return check_hwpoisoned_entry(pte, addr, huge_page_shift(h), - hwp->pfn, &hwp->tk); + return check_hwpoisoned_entry(pte, addr & hugetlb_pte_mask(hpte), + hpte->shift, hwp->pfn, &hwp->tk); } #else #define hwpoison_hugetlb_range NULL diff --git a/mm/mempolicy.c b/mm/mempolicy.c index a256a241fd1d..0f91be88392b 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -558,8 +558,8 @@ static int queue_folios_pte_range(pmd_t *pmd, unsigned long addr, return addr != end ? -EIO : 0; } -static int queue_folios_hugetlb(pte_t *pte, unsigned long hmask, - unsigned long addr, unsigned long end, +static int queue_folios_hugetlb(struct hugetlb_pte *hpte, + unsigned long addr, struct mm_walk *walk) { int ret = 0; @@ -570,8 +570,12 @@ static int queue_folios_hugetlb(pte_t *pte, unsigned long hmask, spinlock_t *ptl; pte_t entry; - ptl = huge_pte_lock(hstate_vma(walk->vma), walk->mm, pte); - entry = huge_ptep_get(pte); + /* We don't migrate high-granularity HugeTLB mappings for now. */ + if (hugetlb_hgm_enabled(walk->vma)) + return -EINVAL; + + ptl = hugetlb_pte_lock(hpte); + entry = huge_ptep_get(hpte->ptep); if (!pte_present(entry)) goto unlock; folio = pfn_folio(pte_pfn(entry)); @@ -608,7 +612,7 @@ static int queue_folios_hugetlb(pte_t *pte, unsigned long hmask, */ if (flags & (MPOL_MF_MOVE_ALL) || (flags & MPOL_MF_MOVE && folio_estimated_sharers(folio) == 1 && - !hugetlb_pmd_shared(pte))) { + !hugetlb_pmd_shared(hpte->ptep))) { if (!isolate_hugetlb(folio, qp->pagelist) && (flags & MPOL_MF_STRICT)) /* diff --git a/mm/mincore.c b/mm/mincore.c index a085a2aeabd8..0894965b3944 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -22,18 +22,29 @@ #include #include "swap.h" -static int mincore_hugetlb(pte_t *pte, unsigned long hmask, unsigned long addr, - unsigned long end, struct mm_walk *walk) +static int mincore_hugetlb(struct hugetlb_pte *hpte, unsigned long addr, + struct mm_walk *walk) { #ifdef CONFIG_HUGETLB_PAGE unsigned char present; + unsigned long end = addr + hugetlb_pte_size(hpte); unsigned char *vec = walk->private; + pte_t pte = huge_ptep_get(hpte->ptep); /* * Hugepages under user process are always in RAM and never * swapped out, but theoretically it needs to be checked. */ - present = pte && !huge_pte_none(huge_ptep_get(pte)); + present = !huge_pte_none(pte); + + /* + * If the pte is present but not a leaf, we raced with someone + * splitting it. For someone to have split it, it must have been + * huge_pte_none before, so treat it as such. + */ + if (pte_present(pte) && !hugetlb_pte_present_leaf(hpte, pte)) + present = false; + for (; addr != end; vec++, addr += PAGE_SIZE) *vec = present; walk->private = vec; diff --git a/mm/mprotect.c b/mm/mprotect.c index 1d4843c97c2a..61263ce9d925 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -564,12 +564,16 @@ static int prot_none_pte_entry(pte_t *pte, unsigned long addr, 0 : -EACCES; } -static int prot_none_hugetlb_entry(pte_t *pte, unsigned long hmask, - unsigned long addr, unsigned long next, +static int prot_none_hugetlb_entry(struct hugetlb_pte *hpte, + unsigned long addr, struct mm_walk *walk) { - return pfn_modify_allowed(pte_pfn(*pte), *(pgprot_t *)(walk->private)) ? - 0 : -EACCES; + pte_t pte = huge_ptep_get(hpte->ptep); + + if (!hugetlb_pte_present_leaf(hpte, pte)) + return -EAGAIN; + return pfn_modify_allowed(pte_pfn(pte), + *(pgprot_t *)(walk->private)) ? 0 : -EACCES; } static int prot_none_test(unsigned long addr, unsigned long next, @@ -612,8 +616,10 @@ mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb, (newflags & VM_ACCESS_FLAGS) == 0) { pgprot_t new_pgprot = vm_get_page_prot(newflags); - error = walk_page_range(current->mm, start, end, - &prot_none_walk_ops, &new_pgprot); + do { + error = walk_page_range(current->mm, start, end, + &prot_none_walk_ops, &new_pgprot); + } while (error == -EAGAIN); if (error) return error; } diff --git a/mm/pagewalk.c b/mm/pagewalk.c index cb23f8a15c13..05ce242f8b7e 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -3,6 +3,7 @@ #include #include #include +#include /* * We want to know the real level where a entry is located ignoring any @@ -296,20 +297,21 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end, struct vm_area_struct *vma = walk->vma; struct hstate *h = hstate_vma(vma); unsigned long next; - unsigned long hmask = huge_page_mask(h); - unsigned long sz = huge_page_size(h); - pte_t *pte; const struct mm_walk_ops *ops = walk->ops; int err = 0; + struct hugetlb_pte hpte; hugetlb_vma_lock_read(vma); do { - next = hugetlb_entry_end(h, addr, end); - pte = hugetlb_walk(vma, addr & hmask, sz); - if (pte) - err = ops->hugetlb_entry(pte, hmask, addr, next, walk); - else if (ops->pte_hole) - err = ops->pte_hole(addr, next, -1, walk); + if (hugetlb_full_walk(&hpte, vma, addr)) { + next = hugetlb_entry_end(h, addr, end); + if (ops->pte_hole) + err = ops->pte_hole(addr, next, -1, walk); + } else { + err = ops->hugetlb_entry( + &hpte, addr, walk); + next = min(addr + hugetlb_pte_size(&hpte), end); + } if (err) break; } while (addr = next, addr != end); From patchwork Sat Feb 18 00:27:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58836 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp142969wrn; Fri, 17 Feb 2023 16:31:52 -0800 (PST) X-Google-Smtp-Source: AK7set9hrT3XkhVLOvmeihAJRNI2LMgNofEm97KrstswJHNVRX+MBGGNbo66XlaEGilwYCpKbM4M X-Received: by 2002:a50:fb01:0:b0:4ad:7bd8:7716 with SMTP id d1-20020a50fb01000000b004ad7bd87716mr5110377edq.40.1676680311947; Fri, 17 Feb 2023 16:31:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676680311; cv=none; d=google.com; s=arc-20160816; b=tF6diTPCekATyUX0cgecU6ddLx6Z11+kKsRHgOIxCFSv3/iRTY52QB9qK8ZUP3cFba Z00gZT1zUGA60D9dBw5QM0LpeQJ4flAwAXrB4qOLMifoAXiwkMpFhLOsXfNXmjR6n9Yd oFIxDtXo9Ih8qIJmY3ATIqDDVJlmk6Vsp7kIXmJNiOyf+2CjziQjpMdWobexnIpCQ2oC MkS603lq1QLL+kDIlBwfWp8CghGp+mR6Z8PAt/Mh6TKwl0AIUgdvPrkW/qek/epV2XjV OYurk9OtGE4CuJxS453RZhm7WlBVbqZxpcZOMW4bcEUPkf8F+a6gLWfzrZ9uCuq0sF3L +qyA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=R2na1A3NSq7U/MTCBc/o5RGQOMMtDEqSAD5VSD0flvE=; b=CzNpjXkbKx1jt0Jye9CW0JDA5L4n5vcvEV/gF1Jp6sgKGZz4hoczsy/RG5hxF9xu7z 7KEpu+0RRHBOFWd/bvKDUwEo5Ek3JV/FgFJrblrqMJqEVLrpfvxWRmWoWoWO4o5AsgwU m0bouV9knQxojb9fSe2arRMI0LCB5ouHVABh7kHjz0c6fDnIL8ZGxjhkYFSzmyfJQlE4 2zhHnVdfUSvgUvBY8Jk0DiAsFmeD21ZM0RRA8KNOSNfaWvBcckizDToWQRpVKQI5ncAK z2jnWxeNuEQfTevbKLPpDu0TsK3inyYNuLnEeFKZqCqDbN8y3BVyXEKWypOxR6H9NDnj dNrg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=qVyA3qUq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n4-20020a05640206c400b004ad7b12d236si3951701edy.318.2023.02.17.16.31.28; Fri, 17 Feb 2023 16:31:51 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=qVyA3qUq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230240AbjBRAay (ORCPT + 99 others); Fri, 17 Feb 2023 19:30:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43080 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229694AbjBRA35 (ORCPT ); Fri, 17 Feb 2023 19:29:57 -0500 Received: from mail-vk1-xa4a.google.com (mail-vk1-xa4a.google.com [IPv6:2607:f8b0:4864:20::a4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6C86368E63 for ; Fri, 17 Feb 2023 16:29:19 -0800 (PST) Received: by mail-vk1-xa4a.google.com with SMTP id g1-20020ac5c5c1000000b00401b81d313bso828558vkl.6 for ; Fri, 17 Feb 2023 16:29:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1676680149; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=R2na1A3NSq7U/MTCBc/o5RGQOMMtDEqSAD5VSD0flvE=; b=qVyA3qUqF5DHCviCI1nzfYS5Svk9ua2LpWd4a99ORevkgEpjI4dqg4lKQMb+bwNcNw 61c9ND7a9iH3goStVFiSljoFRjNY2j1AhhPNcUU/a1rv5fdHxnLbmQMVtti88Q+JYZyU bhe8TGck/nYHhFokZ4sDeNAoyxCO64fiBBLXbHmrXkJokroObrlP3/h9q8bTGXc3eo/F vBbNamFNBm7arbW972qTZ0yke+aKbVBelBbPkEYjM+gx8b1eyLRItOwCX8k0eelqQ1oo E6X1WzYSjvE5m5sQqtV1xVs+q8GAKHv2jxgseAPVQAnj0ZSKEohG90BS+LNhWTC4WS5w 8pnQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1676680149; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=R2na1A3NSq7U/MTCBc/o5RGQOMMtDEqSAD5VSD0flvE=; b=zNstsPHLth5kt06+gbX294bYdljsYHrbEn1ZyZmjZbRpk2DAWWwaRf0Y2h7MT65kN2 kNHatYxMZgbFyFlyTrBfedM0rkt7Ek734QZ+fcjmqCnsqVHAECF2WPksdITnosqjV4h3 zxE0cOyzh6XWAEIPHqrWZ105hzsUv89g4kFKd77JZ+skW0U6979Vpw24DdaZH+1zRXyu 83kuwnz2ay800WWZtwgs/Sn5agzv+89C9sNyHfXHnVSDksHWMzk6CGPxLDxpt1unRdku 2CgflRKwPuq0AEt08x48c2zXhL9AwJLb61oivTmHqEjaI7bpWpQiVa2vpc7L6jftEHTo d6Rw== X-Gm-Message-State: AO0yUKVSlJQiq9oKLyadBYwHXnE9rcSiNF9o2xRaVP2oyg+MPgj7Tvj5 xFJame8ugmcpiZehtfoSuUEGMTcZlhHKAopU X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a67:c485:0:b0:412:4cf3:d0ed with SMTP id d5-20020a67c485000000b004124cf3d0edmr38832vsk.32.1676680149589; Fri, 17 Feb 2023 16:29:09 -0800 (PST) Date: Sat, 18 Feb 2023 00:27:59 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-27-jthoughton@google.com> Subject: [PATCH v2 26/46] mm: rmap: provide pte_order in page_vma_mapped_walk From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758126735129122570?= X-GMAIL-MSGID: =?utf-8?q?1758126735129122570?= page_vma_mapped_walk callers will need this information to know how HugeTLB pages are mapped. pte_order only applies if pte is not NULL. Signed-off-by: James Houghton diff --git a/include/linux/rmap.h b/include/linux/rmap.h index a4570da03e58..87a2c7f422bf 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -387,6 +387,7 @@ struct page_vma_mapped_walk { pmd_t *pmd; pte_t *pte; spinlock_t *ptl; + unsigned int pte_order; unsigned int flags; }; diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 4e448cfbc6ef..08295b122ad6 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -16,6 +16,7 @@ static inline bool not_found(struct page_vma_mapped_walk *pvmw) static bool map_pte(struct page_vma_mapped_walk *pvmw) { pvmw->pte = pte_offset_map(pvmw->pmd, pvmw->address); + pvmw->pte_order = 0; if (!(pvmw->flags & PVMW_SYNC)) { if (pvmw->flags & PVMW_MIGRATION) { if (!is_swap_pte(*pvmw->pte)) @@ -177,6 +178,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) if (!pvmw->pte) return false; + pvmw->pte_order = huge_page_order(hstate); pvmw->ptl = huge_pte_lock(hstate, mm, pvmw->pte); if (!check_pte(pvmw)) return not_found(pvmw); @@ -272,6 +274,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) } pte_unmap(pvmw->pte); pvmw->pte = NULL; + pvmw->pte_order = 0; goto restart; } pvmw->pte++; From patchwork Sat Feb 18 00:28:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58837 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp142998wrn; Fri, 17 Feb 2023 16:31:54 -0800 (PST) X-Google-Smtp-Source: AK7set+OLO93edzgexdcHcBO47tpkxRLHTf71vvP3RDkZnDS9sdF+RhG6/ynyEGmoM1VMXOF1ewB X-Received: by 2002:a17:906:44b:b0:839:74cf:7c4f with SMTP id e11-20020a170906044b00b0083974cf7c4fmr477426eja.8.1676680314828; Fri, 17 Feb 2023 16:31:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676680314; cv=none; d=google.com; s=arc-20160816; b=ebjc9sCmwuvPnPSwKn8LSuiGT3kCe/KcgOWbC91IyiQ0+xq/q/W98NU0xbyV6d1UAw O4puUl2FkYy+nPWwcmXdUcBFIzeVfZJ847H8we7vDcuIqa2286w9x6uCHqnleLPd1fGI NhcjoYntRjoZvGr9vvucjVgXtbavRF31cplSYlt9ZeLo8Nygsw5nSPf3Ng5NFMKPn/Tt isfpSabkfxiUjytD/1Fr247HbHUCV0U0TGZWmw7UoIhFL/HFiuyRfCc3Vbi4AWF0m10f XnUyovTPdP6xGn3S6jx5nZqUyG/gwSwp7gggCqL/h/QkBoEaLzyMWOQL7Qh90HHO3665 uU6g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=C4SpvvUSe/+H51r29czUV8oyanUNWvkyfpCFhzIRnLk=; b=TWlqDvTPcAoUfLXSvtQT9kG05Fasls4fEWKU/Dc9MrRGrySrw6VbxSuq/JVJCQ+Xj8 LfCqX6GcWD8s8Xo/lPP/GLqmZStu0AHIFj4id8y9P8m26Hqpt6uwQBz0HV1vF0/QFz06 jga83eNBU8b3r0t6f1cXIxKTwi+8VYOiY5cwQQ02jCTQONqokPhl/PnvWNMYa35uo27z Cq3AEam3rLPBuEF8RF+2Otbc5y3X2SlQMeRPNID1682+St48xZloAsCv6EZ0qmH2iSN+ yEajp8v7RGVct7TtWeSvZVUC4h4vQpm52iXhOljyY7pH4aFjRUePzi0a4ypRPl9G0rzv 9nAA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=URp4YrNR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v14-20020a17090610ce00b008bd8270048dsi468116ejv.371.2023.02.17.16.31.31; Fri, 17 Feb 2023 16:31:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=URp4YrNR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230246AbjBRAa5 (ORCPT + 99 others); Fri, 17 Feb 2023 19:30:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43524 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230124AbjBRA36 (ORCPT ); Fri, 17 Feb 2023 19:29:58 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A7D6268E64 for ; Fri, 17 Feb 2023 16:29:19 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id 84-20020a251457000000b0091231592671so2246930ybu.1 for ; Fri, 17 Feb 2023 16:29:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=C4SpvvUSe/+H51r29czUV8oyanUNWvkyfpCFhzIRnLk=; b=URp4YrNRsXrEMcz5vF68rwrPTWL0r/1PsOn0iwRwzUv8fGWjA2Zj3WsCDtOHnnEOXI 9xQcdgSYjiC0L8FMkrysuWd8Z4rTcFfESiyb98ZDWB3Rpv8/7b3sLIWRQtcVzhoZSCmY 1q2oSqG9SQQKP9k9222Pkp5CKnK0J5shuyy4KwIs9snSVsRw3F9uu39yYj2MkyXE/1KV b/75TnZWsolWJ9o2Dg5OQzts7dH32JqEhtpz0u50LcpTAXPOuZwDBizDHafxqGuZbm79 w0zdlsUS9tlTAwK9A8se/3bTurakeHRO+WC3ta4JwV799mBHwIfC4VY+Lrp2ra7exXyR sdeQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=C4SpvvUSe/+H51r29czUV8oyanUNWvkyfpCFhzIRnLk=; b=l4nyNGIA82EOLkVfVbGmyZ9oYoEjpM40IK7WRIpknJNFsF/FP9w7aE+RyFdITyxf5H IDuSHPscFo9kfhk9KotlF3KUwEJp2tASZiuxALhCMli0w3MHqpNRrOQyD+ZWkCrQl+32 Bz80GmRAutjWPpyOYFXC/EuWGbWcs4LuEXF8pWq8d/H5XJa1YaOG3KlC0sGZ9ZGXk3ZV qV6OftL5xkUykX+OOCnRryY9EAHXo8eO1IfTf2Crnse9JG1RuhLTCOxOZ71LXjd74sSf ruPZl3aFAOKpDNimYzkPfDkMkZImrgekpJdCx5H9xwcSghzT3qLQKdMtT0bDAWz1+7Oc 5ziA== X-Gm-Message-State: AO0yUKWVmX4X6bypioKQk/qtPd2LB6a34+Q3PaW1GdTUBP56Se3qLf1w 1MR8nUk3Yel4xrM5wNoGvU7rfWnLQcPfv5gn X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a81:ac66:0:b0:52e:ed7f:6e82 with SMTP id z38-20020a81ac66000000b0052eed7f6e82mr257319ywj.9.1676680150801; Fri, 17 Feb 2023 16:29:10 -0800 (PST) Date: Sat, 18 Feb 2023 00:28:00 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-28-jthoughton@google.com> Subject: [PATCH v2 27/46] mm: rmap: update try_to_{migrate,unmap} to handle mapcount for HGM From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758126737763579556?= X-GMAIL-MSGID: =?utf-8?q?1758126737763579556?= Make use of the new pvmw->pte_order field to determine the size of the PTE we're unmapping/migrating. Signed-off-by: James Houghton diff --git a/mm/migrate.c b/mm/migrate.c index 9b4a7e75f6e6..616afcc40fdc 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -247,7 +247,7 @@ static bool remove_migration_pte(struct folio *folio, #ifdef CONFIG_HUGETLB_PAGE if (folio_test_hugetlb(folio)) { - unsigned int shift = huge_page_shift(hstate_vma(vma)); + unsigned int shift = pvmw.pte_order + PAGE_SHIFT; pte = arch_make_huge_pte(pte, shift, vma->vm_flags); if (folio_test_anon(folio)) diff --git a/mm/rmap.c b/mm/rmap.c index c010d0af3a82..0a019ae32f04 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1609,7 +1609,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, if (PageHWPoison(subpage) && !(flags & TTU_IGNORE_HWPOISON)) { pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); if (folio_test_hugetlb(folio)) { - hugetlb_count_sub(folio_nr_pages(folio), mm); + hugetlb_count_sub(1UL << pvmw.pte_order, mm); set_huge_pte_at(mm, address, pvmw.pte, pteval); } else { dec_mm_counter(mm, mm_counter(&folio->page)); @@ -1757,7 +1757,13 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, * * See Documentation/mm/mmu_notifier.rst */ - page_remove_rmap(subpage, vma, folio_test_hugetlb(folio)); + if (folio_test_hugetlb(folio)) + hugetlb_remove_rmap(subpage, + pvmw.pte_order + PAGE_SHIFT, + hstate_vma(vma), vma); + else + page_remove_rmap(subpage, vma, false); + if (vma->vm_flags & VM_LOCKED) mlock_drain_local(); folio_put(folio); @@ -2020,7 +2026,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, } else if (PageHWPoison(subpage)) { pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); if (folio_test_hugetlb(folio)) { - hugetlb_count_sub(folio_nr_pages(folio), mm); + hugetlb_count_sub(1L << pvmw.pte_order, mm); set_huge_pte_at(mm, address, pvmw.pte, pteval); } else { dec_mm_counter(mm, mm_counter(&folio->page)); @@ -2112,7 +2118,12 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, * * See Documentation/mm/mmu_notifier.rst */ - page_remove_rmap(subpage, vma, folio_test_hugetlb(folio)); + if (folio_test_hugetlb(folio)) + hugetlb_remove_rmap(subpage, + pvmw.pte_order + PAGE_SHIFT, + hstate_vma(vma), vma); + else + page_remove_rmap(subpage, vma, false); if (vma->vm_flags & VM_LOCKED) mlock_drain_local(); folio_put(folio); @@ -2196,6 +2207,8 @@ static bool page_make_device_exclusive_one(struct folio *folio, args->owner); mmu_notifier_invalidate_range_start(&range); + VM_BUG_ON_FOLIO(folio_test_hugetlb(folio), folio); + while (page_vma_mapped_walk(&pvmw)) { /* Unexpected PMD-mapped THP? */ VM_BUG_ON_FOLIO(!pvmw.pte, folio); From patchwork Sat Feb 18 00:28:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58839 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp143078wrn; Fri, 17 Feb 2023 16:32:03 -0800 (PST) X-Google-Smtp-Source: AK7set8DlRCxI1hG8b6YIvZTDw9E7i/8mtCN4oFlzKFOjUpiiOvHKhcruxEYVDt0KZ6stNWrAeIq X-Received: by 2002:a05:6402:887:b0:4ab:2504:c7ff with SMTP id e7-20020a056402088700b004ab2504c7ffmr3680839edy.23.1676680323089; Fri, 17 Feb 2023 16:32:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676680323; cv=none; d=google.com; s=arc-20160816; b=XPBroHlejJBNQyBD6acPtHv9bDbfVceyemVyUQHBahk+dbJNT1nGqhZjrQQSqW4rR/ Y5GcowzIVzI2atqAEIbXk0+ndTpbaBFYmpW1srIE8XVaKAAJfRNsX4wpOghtHzO+ZqtE hnuxSL7i8DnE3Rw7lEVq1YzPaMRVUOF4NbmXATAzoQoW27XyW5Et/pHvH4wj1y5AMBcy gc8KDP4iCtPcTO5NUDrXICIVGUCujQx5Hd1b9Zed50YBAjG4vXidG8eec9lmlBBlppxq rPaXStZD3AoyGwPV8AgRdnQkz+LWEdp/sIwGmUbv7z+dumhF5Wke3+uM0PDWcxB9ekDP kQzg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=1rCBHumdufOlTcJkqiypODYa1r7LDyl4S1gdc35TorQ=; b=Ww0Ua8NBfzUVuKnHrNXhmHNc2LJ8XR2vK5btmaHeN4F7LXsrCOGgnFBS+pmn2vvsWZ EBoPIaC0g59z9dc4FnkqfnhZ7UK2aMNyFKEHPgC2V2S2kPGF9ebDVYGY/T8upCQZGluA sdPTVzOMOT6mRQpguusnXQQjCI5GUmML7LXlPOo9PLaQ2oYxPwTItKF2kncBjKuvSCyV IihZvaKdJsmAKv0LURwNI0fjROXci2tbfsG8bkhmOOuxdXsk0qinCfalwJfi4ifQV2h6 P9hhpe6/jC+7/kajmuTB7MA44agZr2ueukeEMRBBKvLH0R2ryadjlM5H+MrLPfgf82w2 0wUw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=YN0Pytyi; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u6-20020a056402064600b004a21b97228bsi4921941edx.224.2023.02.17.16.31.40; Fri, 17 Feb 2023 16:32:03 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=YN0Pytyi; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230263AbjBRAbD (ORCPT + 99 others); Fri, 17 Feb 2023 19:31:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43536 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230131AbjBRAaG (ORCPT ); Fri, 17 Feb 2023 19:30:06 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 003196929B for ; Fri, 17 Feb 2023 16:29:20 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id o14-20020a25810e000000b0095d2ada3d26so1817181ybk.5 for ; Fri, 17 Feb 2023 16:29:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=1rCBHumdufOlTcJkqiypODYa1r7LDyl4S1gdc35TorQ=; b=YN0PytyiHYPIu8m+M2E6v1NUQCnHXY6ytZcGKXyuh7z/uqk1m7+gCvJsZq1js1+Nyk aWy2chcaY0oSjM7oplmnZTRau4yrS7eatEn1k9ey5wfeHKB+fHXyH3hpyjtsmTvEZjCC 943Dvm4iECKIUtQU6bNEzs0C/hZeHT7vfxnkuu/ImNucvgQ6NZzZ5BLuTjeDWoY/lNEN mppTbWX+d43SPhJRbQNNU06lgt5y4Yc251XxwreuYdKf65ASB8UzsacjLW/LKF5eAHUL AdfT5Xxv2SVRLajtE01G0LVCPjbrRrRShVjKlJ9kluNRT6aaGpxY/8iNFGTbZhsxLO7k +hCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=1rCBHumdufOlTcJkqiypODYa1r7LDyl4S1gdc35TorQ=; b=tXI3GDKuAYV5c8vxt6CnXuPv00N1WYMT8XD3DkCRZN3GOwnMBNb4gaujSAVI221c3C LrigrvZL4+pNNPEKw+64zYUUzY0Wd2wIJUMk6C6/rKosX+iVm9BRjPKbuWJtezui38Jj vZ7N0sWQREXyag4g2ueTXWkJR05fWdVGFGdDoGeb3i9Vjd8K4kEeMH24tLekgLfWNzCs HtFEvPg2jWYIGKoSk9LSBUnXjesrbDxlE5XhLLks1CAvlVPpXvhUL9qpqGqNHLBKPbRx xEyyxpAdpfy5Ky6QecdhWXPzP+U/Bp4yFnAl2ulCfpHGwIXJxy8+GuetQbZWdRswH1uz ejPQ== X-Gm-Message-State: AO0yUKU32frAzHvZmGcJDye2BFA7PPVKEJOjfu/bksooCdVU71E7ucOl P1RdK95QkAC2fy9sdIity/VYNSWV/kA4TPh/ X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a5b:910:0:b0:927:a3c1:b2de with SMTP id a16-20020a5b0910000000b00927a3c1b2demr200123ybq.7.1676680151721; Fri, 17 Feb 2023 16:29:11 -0800 (PST) Date: Sat, 18 Feb 2023 00:28:01 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-29-jthoughton@google.com> Subject: [PATCH v2 28/46] mm: rmap: in try_to_{migrate,unmap}, check head page for hugetlb page flags From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758126746635802305?= X-GMAIL-MSGID: =?utf-8?q?1758126746635802305?= The main complication here is that HugeTLB pages have their poison status stored in the head page as the HWPoison page flag. Because HugeTLB high-granularity mapping can create PTEs that point to subpages instead of always the head of a hugepage, we need to check the compound_head for page flags. Signed-off-by: James Houghton diff --git a/mm/rmap.c b/mm/rmap.c index 0a019ae32f04..4908ede83173 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1456,10 +1456,11 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, struct mm_struct *mm = vma->vm_mm; DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); pte_t pteval; - struct page *subpage; + struct page *subpage, *page_flags_page; bool anon_exclusive, ret = true; struct mmu_notifier_range range; enum ttu_flags flags = (enum ttu_flags)(long)arg; + bool page_poisoned; /* * When racing against e.g. zap_pte_range() on another cpu, @@ -1512,9 +1513,17 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, subpage = folio_page(folio, pte_pfn(*pvmw.pte) - folio_pfn(folio)); + /* + * We check the page flags of HugeTLB pages by checking the + * head page. + */ + page_flags_page = folio_test_hugetlb(folio) + ? &folio->page + : subpage; + page_poisoned = PageHWPoison(page_flags_page); address = pvmw.address; anon_exclusive = folio_test_anon(folio) && - PageAnonExclusive(subpage); + PageAnonExclusive(page_flags_page); if (folio_test_hugetlb(folio)) { bool anon = folio_test_anon(folio); @@ -1523,7 +1532,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, * The try_to_unmap() is only passed a hugetlb page * in the case where the hugetlb page is poisoned. */ - VM_BUG_ON_PAGE(!PageHWPoison(subpage), subpage); + VM_BUG_ON_FOLIO(!page_poisoned, folio); /* * huge_pmd_unshare may unmap an entire PMD page. * There is no way of knowing exactly which PMDs may @@ -1606,7 +1615,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, /* Update high watermark before we lower rss */ update_hiwater_rss(mm); - if (PageHWPoison(subpage) && !(flags & TTU_IGNORE_HWPOISON)) { + if (page_poisoned && !(flags & TTU_IGNORE_HWPOISON)) { pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); if (folio_test_hugetlb(folio)) { hugetlb_count_sub(1UL << pvmw.pte_order, mm); @@ -1632,7 +1641,9 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, mmu_notifier_invalidate_range(mm, address, address + PAGE_SIZE); } else if (folio_test_anon(folio)) { - swp_entry_t entry = { .val = page_private(subpage) }; + swp_entry_t entry = { + .val = page_private(page_flags_page) + }; pte_t swp_pte; /* * Store the swap location in the pte. @@ -1822,7 +1833,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, struct mm_struct *mm = vma->vm_mm; DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); pte_t pteval; - struct page *subpage; + struct page *subpage, *page_flags_page; bool anon_exclusive, ret = true; struct mmu_notifier_range range; enum ttu_flags flags = (enum ttu_flags)(long)arg; @@ -1902,9 +1913,16 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, subpage = folio_page(folio, pte_pfn(*pvmw.pte) - folio_pfn(folio)); } + /* + * We check the page flags of HugeTLB pages by checking the + * head page. + */ + page_flags_page = folio_test_hugetlb(folio) + ? &folio->page + : subpage; address = pvmw.address; anon_exclusive = folio_test_anon(folio) && - PageAnonExclusive(subpage); + PageAnonExclusive(page_flags_page); if (folio_test_hugetlb(folio)) { bool anon = folio_test_anon(folio); @@ -2023,7 +2041,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, * No need to invalidate here it will synchronize on * against the special swap migration pte. */ - } else if (PageHWPoison(subpage)) { + } else if (PageHWPoison(page_flags_page)) { pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); if (folio_test_hugetlb(folio)) { hugetlb_count_sub(1L << pvmw.pte_order, mm); From patchwork Sat Feb 18 00:28:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58834 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp142936wrn; Fri, 17 Feb 2023 16:31:47 -0800 (PST) X-Google-Smtp-Source: AK7set/qkWZO5NCBAyxWT65NBEwm/7s1xfr/41OhK89FpxzM4HXpX2XsrQ4TWX0Ejmx9EAjxgCRR X-Received: by 2002:a05:6402:718:b0:4ad:a70c:e001 with SMTP id w24-20020a056402071800b004ada70ce001mr3982633edx.21.1676680307764; Fri, 17 Feb 2023 16:31:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676680307; cv=none; d=google.com; s=arc-20160816; b=xvsDupLsDTRHxJYgUcAootZ5rBIaMnXXpTWN6xy1D8DAcUkxheRZxfGed8wmCQweEp JbrMJR1c4InjvP6RncRmGjJ5cNelHm4Amh6Es/SmrhfXAqbKV/7cTiLoNP1VKrrsM7xP bOCVBf9KK7cbbAf9QqC4H++StHTlPXsZNDfuiBGlWjv00jYLSElTpHWhwvUQrBhG9gee n2BehC4adIU2ssXTMcIm6U7RTSVSNm0KhMCUUJ3g1wcPnPeYD5jfyDGAXmn2DfrdLmmB BAJCggX3JjhLOv402c5AB4IwDWVa6n0Z8kd3DuOtkW9rBuvxqmW9kwd1KMl/K7guQk0n nO+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=mRwFktkeJYIdKMusf4qC81y0TR6AbHEOFSy4VjO6bDw=; b=J19pXxlAZuBcFfD46jEvqJwGAVEMHANgxPQYO1QYP5xQAd3XD/408kkoky3NmhCu3e BS+xMKWPOTnv3hbZBehd6Ljuf6WQgbeUHsvTZmE5cA7TxSL+9dqy2Fe3LqYRqkrMg9e7 2cn+BbbsUUVl6F9UKZLGswugA40bQFQLMZ3hFHvT2ECFy425szjC4pGVe+2a4L/uzf3x sd4mR9+GJn4v0ZCTG/WvOm3rWANNty7mzaeZVmKr1JQKSyzkcWJEqsOl6SKTlSctDTaV gV9h7M7Se3YT0bE2z43qJxT+xuIBRVE47BsFiuO+4NKWfFRN437+LTezh9ITKVUguI6D Gj+Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="Zm6/7/Qa"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d5-20020a056402078500b004ab4183a9b7si6653994edy.396.2023.02.17.16.31.24; Fri, 17 Feb 2023 16:31:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="Zm6/7/Qa"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230127AbjBRAar (ORCPT + 99 others); Fri, 17 Feb 2023 19:30:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42054 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229647AbjBRA3y (ORCPT ); Fri, 17 Feb 2023 19:29:54 -0500 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0224F6BDF1 for ; Fri, 17 Feb 2023 16:29:12 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id l206-20020a25ccd7000000b006fdc6aaec4fso2657262ybf.20 for ; Fri, 17 Feb 2023 16:29:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=mRwFktkeJYIdKMusf4qC81y0TR6AbHEOFSy4VjO6bDw=; b=Zm6/7/QaEOKxvrDuMfeB7vwmF4lLu6UTGBUmswTJcUAlfDSMlIReG5bhg0wTj24gmC pyzKX/4iSeu0TJCnoaa5IeZeAwpPJP4t7J15FYcDft0GCDUyIekB7QiiA5d6VpYF6r1H nOxlRQfswvR4QTg+iFofMR17/iZTfwCKWd+Emo/OWhpb51k0Epcgf3/x2g21X+2fJvir E6KpK03FxgLoBdBKPFmKqjWGZ7c7Mw5BSnv8qT+zAW5LhNVyj/X7uCXj4k6ZrqZKQzWs jRtMwhX56IYmGhi9pxW4bL/oSI6T41SeQ+p8o0DKCkHwnC/wa7X1kwvdzUJ3ryjcOdKA 7yIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=mRwFktkeJYIdKMusf4qC81y0TR6AbHEOFSy4VjO6bDw=; b=6zuD6+cTjnAo9ZOucMhX7KQzXA0lrc+Pnzkkx1zo3zjGhW1v7vu1oOwz8luUMzo0Bb cguKoK10ttoPykd2z8k9mCb3jN359dzrDSreopHDt75jpgXqnzvBodDmNOWTRzFnaFr+ UctmflXE5c+ImJdEpe87yKhlYAouXYQ9ZqYT2kNqORuLOZLD3woHdoIdYuOWaO6gDOVn QaT7R9IpdoZoFn8aKKEOg4vCifyMEhweAAtSA5I+Ai+yT4tbmk5oQ7UPCg7Q9Ub1Yd7W 1PVa62HePTkLW/EmQ/n+94m2ZIIwDyBpB2asqkJRbjR20LRvgLHvTLjdI54BeRRgyh9u s5mQ== X-Gm-Message-State: AO0yUKUGwLrspjBtDwxF0DEKMKyEUUMMZx13z6ehxrJ1SMLn+OE66lYv jEHshnsE11LUSQ+iJjUP3LJp1D1N1Hg7Tctx X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6902:1103:b0:8ed:3426:8a69 with SMTP id o3-20020a056902110300b008ed34268a69mr91121ybu.1.1676680152644; Fri, 17 Feb 2023 16:29:12 -0800 (PST) Date: Sat, 18 Feb 2023 00:28:02 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-30-jthoughton@google.com> Subject: [PATCH v2 29/46] hugetlb: update page_vma_mapped to do high-granularity walks From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758126730699149859?= X-GMAIL-MSGID: =?utf-8?q?1758126730699149859?= Update the HugeTLB logic to look a lot more like the PTE-mapped THP logic. When a user calls us in a loop, we will update pvmw->address to walk to each page table entry that could possibly map the hugepage containing pvmw->pfn. Make use of the new pte_order so callers know what size PTE they're getting. The !pte failure case is changed to call not_found() instead of just returning false. This should be a no-op, but if somehow the hstate-level PTE were deallocated between iterations, not_found() should be called to drop locks. Signed-off-by: James Houghton diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 08295b122ad6..03e8a4987272 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -133,7 +133,8 @@ static void step_forward(struct page_vma_mapped_walk *pvmw, unsigned long size) * * Returns true if the page is mapped in the vma. @pvmw->pmd and @pvmw->pte point * to relevant page table entries. @pvmw->ptl is locked. @pvmw->address is - * adjusted if needed (for PTE-mapped THPs). + * adjusted if needed (for PTE-mapped THPs and high-granularity-mapped HugeTLB + * pages). * * If @pvmw->pmd is set but @pvmw->pte is not, you have found PMD-mapped page * (usually THP). For PTE-mapped THP, you should run page_vma_mapped_walk() in @@ -165,23 +166,47 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) if (unlikely(is_vm_hugetlb_page(vma))) { struct hstate *hstate = hstate_vma(vma); - unsigned long size = huge_page_size(hstate); - /* The only possible mapping was handled on last iteration */ - if (pvmw->pte) - return not_found(pvmw); - /* - * All callers that get here will already hold the - * i_mmap_rwsem. Therefore, no additional locks need to be - * taken before calling hugetlb_walk(). - */ - pvmw->pte = hugetlb_walk(vma, pvmw->address, size); - if (!pvmw->pte) - return false; + struct hugetlb_pte hpte; + pte_t pteval; + + end = (pvmw->address & huge_page_mask(hstate)) + + huge_page_size(hstate); + + do { + if (pvmw->pte) { + if (pvmw->ptl) + spin_unlock(pvmw->ptl); + pvmw->ptl = NULL; + pvmw->address += PAGE_SIZE << pvmw->pte_order; + if (pvmw->address >= end) + return not_found(pvmw); + } - pvmw->pte_order = huge_page_order(hstate); - pvmw->ptl = huge_pte_lock(hstate, mm, pvmw->pte); - if (!check_pte(pvmw)) - return not_found(pvmw); + /* + * All callers that get here will already hold the + * i_mmap_rwsem. Therefore, no additional locks need to + * be taken before calling hugetlb_walk(). + */ + if (hugetlb_full_walk(&hpte, vma, pvmw->address)) + return not_found(pvmw); + +retry: + pvmw->pte = hpte.ptep; + pvmw->pte_order = hpte.shift - PAGE_SHIFT; + pvmw->ptl = hugetlb_pte_lock(&hpte); + pteval = huge_ptep_get(hpte.ptep); + if (pte_present(pteval) && !hugetlb_pte_present_leaf( + &hpte, pteval)) { + /* + * Someone split from under us, so keep + * walking. + */ + spin_unlock(pvmw->ptl); + hugetlb_full_walk_continue(&hpte, vma, + pvmw->address); + goto retry; + } + } while (!check_pte(pvmw)); return true; } From patchwork Sat Feb 18 00:28:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58840 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp143087wrn; Fri, 17 Feb 2023 16:32:04 -0800 (PST) X-Google-Smtp-Source: AK7set+AWr4uZmM1p8aBDxvDQntPWI0nInRJUSyIa4p0u2mPQHVqT5d9S6D1c56iP2k6i/xhaIey X-Received: by 2002:a17:90b:3881:b0:233:be3d:8a49 with SMTP id mu1-20020a17090b388100b00233be3d8a49mr2168383pjb.11.1676680323874; Fri, 17 Feb 2023 16:32:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676680323; cv=none; d=google.com; s=arc-20160816; b=H2zv83fq/D2sRcJFNUKNCGWj0rloLdIvbMj9h6VLyQCmqtrZVDSF5Yle6TKGQf0tit jYg20OoVqwKChpl8TvMiZuAwH3TPDerNUegN/RFjWiya3FS5I/V1TtV6suVe7/exIOqs TEW/u8W94NYDNGTGQLlcm0pjgXO8TB1RLdPkq/dhBDI0NSSYnIRxBIz4tmx+9lyKBbw2 xY1/BWZSKmXdL61/qB9pyjvQrnQZOVkyaSNvfMJ+qxHPvTXSC67g+UK5OvDKAV0DOBak R56Wn/LyLw1LOKoWBgj6RMg1xO9mngzosQOwBARU1MHQImv3+39dDu7XjKdJIlHkQUIf t9Cw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=6R+bM0KtFeTkcEJ8SgP6306LOQusudAcElSasV+tkuA=; b=JNFeckVypZFwvHWWBsoOD71IoMtuAWtv18IgYJA31HGhlk2gvTrBajUi/z+SOhH+lX e3JASgikxHhYvP+guqzyB2JT1XYgkyMamob16FfMerxWieJ+eEPafUIYZif8t4QGEarh wUt+MXDz02/8zIBt8MpsCDOFkBPs8qCCOqDQfyieE8nMXcbDBYdRdoWZTwkdao3pdpHq r/musjmy6jM/UPLoHnalvmbnvEwYIG1gcIhd2O54syNyEzfVC5IquP1BM2LbR6Ke+90N S0mGSPkNFyYguJasqps+G/JCzh0pt/gVk0TbcjZFVmwfQLyClYQR8rj4SaJPzrlFvgpf ZKlg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="dX7/40KB"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bv3-20020a17090af18300b0022be3049e52si2281318pjb.34.2023.02.17.16.31.50; Fri, 17 Feb 2023 16:32:03 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="dX7/40KB"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230271AbjBRAbN (ORCPT + 99 others); Fri, 17 Feb 2023 19:31:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43648 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229563AbjBRAaJ (ORCPT ); Fri, 17 Feb 2023 19:30:09 -0500 Received: from mail-ua1-x949.google.com (mail-ua1-x949.google.com [IPv6:2607:f8b0:4864:20::949]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D0CED6ABE5 for ; Fri, 17 Feb 2023 16:29:23 -0800 (PST) Received: by mail-ua1-x949.google.com with SMTP id v19-20020ab02013000000b0068b9f3e0a2dso625893uak.6 for ; Fri, 17 Feb 2023 16:29:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1676680154; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=6R+bM0KtFeTkcEJ8SgP6306LOQusudAcElSasV+tkuA=; b=dX7/40KBqq/CpQhiOpkAQIzPNluCgT+pqoGZRKlN9aa8QluovsQruDCu/wh6+Xi7HL 6kLXZgl+5XKDk7FsMYGYtr2yyFU9erZiSo7V2dyAwIFPJO5DAOooAQa2F7w+5zW0Hqdu QubYMhLcbqNzMIVdTubD/VkEI82Q/CwfufnycIJ0sLbdRiI6AathfyY49TSn1Ly/A6+9 +4K1wirIYyv3shJT02BYKt93mVx2ZH0jSU1+ZgamxSym8xZo1mO/BLj8U+ZTd86LEOwb MxBJulRTBsOmTowXosHxX9g5jLuhbpjY1XM7qMG0aeLiOSSYz9qPT0kULRTK/B6ula9l hMbw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1676680154; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=6R+bM0KtFeTkcEJ8SgP6306LOQusudAcElSasV+tkuA=; b=cJrD3+hq5WY0ANbp4ZCwIe9zpPqlv5jbfkEpLrdx0T1gsqDYCwc0hhmruYOFTLI7Pm A59UPSHYbSEHIv7tGF0oFVITc28Fsrd53io/Tm7WKyCr7bDbvaXcXIDrmtDMjKmsX3vg OpDcGRnAOxxMhFcj8tQgPQXKvKjcpjWJ8BIfZ5vyJMEx6bid1kePz93SHiraPDG7pHYl 6cv1Pwf/NeKNIROZBqwb/2ZI993aXIoLFzsE3/n8Bkt9MN9mQVcVu3/TG9VzntpH+Ycp SQgOrwDn4pWCDGUcBOXtzN7hvZJB0iCm8kLAyn7xuBWFdUEYiZFPTQv98eFhPMw+9OSq N7rg== X-Gm-Message-State: AO0yUKVKb5T3f9fYJiri3zAP/TCYorE8uaLUhO9ayGkdMrPR9JOFnp0+ bvKebIu8p5+axqhIfwhjspGyINs5HM9GZ8pS X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a1f:a041:0:b0:401:7fe9:ff7f with SMTP id j62-20020a1fa041000000b004017fe9ff7fmr213533vke.5.1676680153948; Fri, 17 Feb 2023 16:29:13 -0800 (PST) Date: Sat, 18 Feb 2023 00:28:03 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-31-jthoughton@google.com> Subject: [PATCH v2 30/46] hugetlb: add high-granularity migration support From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758126747072020393?= X-GMAIL-MSGID: =?utf-8?q?1758126747072020393?= To prevent queueing a hugepage for migration multiple times, we use last_folio to keep track of the last page we saw in queue_pages_hugetlb, and if the page we're looking at is last_folio, then we skip it. For the non-hugetlb cases, last_folio, although unused, is still updated so that it has a consistent meaning with the hugetlb case. Signed-off-by: James Houghton diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 3a451b7afcb3..6ef80763e629 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -68,6 +68,8 @@ static inline bool is_pfn_swap_entry(swp_entry_t entry); +struct hugetlb_pte; + /* Clear all flags but only keep swp_entry_t related information */ static inline pte_t pte_swp_clear_flags(pte_t pte) { @@ -339,7 +341,8 @@ extern void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, #ifdef CONFIG_HUGETLB_PAGE extern void __migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *ptep, spinlock_t *ptl); -extern void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte); +extern void migration_entry_wait_huge(struct vm_area_struct *vma, + struct hugetlb_pte *hpte); #endif /* CONFIG_HUGETLB_PAGE */ #else /* CONFIG_MIGRATION */ static inline swp_entry_t make_readable_migration_entry(pgoff_t offset) @@ -369,7 +372,8 @@ static inline void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, #ifdef CONFIG_HUGETLB_PAGE static inline void __migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *ptep, spinlock_t *ptl) { } -static inline void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte) { } +static inline void migration_entry_wait_huge(struct vm_area_struct *vma, + struct hugetlb_pte *hpte) { } #endif /* CONFIG_HUGETLB_PAGE */ static inline int is_writable_migration_entry(swp_entry_t entry) { diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 86cd51beb02c..39f541b4a0a8 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6418,7 +6418,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, * be released there. */ mutex_unlock(&hugetlb_fault_mutex_table[hash]); - migration_entry_wait_huge(vma, hpte.ptep); + migration_entry_wait_huge(vma, &hpte); return 0; } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) ret = VM_FAULT_HWPOISON_LARGE | diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 0f91be88392b..43e210181cce 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -424,6 +424,7 @@ struct queue_pages { unsigned long start; unsigned long end; struct vm_area_struct *first; + struct folio *last_folio; }; /* @@ -475,6 +476,7 @@ static int queue_folios_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr, flags = qp->flags; /* go to folio migration */ if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) { + qp->last_folio = folio; if (!vma_migratable(walk->vma) || migrate_folio_add(folio, qp->pagelist, flags)) { ret = 1; @@ -539,6 +541,8 @@ static int queue_folios_pte_range(pmd_t *pmd, unsigned long addr, break; } + qp->last_folio = folio; + /* * Do not abort immediately since there may be * temporary off LRU pages in the range. Still @@ -570,15 +574,22 @@ static int queue_folios_hugetlb(struct hugetlb_pte *hpte, spinlock_t *ptl; pte_t entry; - /* We don't migrate high-granularity HugeTLB mappings for now. */ - if (hugetlb_hgm_enabled(walk->vma)) - return -EINVAL; - ptl = hugetlb_pte_lock(hpte); entry = huge_ptep_get(hpte->ptep); if (!pte_present(entry)) goto unlock; - folio = pfn_folio(pte_pfn(entry)); + + if (!hugetlb_pte_present_leaf(hpte, entry)) { + ret = -EAGAIN; + goto unlock; + } + + folio = page_folio(pte_page(entry)); + + /* We already queued this page with another high-granularity PTE. */ + if (folio == qp->last_folio) + goto unlock; + if (!queue_folio_required(folio, qp)) goto unlock; @@ -747,6 +758,7 @@ queue_pages_range(struct mm_struct *mm, unsigned long start, unsigned long end, .start = start, .end = end, .first = NULL, + .last_folio = NULL, }; err = walk_page_range(mm, start, end, &queue_pages_walk_ops, &qp); diff --git a/mm/migrate.c b/mm/migrate.c index 616afcc40fdc..b26169990532 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -196,6 +196,9 @@ static bool remove_migration_pte(struct folio *folio, /* pgoff is invalid for ksm pages, but they are never large */ if (folio_test_large(folio) && !folio_test_hugetlb(folio)) idx = linear_page_index(vma, pvmw.address) - pvmw.pgoff; + else if (folio_test_hugetlb(folio)) + idx = (pvmw.address & ~huge_page_mask(hstate_vma(vma)))/ + PAGE_SIZE; new = folio_page(folio, idx); #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION @@ -247,14 +250,16 @@ static bool remove_migration_pte(struct folio *folio, #ifdef CONFIG_HUGETLB_PAGE if (folio_test_hugetlb(folio)) { + struct page *hpage = folio_page(folio, 0); unsigned int shift = pvmw.pte_order + PAGE_SHIFT; pte = arch_make_huge_pte(pte, shift, vma->vm_flags); if (folio_test_anon(folio)) - hugepage_add_anon_rmap(new, vma, pvmw.address, + hugepage_add_anon_rmap(hpage, vma, pvmw.address, rmap_flags); else - page_add_file_rmap(new, vma, true); + hugetlb_add_file_rmap(new, shift, + hstate_vma(vma), vma); set_huge_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte); } else #endif @@ -270,7 +275,7 @@ static bool remove_migration_pte(struct folio *folio, mlock_drain_local(); trace_remove_migration_pte(pvmw.address, pte_val(pte), - compound_order(new)); + pvmw.pte_order); /* No need to invalidate - it was non-present before */ update_mmu_cache(vma, pvmw.address, pvmw.pte); @@ -361,12 +366,10 @@ void __migration_entry_wait_huge(struct vm_area_struct *vma, } } -void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte) +void migration_entry_wait_huge(struct vm_area_struct *vma, + struct hugetlb_pte *hpte) { - spinlock_t *ptl = huge_pte_lockptr(huge_page_shift(hstate_vma(vma)), - vma->vm_mm, pte); - - __migration_entry_wait_huge(vma, pte, ptl); + __migration_entry_wait_huge(vma, hpte->ptep, hpte->ptl); } #endif From patchwork Sat Feb 18 00:28:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58841 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp143147wrn; Fri, 17 Feb 2023 16:32:11 -0800 (PST) X-Google-Smtp-Source: AK7set8Aw0pWA4GeBiaeRYTkx5cvzZ2riA/EFiJ26Vj9w1MjRKbRW3K/RkPms72e8wWKhCyboLiK X-Received: by 2002:a17:903:41cc:b0:196:704e:2c97 with SMTP id u12-20020a17090341cc00b00196704e2c97mr3899496ple.25.1676680331616; Fri, 17 Feb 2023 16:32:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676680331; cv=none; d=google.com; s=arc-20160816; b=KOU4O9zrPTo26RllKsMD1a1EKJcYn58y9/lSnSiAxAUyduyoxVNgGOGIBQ31Wa2Mzc dx4W68D4ofiJJ638r54P9sPbvQO7S1FzpV87XR8cNDsTKwcekGN+DQSPzms7AV1QmR6p m1Bvw3YwXCBCw9MaCRzMTeIvXTjMq+8LjdOKCjuwOCy2mszAKAfiDquX/KHR9Sh4hMkN FiJYh0nBA1kzPpOifThUGcI6u+uOWGfV0S5gQJOr0iO+guKX4zuzPThB5wUPH3qoYFmZ NXCQsX4/ip+ljH51ieFgKNV8mKi6lq9ZIftPyp3Idddl2AjWpf0ocJvTs8xp+xVUnmVj wB4w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=h5QeZmWStm8MBBmvLyhf4o+hksgZUS71zRGtmoeJ5N4=; b=oCWQqlR4gVPcpXX92PRV0QM/lSjQ0ee2ciiyx2YU5dLTPHhdG6qMIUmqX+8sBy2jOU uySPTJIjAC4QHd5HTQKdkDnpRExX87ndwxXoa42pr/o53Q9PNKcEaGeUN/DkKa95OQhB 6Y93pGmSOOcdVi+ZnzBFs63KG453/SM8Bw6ootMPZODyqE9uO307pmlgQOF+ZfY/2Qm8 tDLN2U+2M1eUHcqczdrhg8n1xVARWeZu3nQYkIkzgwpJvFUcX2ycIYoenGwH1/DMeqcK yN55xOpVXr+VOozRpsrXc7PblCo1l6N6AZUXMpQCmTBxb5fCXykNMkboFBk60C2QhIqZ tpxw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=iqrumGyh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p6-20020a170902780600b0019b7766afdfsi1043453pll.626.2023.02.17.16.31.59; Fri, 17 Feb 2023 16:32:11 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=iqrumGyh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230072AbjBRAbP (ORCPT + 99 others); Fri, 17 Feb 2023 19:31:15 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43086 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229577AbjBRAaJ (ORCPT ); Fri, 17 Feb 2023 19:30:09 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E47786ABE8 for ; Fri, 17 Feb 2023 16:29:23 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id o137-20020a25418f000000b009419f64f6afso2165216yba.2 for ; Fri, 17 Feb 2023 16:29:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=h5QeZmWStm8MBBmvLyhf4o+hksgZUS71zRGtmoeJ5N4=; b=iqrumGyh+1DXgn+jsoYD5tlyxEnhTKa0RiDswgYlulz2R+ePcUkS4hk7pJr1l5TN4Z t4OGyf3Ee88VCdy7Q/P2cmEh2dAWdOVlOvBP1o2vJ8WyqBhGxkkoh3k8/Itb4ZdbmqKm cc2rGHBwWYUHY+CQ/nhxnlepr/hLoBQcN2mefhb5d+Skxpoc3mQ6YYqwqicwAf3fMJi9 IbApGiC57N5YDkCsKcP9870idXAalXkC8/897RuNBgRUzMFJC0wrZjC6bZRbVALKQPA4 KUFkbVYqdOLVPYutCh7Ev7uHgUEdcvwEGi0C/ONTXZeJkJHpEuUFB85nzCIxbQ0alG9k tAvw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=h5QeZmWStm8MBBmvLyhf4o+hksgZUS71zRGtmoeJ5N4=; b=HqcE7V/mOynNTi4Zfix8nkvSx5PtsBEOrGTWEu4TQXqYEzEp8U4xkROeLkztMjHf1p yIjMDZc5pzw3j06TNkFDRU7UWKEFImW8MEUDn4rf4KftV26Wc6DRBusyxRHmzVOn0jKg oYDxGhBbhLkXcQaJZTtKufEnYdFhOTbM6E2Y0OEZ0GwekjXcimyMYNx7P2ZbRYaEVG8V 8QlC6SbpWmu1LJ2HYo7Vqo2k6mtypnPqy30gV4k6iIlgHvsYs16uWyaMh3/O/nh75Hnw aGk/F60D7gKRBtImVR+HfkpKYokBaqxpq74/9j4+qLA9VkC2jFSnQQknJvVekYCdsSHD 3LHg== X-Gm-Message-State: AO0yUKV7EV/bGmkUlOhWyLNUIbi2MqZPsiU0cLnY1o5BJUkgQrY2EtZ3 6VujJs9X1Hx71/sFrt3okEuSjqBggfrUlxV1 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:e211:0:b0:872:465e:2cbf with SMTP id h17-20020a25e211000000b00872465e2cbfmr1298716ybe.264.1676680154885; Fri, 17 Feb 2023 16:29:14 -0800 (PST) Date: Sat, 18 Feb 2023 00:28:04 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-32-jthoughton@google.com> Subject: [PATCH v2 31/46] hugetlb: sort hstates in hugetlb_init_hstates From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758126755410103973?= X-GMAIL-MSGID: =?utf-8?q?1758126755410103973?= When using HugeTLB high-granularity mapping, we need to go through the supported hugepage sizes in decreasing order so that we pick the largest size that works. Consider the case where we're faulting in a 1G hugepage for the first time: we want hugetlb_fault/hugetlb_no_page to map it with a PUD. By going through the sizes in decreasing order, we will find that PUD_SIZE works before finding out that PMD_SIZE or PAGE_SIZE work too. This commit also changes bootmem hugepages from storing hstate pointers directly to storing the hstate sizes. The hstate pointers used for boot-time-allocated hugepages become invalid after we sort the hstates. `gather_bootmem_prealloc`, called after the hstates have been sorted, now converts the size to the correct hstate. Signed-off-by: James Houghton diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 2fe1eb6897d4..a344f9d9eba1 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -766,7 +766,7 @@ struct hstate { struct huge_bootmem_page { struct list_head list; - struct hstate *hstate; + unsigned long hstate_sz; }; int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 39f541b4a0a8..e20df8f6216e 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -34,6 +34,7 @@ #include #include #include +#include #include #include @@ -49,6 +50,10 @@ int hugetlb_max_hstate __read_mostly; unsigned int default_hstate_idx; +/* + * After hugetlb_init_hstates is called, hstates will be sorted from largest + * to smallest. + */ struct hstate hstates[HUGE_MAX_HSTATE]; #ifdef CONFIG_CMA @@ -3464,7 +3469,7 @@ int __alloc_bootmem_huge_page(struct hstate *h, int nid) /* Put them into a private list first because mem_map is not up yet */ INIT_LIST_HEAD(&m->list); list_add(&m->list, &huge_boot_pages); - m->hstate = h; + m->hstate_sz = huge_page_size(h); return 1; } @@ -3479,7 +3484,7 @@ static void __init gather_bootmem_prealloc(void) list_for_each_entry(m, &huge_boot_pages, list) { struct page *page = virt_to_page(m); struct folio *folio = page_folio(page); - struct hstate *h = m->hstate; + struct hstate *h = size_to_hstate(m->hstate_sz); VM_BUG_ON(!hstate_is_gigantic(h)); WARN_ON(folio_ref_count(folio) != 1); @@ -3595,9 +3600,38 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h) kfree(node_alloc_noretry); } +static int compare_hstates_decreasing(const void *a, const void *b) +{ + unsigned long sz_a = huge_page_size((const struct hstate *)a); + unsigned long sz_b = huge_page_size((const struct hstate *)b); + + if (sz_a < sz_b) + return 1; + if (sz_a > sz_b) + return -1; + return 0; +} + +static void sort_hstates(void) +{ + unsigned long default_hstate_sz = huge_page_size(&default_hstate); + + /* Sort from largest to smallest. */ + sort(hstates, hugetlb_max_hstate, sizeof(*hstates), + compare_hstates_decreasing, NULL); + + /* + * We may have changed the location of the default hstate, so we need to + * update it. + */ + default_hstate_idx = hstate_index(size_to_hstate(default_hstate_sz)); +} + static void __init hugetlb_init_hstates(void) { - struct hstate *h, *h2; + struct hstate *h; + + sort_hstates(); for_each_hstate(h) { /* oversize hugepages were init'ed in early boot */ @@ -3616,13 +3650,8 @@ static void __init hugetlb_init_hstates(void) continue; if (hugetlb_cma_size && h->order <= HUGETLB_PAGE_ORDER) continue; - for_each_hstate(h2) { - if (h2 == h) - continue; - if (h2->order < h->order && - h2->order > h->demote_order) - h->demote_order = h2->order; - } + if (h - 1 >= &hstates[0]) + h->demote_order = huge_page_order(h - 1); } } From patchwork Sat Feb 18 00:28:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58842 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp143166wrn; Fri, 17 Feb 2023 16:32:14 -0800 (PST) X-Google-Smtp-Source: AK7set+giMMq3U7A0zKdzbvg/lksgPe4tKu3W1LAia1al1wmfOwc+oLkDkFJOfWSj5PONqlRJOaG X-Received: by 2002:a05:6a20:840c:b0:c7:1f31:8792 with SMTP id c12-20020a056a20840c00b000c71f318792mr10205976pzd.7.1676680333889; Fri, 17 Feb 2023 16:32:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676680333; cv=none; d=google.com; s=arc-20160816; b=pkVlrbt1ae1saWXJz/wic+6CYikHOGdcQSgVptna5P3TMngo4226wlC6dy4aQJ4dRA Yym2qJyae5+er2QCd3wGVYWtkgtrzLvqfoYMrNWxaJgsE67xAHMqxbfPIbtwFWwTtDOb RpTcGAn1HB9QHyIo3mSOXhnSC1kvfKglKqmatG8blmwIf/iZ3ge1dPqeSdRM88Xz8fb1 HulqvtQ8DZQTvAL7xLXHMlMrkw/WH2E+F84ftwyBQ1W1j2600jqUZg9VsiW9FwG6SuRW f5wIRj6eyUJG5l/sxnKQ2I0UgQzG577hrEiV5o+uKPsjL1V2Gjt30ILltGWSLJqTVbsL PcLw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=Hf03eS1BobxnoyUk+0ZYrZ8zRMgrel2ztMkHa50OYSI=; b=soBASEB3KOMScupfMYnrW8nTA/mcCAcZ2bIW9jq9QOulRidM0I/uA0xZA1Lao/29rL jRfibtmLOko44Lt4F4manghKyDLyi6v6cbO4nqFmsEBgmVK02S3Y0ETa38zKpeAkG7kX BK4nvF+q1lJMjfg2iLHpB4FbEMRRPogg5vlJE2zQ6NuQVPSAA4Qw7W/hzJ9ozEn9c6zs fpEG408RoZMgqXImBQdbiykjRASXFMJgTtuM+dcsWoESjewCFHVuaTOYHwwLHMCA6bkY V2nBklqzx9r6PmZdKijxPgzSoVN8FujvusQdSEbAeLQ+1ORSqNmSsHv9e61mifukL6pp umxQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=n5krxTWv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i13-20020a63584d000000b004fbba63b116si5586209pgm.548.2023.02.17.16.32.01; Fri, 17 Feb 2023 16:32:13 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=n5krxTWv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230290AbjBRAbR (ORCPT + 99 others); Fri, 17 Feb 2023 19:31:17 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43652 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230169AbjBRAaJ (ORCPT ); Fri, 17 Feb 2023 19:30:09 -0500 Received: from mail-ua1-x94a.google.com (mail-ua1-x94a.google.com [IPv6:2607:f8b0:4864:20::94a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D5F606B30E for ; Fri, 17 Feb 2023 16:29:24 -0800 (PST) Received: by mail-ua1-x94a.google.com with SMTP id x2-20020ab03802000000b0060d5bfd73b5so939645uav.16 for ; Fri, 17 Feb 2023 16:29:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Hf03eS1BobxnoyUk+0ZYrZ8zRMgrel2ztMkHa50OYSI=; b=n5krxTWva4Sf4Vsvk57VrejZS7qh83I29bbOukhXhjxnbyiLVwsIU1jYSB+TyjblCL MilLcyhHjSUyY+1YLmOg+OnBETPCIOw/L4b8Fi1wC+MKnO4VLbaVIbLLSIpqvEF5fKek S7PVzlegxhgRs1TdpdgnswFYQr0DAofh6bzKR/4aYfTbY2USEybj9EjCijALtRxrpFmY oRt8kPX/U/3EtMyNyG9s3xqsF3Jx65+txOMviZr70m7fuLxiVCeDc004dBLqVT+P2gLb aZ2Lhu7IFeKdLx2Du68oGZvPZtxzFiuZ9OH+JAdOUn2J1T26W3LFbmXb5sUZtm1q1FgI +6FA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Hf03eS1BobxnoyUk+0ZYrZ8zRMgrel2ztMkHa50OYSI=; b=SnaMBtiKGN2ANxXr/9z/eR0zlhMrtyddzCaFLYP8QdJrALxCJbhIhmYN67y2Rfmmhh ogRgLVT1PGQ52OkZnWU1ww/C2d1e2wLI7GJeltWpG3pK1tLJXQ8NDASKebT2x1kyykYt ankOLa4gPsfrr8nRVJ2qNg3xwW9TbqnhaO520tAESkV/fpfiEW68BBDzMsZFjx1Ffuwh v0LmWstW6vYAY6J30lhvyrIUHk41btfHNx+agnC+Fr2qge4eaas9+ARCYvmHEUqiYd/f covEkFUUWmsb3suqCrt5+FBw7u/73bJJnuYO2tBMUaMVQVszU0NOC1DFd81z+Qe8bEZh iRVQ== X-Gm-Message-State: AO0yUKXzI1IYAhT4szge27jnIfl0HYwNw4jEN2Qy1pYyuGApg42S5zO8 zU7uG42k4UqgOjRlDX81FasMKc9qNR0MzE5G X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6102:356b:b0:415:48ce:8597 with SMTP id bh11-20020a056102356b00b0041548ce8597mr942711vsb.8.1676680155862; Fri, 17 Feb 2023 16:29:15 -0800 (PST) Date: Sat, 18 Feb 2023 00:28:05 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-33-jthoughton@google.com> Subject: [PATCH v2 32/46] hugetlb: add for_each_hgm_shift From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758126758136152116?= X-GMAIL-MSGID: =?utf-8?q?1758126758136152116?= This is a helper macro to loop through all the usable page sizes for a high-granularity-enabled HugeTLB VMA. Given the VMA's hstate, it will loop, in descending order, through the page sizes that HugeTLB supports for this architecture. It always includes PAGE_SIZE. This is done by looping through the hstates; however, there is no hstate for PAGE_SIZE. To handle this case, the loop intentionally goes out of bounds, and the out-of-bounds pointer is mapped to PAGE_SIZE. Signed-off-by: James Houghton diff --git a/mm/hugetlb.c b/mm/hugetlb.c index e20df8f6216e..667e82b7a0ff 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -7941,6 +7941,24 @@ bool hugetlb_hgm_enabled(struct vm_area_struct *vma) { return vma && (vma->vm_flags & VM_HUGETLB_HGM); } +/* Should only be used by the for_each_hgm_shift macro. */ +static unsigned int __shift_for_hstate(struct hstate *h) +{ + /* If h is out of bounds, we have reached the end, so give PAGE_SIZE */ + if (h >= &hstates[hugetlb_max_hstate]) + return PAGE_SHIFT; + return huge_page_shift(h); +} + +/* + * Intentionally go out of bounds. An out-of-bounds hstate will be converted to + * PAGE_SIZE. + */ +#define for_each_hgm_shift(hstate, tmp_h, shift) \ + for ((tmp_h) = hstate; (shift) = __shift_for_hstate(tmp_h), \ + (tmp_h) <= &hstates[hugetlb_max_hstate]; \ + (tmp_h)++) + #endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ /* From patchwork Sat Feb 18 00:28:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58835 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp142942wrn; Fri, 17 Feb 2023 16:31:49 -0800 (PST) X-Google-Smtp-Source: AK7set/IOSRSRDsub1GoPN30pCMCGNQefum1/N1TtJ95/h7qpoK6w4pPzQRpkCJVr6z5praEgz5b X-Received: by 2002:a17:906:53d0:b0:8b0:f59e:ab1b with SMTP id p16-20020a17090653d000b008b0f59eab1bmr10629694ejo.75.1676680309182; Fri, 17 Feb 2023 16:31:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676680309; cv=none; d=google.com; s=arc-20160816; b=QxnLElDZtn8vz7ONl/SGrv0bF48+t5ReNByj69uCqdr4JhRXV/TgXPMg5p9sf/04PN 6xIVT7V7BOVOF4uQp+mo9oD/ydoKcU4xPq+CYI+YURcS+D6pOiGHZTyJ4gWVUOPGpjmn xMwg0H5wSSCqaCiPsrwtETflfpOgoTXYkHL3QhUWT2lv9RQRsmrOSAqsD//1kI1vQkf7 V3gvm0c7eEs+KhCwukGj3plblbyZB4NDej/fp8zkE7xWUGKzXiWCefjJ6k4i/tdJ2R/K DbuCg6NMbPCjwgVMGjJZNgxhomzJ3+OmZEi6lvYvGong/A0SsgATaYMG8RAS/InkRzgT Xp5w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=wrKlKZ6ZakLkTbIJ1oX0i57iaJAjsrq7KWEmG7MzQmY=; b=cxgi32JmssogR+rwDwYU8X2ZjPqJRKwcWOveWQq6yvrvWyumXv1XoPzOEaHdhwMs7L purT/GccukfZBtExhS+vOujRen24Ye0sYaEsa/WrYOQRve8sa9IK8wVCSSiHPsFsIxjw 3gHGe9xR8ndzJWALz0nmMNVRvSNjhAmOt+E60CFmcSTii6Y5oXIE9EFEhv4mdkCs8PsB 1HP6ENHXVEoXxJpXb30rabWW/ZWU1LWOC6oFTrqtKBV3bKlJaozWHPPp7TXLdITZXooy NmMdTOtM2UKHZMyexNiNBKlVPpqdAHPjtHnT6sNYWZ0W5Tml+7onag5ZQPiBlIrOUYTC 0iQA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=mCbQikAQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id up33-20020a170907cca100b008bd2a964b93si452140ejc.300.2023.02.17.16.31.26; Fri, 17 Feb 2023 16:31:49 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=mCbQikAQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230234AbjBRAav (ORCPT + 99 others); Fri, 17 Feb 2023 19:30:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43052 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229902AbjBRA3z (ORCPT ); Fri, 17 Feb 2023 19:29:55 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 35B7F68ADE for ; Fri, 17 Feb 2023 16:29:17 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-5365a2b9e4fso19060707b3.15 for ; Fri, 17 Feb 2023 16:29:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=wrKlKZ6ZakLkTbIJ1oX0i57iaJAjsrq7KWEmG7MzQmY=; b=mCbQikAQ3LLbb2MaLcCfnhff3PiBABnQwSQf9LWl30IUncwsLCnlc7QcDea1N/QLsC vfw0K2950rSrhIHrRGtEn7vPNXstlIlDJme+x9X1Tl4rCeL5TE1nmFxc78IZG7kUcuGZ gWnC/qKD/qWIT+WrgA2/XtwBBmqj4uwswcowjU0CsCy/gmEKNQpD5PEKpeYu26k0gax4 wFF1AM3BcKLwDBazsiY337gPGCphNPaNpVDV9c2Psll+/miIIyYrv4fm4pZwi/PDcjtw 80annwRMmYnG6AnPnD+aevt1DhBv/jq/mj8pS2S8t5MLt/H13U85E5EeFDuqgEKiranZ zR4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=wrKlKZ6ZakLkTbIJ1oX0i57iaJAjsrq7KWEmG7MzQmY=; b=BQqYQjQYsrYR78XhUJM+9WcpXXlCwoHZ50K+ougp4+iO7ha8WvWPNVv5w4zDwTI4GX f2k6+AXF9xNvVYrp1QEINkQruQB7RB9ZrVw1hCbOSO0fdKQGGuq6w1TM8rDn1aCXcB+Y Ke9/5igPuLCGgiQ0TkglXehG4UP2qMcd33zRk+7DvLYcAiSKcxE/JLUDToJ+DdyTxGbx w/0V6aI3wbwwaHvMv9/1EoJmK0F0oNDxzD/UsvR/PLqVReMjskrJ3w119uxXgvqHhbrN 8JIahQYps8AoNxwoWjNP22Xypa7LHwA613sZMN2zf59APYJsEVPGjn1SvqH+fp8DhbDj 4kZg== X-Gm-Message-State: AO0yUKWB+7ampr/f/F5D+bl/cwBQdBYcpgf7v2zNyko/ksXmuWxrnkxS 8AUSjiJohdhImxNzdGwFQvS/LaguEviyiPug X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6902:1024:b0:8fc:686c:cf87 with SMTP id x4-20020a056902102400b008fc686ccf87mr53605ybt.4.1676680156885; Fri, 17 Feb 2023 16:29:16 -0800 (PST) Date: Sat, 18 Feb 2023 00:28:06 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-34-jthoughton@google.com> Subject: [PATCH v2 33/46] hugetlb: userfaultfd: add support for high-granularity UFFDIO_CONTINUE From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758126731672380106?= X-GMAIL-MSGID: =?utf-8?q?1758126731672380106?= Changes here are similar to the changes made for hugetlb_no_page. Pass vmf->real_address to userfaultfd_huge_must_wait because vmf->address may be rounded down to the hugepage size, and a high-granularity page table walk would look up the wrong PTE. Also change the call to userfaultfd_must_wait in the same way for consistency. This commit introduces hugetlb_alloc_largest_pte which is used to find the appropriate PTE size to map pages with UFFDIO_CONTINUE. When MADV_SPLIT is provided, page fault events will report PAGE_SIZE-aligned address instead of huge_page_size(h)-aligned addresses, regardless of if UFFD_FEATURE_EXACT_ADDRESS is used. Signed-off-by: James Houghton diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 44d1ee429eb0..bb30001b63ba 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -252,17 +252,17 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx, unsigned long flags, unsigned long reason) { - pte_t *ptep, pte; + pte_t pte; bool ret = true; + struct hugetlb_pte hpte; mmap_assert_locked(ctx->mm); - ptep = hugetlb_walk(vma, address, vma_mmu_pagesize(vma)); - if (!ptep) + if (hugetlb_full_walk(&hpte, vma, address)) goto out; ret = false; - pte = huge_ptep_get(ptep); + pte = huge_ptep_get(hpte.ptep); /* * Lockless access: we're in a wait_event so it's ok if it @@ -531,11 +531,11 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) spin_unlock_irq(&ctx->fault_pending_wqh.lock); if (!is_vm_hugetlb_page(vma)) - must_wait = userfaultfd_must_wait(ctx, vmf->address, vmf->flags, - reason); + must_wait = userfaultfd_must_wait(ctx, vmf->real_address, + vmf->flags, reason); else must_wait = userfaultfd_huge_must_wait(ctx, vma, - vmf->address, + vmf->real_address, vmf->flags, reason); if (is_vm_hugetlb_page(vma)) hugetlb_vma_unlock_read(vma); diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index a344f9d9eba1..e0e51bb06112 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -201,7 +201,8 @@ unsigned long hugetlb_total_pages(void); vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, unsigned int flags); #ifdef CONFIG_USERFAULTFD -int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte, +int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, + struct hugetlb_pte *dst_hpte, struct vm_area_struct *dst_vma, unsigned long dst_addr, unsigned long src_addr, @@ -1272,16 +1273,31 @@ static inline enum hugetlb_level hpage_size_to_level(unsigned long sz) #ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING bool hugetlb_hgm_enabled(struct vm_area_struct *vma); +bool hugetlb_hgm_advised(struct vm_area_struct *vma); bool hugetlb_hgm_eligible(struct vm_area_struct *vma); +int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *mm, + struct vm_area_struct *vma, unsigned long start, + unsigned long end); #else static inline bool hugetlb_hgm_enabled(struct vm_area_struct *vma) { return false; } +static inline bool hugetlb_hgm_advised(struct vm_area_struct *vma) +{ + return false; +} static inline bool hugetlb_hgm_eligible(struct vm_area_struct *vma) { return false; } +static inline +int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *mm, + struct vm_area_struct *vma, unsigned long start, + unsigned long end) +{ + return -EINVAL; +} #endif static inline spinlock_t *huge_pte_lock(struct hstate *h, diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 667e82b7a0ff..a00b4ac07046 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6083,9 +6083,15 @@ static inline vm_fault_t hugetlb_handle_userfault(struct vm_area_struct *vma, unsigned long reason) { u32 hash; + /* + * Don't use the hpage-aligned address if the user has explicitly + * enabled HGM. + */ + bool round_to_pagesize = hugetlb_hgm_advised(vma) && + reason == VM_UFFD_MINOR; struct vm_fault vmf = { .vma = vma, - .address = haddr, + .address = round_to_pagesize ? addr & PAGE_MASK : haddr, .real_address = addr, .flags = flags, @@ -6569,7 +6575,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, * modifications for huge pages. */ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, - pte_t *dst_pte, + struct hugetlb_pte *dst_hpte, struct vm_area_struct *dst_vma, unsigned long dst_addr, unsigned long src_addr, @@ -6580,13 +6586,15 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, bool is_continue = (mode == MCOPY_ATOMIC_CONTINUE); struct hstate *h = hstate_vma(dst_vma); struct address_space *mapping = dst_vma->vm_file->f_mapping; - pgoff_t idx = vma_hugecache_offset(h, dst_vma, dst_addr); + unsigned long haddr = dst_addr & huge_page_mask(h); + pgoff_t idx = vma_hugecache_offset(h, dst_vma, haddr); unsigned long size; int vm_shared = dst_vma->vm_flags & VM_SHARED; pte_t _dst_pte; spinlock_t *ptl; int ret = -ENOMEM; struct folio *folio; + struct page *subpage; int writable; bool folio_in_pagecache = false; @@ -6601,12 +6609,12 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, * a non-missing case. Return -EEXIST. */ if (vm_shared && - hugetlbfs_pagecache_present(h, dst_vma, dst_addr)) { + hugetlbfs_pagecache_present(h, dst_vma, haddr)) { ret = -EEXIST; goto out; } - folio = alloc_hugetlb_folio(dst_vma, dst_addr, 0); + folio = alloc_hugetlb_folio(dst_vma, haddr, 0); if (IS_ERR(folio)) { ret = -ENOMEM; goto out; @@ -6622,13 +6630,13 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, /* Free the allocated folio which may have * consumed a reservation. */ - restore_reserve_on_error(h, dst_vma, dst_addr, folio); + restore_reserve_on_error(h, dst_vma, haddr, folio); folio_put(folio); /* Allocate a temporary folio to hold the copied * contents. */ - folio = alloc_hugetlb_folio_vma(h, dst_vma, dst_addr); + folio = alloc_hugetlb_folio_vma(h, dst_vma, haddr); if (!folio) { ret = -ENOMEM; goto out; @@ -6642,14 +6650,14 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, } } else { if (vm_shared && - hugetlbfs_pagecache_present(h, dst_vma, dst_addr)) { + hugetlbfs_pagecache_present(h, dst_vma, haddr)) { put_page(*pagep); ret = -EEXIST; *pagep = NULL; goto out; } - folio = alloc_hugetlb_folio(dst_vma, dst_addr, 0); + folio = alloc_hugetlb_folio(dst_vma, haddr, 0); if (IS_ERR(folio)) { put_page(*pagep); ret = -ENOMEM; @@ -6697,7 +6705,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, folio_in_pagecache = true; } - ptl = huge_pte_lock(h, dst_mm, dst_pte); + ptl = hugetlb_pte_lock(dst_hpte); ret = -EIO; if (folio_test_hwpoison(folio)) @@ -6709,11 +6717,13 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, * page backing it, then access the page. */ ret = -EEXIST; - if (!huge_pte_none_mostly(huge_ptep_get(dst_pte))) + if (!huge_pte_none_mostly(huge_ptep_get(dst_hpte->ptep))) goto out_release_unlock; + subpage = hugetlb_find_subpage(h, folio, dst_addr); + if (folio_in_pagecache) - page_add_file_rmap(&folio->page, dst_vma, true); + hugetlb_add_file_rmap(subpage, dst_hpte->shift, h, dst_vma); else hugepage_add_new_anon_rmap(folio, dst_vma, dst_addr); @@ -6726,7 +6736,8 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, else writable = dst_vma->vm_flags & VM_WRITE; - _dst_pte = make_huge_pte(dst_vma, &folio->page, writable); + _dst_pte = make_huge_pte_with_shift(dst_vma, subpage, writable, + dst_hpte->shift); /* * Always mark UFFDIO_COPY page dirty; note that this may not be * extremely important for hugetlbfs for now since swapping is not @@ -6739,12 +6750,12 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, if (wp_copy) _dst_pte = huge_pte_mkuffd_wp(_dst_pte); - set_huge_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte); + set_huge_pte_at(dst_mm, dst_addr, dst_hpte->ptep, _dst_pte); - hugetlb_count_add(pages_per_huge_page(h), dst_mm); + hugetlb_count_add(hugetlb_pte_size(dst_hpte) / PAGE_SIZE, dst_mm); /* No need to invalidate - it was non-present before */ - update_mmu_cache(dst_vma, dst_addr, dst_pte); + update_mmu_cache(dst_vma, dst_addr, dst_hpte->ptep); spin_unlock(ptl); if (!is_continue) @@ -7941,6 +7952,18 @@ bool hugetlb_hgm_enabled(struct vm_area_struct *vma) { return vma && (vma->vm_flags & VM_HUGETLB_HGM); } +bool hugetlb_hgm_advised(struct vm_area_struct *vma) +{ + /* + * Right now, the only way for HGM to be enabled is if a user + * explicitly enables it via MADV_SPLIT, but in the future, there + * may be cases where it gets enabled automatically. + * + * Provide hugetlb_hgm_advised() now for call sites where care that the + * user explicitly enabled HGM. + */ + return hugetlb_hgm_enabled(vma); +} /* Should only be used by the for_each_hgm_shift macro. */ static unsigned int __shift_for_hstate(struct hstate *h) { @@ -7959,6 +7982,38 @@ static unsigned int __shift_for_hstate(struct hstate *h) (tmp_h) <= &hstates[hugetlb_max_hstate]; \ (tmp_h)++) +/* + * Find the HugeTLB PTE that maps as much of [start, end) as possible with a + * single page table entry. It is returned in @hpte. + */ +int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *mm, + struct vm_area_struct *vma, unsigned long start, + unsigned long end) +{ + struct hstate *h = hstate_vma(vma), *tmp_h; + unsigned int shift; + unsigned long sz; + int ret; + + for_each_hgm_shift(h, tmp_h, shift) { + sz = 1UL << shift; + + if (!IS_ALIGNED(start, sz) || start + sz > end) + continue; + goto found; + } + return -EINVAL; +found: + ret = hugetlb_full_walk_alloc(hpte, vma, start, sz); + if (ret) + return ret; + + if (hpte->shift > shift) + return -EEXIST; + + return 0; +} + #endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ /* diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 53c3d916ff66..b56bc12f600e 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -320,14 +320,16 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, { int vm_shared = dst_vma->vm_flags & VM_SHARED; ssize_t err; - pte_t *dst_pte; unsigned long src_addr, dst_addr; long copied; struct page *page; - unsigned long vma_hpagesize; + unsigned long vma_hpagesize, target_pagesize; pgoff_t idx; u32 hash; struct address_space *mapping; + bool use_hgm = hugetlb_hgm_advised(dst_vma) && + mode == MCOPY_ATOMIC_CONTINUE; + struct hstate *h = hstate_vma(dst_vma); /* * There is no default zero huge page for all huge page sizes as @@ -345,12 +347,13 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, copied = 0; page = NULL; vma_hpagesize = vma_kernel_pagesize(dst_vma); + target_pagesize = use_hgm ? PAGE_SIZE : vma_hpagesize; /* - * Validate alignment based on huge page size + * Validate alignment based on the targeted page size. */ err = -EINVAL; - if (dst_start & (vma_hpagesize - 1) || len & (vma_hpagesize - 1)) + if (dst_start & (target_pagesize - 1) || len & (target_pagesize - 1)) goto out_unlock; retry: @@ -381,13 +384,14 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, } while (src_addr < src_start + len) { + struct hugetlb_pte hpte; BUG_ON(dst_addr >= dst_start + len); /* * Serialize via vma_lock and hugetlb_fault_mutex. - * vma_lock ensures the dst_pte remains valid even - * in the case of shared pmds. fault mutex prevents - * races with other faulting threads. + * vma_lock ensures the hpte.ptep remains valid even + * in the case of shared pmds and page table collapsing. + * fault mutex prevents races with other faulting threads. */ idx = linear_page_index(dst_vma, dst_addr); mapping = dst_vma->vm_file->f_mapping; @@ -395,23 +399,28 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, mutex_lock(&hugetlb_fault_mutex_table[hash]); hugetlb_vma_lock_read(dst_vma); - err = -ENOMEM; - dst_pte = huge_pte_alloc(dst_mm, dst_vma, dst_addr, vma_hpagesize); - if (!dst_pte) { + if (use_hgm) + err = hugetlb_alloc_largest_pte(&hpte, dst_mm, dst_vma, + dst_addr, + dst_start + len); + else + err = hugetlb_full_walk_alloc(&hpte, dst_vma, dst_addr, + vma_hpagesize); + if (err) { hugetlb_vma_unlock_read(dst_vma); mutex_unlock(&hugetlb_fault_mutex_table[hash]); goto out_unlock; } if (mode != MCOPY_ATOMIC_CONTINUE && - !huge_pte_none_mostly(huge_ptep_get(dst_pte))) { + !huge_pte_none_mostly(huge_ptep_get(hpte.ptep))) { err = -EEXIST; hugetlb_vma_unlock_read(dst_vma); mutex_unlock(&hugetlb_fault_mutex_table[hash]); goto out_unlock; } - err = hugetlb_mcopy_atomic_pte(dst_mm, dst_pte, dst_vma, + err = hugetlb_mcopy_atomic_pte(dst_mm, &hpte, dst_vma, dst_addr, src_addr, mode, &page, wp_copy); @@ -423,6 +432,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, if (unlikely(err == -ENOENT)) { mmap_read_unlock(dst_mm); BUG_ON(!page); + WARN_ON_ONCE(hpte.shift != huge_page_shift(h)); err = copy_huge_page_from_user(page, (const void __user *)src_addr, @@ -440,9 +450,9 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, BUG_ON(page); if (!err) { - dst_addr += vma_hpagesize; - src_addr += vma_hpagesize; - copied += vma_hpagesize; + dst_addr += hugetlb_pte_size(&hpte); + src_addr += hugetlb_pte_size(&hpte); + copied += hugetlb_pte_size(&hpte); if (fatal_signal_pending(current)) err = -EINTR; From patchwork Sat Feb 18 00:28:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58844 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp143317wrn; Fri, 17 Feb 2023 16:32:34 -0800 (PST) X-Google-Smtp-Source: AK7set9+6FQANp1c8ZhVtffZG/SmB+1HG2+RYIwmqLfOn5nH/FzYHA1PKeKGQHmOANK1G6ytGwa+ X-Received: by 2002:a17:902:db05:b0:19b:f5b:4ca6 with SMTP id m5-20020a170902db0500b0019b0f5b4ca6mr6946102plx.31.1676680354487; Fri, 17 Feb 2023 16:32:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676680354; cv=none; d=google.com; s=arc-20160816; b=pIsgw7FykmHoysCBWUMnmU2BA8jB/LO7JeoxkUVRwCHGOPBvR5eoSdDp1TtlDihx2z wJ5wNxqY+ISNNjpxxwxmQBdF0nsM6vCx7a/M1fIvnKR+HdRccvO+hvPJ4nMAYr3N9XGr HoJfsh37j0AmfOYSCAKKpxf3UAw6rUpwmboyD7zk9p+B1fO2F93Rs02sM0hCoBYCgKpO 7cXayH5rUusco4eJSgMTwa0xVz55ivmewrGoTNiTFHmZXoVDcu4ZREL9OST6DzVcQqsi 3mibDNaGCBr1e6bTWsAplMpbELlm9NegPvEc/Y2SDLNdJ1D7b7kU3odO8RDd0swFG4i0 TUig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=H5Ss3Q8p6Tm/C0evsWap3MC9xTDp36cPJKkNtoaVKGQ=; b=eEcQ2OElVONBbx2JFhjeRnJ+t0/WBlNQpMQX5+G2EJppIO37hDDhmO+mLVNapbT6e7 XDs2Btd6Iu0V+qoo+ay9C98yxnbvGWm+I77eULCDoAa7wSfphUgqFZ1Uy6P0YsBjzmSF gH5tZBTYgD9R6Uxe8KjdWCm3xRqLKm5JnQcZDqp4rN8a52OQWmlog9C6NLFNe8NPnDla bSNTOyY2G3ZOE2y4Bu2veF+08clEyPkaKoq2+B6wsUFcCwgEVcqBcooH0Pyr7Sy2Gk0N pVmpqK4strDzaCVZedSFGMfHMgbJDXdfQdQa5v3O25Hisd5kpPP3nrA6r6FNxWnuQ6K8 GLaA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=PyoCsPHZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u5-20020a170903124500b0019b6977e58dsi1362235plh.200.2023.02.17.16.32.21; Fri, 17 Feb 2023 16:32:34 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=PyoCsPHZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229461AbjBRAb1 (ORCPT + 99 others); Fri, 17 Feb 2023 19:31:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43108 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230148AbjBRAaM (ORCPT ); Fri, 17 Feb 2023 19:30:12 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 92C5A6A073 for ; Fri, 17 Feb 2023 16:29:31 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id y33-20020a25ad21000000b00953ffdfbe1aso2197142ybi.23 for ; Fri, 17 Feb 2023 16:29:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=H5Ss3Q8p6Tm/C0evsWap3MC9xTDp36cPJKkNtoaVKGQ=; b=PyoCsPHZKLuSeDOLvIw+lkPHKwn65TPzd0CfsjOfssDvwh0+NtYbibsVfgVsSmNssl qradQv+EgXsSDZiodu5lljG1t2o7w6TQ698wIv5JgVXeUBOB3Wnz/Bcq5yevJLKg+nve VjJMx+Txfk3k/LgV2w2hnh91g802PdgQo2AseWuhEWbpXHvGj7DQ7qL06+fGAKkwmyeX eLgkDxIvUSbfleeEETs9PQLkB3WeTXxJkEgHYOVpxDZGz96scTefQeiZRq4DuIrveLRP xg9PkBMYr1jykjB8E7MRM4w0AuZAeJ4hrxxNWXPe2GYe6/W0zfWOPuKMd0nX/tiporx7 sscg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=H5Ss3Q8p6Tm/C0evsWap3MC9xTDp36cPJKkNtoaVKGQ=; b=DCv3v9qUbqnOBwm9qYfSZ2sB8aFXYQT/35Kswiqe0Ez7FplF9Mj052MCrOJlJ7iSCs CMkmCzKUR64pzuFEFnDE2VRF2RnZIXDuvOGXMKM3/gE0s/B8NGAe/TsUhHgxSm58BLQv DF5xB/19y37frKRva10Y3ABSWL3VTO08mg5qPCXXpOFEMG8lsiqwta6DphEIk7Vl0Psh cFYeNSS57o0/4FghkjEvRwO9BpQtQtvrO5GJ9cqd4jNhs5wRJpJJbhKeqVKsG7jtxBwS FykV3yZhQZ8tr6GD25HCTkTWPb4Lhp1/Zx3waSQpG6Mry6aIQmLqPn49U/5F/cSwdI38 ekvw== X-Gm-Message-State: AO0yUKVHQ1lkRVjjfTZVU6xRhxOBCSSucoSFd0IyTaIvTdSMORF9GpnL GUoU9YOH2f97fWiELPQMYMuewzsH7uwJbfs7 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6902:107:b0:914:1ef3:e98a with SMTP id o7-20020a056902010700b009141ef3e98amr168149ybh.213.1676680158302; Fri, 17 Feb 2023 16:29:18 -0800 (PST) Date: Sat, 18 Feb 2023 00:28:07 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-35-jthoughton@google.com> Subject: [PATCH v2 34/46] hugetlb: add MADV_COLLAPSE for hugetlb From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758126779470260026?= X-GMAIL-MSGID: =?utf-8?q?1758126779470260026?= This is a necessary extension to the UFFDIO_CONTINUE changes. When userspace finishes mapping an entire hugepage with UFFDIO_CONTINUE, the kernel has no mechanism to automatically collapse the page table to map the whole hugepage normally. We require userspace to inform us that they would like the mapping to be collapsed; they do this with MADV_COLLAPSE. If userspace has not mapped all of a hugepage with UFFDIO_CONTINUE, but only some, hugetlb_collapse will cause the requested range to be mapped as if it were UFFDIO_CONTINUE'd already. The effects of any UFFDIO_WRITEPROTECT calls may be undone by a call to MADV_COLLAPSE for intersecting address ranges. This commit is co-opting the same madvise mode that has been introduced to synchronously collapse THPs. The function that does THP collapsing has been renamed to madvise_collapse_thp. As with the rest of the high-granularity mapping support, MADV_COLLAPSE is only supported for shared VMAs right now. MADV_COLLAPSE for HugeTLB takes the mmap_lock for writing. It is important that we check PageHWPoison before checking !HPageMigratable, as PageHWPoison implies !HPageMigratable. !PageHWPoison && !HPageMigratable means that the page has been isolated for migration. Signed-off-by: James Houghton diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 70bd867eba94..fa63a56ebaf0 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -218,9 +218,9 @@ void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, int hugepage_madvise(struct vm_area_struct *vma, unsigned long *vm_flags, int advice); -int madvise_collapse(struct vm_area_struct *vma, - struct vm_area_struct **prev, - unsigned long start, unsigned long end); +int madvise_collapse_thp(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start, unsigned long end); void vma_adjust_trans_huge(struct vm_area_struct *vma, unsigned long start, unsigned long end, long adjust_next); spinlock_t *__pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma); @@ -358,9 +358,9 @@ static inline int hugepage_madvise(struct vm_area_struct *vma, return -EINVAL; } -static inline int madvise_collapse(struct vm_area_struct *vma, - struct vm_area_struct **prev, - unsigned long start, unsigned long end) +static inline int madvise_collapse_thp(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start, unsigned long end) { return -EINVAL; } diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index e0e51bb06112..6cd4ae08d84d 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -1278,6 +1278,8 @@ bool hugetlb_hgm_eligible(struct vm_area_struct *vma); int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *mm, struct vm_area_struct *vma, unsigned long start, unsigned long end); +int hugetlb_collapse(struct mm_struct *mm, unsigned long start, + unsigned long end); #else static inline bool hugetlb_hgm_enabled(struct vm_area_struct *vma) { @@ -1298,6 +1300,12 @@ int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *mm, { return -EINVAL; } +static inline +int hugetlb_collapse(struct mm_struct *mm, unsigned long start, + unsigned long end) +{ + return -EINVAL; +} #endif static inline spinlock_t *huge_pte_lock(struct hstate *h, diff --git a/mm/hugetlb.c b/mm/hugetlb.c index a00b4ac07046..c4d189e5f1fd 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -8014,6 +8014,158 @@ int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *mm, return 0; } +/* + * Collapse the address range from @start to @end to be mapped optimally. + * + * This is only valid for shared mappings. The main use case for this function + * is following UFFDIO_CONTINUE. If a user UFFDIO_CONTINUEs an entire hugepage + * by calling UFFDIO_CONTINUE once for each 4K region, the kernel doesn't know + * to collapse the mapping after the final UFFDIO_CONTINUE. Instead, we leave + * it up to userspace to tell us to do so, via MADV_COLLAPSE. + * + * Any holes in the mapping will be filled. If there is no page in the + * pagecache for a region we're collapsing, the PTEs will be cleared. + * + * If high-granularity PTEs are uffd-wp markers, those markers will be dropped. + */ +static int __hugetlb_collapse(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long start, unsigned long end) +{ + struct hstate *h = hstate_vma(vma); + struct address_space *mapping = vma->vm_file->f_mapping; + struct mmu_notifier_range range; + struct mmu_gather tlb; + unsigned long curr = start; + int ret = 0; + struct folio *folio; + struct page *subpage; + pgoff_t idx; + bool writable = vma->vm_flags & VM_WRITE; + struct hugetlb_pte hpte; + pte_t entry; + spinlock_t *ptl; + + /* + * This is only supported for shared VMAs, because we need to look up + * the page to use for any PTEs we end up creating. + */ + if (!(vma->vm_flags & VM_MAYSHARE)) + return -EINVAL; + + /* If HGM is not enabled, there is nothing to collapse. */ + if (!hugetlb_hgm_enabled(vma)) + return 0; + + tlb_gather_mmu(&tlb, mm); + + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, start, end); + mmu_notifier_invalidate_range_start(&range); + + while (curr < end) { + ret = hugetlb_alloc_largest_pte(&hpte, mm, vma, curr, end); + if (ret) + goto out; + + entry = huge_ptep_get(hpte.ptep); + + /* + * There is no work to do if the PTE doesn't point to page + * tables. + */ + if (!pte_present(entry)) + goto next_hpte; + if (hugetlb_pte_present_leaf(&hpte, entry)) + goto next_hpte; + + idx = vma_hugecache_offset(h, vma, curr); + folio = filemap_get_folio(mapping, idx); + + if (folio && folio_test_hwpoison(folio)) { + /* + * Don't collapse a mapping to a page that is + * hwpoisoned. The entire page will be poisoned. + * + * When HugeTLB supports poisoning PAGE_SIZE bits of + * the hugepage, the logic here can be improved. + * + * Skip this page, and continue to collapse the rest + * of the mapping. + */ + folio_put(folio); + curr = (curr & huge_page_mask(h)) + huge_page_size(h); + continue; + } + + if (folio && !folio_test_hugetlb_migratable(folio)) { + /* + * Don't collapse a mapping to a page that is pending + * a migration. Migration swap entries may have placed + * in the page table. + */ + ret = -EBUSY; + folio_put(folio); + goto out; + } + + /* + * Clear all the PTEs, and drop ref/mapcounts + * (on tlb_finish_mmu). + */ + __unmap_hugepage_range(&tlb, vma, curr, + curr + hugetlb_pte_size(&hpte), + NULL, + ZAP_FLAG_DROP_MARKER); + /* Free the PTEs. */ + hugetlb_free_pgd_range(&tlb, + curr, curr + hugetlb_pte_size(&hpte), + curr, curr + hugetlb_pte_size(&hpte)); + + ptl = hugetlb_pte_lock(&hpte); + + if (!folio) { + huge_pte_clear(mm, curr, hpte.ptep, + hugetlb_pte_size(&hpte)); + spin_unlock(ptl); + goto next_hpte; + } + + subpage = hugetlb_find_subpage(h, folio, curr); + entry = make_huge_pte_with_shift(vma, subpage, + writable, hpte.shift); + hugetlb_add_file_rmap(subpage, hpte.shift, h, vma); + set_huge_pte_at(mm, curr, hpte.ptep, entry); + spin_unlock(ptl); +next_hpte: + curr += hugetlb_pte_size(&hpte); + } +out: + mmu_notifier_invalidate_range_end(&range); + tlb_finish_mmu(&tlb); + + return ret; +} + +int hugetlb_collapse(struct mm_struct *mm, unsigned long start, + unsigned long end) +{ + int ret = 0; + struct vm_area_struct *vma; + + mmap_write_lock(mm); + while (start < end || ret) { + vma = find_vma(mm, start); + if (!vma || !is_vm_hugetlb_page(vma)) { + ret = -EINVAL; + break; + } + ret = __hugetlb_collapse(mm, vma, start, + end < vma->vm_end ? end : vma->vm_end); + start = vma->vm_end; + } + mmap_write_unlock(mm); + return ret; +} + #endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ /* diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 8dbc39896811..58cda5020537 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -2750,8 +2750,8 @@ static int madvise_collapse_errno(enum scan_result r) } } -int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev, - unsigned long start, unsigned long end) +int madvise_collapse_thp(struct vm_area_struct *vma, struct vm_area_struct **prev, + unsigned long start, unsigned long end) { struct collapse_control *cc; struct mm_struct *mm = vma->vm_mm; diff --git a/mm/madvise.c b/mm/madvise.c index 8c004c678262..e121d135252a 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -1028,6 +1028,24 @@ static int madvise_split(struct vm_area_struct *vma, #endif } +static int madvise_collapse(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start, unsigned long end) +{ + if (is_vm_hugetlb_page(vma)) { + struct mm_struct *mm = vma->vm_mm; + int ret; + + *prev = NULL; /* tell sys_madvise we dropped the mmap lock */ + mmap_read_unlock(mm); + ret = hugetlb_collapse(mm, start, end); + mmap_read_lock(mm); + return ret; + } + + return madvise_collapse_thp(vma, prev, start, end); +} + /* * Apply an madvise behavior to a region of a vma. madvise_update_vma * will handle splitting a vm area into separate areas, each area with its own @@ -1204,6 +1222,9 @@ madvise_behavior_valid(int behavior) #ifdef CONFIG_TRANSPARENT_HUGEPAGE case MADV_HUGEPAGE: case MADV_NOHUGEPAGE: +#endif +#if defined(CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING) || \ + defined(CONFIG_TRANSPARENT_HUGEPAGE) case MADV_COLLAPSE: #endif #ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING @@ -1397,7 +1418,8 @@ int madvise_set_anon_name(struct mm_struct *mm, unsigned long start, * MADV_NOHUGEPAGE - mark the given range as not worth being backed by * transparent huge pages so the existing pages will not be * coalesced into THP and new pages will not be allocated as THP. - * MADV_COLLAPSE - synchronously coalesce pages into new THP. + * MADV_COLLAPSE - synchronously coalesce pages into new THP, or, for HugeTLB + * pages, collapse the mapping. * MADV_SPLIT - allow HugeTLB pages to be mapped at PAGE_SIZE. This allows * UFFDIO_CONTINUE to accept PAGE_SIZE-aligned regions. * MADV_DONTDUMP - the application wants to prevent pages in the given range From patchwork Sat Feb 18 00:28:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58843 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp143195wrn; Fri, 17 Feb 2023 16:32:16 -0800 (PST) X-Google-Smtp-Source: AK7set90eDo4xEr53/vVSGkBp/sNTBUZGNOjqwDvj6XTsa91gWLLPQmobClDzZVT6yeYiWwNHlBi X-Received: by 2002:a62:3385:0:b0:5a8:a045:d1d2 with SMTP id z127-20020a623385000000b005a8a045d1d2mr1861031pfz.34.1676680336196; Fri, 17 Feb 2023 16:32:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676680336; cv=none; d=google.com; s=arc-20160816; b=ku+XiIKE7uLMXoINDxW3C7JLNXW2OfAtt7kVVTGaJ+DwlOQS4Bc+Q5vudF1bBCg4G8 hfz0U9vEsvvRRTo7ogyL9XOcdTNMibnS3SLnm1c4aaBH2wIwsh7e9waRqDezFYFL3PNF moJxtn6mbgGvZv0pu5MJvTAfDE7IIHYKfWdwNyRyJ5+UKVayzvGeWeEgi1SGWQCs2ywk cc1ULc+mcJmyJkoY+b4mwe6wSklbBQhb6GRqO5W8bkXAteGg/w631XxDdC+hBwbdPRcu gAY1rXTt6l2zLOVaNZkJgtN0TqWCgaTVvbKGwsI8Pyd/dkyiQxii5rLVS7t8NNUSdtSN NwFA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=6FTaaiplXAWGfEVqPjHy0St/TFfLKP8Xa0naqtqNKxQ=; b=J9WBC1I4sW0GhA8UdbHavI68sFhjx/jsy1idar8GHKvORyP2nSnYSg0K608FkQ0zCb UIDwftfhloKkxXfyGGoV+hnfW+m4/YQEHRYsJzs/SH4lmtVRdMybMMhIyrWb6Ot8Ujv/ Xz0+ipM/N/z6UQpzL0miUu+bxR6UQzbaQK0/DQ3aAq1Dx1dYxS4R2jMcJn7DijgEQtXm OuXYBG5SrrAw76lN4nzOs9tg/wygACC9xXZ2c6ghbOwiQgKYZCdQEOQA6msXgDJt0Fpl 1m+qvFDHXtoIczlYabriRZJkNwfNOfOYvVaXHCzxD/3tY2Afy0SySCozMCg8l5zPD05H oUMg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=lcSeD08U; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z20-20020aa78894000000b0058da3f01de3si6714826pfe.347.2023.02.17.16.32.03; Fri, 17 Feb 2023 16:32:16 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=lcSeD08U; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230178AbjBRAbY (ORCPT + 99 others); Fri, 17 Feb 2023 19:31:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41956 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230146AbjBRAaM (ORCPT ); Fri, 17 Feb 2023 19:30:12 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6F6F168ADA for ; Fri, 17 Feb 2023 16:29:32 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id l206-20020a25ccd7000000b006fdc6aaec4fso2657430ybf.20 for ; Fri, 17 Feb 2023 16:29:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=6FTaaiplXAWGfEVqPjHy0St/TFfLKP8Xa0naqtqNKxQ=; b=lcSeD08U2fduS0Ppbd/dwZ1sVutpIAR5utesmektFgF0YNOfket/WkjCBRvoXlNLiV +o2Aogs/gc+f50ybLSWxFhCjImIYIGSTQeUOhgRWJa7vrQRBVyoRIEpYpw2YRWcA9U41 WyAW0Ab++KWiCATJErqWrBwpBVdFZBa+QSN736IFEusFdICh8RUH5fL8c0v/Y8PEr8Br Qe8eJUa76VSmD3HP1n1esGFDMJpGGHWAJXm6Yqaf73w38JldN0uF+PQwqsmagB2uCrvE JVZuFpSJuNpEY8qrsEqwZB/8hVTroKRCuulZOvJKAfU7a7tyMxK2NmKF8QMW0V3Aw8B3 CM8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=6FTaaiplXAWGfEVqPjHy0St/TFfLKP8Xa0naqtqNKxQ=; b=K1KSJZ9YFAyJsz43Br442nFW0U4ssibTHJMG7pQsQKWIxnesYRgA4wbbqr4Z881fBX dK3MS7I6pU7Mx55dHflnuxeOedKjDxKBSJoOJGYlReg8dEhj6qCykr2rpqz11f20qzTZ AWun5V5kLmGlHHgCZjCYpqfl8TB2k5Dawf+E9Gd2W5ess2+BqOXQbbE+uKFnd6CYO+D8 9aCpBStExiuIPJTCX3CBTPeHq959bzWB93vUGjyW9x4HU6ej3/F6bavTFZcCyAIEIObC CJOLuEH+XlUkokbYN48R6FMY2f/GrbrFg7fCVERIiAh+aD7nYZftpZZonr2+q0mL7Wgt OWzQ== X-Gm-Message-State: AO0yUKVv8EFChHr8M62685HM6hGYzYQbGp8QJDyfAiMMJ3lH4dFHPedm NKs2JTgyBBwpDbe59h7dRNvmANcDObnJxF5q X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a5b:f49:0:b0:995:ccb:1aae with SMTP id y9-20020a5b0f49000000b009950ccb1aaemr85936ybr.13.1676680159411; Fri, 17 Feb 2023 16:29:19 -0800 (PST) Date: Sat, 18 Feb 2023 00:28:08 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-36-jthoughton@google.com> Subject: [PATCH v2 35/46] hugetlb: add check to prevent refcount overflow via HGM From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758126759924559930?= X-GMAIL-MSGID: =?utf-8?q?1758126759924559930?= With high-granularity mappings, it becomes quite trivial for userspace to overflow a page's refcount or mapcount. It can be done like so: 1. Create a 1G hugetlbfs file with a single 1G page. 2. Create 8192 mappings of the file. 3. Use UFFDIO_CONTINUE to map every mapping at entirely 4K. Each time step 3 is done for a mapping, the refcount and mapcount will increase by 2^19 (512 * 512). Do that 2^13 times (8192), and you reach 2^31. To avoid this, WARN_ON_ONCE when the refcount goes negative. If this happens as a result of a page fault, return VM_FAULT_SIGBUS, and if it happens as a result of a UFFDIO_CONTINUE, return EFAULT. We can also create too many mappings by fork()ing a lot with VMAs setup such that page tables must be copied at fork()-time (like if we have VM_UFFD_WP). Use try_get_page() in copy_hugetlb_page_range() to deal with this. Signed-off-by: James Houghton diff --git a/mm/hugetlb.c b/mm/hugetlb.c index c4d189e5f1fd..34368072dabe 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5397,7 +5397,10 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, } else { ptepage = pte_page(entry); hpage = compound_head(ptepage); - get_page(hpage); + if (try_get_page(hpage)) { + ret = -EFAULT; + break; + } /* * Failing to duplicate the anon rmap is a rare case @@ -6132,6 +6135,30 @@ static bool hugetlb_pte_stable(struct hstate *h, struct hugetlb_pte *hpte, return same; } +/* + * Like filemap_lock_folio, but check the refcount of the page afterwards to + * check if we are at risk of overflowing refcount back to 0. + * + * This should be used in places that can be used to easily overflow refcount, + * like places that create high-granularity mappings. + */ +static struct folio *hugetlb_try_find_lock_folio(struct address_space *mapping, + pgoff_t idx) +{ + struct folio *folio = filemap_lock_folio(mapping, idx); + + /* + * This check is very similar to the one in try_get_page(). + * + * This check is inherently racy, so WARN_ON_ONCE() if this condition + * ever occurs. + */ + if (WARN_ON_ONCE(folio && folio_ref_count(folio) <= 0)) + return ERR_PTR(-EFAULT); + + return folio; +} + static vm_fault_t hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma, struct address_space *mapping, pgoff_t idx, @@ -6168,7 +6195,15 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, * before we get page_table_lock. */ new_folio = false; - folio = filemap_lock_folio(mapping, idx); + folio = hugetlb_try_find_lock_folio(mapping, idx); + if (IS_ERR(folio)) { + /* + * We don't want to invoke the OOM killer here, as we aren't + * actually OOMing. + */ + ret = VM_FAULT_SIGBUS; + goto out; + } if (!folio) { size = i_size_read(mapping->host) >> huge_page_shift(h); if (idx >= size) @@ -6600,8 +6635,8 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, if (is_continue) { ret = -EFAULT; - folio = filemap_lock_folio(mapping, idx); - if (!folio) + folio = hugetlb_try_find_lock_folio(mapping, idx); + if (IS_ERR_OR_NULL(folio)) goto out; folio_in_pagecache = true; } else if (!*pagep) { From patchwork Sat Feb 18 00:28:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58845 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp143334wrn; Fri, 17 Feb 2023 16:32:37 -0800 (PST) X-Google-Smtp-Source: AK7set84E+cCx85ZAXIGUjv5z/FoT9LYOdZMh7Qvg/OnZa6MIMfXRjO/7HAOYCWYvdSEcPOlxqBN X-Received: by 2002:a62:3802:0:b0:5a8:adc8:6de1 with SMTP id f2-20020a623802000000b005a8adc86de1mr2166548pfa.29.1676680356762; Fri, 17 Feb 2023 16:32:36 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676680356; cv=none; d=google.com; s=arc-20160816; b=sC9JXwhguh2Lsz40cwAdX/D9UUy4JATBrH+rSFZ84KXoZf1IAtVVJnIoLcS2Burz5J NS9mVx1dzRqRInnEO1DAzFcD0LiXHCw4X7glwhEHTHO5Ls3/mx9oB1SV9ZrgNbwpNZyk b6cNkkns0TcAzHnDBg7ixtkggEhyCGvCUppL6IfCd4UJNF70qkYjfNMF/oHm5ayaepu+ MLrJsILDioaVSt6XLEKmDbk5bw/ahm59tyuj3ybBsumwyZa64iuk8CsPk/dtrH9vLLxF nNfGMknLkxXke+IZhPUEAr7Mx8n3tQWBcaSLnCe5YV2qN8lcROL/DUuAEWGU3/vLPVh9 4+Sg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=HPj8If6i9Ge054c2yi88NtPUxX7MkhEBgezIKsLW8z8=; b=k6fWCA5BquuCflMQ7qprVaYk0SS+dMqxIlndMwhLNcyncleSIs22f2hhUblHyj8M+d 86nvc/H/ILFsRNVN8kmork6kGrIY659YvAnvEBV88OLaY/A9NR6It7ZkeEepKmS+JA3G 06pWpUbO9ubcFf7ZZhVVij1PlWRjTDip1HUV7S/J0yGe6wRviDYg32dfvODa2dN9cMK9 8n4lACXFXOxsB7kzGwzSiBBhggn1Lnj6QKh9iV4n5K3Vdmd4YGx3Q7rDN9HzlZNYdjYr ZKrPsHx9Ipp+8vvC8/2ujQ6BvwgPCEcB2EQl18Lf/2xFTyk2BPoSVaDi9CR7fwdEftTp QZiw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=Kd84UzoU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x128-20020a628686000000b005a9d5b4e724si3595989pfd.124.2023.02.17.16.32.24; Fri, 17 Feb 2023 16:32:36 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=Kd84UzoU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230313AbjBRAbb (ORCPT + 99 others); Fri, 17 Feb 2023 19:31:31 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43030 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230174AbjBRAaM (ORCPT ); Fri, 17 Feb 2023 19:30:12 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 349BA6CA1D for ; Fri, 17 Feb 2023 16:29:33 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-5366333bdb5so15830587b3.19 for ; Fri, 17 Feb 2023 16:29:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=HPj8If6i9Ge054c2yi88NtPUxX7MkhEBgezIKsLW8z8=; b=Kd84UzoUTRPGM2pruNTbVhhDlhlYmwG9wMjKAkrrxmwn/CJAyxsS6jZ/LzaQVwYC3z S69dpVgrGNsnQ2VWeasqchjAp3jDCn9n8jxsJDYFL1UCwLksmO5Kw+6uUwl7zSSgvaCl CihuI8Yf+eChioQnTWfweqD4fr36yuVoZcedwO4ze5S8nNa5G9axS81vDsPaWdtLrroV 8UuCTi7HBc8+DG1q9YpGK/r09QKm4gOgG0V2UzM7mQ71xE6K5hJlSxLD1cNbFDXLleV5 yEaFpVguoifalvGf8MAWgne1cnF863zM85vjZWzRVitIqTT0Ms2eNQkJwx5cA7n7LbdD 9FRQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=HPj8If6i9Ge054c2yi88NtPUxX7MkhEBgezIKsLW8z8=; b=cdAD0PJMBpZmSMX9kpEE1e591YBc8vBZk/37zYRi+/nIjI/M5bLOGnX28nO3fSOTWr rRtap00aSisfJ/zGbAGrFRG0gwl3qc3dUvxAloHaOuMB3N5dcx4QonWSs5DJP65JQS2N CLfCnDCPpIG2OncnKrQYeUHZMKYFw8qQAXWP4PgRabHyCrJaqwG3p+suxUZQ1E+k1e8E NSWFkTSKknYZf7/s5+jBohunRl6B/yCnDQC9yJX82Xtax+zu1jRX3D2fHW613QadXebb Tg7YlmwWnYRbDrK77hImSQkOB6vAzel/af+kuV3AcGAoRHUqQCP94EZfaw+ehbg6R3CX Y1Cg== X-Gm-Message-State: AO0yUKWEIqqYnKcARBAUWaU5VPoEqvWxKtvYPMBhDWU8zbXu3ZuPQU/M SpS1biMp+v1hVYDgtUqgXuv5e4g16cQCfdgz X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6902:1cd:b0:985:3b30:f27 with SMTP id u13-20020a05690201cd00b009853b300f27mr245191ybh.13.1676680160446; Fri, 17 Feb 2023 16:29:20 -0800 (PST) Date: Sat, 18 Feb 2023 00:28:09 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-37-jthoughton@google.com> Subject: [PATCH v2 36/46] hugetlb: remove huge_pte_lock and huge_pte_lockptr From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758126781395748801?= X-GMAIL-MSGID: =?utf-8?q?1758126781395748801?= They are replaced with hugetlb_pte_lock{,ptr}. All callers that haven't already been replaced don't get called when using HGM, so we handle them by populating hugetlb_ptes with the standard, hstate-sized huge PTEs. Signed-off-by: James Houghton diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c index 035a0df47af0..c90ac06dc8d9 100644 --- a/arch/powerpc/mm/pgtable.c +++ b/arch/powerpc/mm/pgtable.c @@ -258,11 +258,14 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma, #ifdef CONFIG_PPC_BOOK3S_64 struct hstate *h = hstate_vma(vma); + struct hugetlb_pte hpte; psize = hstate_get_psize(h); #ifdef CONFIG_DEBUG_VM - assert_spin_locked(huge_pte_lockptr(huge_page_shift(h), - vma->vm_mm, ptep)); + /* HGM is not supported for powerpc yet. */ + hugetlb_pte_init(&hpte, ptep, huge_page_shift(h), + hpage_size_to_level(psize)); + assert_spin_locked(hpte.ptl); #endif #else diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 6cd4ae08d84d..742e7f2cb170 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -1012,14 +1012,6 @@ static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask) return modified_mask; } -static inline spinlock_t *huge_pte_lockptr(unsigned int shift, - struct mm_struct *mm, pte_t *pte) -{ - if (shift == PMD_SHIFT) - return pmd_lockptr(mm, (pmd_t *) pte); - return &mm->page_table_lock; -} - #ifndef hugepages_supported /* * Some platform decide whether they support huge pages at boot @@ -1228,12 +1220,6 @@ static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask) return 0; } -static inline spinlock_t *huge_pte_lockptr(unsigned int shift, - struct mm_struct *mm, pte_t *pte) -{ - return &mm->page_table_lock; -} - static inline void hugetlb_count_init(struct mm_struct *mm) { } @@ -1308,16 +1294,6 @@ int hugetlb_collapse(struct mm_struct *mm, unsigned long start, } #endif -static inline spinlock_t *huge_pte_lock(struct hstate *h, - struct mm_struct *mm, pte_t *pte) -{ - spinlock_t *ptl; - - ptl = huge_pte_lockptr(huge_page_shift(h), mm, pte); - spin_lock(ptl); - return ptl; -} - static inline spinlock_t *hugetlb_pte_lockptr(struct hugetlb_pte *hpte) { @@ -1353,8 +1329,22 @@ void hugetlb_pte_init(struct mm_struct *mm, struct hugetlb_pte *hpte, pte_t *ptep, unsigned int shift, enum hugetlb_level level) { - __hugetlb_pte_init(hpte, ptep, shift, level, - huge_pte_lockptr(shift, mm, ptep)); + spinlock_t *ptl; + + /* + * For contiguous HugeTLB PTEs that can contain other HugeTLB PTEs + * on the same level, the same PTL for both must be used. + * + * For some architectures that implement hugetlb_walk_step, this + * version of hugetlb_pte_populate() may not be correct to use for + * high-granularity PTEs. Instead, call __hugetlb_pte_populate() + * directly. + */ + if (level == HUGETLB_LEVEL_PMD) + ptl = pmd_lockptr(mm, (pmd_t *) ptep); + else + ptl = &mm->page_table_lock; + __hugetlb_pte_init(hpte, ptep, shift, level, ptl); } #if defined(CONFIG_HUGETLB_PAGE) && defined(CONFIG_CMA) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 34368072dabe..e0a92e7c1755 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5454,9 +5454,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, put_page(hpage); /* Install the new hugetlb folio if src pte stable */ - dst_ptl = huge_pte_lock(h, dst, dst_pte); - src_ptl = huge_pte_lockptr(huge_page_shift(h), - src, src_pte); + dst_ptl = hugetlb_pte_lock(&dst_hpte); + src_ptl = hugetlb_pte_lockptr(&src_hpte); spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); entry = huge_ptep_get(src_pte); if (!pte_same(src_pte_old, entry)) { @@ -7582,7 +7581,8 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long saddr; pte_t *spte = NULL; pte_t *pte; - spinlock_t *ptl; + struct hugetlb_pte hpte; + struct hstate *shstate; i_mmap_lock_read(mapping); vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) { @@ -7603,7 +7603,11 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, if (!spte) goto out; - ptl = huge_pte_lock(hstate_vma(vma), mm, spte); + shstate = hstate_vma(svma); + + hugetlb_pte_init(mm, &hpte, spte, huge_page_shift(shstate), + hpage_size_to_level(huge_page_size(shstate))); + spin_lock(hpte.ptl); if (pud_none(*pud)) { pud_populate(mm, pud, (pmd_t *)((unsigned long)spte & PAGE_MASK)); @@ -7611,7 +7615,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, } else { put_page(virt_to_page(spte)); } - spin_unlock(ptl); + spin_unlock(hpte.ptl); out: pte = (pte_t *)pmd_alloc(mm, pud, addr); i_mmap_unlock_read(mapping); @@ -8315,6 +8319,7 @@ static void hugetlb_unshare_pmds(struct vm_area_struct *vma, unsigned long address; spinlock_t *ptl; pte_t *ptep; + struct hugetlb_pte hpte; if (!(vma->vm_flags & VM_MAYSHARE)) return; @@ -8336,7 +8341,10 @@ static void hugetlb_unshare_pmds(struct vm_area_struct *vma, ptep = hugetlb_walk(vma, address, sz); if (!ptep) continue; - ptl = huge_pte_lock(h, mm, ptep); + + hugetlb_pte_init(mm, &hpte, ptep, huge_page_shift(h), + hpage_size_to_level(sz)); + ptl = hugetlb_pte_lock(&hpte); huge_pmd_unshare(mm, vma, address, ptep); spin_unlock(ptl); } From patchwork Sat Feb 18 00:28:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58846 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp143353wrn; Fri, 17 Feb 2023 16:32:40 -0800 (PST) X-Google-Smtp-Source: AK7set8oHSF2yVbbb5G3Xj0ZbCuY2pLSvVTOfEIeFFjBUz5btcDNp2sg4x1YnpNB0RkOxopW6SYh X-Received: by 2002:a17:903:110e:b0:19a:8811:5dee with SMTP id n14-20020a170903110e00b0019a88115deemr3083832plh.35.1676680360408; Fri, 17 Feb 2023 16:32:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676680360; cv=none; d=google.com; s=arc-20160816; b=pFAgdaDeF48Jgu8Pp5Tc2yJVHAYIi3LR0piEIkaE5iELeyDdgY5U9P39PvC9fAF0Sb u1eSRq5l73DDteDYup+q7yOPqbrTPc3/eBq1W66DDLbhaf546fEXjzhxUpcfsL4tmY8o LJhfFpG9tf1SXCAv0jNNtVHRoZUaruyXC2Fz5PbrK4r+G0WOBj+2uniDnCSbei7M0F34 2e9M2Hoscjof6KLQxx+FHQNugGTt6SIPkgFz9X20mOm9MUlyNtI2MYUBW7Nhk5gHdDYf l+ag9BHFtle4SLIln+GyCNXQVFEVHpnX07neb4oTPLHBsBSN8mMjFlp2hTaQncHn0U1b /aZg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=GaGw9/EzNbJa/6t4pJ7yTdVKyp/NxSZHrGSlaKMUK+E=; b=sgP2jLiVev4tOxKNCBygEt77lXnldC8U+LK/L2uDGp/ztjuiBG+7wwAVWmNMt7YuZn rpV9AnkCnEp+vb8wauPJqIMtTClRvJWNLTrbbepFncaHLWxX7NmEEeDsIgaqZfnOB0Pf 7uOKGgepC4isUfvbgbASV72znEuMjRpjBmwihPX1EyV08rUQHCs70BEeEDspBDwti8G8 McP9xM8+foBsLB4OtMb32Tdkr9URjvKt0j/ADRMfPaSEPlxHBjKG7JiVXVbsqbpOM9QU kpaQC3yJ/7nCv/vkcogP4oEZxvL9EyS4ZYf+zbRVIdIz12/ETmkzFPZDAtG54Yr7p/Dy V9ew== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=JocgWYIN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b16-20020a170902b61000b0019aaa9dbd3csi1079552pls.40.2023.02.17.16.32.27; Fri, 17 Feb 2023 16:32:40 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=JocgWYIN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229658AbjBRAbd (ORCPT + 99 others); Fri, 17 Feb 2023 19:31:33 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43142 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230055AbjBRAaQ (ORCPT ); Fri, 17 Feb 2023 19:30:16 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A257964B22 for ; Fri, 17 Feb 2023 16:29:37 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-5365a8e6d8dso18114837b3.7 for ; Fri, 17 Feb 2023 16:29:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=GaGw9/EzNbJa/6t4pJ7yTdVKyp/NxSZHrGSlaKMUK+E=; b=JocgWYINYVmULfBzV4L8RQDVrC/ksz9Y8dI+J+nlAsc/Rp1x6bcRPeZoTXr9WfAAcE 637vbboTsrEowR0AFIppjUO9sj33exLylRWnzGJRB4Ke4REBcvqZSCDhAGTVp/j5YufQ YgJobCIYoQmdFEhMN4GhlXFaOyyGAAfpa0yi+wp2F9k5f0rcjtQQjbs0cPJHzrVzFUhk 50adDYi9tmH1ngFR2JYpBWBk6Q0MNk3dZbjF//o0K6YVOKO0ibfrqlf5m/HRlbVFSL22 zZbQkatRHAWrwIFzYMCpOzJ34fpVzrSFt1DFYR7+DUtjlcSXBgnd1OMWELyjBlQGE0L4 N6FQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=GaGw9/EzNbJa/6t4pJ7yTdVKyp/NxSZHrGSlaKMUK+E=; b=BFhwe3q2T8PM0Yr8wQ1Cqa/BHYcdYzAxA0xGpGwXX6DctVUR//wKKoH8z5uuOJQGTO 3W3hqX48V9aKR4G9H9ksHPutSSh3cYJHxyz7xOiDEyLeGoYTV6qNnuar2tCoIDkmgg1X Xl71DxR9aFonk/tpwTX2C0ekHh5Q5pECu7G8uA3UcqJw/MCm7YA26j5IpWGvMA/6BGjZ MJOLgrz0eQbMUVVzSHReTSb9Ac9EtOk/OZU+qbe++7VEnppEo8HzkmbAFmiewkkffDBW O19lAB/bAA4MkXUaCP2m72/tbQwtrProSD3/VLf9xrnGYJT65OMtqvoVFyhMSW50tt1x gMqA== X-Gm-Message-State: AO0yUKVAz+dU636JJcw48hVFNoHmrPqzGjFomNzoTz3lZWs+D6AEmf3F QE6vX3e5z69mUWVb2/BKh0k2gl1kGt+UTwOO X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6902:12c8:b0:8e3:6aea:973 with SMTP id j8-20020a05690212c800b008e36aea0973mr91564ybu.4.1676680161464; Fri, 17 Feb 2023 16:29:21 -0800 (PST) Date: Sat, 18 Feb 2023 00:28:10 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-38-jthoughton@google.com> Subject: [PATCH v2 37/46] hugetlb: replace make_huge_pte with make_huge_pte_with_shift From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758126785600729485?= X-GMAIL-MSGID: =?utf-8?q?1758126785600729485?= This removes the old definition of make_huge_pte, where now we always require the shift to be explicitly given. All callsites are cleaned up. Signed-off-by: James Houghton diff --git a/mm/hugetlb.c b/mm/hugetlb.c index e0a92e7c1755..4c9b3c5379b2 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5204,9 +5204,9 @@ const struct vm_operations_struct hugetlb_vm_ops = { .pagesize = hugetlb_vm_op_pagesize, }; -static pte_t make_huge_pte_with_shift(struct vm_area_struct *vma, - struct page *page, int writable, - int shift) +static pte_t make_huge_pte(struct vm_area_struct *vma, + struct page *page, int writable, + int shift) { pte_t entry; @@ -5222,14 +5222,6 @@ static pte_t make_huge_pte_with_shift(struct vm_area_struct *vma, return entry; } -static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page, - int writable) -{ - unsigned int shift = huge_page_shift(hstate_vma(vma)); - - return make_huge_pte_with_shift(vma, page, writable, shift); -} - static void set_huge_ptep_writable(struct vm_area_struct *vma, unsigned long address, pte_t *ptep) { @@ -5272,7 +5264,9 @@ hugetlb_install_folio(struct vm_area_struct *vma, pte_t *ptep, unsigned long add { __folio_mark_uptodate(new_folio); hugepage_add_new_anon_rmap(new_folio, vma, addr); - set_huge_pte_at(vma->vm_mm, addr, ptep, make_huge_pte(vma, &new_folio->page, 1)); + set_huge_pte_at(vma->vm_mm, addr, ptep, make_huge_pte( + vma, &new_folio->page, 1, + huge_page_shift(hstate_vma(vma)))); hugetlb_count_add(pages_per_huge_page(hstate_vma(vma)), vma->vm_mm); folio_set_hugetlb_migratable(new_folio); } @@ -6006,7 +6000,8 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, hugetlb_remove_rmap(old_page, huge_page_shift(h), h, vma); hugepage_add_new_anon_rmap(new_folio, vma, haddr); set_huge_pte_at(mm, haddr, ptep, - make_huge_pte(vma, &new_folio->page, !unshare)); + make_huge_pte(vma, &new_folio->page, !unshare, + huge_page_shift(h))); folio_set_hugetlb_migratable(new_folio); /* Make the old page be freed below */ new_folio = page_folio(old_page); @@ -6348,7 +6343,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, else hugetlb_add_file_rmap(subpage, hpte->shift, h, vma); - new_pte = make_huge_pte_with_shift(vma, subpage, + new_pte = make_huge_pte(vma, subpage, ((vma->vm_flags & VM_WRITE) && (vma->vm_flags & VM_SHARED)), hpte->shift); @@ -6770,8 +6765,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, else writable = dst_vma->vm_flags & VM_WRITE; - _dst_pte = make_huge_pte_with_shift(dst_vma, subpage, writable, - dst_hpte->shift); + _dst_pte = make_huge_pte(dst_vma, subpage, writable, dst_hpte->shift); /* * Always mark UFFDIO_COPY page dirty; note that this may not be * extremely important for hugetlbfs for now since swapping is not @@ -8169,8 +8163,7 @@ static int __hugetlb_collapse(struct mm_struct *mm, struct vm_area_struct *vma, } subpage = hugetlb_find_subpage(h, folio, curr); - entry = make_huge_pte_with_shift(vma, subpage, - writable, hpte.shift); + entry = make_huge_pte(vma, subpage, writable, hpte.shift); hugetlb_add_file_rmap(subpage, hpte.shift, h, vma); set_huge_pte_at(mm, curr, hpte.ptep, entry); spin_unlock(ptl); From patchwork Sat Feb 18 00:28:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58847 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp143367wrn; Fri, 17 Feb 2023 16:32:43 -0800 (PST) X-Google-Smtp-Source: AK7set9qXvKxa1/4I48uVHJB07cAHIUdEG5cGxT4IOGnNb6qGPEy2U7gWEQipPPO37RHgcTy4z5P X-Received: by 2002:a05:6a21:32aa:b0:bf:7ae0:5f97 with SMTP id yt42-20020a056a2132aa00b000bf7ae05f97mr1949302pzb.21.1676680363002; Fri, 17 Feb 2023 16:32:43 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676680362; cv=none; d=google.com; s=arc-20160816; b=UwtUkaiJaZ1GV6oQFPODk49/Q7AXHm/FEtdR8Qb3G/Jm11cyI++6VKt05NFreUHHoZ hHmVstP2YNyzZHaNs7/+/FEyrc8cXvjoNZ6DlIebYGgEVBE/eDSGFMEfd70bKzA7W+NM /Rq4IFYnqh80565leCrQjJpwjIoloJv6qIjt9ulysJQKViiBXqgmgto+9EqwmyMd4J61 fIThYTyikATSCejtrU9dk9jKl9CTz3A99KvnIF7ac/Q9QunC3yfPC45QSTTq72lfKk4L 6u1XaIEbGiAzdCwW6ED9AGMpAWDxJeqC73Dyb0AmyfAFg/xnItM21COL74creAdS8Tyu SwXw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=ep+t2caOQSahkPTOUAWt22QPS6ka0wL1iktlvruOFtk=; b=hQxdthoqNdE+RyIN3daKihozKOzuhPaZEWTDzho7TgWtILa7VaqGB3IPKnqAZw0vVo 6bPiuEIQrsEHJw48RUXFoTyv0tbWcXkyU+MK/qKQUAyTApbnCLijvJgBYjZm36HDs9SU qeAS8dHU9nSXhIWHVTGGnxaE1nZDE7OX4N9dLWebqK8zX+LgeVfqyoPqA7ZJunmLgW9a 9IHYn6xpXzH4az0mBpKXLet575YZklB9RXZsxHx0RXaE6K/Lk5aLBzCf1u2FCdl4ffFh XtUIwA30zW9iDbbnbqevrRLHwfizSiZm/IRx/6keaLT+wTNhEiP2Z7ooj9+MsaGJoxFx g0hw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=eXxtIitd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j71-20020a63804a000000b004fb359820cfsi6487691pgd.573.2023.02.17.16.32.29; Fri, 17 Feb 2023 16:32:42 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=eXxtIitd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230102AbjBRAbf (ORCPT + 99 others); Fri, 17 Feb 2023 19:31:35 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43248 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229656AbjBRAan (ORCPT ); Fri, 17 Feb 2023 19:30:43 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 50FED6D264 for ; Fri, 17 Feb 2023 16:29:41 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id 75-20020a250b4e000000b0090f2c84a6a4so1998119ybl.13 for ; Fri, 17 Feb 2023 16:29:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ep+t2caOQSahkPTOUAWt22QPS6ka0wL1iktlvruOFtk=; b=eXxtIitdVZvzFBf04daw9PPgRtXdmihKwPkQjcig6vMYYMtp+LOPKmFy/GNPGDZBuZ hUMbzRNTCrNmLeDYHlIEgRctvtpQ9FpU/o7UPh1ZIp/HRvg6DXU0nPANTJe2xRrjJGwp WbMuaqtCTStJZiCZ1A+9pK76quqdRz7D7vwYcnbCxzAH8ZqKG92mdTDkW+Ff5KgL16Oc bfQ+hpZtyfxk3DBK5nZlhD8pUxO/J77oxCvJiRMYzZvWPl+L1gHyATsIUcMQzvI17Xkm FGAjE5aD9xBtLlj157PrQ56C2U6XXydzea3z40X2BAfKNbG/jjnHVSZCkfPzbPOV1U8m sBLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ep+t2caOQSahkPTOUAWt22QPS6ka0wL1iktlvruOFtk=; b=igxrMWXMAxA7kza0kCO3chs5gaHN3oSNNdh2nuAunXS+8wAL7oOwQ+rhkrjWxVoTf8 EVX0yoJ3V8+kMcjBVjBMszCM2yzc+bs1xWkRaGX7uRoHrIDMX4kB09h8UKm1vRyAVItd WDX2AYVDMNrG7jYGSfAlxK445m0lyfCRCSr2qpy/hEfvkpEEBgsmgDNgeWicOaEYSrH0 G2Xp4e0LhCffszc+NoX+wNQ3vvP7TZwCfP1tEp14wSzygiEWe3KzEygI3e/3ViDgZ9fP sR67ROMJSk3v8XzSYGxFdUVK6e/PRwRl9ZWnsklfRLl//Td+OJJxnmTJ1NptelZmS1Tn /2mQ== X-Gm-Message-State: AO0yUKXwvMCaOPUT9aIEKkf93kLZyazWuikaxQMfNujn+4XYscJrtJf5 bXqtdllh5G05ocgpPnCvo62Q5qWAfQXTF5PC X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a5b:d4b:0:b0:8da:6dc5:ca06 with SMTP id f11-20020a5b0d4b000000b008da6dc5ca06mr215488ybr.7.1676680162408; Fri, 17 Feb 2023 16:29:22 -0800 (PST) Date: Sat, 18 Feb 2023 00:28:11 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-39-jthoughton@google.com> Subject: [PATCH v2 38/46] mm: smaps: add stats for HugeTLB mapping size From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758126788013987602?= X-GMAIL-MSGID: =?utf-8?q?1758126788013987602?= When the kernel is compiled with HUGETLB_HIGH_GRANULARITY_MAPPING, smaps may provide HugetlbPudMapped, HugetlbPmdMapped, and HugetlbPteMapped. Levels that are folded will not be outputted. Signed-off-by: James Houghton diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 2f293b5dabc0..1ced7300f8cd 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -412,6 +412,15 @@ struct mem_size_stats { unsigned long swap; unsigned long shared_hugetlb; unsigned long private_hugetlb; +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING +#ifndef __PAGETABLE_PUD_FOLDED + unsigned long hugetlb_pud_mapped; +#endif +#ifndef __PAGETABLE_PMD_FOLDED + unsigned long hugetlb_pmd_mapped; +#endif + unsigned long hugetlb_pte_mapped; +#endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ u64 pss; u64 pss_anon; u64 pss_file; @@ -731,6 +740,33 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) } #ifdef CONFIG_HUGETLB_PAGE + +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING +static void smaps_hugetlb_hgm_account(struct mem_size_stats *mss, + struct hugetlb_pte *hpte) +{ + unsigned long size = hugetlb_pte_size(hpte); + + switch (hpte->level) { +#ifndef __PAGETABLE_PUD_FOLDED + case HUGETLB_LEVEL_PUD: + mss->hugetlb_pud_mapped += size; + break; +#endif +#ifndef __PAGETABLE_PMD_FOLDED + case HUGETLB_LEVEL_PMD: + mss->hugetlb_pmd_mapped += size; + break; +#endif + case HUGETLB_LEVEL_PTE: + mss->hugetlb_pte_mapped += size; + break; + default: + break; + } +} +#endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ + static int smaps_hugetlb_range(struct hugetlb_pte *hpte, unsigned long addr, struct mm_walk *walk) @@ -764,6 +800,9 @@ static int smaps_hugetlb_range(struct hugetlb_pte *hpte, mss->shared_hugetlb += sz; else mss->private_hugetlb += sz; +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING + smaps_hugetlb_hgm_account(mss, hpte); +#endif } return 0; } @@ -833,38 +872,47 @@ static void smap_gather_stats(struct vm_area_struct *vma, static void __show_smap(struct seq_file *m, const struct mem_size_stats *mss, bool rollup_mode) { - SEQ_PUT_DEC("Rss: ", mss->resident); - SEQ_PUT_DEC(" kB\nPss: ", mss->pss >> PSS_SHIFT); - SEQ_PUT_DEC(" kB\nPss_Dirty: ", mss->pss_dirty >> PSS_SHIFT); + SEQ_PUT_DEC("Rss: ", mss->resident); + SEQ_PUT_DEC(" kB\nPss: ", mss->pss >> PSS_SHIFT); + SEQ_PUT_DEC(" kB\nPss_Dirty: ", mss->pss_dirty >> PSS_SHIFT); if (rollup_mode) { /* * These are meaningful only for smaps_rollup, otherwise two of * them are zero, and the other one is the same as Pss. */ - SEQ_PUT_DEC(" kB\nPss_Anon: ", + SEQ_PUT_DEC(" kB\nPss_Anon: ", mss->pss_anon >> PSS_SHIFT); - SEQ_PUT_DEC(" kB\nPss_File: ", + SEQ_PUT_DEC(" kB\nPss_File: ", mss->pss_file >> PSS_SHIFT); - SEQ_PUT_DEC(" kB\nPss_Shmem: ", + SEQ_PUT_DEC(" kB\nPss_Shmem: ", mss->pss_shmem >> PSS_SHIFT); } - SEQ_PUT_DEC(" kB\nShared_Clean: ", mss->shared_clean); - SEQ_PUT_DEC(" kB\nShared_Dirty: ", mss->shared_dirty); - SEQ_PUT_DEC(" kB\nPrivate_Clean: ", mss->private_clean); - SEQ_PUT_DEC(" kB\nPrivate_Dirty: ", mss->private_dirty); - SEQ_PUT_DEC(" kB\nReferenced: ", mss->referenced); - SEQ_PUT_DEC(" kB\nAnonymous: ", mss->anonymous); - SEQ_PUT_DEC(" kB\nLazyFree: ", mss->lazyfree); - SEQ_PUT_DEC(" kB\nAnonHugePages: ", mss->anonymous_thp); - SEQ_PUT_DEC(" kB\nShmemPmdMapped: ", mss->shmem_thp); - SEQ_PUT_DEC(" kB\nFilePmdMapped: ", mss->file_thp); - SEQ_PUT_DEC(" kB\nShared_Hugetlb: ", mss->shared_hugetlb); - seq_put_decimal_ull_width(m, " kB\nPrivate_Hugetlb: ", + SEQ_PUT_DEC(" kB\nShared_Clean: ", mss->shared_clean); + SEQ_PUT_DEC(" kB\nShared_Dirty: ", mss->shared_dirty); + SEQ_PUT_DEC(" kB\nPrivate_Clean: ", mss->private_clean); + SEQ_PUT_DEC(" kB\nPrivate_Dirty: ", mss->private_dirty); + SEQ_PUT_DEC(" kB\nReferenced: ", mss->referenced); + SEQ_PUT_DEC(" kB\nAnonymous: ", mss->anonymous); + SEQ_PUT_DEC(" kB\nLazyFree: ", mss->lazyfree); + SEQ_PUT_DEC(" kB\nAnonHugePages: ", mss->anonymous_thp); + SEQ_PUT_DEC(" kB\nShmemPmdMapped: ", mss->shmem_thp); + SEQ_PUT_DEC(" kB\nFilePmdMapped: ", mss->file_thp); + SEQ_PUT_DEC(" kB\nShared_Hugetlb: ", mss->shared_hugetlb); + seq_put_decimal_ull_width(m, " kB\nPrivate_Hugetlb: ", mss->private_hugetlb >> 10, 7); - SEQ_PUT_DEC(" kB\nSwap: ", mss->swap); - SEQ_PUT_DEC(" kB\nSwapPss: ", +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING +#ifndef __PAGETABLE_PUD_FOLDED + SEQ_PUT_DEC(" kB\nHugetlbPudMapped: ", mss->hugetlb_pud_mapped); +#endif +#ifndef __PAGETABLE_PMD_FOLDED + SEQ_PUT_DEC(" kB\nHugetlbPmdMapped: ", mss->hugetlb_pmd_mapped); +#endif + SEQ_PUT_DEC(" kB\nHugetlbPteMapped: ", mss->hugetlb_pte_mapped); +#endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ + SEQ_PUT_DEC(" kB\nSwap: ", mss->swap); + SEQ_PUT_DEC(" kB\nSwapPss: ", mss->swap_pss >> PSS_SHIFT); - SEQ_PUT_DEC(" kB\nLocked: ", + SEQ_PUT_DEC(" kB\nLocked: ", mss->pss_locked >> PSS_SHIFT); seq_puts(m, " kB\n"); } @@ -880,18 +928,18 @@ static int show_smap(struct seq_file *m, void *v) show_map_vma(m, vma); - SEQ_PUT_DEC("Size: ", vma->vm_end - vma->vm_start); - SEQ_PUT_DEC(" kB\nKernelPageSize: ", vma_kernel_pagesize(vma)); - SEQ_PUT_DEC(" kB\nMMUPageSize: ", vma_mmu_pagesize(vma)); + SEQ_PUT_DEC("Size: ", vma->vm_end - vma->vm_start); + SEQ_PUT_DEC(" kB\nKernelPageSize: ", vma_kernel_pagesize(vma)); + SEQ_PUT_DEC(" kB\nMMUPageSize: ", vma_mmu_pagesize(vma)); seq_puts(m, " kB\n"); __show_smap(m, &mss, false); - seq_printf(m, "THPeligible: %d\n", + seq_printf(m, "THPeligible: %d\n", hugepage_vma_check(vma, vma->vm_flags, true, false, true)); if (arch_pkeys_enabled()) - seq_printf(m, "ProtectionKey: %8u\n", vma_pkey(vma)); + seq_printf(m, "ProtectionKey: %8u\n", vma_pkey(vma)); show_smap_vma_flags(m, vma); return 0; From patchwork Sat Feb 18 00:28:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58852 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp147399wrn; Fri, 17 Feb 2023 16:46:15 -0800 (PST) X-Google-Smtp-Source: AK7set+jar8wSBACGxBZ1rdB9ujrp4hMNfBoUhEMbL9hNaqDPN6R5CWMZv0WeKFXb05Pb1gpdCAN X-Received: by 2002:a17:902:f682:b0:19c:1433:5fba with SMTP id l2-20020a170902f68200b0019c14335fbamr5206990plg.0.1676681175367; Fri, 17 Feb 2023 16:46:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676681175; cv=none; d=google.com; s=arc-20160816; b=eyB2L9310wjhDG8UkGtEgMNewg/gpEiyvvmWGaAitzYmthOns4jcI5Mjx51oWGQ54K eRsCR3LdOxpShpnEDQtjyMTw6zjcqXIEMHD438/K0RQhGo1RwMNPlWctt6l+/x0FkfRS PT/g8lciLABMEoalLKLcTkjLey9ul5lLJj1btdr7GhRyXT25JDK7dTnxsM/Y0LVVLweX 1Fo34m7l27EXXfm+MyeLxIdeGg6jtL/8s90HDoFS4QmWyDsjU6mDRfnA9uQCaUshxD/Z TGVtyjCORx9n6vvalA5ZMuDM9gx/Jnb7a/3sWETiKhF5+bsZtyvy2LpxdxSeI3j9sVxa 0dVw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=YyhF10hcbQMtiIcRNPe2j9wgCL3nu0mYgfomzauIh1M=; b=FHH12/a04stID5EXIrTk+FC6lGY4YM2BKiSpItrSQsbBWHVYWOQoRcGwnQLtoWEVCA HhnfwJs9fNiJuKlPv01ULiLoc9ODXDSQAc5RSrgRhPILKONT9QSt8lruAUol7ZJugLE/ ifTSSI78hPap+FyBfA2sbHbNWJ0lIKDcUIsqPZ048gYLr8gZ7F3otB5n1fxvklxI/3EO t0YU9RRS0NcFin6a0wkBmNqJk9dlG81I1lVKnE13K+zDD686ELQY1xrZ9Gjm9CG+tEcW l2XECITajfC1zFijlW7tdRfKtVUEfqQ66caXn3TX8jchyXGzkbUu6/g3tfyCRM0iBscF CeGw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=mdWiwuhX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f1-20020a170902ff0100b00198e0aa6c41si1092328plj.450.2023.02.17.16.46.02; Fri, 17 Feb 2023 16:46:15 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=mdWiwuhX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230330AbjBRAbk (ORCPT + 99 others); Fri, 17 Feb 2023 19:31:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43388 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229795AbjBRAao (ORCPT ); Fri, 17 Feb 2023 19:30:44 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DDBFE5BDB0 for ; Fri, 17 Feb 2023 16:29:43 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-53655de27a1so28453887b3.14 for ; Fri, 17 Feb 2023 16:29:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=YyhF10hcbQMtiIcRNPe2j9wgCL3nu0mYgfomzauIh1M=; b=mdWiwuhXmCO7ZvCnfOzT5kGyeI1B76654OSvGqg0v4SfpBfq8gE8Y2HxCtDlqyAoCe RlH3iNC1O3POA4w07JzQSLH9WOuQMtWGYGTcuZDqfaF7LyXaloMakwXqjQejR67+NzLM ZSebxnW4BLvieTC+ErPKwo8+NE2d2ITkqYp8qePD98ii1puW00fdoHibbpwmm78EnXnc e9mKjUDG0xvqx47GeoE/+zEj2Ty0K8brFhlfqMVjJV2l/MZO0YGU5DUAH3uw/T0E0FYc e6nZsvIWbjAB08oOCLK03Uzk6o/Dc+xelLgiCeHsfs3lPkfiBKCFNyXIzhr8GfZEa1Cd rW1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=YyhF10hcbQMtiIcRNPe2j9wgCL3nu0mYgfomzauIh1M=; b=ldTZJnUaB9vjveagyNoKM5vwgl2MAh23/erCGa2jZLxRTuWUtZIZAZJ/Yw+7oA2Wq5 vQIfUkBpg2CT3xPQsJomH8uJZIEmI4nAw9FRI8vKGHXzTNP0H11uiniSbmHI8O/wf6Hx ypMwPIdAQ8vB/YFN8ac9m357B71xxCoJxkc2rGC/uzvNiq/dMtyAKPWM0JhwLlLoecEF +rYz5fzdhZ+qi78ddrqW7DsRyw5F1re3ZpWDF6GfYMSjV+0eaDU1ihBECmY2fzilr60T JSQYMTTiQJa1BYxIy+h3NFcHH6pNrqSzdbX+h198sn3GZhNsbBMJd8ZF15fYxnkaNIHj 2diQ== X-Gm-Message-State: AO0yUKX4Yt//dgMwaoDJjkyVOFg0of0EPrlLBHMTjA5znVF8LSgevKft ATzq4Stpg8E/1bWZNxREnyiUJPp8tTSluhdh X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6902:3:b0:90d:af77:9ca6 with SMTP id l3-20020a056902000300b0090daf779ca6mr34196ybh.7.1676680163234; Fri, 17 Feb 2023 16:29:23 -0800 (PST) Date: Sat, 18 Feb 2023 00:28:12 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-40-jthoughton@google.com> Subject: [PATCH v2 39/46] hugetlb: x86: enable high-granularity mapping for x86_64 From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758127639911247805?= X-GMAIL-MSGID: =?utf-8?q?1758127639911247805?= Now that HGM is fully supported for GENERAL_HUGETLB, we can enable it for x86_64. We can only enable it for 64-bit architectures because the vm flag VM_HUGETLB_HGM uses a high bit. The x86 KVM MMU already properly handles HugeTLB HGM pages (it does a page table walk to determine which size to use in the second-stage page table instead of, for example, checking vma_mmu_pagesize, like arm64 does). Signed-off-by: James Houghton diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 3604074a878b..fde9ba1dd8d7 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -126,6 +126,7 @@ config X86 select ARCH_WANT_GENERAL_HUGETLB select ARCH_WANT_HUGE_PMD_SHARE select ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP if X86_64 + select ARCH_WANT_HUGETLB_HIGH_GRANULARITY_MAPPING if X86_64 select ARCH_WANT_LD_ORPHAN_WARN select ARCH_WANTS_THP_SWAP if X86_64 select ARCH_HAS_PARANOID_L1D_FLUSH From patchwork Sat Feb 18 00:28:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58853 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp147453wrn; Fri, 17 Feb 2023 16:46:28 -0800 (PST) X-Google-Smtp-Source: AK7set9+2fh6LLPcwrvOHrX2ZHRXRQZ0oOUEjzQuA/WnkcMotqUWW693niltQ656CFzm4LxXaqWD X-Received: by 2002:a17:90b:1643:b0:233:b416:7f85 with SMTP id il3-20020a17090b164300b00233b4167f85mr1162079pjb.2.1676681188104; Fri, 17 Feb 2023 16:46:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676681188; cv=none; d=google.com; s=arc-20160816; b=H8giQeoZDpjNVWbq6crJSjb5bR+8t74ZR2EtQKfKBwmL4gNu4vqts+Ipq4b42dhHQi c998I5puG3fxIHkA5ohWjlU0uUgWZ8iY/YIZuPbrOIK7empEw0wCVw4LMKA+8ZM+lAOt 6x2oljw62ww9ZjKYpeZVCgnDQWKlnnss5Vxlj16QRPWCpfXDDFF9LVogPspYrvnac9ti py4MyI/hR8kgniMZzTlZsxiRb+vaJVDV8bLWs8PypSivrCtfJ56zdtERHylk7065rTMd UFaDXPLBenKdS9ueJCzvF/NGenkSCV/6g0Q/yLi2j7ylzebVyDp1efw12qkK0asTtJL+ myWw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=meim6QzwmgVJzeI2+Wk1IxU1uXO2GoWG7kArQ5zE/oE=; b=lrJUTBhOiaT+JAidXkidLoyAQsdKEvKBMxbjRiR/2moWMXHFunWIVw/tKHDLqT+LSn wX3QFGBh+3jfYiVU5n/Xi3/wbqUCVtror3AadET9DPg4fn8rGC93H5M8NmV3ZZUkLFV2 uYbW0KidLAA8uPUo1bnt5zzYyCnR0cGU77AshFrao+pcfbKdxPyGGAmdIjD++ZxRhACF EoqH1LtgU2DjiT4NfLe9akZ2bGeAfjJN6T2fu41+4jw65U5qJfXmSUGsj+SxKqy7kM5b vgSlllqIO5Ekb7N+iZg5TFzxmfa3pGw7FWwNTUXefIWrFsYFprJm2WFf7XydVU8k+8Og L4Uw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=jBDVTyUi; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hi21-20020a17090b30d500b0022c09c603bdsi6706999pjb.118.2023.02.17.16.46.16; Fri, 17 Feb 2023 16:46:28 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=jBDVTyUi; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229813AbjBRAbu (ORCPT + 99 others); Fri, 17 Feb 2023 19:31:50 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43536 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230255AbjBRAbA (ORCPT ); Fri, 17 Feb 2023 19:31:00 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E23DB6B303 for ; Fri, 17 Feb 2023 16:29:48 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-5366c22f138so12022157b3.10 for ; Fri, 17 Feb 2023 16:29:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=meim6QzwmgVJzeI2+Wk1IxU1uXO2GoWG7kArQ5zE/oE=; b=jBDVTyUiFjSinjlesXO7Wsr29fC05xWRfrl7nAZ2Vi6vctWsz/7+NsXy6Dx9zE02oD Zi9rU6CuILwSQ9MmYCjyB8BnkobSNAXN9wk5uSKN5uHSYEobIzebiBr+QJLAoe2BO2h7 BvQ5nLWnyJ34pOp7tHtrWAltvM5a5A6wWV9ntVMJs7Np7CHhXIh92iq7fs97WdSGdx7k chX9KZzHAChL7Krh0NrLbnAxcDgyvNutktvSzeK5qogN04D9EfgapHMPtcUexRrTONeu Lv9LpAtAQc7gW1fQL5eLIT4qqKfM5pCyZOYdI8D3vWXHR6ii/aIgrxnOVziUFMcqw1D2 IOGw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=meim6QzwmgVJzeI2+Wk1IxU1uXO2GoWG7kArQ5zE/oE=; b=r9kBBNbu5C7zF9kg9HlCbryiCKEMfgSmCMiS9p17srDDiOLXWw0JCWMaaBm7HkUcEk JiEqsGuNuPcyjI7wVqhebUTmNuHCTpzR3AGF6SoDMlrtBApOC+H+tzgm+4Y0Ym/PG8X1 3hl7nWSwToKJYOR7ip6ybr/o4nQReZP1LDPK9SlsD9d5mo4V0/d5AoI8RMKczKbnaOHB /ypS+HKWaQIHQM2NCklGtC0XA8/+HmemPYX9EGKp9a5Tq/dGIj932jlWXUSCmR8oYtyG Lx0IVyk6bVGldLk4t3e57vMocycQhE9P1DiH19J5BKynT9Ge5yLp7wAy+MLQx9b8DWv/ IEqw== X-Gm-Message-State: AO0yUKVLfGWNzaR72MBJRWN42cQ5keEcpIq6yFdv6Nj879acuYCCCJh5 5XlX+hv58eiAg+mwVZHuOJUZomf1QNpkhM7+ X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a5b:144:0:b0:91c:90b6:f48a with SMTP id c4-20020a5b0144000000b0091c90b6f48amr1373069ybp.580.1676680164340; Fri, 17 Feb 2023 16:29:24 -0800 (PST) Date: Sat, 18 Feb 2023 00:28:13 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-41-jthoughton@google.com> Subject: [PATCH v2 40/46] docs: hugetlb: update hugetlb and userfaultfd admin-guides with HGM info From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758127653356843942?= X-GMAIL-MSGID: =?utf-8?q?1758127653356843942?= Include information about how MADV_SPLIT should be used to enable high-granularity UFFDIO_CONTINUE operations, and include information about how MADV_COLLAPSE should be used to collapse the mappings at the end. Signed-off-by: James Houghton diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst b/Documentation/admin-guide/mm/hugetlbpage.rst index a969a2c742b2..c6eaef785609 100644 --- a/Documentation/admin-guide/mm/hugetlbpage.rst +++ b/Documentation/admin-guide/mm/hugetlbpage.rst @@ -454,6 +454,10 @@ errno set to EINVAL or exclude hugetlb pages that extend beyond the length if not hugepage aligned. For example, munmap(2) will fail if memory is backed by a hugetlb page and the length is smaller than the hugepage size. +It is possible for users to map HugeTLB pages at a higher granularity than +normal using HugeTLB high-granularity mapping (HGM). For example, when using 1G +pages on x86, a user could map that page with 4K PTEs, 2M PMDs, a combination of +the two. See Documentation/admin-guide/mm/userfaultfd.rst. Examples ======== diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst index 83f31919ebb3..cc496a307ea2 100644 --- a/Documentation/admin-guide/mm/userfaultfd.rst +++ b/Documentation/admin-guide/mm/userfaultfd.rst @@ -169,7 +169,13 @@ like to do to resolve it: the page cache). Userspace has the option of modifying the page's contents before resolving the fault. Once the contents are correct (modified or not), userspace asks the kernel to map the page and let the - faulting thread continue with ``UFFDIO_CONTINUE``. + faulting thread continue with ``UFFDIO_CONTINUE``. If this is done at the + base-page size in a transparent-hugepage-eligible VMA or in a HugeTLB VMA + (requires ``MADV_SPLIT``), then userspace may want to use + ``MADV_COLLAPSE`` when a hugepage is fully populated to inform the kernel + that it may be able to collapse the mapping. ``MADV_COLLAPSE`` will undo + the effect of any ``UFFDIO_WRITEPROTECT`` calls on the collapsed address + range. Notes: From patchwork Sat Feb 18 00:28:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58856 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp148044wrn; Fri, 17 Feb 2023 16:48:20 -0800 (PST) X-Google-Smtp-Source: AK7set+rGYZxp+tg+lSEA6QKQKBFr69tXXeexrkjzAnzZSoc9Z4daBWzXJvGeuH4U07CRvW/9Zp1 X-Received: by 2002:a17:90b:4c07:b0:234:b75c:1847 with SMTP id na7-20020a17090b4c0700b00234b75c1847mr6544910pjb.30.1676681299731; Fri, 17 Feb 2023 16:48:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676681299; cv=none; d=google.com; s=arc-20160816; b=UM0JVMgUTirAwc9n2IkngVWV7HCs/dkXpwA7w09p9YvsBm5+Ugnk6R211Q3ICRAbIl Bo3J3/CBNJohMegtHVWyhbKGOKquNJamES4IaMYSlHyuzcAPJXlfW9gEB38mKMy40mc3 b0dQZLb9u+tFy6MxEq4/S6bIBnJ5xvkjqXOkBWfnZKvdQbXNxlpvpu1Ukq5M1VgiqCAU 57+4cHt9R8W1JwkRPmQiBKBKTCc8uAIeAN4Tp2zRqlcYkVKn+9dy4fontP9EaxL0lfss ExN2SAA9/eezsg6C+1ChZb+75Wd0KV0psv7CAjfKEVbH+CH01VOUXLXg9eP5Julb8h9N cPTA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=rVcqYSKj99pGLWaPkurlgRpM4dq8t04RSFrIu2L3zgA=; b=ggYAzqHoszTu5IpAocQqKLbNbeyDhtdo6LQnchCbC6ZpKXsTwx/D82U1mIPesgPI9J m1pU6l2o4RiDAKG2PPsv+e2LUEgIDVY6/IHVvT+SWFh301ZxYneNrjDEW2WdGEprZHHp Jij9st8gNGI5E3uKSlXIhdX0PHnYtcAmeTTKzpExYWIpXYzzd3G2n4IORFsOeIFC1dcx CmqQH+C/dMq00d1u0THITdkwaXG7+dhdXDfUD03bsnpp3QqC+cN60V6GBnubKOn7sjTI dY3wZcT+kQDSraOxLr//AulfG+HaXtx85OGbrtjupJnYE1Q6gRcvzO0sc0e4GZwkC8X8 BB5w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=q4c+hojX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 12-20020a17090a194c00b0022c5524d76dsi3622430pjh.24.2023.02.17.16.48.07; Fri, 17 Feb 2023 16:48:19 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=q4c+hojX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230338AbjBRAb7 (ORCPT + 99 others); Fri, 17 Feb 2023 19:31:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43108 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230294AbjBRAbW (ORCPT ); Fri, 17 Feb 2023 19:31:22 -0500 Received: from mail-vk1-xa49.google.com (mail-vk1-xa49.google.com [IPv6:2607:f8b0:4864:20::a49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6D4E36D25F for ; Fri, 17 Feb 2023 16:29:55 -0800 (PST) Received: by mail-vk1-xa49.google.com with SMTP id o73-20020a1f414c000000b0040163d749ecso646898vka.11 for ; Fri, 17 Feb 2023 16:29:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=rVcqYSKj99pGLWaPkurlgRpM4dq8t04RSFrIu2L3zgA=; b=q4c+hojX4+dTZygB2Tks0txtgbVxj7x8TKkFZ4tMc+sqQf6GeN73tsvcqCiW02hDp4 8XPu7pKfyuoWyWf0l6efa5Zl8QTeR0XBVVaGrv+Wd9WG+jAQtWkF9yUFkUA6luQj+6cl jVGauOVy3CkA+wV1fgTRQ99Wfw3xgg3rfIr1+mhVfMwPBkQ+w4aIyIAy3UP92Sz6TP2b rQ4LOt/rb25Xw9+JRDlRHS/Ym0me/mMiwA3HH1eYVYoYEY9NAXA/3czNh8h0WNs5P9nw XagFyrun6V7H5uqPSe2BtAfdYCj11o7oZZljG6RocZ2iOY2K7t3KE+FE2jG7TajzTzWr Fm/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=rVcqYSKj99pGLWaPkurlgRpM4dq8t04RSFrIu2L3zgA=; b=pjf/EGM78TOgMpolN8znEAxniik47ZXYh5R5s/capKMnFXX/Q3WNp8/9OwTLD63mQP Q2LrDR27bc/XpMxZe6U5HZv3oMOFpOjUE8HNmB0QiLKMgrvGWYrtWV1mpwpJFH+vVN9m uwQJm74HVJhwFmrgUbQ+hSSbl3yhhSV2/o9UtOJQ3zCLx3tFKJlJmj5JJWcc3hy1YaXq b65RDP6sm43bE3gUoUALCtwNuUEhRRJ3v3+fOTCPYdPZKmS+xSy03O4H7jdrgn8Cn406 q2nuyToL8vegjh3IxWAV4K+l6s/JBrF2qT7oEVgEaA8nKrasx2VCjqqBiazdKH9lddKT gwLg== X-Gm-Message-State: AO0yUKX17HODU8+NeSdCcMejTk5b3+rDJVdm65o8wQvE5HesSQhJEaRr n9KyYfRvneJ4nZ1BNNTVQgj2i+WSWqTmQbnZ X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:ab0:100c:0:b0:68b:9eed:1c7d with SMTP id f12-20020ab0100c000000b0068b9eed1c7dmr77489uab.0.1676680165444; Fri, 17 Feb 2023 16:29:25 -0800 (PST) Date: Sat, 18 Feb 2023 00:28:14 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-42-jthoughton@google.com> Subject: [PATCH v2 41/46] docs: proc: include information about HugeTLB HGM From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758127770782587543?= X-GMAIL-MSGID: =?utf-8?q?1758127770782587543?= Include the updates that have been made to smaps, specifically, the addition of Hugetlb[Pud,Pmd,Pte]Mapped. Signed-off-by: James Houghton diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst index e224b6d5b642..1d2a1cd1fe6a 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -447,29 +447,32 @@ Memory Area, or VMA) there is a series of lines such as the following:: 08048000-080bc000 r-xp 00000000 03:02 13130 /bin/bash - Size: 1084 kB - KernelPageSize: 4 kB - MMUPageSize: 4 kB - Rss: 892 kB - Pss: 374 kB - Pss_Dirty: 0 kB - Shared_Clean: 892 kB - Shared_Dirty: 0 kB - Private_Clean: 0 kB - Private_Dirty: 0 kB - Referenced: 892 kB - Anonymous: 0 kB - LazyFree: 0 kB - AnonHugePages: 0 kB - ShmemPmdMapped: 0 kB - Shared_Hugetlb: 0 kB - Private_Hugetlb: 0 kB - Swap: 0 kB - SwapPss: 0 kB - KernelPageSize: 4 kB - MMUPageSize: 4 kB - Locked: 0 kB - THPeligible: 0 + Size: 1084 kB + KernelPageSize: 4 kB + MMUPageSize: 4 kB + Rss: 892 kB + Pss: 374 kB + Pss_Dirty: 0 kB + Shared_Clean: 892 kB + Shared_Dirty: 0 kB + Private_Clean: 0 kB + Private_Dirty: 0 kB + Referenced: 892 kB + Anonymous: 0 kB + LazyFree: 0 kB + AnonHugePages: 0 kB + ShmemPmdMapped: 0 kB + Shared_Hugetlb: 0 kB + Private_Hugetlb: 0 kB + HugetlbPudMapped: 0 kB + HugetlbPmdMapped: 0 kB + HugetlbPteMapped: 0 kB + Swap: 0 kB + SwapPss: 0 kB + KernelPageSize: 4 kB + MMUPageSize: 4 kB + Locked: 0 kB + THPeligible: 0 VmFlags: rd ex mr mw me dw The first of these lines shows the same information as is displayed for the @@ -510,10 +513,15 @@ implementation. If this is not desirable please file a bug report. "ShmemPmdMapped" shows the ammount of shared (shmem/tmpfs) memory backed by huge pages. -"Shared_Hugetlb" and "Private_Hugetlb" show the ammounts of memory backed by +"Shared_Hugetlb" and "Private_Hugetlb" show the amounts of memory backed by hugetlbfs page which is *not* counted in "RSS" or "PSS" field for historical reasons. And these are not included in {Shared,Private}_{Clean,Dirty} field. +If the kernel was compiled with ``CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING``, +"HugetlbPudMapped", "HugetlbPmdMapped", and "HugetlbPteMapped" may appear and +show the amount of HugeTLB memory mapped with PUDs, PMDs, and PTEs respectively. +Folded levels won't appear. See Documentation/admin-guide/mm/hugetlbpage.rst. + "Swap" shows how much would-be-anonymous memory is also used, but out on swap. For shmem mappings, "Swap" includes also the size of the mapped (and not From patchwork Sat Feb 18 00:28:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58854 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp147812wrn; Fri, 17 Feb 2023 16:47:40 -0800 (PST) X-Google-Smtp-Source: AK7set/4tB880otrfhP9emKEeWuaxMMO1jIk4ym3iHk8FdZCOLlOJ7VMdKvMuqmEnvVrgCbD7nYw X-Received: by 2002:a05:6a00:430a:b0:5a8:cc39:fc58 with SMTP id cb10-20020a056a00430a00b005a8cc39fc58mr7212541pfb.6.1676681260371; Fri, 17 Feb 2023 16:47:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676681260; cv=none; d=google.com; s=arc-20160816; b=TBflTQ1Pmnu9snCMbUgkqt9z2EH/2IIs2w7kd1hPXuUtp9s8wYqUDTKLMjJ8/MbHVr fFSe7bDtiEu2KjMLSZxcrTwYHPU2ZmT8zULjBoQz4B1zRJFDCDoIWeYXXAmRb65Lec0w mxZ9+qGlxQrujD5ks5RoDi5Um52ghl5xxYV4s0wAleRJJNLM/wwk+qxZFv/WrKP/78kM 2ttwHmq0YglqATx5fJgPilqxG9zWq7u67oN2rB3wK+jC3/4VsKFhvnfZ48nk29EETMKJ AqgG8qMtgAtODFSNVpHS+yydzlMRajUjuJ7FoHsw1NPqOBXpThAqWjy/Jt1wBuHaF8Yw pR/w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=+gBzjJ9Vs83x35smR0BnSI5ARmUioiACF/H5X92ziQg=; b=EGqisTVFVX+o+y5brCb2vFOJ1qzd9+fqZHshbMukvDkDKfigkaOfTE8p8deZ8G7kKp o935euq7mVJcTKkbm7viigdS0rO9Ys5lFgB7sfzKeS2ZCR9TnAj2fI76Z24ELr9qyojS 48l52X/s6k00kXfWXSpqa2F4BPdtL/fpzzC608kJ1pPritY1jzrsgPMcnWlqr5urw6cJ yl+jxrNZJWjkwmL6NplgXyKimEIhFK/CoWVJqWTPkoMVqxzH9kU2k1GuQglmqqBVORZM es9g0EL5T5zIoM6qbS0rzX2wOrj1cR2W/07gGCZ6BN4Cj6N68oVnn+34EtTrNX2gy8wz Xb0w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=FTUzD2si; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 63-20020a621942000000b005a8a67dcafcsi6019148pfz.74.2023.02.17.16.47.27; Fri, 17 Feb 2023 16:47:40 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=FTUzD2si; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230210AbjBRAcB (ORCPT + 99 others); Fri, 17 Feb 2023 19:32:01 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43136 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230186AbjBRAb0 (ORCPT ); Fri, 17 Feb 2023 19:31:26 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 57F846ABC1 for ; Fri, 17 Feb 2023 16:29:57 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-5365a8dd33aso18706267b3.22 for ; Fri, 17 Feb 2023 16:29:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=+gBzjJ9Vs83x35smR0BnSI5ARmUioiACF/H5X92ziQg=; b=FTUzD2siWOamshg6dDNFdmcX2aYM0u3nvjpwU9uc75sLtLAKVBJAHVwAMjybOOM6aS VCEL8peUI2WwLbzU6ATlSC6+Wvz1/sc5w5zoKT+HEYVEE6xJEDLkP6KrFxiX5oVihBDo s3dn54TE92bh/yI06SLWHvNp5s6gfTErs9rDsWAFHd6g5rwlUEwUOvW2OPB06dtmQD1G M0MjN8FlEcXs2Za2RbC+His2Zwd7Jw8oRFe61lIYiwkcZmU3IZmQOKMZVZ3pE48rUE+w Jbg305rP5BtXo+lCwZaFP+1TrtMNjBmig60ExoWFI/t+qWpnbo+4TveqyihYwoX+lHie dTlQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=+gBzjJ9Vs83x35smR0BnSI5ARmUioiACF/H5X92ziQg=; b=d0kcJro0ct/CLHJ1ryQvyijgRD04ZHzWCEhlOA/vG+GlzhaYF9R8QXAZSeegBOxOgE jx0fF96tdGOCsC+4putq4nWgAOI1Yfp32zWxboQ/I4HdU3yBOwPD4PftVClD37zEV60t LnOeFF8a8Q0YNsOU7BKqB86hcEzCQdjWQQG/E3J50AJnNUJafvwurp2OM9jtOLA9MTaB pKrA1E+Rmb1lekboTtmLbA3O71QBk70M5j6SkTFrVKFRpV2hxcUwIOfrb0ROUeYCgDa4 KqO0iHoNCP1PXl/2yJaAUkUM0LSMRSx4lqE1kBgneFsIQpj974sr23qOV1XO8MCWFplS ZXQg== X-Gm-Message-State: AO0yUKW9wgp4zI1V83Qie45I5qL3BeepMIfBtygYPj8zX2Gh9mPJKzPr kIBmcZurBZVkxPfivog247kCaoRyO+pMmPuh X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a81:8706:0:b0:50b:429e:a9ef with SMTP id x6-20020a818706000000b0050b429ea9efmr1329552ywf.434.1676680166676; Fri, 17 Feb 2023 16:29:26 -0800 (PST) Date: Sat, 18 Feb 2023 00:28:15 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-43-jthoughton@google.com> Subject: [PATCH v2 42/46] selftests/mm: add HugeTLB HGM to userfaultfd selftest From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758127729414850795?= X-GMAIL-MSGID: =?utf-8?q?1758127729414850795?= This test case behaves similarly to the regular shared HugeTLB configuration, except that it uses 4K instead of hugepages, and that we ignore the UFFDIO_COPY tests, as UFFDIO_CONTINUE is the only ioctl that supports PAGE_SIZE-aligned regions. This doesn't test MADV_COLLAPSE. Other tests are added later to exercise MADV_COLLAPSE. Signed-off-by: James Houghton diff --git a/tools/testing/selftests/mm/userfaultfd.c b/tools/testing/selftests/mm/userfaultfd.c index 7f22844ed704..681c5c5f863b 100644 --- a/tools/testing/selftests/mm/userfaultfd.c +++ b/tools/testing/selftests/mm/userfaultfd.c @@ -73,9 +73,10 @@ static unsigned long nr_cpus, nr_pages, nr_pages_per_cpu, page_size, hpage_size; #define BOUNCE_POLL (1<<3) static int bounces; -#define TEST_ANON 1 -#define TEST_HUGETLB 2 -#define TEST_SHMEM 3 +#define TEST_ANON 1 +#define TEST_HUGETLB 2 +#define TEST_HUGETLB_HGM 3 +#define TEST_SHMEM 4 static int test_type; #define UFFD_FLAGS (O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY) @@ -93,6 +94,8 @@ static volatile bool test_uffdio_zeropage_eexist = true; static bool test_uffdio_wp = true; /* Whether to test uffd minor faults */ static bool test_uffdio_minor = false; +static bool test_uffdio_copy = true; + static bool map_shared; static int mem_fd; static unsigned long long *count_verify; @@ -151,7 +154,7 @@ static void usage(void) fprintf(stderr, "\nUsage: ./userfaultfd " "[hugetlbfs_file]\n\n"); fprintf(stderr, "Supported : anon, hugetlb, " - "hugetlb_shared, shmem\n\n"); + "hugetlb_shared, hugetlb_shared_hgm, shmem\n\n"); fprintf(stderr, "'Test mods' can be joined to the test type string with a ':'. " "Supported mods:\n"); fprintf(stderr, "\tsyscall - Use userfaultfd(2) (default)\n"); @@ -167,6 +170,11 @@ static void usage(void) exit(1); } +static bool test_is_hugetlb(void) +{ + return test_type == TEST_HUGETLB || test_type == TEST_HUGETLB_HGM; +} + #define _err(fmt, ...) \ do { \ int ret = errno; \ @@ -381,7 +389,7 @@ static struct uffd_test_ops *uffd_test_ops; static inline uint64_t uffd_minor_feature(void) { - if (test_type == TEST_HUGETLB && map_shared) + if (test_is_hugetlb() && map_shared) return UFFD_FEATURE_MINOR_HUGETLBFS; else if (test_type == TEST_SHMEM) return UFFD_FEATURE_MINOR_SHMEM; @@ -393,7 +401,7 @@ static uint64_t get_expected_ioctls(uint64_t mode) { uint64_t ioctls = UFFD_API_RANGE_IOCTLS; - if (test_type == TEST_HUGETLB) + if (test_is_hugetlb()) ioctls &= ~(1 << _UFFDIO_ZEROPAGE); if (!((mode & UFFDIO_REGISTER_MODE_WP) && test_uffdio_wp)) @@ -500,13 +508,16 @@ static void uffd_test_ctx_clear(void) static void uffd_test_ctx_init(uint64_t features) { unsigned long nr, cpu; + uint64_t enabled_features = features; uffd_test_ctx_clear(); uffd_test_ops->allocate_area((void **)&area_src, true); uffd_test_ops->allocate_area((void **)&area_dst, false); - userfaultfd_open(&features); + userfaultfd_open(&enabled_features); + if ((enabled_features & features) != features) + err("couldn't enable all features"); count_verify = malloc(nr_pages * sizeof(unsigned long long)); if (!count_verify) @@ -726,13 +737,16 @@ static void uffd_handle_page_fault(struct uffd_msg *msg, struct uffd_stats *stats) { unsigned long offset; + unsigned long address; if (msg->event != UFFD_EVENT_PAGEFAULT) err("unexpected msg event %u", msg->event); + address = msg->arg.pagefault.address; + if (msg->arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_WP) { /* Write protect page faults */ - wp_range(uffd, msg->arg.pagefault.address, page_size, false); + wp_range(uffd, address, page_size, false); stats->wp_faults++; } else if (msg->arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_MINOR) { uint8_t *area; @@ -751,11 +765,10 @@ static void uffd_handle_page_fault(struct uffd_msg *msg, */ area = (uint8_t *)(area_dst + - ((char *)msg->arg.pagefault.address - - area_dst_alias)); + ((char *)address - area_dst_alias)); for (b = 0; b < page_size; ++b) area[b] = ~area[b]; - continue_range(uffd, msg->arg.pagefault.address, page_size); + continue_range(uffd, address, page_size); stats->minor_faults++; } else { /* @@ -782,7 +795,7 @@ static void uffd_handle_page_fault(struct uffd_msg *msg, if (msg->arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_WRITE) err("unexpected write fault"); - offset = (char *)(unsigned long)msg->arg.pagefault.address - area_dst; + offset = (char *)address - area_dst; offset &= ~(page_size-1); if (copy_page(uffd, offset)) @@ -1192,6 +1205,12 @@ static int userfaultfd_events_test(void) char c; struct uffd_stats stats = { 0 }; + if (!test_uffdio_copy) { + printf("Skipping userfaultfd events test " + "(test_uffdio_copy=false)\n"); + return 0; + } + printf("testing events (fork, remap, remove): "); fflush(stdout); @@ -1245,6 +1264,12 @@ static int userfaultfd_sig_test(void) char c; struct uffd_stats stats = { 0 }; + if (!test_uffdio_copy) { + printf("Skipping userfaultfd signal test " + "(test_uffdio_copy=false)\n"); + return 0; + } + printf("testing signal delivery: "); fflush(stdout); @@ -1329,6 +1354,11 @@ static int userfaultfd_minor_test(void) uffd_test_ctx_init(uffd_minor_feature()); + if (test_type == TEST_HUGETLB_HGM) + /* Enable high-granularity userfaultfd ioctls for HugeTLB */ + if (madvise(area_dst_alias, nr_pages * page_size, MADV_SPLIT)) + err("MADV_SPLIT failed"); + uffdio_register.range.start = (unsigned long)area_dst_alias; uffdio_register.range.len = nr_pages * page_size; uffdio_register.mode = UFFDIO_REGISTER_MODE_MINOR; @@ -1538,6 +1568,12 @@ static int userfaultfd_stress(void) pthread_attr_init(&attr); pthread_attr_setstacksize(&attr, 16*1024*1024); + if (!test_uffdio_copy) { + printf("Skipping userfaultfd stress test " + "(test_uffdio_copy=false)\n"); + bounces = 0; + } + while (bounces--) { printf("bounces: %d, mode:", bounces); if (bounces & BOUNCE_RANDOM) @@ -1696,6 +1732,16 @@ static void set_test_type(const char *type) uffd_test_ops = &hugetlb_uffd_test_ops; /* Minor faults require shared hugetlb; only enable here. */ test_uffdio_minor = true; + } else if (!strcmp(type, "hugetlb_shared_hgm")) { + map_shared = true; + test_type = TEST_HUGETLB_HGM; + uffd_test_ops = &hugetlb_uffd_test_ops; + /* + * HugeTLB HGM only changes UFFDIO_CONTINUE, so don't test + * UFFDIO_COPY. + */ + test_uffdio_minor = true; + test_uffdio_copy = false; } else if (!strcmp(type, "shmem")) { map_shared = true; test_type = TEST_SHMEM; @@ -1731,6 +1777,7 @@ static void parse_test_type_arg(const char *raw_type) err("Unsupported test: %s", raw_type); if (test_type == TEST_HUGETLB) + /* TEST_HUGETLB_HGM gets small pages. */ page_size = hpage_size; else page_size = sysconf(_SC_PAGE_SIZE); @@ -1813,22 +1860,29 @@ int main(int argc, char **argv) nr_cpus = x < y ? x : y; } nr_pages_per_cpu = bytes / page_size / nr_cpus; + if (test_type == TEST_HUGETLB_HGM) + /* + * `page_size` refers to the page_size we can use in + * UFFDIO_CONTINUE. We still need nr_pages to be appropriately + * aligned, so align it here. + */ + nr_pages_per_cpu -= nr_pages_per_cpu % (hpage_size / page_size); if (!nr_pages_per_cpu) { _err("invalid MiB"); usage(); } + nr_pages = nr_pages_per_cpu * nr_cpus; bounces = atoi(argv[3]); if (bounces <= 0) { _err("invalid bounces"); usage(); } - nr_pages = nr_pages_per_cpu * nr_cpus; - if (test_type == TEST_SHMEM || test_type == TEST_HUGETLB) { + if (test_type == TEST_SHMEM || test_is_hugetlb()) { unsigned int memfd_flags = 0; - if (test_type == TEST_HUGETLB) + if (test_is_hugetlb()) memfd_flags = MFD_HUGETLB; mem_fd = memfd_create(argv[0], memfd_flags); if (mem_fd < 0) From patchwork Sat Feb 18 00:28:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58849 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp146425wrn; Fri, 17 Feb 2023 16:42:38 -0800 (PST) X-Google-Smtp-Source: AK7set+nM0KDQE1tCzAtAaTyCpu/VbtOJSoCNqVQ/LnAt1eIPerI4424pmPbClEoZbN62GttQDxk X-Received: by 2002:a17:902:d4c7:b0:198:adc4:22a2 with SMTP id o7-20020a170902d4c700b00198adc422a2mr8807330plg.29.1676680958383; Fri, 17 Feb 2023 16:42:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676680958; cv=none; d=google.com; s=arc-20160816; b=BptgOqYUrse4nAaAiTo1Rzwf06StN1mvQZ2ypZuxR0yOPjLjhgH+of+ivBeKgqqaGX 5qQJwwjRXClbuhOY2rHUi2MMUZSqfh1tqDqkohewCaMt0kZBXHoIrujN7o2uuW1ZvWD3 zGxrei73eJ70aE+VIKCb09ir+s9ER9QPRWFYUBWjjoolVoCqp7WpyF+U3opmpy3wfq79 FCOwvRsrgxSo6joUesL8nyB5IkWvDQu15wg+wF4vT96y2PB2edwR+y2WD3g8LkQRwr95 GzQV0G2sJwwmNxte+ccJAz4CWHe8kbGhJFU5U+GrV35PRdcoWufkolUC23RZLCV5Eadk yPzQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=8+Dfh3tVbu9NNOAuUQbMjQlcnIuGhIGz3LkhsyysEBM=; b=Hu0e3UKuCaD62NVfGE/9BS/a2ERV3qqmpwJeYZGWbqznFR08H1z2THV8A7u34ibqoo 8C/YtkuEhorbHjHQog6hijyrUnNSFZCKegqSrhUh21hOmN7DwNY/gSle1LgQSCAQ3gpA 8xwxYhHGItXLbRlOywcl0jySfmDzrmg7SPTn1uZskQ3euaNkOsNpbHBOGNwkll0KFIfN 0iu5pH3q3BVsS7yfMoG1eZKGFAecGQqKX2KAPTKRaoL3J8od1Ff4bskAbLHOyaLCWt3Q kvMQT5S1Jd1mdvZnBt43dQeZQRpsPLrQQL0cWpYqO5DWtVewMuqgVSc6/YKw4M/HCzY/ I4MA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=oC2H6v+t; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id r76-20020a632b4f000000b004fca80d4647si3783120pgr.216.2023.02.17.16.42.25; Fri, 17 Feb 2023 16:42:38 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=oC2H6v+t; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229931AbjBRAcU (ORCPT + 99 others); Fri, 17 Feb 2023 19:32:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43138 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230082AbjBRAb6 (ORCPT ); Fri, 17 Feb 2023 19:31:58 -0500 Received: from mail-ua1-x949.google.com (mail-ua1-x949.google.com [IPv6:2607:f8b0:4864:20::949]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 375EA6EBB7 for ; Fri, 17 Feb 2023 16:30:25 -0800 (PST) Received: by mail-ua1-x949.google.com with SMTP id f13-20020ab060ad000000b0068e6c831945so354397uam.12 for ; Fri, 17 Feb 2023 16:30:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=8+Dfh3tVbu9NNOAuUQbMjQlcnIuGhIGz3LkhsyysEBM=; b=oC2H6v+tP8sxgSW/qaTkqpk/qQGFu8OqZp2lefzh5tpvjsOdoYS95fEme6g5r45Srg kK+SX7TkKqLd7NIyWtb5jibyXnm1w9QJR9oIYR7KWDFmDwxHgvIJ02BbB1UgyoTqeQL8 89D9RKCX1G+/4ORDV+WYjrLRuKzAfDY/6QJdu8zlAz9TQtaROdykMDea9qi+W5pMHyEx npoMTQfezy/k56FzTnBrsh96SmGvpoH+UHWk3RlvjqE9gAwVIEdHIsW/fo6+TLVqF5XS Xw+BwwbTSNp8vpWaaHiMcSOExGAoahht2ywAUDVHMW8zc5X++I6XujZ6B7mVjrEIJDFO c0nA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=8+Dfh3tVbu9NNOAuUQbMjQlcnIuGhIGz3LkhsyysEBM=; b=W4h0YZbq75V0QN5By17cfqrW36X76UtfA4wbewCMkqucHQrWHYG8HTKtPkNDEutdry FAebfldqH6FoXa8JdYiVVptCHqWtpf9u90eJXYhtckya/Kfc5azBbZ8Hd/bUO8MKd3BW ksT0N/gegloxr9Bx61iv+JmE7abi/N+YMkPWTMWzO0MiPU3St32jdv1LwVuJomnDC4HK JFEyf0xCBSBAEpcmv1IJgsqsPIODxR3V7+4TnMBwHvjG2uXR7BPcB1oV909RWB9oeVju 3WxmjiydZ69HIc+vSr3o6iHsdMjzsVIhItXTDsv1SaROmWrjaS0g0M549W/8cwMIaozW 0hgg== X-Gm-Message-State: AO0yUKW7lPU5qmyIC4zdUW9M3HzkelpQVotwhtH2Gji1UZIqrkJLS5o1 +I/+B2SEyCAs9XebYf+0pwKuMlW5+OdRMpJJ X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a1f:a013:0:b0:401:9bc6:c40c with SMTP id j19-20020a1fa013000000b004019bc6c40cmr552024vke.20.1676680183989; Fri, 17 Feb 2023 16:29:43 -0800 (PST) Date: Sat, 18 Feb 2023 00:28:16 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-44-jthoughton@google.com> Subject: [PATCH v2 43/46] KVM: selftests: add HugeTLB HGM to KVM demand paging selftest From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758127412185491411?= X-GMAIL-MSGID: =?utf-8?q?1758127412185491411?= This test exercises the GUP paths for HGM. MADV_COLLAPSE is not tested. Signed-off-by: James Houghton diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c index b0e1fc4de9e2..e534f9c927bf 100644 --- a/tools/testing/selftests/kvm/demand_paging_test.c +++ b/tools/testing/selftests/kvm/demand_paging_test.c @@ -170,7 +170,7 @@ static void run_test(enum vm_guest_mode mode, void *arg) uffd_descs[i] = uffd_setup_demand_paging( p->uffd_mode, p->uffd_delay, vcpu_hva, vcpu_args->pages * memstress_args.guest_page_size, - &handle_uffd_page_request); + p->src_type, &handle_uffd_page_request); } } diff --git a/tools/testing/selftests/kvm/include/test_util.h b/tools/testing/selftests/kvm/include/test_util.h index 80d6416f3012..a2106c19a614 100644 --- a/tools/testing/selftests/kvm/include/test_util.h +++ b/tools/testing/selftests/kvm/include/test_util.h @@ -103,6 +103,7 @@ enum vm_mem_backing_src_type { VM_MEM_SRC_ANONYMOUS_HUGETLB_16GB, VM_MEM_SRC_SHMEM, VM_MEM_SRC_SHARED_HUGETLB, + VM_MEM_SRC_SHARED_HUGETLB_HGM, NUM_SRC_TYPES, }; @@ -121,6 +122,7 @@ size_t get_def_hugetlb_pagesz(void); const struct vm_mem_backing_src_alias *vm_mem_backing_src_alias(uint32_t i); size_t get_backing_src_pagesz(uint32_t i); bool is_backing_src_hugetlb(uint32_t i); +bool is_backing_src_shared_hugetlb(enum vm_mem_backing_src_type src_type); void backing_src_help(const char *flag); enum vm_mem_backing_src_type parse_backing_src_type(const char *type_name); long get_run_delay(void); diff --git a/tools/testing/selftests/kvm/include/userfaultfd_util.h b/tools/testing/selftests/kvm/include/userfaultfd_util.h index 877449c34592..d91528a58245 100644 --- a/tools/testing/selftests/kvm/include/userfaultfd_util.h +++ b/tools/testing/selftests/kvm/include/userfaultfd_util.h @@ -26,9 +26,9 @@ struct uffd_desc { pthread_t thread; }; -struct uffd_desc *uffd_setup_demand_paging(int uffd_mode, useconds_t delay, - void *hva, uint64_t len, - uffd_handler_t handler); +struct uffd_desc *uffd_setup_demand_paging( + int uffd_mode, useconds_t delay, void *hva, uint64_t len, + enum vm_mem_backing_src_type src_type, uffd_handler_t handler); void uffd_stop_demand_paging(struct uffd_desc *uffd); diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index 56d5ea949cbb..b9c398dc295d 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -981,7 +981,7 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm, region->fd = -1; if (backing_src_is_shared(src_type)) region->fd = kvm_memfd_alloc(region->mmap_size, - src_type == VM_MEM_SRC_SHARED_HUGETLB); + is_backing_src_shared_hugetlb(src_type)); region->mmap_start = mmap(NULL, region->mmap_size, PROT_READ | PROT_WRITE, diff --git a/tools/testing/selftests/kvm/lib/test_util.c b/tools/testing/selftests/kvm/lib/test_util.c index 5c22fa4c2825..712a0878932e 100644 --- a/tools/testing/selftests/kvm/lib/test_util.c +++ b/tools/testing/selftests/kvm/lib/test_util.c @@ -271,6 +271,13 @@ const struct vm_mem_backing_src_alias *vm_mem_backing_src_alias(uint32_t i) */ .flag = MAP_SHARED, }, + [VM_MEM_SRC_SHARED_HUGETLB_HGM] = { + /* + * Identical to shared_hugetlb except for the name. + */ + .name = "shared_hugetlb_hgm", + .flag = MAP_SHARED, + }, }; _Static_assert(ARRAY_SIZE(aliases) == NUM_SRC_TYPES, "Missing new backing src types?"); @@ -289,6 +296,7 @@ size_t get_backing_src_pagesz(uint32_t i) switch (i) { case VM_MEM_SRC_ANONYMOUS: case VM_MEM_SRC_SHMEM: + case VM_MEM_SRC_SHARED_HUGETLB_HGM: return getpagesize(); case VM_MEM_SRC_ANONYMOUS_THP: return get_trans_hugepagesz(); @@ -305,6 +313,12 @@ bool is_backing_src_hugetlb(uint32_t i) return !!(vm_mem_backing_src_alias(i)->flag & MAP_HUGETLB); } +bool is_backing_src_shared_hugetlb(enum vm_mem_backing_src_type src_type) +{ + return src_type == VM_MEM_SRC_SHARED_HUGETLB || + src_type == VM_MEM_SRC_SHARED_HUGETLB_HGM; +} + static void print_available_backing_src_types(const char *prefix) { int i; diff --git a/tools/testing/selftests/kvm/lib/userfaultfd_util.c b/tools/testing/selftests/kvm/lib/userfaultfd_util.c index 92cef20902f1..3c7178d6c4f4 100644 --- a/tools/testing/selftests/kvm/lib/userfaultfd_util.c +++ b/tools/testing/selftests/kvm/lib/userfaultfd_util.c @@ -25,6 +25,10 @@ #ifdef __NR_userfaultfd +#ifndef MADV_SPLIT +#define MADV_SPLIT 26 +#endif + static void *uffd_handler_thread_fn(void *arg) { struct uffd_desc *uffd_desc = (struct uffd_desc *)arg; @@ -108,9 +112,9 @@ static void *uffd_handler_thread_fn(void *arg) return NULL; } -struct uffd_desc *uffd_setup_demand_paging(int uffd_mode, useconds_t delay, - void *hva, uint64_t len, - uffd_handler_t handler) +struct uffd_desc *uffd_setup_demand_paging( + int uffd_mode, useconds_t delay, void *hva, uint64_t len, + enum vm_mem_backing_src_type src_type, uffd_handler_t handler) { struct uffd_desc *uffd_desc; bool is_minor = (uffd_mode == UFFDIO_REGISTER_MODE_MINOR); @@ -140,6 +144,10 @@ struct uffd_desc *uffd_setup_demand_paging(int uffd_mode, useconds_t delay, "ioctl UFFDIO_API failed: %" PRIu64, (uint64_t)uffdio_api.api); + if (src_type == VM_MEM_SRC_SHARED_HUGETLB_HGM) + TEST_ASSERT(!madvise(hva, len, MADV_SPLIT), + "Could not enable HGM"); + uffdio_register.range.start = (uint64_t)hva; uffdio_register.range.len = len; uffdio_register.mode = uffd_mode; From patchwork Sat Feb 18 00:28:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58848 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp145993wrn; Fri, 17 Feb 2023 16:41:06 -0800 (PST) X-Google-Smtp-Source: AK7set+B785hJdUjtrD7Kb8x/FzFPjkqztOsGtMUgqqPkgr6vAYTtOk+i0KOTwVx8wUsdly0PGpr X-Received: by 2002:a05:6a20:4407:b0:c0:5c7:73b5 with SMTP id ce7-20020a056a20440700b000c005c773b5mr3653922pzb.9.1676680866274; Fri, 17 Feb 2023 16:41:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676680866; cv=none; d=google.com; s=arc-20160816; b=qsGfbdHPhm3XyPPSTjP00yW/dfFSy4nEilYvoQArTmAtuPlPBXIOGwKzixMLIVxF9g bK7jgvNv3vnXdbgcCb28C6/V3oS/KTgpIM9GsfHv5LAhrZyGL0fcDe8F4rWIGW6GhX20 rzX563d2Tb/FROzZJWreMcn6Yle6f2Ay2OrbEEehwgBaZ1CiUNvuU4Qv7Gt6oZ44jPfk Nwv5pz1dqxqnp5OcCyKq9ENznUsG/L2smMlIqK+ylUsO1KL6rF8EeQBFVTA+YDQrUABn Vx6y5fY2lDIZIBkrI7+ODn43hHbyDK5/tK38BE9zD4qFeRFf2C0GlpjuwnwzNAropSKi ZKEw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=hyOLv1FsgINJ0DL+GVviZenHuSxZX0xiNJERAdFTwJY=; b=BBiXUrHFzefw2LxSDtVFmmiTHKAb/3OotZOcn3uV1oryLX5h86WCJVb2oLMlwjNKgH hHkaBBuKotB+8pThvdCv/kw9DmQUXGxIHXMkGFtjE2fLtKDd8BXqGZdhhgaj+MXMBS4t vIT3hlU9ncmH5siCSneIrHF2kNfgOT88/vdK3Kk0ho9ijbSAmxUhhu33IiEF9tmF4dbQ t4/h7IfqfV2LQuxv5S6wHBdwrtQqxJWtDTcISI8fhoTJJj6G58fH72+RvVG/aD5YFf3+ IOtJ4Ad6cq/zKn0NQxGPjBobxxxdzzU/+5KAHbWtJlb8WMu8nIELj0JFYiibrroGWemk DipQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=CawVGssL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m10-20020a65530a000000b004fbd23110d5si6041384pgq.546.2023.02.17.16.40.50; Fri, 17 Feb 2023 16:41:06 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=CawVGssL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230308AbjBRAcW (ORCPT + 99 others); Fri, 17 Feb 2023 19:32:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42016 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230306AbjBRAb7 (ORCPT ); Fri, 17 Feb 2023 19:31:59 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 038CB6EBA3 for ; Fri, 17 Feb 2023 16:30:26 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-53659b9818dso20142317b3.18 for ; Fri, 17 Feb 2023 16:30:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=hyOLv1FsgINJ0DL+GVviZenHuSxZX0xiNJERAdFTwJY=; b=CawVGssLxbR/eBsnpR76uSOwVJlkmsmCv0a5oroE0FUoq5CLC5qHPXGXefy+nfOAyq vHPyCbLQuykbScgNRy8G6ArFuhS2PA24tAoAR0gKyBhUDTykhHwShI6arftzkplVbyyk b4WDZ3892nTCDc681eGtskse+KIsH+dqtfTG2Vne6aDgRhNRdxfjHtbN/sR4SQs6gHc0 qUEwaDcngWDaQt95+PjWHSmLfyKw65IbdY5tMd2l4nt3+4TnGgT5wI1VLYvFO5GyJcwD HDiIQ3YFBSVkmnMVLHQ4cyRGkHdEevZ48y3mNmJQPrsPUHrSdwcp+QpVtufpdBI9D2OH id0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=hyOLv1FsgINJ0DL+GVviZenHuSxZX0xiNJERAdFTwJY=; b=xw41vqJJWN4lrBYOs2cIIcNvUKe2MA7KDvGo6Mm1y1ACe+TrkjuiBC7v4CX6nQdmVC UrIj+dKaHseR6QC141WSiVoaYFnRTJgmVX/Zs6pQTkglq6PEDuf5Q1emoxMIeC6k7/q6 EZaQqSi7UqqR5p+bNU5xYqN9IM06UIv90HpOqNcV+HKkD5wQOZ18HLmDgJs6Jr1hcjt/ BGKOpAx4GReqAmvXjh5oRcqfFJm0OreUo8P3T9u3RyBvmfg8lQimSBvjiLprY39PM0jh +kELcCExAybInH2/CEtW4mWOoSMXEkyEdwLnftFzCw7pP2yabL5UmhyaFScYz8BIHmAQ 3RBg== X-Gm-Message-State: AO0yUKXaak9bUALie6YOEEGSZt4pWJFBbP5WN1RHpJmGhYAkHx9kWX/N DkqagcKkL0rFSTIGBPgnh1AT8+3aPym9J6EB X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6902:4f0:b0:98e:6280:74ca with SMTP id w16-20020a05690204f000b0098e628074camr174263ybs.1.1676680184745; Fri, 17 Feb 2023 16:29:44 -0800 (PST) Date: Sat, 18 Feb 2023 00:28:17 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-45-jthoughton@google.com> Subject: [PATCH v2 44/46] selftests/mm: add anon and shared hugetlb to migration test From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758127316002468316?= X-GMAIL-MSGID: =?utf-8?q?1758127316002468316?= Shared HugeTLB mappings are migrated best-effort. Sometimes, due to being unable to grab the VMA lock for writing, migration may just randomly fail. To allow for that, we allow retries. Signed-off-by: James Houghton diff --git a/tools/testing/selftests/mm/migration.c b/tools/testing/selftests/mm/migration.c index 1cec8425e3ca..21577a84d7e4 100644 --- a/tools/testing/selftests/mm/migration.c +++ b/tools/testing/selftests/mm/migration.c @@ -13,6 +13,7 @@ #include #include #include +#include #define TWOMEG (2<<20) #define RUNTIME (60) @@ -59,11 +60,12 @@ FIXTURE_TEARDOWN(migration) free(self->pids); } -int migrate(uint64_t *ptr, int n1, int n2) +int migrate(uint64_t *ptr, int n1, int n2, int retries) { int ret, tmp; int status = 0; struct timespec ts1, ts2; + int failed = 0; if (clock_gettime(CLOCK_MONOTONIC, &ts1)) return -1; @@ -78,6 +80,9 @@ int migrate(uint64_t *ptr, int n1, int n2) ret = move_pages(0, 1, (void **) &ptr, &n2, &status, MPOL_MF_MOVE_ALL); if (ret) { + if (++failed < retries) + continue; + if (ret > 0) printf("Didn't migrate %d pages\n", ret); else @@ -88,6 +93,7 @@ int migrate(uint64_t *ptr, int n1, int n2) tmp = n2; n2 = n1; n1 = tmp; + failed = 0; } return 0; @@ -128,7 +134,7 @@ TEST_F_TIMEOUT(migration, private_anon, 2*RUNTIME) if (pthread_create(&self->threads[i], NULL, access_mem, ptr)) perror("Couldn't create thread"); - ASSERT_EQ(migrate(ptr, self->n1, self->n2), 0); + ASSERT_EQ(migrate(ptr, self->n1, self->n2, 1), 0); for (i = 0; i < self->nthreads - 1; i++) ASSERT_EQ(pthread_cancel(self->threads[i]), 0); } @@ -158,7 +164,7 @@ TEST_F_TIMEOUT(migration, shared_anon, 2*RUNTIME) self->pids[i] = pid; } - ASSERT_EQ(migrate(ptr, self->n1, self->n2), 0); + ASSERT_EQ(migrate(ptr, self->n1, self->n2, 1), 0); for (i = 0; i < self->nthreads - 1; i++) ASSERT_EQ(kill(self->pids[i], SIGTERM), 0); } @@ -185,9 +191,78 @@ TEST_F_TIMEOUT(migration, private_anon_thp, 2*RUNTIME) if (pthread_create(&self->threads[i], NULL, access_mem, ptr)) perror("Couldn't create thread"); - ASSERT_EQ(migrate(ptr, self->n1, self->n2), 0); + ASSERT_EQ(migrate(ptr, self->n1, self->n2, 1), 0); + for (i = 0; i < self->nthreads - 1; i++) + ASSERT_EQ(pthread_cancel(self->threads[i]), 0); +} + +/* + * Tests the anon hugetlb migration entry paths. + */ +TEST_F_TIMEOUT(migration, private_anon_hugetlb, 2*RUNTIME) +{ + uint64_t *ptr; + int i; + + if (self->nthreads < 2 || self->n1 < 0 || self->n2 < 0) + SKIP(return, "Not enough threads or NUMA nodes available"); + + ptr = mmap(NULL, TWOMEG, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0); + if (ptr == MAP_FAILED) + SKIP(return, "Could not allocate hugetlb pages"); + + memset(ptr, 0xde, TWOMEG); + for (i = 0; i < self->nthreads - 1; i++) + if (pthread_create(&self->threads[i], NULL, access_mem, ptr)) + perror("Couldn't create thread"); + + ASSERT_EQ(migrate(ptr, self->n1, self->n2, 1), 0); for (i = 0; i < self->nthreads - 1; i++) ASSERT_EQ(pthread_cancel(self->threads[i]), 0); } +/* + * Tests the shared hugetlb migration entry paths. + */ +TEST_F_TIMEOUT(migration, shared_hugetlb, 2*RUNTIME) +{ + uint64_t *ptr; + int i; + int fd; + unsigned long sz; + struct statfs filestat; + + if (self->nthreads < 2 || self->n1 < 0 || self->n2 < 0) + SKIP(return, "Not enough threads or NUMA nodes available"); + + fd = memfd_create("tmp_hugetlb", MFD_HUGETLB); + if (fd < 0) + SKIP(return, "Couldn't create hugetlb memfd"); + + if (fstatfs(fd, &filestat) < 0) + SKIP(return, "Couldn't fstatfs hugetlb file"); + + sz = filestat.f_bsize; + + if (ftruncate(fd, sz)) + SKIP(return, "Couldn't allocate hugetlb pages"); + ptr = mmap(NULL, sz, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + if (ptr == MAP_FAILED) + SKIP(return, "Could not map hugetlb pages"); + + memset(ptr, 0xde, sz); + for (i = 0; i < self->nthreads - 1; i++) + if (pthread_create(&self->threads[i], NULL, access_mem, ptr)) + perror("Couldn't create thread"); + + ASSERT_EQ(migrate(ptr, self->n1, self->n2, 10), 0); + for (i = 0; i < self->nthreads - 1; i++) { + ASSERT_EQ(pthread_cancel(self->threads[i]), 0); + pthread_join(self->threads[i], NULL); + } + ftruncate(fd, 0); + close(fd); +} + TEST_HARNESS_MAIN From patchwork Sat Feb 18 00:28:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58850 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp146913wrn; Fri, 17 Feb 2023 16:44:32 -0800 (PST) X-Google-Smtp-Source: AK7set9bLpysQhny98TgCz13kD49Z5Ptu/Bari/ioxI1UkZe4Yi4YC2WNJGHasT8UIdlemVDAiuI X-Received: by 2002:a05:6a20:3d05:b0:c7:166e:ea9a with SMTP id y5-20020a056a203d0500b000c7166eea9amr10191220pzi.21.1676681072354; Fri, 17 Feb 2023 16:44:32 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676681072; cv=none; d=google.com; s=arc-20160816; b=q1F+piu+skPgAqdVz+YCkHNXlVoAz2BLHOIwQwEYI3sD5Z1dqMBEwRO+h4bxSYrg2g ZMzqL/Je3kJENfXlDrYbmtxtW/UE5u+NH8EqIOA7r0dR354qIxk0pJqcJ+gm7i7aMnR1 NhhFZYLkhPi80jxgSeQ42Gyjv288z5MImBok+NhNvKZDC97OK/vw2MU50YycHE00IpRN jKC9VQQT/Jc3BC+MlKTgZGEDrjex27dbBKF8UImyGH2hBV7xYn1XE0Gx/TRbP1BNtoL3 JcPkDkSGB9KcSx1qjdr7FwCsbyr19ZJM4tEJ6/EuTeNBeJ04E3KgTDhcSMeKafrSfDjv KWjQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=IfKq0bBaVq4CjPIDar2fHBWF2wKUUWJpsDr5c4wwSS8=; b=vPDqQgALUwRPtsdQu3j+AavFIL/XoCCPpmb304KIAeQlFzuM4cUcpkLA1ln6w+1jhR 7htPZzpxDgblgfQHhMzJxpEgp2dao2y3ZMND/r2O1I88SJspUPK5hFsBSDGI224Tb9HA ppL5fonPwPHHNjeqocwdtQ9VFcIAOfCAEuH3yF1dC9B2zL+j4JN0wbMDwt+4ci6bjFwD qVR/6cyai1OSmw4MlEBmCGAMpbWonKhydZRLIOawpbv9262z7BHuPss589wvChIA3csM e1bbtbj/AMOX9VTEb12RXI7kPtl39/XlQ+f4Ou0LjYt3HWzgScFvuHrU7uWhtuXAG7IO wkTQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=FNS8OXzQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 15-20020a63070f000000b004cb94362cb8si6146667pgh.195.2023.02.17.16.44.19; Fri, 17 Feb 2023 16:44:32 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=FNS8OXzQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230215AbjBRAc0 (ORCPT + 99 others); Fri, 17 Feb 2023 19:32:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43148 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230340AbjBRAcA (ORCPT ); Fri, 17 Feb 2023 19:32:00 -0500 Received: from mail-ua1-x94a.google.com (mail-ua1-x94a.google.com [IPv6:2607:f8b0:4864:20::94a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6268A6E677 for ; Fri, 17 Feb 2023 16:30:28 -0800 (PST) Received: by mail-ua1-x94a.google.com with SMTP id x2-20020ab03802000000b0060d5bfd73b5so940115uav.16 for ; Fri, 17 Feb 2023 16:30:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=IfKq0bBaVq4CjPIDar2fHBWF2wKUUWJpsDr5c4wwSS8=; b=FNS8OXzQNbEkpLDpjB3nAEwYdiqjzDyLdWfy2qMCsFYgDJSKBhPFK3yDxyIWpeySgl maQUSikgn8RHfHnveHKP4KewomfuBHkrzkOdNxKAEwOz4ERjMdSgdS+IkmHhDevj0Dlo 8SYMrUU5w0fXhm10675DXBEz1qa82TExwLIbs6xG0LABcq+O89FT45Z9grk1Qq9uWQnL VEl/w45AklX6i6AYyOmin2NKbHJDvd0cIf5fk1zu+qjIn8o5y4g9O4lMFPFSQVgZ0QZ9 4eExXJ6WU1Q2oLrhiuNiGB+aLlTM+fJuXr+noURddq0gbHVbzoVDxDtmh6wk9zHk9oKN 9jig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=IfKq0bBaVq4CjPIDar2fHBWF2wKUUWJpsDr5c4wwSS8=; b=nCHjl62P6gsOcOxCvrHUS+VwgswZKQgJ6ToSISvaiipVQfhzbRl9htdiUC429QSy+5 akicXhbT5kpmJOYn+Yy9Eq99aso618WeMF63JggG5+PuU8lHXc1SHm6JX86bbPhGGH1A 0EVuK8LvE514eKFjeG4y2rCG6q4pK4xyq1884JZuq0M4okr1S7OLkM6+8E3l0K32bB9o IApRjc43eXVdRdmhT13p1YVOhb+iK+ZS9aG6TFlGoibGUi/BuLjvZx4gN1vgwm4PSFkQ 0Ouuf/DZ2n/1duFbGn7MpbaYz3cniV9Qr7gx/3tRmkyXrUViWTKJNKanbfQ/Vj/pRvdz 6fYw== X-Gm-Message-State: AO0yUKUJb+Lwy3SWhRwZlGWhdZF9VAn0z70ML3OmjRYawFuoOGTlDtfw APBSdSXDeZbIFEQe8T5qbsz17zsT0KALwpno X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:ab0:53d2:0:b0:68b:923a:d6f4 with SMTP id l18-20020ab053d2000000b0068b923ad6f4mr47364uaa.2.1676680186208; Fri, 17 Feb 2023 16:29:46 -0800 (PST) Date: Sat, 18 Feb 2023 00:28:18 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-46-jthoughton@google.com> Subject: [PATCH v2 45/46] selftests/mm: add hugetlb HGM test to migration selftest From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758127531955547651?= X-GMAIL-MSGID: =?utf-8?q?1758127531955547651?= This is mostly the same as the shared HugeTLB case, but instead of mapping the page with a regular page fault, we map it with lots of UFFDIO_CONTINUE operations. We also verify that the contents haven't changed after the migration, which would be the case if the post-migration PTEs pointed to the wrong page. Signed-off-by: James Houghton diff --git a/tools/testing/selftests/mm/migration.c b/tools/testing/selftests/mm/migration.c index 21577a84d7e4..1fb3607accab 100644 --- a/tools/testing/selftests/mm/migration.c +++ b/tools/testing/selftests/mm/migration.c @@ -14,12 +14,21 @@ #include #include #include +#include +#include +#include +#include +#include #define TWOMEG (2<<20) #define RUNTIME (60) #define ALIGN(x, a) (((x) + (a - 1)) & (~((a) - 1))) +#ifndef MADV_SPLIT +#define MADV_SPLIT 26 +#endif + FIXTURE(migration) { pthread_t *threads; @@ -265,4 +274,141 @@ TEST_F_TIMEOUT(migration, shared_hugetlb, 2*RUNTIME) close(fd); } +#ifdef __NR_userfaultfd +static int map_at_high_granularity(char *mem, size_t length) +{ + int i; + int ret; + int uffd = syscall(__NR_userfaultfd, 0); + struct uffdio_api api; + struct uffdio_register reg; + int pagesize = getpagesize(); + + if (uffd < 0) { + perror("couldn't create uffd"); + return uffd; + } + + api.api = UFFD_API; + api.features = 0; + + ret = ioctl(uffd, UFFDIO_API, &api); + if (ret || api.api != UFFD_API) { + perror("UFFDIO_API failed"); + goto out; + } + + if (madvise(mem, length, MADV_SPLIT) == -1) { + perror("MADV_SPLIT failed"); + goto out; + } + + reg.range.start = (unsigned long)mem; + reg.range.len = length; + + reg.mode = UFFDIO_REGISTER_MODE_MISSING | UFFDIO_REGISTER_MODE_MINOR; + + ret = ioctl(uffd, UFFDIO_REGISTER, ®); + if (ret) { + perror("UFFDIO_REGISTER failed"); + goto out; + } + + /* UFFDIO_CONTINUE each 4K segment of the 2M page. */ + for (i = 0; i < length/pagesize; ++i) { + struct uffdio_continue cont; + + cont.range.start = (unsigned long long)mem + i * pagesize; + cont.range.len = pagesize; + cont.mode = 0; + ret = ioctl(uffd, UFFDIO_CONTINUE, &cont); + if (ret) { + fprintf(stderr, "UFFDIO_CONTINUE failed " + "for %llx -> %llx: %d\n", + cont.range.start, + cont.range.start + cont.range.len, + errno); + goto out; + } + } + ret = 0; +out: + close(uffd); + return ret; +} +#else +static int map_at_high_granularity(char *mem, size_t length) +{ + fprintf(stderr, "Userfaultfd missing\n"); + return -1; +} +#endif /* __NR_userfaultfd */ + +/* + * Tests the high-granularity hugetlb migration entry paths. + */ +TEST_F_TIMEOUT(migration, shared_hugetlb_hgm, 2*RUNTIME) +{ + uint64_t *ptr; + int i; + int fd; + unsigned long sz; + struct statfs filestat; + + if (self->nthreads < 2 || self->n1 < 0 || self->n2 < 0) + SKIP(return, "Not enough threads or NUMA nodes available"); + + fd = memfd_create("tmp_hugetlb", MFD_HUGETLB); + if (fd < 0) + SKIP(return, "Couldn't create hugetlb memfd"); + + if (fstatfs(fd, &filestat) < 0) + SKIP(return, "Couldn't fstatfs hugetlb file"); + + sz = filestat.f_bsize; + + if (ftruncate(fd, sz)) + SKIP(return, "Couldn't allocate hugetlb pages"); + + if (fallocate(fd, 0, 0, sz) < 0) { + perror("fallocate failed"); + SKIP(return, "fallocate failed"); + } + + ptr = mmap(NULL, sz, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + if (ptr == MAP_FAILED) + SKIP(return, "Could not allocate hugetlb pages"); + + /* + * We have to map_at_high_granularity before we memset, otherwise + * memset will map everything at the hugepage size. + */ + if (map_at_high_granularity((char *)ptr, sz) < 0) + SKIP(return, "Could not map HugeTLB range at high granularity"); + + /* Populate the page we're migrating. */ + for (i = 0; i < sz/sizeof(*ptr); ++i) + ptr[i] = i; + + for (i = 0; i < self->nthreads - 1; i++) + if (pthread_create(&self->threads[i], NULL, access_mem, ptr)) + perror("Couldn't create thread"); + + ASSERT_EQ(migrate(ptr, self->n1, self->n2, 10), 0); + for (i = 0; i < self->nthreads - 1; i++) { + ASSERT_EQ(pthread_cancel(self->threads[i]), 0); + pthread_join(self->threads[i], NULL); + } + + /* Check that the contents didnt' change. */ + for (i = 0; i < sz/sizeof(*ptr); ++i) { + ASSERT_EQ(ptr[i], i); + if (ptr[i] != i) + break; + } + + ftruncate(fd, 0); + close(fd); +} + TEST_HARNESS_MAIN From patchwork Sat Feb 18 00:28:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 58857 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp148237wrn; Fri, 17 Feb 2023 16:49:01 -0800 (PST) X-Google-Smtp-Source: AK7set+NaI0mfOs5CUapr5ew0T4g27tkYfp2BMyipC2aHyEDZHZXYIjsfz0gIzJgNV/oPRwVzcWt X-Received: by 2002:a17:903:84c:b0:19a:8c8a:13f9 with SMTP id ks12-20020a170903084c00b0019a8c8a13f9mr1248221plb.47.1676681341378; Fri, 17 Feb 2023 16:49:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676681341; cv=none; d=google.com; s=arc-20160816; b=cIVXPG0iF1TCztSnsVsScUUYmgu5/yNuXw2PiDF9dLv33p6AiD5GJRmcajvUR+DU7X K09iLc43Jdfiq0e7COYBbmp/ISmBUyliTq9wWKQKWFQnHHTHvzaXCnsEhJms7RBUaQf6 MGOGUj/ydZtL6ejjrBb8SjDKYSsKS18V2TigXbGriyp0/ZJtrD5i7Gd9umBb/S6AvAUU dKK4dJeHh/lLzS5HL0wF1HBLS2qX52FfDj2scPOUZUfvVbNadiXjEbVEam3dVGucWKsG QfNqWqwpP30Ejs6gg/niEUJkxAqlGS/R4U3h1tnSS1FZIqnWdLvfQebZehwnUNGabUqv 18pg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=nWGrggQ/qDTMgdBpDhPDYEEJzTc7x7HUNebehORCAiA=; b=t83r/61nvSkXnwwCa2BvbjdyAKoBN1xv7DtmCQrORXzFCpK7PhRYQTITCOsJQBUrhb I6X3r26R3rRWGhMBq99mCVi28lo+hrgXxtZzZkVn169i85esciDPtGzccbU++943I3+d 91w2yV4GOA0OmqlUVapTEn5z3xELSUhLmwvIkQgMkmGaeXwXzbeKsv4eJwgkDAFRtnqN 85q12zQ+2DAfFN8wQnUg+Mi0q/dhwldcqZ42rv8gA0+Xq1ayA+aDN+Ik9bW79kGM0mjx Oj53UmjILHGZVx2Febndz72FhBa7etvyQcHwwYcv44jx8CchUVXxU6qyxJKV7gnbtcwQ 5kJg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=Ycfh2IW+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id im9-20020a170902bb0900b0019a83c16addsi1106336plb.51.2023.02.17.16.48.49; Fri, 17 Feb 2023 16:49:01 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=Ycfh2IW+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230368AbjBRAc2 (ORCPT + 99 others); Fri, 17 Feb 2023 19:32:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41840 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230095AbjBRAcB (ORCPT ); Fri, 17 Feb 2023 19:32:01 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 99E706EBB9 for ; Fri, 17 Feb 2023 16:30:29 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id e83-20020a25e756000000b0086349255277so2438159ybh.8 for ; Fri, 17 Feb 2023 16:30:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=nWGrggQ/qDTMgdBpDhPDYEEJzTc7x7HUNebehORCAiA=; b=Ycfh2IW+hDbVlFvLyA3lntlwjCloreR3r/qvCLXWGq6CNK6K2k2C8p8Spq55zPyhuq AAzGAWCFdAKep93lG+yL4ZIn4/dB4F5SfgWiuIci0gRz5Jc/WSpHykVxXgtPRca51cTh Kjt4vcYHHiUBWKNDbEdNkw5LmtxPtJAs961xziLXEQQmcRRKC+XTvb8X3Mv5HuNd3g65 rbEnCtYwG9f2sqWhaLJitGD3eobDHYJm3Lp8idB4WHCamsENmNmOfrndZ99VziA1718v ZFwhg4eMsDrzkJAlatdDRFvDNHgbLy+EDTZnB6N9dZ7yt4JVcAMrYLFaHtXLj4rhKRWg o1Pg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=nWGrggQ/qDTMgdBpDhPDYEEJzTc7x7HUNebehORCAiA=; b=xW7/t2TW9JqIWsGkrDP8r8AazBmsFE4vvffdIaA4l29Q95s1Bkt49qNYMcJaAlK7oi 9Dpbl9eoi69eLCLe8ETNyCx4KEFOVah6HQgOsTsp1CsDwA08DayDX3WjCwvs1nqvDCSW XxaaGoeYvfIWPScgQ3YV6eVx3Gihj5m+nFFkIn8gbqyj8dZaIjcF3ArVyf+B+iLWhtEL 3yMUMlhfc+0tzimQkJHC4+n82M5RsxvpK3WFraZXK6VtqaJ1BWtecD0302PRvyi4oOg0 e5TvIDjNrX4OMlkL56/MAp4+hls/cewx3LEExUuOd4s82gO8Ol3NRrD7ksJq2ePnzUS7 9lxA== X-Gm-Message-State: AO0yUKWX15o8wkC+wqIuhGvnw3DTwyGd1W0ysXdK79AIGNwDfVPTW2CC RRvWxlCQoU9AgaMIsg2RIYWSfeQy8FBUoWZa X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a5b:5c3:0:b0:8ed:262d:defe with SMTP id w3-20020a5b05c3000000b008ed262ddefemr185750ybp.0.1676680187212; Fri, 17 Feb 2023 16:29:47 -0800 (PST) Date: Sat, 18 Feb 2023 00:28:19 +0000 In-Reply-To: <20230218002819.1486479-1-jthoughton@google.com> Mime-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> X-Mailer: git-send-email 2.39.2.637.g21b0678d19-goog Message-ID: <20230218002819.1486479-47-jthoughton@google.com> Subject: [PATCH v2 46/46] selftests/mm: add HGM UFFDIO_CONTINUE and hwpoison tests From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , Jiaqi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1758127813861022201?= X-GMAIL-MSGID: =?utf-8?q?1758127813861022201?= Test that high-granularity CONTINUEs at all sizes work (exercising contiguous PTE sizes for arm64, when support is added). Also test that collapse works and hwpoison works correctly (although we aren't yet testing high-granularity poison). This test uses UFFD_FEATURE_EVENT_FORK + UFFD_REGISTER_MODE_WP to force the kernel to copy page tables on fork(), exercising the changes to copy_hugetlb_page_range(). Also test that UFFDIO_WRITEPROTECT doesn't prevent UFFDIO_CONTINUE from behaving properly (in other words, that HGM walks treat UFFD-WP markers like blank PTEs in the appropriate cases). We also test that the uffd-wp PTE markers are preserved properly. Signed-off-by: James Houghton create mode 100644 tools/testing/selftests/mm/hugetlb-hgm.c diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile index d90cdc06aa59..920baccccb9e 100644 --- a/tools/testing/selftests/mm/Makefile +++ b/tools/testing/selftests/mm/Makefile @@ -36,6 +36,7 @@ TEST_GEN_FILES += compaction_test TEST_GEN_FILES += gup_test TEST_GEN_FILES += hmm-tests TEST_GEN_FILES += hugetlb-madvise +TEST_GEN_FILES += hugetlb-hgm TEST_GEN_FILES += hugepage-mmap TEST_GEN_FILES += hugepage-mremap TEST_GEN_FILES += hugepage-shm diff --git a/tools/testing/selftests/mm/hugetlb-hgm.c b/tools/testing/selftests/mm/hugetlb-hgm.c new file mode 100644 index 000000000000..4c27a6a11818 --- /dev/null +++ b/tools/testing/selftests/mm/hugetlb-hgm.c @@ -0,0 +1,608 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Test uncommon cases in HugeTLB high-granularity mapping: + * 1. Test all supported high-granularity page sizes (with MADV_COLLAPSE). + * 2. Test MADV_HWPOISON behavior. + * 3. Test interaction with UFFDIO_WRITEPROTECT. + */ + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define PAGE_SIZE 4096 +#define PAGE_MASK ~(PAGE_SIZE - 1) + +#ifndef MADV_COLLAPSE +#define MADV_COLLAPSE 25 +#endif + +#ifndef MADV_SPLIT +#define MADV_SPLIT 26 +#endif + +#define PREFIX " ... " +#define ERROR_PREFIX " !!! " + +static void *sigbus_addr; +bool was_mceerr; +bool got_sigbus; +bool expecting_sigbus; + +enum test_status { + TEST_PASSED = 0, + TEST_FAILED = 1, + TEST_SKIPPED = 2, +}; + +static char *status_to_str(enum test_status status) +{ + switch (status) { + case TEST_PASSED: + return "TEST_PASSED"; + case TEST_FAILED: + return "TEST_FAILED"; + case TEST_SKIPPED: + return "TEST_SKIPPED"; + default: + return "TEST_???"; + } +} + +static int userfaultfd(int flags) +{ + return syscall(__NR_userfaultfd, flags); +} + +static int map_range(int uffd, char *addr, uint64_t length) +{ + struct uffdio_continue cont = { + .range = (struct uffdio_range) { + .start = (uint64_t)addr, + .len = length, + }, + .mode = 0, + .mapped = 0, + }; + + if (ioctl(uffd, UFFDIO_CONTINUE, &cont) < 0) { + perror(ERROR_PREFIX "UFFDIO_CONTINUE failed"); + return -1; + } + return 0; +} + +static int userfaultfd_writeprotect(int uffd, char *addr, uint64_t length, + bool protect) +{ + struct uffdio_writeprotect wp = { + .range = (struct uffdio_range) { + .start = (uint64_t)addr, + .len = length, + }, + .mode = UFFDIO_WRITEPROTECT_MODE_DONTWAKE, + }; + + if (protect) + wp.mode = UFFDIO_WRITEPROTECT_MODE_WP; + + printf(PREFIX "UFFDIO_WRITEPROTECT: %p -> %p (%sprotected)\n", addr, + addr + length, protect ? "" : "un"); + + if (ioctl(uffd, UFFDIO_WRITEPROTECT, &wp) < 0) { + perror(ERROR_PREFIX "UFFDIO_WRITEPROTECT failed"); + return -1; + } + return 0; +} + +static int check_equal(char *mapping, size_t length, char value) +{ + size_t i; + + for (i = 0; i < length; ++i) + if (mapping[i] != value) { + printf(ERROR_PREFIX "mismatch at %p (%d != %d)\n", + &mapping[i], mapping[i], value); + return -1; + } + + return 0; +} + +static int test_continues(int uffd, char *primary_map, char *secondary_map, + size_t len, bool verify) +{ + size_t offset = 0; + unsigned char iter = 0; + unsigned long pagesize = getpagesize(); + uint64_t size; + + for (size = len/2; size >= pagesize; + offset += size, size /= 2) { + iter++; + memset(secondary_map + offset, iter, size); + printf(PREFIX "UFFDIO_CONTINUE: %p -> %p = %d%s\n", + primary_map + offset, + primary_map + offset + size, + iter, + verify ? " (and verify)" : ""); + if (map_range(uffd, primary_map + offset, size)) + return -1; + if (verify && check_equal(primary_map + offset, size, iter)) + return -1; + } + return 0; +} + +static int verify_contents(char *map, size_t len, bool last_page_zero) +{ + size_t offset = 0; + int i = 0; + uint64_t size; + + for (size = len/2; size > PAGE_SIZE; offset += size, size /= 2) + if (check_equal(map + offset, size, ++i)) + return -1; + + if (last_page_zero) + if (check_equal(map + len - PAGE_SIZE, PAGE_SIZE, 0)) + return -1; + + return 0; +} + +static int test_collapse(char *primary_map, size_t len, bool verify) +{ + int ret = 0; + + printf(PREFIX "collapsing %p -> %p\n", primary_map, primary_map + len); + if (madvise(primary_map, len, MADV_COLLAPSE) < 0) { + perror(ERROR_PREFIX "collapse failed"); + return -1; + } + + if (verify) { + printf(PREFIX "verifying %p -> %p\n", primary_map, + primary_map + len); + ret = verify_contents(primary_map, len, true); + } + return ret; +} + +static void sigbus_handler(int signo, siginfo_t *info, void *context) +{ + if (!expecting_sigbus) + printf(ERROR_PREFIX "unexpected sigbus: %p\n", info->si_addr); + + got_sigbus = true; + was_mceerr = info->si_code == BUS_MCEERR_AR; + sigbus_addr = info->si_addr; + + pthread_exit(NULL); +} + +static void *access_mem(void *addr) +{ + volatile char *ptr = addr; + + /* + * Do a write without changing memory contents, as other routines will + * need to verify that mapping contents haven't changed. + * + * We do a write so that we trigger uffd-wp SIGBUSes. To test that we + * get HWPOISON SIGBUSes, we would only need to read. + */ + *ptr = *ptr; + return NULL; +} + +static int test_sigbus(char *addr, bool poison) +{ + int ret; + pthread_t pthread; + + sigbus_addr = (void *)0xBADBADBAD; + was_mceerr = false; + got_sigbus = false; + expecting_sigbus = true; + ret = pthread_create(&pthread, NULL, &access_mem, addr); + if (ret) { + printf(ERROR_PREFIX "failed to create thread: %s\n", + strerror(ret)); + goto out; + } + + pthread_join(pthread, NULL); + + ret = -1; + if (!got_sigbus) + printf(ERROR_PREFIX "didn't get a SIGBUS: %p\n", addr); + else if (sigbus_addr != addr) + printf(ERROR_PREFIX "got incorrect sigbus address: %p vs %p\n", + sigbus_addr, addr); + else if (poison && !was_mceerr) + printf(ERROR_PREFIX "didn't get an MCEERR?\n"); + else + ret = 0; +out: + expecting_sigbus = false; + return ret; +} + +static void *read_from_uffd_thd(void *arg) +{ + int uffd = *(int *)arg; + struct uffd_msg msg; + /* opened without O_NONBLOCK */ + if (read(uffd, &msg, sizeof(msg)) != sizeof(msg)) + printf(ERROR_PREFIX "reading uffd failed\n"); + + return NULL; +} + +static int read_event_from_uffd(int *uffd, pthread_t *pthread) +{ + int ret = 0; + + ret = pthread_create(pthread, NULL, &read_from_uffd_thd, (void *)uffd); + if (ret) { + printf(ERROR_PREFIX "failed to create thread: %s\n", + strerror(ret)); + return ret; + } + return 0; +} + +static int test_sigbus_range(char *primary_map, size_t len, bool hwpoison) +{ + const unsigned long pagesize = getpagesize(); + const int num_checks = 512; + unsigned long bytes_per_check = len/num_checks; + int i; + + printf(PREFIX "checking that we can't access " + "(%d addresses within %p -> %p)\n", + num_checks, primary_map, primary_map + len); + + if (pagesize > bytes_per_check) + bytes_per_check = pagesize; + + for (i = 0; i < len; i += bytes_per_check) + if (test_sigbus(primary_map + i, hwpoison) < 0) + return 1; + /* check very last byte, because we left it unmapped */ + if (test_sigbus(primary_map + len - 1, hwpoison)) + return 1; + + return 0; +} + +static enum test_status test_hwpoison(char *primary_map, size_t len) +{ + printf(PREFIX "poisoning %p -> %p\n", primary_map, primary_map + len); + if (madvise(primary_map, len, MADV_HWPOISON) < 0) { + perror(ERROR_PREFIX "MADV_HWPOISON failed"); + return TEST_SKIPPED; + } + + return test_sigbus_range(primary_map, len, true) + ? TEST_FAILED : TEST_PASSED; +} + +static int test_fork(int uffd, char *primary_map, size_t len) +{ + int status; + int ret = 0; + pid_t pid; + pthread_t uffd_thd; + + /* + * UFFD_FEATURE_EVENT_FORK will put fork event on the userfaultfd, + * which we must read, otherwise we block fork(). Setup a thread to + * read that event now. + * + * Page fault events should result in a SIGBUS, so we expect only a + * single event from the uffd (the fork event). + */ + if (read_event_from_uffd(&uffd, &uffd_thd)) + return -1; + + pid = fork(); + + if (!pid) { + /* + * Because we have UFFDIO_REGISTER_MODE_WP and + * UFFD_FEATURE_EVENT_FORK, the page tables should be copied + * exactly. + * + * Check that everything except that last 4K has correct + * contents, and then check that the last 4K gets a SIGBUS. + */ + printf(PREFIX "child validating...\n"); + ret = verify_contents(primary_map, len, false) || + test_sigbus(primary_map + len - 1, false); + ret = 0; + exit(ret ? 1 : 0); + } else { + /* wait for the child to finish. */ + waitpid(pid, &status, 0); + ret = WEXITSTATUS(status); + if (!ret) { + printf(PREFIX "parent validating...\n"); + /* Same check as the child. */ + ret = verify_contents(primary_map, len, false) || + test_sigbus(primary_map + len - 1, false); + ret = 0; + } + } + + pthread_join(uffd_thd, NULL); + return ret; + +} + +static int uffd_register(int uffd, char *primary_map, unsigned long len, + int mode) +{ + struct uffdio_register reg; + + reg.range.start = (unsigned long)primary_map; + reg.range.len = len; + reg.mode = mode; + + reg.ioctls = 0; + return ioctl(uffd, UFFDIO_REGISTER, ®); +} + +enum test_type { + TEST_DEFAULT, + TEST_UFFDWP, + TEST_HWPOISON +}; + +static enum test_status +test_hgm(int fd, size_t hugepagesize, size_t len, enum test_type type) +{ + int uffd; + char *primary_map, *secondary_map; + struct uffdio_api api; + struct sigaction new, old; + enum test_status status = TEST_SKIPPED; + bool hwpoison = type == TEST_HWPOISON; + bool uffd_wp = type == TEST_UFFDWP; + bool verify = type == TEST_DEFAULT; + int register_args; + + if (ftruncate(fd, len) < 0) { + perror(ERROR_PREFIX "ftruncate failed"); + return status; + } + + uffd = userfaultfd(O_CLOEXEC); + if (uffd < 0) { + perror(ERROR_PREFIX "uffd not created"); + return status; + } + + primary_map = mmap(NULL, len, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + if (primary_map == MAP_FAILED) { + perror(ERROR_PREFIX "mmap for primary mapping failed"); + goto close_uffd; + } + secondary_map = mmap(NULL, len, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + if (secondary_map == MAP_FAILED) { + perror(ERROR_PREFIX "mmap for secondary mapping failed"); + goto unmap_primary; + } + + printf(PREFIX "primary mapping: %p\n", primary_map); + printf(PREFIX "secondary mapping: %p\n", secondary_map); + + api.api = UFFD_API; + api.features = UFFD_FEATURE_SIGBUS | UFFD_FEATURE_EXACT_ADDRESS | + UFFD_FEATURE_EVENT_FORK; + if (ioctl(uffd, UFFDIO_API, &api) == -1) { + perror(ERROR_PREFIX "UFFDIO_API failed"); + goto out; + } + + if (madvise(primary_map, len, MADV_SPLIT)) { + perror(ERROR_PREFIX "MADV_SPLIT failed"); + goto out; + } + + /* + * Register with UFFDIO_REGISTER_MODE_WP to force fork() to copy page + * tables (also need UFFD_FEATURE_EVENT_FORK, which we have). + */ + register_args = UFFDIO_REGISTER_MODE_MISSING | UFFDIO_REGISTER_MODE_WP; + if (!uffd_wp) + /* + * If we're testing UFFDIO_WRITEPROTECT, then we don't want + * minor faults. With minor faults enabled, we'll get SIGBUSes + * for any minor fault, wheresa without minot faults enabled, + * writes will verify that uffd-wp PTE markers were installed + * properly. + */ + register_args |= UFFDIO_REGISTER_MODE_MINOR; + + if (uffd_register(uffd, primary_map, len, register_args)) { + perror(ERROR_PREFIX "UFFDIO_REGISTER failed"); + goto out; + } + + + new.sa_sigaction = &sigbus_handler; + new.sa_flags = SA_SIGINFO; + if (sigaction(SIGBUS, &new, &old) < 0) { + perror(ERROR_PREFIX "could not setup SIGBUS handler"); + goto out; + } + + status = TEST_FAILED; + + if (uffd_wp) { + /* + * Install uffd-wp PTE markers now. They should be preserved + * as we split the mappings with UFFDIO_CONTINUE later. + */ + if (userfaultfd_writeprotect(uffd, primary_map, len, true)) + goto done; + /* Verify that we really are write-protected. */ + if (test_sigbus(primary_map, false)) + goto done; + } + + /* + * Main piece of the test: map primary_map at all the possible + * page sizes. Starting at the hugepage size and going down to + * PAGE_SIZE. This leaves the final PAGE_SIZE piece of the mapping + * unmapped. + */ + if (test_continues(uffd, primary_map, secondary_map, len, verify)) + goto done; + + /* + * Verify that MADV_HWPOISON is able to properly poison the entire + * mapping. + */ + if (hwpoison) { + enum test_status new_status = test_hwpoison(primary_map, len); + + if (new_status != TEST_PASSED) { + status = new_status; + goto done; + } + } + + if (uffd_wp) { + /* + * Check that the uffd-wp marker we installed initially still + * exists in the unmapped 4K piece at the end the mapping. + * + * test_sigbus() will do a write. When this happens: + * 1. The page fault handler will find the uffd-wp marker and + * create a read-only PTE. + * 2. The memory access is retried, and the page fault handler + * will find that a write was attempted in a UFFD_WP VMA + * where a RO mapping exists, so SIGBUS + * (we have UFFD_FEATURE_SIGBUS). + * + * We only check the final pag because UFFDIO_CONTINUE will + * have cleared the write-protection on all the other pieces + * of the mapping. + */ + printf(PREFIX "verifying that we can't write to final page\n"); + if (test_sigbus(primary_map + len - 1, false)) + goto done; + } + + if (!hwpoison) + /* + * test_fork() will verify memory contents. We can't do + * that if memory has been poisoned. + */ + if (test_fork(uffd, primary_map, len)) + goto done; + + /* + * Check that MADV_COLLAPSE functions properly. That is: + * - the PAGE_SIZE hole we had is no longer unmapped. + * - poisoned regions are still poisoned. + * + * Verify the data is correct if we haven't poisoned. + */ + if (test_collapse(primary_map, len, !hwpoison)) + goto done; + /* + * Verify that memory is still poisoned. + */ + if (hwpoison && test_sigbus_range(primary_map, len, true)) + goto done; + + status = TEST_PASSED; + +done: + if (ftruncate(fd, 0) < 0) { + perror(ERROR_PREFIX "ftruncate back to 0 failed"); + status = TEST_FAILED; + } + +out: + munmap(secondary_map, len); +unmap_primary: + munmap(primary_map, len); +close_uffd: + close(uffd); + return status; +} + +int main(void) +{ + int fd; + struct statfs file_stat; + size_t hugepagesize; + size_t len; + enum test_status status; + int ret = 0; + + fd = memfd_create("hugetlb_tmp", MFD_HUGETLB); + if (fd < 0) { + perror(ERROR_PREFIX "could not open hugetlbfs file"); + return -1; + } + + memset(&file_stat, 0, sizeof(file_stat)); + if (fstatfs(fd, &file_stat)) { + perror(ERROR_PREFIX "fstatfs failed"); + goto close; + } + if (file_stat.f_type != HUGETLBFS_MAGIC) { + printf(ERROR_PREFIX "not hugetlbfs file\n"); + goto close; + } + + hugepagesize = file_stat.f_bsize; + len = 2 * hugepagesize; + + printf("HGM regular test...\n"); + status = test_hgm(fd, hugepagesize, len, TEST_DEFAULT); + printf("HGM regular test: %s\n", status_to_str(status)); + if (status == TEST_FAILED) + ret = -1; + + printf("HGM uffd-wp test...\n"); + status = test_hgm(fd, hugepagesize, len, TEST_UFFDWP); + printf("HGM uffd-wp test: %s\n", status_to_str(status)); + if (status == TEST_FAILED) + ret = -1; + + printf("HGM hwpoison test...\n"); + status = test_hgm(fd, hugepagesize, len, TEST_HWPOISON); + printf("HGM hwpoison test: %s\n", status_to_str(status)); + if (status == TEST_FAILED) + ret = -1; +close: + close(fd); + + return ret; +}