From patchwork Fri Oct 21 16:36:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6821 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp794936wrr; Fri, 21 Oct 2022 09:38:00 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5SG8xs2UpjZ/6fG+3ilOm2msxYrsBJsP7HBAmP9dOIWb/LpgIXzLKVOyF2/mTYj4ZJ7CiS X-Received: by 2002:a17:902:ecc2:b0:181:b55a:f954 with SMTP id a2-20020a170902ecc200b00181b55af954mr19997347plh.32.1666370280083; Fri, 21 Oct 2022 09:38:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370280; cv=none; d=google.com; s=arc-20160816; b=QhxfXTuR3wtzwOcua40pB3RXrFwggQEJRGZOCb1EY8SQoKzq/khNn+ysuZAZBWDzRL b6F2iu+TvxXJTObsau8syg3nbVffMbNvMmhcIE96HEaVWVbGhtt+pRPnxCkSh3kglKUV YCNhk4UaOp2nRHuXDF5eY+8OOjZdl0E2arXFSOFmMEpHHoOlIxR2TSMH3Q0KQSRaD0aE RiOxpU+QkvM5ePEk7dHPv7xow1buqjdCSmcyaHzEWeZ89PE/pxoH/RAYZ3GjpcLiQHz2 dqiLw3mn9ynKhDNq7o+PS9z1fcOXpvzT1RUbbL44O958qy0rFiM9OcAgtKivWGA1Z6bl zZxQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=PyHYN0BZDkl5R+we+ok2phOJ6Br6Eo9HknyqZ/f6blI=; b=0F1YEoHfNdMx1/0mBUVwyaQ7tTFHNuFt/ZoLO14IKecwP9Q4oWlH4KuS9OSNp5UA2s kFoB9Ig798ySvHFL2bWNbspSrD2ojceCVws9EgtsWLUt9yzqDSBYzwpIOYXtnARG8Te0 +1OUASopNEb/9Dn4HzO9EjQyraF32HKPyrFfpdUYcgW7FxxQtuSHW02R3qJ8VY91qp8V LJlyaiCokkn1NFJp4iZhlJSia/ZG+y6op2KsN6uQ25sTZKHX7SEVa3Yz5zmgOjhXy/Zo 7/LLhMrBxISJbF8+Lrpf5EmOB15VKf3Bub2WgDHf7r612qx+B7PrQ0G4C6qg33QrlUrO Tfpw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=k4YSYz8E; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g3-20020a17090a300300b00212d9aae585si123017pjb.57.2022.10.21.09.37.46; Fri, 21 Oct 2022 09:38:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=k4YSYz8E; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230269AbiJUQhV (ORCPT + 99 others); Fri, 21 Oct 2022 12:37:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52290 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230285AbiJUQhR (ORCPT ); Fri, 21 Oct 2022 12:37:17 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 597BE26B6FE for ; Fri, 21 Oct 2022 09:37:16 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id 189-20020a2516c6000000b006bbbcc3dd9bso3751596ybw.15 for ; Fri, 21 Oct 2022 09:37:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=PyHYN0BZDkl5R+we+ok2phOJ6Br6Eo9HknyqZ/f6blI=; b=k4YSYz8Eqmi/g8cLFz6T0cXUQewY5TrykcLo3J6vLFEFW+zVg07R+6L2pGz4hYkpjh YeRSCb3IFz9Lx2ADjHguv3Bcl2NzvlSG0UaHoorM2hx35cBJZbjKBwoRh5rxOv53PUtH 6JfJ5o1zynpQr144bkb1sr56ifBdHTV2T5B0ZasYzlZTGp6KE2/lEQXR+tm8Fb9Lka+0 ayKHUamTEhmRm9nNm0eDQcbCVIN9xZXd7NlM3NhcPsDm0mH+iuwgerGgDIztDwXkgbUe GoioeQ/qERTklWRzJZVlo//zRoztIQ/e5hovtXxIkbCWhIHv+mW+9bZYapriY1Hylf8f cQlw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=PyHYN0BZDkl5R+we+ok2phOJ6Br6Eo9HknyqZ/f6blI=; b=M/7XFWEEWXPAwLkTN5fa931ccBYhOHovW3gYxyyZCKZMjxk4IB9n+LzWGYoQJhQef+ xc4VK/jWCgvMhyHIerWdslKj77BYKnRawUa3u1OKFpMmQ0P7K4BcBjmDkSf/ugYDQ+BT BwufZdS08PA7MJp1KAYLCuGYRynxzrUG19DBHeTXNqvGsoaGxQH5oNVJySI8JWpbTmQ/ OZGUcZJah0q1/umWDVQITq8PXjsgpnWu9+a11EpBH1/lmBPxPkyM2EyQ+2TO+BC1zI3b 8ThG2JRZ1f0ZZFX0R2hkoPj9KpGwZxIfy9q6MMELzGsuBnTOghkC08c8LOhvpQtjeP5d 7tLQ== X-Gm-Message-State: ACrzQf2tsD01D5VtFxZ2J0Eokeu7ZHQ2sc5ms5h217nm/5ajPf7o2L63 ZoMlq0Gte/bgIOvnAD0U/C4ZUnlEwMLZDKAl X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:6a8b:0:b0:6c0:5610:b6f1 with SMTP id f133-20020a256a8b000000b006c05610b6f1mr17608546ybc.273.1666370235466; Fri, 21 Oct 2022 09:37:15 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:17 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-2-jthoughton@google.com> Subject: [RFC PATCH v2 01/47] hugetlb: don't set PageUptodate for UFFDIO_CONTINUE From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747315882674959927?= X-GMAIL-MSGID: =?utf-8?q?1747315882674959927?= This is how it should have been to begin with. It would be very bad if we actually set PageUptodate with a UFFDIO_CONTINUE, as UFFDIO_CONTINUE doesn't actually set/update the contents of the page, so we would be exposing a non-zeroed page to the user. The reason this change is being made now is because UFFDIO_CONTINUEs on subpages definitely shouldn't set this page flag on the head page. Signed-off-by: James Houghton --- mm/hugetlb.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 1a7dc7b2e16c..650761cdd2f6 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6097,7 +6097,10 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, * preceding stores to the page contents become visible before * the set_pte_at() write. */ - __SetPageUptodate(page); + if (!is_continue) + __SetPageUptodate(page); + else + VM_WARN_ON_ONCE_PAGE(!PageUptodate(page), page); /* Add shared, newly allocated pages to the page cache. */ if (vm_shared && !is_continue) { From patchwork Fri Oct 21 16:36:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6820 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp794917wrr; Fri, 21 Oct 2022 09:37:57 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5Tb0cW8I6ItNtO5mpsTc9jg0BNudgGRAWb6ghrVnQpJCO3/8j0C3PApP0e0BdJZ+E6BI5r X-Received: by 2002:a17:902:cecd:b0:185:46d3:8cad with SMTP id d13-20020a170902cecd00b0018546d38cadmr19926470plg.83.1666370276869; Fri, 21 Oct 2022 09:37:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370276; cv=none; d=google.com; s=arc-20160816; b=I7SAuRX0xqfu69oKbGs0iyqKIZiGohDMLkUgOqMBCoDJjUwb1gHlYtHsgqhGEBbVqI gOsyD4m8D5OntxshMpvqZWgFrn/8kxofG4MCzjy1/1iUOHkns4qJf/vTEe57x8s13jj9 gf4YvYgWV5Q9/rzBItpeFeSYjdM5MAa/CbHCwFYj/zsSja/VJJLHR5CS7EGKfgSPyGeV lWylGGcZQ5/d4RqixNOyp8tk+mJFCloycSKKMOYIUTAuRUmb3xblnAFwxqreDAcCrjfL AErxYLcYxrdQk2DfjBg/owN9vVQDA0mv8snaQq7s+iPxDNRXmrWYMTQFWblW+oDAmhMP VuoA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=esgdcnz6KGc3B9V+LZ2N8Ds4/nBZJJcBbCbHMbBwHjk=; b=TEA5X5t8CCpqH80z0Rauzo85oBkeEWk7KoyQDRnQ5Vqjh/DLz9tCmmEuhGpQ1N4WDI i/sVgFF6UPvnImRAq9B3DrdOUHA031lReOM2885gspXJg75/DixOrwtF7CCuziVhmWcT phS7nPBv5aLZoF06Kcodt/3n7velDG5+T/Jl8WtL6i9TuzriP89s0I581aEpyI8+lvEh ktljW8HvLr9TahQGegzgZxF/g5ScwEzb31jBCFkon60/qhrJVkfigHIOSYECHTuE/TFf JwQToyYaXDibsDC8abfuqJZnghl+dADoAEMgVSU+fBRrFgzTG3t0Zs873OHcjV1opa0A 6LIQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=pIJH5olO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id lw12-20020a17090b180c00b002036aa03b94si8096770pjb.89.2022.10.21.09.37.43; Fri, 21 Oct 2022 09:37:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=pIJH5olO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230307AbiJUQh1 (ORCPT + 99 others); Fri, 21 Oct 2022 12:37:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52318 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230310AbiJUQhS (ORCPT ); Fri, 21 Oct 2022 12:37:18 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A1C7B24BAA4 for ; Fri, 21 Oct 2022 09:37:17 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id y126-20020a257d84000000b006c554365f5aso3713557ybc.9 for ; Fri, 21 Oct 2022 09:37:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=esgdcnz6KGc3B9V+LZ2N8Ds4/nBZJJcBbCbHMbBwHjk=; b=pIJH5olO/I2dA/fnJ38RNDUNag1jCBRbSJReNXDuFWFG47O9hxJi0FrZmqRDXnMQI/ /1xVDVB5mv31fhcLpuKcwsVOmx47jsNdpgEmx1ogSlHkiRq2gQSCkBfkeXDdnWNMRdDn WgoWwu2YOgG4/TAj7KoF7BJ0B/PakCZecR5Fz7AxDD2HusQ0sKx0GmA+SDCOGrHBGHKJ 9+3IXcRAfLkNOoszDtrPaH07o7iDaKBcPTzeRh8Cc75lTqqKCgJ+kxoLRkdDPkZ1rJBO w37Zjh0EOVT4NqJ7iFe9EVTQT2dlpacbUdy+60R4swoG2PQ3BsajOoLaJnerCKU7XJt3 wX9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=esgdcnz6KGc3B9V+LZ2N8Ds4/nBZJJcBbCbHMbBwHjk=; b=hXlOKougrp1L1NzAjkP9Eph4tp55PF7HFQeXy7pITp+zSFRcYo25YGcVMyIPiJJtS7 qwqn/V+dS6u77aNSzK2s1jdcKdw5buVdlBXgxE6w7X9FSmBlxKyFt8TGh7iuJHxGiGwF qLuW5M4fa9i5CColDaeV+tF65dGjKE8nJZLsJtFZ4y19MM3XQB9s4IO8cvyMHyvLuSWb 9kpYkQKLsRObTQcSm90+hT3NcyUOeHkmNdr7CQlX9vBmEd0pSROEQY8WJcMUzFhJSDTc o7xfnzfXvgOrky1U82d9h+qQgHtJP+6FY4sjO59JX/OqQWTDehMzZuQHHtlTnK6R/NqU q+0A== X-Gm-Message-State: ACrzQf0LK0HOrs4YdP7v6wqJ7NY4cf5QYxMYH6LqhTMB0qoBSjMALetU q07sFvjo+N1rZmRQlTxhW/yytsBO/3tOqGgP X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:3cc4:0:b0:6af:c67e:909e with SMTP id j187-20020a253cc4000000b006afc67e909emr17009872yba.266.1666370236967; Fri, 21 Oct 2022 09:37:16 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:18 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-3-jthoughton@google.com> Subject: [RFC PATCH v2 02/47] hugetlb: remove mk_huge_pte; it is unused From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747315879208128154?= X-GMAIL-MSGID: =?utf-8?q?1747315879208128154?= mk_huge_pte is unused and not necessary. pte_mkhuge is the appropriate function to call to create a HugeTLB PTE (see Documentation/mm/arch_pgtable_helpers.rst). It is being removed now to avoid complicating the implementation of HugeTLB high-granularity mapping. Signed-off-by: James Houghton Acked-by: Peter Xu Acked-by: Mina Almasry Reviewed-by: Mike Kravetz --- arch/s390/include/asm/hugetlb.h | 5 ----- include/asm-generic/hugetlb.h | 5 ----- mm/debug_vm_pgtable.c | 2 +- mm/hugetlb.c | 7 +++---- 4 files changed, 4 insertions(+), 15 deletions(-) diff --git a/arch/s390/include/asm/hugetlb.h b/arch/s390/include/asm/hugetlb.h index ccdbccfde148..c34893719715 100644 --- a/arch/s390/include/asm/hugetlb.h +++ b/arch/s390/include/asm/hugetlb.h @@ -77,11 +77,6 @@ static inline void huge_ptep_set_wrprotect(struct mm_struct *mm, set_huge_pte_at(mm, addr, ptep, pte_wrprotect(pte)); } -static inline pte_t mk_huge_pte(struct page *page, pgprot_t pgprot) -{ - return mk_pte(page, pgprot); -} - static inline int huge_pte_none(pte_t pte) { return pte_none(pte); diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h index a57d667addd2..aab9e46fa628 100644 --- a/include/asm-generic/hugetlb.h +++ b/include/asm-generic/hugetlb.h @@ -5,11 +5,6 @@ #include #include -static inline pte_t mk_huge_pte(struct page *page, pgprot_t pgprot) -{ - return mk_pte(page, pgprot); -} - static inline unsigned long huge_pte_write(pte_t pte) { return pte_write(pte); diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c index 2b61fde8c38c..10573a283a12 100644 --- a/mm/debug_vm_pgtable.c +++ b/mm/debug_vm_pgtable.c @@ -929,7 +929,7 @@ static void __init hugetlb_basic_tests(struct pgtable_debug_args *args) * as it was previously derived from a real kernel symbol. */ page = pfn_to_page(args->fixed_pmd_pfn); - pte = mk_huge_pte(page, args->page_prot); + pte = mk_pte(page, args->page_prot); WARN_ON(!huge_pte_dirty(huge_pte_mkdirty(pte))); WARN_ON(!huge_pte_write(huge_pte_mkwrite(huge_pte_wrprotect(pte)))); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 650761cdd2f6..20a111b532aa 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4728,11 +4728,10 @@ static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page, unsigned int shift = huge_page_shift(hstate_vma(vma)); if (writable) { - entry = huge_pte_mkwrite(huge_pte_mkdirty(mk_huge_pte(page, - vma->vm_page_prot))); + entry = huge_pte_mkwrite(huge_pte_mkdirty(mk_pte(page, + vma->vm_page_prot))); } else { - entry = huge_pte_wrprotect(mk_huge_pte(page, - vma->vm_page_prot)); + entry = huge_pte_wrprotect(mk_pte(page, vma->vm_page_prot)); } entry = pte_mkyoung(entry); entry = arch_make_huge_pte(entry, shift, vma->vm_flags); From patchwork Fri Oct 21 16:36:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6824 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp795085wrr; Fri, 21 Oct 2022 09:38:19 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5i9RwhOQCF7W2FhpUUbjb/96mrZlVN9I6pqn2wn8BaOBGcYCVjjfVNwLdG0QprNiiXZLUT X-Received: by 2002:a17:907:6087:b0:78d:ee99:a06a with SMTP id ht7-20020a170907608700b0078dee99a06amr16404525ejc.52.1666370298711; Fri, 21 Oct 2022 09:38:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370298; cv=none; d=google.com; s=arc-20160816; b=F9V7G68BfXCdRKY3AyIQ62adLO3ZMgfynfKgEOya4Pz9cVYPXLrTFTt0ymXx6s3m7q iGQEnYxje1Ykw1wuaOD5NYQLn4A6qb6cbKCXqfLQcNzVJJqW2+BQmhp7Hz72nMp2UKXD hvCLXw1q+srMm5Eck2St/xMKkkSnGFb1f/vmdzYfBt9S0O3nkBTequBfawqxwEMfwf2e QFasO+iOrlXVrlkhnSzHoHVzV9eRaJxyyrbhpL2y8bQdX7TpwwSCPC8C2VBWsFkHgdZ9 YlR1G6efPCD5g1mqm2Z+OmZtF3cyG335+APrwhS1s+i6gmeFhXn3towbbcGT1B+n3sIq duPQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=lZUN9P7JGIQHPnquxx9reV3Jbe3B3qBfX9sZ+LRQm7o=; b=P0OCkAzeNmuO7uOYEIAS4r+HsuKnygOPJg4cuTXbVopW7+rNGDMO9SehFuoF1dYHRr Yek9Vq89XAzoer87K4dEBjvGUgUsZPaxEZCN1KtUF8QFDG/t7ai1Fe4PvdmEuBND+7+I 5NESNdErJrxA95JEo0Yx4uIC8rlOPSwSDZ92makvyZ3xY6gaIOxAMprvPuKgHQ/jq1eW hnTcPbHL/jnuq3rkWgVFcHmPMG/hFMTSUpK5T7z1eTjBWHILZtNe7H+sJiRmLqrSv892 t9zvi8px93WnDYdVccf4CccGdX2ioxDVW+aX0Pu9iNXM7hxss8aOJFerOf/royaQQIcG neOg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=GJoNeaGD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hq38-20020a1709073f2600b0078bb4d5db86si19847344ejc.77.2022.10.21.09.37.53; Fri, 21 Oct 2022 09:38:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=GJoNeaGD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230465AbiJUQhg (ORCPT + 99 others); Fri, 21 Oct 2022 12:37:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52392 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230324AbiJUQhV (ORCPT ); Fri, 21 Oct 2022 12:37:21 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3A2A124BAA4 for ; Fri, 21 Oct 2022 09:37:19 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-360a7ff46c3so34028667b3.12 for ; Fri, 21 Oct 2022 09:37:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=lZUN9P7JGIQHPnquxx9reV3Jbe3B3qBfX9sZ+LRQm7o=; b=GJoNeaGDjwD1Fl1cyhHuK2aLhzkgPOKG7HEw7TkRJFgoz4poH2nLCE3sHMXaMDFunc w58AGv5zD/ndOCN5807bGBahVC5t3I4qD5n2viZgEwpgYmeuPO+w8ZUmQC0tetwyZma6 tnt+SbOGRucmBwJ4t0wW+XGY70fiV6i1Si6lkcLf27swVkRoOYIPAlkG/JqDsto1XLnY Mc9LriuigNU7e6Dt/Y1OKV5sJg/o96ytQuNXvUjynTNWGnVUG1Ap7CSa0NXR3xqkTQw9 i4dLRfrG2R7TcYitFjvdRgvivePstzTsFNXsq03qGK/00RpCzirkjyJ+wubBLyXDN6fR YW/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=lZUN9P7JGIQHPnquxx9reV3Jbe3B3qBfX9sZ+LRQm7o=; b=o+R0WxT/8v31c9BMStrefNo+ZArjhbmQ+GQcbufIecRYK4F2K3WpYs8Rfcwe3lncEI VTiuSY2acelWLxlJBN1P+lcgmQjffkI9jU6Ab5wAAsP/C5ge92QIKZ9+ebUrOn6aMHrY 7G3lv7akc5Nh2WqJe8GNPYMlnh2708AIiJRNbv9HebMbsZYAnnEFgYqttaOLr3ssrCgi dASpOm4pu3o1Ak6lMTSMgX+Qvy8HDdwVs1bKUSHxK9shCSiiaqLVfwcwyEAa6HHGf7Pe Yc/b3vf/hlyhHdta13ya+FHBIN5EQau6U0O/aSIinDE9t/yErm3tLFNZDDUe/C40lqwQ vDmg== X-Gm-Message-State: ACrzQf0Ck/kZnuaaV0oW8Z7vrvcLK1a8xtom9wvzx5NMO+1lHvhCDpvA jhdGEMQsxifyYqsNdgzv6UNswLGcVk7Vxnot X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a81:383:0:b0:350:9b62:60bc with SMTP id 125-20020a810383000000b003509b6260bcmr17608845ywd.514.1666370238413; Fri, 21 Oct 2022 09:37:18 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:19 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-4-jthoughton@google.com> Subject: [RFC PATCH v2 03/47] hugetlb: remove redundant pte_mkhuge in migration path From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747315902354869125?= X-GMAIL-MSGID: =?utf-8?q?1747315902354869125?= arch_make_huge_pte, which is called immediately following pte_mkhuge, already makes the necessary changes to the PTE that pte_mkhuge would have. The generic implementation of arch_make_huge_pte simply calls pte_mkhuge. Signed-off-by: James Houghton Acked-by: Peter Xu Acked-by: Mina Almasry Reviewed-by: Mike Kravetz --- mm/migrate.c | 1 - 1 file changed, 1 deletion(-) diff --git a/mm/migrate.c b/mm/migrate.c index 8e5eb6ed9da2..1457cdbb7828 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -237,7 +237,6 @@ static bool remove_migration_pte(struct folio *folio, if (folio_test_hugetlb(folio)) { unsigned int shift = huge_page_shift(hstate_vma(vma)); - pte = pte_mkhuge(pte); pte = arch_make_huge_pte(pte, shift, vma->vm_flags); if (folio_test_anon(folio)) hugepage_add_anon_rmap(new, vma, pvmw.address, From patchwork Fri Oct 21 16:36:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6822 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp794941wrr; Fri, 21 Oct 2022 09:38:01 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5r/YkxJdp/0APqzO9OiSPCFO0ffT1q6D6gK/znHf1834rPbG9rV1O45WXtPyBef5E9hBHt X-Received: by 2002:a05:6a00:18a2:b0:56b:6823:322c with SMTP id x34-20020a056a0018a200b0056b6823322cmr1416596pfh.18.1666370280758; Fri, 21 Oct 2022 09:38:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370280; cv=none; d=google.com; s=arc-20160816; b=T2qKbVxyypoumhRNIr2lgsOF8ZdjoYh+DoGZFoKA1GnU3J0Ug0FRQdssSYypKXa/JL NGi1b+1Lbn5O/lyIWQzRj7UfIPedmQ0xF7Zig7TRZOEh8Pcc3UF1JnMqBeQdkgwE9bFd 15ADg+yXksGJJAtWhHXXNG5MAaY2UxjhPr1XhX6O8ykmBKzaFD56ZuG0ZcG6NdKdf3j5 DDTeIFhKxwHBnLqfjHn9Kt+nWTPzlh5sTbDwc2sKv0m+6umClskP4aLw7nK8A2dGdA1B jW0GGYicUie26RoSjH0VjGnf26d7c5brYMfFN/A3DLqmw4mlkBl+pGcRPmXEObDUj2ZT Hv8g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=3+WerD0u1o5rdjYYXB/kVnQa0VLhkM9eeGW4etvGwB4=; b=rajB37bUQhMQUY5upZa/oAtGe0hiXT1FQCmljxGqfayr2Mivj4D8HA948e9jSBszxk 62Ie6r0v00gOhHBoB/g6cV7+j7v4LBubypvLMwHMITmlXdryAbDeDYHbNn2mFeM5qT56 Zscfo7dxKOk/3nJcGKG03cO+fUeVe57XaIHlN8wmK6eFKA5w+//8HRQwesiP4qe1IIhu PwL33GyWcMg+kuMgarE7MR7zoEH3UmpJLIYpCDozVvZMnvrN1DHMq7KyF1jhbudQSDVF qm19GtO2b0I84duOvLFdxJauIN6It9C/ObZaviC6azqVlSGHxWcCciPOfG4n9RkZtXKz OV7A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=tdKQZAo0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d19-20020a170902e15300b00179e261e4f8si21837763pla.90.2022.10.21.09.37.48; Fri, 21 Oct 2022 09:38:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=tdKQZAo0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230445AbiJUQha (ORCPT + 99 others); Fri, 21 Oct 2022 12:37:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52420 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230321AbiJUQhX (ORCPT ); Fri, 21 Oct 2022 12:37:23 -0400 Received: from mail-vs1-xe49.google.com (mail-vs1-xe49.google.com [IPv6:2607:f8b0:4864:20::e49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 21C20277A14 for ; Fri, 21 Oct 2022 09:37:21 -0700 (PDT) Received: by mail-vs1-xe49.google.com with SMTP id m185-20020a6771c2000000b00390d0a1217aso1031538vsc.2 for ; Fri, 21 Oct 2022 09:37:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=3+WerD0u1o5rdjYYXB/kVnQa0VLhkM9eeGW4etvGwB4=; b=tdKQZAo0gYs0G2kdztjg1TX/oj/RnJXrve82tS8xBK6mSdOdMhGNTvkXAs8ZVxrK7z c8Kzgsu4FgaGzd7K4NG6nHrY9Icl4OgkNHSsL2+fveGeW7tPGAvA7jOfo/RrP3+xhVsj hj3HZu2VVZPzkXEDKmJLfia5sRbAnjgM+kgr5WU35yWKtsbyjTr1njgKfeGFoYirHR50 cSTvm6iAX5EBkULkMrNaZyqGJUKF7r8UPQRlxf5X/7PYgCLrcqJENsCKtKyvvJj2Wm1K 9YyCpBMTJx1mwbB+Hpr2b1STLxNq4+6UD5uPyGqMcDZTJDNCR2PlND0UA0Ab0EPHpo7w JnYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=3+WerD0u1o5rdjYYXB/kVnQa0VLhkM9eeGW4etvGwB4=; b=IxGNPGDzH58a/yH0mVo4hKr5CbxTpG84kGKUuqN6zLMEqODbi/O35k9NqEZQaZweQU 1CeTVPW+PWMOxxrmCWeH28sOX1QMkzhD7bpKkkAzcAagB25B1EBbkNY3JcFB6hkYw1ec xLkzDpOlMbzWIblSzAqe/C1ARFLn0OyqfWuU8EHddJp40ek4llDfAEuP5Jg4ajiKRCXa 3P58OIQNJO55SZ0DNOgj65XfKP8SdRabtHfDyWeCPgcZqCvhjSvBR/YjBrBX5AbdTxAO cG2vdpXcm8DVIt3nAP/sAPNH7D8lmsNkUG24oM67Np8ULGSmMmM+BezwHwvq5vNp5XM2 pTfg== X-Gm-Message-State: ACrzQf1sdo0pS7LirXi42T8djTE4WWisMiPCc6T+jcnv7x+nAQLKRiX0 DB+efbhMIjTHt2bOBmVqtrg++TI+eGlnWkC0 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a1f:aa42:0:b0:3ab:81ee:8fa9 with SMTP id t63-20020a1faa42000000b003ab81ee8fa9mr12468300vke.17.1666370240102; Fri, 21 Oct 2022 09:37:20 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:20 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-5-jthoughton@google.com> Subject: [RFC PATCH v2 04/47] hugetlb: only adjust address ranges when VMAs want PMD sharing From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747315883407297692?= X-GMAIL-MSGID: =?utf-8?q?1747315883407297692?= Currently this check is overly aggressive. For some userfaultfd VMAs, VMA sharing is disabled, yet we still widen the address range, which is used for flushing TLBs and sending MMU notifiers. This is done now, as HGM VMAs also have sharing disabled, yet would still have flush ranges adjusted. Overaggressively flushing TLBs and triggering MMU notifiers is particularly harmful with lots of high-granularity operations. Signed-off-by: James Houghton Acked-by: Peter Xu Reviewed-by: Mike Kravetz --- mm/hugetlb.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 20a111b532aa..52cec5b0789e 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6835,22 +6835,31 @@ static unsigned long page_table_shareable(struct vm_area_struct *svma, return saddr; } -bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr) +static bool pmd_sharing_possible(struct vm_area_struct *vma) { - unsigned long start = addr & PUD_MASK; - unsigned long end = start + PUD_SIZE; - #ifdef CONFIG_USERFAULTFD if (uffd_disable_huge_pmd_share(vma)) return false; #endif /* - * check on proper vm_flags and page table alignment + * Only shared VMAs can share PMDs. */ if (!(vma->vm_flags & VM_MAYSHARE)) return false; if (!vma->vm_private_data) /* vma lock required for sharing */ return false; + return true; +} + +bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr) +{ + unsigned long start = addr & PUD_MASK; + unsigned long end = start + PUD_SIZE; + /* + * check on proper vm_flags and page table alignment + */ + if (!pmd_sharing_possible(vma)) + return false; if (!range_in_vma(vma, start, end)) return false; return true; @@ -6871,7 +6880,7 @@ void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, * vma needs to span at least one aligned PUD size, and the range * must be at least partially within in. */ - if (!(vma->vm_flags & VM_MAYSHARE) || !(v_end > v_start) || + if (!pmd_sharing_possible(vma) || !(v_end > v_start) || (*end <= v_start) || (*start >= v_end)) return; From patchwork Fri Oct 21 16:36:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6823 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp795046wrr; Fri, 21 Oct 2022 09:38:14 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6k97Bm5Ji9gacVfg2WmQFkTVIqtUYxX8/ZrMS60rfqRRBE8fF66p585xk6xoIjSqDd6O1w X-Received: by 2002:a63:145d:0:b0:44b:f115:f90f with SMTP id 29-20020a63145d000000b0044bf115f90fmr17140124pgu.157.1666370294004; Fri, 21 Oct 2022 09:38:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370293; cv=none; d=google.com; s=arc-20160816; b=O+rto/cW+lc/Zl3bU59z7giUSYYe9EwtMjqgCRzMNU+GiH2+O97EfJQE1XQaKleKHp FQpGtSCOa9h6bTTaMAIcLu3CTKXyBileNglLXXbLSf+36RoY/LbZn+Qo7+FCoOI1JDi0 X5NsIgrv20HLhd3Y5JvFMtrecFkBtmHF7ut/FlYd8KEvVlVyHejWXwcw4bsjh2Rgb7EB Kddaa5zx/pY9KvUj0GMTOOSkQu8nHVgRm8PVqU3unWcX1fKsm1uP2q1w9vX73sgnFhkC a7Rjc5UNrt+f9xMBUQYkpFXKqx9Q/dQWDqKJbDNyY/68z6aw6gRtFGuvgN/9SPD6tzfk xFUg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=34cx5eEOWWMEza3+3K1vLVCCrFvx1bz9vEUMrzq2ytU=; b=XT/7uEQ9JYfOtOVtN7RMkIbbVeSJBslKiADwM6pWUxUwfLebHx6A4obe1ysRZhGQMx GtQaW6jGjga8uAKZa1GxCGyrK7eaakSTwHqy42hPr2riV8To8/mXuzaxq0tX/yGMe+Wh NB3MgX8I/poCZs7/rKDnjQMMBCfokVZeRsIynoKC9xPRhCDH5yugfN2O7bRSb7j03ZsK 04P3bxNCh0IvD8VZ8GNgLv/RLN+IqCBWZ3nyiYuEwnX32spF+1YyHMZO+pSznK5/40Nk oyiYXwwbEJAikkp4gvEvPEGnbToV2OPVxOR719VUGD7pg+wfbFBMIHAOrBQjaqqBVhMA KfRg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=Sp9ejFIj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l62-20020a638841000000b0044c7a49bef7si24312807pgd.259.2022.10.21.09.38.00; Fri, 21 Oct 2022 09:38:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=Sp9ejFIj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230472AbiJUQhn (ORCPT + 99 others); Fri, 21 Oct 2022 12:37:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52586 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230385AbiJUQh1 (ORCPT ); Fri, 21 Oct 2022 12:37:27 -0400 Received: from mail-vs1-xe4a.google.com (mail-vs1-xe4a.google.com [IPv6:2607:f8b0:4864:20::e4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3541327B54B for ; Fri, 21 Oct 2022 09:37:22 -0700 (PDT) Received: by mail-vs1-xe4a.google.com with SMTP id p184-20020a6742c1000000b003a9dc8a5ec2so1052458vsa.22 for ; Fri, 21 Oct 2022 09:37:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=34cx5eEOWWMEza3+3K1vLVCCrFvx1bz9vEUMrzq2ytU=; b=Sp9ejFIjwkTD2k7b5O35d/eTr3qFaL4Cu0+Y3Jqen1+jW42rNvItxKaNURZz5TSKk7 x35Bvv5PH68z6pCQYFbhWyoyGgG4rzUpeOPbjmYrzgvXuzuUNcyvX4/Otvfkuiq5+cZv HHQynZVto3uDixtcVQsCf3qWQvklcY23DsStSgC2qGnXAGQ/l0stiqvSXSvt9oZVTDkE Caber1o4FewbpJvcwU1yEnRVdmKdy2XZSnVqN1DFbOLIkZI+hKbcR55fyjn4Og0CJ7M/ x13b8bjYpqhnO2fokhLpSAWC6h6Udyc8I00LhZRGAFNopGb6wpiImE12TuoAvXq6t9Ke 1k4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=34cx5eEOWWMEza3+3K1vLVCCrFvx1bz9vEUMrzq2ytU=; b=FruMi4n4XpGw2FLrALD6TCR0HJQg8ncpUUEszG9NgMVXpXIJeFBc/s/BeH0GaLQf7f YYNWRqTxfO3HItPVw2JzMCGT+OLD7xxzwqwIxZNsVeCabjF1st0w2/Euk6x44B1+7mAF go9VPj1d84krzrP0uKOZQAwFSV7gvKmz59JN/GyUwgflUWPXSdLBICX7rA3aUEPE1d+G eqejsUlkBbtsLN/yhGKTh6DQ9IUZv6qDSqp+vef+Q6okCQEuBCmldqBZQ5+lZsjz7YCn Gvsw+/XzIQlkgH19QVzU8BPQiUefBNEpzVaRj4KaD7OHQ1suDldTdOd3Dj98bTiVPwiD 8oJw== X-Gm-Message-State: ACrzQf30tQQflkDYlIJUKgjW880j3HTx+Nbscp8s1W6OSsJ5TcvhTntu GjxG3Z03DjZ7FvmuHBe7eRrV/zbncmGXjNva X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a67:fa59:0:b0:3a7:7516:b43b with SMTP id j25-20020a67fa59000000b003a77516b43bmr14278433vsq.83.1666370241512; Fri, 21 Oct 2022 09:37:21 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:21 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-6-jthoughton@google.com> Subject: [RFC PATCH v2 05/47] hugetlb: make hugetlb_vma_lock_alloc return its failure reason From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747315897607722023?= X-GMAIL-MSGID: =?utf-8?q?1747315897607722023?= Currently hugetlb_vma_lock_alloc doesn't return anything, as there is no need: if it fails, PMD sharing won't be enabled. However, HGM requires that the VMA lock exists, so we need to verify that hugetlb_vma_lock_alloc actually succeeded. If hugetlb_vma_lock_alloc fails, then we can pass that up to the caller that is attempting to enable HGM. Signed-off-by: James Houghton --- mm/hugetlb.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 52cec5b0789e..dc82256b89dd 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -92,7 +92,7 @@ struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp; /* Forward declaration */ static int hugetlb_acct_memory(struct hstate *h, long delta); static void hugetlb_vma_lock_free(struct vm_area_struct *vma); -static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma); +static int hugetlb_vma_lock_alloc(struct vm_area_struct *vma); static void __hugetlb_vma_unlock_write_free(struct vm_area_struct *vma); static inline bool subpool_is_free(struct hugepage_subpool *spool) @@ -7001,17 +7001,17 @@ static void hugetlb_vma_lock_free(struct vm_area_struct *vma) } } -static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma) +static int hugetlb_vma_lock_alloc(struct vm_area_struct *vma) { struct hugetlb_vma_lock *vma_lock; /* Only establish in (flags) sharable vmas */ if (!vma || !(vma->vm_flags & VM_MAYSHARE)) - return; + return -EINVAL; - /* Should never get here with non-NULL vm_private_data */ + /* We've already allocated the lock. */ if (vma->vm_private_data) - return; + return 0; vma_lock = kmalloc(sizeof(*vma_lock), GFP_KERNEL); if (!vma_lock) { @@ -7026,13 +7026,14 @@ static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma) * allocation failure. */ pr_warn_once("HugeTLB: unable to allocate vma specific lock\n"); - return; + return -ENOMEM; } kref_init(&vma_lock->refs); init_rwsem(&vma_lock->rw_sema); vma_lock->vma = vma; vma->vm_private_data = vma_lock; + return 0; } /* @@ -7160,8 +7161,9 @@ static void hugetlb_vma_lock_free(struct vm_area_struct *vma) { } -static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma) +static int hugetlb_vma_lock_alloc(struct vm_area_struct *vma) { + return 0; } pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, From patchwork Fri Oct 21 16:36:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6825 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp795140wrr; Fri, 21 Oct 2022 09:38:23 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6uQiSdH7myCl818OB5bLs++Lr2lmtGAqUGTUnB3VxI6VBEMjj+PHyEQiW+zm/PtXKuvlFQ X-Received: by 2002:a17:902:be03:b0:17b:80c1:78c2 with SMTP id r3-20020a170902be0300b0017b80c178c2mr20359564pls.34.1666370303589; Fri, 21 Oct 2022 09:38:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370303; cv=none; d=google.com; s=arc-20160816; b=JJsdIRogNyxXXTktA1OD4ICqKsijBSZjjNogXO5/DI1+hby6PGQVEAQfMk6YH5SFix EnrItmeu0rF9moWbMsKY7+BSI2/w+4KnWAubyGpyf0OLlUhl7juBCRIgNCoWGEmyXb1j V9kjyLwkOhdAOSK4amOXC3YXlqUqn00cKKrjfk8kDawXtEfzYClTVs0V/9EpPZ8qoRa+ 0pseZjXJ/leepZ8pnP0BopEMeKIYzUvLF+lvoSamuTsttgoPFreIC7zSjYTzkSBPdjfb UOySc4DluQNxQfZFo4J1vkIOHVAMLwltZxWYBhKwcD9Lx02DEMcGzJCq15r1SOyty7yW AC1g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=EmV7/WIsSX/8L/pw2a2OJCYUYMgWWbmlq26RmUc7uno=; b=Ds5RANl0yYvrRHJx9lLCUbfVwgexhZW6cFzfoVeyERjguAFqug6BO3M+T196CFrpuo xUvBLU5gq3ryYlAusxWNvMpaRQgYToo916cZRf/QtYan/Uxgu9AEb0MRxZa9CQpld5+e tl/T/92foDLczdEtiHlPG+C7UCx0k/DoWz0sthUil5XqebyEr7N6xqlW6kDk2j4K+ZVO loKMXMvnvcPH8SqCC0nfIS+xMrgISVY0wtawNR8X2tbj5vF/HYa34QtLxxAof6e4wgMr stGPBHiOcoxswPPTJiF4YtvQxQzeHZisS+ZNEDmTIxEXAHGIdODBfgEVerOo64CQ1wqj FqzQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=EjsRjGW9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q11-20020a63504b000000b0046ade103855si24986413pgl.312.2022.10.21.09.38.11; Fri, 21 Oct 2022 09:38:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=EjsRjGW9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230482AbiJUQhs (ORCPT + 99 others); Fri, 21 Oct 2022 12:37:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52562 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230328AbiJUQh2 (ORCPT ); Fri, 21 Oct 2022 12:37:28 -0400 Received: from mail-vs1-xe4a.google.com (mail-vs1-xe4a.google.com [IPv6:2607:f8b0:4864:20::e4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BBFD627B54F for ; Fri, 21 Oct 2022 09:37:23 -0700 (PDT) Received: by mail-vs1-xe4a.google.com with SMTP id d184-20020a671dc1000000b0039b46979cb9so1036700vsd.4 for ; Fri, 21 Oct 2022 09:37:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=EmV7/WIsSX/8L/pw2a2OJCYUYMgWWbmlq26RmUc7uno=; b=EjsRjGW9BIOWJ38zTf5oxqMXDmlQSTQ3fYn3u9zu3boTToWZbcGqMU9ujmx640GFRE 9DUkw2yIPQ77Nn5MUKOb5kXytgsWQCvBSzYZM5ncj6hObT+maHj7QLFp+FH44fcEAm6s E54uAsE9CsLgxWffXC8Kk3/4Jzx+VZfruoAAQwnnwD+/xbUG4mg5X6y9LbZPB2Zen+1y 1D72cLftsFvHU+NJIIKl4sRZGZcynq0InczbAJ1ff1mayWhCSq0HjcxZ85zVQqshHO7I JBMe4cxSTiZbZ0HfhHlpTs6kGaZKFF03HNKa1/XrU2XZyxzQiAxSFXRPbPzeToVyxPWz xW8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=EmV7/WIsSX/8L/pw2a2OJCYUYMgWWbmlq26RmUc7uno=; b=Ps4S1IbRgaTaBKbNotwgCPAEgHK4pf4typ2uqztXI+3bBjAypR+lhwpS86PMWxxMkX 0UviNV+qlnozfwJsOdgvW2ycI6lAdZzVUnk9pxs4jpnhwPw9GPoFGqUoOOysbn75FsLF 9un5mKqDx+N4xeceS3qE/V/TGuBj2F/uEXq0agkjeeUEHAtQ3CHGEBfzTFCvNgfOiMby ZQTFmjkrCxaWx22Do64I1QH7kZACvti5i8t0m9UfNaekCOxIh7OOjJRqKark4LYxhkPd wYu98/zqsvOjo1pRBvLXZmgqofFudsAvztu83KdVAdXsIItj4fia7VkyixTqOGDIeF+p 83PA== X-Gm-Message-State: ACrzQf1N6VD/0cWjlwkzMV0F4lDamXLmhN/85514T7mYdxAYPy8e/xzI Y0m4Wy1ebt2q6tje21MFUErpu4BbS07SJ8JN X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:ab0:3742:0:b0:403:e8e2:865a with SMTP id i2-20020ab03742000000b00403e8e2865amr1305908uat.37.1666370242435; Fri, 21 Oct 2022 09:37:22 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:22 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-7-jthoughton@google.com> Subject: [RFC PATCH v2 06/47] hugetlb: extend vma lock for shared vmas From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747315907528071849?= X-GMAIL-MSGID: =?utf-8?q?1747315907528071849?= This allows us to add more data into the shared structure, which we will use to store whether or not HGM is enabled for this VMA or not, as HGM is only available for shared mappings. It may be better to include HGM as a VMA flag instead of extending the VMA lock structure. Signed-off-by: James Houghton --- include/linux/hugetlb.h | 4 +++ mm/hugetlb.c | 65 +++++++++++++++++++++-------------------- 2 files changed, 37 insertions(+), 32 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index a899bc76d677..534958499ac4 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -121,6 +121,10 @@ struct hugetlb_vma_lock { struct vm_area_struct *vma; }; +struct hugetlb_shared_vma_data { + struct hugetlb_vma_lock vma_lock; +}; + extern struct resv_map *resv_map_alloc(void); void resv_map_release(struct kref *ref); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index dc82256b89dd..5ae8bc8c928e 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -91,8 +91,8 @@ struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp; /* Forward declaration */ static int hugetlb_acct_memory(struct hstate *h, long delta); -static void hugetlb_vma_lock_free(struct vm_area_struct *vma); -static int hugetlb_vma_lock_alloc(struct vm_area_struct *vma); +static void hugetlb_vma_data_free(struct vm_area_struct *vma); +static int hugetlb_vma_data_alloc(struct vm_area_struct *vma); static void __hugetlb_vma_unlock_write_free(struct vm_area_struct *vma); static inline bool subpool_is_free(struct hugepage_subpool *spool) @@ -4643,11 +4643,11 @@ static void hugetlb_vm_op_open(struct vm_area_struct *vma) if (vma_lock) { if (vma_lock->vma != vma) { vma->vm_private_data = NULL; - hugetlb_vma_lock_alloc(vma); + hugetlb_vma_data_alloc(vma); } else pr_warn("HugeTLB: vma_lock already exists in %s.\n", __func__); } else - hugetlb_vma_lock_alloc(vma); + hugetlb_vma_data_alloc(vma); } } @@ -4659,7 +4659,7 @@ static void hugetlb_vm_op_close(struct vm_area_struct *vma) unsigned long reserve, start, end; long gbl_reserve; - hugetlb_vma_lock_free(vma); + hugetlb_vma_data_free(vma); resv = vma_resv_map(vma); if (!resv || !is_vma_resv_set(vma, HPAGE_RESV_OWNER)) @@ -6629,7 +6629,7 @@ bool hugetlb_reserve_pages(struct inode *inode, /* * vma specific semaphore used for pmd sharing synchronization */ - hugetlb_vma_lock_alloc(vma); + hugetlb_vma_data_alloc(vma); /* * Only apply hugepage reservation if asked. At fault time, an @@ -6753,7 +6753,7 @@ bool hugetlb_reserve_pages(struct inode *inode, hugetlb_cgroup_uncharge_cgroup_rsvd(hstate_index(h), chg * pages_per_huge_page(h), h_cg); out_err: - hugetlb_vma_lock_free(vma); + hugetlb_vma_data_free(vma); if (!vma || vma->vm_flags & VM_MAYSHARE) /* Only call region_abort if the region_chg succeeded but the * region_add failed or didn't run. @@ -6901,55 +6901,55 @@ static bool __vma_shareable_flags_pmd(struct vm_area_struct *vma) void hugetlb_vma_lock_read(struct vm_area_struct *vma) { if (__vma_shareable_flags_pmd(vma)) { - struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; + struct hugetlb_shared_vma_data *data = vma->vm_private_data; - down_read(&vma_lock->rw_sema); + down_read(&data->vma_lock.rw_sema); } } void hugetlb_vma_unlock_read(struct vm_area_struct *vma) { if (__vma_shareable_flags_pmd(vma)) { - struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; + struct hugetlb_shared_vma_data *data = vma->vm_private_data; - up_read(&vma_lock->rw_sema); + up_read(&data->vma_lock.rw_sema); } } void hugetlb_vma_lock_write(struct vm_area_struct *vma) { if (__vma_shareable_flags_pmd(vma)) { - struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; + struct hugetlb_shared_vma_data *data = vma->vm_private_data; - down_write(&vma_lock->rw_sema); + down_write(&data->vma_lock.rw_sema); } } void hugetlb_vma_unlock_write(struct vm_area_struct *vma) { if (__vma_shareable_flags_pmd(vma)) { - struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; + struct hugetlb_shared_vma_data *data = vma->vm_private_data; - up_write(&vma_lock->rw_sema); + up_write(&data->vma_lock.rw_sema); } } int hugetlb_vma_trylock_write(struct vm_area_struct *vma) { - struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; + struct hugetlb_shared_vma_data *data = vma->vm_private_data; if (!__vma_shareable_flags_pmd(vma)) return 1; - return down_write_trylock(&vma_lock->rw_sema); + return down_write_trylock(&data->vma_lock.rw_sema); } void hugetlb_vma_assert_locked(struct vm_area_struct *vma) { if (__vma_shareable_flags_pmd(vma)) { - struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; + struct hugetlb_shared_vma_data *data = vma->vm_private_data; - lockdep_assert_held(&vma_lock->rw_sema); + lockdep_assert_held(&data->vma_lock.rw_sema); } } @@ -6985,7 +6985,7 @@ static void __hugetlb_vma_unlock_write_free(struct vm_area_struct *vma) } } -static void hugetlb_vma_lock_free(struct vm_area_struct *vma) +static void hugetlb_vma_data_free(struct vm_area_struct *vma) { /* * Only present in sharable vmas. @@ -6994,16 +6994,17 @@ static void hugetlb_vma_lock_free(struct vm_area_struct *vma) return; if (vma->vm_private_data) { - struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; + struct hugetlb_shared_vma_data *data = vma->vm_private_data; + struct hugetlb_vma_lock *vma_lock = &data->vma_lock; down_write(&vma_lock->rw_sema); __hugetlb_vma_unlock_write_put(vma_lock); } } -static int hugetlb_vma_lock_alloc(struct vm_area_struct *vma) +static int hugetlb_vma_data_alloc(struct vm_area_struct *vma) { - struct hugetlb_vma_lock *vma_lock; + struct hugetlb_shared_vma_data *data; /* Only establish in (flags) sharable vmas */ if (!vma || !(vma->vm_flags & VM_MAYSHARE)) @@ -7013,8 +7014,8 @@ static int hugetlb_vma_lock_alloc(struct vm_area_struct *vma) if (vma->vm_private_data) return 0; - vma_lock = kmalloc(sizeof(*vma_lock), GFP_KERNEL); - if (!vma_lock) { + data = kmalloc(sizeof(*data), GFP_KERNEL); + if (!data) { /* * If we can not allocate structure, then vma can not * participate in pmd sharing. This is only a possible @@ -7025,14 +7026,14 @@ static int hugetlb_vma_lock_alloc(struct vm_area_struct *vma) * until the file is removed. Warn in the unlikely case of * allocation failure. */ - pr_warn_once("HugeTLB: unable to allocate vma specific lock\n"); + pr_warn_once("HugeTLB: unable to allocate vma shared data\n"); return -ENOMEM; } - kref_init(&vma_lock->refs); - init_rwsem(&vma_lock->rw_sema); - vma_lock->vma = vma; - vma->vm_private_data = vma_lock; + kref_init(&data->vma_lock.refs); + init_rwsem(&data->vma_lock.rw_sema); + data->vma_lock.vma = vma; + vma->vm_private_data = data; return 0; } @@ -7157,11 +7158,11 @@ static void __hugetlb_vma_unlock_write_free(struct vm_area_struct *vma) { } -static void hugetlb_vma_lock_free(struct vm_area_struct *vma) +static void hugetlb_vma_data_free(struct vm_area_struct *vma) { } -static int hugetlb_vma_lock_alloc(struct vm_area_struct *vma) +static int hugetlb_vma_data_alloc(struct vm_area_struct *vma) { return 0; } From patchwork Fri Oct 21 16:36:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6826 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp795187wrr; Fri, 21 Oct 2022 09:38:28 -0700 (PDT) X-Google-Smtp-Source: AMsMyM60sCWxsSvc+1x5xNczaeNAl1kC+D3wNpLstJXihw8/EGkAUzzSINZGYgbqXbZSyKlpfQ2y X-Received: by 2002:a17:903:2410:b0:17a:b32:dbec with SMTP id e16-20020a170903241000b0017a0b32dbecmr20009880plo.163.1666370308388; Fri, 21 Oct 2022 09:38:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370308; cv=none; d=google.com; s=arc-20160816; b=vOuUQm1tTd0CQOPtYnT2MN/x17AT8+9M92fpBZdVWRTfh98o1JnyQKkfCyi/hKsrDd kOKTc4+Npv64kBfi7rYneEjNtY/KJQ5CuT8rBIIPNVE3VLBAFawwzTFvXK0j7Mxkw1yd MSGZ49KPqgBZd+vw1XWgQRmBAQSlmiubbPg0HoXNP6P6ejEipp9RObKQrJCXXAyN9/Y+ 3Faw9LstGdOOAFEIwFddFQKJJGLreYoYvCXbOtCrESV2s6ND5WLAzB42196tuf3Y8eGS xyFdQPEm53GlnUSyROvx76S3SAESxPCL6reNLQhVTPB6/BN2GLk4hgelNZi4B2Q0/f7Z BFKg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=IUxkfzPpYKN6asWxOUK5hNSN6fMNkS8/sAXTNszQZDU=; b=Ok+01knSIfSuCYuz+WhB5H3W47RnMyrIk+9iP8P7h6T74NsjZW5MDXMlpcKQ+F0sdr PHYa8kLIWcKFvRKtWmAq8Y0cbnp00iru++fjt7w8yOnJmGq+ttLYZgSsAl3ehDMOlPm6 pmIlSHhZ2dGcYFLMDCReQD8oPYnl9ouMANWT8kMBAu292tHNZQGzhPHh8JfzljxkAYrg V/d4GdE1kZ78fq1mCLomUuED4SI8VB6J62fJMpJvC1vMtwboeermOJOVw7Z6lWQiL7Dh Zx9wR24YZcyBv6JmN5YFx/cxiH4sSr9RzGpkWj1GrL//i4dEMluy4eUyRbmXItK3R+MX tbDQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=KyHaYNZ4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c15-20020a056a00248f00b00543bc7e5bc9si25534249pfv.361.2022.10.21.09.38.15; Fri, 21 Oct 2022 09:38:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=KyHaYNZ4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230497AbiJUQhw (ORCPT + 99 others); Fri, 21 Oct 2022 12:37:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52568 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230421AbiJUQh3 (ORCPT ); Fri, 21 Oct 2022 12:37:29 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A30AA27B560 for ; Fri, 21 Oct 2022 09:37:24 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-367dc159c2fso33838717b3.19 for ; Fri, 21 Oct 2022 09:37:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=IUxkfzPpYKN6asWxOUK5hNSN6fMNkS8/sAXTNszQZDU=; b=KyHaYNZ4pYGCB/ViwdCe6ocu1GDgb5YvODnNgAb9mCc41is/HzCKWGlBYl+zSC2M7R wDxY0MPGeDl4H+0C+mB6AFFSBWs3XZbwRg6SRGwAx5hUQbyU0q2y7YE3x3g1mN6R7kOL mmsmmdI260xt2gPS3NqKTeMLpZZsslNcjBFK0FPPuqtHMPONuhEMnH8tH2tsucIOb/sf djUB2qYtnGhphrore61r1cmnFMRrRh4uh4EhSpubCR4MBBVW+CoIXhmSmouk+6a8lUUx XMYWqrbERhGrTAQJzZPB/mpCFUGWWYrvH6mlo2sZFv3G70Q1lmcsgtXbK8JdDbqteunh HYwQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=IUxkfzPpYKN6asWxOUK5hNSN6fMNkS8/sAXTNszQZDU=; b=gMkWMVwBAMjeZmC5mjvOl02uJNZTZLJ6Tfy5rg0H+wCm2MjZmSoL7j3Ixuu54he+Ol qnwcavChW1oXU0eFkUP0WvRCA7KMUWyYQ4d3fNlBYiqBgfXA+EoZpYLSWbb1EsYOjwO0 9wVCXua1XPTJPfNRxuhXAUcPTYvb7rvDSO8QSsr8MgvA+ZJ53rgkJNyVMCQcwhG/V6OX +cmWESx1E/Wl826M5vnQ1cx0bvoRHADYIWw32V15pIQWezj0l9sljOdxK2m69l9m3/7n JeZc6uKbvU51c3n7Mi93dboiiAiAwKMpl20MOB+KgbbDN1CgCaSfKIj4uuFk6qAOqZ8X /raw== X-Gm-Message-State: ACrzQf0KWd5sI6zwHhRvJ+ehCqlJxI62xQ/EqLcOnEnxvSAEdzGGnD6z nZOEDWsu+3ErMEFnYzg41EdnP+tp9jvdcWoc X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:790d:0:b0:670:6032:b1df with SMTP id u13-20020a25790d000000b006706032b1dfmr16600775ybc.629.1666370243225; Fri, 21 Oct 2022 09:37:23 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:23 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-8-jthoughton@google.com> Subject: [RFC PATCH v2 07/47] hugetlb: add CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747315912432624788?= X-GMAIL-MSGID: =?utf-8?q?1747315912432624788?= This adds the Kconfig to enable or disable high-granularity mapping. Each architecture must explicitly opt-in to it (via ARCH_WANT_HUGETLB_HIGH_GRANULARITY_MAPPING), but when opted in, HGM will be enabled by default if HUGETLB_PAGE is enabled. Signed-off-by: James Houghton --- fs/Kconfig | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/fs/Kconfig b/fs/Kconfig index 2685a4d0d353..ce2567946016 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -267,6 +267,13 @@ config HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON enable HVO by default. It can be disabled via hugetlb_free_vmemmap=off (boot command line) or hugetlb_optimize_vmemmap (sysctl). +config ARCH_WANT_HUGETLB_HIGH_GRANULARITY_MAPPING + bool + +config HUGETLB_HIGH_GRANULARITY_MAPPING + def_bool HUGETLB_PAGE + depends on ARCH_WANT_HUGETLB_HIGH_GRANULARITY_MAPPING + config MEMFD_CREATE def_bool TMPFS || HUGETLBFS From patchwork Fri Oct 21 16:36:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6827 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp795227wrr; Fri, 21 Oct 2022 09:38:32 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6C7dyPcDB81OCB+AzVVZmtF3gyxAi8KVcAuUj4ZCvlUYQmQI8lRMjmnrP2xTJ0Pds/YfKk X-Received: by 2002:a17:903:245:b0:178:e0ba:e507 with SMTP id j5-20020a170903024500b00178e0bae507mr20585521plh.115.1666370312274; Fri, 21 Oct 2022 09:38:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370312; cv=none; d=google.com; s=arc-20160816; b=rDqUYHc4JtVjZnvGB9GwYxN9ORfcqrsRsStgJSf7iaO+DMd/ehF5EH5zFVmhymRRm/ Hf7n7/L6Fd4FnRIgJRyW6bCsYE/fCLswsXVvQuVln3bdHAJHIFcEJGTYAMwLilwaAEiN 2cPNLGX0UHK7DQe/8LHe4B08X4tiZkJCrnM6xDMCwzWKnvMts+0yzSO7sEukzOLj2BiN 32zy/0kPyI1uIsyS3Vnhc8yzw9zxR2skanHd4tDYp28aG77ARo7ZvHeK3J9JQX07UBLg 8F3AaP3ZS/06vKRdhHHar/2DkC8hGQLT979hzFBKUtVA0LNSjroDspNcrsgs735OcprF k7/Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=2b1VLSKb1qIoBhHv68uc7sEp7uA+Ju1UfoLt8HAduUY=; b=IUtuK3jmbWwmgaaiwSr84b721Kbb736XME6oFXdLdNZTU7bKob4pWqO/0gdlFQzR5P btXzGX25C8BEwkmaBnKe3rQBY01/93vogKe3lxPTOqndR5REcSL4ojgPrOGVM6mJEzX1 wcPLzWwnnUc7uW4LIM0DbITg+XKVu2Ya9iTtftCs0ky9gkP2y5SHq8H9rx0aoHjicWhr AkIL9mtZXwjy4a4DDyASrpgG4fK0mxHawFxG35OOTR1smHMJwDRhlFZeEzgiYdV1Xunp Ol5UOFys3ywW+jxTs14P2Laj6+gPKNxmIVu9gbIzjtYtLMGqpWg+csVnLxChKhC/Qx5l 1BXQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=fFKq9A+3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s7-20020a639247000000b00455bdca33d0si30679444pgn.812.2022.10.21.09.38.20; Fri, 21 Oct 2022 09:38:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=fFKq9A+3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230518AbiJUQh4 (ORCPT + 99 others); Fri, 21 Oct 2022 12:37:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52572 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230427AbiJUQh3 (ORCPT ); Fri, 21 Oct 2022 12:37:29 -0400 Received: from mail-ua1-x94a.google.com (mail-ua1-x94a.google.com [IPv6:2607:f8b0:4864:20::94a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AED6227B564 for ; Fri, 21 Oct 2022 09:37:25 -0700 (PDT) Received: by mail-ua1-x94a.google.com with SMTP id h11-20020ab0470b000000b003bf1da44886so2348887uac.17 for ; Fri, 21 Oct 2022 09:37:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=2b1VLSKb1qIoBhHv68uc7sEp7uA+Ju1UfoLt8HAduUY=; b=fFKq9A+3A5TkI1IGkVnB2YWBf8fVHVGy1IiVCq2xH2D+ukznX6/SfUTTA86X+pjDnq mbkAvkkQZk6J12PGLDzV/KGi2b7kOMU02Aktw1M0kEN5f6ZdQppBNHzpLJZ/2+Tr/2/g ncbFQte81ZEqmhzmntgBziWdoyhRcdyDj0o5k51Fk87D8MZ5FdHzTMkbyahm7686yY3i DZOqrcFZGBRSPmwIMgUobG/r3zXICtn1+y5idZYO2hGxg2n7PJaREU84U/umC4o0E0Ow /Kbl96xQv1BiA7GCRPg/XqDMmTLQZjscb0wPrtARpfKTcFdOZGNtTdI8o1uc4cbj1uEg 69aw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=2b1VLSKb1qIoBhHv68uc7sEp7uA+Ju1UfoLt8HAduUY=; b=dZLaVedVZ65npY9+RIkby/0YIv1neWgkH3rdavJ16jIuwGRuR2nsjOPPvHuQo2suEw OZUIbouwdSSR+DHbCEOumya9kGRgCPjOHT1TN2L0+MUMjCiaDfZ5gN/uFVc+cmqCUWu5 XQq1NPLkgztXYvNt3vGFJLnme5Qk8U4C7JLsP1uBegrpRuUpQ28nO43FEq42WsZRY5ab lw1enGh4lZy7Tw9F1zko4k+DZ7MqdL43Jcge7Jot6a7jhXvRFiTjaSPZ5Qs9lb+D8EuG 0aKZ5dV1K37jt655ehSL3MZQwIJARx8YSpmWodtY/5gtrzpVpY2fLKGwCpYo856HfVGC 5/bA== X-Gm-Message-State: ACrzQf3b8CdCQyxV2yrGUJzPodUqkIhIWoC9aHvee0fvpO0m0NtfI9LT +Yp/nqfcRehuxzcyn4r1AXXQhIYp/Qvsm77+ X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:ab0:6847:0:b0:3f0:c29b:e14a with SMTP id a7-20020ab06847000000b003f0c29be14amr9896783uas.33.1666370244124; Fri, 21 Oct 2022 09:37:24 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:24 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-9-jthoughton@google.com> Subject: [RFC PATCH v2 08/47] hugetlb: add HGM enablement functions From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747315916989629171?= X-GMAIL-MSGID: =?utf-8?q?1747315916989629171?= Currently it is possible for all shared VMAs to use HGM, but it must be enabled first. This is because with HGM, we lose PMD sharing, and page table walks require additional synchronization (we need to take the VMA lock). Signed-off-by: James Houghton --- include/linux/hugetlb.h | 22 +++++++++++++ mm/hugetlb.c | 69 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 91 insertions(+) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 534958499ac4..6e0c36b08a0c 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -123,6 +123,9 @@ struct hugetlb_vma_lock { struct hugetlb_shared_vma_data { struct hugetlb_vma_lock vma_lock; +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING + bool hgm_enabled; +#endif }; extern struct resv_map *resv_map_alloc(void); @@ -1179,6 +1182,25 @@ static inline void hugetlb_unregister_node(struct node *node) } #endif /* CONFIG_HUGETLB_PAGE */ +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING +bool hugetlb_hgm_enabled(struct vm_area_struct *vma); +bool hugetlb_hgm_eligible(struct vm_area_struct *vma); +int enable_hugetlb_hgm(struct vm_area_struct *vma); +#else +static inline bool hugetlb_hgm_enabled(struct vm_area_struct *vma) +{ + return false; +} +static inline bool hugetlb_hgm_eligible(struct vm_area_struct *vma) +{ + return false; +} +static inline int enable_hugetlb_hgm(struct vm_area_struct *vma) +{ + return -EINVAL; +} +#endif + static inline spinlock_t *huge_pte_lock(struct hstate *h, struct mm_struct *mm, pte_t *pte) { diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 5ae8bc8c928e..a18143add956 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6840,6 +6840,10 @@ static bool pmd_sharing_possible(struct vm_area_struct *vma) #ifdef CONFIG_USERFAULTFD if (uffd_disable_huge_pmd_share(vma)) return false; +#endif +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING + if (hugetlb_hgm_enabled(vma)) + return false; #endif /* * Only shared VMAs can share PMDs. @@ -7033,6 +7037,9 @@ static int hugetlb_vma_data_alloc(struct vm_area_struct *vma) kref_init(&data->vma_lock.refs); init_rwsem(&data->vma_lock.rw_sema); data->vma_lock.vma = vma; +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING + data->hgm_enabled = false; +#endif vma->vm_private_data = data; return 0; } @@ -7290,6 +7297,68 @@ __weak unsigned long hugetlb_mask_last_page(struct hstate *h) #endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */ +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING +bool hugetlb_hgm_eligible(struct vm_area_struct *vma) +{ + /* + * All shared VMAs may have HGM. + * + * HGM requires using the VMA lock, which only exists for shared VMAs. + * To make HGM work for private VMAs, we would need to use another + * scheme to prevent collapsing/splitting from invalidating other + * threads' page table walks. + */ + return vma && (vma->vm_flags & VM_MAYSHARE); +} +bool hugetlb_hgm_enabled(struct vm_area_struct *vma) +{ + struct hugetlb_shared_vma_data *data = vma->vm_private_data; + + if (!vma || !(vma->vm_flags & VM_MAYSHARE)) + return false; + + return data && data->hgm_enabled; +} + +/* + * Enable high-granularity mapping (HGM) for this VMA. Once enabled, HGM + * cannot be turned off. + * + * PMDs cannot be shared in HGM VMAs. + */ +int enable_hugetlb_hgm(struct vm_area_struct *vma) +{ + int ret; + struct hugetlb_shared_vma_data *data; + + if (!hugetlb_hgm_eligible(vma)) + return -EINVAL; + + if (hugetlb_hgm_enabled(vma)) + return 0; + + /* + * We must hold the mmap lock for writing so that callers can rely on + * hugetlb_hgm_enabled returning a consistent result while holding + * the mmap lock for reading. + */ + mmap_assert_write_locked(vma->vm_mm); + + /* HugeTLB HGM requires the VMA lock to synchronize collapsing. */ + ret = hugetlb_vma_data_alloc(vma); + if (ret) + return ret; + + data = vma->vm_private_data; + BUG_ON(!data); + data->hgm_enabled = true; + + /* We don't support PMD sharing with HGM. */ + hugetlb_unshare_all_pmds(vma); + return 0; +} +#endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ + /* * These functions are overwritable if your architecture needs its own * behavior. From patchwork Fri Oct 21 16:36:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6828 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp795281wrr; Fri, 21 Oct 2022 09:38:37 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5+Maf2YvHWrBDeg1Pid9lQUzhKVR0OmVp3oXRDpVkZYDGJzmQVOBjUkwePUeSjNlE0qaCG X-Received: by 2002:a17:90b:314b:b0:20d:a462:b996 with SMTP id ip11-20020a17090b314b00b0020da462b996mr23079952pjb.39.1666370317268; Fri, 21 Oct 2022 09:38:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370317; cv=none; d=google.com; s=arc-20160816; b=r5DXfi6dttQQ480I5r+guJId/wEWMvczCnsBAKU/x2ilVlifS087Ob56t46y7KnmR5 uxtcP7JFLDpZkKDPBQ8UNnvb05n8Umw40rH9Jff87GI8hOib39bWUvaxVZTZpnSjpq4y XJwxguRk57A/ticCq/Yg470V/W9T/vza/DNebmNI3lhPgKGZLvFJfWGaadgugEjArele O51BALWmoDsN7nsMQpGPpqOSAZr+zxZcL28MB3STVWRqmvvZXULPug3X5+Ci3KCPKkS5 HymnoUSZanndXvEvSTjPPKz58ErJ+hCOkNQik8VYDyx+inbS4rpTevEBsPtfjBrfol4x u9AA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=uDuCPiOW4B8iVzI/6AmJPz8Wx+abw4sbfMgiEm8c4Xc=; b=fJy+MNw8wZP32v8NMsbeKzOOYdoEDuz8aAs8hYuAO28G4ZMCo7Ls3DMhwAQpPr/8An XvyN/50+oqtVSBAfB+fNAqV2RtJmPtVx8p7TpVZ1l4fYgawuyYvUkJrj6tvaclxZicTo FNoaDBMMP/Hjogh+Zzb+PrqIEx3qTSn7s2TG6wqH/aw5IOMlTMeiBHuxVfrs2T3PjN0b YSLVIP2RtxR+MkRZoi9JgU8TYv19KxWMZlFY+HUhwD6qaneLZD1lRM2rvjqbqhNLz4Es 6fqJBZqThvYzEUpdavvspixHcW7qYIec+QShxEH5S8WKW9fEbPmqutECaMiPaFUxOY7I kY7A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=qq0EliFW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d32-20020a631d60000000b00429f4b62461si26139742pgm.431.2022.10.21.09.38.24; Fri, 21 Oct 2022 09:38:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=qq0EliFW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230526AbiJUQiB (ORCPT + 99 others); Fri, 21 Oct 2022 12:38:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52684 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230428AbiJUQh3 (ORCPT ); Fri, 21 Oct 2022 12:37:29 -0400 Received: from mail-ua1-x94a.google.com (mail-ua1-x94a.google.com [IPv6:2607:f8b0:4864:20::94a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E2DA627B56D for ; Fri, 21 Oct 2022 09:37:26 -0700 (PDT) Received: by mail-ua1-x94a.google.com with SMTP id 95-20020a9f23e8000000b0038caa7cd5c1so2349724uao.8 for ; Fri, 21 Oct 2022 09:37:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=uDuCPiOW4B8iVzI/6AmJPz8Wx+abw4sbfMgiEm8c4Xc=; b=qq0EliFW2HK5B98D2udo/aAD/3vjrh0KvnobMtG+PZI475PQxdoifbXuauS2SCXR7s NBAGHfMDTBh2R7M3djga9zPIyUlhx0bNxKeL4L+V01PzyGv4mm0erA2L0Y1ny5S6vxJy 4gip8m0AqQsH7lfmx5uVfNaKn30YrEf2kISS2JxneX57zOtYW6lbymXlNNHiSfHhJ/y1 yUdl8fYNmg7ptYMobwK3mFqxuJ7Y4BDoNlS3denm9rDalcC6tLL8UL3VoKuOEfO+zFSR UdMrIX+D3fSXo2nL8Ka8LYE/7mxAdhiJ2pdIhLJL9bKOH/gF6yAWL/qKQ3jBuanK9jCL 6s5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=uDuCPiOW4B8iVzI/6AmJPz8Wx+abw4sbfMgiEm8c4Xc=; b=H1x/qASiCRbYpy6Qe+KoKeSqLQfpRHR5yLUd9o6vhgplzcBeMNq50sGfMNqWu/9bbr Cl0u1ugcv/9G2fBMCg6hTQi5YbvEJkYMuJAssjW6lTtko2Y2+oU5NyvFbpmexchUfXV3 m0+tZppSgv5FWP0s+YShAB07WvcSAPX2W7evrBYyPU9WnCK9dhTKGAzpEaELAcSsnFtP /mhH5T2r4kxzKBy3FRchhpyLv1ru3SlxGY1VO5fRYIQgApYr78mZfb5GQGD8V+WEWFqp rgaKAu3rKjaARLDPCGEoJ7RDia7HtIjSwrrEDLTiInbmjtqYrkhvRNUu8LzDjI9s3gcU BSgA== X-Gm-Message-State: ACrzQf0YWUIElb8qWUuGwdS5QcATKEIGwVqCD/me4XGxY3BGU1PNJ3jW Aa10/DJbOJY3KnGtcMr7Vev+0y3qSX83DF+Z X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6102:3172:b0:3a7:319c:ffef with SMTP id l18-20020a056102317200b003a7319cffefmr15180512vsm.80.1666370244975; Fri, 21 Oct 2022 09:37:24 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:25 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-10-jthoughton@google.com> Subject: [RFC PATCH v2 09/47] hugetlb: make huge_pte_lockptr take an explicit shift argument. From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747315921522248392?= X-GMAIL-MSGID: =?utf-8?q?1747315921522248392?= This is needed to handle PTL locking with high-granularity mapping. We won't always be using the PMD-level PTL even if we're using the 2M hugepage hstate. It's possible that we're dealing with 4K PTEs, in which case, we need to lock the PTL for the 4K PTE. Signed-off-by: James Houghton Reviewed-by: Mina Almasry Acked-by: Mike Kravetz --- arch/powerpc/mm/pgtable.c | 3 ++- include/linux/hugetlb.h | 9 ++++----- mm/hugetlb.c | 7 ++++--- mm/migrate.c | 3 ++- 4 files changed, 12 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c index cb2dcdb18f8e..035a0df47af0 100644 --- a/arch/powerpc/mm/pgtable.c +++ b/arch/powerpc/mm/pgtable.c @@ -261,7 +261,8 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma, psize = hstate_get_psize(h); #ifdef CONFIG_DEBUG_VM - assert_spin_locked(huge_pte_lockptr(h, vma->vm_mm, ptep)); + assert_spin_locked(huge_pte_lockptr(huge_page_shift(h), + vma->vm_mm, ptep)); #endif #else diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 6e0c36b08a0c..db3ed6095b1c 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -934,12 +934,11 @@ static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask) return modified_mask; } -static inline spinlock_t *huge_pte_lockptr(struct hstate *h, +static inline spinlock_t *huge_pte_lockptr(unsigned int shift, struct mm_struct *mm, pte_t *pte) { - if (huge_page_size(h) == PMD_SIZE) + if (shift == PMD_SHIFT) return pmd_lockptr(mm, (pmd_t *) pte); - VM_BUG_ON(huge_page_size(h) == PAGE_SIZE); return &mm->page_table_lock; } @@ -1144,7 +1143,7 @@ static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask) return 0; } -static inline spinlock_t *huge_pte_lockptr(struct hstate *h, +static inline spinlock_t *huge_pte_lockptr(unsigned int shift, struct mm_struct *mm, pte_t *pte) { return &mm->page_table_lock; @@ -1206,7 +1205,7 @@ static inline spinlock_t *huge_pte_lock(struct hstate *h, { spinlock_t *ptl; - ptl = huge_pte_lockptr(h, mm, pte); + ptl = huge_pte_lockptr(huge_page_shift(h), mm, pte); spin_lock(ptl); return ptl; } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index a18143add956..ef7662bd0068 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4847,7 +4847,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, } dst_ptl = huge_pte_lock(h, dst, dst_pte); - src_ptl = huge_pte_lockptr(h, src, src_pte); + src_ptl = huge_pte_lockptr(huge_page_shift(h), src, src_pte); spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); entry = huge_ptep_get(src_pte); again: @@ -4925,7 +4925,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, /* Install the new huge page if src pte stable */ dst_ptl = huge_pte_lock(h, dst, dst_pte); - src_ptl = huge_pte_lockptr(h, src, src_pte); + src_ptl = huge_pte_lockptr(huge_page_shift(h), + src, src_pte); spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); entry = huge_ptep_get(src_pte); if (!pte_same(src_pte_old, entry)) { @@ -4979,7 +4980,7 @@ static void move_huge_pte(struct vm_area_struct *vma, unsigned long old_addr, pte_t pte; dst_ptl = huge_pte_lock(h, mm, dst_pte); - src_ptl = huge_pte_lockptr(h, mm, src_pte); + src_ptl = huge_pte_lockptr(huge_page_shift(h), mm, src_pte); /* * We don't have to worry about the ordering of src and dst ptlocks diff --git a/mm/migrate.c b/mm/migrate.c index 1457cdbb7828..a0105fa6e3b2 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -334,7 +334,8 @@ void __migration_entry_wait_huge(pte_t *ptep, spinlock_t *ptl) void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte) { - spinlock_t *ptl = huge_pte_lockptr(hstate_vma(vma), vma->vm_mm, pte); + spinlock_t *ptl = huge_pte_lockptr(huge_page_shift(hstate_vma(vma)), + vma->vm_mm, pte); __migration_entry_wait_huge(pte, ptl); } From patchwork Fri Oct 21 16:36:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6830 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp795473wrr; Fri, 21 Oct 2022 09:38:55 -0700 (PDT) X-Google-Smtp-Source: AMsMyM58a3CetAXuhzEWhsdfTh6G9PikcmGEfu7RPiysRfIvlyddxvOnqpOhktXLfpZg1XMm82QH X-Received: by 2002:a17:902:e849:b0:17f:f3c4:a2c4 with SMTP id t9-20020a170902e84900b0017ff3c4a2c4mr19921189plg.125.1666370324765; Fri, 21 Oct 2022 09:38:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370324; cv=none; d=google.com; s=arc-20160816; b=F9sxInQ3aZ6by4nRYJmowAtlECtDE5YADyOp+9OsDepE9f2dxymkCirlR2mTUa/1uI QNhcYEtMXFlFCoArxpQtZ1dX2yIR911agaOTihcr3FK2MmNCeK4i0gfy7Snqy8BO6lbd +8qsKr+FGGbiU5KdcIOXRoldkt6t41fIB4YG/cKKpVmlLvPIsPLnR8/kyE5AXGWxIBCk 1dNwEq18aet6Oay43xPi2IojoLp9IqU+PnWjB+yHsI0N8Bf1S+/fMRRvthMqrvoMHZoq reyK9bHOI04FPklExGR5l1fdGT7M0yfNXQA/8W+TZ8DMwLjJ5lc05K98KsY1QzPk0rqo 8kSQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=fki+um4ICIptosTnJrWPKrPDlP+wU0UwwTN9RPzv9+o=; b=qGfSfzz0CNZz8hY/8HHYz33HKFOSmTAjHQScBS09LUhO6Jf2lEHiBQCWDDGZwjW6qV Z+g6cvhWZ+67miBAC+Ko48E4XWwCcaOT/0c8D6YAlIG+TgJ7tiE/DzP1KtfYCc8TFUdR 0SMyLF6IVASJmijX0Rd55OhpDvzWkRIDA0+K/cSWQp8Djiw0TWtBUW0kd10CPP/CUIqJ T+WTWmQPfc0KbbFGiOOaX+rKIWnhvCt39TKl4diRSrWbp3LiNuScD9IkSu3cVFT065Bq U43XpK3bZFwcDm+gb+1EGwE4FbrPhm2TAQ5PjUqC79m1bAh8QKchZevKOdQ/My5yqPtm fnDQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=ND0Wnc7o; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id oc1-20020a17090b1c0100b0020ad6d218cdsi176379pjb.3.2022.10.21.09.38.31; Fri, 21 Oct 2022 09:38:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=ND0Wnc7o; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230342AbiJUQiL (ORCPT + 99 others); Fri, 21 Oct 2022 12:38:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52606 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230439AbiJUQha (ORCPT ); Fri, 21 Oct 2022 12:37:30 -0400 Received: from mail-ua1-x94a.google.com (mail-ua1-x94a.google.com [IPv6:2607:f8b0:4864:20::94a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 26C5A27B56F for ; Fri, 21 Oct 2022 09:37:26 -0700 (PDT) Received: by mail-ua1-x94a.google.com with SMTP id h11-20020ab0470b000000b003bf1da44886so2348922uac.17 for ; Fri, 21 Oct 2022 09:37:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=fki+um4ICIptosTnJrWPKrPDlP+wU0UwwTN9RPzv9+o=; b=ND0Wnc7owKsedJs2eeuHEmNDxzWNQ8LJ9L6l2O850QzH0C/poOjJndJ7AVL7Po1zjY RTH3QeNCDw8Ggm9zXm0JwsqpfhqKci87zPwqjusR19zNbpEUtAwPucBKje4NDjKnw/yN JTUX7XoJ4tpG9TXphw9fp2FSGuVr5cTSuyjJYTpAt1r53VPzmdcNWylmEckjVlQivsUE doMwgo1hmTPDhQyun6/Q6ld67hLjwxbbwc8Uuj99/wST20NLCwvhvHvgbjaU/OG+tOnz +6TkRssWN2qGSoBubKtF0OdCdQxRP8LDpfebPGhVmTNXhq3XL/JjAXpxQmoXgb5b4WfY YwQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=fki+um4ICIptosTnJrWPKrPDlP+wU0UwwTN9RPzv9+o=; b=TlL4s2l6x5Kum61eScvmz35zXKbBMWdiUsXcigpIKhz7d1O/mgFIreWOf421RqJ/xl E5UJD5XQzfVs/S7ndfSqnO/75jvHq5eJwRAJqPPEfBVoMwwRQHp8GAEXhcMgP3x3Y9GR Qk7dR4Qb2N9dnvpJd3lAjF9q3S96nBFkIfUdCU0zeKjtLJ746o6+kTs4wpQTDm7WIPn2 weOGulcdbzb/SyaPMthNI0cTgTRvQftW8lT33tPGsdZcGyaoF6lFbTnX86zrSdhvE/9E P7zgbyzBq04xdahXNuKB1YqcPd1lZjNDJqivFa41fUyLeKh+WfZLO0dsPrm51ih+wwVC 2GdQ== X-Gm-Message-State: ACrzQf0MrHUR3dsJJf81gxyV+il7ztqV4GexocjlrF+sRm9buKDgeUv2 b3ub21ja88kvhiDvwukZuNlKUGflndZ8wIis X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6102:3e1a:b0:3a9:7543:204a with SMTP id j26-20020a0561023e1a00b003a97543204amr13044013vsv.53.1666370245924; Fri, 21 Oct 2022 09:37:25 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:26 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-11-jthoughton@google.com> Subject: [RFC PATCH v2 10/47] hugetlb: add hugetlb_pte to track HugeTLB page table entries From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747315929689596371?= X-GMAIL-MSGID: =?utf-8?q?1747315929689596371?= After high-granularity mapping, page table entries for HugeTLB pages can be of any size/type. (For example, we can have a 1G page mapped with a mix of PMDs and PTEs.) This struct is to help keep track of a HugeTLB PTE after we have done a page table walk. Without this, we'd have to pass around the "size" of the PTE everywhere. We effectively did this before; it could be fetched from the hstate, which we pass around pretty much everywhere. hugetlb_pte_present_leaf is included here as a helper function that will be used frequently later on. Signed-off-by: James Houghton --- include/linux/hugetlb.h | 88 +++++++++++++++++++++++++++++++++++++++++ mm/hugetlb.c | 29 ++++++++++++++ 2 files changed, 117 insertions(+) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index db3ed6095b1c..d30322108b34 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -50,6 +50,75 @@ enum { __NR_USED_SUBPAGE, }; +enum hugetlb_level { + HUGETLB_LEVEL_PTE = 1, + /* + * We always include PMD, PUD, and P4D in this enum definition so that, + * when logged as an integer, we can easily tell which level it is. + */ + HUGETLB_LEVEL_PMD, + HUGETLB_LEVEL_PUD, + HUGETLB_LEVEL_P4D, + HUGETLB_LEVEL_PGD, +}; + +struct hugetlb_pte { + pte_t *ptep; + unsigned int shift; + enum hugetlb_level level; + spinlock_t *ptl; +}; + +static inline +void hugetlb_pte_populate(struct hugetlb_pte *hpte, pte_t *ptep, + unsigned int shift, enum hugetlb_level level) +{ + WARN_ON_ONCE(!ptep); + hpte->ptep = ptep; + hpte->shift = shift; + hpte->level = level; + hpte->ptl = NULL; +} + +static inline +unsigned long hugetlb_pte_size(const struct hugetlb_pte *hpte) +{ + WARN_ON_ONCE(!hpte->ptep); + return 1UL << hpte->shift; +} + +static inline +unsigned long hugetlb_pte_mask(const struct hugetlb_pte *hpte) +{ + WARN_ON_ONCE(!hpte->ptep); + return ~(hugetlb_pte_size(hpte) - 1); +} + +static inline +unsigned int hugetlb_pte_shift(const struct hugetlb_pte *hpte) +{ + WARN_ON_ONCE(!hpte->ptep); + return hpte->shift; +} + +static inline +enum hugetlb_level hugetlb_pte_level(const struct hugetlb_pte *hpte) +{ + WARN_ON_ONCE(!hpte->ptep); + return hpte->level; +} + +static inline +void hugetlb_pte_copy(struct hugetlb_pte *dest, const struct hugetlb_pte *src) +{ + dest->ptep = src->ptep; + dest->shift = src->shift; + dest->level = src->level; + dest->ptl = src->ptl; +} + +bool hugetlb_pte_present_leaf(const struct hugetlb_pte *hpte, pte_t pte); + struct hugepage_subpool { spinlock_t lock; long count; @@ -1210,6 +1279,25 @@ static inline spinlock_t *huge_pte_lock(struct hstate *h, return ptl; } +static inline +spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte) +{ + + BUG_ON(!hpte->ptep); + if (hpte->ptl) + return hpte->ptl; + return huge_pte_lockptr(hugetlb_pte_shift(hpte), mm, hpte->ptep); +} + +static inline +spinlock_t *hugetlb_pte_lock(struct mm_struct *mm, struct hugetlb_pte *hpte) +{ + spinlock_t *ptl = hugetlb_pte_lockptr(mm, hpte); + + spin_lock(ptl); + return ptl; +} + #if defined(CONFIG_HUGETLB_PAGE) && defined(CONFIG_CMA) extern void __init hugetlb_cma_reserve(int order); #else diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ef7662bd0068..a0e46d35dabc 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1127,6 +1127,35 @@ static bool vma_has_reserves(struct vm_area_struct *vma, long chg) return false; } +bool hugetlb_pte_present_leaf(const struct hugetlb_pte *hpte, pte_t pte) +{ + pgd_t pgd; + p4d_t p4d; + pud_t pud; + pmd_t pmd; + + WARN_ON_ONCE(!hpte->ptep); + switch (hugetlb_pte_level(hpte)) { + case HUGETLB_LEVEL_PGD: + pgd = __pgd(pte_val(pte)); + return pgd_present(pgd) && pgd_leaf(pgd); + case HUGETLB_LEVEL_P4D: + p4d = __p4d(pte_val(pte)); + return p4d_present(p4d) && p4d_leaf(p4d); + case HUGETLB_LEVEL_PUD: + pud = __pud(pte_val(pte)); + return pud_present(pud) && pud_leaf(pud); + case HUGETLB_LEVEL_PMD: + pmd = __pmd(pte_val(pte)); + return pmd_present(pmd) && pmd_leaf(pmd); + case HUGETLB_LEVEL_PTE: + return pte_present(pte); + default: + WARN_ON_ONCE(1); + return false; + } +} + static void enqueue_huge_page(struct hstate *h, struct page *page) { int nid = page_to_nid(page); From patchwork Fri Oct 21 16:36:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6831 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp795483wrr; Fri, 21 Oct 2022 09:38:56 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7L3iwCKhMGot6tKxMxpE05zZWlOnZYb3le+Aw32GwHwbM50krUr5eVDWX+Ton7p1QQ/Z44 X-Received: by 2002:a17:902:bd8e:b0:178:25ab:56cc with SMTP id q14-20020a170902bd8e00b0017825ab56ccmr19804234pls.86.1666370335971; Fri, 21 Oct 2022 09:38:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370335; cv=none; d=google.com; s=arc-20160816; b=GeDznbhkMnkL2GJwUSSyDf7xWK9NpBY2q3n5F0pkPSyNlT1tKCcpzFjS0uu69U/pV+ irKXSXV34/4dinOBLUMu9/TiQfq4F0gtzT6qdvmjbQZ25/wg5rlkUG6ULroAvSyypwNp lJBmdbFPsemeS9p4NT0ti/19F9bl2ViA4cWdaljQXKERYkOmtqNlVbYu3+BoO1afLlH1 1SFGgU1JoP6i0SgWUV29HhmQNilDIb5wyM+MauDdHOHbfw0bEcbEKWNtTR8+9J8BsZCZ EbKrx93GiFiLIvkaBDA8WfS0eANc8kp3aX21SZTcybC1OBEcTKSlHrCdtGaPz77LN6Fh OKNw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=oBshBOuvfy/W1BjcC6heQpMSQfDF2AIsZyDOsSW4TwI=; b=eKOJbHOcUo3r5Grc71MwPWqpllDe+LSTXx4q4P7MnUYQmBdCX/24bFJ1Fsk5Gx5jAL dTYo7IF5m0jTlJMPwSELTYIFWinNQa3AUj3vudRSGdBSe8ozJsQrjBWKBJCARE2HmfNU pjqJO2S5PI2ySJQkf2HpTswt6BSCiLnjpJNoVLQOmz5Qo+Jke0eCRNCG3m759LWGGrWW t3fxatHSYQ/rJN/40IdhTTsjbhS0ErBXFJWyk2olWq2pdMhQn+GRpVrsBclYuHptrp4w Md1uVyzf5mG+K5bQb9xM65GtmTIrqWzP/UKuFM+K2cqjG3P3CzIRdBmsjk0vNXQULsoh jKAw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="P3qv/MTB"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m6-20020a170902bb8600b00176b8830921si21652919pls.294.2022.10.21.09.38.41; Fri, 21 Oct 2022 09:38:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="P3qv/MTB"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230523AbiJUQiT (ORCPT + 99 others); Fri, 21 Oct 2022 12:38:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52568 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230458AbiJUQhc (ORCPT ); Fri, 21 Oct 2022 12:37:32 -0400 Received: from mail-vs1-xe4a.google.com (mail-vs1-xe4a.google.com [IPv6:2607:f8b0:4864:20::e4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9F8EE277A0D for ; Fri, 21 Oct 2022 09:37:28 -0700 (PDT) Received: by mail-vs1-xe4a.google.com with SMTP id h8-20020a056102104800b003a7cdc977c4so1052363vsq.21 for ; Fri, 21 Oct 2022 09:37:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=oBshBOuvfy/W1BjcC6heQpMSQfDF2AIsZyDOsSW4TwI=; b=P3qv/MTBLsES5DUqVSKD8A/FOo67uH/5QWjk7bZ+AuC2vtVepaPv7nXDMe6jBJdGnF wl5tbLBKoSRJgFze6Rqcla+5Wu3Y/4rq+qR1z//VWWhDq0hfdl1Bqviwajdis7GX6dZJ M+ZE1jHFudjivAP+u9jN3zLgWg43Li5LR/f+WX7fNKTL12loHBOO3vplCGTU1vujjFPN SEUF2jc3R6lL4b97sxviNu/EraGla9thkJDq4ogtS6wyKOHp0XSnD3L/ikmoBB4oGubq FmBAKVQpWv//kqqkvT5yrCIh6Yo6KqBzzodM7qpeTlTRCCK5jlC7slq4si+uHRhLiq/Z jmIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=oBshBOuvfy/W1BjcC6heQpMSQfDF2AIsZyDOsSW4TwI=; b=8KNC5gInjL4MKrMhH/PzeaettF1tSbiumK42hY9OTuETW5k2OIxywTya977DNT5Mdy hOjpleQvXZJtKw/u4P1zC8SLUABd244jaojtJXh0G5payTtKl/QcpvH+VuRlN5sPxkoG 43BKCyJaaat+Rm3V1OXFoUUkuQ6IYcbz3lbKYgiAk6DOOl9QmW1cKfZaHthHQEPc3t7t 69+ylKfeAg70s89Z5lAhGegqzAQnCsFC7stlkK6WpqkaYSz7+LvBoAmMNk43q4T+hqTI xCJBq8a69UBrUiaOP5HlFGJkIZVLGJgMt8dWUOjazLQZhYWcMwv74bgNDFCGqPuaEdbr 1IDQ== X-Gm-Message-State: ACrzQf0mTFweZBjsNIiwd/FMUAEYQzTOouIggp9RucKzznyJK43OgzxZ kB2OSFzESnjc8XuRku7BARut1mXeJn/8GYs/ X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:ab0:b99:0:b0:3c2:b377:61e1 with SMTP id c25-20020ab00b99000000b003c2b37761e1mr13835270uak.2.1666370246908; Fri, 21 Oct 2022 09:37:26 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:27 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-12-jthoughton@google.com> Subject: [RFC PATCH v2 11/47] hugetlb: add hugetlb_pmd_alloc and hugetlb_pte_alloc From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747315941242406445?= X-GMAIL-MSGID: =?utf-8?q?1747315941242406445?= These functions are used to allocate new PTEs below the hstate PTE. This will be used by hugetlb_walk_step, which implements stepping forwards in a HugeTLB high-granularity page table walk. The reasons that we don't use the standard pmd_alloc/pte_alloc* functions are: 1) This prevents us from accidentally overwriting swap entries or attempting to use swap entries as present non-leaf PTEs (see pmd_alloc(); we assume that !pte_none means pte_present and non-leaf). 2) Locking hugetlb PTEs can different than regular PTEs. (Although, as implemented right now, locking is the same.) 3) We can maintain compatibility with CONFIG_HIGHPTE. That is, HugeTLB HGM won't use HIGHPTE, but the kernel can still be built with it, and other mm code will use it. When GENERAL_HUGETLB supports P4D-based hugepages, we will need to implement hugetlb_pud_alloc to implement hugetlb_walk_step. Signed-off-by: James Houghton --- include/linux/hugetlb.h | 5 +++ mm/hugetlb.c | 94 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 99 insertions(+) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index d30322108b34..003255b0e40f 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -119,6 +119,11 @@ void hugetlb_pte_copy(struct hugetlb_pte *dest, const struct hugetlb_pte *src) bool hugetlb_pte_present_leaf(const struct hugetlb_pte *hpte, pte_t pte); +pmd_t *hugetlb_pmd_alloc(struct mm_struct *mm, struct hugetlb_pte *hpte, + unsigned long addr); +pte_t *hugetlb_pte_alloc(struct mm_struct *mm, struct hugetlb_pte *hpte, + unsigned long addr); + struct hugepage_subpool { spinlock_t lock; long count; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index a0e46d35dabc..e3733388adee 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -341,6 +341,100 @@ static bool has_same_uncharge_info(struct file_region *rg, #endif } +pmd_t *hugetlb_pmd_alloc(struct mm_struct *mm, struct hugetlb_pte *hpte, + unsigned long addr) +{ + spinlock_t *ptl = hugetlb_pte_lockptr(mm, hpte); + pmd_t *new; + pud_t *pudp; + pud_t pud; + + if (hpte->level != HUGETLB_LEVEL_PUD) + return ERR_PTR(-EINVAL); + + pudp = (pud_t *)hpte->ptep; +retry: + pud = *pudp; + if (likely(pud_present(pud))) + return unlikely(pud_leaf(pud)) + ? ERR_PTR(-EEXIST) + : pmd_offset(pudp, addr); + else if (!huge_pte_none(huge_ptep_get(hpte->ptep))) + /* + * Not present and not none means that a swap entry lives here, + * and we can't get rid of it. + */ + return ERR_PTR(-EEXIST); + + new = pmd_alloc_one(mm, addr); + if (!new) + return ERR_PTR(-ENOMEM); + + spin_lock(ptl); + if (!pud_same(pud, *pudp)) { + spin_unlock(ptl); + pmd_free(mm, new); + goto retry; + } + + mm_inc_nr_pmds(mm); + smp_wmb(); /* See comment in pmd_install() */ + pud_populate(mm, pudp, new); + spin_unlock(ptl); + return pmd_offset(pudp, addr); +} + +pte_t *hugetlb_pte_alloc(struct mm_struct *mm, struct hugetlb_pte *hpte, + unsigned long addr) +{ + spinlock_t *ptl = hugetlb_pte_lockptr(mm, hpte); + pgtable_t new; + pmd_t *pmdp; + pmd_t pmd; + + if (hpte->level != HUGETLB_LEVEL_PMD) + return ERR_PTR(-EINVAL); + + pmdp = (pmd_t *)hpte->ptep; +retry: + pmd = *pmdp; + if (likely(pmd_present(pmd))) + return unlikely(pmd_leaf(pmd)) + ? ERR_PTR(-EEXIST) + : pte_offset_kernel(pmdp, addr); + else if (!huge_pte_none(huge_ptep_get(hpte->ptep))) + /* + * Not present and not none means that a swap entry lives here, + * and we can't get rid of it. + */ + return ERR_PTR(-EEXIST); + + /* + * With CONFIG_HIGHPTE, calling `pte_alloc_one` directly may result + * in page tables being allocated in high memory, needing a kmap to + * access. Instead, we call __pte_alloc_one directly with + * GFP_PGTABLE_USER to prevent these PTEs being allocated in high + * memory. + */ + new = __pte_alloc_one(mm, GFP_PGTABLE_USER); + if (!new) + return ERR_PTR(-ENOMEM); + + spin_lock(ptl); + if (!pmd_same(pmd, *pmdp)) { + spin_unlock(ptl); + pgtable_pte_page_dtor(new); + __free_page(new); + goto retry; + } + + mm_inc_nr_ptes(mm); + smp_wmb(); /* See comment in pmd_install() */ + pmd_populate(mm, pmdp, new); + spin_unlock(ptl); + return pte_offset_kernel(pmdp, addr); +} + static void coalesce_file_region(struct resv_map *resv, struct file_region *rg) { struct file_region *nrg, *prg; From patchwork Fri Oct 21 16:36:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6829 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp795392wrr; Fri, 21 Oct 2022 09:38:48 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4jPObyjubaYxxdA3nqcmFBH7q0VMJ3z6XQfELywjGfByuy1vNkEgZtE4RHEHVRKa20AtS+ X-Received: by 2002:a63:4f15:0:b0:455:ede1:d8c9 with SMTP id d21-20020a634f15000000b00455ede1d8c9mr17180597pgb.452.1666370328611; Fri, 21 Oct 2022 09:38:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370328; cv=none; d=google.com; s=arc-20160816; b=ZJ330o/saal4BJ0j4dB3MxEQ6F9MEY6c6NYAJUYMlx9UPUTTyEZ61CANiKZ02KYqbd xaDxtkhnqfZgWDXE1x8oSeji1KMNcsA/unJJiyB4OhSupgnLn9IMHmFqnVueWpFXjzbv u6IbvauqFSdDhwNliv8O4VGwGaBpOvM6v0QIFy++Sn4RWZENrIp7g6O586lCYHUqTKkE rO03vRMcI5/fLc1P/neYXejUqcw9HvEKmlvBnsaC8mnA1fbqi8/R1lsn7Td8WjpBzrrd mcau8A+qb2wEpaa3JzVEhYq+/0DEN6KhCU3TQe84TileVAhrxJC6Jw1dqY20mcZHNCQN hZtA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=tiouMOIIjJOYM6j/82rHxYpjcD4eEhikQkMBGqa8BjI=; b=0LvOequDFpKA64XcufCpDQcx/JymI8kQO7hZPFbbK/+VskRHef3/ZD8cwT5WO4TXkF 8XSDJcP7rALWZch8qVJjvPxisUpNgKdUbcH05sKh6W64Z95i6ZG3guzLEYvBgFke4l1k t9amVvaIeMFduhBA3g90Hog8HTE59Jx7PtASUqu3hARrdpjSe50B6yJMUvHNbXvFEqvM yEqgDkc4F47rDHmrEBZYqlgal/NJAfddk9/gCWIRlzth8xjfCBLjYauIG8IYEluMso7D HsqFhE+6OWSHK1QVfNBqFpPbk83y/PsfncxJsM7aUZX/6xOzmKsvawudX8phHRVLgQuS iuTw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=CKDVMh2S; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b4-20020a170903228400b0016bea2a8b16si30604813plh.364.2022.10.21.09.38.36; Fri, 21 Oct 2022 09:38:48 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=CKDVMh2S; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231183AbiJUQiP (ORCPT + 99 others); Fri, 21 Oct 2022 12:38:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52730 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230416AbiJUQhb (ORCPT ); Fri, 21 Oct 2022 12:37:31 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DDA8724AE2A for ; Fri, 21 Oct 2022 09:37:28 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-36810cfa61fso34036617b3.6 for ; Fri, 21 Oct 2022 09:37:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=tiouMOIIjJOYM6j/82rHxYpjcD4eEhikQkMBGqa8BjI=; b=CKDVMh2S2MhbdslRa+CI5ddIMrp2Tg/ONWgjdp1mlNaXKwpLWu4solm3u9LKSEZhXe p5A9lENwGfwCeYMW/FG7oLitUyH1M07pAul99TUAng6w3bindvWi4y2ob1DEtfPdFWqB mEbbZRsIEV0rMs/1QSkPYQ7dFlZlPFhQZCI2VPY2cOuqpSHhoSsI6hY56xx7Nrh4CjVM eSbPEMydq6pzO8+REx/zjXLNqei8CW9W8LlzdHQcvqY0UxSls9cI1rUTL/nhPpVdrYN7 CrKaq5o7mD0KjcbV9vAH7QbcGy2DhuZdmXzsjVy3O3QG6hDph0GIp/gUmOcRtHlgiQyh o1cA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=tiouMOIIjJOYM6j/82rHxYpjcD4eEhikQkMBGqa8BjI=; b=kAPv8GsckLnNW/gYoV1/aonBYhLVlTH8kobuFa0kg5rJXQBtYKlk7O0kOsg0yR7eZ9 tJdqTBRS8a3pFeHwz5u83QBKI/Al278ryxKWubdRbE0bcLRqViOeVF+L3NEkWhG9dUIu /yh+3jXN+TuPug0N6a2ZcSDWpYW6mJpy7ycSgJY3N2aRxpIoEa7lPPl+DiMLHtpw5VNB MA/myvZyqKdbT3n8lB/773p50R6wmeSD8OmDhLaBDEcW8lVz+d3QtqUPk64bb7Cr/lrY Toc4CkZ2xPkg1v4U1daUYIdNWixiORtBUGVoKcHTw0C1diUCGUm3+KH6XueXjjhGndjL zVJw== X-Gm-Message-State: ACrzQf1lX/iknXdVtbr3PWgq2UVF1Rx5gCvo+by0F0dZDT9pb1viRbu/ 8Ie5PDyHPQ6Ffl7BV6mXvToHq52W9vzOtEDa X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6902:124f:b0:66e:e3da:487e with SMTP id t15-20020a056902124f00b0066ee3da487emr18532629ybu.310.1666370247777; Fri, 21 Oct 2022 09:37:27 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:28 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-13-jthoughton@google.com> Subject: [RFC PATCH v2 12/47] hugetlb: add hugetlb_hgm_walk and hugetlb_walk_step From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747315933843644707?= X-GMAIL-MSGID: =?utf-8?q?1747315933843644707?= hugetlb_hgm_walk implements high-granularity page table walks for HugeTLB. It is safe to call on non-HGM enabled VMAs; it will return immediately. hugetlb_walk_step implements how we step forwards in the walk. For architectures that don't use GENERAL_HUGETLB, they will need to provide their own implementation. Signed-off-by: James Houghton --- include/linux/hugetlb.h | 13 +++++ mm/hugetlb.c | 125 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 138 insertions(+) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 003255b0e40f..4b1548adecde 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -276,6 +276,10 @@ u32 hugetlb_fault_mutex_hash(struct address_space *mapping, pgoff_t idx); pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, pud_t *pud); +int hugetlb_hgm_walk(struct mm_struct *mm, struct vm_area_struct *vma, + struct hugetlb_pte *hpte, unsigned long addr, + unsigned long sz, bool stop_at_none); + struct address_space *hugetlb_page_mapping_lock_write(struct page *hpage); extern int sysctl_hugetlb_shm_group; @@ -288,6 +292,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr, unsigned long sz); unsigned long hugetlb_mask_last_page(struct hstate *h); +int hugetlb_walk_step(struct mm_struct *mm, struct hugetlb_pte *hpte, + unsigned long addr, unsigned long sz); int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, pte_t *ptep); void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, @@ -1066,6 +1072,8 @@ void hugetlb_register_node(struct node *node); void hugetlb_unregister_node(struct node *node); #endif +enum hugetlb_level hpage_size_to_level(unsigned long sz); + #else /* CONFIG_HUGETLB_PAGE */ struct hstate {}; @@ -1253,6 +1261,11 @@ static inline void hugetlb_register_node(struct node *node) static inline void hugetlb_unregister_node(struct node *node) { } + +static inline enum hugetlb_level hpage_size_to_level(unsigned long sz) +{ + return HUGETLB_LEVEL_PTE; +} #endif /* CONFIG_HUGETLB_PAGE */ #ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING diff --git a/mm/hugetlb.c b/mm/hugetlb.c index e3733388adee..90db59632559 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -95,6 +95,29 @@ static void hugetlb_vma_data_free(struct vm_area_struct *vma); static int hugetlb_vma_data_alloc(struct vm_area_struct *vma); static void __hugetlb_vma_unlock_write_free(struct vm_area_struct *vma); +/* + * hpage_size_to_level() - convert @sz to the corresponding page table level + * + * @sz must be less than or equal to a valid hugepage size. + */ +enum hugetlb_level hpage_size_to_level(unsigned long sz) +{ + /* + * We order the conditionals from smallest to largest to pick the + * smallest level when multiple levels have the same size (i.e., + * when levels are folded). + */ + if (sz < PMD_SIZE) + return HUGETLB_LEVEL_PTE; + if (sz < PUD_SIZE) + return HUGETLB_LEVEL_PMD; + if (sz < P4D_SIZE) + return HUGETLB_LEVEL_PUD; + if (sz < PGDIR_SIZE) + return HUGETLB_LEVEL_P4D; + return HUGETLB_LEVEL_PGD; +} + static inline bool subpool_is_free(struct hugepage_subpool *spool) { if (spool->count) @@ -7321,6 +7344,70 @@ bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr) } #endif /* CONFIG_ARCH_WANT_HUGE_PMD_SHARE */ +/* hugetlb_hgm_walk - walks a high-granularity HugeTLB page table to resolve + * the page table entry for @addr. + * + * @hpte must always be pointing at an hstate-level PTE (or deeper). + * + * This function will never walk further if it encounters a PTE of a size + * less than or equal to @sz. + * + * @stop_at_none determines what we do when we encounter an empty PTE. If true, + * we return that PTE. If false and @sz is less than the current PTE's size, + * we make that PTE point to the next level down, going until @sz is the same + * as our current PTE. + * + * If @stop_at_none is true and @sz is PAGE_SIZE, this function will always + * succeed, but that does not guarantee that hugetlb_pte_size(hpte) is @sz. + * + * Return: + * -ENOMEM if we couldn't allocate new PTEs. + * -EEXIST if the caller wanted to walk further than a migration PTE, + * poison PTE, or a PTE marker. The caller needs to manually deal + * with this scenario. + * -EINVAL if called with invalid arguments (@sz invalid, @hpte not + * initialized). + * 0 otherwise. + * + * Even if this function fails, @hpte is guaranteed to always remain + * valid. + */ +int hugetlb_hgm_walk(struct mm_struct *mm, struct vm_area_struct *vma, + struct hugetlb_pte *hpte, unsigned long addr, + unsigned long sz, bool stop_at_none) +{ + int ret = 0; + pte_t pte; + + if (WARN_ON_ONCE(sz < PAGE_SIZE)) + return -EINVAL; + + if (!hugetlb_hgm_enabled(vma)) { + if (stop_at_none) + return 0; + return sz == huge_page_size(hstate_vma(vma)) ? 0 : -EINVAL; + } + + hugetlb_vma_assert_locked(vma); + + if (WARN_ON_ONCE(!hpte->ptep)) + return -EINVAL; + + while (hugetlb_pte_size(hpte) > sz && !ret) { + pte = huge_ptep_get(hpte->ptep); + if (!pte_present(pte)) { + if (stop_at_none) + return 0; + if (unlikely(!huge_pte_none(pte))) + return -EEXIST; + } else if (hugetlb_pte_present_leaf(hpte, pte)) + return 0; + ret = hugetlb_walk_step(mm, hpte, addr, sz); + } + + return ret; +} + #ifdef CONFIG_ARCH_WANT_GENERAL_HUGETLB pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz) @@ -7388,6 +7475,44 @@ pte_t *huge_pte_offset(struct mm_struct *mm, return (pte_t *)pmd; } +/* + * hugetlb_walk_step() - Walk the page table one step to resolve the page + * (hugepage or subpage) entry at address @addr. + * + * @sz always points at the final target PTE size (e.g. PAGE_SIZE for the + * lowest level PTE). + * + * @hpte will always remain valid, even if this function fails. + */ +int hugetlb_walk_step(struct mm_struct *mm, struct hugetlb_pte *hpte, + unsigned long addr, unsigned long sz) +{ + pte_t *ptep; + spinlock_t *ptl; + + switch (hpte->level) { + case HUGETLB_LEVEL_PUD: + ptep = (pte_t *)hugetlb_pmd_alloc(mm, hpte, addr); + if (IS_ERR(ptep)) + return PTR_ERR(ptep); + hugetlb_pte_populate(hpte, ptep, PMD_SHIFT, HUGETLB_LEVEL_PMD); + break; + case HUGETLB_LEVEL_PMD: + ptep = hugetlb_pte_alloc(mm, hpte, addr); + if (IS_ERR(ptep)) + return PTR_ERR(ptep); + ptl = pte_lockptr(mm, (pmd_t *)hpte->ptep); + hugetlb_pte_populate(hpte, ptep, PAGE_SHIFT, HUGETLB_LEVEL_PTE); + hpte->ptl = ptl; + break; + default: + WARN_ONCE(1, "%s: got invalid level: %d (shift: %d)\n", + __func__, hpte->level, hpte->shift); + return -EINVAL; + } + return 0; +} + /* * Return a mask that can be used to update an address to the last huge * page in a page table page mapping size. Used to skip non-present From patchwork Fri Oct 21 16:36:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6832 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp795500wrr; Fri, 21 Oct 2022 09:38:58 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4VGU1o49F1sPeEzUo4RJ3fPXVdUstDeMpgStggQItYkg2o5hUJA1i5YwHDGjWp4fUzN8Fm X-Received: by 2002:a17:90b:4f84:b0:212:c372:1c9 with SMTP id qe4-20020a17090b4f8400b00212c37201c9mr8030416pjb.236.1666370337695; Fri, 21 Oct 2022 09:38:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370337; cv=none; d=google.com; s=arc-20160816; b=mhbBeuXCfjAezyk6IbmFAw+ObEa43HnyKAZFcfF8XX2weCugynqwkytYq1hx9JyKE6 1sAkG71gtEwAyFHh8x1RqyLIEujis6U5xpGs3K4JkJlmFZNEMn+NkK7z7+rGiouOjBYv kmU1ESYviYYEStv/kijthHnp9Uia8ExqRL3hewO2G2fGONNIZMjQUz33MT+YNVGQgGUO anTvpmUEmA2SKLEMTP/+eH7EfHeq7w5gGuiE5GsoUzs5ipUPev9nQyI9OGm5dbuwRzO8 MNuAOf/6jB3jcGpAdF13kKVk4uoqz3jVfS0Sb+vHfTqtY1UW2CnRD9ROZyhjEVJO62oT goFw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=8/NM5zqKrp3T0mJhjHHyOLUnz62gupmqJK/DDFGexZI=; b=JWlYkc7onarEFqDnQit4xlnMxEAKM+gBPgaTN/D7Sok5tphkdtmZr2ywiRlBgyf9Jz V+FMPxbA9axzzzVlXDXkdlpWg+X7bmxyI6Y/zUOYQgaCgJjdMvtKp3vM0mzLylO5IGbG M23sKKz2z7ncg5nstoG5fW2tZ1z1KMSWhQjUTRG5rkpvpRRdxmwct7eZZIOAo6gnwAFs k2QjJbQovF/d7wje7iSjz9i+tMwy3XhH2BzY+7LEQAgBEzZabXmAnFn70o7bnbxASMOw 4Qe9Gu9Gd6SI70KyXD0/Vp7GCwyf8rt/NgOko3txRsTPFZK2Z06fbytY0CY4XAYcKeyp zBBw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=kNo3BI9L; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ay12-20020a1709028b8c00b00178072335cbsi23595465plb.132.2022.10.21.09.38.45; Fri, 21 Oct 2022 09:38:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=kNo3BI9L; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231200AbiJUQiY (ORCPT + 99 others); Fri, 21 Oct 2022 12:38:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52662 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230361AbiJUQhw (ORCPT ); Fri, 21 Oct 2022 12:37:52 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D525427B579 for ; Fri, 21 Oct 2022 09:37:29 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id t17-20020a257811000000b006c509f9a16dso3792779ybc.10 for ; Fri, 21 Oct 2022 09:37:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=8/NM5zqKrp3T0mJhjHHyOLUnz62gupmqJK/DDFGexZI=; b=kNo3BI9LgJDNbw4zG7f+BcfBeEsxT7zZGWcOge04kuwB3WGs4ua5FnV/ikxacl43mD M6k0xqIMh4xV/kNssOly5E5ouNZTJbE1jiHX8f4BQKdFzjOe8CtoSIjUdke5SPji+pEq kOZ846TjVCKjKeX4b6HxrkfJ6BQ6CGISZePlXZkN2Rm1jtqyKh+RfXYCoD2KqxlK58+X giN/v/U9prCXXWwTgQ+82EXYGKAT/w9/2BGshYZOi4LwaY3LCQutO9qEE+7l1KWb9pbv DCVuuxqs8fuEhlJkR2ekbMYvfUQkoMuOLpKGVnSHPaC+P69I+ARfmCsKOI9PHLE6d5yH SWSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=8/NM5zqKrp3T0mJhjHHyOLUnz62gupmqJK/DDFGexZI=; b=PE7K8aCeBtl9dPQk2G2iangSq8o390LGl2zyxcz7mCCAMCjsC78mP1kdSk1+MglYdi iRzVxb+pJ5HMkkHJeHwEQL8hbgffR2eo+Hu4e7qp+yTOTHD9f2Rw1fWwuBPzlpgbGrLj Yv6tv3/PB6W4A2LrO89Yhbdx8d4r/yPhqpT5bEB+zRIxQiNKZBDvukphOC2110mPvFex VH9UOVwrwz98a5LYzZZdkDwWA0dCnAHbAYQnu72lmqYCvFcHVMhDF1W1wavNBJ3clUVG zsZea1jbdohwLrEvd3Acm4lPxpu8gbQySy9T8m2X8mdFiOngk9/DnBqrxCHmq341pPY9 lPRg== X-Gm-Message-State: ACrzQf0eRFmTgnQiIODRCrlRFrmtlHgDfLcPQ0SA5Uew7qTxqWxTEDLl TT3xagEHB3Xg296AUjcZ4QtPCufheQCXdxwz X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a81:4b96:0:b0:354:8935:d5a9 with SMTP id y144-20020a814b96000000b003548935d5a9mr18239585ywa.36.1666370248722; Fri, 21 Oct 2022 09:37:28 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:29 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-14-jthoughton@google.com> Subject: [RFC PATCH v2 13/47] hugetlb: add make_huge_pte_with_shift From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747315943252168895?= X-GMAIL-MSGID: =?utf-8?q?1747315943252168895?= This allows us to make huge PTEs at shifts other than the hstate shift, which will be necessary for high-granularity mappings. Signed-off-by: James Houghton Acked-by: Mike Kravetz --- mm/hugetlb.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 90db59632559..74a4afda1a7e 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4867,11 +4867,11 @@ const struct vm_operations_struct hugetlb_vm_ops = { .pagesize = hugetlb_vm_op_pagesize, }; -static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page, - int writable) +static pte_t make_huge_pte_with_shift(struct vm_area_struct *vma, + struct page *page, int writable, + int shift) { pte_t entry; - unsigned int shift = huge_page_shift(hstate_vma(vma)); if (writable) { entry = huge_pte_mkwrite(huge_pte_mkdirty(mk_pte(page, @@ -4885,6 +4885,14 @@ static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page, return entry; } +static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page, + int writable) +{ + unsigned int shift = huge_page_shift(hstate_vma(vma)); + + return make_huge_pte_with_shift(vma, page, writable, shift); +} + static void set_huge_ptep_writable(struct vm_area_struct *vma, unsigned long address, pte_t *ptep) { From patchwork Fri Oct 21 16:36:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6834 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp795699wrr; Fri, 21 Oct 2022 09:39:19 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7+g8FgBGSJwFOAueQHTFXZeHnpiOQosBsIoRYbEcB7w7YG6aqF+KhUq7LmPRu5u5ml+xgH X-Received: by 2002:a05:6a00:cc4:b0:566:87c:53de with SMTP id b4-20020a056a000cc400b00566087c53demr19733657pfv.19.1666370359486; Fri, 21 Oct 2022 09:39:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370359; cv=none; d=google.com; s=arc-20160816; b=AGVvdK4iYOodvuhV3rYJChroxMXDg05VC9WkBKx+JDnJN+KUlcAXQqIYC9op2jHzR8 YEgdwyCz1L5oedDUxB5mYGjdSIf6gjpBsb2cZX/H9Hc+TAx27CIv/cgh6sRUS6Oewb0v muv8xLnG32QMPBlJ0E9r4vAmCx287z37Jej4Y+hxvy9/FfEjiAvEqNMH8mt4IVYg2NW5 8SskYuAB+Zt/vTBEpw2RSZ0r1lTxrnSuTKluqOv9x58iqPbrimCqkSDWJpbv+gzohnxv la3IwtMN3PWModrFshRZccepY+pas+MIvU+YdEMNcuO3eAqhbVmOJQbHi1nprKNcETew kUdA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=vfmRA2hXHJo+o02NNvf+xyYOK4fmmGYkyLRnRI2e0ew=; b=mWOx4WuWx+rhbHeb8vkNOE3vfBnN0ijbkdXIDliuFUReYzmUNwKCMfxNyurI6Jim86 dKYGRptTYHKBmPdXQvxq6qUXYp3clupqoFUAVrLbvkWdikzwq3RePwTlWpBN/vnjEf+e CLhVxy4N1QzTUMDXyKdzd7W0ijc263mwi93eHyj4jbQPIf/eecyjUI2hjhQRrlrY8E1+ butoL5rM6ldfPtcYnMOwme4EhOI5XWaDQbfqVgD2a15CAzqaY0rabdeT/vXVft2veWMM gwaeIL5UcDRnNw+3lYXJqHSidOSjvU6DNPgCvzBcUaG+gX8rIo1T7FCmC5dZwPN0qEZ2 UPxw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=N0Njss2t; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a25-20020a631a19000000b0045166930a46si27568111pga.609.2022.10.21.09.39.06; Fri, 21 Oct 2022 09:39:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=N0Njss2t; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230141AbiJUQig (ORCPT + 99 others); Fri, 21 Oct 2022 12:38:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53402 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230512AbiJUQhw (ORCPT ); Fri, 21 Oct 2022 12:37:52 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BED1F27B55C for ; Fri, 21 Oct 2022 09:37:30 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id p66-20020a257445000000b006ca0ba7608fso3732176ybc.7 for ; Fri, 21 Oct 2022 09:37:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=vfmRA2hXHJo+o02NNvf+xyYOK4fmmGYkyLRnRI2e0ew=; b=N0Njss2tbbufEOGB3mWef6ZPPCsGQk8p/qgNHuo+xubJr8V06BeblG2+fHSKMDlxA/ UJHivxWkHImjyXzOSjLEeYcZId2JJN2s+UU8otxIgWTW9duZ5h63B+TLoNO25kQvzWXU HeD3Hvh1l9z43C29huY50us4lyvu6ehOMFBsaP/qsg6yn2ieehrftiopv/q8ZpHCLJ3Y K7UMsTw8/CtFVq/Ds71EirrkPGmvM7a28uKqnCDpCM+PSuE49qWj9kjs6TX0wErVJMdC fBYvTwS4YdM1mYbSThmJwyUIU5RPiBM8YE1g/uw7dhpETOePIsSy4CEuuJN9RhbaqsIy 0rtA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=vfmRA2hXHJo+o02NNvf+xyYOK4fmmGYkyLRnRI2e0ew=; b=pLsxIW+Znhwb5czVtOkwP4MfdVlHE0FAd5jFU4PU6c96ulf3QuFhtz/mQj14NXhluV b3HQ6oOpMvAEEY6p4zRFFq2LV3fP4cHFhrxOLzNM6jPSPu7qYG4Lc5m75VS6cRy7FjfH trKdxVT3YQuLFALesgyMOIHRr85zw/P6R9qI8Azkj4SA/hOt+u2VHtPgjLPh0i7qtzOY jAURiwz3aFmLjVQNdQx5Y1r3j7Mw9nm37LQSulHzGitRZRqIbJiXrk0J3xnWCsFw3lwl Fq22Ux9Ap8Le4MlA/sq8yTeLR52OdzZJ6lyr+p4bMkhvzWE1gOaIsz0g7+E102B77SAt TW8w== X-Gm-Message-State: ACrzQf0+J5HMG2PiQ8WV2ZorU207EpquGUe9EaQftVM784AMzGUMqQxO 4xAKPVB7e1rHzp2vrI6xOnRWCN2i9T0A+AUo X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:ca8d:0:b0:6c1:99ab:5798 with SMTP id a135-20020a25ca8d000000b006c199ab5798mr17535543ybg.19.1666370249622; Fri, 21 Oct 2022 09:37:29 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:30 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-15-jthoughton@google.com> Subject: [RFC PATCH v2 14/47] hugetlb: make default arch_make_huge_pte understand small mappings From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747315965992519081?= X-GMAIL-MSGID: =?utf-8?q?1747315965992519081?= This is a simple change: don't create a "huge" PTE if we are making a regular, PAGE_SIZE PTE. All architectures that want to implement HGM likely need to be changed in a similar way if they implement their own version of arch_make_huge_pte. Signed-off-by: James Houghton --- include/linux/hugetlb.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 4b1548adecde..d305742e9d44 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -907,7 +907,7 @@ static inline void arch_clear_hugepage_flags(struct page *page) { } static inline pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags) { - return pte_mkhuge(entry); + return shift > PAGE_SHIFT ? pte_mkhuge(entry) : entry; } #endif From patchwork Fri Oct 21 16:36:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6833 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp795668wrr; Fri, 21 Oct 2022 09:39:16 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5NUOnR0/VHR86OxcDPUEle1aqoX7RZHTHqVARAJH91IQdfBZaTeDidxeeb4uP6BEhegZDy X-Received: by 2002:a63:f512:0:b0:460:35b5:14ed with SMTP id w18-20020a63f512000000b0046035b514edmr16144736pgh.389.1666370356119; Fri, 21 Oct 2022 09:39:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370356; cv=none; d=google.com; s=arc-20160816; b=QNNbDyaZrGuMBdr0izUBZRZyNwuOa8KvhFSa9+jYdr40MeOfTfR4nbaF5WLDeYDrvo D0C6V1QYS0SY+EG4eK8CutNt6A4INu8KCDAq88WAzmq8mXXBvnfn75JYOL52z88YBNxd cCCNpa9ooEAiKNwOD3P+VOTGh/lfpAdCJa/ganwCEmni8AisvWJmSvToZQnX7zAG1Tfc B26MbA2i3cG+LhKC9AnSsZ9iQkK2Bwge6rAWwGIyLH0Wn+umPm99hTQXu3Ol8frERmZ0 ZxiUvkvUW041VJg6Px+B74FBYR8F2WKf99fN+u5u7sar2G19ivvPeE3SOp0E3EihI7UX x2dQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=fsyIlVnnMiNXvkl1HsXqgX1TLgLGQ8zT/P/5weg/1gI=; b=G/wIe/bjSXCrDlTUv2ayRBwXkVRnSxkcU67LUdlqb1/E7VR8tFsmWH+FvA1nhbcq78 dEHXoX9vEPEPV4vZp/DiMP1A6yDQtjQBkCxu/oHWulMPTnb9/Kap/Bu3H4/ZsZ1du8LO RcPbsLBXtccpQ2kxBx91E/Dh7CMMDL04l8B49ykHOUbFduFwPuXDrESYKYld9UKkxpQx h6NZPK5fPxU6MnsbhvFaGJkahe4cLUEgqjRE8T6fsgEY1W1NlYVzAvNWG5bK1K8x0mu2 w3q1rIHmFrsUqoRcqosWVowjmnoTdATA3AhK6FwnqshTl9azoaqf+kVZWcnFNCdgPOQC s6fA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="XqorezT/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x6-20020a170902ea8600b001853ad3944dsi23778489plb.480.2022.10.21.09.39.02; Fri, 21 Oct 2022 09:39:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="XqorezT/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231204AbiJUQib (ORCPT + 99 others); Fri, 21 Oct 2022 12:38:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53398 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230509AbiJUQhw (ORCPT ); Fri, 21 Oct 2022 12:37:52 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DE37527B560 for ; Fri, 21 Oct 2022 09:37:31 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id 204-20020a2510d5000000b006be7970889cso3745369ybq.21 for ; Fri, 21 Oct 2022 09:37:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=fsyIlVnnMiNXvkl1HsXqgX1TLgLGQ8zT/P/5weg/1gI=; b=XqorezT/2FphyiQnAX1Bd8wgsl7lfI/pP/ncB+cOJZKqD23STzv16ftCYUCPWHqtg5 bpBlu8K4trcZPPP7ksTIPwi+OmxSmuA+bx2JDQ4xMN1Sza2xhzZIri2t3uZC3siOgf1N 6PkInLp3do3ZY7Mr0m9JvaUW9Fio86INZMzBpW/c8ze+LCXgeqrdcTQFLLGy0UJLJ7rx r5nPuIKCiA9//Lvg3JPH/D3IUupmV0+whrn4/c2BWlYGrhrlNQ3nh+aOXJrWK/llyVpr 73/3Br83UOiGnA3b2tybFdHzcf9rRgS7SFsJnN0uyja6ZNBNeDdPNMYCpV/KSWK8w9CV O7AQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=fsyIlVnnMiNXvkl1HsXqgX1TLgLGQ8zT/P/5weg/1gI=; b=K+e3UX+pTvf9ANRUGoMl/psOaT/Bqsp+sLggtvK2mS6JGtzj9zzAUhl1nHKQd5c7Hy HYwe7HM2mbfZxzx2XRMEHwXxjYXaaVgPR+F7vRdqnFu2X1hv+FHJonXzbiKox1rG/yYr yy3F+0QeYIKEvBlGtnsBvnQi/kHiXgFq1EApsbYxN0ecP50b2O9nJdMxYcGhfOsFeTXZ RflacmL1dd+ttzcFYhXfWRKkkdR9ORsxAykHAzJqx4iG6zCS6akLBF3aP/Q6F3ItDpCz fi9o0uRy+iIOnYlhKvJks12s/WR3ftWfzetx9yTvUo6xW0FfE9D6g+HsECuNukIyofWT eHiw== X-Gm-Message-State: ACrzQf1vHAckTIBW/n6KvpKivc6gjn9/ugp/4r1WJ8nzaLw0+m8BLJhN rV7pRb84V9MraDRTI+ogSH+/zhgUHc5IEHqi X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6902:70f:b0:6ca:7254:c2ea with SMTP id k15-20020a056902070f00b006ca7254c2eamr3096670ybt.476.1666370250766; Fri, 21 Oct 2022 09:37:30 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:31 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-16-jthoughton@google.com> Subject: [RFC PATCH v2 15/47] hugetlbfs: for unmapping, treat HGM-mapped pages as potentially mapped From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747315962477727499?= X-GMAIL-MSGID: =?utf-8?q?1747315962477727499?= hugetlb_vma_maps_page was mostly used as an optimization: if the VMA isn't mapping a page, then we don't have to attempt to unmap it again. We are still able to call the unmap routine if we need to. For high-granularity mapped pages, we can't easily do a full walk to see if the page is actually mapped or not, so simply return that it might be. Signed-off-by: James Houghton --- fs/hugetlbfs/inode.c | 27 +++++++++++++++++++++------ 1 file changed, 21 insertions(+), 6 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 7f836f8f9db1..a7ab62e39b8c 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -383,21 +383,34 @@ static void hugetlb_delete_from_page_cache(struct folio *folio) * mutex for the page in the mapping. So, we can not race with page being * faulted into the vma. */ -static bool hugetlb_vma_maps_page(struct vm_area_struct *vma, - unsigned long addr, struct page *page) +static bool hugetlb_vma_maybe_maps_page(struct vm_area_struct *vma, + unsigned long addr, struct page *page) { pte_t *ptep, pte; + struct hugetlb_pte hpte; + struct hstate *h = hstate_vma(vma); - ptep = huge_pte_offset(vma->vm_mm, addr, - huge_page_size(hstate_vma(vma))); + ptep = huge_pte_offset(vma->vm_mm, addr, huge_page_size(h)); if (!ptep) return false; + hugetlb_pte_populate(&hpte, ptep, huge_page_shift(h), + hpage_size_to_level(huge_page_size(h))); + pte = huge_ptep_get(ptep); if (huge_pte_none(pte) || !pte_present(pte)) return false; + if (!hugetlb_pte_present_leaf(&hpte, pte)) + /* + * The top-level PTE is not a leaf, so it's possible that a PTE + * under us is mapping the page. We aren't holding the VMA + * lock, so it is unsafe to continue the walk further. Instead, + * return true to indicate that we might be mapping the page. + */ + return true; + if (pte_page(pte) == page) return true; @@ -457,7 +470,8 @@ static void hugetlb_unmap_file_folio(struct hstate *h, v_start = vma_offset_start(vma, start); v_end = vma_offset_end(vma, end); - if (!hugetlb_vma_maps_page(vma, vma->vm_start + v_start, page)) + if (!hugetlb_vma_maybe_maps_page(vma, vma->vm_start + v_start, + page)) continue; if (!hugetlb_vma_trylock_write(vma)) { @@ -507,7 +521,8 @@ static void hugetlb_unmap_file_folio(struct hstate *h, */ v_start = vma_offset_start(vma, start); v_end = vma_offset_end(vma, end); - if (hugetlb_vma_maps_page(vma, vma->vm_start + v_start, page)) + if (hugetlb_vma_maybe_maps_page(vma, vma->vm_start + v_start, + page)) unmap_hugepage_range(vma, vma->vm_start + v_start, v_end, NULL, ZAP_FLAG_DROP_MARKER); From patchwork Fri Oct 21 16:36:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6835 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp795736wrr; Fri, 21 Oct 2022 09:39:23 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7fkwqOPorZLaRQ9fiZMan4UCCEqWyXH0CKPXPijtNEpQiEO25dKh3i82FA41Iao9kJkbJn X-Received: by 2002:a17:902:ce0d:b0:178:bd1e:e8da with SMTP id k13-20020a170902ce0d00b00178bd1ee8damr20018007plg.103.1666370363320; Fri, 21 Oct 2022 09:39:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370363; cv=none; d=google.com; s=arc-20160816; b=tYoRpzQXlUQMJBTITzm0EV1+fnTSx3d+Aypl2RNtXUDrOMRiLT8A6qXfQ+OKlfdYS9 CnEPyE6csVR/Xw+SSN/EbjGKMILUSBm0JnYufTtaYitNCLLZ/LJFJa1HoADkiwiM4yLX SMnlzZLCPkuagGb86oq0mfKkM/ZqdjovIy9cY+yJOX7hEog14EclO1uxshgA8WFYCld5 i6CGg4wltwkfV/SBkWmTqEd2LiF+UAZXSmwIP/l8wL4kvIUfVDvAKzrY2IFLXJF3ml+Q Ysq/Hp/E9x5i3BwyVmoSiGfsjx2FSlqh/Z01Rbk4joJ1ry/1HsonRJZz1kFmPPefSWv1 170A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=jGDNGpgyd+LktDdBqcgk2eFzL/51App81iFIhhA8QvM=; b=MtlLEkVvimh1/KMLQcFuDznGEbtjkIG87L2bpITDrNA6eXar9RmnU3Rm/XGkGhsJLr hlfwuAwjU11Xy1kqfcjISuIqwByRfY0Y/sgutgK1wsdK+Wb5NlvJSeqZvis9uccwWDhh 1ktzmw9nyv9CFS6oNfjbAji6/2aAUFNmzoL1U/Gii+mPZAtqHnqymC3vfsp6LheN++XR 3HaKgao8PiD333P1pQMwV7tQQAbevmaygpSaMGR2ecOiSDRQk2/cgHyQpYzHgIkNyASw SSSXqgJIirfBtY0Q8OHU3A2gIJTOK4nJgtI0QWVpY4+cC2JdMt3NnsJCGhOhNEJ8xRie U3Yw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=WvssnA57; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a64-20020a639043000000b0046ed62f8205si67450pge.844.2022.10.21.09.39.11; Fri, 21 Oct 2022 09:39:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=WvssnA57; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231251AbiJUQik (ORCPT + 99 others); Fri, 21 Oct 2022 12:38:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52692 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230426AbiJUQhw (ORCPT ); Fri, 21 Oct 2022 12:37:52 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 18765285285 for ; Fri, 21 Oct 2022 09:37:32 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id d8-20020a25bc48000000b00680651cf051so3721773ybk.23 for ; Fri, 21 Oct 2022 09:37:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=jGDNGpgyd+LktDdBqcgk2eFzL/51App81iFIhhA8QvM=; b=WvssnA576m2gIOjqFAeA/o9F21XVuqPBOfjdLDQe3yj3f7sCXH/DyQZifeomZgACso tq9jSZfo+HUQwrmpjel2ODEauK1LcfzwDDEk6ffGwYPc4Y5s7gAYkqNbIaP2xTzfrsxZ pIO+v90Vr8xsUYHX9E/VLnIFshVrB0JCnxxBwAK2yJQEFzngwXTyf9t/ysQ0xxvch87r Zi7MRBeyl8crtyxZjZWkPZLEPtcOooB2n7nKJms7jG/Tpde/Q+Lx3dNw5d7IMczHqnN3 CZBfpX0lsEsL0bCzuxums79HAP7Zr4yoEeRp/oAqhslrlEQnEUwcSQPm09TnYI39AuL+ zA3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=jGDNGpgyd+LktDdBqcgk2eFzL/51App81iFIhhA8QvM=; b=hC6D6hEH6V4wK6TfUw0c/4wGF030xw/4irph6+WTy2WaCbzeGps3cfvpZGbxUBtL+8 e4UbO/sfRpBr1J+6TipXUWrpQJdJqJ83b47HZGnKnvXIo6YwdURHuePxCooc6/Kd0hWK LLzYLoZLoYLr8iQtItENGamdZaCvT7y3l54n4tIAGLgSfz8YgrDnxPf45Mc6qjwLsvNV 6mjRnqPmec8QDV3YaunAalrGBruST8fA5UbYSDzkULtn2ZpYiMls7i3nMfZlhWIFOiiO w6bEAWeI6YG2Miec/wtziP4XIMmLZiq42/rTQN6lTcDIEvAOOzTc0bu4Yu1SYygZLFvT dUZg== X-Gm-Message-State: ACrzQf1tESkLz4/DRfQc9ywOUYNJESyl42lX9vRZt1VDy8M9PMyBivPe 3XL7WkLKK6G7zibLkF9MkF+y8EtR+V3Zux1e X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a0d:d601:0:b0:368:6e24:d23a with SMTP id y1-20020a0dd601000000b003686e24d23amr8805625ywd.82.1666370251617; Fri, 21 Oct 2022 09:37:31 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:32 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-17-jthoughton@google.com> Subject: [RFC PATCH v2 16/47] hugetlb: make unmapping compatible with high-granularity mappings From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747315969935808815?= X-GMAIL-MSGID: =?utf-8?q?1747315969935808815?= Enlighten __unmap_hugepage_range to deal with high-granularity mappings. This doesn't change its API; it still must be called with hugepage alignment, but it will correctly unmap hugepages that have been mapped at high granularity. The rules for mapcount and refcount here are: 1. Refcount and mapcount are tracked on the head page. 2. Each page table mapping into some of an hpage will increase that hpage's mapcount and refcount by 1. Eventually, functionality here can be expanded to allow users to call MADV_DONTNEED on PAGE_SIZE-aligned sections of a hugepage, but that is not done here. Signed-off-by: James Houghton --- include/asm-generic/tlb.h | 6 ++-- mm/hugetlb.c | 76 +++++++++++++++++++++++++-------------- 2 files changed, 52 insertions(+), 30 deletions(-) diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h index 492dce43236e..c378a44915a9 100644 --- a/include/asm-generic/tlb.h +++ b/include/asm-generic/tlb.h @@ -566,9 +566,9 @@ static inline void tlb_flush_p4d_range(struct mmu_gather *tlb, __tlb_remove_tlb_entry(tlb, ptep, address); \ } while (0) -#define tlb_remove_huge_tlb_entry(h, tlb, ptep, address) \ +#define tlb_remove_huge_tlb_entry(tlb, hpte, address) \ do { \ - unsigned long _sz = huge_page_size(h); \ + unsigned long _sz = hugetlb_pte_size(&hpte); \ if (_sz >= P4D_SIZE) \ tlb_flush_p4d_range(tlb, address, _sz); \ else if (_sz >= PUD_SIZE) \ @@ -577,7 +577,7 @@ static inline void tlb_flush_p4d_range(struct mmu_gather *tlb, tlb_flush_pmd_range(tlb, address, _sz); \ else \ tlb_flush_pte_range(tlb, address, _sz); \ - __tlb_remove_tlb_entry(tlb, ptep, address); \ + __tlb_remove_tlb_entry(tlb, hpte.ptep, address);\ } while (0) /** diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 74a4afda1a7e..227150c25763 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5221,10 +5221,10 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct { struct mm_struct *mm = vma->vm_mm; unsigned long address; - pte_t *ptep; + struct hugetlb_pte hpte; pte_t pte; spinlock_t *ptl; - struct page *page; + struct page *hpage, *subpage; struct hstate *h = hstate_vma(vma); unsigned long sz = huge_page_size(h); struct mmu_notifier_range range; @@ -5235,11 +5235,6 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct BUG_ON(start & ~huge_page_mask(h)); BUG_ON(end & ~huge_page_mask(h)); - /* - * This is a hugetlb vma, all the pte entries should point - * to huge page. - */ - tlb_change_page_size(tlb, sz); tlb_start_vma(tlb, vma); /* @@ -5251,26 +5246,35 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct mmu_notifier_invalidate_range_start(&range); last_addr_mask = hugetlb_mask_last_page(h); address = start; - for (; address < end; address += sz) { - ptep = huge_pte_offset(mm, address, sz); + + while (address < end) { + pte_t *ptep = huge_pte_offset(mm, address, sz); + if (!ptep) { address |= last_addr_mask; + address += sz; continue; } + hugetlb_pte_populate(&hpte, ptep, huge_page_shift(h), + hpage_size_to_level(huge_page_size(h))); + hugetlb_hgm_walk(mm, vma, &hpte, address, + PAGE_SIZE, /*stop_at_none=*/true); - ptl = huge_pte_lock(h, mm, ptep); - if (huge_pmd_unshare(mm, vma, address, ptep)) { + ptl = hugetlb_pte_lock(mm, &hpte); + if (huge_pmd_unshare(mm, vma, address, hpte.ptep)) { spin_unlock(ptl); tlb_flush_pmd_range(tlb, address & PUD_MASK, PUD_SIZE); force_flush = true; address |= last_addr_mask; + address += sz; continue; } - pte = huge_ptep_get(ptep); + pte = huge_ptep_get(hpte.ptep); + if (huge_pte_none(pte)) { spin_unlock(ptl); - continue; + goto next_hpte; } /* @@ -5287,25 +5291,36 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct */ if (pte_swp_uffd_wp_any(pte) && !(zap_flags & ZAP_FLAG_DROP_MARKER)) - set_huge_pte_at(mm, address, ptep, + set_huge_pte_at(mm, address, hpte.ptep, make_pte_marker(PTE_MARKER_UFFD_WP)); else #endif - huge_pte_clear(mm, address, ptep, sz); + huge_pte_clear(mm, address, hpte.ptep, + hugetlb_pte_size(&hpte)); + spin_unlock(ptl); + goto next_hpte; + } + + if (unlikely(!hugetlb_pte_present_leaf(&hpte, pte))) { + /* + * We raced with someone splitting out from under us. + * Retry the walk. + */ spin_unlock(ptl); continue; } - page = pte_page(pte); + subpage = pte_page(pte); + hpage = compound_head(subpage); /* * If a reference page is supplied, it is because a specific * page is being unmapped, not a range. Ensure the page we * are about to unmap is the actual page of interest. */ if (ref_page) { - if (page != ref_page) { + if (hpage != ref_page) { spin_unlock(ptl); - continue; + goto next_hpte; } /* * Mark the VMA as having unmapped its page so that @@ -5315,27 +5330,34 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct set_vma_resv_flags(vma, HPAGE_RESV_UNMAPPED); } - pte = huge_ptep_get_and_clear(mm, address, ptep); - tlb_remove_huge_tlb_entry(h, tlb, ptep, address); + pte = huge_ptep_get_and_clear(mm, address, hpte.ptep); + tlb_change_page_size(tlb, hugetlb_pte_size(&hpte)); + tlb_remove_huge_tlb_entry(tlb, hpte, address); if (huge_pte_dirty(pte)) - set_page_dirty(page); + set_page_dirty(hpage); #ifdef CONFIG_PTE_MARKER_UFFD_WP /* Leave a uffd-wp pte marker if needed */ if (huge_pte_uffd_wp(pte) && !(zap_flags & ZAP_FLAG_DROP_MARKER)) - set_huge_pte_at(mm, address, ptep, + set_huge_pte_at(mm, address, hpte.ptep, make_pte_marker(PTE_MARKER_UFFD_WP)); #endif - hugetlb_count_sub(pages_per_huge_page(h), mm); - page_remove_rmap(page, vma, true); + hugetlb_count_sub(hugetlb_pte_size(&hpte)/PAGE_SIZE, mm); + page_remove_rmap(hpage, vma, true); spin_unlock(ptl); - tlb_remove_page_size(tlb, page, huge_page_size(h)); /* - * Bail out after unmapping reference page if supplied + * Lower the reference count on the head page. + */ + tlb_remove_page_size(tlb, hpage, sz); + /* + * Bail out after unmapping reference page if supplied, + * and there's only one PTE mapping this page. */ - if (ref_page) + if (ref_page && hugetlb_pte_size(&hpte) == sz) break; +next_hpte: + address += hugetlb_pte_size(&hpte); } mmu_notifier_invalidate_range_end(&range); tlb_end_vma(tlb, vma); From patchwork Fri Oct 21 16:36:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6837 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp795821wrr; Fri, 21 Oct 2022 09:39:30 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6Zil/vY8an3g3Z+H6Ez/wW/AUmYccc8OI4PrZIZBOA2Q04TbmKJsmzeeTKz7O83C7q1aLt X-Received: by 2002:a65:6bca:0:b0:420:712f:ab98 with SMTP id e10-20020a656bca000000b00420712fab98mr16740543pgw.350.1666370370539; Fri, 21 Oct 2022 09:39:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370370; cv=none; d=google.com; s=arc-20160816; b=Ri4X/xjF20Dun1TpsIiuJdqoj0ZAb8XWGTroFYYcOvIJJnfxzd2DafiBetqgKcFu0x a/aUtQ9h+2FdNY+uLhik09aNujiSvshXDl1TfTOIGBsD4TRaYs6XbPU3gT9/lcJZ6LQU hIsZ52gRICEKAa3o2aFO6xxuQa9kZGNrLbFwu/6ToM0SozBXiC+BmJttWugz7fx3Gb3V HxBMXFG1R082yohhcsCrGC+Pm/a/usBabrMvqbsEcIwqz1cfRaPx6G5zX4b9ljdgrUi7 okep0ZQ9Q0Qy9psTzfDZdkAKuJuxEIOlRs6t0xRseNeM5dCe6eTxm6X4Q7oAFITpf2Tp CpmA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=W0hHBKIF8uIcOgxC/q0OXlvofpo6KadXA1H0O4ltk1Y=; b=fTYiJr+Y9Oflk5aV8FNlA+u8fuodGWLEn5y7B9u8wiw51lA0HmDe2FSYnlk8VHWiFJ aVoKJnx6dO/UnpRUWe2XIysRUiVaWitZsxLYB/nn5K4dNAea68H7E2gtoNsJZCCiOyqM NPBfJ2dmcFltkuybBd2xMlPgNwcoX1w3BECx6vSHLlHNvfLaEWEKmJZZQ/gGkYrj7D+r Nk+3K1+t6EUWhY9AYqbpyw5HkBYh1S+mDSpGDWWcGW2z7a+zvIHxdJZ9Yxo3EVTUPYoj wUqnoAS0hf5zbrDF6+BhgBtjjriXQWYnhVN46WfMOeIpvogAJnNRvlbKlOe/rI/ylobM Ev6g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=AHId1r8q; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z3-20020a170902708300b00176a116a20csi23641998plk.567.2022.10.21.09.39.18; Fri, 21 Oct 2022 09:39:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=AHId1r8q; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230448AbiJUQiy (ORCPT + 99 others); Fri, 21 Oct 2022 12:38:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53966 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231154AbiJUQiJ (ORCPT ); Fri, 21 Oct 2022 12:38:09 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1537A2852BF for ; Fri, 21 Oct 2022 09:37:33 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id y65-20020a25c844000000b006bb773548d5so3743999ybf.5 for ; Fri, 21 Oct 2022 09:37:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=W0hHBKIF8uIcOgxC/q0OXlvofpo6KadXA1H0O4ltk1Y=; b=AHId1r8qc07ghUEt5/Ex2Zf0W0zSQGOUnXdHCxNiJR1Hno1KO2tg3IaYi5fsBFFREt zaTPG3ZOZ7ZDI5SYZUSdd164/SBx2RNgUpq7SyZUqrxObCinLqY5HPhh4EIS39CFsVS9 CAn3ANMOBawas/OIPX5suHRWyjuMGO9l7g4fyyzNwadwFTopQQE0DuLsY9+NrFeWg5Jp XA17TZNaMI18gkKs5ImobgIPXu4MoBCgWZQKQetYF2kcQudpcNWu4meOa0LoiWW9Y37P EB6D21RUxMU62JbTdgEoQX2uHO4k4jpr8V5zyAH7bZhv3mCQ5m7YGVKOsgczQDIKG0Ut QxXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=W0hHBKIF8uIcOgxC/q0OXlvofpo6KadXA1H0O4ltk1Y=; b=gGiKvW4yfmuqjOqvqJMqQkSospW5ebCqBPjG6i/HbkO+sR6cDD+fllx6aryCdy35CE cfb4xdNYkH4WF55R7/RByqtIfvLBNzdsig7dC42t7z9Ifuvgex6BXelY9Ljlyf8FX9qc yeb4/Qkl86mUlunrYPjpBXjG0K+G5KwLu5cS32cst4+qskJVEMuVrefFt5EkFuodlQxL KNdOyHh147vXPs6qpe/UIcRrY9EFgUiPqsxzl/p7Cso1yAASvorokwA1dRucDkPKw1ec weDqkj6TZtgZaB+/LioGiu29IwziiYCM0m457bNfn7rPZsYPIv+w4mvSQpCX7wv+hSOr YZTg== X-Gm-Message-State: ACrzQf2vYXoh2aZ5dAZ5e1dd0PUQGBEZgNX5UIx/EeIuOjYYrntXht65 n4Q79NCvltOK18U9hss2qbhmq9LlluAg6KN2 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a0d:c5c4:0:b0:34a:7ab0:7b29 with SMTP id h187-20020a0dc5c4000000b0034a7ab07b29mr18365233ywd.294.1666370252562; Fri, 21 Oct 2022 09:37:32 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:33 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-18-jthoughton@google.com> Subject: [RFC PATCH v2 17/47] hugetlb: make hugetlb_change_protection compatible with HGM From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747315977176411092?= X-GMAIL-MSGID: =?utf-8?q?1747315977176411092?= The main change here is to do a high-granularity walk and pulling the shift from the walk (not from the hstate). Signed-off-by: James Houghton --- mm/hugetlb.c | 65 ++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 45 insertions(+), 20 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 227150c25763..2d096cef53cd 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6654,15 +6654,15 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, { struct mm_struct *mm = vma->vm_mm; unsigned long start = address; - pte_t *ptep; pte_t pte; struct hstate *h = hstate_vma(vma); - unsigned long pages = 0, psize = huge_page_size(h); + unsigned long base_pages = 0, psize = huge_page_size(h); bool shared_pmd = false; struct mmu_notifier_range range; unsigned long last_addr_mask; bool uffd_wp = cp_flags & MM_CP_UFFD_WP; bool uffd_wp_resolve = cp_flags & MM_CP_UFFD_WP_RESOLVE; + struct hugetlb_pte hpte; /* * In the case of shared PMDs, the area to flush could be beyond @@ -6680,31 +6680,38 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, hugetlb_vma_lock_write(vma); i_mmap_lock_write(vma->vm_file->f_mapping); last_addr_mask = hugetlb_mask_last_page(h); - for (; address < end; address += psize) { + while (address < end) { spinlock_t *ptl; - ptep = huge_pte_offset(mm, address, psize); + pte_t *ptep = huge_pte_offset(mm, address, psize); + if (!ptep) { address |= last_addr_mask; + address += huge_page_size(h); continue; } - ptl = huge_pte_lock(h, mm, ptep); - if (huge_pmd_unshare(mm, vma, address, ptep)) { + hugetlb_pte_populate(&hpte, ptep, huge_page_shift(h), + hpage_size_to_level(psize)); + hugetlb_hgm_walk(mm, vma, &hpte, address, PAGE_SIZE, + /*stop_at_none=*/true); + + ptl = hugetlb_pte_lock(mm, &hpte); + if (huge_pmd_unshare(mm, vma, address, hpte.ptep)) { /* * When uffd-wp is enabled on the vma, unshare * shouldn't happen at all. Warn about it if it * happened due to some reason. */ WARN_ON_ONCE(uffd_wp || uffd_wp_resolve); - pages++; + base_pages += hugetlb_pte_size(&hpte) / PAGE_SIZE; spin_unlock(ptl); shared_pmd = true; address |= last_addr_mask; - continue; + goto next_hpte; } - pte = huge_ptep_get(ptep); + pte = huge_ptep_get(hpte.ptep); if (unlikely(is_hugetlb_entry_hwpoisoned(pte))) { spin_unlock(ptl); - continue; + goto next_hpte; } if (unlikely(is_hugetlb_entry_migration(pte))) { swp_entry_t entry = pte_to_swp_entry(pte); @@ -6724,11 +6731,11 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, newpte = pte_swp_mkuffd_wp(newpte); else if (uffd_wp_resolve) newpte = pte_swp_clear_uffd_wp(newpte); - set_huge_pte_at(mm, address, ptep, newpte); - pages++; + set_huge_pte_at(mm, address, hpte.ptep, newpte); + base_pages += hugetlb_pte_size(&hpte) / PAGE_SIZE; } spin_unlock(ptl); - continue; + goto next_hpte; } if (unlikely(pte_marker_uffd_wp(pte))) { /* @@ -6736,21 +6743,37 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, * no need for huge_ptep_modify_prot_start/commit(). */ if (uffd_wp_resolve) - huge_pte_clear(mm, address, ptep, psize); + huge_pte_clear(mm, address, hpte.ptep, + hugetlb_pte_size(&hpte)); } if (!huge_pte_none(pte)) { pte_t old_pte; - unsigned int shift = huge_page_shift(hstate_vma(vma)); + unsigned int shift = hpte.shift; - old_pte = huge_ptep_modify_prot_start(vma, address, ptep); + /* + * Because we are holding the VMA lock for writing, pte + * will always be a leaf. WARN if it is not. + */ + if (unlikely(!hugetlb_pte_present_leaf(&hpte, pte))) { + spin_unlock(ptl); + WARN_ONCE(1, "Unexpected non-leaf PTE: ptep:%p, address:0x%lx\n", + hpte.ptep, address); + continue; + } + + old_pte = huge_ptep_modify_prot_start( + vma, address, hpte.ptep); pte = huge_pte_modify(old_pte, newprot); - pte = arch_make_huge_pte(pte, shift, vma->vm_flags); + pte = arch_make_huge_pte( + pte, shift, vma->vm_flags); if (uffd_wp) pte = huge_pte_mkuffd_wp(huge_pte_wrprotect(pte)); else if (uffd_wp_resolve) pte = huge_pte_clear_uffd_wp(pte); - huge_ptep_modify_prot_commit(vma, address, ptep, old_pte, pte); - pages++; + huge_ptep_modify_prot_commit( + vma, address, hpte.ptep, + old_pte, pte); + base_pages += hugetlb_pte_size(&hpte) / PAGE_SIZE; } else { /* None pte */ if (unlikely(uffd_wp)) @@ -6759,6 +6782,8 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, make_pte_marker(PTE_MARKER_UFFD_WP)); } spin_unlock(ptl); +next_hpte: + address += hugetlb_pte_size(&hpte); } /* * Must flush TLB before releasing i_mmap_rwsem: x86's huge_pmd_unshare @@ -6781,7 +6806,7 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, hugetlb_vma_unlock_write(vma); mmu_notifier_invalidate_range_end(&range); - return pages << h->order; + return base_pages; } /* Return true if reservation was successful, false otherwise. */ From patchwork Fri Oct 21 16:36:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6836 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp795805wrr; Fri, 21 Oct 2022 09:39:29 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7SvHu0J/1Gne/b3mcw2sxxtOU9y7vns67Hr5RZQirsiTuOnwOK/hCkXoMyIZyZSyz/vj6m X-Received: by 2002:a05:6a00:1414:b0:563:ae66:3103 with SMTP id l20-20020a056a00141400b00563ae663103mr19659153pfu.17.1666370369557; Fri, 21 Oct 2022 09:39:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370369; cv=none; d=google.com; s=arc-20160816; b=AIh3yA3vKRpvEAQdq9HvkAUa6wjMr4rgqSrAiUkBvzwJGVbeM1bUhHgny7LTcgYjXk rDT9Sxa4JSwCYOMLGX7HOQ0wiDEIqJJLFUKYGP9FBV4TLzkeR7tWiAl/W6IdJB9ro158 o3Old60ad6ww2mu54nhE3EV0Yxg6PYoYRvxVkr+vslouYXyUpWq1SigG+RHrIs2I5Cbn 016JaLUPg1nT5izhzlTrkxPSkwTr7QQqBfZjMo2t30UbhMDdbbzaDFNZNcvWSL/wfh2z hTCnW4eqKx+1a1hbAnG6EqSR/R2TT6I10TZ1H4vaUx5B6TgoA1lAhQZisib3lp5EyyeQ ijdQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=X13wx2njoRAvoh4E0IMJVfrDZ5s4WLay4B3IFLQbLEY=; b=sMbNU2W3iD5LJnihuraXO/zLwrS+p4GqPIEfb040GWVmlp570Adnem3qCxX0EQtPbd /CA7EoEBfFGYDOCGetmtGfcWpICXT6pHgGxIWJAhJNnxjNo648J/AVQKvWvE/hwjIsRv FmO3/KErJ8RyQzAEedb0fnzszdVrg2rOSph7NBv8rvt9A5Bkz2h9OncxyQD48K+NqNsv CTaXZ7I1WZLiWyZdZW6WMyXmmLsjUqKCE/6BZdDjROveSnisISRTtWxy5Zur69yiYE58 eoeMZia4qCC//DF4y0yZ5cRRn35DmxoWWN6hPJF7FHomuwvwaNqNvqBViccDhaFA1dco G7Jw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=Sa9tlTSv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z5-20020a170902d54500b001749d6132a7si26733274plf.251.2022.10.21.09.39.16; Fri, 21 Oct 2022 09:39:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=Sa9tlTSv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231265AbiJUQip (ORCPT + 99 others); Fri, 21 Oct 2022 12:38:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53882 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231136AbiJUQiI (ORCPT ); Fri, 21 Oct 2022 12:38:08 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DF6DA2852A1 for ; Fri, 21 Oct 2022 09:37:34 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-36ab1ae386bso3385327b3.16 for ; Fri, 21 Oct 2022 09:37:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=X13wx2njoRAvoh4E0IMJVfrDZ5s4WLay4B3IFLQbLEY=; b=Sa9tlTSvMy27PkqCy85zUMB7NXscw1SeftIzcgjHIdZeuKi35UjClC0+e61EdsQTSl NMLCrfPUooY7jzsHIfTKQHEnm6IUVAUYCi3yYETPsKUeFPWIFi4VRk+XusoX6DEnlweI lFO+cIJ76Sk+AjPePBoVFpvlnw2U1gr7dRgFq3DrzIDAhWlVL+NEO/0ro9fFwSCY/mjx 7ZtzekH0PrfJEZhy6u5zJ6NFUTk/+Y3IJrfcH7Gs6tO7dALKjpojag4k7ICH4rINP9Vg 6A/PZUKktjf3jROcDajl8cuELh+tfbsO8Bbf5at999k7ONKuVzu9yjryfOrezKepLD6u XbIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=X13wx2njoRAvoh4E0IMJVfrDZ5s4WLay4B3IFLQbLEY=; b=JUci/z6Xz3Jkw0tNmAu0sk9mykq59RyahbUfMgpkK/OeDsiBYVdbY0tQlWMJ2j2OZa Tl7Qv2py/g3ZvThLqGhO84ormcrvnFPdrTQ2wvNd0I46h5zqYj2gTOTQtMJa6GqbADDC jjwxtWFuDoCZyNQDM1WFmSSnXPRvx5RI+2wJLWe55CcwQrRfWNWtsTK4cePEO0y617us 43VpEu590M9otyqvGgd8ROZV3mQ1sCO+LTjAHMumogT40zWJ6LIwX4fNnFXLpP55BJ6f QZBOgEvGxBM1DSIHB5Xy+xhq5keiaGZ6xpxX8OW9MZK11Bg1BSGTW2DTdK2Jxo5ne8zl 6APA== X-Gm-Message-State: ACrzQf3ZexEpK9aENtV4GiHZomThr2efAwfYwcNSINMB+rQ/SDXdEeth NsX8Cylfc43G8pcVEM1D4zmblPVprd6TMUBw X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:87:0:b0:6c3:b703:ef69 with SMTP id 129-20020a250087000000b006c3b703ef69mr17587149yba.126.1666370253374; Fri, 21 Oct 2022 09:37:33 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:34 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-19-jthoughton@google.com> Subject: [RFC PATCH v2 18/47] hugetlb: enlighten follow_hugetlb_page to support HGM From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747315977140412303?= X-GMAIL-MSGID: =?utf-8?q?1747315977140412303?= This enables high-granularity mapping support in GUP. One important change here is that, before, we never needed to grab the VMA lock, but now, to prevent someone from collapsing the page tables out from under us, we grab it for reading when doing high-granularity PT walks. In case it is confusing, pfn_offset is the offset (in PAGE_SIZE units) that vaddr points to within the subpage that hpte points to. Signed-off-by: James Houghton --- mm/hugetlb.c | 76 ++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 53 insertions(+), 23 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 2d096cef53cd..d76ab32fb6d3 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6382,11 +6382,9 @@ static void record_subpages_vmas(struct page *page, struct vm_area_struct *vma, } } -static inline bool __follow_hugetlb_must_fault(unsigned int flags, pte_t *pte, +static inline bool __follow_hugetlb_must_fault(unsigned int flags, pte_t pteval, bool *unshare) { - pte_t pteval = huge_ptep_get(pte); - *unshare = false; if (is_swap_pte(pteval)) return true; @@ -6478,12 +6476,20 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, struct hstate *h = hstate_vma(vma); int err = -EFAULT, refs; + /* + * Grab the VMA lock for reading now so no one can collapse the page + * table from under us. + */ + hugetlb_vma_lock_read(vma); + while (vaddr < vma->vm_end && remainder) { - pte_t *pte; + pte_t *ptep, pte; spinlock_t *ptl = NULL; bool unshare = false; int absent; - struct page *page; + unsigned long pages_per_hpte; + struct page *page, *subpage; + struct hugetlb_pte hpte; /* * If we have a pending SIGKILL, don't keep faulting pages and @@ -6499,13 +6505,22 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, * each hugepage. We have to make sure we get the * first, for the page indexing below to work. * - * Note that page table lock is not held when pte is null. + * Note that page table lock is not held when ptep is null. */ - pte = huge_pte_offset(mm, vaddr & huge_page_mask(h), - huge_page_size(h)); - if (pte) - ptl = huge_pte_lock(h, mm, pte); - absent = !pte || huge_pte_none(huge_ptep_get(pte)); + ptep = huge_pte_offset(mm, vaddr & huge_page_mask(h), + huge_page_size(h)); + if (ptep) { + hugetlb_pte_populate(&hpte, ptep, huge_page_shift(h), + hpage_size_to_level(huge_page_size(h))); + hugetlb_hgm_walk(mm, vma, &hpte, vaddr, + PAGE_SIZE, + /*stop_at_none=*/true); + ptl = hugetlb_pte_lock(mm, &hpte); + ptep = hpte.ptep; + pte = huge_ptep_get(ptep); + } + + absent = !ptep || huge_pte_none(pte); /* * When coredumping, it suits get_dump_page if we just return @@ -6516,12 +6531,19 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, */ if (absent && (flags & FOLL_DUMP) && !hugetlbfs_pagecache_present(h, vma, vaddr)) { - if (pte) + if (ptep) spin_unlock(ptl); remainder = 0; break; } + if (!absent && pte_present(pte) && + !hugetlb_pte_present_leaf(&hpte, pte)) { + /* We raced with someone splitting the PTE, so retry. */ + spin_unlock(ptl); + continue; + } + /* * We need call hugetlb_fault for both hugepages under migration * (in which case hugetlb_fault waits for the migration,) and @@ -6537,7 +6559,10 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, vm_fault_t ret; unsigned int fault_flags = 0; - if (pte) + /* Drop the lock before entering hugetlb_fault. */ + hugetlb_vma_unlock_read(vma); + + if (ptep) spin_unlock(ptl); if (flags & FOLL_WRITE) fault_flags |= FAULT_FLAG_WRITE; @@ -6560,7 +6585,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, if (ret & VM_FAULT_ERROR) { err = vm_fault_to_errno(ret, flags); remainder = 0; - break; + goto out; } if (ret & VM_FAULT_RETRY) { if (locked && @@ -6578,11 +6603,14 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, */ return i; } + hugetlb_vma_lock_read(vma); continue; } - pfn_offset = (vaddr & ~huge_page_mask(h)) >> PAGE_SHIFT; - page = pte_page(huge_ptep_get(pte)); + pfn_offset = (vaddr & ~hugetlb_pte_mask(&hpte)) >> PAGE_SHIFT; + subpage = pte_page(pte); + pages_per_hpte = hugetlb_pte_size(&hpte) / PAGE_SIZE; + page = compound_head(subpage); VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) && !PageAnonExclusive(page), page); @@ -6592,21 +6620,21 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, * and skip the same_page loop below. */ if (!pages && !vmas && !pfn_offset && - (vaddr + huge_page_size(h) < vma->vm_end) && - (remainder >= pages_per_huge_page(h))) { - vaddr += huge_page_size(h); - remainder -= pages_per_huge_page(h); - i += pages_per_huge_page(h); + (vaddr + pages_per_hpte < vma->vm_end) && + (remainder >= pages_per_hpte)) { + vaddr += pages_per_hpte; + remainder -= pages_per_hpte; + i += pages_per_hpte; spin_unlock(ptl); continue; } /* vaddr may not be aligned to PAGE_SIZE */ - refs = min3(pages_per_huge_page(h) - pfn_offset, remainder, + refs = min3(pages_per_hpte - pfn_offset, remainder, (vma->vm_end - ALIGN_DOWN(vaddr, PAGE_SIZE)) >> PAGE_SHIFT); if (pages || vmas) - record_subpages_vmas(nth_page(page, pfn_offset), + record_subpages_vmas(nth_page(subpage, pfn_offset), vma, refs, likely(pages) ? pages + i : NULL, vmas ? vmas + i : NULL); @@ -6637,6 +6665,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, spin_unlock(ptl); } + hugetlb_vma_unlock_read(vma); +out: *nr_pages = remainder; /* * setting position is actually required only if remainder is From patchwork Fri Oct 21 16:36:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6838 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp795886wrr; Fri, 21 Oct 2022 09:39:36 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5HATryo+sulQNAXfmCgXSLPmZNEFZ0ImXxkyu9jlA/i6HG0d8ajTRozio8x0WQ0QpGuzmw X-Received: by 2002:a05:6a00:cd1:b0:563:8df5:2b8b with SMTP id b17-20020a056a000cd100b005638df52b8bmr20158361pfv.67.1666370376525; Fri, 21 Oct 2022 09:39:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370376; cv=none; d=google.com; s=arc-20160816; b=zN1ka6kWhltxoJniNZ42xGS/urzEBAl8hnoELQf+piOm33VDMZhC/ThHqzt4MiqfeM Fj8kBrXksG/go6cHKiPwf50gupd8moSijJJ+64ofHMMHMKSM4m0rpZYeGOSXtj2eOH/Y BN6OFieBOMvSyfyQj/JLJCs26h9FE4b+FaEmJWh0sfAhlyOs4F+G4uzAvTKbAPZPrFr3 r3Cpgb0S4YweRMB5nKOsSmdm/6+BBV6+Ln9TcdPFSUJr3+oPODtgrJJW9iv2teMeevee m4MOVCgU9Ps6Q9hM5Kpm7rC4ieIhPugWSFq4V3NzxCMKrhaFCgU9lwPe1xO1I6UbgPf2 l2UA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=CRJ6pkRKArtLmZe7XMBJjGLJyFbSk0rbbeye9q84C9k=; b=OTzr9noAMNXsBr2MRuLojqacOllw0GHcvx1uFCM/Qs2uW09MTD4+26K74v9FA4Hj0w WLySGXPGDR5Ms8xsn55W0TJ5dtP25Rv1f9ftmJGSTWmazdUVlVcMZYYLdu/udA8tQ+Fl VdoOYsH3c/dhhAvdEZZfr4Cr3X05PbIAAc7FgHCW/4mtmMXLVsm03lVE9emBTo7rYF6n pNS4BJYhU0cxNmPHhlCc1TGflZLryq3snS3E4bsX3hZYhZamx7vUrD6HLkUcV/djU8xP U8iPcX0niFOnjObG3xgSuuWqoeuCw22tuX3ZJWoeAH5QKEhE/NCLMFAS/ruBLh1OSVsz ij5Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=ZQtGwmeX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i22-20020a63e916000000b003fea0415b5asi27312656pgh.834.2022.10.21.09.39.24; Fri, 21 Oct 2022 09:39:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=ZQtGwmeX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230460AbiJUQjF (ORCPT + 99 others); Fri, 21 Oct 2022 12:39:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53972 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231160AbiJUQiJ (ORCPT ); Fri, 21 Oct 2022 12:38:09 -0400 Received: from mail-vs1-xe4a.google.com (mail-vs1-xe4a.google.com [IPv6:2607:f8b0:4864:20::e4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 886E7285996 for ; Fri, 21 Oct 2022 09:37:35 -0700 (PDT) Received: by mail-vs1-xe4a.google.com with SMTP id i4-20020a05610220c400b003a7718929ecso1043612vsr.10 for ; Fri, 21 Oct 2022 09:37:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=CRJ6pkRKArtLmZe7XMBJjGLJyFbSk0rbbeye9q84C9k=; b=ZQtGwmeXsLEasfxlAVd+Ky+KcZ4u0ied8Nvno8abVpS+XKmDnGeaG7sbDC8DX8DjvM 74vP8xl/C/D6frwkZjH0NjGEaALQrxYOl4qyIHM6NFQ6TqBYFbfRh99+goV7aMS+wqHZ EWOers7WW52iKsOYpfLe1iauG6eIptFtQYmFfmKiotPdEPO4sKbu65xEzDbR/5RG82C/ y77vYpTF7YV+Whm4oAUX1QUUOJtL0b1RKIBN4O1mH8xMYkKU5aZVNAgTr4TGtpelYVZI e9gU8PeJNzs1B5TCPPy55btaAR0P5+I/VVVQLdPgY9hG0sq1HT32TKdmh24otsMY096+ x/yw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=CRJ6pkRKArtLmZe7XMBJjGLJyFbSk0rbbeye9q84C9k=; b=QXrSZID3DZb2tjqMQ3iz4ddmhYtkKm5Xk55sArygLEkLaKi0F/Fq/5gb2h45ehMUjS uGgBBXVjE9U2ZgrsMea35WJ02Cx7sw6eMBbVBDVbwRLcEI2GxHMR8n2DGR01tfFKAY7e EHI0ZHfm18/P6FHwbyDEi76BcGl7NolszPrH3ounF2eQSnjnv9dZAalw0FveFJe+M8Yh pRaYfKVq9nrbDI2UtnKpDUxFG+wK/qc63sCxGD4ElzLizgk72mOCCl8Mkvm3fknHl/4/ zW7yZaS+/BBRxMvcbaWfMZMxDmEXyOOhOIU//ny1R0vm+YHuPBrx2f2u1OUKNSBg3aQn kCrw== X-Gm-Message-State: ACrzQf2dgtVGS4ZhR1wwh4oAmVjMbmKfrQsYhv4rUW+NprYLHBJneyqX Lj9BS3RDel0Lr22Sbo/yPeYTIefKQREX3Pxo X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6102:3678:b0:3a7:6056:9e7a with SMTP id bg24-20020a056102367800b003a760569e7amr12639659vsb.62.1666370254111; Fri, 21 Oct 2022 09:37:34 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:35 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-20-jthoughton@google.com> Subject: [RFC PATCH v2 19/47] hugetlb: make hugetlb_follow_page_mask HGM-enabled From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747315983754687652?= X-GMAIL-MSGID: =?utf-8?q?1747315983754687652?= The change here is very simple: do a high-granularity walk. Signed-off-by: James Houghton --- mm/hugetlb.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index d76ab32fb6d3..5783a8307a77 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6408,6 +6408,7 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, struct page *page = NULL; spinlock_t *ptl; pte_t *pte, entry; + struct hugetlb_pte hpte; /* * FOLL_PIN is not supported for follow_page(). Ordinary GUP goes via @@ -6429,9 +6430,22 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, return NULL; } - ptl = huge_pte_lock(h, mm, pte); +retry_walk: + hugetlb_pte_populate(&hpte, pte, huge_page_shift(h), + hpage_size_to_level(huge_page_size(h))); + hugetlb_hgm_walk(mm, vma, &hpte, address, + PAGE_SIZE, + /*stop_at_none=*/true); + + ptl = hugetlb_pte_lock(mm, &hpte); entry = huge_ptep_get(pte); if (pte_present(entry)) { + if (unlikely(!hugetlb_pte_present_leaf(&hpte, entry))) { + /* We raced with someone splitting from under us. */ + spin_unlock(ptl); + goto retry_walk; + } + page = pte_page(entry) + ((address & ~huge_page_mask(h)) >> PAGE_SHIFT); /* From patchwork Fri Oct 21 16:36:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6841 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp796021wrr; Fri, 21 Oct 2022 09:39:50 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7QN7jQ6RgFJ/UdW9mpRnTlNI+pzH4ibdXc3lvFv6sWRo6NAfSu9nY5j5dPneCOikxREJHN X-Received: by 2002:a65:6e0e:0:b0:434:59e0:27d3 with SMTP id bd14-20020a656e0e000000b0043459e027d3mr16446472pgb.185.1666370390528; Fri, 21 Oct 2022 09:39:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370390; cv=none; d=google.com; s=arc-20160816; b=NbFHNGeh1KPGrhl7uBcUgOVqPqycCg0P/ZUnr2kej1yKAQ4dA4XrFpGSVgFayiUA8l eI9vqhfdC+/EbcjQ0I6bZrVR+kG/ccMeW3oA9vefEpeW+2AnYIEJWvUWdtjYAEg2G/yF ZzReMZkH5KlDOKXvc9i2FnHJIWPWsBUBGIfe6uKXJHnz7Gqst6fpgd6Tpu9yBMezAzWA ThnerH82ApOAlsXpPd3sH4KLnKhDKD5x5NTSdQzL2I6Izu6TZ/RV6GEXQ7+O1aX7i+Yw iw6vSjFw4yjs9tLx3y+gbKg9rTmf0J9rnzlqx6X+Wk5iicwG2ZfFNd6ZP/YkZMzVxve6 s8gQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=jeZGXnq9ppZX6Bn3Ab6ADEhW+p+OYv5o03S1VkTLnuk=; b=pwixELOb6RniGD1pzKmjYRxBUNSqjc10SOVjeZoA7lDPjOGlVizfkDKQB1YE4oU+mD sfNE4ku9BOV2cztxsVrAHwEFnDGqgdZRda7rdRrJGOQFT/norDjyJPZy0F3Cp/PFhE6b DpKbyfjZ4ZA9eFj1bmRhkuVi2n53bcIL3pWLlxLqSCoBhXet8Fwn07a8Kes00Ws2Kk7k 1mQTLiQ7QvRAWKR4FIea7Z3aulCXeSE4SwIcKzhtc34Y4msC8wrXQpAcdauVNUhWXS/E YFsJ6bKgldmB3GmXmsXztnTkO6pXsr5fBCUGIj+bScZfZClW+IvHLyJEloUvfzXKUTnz EH5A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=tbNvguBz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n18-20020a170902e55200b0017f8a070d5asi29747782plf.380.2022.10.21.09.39.38; Fri, 21 Oct 2022 09:39:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=tbNvguBz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231338AbiJUQjV (ORCPT + 99 others); Fri, 21 Oct 2022 12:39:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52652 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231173AbiJUQiP (ORCPT ); Fri, 21 Oct 2022 12:38:15 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7727C285286 for ; Fri, 21 Oct 2022 09:37:36 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id p66-20020a257445000000b006ca0ba7608fso3732378ybc.7 for ; Fri, 21 Oct 2022 09:37:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=jeZGXnq9ppZX6Bn3Ab6ADEhW+p+OYv5o03S1VkTLnuk=; b=tbNvguBzB1ZQ4lsC5+9mg7RWgx9DLFVbmF5v6BFznCk26QFFQdPaNiWrYza+kxDylE tEa7a8XUOhkliOlVXdX6cyfm8TE9X0VEvk/pj5HJUcnfc7FgSAESb7fi8f7w8gA576cO LYwJodJrSLZqWi2/eMrO03DghELV+IjVJYCQF4i3lW2FOVdNkbcytuaue/e5g9HIMMWF uNu1bQ+0pIqtKnUyor0k37TH4YOlWSMHgWWcV4CAVeufrCm6C4lzViN3BceDaEdp7Rlw gCQWhWp3Lo+C7PMV0hXo81mJTBpOeDjRf2UNTijPurz0+pu6m8vVHSiJIopwA6humXkP 4dgA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=jeZGXnq9ppZX6Bn3Ab6ADEhW+p+OYv5o03S1VkTLnuk=; b=CZWHe0HDhmcaqt8rwVsj4xwi9Gc1fWjXf0s7NucKl83zryrovKG1SjYOaJ1miAfiaC 2Nj4LPJm892h9FO2/vGM/1VhXUtMauEZEt6+uVY6GMF5EqCFC6Vf4SWfNq/Do8ljlqP+ KC2fnLeyLimm4JwvdqMQnZrYN9MbC5CMTmKWhSEP0GFIZC0DwODyvix15PAJtUYyqB3j SYPof2vN04EVMbU+hgMwDtW+ByQGdgnoVAXai+Xoe01ElmUocl1mCv1GjhnK8aKib6Y4 FrfcYnFstP3+p0UlR5jIUiURhJJHn1ttTwIQ/im4k+SB/kkWgFXZT7mxFkVxGrasb+WX toVg== X-Gm-Message-State: ACrzQf0ny2lYAM2j2cLp5a+5rOE2v7FrJQzRUKA/aaXIqNRluxFiI4iG SXkAmQVsfJwMRubppVjpu8+K0uLyD4HOZr0V X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a81:df08:0:b0:352:f2f2:580c with SMTP id c8-20020a81df08000000b00352f2f2580cmr17379566ywn.40.1666370254979; Fri, 21 Oct 2022 09:37:34 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:36 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-21-jthoughton@google.com> Subject: [RFC PATCH v2 20/47] hugetlb: use struct hugetlb_pte for walk_hugetlb_range From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747315998580575472?= X-GMAIL-MSGID: =?utf-8?q?1747315998580575472?= The main change in this commit is to walk_hugetlb_range to support walking HGM mappings, but all walk_hugetlb_range callers must be updated to use the new API and take the correct action. Listing all the changes to the callers: For s390 changes, we simply ignore HGM PTEs (we don't support s390 yet). For smaps, shared_hugetlb (and private_hugetlb, although private mappings don't support HGM) may now not be divisible by the hugepage size. The appropriate changes have been made to support analyzing HGM PTEs. For pagemap, we ignore non-leaf PTEs by treating that as if they were none PTEs. We can only end up with non-leaf PTEs if they had just been updated from a none PTE. For show_numa_map, the challenge is that, if any of a hugepage is mapped, we have to count that entire page exactly once, as the results are given in units of hugepages. To support HGM mappings, we keep track of the last page that we looked it. If the hugepage we are currently looking at is the same as the last one, then we must be looking at an HGM-mapped page that has been mapped at high-granularity, and we've already accounted for it. For DAMON, we treat non-leaf PTEs as if they were blank, for the same reason as pagemap. For hwpoison, we proactively update the logic to support the case when hpte is pointing to a subpage within the poisoned hugepage. For queue_pages_hugetlb/migration, we ignore all HGM-enabled VMAs for now. For mincore, we ignore non-leaf PTEs for the same reason as pagemap. For mprotect/prot_none_hugetlb_entry, we retry the walk when we get a non-leaf PTE. Signed-off-by: James Houghton --- arch/s390/mm/gmap.c | 20 ++++++++-- fs/proc/task_mmu.c | 83 +++++++++++++++++++++++++++++----------- include/linux/pagewalk.h | 11 ++++-- mm/damon/vaddr.c | 57 +++++++++++++++++---------- mm/hmm.c | 21 ++++++---- mm/memory-failure.c | 17 ++++---- mm/mempolicy.c | 12 ++++-- mm/mincore.c | 17 ++++++-- mm/mprotect.c | 18 ++++++--- mm/pagewalk.c | 32 +++++++++++++--- 10 files changed, 203 insertions(+), 85 deletions(-) diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c index 02d15c8dc92e..d65c15b5dccb 100644 --- a/arch/s390/mm/gmap.c +++ b/arch/s390/mm/gmap.c @@ -2622,13 +2622,25 @@ static int __s390_enable_skey_pmd(pmd_t *pmd, unsigned long addr, return 0; } -static int __s390_enable_skey_hugetlb(pte_t *pte, unsigned long addr, - unsigned long hmask, unsigned long next, +static int __s390_enable_skey_hugetlb(struct hugetlb_pte *hpte, + unsigned long addr, struct mm_walk *walk) { - pmd_t *pmd = (pmd_t *)pte; + struct hstate *h = hstate_vma(walk->vma); + pmd_t *pmd; unsigned long start, end; - struct page *page = pmd_page(*pmd); + struct page *page; + + if (huge_page_size(h) != hugetlb_pte_size(hpte)) + /* Ignore high-granularity PTEs. */ + return 0; + + if (!pte_present(huge_ptep_get(hpte->ptep))) + /* Ignore non-present PTEs. */ + return 0; + + pmd = (pmd_t *)pte; + page = pmd_page(*pmd); /* * The write check makes sure we do not set a key on shared diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 8a74cdcc9af0..be78cdb7677e 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -720,18 +720,28 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) } #ifdef CONFIG_HUGETLB_PAGE -static int smaps_hugetlb_range(pte_t *pte, unsigned long hmask, - unsigned long addr, unsigned long end, - struct mm_walk *walk) +static int smaps_hugetlb_range(struct hugetlb_pte *hpte, + unsigned long addr, + struct mm_walk *walk) { struct mem_size_stats *mss = walk->private; struct vm_area_struct *vma = walk->vma; struct page *page = NULL; + pte_t pte = huge_ptep_get(hpte->ptep); - if (pte_present(*pte)) { - page = vm_normal_page(vma, addr, *pte); - } else if (is_swap_pte(*pte)) { - swp_entry_t swpent = pte_to_swp_entry(*pte); + if (pte_present(pte)) { + /* We only care about leaf-level PTEs. */ + if (!hugetlb_pte_present_leaf(hpte, pte)) + /* + * The only case where hpte is not a leaf is that + * it was originally none, but it was split from + * under us. It was originally none, so exclude it. + */ + return 0; + + page = vm_normal_page(vma, addr, pte); + } else if (is_swap_pte(pte)) { + swp_entry_t swpent = pte_to_swp_entry(pte); if (is_pfn_swap_entry(swpent)) page = pfn_swap_entry_to_page(swpent); @@ -740,9 +750,9 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned long hmask, int mapcount = page_mapcount(page); if (mapcount >= 2) - mss->shared_hugetlb += huge_page_size(hstate_vma(vma)); + mss->shared_hugetlb += hugetlb_pte_size(hpte); else - mss->private_hugetlb += huge_page_size(hstate_vma(vma)); + mss->private_hugetlb += hugetlb_pte_size(hpte); } return 0; } @@ -1561,22 +1571,31 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end, #ifdef CONFIG_HUGETLB_PAGE /* This function walks within one hugetlb entry in the single call */ -static int pagemap_hugetlb_range(pte_t *ptep, unsigned long hmask, - unsigned long addr, unsigned long end, +static int pagemap_hugetlb_range(struct hugetlb_pte *hpte, + unsigned long addr, struct mm_walk *walk) { struct pagemapread *pm = walk->private; struct vm_area_struct *vma = walk->vma; u64 flags = 0, frame = 0; int err = 0; - pte_t pte; + unsigned long hmask = hugetlb_pte_mask(hpte); + unsigned long end = addr + hugetlb_pte_size(hpte); + pte_t pte = huge_ptep_get(hpte->ptep); + struct page *page; if (vma->vm_flags & VM_SOFTDIRTY) flags |= PM_SOFT_DIRTY; - pte = huge_ptep_get(ptep); if (pte_present(pte)) { - struct page *page = pte_page(pte); + /* + * We raced with this PTE being split, which can only happen if + * it was blank before. Treat it is as if it were blank. + */ + if (!hugetlb_pte_present_leaf(hpte, pte)) + return 0; + + page = pte_page(pte); if (!PageAnon(page)) flags |= PM_FILE; @@ -1857,10 +1876,16 @@ static struct page *can_gather_numa_stats_pmd(pmd_t pmd, } #endif +struct show_numa_map_private { + struct numa_maps *md; + struct page *last_page; +}; + static int gather_pte_stats(pmd_t *pmd, unsigned long addr, unsigned long end, struct mm_walk *walk) { - struct numa_maps *md = walk->private; + struct show_numa_map_private *priv = walk->private; + struct numa_maps *md = priv->md; struct vm_area_struct *vma = walk->vma; spinlock_t *ptl; pte_t *orig_pte; @@ -1872,6 +1897,7 @@ static int gather_pte_stats(pmd_t *pmd, unsigned long addr, struct page *page; page = can_gather_numa_stats_pmd(*pmd, vma, addr); + priv->last_page = page; if (page) gather_stats(page, md, pmd_dirty(*pmd), HPAGE_PMD_SIZE/PAGE_SIZE); @@ -1885,6 +1911,7 @@ static int gather_pte_stats(pmd_t *pmd, unsigned long addr, orig_pte = pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl); do { struct page *page = can_gather_numa_stats(*pte, vma, addr); + priv->last_page = page; if (!page) continue; gather_stats(page, md, pte_dirty(*pte), 1); @@ -1895,19 +1922,25 @@ static int gather_pte_stats(pmd_t *pmd, unsigned long addr, return 0; } #ifdef CONFIG_HUGETLB_PAGE -static int gather_hugetlb_stats(pte_t *pte, unsigned long hmask, - unsigned long addr, unsigned long end, struct mm_walk *walk) +static int gather_hugetlb_stats(struct hugetlb_pte *hpte, unsigned long addr, + struct mm_walk *walk) { - pte_t huge_pte = huge_ptep_get(pte); + struct show_numa_map_private *priv = walk->private; + pte_t huge_pte = huge_ptep_get(hpte->ptep); struct numa_maps *md; struct page *page; - if (!pte_present(huge_pte)) + if (!hugetlb_pte_present_leaf(hpte, huge_pte)) + return 0; + + page = compound_head(pte_page(huge_pte)); + if (priv->last_page == page) + /* we've already accounted for this page */ return 0; - page = pte_page(huge_pte); + priv->last_page = page; - md = walk->private; + md = priv->md; gather_stats(page, md, pte_dirty(huge_pte), 1); return 0; } @@ -1937,9 +1970,15 @@ static int show_numa_map(struct seq_file *m, void *v) struct file *file = vma->vm_file; struct mm_struct *mm = vma->vm_mm; struct mempolicy *pol; + char buffer[64]; int nid; + struct show_numa_map_private numa_map_private; + + numa_map_private.md = md; + numa_map_private.last_page = NULL; + if (!mm) return 0; @@ -1969,7 +2008,7 @@ static int show_numa_map(struct seq_file *m, void *v) seq_puts(m, " huge"); /* mmap_lock is held by m_start */ - walk_page_vma(vma, &show_numa_ops, md); + walk_page_vma(vma, &show_numa_ops, &numa_map_private); if (!md->pages) goto out; diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h index 2f8f6cc980b4..7ed065ea5dba 100644 --- a/include/linux/pagewalk.h +++ b/include/linux/pagewalk.h @@ -3,6 +3,7 @@ #define _LINUX_PAGEWALK_H #include +#include struct mm_walk; @@ -21,7 +22,10 @@ struct mm_walk; * depth is -1 if not known, 0:PGD, 1:P4D, 2:PUD, 3:PMD. * Any folded depths (where PTRS_PER_P?D is equal to 1) * are skipped. - * @hugetlb_entry: if set, called for each hugetlb entry + * @hugetlb_entry: if set, called for each hugetlb entry. In the presence + * of high-granularity hugetlb entries, @hugetlb_entry is + * called only for leaf-level entries (i.e., hstate-level + * page table entries are ignored if they are not leaves). * @test_walk: caller specific callback function to determine whether * we walk over the current vma or not. Returning 0 means * "do page table walk over the current vma", returning @@ -47,9 +51,8 @@ struct mm_walk_ops { unsigned long next, struct mm_walk *walk); int (*pte_hole)(unsigned long addr, unsigned long next, int depth, struct mm_walk *walk); - int (*hugetlb_entry)(pte_t *pte, unsigned long hmask, - unsigned long addr, unsigned long next, - struct mm_walk *walk); + int (*hugetlb_entry)(struct hugetlb_pte *hpte, + unsigned long addr, struct mm_walk *walk); int (*test_walk)(unsigned long addr, unsigned long next, struct mm_walk *walk); int (*pre_vma)(unsigned long start, unsigned long end, diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c index 15f03df66db6..42845e1b560d 100644 --- a/mm/damon/vaddr.c +++ b/mm/damon/vaddr.c @@ -330,48 +330,55 @@ static int damon_mkold_pmd_entry(pmd_t *pmd, unsigned long addr, } #ifdef CONFIG_HUGETLB_PAGE -static void damon_hugetlb_mkold(pte_t *pte, struct mm_struct *mm, +static void damon_hugetlb_mkold(struct hugetlb_pte *hpte, pte_t entry, + struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr) { bool referenced = false; - pte_t entry = huge_ptep_get(pte); struct page *page = pte_page(entry); + struct page *hpage = compound_head(page); - get_page(page); + get_page(hpage); if (pte_young(entry)) { referenced = true; entry = pte_mkold(entry); - set_huge_pte_at(mm, addr, pte, entry); + set_huge_pte_at(mm, addr, hpte->ptep, entry); } #ifdef CONFIG_MMU_NOTIFIER if (mmu_notifier_clear_young(mm, addr, - addr + huge_page_size(hstate_vma(vma)))) + addr + hugetlb_pte_size(hpte))) referenced = true; #endif /* CONFIG_MMU_NOTIFIER */ if (referenced) - set_page_young(page); + set_page_young(hpage); - set_page_idle(page); - put_page(page); + set_page_idle(hpage); + put_page(hpage); } -static int damon_mkold_hugetlb_entry(pte_t *pte, unsigned long hmask, - unsigned long addr, unsigned long end, +static int damon_mkold_hugetlb_entry(struct hugetlb_pte *hpte, + unsigned long addr, struct mm_walk *walk) { - struct hstate *h = hstate_vma(walk->vma); spinlock_t *ptl; pte_t entry; - ptl = huge_pte_lock(h, walk->mm, pte); - entry = huge_ptep_get(pte); + ptl = hugetlb_pte_lock(walk->mm, hpte); + entry = huge_ptep_get(hpte->ptep); if (!pte_present(entry)) goto out; - damon_hugetlb_mkold(pte, walk->mm, walk->vma, addr); + if (!hugetlb_pte_present_leaf(hpte, entry)) + /* + * We raced with someone splitting a blank PTE. Treat this PTE + * as if it were blank. + */ + goto out; + + damon_hugetlb_mkold(hpte, entry, walk->mm, walk->vma, addr); out: spin_unlock(ptl); @@ -484,31 +491,39 @@ static int damon_young_pmd_entry(pmd_t *pmd, unsigned long addr, } #ifdef CONFIG_HUGETLB_PAGE -static int damon_young_hugetlb_entry(pte_t *pte, unsigned long hmask, - unsigned long addr, unsigned long end, +static int damon_young_hugetlb_entry(struct hugetlb_pte *hpte, + unsigned long addr, struct mm_walk *walk) { struct damon_young_walk_private *priv = walk->private; struct hstate *h = hstate_vma(walk->vma); - struct page *page; + struct page *page, *hpage; spinlock_t *ptl; pte_t entry; - ptl = huge_pte_lock(h, walk->mm, pte); + ptl = hugetlb_pte_lock(walk->mm, hpte); entry = huge_ptep_get(pte); if (!pte_present(entry)) goto out; + if (!hugetlb_pte_present_leaf(hpte, entry)) + /* + * We raced with someone splitting a blank PTE. Treat this PTE + * as if it were blank. + */ + goto out; + page = pte_page(entry); - get_page(page); + hpage = compound_head(page); + get_page(hpage); - if (pte_young(entry) || !page_is_idle(page) || + if (pte_young(entry) || !page_is_idle(hpage) || mmu_notifier_test_young(walk->mm, addr)) { *priv->page_sz = huge_page_size(h); priv->young = true; } - put_page(page); + put_page(hpage); out: spin_unlock(ptl); diff --git a/mm/hmm.c b/mm/hmm.c index 3850fb625dda..76679b46ad5e 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -469,27 +469,34 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end, #endif #ifdef CONFIG_HUGETLB_PAGE -static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask, - unsigned long start, unsigned long end, +static int hmm_vma_walk_hugetlb_entry(struct hugetlb_pte *hpte, + unsigned long start, struct mm_walk *walk) { unsigned long addr = start, i, pfn; struct hmm_vma_walk *hmm_vma_walk = walk->private; struct hmm_range *range = hmm_vma_walk->range; - struct vm_area_struct *vma = walk->vma; unsigned int required_fault; unsigned long pfn_req_flags; unsigned long cpu_flags; + unsigned long hmask = hugetlb_pte_mask(hpte); + unsigned int order = hugetlb_pte_shift(hpte) - PAGE_SHIFT; + unsigned long end = start + hugetlb_pte_size(hpte); spinlock_t *ptl; pte_t entry; - ptl = huge_pte_lock(hstate_vma(vma), walk->mm, pte); - entry = huge_ptep_get(pte); + ptl = hugetlb_pte_lock(walk->mm, hpte); + entry = huge_ptep_get(hpte->ptep); + + if (!hugetlb_pte_present_leaf(hpte, entry)) { + spin_unlock(ptl); + return -EAGAIN; + } i = (start - range->start) >> PAGE_SHIFT; pfn_req_flags = range->hmm_pfns[i]; cpu_flags = pte_to_hmm_pfn_flags(range, entry) | - hmm_pfn_flags_order(huge_page_order(hstate_vma(vma))); + hmm_pfn_flags_order(order); required_fault = hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, cpu_flags); if (required_fault) { @@ -593,7 +600,7 @@ int hmm_range_fault(struct hmm_range *range) * in pfns. All entries < last in the pfn array are set to their * output, and all >= are still at their input values. */ - } while (ret == -EBUSY); + } while (ret == -EBUSY || ret == -EAGAIN); return ret; } EXPORT_SYMBOL(hmm_range_fault); diff --git a/mm/memory-failure.c b/mm/memory-failure.c index bead6bccc7f2..505efba59d29 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -628,6 +628,7 @@ static int check_hwpoisoned_entry(pte_t pte, unsigned long addr, short shift, unsigned long poisoned_pfn, struct to_kill *tk) { unsigned long pfn = 0; + unsigned long base_pages_poisoned = (1UL << shift) / PAGE_SIZE; if (pte_present(pte)) { pfn = pte_pfn(pte); @@ -638,7 +639,8 @@ static int check_hwpoisoned_entry(pte_t pte, unsigned long addr, short shift, pfn = swp_offset_pfn(swp); } - if (!pfn || pfn != poisoned_pfn) + if (!pfn || pfn < poisoned_pfn || + pfn >= poisoned_pfn + base_pages_poisoned) return 0; set_to_kill(tk, addr, shift); @@ -704,16 +706,15 @@ static int hwpoison_pte_range(pmd_t *pmdp, unsigned long addr, } #ifdef CONFIG_HUGETLB_PAGE -static int hwpoison_hugetlb_range(pte_t *ptep, unsigned long hmask, - unsigned long addr, unsigned long end, - struct mm_walk *walk) +static int hwpoison_hugetlb_range(struct hugetlb_pte *hpte, + unsigned long addr, + struct mm_walk *walk) { struct hwp_walk *hwp = walk->private; - pte_t pte = huge_ptep_get(ptep); - struct hstate *h = hstate_vma(walk->vma); + pte_t pte = huge_ptep_get(hpte->ptep); - return check_hwpoisoned_entry(pte, addr, huge_page_shift(h), - hwp->pfn, &hwp->tk); + return check_hwpoisoned_entry(pte, addr & hugetlb_pte_mask(hpte), + hpte->shift, hwp->pfn, &hwp->tk); } #else #define hwpoison_hugetlb_range NULL diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 61aa9aedb728..275bc549590e 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -558,8 +558,8 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr, return addr != end ? -EIO : 0; } -static int queue_pages_hugetlb(pte_t *pte, unsigned long hmask, - unsigned long addr, unsigned long end, +static int queue_pages_hugetlb(struct hugetlb_pte *hpte, + unsigned long addr, struct mm_walk *walk) { int ret = 0; @@ -570,8 +570,12 @@ static int queue_pages_hugetlb(pte_t *pte, unsigned long hmask, spinlock_t *ptl; pte_t entry; - ptl = huge_pte_lock(hstate_vma(walk->vma), walk->mm, pte); - entry = huge_ptep_get(pte); + /* We don't migrate high-granularity HugeTLB mappings for now. */ + if (hugetlb_hgm_enabled(walk->vma)) + return -EINVAL; + + ptl = hugetlb_pte_lock(walk->mm, hpte); + entry = huge_ptep_get(hpte->ptep); if (!pte_present(entry)) goto unlock; page = pte_page(entry); diff --git a/mm/mincore.c b/mm/mincore.c index a085a2aeabd8..0894965b3944 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -22,18 +22,29 @@ #include #include "swap.h" -static int mincore_hugetlb(pte_t *pte, unsigned long hmask, unsigned long addr, - unsigned long end, struct mm_walk *walk) +static int mincore_hugetlb(struct hugetlb_pte *hpte, unsigned long addr, + struct mm_walk *walk) { #ifdef CONFIG_HUGETLB_PAGE unsigned char present; + unsigned long end = addr + hugetlb_pte_size(hpte); unsigned char *vec = walk->private; + pte_t pte = huge_ptep_get(hpte->ptep); /* * Hugepages under user process are always in RAM and never * swapped out, but theoretically it needs to be checked. */ - present = pte && !huge_pte_none(huge_ptep_get(pte)); + present = !huge_pte_none(pte); + + /* + * If the pte is present but not a leaf, we raced with someone + * splitting it. For someone to have split it, it must have been + * huge_pte_none before, so treat it as such. + */ + if (pte_present(pte) && !hugetlb_pte_present_leaf(hpte, pte)) + present = false; + for (; addr != end; vec++, addr += PAGE_SIZE) *vec = present; walk->private = vec; diff --git a/mm/mprotect.c b/mm/mprotect.c index 99762403cc8f..9975b86035e0 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -524,12 +524,16 @@ static int prot_none_pte_entry(pte_t *pte, unsigned long addr, 0 : -EACCES; } -static int prot_none_hugetlb_entry(pte_t *pte, unsigned long hmask, - unsigned long addr, unsigned long next, +static int prot_none_hugetlb_entry(struct hugetlb_pte *hpte, + unsigned long addr, struct mm_walk *walk) { - return pfn_modify_allowed(pte_pfn(*pte), *(pgprot_t *)(walk->private)) ? - 0 : -EACCES; + pte_t pte = huge_ptep_get(hpte->ptep); + + if (!hugetlb_pte_present_leaf(hpte, pte)) + return -EAGAIN; + return pfn_modify_allowed(pte_pfn(pte), + *(pgprot_t *)(walk->private)) ? 0 : -EACCES; } static int prot_none_test(unsigned long addr, unsigned long next, @@ -572,8 +576,10 @@ mprotect_fixup(struct mmu_gather *tlb, struct vm_area_struct *vma, (newflags & VM_ACCESS_FLAGS) == 0) { pgprot_t new_pgprot = vm_get_page_prot(newflags); - error = walk_page_range(current->mm, start, end, - &prot_none_walk_ops, &new_pgprot); + do { + error = walk_page_range(current->mm, start, end, + &prot_none_walk_ops, &new_pgprot); + } while (error == -EAGAIN); if (error) return error; } diff --git a/mm/pagewalk.c b/mm/pagewalk.c index bb33c1e8c017..2318aae98f1e 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -3,6 +3,7 @@ #include #include #include +#include /* * We want to know the real level where a entry is located ignoring any @@ -301,20 +302,39 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end, pte_t *pte; const struct mm_walk_ops *ops = walk->ops; int err = 0; + struct hugetlb_pte hpte; + + if (hugetlb_hgm_enabled(vma)) + /* + * We could potentially do high-granularity walks. Grab the + * VMA lock to prevent PTEs from becoming invalid. + */ + hugetlb_vma_lock_read(vma); do { - next = hugetlb_entry_end(h, addr, end); pte = huge_pte_offset(walk->mm, addr & hmask, sz); - - if (pte) - err = ops->hugetlb_entry(pte, hmask, addr, next, walk); - else if (ops->pte_hole) - err = ops->pte_hole(addr, next, -1, walk); + if (!pte) { + next = hugetlb_entry_end(h, addr, end); + if (ops->pte_hole) + err = ops->pte_hole(addr, next, -1, walk); + } else { + hugetlb_pte_populate(&hpte, pte, huge_page_shift(h), + hpage_size_to_level(sz)); + hugetlb_hgm_walk(walk->mm, vma, &hpte, addr, + PAGE_SIZE, + /*stop_at_none=*/true); + err = ops->hugetlb_entry( + &hpte, addr, walk); + next = min(addr + hugetlb_pte_size(&hpte), end); + } if (err) break; } while (addr = next, addr != end); + if (hugetlb_hgm_enabled(vma)) + hugetlb_vma_unlock_read(vma); + return err; } From patchwork Fri Oct 21 16:36:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6839 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp795948wrr; Fri, 21 Oct 2022 09:39:43 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7jBAtlGp6UTZqCRrzLq5gxNRDDD7FxwwMZDtQuNBK8RKlRkPS9UndbtT/LPFVWM+fBTl+3 X-Received: by 2002:a17:90b:19ca:b0:212:d2c4:83ac with SMTP id nm10-20020a17090b19ca00b00212d2c483acmr3631744pjb.166.1666370383069; Fri, 21 Oct 2022 09:39:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370383; cv=none; d=google.com; s=arc-20160816; b=JUrjUGUwy++tsjMbywjBRSLo4VMpnmgUElCOUUije1hLtbIImGvx+FSKhB+aeEdQ1/ XUqIr8KU6zAeV3llSTEk9hstGRe8BnSv24G5S/Nt1jKizYfTOSTQLCy8aUulDXhQDQAi qRaZI56dZ8G2/mLVf5+Sx51P+PFiSTefelHnEw5gvGNI9iKVDQDe32p48IQYEgOoMIzT YA9pyAUvJHfG1HeHpd11Bxl9D8PTqvOf/OvP9HRvsIGRoe/NleuADKj3nlafw1LG3g8W xXrJL5SJj3jyq4X70khVB9nCbqPyO6JDYDGLS3KVkEl19mDqGoFVpiKL5R63AthcMMqI YyPg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=b10z9K2d7yL2NkYZoSYI1eqym2z6cE2oNjVSRM+YxM4=; b=zXVoYbpER2yNbnzVneUNldxFNI8TX7RyRvFqt87BFmKINeK5Ru7StIh4VJ5zu8A1ye pa44Z8CnqdhKSAXgNTeL82tRJ0JLPZvM1uc/hvvZVrqu5OMTVs6MY1PQJ26bfWlK7RxC f35ds6vNQM8RzhtXFkHklMod+ePL1xjA3WkqhbZAIcw+rA8lzE4trnOw5WWdlGe2SdvM mCPgk3l6M5AUn6lWhz3oATSBBxQ+SwBfxBD2h0HATvF3K1nA/942Mms1rMNB6pqhaDIL mqZYgZYYqtxrvKE1A02cLUX4Ur8qFgxeuRQwLA5Tsslsg5NndowPE8cSxUwANHmdYQlZ UVjA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=RdBFlvTW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h23-20020a17090ac39700b002033116cb72si5572879pjt.156.2022.10.21.09.39.30; Fri, 21 Oct 2022 09:39:43 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=RdBFlvTW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231194AbiJUQjK (ORCPT + 99 others); Fri, 21 Oct 2022 12:39:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52568 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231165AbiJUQiP (ORCPT ); Fri, 21 Oct 2022 12:38:15 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D59192859B4 for ; Fri, 21 Oct 2022 09:37:36 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-36772c0c795so33791817b3.23 for ; Fri, 21 Oct 2022 09:37:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=b10z9K2d7yL2NkYZoSYI1eqym2z6cE2oNjVSRM+YxM4=; b=RdBFlvTWkP0mOj/+10aeT3pEp8gO1xe67MKNQAL9eJH8ygvwfl65HnLVJ/TXb07S8J Ekg4eIzJiS8mFJVRjcHyYUexaJ2Wpz0KD24tdJaK9vQvVpzHw4Swadb/CS9sPPIaClpx a4pSknIHqn7yKEu2xGQn6BM9ClrfpieKDQTxfc134vaHcOGtZtUzcwwZ/+hyo2Y9SDbU EqxOcGGs64poU/YYQXMdII6wUC0HnwMTfwDML9Jw5n3ISsd1571avOMoRlykby8HbVOv iidSXRx0txEQjxJng6lLm/tQXQGCk4PTQsXMvO/tZsN0O2ZrP+47TYzob91093CTHxXV vE5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=b10z9K2d7yL2NkYZoSYI1eqym2z6cE2oNjVSRM+YxM4=; b=ugRGnq3Otogx+T2tmhZUiCeEjyUgUxsc+Z8ijVoLEn/k0gE+yx4FZ7SyqxRBiBvgfr h/OmZ2+cog51m7tdIDGtBcjhOKzQOrRolgYq4gNn5D2xtiM2qd1ZtUVmfujKKaTEH8LH ZYmSTuw2djMJEx2aUEQAVE9UqOjSuV+g4vFt1MSfSyaOpuLNk3xaG9dAlTs1r9T084Ga +/3N+qJmbolSNWIlxkjf6RXy8dcABYzif7yVHF7UzjqVuT55SM75OcnEP5spsKDIyVdk g+KlujAqvXr343olNXJiH+piKuT6a7ZY5ZTIbZWOp/Vur7JKPE1V9MAsBSGPJ6ET8It6 MwOw== X-Gm-Message-State: ACrzQf2DlbhLWtgrhGwe1GSgS12MPEOO5TADZN6TyKONKVWBbfJR961k 9LQybsqY6L9hJOLjDp4rK8bPp1dIL9kAf6Wt X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a81:8d14:0:b0:361:4e59:a90e with SMTP id d20-20020a818d14000000b003614e59a90emr17171702ywg.288.1666370255675; Fri, 21 Oct 2022 09:37:35 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:37 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-22-jthoughton@google.com> Subject: [RFC PATCH v2 21/47] mm: rmap: provide pte_order in page_vma_mapped_walk From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747315990899618492?= X-GMAIL-MSGID: =?utf-8?q?1747315990899618492?= page_vma_mapped_walk callers will need this information to know how HugeTLB pages are mapped. pte_order only applies if pte is not NULL. Signed-off-by: James Houghton --- include/linux/rmap.h | 1 + mm/page_vma_mapped.c | 3 +++ 2 files changed, 4 insertions(+) diff --git a/include/linux/rmap.h b/include/linux/rmap.h index bd3504d11b15..e0557ede2951 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -378,6 +378,7 @@ struct page_vma_mapped_walk { pmd_t *pmd; pte_t *pte; spinlock_t *ptl; + unsigned int pte_order; unsigned int flags; }; diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 93e13fc17d3c..395ca4e21c56 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -16,6 +16,7 @@ static inline bool not_found(struct page_vma_mapped_walk *pvmw) static bool map_pte(struct page_vma_mapped_walk *pvmw) { pvmw->pte = pte_offset_map(pvmw->pmd, pvmw->address); + pvmw->pte_order = 0; if (!(pvmw->flags & PVMW_SYNC)) { if (pvmw->flags & PVMW_MIGRATION) { if (!is_swap_pte(*pvmw->pte)) @@ -174,6 +175,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) if (!pvmw->pte) return false; + pvmw->pte_order = huge_page_order(hstate); pvmw->ptl = huge_pte_lock(hstate, mm, pvmw->pte); if (!check_pte(pvmw)) return not_found(pvmw); @@ -269,6 +271,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) } pte_unmap(pvmw->pte); pvmw->pte = NULL; + pvmw->pte_order = 0; goto restart; } pvmw->pte++; From patchwork Fri Oct 21 16:36:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6840 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp795979wrr; Fri, 21 Oct 2022 09:39:46 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4z0ZvP7uxXgKlOTWVJnChyutRO9Hye+elXHsTkKWgM0VuQ4JcJHAH/KRq8B+SHXhCvJSg0 X-Received: by 2002:a63:91c7:0:b0:460:924:a34e with SMTP id l190-20020a6391c7000000b004600924a34emr16669533pge.492.1666370386302; Fri, 21 Oct 2022 09:39:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370386; cv=none; d=google.com; s=arc-20160816; b=AyTPTjHYpgwl1CTIRMUScREUuNzkUCIfkevlmt8BXx6UQdmktjcl5YM/x6wkN/XU13 r6RtRq1Yf1ywSBs9l4maUKbt6ln9C/6IAl6usY3x1JOdngdEGFXktS7wy0spDLQv1oID ysasiAFYy9+/3F1e3inU+WamDgX4pJlrlTsA+T49GeyiGI9ZCW/mTjW4FgJNxBkO5+hn WATP1eY8Ze4N2CpFyVB2ZBbmSuOiVbypPwSZIr/QASl9QrDbQ3x+ObjDZAYC+ts/uBAA ys7lt3KQczfWfSJNgmFGeH0Cymx6UsmwzI397Qp8VT/AIa1yb2zD9095ZEROjjYc4PMo SbfA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=8wv1ndA1KF8oaMnIrotaSvG0oK1F475ZCcDJjASpTlQ=; b=eugVEQllnIm6mmqHVAn39JUnzhy6OmA6O/7Sys4YWhpbl3uqNhTfUKbYXku+uOWDlw qbFj0hcySgmofj58mW3PYNUqsNIHVNQ8fURoUeR40eoo0Uvg9okFPkOEUHI8/k+1iIA6 /x2ym9NTH8SlxosPpYC/W9KKws1jqHiV4WcstiplFbg/+/bD1OQDJ4ICunj6t/yKEHid MVV+hbVryEEFVbPKTbx7HCZmzDQfOzCzBV+LDElUGXk694yQkbLi1cvZdPKPIP44d3JN Jiw4TG1Gct5pUmIXdTEzXalfZZrkpbPzeSGdDpvpVxSnPQgrUKTdS+/VbJ3k1cgfVKnd eRtA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=hA8J326O; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l65-20020a639144000000b0046125fee7dbsi23627668pge.382.2022.10.21.09.39.33; Fri, 21 Oct 2022 09:39:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=hA8J326O; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231209AbiJUQjQ (ORCPT + 99 others); Fri, 21 Oct 2022 12:39:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52790 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230417AbiJUQiP (ORCPT ); Fri, 21 Oct 2022 12:38:15 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CBA24285B77 for ; Fri, 21 Oct 2022 09:37:38 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id f9-20020a25b089000000b006be298e2a8dso3755480ybj.20 for ; Fri, 21 Oct 2022 09:37:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=8wv1ndA1KF8oaMnIrotaSvG0oK1F475ZCcDJjASpTlQ=; b=hA8J326O7a0EqIeFifBsH7a/imYAoEv3Fjm/FsWS/2l67p7YL4UTp4delGk3izcl+P yUMlvya8wwiX0QoCutq/CYssZPSkjElnj1/ElmaX1Yv0en/RWLhHh7JnxK7ihEFGucid sPn7Jj6tEBM2NrRJYPvUv4K5hnyaC9+Ddpv1Ik/07iw1wk6iQd8BFDrNfG9dJysMepUC Rk6k301F/i5SC3oHp9J/MopKdD70L4fAd+UF5uQtHniRZAMT/o0Qu3QtZI+LtO+vEwiz 1K/dd5p1zQhugJO2OzYe1+HctzunUY0PXuEreoURlPTPkCfYe98jsMbHS423j8l/98PY A6zg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=8wv1ndA1KF8oaMnIrotaSvG0oK1F475ZCcDJjASpTlQ=; b=YIF891OxMVAA8HgRpv4ltLYgkf8dJFqR3hvWlFf7s1sxVUKYomhPnCs3WuIIdLH9X4 7yAtpkDEoS3HUiDDHKeeraR/X2wv4YiQg0u93DsBgqYS2XM+/CUKrnX3lEWhWsgMQb72 BD55CegNH5RLRij3/LnIBf67nKLGHq/rFxBQvNOvuf/ifhyvkGf1FgZzjiCLDLJrfl8/ YmTQBlS2okonYX1e4yMZkKfufU79LKMspAu7IvaiYNj7wy3VxTH3wBlQGM+gO9gFuvx3 MgnShFlWXWmRwOYp/gGX1o2hlllq4wI5eReGB7KtpnZb0UJC6ETnfqX88a9SGMbONWOd ggZw== X-Gm-Message-State: ACrzQf1MaMPNul0YCxMccZqX/iDlsSTr8IHkkNmYdruB05oC8ZAXhTyn sCnfnSBYl28qTYdUcbAGPdy0LqhJjMoaQ+qM X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6902:110e:b0:66d:e6dc:5f31 with SMTP id o14-20020a056902110e00b0066de6dc5f31mr17202827ybu.628.1666370256631; Fri, 21 Oct 2022 09:37:36 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:38 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-23-jthoughton@google.com> Subject: [RFC PATCH v2 22/47] mm: rmap: make page_vma_mapped_walk callers use pte_order From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747315993888667893?= X-GMAIL-MSGID: =?utf-8?q?1747315993888667893?= This also updates the callers' hugetlb mapcounting code to handle mapcount properly for subpage-mapped hugetlb pages. Signed-off-by: James Houghton --- mm/migrate.c | 2 +- mm/rmap.c | 17 +++++++++++++---- 2 files changed, 14 insertions(+), 5 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index a0105fa6e3b2..8712b694c5a7 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -235,7 +235,7 @@ static bool remove_migration_pte(struct folio *folio, #ifdef CONFIG_HUGETLB_PAGE if (folio_test_hugetlb(folio)) { - unsigned int shift = huge_page_shift(hstate_vma(vma)); + unsigned int shift = pvmw.pte_order + PAGE_SHIFT; pte = arch_make_huge_pte(pte, shift, vma->vm_flags); if (folio_test_anon(folio)) diff --git a/mm/rmap.c b/mm/rmap.c index 9bba65b30e4d..19850d955aea 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1626,7 +1626,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, if (PageHWPoison(subpage) && !(flags & TTU_IGNORE_HWPOISON)) { pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); if (folio_test_hugetlb(folio)) { - hugetlb_count_sub(folio_nr_pages(folio), mm); + hugetlb_count_sub(1UL << pvmw.pte_order, mm); set_huge_pte_at(mm, address, pvmw.pte, pteval); } else { dec_mm_counter(mm, mm_counter(&folio->page)); @@ -1785,7 +1785,11 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, * * See Documentation/mm/mmu_notifier.rst */ - page_remove_rmap(subpage, vma, folio_test_hugetlb(folio)); + if (folio_test_hugetlb(folio)) + page_remove_rmap(&folio->page, vma, true); + else + page_remove_rmap(subpage, vma, false); + if (vma->vm_flags & VM_LOCKED) mlock_page_drain_local(); folio_put(folio); @@ -2034,7 +2038,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, } else if (PageHWPoison(subpage)) { pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); if (folio_test_hugetlb(folio)) { - hugetlb_count_sub(folio_nr_pages(folio), mm); + hugetlb_count_sub(1L << pvmw.pte_order, mm); set_huge_pte_at(mm, address, pvmw.pte, pteval); } else { dec_mm_counter(mm, mm_counter(&folio->page)); @@ -2126,7 +2130,10 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, * * See Documentation/mm/mmu_notifier.rst */ - page_remove_rmap(subpage, vma, folio_test_hugetlb(folio)); + if (folio_test_hugetlb(folio)) + page_remove_rmap(&folio->page, vma, true); + else + page_remove_rmap(subpage, vma, false); if (vma->vm_flags & VM_LOCKED) mlock_page_drain_local(); folio_put(folio); @@ -2210,6 +2217,8 @@ static bool page_make_device_exclusive_one(struct folio *folio, args->owner); mmu_notifier_invalidate_range_start(&range); + VM_BUG_ON_FOLIO(folio_test_hugetlb(folio), folio); + while (page_vma_mapped_walk(&pvmw)) { /* Unexpected PMD-mapped THP? */ VM_BUG_ON_FOLIO(!pvmw.pte, folio); From patchwork Fri Oct 21 16:36:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6842 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp796083wrr; Fri, 21 Oct 2022 09:39:58 -0700 (PDT) X-Google-Smtp-Source: AMsMyM60hESqHt8Gvsxl0IsAkwOXX6PJIP+KfBjlZLaCpvcOLrLMKOmAA+ZsKdDImFa4Sm2Ol3Ha X-Received: by 2002:a63:2221:0:b0:43b:f4a3:80cc with SMTP id i33-20020a632221000000b0043bf4a380ccmr16732001pgi.367.1666370398165; Fri, 21 Oct 2022 09:39:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370398; cv=none; d=google.com; s=arc-20160816; b=wpDH0aTdhO+MAXXb9cJ9QayJ05TBwlmRT/fKNwbL5uIAnqPHShjK45FsVUE8Szkhs3 cgRi2MjJvVndIPYKKW2+xnUni/tmNVd03cPC0ZcGUkq6uVWvxaTIY2yjLQ1PBstS0+XR rh+62Uojpauz8IxqcRFgUtG3vrxjAHQGH+U4Jig7snZYfWNq/MJ6UWHtqbi/EkyZJrl0 a8f9UkQLta7M5MZ9W7xpQj3ayghj6/toXqRhZzHg8IOKO4mL1sMhUTiqFncb17ZlDbQg V0biJk0n9Riyj4F2cypR3X047qOCgKt9dqehYLvTxWeo+h2Czx0TECraarvDGmMRJ1M2 r6mg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=A8A4WdJUmtZ3pZn0W1kJeWoXW7oHHPnmlapm5XaH58o=; b=CHJd9GOt6xServQ4zRAjhvBuCISU6RXXOn4DX+6tKHl/grE7Y7+AftU15IFJ2qHUHB riy0eJyNwFSCkTiLXOJGiMJU0+LrB9EPjoCuud6MskDb2hjbOXuionswEDheixGd+Zqw vu8hdbGhiQ+ijXZg/S8uI3731XMv6jJWabZSz7BB+KsCjIYRHZ/QJaUvDPm6GM9QqNWw j4sodgnmNOiQ/bcs39UtK/I8UQHDhmKquJjRc/0LzDTt9GRJZqzaTnilojXRe/8hyU6N v51iqsttvEYa3y7kBkKng2NMzP/bxj0qGuQHc2JNJQYgMRGC7DeHe00gws3LtNmwCAg0 vEQA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=QyZNlwrS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hg3-20020a17090b300300b0020dba9319fcsi6362967pjb.111.2022.10.21.09.39.45; Fri, 21 Oct 2022 09:39:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=QyZNlwrS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231216AbiJUQjY (ORCPT + 99 others); Fri, 21 Oct 2022 12:39:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54174 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231176AbiJUQiP (ORCPT ); Fri, 21 Oct 2022 12:38:15 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DC1782892C7 for ; Fri, 21 Oct 2022 09:37:38 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id i16-20020a056902069000b006c3ef07d22eso3768440ybt.13 for ; Fri, 21 Oct 2022 09:37:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=A8A4WdJUmtZ3pZn0W1kJeWoXW7oHHPnmlapm5XaH58o=; b=QyZNlwrSq5mo4QWYzR3oK0EIR216wpexI9Df6hEZNyICuoeMeDmrMlBrvBhG5G9oMH eDwoSKuwVOkw2F+eA1UKjuSgQ/4G0YaIvKDgPK5/l7J3Q8CkdNBISRGJo0HwWGWEUNrf Hsg26jVqMIfXyFuiBuWIPRk3e/w2Ynh00CNLcq8RUWHyxHLWPosuj8ILf6dtKr/srDKf VgnkAvfVFAk1eD2bIu3KwtmPY4idqkPwVNcFidZU4Qvqi2ndAMirEsPVmZ++CLhMi7or QE1FxE7zy+/P5OQx3hmNlYrC8AT26Ti6pHfMan5PLEwpAZLEhxJr4VhMjlaxssiD5v+m jbDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=A8A4WdJUmtZ3pZn0W1kJeWoXW7oHHPnmlapm5XaH58o=; b=j62+WE5X4X+PrlleKBIqlythGDRbdSizXhECm2SBNcDhP89+iucsL5fluS9hL5St7i gwTGXlW0wEeDY1lGhvpJhkqNbgAEGmCFNYwoxHfAveR3fFBvVj7z+nvpWovnFBKCl1oL V78pfoBW6NwtFFo777zAG+Rnayxs+UGjlGZ6hkuPHYnPWHh12s7au2XARi2/O4WB4YV9 yjwBcPjPYdipBa7KJTJIU+exDZysSeCLm0Sy+FSzakMi0261QkBCh+0fnu2evtKNzU/X DXvyPPdZKgWJKHQfnuLxH3405c/7BiHQwR3rjgEZkRU6SqaG15roOFDsGEG3zb16Pq6I /AGw== X-Gm-Message-State: ACrzQf0dNSZeiSqweeEq2AeCD2YqL/MCBtXLeLWe+UjVPA0TuWtRmnSM yw+8RQ1iHrwlEfvjBFarXc7ycpcLOZoH+GKQ X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:cd45:0:b0:6c2:2d8a:e3f4 with SMTP id d66-20020a25cd45000000b006c22d8ae3f4mr17406990ybf.395.1666370257496; Fri, 21 Oct 2022 09:37:37 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:39 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-24-jthoughton@google.com> Subject: [RFC PATCH v2 23/47] rmap: update hugetlb lock comment for HGM From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747316006357220520?= X-GMAIL-MSGID: =?utf-8?q?1747316006357220520?= The VMA lock is used to prevent high-granularity HugeTLB mappings from being collapsed while other threads are doing high-granularity page table walks. Signed-off-by: James Houghton --- mm/rmap.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mm/rmap.c b/mm/rmap.c index 19850d955aea..527463c1e936 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -47,7 +47,8 @@ * * hugetlbfs PageHuge() take locks in this order: * hugetlb_fault_mutex (hugetlbfs specific page fault mutex) - * vma_lock (hugetlb specific lock for pmd_sharing) + * vma_lock (hugetlb specific lock for pmd_sharing and high-granularity + * mapping) * mapping->i_mmap_rwsem (also used for hugetlb pmd sharing) * page->flags PG_locked (lock_page) */ From patchwork Fri Oct 21 16:36:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6843 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp796120wrr; Fri, 21 Oct 2022 09:40:02 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4egpC+NPI6+UdVJu1oJy1IfNo37tLEnvXtjBt8f/+j4u90vabdHMIzpHCTFG2QRG7X+Y1K X-Received: by 2002:a17:90a:cc7:b0:200:3b3e:4e00 with SMTP id 7-20020a17090a0cc700b002003b3e4e00mr58352422pjt.201.1666370402269; Fri, 21 Oct 2022 09:40:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370402; cv=none; d=google.com; s=arc-20160816; b=dom4ruKEKCxsT+lkiiUdCzPROxMZBpUoCKVn3keLg16Tb3HOFB5ZgKDmEcGQCr4/Ba T/Zytt7CvJbpxjaOQ1ub0Eh8IqZcChG3bJsP8nuWhJRYZC+eEojpYWfeozLYCdJGc8Nw fUbkG3CPKoG0VqH3pgWNG3/GqXSAldlIHvFnNaiuNIPrXxkE4AtKa3f1b48+jgqupGNH eaK+EhRw67fub2BPLPCf96KxIaHu56z5ftfnYeOWxJlyiJ0EcbUtmFJuBDzUfZbfAbXt jkACypK7v9KxfYhzUwPVpe/2zlKyqKLNOPn9KpUxuODDapB8kyCSG6lZyRTToLnpb31h Kj5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=kLa4FDL78/heGirMrV4t2jI2YXb52QMfSMVBDglb1pI=; b=DGP+2h3ocDy/BAURvLBUHGX2rvdIDsqkOBTyOJ/zg9oLpmAEhTCCkfX7AyjHlWLFyH g909+zdAFQchfPeQ/dqcBioqMNmRNEgwcR0e6PA4EHwyxt/1qZDGqxtQTr27OTl6zHH9 QXj4kTqm7DoQPx3rVnCIhUS4YqbiLFKfoCp5CF7JFKHH/iK1IJzvXFf5yGkk0ghA/FiD wPfU0177DPf1iP6OUsbvI68Ruato/f1wIH+xqrOfd1fzq06OM3fjPVBXiJAh6q5KAmXD raKbxl96pVvcjQNW42Sm817S+hCcwRbgAC0Rrdt0g6cAcxoj7yFVMOz0jfXmyn2b/yQk j2qw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=Q5PTuLo7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i8-20020a170902cf0800b00185419aa853si17502100plg.158.2022.10.21.09.39.49; Fri, 21 Oct 2022 09:40:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=Q5PTuLo7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231234AbiJUQj3 (ORCPT + 99 others); Fri, 21 Oct 2022 12:39:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54340 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231126AbiJUQiT (ORCPT ); Fri, 21 Oct 2022 12:38:19 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0C3682892CF for ; Fri, 21 Oct 2022 09:37:39 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-352e29ff8c2so34082087b3.21 for ; Fri, 21 Oct 2022 09:37:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=kLa4FDL78/heGirMrV4t2jI2YXb52QMfSMVBDglb1pI=; b=Q5PTuLo7HRBrnlQ1Ey9ThRuaUPzdRht6lDvMahIeRZWgEby1HC2ivIu1k6cVhj8WmR oO/uNViSfO/2l5oua1WWbkZ0eHLvVgNvH7kU5i5kmpn4hbwV9AMGBGCX3YqInEpOxWcS OQI52pRTR+PnZRAz0BqQAhpT9WrDVcR13NBnsTM4rCctkW7Z+CR0eHFNnJ4WxMOCiJ5T SDYleezhNhwhzNFFFV+R09sJxih1+czcs9xyXQDbsEpbaRwdWjhHE0DoJ1w0cD/H3Kzs ktG3Exmi3FDbo2/KLeQiGqY7fizeM3zEOYgw8vPiHu8GPzznlIQ9RQAUhOcOYC4PvbiJ fpjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=kLa4FDL78/heGirMrV4t2jI2YXb52QMfSMVBDglb1pI=; b=S5Pn+vXcF4NDfqipAbv+uPT/W+qwI8GV2dzvfzh20BIFRYUwjYvetw0eEUVZN6uhug /HRACii/dKCfwJd5wSS2kKKUHJQJY+PNQqDmbGa6PmKcwKcrORqW6+rSOOSeuhkW5QsN dfs/pCDd+AEAJWxmr6Bczuzi7ySV878WM3d7vscrvEKKyJXSN5IDCQdj1p8TYzrUBLsA qG4Q+E8oI6LXVd0W9+UzJZYy1WK6/++66F3CdXWOxyRDNLLC7pWZiMSa9Er27LfsJLET rrRH/duFGiSfNu9lHNwJmOrxHVhd74uGCCM3TwRK9CvVzpdFHf9G0YPcpuar1u2xojVd k+/w== X-Gm-Message-State: ACrzQf1VlWPdrGIsXJ55m50ulvHKIMM0fwSruV2k55IvbI7BQEAGYx8h 6ETGujgrT6IhvVStlVTruRZh1y0s229l6ZIY X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:244e:0:b0:6ca:1972:f851 with SMTP id k75-20020a25244e000000b006ca1972f851mr10580295ybk.277.1666370258436; Fri, 21 Oct 2022 09:37:38 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:40 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-25-jthoughton@google.com> Subject: [RFC PATCH v2 24/47] hugetlb: update page_vma_mapped to do high-granularity walks From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747316010787122971?= X-GMAIL-MSGID: =?utf-8?q?1747316010787122971?= This updates the HugeTLB logic to look a lot more like the PTE-mapped THP logic. When a user calls us in a loop, we will update pvmw->address to walk to each page table entry that could possibly map the hugepage containing pvmw->pfn. This makes use of the new pte_order so callers know what size PTE they're getting. Signed-off-by: James Houghton --- include/linux/rmap.h | 4 +++ mm/page_vma_mapped.c | 59 ++++++++++++++++++++++++++++++++++++-------- mm/rmap.c | 48 +++++++++++++++++++++-------------- 3 files changed, 83 insertions(+), 28 deletions(-) diff --git a/include/linux/rmap.h b/include/linux/rmap.h index e0557ede2951..d7d2d9f65a01 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -13,6 +13,7 @@ #include #include #include +#include /* * The anon_vma heads a list of private "related" vmas, to scan if @@ -409,6 +410,9 @@ static inline void page_vma_mapped_walk_done(struct page_vma_mapped_walk *pvmw) pte_unmap(pvmw->pte); if (pvmw->ptl) spin_unlock(pvmw->ptl); + if (pvmw->pte && is_vm_hugetlb_page(pvmw->vma) && + hugetlb_hgm_enabled(pvmw->vma)) + hugetlb_vma_unlock_read(pvmw->vma); } bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw); diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 395ca4e21c56..1994b3f9a4c2 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -133,7 +133,8 @@ static void step_forward(struct page_vma_mapped_walk *pvmw, unsigned long size) * * Returns true if the page is mapped in the vma. @pvmw->pmd and @pvmw->pte point * to relevant page table entries. @pvmw->ptl is locked. @pvmw->address is - * adjusted if needed (for PTE-mapped THPs). + * adjusted if needed (for PTE-mapped THPs and high-granularity--mapped HugeTLB + * pages). * * If @pvmw->pmd is set but @pvmw->pte is not, you have found PMD-mapped page * (usually THP). For PTE-mapped THP, you should run page_vma_mapped_walk() in @@ -166,19 +167,57 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) if (unlikely(is_vm_hugetlb_page(vma))) { struct hstate *hstate = hstate_vma(vma); unsigned long size = huge_page_size(hstate); - /* The only possible mapping was handled on last iteration */ - if (pvmw->pte) - return not_found(pvmw); + struct hugetlb_pte hpte; + pte_t *pte; + pte_t pteval; + + end = (pvmw->address & huge_page_mask(hstate)) + + huge_page_size(hstate); /* when pud is not present, pte will be NULL */ - pvmw->pte = huge_pte_offset(mm, pvmw->address, size); - if (!pvmw->pte) + pte = huge_pte_offset(mm, pvmw->address, size); + if (!pte) return false; - pvmw->pte_order = huge_page_order(hstate); - pvmw->ptl = huge_pte_lock(hstate, mm, pvmw->pte); - if (!check_pte(pvmw)) - return not_found(pvmw); + do { + hugetlb_pte_populate(&hpte, pte, huge_page_shift(hstate), + hpage_size_to_level(size)); + + /* + * Do a high granularity page table walk. The vma lock + * is grabbed to prevent the page table from being + * collapsed mid-walk. It is dropped in + * page_vma_mapped_walk_done(). + */ + if (pvmw->pte) { + if (pvmw->ptl) + spin_unlock(pvmw->ptl); + pvmw->ptl = NULL; + pvmw->address += PAGE_SIZE << pvmw->pte_order; + if (pvmw->address >= end) + return not_found(pvmw); + } else if (hugetlb_hgm_enabled(vma)) + /* Only grab the lock once. */ + hugetlb_vma_lock_read(vma); + +retry_walk: + hugetlb_hgm_walk(mm, vma, &hpte, pvmw->address, + PAGE_SIZE, /*stop_at_none=*/true); + + pvmw->pte = hpte.ptep; + pvmw->pte_order = hpte.shift - PAGE_SHIFT; + pvmw->ptl = hugetlb_pte_lock(mm, &hpte); + pteval = huge_ptep_get(hpte.ptep); + if (pte_present(pteval) && !hugetlb_pte_present_leaf( + &hpte, pteval)) { + /* + * Someone split from under us, so keep + * walking. + */ + spin_unlock(pvmw->ptl); + goto retry_walk; + } + } while (!check_pte(pvmw)); return true; } diff --git a/mm/rmap.c b/mm/rmap.c index 527463c1e936..a8359584467e 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1552,17 +1552,23 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, flush_cache_range(vma, range.start, range.end); /* - * To call huge_pmd_unshare, i_mmap_rwsem must be - * held in write mode. Caller needs to explicitly - * do this outside rmap routines. - * - * We also must hold hugetlb vma_lock in write mode. - * Lock order dictates acquiring vma_lock BEFORE - * i_mmap_rwsem. We can only try lock here and fail - * if unsuccessful. + * If HGM is enabled, we have already grabbed the VMA + * lock for reading, and we cannot safely release it. + * Because HGM-enabled VMAs have already unshared all + * PMDs, we can safely ignore PMD unsharing here. */ - if (!anon) { + if (!anon && !hugetlb_hgm_enabled(vma)) { VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); + /* + * To call huge_pmd_unshare, i_mmap_rwsem must + * be held in write mode. Caller needs to + * explicitly do this outside rmap routines. + * + * We also must hold hugetlb vma_lock in write + * mode. Lock order dictates acquiring vma_lock + * BEFORE i_mmap_rwsem. We can only try lock + * here and fail if unsuccessful. + */ if (!hugetlb_vma_trylock_write(vma)) { page_vma_mapped_walk_done(&pvmw); ret = false; @@ -1946,17 +1952,23 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, flush_cache_range(vma, range.start, range.end); /* - * To call huge_pmd_unshare, i_mmap_rwsem must be - * held in write mode. Caller needs to explicitly - * do this outside rmap routines. - * - * We also must hold hugetlb vma_lock in write mode. - * Lock order dictates acquiring vma_lock BEFORE - * i_mmap_rwsem. We can only try lock here and - * fail if unsuccessful. + * If HGM is enabled, we have already grabbed the VMA + * lock for reading, and we cannot safely release it. + * Because HGM-enabled VMAs have already unshared all + * PMDs, we can safely ignore PMD unsharing here. */ - if (!anon) { + if (!anon && !hugetlb_hgm_enabled(vma)) { VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); + /* + * To call huge_pmd_unshare, i_mmap_rwsem must + * be held in write mode. Caller needs to + * explicitly do this outside rmap routines. + * + * We also must hold hugetlb vma_lock in write + * mode. Lock order dictates acquiring vma_lock + * BEFORE i_mmap_rwsem. We can only try lock + * here and fail if unsuccessful. + */ if (!hugetlb_vma_trylock_write(vma)) { page_vma_mapped_walk_done(&pvmw); ret = false; From patchwork Fri Oct 21 16:36:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6844 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp796171wrr; Fri, 21 Oct 2022 09:40:07 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6t6ifSIX4lJjEzj563OfOP0025bQI3ddystwncToYmCE58bb5hTK3rBEZPlP4A5V0ZdVhV X-Received: by 2002:a05:6a00:1707:b0:562:e790:dfc3 with SMTP id h7-20020a056a00170700b00562e790dfc3mr20156600pfc.59.1666370407223; Fri, 21 Oct 2022 09:40:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370407; cv=none; d=google.com; s=arc-20160816; b=mrWl9mhJXWwoMBmomAtlfDKV3jCHC8oJ7r+dvSSPOeBv1+PlLNRFqZVsRYKH3gkdDA Ip3lEsfHf792thpdPUeUDZ4S6AyT7gV2ch+lI800aIKoA+rvd3Msm7MFapGDk30INb3W aMNW25UYluYSg51g4RnDoGAFgBOmAhopdKBpdXiZ4bS7pHXSloAkwjMriBTnRoSdSfbE M40wqf45oDOYLSTjFcZTnEjQqHY7To7VaFe+qQ7qgZHIVi1cyVHFsYScZv9XPyD3xETm qY5No/iaDrJcpPo4gel3Yrdu+ywb8Oh3QrFUXFyjUVjOMbgFbfH+3Y1PRpMCmFGKJ3V4 u5AA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=n5VzrQrgnFUgUGieOOqXon7u8Uw/+xZD2vuQWAKlM0E=; b=V3yeW2WhA/F+v1UXXfLIAjKJDCGPzxshWz3gd0RoVyjh1QUzA08uckMrFMsctmnD3X l4Iy2VJNa5DwEOCx1IVAcgFk58rFuaujNCm5UOebGW+sbjigxoTGcOw4D7izj/RgfNDH XmNPxla/DWVen0QaKCr8IA38rFr8AWKpVRnG7X222P5om0mGSYDBX5WMqmszDoxZmuGZ T04oWwlaSY/pyl87fy3z5AdpXxjH+8Wm9HKrvZer6rwdbq2LVumr2L3V+6lubaVeqfPg zy0ku/swGEdBil7Ddi7SdoVy/EO0vv5xKXJ1UBTkh1vgApifNCrf4ZzoENgMHjcSl0PY iv6w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="F0/10ArY"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x64-20020a638643000000b0043c9d3bc8fcsi24283954pgd.580.2022.10.21.09.39.54; Fri, 21 Oct 2022 09:40:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="F0/10ArY"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231286AbiJUQjc (ORCPT + 99 others); Fri, 21 Oct 2022 12:39:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52604 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230495AbiJUQiV (ORCPT ); Fri, 21 Oct 2022 12:38:21 -0400 Received: from mail-vs1-xe4a.google.com (mail-vs1-xe4a.google.com [IPv6:2607:f8b0:4864:20::e4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ACE122892D6 for ; Fri, 21 Oct 2022 09:37:40 -0700 (PDT) Received: by mail-vs1-xe4a.google.com with SMTP id 124-20020a671082000000b0039b07671c7aso1059586vsq.13 for ; Fri, 21 Oct 2022 09:37:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=n5VzrQrgnFUgUGieOOqXon7u8Uw/+xZD2vuQWAKlM0E=; b=F0/10ArYlwy8KqPJwKkvyO4Kn3FHV6wsoqD58NMFi4c3hrL91LKlK8QvSpR/Gi5jq7 8V938nHwYuBhcZy87Ak4geIo2bLTgzyOZnSBMzMsfaBHlybcG/7ybv/sJK8kQ71WN5JR ekEzvlX4a51pOlaarCmmZvNB7g071LCPtfavN9axUqYDW7ZwfJDHPNjILkcheYku32by t5iyhpbam+p3mqmNvdhDtz//VOQktYqdwwXSfb6PYrkvtME/W09eAN3rCIPNMTaIKzzZ 8CK1TCpdG1tv5c3OF14+w2xwlejiF31gunYGANRvHbfBrQU2x5LoJVQkfQShQHGIvQ+H RCzA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=n5VzrQrgnFUgUGieOOqXon7u8Uw/+xZD2vuQWAKlM0E=; b=l2yMMrlsWRP0H88oULs4XspdT1FAIIyWTpLXCEx/gJ1CzHDtScnWWYgod0GzsLmY57 kq52ZzlGRHzBWuBTdNBNQrMSRxzVQb/YI8Gd1MAvxguZfTJe48eDb42msWF6wZN+Pq05 aXjYiHFcDf7QOwBdIO8M4UgbHOMDLxzx0kp5Xc9Tn3LfdOzPAmRhCg/ezyNuHssXqK+U rD3/AjZqMxXd3h9MVAV849mzoSuz7B6vOekp9gWp/HjT1EDPKoLM6Xv8597OD+P9VqZG IOg6W9SaTOfZxB/Nd2Kn2hCzYmrLzIjH4AAZaUkbuIMXoB1Ww8f1jgQR3vHORyK3A2XQ 7zYg== X-Gm-Message-State: ACrzQf1tmS4PDi2m5Wni7oW7DSKF16lU5qWdZTlJ0sTEEYzBeT+0jaeI jLOoRjaACT1nxjlECO5+BDQWYA8epS+nYyGi X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a1f:f288:0:b0:3ab:a997:901a with SMTP id q130-20020a1ff288000000b003aba997901amr12715913vkh.19.1666370259573; Fri, 21 Oct 2022 09:37:39 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:41 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-26-jthoughton@google.com> Subject: [RFC PATCH v2 25/47] hugetlb: add HGM support for copy_hugetlb_page_range From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747316016068472907?= X-GMAIL-MSGID: =?utf-8?q?1747316016068472907?= This allows fork() to work with high-granularity mappings. The page table structure is copied such that partially mapped regions will remain partially mapped in the same way for the new process. A page's reference count is incremented for *each* portion of it that is mapped in the page table. For example, if you have a PMD-mapped 1G page, the reference count and mapcount will be incremented by 512. Signed-off-by: James Houghton --- mm/hugetlb.c | 81 +++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 65 insertions(+), 16 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 5783a8307a77..7d692907cbf3 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4946,7 +4946,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, struct vm_area_struct *src_vma) { pte_t *src_pte, *dst_pte, entry; - struct page *ptepage; + struct hugetlb_pte src_hpte, dst_hpte; + struct page *ptepage, *hpage; unsigned long addr; bool cow = is_cow_mapping(src_vma->vm_flags); struct hstate *h = hstate_vma(src_vma); @@ -4956,6 +4957,16 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, unsigned long last_addr_mask; int ret = 0; + if (hugetlb_hgm_enabled(src_vma)) { + /* + * src_vma might have high-granularity PTEs, and dst_vma will + * need to copy those. + */ + ret = enable_hugetlb_hgm(dst_vma); + if (ret) + return ret; + } + if (cow) { mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, src_vma, src, src_vma->vm_start, @@ -4967,18 +4978,22 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, /* * For shared mappings the vma lock must be held before * calling huge_pte_offset in the src vma. Otherwise, the - * returned ptep could go away if part of a shared pmd and - * another thread calls huge_pmd_unshare. + * returned ptep could go away if + * - part of a shared pmd and another thread calls + * huge_pmd_unshare, or + * - another thread collapses a high-granularity mapping. */ hugetlb_vma_lock_read(src_vma); } last_addr_mask = hugetlb_mask_last_page(h); - for (addr = src_vma->vm_start; addr < src_vma->vm_end; addr += sz) { + addr = src_vma->vm_start; + while (addr < src_vma->vm_end) { spinlock_t *src_ptl, *dst_ptl; + unsigned long hpte_sz; src_pte = huge_pte_offset(src, addr, sz); if (!src_pte) { - addr |= last_addr_mask; + addr = (addr | last_addr_mask) + sz; continue; } dst_pte = huge_pte_alloc(dst, dst_vma, addr, sz); @@ -4987,6 +5002,26 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, break; } + hugetlb_pte_populate(&src_hpte, src_pte, huge_page_shift(h), + hpage_size_to_level(huge_page_size(h))); + hugetlb_pte_populate(&dst_hpte, dst_pte, huge_page_shift(h), + hpage_size_to_level(huge_page_size(h))); + + if (hugetlb_hgm_enabled(src_vma)) { + hugetlb_hgm_walk(src, src_vma, &src_hpte, addr, + PAGE_SIZE, /*stop_at_none=*/true); + ret = hugetlb_hgm_walk(dst, dst_vma, &dst_hpte, addr, + hugetlb_pte_size(&src_hpte), + /*stop_at_none=*/false); + if (ret) + break; + + src_pte = src_hpte.ptep; + dst_pte = dst_hpte.ptep; + } + + hpte_sz = hugetlb_pte_size(&src_hpte); + /* * If the pagetables are shared don't copy or take references. * @@ -4996,12 +5031,12 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, * to reliably determine whether pte is shared. */ if (page_count(virt_to_page(dst_pte)) > 1) { - addr |= last_addr_mask; + addr = (addr | last_addr_mask) + sz; continue; } - dst_ptl = huge_pte_lock(h, dst, dst_pte); - src_ptl = huge_pte_lockptr(huge_page_shift(h), src, src_pte); + dst_ptl = hugetlb_pte_lock(dst, &dst_hpte); + src_ptl = hugetlb_pte_lockptr(src, &src_hpte); spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); entry = huge_ptep_get(src_pte); again: @@ -5042,10 +5077,15 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, */ if (userfaultfd_wp(dst_vma)) set_huge_pte_at(dst, addr, dst_pte, entry); + } else if (!hugetlb_pte_present_leaf(&src_hpte, entry)) { + /* Retry the walk. */ + spin_unlock(src_ptl); + spin_unlock(dst_ptl); + continue; } else { - entry = huge_ptep_get(src_pte); ptepage = pte_page(entry); - get_page(ptepage); + hpage = compound_head(ptepage); + get_page(hpage); /* * Failing to duplicate the anon rmap is a rare case @@ -5058,24 +5098,29 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, * sleep during the process. */ if (!PageAnon(ptepage)) { - page_dup_file_rmap(ptepage, true); - } else if (page_try_dup_anon_rmap(ptepage, true, + page_dup_file_rmap(hpage, true); + } else if (page_try_dup_anon_rmap(hpage, true, src_vma)) { pte_t src_pte_old = entry; struct page *new; + if (hugetlb_hgm_enabled(src_vma)) { + ret = -EINVAL; + break; + } + spin_unlock(src_ptl); spin_unlock(dst_ptl); /* Do not use reserve as it's private owned */ new = alloc_huge_page(dst_vma, addr, 1); if (IS_ERR(new)) { - put_page(ptepage); + put_page(hpage); ret = PTR_ERR(new); break; } - copy_user_huge_page(new, ptepage, addr, dst_vma, + copy_user_huge_page(new, hpage, addr, dst_vma, npages); - put_page(ptepage); + put_page(hpage); /* Install the new huge page if src pte stable */ dst_ptl = huge_pte_lock(h, dst, dst_pte); @@ -5093,6 +5138,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, hugetlb_install_page(dst_vma, dst_pte, addr, new); spin_unlock(src_ptl); spin_unlock(dst_ptl); + addr += hugetlb_pte_size(&src_hpte); continue; } @@ -5109,10 +5155,13 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, } set_huge_pte_at(dst, addr, dst_pte, entry); - hugetlb_count_add(npages, dst); + hugetlb_count_add( + hugetlb_pte_size(&dst_hpte) / PAGE_SIZE, + dst); } spin_unlock(src_ptl); spin_unlock(dst_ptl); + addr += hugetlb_pte_size(&src_hpte); } if (cow) { From patchwork Fri Oct 21 16:36:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6845 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp796207wrr; Fri, 21 Oct 2022 09:40:11 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6ordF0FabdiNN5PS58Ern66xJZsE3lnjnogBFhw3080WBJtQ7E7XuX65FQp6rpWOovWYx0 X-Received: by 2002:a63:e153:0:b0:439:2fa3:74d1 with SMTP id h19-20020a63e153000000b004392fa374d1mr16866187pgk.85.1666370410780; Fri, 21 Oct 2022 09:40:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370410; cv=none; d=google.com; s=arc-20160816; b=rSZqudF06Hc5sl5KVW64iUUjEOP06RHn/t3GTww0z0VdNyNOD4Pzpp03COabPVkhyg 4FHU+fYAKlfAeqKjqZ2bX2nvsb5Nd6WWU4/7oVdVlAjxc+2Tb+9/2JMH39Nukd0R1qvH o+OtVaU0gmHWng9nxbeMzvQVg8Pm9SMknFXH5Efy9x5EeoqLWfnqDNJQtKrKfWRxS1HO RvdU9KpdtBrhRpu96slMItz5LP9ZncnStIpZ0IlQWRo+t/y5PkmA1oRC/qoJxlblhZ1j n1oJPlrV3NagcmK6ZfW/O1lirUXeWRrP3NAcbknIvmm/F5RSfPDQAgNPLYPUWhtVGkei VjoQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=iBv8Qeh3umlbRvLwHQeNwyurmrBSLw5TULsp1T6zjOQ=; b=HM3QVrLEQ75HHsMn1JbUGsJlRoAuH8QubUcAaZcjgTm+Zn+AwNkIRp6Ivw3FRoUyyp jEhVLKNq0+RfXc5nm2Lbfdk5VzWK2jNh2vZ3woh3wKER8r6It76R4hYQ4SZx5jI4Cwfx WQfDJc6zMvvyy8U9eGS9Fd/0gDMGC90LNi9BzJi3mft6QFBnrZbrYZvttMuC29XpZn7F llyJXUeMep8ybe+wQkK1Ej2SKdaCBjJH1H3e/AdrzpMc00/AC+y0XwSNQN1/F8Sg6gzx jz+wTs2OlDXm2PV4Sh3AFm6aoMrsZgDUO4CUD2sqcKnA1DQn/2dvAAPihQZb8WnWR4EF 7Zkw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=suyR7+ET; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id my15-20020a17090b4c8f00b0020d67a4e6fasi7741938pjb.174.2022.10.21.09.39.58; Fri, 21 Oct 2022 09:40:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=suyR7+ET; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231298AbiJUQjh (ORCPT + 99 others); Fri, 21 Oct 2022 12:39:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53400 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231215AbiJUQi3 (ORCPT ); Fri, 21 Oct 2022 12:38:29 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9344E2892DF for ; Fri, 21 Oct 2022 09:37:41 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id m66-20020a257145000000b006c23949ec98so3762402ybc.4 for ; Fri, 21 Oct 2022 09:37:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=iBv8Qeh3umlbRvLwHQeNwyurmrBSLw5TULsp1T6zjOQ=; b=suyR7+ETHnPADETLXijl2rxvK4S0l8lKXahB00jRy94KrLql58bgTIu6Xtq+wEAutw 3qSk5gGSMSLsMT2bwUVmLbiHjZ0JM//d+Va50N+iIzaNRXrtWFUP2ujYlH1+zkVlnTBY o4N+HujD+Uvd3YUSbtc6XaGAqm8geaCN4uRH8TMKVxMsk5SRi2j9odHj3AlMttSaThcn piV7sQs60kSX1tbZIg1XS5TZQJHNgM0uTI7X8y1AXFlegWGyn7RhVj6zPo9K9F6hUJ3t H8alvqf47S17+GW/5DSv1v7fEO6GtqbrtOg9qQM+8ny5ZkncQHoN4Y8WiC0iIHWzuQsy 2f6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=iBv8Qeh3umlbRvLwHQeNwyurmrBSLw5TULsp1T6zjOQ=; b=5Mzdsc7CsKuEqubgr+2xux6i0L/aDsRcSo1BEi/5dUw8ppuyMwC1m36+wdujhy1lIT YVZgOYVEWFRjC+yCOVotFibhuoH9IITIWdWDABIyQAHikv6GHB852X270KG7YY8fNFyE L8Kwu+j59mva2LgDVrCh4MBySF0J6AR28o4hhWrcbF/8lmSsIDGUPmsK34o9hNZw4o8A wh1PF3z77/Q/EqqS5R76Z/WoXs8nYFHMOnUJSFy1yvnQbVQaPeEdqR3w4grOEt9Cu2bA 1P1LOSaA5PvaFE+BwlVwM54rBZiqfnHO2yLBWs/CX6Nv9zvUi2B3bpUmb46HpcxW8qB5 yRwA== X-Gm-Message-State: ACrzQf228TMlyqrxYpJ78lc4KTCPgqFmLjB+LkwCLGDGV4ptug8I4NcB BNQTuyo2Fp6Ls+ax3pLlwBRduwDAjq7MS/ll X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:84cc:0:b0:6c4:95:f1c1 with SMTP id x12-20020a2584cc000000b006c40095f1c1mr18790616ybm.131.1666370260537; Fri, 21 Oct 2022 09:37:40 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:42 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-27-jthoughton@google.com> Subject: [RFC PATCH v2 26/47] hugetlb: make move_hugetlb_page_tables compatible with HGM From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747316019743838517?= X-GMAIL-MSGID: =?utf-8?q?1747316019743838517?= This is very similar to the support that was added to copy_hugetlb_page_range. We simply do a high-granularity walk now, and most of the rest of the code stays the same. Signed-off-by: James Houghton --- mm/hugetlb.c | 47 ++++++++++++++++++++++++++++++++--------------- 1 file changed, 32 insertions(+), 15 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 7d692907cbf3..16b0d192445c 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5174,16 +5174,16 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, return ret; } -static void move_huge_pte(struct vm_area_struct *vma, unsigned long old_addr, - unsigned long new_addr, pte_t *src_pte, pte_t *dst_pte) +static void move_hugetlb_pte(struct vm_area_struct *vma, unsigned long old_addr, + unsigned long new_addr, struct hugetlb_pte *src_hpte, + struct hugetlb_pte *dst_hpte) { - struct hstate *h = hstate_vma(vma); struct mm_struct *mm = vma->vm_mm; spinlock_t *src_ptl, *dst_ptl; pte_t pte; - dst_ptl = huge_pte_lock(h, mm, dst_pte); - src_ptl = huge_pte_lockptr(huge_page_shift(h), mm, src_pte); + dst_ptl = hugetlb_pte_lock(mm, dst_hpte); + src_ptl = hugetlb_pte_lockptr(mm, src_hpte); /* * We don't have to worry about the ordering of src and dst ptlocks @@ -5192,8 +5192,8 @@ static void move_huge_pte(struct vm_area_struct *vma, unsigned long old_addr, if (src_ptl != dst_ptl) spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); - pte = huge_ptep_get_and_clear(mm, old_addr, src_pte); - set_huge_pte_at(mm, new_addr, dst_pte, pte); + pte = huge_ptep_get_and_clear(mm, old_addr, src_hpte->ptep); + set_huge_pte_at(mm, new_addr, dst_hpte->ptep, pte); if (src_ptl != dst_ptl) spin_unlock(src_ptl); @@ -5214,6 +5214,7 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma, pte_t *src_pte, *dst_pte; struct mmu_notifier_range range; bool shared_pmd = false; + struct hugetlb_pte src_hpte, dst_hpte; mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, old_addr, old_end); @@ -5229,20 +5230,28 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma, /* Prevent race with file truncation */ hugetlb_vma_lock_write(vma); i_mmap_lock_write(mapping); - for (; old_addr < old_end; old_addr += sz, new_addr += sz) { + while (old_addr < old_end) { src_pte = huge_pte_offset(mm, old_addr, sz); if (!src_pte) { - old_addr |= last_addr_mask; - new_addr |= last_addr_mask; + old_addr = (old_addr | last_addr_mask) + sz; + new_addr = (new_addr | last_addr_mask) + sz; continue; } - if (huge_pte_none(huge_ptep_get(src_pte))) + + hugetlb_pte_populate(&src_hpte, src_pte, huge_page_shift(h), + hpage_size_to_level(sz)); + hugetlb_hgm_walk(mm, vma, &src_hpte, old_addr, + PAGE_SIZE, /*stop_at_none=*/true); + if (huge_pte_none(huge_ptep_get(src_hpte.ptep))) { + old_addr += hugetlb_pte_size(&src_hpte); + new_addr += hugetlb_pte_size(&src_hpte); continue; + } - if (huge_pmd_unshare(mm, vma, old_addr, src_pte)) { + if (huge_pmd_unshare(mm, vma, old_addr, src_hpte.ptep)) { shared_pmd = true; - old_addr |= last_addr_mask; - new_addr |= last_addr_mask; + old_addr = (old_addr | last_addr_mask) + sz; + new_addr = (new_addr | last_addr_mask) + sz; continue; } @@ -5250,7 +5259,15 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma, if (!dst_pte) break; - move_huge_pte(vma, old_addr, new_addr, src_pte, dst_pte); + hugetlb_pte_populate(&dst_hpte, dst_pte, huge_page_shift(h), + hpage_size_to_level(sz)); + if (hugetlb_hgm_walk(mm, vma, &dst_hpte, new_addr, + hugetlb_pte_size(&src_hpte), + /*stop_at_none=*/false)) + break; + move_hugetlb_pte(vma, old_addr, new_addr, &src_hpte, &dst_hpte); + old_addr += hugetlb_pte_size(&src_hpte); + new_addr += hugetlb_pte_size(&src_hpte); } if (shared_pmd) From patchwork Fri Oct 21 16:36:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6850 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp796534wrr; Fri, 21 Oct 2022 09:40:47 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5oAfIx4XyiT59Y/0TZewOrchxouPHaubFOEW92AwGQqwibbqTLEicLf3y/vp8HkZO2gHdA X-Received: by 2002:a17:903:124e:b0:178:6946:a2ba with SMTP id u14-20020a170903124e00b001786946a2bamr20012421plh.89.1666370447227; Fri, 21 Oct 2022 09:40:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370447; cv=none; d=google.com; s=arc-20160816; b=HAH+RSSGxP/EE0X6r/ALp9I0TgG0uEl2QRA/RKbF7yWNQ51R246TaQTB4hWwnJ/ZuX 6N5uoTChp0ELgBUAXDjXJ8+BsnBYo25QtiwG481OdeSUCCIc42zzxDCB43OqbXpqFSfB dCd+E+uGtB8MKEeIy2M9R2evV+5+NzrDPotwu+GlteBUZZPHsI1EcpV02wEy0tRhvthG cWF8FxqusOmwjGDJ3McZr6P+Ng6iyPLOsWgcynCNUgdQxQSsdR8E8CWOtEra6cDxxaVk nOEbVjF0ev/xuKyLXh0h8LSgPfBg5RcSrQhCNf211VSwR3dVxWPmjKsYBNKVqS7wl+7I +Y/A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=xcIc6o2q1ZWtyzFetKZVD9MlAzdaCFh0Vvt9+xlVO4U=; b=e6CSpljj1m984gboWRPL7EuJZ8OPhjjmXbI9H08UneZgxecelj3vuuxIzCDJrdviKv 0y6Q6NxNH+F5UmVHZUDqA7V74NcB2kezfZK45+BENc+0elefm9MOFyl+rXNEWBVnuIS7 1Z4lXH8dgJfaAY6pLC3DvAgjTq9cYnBp84ThuqfCIT9R6nMeD0OgEc2CbDoW+LODEO0n 2WET5hJJNEVtumffQAdq+US8xgDpiH2Zb46FnpVKhkXHhpiXRLPyAjRoGRe0jBP1qAX7 ytSSXI45wqRgDigV58Rx4R+AWpsHd95DuxSf/1vG0Qc8rbM092vXAdf6vjTWjchpdFX5 64Ug== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=RBhXCh4+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u3-20020a17090341c300b001822121c45fsi29561292ple.338.2022.10.21.09.40.34; Fri, 21 Oct 2022 09:40:47 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=RBhXCh4+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231386AbiJUQkF (ORCPT + 99 others); Fri, 21 Oct 2022 12:40:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53962 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231145AbiJUQir (ORCPT ); Fri, 21 Oct 2022 12:38:47 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C94B12892E2 for ; Fri, 21 Oct 2022 09:37:42 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id v17-20020a259d91000000b006b4c31c0640so3722736ybp.18 for ; Fri, 21 Oct 2022 09:37:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=xcIc6o2q1ZWtyzFetKZVD9MlAzdaCFh0Vvt9+xlVO4U=; b=RBhXCh4+M6IU1RPOCl/mMwL/StlmI///zFvgKrj30KQcO9AVvs3UiO61F1asmizkBs xj35V7Esrs0WYRxwoyhm7vvnHyJsjHl37OhS9h/vWxsEHfNinNfICaG7CudxLzL06C2/ ZNuGUVrCoCLtVIzeXI+CZNkVbBFWX4dzCSWr/REeJH2L4rNTqx80kStlB6xJeNxzmo+p um0YkZS3Pl93a17EcSQA2mJP+JIyDjPEhxfNNRk/fLvPyR7v7qRC28tNFCPNiLrxtNze fip6KSAYskk7JdwZzlm12LEGDesz9sLZXePmgdnFbFD5TgEwuKmoBgaXaElakdHSJy4e /oeQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xcIc6o2q1ZWtyzFetKZVD9MlAzdaCFh0Vvt9+xlVO4U=; b=K77a7pBx5/UP9rQHJJDCY2tNyz1L4IiiQgQUT9FCgfeUN3e3YdeyeGak42kcwhH4AL UniH3gsFxdMZDLfXhHI+hSJS7yJqdEmtwAnjyyexSNPOoM7RtrwVtQpjGl0iLWLRSa2p Ja88GVEPIs6WvgSfY98w0fBKbUgXHE2rpmjLSRFB3Vjxj60yobHwB1xqXp5CPM0zx2bc K0dAWSmeOkV6045J7jpf+WmYrNbBbX9OSGA4wsWoLJAo//x0S4aAbUrs/U3yK11z9p87 KSBHSnpwF5iU64spGcr/d/CX4pUUKSedxr61kHedqH1jOSNBCjoID/erMReOc7yzGgLY 2tug== X-Gm-Message-State: ACrzQf1cl6GxfBDbNyw5bU9lbS5GcHq8XNtyh2PO9IDnGia5iFzzDUBE MI+2oxmk8J6QhOD81BLJRLWKpPUAixU4WA0h X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a5b:8c2:0:b0:6bc:272:4f42 with SMTP id w2-20020a5b08c2000000b006bc02724f42mr17980966ybq.555.1666370261433; Fri, 21 Oct 2022 09:37:41 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:43 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-28-jthoughton@google.com> Subject: [RFC PATCH v2 27/47] hugetlb: add HGM support for hugetlb_fault and hugetlb_no_page From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747316058072714258?= X-GMAIL-MSGID: =?utf-8?q?1747316058072714258?= Update the page fault handler to support high-granularity page faults. While handling a page fault on a partially-mapped HugeTLB page, if the PTE we find with hugetlb_pte_walk is none, then we will replace it with a leaf-level PTE to map the page. To give some examples: 1. For a completely unmapped 1G page, it will be mapped with a 1G PUD. 2. For a 1G page that has its first 512M mapped, any faults on the unmapped sections will result in 2M PMDs mapping each unmapped 2M section. 3. For a 1G page that has only its first 4K mapped, a page fault on its second 4K section will get a 4K PTE to map it. Unless high-granularity mappings are created via UFFDIO_CONTINUE, it is impossible for hugetlb_fault to create high-granularity mappings. This commit does not handle hugetlb_wp right now, and it doesn't handle HugeTLB page migration and swap entries. Signed-off-by: James Houghton --- mm/hugetlb.c | 90 +++++++++++++++++++++++++++++++++++++--------------- 1 file changed, 64 insertions(+), 26 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 16b0d192445c..2ee2c48ee79c 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -118,6 +118,18 @@ enum hugetlb_level hpage_size_to_level(unsigned long sz) return HUGETLB_LEVEL_PGD; } +/* + * Find the subpage that corresponds to `addr` in `hpage`. + */ +static struct page *hugetlb_find_subpage(struct hstate *h, struct page *hpage, + unsigned long addr) +{ + size_t idx = (addr & ~huge_page_mask(h))/PAGE_SIZE; + + BUG_ON(idx >= pages_per_huge_page(h)); + return &hpage[idx]; +} + static inline bool subpool_is_free(struct hugepage_subpool *spool) { if (spool->count) @@ -5810,13 +5822,13 @@ static inline vm_fault_t hugetlb_handle_userfault(struct vm_area_struct *vma, * false if pte changed or is changing. */ static bool hugetlb_pte_stable(struct hstate *h, struct mm_struct *mm, - pte_t *ptep, pte_t old_pte) + struct hugetlb_pte *hpte, pte_t old_pte) { spinlock_t *ptl; bool same; - ptl = huge_pte_lock(h, mm, ptep); - same = pte_same(huge_ptep_get(ptep), old_pte); + ptl = hugetlb_pte_lock(mm, hpte); + same = pte_same(huge_ptep_get(hpte->ptep), old_pte); spin_unlock(ptl); return same; @@ -5825,17 +5837,18 @@ static bool hugetlb_pte_stable(struct hstate *h, struct mm_struct *mm, static vm_fault_t hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma, struct address_space *mapping, pgoff_t idx, - unsigned long address, pte_t *ptep, + unsigned long address, struct hugetlb_pte *hpte, pte_t old_pte, unsigned int flags) { struct hstate *h = hstate_vma(vma); vm_fault_t ret = VM_FAULT_SIGBUS; int anon_rmap = 0; unsigned long size; - struct page *page; + struct page *page, *subpage; pte_t new_pte; spinlock_t *ptl; unsigned long haddr = address & huge_page_mask(h); + unsigned long haddr_hgm = address & hugetlb_pte_mask(hpte); bool new_page, new_pagecache_page = false; u32 hash = hugetlb_fault_mutex_hash(mapping, idx); @@ -5880,7 +5893,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, * never happen on the page after UFFDIO_COPY has * correctly installed the page and returned. */ - if (!hugetlb_pte_stable(h, mm, ptep, old_pte)) { + if (!hugetlb_pte_stable(h, mm, hpte, old_pte)) { ret = 0; goto out; } @@ -5904,7 +5917,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, * here. Before returning error, get ptl and make * sure there really is no pte entry. */ - if (hugetlb_pte_stable(h, mm, ptep, old_pte)) + if (hugetlb_pte_stable(h, mm, hpte, old_pte)) ret = vmf_error(PTR_ERR(page)); else ret = 0; @@ -5954,7 +5967,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, unlock_page(page); put_page(page); /* See comment in userfaultfd_missing() block above */ - if (!hugetlb_pte_stable(h, mm, ptep, old_pte)) { + if (!hugetlb_pte_stable(h, mm, hpte, old_pte)) { ret = 0; goto out; } @@ -5979,10 +5992,10 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, vma_end_reservation(h, vma, haddr); } - ptl = huge_pte_lock(h, mm, ptep); + ptl = hugetlb_pte_lock(mm, hpte); ret = 0; /* If pte changed from under us, retry */ - if (!pte_same(huge_ptep_get(ptep), old_pte)) + if (!pte_same(huge_ptep_get(hpte->ptep), old_pte)) goto backout; if (anon_rmap) { @@ -5990,20 +6003,25 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, hugepage_add_new_anon_rmap(page, vma, haddr); } else page_dup_file_rmap(page, true); - new_pte = make_huge_pte(vma, page, ((vma->vm_flags & VM_WRITE) - && (vma->vm_flags & VM_SHARED))); + + subpage = hugetlb_find_subpage(h, page, haddr_hgm); + new_pte = make_huge_pte_with_shift(vma, subpage, + ((vma->vm_flags & VM_WRITE) + && (vma->vm_flags & VM_SHARED)), + hpte->shift); /* * If this pte was previously wr-protected, keep it wr-protected even * if populated. */ if (unlikely(pte_marker_uffd_wp(old_pte))) new_pte = huge_pte_wrprotect(huge_pte_mkuffd_wp(new_pte)); - set_huge_pte_at(mm, haddr, ptep, new_pte); + set_huge_pte_at(mm, haddr_hgm, hpte->ptep, new_pte); - hugetlb_count_add(pages_per_huge_page(h), mm); + hugetlb_count_add(hugetlb_pte_size(hpte) / PAGE_SIZE, mm); if ((flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) { + BUG_ON(hugetlb_pte_size(hpte) != huge_page_size(h)); /* Optimization, do the COW without a second fault */ - ret = hugetlb_wp(mm, vma, address, ptep, flags, page, ptl); + ret = hugetlb_wp(mm, vma, address, hpte->ptep, flags, page, ptl); } spin_unlock(ptl); @@ -6066,11 +6084,14 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, u32 hash; pgoff_t idx; struct page *page = NULL; + struct page *subpage = NULL; struct page *pagecache_page = NULL; struct hstate *h = hstate_vma(vma); struct address_space *mapping; int need_wait_lock = 0; unsigned long haddr = address & huge_page_mask(h); + unsigned long haddr_hgm; + struct hugetlb_pte hpte; ptep = huge_pte_offset(mm, haddr, huge_page_size(h)); if (ptep) { @@ -6115,15 +6136,22 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, return VM_FAULT_OOM; } - entry = huge_ptep_get(ptep); + hugetlb_pte_populate(&hpte, ptep, huge_page_shift(h), + hpage_size_to_level(huge_page_size(h))); + /* Do a high-granularity page table walk. */ + hugetlb_hgm_walk(mm, vma, &hpte, address, PAGE_SIZE, + /*stop_at_none=*/true); + + entry = huge_ptep_get(hpte.ptep); /* PTE markers should be handled the same way as none pte */ - if (huge_pte_none_mostly(entry)) + if (huge_pte_none_mostly(entry)) { /* * hugetlb_no_page will drop vma lock and hugetlb fault * mutex internally, which make us return immediately. */ - return hugetlb_no_page(mm, vma, mapping, idx, address, ptep, + return hugetlb_no_page(mm, vma, mapping, idx, address, &hpte, entry, flags); + } ret = 0; @@ -6137,6 +6165,10 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, if (!pte_present(entry)) goto out_mutex; + if (!hugetlb_pte_present_leaf(&hpte, entry)) + /* We raced with someone splitting the entry. */ + goto out_mutex; + /* * If we are going to COW/unshare the mapping later, we examine the * pending reservations for this page now. This will ensure that any @@ -6156,14 +6188,17 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, pagecache_page = find_lock_page(mapping, idx); } - ptl = huge_pte_lock(h, mm, ptep); + ptl = hugetlb_pte_lock(mm, &hpte); /* Check for a racing update before calling hugetlb_wp() */ - if (unlikely(!pte_same(entry, huge_ptep_get(ptep)))) + if (unlikely(!pte_same(entry, huge_ptep_get(hpte.ptep)))) goto out_ptl; + /* haddr_hgm is the base address of the region that hpte maps. */ + haddr_hgm = address & hugetlb_pte_mask(&hpte); + /* Handle userfault-wp first, before trying to lock more pages */ - if (userfaultfd_wp(vma) && huge_pte_uffd_wp(huge_ptep_get(ptep)) && + if (userfaultfd_wp(vma) && huge_pte_uffd_wp(entry) && (flags & FAULT_FLAG_WRITE) && !huge_pte_write(entry)) { struct vm_fault vmf = { .vma = vma, @@ -6187,7 +6222,8 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, * pagecache_page, so here we need take the former one * when page != pagecache_page or !pagecache_page. */ - page = pte_page(entry); + subpage = pte_page(entry); + page = compound_head(subpage); if (page != pagecache_page) if (!trylock_page(page)) { need_wait_lock = 1; @@ -6198,7 +6234,8 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, if (flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) { if (!huge_pte_write(entry)) { - ret = hugetlb_wp(mm, vma, address, ptep, flags, + BUG_ON(hugetlb_pte_size(&hpte) != huge_page_size(h)); + ret = hugetlb_wp(mm, vma, address, hpte.ptep, flags, pagecache_page, ptl); goto out_put_page; } else if (likely(flags & FAULT_FLAG_WRITE)) { @@ -6206,9 +6243,9 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, } } entry = pte_mkyoung(entry); - if (huge_ptep_set_access_flags(vma, haddr, ptep, entry, + if (huge_ptep_set_access_flags(vma, haddr_hgm, hpte.ptep, entry, flags & FAULT_FLAG_WRITE)) - update_mmu_cache(vma, haddr, ptep); + update_mmu_cache(vma, haddr_hgm, hpte.ptep); out_put_page: if (page != pagecache_page) unlock_page(page); @@ -7598,7 +7635,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, pte = (pte_t *)pmd_alloc(mm, pud, addr); } } - BUG_ON(pte && pte_present(*pte) && !pte_huge(*pte)); + BUG_ON(pte && pte_present(*pte) && !pte_huge(*pte) && + !hugetlb_hgm_enabled(vma)); return pte; } From patchwork Fri Oct 21 16:36:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6846 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp796270wrr; Fri, 21 Oct 2022 09:40:19 -0700 (PDT) X-Google-Smtp-Source: AMsMyM46cO1xCUhT9s9P194wortoOm4NJej/o7yC71RY0Y3js4Qkq0KOc/rVJn17JJEUK3RY14l8 X-Received: by 2002:a62:4c6:0:b0:55f:c739:51e0 with SMTP id 189-20020a6204c6000000b0055fc73951e0mr19855580pfe.49.1666370418579; Fri, 21 Oct 2022 09:40:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370418; cv=none; d=google.com; s=arc-20160816; b=YDM0/NpzneqPaVNM02NFXSbIoAKUpTLfnprGKTxY68MYdqAnMN/0l1DneRYXq0yUrB 635DngNo+eu4O3A0AinirVmrTiFW4hAkygC7HcrenwxjEyzYMbO0ghEQBmYKq361A2Xn za+4Jxg3M5A470SxmgXIhxdOhV6Oc0WVuBaj/Z1yLiagz5rnj7W8BPCUja7dUpWoJXvl z9RsPxax9OathIw9c53g5bBSmHx6Kt3zUI68d0JNTW7iKlYPPjsKNbgZZOuBYbWoNWup 0FuMSMfm3i5d2iCHZsdsjz5YMEsWF4D5XvnUAz/OmGSMO+jngxivWEl384IpU5l2i0r8 WzmQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=Ei7k3uU35Z4Hu/w/xkPODXeMU6eNvsuQ55K19eUxLMI=; b=0VNPlY3lQIrgum9ZZYAhMbaqnBFCSHqPXlSLvEdbDx1ptz1UCmV/9O4Ef84hD2Kpqc +Qd7Zo7IfZdZO8r3ZnFWntg3LlSMn1m2JW+3gxwdltqnT3HrEwu7GHjkGDRhNysN9Yt1 siK7BpVccgR3+Ivb7lf0kUg6n2BxSBLdiay5BsonGHmFIIWPl4g1nf1BVGfLBSzP2bz4 HSPvfbR+LitInNytEiF9PwtpH/qkaP85K8FYXn1v5d0FtWUc27Ock1qxw0L0USgXKZCQ bgGbecRYcSaFdVucvfCob6tjOxN6vksHWSEkamNAtMrBVsRkTtjUOgjS9cXga+4iyx+b Z0MQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=huKMsf08; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id mq14-20020a17090b380e00b001fd7ced8960si129242pjb.92.2022.10.21.09.40.04; Fri, 21 Oct 2022 09:40:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=huKMsf08; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231313AbiJUQjo (ORCPT + 99 others); Fri, 21 Oct 2022 12:39:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52692 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231224AbiJUQid (ORCPT ); Fri, 21 Oct 2022 12:38:33 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7CB3C2892E7 for ; Fri, 21 Oct 2022 09:37:43 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id h4-20020a5b02c4000000b006bc192d672bso3734405ybp.22 for ; Fri, 21 Oct 2022 09:37:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Ei7k3uU35Z4Hu/w/xkPODXeMU6eNvsuQ55K19eUxLMI=; b=huKMsf08tu+ikIVOCPqBNwFsJXqNyp0P8NACcPr/sumx9YmFpNoCw7wMEhSlSL8pMP nSAMHrcN+7UfC0DL5d1quDTLv1jC014OC18xm+ftOJsMlX86PO+FJJI4RqcxlQb7hajv PzQFgZUu5gOec6xHS10xWWeea1IDWG2Utvm0kWwRv/uh6n7cEEBs0u+ueLl/S4oJE+Vy nDwZQusgJIjBbOwa/l5QxjrgNgvpToknRIkLb2dQOitFidH8U4mZMupmnTOexm99hkrz z0sYqYb8B3PDqwJp6QYPOzSeMKhGJlKRatwIHEblCaYbrb8BMhBjB2tWxOLSppUpCrob tTWw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Ei7k3uU35Z4Hu/w/xkPODXeMU6eNvsuQ55K19eUxLMI=; b=1bQgREnWO0T731JmYARbFzFi58laCGy+ZOBHNylWXqhdBGUdbF/v6A0OR3XFL8OvBg xi2VE8TU57UqAYsZpX3E0NBZAbQuQaQP0xcsO7jkkOzrUJC5zPhjhc3qyQ+elfmclS+2 REv7ymRE5T3gtknDnvTTtwlvGwN0R0To6JDQyCsH4LUpDcyXbS7D7o5M97u6RojDvbdD n7tLlQ0/EtKIkPWB2xLaQ1+vKA0v+Uj5+q/3v5wMJ5ucGvdomwhIBonDw/yolKD64TAl vIEP3UbFvaKqAGbfM8RvhL/E4gQszKjLWmn1Foeg26ET4I1ua1C4NlQJfo7hIsOKOJgK fojA== X-Gm-Message-State: ACrzQf2BFoGoVhNSy+lx/VGm5IHNlbJOV65PvOvsYsH62VqkkofLga0k 7G5WnN88jjdRFw6eVXnyy3G694Mf2f0qd0BO X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:15c4:0:b0:6bd:4d9c:a3bc with SMTP id 187-20020a2515c4000000b006bd4d9ca3bcmr18149514ybv.211.1666370262380; Fri, 21 Oct 2022 09:37:42 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:44 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-29-jthoughton@google.com> Subject: [RFC PATCH v2 28/47] rmap: in try_to_{migrate,unmap}_one, check head page for page flags From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747316027904807140?= X-GMAIL-MSGID: =?utf-8?q?1747316027904807140?= The main complication here is that HugeTLB pages have their poison status stored in the head page as the HWPoison page flag. Because HugeTLB high-granularity mapping can create PTEs that point to subpages instead of always the head of a hugepage, we need to check the compound_head for page flags. Signed-off-by: James Houghton --- mm/rmap.c | 34 ++++++++++++++++++++++++++-------- 1 file changed, 26 insertions(+), 8 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index a8359584467e..d5e1eb6b8ce5 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1474,10 +1474,11 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, struct mm_struct *mm = vma->vm_mm; DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); pte_t pteval; - struct page *subpage; + struct page *subpage, *page_flags_page; bool anon_exclusive, ret = true; struct mmu_notifier_range range; enum ttu_flags flags = (enum ttu_flags)(long)arg; + bool page_poisoned; /* * When racing against e.g. zap_pte_range() on another cpu, @@ -1530,9 +1531,17 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, subpage = folio_page(folio, pte_pfn(*pvmw.pte) - folio_pfn(folio)); + /* + * We check the page flags of HugeTLB pages by checking the + * head page. + */ + page_flags_page = folio_test_hugetlb(folio) + ? &folio->page + : subpage; + page_poisoned = PageHWPoison(page_flags_page); address = pvmw.address; anon_exclusive = folio_test_anon(folio) && - PageAnonExclusive(subpage); + PageAnonExclusive(page_flags_page); if (folio_test_hugetlb(folio)) { bool anon = folio_test_anon(folio); @@ -1541,7 +1550,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, * The try_to_unmap() is only passed a hugetlb page * in the case where the hugetlb page is poisoned. */ - VM_BUG_ON_PAGE(!PageHWPoison(subpage), subpage); + VM_BUG_ON_FOLIO(!page_poisoned, folio); /* * huge_pmd_unshare may unmap an entire PMD page. * There is no way of knowing exactly which PMDs may @@ -1630,7 +1639,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, /* Update high watermark before we lower rss */ update_hiwater_rss(mm); - if (PageHWPoison(subpage) && !(flags & TTU_IGNORE_HWPOISON)) { + if (page_poisoned && !(flags & TTU_IGNORE_HWPOISON)) { pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); if (folio_test_hugetlb(folio)) { hugetlb_count_sub(1UL << pvmw.pte_order, mm); @@ -1656,7 +1665,9 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, mmu_notifier_invalidate_range(mm, address, address + PAGE_SIZE); } else if (folio_test_anon(folio)) { - swp_entry_t entry = { .val = page_private(subpage) }; + swp_entry_t entry = { + .val = page_private(page_flags_page) + }; pte_t swp_pte; /* * Store the swap location in the pte. @@ -1855,7 +1866,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, struct mm_struct *mm = vma->vm_mm; DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); pte_t pteval; - struct page *subpage; + struct page *subpage, *page_flags_page; bool anon_exclusive, ret = true; struct mmu_notifier_range range; enum ttu_flags flags = (enum ttu_flags)(long)arg; @@ -1935,9 +1946,16 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, subpage = folio_page(folio, pte_pfn(*pvmw.pte) - folio_pfn(folio)); } + /* + * We check the page flags of HugeTLB pages by checking the + * head page. + */ + page_flags_page = folio_test_hugetlb(folio) + ? &folio->page + : subpage; address = pvmw.address; anon_exclusive = folio_test_anon(folio) && - PageAnonExclusive(subpage); + PageAnonExclusive(page_flags_page); if (folio_test_hugetlb(folio)) { bool anon = folio_test_anon(folio); @@ -2048,7 +2066,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, * No need to invalidate here it will synchronize on * against the special swap migration pte. */ - } else if (PageHWPoison(subpage)) { + } else if (PageHWPoison(page_flags_page)) { pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); if (folio_test_hugetlb(folio)) { hugetlb_count_sub(1L << pvmw.pte_order, mm); From patchwork Fri Oct 21 16:36:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6852 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp796663wrr; Fri, 21 Oct 2022 09:41:00 -0700 (PDT) X-Google-Smtp-Source: AMsMyM57DcqZiKZ/TRja4VFaDZIvEWsv9anJKVLWq5OOqmvB8SgSpP/tj9Kc+AN48AMV46J3yW1z X-Received: by 2002:a62:32c2:0:b0:56b:2cce:d46a with SMTP id y185-20020a6232c2000000b0056b2cced46amr2235742pfy.36.1666370460499; Fri, 21 Oct 2022 09:41:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370460; cv=none; d=google.com; s=arc-20160816; b=l7PxpFRLEPbJ1X4nJMxu3StAlDdeldz5vzy8PxbGBYk3fbDw3b/mqEiDiZsCWg2Yf1 UiNc/mlTqzqXKw3teFbnGlUBbC6lQVXTyRaUKOVgl4wZ6JGlHL/nRf7gbBEu52JoKwKb yMnBPLTgQkzzcmKI0tYCUOaUATyiO9qKRTrsJGhng4u218zgs6ngbDL4pF1zCgYegGQr iNcEU0GUrjK8fkpaswLQN4RTiyUsDd9ZA439YFm1IS2XW+tsxJ0btrWD/v+/1F5ZCTj8 q9u/yhQzi6LMjTje5UWLPu2nHQ4BM6tWJ54VxnY51NccCtM4THEQhmVBGQK7Jr1eN9KI PRJw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=mfAGmPTzD6IaWh4r35NYR4uvUMh+9qxawpYMgnabsqo=; b=AvbW7uCQN2dynD5HEw+h9HwymQB28FE6J2wIk2tq/2iVZHmuHXJ5DYO56ovy2IdyE9 eK1St0DsAYo0YJbeOqRVaPtlQoEN20TI1vUn4iEdubh7eC9+Pp/gcSlJo+vxiIf6/G2x YTe1S36+8Mc1DAxxHKJ7DxWDjTt45fJ7tLtTPzFiF0VBQlWv4Y7KRyzXBrVFFK0b0r3m Z7MHkHlfBskpDqTj+6IZ8IZe0UzXuXDGpnzAZ3myQnAIjt9ZrzFqJkumx9DXaEkExfKW NOkBWiF8PIbGB0iPqIY/Do4q0RjLOE4oozXkGPEJdAVdiMUAxdbpw1SBX3E6VJkfLeMo Asiw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=ladT4IDF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id lr13-20020a17090b4b8d00b0020af8232048si94175pjb.139.2022.10.21.09.40.47; Fri, 21 Oct 2022 09:41:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=ladT4IDF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230490AbiJUQkZ (ORCPT + 99 others); Fri, 21 Oct 2022 12:40:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52650 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231285AbiJUQi4 (ORCPT ); Fri, 21 Oct 2022 12:38:56 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ACA922892F4 for ; Fri, 21 Oct 2022 09:37:44 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-368e6c449f2so34163127b3.5 for ; Fri, 21 Oct 2022 09:37:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=mfAGmPTzD6IaWh4r35NYR4uvUMh+9qxawpYMgnabsqo=; b=ladT4IDF0faYkS6EsLyPL4iAwSYt3vTPNcCtHgK4GO6UxLr3iGow3ilNz77qQtTxdU teVbA5i21qZ5PUXIY4R9n+Nlp+jikEryrw14TrwV+/URh3HfX3Ah8cih1kOfbXrR5NfU 7R9L/opxCj6oXyy8sYYGZV+1ooqMBwsfrmEH7fhy8ZHX6lySQFd9UiHPWrK2rd9KNFSB 3/duJ13bOFufz9wC+afwg6lg1M05SolVDmt/Bc7HfN7Dz/OhlEzPe8dGy1s/Dxw1eJ9b MfoCnqg3qGFaYulat7qBcu0qc1uAQ7885pfCKUPqkcpv+B/hepRO/OQQPPnKwTrNgY8F 3D8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=mfAGmPTzD6IaWh4r35NYR4uvUMh+9qxawpYMgnabsqo=; b=w4cgXpb30uI194akcyr776dFfpqiQ69s+QmMKHqlh7bvU3aTMCjAR51weJBvUmUM8U JrS8+QZpFWycdAo+1n6qW6X7KdFaAS2P8lNXjrU3uiyGKiwL86/5dEq0EPvAgEaomZRO t5FRdhxrlMhZ3Qp8mxvbVNS6kIP03lL1z/tCeeDK2ueSpkmzQCoBbUPIhpmMRtxcGPLz zGYcEkQHSmyZ1Ata6RsfM7YOVlct7v1jf7j6cYq7nHrIAfe7iMeyoEz4KSwcCkG2ehOp jVFhT+iI4J5ChGXvivdqsSvIIQwUc2Iv1FowdIUfQXMQHxYZx059azZQrevWo/3RtnLS mYMw== X-Gm-Message-State: ACrzQf25MCKJ9uy5TudcjzwjNCdxKLHmzIhOTTjezKjPuRhZF1EoFGmO DBa94Gi+fRXU85wy3g+C5mslkUmWol2OE9xn X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:c704:0:b0:6c1:9494:f584 with SMTP id w4-20020a25c704000000b006c19494f584mr17803486ybe.98.1666370263251; Fri, 21 Oct 2022 09:37:43 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:45 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-30-jthoughton@google.com> Subject: [RFC PATCH v2 29/47] hugetlb: add high-granularity migration support From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747316072029255514?= X-GMAIL-MSGID: =?utf-8?q?1747316072029255514?= To prevent queueing a hugepage for migration multiple times, we use last_page to keep track of the last page we saw in queue_pages_hugetlb, and if the page we're looking at is last_page, then we skip it. For the non-hugetlb cases, last_page, although unused, is still updated so that it has a consistent meaning with the hugetlb case. This commit adds a check in hugetlb_fault for high-granularity migration PTEs. Signed-off-by: James Houghton --- include/linux/swapops.h | 8 ++++++-- mm/hugetlb.c | 15 ++++++++++++++- mm/mempolicy.c | 24 +++++++++++++++++++----- mm/migrate.c | 18 +++++++++++------- 4 files changed, 50 insertions(+), 15 deletions(-) diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 86b95ccb81bb..2939323d0fd2 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -66,6 +66,8 @@ static inline bool is_pfn_swap_entry(swp_entry_t entry); +struct hugetlb_pte; + /* Clear all flags but only keep swp_entry_t related information */ static inline pte_t pte_swp_clear_flags(pte_t pte) { @@ -346,7 +348,8 @@ extern void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, unsigned long address); #ifdef CONFIG_HUGETLB_PAGE extern void __migration_entry_wait_huge(pte_t *ptep, spinlock_t *ptl); -extern void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte); +extern void migration_entry_wait_huge(struct vm_area_struct *vma, + struct hugetlb_pte *hpte); #endif /* CONFIG_HUGETLB_PAGE */ #else /* CONFIG_MIGRATION */ static inline swp_entry_t make_readable_migration_entry(pgoff_t offset) @@ -375,7 +378,8 @@ static inline void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, unsigned long address) { } #ifdef CONFIG_HUGETLB_PAGE static inline void __migration_entry_wait_huge(pte_t *ptep, spinlock_t *ptl) { } -static inline void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte) { } +static inline void migration_entry_wait_huge(struct vm_area_struct *vma, + struct hugetlb_pte *hpte) { } #endif /* CONFIG_HUGETLB_PAGE */ static inline int is_writable_migration_entry(swp_entry_t entry) { diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 2ee2c48ee79c..8dba8d59ebe5 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6100,9 +6100,11 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, * OK as we are only making decisions based on content and * not actually modifying content here. */ + hugetlb_pte_populate(&hpte, ptep, huge_page_shift(h), + hpage_size_to_level(huge_page_size(h))); entry = huge_ptep_get(ptep); if (unlikely(is_hugetlb_entry_migration(entry))) { - migration_entry_wait_huge(vma, ptep); + migration_entry_wait_huge(vma, &hpte); return 0; } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) return VM_FAULT_HWPOISON_LARGE | @@ -6142,7 +6144,18 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, hugetlb_hgm_walk(mm, vma, &hpte, address, PAGE_SIZE, /*stop_at_none=*/true); + /* + * Now that we have done a high-granularity walk, check again if we are + * looking at a migration entry. + */ entry = huge_ptep_get(hpte.ptep); + if (unlikely(is_hugetlb_entry_migration(entry))) { + hugetlb_vma_unlock_read(vma); + mutex_unlock(&hugetlb_fault_mutex_table[hash]); + migration_entry_wait_huge(vma, &hpte); + return 0; + } + /* PTE markers should be handled the same way as none pte */ if (huge_pte_none_mostly(entry)) { /* diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 275bc549590e..47bf9b16a9c0 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -424,6 +424,7 @@ struct queue_pages { unsigned long start; unsigned long end; struct vm_area_struct *first; + struct page *last_page; }; /* @@ -475,6 +476,7 @@ static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr, flags = qp->flags; /* go to thp migration */ if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) { + qp->last_page = page; if (!vma_migratable(walk->vma) || migrate_page_add(page, qp->pagelist, flags)) { ret = 1; @@ -532,6 +534,7 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr, continue; if (!queue_pages_required(page, qp)) continue; + if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) { /* MPOL_MF_STRICT must be specified if we get here */ if (!vma_migratable(vma)) { @@ -539,6 +542,8 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr, break; } + qp->last_page = page; + /* * Do not abort immediately since there may be * temporary off LRU pages in the range. Still @@ -570,15 +575,22 @@ static int queue_pages_hugetlb(struct hugetlb_pte *hpte, spinlock_t *ptl; pte_t entry; - /* We don't migrate high-granularity HugeTLB mappings for now. */ - if (hugetlb_hgm_enabled(walk->vma)) - return -EINVAL; - ptl = hugetlb_pte_lock(walk->mm, hpte); entry = huge_ptep_get(hpte->ptep); if (!pte_present(entry)) goto unlock; - page = pte_page(entry); + + if (!hugetlb_pte_present_leaf(hpte, entry)) { + ret = -EAGAIN; + goto unlock; + } + + page = compound_head(pte_page(entry)); + + /* We already queued this page with another high-granularity PTE. */ + if (page == qp->last_page) + goto unlock; + if (!queue_pages_required(page, qp)) goto unlock; @@ -605,6 +617,7 @@ static int queue_pages_hugetlb(struct hugetlb_pte *hpte, /* With MPOL_MF_MOVE, we migrate only unshared hugepage. */ if (flags & (MPOL_MF_MOVE_ALL) || (flags & MPOL_MF_MOVE && page_mapcount(page) == 1)) { + qp->last_page = page; if (isolate_hugetlb(page, qp->pagelist) && (flags & MPOL_MF_STRICT)) /* @@ -740,6 +753,7 @@ queue_pages_range(struct mm_struct *mm, unsigned long start, unsigned long end, .start = start, .end = end, .first = NULL, + .last_page = NULL, }; err = walk_page_range(mm, start, end, &queue_pages_walk_ops, &qp); diff --git a/mm/migrate.c b/mm/migrate.c index 8712b694c5a7..197662dd1dc0 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -186,6 +186,9 @@ static bool remove_migration_pte(struct folio *folio, /* pgoff is invalid for ksm pages, but they are never large */ if (folio_test_large(folio) && !folio_test_hugetlb(folio)) idx = linear_page_index(vma, pvmw.address) - pvmw.pgoff; + else if (folio_test_hugetlb(folio)) + idx = (pvmw.address & ~huge_page_mask(hstate_vma(vma)))/ + PAGE_SIZE; new = folio_page(folio, idx); #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION @@ -235,14 +238,15 @@ static bool remove_migration_pte(struct folio *folio, #ifdef CONFIG_HUGETLB_PAGE if (folio_test_hugetlb(folio)) { + struct page *hpage = folio_page(folio, 0); unsigned int shift = pvmw.pte_order + PAGE_SHIFT; pte = arch_make_huge_pte(pte, shift, vma->vm_flags); if (folio_test_anon(folio)) - hugepage_add_anon_rmap(new, vma, pvmw.address, + hugepage_add_anon_rmap(hpage, vma, pvmw.address, rmap_flags); else - page_dup_file_rmap(new, true); + page_dup_file_rmap(hpage, true); set_huge_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte); } else #endif @@ -258,7 +262,7 @@ static bool remove_migration_pte(struct folio *folio, mlock_page_drain_local(); trace_remove_migration_pte(pvmw.address, pte_val(pte), - compound_order(new)); + pvmw.pte_order); /* No need to invalidate - it was non-present before */ update_mmu_cache(vma, pvmw.address, pvmw.pte); @@ -332,12 +336,12 @@ void __migration_entry_wait_huge(pte_t *ptep, spinlock_t *ptl) migration_entry_wait_on_locked(pte_to_swp_entry(pte), NULL, ptl); } -void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte) +void migration_entry_wait_huge(struct vm_area_struct *vma, + struct hugetlb_pte *hpte) { - spinlock_t *ptl = huge_pte_lockptr(huge_page_shift(hstate_vma(vma)), - vma->vm_mm, pte); + spinlock_t *ptl = hugetlb_pte_lockptr(vma->vm_mm, hpte); - __migration_entry_wait_huge(pte, ptl); + __migration_entry_wait_huge(hpte->ptep, ptl); } #endif From patchwork Fri Oct 21 16:36:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6847 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp796287wrr; Fri, 21 Oct 2022 09:40:20 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7UYQT+MVDlgyfnYGeiqi66zebKIUyCDLtLLKAWsf1DPU4NJrwu3KSadb6zpS1MOoN2eHQe X-Received: by 2002:a17:902:9049:b0:180:7922:ce36 with SMTP id w9-20020a170902904900b001807922ce36mr20815721plz.30.1666370420062; Fri, 21 Oct 2022 09:40:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370420; cv=none; d=google.com; s=arc-20160816; b=wwIH5KOCvQ5l1+B3bhB2zH6oCU4ZunHWKgidMPHRAEWXwXKOT7a8YLmuu6Axg/AsUR 7GnVrdzlLo69m1r1VP2tBDGOxKkIpJSqMhbTT6OthZsl4EUvU2R/JuxTnlmzBFw0wVNF 1X4MPaMVZYG902XRxEOSfl+k+PPC0vSId35KAmZQIknNCwZhG1shHALhwq9uNF1Bt7eS UobusiCrootg8cClAVoSJJb5EctKSg6ooWO545+ACgwIaI82LN/oGiHd3Qgc2ST8+4w6 4AUvkhPpcnFGPxjDnRhR5Z1pAvIDEpo/pKnsUiZgQWKv4KUIyY+wIfOlLJqw/h/EuWS4 sGzw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=AZXIzSP66kpr06+Kt+tXVsbo2+8AOhJazRiZYv7H/E0=; b=YkG3bVtugR7jiq3+BndwToqLJR1bMPRx8bWiuoQWwP+HT0FOKY87ldcmj+tYixijjW sI35jWivvUHeh145RZ6RR9qYvfTG6BMkKiuOl6S4rXag0EI5Ln+Wv/y+hHTvNylqXLrO lo4NbtntsMrfTYysSgcmaPH3yMZv/tHS/SI4yhgCoWpbXei/s2FgLvZR4tyMn4acO9vY 93avFcDWHiXX8IpCx2WHgyCBvxRce5Etui8V8piJaRH9Cs1k5o7iTsOuI0seIoB0QqO5 ngYWLPw75k/KWccMBdKRSVIj0BfYFLoPsjUYKOitJZyKgEHpEwruNTr3jCCpkLYHnkwt 9Kow== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=AaEDv6vD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o3-20020a655203000000b004639c679346si25024776pgp.837.2022.10.21.09.40.07; Fri, 21 Oct 2022 09:40:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=AaEDv6vD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231360AbiJUQjs (ORCPT + 99 others); Fri, 21 Oct 2022 12:39:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54872 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231238AbiJUQid (ORCPT ); Fri, 21 Oct 2022 12:38:33 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EF2992892C2 for ; Fri, 21 Oct 2022 09:37:45 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-368036d93abso34003447b3.18 for ; Fri, 21 Oct 2022 09:37:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=AZXIzSP66kpr06+Kt+tXVsbo2+8AOhJazRiZYv7H/E0=; b=AaEDv6vDecL09NV8IK8qq+jwZiQZvvW6A8x7Ma8TJX0CZ7w6xMLKRf3Y909I4MsZZu 0hYooJEJeyPq66zTIVLiJg5dt/gi8xbCkq4HqSj0EjYIEz8fyCwDNYdQStbGaf9MswBA cM0TuGM5Hkk/qVgNL10a3ya+lk67Uue+WlYyzDhlhPN60sp+XydnBo5yzMI6BKPgITpI H+hf53xNhGv8YGv7YkIqkTgxh9mHI8YlwlzFtLYveDQJ8oSEEIKb54V3+B6SA/Eioz/A s4eUG3ERd3oa9TzjeUGeXvx0oAzfFHUOWw0ZDZJbL1DS3wUfuFPlKrawoJROET5ltInS kxow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=AZXIzSP66kpr06+Kt+tXVsbo2+8AOhJazRiZYv7H/E0=; b=shIvyF5HzE9g98YV7rlLNbxFBMcJhfZWb/1ek8VJKye90T/aKT8NYaRZaKTA1wvGzh BUogy9z5rhplcpDsBKDPtDdRgVPU5yO55I67As2EaGaJpCO2mFTrGTGH0uewRcrVw67R Ej1qUldpjEsW+c3IzvpKEfqxUO8dJn1w4Kgog0kZQ06CX21c1S23Qxxbu2C9oQQ094Ck PyH2sscHclqVGULhT/lffuZOJKMVlNTiJXQAAKBjaCZOoAq0hwGEi4H3iyNzFTe+N+ds V+fh7zo0fgHw1LNAez24CfhsLbC+2b4UH5VTtTnNN6SrmXTMsAEfFoC2v41VjjVJA5AA fXDw== X-Gm-Message-State: ACrzQf3/XmZAmKI6t2LL1JT+3uZmGCdsfzM9XohigeuwUSK9yzxvsJCh NTwWhNIQkjG4vZdr9Y/hJ4PL5e70uoV9BPg+ X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:c1c2:0:b0:6c4:318:642f with SMTP id r185-20020a25c1c2000000b006c40318642fmr16424485ybf.561.1666370264324; Fri, 21 Oct 2022 09:37:44 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:46 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-31-jthoughton@google.com> Subject: [RFC PATCH v2 30/47] hugetlb: add high-granularity check for hwpoison in fault path From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747316029281573855?= X-GMAIL-MSGID: =?utf-8?q?1747316029281573855?= Because hwpoison swap entries may be placed beneath the hstate-level PTE, we need to check for it separately (on top of the hstate-level PTE check that remains). Signed-off-by: James Houghton --- mm/hugetlb.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 8dba8d59ebe5..bb0005d57cab 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6154,6 +6154,11 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, mutex_unlock(&hugetlb_fault_mutex_table[hash]); migration_entry_wait_huge(vma, &hpte); return 0; + } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) { + hugetlb_vma_unlock_read(vma); + mutex_unlock(&hugetlb_fault_mutex_table[hash]); + return VM_FAULT_HWPOISON_LARGE | + VM_FAULT_SET_HINDEX(hstate_index(h)); } /* PTE markers should be handled the same way as none pte */ From patchwork Fri Oct 21 16:36:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6848 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp796444wrr; Fri, 21 Oct 2022 09:40:38 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4bqiJTWAxNDW7lqoXTXIAQ1xkEyEUC2/F0aaLI4UQ/Z0RZcJETzeO4VcbFW6ialifzxKLq X-Received: by 2002:a17:903:248b:b0:17d:ea45:d76a with SMTP id p11-20020a170903248b00b0017dea45d76amr20454801plw.97.1666370438420; Fri, 21 Oct 2022 09:40:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370438; cv=none; d=google.com; s=arc-20160816; b=QZ1kpNaQnPZbvv/FX7XoA00/OawquMnqQg5AWUjtexObZ468MKidXGAbLbHFiFFTrF SbwLx8FmxVjOYSwmEnw83NQV2BhX31Bd8rSRwZBegUxuDaSvyQouTFklCHHuisR6FZuJ V2c2hHfUKMOHAZv83R3o1sTFJiKf6ExX9d0e5xPJ+t9uVH3o5jEfSKZAIj1wUKMXiz2V EXWEoKmloMXWCExIJe14mICpjK9SsCkb+6XRJo2vmnh1LEV4k0Hmpooch+o28jvfMJrk XT1mRYB1vq9qZegNBp0aqK+5w0TKLGAp8llyGF55tz1PLVDSDJqy6SI+C6YMFHw7hDjk TziQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=scj2/ZcN7CmnhbVTPEMSvCXttKvb/J7A59aTeFA+Dhs=; b=GjoTl5sxc+4Dl4O9OrYLrjUd6VAswHu0uGRVhc5w5+JpLA/sxkhiFWNBpGMcRSzGb2 ZFvFZPnj/0ZRdLAYbWeZebD6Fv3tzms2vpcsCwTirm+LNtiTFkMR3ZVz7ZVIyNDu3IAX qoT7q5l0+lo8An5N+1GRpIkOaGR3YzzoYQF+tK4PYR7AXVKiNqWv6PTMBi0AD+yWIrrI W/6l9pYmTiwBmzHZCYJlvDvNMSGN+CZ4V3HV9ooWLUvWGiSPDj8KV0+Ng89QBXroGPKh +xAU+ldlLmDlbonrxckTH9iCyIv3BlxPjfC5YqwKp9Qh2985KWMn3xbt2aGRoh56mCvQ P4rA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=S0gGSTHV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 2-20020a630a02000000b0046b2f9bf473si24270763pgk.574.2022.10.21.09.40.23; Fri, 21 Oct 2022 09:40:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=S0gGSTHV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231366AbiJUQjx (ORCPT + 99 others); Fri, 21 Oct 2022 12:39:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52560 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231136AbiJUQiq (ORCPT ); Fri, 21 Oct 2022 12:38:46 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9452927B578 for ; Fri, 21 Oct 2022 09:37:46 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-36a4089328dso14607417b3.2 for ; Fri, 21 Oct 2022 09:37:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=scj2/ZcN7CmnhbVTPEMSvCXttKvb/J7A59aTeFA+Dhs=; b=S0gGSTHVvjAreWqeZs5A9raMMMeKXhXFwUCqBkMsUskY4XwYA32hBXLRGplHk45QeI 5hNgcngQPNVjy5Kk2gQ5YzOxS99JOyMMSdcynaw9rmdXfKkiFEtbFwINokhrgbql0JE2 0sKmbmef0q7IWIbffBh2wHykR88/4gHiqdd0pAw6PZWkeF9kCARu3pi/qAdcULFiEFK2 zqn43ZTH5u5pcpqC+v95dptWRM8ktSEwZXJsat+d9M6BgvVw7Tq+4w0GYsymA05kIp1Y kIdA/tQpriZyd9lUub9jcQWZ1R2v+B1qZdNcMF4fmnajZ6EhnPh0qg0O2vMtK7smyeOR wLPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=scj2/ZcN7CmnhbVTPEMSvCXttKvb/J7A59aTeFA+Dhs=; b=rrXwX1lRb14DyLDW+ahSWsuFgkSX0RikrKWi9X2RCOQDzJDhwTHazZhhoYvs9jjhyS x1pAPNAyY+6fGFZgdsEtPixueteJCGnokFhMn01aqyIokL52GP/AlkB1t1hCpPoeuf81 oagFBxbo86jCflgfu+h5OXbLPzCe2eMZglQIzYEBvdcssU8tOu8WqA6QGIouOXOXuQ5U qZOwz4suFktuVWFu9WQm/vKuOQgwTjVx3PaTv2QkI0faoQch2r6VwxYqde6+IGLXCdi3 y2KpSYP8YzkB+zsTbqgtsea8VcLCgmtAE9gnsNqp+QjVP4RS1EG1EWIIg5g0jh4XMn+W CVoQ== X-Gm-Message-State: ACrzQf3/TkBkvYRPFN/RI37ALjFEp/CILkzk1xudk0zrd1xX0FzgyyQy 8N/dKcBbJf3f0208OsfkBWUr7TTDhPppuKG1 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:c713:0:b0:6ca:203:504f with SMTP id w19-20020a25c713000000b006ca0203504fmr11754100ybe.574.1666370265356; Fri, 21 Oct 2022 09:37:45 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:47 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-32-jthoughton@google.com> Subject: [RFC PATCH v2 31/47] hugetlb: sort hstates in hugetlb_init_hstates From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747316048969391994?= X-GMAIL-MSGID: =?utf-8?q?1747316048969391994?= When using HugeTLB high-granularity mapping, we need to go through the supported hugepage sizes in decreasing order so that we pick the largest size that works. Consider the case where we're faulting in a 1G hugepage for the first time: we want hugetlb_fault/hugetlb_no_page to map it with a PUD. By going through the sizes in decreasing order, we will find that PUD_SIZE works before finding out that PMD_SIZE or PAGE_SIZE work too. This commit also changes bootmem hugepages from storing hstate pointers directly to storing the hstate sizes. The hstate pointers used for boot-time-allocated hugepages become invalid after we sort the hstates. `gather_bootmem_prealloc`, called after the hstates have been sorted, now converts the size to the correct hstate. Signed-off-by: James Houghton --- include/linux/hugetlb.h | 2 +- mm/hugetlb.c | 49 ++++++++++++++++++++++++++++++++--------- 2 files changed, 40 insertions(+), 11 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index d305742e9d44..e25f97cdd086 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -772,7 +772,7 @@ struct hstate { struct huge_bootmem_page { struct list_head list; - struct hstate *hstate; + unsigned long hstate_sz; }; int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index bb0005d57cab..d6f07968156c 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -34,6 +34,7 @@ #include #include #include +#include #include #include @@ -49,6 +50,10 @@ int hugetlb_max_hstate __read_mostly; unsigned int default_hstate_idx; +/* + * After hugetlb_init_hstates is called, hstates will be sorted from largest + * to smallest. + */ struct hstate hstates[HUGE_MAX_HSTATE]; #ifdef CONFIG_CMA @@ -3189,7 +3194,7 @@ int __alloc_bootmem_huge_page(struct hstate *h, int nid) /* Put them into a private list first because mem_map is not up yet */ INIT_LIST_HEAD(&m->list); list_add(&m->list, &huge_boot_pages); - m->hstate = h; + m->hstate_sz = huge_page_size(h); return 1; } @@ -3203,7 +3208,7 @@ static void __init gather_bootmem_prealloc(void) list_for_each_entry(m, &huge_boot_pages, list) { struct page *page = virt_to_page(m); - struct hstate *h = m->hstate; + struct hstate *h = size_to_hstate(m->hstate_sz); VM_BUG_ON(!hstate_is_gigantic(h)); WARN_ON(page_count(page) != 1); @@ -3319,9 +3324,38 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h) kfree(node_alloc_noretry); } +static int compare_hstates_decreasing(const void *a, const void *b) +{ + unsigned long sz_a = huge_page_size((const struct hstate *)a); + unsigned long sz_b = huge_page_size((const struct hstate *)b); + + if (sz_a < sz_b) + return 1; + if (sz_a > sz_b) + return -1; + return 0; +} + +static void sort_hstates(void) +{ + unsigned long default_hstate_sz = huge_page_size(&default_hstate); + + /* Sort from largest to smallest. */ + sort(hstates, hugetlb_max_hstate, sizeof(*hstates), + compare_hstates_decreasing, NULL); + + /* + * We may have changed the location of the default hstate, so we need to + * update it. + */ + default_hstate_idx = hstate_index(size_to_hstate(default_hstate_sz)); +} + static void __init hugetlb_init_hstates(void) { - struct hstate *h, *h2; + struct hstate *h; + + sort_hstates(); for_each_hstate(h) { /* oversize hugepages were init'ed in early boot */ @@ -3340,13 +3374,8 @@ static void __init hugetlb_init_hstates(void) continue; if (hugetlb_cma_size && h->order <= HUGETLB_PAGE_ORDER) continue; - for_each_hstate(h2) { - if (h2 == h) - continue; - if (h2->order < h->order && - h2->order > h->demote_order) - h->demote_order = h2->order; - } + if (h - 1 >= &hstates[0]) + h->demote_order = huge_page_order(h - 1); } } From patchwork Fri Oct 21 16:36:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6862 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp797128wrr; Fri, 21 Oct 2022 09:41:53 -0700 (PDT) X-Google-Smtp-Source: AMsMyM668Sb6MMtcjW7NqQEjRPpDtTuSwANii9EQj99K08usVNQbL9NwTUjqfeHN4cjCW4HZu0lR X-Received: by 2002:a05:6a00:301c:b0:567:6e2c:4c10 with SMTP id ay28-20020a056a00301c00b005676e2c4c10mr16362494pfb.83.1666370513208; Fri, 21 Oct 2022 09:41:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370513; cv=none; d=google.com; s=arc-20160816; b=DIgbM/ciSzjPeUGU0iXnlWZ9KDwDyzVYpgf27PAahA1ewDW6UyVtBwAzM4BbsnhK+p 67N9e4x5mNSvb6jDMSYV8PbP0Cd3cUbVQglLdNP4TeIYMa/RcaR5GxpBsuoRQFZCI84Q Lzs/U4n16gomq3aQn7M3ms6akJVfEmkzJMCgwQJxIIvZYByTKoY8bsprFVd6Fwqs7Jkb ODWZorpQkV67ph/z/tkw0P739k/bjhAqTXOPgbeuTv8aejXLeWTuXAS27SmE6RPAh0ss xwwFGLh23nPGGktQpNQQ/D0bBVDvbHKlcQp/UYIGPkVa97w4q7CpIgXV4ipmF8LBiNbO zktQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=WxKGqQ6TZ4IrlRpo46PGVbYr6hPIEsEGe3hsSCnb6qg=; b=og6SqWUkec4a1KUJgqsHDbqEU1iTNsvdc1Bd+D4zlEABIcGZOc2AIfOKy1rDSO5KCL LnlKzKHGWXKzgqwMLecBj1pizQ12e381IBO1PQggJxBvTK26HibLUM2T6JpnKTA+1qSi X+wYu5dW0NHss5D0qnW1qqvLnjE6NvnPeKb+wxv3vSxiEHORZkmjm/aJmXxoSNsGCqY8 0o9DB31ePt+tpV7N6fiZ3RU92reoeuhiew1e5cGSV0pZu/2VjJqcqtJDEqyj3JGaOrL3 6zRZJotOddFfnjXcKOY+8/Aue7Xolel/RqmVG94gAyabQwsb8FHvvDxXoJy/vtTsNkZI fq2w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=nd84xfoS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v16-20020a1709028d9000b00176955c81eesi22805240plo.247.2022.10.21.09.41.40; Fri, 21 Oct 2022 09:41:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=nd84xfoS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231400AbiJUQlT (ORCPT + 99 others); Fri, 21 Oct 2022 12:41:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53866 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231290AbiJUQjd (ORCPT ); Fri, 21 Oct 2022 12:39:33 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A183F2892E3 for ; Fri, 21 Oct 2022 09:37:47 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id o4-20020a258d84000000b006bcfc1aafbdso3722554ybl.14 for ; Fri, 21 Oct 2022 09:37:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=WxKGqQ6TZ4IrlRpo46PGVbYr6hPIEsEGe3hsSCnb6qg=; b=nd84xfoSk805dVcAVyuoNqGohV7P5cWuxEwnitdbs1rkx0N7wPZbXDeJ4RvhkuBQuI 1v5ieZAWL4fzYytJgsDwUG418qnEFB1/qmZtj48lT3wl5WE9DVb2V1XTv+zdOICQ5pda 5diZHurVjQaoJ7Iv2PjP8DCGO1/4fukcBNdxrI/uAXh+hc99Z5wn2kayMv2lFTeOX5T6 kP7/z1HmhxO1WbkpEmUOZN3ey0YjwgIFaCUcG52wZtr3jBV58ShZQQ9oLg2hp/N1Ghjm Ji/V0BtPfvSDPqTGOkZQH+0TcxQ7lca6EKfHqsThwhdtFcp6XI9rp8phcpX8i0ybjokl Tk3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=WxKGqQ6TZ4IrlRpo46PGVbYr6hPIEsEGe3hsSCnb6qg=; b=4dLlbuKN1GjeYFVaTXVd+pXYSwmLJtxH5AGeziP8I2DOEds1H8LgrYVQT+Vvl5HVrm 1NNqjoiN7cCqNxR5DvTZ9ZO+JA2YI2gKKn87D0U+qMBNw8Ww35hReEcjGeIXwi+6fbXh Linz5DUSuMASKGTTwdx1Rx/rJ1+LTYVRq51ZlBMmLWJK1aryDht4hMPb/iyu2q/OFJt0 dA4rJKwx9zWxqhrJBxzyLDTQeguJEQFyDhUbAZJ4dcwaiTZqxZ1hoyjY0yPygRflTTlz IkMG1NiDvZEJJFAopJzd3QjRDD83XfHPi2EwpSNXiCr9hFTvNGvqN0p443JmpVJU9UEc 0Z3w== X-Gm-Message-State: ACrzQf1/PzzSaZh+ktpR5zGTAafpSOzXCsIPlzRs9KSbEr+3/erj2DFJ UNA4wdQpSQvn8DHGnabWL+Cg/gKmm7W1R4tX X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6902:1027:b0:6ca:71ca:e68b with SMTP id x7-20020a056902102700b006ca71cae68bmr2934076ybt.2.1666370266458; Fri, 21 Oct 2022 09:37:46 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:48 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-33-jthoughton@google.com> Subject: [RFC PATCH v2 32/47] hugetlb: add for_each_hgm_shift From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747316127038611092?= X-GMAIL-MSGID: =?utf-8?q?1747316127038611092?= This is a helper macro to loop through all the usable page sizes for a high-granularity-enabled HugeTLB VMA. Given the VMA's hstate, it will loop, in descending order, through the page sizes that HugeTLB supports for this architecture. It always includes PAGE_SIZE. This is done by looping through the hstates; however, there is no hstate for PAGE_SIZE. To handle this case, the loop intentionally goes out of bounds, and the out-of-bounds pointer is mapped to PAGE_SIZE. Signed-off-by: James Houghton --- mm/hugetlb.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index d6f07968156c..6eaec40d66ad 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -7856,6 +7856,25 @@ int enable_hugetlb_hgm(struct vm_area_struct *vma) hugetlb_unshare_all_pmds(vma); return 0; } + +/* Should only be used by the for_each_hgm_shift macro. */ +static unsigned int __shift_for_hstate(struct hstate *h) +{ + /* If h is out of bounds, we have reached the end, so give PAGE_SIZE */ + if (h >= &hstates[hugetlb_max_hstate]) + return PAGE_SHIFT; + return huge_page_shift(h); +} + +/* + * Intentionally go out of bounds. An out-of-bounds hstate will be converted to + * PAGE_SIZE. + */ +#define for_each_hgm_shift(hstate, tmp_h, shift) \ + for ((tmp_h) = hstate; (shift) = __shift_for_hstate(tmp_h), \ + (tmp_h) <= &hstates[hugetlb_max_hstate]; \ + (tmp_h)++) + #endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ /* From patchwork Fri Oct 21 16:36:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6856 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp796887wrr; Fri, 21 Oct 2022 09:41:24 -0700 (PDT) X-Google-Smtp-Source: AMsMyM58SCRVwPAhy4Tx3sIvPRaWLx1gtAWI+g1lgB0s1LR44bkuWOziWQ7UrqlrpnZaimtnNNP6 X-Received: by 2002:a17:90b:2243:b0:20b:42a:4c0d with SMTP id hk3-20020a17090b224300b0020b042a4c0dmr56988155pjb.123.1666370484595; Fri, 21 Oct 2022 09:41:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370484; cv=none; d=google.com; s=arc-20160816; b=eNmdfWo3/uyaDcCdlT5Xy++09cdL48nEnFMIha5Uq5t30ulriDwjEDA8RtrX0n0h12 LBxPx0dbgCp1s+J4KJBosFODO0b2Wgebsu/iM7OCJkcKhugdIjT4fV59GCZSazGfK9HL flE+S1Uv/x6fRcEg1Vi6lPkyVwhebPcl11X3oKkUJhU5Bk8Vuo2AHdiRPtjQrZOSAoQN HMuIeRr6R1UxNPsK1A4tw5FnKU0mC5jmGx/yWVYeJlr4CiM5GFQIfoqQNEHBb3leN/E3 VpCOA0GWAifsmjOPY8YgcHHfE3Qw0UGTvW6nqh7OuXD30leY6QnuybO+0khACkbxJNAr GCPQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=+C7/ATuUriMIrUw34NjzRIATqKexU0rnzCYq9OSaiU4=; b=lMrBVYaDG6WOthlJyxxYNWNHTf6x0sI6xiO46XQEwuJf6NMO5dbbsUatRGZqhZ+W8l 4G2AGplsfqHrvg/KcAaXv/gQA45Yun3+3hSll7h84ccba9ipp2P/TvaEkqwkRNzGEx0I +QvF6yeDnjfoNI35Bip7GbBHTvWUi8pwDQ1ic9cLxWhug1M59dTSnDPlp7Tkw+U5DOKm ulWJf3YubvPFUxJSv+l3nfxJTp4DI7ktRcEb0DyngHnAKW+Oa/44U0qW/ZOMLulNUp/G yZ6O7nWy++wn4uOevwgWd0CfHhjLGYNGXLkJGa/K8uxHLHf87TO/Xm79PpnWoxQ6jcTL 18Kw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=jJGO9Z7Z; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k137-20020a633d8f000000b0045d60c88ad2si26186617pga.164.2022.10.21.09.41.11; Fri, 21 Oct 2022 09:41:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=jJGO9Z7Z; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231442AbiJUQks (ORCPT + 99 others); Fri, 21 Oct 2022 12:40:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54372 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231181AbiJUQjH (ORCPT ); Fri, 21 Oct 2022 12:39:07 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9C02128B187 for ; Fri, 21 Oct 2022 09:37:48 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id n6-20020a5b0486000000b006aff8dc9865so3737759ybp.11 for ; Fri, 21 Oct 2022 09:37:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=+C7/ATuUriMIrUw34NjzRIATqKexU0rnzCYq9OSaiU4=; b=jJGO9Z7ZWa3ry6jjj7ApkjoGMgyzlebjLJ9uIX3/IGcrm79tNXbnUjUmPWEAbUWcEd 55DuEMpVP9519fDyAuDBtSlqfbKbDKkyB8ijR8pErjYk3LLzCP/D/GV/wZwetPTE7Ojr wpMDqo/jZbCPsmNcIrkWDF+SByqBvmoi+4RjlGJDBYF3ZdmqeeL+CbWAQiFjQNnor9LY 8qVv18b9TvK+Uf6pPv8vBqeHpFR9pTHLq1xNH9bPPuK8OH77Q8JvguOlKR7a9IFK3m+x IXO0PqqgFybb/i1Kn8jmukXUloBrvY5XjEr+pUxja4P0YbG1JGN8wCpY7Bj9c2UzEM8m BOcA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=+C7/ATuUriMIrUw34NjzRIATqKexU0rnzCYq9OSaiU4=; b=uiTNWqPQYKpD8lltgshhmlRq3/grOOEv7X7NyJI7d7ZNEcgirwSv7Wf0W02B/JiSaV 7xIyurw/Zmn8/ipL5nWBnWloBaorS2yMvSPB4Ll+iQVtB4eJJ1bqA61kQ53cOqQqiFx7 PtiXWyM1Gon5ApsTyfDqspIOgUA0gYL6yujJ1av05UiujqsNRLaO8PX61qTcfTnxPuMk QtBm5xD09N0GFe5odvKlhHknDWI/URB68027mH7pwsk4OJj4pl6fRhBbUcJ7rNDdmkyX Et8UL83ajadWqpW3HWGQlCItvfRDq0oxsCQ7O/MtaIL7P5t31EWQvUCn5IyNmUugTF3w 4VAA== X-Gm-Message-State: ACrzQf2Ilt5RYE2lUdITKWHGVgVEMrtSn3GNddOsH8FhSPNEl57rlY0F EVmHzwZ95fcnr66eP6WWy2gOcA5pC3liLsev X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:b851:0:b0:6ca:2b0b:d334 with SMTP id b17-20020a25b851000000b006ca2b0bd334mr9202954ybm.104.1666370267430; Fri, 21 Oct 2022 09:37:47 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:49 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-34-jthoughton@google.com> Subject: [RFC PATCH v2 33/47] userfaultfd: add UFFD_FEATURE_MINOR_HUGETLBFS_HGM From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747316096887403462?= X-GMAIL-MSGID: =?utf-8?q?1747316096887403462?= Userspace must provide this new feature when it calls UFFDIO_API to enable HGM. Userspace can check if the feature exists in uffdio_api.features, and if it does not exist, the kernel does not support and therefore did not enable HGM. Signed-off-by: James Houghton --- fs/userfaultfd.c | 12 +++++++++++- include/linux/userfaultfd_k.h | 7 +++++++ include/uapi/linux/userfaultfd.h | 2 ++ 3 files changed, 20 insertions(+), 1 deletion(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 07c81ab3fd4d..3a3e9ef74dab 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -226,6 +226,11 @@ static inline struct uffd_msg userfault_msg(unsigned long address, return msg; } +bool uffd_ctx_has_hgm(struct vm_userfaultfd_ctx *ctx) +{ + return ctx->ctx->features & UFFD_FEATURE_MINOR_HUGETLBFS_HGM; +} + #ifdef CONFIG_HUGETLB_PAGE /* * Same functionality as userfaultfd_must_wait below with modifications for @@ -1954,10 +1959,15 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx, goto err_out; /* report all available features and ioctls to userland */ uffdio_api.features = UFFD_API_FEATURES; + #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR uffdio_api.features &= ~(UFFD_FEATURE_MINOR_HUGETLBFS | UFFD_FEATURE_MINOR_SHMEM); -#endif +#ifndef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING + uffdio_api.features &= ~UFFD_FEATURE_MINOR_HUGETLBFS_HGM; +#endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ +#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ + #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_WP uffdio_api.features &= ~UFFD_FEATURE_PAGEFAULT_FLAG_WP; #endif diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index f07e6998bb68..d8fa37f308f7 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -162,6 +162,8 @@ static inline bool vma_can_userfault(struct vm_area_struct *vma, vma_is_shmem(vma); } +extern bool uffd_ctx_has_hgm(struct vm_userfaultfd_ctx *); + extern int dup_userfaultfd(struct vm_area_struct *, struct list_head *); extern void dup_userfaultfd_complete(struct list_head *); @@ -228,6 +230,11 @@ static inline bool userfaultfd_armed(struct vm_area_struct *vma) return false; } +static inline bool uffd_ctx_has_hgm(struct vm_userfaultfd_ctx *ctx) +{ + return false; +} + static inline int dup_userfaultfd(struct vm_area_struct *vma, struct list_head *l) { diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index 005e5e306266..ae8080003560 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -36,6 +36,7 @@ UFFD_FEATURE_SIGBUS | \ UFFD_FEATURE_THREAD_ID | \ UFFD_FEATURE_MINOR_HUGETLBFS | \ + UFFD_FEATURE_MINOR_HUGETLBFS_HGM | \ UFFD_FEATURE_MINOR_SHMEM | \ UFFD_FEATURE_EXACT_ADDRESS | \ UFFD_FEATURE_WP_HUGETLBFS_SHMEM) @@ -217,6 +218,7 @@ struct uffdio_api { #define UFFD_FEATURE_MINOR_SHMEM (1<<10) #define UFFD_FEATURE_EXACT_ADDRESS (1<<11) #define UFFD_FEATURE_WP_HUGETLBFS_SHMEM (1<<12) +#define UFFD_FEATURE_MINOR_HUGETLBFS_HGM (1<<13) __u64 features; __u64 ioctls; From patchwork Fri Oct 21 16:36:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6851 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp796600wrr; Fri, 21 Oct 2022 09:40:53 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4BjU4BEDGGD1uBiP1bikhAihrANqrEjN43zeTJ+CIfGmfUZt558mxNw4aMdFG26Hoc7koD X-Received: by 2002:aa7:8a0d:0:b0:562:a5bf:95e9 with SMTP id m13-20020aa78a0d000000b00562a5bf95e9mr20001062pfa.24.1666370453110; Fri, 21 Oct 2022 09:40:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370453; cv=none; d=google.com; s=arc-20160816; b=KOi7o7ay+wBQwwenDVvk3kDKmegvLV+hpxv92C5y3uJxEyb3fVTDdr57eR3HbhdJA/ Uzkl04z5OK1A9pKI0jSUM++2BCPCuvvj57HFnkKjry64IESOq3QIVwBmNsG9wnyCHiLb +anaMDhUT+IHBHYyHNm3awaBNoQTMjW8eBKlseW/gci3L1tQ2EUjk7+nD0jHgVBESZll qIgxm8qY+KzdhLQFijsKEndv+C5+sTTRc/8IIJoR0G7Z20IXYe2I3340U72hskqJgDqj pc6EnYph4WETgcd+n6dTKFKGZwVWUgH8VMps4xshXKW92r5c6rjrAoyuml4tq/D2EXCo wEbw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=UqxKOM3MWYAXAyGtNHot8Ydwfg1psxhFwpjBslLv29s=; b=Hx+gcDSR7tggh0Rt9VUy1ZAFaN5keK2MZ/D3KllN9CIll4Mw1FEm6CVgRboAtugHmN QG48Kv60GptuROg2VlCIpTtra0HhP7Ury7MtwCw+sZTAXDyJ+kRyHD7Y5JDos4hSYZ3J JmhjAoHr6WGcTyPzHNGJfh06mkNirz4N9JH7PDmPsQmnwGr13bs7kL8HcY/1Nm9uXII7 1X8sFKcpgAJyM6Onet7klmc9CrawX31glLiTM1kXHKxskiJ+tqmGBZpc4rlC3FIO+RWp Xp6KDmBbNrRpQF2iMmNAw2qaz1CDGg05nDIWOz77UnYCzwVNCbmGoYtJo+SDJt2gFwNO iDfw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=UytSbHX2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m13-20020a17090a3f8d00b00200bbb9027asi109584pjc.77.2022.10.21.09.40.40; Fri, 21 Oct 2022 09:40:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=UytSbHX2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231304AbiJUQkV (ORCPT + 99 others); Fri, 21 Oct 2022 12:40:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52570 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230398AbiJUQiy (ORCPT ); Fri, 21 Oct 2022 12:38:54 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 10DFF28B18F for ; Fri, 21 Oct 2022 09:37:49 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id e8-20020a5b0cc8000000b006bca0fa3ab6so3796829ybr.0 for ; Fri, 21 Oct 2022 09:37:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=UqxKOM3MWYAXAyGtNHot8Ydwfg1psxhFwpjBslLv29s=; b=UytSbHX2Jji0JBABtMgWoaGtBgF2MUn5vlwWqDRrVrcNAijPP9Qv7aUmRVAfTxv3Ft rWsJVyFJxCzZthpiWYJWzo3/CLwv8k9bBuZeEHVBwb6+quU9NKcUh+V+JdakL1oVm1Lr dG6N1EnVTKTYO39B1wEgizFXS4DG8EDiEOg5ij/oUeHv2Ul95dPDPNqU4XQYGGjFjYr/ 688C0dW1v+ij3m5bN6tv2h3klRHHsfbZzqDSO6L0zDjYBV0IMjd8371SGOjnV1fU9nnm sqiPf/pfItm/trjJqPfGvHI3I8Axr0JJvvQ2432CzGRIGyvDTWICZsz9NlNB6ANq6VQC 8zMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=UqxKOM3MWYAXAyGtNHot8Ydwfg1psxhFwpjBslLv29s=; b=NPQAywsKS5mqlFLXzKrRVJj/orJSuHwI9SE0NIaOcmdvpZyBMTJV87IR2Hg8yycxC2 yU7oR1VBDn2Y7n9uf0ClcnJ4ZfHmRzG2CYIu8umAoUiIxHGquRwav1Dbm6vC4A+/8ksM 3OGhztWU9d1vBgnCplNdEWT8OftSIP7GV4i6XkGXAehqJ5UGvEYYrjx4n4OJIGBOiTJl Ktau0fVyGcJcDYMeUoHVhaxv4NQmjxT4p4wcWY5JZfTkLCxuXjm41G09QHjxv+bhFj4q gSjb/nMKOz7FffvG7YNtHxop3cdrX6SJgT+whQ83YGRPU691KbXtEM28C0vwbVr+xeUV y3qw== X-Gm-Message-State: ACrzQf2GAEFJuloNd6L/Wc1RUFGUyC6nXFQSp0rW0qR2gZvrGu6bzivV s/6VDMssvEZH6e7QrlJ91DYzHjeZQOgXfkt4 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6902:13c7:b0:695:84d9:c5da with SMTP id y7-20020a05690213c700b0069584d9c5damr18146452ybu.650.1666370268278; Fri, 21 Oct 2022 09:37:48 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:50 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-35-jthoughton@google.com> Subject: [RFC PATCH v2 34/47] hugetlb: userfaultfd: add support for high-granularity UFFDIO_CONTINUE From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747316064286129607?= X-GMAIL-MSGID: =?utf-8?q?1747316064286129607?= Changes here are similar to the changes made for hugetlb_no_page. Pass vmf->real_address to userfaultfd_huge_must_wait because vmf->address is rounded down to the hugepage size, and a high-granularity page table walk would look up the wrong PTE. Also change the call to userfaultfd_must_wait in the same way for consistency. This commit introduces hugetlb_alloc_largest_pte which is used to find the appropriate PTE size to map pages with UFFDIO_CONTINUE. Signed-off-by: James Houghton --- fs/userfaultfd.c | 33 +++++++++++++++--- include/linux/hugetlb.h | 14 +++++++- mm/hugetlb.c | 76 +++++++++++++++++++++++++++++++++-------- mm/userfaultfd.c | 46 +++++++++++++++++-------- 4 files changed, 135 insertions(+), 34 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 3a3e9ef74dab..0204108e3882 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -245,14 +245,22 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx, struct mm_struct *mm = ctx->mm; pte_t *ptep, pte; bool ret = true; + struct hugetlb_pte hpte; + unsigned long sz = vma_mmu_pagesize(vma); + unsigned int shift = huge_page_shift(hstate_vma(vma)); mmap_assert_locked(mm); - ptep = huge_pte_offset(mm, address, vma_mmu_pagesize(vma)); + ptep = huge_pte_offset(mm, address, sz); if (!ptep) goto out; + hugetlb_pte_populate(&hpte, ptep, shift, hpage_size_to_level(sz)); + hugetlb_hgm_walk(mm, vma, &hpte, address, PAGE_SIZE, + /*stop_at_none=*/true); + ptep = hpte.ptep; + ret = false; pte = huge_ptep_get(ptep); @@ -498,6 +506,14 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) blocking_state = userfaultfd_get_blocking_state(vmf->flags); + if (is_vm_hugetlb_page(vmf->vma) && hugetlb_hgm_enabled(vmf->vma)) + /* + * Lock the VMA lock so we can do a high-granularity walk in + * userfaultfd_huge_must_wait. We have to grab this lock before + * we set our state to blocking. + */ + hugetlb_vma_lock_read(vmf->vma); + spin_lock_irq(&ctx->fault_pending_wqh.lock); /* * After the __add_wait_queue the uwq is visible to userland @@ -513,12 +529,15 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) spin_unlock_irq(&ctx->fault_pending_wqh.lock); if (!is_vm_hugetlb_page(vmf->vma)) - must_wait = userfaultfd_must_wait(ctx, vmf->address, vmf->flags, - reason); + must_wait = userfaultfd_must_wait(ctx, vmf->real_address, + vmf->flags, reason); else must_wait = userfaultfd_huge_must_wait(ctx, vmf->vma, - vmf->address, + vmf->real_address, vmf->flags, reason); + + if (is_vm_hugetlb_page(vmf->vma) && hugetlb_hgm_enabled(vmf->vma)) + hugetlb_vma_unlock_read(vmf->vma); mmap_read_unlock(mm); if (likely(must_wait && !READ_ONCE(ctx->released))) { @@ -1463,6 +1482,12 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, mas_pause(&mas); } next: + if (is_vm_hugetlb_page(vma) && (ctx->features & + UFFD_FEATURE_MINOR_HUGETLBFS_HGM)) { + ret = enable_hugetlb_hgm(vma); + if (ret) + break; + } /* * In the vma_merge() successful mprotect-like case 8: * the next vma was merged into the current one and diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index e25f97cdd086..00c22a84a1c6 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -250,7 +250,8 @@ unsigned long hugetlb_total_pages(void); vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, unsigned int flags); #ifdef CONFIG_USERFAULTFD -int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte, +int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, + struct hugetlb_pte *dst_hpte, struct vm_area_struct *dst_vma, unsigned long dst_addr, unsigned long src_addr, @@ -1272,6 +1273,9 @@ static inline enum hugetlb_level hpage_size_to_level(unsigned long sz) bool hugetlb_hgm_enabled(struct vm_area_struct *vma); bool hugetlb_hgm_eligible(struct vm_area_struct *vma); int enable_hugetlb_hgm(struct vm_area_struct *vma); +int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *mm, + struct vm_area_struct *vma, unsigned long start, + unsigned long end); #else static inline bool hugetlb_hgm_enabled(struct vm_area_struct *vma) { @@ -1285,6 +1289,14 @@ static inline int enable_hugetlb_hgm(struct vm_area_struct *vma) { return -EINVAL; } + +static inline +int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *mm, + struct vm_area_struct *vma, unsigned long start, + unsigned long end) +{ + return -EINVAL; +} #endif static inline spinlock_t *huge_pte_lock(struct hstate *h, diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 6eaec40d66ad..c25d3cd73ac9 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6325,7 +6325,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, * modifications for huge pages. */ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, - pte_t *dst_pte, + struct hugetlb_pte *dst_hpte, struct vm_area_struct *dst_vma, unsigned long dst_addr, unsigned long src_addr, @@ -6336,13 +6336,14 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, bool is_continue = (mode == MCOPY_ATOMIC_CONTINUE); struct hstate *h = hstate_vma(dst_vma); struct address_space *mapping = dst_vma->vm_file->f_mapping; - pgoff_t idx = vma_hugecache_offset(h, dst_vma, dst_addr); + unsigned long haddr = dst_addr & huge_page_mask(h); + pgoff_t idx = vma_hugecache_offset(h, dst_vma, haddr); unsigned long size; int vm_shared = dst_vma->vm_flags & VM_SHARED; pte_t _dst_pte; spinlock_t *ptl; int ret = -ENOMEM; - struct page *page; + struct page *page, *subpage; int writable; bool page_in_pagecache = false; @@ -6357,12 +6358,12 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, * a non-missing case. Return -EEXIST. */ if (vm_shared && - hugetlbfs_pagecache_present(h, dst_vma, dst_addr)) { + hugetlbfs_pagecache_present(h, dst_vma, haddr)) { ret = -EEXIST; goto out; } - page = alloc_huge_page(dst_vma, dst_addr, 0); + page = alloc_huge_page(dst_vma, haddr, 0); if (IS_ERR(page)) { ret = -ENOMEM; goto out; @@ -6378,13 +6379,13 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, /* Free the allocated page which may have * consumed a reservation. */ - restore_reserve_on_error(h, dst_vma, dst_addr, page); + restore_reserve_on_error(h, dst_vma, haddr, page); put_page(page); /* Allocate a temporary page to hold the copied * contents. */ - page = alloc_huge_page_vma(h, dst_vma, dst_addr); + page = alloc_huge_page_vma(h, dst_vma, haddr); if (!page) { ret = -ENOMEM; goto out; @@ -6398,14 +6399,14 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, } } else { if (vm_shared && - hugetlbfs_pagecache_present(h, dst_vma, dst_addr)) { + hugetlbfs_pagecache_present(h, dst_vma, haddr)) { put_page(*pagep); ret = -EEXIST; *pagep = NULL; goto out; } - page = alloc_huge_page(dst_vma, dst_addr, 0); + page = alloc_huge_page(dst_vma, haddr, 0); if (IS_ERR(page)) { put_page(*pagep); ret = -ENOMEM; @@ -6447,7 +6448,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, page_in_pagecache = true; } - ptl = huge_pte_lock(h, dst_mm, dst_pte); + ptl = hugetlb_pte_lock(dst_mm, dst_hpte); ret = -EIO; if (PageHWPoison(page)) @@ -6459,7 +6460,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, * page backing it, then access the page. */ ret = -EEXIST; - if (!huge_pte_none_mostly(huge_ptep_get(dst_pte))) + if (!huge_pte_none_mostly(huge_ptep_get(dst_hpte->ptep))) goto out_release_unlock; if (page_in_pagecache) { @@ -6478,7 +6479,11 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, else writable = dst_vma->vm_flags & VM_WRITE; - _dst_pte = make_huge_pte(dst_vma, page, writable); + subpage = hugetlb_find_subpage(h, page, dst_addr); + WARN_ON_ONCE(subpage != page && !hugetlb_hgm_enabled(dst_vma)); + + _dst_pte = make_huge_pte_with_shift(dst_vma, subpage, writable, + dst_hpte->shift); /* * Always mark UFFDIO_COPY page dirty; note that this may not be * extremely important for hugetlbfs for now since swapping is not @@ -6491,12 +6496,12 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, if (wp_copy) _dst_pte = huge_pte_mkuffd_wp(_dst_pte); - set_huge_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte); + set_huge_pte_at(dst_mm, dst_addr, dst_hpte->ptep, _dst_pte); - hugetlb_count_add(pages_per_huge_page(h), dst_mm); + hugetlb_count_add(hugetlb_pte_size(dst_hpte) / PAGE_SIZE, dst_mm); /* No need to invalidate - it was non-present before */ - update_mmu_cache(dst_vma, dst_addr, dst_pte); + update_mmu_cache(dst_vma, dst_addr, dst_hpte->ptep); spin_unlock(ptl); if (!is_continue) @@ -7875,6 +7880,47 @@ static unsigned int __shift_for_hstate(struct hstate *h) (tmp_h) <= &hstates[hugetlb_max_hstate]; \ (tmp_h)++) +/* + * Allocate a HugeTLB PTE that maps as much of [start, end) as possible with a + * single page table entry. The allocated HugeTLB PTE is returned in @hpte. + */ +int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *mm, + struct vm_area_struct *vma, unsigned long start, + unsigned long end) +{ + struct hstate *h = hstate_vma(vma), *tmp_h; + unsigned int shift; + unsigned long sz; + int ret; + pte_t *ptep; + + for_each_hgm_shift(h, tmp_h, shift) { + sz = 1UL << shift; + + if (!IS_ALIGNED(start, sz) || start + sz > end) + continue; + goto found; + } + return -EINVAL; +found: + ptep = huge_pte_alloc(mm, vma, start, huge_page_size(h)); + if (!ptep) + return -ENOMEM; + + hugetlb_pte_populate(hpte, ptep, huge_page_shift(h), + hpage_size_to_level(huge_page_size(h))); + + ret = hugetlb_hgm_walk(mm, vma, hpte, start, 1L << shift, + /*stop_at_none=*/false); + if (ret) + return ret; + + if (hpte->shift > shift) + return -EEXIST; + + return 0; +} + #endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ /* diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index e24e8a47ce8a..c4a8e6666ea6 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -315,14 +315,16 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, { int vm_shared = dst_vma->vm_flags & VM_SHARED; ssize_t err; - pte_t *dst_pte; unsigned long src_addr, dst_addr; long copied; struct page *page; - unsigned long vma_hpagesize; + unsigned long vma_hpagesize, target_pagesize; pgoff_t idx; u32 hash; struct address_space *mapping; + bool use_hgm = uffd_ctx_has_hgm(&dst_vma->vm_userfaultfd_ctx) && + mode == MCOPY_ATOMIC_CONTINUE; + struct hstate *h = hstate_vma(dst_vma); /* * There is no default zero huge page for all huge page sizes as @@ -340,12 +342,13 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, copied = 0; page = NULL; vma_hpagesize = vma_kernel_pagesize(dst_vma); + target_pagesize = use_hgm ? PAGE_SIZE : vma_hpagesize; /* - * Validate alignment based on huge page size + * Validate alignment based on the targeted page size. */ err = -EINVAL; - if (dst_start & (vma_hpagesize - 1) || len & (vma_hpagesize - 1)) + if (dst_start & (target_pagesize - 1) || len & (target_pagesize - 1)) goto out_unlock; retry: @@ -362,6 +365,8 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, err = -EINVAL; if (vma_hpagesize != vma_kernel_pagesize(dst_vma)) goto out_unlock; + if (use_hgm && !hugetlb_hgm_enabled(dst_vma)) + goto out_unlock; vm_shared = dst_vma->vm_flags & VM_SHARED; } @@ -376,13 +381,15 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, } while (src_addr < src_start + len) { + struct hugetlb_pte hpte; + pte_t *dst_pte; BUG_ON(dst_addr >= dst_start + len); /* * Serialize via vma_lock and hugetlb_fault_mutex. - * vma_lock ensures the dst_pte remains valid even - * in the case of shared pmds. fault mutex prevents - * races with other faulting threads. + * vma_lock ensures the hpte.ptep remains valid even + * in the case of shared pmds and page table collapsing. + * fault mutex prevents races with other faulting threads. */ idx = linear_page_index(dst_vma, dst_addr); mapping = dst_vma->vm_file->f_mapping; @@ -390,23 +397,33 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, mutex_lock(&hugetlb_fault_mutex_table[hash]); hugetlb_vma_lock_read(dst_vma); - err = -ENOMEM; + err = 0; dst_pte = huge_pte_alloc(dst_mm, dst_vma, dst_addr, vma_hpagesize); - if (!dst_pte) { + if (!dst_pte) + err = -ENOMEM; + else { + hugetlb_pte_populate(&hpte, dst_pte, huge_page_shift(h), + hpage_size_to_level(huge_page_size(h))); + if (use_hgm) + err = hugetlb_alloc_largest_pte(&hpte, + dst_mm, dst_vma, dst_addr, + dst_start + len); + } + if (err) { hugetlb_vma_unlock_read(dst_vma); mutex_unlock(&hugetlb_fault_mutex_table[hash]); goto out_unlock; } if (mode != MCOPY_ATOMIC_CONTINUE && - !huge_pte_none_mostly(huge_ptep_get(dst_pte))) { + !huge_pte_none_mostly(huge_ptep_get(hpte.ptep))) { err = -EEXIST; hugetlb_vma_unlock_read(dst_vma); mutex_unlock(&hugetlb_fault_mutex_table[hash]); goto out_unlock; } - err = hugetlb_mcopy_atomic_pte(dst_mm, dst_pte, dst_vma, + err = hugetlb_mcopy_atomic_pte(dst_mm, &hpte, dst_vma, dst_addr, src_addr, mode, &page, wp_copy); @@ -418,6 +435,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, if (unlikely(err == -ENOENT)) { mmap_read_unlock(dst_mm); BUG_ON(!page); + BUG_ON(hpte.shift != huge_page_shift(h)); err = copy_huge_page_from_user(page, (const void __user *)src_addr, @@ -435,9 +453,9 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, BUG_ON(page); if (!err) { - dst_addr += vma_hpagesize; - src_addr += vma_hpagesize; - copied += vma_hpagesize; + dst_addr += hugetlb_pte_size(&hpte); + src_addr += hugetlb_pte_size(&hpte); + copied += hugetlb_pte_size(&hpte); if (fatal_signal_pending(current)) err = -EINTR; From patchwork Fri Oct 21 16:36:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6853 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp796704wrr; Fri, 21 Oct 2022 09:41:06 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7k00U+5pUrJTSIXbVhmLQsAAd21YQViIzKQXdfADjlP/GzczbM+ggZSnCxDByawEyDOi3Y X-Received: by 2002:a17:90a:930b:b0:20b:a5d:35d6 with SMTP id p11-20020a17090a930b00b0020b0a5d35d6mr58332879pjo.146.1666370466360; Fri, 21 Oct 2022 09:41:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370466; cv=none; d=google.com; s=arc-20160816; b=fkylWtWC7kRRDOaKlGGUPI/GVk7U2TQE1N45TUsAvCDohMm+lJ2YXolnowVuxj4jeU UnxUYR3lH+HxGpBrvf0kwDXtBhpoNH3vl9iAJRuU/iEdZkJickS5xMWCXI+VrV3peRAa lnwa5QUPP8yeRffQjJnFIsVhrpdnPFroTFfFQRSqDbqoBR3pf2ICX025QX1GqXACL8BW j/DhQCK85ajsGoJvUAskOt6MaxkZLlr1m1yXbfImSw3TgLQgMS+yXEDh8QXgBOqMJ69T xp7IYAqmL9IF9evFAMqSRFYB0se4CErq8ebITOgI+BWtc5BNBB7yUvPogLABEsZHi/9z w3LQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=u3+VTs02NGQ5oyT2/qztgs968E2VJInsJwZCRXskVX8=; b=v+tB+6EGYKW1RW8+vOU6FzCsmT0bKylVV+AGHL9bnFz8ZXFQFgGHPSrdas49JIz44B BwlyIQnBtRSbJ363rqUs9RXuYath5BmLopcLA8sNsOsx29yFRZNr2K1jXJoXgiGWBzkJ Td2yO4+iFsYG/2B9CzBGUUWIr7tO5qv0FugYz4cwzgZ8AinLUODYlNm4BN54mm3WviPV fRxvl7Tz5nxeQS0h1p3Vw5m7vmIpgqxVNrUj/f2BNj3BBVXBKd8GbfJWNCAoRTervVEe oPe7o4F+6qnkm2amKd/8ls7FSreO9fa8zxsVOKpylZF6qtl6jPb6MyR+U5ALNNpSEyWb lrUQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=B67mZeoR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b13-20020a6541cd000000b00461ec258b85si25939991pgq.566.2022.10.21.09.40.53; Fri, 21 Oct 2022 09:41:06 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=B67mZeoR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231417AbiJUQka (ORCPT + 99 others); Fri, 21 Oct 2022 12:40:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52790 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231308AbiJUQjA (ORCPT ); Fri, 21 Oct 2022 12:39:00 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D83A528B193 for ; Fri, 21 Oct 2022 09:37:50 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id b142-20020a253494000000b006ca86d5f40fso920429yba.19 for ; Fri, 21 Oct 2022 09:37:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=u3+VTs02NGQ5oyT2/qztgs968E2VJInsJwZCRXskVX8=; b=B67mZeoRzaRq4630o3McC0EzYYV2BfBTiABJMfKCzqnLycO3omT8Gy4/5jQCJFz8AY Qu+pjgsVsBEQ7iOage5JALUzkvGcVzVAI+BCZzndDO2njS28MHuucgPnHYGjHi0MMXoR Jp5rOJtddtvnQu8CbcPae/HSkg70RTOIctaRcEgcIMgYrmnjlaiPn4cNjpoWw5tTnOGk A3Uul0URZQlgQ2/D4cBlS3e4c0tE/yI2F+txMyFPuIxYVsSXN6ldOmoWU22e+oy+uchS z7av7s9Z3JIen2P4NfWiU5pOIplHTUb/SK5SQWwj0gI2b/bXVCp6bRB3GWynd6s9gcat VN1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=u3+VTs02NGQ5oyT2/qztgs968E2VJInsJwZCRXskVX8=; b=U1ehZgjeRpWYIV7uzRSnubXLHHE3ZMUwhJwYNFjHHaTurRg7sYlYf+1V+UfL7izgOB AGi1VnN1FM/bBtQam1deHMrSAQW9EHAdZBvuLGyGvwzC5m8of+S+HG+dDioqaerBe79b oQaJIQI347/KJrPMch4PnQurRr5tOvT28izKNU5nl/CNn9elmmRp1a25c9Ao12zgXyFX nKGJ6MdtiMSxLdn7G+Iy0OquWnP1t8mOr8J2KfP1Yxdwcxdjch5bz/qCJOzxGNomTs7M CSM015g0d/l4zI++xRu3O9QFk1DZ7MHS1HrqYNi5Ky6Shu5nR1Krk2tSII4/uiOXd/j4 D3Ag== X-Gm-Message-State: ACrzQf1ZEVjDqn7WgoTuFZcflGer1NTDfDFJH0Z7XXfZe5kcUqGEKMhF aWp/pAtNeBApdPtYwwRHVe39MIpNT85b5wNA X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:2187:0:b0:6b0:1abc:2027 with SMTP id h129-20020a252187000000b006b01abc2027mr17110269ybh.348.1666370269187; Fri, 21 Oct 2022 09:37:49 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:51 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-36-jthoughton@google.com> Subject: [RFC PATCH v2 35/47] userfaultfd: require UFFD_FEATURE_EXACT_ADDRESS when using HugeTLB HGM From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747316078220161453?= X-GMAIL-MSGID: =?utf-8?q?1747316078220161453?= To avoid bugs in userspace, we require that userspace provide UFFD_FEATURE_EXACT_ADDRESS when using UFFD_FEATURE_MINOR_HUGETLBFS_HGM, otherwise UFFDIO_API will fail with EINVAL. The potential confusion is this: without EXACT_ADDRESS, the address given in the userfaultfd message will be rounded down to the hugepage size. Userspace may think that, because they're using HGM, just UFFDIO_CONTINUE the interval [address, address+PAGE_SIZE), but for faults that didn't occur in the first base page of the hugepage, this won't resolve the fault. The only choice it has in this scenario is to UFFDIO_CONTINUE the interval [address, address+hugepage_size), which negates the purpose of using HGM in the first place. By requiring userspace to provide UFFD_FEATURE_EXACT_ADDRESS, there is no rounding, and userspace now has the information it needs to appropriately resolve the fault. Another potential solution here is to change the behavior when UFFD_FEATURE_EXACT_ADDRESS is not provided: when HGM is enabled, start rounding to PAGE_SIZE instead of to the hugepage size. I think requiring UFFD_FEATURE_EXACT_ADDRESS is cleaner. Signed-off-by: James Houghton --- fs/userfaultfd.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 0204108e3882..c8f21f53e37d 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1990,6 +1990,17 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx, ~(UFFD_FEATURE_MINOR_HUGETLBFS | UFFD_FEATURE_MINOR_SHMEM); #ifndef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING uffdio_api.features &= ~UFFD_FEATURE_MINOR_HUGETLBFS_HGM; +#else + + ret = -EINVAL; + if ((uffdio_api.features & UFFD_FEATURE_MINOR_HUGETLBFS_HGM) && + !(uffdio_api.features & UFFD_FEATURE_EXACT_ADDRESS)) + /* + * UFFD_FEATURE_MINOR_HUGETLBFS_HGM is mostly + * useless without UFFD_FEATURE_EXACT_ADDRESS, + * so require userspace to provide both. + */ + goto err_out; #endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ From patchwork Fri Oct 21 16:36:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6854 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp796778wrr; Fri, 21 Oct 2022 09:41:13 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4XTxp6trbGgiNE+pLgqKBL+rEj/ONLD0nYYPY3G2pckRzfRuRGMlulfNS/Cab2xinzqPxA X-Received: by 2002:a05:6a00:cc4:b0:566:87c:53de with SMTP id b4-20020a056a000cc400b00566087c53demr19741242pfv.19.1666370473180; Fri, 21 Oct 2022 09:41:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370473; cv=none; d=google.com; s=arc-20160816; b=uMGze6NkPjkylJ8Q2kxqAaTV+sFPjgpw0poutnyOCvoNi2Esp01KCxp0B6Er3PEx1R r9JMY5YoSmBJwEIM6R7UNgm15b01ha5hJKTmMM2Khbco8gUPT1Atz1ZnTGAOiwPs1d5+ e09ievsDxkLd5E9uQ0mJFR03C1ZNc9Tj8cmO/STBrdjUOqmIg7hl3QIYuHvOh9NWY4T8 T5px3awM673yP0VHPLRh2jc6KEW4p5ZQxVMNu9m2kA1GFbvH0sJ45fMo1FzaBAeC8q1G TtkrrNZoPzXc4H1hk/fBz61DMeU4u6o16g1F5HEntelFHit9uuOM8k3HlbHRL3BGvz+B AOyA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=9XBceLY8DUUPXaMS6s0AnMTF5dFEtD/aKN5y5ep44a8=; b=K2jYgdCCDDyufRQvdLoAVOHE91Gmyyg5iBwls+TpkpDq57PxalnBY8H7/EjNNDIjEV BMPdbGDrMd2QlXkcDVlcWQdGkgRdJFS17NREmHBmm+c8lj0Lz7RSwxf8OASB7bX6FEcm PDh6pys5BqW/OJ1sJM0/FAajIw7HzCdlYeOwedKsbEli/JNMqrxMaZsQjcsicWjJRZg/ t/WCFx9QJlAAQ3+lAUk4AAT6PD80kvmN8969BjFDgQRDGYvpWUq6G2NLz1RQPslzIvDm xG2kxMJimGoHpXCZa3dW5QIHXaUSKgJNypBVSqh09ot6zIBQkFGXF+tf1nMvyRtiGlqB 3JxQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=OgH1Pzym; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y15-20020a056a00180f00b00561c6940802si28758727pfa.27.2022.10.21.09.41.00; Fri, 21 Oct 2022 09:41:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=OgH1Pzym; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230502AbiJUQkg (ORCPT + 99 others); Fri, 21 Oct 2022 12:40:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54318 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231321AbiJUQjC (ORCPT ); Fri, 21 Oct 2022 12:39:02 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6C37E2852A0 for ; Fri, 21 Oct 2022 09:37:51 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-367c2e72a6dso34353827b3.1 for ; Fri, 21 Oct 2022 09:37:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=9XBceLY8DUUPXaMS6s0AnMTF5dFEtD/aKN5y5ep44a8=; b=OgH1PzymDfy3mw5S5P/GrBugnAOY7dXplImrNOPJNzGgEj5Ynkk+R2pxayxr0P31L+ I7KeAWrzVvfqQft9BqHY8unXCkX4tbl4+yOLciSqD0qIO+ksbnyovctrGta/pIV8JNaG W6hTpfrnvPg+23rAzFRCtUicfoL9p4+j7RCT+Huce0P9QHhJj1qJU0NBfwFm3ad7u1Th CIpBC/83vC+apa7HIzMVeXUPgw0BhfSreH1K/aS4Ytz1C/SF7HhYjjJiarG3JHxSYM/N OlbY1e6bk+zMD25bt+/pLDkYPVZFHVfTiNohjOvsYUlfTvGIg+EDo6vuDIqJC1bsWShr W5sw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=9XBceLY8DUUPXaMS6s0AnMTF5dFEtD/aKN5y5ep44a8=; b=KsUut/ItvzqkeYaTORn+oMAlBxYO7ZsOpEuBcRwPh3QSJjmbhgBuG2WzsfIcweT2tg fmvTHPj3Bl4OPqYhVQut6X6qgPfgNPnVOMO2Zdo5/gN3mmL7iA2zrUcq49sZ6WppXaf2 fXcFe0EgdQ1YHApw+fIWjtbaieVFvLH9nLD8FiADZmzxFZjtAUtHXAlkQzpvnuO4U/h9 qxxjzpBY8JQbpkO2juabnxf90tOmBUebD+bPLtqjmWDRQ4CKeK0COtxa9g05sshHNKsS PM3cFt+1kgY9O645BOPmYypNSjSLc22NFp5CC63pg+xKOQK1sLbU5GR0lIQ6CDTs9n5W 2TVw== X-Gm-Message-State: ACrzQf3y3m4YlkxxbviSRZQ62TESxYIlgm/9f9wJVV0b94BcTh0q8Og0 7c+RN51K53kBKyWb4nCzULTGKdQ9DMgezzRo X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:5:0:b0:6c5:3855:d87b with SMTP id 5-20020a250005000000b006c53855d87bmr18032790yba.84.1666370270050; Fri, 21 Oct 2022 09:37:50 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:52 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-37-jthoughton@google.com> Subject: [RFC PATCH v2 36/47] hugetlb: add MADV_COLLAPSE for hugetlb From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747316085109401204?= X-GMAIL-MSGID: =?utf-8?q?1747316085109401204?= This is a necessary extension to the UFFDIO_CONTINUE changes. When userspace finishes mapping an entire hugepage with UFFDIO_CONTINUE, the kernel has no mechanism to automatically collapse the page table to map the whole hugepage normally. We require userspace to inform us that they would like the mapping to be collapsed; they do this with MADV_COLLAPSE. If userspace has not mapped all of a hugepage with UFFDIO_CONTINUE, but only some, hugetlb_collapse will cause the requested range to be mapped as if it were UFFDIO_CONTINUE'd already. The effects of any UFFDIO_WRITEPROTECT calls may be undone by a call to MADV_COLLAPSE for intersecting address ranges. This commit is co-opting the same madvise mode that has been introduced to synchronously collapse THPs. The function that does THP collapsing has been renamed to madvise_collapse_thp. As with the rest of the high-granularity mapping support, MADV_COLLAPSE is only supported for shared VMAs right now. Signed-off-by: James Houghton --- include/linux/huge_mm.h | 12 ++-- include/linux/hugetlb.h | 8 +++ mm/hugetlb.c | 142 ++++++++++++++++++++++++++++++++++++++++ mm/khugepaged.c | 4 +- mm/madvise.c | 24 ++++++- 5 files changed, 181 insertions(+), 9 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 5d861905df46..fc2813db5e2e 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -226,9 +226,9 @@ void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, int hugepage_madvise(struct vm_area_struct *vma, unsigned long *vm_flags, int advice); -int madvise_collapse(struct vm_area_struct *vma, - struct vm_area_struct **prev, - unsigned long start, unsigned long end); +int madvise_collapse_thp(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start, unsigned long end); void vma_adjust_trans_huge(struct vm_area_struct *vma, unsigned long start, unsigned long end, long adjust_next); spinlock_t *__pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma); @@ -373,9 +373,9 @@ static inline int hugepage_madvise(struct vm_area_struct *vma, return -EINVAL; } -static inline int madvise_collapse(struct vm_area_struct *vma, - struct vm_area_struct **prev, - unsigned long start, unsigned long end) +static inline int madvise_collapse_thp(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start, unsigned long end) { return -EINVAL; } diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 00c22a84a1c6..5378b98cc7b8 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -1276,6 +1276,8 @@ int enable_hugetlb_hgm(struct vm_area_struct *vma); int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *mm, struct vm_area_struct *vma, unsigned long start, unsigned long end); +int hugetlb_collapse(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long start, unsigned long end); #else static inline bool hugetlb_hgm_enabled(struct vm_area_struct *vma) { @@ -1297,6 +1299,12 @@ int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *mm, { return -EINVAL; } +static inline +int hugetlb_collapse(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long start, unsigned long end) +{ + return -EINVAL; +} #endif static inline spinlock_t *huge_pte_lock(struct hstate *h, diff --git a/mm/hugetlb.c b/mm/hugetlb.c index c25d3cd73ac9..d80db81a1fa5 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -7921,6 +7921,148 @@ int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *mm, return 0; } +/* + * Collapse the address range from @start to @end to be mapped optimally. + * + * This is only valid for shared mappings. The main use case for this function + * is following UFFDIO_CONTINUE. If a user UFFDIO_CONTINUEs an entire hugepage + * by calling UFFDIO_CONTINUE once for each 4K region, the kernel doesn't know + * to collapse the mapping after the final UFFDIO_CONTINUE. Instead, we leave + * it up to userspace to tell us to do so, via MADV_COLLAPSE. + * + * Any holes in the mapping will be filled. If there is no page in the + * pagecache for a region we're collapsing, the PTEs will be cleared. + * + * If high-granularity PTEs are uffd-wp markers, those markers will be dropped. + */ +int hugetlb_collapse(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long start, unsigned long end) +{ + struct hstate *h = hstate_vma(vma); + struct address_space *mapping = vma->vm_file->f_mapping; + struct mmu_notifier_range range; + struct mmu_gather tlb; + unsigned long curr = start; + int ret = 0; + struct page *hpage, *subpage; + pgoff_t idx; + bool writable = vma->vm_flags & VM_WRITE; + bool shared = vma->vm_flags & VM_SHARED; + struct hugetlb_pte hpte; + pte_t entry; + + /* + * This is only supported for shared VMAs, because we need to look up + * the page to use for any PTEs we end up creating. + */ + if (!shared) + return -EINVAL; + + if (!hugetlb_hgm_enabled(vma)) + return 0; + + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, + start, end); + mmu_notifier_invalidate_range_start(&range); + tlb_gather_mmu(&tlb, mm); + + /* + * Grab the lock VMA lock for writing. This will prevent concurrent + * high-granularity page table walks, so that we can safely collapse + * and free page tables. + */ + hugetlb_vma_lock_write(vma); + + while (curr < end) { + ret = hugetlb_alloc_largest_pte(&hpte, mm, vma, curr, end); + if (ret) + goto out; + + entry = huge_ptep_get(hpte.ptep); + + /* + * There is no work to do if the PTE doesn't point to page + * tables. + */ + if (!pte_present(entry)) + goto next_hpte; + if (hugetlb_pte_present_leaf(&hpte, entry)) + goto next_hpte; + + idx = vma_hugecache_offset(h, vma, curr); + hpage = find_get_page(mapping, idx); + + if (hpage && !HPageMigratable(hpage)) { + /* + * Don't collapse a mapping to a page that is pending + * a migration. Migration swap entries may have placed + * in the page table. + */ + ret = -EBUSY; + put_page(hpage); + goto out; + } + + if (hpage && PageHWPoison(hpage)) { + /* + * Don't collapse a mapping to a page that is + * hwpoisoned. + */ + ret = -EHWPOISON; + put_page(hpage); + /* + * By setting ret to -EHWPOISON, if nothing else + * happens, we will tell userspace that we couldn't + * fully collapse everything due to poison. + * + * Skip this page, and continue to collapse the rest + * of the mapping. + */ + curr = (curr & huge_page_mask(h)) + huge_page_size(h); + continue; + } + + /* + * Clear all the PTEs, and drop ref/mapcounts + * (on tlb_finish_mmu). + */ + __unmap_hugepage_range(&tlb, vma, curr, + curr + hugetlb_pte_size(&hpte), + NULL, + ZAP_FLAG_DROP_MARKER); + /* Free the PTEs. */ + hugetlb_free_pgd_range(&tlb, + curr, curr + hugetlb_pte_size(&hpte), + curr, curr + hugetlb_pte_size(&hpte)); + if (!hpage) { + huge_pte_clear(mm, curr, hpte.ptep, + hugetlb_pte_size(&hpte)); + goto next_hpte; + } + + page_dup_file_rmap(hpage, true); + + subpage = hugetlb_find_subpage(h, hpage, curr); + entry = make_huge_pte_with_shift(vma, subpage, + writable, hpte.shift); + set_huge_pte_at(mm, curr, hpte.ptep, entry); +next_hpte: + curr += hugetlb_pte_size(&hpte); + + if (curr < end) { + /* Don't hold the VMA lock for too long. */ + hugetlb_vma_unlock_write(vma); + cond_resched(); + hugetlb_vma_lock_write(vma); + } + } +out: + hugetlb_vma_unlock_write(vma); + tlb_finish_mmu(&tlb); + mmu_notifier_invalidate_range_end(&range); + return ret; +} + #endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ /* diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 4734315f7940..70796824e9d2 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -2555,8 +2555,8 @@ static int madvise_collapse_errno(enum scan_result r) } } -int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev, - unsigned long start, unsigned long end) +int madvise_collapse_thp(struct vm_area_struct *vma, struct vm_area_struct **prev, + unsigned long start, unsigned long end) { struct collapse_control *cc; struct mm_struct *mm = vma->vm_mm; diff --git a/mm/madvise.c b/mm/madvise.c index 2baa93ca2310..6aed9bd68476 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -986,6 +986,24 @@ static long madvise_remove(struct vm_area_struct *vma, return error; } +static int madvise_collapse(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start, unsigned long end) +{ + /* Only allow collapsing for HGM-enabled, shared mappings. */ + if (is_vm_hugetlb_page(vma)) { + *prev = vma; + if (!hugetlb_hgm_eligible(vma)) + return -EINVAL; + if (!hugetlb_hgm_enabled(vma)) + return 0; + return hugetlb_collapse(vma->vm_mm, vma, start, end); + } + + return madvise_collapse_thp(vma, prev, start, end); + +} + /* * Apply an madvise behavior to a region of a vma. madvise_update_vma * will handle splitting a vm area into separate areas, each area with its own @@ -1157,6 +1175,9 @@ madvise_behavior_valid(int behavior) #ifdef CONFIG_TRANSPARENT_HUGEPAGE case MADV_HUGEPAGE: case MADV_NOHUGEPAGE: +#endif +#if defined(CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING) || \ + defined(CONFIG_TRANSPARENT_HUGEPAGE) case MADV_COLLAPSE: #endif case MADV_DONTDUMP: @@ -1347,7 +1368,8 @@ int madvise_set_anon_name(struct mm_struct *mm, unsigned long start, * MADV_NOHUGEPAGE - mark the given range as not worth being backed by * transparent huge pages so the existing pages will not be * coalesced into THP and new pages will not be allocated as THP. - * MADV_COLLAPSE - synchronously coalesce pages into new THP. + * MADV_COLLAPSE - synchronously coalesce pages into new THP, or, for HugeTLB + * pages, collapse the mapping. * MADV_DONTDUMP - the application wants to prevent pages in the given range * from being included in its core dump. * MADV_DODUMP - cancel MADV_DONTDUMP: no longer exclude from core dump. From patchwork Fri Oct 21 16:36:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6849 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp796472wrr; Fri, 21 Oct 2022 09:40:41 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6lSttM/VbomIw26BEI+k1Q/cHaIxLMuIhgcGvE3lXfzluT9a4w3TPrrsRfJn22W9KZgaZF X-Received: by 2002:a17:90b:3e81:b0:20d:bbe5:f332 with SMTP id rj1-20020a17090b3e8100b0020dbbe5f332mr47285769pjb.112.1666370441205; Fri, 21 Oct 2022 09:40:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370441; cv=none; d=google.com; s=arc-20160816; b=0uO7XrUoFOuoOhhF+Bv9MnQhGoTfIBMB8zW8zrap/GwjrrZXpUWYPCvayHOq9uQRGH Duv8us6GEYq4P8kONumEbWOkTCv5OpeIDfOQnL1huDySgiy6A/OVC4lgHmz6UbYkNc3D CU78+aNbqcEI3kgMqHxu41ol57643YCycAXB2z7s8zwqKxmLlYbAsOrrhDuu7dNcPD97 K36NwB5HIcP453PetmOHE2s91zOFjnUqOA0TjyK7lMY2g8Rcwr/61a+ApL7QY7hc6BOV mt8ygeQCE6kEJcC7DkMWNkSsKA/ACMVc3hBxpnsnDu7AIg0REL0AWB7WzU73VnSsLVYR ujFQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=pO7qfdtljC0cQEtW68yd+sWFuuFiNFYwsARvURkZJUk=; b=cfWiAdUVl7V3QvFtTsKHyODkIsTaDyFBioQgAE2vu4E6N4TyknAeP0F1G4pFBIe4Hy m4jp/BvZh2jvquS4UyOBF7gZDJrFG/3b9hQySItdkbGRlrnhsUKoEKATgxtH+7Hj7l/R deSpPBOA5TnhletI4vD95E7amMEAwOMIahPCkOzgESFcKkX/WOi3qapA+axniOdkDz0g sAqlPp1ZZjQGUzcEh3i+f0kQB4SxGNom7YiuNEM1M/OP8XBoAUiVSJ+uhmmnt+jbDXLK D75b43IQLnPrT8wZWRKm1HiPhKTeskmt95eKCEcLlfD6BETEOvflwxyjWq9Z1FfxBnbb rglA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=F5EfAc0E; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p1-20020a63fe01000000b0046af247a19asi25978233pgh.438.2022.10.21.09.40.28; Fri, 21 Oct 2022 09:40:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=F5EfAc0E; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231375AbiJUQj5 (ORCPT + 99 others); Fri, 21 Oct 2022 12:39:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52566 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231146AbiJUQiv (ORCPT ); Fri, 21 Oct 2022 12:38:51 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A5582277A14 for ; Fri, 21 Oct 2022 09:37:52 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-369d18a90c3so25164097b3.0 for ; Fri, 21 Oct 2022 09:37:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=pO7qfdtljC0cQEtW68yd+sWFuuFiNFYwsARvURkZJUk=; b=F5EfAc0EFaqIHDJwcnG3SX+PWOaohJQFLM2kFecrOP2UMSCoQVjg78Y10c4j1PI46N x3pQT8CD3Oz4bgPgYC/hkNHGtoaKAp+KDoJ159WXQmk8tAlHkqBPm8EcDd37g5RKAzHh OvLW9vkuEzWvkq5Y6LbXK3yN9XL3aGUUggRQqwFgMqJtrLkxrke5o4a9R8EwmQ1kcj1S Hppt92/szI1X7GCxGOniN98/J1Tm1vwZRKo9WhiiZPqGEQDCeU4Uj32Clokiwp8SefxD ZHhuRWT6/KZWweMGNRSX67APat2/HtcUTuMbrmhoGIuQHcxDyVwqWe/kONRtmfkeK79F UDpw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=pO7qfdtljC0cQEtW68yd+sWFuuFiNFYwsARvURkZJUk=; b=jz+fnZtfHpBmF7A3MZ/ReMFUgbGZtR+WCx4rI/5uEvNWZQZ7gi/yhozoc/GZrJsdjK m1auZyNLho1GInwYYuGsXb3xi/+gVU0jhflmtPA32pLeoydzp/ZOCysZ9kW0iqOxfepq O7LyaCuDA5gv/i5KW7WLTdXzS+bD4AOSh3/FOGBkBJqSnsXER70fzvg+tLNV+gpqZAFv UIqP8mM3wF87CLvhWIeVmH5YN7YS6M5YyuqCPqDTsaPj+lIovAi9Ib96s050EgmNzJtL wOzR3PRXIfwGXtTBwFEQ3q3lpaKQlbwqCym/km4m0oOQlM6lthLvmKgwccQp1ppgoHwZ vrqg== X-Gm-Message-State: ACrzQf0zbtgIUGgXy2NhlfZ6SEzf2TXtgI3Mf04eAePl6i/pfKY+Rryt EAr9RpXOWOVPvMeNYE182VZ176GhvuvKJWV7 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a0d:da85:0:b0:360:819a:ffa8 with SMTP id c127-20020a0dda85000000b00360819affa8mr17910682ywe.414.1666370270941; Fri, 21 Oct 2022 09:37:50 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:53 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-38-jthoughton@google.com> Subject: [RFC PATCH v2 37/47] hugetlb: remove huge_pte_lock and huge_pte_lockptr From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747316051298735514?= X-GMAIL-MSGID: =?utf-8?q?1747316051298735514?= They are replaced with hugetlb_pte_lock{,ptr}. All callers that haven't already been replaced don't get called when using HGM, so we handle them by populating hugetlb_ptes with the standard, hstate-sized huge PTEs. Signed-off-by: James Houghton --- include/linux/hugetlb.h | 28 +++------------------------- mm/hugetlb.c | 15 ++++++++++----- 2 files changed, 13 insertions(+), 30 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 5378b98cc7b8..e6dc25b15403 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -1015,14 +1015,6 @@ static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask) return modified_mask; } -static inline spinlock_t *huge_pte_lockptr(unsigned int shift, - struct mm_struct *mm, pte_t *pte) -{ - if (shift == PMD_SHIFT) - return pmd_lockptr(mm, (pmd_t *) pte); - return &mm->page_table_lock; -} - #ifndef hugepages_supported /* * Some platform decide whether they support huge pages at boot @@ -1226,12 +1218,6 @@ static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask) return 0; } -static inline spinlock_t *huge_pte_lockptr(unsigned int shift, - struct mm_struct *mm, pte_t *pte) -{ - return &mm->page_table_lock; -} - static inline void hugetlb_count_init(struct mm_struct *mm) { } @@ -1307,16 +1293,6 @@ int hugetlb_collapse(struct mm_struct *mm, struct vm_area_struct *vma, } #endif -static inline spinlock_t *huge_pte_lock(struct hstate *h, - struct mm_struct *mm, pte_t *pte) -{ - spinlock_t *ptl; - - ptl = huge_pte_lockptr(huge_page_shift(h), mm, pte); - spin_lock(ptl); - return ptl; -} - static inline spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte) { @@ -1324,7 +1300,9 @@ spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte) BUG_ON(!hpte->ptep); if (hpte->ptl) return hpte->ptl; - return huge_pte_lockptr(hugetlb_pte_shift(hpte), mm, hpte->ptep); + if (hugetlb_pte_level(hpte) == HUGETLB_LEVEL_PMD) + return pmd_lockptr(mm, (pmd_t *) hpte->ptep); + return &mm->page_table_lock; } static inline diff --git a/mm/hugetlb.c b/mm/hugetlb.c index d80db81a1fa5..9d4e41c41f78 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5164,9 +5164,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, put_page(hpage); /* Install the new huge page if src pte stable */ - dst_ptl = huge_pte_lock(h, dst, dst_pte); - src_ptl = huge_pte_lockptr(huge_page_shift(h), - src, src_pte); + dst_ptl = hugetlb_pte_lock(dst, &dst_hpte); + src_ptl = hugetlb_pte_lockptr(src, &src_hpte); spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); entry = huge_ptep_get(src_pte); if (!pte_same(src_pte_old, entry)) { @@ -7465,6 +7464,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, pte_t *spte = NULL; pte_t *pte; spinlock_t *ptl; + struct hugetlb_pte hpte; i_mmap_lock_read(mapping); vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) { @@ -7485,7 +7485,8 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, if (!spte) goto out; - ptl = huge_pte_lock(hstate_vma(vma), mm, spte); + hugetlb_pte_populate(&hpte, (pte_t *)pud, PUD_SHIFT, HUGETLB_LEVEL_PUD); + ptl = hugetlb_pte_lock(mm, &hpte); if (pud_none(*pud)) { pud_populate(mm, pud, (pmd_t *)((unsigned long)spte & PAGE_MASK)); @@ -8179,6 +8180,7 @@ void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) unsigned long address, start, end; spinlock_t *ptl; pte_t *ptep; + struct hugetlb_pte hpte; if (!(vma->vm_flags & VM_MAYSHARE)) return; @@ -8203,7 +8205,10 @@ void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) ptep = huge_pte_offset(mm, address, sz); if (!ptep) continue; - ptl = huge_pte_lock(h, mm, ptep); + + hugetlb_pte_populate(&hpte, ptep, huge_page_shift(h), + hpage_size_to_level(sz)); + ptl = hugetlb_pte_lock(mm, &hpte); huge_pmd_unshare(mm, vma, address, ptep); spin_unlock(ptl); } From patchwork Fri Oct 21 16:36:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6855 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp796826wrr; Fri, 21 Oct 2022 09:41:18 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6YC1efpK5lpJlZcZfoPd9xI7eORie0zfCywgeCMYnDubcY2sIkHXk2i8rMi3veGBa54Kc1 X-Received: by 2002:a05:6a00:1707:b0:562:e790:dfc3 with SMTP id h7-20020a056a00170700b00562e790dfc3mr20161899pfc.59.1666370478582; Fri, 21 Oct 2022 09:41:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370478; cv=none; d=google.com; s=arc-20160816; b=mDUdSYYtRKdKRySXh3X7D9b70F0zNKffK8f7zcxPMUI9o7ZL+2Luqs+zLqhiZt0rwz Jx9eyBFLe8uJ1PITRYWLNWFu3NeUfLq/4S+hpxeS2CnVy8g0ez5XsSD5qk8mp/ADidmM 2fgw27ndqv+bEYtnZmQNOBmJowMzFk255Br5tLaAwkw727Py/BzTRF4s4wcOFGd9kcIE y3nc3u3Y2hrrm+5nZT1KnAqeUu5qk8gJT4QcDoXqIVXq1xwFK1+fAIPT11tc+Z3zksfZ hCCDCcewxJ+meLUwJ7DP6GnQYUN/PwmQZkhfwJqCvEHUVHv/Kzl0hpS5xWvTuRDtqM0p kVdA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=8LccpPhXfmW4rVnqf/7+o+Hx7HGdpTBJy+WhJxfYYjY=; b=g0SBY3tjV66MAG8Y6iggaSLQyydtZTlHZGFyC9bcO+7F1Ct8ifSKzNcDvnYUqueEDo 3nRK8EL3Ztb36+TZL7OPGIc87lx5CN+RMWzeZN1QAM3G3lSvrhQX0ASWEPYFKbG1N72n pkvZ7NpOgwgN3V8StBzbToBWHCu33nXbw1DaNs9vlkACBQh1BbKnJHuPx6iGtf02zBB0 vYMioi4fYwR/gS3oEUKWQG4zRXsoMUZEd7wcf7VU26bGjvT+/YP7rHEJ9ISBg2dkiw7s bbIFbeyWR0/9AnIs1If9p9nym81ra6l+Ve8NAzX0gF0QNZkRWvfio54tl4p8MeGaYe8e tk5A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=VtedBtwv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id rm5-20020a17090b3ec500b0020bc071d30csi7831231pjb.97.2022.10.21.09.41.06; Fri, 21 Oct 2022 09:41:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=VtedBtwv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231433AbiJUQkl (ORCPT + 99 others); Fri, 21 Oct 2022 12:40:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52664 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231171AbiJUQjH (ORCPT ); Fri, 21 Oct 2022 12:39:07 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3538428B1A2 for ; Fri, 21 Oct 2022 09:37:52 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-36ab1ae386bso3393707b3.16 for ; Fri, 21 Oct 2022 09:37:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=8LccpPhXfmW4rVnqf/7+o+Hx7HGdpTBJy+WhJxfYYjY=; b=VtedBtwvWHz/xR7mbbUZi/QoevZsZHK58lrDg/Og1dHq+4sTO0SzuJwbDL01OD4aRO ZrzqzImsUW7zaLsPz9uDyqGaQmYDBP/ewXtMT4FyGfv8JkL2w+joXY1RfIXLgBKES2Sj vqywDuAhyTmd/LLF7nkcY7zjNmxvC9i895d0uGYTI/cSG6WlZY+43S6OfJD/E2SXCvXm X2hL0i7SeDE8rCss/xStUkiKMm2focDWaDnd8WQz9Joh9DY2g8UAYbi2pu7QynxsVTmQ UZERGl+LChK8MQdnam9/1x/By+2QIZQaEESDONOlLX/D7ClwvnQ3+jVU/EIpCh5KiFQc fYpg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=8LccpPhXfmW4rVnqf/7+o+Hx7HGdpTBJy+WhJxfYYjY=; b=cviTifGuCilDRJ/dBDjemKOOEqnhM7FoMTpn/P0Zm5xp7GdDpYo+GXspcG2/qmH1lY woERaH7cn/FYNh73aTtf20CjfFGKH7Mzlaf53NRBvXLjpQOLcneF4FhesMRDaIV7VCbc qHCYWg0ej1cHINxbLsQOUcvrDsZQ2oCBURdoqQ8ZT733S2vE5bHvxifwEhivjjplArNR 5v9tJrSy5MK5TGSCF/RsCKlurzx/QA+PLc400+9dgZtXWPiPoLlehcEmBoUj0Voyxx9+ rAUzU7Briw8V5hnQprXntsjuOTYkBMkOlAyQAlqwKEG73mbDCKrPLH8oIxhy0n+Jclbs GXOA== X-Gm-Message-State: ACrzQf2etIi8YFfd5YgXUhNSZB1pkZMR8+6Mq8QFeCsacNPdUcKTuaKZ tB9kkfRUkCxt3xaLUuv/nSReNRhFIF21uXAl X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:2d63:0:b0:6ca:3fe:3f2d with SMTP id s35-20020a252d63000000b006ca03fe3f2dmr12173515ybe.90.1666370271836; Fri, 21 Oct 2022 09:37:51 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:54 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-39-jthoughton@google.com> Subject: [RFC PATCH v2 38/47] hugetlb: replace make_huge_pte with make_huge_pte_with_shift From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747316090796669161?= X-GMAIL-MSGID: =?utf-8?q?1747316090796669161?= This removes the old definition of make_huge_pte, where now we always require the shift to be explicitly given. All callsites are cleaned up. Signed-off-by: James Houghton --- mm/hugetlb.c | 31 ++++++++++++------------------- 1 file changed, 12 insertions(+), 19 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 9d4e41c41f78..b26142bec4fe 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4908,9 +4908,9 @@ const struct vm_operations_struct hugetlb_vm_ops = { .pagesize = hugetlb_vm_op_pagesize, }; -static pte_t make_huge_pte_with_shift(struct vm_area_struct *vma, - struct page *page, int writable, - int shift) +static pte_t make_huge_pte(struct vm_area_struct *vma, + struct page *page, int writable, + int shift) { pte_t entry; @@ -4926,14 +4926,6 @@ static pte_t make_huge_pte_with_shift(struct vm_area_struct *vma, return entry; } -static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page, - int writable) -{ - unsigned int shift = huge_page_shift(hstate_vma(vma)); - - return make_huge_pte_with_shift(vma, page, writable, shift); -} - static void set_huge_ptep_writable(struct vm_area_struct *vma, unsigned long address, pte_t *ptep) { @@ -4974,10 +4966,12 @@ static void hugetlb_install_page(struct vm_area_struct *vma, pte_t *ptep, unsigned long addr, struct page *new_page) { + struct hstate *h = hstate_vma(vma); __SetPageUptodate(new_page); hugepage_add_new_anon_rmap(new_page, vma, addr); - set_huge_pte_at(vma->vm_mm, addr, ptep, make_huge_pte(vma, new_page, 1)); - hugetlb_count_add(pages_per_huge_page(hstate_vma(vma)), vma->vm_mm); + set_huge_pte_at(vma->vm_mm, addr, ptep, make_huge_pte(vma, new_page, 1, + huge_page_shift(h))); + hugetlb_count_add(pages_per_huge_page(h), vma->vm_mm); ClearHPageRestoreReserve(new_page); SetHPageMigratable(new_page); } @@ -5737,7 +5731,8 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, page_remove_rmap(old_page, vma, true); hugepage_add_new_anon_rmap(new_page, vma, haddr); set_huge_pte_at(mm, haddr, ptep, - make_huge_pte(vma, new_page, !unshare)); + make_huge_pte(vma, new_page, !unshare, + huge_page_shift(h))); SetHPageMigratable(new_page); /* Make the old page be freed below */ new_page = old_page; @@ -6033,7 +6028,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, page_dup_file_rmap(page, true); subpage = hugetlb_find_subpage(h, page, haddr_hgm); - new_pte = make_huge_pte_with_shift(vma, subpage, + new_pte = make_huge_pte(vma, subpage, ((vma->vm_flags & VM_WRITE) && (vma->vm_flags & VM_SHARED)), hpte->shift); @@ -6481,8 +6476,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, subpage = hugetlb_find_subpage(h, page, dst_addr); WARN_ON_ONCE(subpage != page && !hugetlb_hgm_enabled(dst_vma)); - _dst_pte = make_huge_pte_with_shift(dst_vma, subpage, writable, - dst_hpte->shift); + _dst_pte = make_huge_pte(dst_vma, subpage, writable, dst_hpte->shift); /* * Always mark UFFDIO_COPY page dirty; note that this may not be * extremely important for hugetlbfs for now since swapping is not @@ -8044,8 +8038,7 @@ int hugetlb_collapse(struct mm_struct *mm, struct vm_area_struct *vma, page_dup_file_rmap(hpage, true); subpage = hugetlb_find_subpage(h, hpage, curr); - entry = make_huge_pte_with_shift(vma, subpage, - writable, hpte.shift); + entry = make_huge_pte(vma, subpage, writable, hpte.shift); set_huge_pte_at(mm, curr, hpte.ptep, entry); next_hpte: curr += hugetlb_pte_size(&hpte); From patchwork Fri Oct 21 16:36:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6857 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp796963wrr; Fri, 21 Oct 2022 09:41:33 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6QEmNO99HcbvLQwZOti7aUhLgCXh1795sNkRk81vByYAVaQcHNVTCSVT0ZjZ7OqTCdoA3N X-Received: by 2002:a17:90b:78e:b0:205:c9ae:21f9 with SMTP id l14-20020a17090b078e00b00205c9ae21f9mr23490370pjz.112.1666370492897; Fri, 21 Oct 2022 09:41:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370492; cv=none; d=google.com; s=arc-20160816; b=OtVDEH3vtRMcu82AUUyv1RiPfjSwRM8fNJw2NdovnF565xQgOCTkV2jkJ00UV+s6Fz tXYE+msEsFzbkxgCQ5TL4lmruYHsbt0d6l1YP0RS548KOMQb4yP9cEaXlax961iJn/+h ziN7+9tAR+CRgp2u7KDav4gH3F6mr6/mLBzZPaFx/HGIFaqIJxf2xfMwYT+1LJFoRYAJ YdpjA89t/HkwNjUrwZ5U1VbLgDZIhz/aHSOY4pCZUXkkLIWu02GVmK9xbUKLPRBjOa8m k8HSTYsNUi088eEJ3s33uNH5kEwZrux6VlM1FM0oxd33uznFWcbL0CYkUg7+jOpvEwz8 C4YA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=l0T+gx8ZiJIGzUiHsY9RRIxEbYNQB6e8ZQhqkah5o0o=; b=KP2flkjTkWZMQVf6BV/J1cNL0yrVkgMH72Gwap52qUbStU1V3Kugt0KolBiTHZBSCN osk9q5UhxdeZJPsOblwyJmjnIYluVxmXI67eDMYRbmUaT6DZ9zSkQGLpAzM11pcbp4KI SGJW7oR9zpuFPYaIfD+5/LboihlSwoeEiWqpIAMaGW2ckeuTGixxFqiddgzFcmWVnXs9 rXT0ISt7D3jswNtLqeVRJ9mDvD2Mnwhm9uQEjz7vsllHTag+/gyyO5w9kzz1ciWp7orH JOP6m3wr+P7sY8uqa3gTN5LL42SmHKQ+IB9UAT9Wh9U9chIKMSSbT7wKycRARIYlXG+a RcJA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=hcmOSRNp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l27-20020a63ba5b000000b0046301a9c718si28448547pgu.21.2022.10.21.09.41.17; Fri, 21 Oct 2022 09:41:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=hcmOSRNp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231447AbiJUQkw (ORCPT + 99 others); Fri, 21 Oct 2022 12:40:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52604 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231190AbiJUQjI (ORCPT ); Fri, 21 Oct 2022 12:39:08 -0400 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0FEF82892EC for ; Fri, 21 Oct 2022 09:37:53 -0700 (PDT) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-367dc159c2fso33850737b3.19 for ; Fri, 21 Oct 2022 09:37:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=l0T+gx8ZiJIGzUiHsY9RRIxEbYNQB6e8ZQhqkah5o0o=; b=hcmOSRNpRjq1NSM7DS6HOkNcODOyUhsGlgn2j//u74i5fPh2pgjEybQjits1Mwk5lZ lKPtFsy9JwRZ/QpCoIUrz+BOUQpmPn+x03wKvOaw4dkIE0YqoVSiYMWvcQlk22duLcOv Ih4SJ+4n2QJdtBkeEPXivg1YLJg/nUSmvxiKAqqtuM0iOZOwSeuYV7oa76jrjZexxv2d g2n70JA8Q7a6iYcl4XXTqj8YTsteJE6rxVLegrKhAN80Er0f8yLjwpslDHiqVDzr/NBy kxGlC87klGgk8zgpyI9hmrUjjd9CbOO8P2QhjI2W0lQZebZNHlUuk/9YNIni/j2WIMky 1A6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=l0T+gx8ZiJIGzUiHsY9RRIxEbYNQB6e8ZQhqkah5o0o=; b=GNWCZCcJNr+I3WzFPBtrqKMrn0XqxMbnh1cWkmQaTkl1ol0CYuI9xc5+9sLVndWJVq WsIZZW7P+0XNJko8mP0TwSUtZ2Ew1OfWAhmZmWh7WpjEYISoocUOEydZLIWiFL9bpW5w LcYcpsZhxoGS1fSIDoUF7HjjFrZ7ruzGePyo+wOQRnvBPBgWTTfmYnIGAFS3tFC7wRtz UfOyk94EWk3ejF4N7CwPPmOG6ONSau728aZrkz+JH50M3196hOUdUYA9humduQKdEGHe HUUq/N2K4oOJf9OB3ad0efC83PHkTVzIFLLrmURhG59OMv6h2wijHIhff7eWQClWNtEb CVOg== X-Gm-Message-State: ACrzQf1+L6S1007Bt3iW9WatFfTMWHBt1liCozudrGEdliuMzV09B9/2 YgyzUODhor5wEMeSidteg+8j/VMV+va+rO2g X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a81:5789:0:b0:35d:f12:4c0e with SMTP id l131-20020a815789000000b0035d0f124c0emr17814525ywb.26.1666370272899; Fri, 21 Oct 2022 09:37:52 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:55 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-40-jthoughton@google.com> Subject: [RFC PATCH v2 39/47] mm: smaps: add stats for HugeTLB mapping size From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747316105851205512?= X-GMAIL-MSGID: =?utf-8?q?1747316105851205512?= When the kernel is compiled with HUGETLB_HIGH_GRANULARITY_MAPPING, smaps may provide HugetlbPudMapped, HugetlbPmdMapped, and HugetlbPteMapped. Levels that are folded will not be outputted. Signed-off-by: James Houghton --- fs/proc/task_mmu.c | 101 +++++++++++++++++++++++++++++++++------------ 1 file changed, 75 insertions(+), 26 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index be78cdb7677e..16288d6dbf1d 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -405,6 +405,15 @@ struct mem_size_stats { unsigned long swap; unsigned long shared_hugetlb; unsigned long private_hugetlb; +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING +#ifndef __PAGETABLE_PUD_FOLDED + unsigned long hugetlb_pud_mapped; +#endif +#ifndef __PAGETABLE_PMD_FOLDED + unsigned long hugetlb_pmd_mapped; +#endif + unsigned long hugetlb_pte_mapped; +#endif u64 pss; u64 pss_anon; u64 pss_file; @@ -720,6 +729,35 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) } #ifdef CONFIG_HUGETLB_PAGE + +static void smaps_hugetlb_hgm_account(struct mem_size_stats *mss, + struct hugetlb_pte *hpte) +{ +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING + unsigned long size = hugetlb_pte_size(hpte); + + switch (hpte->level) { +#ifndef __PAGETABLE_PUD_FOLDED + case HUGETLB_LEVEL_PUD: + mss->hugetlb_pud_mapped += size; + break; +#endif +#ifndef __PAGETABLE_PMD_FOLDED + case HUGETLB_LEVEL_PMD: + mss->hugetlb_pmd_mapped += size; + break; +#endif + case HUGETLB_LEVEL_PTE: + mss->hugetlb_pte_mapped += size; + break; + default: + break; + } +#else + return; +#endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ +} + static int smaps_hugetlb_range(struct hugetlb_pte *hpte, unsigned long addr, struct mm_walk *walk) @@ -753,6 +791,8 @@ static int smaps_hugetlb_range(struct hugetlb_pte *hpte, mss->shared_hugetlb += hugetlb_pte_size(hpte); else mss->private_hugetlb += hugetlb_pte_size(hpte); + + smaps_hugetlb_hgm_account(mss, hpte); } return 0; } @@ -822,38 +862,47 @@ static void smap_gather_stats(struct vm_area_struct *vma, static void __show_smap(struct seq_file *m, const struct mem_size_stats *mss, bool rollup_mode) { - SEQ_PUT_DEC("Rss: ", mss->resident); - SEQ_PUT_DEC(" kB\nPss: ", mss->pss >> PSS_SHIFT); - SEQ_PUT_DEC(" kB\nPss_Dirty: ", mss->pss_dirty >> PSS_SHIFT); + SEQ_PUT_DEC("Rss: ", mss->resident); + SEQ_PUT_DEC(" kB\nPss: ", mss->pss >> PSS_SHIFT); + SEQ_PUT_DEC(" kB\nPss_Dirty: ", mss->pss_dirty >> PSS_SHIFT); if (rollup_mode) { /* * These are meaningful only for smaps_rollup, otherwise two of * them are zero, and the other one is the same as Pss. */ - SEQ_PUT_DEC(" kB\nPss_Anon: ", + SEQ_PUT_DEC(" kB\nPss_Anon: ", mss->pss_anon >> PSS_SHIFT); - SEQ_PUT_DEC(" kB\nPss_File: ", + SEQ_PUT_DEC(" kB\nPss_File: ", mss->pss_file >> PSS_SHIFT); - SEQ_PUT_DEC(" kB\nPss_Shmem: ", + SEQ_PUT_DEC(" kB\nPss_Shmem: ", mss->pss_shmem >> PSS_SHIFT); } - SEQ_PUT_DEC(" kB\nShared_Clean: ", mss->shared_clean); - SEQ_PUT_DEC(" kB\nShared_Dirty: ", mss->shared_dirty); - SEQ_PUT_DEC(" kB\nPrivate_Clean: ", mss->private_clean); - SEQ_PUT_DEC(" kB\nPrivate_Dirty: ", mss->private_dirty); - SEQ_PUT_DEC(" kB\nReferenced: ", mss->referenced); - SEQ_PUT_DEC(" kB\nAnonymous: ", mss->anonymous); - SEQ_PUT_DEC(" kB\nLazyFree: ", mss->lazyfree); - SEQ_PUT_DEC(" kB\nAnonHugePages: ", mss->anonymous_thp); - SEQ_PUT_DEC(" kB\nShmemPmdMapped: ", mss->shmem_thp); - SEQ_PUT_DEC(" kB\nFilePmdMapped: ", mss->file_thp); - SEQ_PUT_DEC(" kB\nShared_Hugetlb: ", mss->shared_hugetlb); - seq_put_decimal_ull_width(m, " kB\nPrivate_Hugetlb: ", + SEQ_PUT_DEC(" kB\nShared_Clean: ", mss->shared_clean); + SEQ_PUT_DEC(" kB\nShared_Dirty: ", mss->shared_dirty); + SEQ_PUT_DEC(" kB\nPrivate_Clean: ", mss->private_clean); + SEQ_PUT_DEC(" kB\nPrivate_Dirty: ", mss->private_dirty); + SEQ_PUT_DEC(" kB\nReferenced: ", mss->referenced); + SEQ_PUT_DEC(" kB\nAnonymous: ", mss->anonymous); + SEQ_PUT_DEC(" kB\nLazyFree: ", mss->lazyfree); + SEQ_PUT_DEC(" kB\nAnonHugePages: ", mss->anonymous_thp); + SEQ_PUT_DEC(" kB\nShmemPmdMapped: ", mss->shmem_thp); + SEQ_PUT_DEC(" kB\nFilePmdMapped: ", mss->file_thp); + SEQ_PUT_DEC(" kB\nShared_Hugetlb: ", mss->shared_hugetlb); + seq_put_decimal_ull_width(m, " kB\nPrivate_Hugetlb: ", mss->private_hugetlb >> 10, 7); - SEQ_PUT_DEC(" kB\nSwap: ", mss->swap); - SEQ_PUT_DEC(" kB\nSwapPss: ", +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING +#ifndef __PAGETABLE_PUD_FOLDED + SEQ_PUT_DEC(" kB\nHugetlbPudMapped: ", mss->hugetlb_pud_mapped); +#endif +#ifndef __PAGETABLE_PMD_FOLDED + SEQ_PUT_DEC(" kB\nHugetlbPmdMapped: ", mss->hugetlb_pmd_mapped); +#endif + SEQ_PUT_DEC(" kB\nHugetlbPteMapped: ", mss->hugetlb_pte_mapped); +#endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ + SEQ_PUT_DEC(" kB\nSwap: ", mss->swap); + SEQ_PUT_DEC(" kB\nSwapPss: ", mss->swap_pss >> PSS_SHIFT); - SEQ_PUT_DEC(" kB\nLocked: ", + SEQ_PUT_DEC(" kB\nLocked: ", mss->pss_locked >> PSS_SHIFT); seq_puts(m, " kB\n"); } @@ -869,18 +918,18 @@ static int show_smap(struct seq_file *m, void *v) show_map_vma(m, vma); - SEQ_PUT_DEC("Size: ", vma->vm_end - vma->vm_start); - SEQ_PUT_DEC(" kB\nKernelPageSize: ", vma_kernel_pagesize(vma)); - SEQ_PUT_DEC(" kB\nMMUPageSize: ", vma_mmu_pagesize(vma)); + SEQ_PUT_DEC("Size: ", vma->vm_end - vma->vm_start); + SEQ_PUT_DEC(" kB\nKernelPageSize: ", vma_kernel_pagesize(vma)); + SEQ_PUT_DEC(" kB\nMMUPageSize: ", vma_mmu_pagesize(vma)); seq_puts(m, " kB\n"); __show_smap(m, &mss, false); - seq_printf(m, "THPeligible: %d\n", + seq_printf(m, "THPeligible: %d\n", hugepage_vma_check(vma, vma->vm_flags, true, false, true)); if (arch_pkeys_enabled()) - seq_printf(m, "ProtectionKey: %8u\n", vma_pkey(vma)); + seq_printf(m, "ProtectionKey: %8u\n", vma_pkey(vma)); show_smap_vma_flags(m, vma); return 0; From patchwork Fri Oct 21 16:36:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6858 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp796971wrr; Fri, 21 Oct 2022 09:41:33 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5FnOlYbmC+2rTY6ndpxnwJ42++eN/AGo9+qa+0Uy5QT+uytZiObkVbDPyzpP2KgVa454RP X-Received: by 2002:a17:90b:4c86:b0:20d:402d:6155 with SMTP id my6-20020a17090b4c8600b0020d402d6155mr57172760pjb.229.1666370493296; Fri, 21 Oct 2022 09:41:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370493; cv=none; d=google.com; s=arc-20160816; b=zY2YYdyXQRVReqzYd42/I+L/sQzzic92SdNQlhbi3zGgrv0IO3axeroa1qUNRyyFui Hr7ML9/2QHwAVkT58C8StFnQWjGcQHblXUdzRNKzlbJGk+Iv48mJSYNtkg6P+YnFrRXh wGB0RwjtikySKCyWeOqsYyV8JeJNlIlNnJt4P0Z4oSonja8i4Oq6rWp8voIMC3If9oht /9xvfHdn6KVzDeaAN64V2IkTsSoACYFF+dz2DpiI0FF9w58lwjh0twNXLw/RwGpu6zJI cOB9sjX27ajtBmYy3G2XamzGbh4GWf8/JOhvGLwftXoTq+hXsLcHCiQ1kWJWqUAt01UD lgCQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=aZtBH3hoHoyLKHZRtw1whotmEhRsDJhNWbtzAg1Amnw=; b=AbAL5pzaRAtfHNfsFP5YQngfB1cbHXiWTIG3+WG9EiFCdiKi7YkDk/1XmpR/mgpXTj P4Pftx4kKi5BkpZFj4p24NTmJ1cK0iNIfS+HDxHKiDO1QTgvDLE9dI1E8IdZm4ML/LXQ qIMKsAQqKcMLsJQB/ufXP71QLBuyPB2EWnxl7eE+PIDdrg/xsePdFLHZW849JS4a7o7I s0vm/Fouuc5zDoX8KKCX3IJrQlXGM3d3cQ0Bpx2QQvjsEiqQlPYoB5kJVtJ4Vaf3bGlv GiWObJHaK1uXfZHCRyXfxg+qTqW3fWGr4Xuz5X6iYOfWPI0r+GpQURHMl6Zv063UCI6z LpBQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=hgVq1BGI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 23-20020a17090a1a1700b0020c060f42c9si6250192pjk.164.2022.10.21.09.41.20; Fri, 21 Oct 2022 09:41:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=hgVq1BGI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231459AbiJUQlA (ORCPT + 99 others); Fri, 21 Oct 2022 12:41:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54668 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231334AbiJUQjN (ORCPT ); Fri, 21 Oct 2022 12:39:13 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5B71328B1B2 for ; Fri, 21 Oct 2022 09:37:54 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id f9-20020a25b089000000b006be298e2a8dso3756312ybj.20 for ; Fri, 21 Oct 2022 09:37:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=aZtBH3hoHoyLKHZRtw1whotmEhRsDJhNWbtzAg1Amnw=; b=hgVq1BGI9uAPQNPd5svcoZ0uICUDeObQg8sjHrHyUINwI5zeVQNqGOHjoo/I6apJKi k/JHaeZ4Yl0qrBsh6RDFwRouNrgEUvtGuCUfpRfpfN4sO+Xf2T8PiR1oaMkO0yS/UNTz 23yBS9W01SZ1r6pO17zULLRSmQAG2HfnAL6Oj1CQ5zD5aGiHW6MLxM2mYGGk74xNkJVY PqQQmAGOeK1qbySxR4UJaQgs9hCNXsrXirmumCJnGX6I+NXqom+pwXFznBlBrvL3S+zD jT93upMwLSjpb8myFKnMEGe3Y5MDjPbD5ykOxmhldKIp5uvY/jR4K4NRBGpRWxnp01tl jMzg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=aZtBH3hoHoyLKHZRtw1whotmEhRsDJhNWbtzAg1Amnw=; b=0tAWEkzxssSMde6OIaUykJVpz1t9uMOAg0ko8Ek2ZJC+MbQFrQVv0LkR8Dyd61WANH R5XEs6+XfUgtI+G9sNaBA4rps00+HiQN0H0UlUeGApVyuF45BB2+RHetlPamUS/HSId2 GozY6Kvon5LWTAIQrmY9rf6VJTGRlxE5Af47M8RSmXCGGBRBG/5ieKnimCrJRoE/NhKO X8ufjwbAxZNm0p8gQSheKWKTrSoqeHSEOO52ckioy5WpER+AFcSbFRw2i/uWeLueaLHZ aoDyarFRvnOYWO80CDQLjLiTq6EfThB/OwbzBqNB1okzD3WdJc1LStLh0+e6HIhrMb4W vpwA== X-Gm-Message-State: ACrzQf0QSXuZYp0CkXO6A9MW3UsEpDkkZo+ruqUOo23xJ83qGX3JKedu dfoP0+/tT1vbzk8Qo75IE60xvREDpAoHznwC X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6902:1083:b0:6c0:7c4f:f093 with SMTP id v3-20020a056902108300b006c07c4ff093mr17523772ybu.25.1666370273909; Fri, 21 Oct 2022 09:37:53 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:56 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-41-jthoughton@google.com> Subject: [RFC PATCH v2 40/47] hugetlb: x86: enable high-granularity mapping From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747316106118782382?= X-GMAIL-MSGID: =?utf-8?q?1747316106118782382?= Now that HGM is fully supported for GENERAL_HUGETLB, x86 can enable it. The x86 KVM MMU already properly handles HugeTLB HGM pages (it does a page table walk to determine which size to use in the second-stage page table instead of, for example, checking vma_mmu_pagesize, like arm64 does). We could also enable HugeTLB HGM for arm (32-bit) at this point, as it also uses GENERAL_HUGETLB and I don't see anything else that is needed for it. However, I haven't tested on arm at all, so I won't enable it. Signed-off-by: James Houghton --- arch/x86/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 6d1879ef933a..6d7103266e61 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -124,6 +124,7 @@ config X86 select ARCH_WANT_GENERAL_HUGETLB select ARCH_WANT_HUGE_PMD_SHARE select ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP if X86_64 + select ARCH_WANT_HUGETLB_HIGH_GRANULARITY_MAPPING select ARCH_WANT_LD_ORPHAN_WARN select ARCH_WANTS_THP_SWAP if X86_64 select ARCH_HAS_PARANOID_L1D_FLUSH From patchwork Fri Oct 21 16:36:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6867 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp802401wrr; Fri, 21 Oct 2022 09:55:32 -0700 (PDT) X-Google-Smtp-Source: AMsMyM666UhGVVEbjkowDd24R8WOKkIry4eEnK/DU+cp3RIBii6w4apoxdw4tGRqwnOjd+BH3ITT X-Received: by 2002:a17:907:2ceb:b0:78d:b765:c50d with SMTP id hz11-20020a1709072ceb00b0078db765c50dmr16271425ejc.73.1666371332301; Fri, 21 Oct 2022 09:55:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666371332; cv=none; d=google.com; s=arc-20160816; b=Uh5IeARj6+yt4JXEnIRkJybn82QJrrgQvuY6ikgGC3ER85m+YflzAduFIn8hrHvCra zRgMn//gYyKr4piDR4Tq28G/03slMBsuCFlaC4dRThQSQmPtFEVOFDLbmG5dQnhvsPvB LEJntoPGy6CNf9GyBi3uR4I4r4rnAvmRlKYr7ze3m5xQbn/EK8rTtYGGSM5tbHLPfK05 44EA3zEyN3f9NWAoohSGIwdur/Y3UOHQ0qFl4fQtZMvYo1HHTb6Lk+2neAVzHMrFJ/1D /HFY24Hofpc0S73h647DBmrxWvYEBvPj0gNV/+fY7H/KvMJVqW1Oq2aP37jtv3wWm/2e XIlA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=u2L8dpQwPLEEtV1+N27Ru5/DaXt/OYV1HEOgWR9nKqQ=; b=cVq3Wl35BFv90w12CZjkA/k/pzzjhJT+6vtC5pGeWWLa9iIjDLHLObtVeooVoypEOG e3SHwuTJN07AP4Cu9a0DtEXMQq/pA1aYxH980EgLJsjR063mX5ZAmAGdSOKh7lnWhK2Z SheNNnycmiZwQcVQrL9v6YqdLSPET8YuFEySDAVUkUZRpk72KH31i7Z77YXNH1zzk1EA Y4da0YV3XXdiEjhyYVnTdJo9/hztfpjzU1FeCrbCNCjR/boKZf/ULPjd9BacuNMhUgLI x4Fbf4Na5jXxIp+8KIDU6bvGhLSbHNPwcKlbXmU/mXPrka3+9Od2DjYeLlsL5EW6VJ7i 0LnQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=hQEOt5qb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s19-20020a056402521300b0045782fcb80asi21167941edd.225.2022.10.21.09.54.56; Fri, 21 Oct 2022 09:55:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=hQEOt5qb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231196AbiJUQlj (ORCPT + 99 others); Fri, 21 Oct 2022 12:41:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53970 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231280AbiJUQkC (ORCPT ); Fri, 21 Oct 2022 12:40:02 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4E00A28B1B7 for ; Fri, 21 Oct 2022 09:37:56 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-367f94b9b16so33998887b3.11 for ; Fri, 21 Oct 2022 09:37:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=u2L8dpQwPLEEtV1+N27Ru5/DaXt/OYV1HEOgWR9nKqQ=; b=hQEOt5qbhtIciEkifW3aJtfdGkwkMFYThz3+zH4m+l0DZeOwkOHljylI87o4133YIY +9qlJoQYYs5X58L9fVYGOUcAW3htdR4wJqVgJegiRWhoDZN6Hw6rLU1Q3Xarnd4Z6B2S 0Qm/YGHX8i6NhqgYKz1MbFHfMwocZZ13peTo4v5Lo7uK+rU6z8w5PasvDpHbgItoLjy/ kO/NxxIWM2aQQDZLQNut8OALQcGGq9dp7sd0njpT2gzPLMMoFH/CqGvCCb5bZ8st1noJ mGHFYBCd4aJ2Q+Rh1Ul7ceCnhLsZNJJXbbirMkaOkrcUY5VvqDaCqxSFUg1RZviE76DG R0lw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=u2L8dpQwPLEEtV1+N27Ru5/DaXt/OYV1HEOgWR9nKqQ=; b=Koq9v1RXpDRXoxTioJYQPsZJqc7qTzzXDA3HBNReZQwM8+C6nuMMGfVrEwzkP0FUlW koHwTqzIeVc8WWFpEawh4/Nb4Wa98XtCb6inBJbwn3ktuu+xGj+z5dz0yohZfxkp1nSg 6EZezsW9advB+lNMq+juWIB1zkYt3IW/iaFn/O7Qq3kxUXQRyVWSWNwufCiI8hLjD2XD GK77llSmxpBGJ1S/8ypFSosuOD6eoKgOoWs7k6CLmlwYGQ13yLISvOXBKOSeSafFOH2m iPlXZRXhnO1I/LdGJJ0dOGi1whi6MLxfxiCkqqFfXmkjc7Ow2YAvLpSKWlfCT9VwAMt7 xYkw== X-Gm-Message-State: ACrzQf2No7WOT1REHwglItHNrTvSP6gLV3tJZjk1X8Z1liCaVW6w4IVa +8FZLXWEiAR+SmdaFeQusCK8vsRM6enu0oRP X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6902:105:b0:6bc:fb54:f4da with SMTP id o5-20020a056902010500b006bcfb54f4damr17650934ybh.284.1666370275067; Fri, 21 Oct 2022 09:37:55 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:57 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-42-jthoughton@google.com> Subject: [RFC PATCH v2 41/47] docs: hugetlb: update hugetlb and userfaultfd admin-guides with HGM info From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747316986357581772?= X-GMAIL-MSGID: =?utf-8?q?1747316986357581772?= This includes information about how UFFD_FEATURE_MINOR_HUGETLBFS_HGM should be used and when MADV_COLLAPSE should be used with it. Signed-off-by: James Houghton --- Documentation/admin-guide/mm/hugetlbpage.rst | 4 ++++ Documentation/admin-guide/mm/userfaultfd.rst | 16 +++++++++++++++- 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst b/Documentation/admin-guide/mm/hugetlbpage.rst index 19f27c0d92e0..ca7db15ae768 100644 --- a/Documentation/admin-guide/mm/hugetlbpage.rst +++ b/Documentation/admin-guide/mm/hugetlbpage.rst @@ -454,6 +454,10 @@ errno set to EINVAL or exclude hugetlb pages that extend beyond the length if not hugepage aligned. For example, munmap(2) will fail if memory is backed by a hugetlb page and the length is smaller than the hugepage size. +It is possible for users to map HugeTLB pages at a higher granularity than +normal using HugeTLB high-granularity mapping (HGM). For example, when using 1G +pages on x86, a user could map that page with 4K PTEs, 2M PMDs, a combination of +the two. See Documentation/admin-guide/mm/userfaultfd.rst. Examples ======== diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst index 83f31919ebb3..19877aaad61b 100644 --- a/Documentation/admin-guide/mm/userfaultfd.rst +++ b/Documentation/admin-guide/mm/userfaultfd.rst @@ -115,6 +115,14 @@ events, except page fault notifications, may be generated: areas. ``UFFD_FEATURE_MINOR_SHMEM`` is the analogous feature indicating support for shmem virtual memory areas. +- ``UFFD_FEATURE_MINOR_HUGETLBFS_HGM`` indicates that the kernel supports + small-page-aligned regions for ``UFFDIO_CONTINUE`` in HugeTLB-backed + virtual memory areas. ``UFFD_FEATURE_MINOR_HUGETLBFS_HGM`` and + ``UFFD_FEATURE_EXACT_ADDRESS`` must both be specified explicitly to enable + this behavior. If ``UFFD_FEATURE_MINOR_HUGETLBFS_HGM`` is specified but + ``UFFD_FEATURE_EXACT_ADDRESS`` is not, then ``UFFDIO_API`` will fail with + ``EINVAL``. + The userland application should set the feature flags it intends to use when invoking the ``UFFDIO_API`` ioctl, to request that those features be enabled if supported. @@ -169,7 +177,13 @@ like to do to resolve it: the page cache). Userspace has the option of modifying the page's contents before resolving the fault. Once the contents are correct (modified or not), userspace asks the kernel to map the page and let the - faulting thread continue with ``UFFDIO_CONTINUE``. + faulting thread continue with ``UFFDIO_CONTINUE``. If this is done at the + base-page size in a transparent-hugepage-eligible VMA or in a HugeTLB VMA + (requires ``UFFD_FEATURE_MINOR_HUGETLBFS_HGM``), then userspace may want to + use ``MADV_COLLAPSE`` when a hugepage is fully populated to inform the kernel + that it may be able to collapse the mapping. ``MADV_COLLAPSE`` will may undo + the effect of any ``UFFDIO_WRITEPROTECT`` calls on the collapsed address + range. Notes: From patchwork Fri Oct 21 16:36:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6860 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp797046wrr; Fri, 21 Oct 2022 09:41:44 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7oU1GGIdXzzLGVUbsFqq2YnZNIsi7vHhmHbGAVFLZ1mpltuzhd59mtuzf9tqFerhy0NiJR X-Received: by 2002:a17:902:f60a:b0:186:5d06:8da4 with SMTP id n10-20020a170902f60a00b001865d068da4mr13121764plg.106.1666370503803; Fri, 21 Oct 2022 09:41:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370503; cv=none; d=google.com; s=arc-20160816; b=QozfVf9doQAQ7F9XW50CjGybXBaFYT/uaaKpsjLQLStPS2SD8LWETTMEfs9gjGOekU iSw3WC4/0YXtHNGplaDZtgVNNOhSYarONI91/6GxCnTtiGB8J2D+A9Jr6/YydKzG96/y D054hycTh3xrtgr+aXjsFiv3rhbtI13207WgbrqOjlojMN5qVDX4dYTUEgJV2Lu9NhBR aV1cdm3eYU1kBXfnmIcg7JiE5CgE8xav0JItV80lwMNoaBP2942g8p4psl2o7Iwu0XqY rVXl7Z4hinsNT1ZTGff9o0Uodxj8b9eUMsMo3a4tc+BCwk2y8fDM+GyjcEhW8G5wlC7X 9w1w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=TmmajsS+yF0Y+smc9uDIgSrHWhiWWvbw0yNGeqoJ1B4=; b=TZKFt3/4SLDKbY5JUQc1KjpLnyHVbR6K9epM9pgjLpMN3TCSQx0GN4yWUbeexCf33U IHFIQdXWqb+3EiR8DOpwkq3rD+FzbdVMXX7XACfzNzDtDGQa5l41YxrYhNLo4hCUK6Xd gb/bv805ca4CDTjbaL0N0NOMCnYFiFEXi7XD+Rp4LGDBmzgrJ/2n2LouDv/pTZDdootj 02/41KK+QMCYEPxcdpetx5/beACLnY7IP2JAi587tkEr9h3mlvVo0Ey2jjxMr8Ocf9gs SAlaJzgxKtC/plW4l24+hoPxnqXUSd7kmXszIGkmAkteJua8wAbI2SuiqsoDPOQB+TRc X7aQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="k/CWTQ2S"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a35-20020a631a23000000b004610dceef13si24080663pga.336.2022.10.21.09.41.30; Fri, 21 Oct 2022 09:41:43 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="k/CWTQ2S"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231157AbiJUQlL (ORCPT + 99 others); Fri, 21 Oct 2022 12:41:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52606 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231350AbiJUQjY (ORCPT ); Fri, 21 Oct 2022 12:39:24 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 19F6828B1BD for ; Fri, 21 Oct 2022 09:37:57 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-360b9418f64so34092477b3.7 for ; Fri, 21 Oct 2022 09:37:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=TmmajsS+yF0Y+smc9uDIgSrHWhiWWvbw0yNGeqoJ1B4=; b=k/CWTQ2SpQJfccU+Jmu5n5m3XKwNOcn09ydVormKK4vMumeUdIUvjCVCl5GIZ9SIyh C7Inn0LyzhxsNSOMjm/4020a5BNYgiNEhy78qAvhW1Twuq2iFbHu60cGARKIabVVIokR utHTdzZJsAzCrzV4+3TLa3tG7Ervq6FCIxHIVfY86Qt7eoS/8UiFf/E6EZGcJbnBuQHC zZp90iS8OmSktGOr/SeHXA0/BixmjPhSHK2BJ4SArbN3ZuBM6g0pEG45aPZ9TlAcsNeA Wt8tZXRb5+asexv88RaTMkaItUpev6DjCp9jt/Dnh6ebstfYDAyaIV6Q4tpqhyYhq9jK 6mNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=TmmajsS+yF0Y+smc9uDIgSrHWhiWWvbw0yNGeqoJ1B4=; b=Z9r5svrNp7mcxN6GccCzeATb4v7s4PxpKarJDM2Yq8Ery8rfKlgGUo9PB++S0PIqp8 3db6gpzjzgr9fuUi6ewcfIaGKvIu+R9t2IgHefqs7bYdNhltJ/C39fHyEIOmN+knfEdV 4G589b+TDJz+vpQjmdrwmt5rvREhqWNzjeJv//mmYPSanJ1+nVcZJLjoGogcWdFLJY+k V881JSuM222Rr9lzdaCSN/tNMoAnCcib1ERmyEXTjkIps5VkWSgexeaNyrRjPwks630n IJ2Ye4aU91Ngj/CqR8d0ASOADuPGUlpoG72YlrAQgcX8M4nDELvwiMdoBuy3GVNMO8O0 /Djw== X-Gm-Message-State: ACrzQf0Pdg5mWB4L1e6ED5YtR1oFGZBYfO2/dK+1O3OJcH6+XT0j6ecM i1Cjx+DWAosrV/NcsknbwsQGxYjTMbe5VqKs X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a81:1aca:0:b0:35f:1d9e:fbc8 with SMTP id a193-20020a811aca000000b0035f1d9efbc8mr17306594ywa.261.1666370275936; Fri, 21 Oct 2022 09:37:55 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:58 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-43-jthoughton@google.com> Subject: [RFC PATCH v2 42/47] docs: proc: include information about HugeTLB HGM From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747316117459044373?= X-GMAIL-MSGID: =?utf-8?q?1747316117459044373?= This includes the updates that have been made to smaps, specifically, the addition of Hugetlb[Pud,Pmd,Pte]Mapped. Signed-off-by: James Houghton --- Documentation/filesystems/proc.rst | 56 +++++++++++++++++------------- 1 file changed, 32 insertions(+), 24 deletions(-) diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst index ec6cfdf1796a..807d6c0694c2 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -444,29 +444,32 @@ Memory Area, or VMA) there is a series of lines such as the following:: 08048000-080bc000 r-xp 00000000 03:02 13130 /bin/bash - Size: 1084 kB - KernelPageSize: 4 kB - MMUPageSize: 4 kB - Rss: 892 kB - Pss: 374 kB - Pss_Dirty: 0 kB - Shared_Clean: 892 kB - Shared_Dirty: 0 kB - Private_Clean: 0 kB - Private_Dirty: 0 kB - Referenced: 892 kB - Anonymous: 0 kB - LazyFree: 0 kB - AnonHugePages: 0 kB - ShmemPmdMapped: 0 kB - Shared_Hugetlb: 0 kB - Private_Hugetlb: 0 kB - Swap: 0 kB - SwapPss: 0 kB - KernelPageSize: 4 kB - MMUPageSize: 4 kB - Locked: 0 kB - THPeligible: 0 + Size: 1084 kB + KernelPageSize: 4 kB + MMUPageSize: 4 kB + Rss: 892 kB + Pss: 374 kB + Pss_Dirty: 0 kB + Shared_Clean: 892 kB + Shared_Dirty: 0 kB + Private_Clean: 0 kB + Private_Dirty: 0 kB + Referenced: 892 kB + Anonymous: 0 kB + LazyFree: 0 kB + AnonHugePages: 0 kB + ShmemPmdMapped: 0 kB + Shared_Hugetlb: 0 kB + Private_Hugetlb: 0 kB + HugetlbPudMapped: 0 kB + HugetlbPmdMapped: 0 kB + HugetlbPteMapped: 0 kB + Swap: 0 kB + SwapPss: 0 kB + KernelPageSize: 4 kB + MMUPageSize: 4 kB + Locked: 0 kB + THPeligible: 0 VmFlags: rd ex mr mw me dw The first of these lines shows the same information as is displayed for the @@ -507,10 +510,15 @@ implementation. If this is not desirable please file a bug report. "ShmemPmdMapped" shows the ammount of shared (shmem/tmpfs) memory backed by huge pages. -"Shared_Hugetlb" and "Private_Hugetlb" show the ammounts of memory backed by +"Shared_Hugetlb" and "Private_Hugetlb" show the amounts of memory backed by hugetlbfs page which is *not* counted in "RSS" or "PSS" field for historical reasons. And these are not included in {Shared,Private}_{Clean,Dirty} field. +If the kernel was compiled with ``CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING``, +"HugetlbPudMapped", "HugetlbPmdMapped", and "HugetlbPteMapped" will appear and +show the amount of HugeTLB memory mapped with PUDs, PMDs, and PTEs respectively. +See Documentation/admin-guide/mm/hugetlbpage.rst. + "Swap" shows how much would-be-anonymous memory is also used, but out on swap. For shmem mappings, "Swap" includes also the size of the mapped (and not From patchwork Fri Oct 21 16:36:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6859 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp797005wrr; Fri, 21 Oct 2022 09:41:37 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4uuhr4KmQNOM+0cIt7CkHrbg2xB7AN62kGo2kr+lvdMtbdKu4+wABcoZjBCs/OdnLdBGig X-Received: by 2002:a17:903:41cb:b0:183:1648:be0f with SMTP id u11-20020a17090341cb00b001831648be0fmr19995226ple.18.1666370497562; Fri, 21 Oct 2022 09:41:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370497; cv=none; d=google.com; s=arc-20160816; b=CWg48XjhtD2icQns606GRnc5W8Rwpy/cfUSowgpOaTSKEa/bzX7gu+gsHUvhXu95f6 nJT3Plmpm5nsHudnyJjcvAoe/XkT+1eq8TwFyPvDgHEXfQB7xcRHD7v1NgdWqCtY5Svk nAubx7kbEk4WIhAUFFY1casU69cKSfmzBDf2+VKyCpV3JOMSuUNWI7CK4mF+ktQ7+Zbu YBZyureT15Ez/cE9pQBj1SmTSk0sTuyvyvoz4EfxqjRb8tKsev63oM0uH3fSebldNpiw OiIx9X9aK5l0t6tFClircv+httHCI0zlOS3a6fmDVyATZVPKtTVTtxl32PDoY410Nz6l ct/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=LdHnE7caiqm4qshIBX12p5Rv53BpHm3Kmd+OvWlvh64=; b=sGjkIdAxC6DBGKth9ZqyLYO0EmR4d+qT/3woviydIOO8F3bl7j2E3nGiTMc51kpPUs TrbfnE829hL9ZqnEPPdL6SDBLswvuqqUtdC+/nHmMhYXqgZ/xkAmN9nHP7J7Nkq/Kdqz iTHkfg70Fnc9WIxp9DeyhJQzQhuelklLwcOFurSJmdl5U2XSKay2VLN8SiOpKfmfp0g9 P7N7KEjXmOp/SVfuWhpisnz3FWVs3a6FYkaZAQg9C7Qe86F2R6cspa9WTxkmoWcXPOuz yLvm3dZB1v5IGmqGAPdHGIhNqUjQZvNjp6P6M8oZW2T3k0EYK3dgBMBY9XHf8I1KBYbw aanQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=BIwoTuKH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d4-20020a056a0010c400b0056515a324adsi27855585pfu.90.2022.10.21.09.41.25; Fri, 21 Oct 2022 09:41:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=BIwoTuKH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231294AbiJUQlG (ORCPT + 99 others); Fri, 21 Oct 2022 12:41:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53400 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231207AbiJUQjP (ORCPT ); Fri, 21 Oct 2022 12:39:15 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1BCDC2892F2 for ; Fri, 21 Oct 2022 09:37:57 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id v17-20020a259d91000000b006b4c31c0640so3723404ybp.18 for ; Fri, 21 Oct 2022 09:37:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=LdHnE7caiqm4qshIBX12p5Rv53BpHm3Kmd+OvWlvh64=; b=BIwoTuKHg0KVxAgZkQCtc2lTNtp6Y3O1lU/37+Hr0O/jeYU6gp+nzdB+H8m1NuA1ZL VfIhzTnIwplVj+yHE00lZr+dFVwGgjXrUD+N5kHCoV/HhA1HAXwQNy9x27lvlVlgLIwf DtjoIt6kkMc1YxnjeRuFwMF1WhDCfWcSqQA8ZPBcfbEojzPyzzD3dfJjCbyiR5Oc3sTY SGe/+kTodt8Pm9AEqQzXBD2m1LtQnlBciskGh6NtkTnRzM+71sNNQHKJjcxjPPCLNpaD NGLrnmbOnuJ8iS/HFbyXfe7oQTovhAfVGgN7uv4nuezupV5Y/VexoSSWmkqo3fUcEME7 XpJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=LdHnE7caiqm4qshIBX12p5Rv53BpHm3Kmd+OvWlvh64=; b=azDVOvfrkK2fNG02OQG5hoy9ZY34fQA4opA8NXen0NuyR1G7MrwMzSj86estGiRFJE etSjwdWBG8E0BQrvvs6sgWpP6MqUbNYo5ealYFu7h/8gktm1Q77hlLOF4Ov2yEuNXlhH LZ1d+S8yfD97Y59tWAIgbyN0JZPVoQvkCepti7j6V0uzAPelCpyGIUf/3nrwpgZ7os63 rx/GDseUzf9u1v3O0/kHxLjjNctheX1m3/aZpuWpPL2ca+4hyotBiI149osGSg1NTxBV rOotXr36HwOhT0Z4m7It5nt7u0i3TREafJUOZydGGO/BDFw7zEo8TnaGYcRbdDhOVDcw VajA== X-Gm-Message-State: ACrzQf1ryhCG6IwN3JPq/uJJ2qmogDH68aVegpp67sU3UpDHF0EIA5i9 b6um56cgf2jf/lk5lf952AhBqPBAQibu5q0B X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a0d:e402:0:b0:368:5f54:d94b with SMTP id n2-20020a0de402000000b003685f54d94bmr8815902ywe.519.1666370276754; Fri, 21 Oct 2022 09:37:56 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:59 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-44-jthoughton@google.com> Subject: [RFC PATCH v2 43/47] selftests/vm: add HugeTLB HGM to userfaultfd selftest From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747316110975894421?= X-GMAIL-MSGID: =?utf-8?q?1747316110975894421?= This test case behaves similarly to the regular shared HugeTLB configuration, except that it uses 4K instead of hugepages, and that we ignore the UFFDIO_COPY tests, as UFFDIO_CONTINUE is the only ioctl that supports PAGE_SIZE-aligned regions. This doesn't test MADV_COLLAPSE. Other tests are added later to exercise MADV_COLLAPSE. Signed-off-by: James Houghton --- tools/testing/selftests/vm/userfaultfd.c | 90 +++++++++++++++++++----- 1 file changed, 74 insertions(+), 16 deletions(-) diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c index 7f22844ed704..c9cdfb20f292 100644 --- a/tools/testing/selftests/vm/userfaultfd.c +++ b/tools/testing/selftests/vm/userfaultfd.c @@ -73,9 +73,10 @@ static unsigned long nr_cpus, nr_pages, nr_pages_per_cpu, page_size, hpage_size; #define BOUNCE_POLL (1<<3) static int bounces; -#define TEST_ANON 1 -#define TEST_HUGETLB 2 -#define TEST_SHMEM 3 +#define TEST_ANON 1 +#define TEST_HUGETLB 2 +#define TEST_HUGETLB_HGM 3 +#define TEST_SHMEM 4 static int test_type; #define UFFD_FLAGS (O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY) @@ -93,6 +94,8 @@ static volatile bool test_uffdio_zeropage_eexist = true; static bool test_uffdio_wp = true; /* Whether to test uffd minor faults */ static bool test_uffdio_minor = false; +static bool test_uffdio_copy = true; + static bool map_shared; static int mem_fd; static unsigned long long *count_verify; @@ -151,7 +154,7 @@ static void usage(void) fprintf(stderr, "\nUsage: ./userfaultfd " "[hugetlbfs_file]\n\n"); fprintf(stderr, "Supported : anon, hugetlb, " - "hugetlb_shared, shmem\n\n"); + "hugetlb_shared, hugetlb_shared_hgm, shmem\n\n"); fprintf(stderr, "'Test mods' can be joined to the test type string with a ':'. " "Supported mods:\n"); fprintf(stderr, "\tsyscall - Use userfaultfd(2) (default)\n"); @@ -167,6 +170,11 @@ static void usage(void) exit(1); } +static bool test_is_hugetlb(void) +{ + return test_type == TEST_HUGETLB || test_type == TEST_HUGETLB_HGM; +} + #define _err(fmt, ...) \ do { \ int ret = errno; \ @@ -381,8 +389,12 @@ static struct uffd_test_ops *uffd_test_ops; static inline uint64_t uffd_minor_feature(void) { - if (test_type == TEST_HUGETLB && map_shared) - return UFFD_FEATURE_MINOR_HUGETLBFS; + if (test_is_hugetlb() && map_shared) + return UFFD_FEATURE_MINOR_HUGETLBFS | + (test_type == TEST_HUGETLB_HGM + ? (UFFD_FEATURE_MINOR_HUGETLBFS_HGM | + UFFD_FEATURE_EXACT_ADDRESS) + : 0); else if (test_type == TEST_SHMEM) return UFFD_FEATURE_MINOR_SHMEM; else @@ -393,7 +405,7 @@ static uint64_t get_expected_ioctls(uint64_t mode) { uint64_t ioctls = UFFD_API_RANGE_IOCTLS; - if (test_type == TEST_HUGETLB) + if (test_is_hugetlb()) ioctls &= ~(1 << _UFFDIO_ZEROPAGE); if (!((mode & UFFDIO_REGISTER_MODE_WP) && test_uffdio_wp)) @@ -500,13 +512,16 @@ static void uffd_test_ctx_clear(void) static void uffd_test_ctx_init(uint64_t features) { unsigned long nr, cpu; + uint64_t enabled_features = features; uffd_test_ctx_clear(); uffd_test_ops->allocate_area((void **)&area_src, true); uffd_test_ops->allocate_area((void **)&area_dst, false); - userfaultfd_open(&features); + userfaultfd_open(&enabled_features); + if ((enabled_features & features) != features) + err("couldn't enable all features"); count_verify = malloc(nr_pages * sizeof(unsigned long long)); if (!count_verify) @@ -726,13 +741,21 @@ static void uffd_handle_page_fault(struct uffd_msg *msg, struct uffd_stats *stats) { unsigned long offset; + unsigned long address; if (msg->event != UFFD_EVENT_PAGEFAULT) err("unexpected msg event %u", msg->event); + /* + * Round down address to nearest page_size. + * We do this manually because we specified UFFD_FEATURE_EXACT_ADDRESS + * to support UFFD_FEATURE_MINOR_HUGETLBFS_HGM. + */ + address = msg->arg.pagefault.address & ~(page_size - 1); + if (msg->arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_WP) { /* Write protect page faults */ - wp_range(uffd, msg->arg.pagefault.address, page_size, false); + wp_range(uffd, address, page_size, false); stats->wp_faults++; } else if (msg->arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_MINOR) { uint8_t *area; @@ -751,11 +774,10 @@ static void uffd_handle_page_fault(struct uffd_msg *msg, */ area = (uint8_t *)(area_dst + - ((char *)msg->arg.pagefault.address - - area_dst_alias)); + ((char *)address - area_dst_alias)); for (b = 0; b < page_size; ++b) area[b] = ~area[b]; - continue_range(uffd, msg->arg.pagefault.address, page_size); + continue_range(uffd, address, page_size); stats->minor_faults++; } else { /* @@ -782,7 +804,7 @@ static void uffd_handle_page_fault(struct uffd_msg *msg, if (msg->arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_WRITE) err("unexpected write fault"); - offset = (char *)(unsigned long)msg->arg.pagefault.address - area_dst; + offset = (char *)address - area_dst; offset &= ~(page_size-1); if (copy_page(uffd, offset)) @@ -1192,6 +1214,12 @@ static int userfaultfd_events_test(void) char c; struct uffd_stats stats = { 0 }; + if (!test_uffdio_copy) { + printf("Skipping userfaultfd events test " + "(test_uffdio_copy=false)\n"); + return 0; + } + printf("testing events (fork, remap, remove): "); fflush(stdout); @@ -1245,6 +1273,12 @@ static int userfaultfd_sig_test(void) char c; struct uffd_stats stats = { 0 }; + if (!test_uffdio_copy) { + printf("Skipping userfaultfd signal test " + "(test_uffdio_copy=false)\n"); + return 0; + } + printf("testing signal delivery: "); fflush(stdout); @@ -1538,6 +1572,12 @@ static int userfaultfd_stress(void) pthread_attr_init(&attr); pthread_attr_setstacksize(&attr, 16*1024*1024); + if (!test_uffdio_copy) { + printf("Skipping userfaultfd stress test " + "(test_uffdio_copy=false)\n"); + bounces = 0; + } + while (bounces--) { printf("bounces: %d, mode:", bounces); if (bounces & BOUNCE_RANDOM) @@ -1696,6 +1736,16 @@ static void set_test_type(const char *type) uffd_test_ops = &hugetlb_uffd_test_ops; /* Minor faults require shared hugetlb; only enable here. */ test_uffdio_minor = true; + } else if (!strcmp(type, "hugetlb_shared_hgm")) { + map_shared = true; + test_type = TEST_HUGETLB_HGM; + uffd_test_ops = &hugetlb_uffd_test_ops; + /* + * HugeTLB HGM only changes UFFDIO_CONTINUE, so don't test + * UFFDIO_COPY. + */ + test_uffdio_minor = true; + test_uffdio_copy = false; } else if (!strcmp(type, "shmem")) { map_shared = true; test_type = TEST_SHMEM; @@ -1731,6 +1781,7 @@ static void parse_test_type_arg(const char *raw_type) err("Unsupported test: %s", raw_type); if (test_type == TEST_HUGETLB) + /* TEST_HUGETLB_HGM gets small pages. */ page_size = hpage_size; else page_size = sysconf(_SC_PAGE_SIZE); @@ -1813,22 +1864,29 @@ int main(int argc, char **argv) nr_cpus = x < y ? x : y; } nr_pages_per_cpu = bytes / page_size / nr_cpus; + if (test_type == TEST_HUGETLB_HGM) + /* + * `page_size` refers to the page_size we can use in + * UFFDIO_CONTINUE. We still need nr_pages to be appropriately + * aligned, so align it here. + */ + nr_pages_per_cpu -= nr_pages_per_cpu % (hpage_size / page_size); if (!nr_pages_per_cpu) { _err("invalid MiB"); usage(); } + nr_pages = nr_pages_per_cpu * nr_cpus; bounces = atoi(argv[3]); if (bounces <= 0) { _err("invalid bounces"); usage(); } - nr_pages = nr_pages_per_cpu * nr_cpus; - if (test_type == TEST_SHMEM || test_type == TEST_HUGETLB) { + if (test_type == TEST_SHMEM || test_is_hugetlb()) { unsigned int memfd_flags = 0; - if (test_type == TEST_HUGETLB) + if (test_is_hugetlb()) memfd_flags = MFD_HUGETLB; mem_fd = memfd_create(argv[0], memfd_flags); if (mem_fd < 0) From patchwork Fri Oct 21 16:37:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6866 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp801410wrr; Fri, 21 Oct 2022 09:53:34 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5yD01xyag9Lx/ueVnvk4E9hlDXPKGqk6tyv/NaAX4YDjB8P4klXRDW/XBlQ77tNAAHgxVz X-Received: by 2002:a17:906:58cc:b0:78d:ce9c:3787 with SMTP id e12-20020a17090658cc00b0078dce9c3787mr16077064ejs.715.1666371204163; Fri, 21 Oct 2022 09:53:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666371204; cv=none; d=google.com; s=arc-20160816; b=UUDRX3PhKvoe80H3wCjySBBqcc1POk6YCS9/DhrB/P1lWTCATeMbK8AcLJIZd5xXn7 Va0lQq8z0ybcLyDuZsyLsplEld2AzAt2Au+SGs8fzcrDRZ6j2tTquaplc7MM9EplzsrU FIn3Fc31sWQDiwq2HhwM6iiLKvchAX8cWcvmNO0sKJRH/kgJj36ziKCvBvuRif42gsV2 G9j7AT+wKPxBmNn+/4ZQQcdXHervjvEtPSTfF8l6W/GWvCrtNmh5y3/Ugsc7DAq1Xc0P BO1Pe6A4fyYcUlVWW5udppKV1vxwWcO+HDvD9tHH/xeNzdsuGLMEv+mFd8Ki1IBKC0uZ Jmkg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=9y9Kps0uMUxaUviUQmHpRsZg2wkh8g+3BdgpOUsdpD0=; b=cCrk+OyDdPaVXiryIoWM9jCiKHjcC8c8bFaYN2g3Z8CZ1A2CvKO8+0x2zQluqXHd4X 6WdoUGnANUdGboJ7TpEwZUyw8mcT90XFGZxidZYX4DGdZOdbiFOAkSn3mghct69N9VZ3 nnHH4H3C0eVi6MtoOVXEO/rpzFzyAXjfkcPqREOhTPfEWBnMGn4nP1foa4KCB5CEOFHf G0xDGuFj+Z2Oci7JBQ+dlm2/WcVRM7ERwli3jg+5u72FFtIC9nAFsoMMWzVMNuFkzGar xG0WjTXqmWaOAuuGi+XNEkGmh4ub2D2vgf7/MXmULBDh3XO11kR+hSZZAdjdzUWo75fl NzWQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="l/7qHlTl"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 4-20020a170906328400b0078d473448afsi18868466ejw.233.2022.10.21.09.52.51; Fri, 21 Oct 2022 09:53:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="l/7qHlTl"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231490AbiJUQlY (ORCPT + 99 others); Fri, 21 Oct 2022 12:41:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52698 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231295AbiJUQje (ORCPT ); Fri, 21 Oct 2022 12:39:34 -0400 Received: from mail-vs1-xe4a.google.com (mail-vs1-xe4a.google.com [IPv6:2607:f8b0:4864:20::e4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5700A28C310 for ; Fri, 21 Oct 2022 09:37:59 -0700 (PDT) Received: by mail-vs1-xe4a.google.com with SMTP id 65-20020a670344000000b0039b3020da1bso1053592vsd.3 for ; Fri, 21 Oct 2022 09:37:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=9y9Kps0uMUxaUviUQmHpRsZg2wkh8g+3BdgpOUsdpD0=; b=l/7qHlTlqK2+odgNL+qqaWtZd95RSQmfjj2EnMNPwPzYUopC/nQW9oknGDwJAIzM/V toI7WNbfT5WjVMs9srJI2QLMSDhRHFgSpdvMACL3z+IizihCksdhhkBJ5mEioYTHCvo0 FmIcUxXZMiWS/ZIIxOqSMSBxAewQGqIH5uUcc6n9Dwa4sD9nRkufaMXxSFItUN7lDgfT 9USzcKP2HR68osgtg5cJtSMN3cXJ4Jhqhm/TxBcpRTG1sKA8CwhRmqo/CkU6R16k9yiZ DFjIvC7I9+PQFY+MIk1rCSd9HJjdGzJoKWKELko+HJE1P98O26BCJKJPZ6KXbv/ZPUig 8qww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=9y9Kps0uMUxaUviUQmHpRsZg2wkh8g+3BdgpOUsdpD0=; b=lQjolw3JnpHzgvISkzrbZUqzqq650Nm9t6w8GfSVWdGOD/5ta5dgUySPY0ouvS95P0 5Hbo+8fDJGinFH/Y4KwbYGnCCpbrTXn8MDWRazgKmsqeSAzPHzkM+o2F3QBngiFON+jL inTXpkF7sySUk+r9CkBU+/Wj86XafSdDfyPAHdnp9FAB0Rauy6lNxPTmD5nCBT26XsAw ZvnwLYYs5AKPT+2yytuTb2T2lsql58jpBm6HqvYOm7++A61r746m7cfIsUu5x1RQxdcW hWt8IJOqz3Qs6K8He1BhN9NMFvo3JCkn2Dv2DKRhpMdgl1G0uzzhcsQYcR8+yQDjJVnq GS3w== X-Gm-Message-State: ACrzQf3fO39wODjFI36JCOLh3qCWl/z77s+jQmnbr8cKw0HPxLwcZR5e Tnro57lxxMkfzbxbUV2yGgDv+hgqXmBHVLgQ X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a67:c80d:0:b0:3aa:895:9630 with SMTP id u13-20020a67c80d000000b003aa08959630mr2609998vsk.15.1666370277616; Fri, 21 Oct 2022 09:37:57 -0700 (PDT) Date: Fri, 21 Oct 2022 16:37:00 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-45-jthoughton@google.com> Subject: [RFC PATCH v2 44/47] selftests/kvm: add HugeTLB HGM to KVM demand paging selftest From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747316852192614596?= X-GMAIL-MSGID: =?utf-8?q?1747316852192614596?= This test exercises the GUP paths for HGM. MADV_COLLAPSE is not tested. Signed-off-by: James Houghton --- .../selftests/kvm/demand_paging_test.c | 20 ++++++++++++++++--- .../testing/selftests/kvm/include/test_util.h | 2 ++ tools/testing/selftests/kvm/lib/kvm_util.c | 2 +- tools/testing/selftests/kvm/lib/test_util.c | 14 +++++++++++++ 4 files changed, 34 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c index 779ae54f89c4..67ca8703c6b7 100644 --- a/tools/testing/selftests/kvm/demand_paging_test.c +++ b/tools/testing/selftests/kvm/demand_paging_test.c @@ -76,6 +76,12 @@ static int handle_uffd_page_request(int uffd_mode, int uffd, uint64_t addr) clock_gettime(CLOCK_MONOTONIC, &start); + /* + * We're using UFFD_FEATURE_EXACT_ADDRESS, so round down the address. + * This is needed to support HugeTLB high-granularity mapping. + */ + addr &= ~(demand_paging_size - 1); + if (uffd_mode == UFFDIO_REGISTER_MODE_MISSING) { struct uffdio_copy copy; @@ -214,7 +220,8 @@ static void setup_demand_paging(struct kvm_vm *vm, pthread_t *uffd_handler_thread, int pipefd, int uffd_mode, useconds_t uffd_delay, struct uffd_handler_args *uffd_args, - void *hva, void *alias, uint64_t len) + void *hva, void *alias, uint64_t len, + enum vm_mem_backing_src_type src_type) { bool is_minor = (uffd_mode == UFFDIO_REGISTER_MODE_MINOR); int uffd; @@ -244,9 +251,15 @@ static void setup_demand_paging(struct kvm_vm *vm, TEST_ASSERT(uffd >= 0, __KVM_SYSCALL_ERROR("userfaultfd()", uffd)); uffdio_api.api = UFFD_API; - uffdio_api.features = 0; + uffdio_api.features = is_minor + ? UFFD_FEATURE_EXACT_ADDRESS | UFFD_FEATURE_MINOR_HUGETLBFS_HGM + : 0; ret = ioctl(uffd, UFFDIO_API, &uffdio_api); TEST_ASSERT(ret != -1, __KVM_SYSCALL_ERROR("UFFDIO_API", ret)); + if (src_type == VM_MEM_SRC_SHARED_HUGETLB_HGM) + TEST_ASSERT(uffdio_api.features & + UFFD_FEATURE_MINOR_HUGETLBFS_HGM, + "UFFD_FEATURE_MINOR_HUGETLBFS_HGM not present"); uffdio_register.range.start = (uint64_t)hva; uffdio_register.range.len = len; @@ -329,7 +342,8 @@ static void run_test(enum vm_guest_mode mode, void *arg) pipefds[i * 2], p->uffd_mode, p->uffd_delay, &uffd_args[i], vcpu_hva, vcpu_alias, - vcpu_args->pages * perf_test_args.guest_page_size); + vcpu_args->pages * perf_test_args.guest_page_size, + p->src_type); } } diff --git a/tools/testing/selftests/kvm/include/test_util.h b/tools/testing/selftests/kvm/include/test_util.h index befc754ce9b3..0410326dbc18 100644 --- a/tools/testing/selftests/kvm/include/test_util.h +++ b/tools/testing/selftests/kvm/include/test_util.h @@ -96,6 +96,7 @@ enum vm_mem_backing_src_type { VM_MEM_SRC_ANONYMOUS_HUGETLB_16GB, VM_MEM_SRC_SHMEM, VM_MEM_SRC_SHARED_HUGETLB, + VM_MEM_SRC_SHARED_HUGETLB_HGM, NUM_SRC_TYPES, }; @@ -114,6 +115,7 @@ size_t get_def_hugetlb_pagesz(void); const struct vm_mem_backing_src_alias *vm_mem_backing_src_alias(uint32_t i); size_t get_backing_src_pagesz(uint32_t i); bool is_backing_src_hugetlb(uint32_t i); +bool is_backing_src_shared_hugetlb(enum vm_mem_backing_src_type src_type); void backing_src_help(const char *flag); enum vm_mem_backing_src_type parse_backing_src_type(const char *type_name); long get_run_delay(void); diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index f1cb1627161f..7d769a117e14 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -896,7 +896,7 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm, region->fd = -1; if (backing_src_is_shared(src_type)) region->fd = kvm_memfd_alloc(region->mmap_size, - src_type == VM_MEM_SRC_SHARED_HUGETLB); + is_backing_src_shared_hugetlb(src_type)); region->mmap_start = mmap(NULL, region->mmap_size, PROT_READ | PROT_WRITE, diff --git a/tools/testing/selftests/kvm/lib/test_util.c b/tools/testing/selftests/kvm/lib/test_util.c index 6d23878bbfe1..710dc42077fe 100644 --- a/tools/testing/selftests/kvm/lib/test_util.c +++ b/tools/testing/selftests/kvm/lib/test_util.c @@ -254,6 +254,13 @@ const struct vm_mem_backing_src_alias *vm_mem_backing_src_alias(uint32_t i) */ .flag = MAP_SHARED, }, + [VM_MEM_SRC_SHARED_HUGETLB_HGM] = { + /* + * Identical to shared_hugetlb except for the name. + */ + .name = "shared_hugetlb_hgm", + .flag = MAP_SHARED, + }, }; _Static_assert(ARRAY_SIZE(aliases) == NUM_SRC_TYPES, "Missing new backing src types?"); @@ -272,6 +279,7 @@ size_t get_backing_src_pagesz(uint32_t i) switch (i) { case VM_MEM_SRC_ANONYMOUS: case VM_MEM_SRC_SHMEM: + case VM_MEM_SRC_SHARED_HUGETLB_HGM: return getpagesize(); case VM_MEM_SRC_ANONYMOUS_THP: return get_trans_hugepagesz(); @@ -288,6 +296,12 @@ bool is_backing_src_hugetlb(uint32_t i) return !!(vm_mem_backing_src_alias(i)->flag & MAP_HUGETLB); } +bool is_backing_src_shared_hugetlb(enum vm_mem_backing_src_type src_type) +{ + return src_type == VM_MEM_SRC_SHARED_HUGETLB || + src_type == VM_MEM_SRC_SHARED_HUGETLB_HGM; +} + static void print_available_backing_src_types(const char *prefix) { int i; From patchwork Fri Oct 21 16:37:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6861 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp797072wrr; Fri, 21 Oct 2022 09:41:46 -0700 (PDT) X-Google-Smtp-Source: AMsMyM63bmBVHjC558cGmyxtzs9NTycpzi1cpFtXXwdhV6qk4HryEFRG1AaCCSn0UJQhbGvgOe/V X-Received: by 2002:a17:90b:4fc3:b0:20c:dbba:e614 with SMTP id qa3-20020a17090b4fc300b0020cdbbae614mr59556555pjb.163.1666370506263; Fri, 21 Oct 2022 09:41:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370506; cv=none; d=google.com; s=arc-20160816; b=qxypDZP11sg6iDpTrLmi4/f5MTbY7Zrhf2cafQohIM0+zP1aqo6TPVN2CoOmwnhiPv uZBAENAuY9iW0oCL3pUkP2FO15j2eH9sCc8IegpYB/ZWNvmww/i+U4dwWaG1JnJA126N e1YV4AV9UOf0WNfBwS+Oly8frzSp1/R1vQrhv+hrsMS3ij0oKK7x8/4zGxKdcYPdZ5Nl mYcJxKmhJSw9zMw20olbfSPU+EWOCocshY3YGGZsLuz2tU//AANWcrgFdkWAtR10LHdC WYyMzOtGtyZuUs/83+h5R0Isjih7FOUW69uWCiJA+39im1UTw2FQnri2wCrGgvj6j3D3 xbjg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=rgu86dTBraBMEbo7EYK3wnE2QExNEIbm0lf0S3p+RAM=; b=CxUw6v86nLiD5Iz0F1SqvJU2IuampBdVJ4tytEq1Jl1wNZzGwO2NgwDtfQmwBkMaLX DOLluWqDgL4Fjr/tv9HrQzVEk9bShctJuF3GKIyN//+MxYk++P617+8FbUq4opGfYidQ ROSPXGEXiDZItvKEBh53jtK//TkKdb/42wCj2aJCfgQbYOMXAYYeCgq8mM6bveJOap2X E5yE8C63EZb6JykPI1Bqp2GHiUFfF8W432tAwx0sQs5OaC4yH1WQMz4vxS9ZswV7sihn WsBp0sxDxnr/thsbzrG0NPBvYbe5HgZngEwr7JzSMzdBKJ/uRQ824X8zxrHCsJUJXCm8 a7Fw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=CDWV7vuG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a19-20020a1709027d9300b00186748fe89asi3490596plm.191.2022.10.21.09.41.33; Fri, 21 Oct 2022 09:41:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=CDWV7vuG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231389AbiJUQlP (ORCPT + 99 others); Fri, 21 Oct 2022 12:41:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54690 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230503AbiJUQjP (ORCPT ); Fri, 21 Oct 2022 12:39:15 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DF22328C322 for ; Fri, 21 Oct 2022 09:38:02 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-36885d835e9so33784767b3.17 for ; Fri, 21 Oct 2022 09:38:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=rgu86dTBraBMEbo7EYK3wnE2QExNEIbm0lf0S3p+RAM=; b=CDWV7vuGy8k4HSVoqHUUqHq4k/ljtoqD3xxO+wejMuxAHTGuNtdRUbXNgGj+V+78Zx oU99fdRPWJDzHYJJwVoaXcgYCaOBiPK/e2LKKQV+PqTKnIRDuYGoo28iGCk8HwJwicT7 +Q6T6t9e35ieUG8SKhA6GWk9Yngm0X5rtw33d4A+66eY3fHNfiCumY+W+3EeR+pd+k2g EJ/M2S1Kfe/MJZEOuP4oko60RqTpq+UKJsdxFeoYbx/3dIB+RmMCT6+Z8zWR1IyMbdHb V6aR+BivHNTlLmnlyRdniwOEWMZQbRMfTL1+xWOhSK0EfGWdjiwxTjY0tFQre39sDFt5 TSmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=rgu86dTBraBMEbo7EYK3wnE2QExNEIbm0lf0S3p+RAM=; b=FE8f+/DLNgeMxhOm4kywgDYCK59xqRXV7LIlzgLHQq+puxTpbFJCGLEfJuq3/hXFVD ZVc+yF60X1jf4QDJI9m0J7rfjs4o21ccGDPaAl5m7j5X9jMkH4D1Hhe0K5NZYPsPhsEc Tcu8+CE9ltxMH4B3p0eaGQN20pnPJcs2KHPVcC97ZPKL+2wTtDeD7So51bddGxv49ols JguKKI72oRWh7PFbchZQXgn07c1Jyy4Rgn6qtaVeoCz5dBiI7Ykdv03gGIVGG8t779aC jHy09gWbbGlPueUOtyAipm6o3iYyuuxXHGkKECGi6m9VCmpTAc3zUdN2f8s6WgbiF5Wk tMrQ== X-Gm-Message-State: ACrzQf38kB2MISoa3h2WraA0xTsJlKeYUe6NV6uUFAPW45cfRrw6Fcu/ f3r7ShzvNSsBJbXFN2YSL0eEDjDTEmGiNIxE X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:f448:0:b0:6ca:22e1:638c with SMTP id p8-20020a25f448000000b006ca22e1638cmr10364996ybe.252.1666370278696; Fri, 21 Oct 2022 09:37:58 -0700 (PDT) Date: Fri, 21 Oct 2022 16:37:01 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-46-jthoughton@google.com> Subject: [RFC PATCH v2 45/47] selftests/vm: add anon and shared hugetlb to migration test From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747316120306126765?= X-GMAIL-MSGID: =?utf-8?q?1747316120306126765?= Shared HugeTLB mappings are migrated best-effort. Sometimes, due to being unable to grab the VMA lock for writing, migration may just randomly fail. To allow for that, we allow retries. Signed-off-by: James Houghton --- tools/testing/selftests/vm/migration.c | 83 ++++++++++++++++++++++++-- 1 file changed, 79 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/vm/migration.c b/tools/testing/selftests/vm/migration.c index 1cec8425e3ca..21577a84d7e4 100644 --- a/tools/testing/selftests/vm/migration.c +++ b/tools/testing/selftests/vm/migration.c @@ -13,6 +13,7 @@ #include #include #include +#include #define TWOMEG (2<<20) #define RUNTIME (60) @@ -59,11 +60,12 @@ FIXTURE_TEARDOWN(migration) free(self->pids); } -int migrate(uint64_t *ptr, int n1, int n2) +int migrate(uint64_t *ptr, int n1, int n2, int retries) { int ret, tmp; int status = 0; struct timespec ts1, ts2; + int failed = 0; if (clock_gettime(CLOCK_MONOTONIC, &ts1)) return -1; @@ -78,6 +80,9 @@ int migrate(uint64_t *ptr, int n1, int n2) ret = move_pages(0, 1, (void **) &ptr, &n2, &status, MPOL_MF_MOVE_ALL); if (ret) { + if (++failed < retries) + continue; + if (ret > 0) printf("Didn't migrate %d pages\n", ret); else @@ -88,6 +93,7 @@ int migrate(uint64_t *ptr, int n1, int n2) tmp = n2; n2 = n1; n1 = tmp; + failed = 0; } return 0; @@ -128,7 +134,7 @@ TEST_F_TIMEOUT(migration, private_anon, 2*RUNTIME) if (pthread_create(&self->threads[i], NULL, access_mem, ptr)) perror("Couldn't create thread"); - ASSERT_EQ(migrate(ptr, self->n1, self->n2), 0); + ASSERT_EQ(migrate(ptr, self->n1, self->n2, 1), 0); for (i = 0; i < self->nthreads - 1; i++) ASSERT_EQ(pthread_cancel(self->threads[i]), 0); } @@ -158,7 +164,7 @@ TEST_F_TIMEOUT(migration, shared_anon, 2*RUNTIME) self->pids[i] = pid; } - ASSERT_EQ(migrate(ptr, self->n1, self->n2), 0); + ASSERT_EQ(migrate(ptr, self->n1, self->n2, 1), 0); for (i = 0; i < self->nthreads - 1; i++) ASSERT_EQ(kill(self->pids[i], SIGTERM), 0); } @@ -185,9 +191,78 @@ TEST_F_TIMEOUT(migration, private_anon_thp, 2*RUNTIME) if (pthread_create(&self->threads[i], NULL, access_mem, ptr)) perror("Couldn't create thread"); - ASSERT_EQ(migrate(ptr, self->n1, self->n2), 0); + ASSERT_EQ(migrate(ptr, self->n1, self->n2, 1), 0); + for (i = 0; i < self->nthreads - 1; i++) + ASSERT_EQ(pthread_cancel(self->threads[i]), 0); +} + +/* + * Tests the anon hugetlb migration entry paths. + */ +TEST_F_TIMEOUT(migration, private_anon_hugetlb, 2*RUNTIME) +{ + uint64_t *ptr; + int i; + + if (self->nthreads < 2 || self->n1 < 0 || self->n2 < 0) + SKIP(return, "Not enough threads or NUMA nodes available"); + + ptr = mmap(NULL, TWOMEG, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0); + if (ptr == MAP_FAILED) + SKIP(return, "Could not allocate hugetlb pages"); + + memset(ptr, 0xde, TWOMEG); + for (i = 0; i < self->nthreads - 1; i++) + if (pthread_create(&self->threads[i], NULL, access_mem, ptr)) + perror("Couldn't create thread"); + + ASSERT_EQ(migrate(ptr, self->n1, self->n2, 1), 0); for (i = 0; i < self->nthreads - 1; i++) ASSERT_EQ(pthread_cancel(self->threads[i]), 0); } +/* + * Tests the shared hugetlb migration entry paths. + */ +TEST_F_TIMEOUT(migration, shared_hugetlb, 2*RUNTIME) +{ + uint64_t *ptr; + int i; + int fd; + unsigned long sz; + struct statfs filestat; + + if (self->nthreads < 2 || self->n1 < 0 || self->n2 < 0) + SKIP(return, "Not enough threads or NUMA nodes available"); + + fd = memfd_create("tmp_hugetlb", MFD_HUGETLB); + if (fd < 0) + SKIP(return, "Couldn't create hugetlb memfd"); + + if (fstatfs(fd, &filestat) < 0) + SKIP(return, "Couldn't fstatfs hugetlb file"); + + sz = filestat.f_bsize; + + if (ftruncate(fd, sz)) + SKIP(return, "Couldn't allocate hugetlb pages"); + ptr = mmap(NULL, sz, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + if (ptr == MAP_FAILED) + SKIP(return, "Could not map hugetlb pages"); + + memset(ptr, 0xde, sz); + for (i = 0; i < self->nthreads - 1; i++) + if (pthread_create(&self->threads[i], NULL, access_mem, ptr)) + perror("Couldn't create thread"); + + ASSERT_EQ(migrate(ptr, self->n1, self->n2, 10), 0); + for (i = 0; i < self->nthreads - 1; i++) { + ASSERT_EQ(pthread_cancel(self->threads[i]), 0); + pthread_join(self->threads[i], NULL); + } + ftruncate(fd, 0); + close(fd); +} + TEST_HARNESS_MAIN From patchwork Fri Oct 21 16:37:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6864 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp799297wrr; Fri, 21 Oct 2022 09:47:16 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7UBNaTrFlk8yn3MLSrxmFKuXJ0hyzjeSh+qiHsXDuzWQdUpFV5WBGX/wRza3oXFAmGxJIp X-Received: by 2002:a17:90b:350d:b0:20d:5438:f594 with SMTP id ls13-20020a17090b350d00b0020d5438f594mr60036969pjb.216.1666370836604; Fri, 21 Oct 2022 09:47:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666370836; cv=none; d=google.com; s=arc-20160816; b=RHSRC1A4YAFmQvn+jx5MskoeflEf1p5l9SggsEHRuN3bzeDQrO9Zg5ovrEzxVpJWkQ jsSclU/2zLCHldk+GzxbYPCWZkfDjxsHaqN+rxox7mrTDwbYrT+jDA4pkIm6Ik5FFddZ XHbxENScglHqgkiC9GdqHxuFmDimmg3fy06k8pilb5etB28D1vCo5E9Fd/RU1HKCzhtC CUTB3KGRxeDM+GToV8NrTUM97ICIMBRJmPuhS2B5Kq0ZCkU8yVoPIi1y1wiguaCnqPxK Y+eTSCDmXE6+PTHk1Y/3CGMQLOmo+vJEXruNPATjAAXenTON5kO+0zE8rd8GYCq4b5pe 9K/A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=hIQEWHlSLo/XkVLbilcDDKMbq3Gd3wcTC7FKcNbq2ok=; b=s+3hGSGzdNJiR5H/LijAhQw3Tuj6edBtJtaKCrNqziY5uwTB6GtY70pHiXJxk9gdbO pYUsjzNbRszeDIezbHsXMMG9hxW9eIOUf4vsTnjVhOl/3Up0GmdUWMvrwJ1EGYlcqzxX C/yaIalP5BRXMwzalcIJqmIVq97xG9w0NYDYxOulKemxC7t0m20QGIpL5JSaO0r25kBI vd8Csjqvs9RO2GifFkUd/pNEKONCXL6aFSIbS6MML1Rr0lAKzsBQao2X9qIFLust2z9y 3wfpls9n+vZZKaWVU6VwiX+3z3tp0kd5wUy0jkfDVvLxWzm6Xfxix5c0/7cOAzCFDkxQ YsSw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=kwDsO1K2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u1-20020a17090341c100b0017486813f81si29077099ple.528.2022.10.21.09.46.56; Fri, 21 Oct 2022 09:47:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=kwDsO1K2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230398AbiJUQla (ORCPT + 99 others); Fri, 21 Oct 2022 12:41:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52730 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231362AbiJUQjt (ORCPT ); Fri, 21 Oct 2022 12:39:49 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DF07C28B19B for ; Fri, 21 Oct 2022 09:38:02 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id t6-20020a25b706000000b006b38040b6f7so3731874ybj.6 for ; Fri, 21 Oct 2022 09:38:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=hIQEWHlSLo/XkVLbilcDDKMbq3Gd3wcTC7FKcNbq2ok=; b=kwDsO1K2E/Ygmg/+9L0K1r1iIEM7XdI2MkqQcBsRmTHXB24iYmGgzQlcuEW2FKjX7m plhMEQJHzYM8g9ACG97HJE8ZCRQlE3FdOGdPSeXGWAUONWB42bOUJIthA2+V8zHKec/m fC6Mz0RKHc+cbwkbZjPlGqO5Uqz3GxoAe2WPWIrxxuon+8lLDyi8pd/A3tEgkPonOJa4 QGdAVkjCxEDMcQL2niAH/NOQhDCN8KVl0p+89L9GZ9Hl7RKE4T/WVn9sk/Y8tmIFdKCL iWCcUBvK1VCBpMt5bjp85RIt+A5+KMo4rALu1oLpUelOWsIAmrxOn9f5I2sbOnyNjqPF QggA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=hIQEWHlSLo/XkVLbilcDDKMbq3Gd3wcTC7FKcNbq2ok=; b=UgJOInjFOZhlj5mwpkjgakszZEepCMNrMY3wkrm8AkwEh8rTgweIjAA+gp+y0jn1aW fQVnv8Kv4p4jfGBcqeBw0p7q9HfVhPmh01cuX54ujNaoE6fwCcoby3M2MIcn66Ly4GZl Jr2qDT12eIdic3e2eypDqYV6B19rtQV2pusENW/vVOT7rqPxS11TuZekxHX7YhuGFU93 fCT1hfgCJVRU3HAUtugVzCcqN2hvyIGtsoekKJwFIDXNtVqhVOUUwDLLo3zvGpIjKQl0 xqa/qLPRIbV7+YnwwQ36e7GF1Q5zk/NhaL40Fk3BPhxkh4XsWa1zVLM3LMltJQz6t0mj pM1Q== X-Gm-Message-State: ACrzQf0Gyupdcr74pbqx16B5WMP7hEoLfz5CtVdo6SV8TkfrN7eFrrBi dWCEqQQerL4x0xbgE2EvT+gA8jPZPZPIJBJv X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a5b:f11:0:b0:6be:94c1:65e2 with SMTP id x17-20020a5b0f11000000b006be94c165e2mr17452348ybr.283.1666370279477; Fri, 21 Oct 2022 09:37:59 -0700 (PDT) Date: Fri, 21 Oct 2022 16:37:02 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-47-jthoughton@google.com> Subject: [RFC PATCH v2 46/47] selftests/vm: add hugetlb HGM test to migration selftest From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747316466355045295?= X-GMAIL-MSGID: =?utf-8?q?1747316466355045295?= This is mostly the same as the shared HugeTLB case, but instead of mapping the page with a regular page fault, we map it with lots of UFFDIO_CONTINUE operations. We also verify that the contents haven't changed after the migration, which would be the case if the post-migration PTEs pointed to the wrong page. Signed-off-by: James Houghton --- tools/testing/selftests/vm/migration.c | 139 +++++++++++++++++++++++++ 1 file changed, 139 insertions(+) diff --git a/tools/testing/selftests/vm/migration.c b/tools/testing/selftests/vm/migration.c index 21577a84d7e4..89cb5934f139 100644 --- a/tools/testing/selftests/vm/migration.c +++ b/tools/testing/selftests/vm/migration.c @@ -14,6 +14,11 @@ #include #include #include +#include +#include +#include +#include +#include #define TWOMEG (2<<20) #define RUNTIME (60) @@ -265,4 +270,138 @@ TEST_F_TIMEOUT(migration, shared_hugetlb, 2*RUNTIME) close(fd); } +#ifdef __NR_userfaultfd +static int map_at_high_granularity(char *mem, size_t length) +{ + int i; + int ret; + int uffd = syscall(__NR_userfaultfd, 0); + struct uffdio_api api; + struct uffdio_register reg; + int pagesize = getpagesize(); + + if (uffd < 0) { + perror("couldn't create uffd"); + return uffd; + } + + api.api = UFFD_API; + api.features = UFFD_FEATURE_MISSING_HUGETLBFS + | UFFD_FEATURE_MINOR_HUGETLBFS + | UFFD_FEATURE_MINOR_HUGETLBFS_HGM; + + ret = ioctl(uffd, UFFDIO_API, &api); + if (ret || api.api != UFFD_API) { + perror("UFFDIO_API failed"); + goto out; + } + + reg.range.start = (unsigned long)mem; + reg.range.len = length; + + reg.mode = UFFDIO_REGISTER_MODE_MISSING | UFFDIO_REGISTER_MODE_MINOR; + + ret = ioctl(uffd, UFFDIO_REGISTER, ®); + if (ret) { + perror("UFFDIO_REGISTER failed"); + goto out; + } + + /* UFFDIO_CONTINUE each 4K segment of the 2M page. */ + for (i = 0; i < length/pagesize; ++i) { + struct uffdio_continue cont; + + cont.range.start = (unsigned long long)mem + i * pagesize; + cont.range.len = pagesize; + cont.mode = 0; + ret = ioctl(uffd, UFFDIO_CONTINUE, &cont); + if (ret) { + fprintf(stderr, "UFFDIO_CONTINUE failed " + "for %llx -> %llx: %d\n", + cont.range.start, + cont.range.start + cont.range.len, + errno); + goto out; + } + } + ret = 0; +out: + close(uffd); + return ret; +} +#else +static int map_at_high_granularity(char *mem, size_t length) +{ + fprintf(stderr, "Userfaultfd missing\n"); + return -1; +} +#endif /* __NR_userfaultfd */ + +/* + * Tests the high-granularity hugetlb migration entry paths. + */ +TEST_F_TIMEOUT(migration, shared_hugetlb_hgm, 2*RUNTIME) +{ + uint64_t *ptr; + int i; + int fd; + unsigned long sz; + struct statfs filestat; + + if (self->nthreads < 2 || self->n1 < 0 || self->n2 < 0) + SKIP(return, "Not enough threads or NUMA nodes available"); + + fd = memfd_create("tmp_hugetlb", MFD_HUGETLB); + if (fd < 0) + SKIP(return, "Couldn't create hugetlb memfd"); + + if (fstatfs(fd, &filestat) < 0) + SKIP(return, "Couldn't fstatfs hugetlb file"); + + sz = filestat.f_bsize; + + if (ftruncate(fd, sz)) + SKIP(return, "Couldn't allocate hugetlb pages"); + + if (fallocate(fd, 0, 0, sz) < 0) { + perror("fallocate failed"); + SKIP(return, "fallocate failed"); + } + + ptr = mmap(NULL, sz, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + if (ptr == MAP_FAILED) + SKIP(return, "Could not allocate hugetlb pages"); + + /* + * We have to map_at_high_granularity before we memset, otherwise + * memset will map everything at the hugepage size. + */ + if (map_at_high_granularity((char *)ptr, sz) < 0) + SKIP(return, "Could not map HugeTLB range at high granularity"); + + /* Populate the page we're migrating. */ + for (i = 0; i < sz/sizeof(*ptr); ++i) + ptr[i] = i; + + for (i = 0; i < self->nthreads - 1; i++) + if (pthread_create(&self->threads[i], NULL, access_mem, ptr)) + perror("Couldn't create thread"); + + ASSERT_EQ(migrate(ptr, self->n1, self->n2, 10), 0); + for (i = 0; i < self->nthreads - 1; i++) { + ASSERT_EQ(pthread_cancel(self->threads[i]), 0); + pthread_join(self->threads[i], NULL); + } + + /* Check that the contents didnt' change. */ + for (i = 0; i < sz/sizeof(*ptr); ++i) { + ASSERT_EQ(ptr[i], i); + if (ptr[i] != i) + break; + } + + ftruncate(fd, 0); + close(fd); +} + TEST_HARNESS_MAIN From patchwork Fri Oct 21 16:37:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 6865 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp801288wrr; Fri, 21 Oct 2022 09:53:11 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6lJuHmzUXSIK7PqXpAjieOJyZ54hcdEruQf0xmwhBel6zxCj71yTaqnH/2qU+w94aQy/f4 X-Received: by 2002:a17:907:97c3:b0:79b:3f8d:a354 with SMTP id js3-20020a17090797c300b0079b3f8da354mr3150397ejc.461.1666371191622; Fri, 21 Oct 2022 09:53:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666371191; cv=none; d=google.com; s=arc-20160816; b=fq3W92on6q9H6P2TdXuI7zFg6odI3qREa4EvlJ552ojL6rqEtQEwaOF3vzC0gkt623 Kv8qr4vvsXQT3efODM6gWfq/J/FLt+VBmBK6ncEWLHajygWLPMoIsoPvDhBnAFbtz8rF NyTizNhMp1JDLhy30K9H+Li83uLo8cbXMQf5pvkD744lgxNvW0rmSLxA9ddAwnAq7oyF hGmMeb1xx78VJZpgh+iAfBFv8NzIjV+tYysY+HaxSsApwUOf54N1WyLLjEuiVzUwURBq LSzELSB5F6LHGtnXRsWBgFq9rTOuJrMD6S1YbF47PuvBd9ALvtWppVqF2aOJQSUsoRvp B0WA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=qP1ClP7q+g5S/3EyNUqzGRFXdONCD3+1un/KPsFlt6w=; b=qiP3cYQnupqNdPhNkgedbMbl1+SeLRG0aUERxVY4yjl0eyJlnmRXHLlps9n3nyx5uc utJl4ru1xrUQJmbXy7Bb9zey7UqJEb/FZ90Xwo7+ESU85ckeTwtnga9/phcB+vKoN+kY D68eGJdM5MzH8ZSSX0vrhCjYEozZGE5h67Paur7KNwbU1k48Z3QQEjGA+eQ4fnWV/tZD tiB8Fra7UmtO/sn88j5eyCpAyIpvIg9RFuPJ61P5o/Hh5n1lzYcuEIhhXnF9dANpA23h kSXksyxZBbpUQEamUmeNB2xjFIO3Lgjg2YN3qZlQ0tKjmABVqhkY0aSyF+GQrlY6GOSt hCXg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=QMIS+S3F; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ho19-20020a1709070e9300b007811ace1701si24011134ejc.445.2022.10.21.09.52.41; Fri, 21 Oct 2022 09:53:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=QMIS+S3F; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231503AbiJUQld (ORCPT + 99 others); Fri, 21 Oct 2022 12:41:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53968 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231277AbiJUQj6 (ORCPT ); Fri, 21 Oct 2022 12:39:58 -0400 Received: from mail-vs1-xe49.google.com (mail-vs1-xe49.google.com [IPv6:2607:f8b0:4864:20::e49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 536DD238247 for ; Fri, 21 Oct 2022 09:38:04 -0700 (PDT) Received: by mail-vs1-xe49.google.com with SMTP id m186-20020a6726c3000000b0039b2e2e040dso1041533vsm.9 for ; Fri, 21 Oct 2022 09:38:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=qP1ClP7q+g5S/3EyNUqzGRFXdONCD3+1un/KPsFlt6w=; b=QMIS+S3FZjAOFL/rBkRiry87Th5Kd5EwT0c05LySCp4nkCaELh/zdUD5+Aa9N/+7D7 SttWCG9Hi3HpsVOYK516SluMw+1U5skL5XiUCiezO60dxluHkHXNJD2BPxzYy9ElIoIo CqVjIAYat6VGsTooAFXgZhyY31kEJ2Pb7dKKQXnSZh6pHHmXteCqEnslryEPoKlJ/1HL ZJ2O0i+mmtp0eV1DkFySFLMzu3xcpuYZRBOXaXNztWI6gXkUaHxqX1Lu3FuqcpPbUo/J YTGrOVJlkmeX1/WedJcVMPwDEck6qTmRFmeRuAAp5UxDdvPetKdD3Fa+pCCwGznSQq8b F+kg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=qP1ClP7q+g5S/3EyNUqzGRFXdONCD3+1un/KPsFlt6w=; b=ZURbbmxFqfyo1vrwHNokNDYctS0PheIjzcA+iU1rxyWOmZ6fPlA9qGhk5V4bqKQrpE Yws7FbDQSiBYcHYAJGQGwfOU5i75FLzFb3o413QEMtr0wp9kqQOcLZ+hdlgDczneu7oF A9y955SMJGnvDwYW5gNTaY7RA7ZVqRUlE/h+9bLnP+Kh5FcZHQzNWOvz0PEEogoMvbNj 8DPm42Lu4CTFSWdTc5XIdKSlz0QBRZJwdPcNtJKCCqFxaFE1op+uVkd6xl19uTmrMGfN JzywLTmMsNPDg7+YI4MKcHiqb9wuqlyLwv7+FkoFTCX4O1Ft26EIugVWO2dMV62Eu49S oAQQ== X-Gm-Message-State: ACrzQf3z9dpK+pxvpNcH52MloQtMpuvpvwz56jSbrgdEIu7lbaMVgsDP QL0pFr0l+09fcQj4goo0ByhoZ0N/E/U+bLv9 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6102:3d99:b0:3a9:5976:cd84 with SMTP id h25-20020a0561023d9900b003a95976cd84mr12345154vsv.4.1666370280478; Fri, 21 Oct 2022 09:38:00 -0700 (PDT) Date: Fri, 21 Oct 2022 16:37:03 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-48-jthoughton@google.com> Subject: [RFC PATCH v2 47/47] selftests/vm: add HGM UFFDIO_CONTINUE and hwpoison tests From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747316838650480712?= X-GMAIL-MSGID: =?utf-8?q?1747316838650480712?= This tests that high-granularity CONTINUEs at all sizes work (exercising contiguous PTE sizes for arm64, when support is added). This also tests that collapse works and hwpoison works correctly (although we aren't yet testing high-granularity poison). Signed-off-by: James Houghton --- tools/testing/selftests/vm/Makefile | 1 + tools/testing/selftests/vm/hugetlb-hgm.c | 326 +++++++++++++++++++++++ 2 files changed, 327 insertions(+) create mode 100644 tools/testing/selftests/vm/hugetlb-hgm.c diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile index 00920cb8b499..da1e01a5ac9b 100644 --- a/tools/testing/selftests/vm/Makefile +++ b/tools/testing/selftests/vm/Makefile @@ -32,6 +32,7 @@ TEST_GEN_FILES += compaction_test TEST_GEN_FILES += gup_test TEST_GEN_FILES += hmm-tests TEST_GEN_FILES += hugetlb-madvise +TEST_GEN_FILES += hugetlb-hgm TEST_GEN_FILES += hugepage-mmap TEST_GEN_FILES += hugepage-mremap TEST_GEN_FILES += hugepage-shm diff --git a/tools/testing/selftests/vm/hugetlb-hgm.c b/tools/testing/selftests/vm/hugetlb-hgm.c new file mode 100644 index 000000000000..e36a1c988bb4 --- /dev/null +++ b/tools/testing/selftests/vm/hugetlb-hgm.c @@ -0,0 +1,326 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Test uncommon cases in HugeTLB high-granularity mapping: + * 1. Test all supported high-granularity page sizes (with MADV_COLLAPSE). + * 2. Test MADV_HWPOISON behavior. + */ + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include + +#define PAGE_MASK ~(4096 - 1) + +#ifndef MADV_COLLAPSE +#define MADV_COLLAPSE 25 +#endif + +#define PREFIX " ... " + +int userfaultfd(int flags) +{ + return syscall(__NR_userfaultfd, flags); +} + +int map_range(int uffd, char *addr, uint64_t length) +{ + struct uffdio_continue cont = { + .range = (struct uffdio_range) { + .start = (uint64_t)addr, + .len = length, + }, + .mode = 0, + .mapped = 0, + }; + + if (ioctl(uffd, UFFDIO_CONTINUE, &cont) < 0) { + perror("UFFDIO_CONTINUE failed"); + return -1; + } + return 0; +} + +int check_equal(char *mapping, size_t length, char value) +{ + size_t i; + + for (i = 0; i < length; ++i) + if (mapping[i] != value) { + printf("mismatch at %p (%d != %d)\n", &mapping[i], + mapping[i], value); + return -1; + } + + return 0; +} + +int test_continues(int uffd, char *primary_map, char *secondary_map, size_t len, + bool verify) +{ + size_t offset = 0; + unsigned char iter = 0; + unsigned long pagesize = getpagesize(); + uint64_t size; + + for (size = len/2; size >= pagesize; + offset += size, size /= 2) { + iter++; + memset(secondary_map + offset, iter, size); + printf(PREFIX "UFFDIO_CONTINUE: %p -> %p = %d%s\n", + primary_map + offset, + primary_map + offset + size, + iter, + verify ? " (and verify)" : ""); + if (map_range(uffd, primary_map + offset, size)) + return -1; + if (verify && check_equal(primary_map + offset, size, iter)) + return -1; + } + return 0; +} + +int test_collapse(char *primary_map, size_t len, bool hwpoison) +{ + size_t offset; + int i; + uint64_t size; + + printf(PREFIX "collapsing %p -> %p\n", primary_map, primary_map + len); + if (madvise(primary_map, len, MADV_COLLAPSE) < 0) { + if (errno == EHWPOISON && hwpoison) { + /* this is expected for the hwpoison test. */ + printf(PREFIX "could not collapse due to poison\n"); + return 0; + } + perror("collapse failed"); + return -1; + } + + printf(PREFIX "verifying %p -> %p\n", primary_map, primary_map + len); + + offset = 0; + i = 0; + for (size = len/2; size > 4096; offset += size, size /= 2) { + if (check_equal(primary_map + offset, size, ++i)) + return -1; + } + /* expect the last 4K to be zero. */ + if (check_equal(primary_map + len - 4096, 4096, 0)) + return -1; + + return 0; +} + +static void *poisoned_addr; + +void sigbus_handler(int signo, siginfo_t *info, void *context) +{ + if (info->si_code != BUS_MCEERR_AR) + goto kill; + poisoned_addr = info->si_addr; +kill: + pthread_exit(NULL); +} + +void *access_mem(void *addr) +{ + volatile char *ptr = addr; + + *ptr; + return NULL; +} + +int test_poison_sigbus(char *addr) +{ + int ret = 0; + pthread_t pthread; + + poisoned_addr = (void *)0xBADBADBAD; + ret = pthread_create(&pthread, NULL, &access_mem, addr); + if (pthread_create(&pthread, NULL, &access_mem, addr)) { + printf("failed to create thread: %s\n", strerror(ret)); + return ret; + } + + pthread_join(pthread, NULL); + if (poisoned_addr != addr) { + printf("got incorrect poisoned address: %p vs %p\n", + poisoned_addr, addr); + return -1; + } + return 0; +} + +int test_hwpoison(char *primary_map, size_t len) +{ + const unsigned long pagesize = getpagesize(); + const int num_poison_checks = 512; + unsigned long bytes_per_check = len/num_poison_checks; + struct sigaction new, old; + int i; + + printf(PREFIX "poisoning %p -> %p\n", primary_map, primary_map + len); + if (madvise(primary_map, len, MADV_HWPOISON) < 0) { + perror("MADV_HWPOISON failed"); + return -1; + } + + printf(PREFIX "checking that it was poisoned " + "(%d addresses within %p -> %p)\n", + num_poison_checks, primary_map, primary_map + len); + + new.sa_sigaction = &sigbus_handler; + new.sa_flags = SA_SIGINFO; + if (sigaction(SIGBUS, &new, &old) < 0) { + perror("could not setup SIGBUS handler"); + return -1; + } + + if (pagesize > bytes_per_check) + bytes_per_check = pagesize; + + for (i = 0; i < len; i += bytes_per_check) + if (test_poison_sigbus(primary_map + i) < 0) + return -1; + /* check very last byte, because we left it unmapped */ + if (test_poison_sigbus(primary_map + len - 1)) + return -1; + + return 0; +} + +int test_hgm(int fd, size_t hugepagesize, size_t len, bool hwpoison) +{ + int ret = 0; + int uffd; + char *primary_map, *secondary_map; + struct uffdio_api api; + struct uffdio_register reg; + + if (ftruncate(fd, len) < 0) { + perror("ftruncate failed"); + return -1; + } + + uffd = userfaultfd(O_CLOEXEC | O_NONBLOCK); + if (uffd < 0) { + perror("uffd not created"); + return -1; + } + + primary_map = mmap(NULL, len, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + if (primary_map == MAP_FAILED) { + perror("mmap for primary mapping failed"); + ret = -1; + goto close_uffd; + } + secondary_map = mmap(NULL, len, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + if (secondary_map == MAP_FAILED) { + perror("mmap for secondary mapping failed"); + ret = -1; + goto unmap_primary; + } + + printf(PREFIX "primary mapping: %p\n", primary_map); + printf(PREFIX "secondary mapping: %p\n", secondary_map); + + api.api = UFFD_API; + api.features = UFFD_FEATURE_MINOR_HUGETLBFS | + UFFD_FEATURE_MISSING_HUGETLBFS | + UFFD_FEATURE_MINOR_HUGETLBFS_HGM | UFFD_FEATURE_SIGBUS | + UFFD_FEATURE_EXACT_ADDRESS; + if (ioctl(uffd, UFFDIO_API, &api) == -1) { + perror("UFFDIO_API failed"); + ret = -1; + goto out; + } + if (!(api.features & UFFD_FEATURE_MINOR_HUGETLBFS_HGM)) { + puts("UFFD_FEATURE_MINOR_HUGETLBFS_HGM not present"); + ret = -1; + goto out; + } + + reg.range.start = (unsigned long)primary_map; + reg.range.len = len; + reg.mode = UFFDIO_REGISTER_MODE_MINOR | UFFDIO_REGISTER_MODE_MISSING; + reg.ioctls = 0; + if (ioctl(uffd, UFFDIO_REGISTER, ®) == -1) { + perror("register failed"); + ret = -1; + goto out; + } + + if (test_continues(uffd, primary_map, secondary_map, len, !hwpoison) + || (hwpoison && test_hwpoison(primary_map, len)) + || test_collapse(primary_map, len, hwpoison)) { + ret = -1; + } + + if (ftruncate(fd, 0) < 0) { + perror("ftruncate back to 0 failed"); + ret = -1; + } + +out: + munmap(secondary_map, len); +unmap_primary: + munmap(primary_map, len); +close_uffd: + close(uffd); + return ret; +} + +int main(void) +{ + int fd; + struct statfs file_stat; + size_t hugepagesize; + size_t len; + + fd = memfd_create("hugetlb_tmp", MFD_HUGETLB); + if (fd < 0) { + perror("could not open hugetlbfs file"); + return -1; + } + + memset(&file_stat, 0, sizeof(file_stat)); + if (fstatfs(fd, &file_stat)) { + perror("fstatfs failed"); + goto close; + } + if (file_stat.f_type != HUGETLBFS_MAGIC) { + printf("not hugetlbfs file\n"); + goto close; + } + + hugepagesize = file_stat.f_bsize; + len = 2 * hugepagesize; + printf("HGM regular test...\n"); + printf("HGM regular test: %s\n", + test_hgm(fd, hugepagesize, len, false) + ? "FAILED" : "PASSED"); + printf("HGM hwpoison test...\n"); + printf("HGM hwpoison test: %s\n", + test_hgm(fd, hugepagesize, len, true) + ? "FAILED" : "PASSED"); +close: + close(fd); + + return 0; +}