From patchwork Fri Dec 9 17:00:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 31873 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp888089wrr; Fri, 9 Dec 2022 09:03:03 -0800 (PST) X-Google-Smtp-Source: AA0mqf6+W8rnspcLAaSbFo3bHCvbZPGJxhLT/ylILdp0SHTG3vmOuHkysMQUt42uw7pt/BaX+DJb X-Received: by 2002:a17:907:a4c6:b0:78d:f455:30db with SMTP id vq6-20020a170907a4c600b0078df45530dbmr5278633ejc.3.1670605383767; Fri, 09 Dec 2022 09:03:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670605383; cv=none; d=google.com; s=arc-20160816; b=gUt+KzRFfhJ+VwBcS+PEQtU+hct3ZgOKRRh0vhfMgCZqLUdSmmrtvif6ne3+1aVbEo kNcHeNo+VuD/ICA4qwc+WlpKUF5Vh1jHj5bJIZQl04N5Fy4tl8ErtkqktOhOuEhhLF/y T1ubUUnv1CPTZ62nCoNp9Ul5YIP1AMLlBhobDaLocldo8BOzgs+gjqDx9fHRbt60LBid pJSA3NFbtSCdpK39cwVCu8uVxBYXV++QC0HsNIojb8w3b5MOPMasfU/jJT0rSwmHnTcp r32t3qoN/jCq21xvWKwmYAG+Mh9OXtehK6PxN9ZqcCaBHxGRQNnc57gQ2tKS5FQseaZR M6/A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=W6TRH+D4KiiNcSkU8trC/D1fuTVk/3U3ig0mjyWFdEw=; b=QdKRejqAMgyVGwY22yqNa6T+zBShCkyaTDhe+HhZniQkaDDCkvnX71Fy9/x0Up+zrF f8EQ6roEGg9CfPXmW8h6Zqt3PsvKP14tFM8D7OPCoiDGuvb6oRVF0wcmKFnpXBW1topt qSuUN1RgqF/vUbE+n/sh3rQcNwK/dTv99+oNeKriQyzeRCEXhYSRRx0PjuqTn0BUM8iT Ozm1/3TkrXEYuLaW1KV8rrYr/qKQ1Scl1pjCxr3wu77220d3onRpkraCsaCBPmXQJBR6 5UMglOYXJ5QicieQ3VVeUy7tJm/a6K+CiRQGwT1PiykEsKPaU4DPWqcod1+KPGbF6RlM EgPA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=BasAfyTl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p17-20020a1709060e9100b007c1479b6a78si183846ejf.816.2022.12.09.09.02.38; Fri, 09 Dec 2022 09:03:03 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=BasAfyTl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229798AbiLIRCP (ORCPT + 99 others); Fri, 9 Dec 2022 12:02:15 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51630 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229580AbiLIRCN (ORCPT ); Fri, 9 Dec 2022 12:02:13 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 306C77D08A for ; Fri, 9 Dec 2022 09:01:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1670605275; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=W6TRH+D4KiiNcSkU8trC/D1fuTVk/3U3ig0mjyWFdEw=; b=BasAfyTl2W3DbC3FWMAKlioqVS3BzNwSpaquhle+SBStzFsTSD0M022GIVgKReJlL7Muhf DPNDfLigr4Mi3rLsB9vfwMfZZxOxbWE6RjI/42GKCYWDcIJUaSqBl4NbOKLDmyWg95sdYg rOcYusLYeM2tIqlmRaogN4+ztpKLgg0= Received: from mail-oa1-f71.google.com (mail-oa1-f71.google.com [209.85.160.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-249-wg_v7kINO1OpP8R3Tt9HrQ-1; Fri, 09 Dec 2022 12:01:11 -0500 X-MC-Unique: wg_v7kINO1OpP8R3Tt9HrQ-1 Received: by mail-oa1-f71.google.com with SMTP id 586e51a60fabf-144870e8fe8so134755fac.13 for ; Fri, 09 Dec 2022 09:01:11 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=W6TRH+D4KiiNcSkU8trC/D1fuTVk/3U3ig0mjyWFdEw=; b=ZQKQl1IyyfarP78uIWU/RGr1TTbEAypGYck0+wlOjBzS4NEY9ML+4bfJ9FCGL8iBdt kxlBOfvLjZHZTA2VbqyEqFfikUUrlteSe7m7ZmJe0s8kWGUhr32JF31zM7jB/sqDQwNG 9FD6Pvog2MrWE3vr1GuWyKbE2qZW1/RQuBO9Mk4KRad02ztkgRoPH+s+cSr6D4kS+Fgf WmfkM/YseefjOHtJqyfQxlyVhUd9LtUlp3OAYa8y/v0oEADJiPDq1bdzPsln4K2GE+KQ lbDr9lJeOd7PP7qPZTUjsOvrNfSLEGv+15nhWAv+o0dRhWJzhud5BiJZ8cvGXhbVCcXo efow== X-Gm-Message-State: ANoB5pmKZVJoqcQkpezADuSy4zhDlkhl4drFrzCMGoeIzwWw1TVGxTj6 zgXcPsGobTCmL7ArLzdNBK41PmoTFiVhh6Nlb/xNJSU1U+MXqMTRBjrpLdqoCrqy7uI8CTV2ud7 2GadPz2PK8fwlvvLP+V0HXoRN X-Received: by 2002:a05:6870:ed8e:b0:144:c281:11ec with SMTP id fz14-20020a056870ed8e00b00144c28111ecmr3491113oab.38.1670605270738; Fri, 09 Dec 2022 09:01:10 -0800 (PST) X-Received: by 2002:a05:6870:ed8e:b0:144:c281:11ec with SMTP id fz14-20020a056870ed8e00b00144c28111ecmr3491088oab.38.1670605270459; Fri, 09 Dec 2022 09:01:10 -0800 (PST) Received: from x1n.redhat.com (bras-base-aurron9127w-grc-46-70-31-27-79.dsl.bell.ca. [70.31.27.79]) by smtp.gmail.com with ESMTPSA id q7-20020a05620a0d8700b006cf38fd659asm178907qkl.103.2022.12.09.09.01.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 09 Dec 2022 09:01:06 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Miaohe Lin , David Hildenbrand , Nadav Amit , peterx@redhat.com, Andrea Arcangeli , Jann Horn , John Hubbard , Mike Kravetz , James Houghton , Rik van Riel , Muchun Song Subject: [PATCH v3 1/9] mm/hugetlb: Let vma_offset_start() to return start Date: Fri, 9 Dec 2022 12:00:52 -0500 Message-Id: <20221209170100.973970-2-peterx@redhat.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221209170100.973970-1-peterx@redhat.com> References: <20221209170100.973970-1-peterx@redhat.com> MIME-Version: 1.0 Content-type: text/plain X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751756711072978307?= X-GMAIL-MSGID: =?utf-8?q?1751756711072978307?= Even though vma_offset_start() is named like that, it's not returning "the start address of the range" but rather the offset we should use to offset the vma->vm_start address. Make it return the real value of the start vaddr, and it also helps for all the callers because whenever the retval is used, it'll be ultimately added into the vma->vm_start anyway, so it's better. Reviewed-by: Mike Kravetz Reviewed-by: David Hildenbrand Reviewed-by: John Hubbard Signed-off-by: Peter Xu --- fs/hugetlbfs/inode.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 790d2727141a..fdb16246f46e 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -412,10 +412,12 @@ static bool hugetlb_vma_maps_page(struct vm_area_struct *vma, */ static unsigned long vma_offset_start(struct vm_area_struct *vma, pgoff_t start) { + unsigned long offset = 0; + if (vma->vm_pgoff < start) - return (start - vma->vm_pgoff) << PAGE_SHIFT; - else - return 0; + offset = (start - vma->vm_pgoff) << PAGE_SHIFT; + + return vma->vm_start + offset; } static unsigned long vma_offset_end(struct vm_area_struct *vma, pgoff_t end) @@ -457,7 +459,7 @@ static void hugetlb_unmap_file_folio(struct hstate *h, v_start = vma_offset_start(vma, start); v_end = vma_offset_end(vma, end); - if (!hugetlb_vma_maps_page(vma, vma->vm_start + v_start, page)) + if (!hugetlb_vma_maps_page(vma, v_start, page)) continue; if (!hugetlb_vma_trylock_write(vma)) { @@ -473,8 +475,8 @@ static void hugetlb_unmap_file_folio(struct hstate *h, break; } - unmap_hugepage_range(vma, vma->vm_start + v_start, v_end, - NULL, ZAP_FLAG_DROP_MARKER); + unmap_hugepage_range(vma, v_start, v_end, NULL, + ZAP_FLAG_DROP_MARKER); hugetlb_vma_unlock_write(vma); } @@ -507,10 +509,9 @@ static void hugetlb_unmap_file_folio(struct hstate *h, */ v_start = vma_offset_start(vma, start); v_end = vma_offset_end(vma, end); - if (hugetlb_vma_maps_page(vma, vma->vm_start + v_start, page)) - unmap_hugepage_range(vma, vma->vm_start + v_start, - v_end, NULL, - ZAP_FLAG_DROP_MARKER); + if (hugetlb_vma_maps_page(vma, v_start, page)) + unmap_hugepage_range(vma, v_start, v_end, NULL, + ZAP_FLAG_DROP_MARKER); kref_put(&vma_lock->refs, hugetlb_vma_lock_release); hugetlb_vma_unlock_write(vma); @@ -540,8 +541,7 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end, v_start = vma_offset_start(vma, start); v_end = vma_offset_end(vma, end); - unmap_hugepage_range(vma, vma->vm_start + v_start, v_end, - NULL, zap_flags); + unmap_hugepage_range(vma, v_start, v_end, NULL, zap_flags); /* * Note that vma lock only exists for shared/non-private From patchwork Fri Dec 9 17:00:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 31876 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp888448wrr; Fri, 9 Dec 2022 09:03:31 -0800 (PST) X-Google-Smtp-Source: AA0mqf6QgrExp6HO/+VW+hJAOhRIDT+KGVcYVs+Q0lV8ptRnDLNu0ianpDoNJXpXbkFz/bGI585J X-Received: by 2002:a17:906:ecb7:b0:7c1:f6c:dd4e with SMTP id qh23-20020a170906ecb700b007c10f6cdd4emr6053923ejb.40.1670605411736; Fri, 09 Dec 2022 09:03:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670605411; cv=none; d=google.com; s=arc-20160816; b=kmwPUco9YbzvVQUuuZOdIwzQcaOE1i6RZJLjvdN2KtSPuFqrXKEPVsBadowaooLLxc Lk7gBs5owcrnwNlpLcCjfO9GBanBcP9GTyGBa9s4/JKfEJggx8A6zGYf69Phwf3NZYbb n7EWZPliYdBfrhYNDk8A3nvIPW1wCi10LMt+u1yPp8yFnKdHBUZtqYJv8c4eJWhNcb+q 33wlEIJuwKSLherZ1U2e5knoBNlbduslpw7qhKCXX2UCBjATvwYVv5mZVfHFJlb50h66 LERg2aqSeNAe0EHyNMdFYzfFcexw1mb3J4jmMDqe1rBs7LPyMQqP/mx8MfWAXVM2bx9f EIYA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=3uAZCG3P+mUZeKCIXPYURgU7pcsgAcHbWcWGaDTy3vU=; b=W9DVTgNdSsZllwjsn5HwX0TO61/rvh9iKIPQfzAIqB7pQbJzraMDbsRBl/oGBIo7/X MYbYDxZLKGbziY1ntPlTgtajzIDjyr9PGwBBa18peU2+dUFlPJwl7dCo8eT75QjUyJC1 tAhPI0b5DHjmTpPlCg1Aq8xWAdVVq2npkut8lYk7h45XrtYVKZW0XpVUnwZfW14Wj5L4 +JKTcsPNiUcvluVmC3oH/1Gkkk7/gy3fVuWeVF/pHtAUPfSDllhzKfH1zQEWDUYVGU9t oVZBJEIJdLOi6wmFKhZ+IV4MBT1zmXiRFzvw0dXM7qm64f8RlZxcdOx3xJrZFei1ruv5 7AeQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=DzIjkVtp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id xb4-20020a170907070400b007af039d0bcasi210266ejb.429.2022.12.09.09.03.07; Fri, 09 Dec 2022 09:03:31 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=DzIjkVtp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229938AbiLIRC3 (ORCPT + 99 others); Fri, 9 Dec 2022 12:02:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51590 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229580AbiLIRCQ (ORCPT ); Fri, 9 Dec 2022 12:02:16 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 31C6E8F08B for ; Fri, 9 Dec 2022 09:01:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1670605282; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3uAZCG3P+mUZeKCIXPYURgU7pcsgAcHbWcWGaDTy3vU=; b=DzIjkVtpK7Cp1Jj+m6zRAtywQWuMDR/7tV5tS+evqXmt64NxJB/4gfhY/0cj9b4c6u9HsY Dl47pHcs8SZHgPkQ1eTAYEQkNVG0YMNQhWtz+xZ6Vwss/aeCWM4UKkhsibCTXiAexk/QGT Rr5uiyfe/uPpiBHC3AufQk8xaP5WifM= Received: from mail-oo1-f71.google.com (mail-oo1-f71.google.com [209.85.161.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-614-P1I0-qKkN-6KU1hwasDSOA-1; Fri, 09 Dec 2022 12:01:13 -0500 X-MC-Unique: P1I0-qKkN-6KU1hwasDSOA-1 Received: by mail-oo1-f71.google.com with SMTP id u22-20020a4a6c56000000b004a38aa46a1fso1578747oof.22 for ; Fri, 09 Dec 2022 09:01:13 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3uAZCG3P+mUZeKCIXPYURgU7pcsgAcHbWcWGaDTy3vU=; b=E55SgkSfoQwsHLpjbJG3MLUvJn1lSsaBb8vfRXDBwHcY0+Ci1JuFM8xTBvqLzxdFyM NpawgnEgIQgYxuk2tdCbgoe4eVC5m1WBIT59LdAV58Y44/MuGhGpwk0zMbHR2JsbfRry w2fO3uLqyfxrCQLgB8t+IrYjWea4dLrjz8tlBlmWEX0pbYFHttyaMcvwSIpbjMHhsIo9 f99dKFCulVo8MO9gBirDvhhqx7Se3bRLywpm6yvOJVkpmFBJVM6CdauKbVmWYXXLBHIj j+QBemK5291JIZv9/iaCHvcXv7wPydyBrGQwHnBo1QYQJXg/otxLuXg15Xs9IFkMeNdB kBJQ== X-Gm-Message-State: ANoB5pmv2PQKivi3NzWMTLDSAYmXRA0aNntiKCNRsr/LU+u+6tfd0MEE +sL2GK2uR5izNmvhFC1MZA1zPK5zGy+shZlAKSfJnny8BuDQ2oYi+qV3a65q95iIU3zMcXdy9Pi yrJjqgbaItosd+EdVjOC7auxz X-Received: by 2002:a54:438d:0:b0:35e:1ca6:ff6d with SMTP id u13-20020a54438d000000b0035e1ca6ff6dmr2673227oiv.5.1670605273064; Fri, 09 Dec 2022 09:01:13 -0800 (PST) X-Received: by 2002:a54:438d:0:b0:35e:1ca6:ff6d with SMTP id u13-20020a54438d000000b0035e1ca6ff6dmr2673199oiv.5.1670605272841; Fri, 09 Dec 2022 09:01:12 -0800 (PST) Received: from x1n.redhat.com (bras-base-aurron9127w-grc-46-70-31-27-79.dsl.bell.ca. [70.31.27.79]) by smtp.gmail.com with ESMTPSA id q7-20020a05620a0d8700b006cf38fd659asm178907qkl.103.2022.12.09.09.01.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 09 Dec 2022 09:01:12 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Miaohe Lin , David Hildenbrand , Nadav Amit , peterx@redhat.com, Andrea Arcangeli , Jann Horn , John Hubbard , Mike Kravetz , James Houghton , Rik van Riel , Muchun Song Subject: [PATCH v3 2/9] mm/hugetlb: Don't wait for migration entry during follow page Date: Fri, 9 Dec 2022 12:00:53 -0500 Message-Id: <20221209170100.973970-3-peterx@redhat.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221209170100.973970-1-peterx@redhat.com> References: <20221209170100.973970-1-peterx@redhat.com> MIME-Version: 1.0 Content-type: text/plain X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751756739887593636?= X-GMAIL-MSGID: =?utf-8?q?1751756739887593636?= That's what the code does with !hugetlb pages, so we should logically do the same for hugetlb, so migration entry will also be treated as no page. This is probably also the last piece in follow_page code that may sleep, the last one should be removed in cf994dd8af27 ("mm/gup: remove FOLL_MIGRATION", 2022-11-16). Reviewed-by: Mike Kravetz Reviewed-by: David Hildenbrand Reviewed-by: John Hubbard Signed-off-by: Peter Xu --- mm/hugetlb.c | 11 ----------- 1 file changed, 11 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 1088f2f41c88..c8a6673fe5b4 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6232,7 +6232,6 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, if (WARN_ON_ONCE(flags & FOLL_PIN)) return NULL; -retry: pte = huge_pte_offset(mm, haddr, huge_page_size(h)); if (!pte) return NULL; @@ -6255,16 +6254,6 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, page = NULL; goto out; } - } else { - if (is_hugetlb_entry_migration(entry)) { - spin_unlock(ptl); - __migration_entry_wait_huge(pte, ptl); - goto retry; - } - /* - * hwpoisoned entry is treated as no_page_table in - * follow_page_mask(). - */ } out: spin_unlock(ptl); From patchwork Fri Dec 9 17:00:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 31874 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp888305wrr; Fri, 9 Dec 2022 09:03:19 -0800 (PST) X-Google-Smtp-Source: AA0mqf7IiG9hwipUijxdTlVbgXcMsFHDv79jlXLtCqeVbJQ3suv6JiUIDIjXhtrDT71ctIr3DM+L X-Received: by 2002:a17:906:90c9:b0:7c0:f908:79f1 with SMTP id v9-20020a17090690c900b007c0f90879f1mr7957310ejw.60.1670605399259; Fri, 09 Dec 2022 09:03:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670605399; cv=none; d=google.com; s=arc-20160816; b=CXLckSfUDGVDM113tmBO0SYq6loYCx9gzuHfe1OXF47p4+qFV5kCBH8Lu4MVS4YOWp dE+e5keIQmFpEB/AVf6NeddwEAR5ZpyvtQ5wwxswU9Jh/I+BKyL57PQ0U4P5FR9r3iWb njvYM9bsQ1k3CxKhnam6xfg+QZ67vpnfWvDIPkjwoGT8G3cuakuyfzxJg7hMKRU8PtHK pgvv34R9dTjeqN9BloZKk7Q2leqyuDhcQmhUa6FVEvBeVNXOxdWeaOaE8f/yJkcxZgL/ Qcw9+J5jWEaBw4y7o9/+ttpnwSwQRcHZ/3ABySa6zyqjmOgx6pbbisZSW24arwuHnObO tYhQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=KF8YBIr+SenpQMPOCGcYKB11m6lZYSBtX9iYqo4LY9E=; b=okQ7ICJn2KmlXtBiWBmfkCJJMAAaYdPRjhhkW1+Qx0Z9tsfyTDdh6Ejqvobbo+T9NO aXD4nnLzs8fyQeE1p1xxjjv3LYtrLG+JZMHT2XLppm9BJkSvUVeizcwfdSKK/Xryl1ZM +UKCUZYivIit0z6DIgDLfJtGzuH1TtnrKkZyBWIXTlh3IJBZAZPNtJTAJytMtQisjPT9 BT3HHR+LJ5/V8NPwy/8LIY9YvnzEKjlPl9blotFU78gyU4XXwFfxqecpxYsjRLbTPYrc lxMG893+ma9MHrQKz8/KYN90HwZBexYonjApsySi9qgccSgfNKOjofFB1qCPcEHTI9LR VcFw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Uw8a7AAh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q30-20020a50cc9e000000b0046906b7a5c8si1655716edi.559.2022.12.09.09.02.54; Fri, 09 Dec 2022 09:03:19 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Uw8a7AAh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229854AbiLIRCS (ORCPT + 99 others); Fri, 9 Dec 2022 12:02:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51474 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229749AbiLIRCO (ORCPT ); Fri, 9 Dec 2022 12:02:14 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8EEDA9857C for ; Fri, 9 Dec 2022 09:01:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1670605277; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KF8YBIr+SenpQMPOCGcYKB11m6lZYSBtX9iYqo4LY9E=; b=Uw8a7AAhzLIhx8sWioDN6YkBSOA9pkP3o+0vi3iDFLGzZJUFY0frrw4aEa/RB+wm5sWIKz ybSQicFWvU7mdKGoi0fYMzujHTjCJlByj3EVQmR2ozN2Ro0uaUHvMavdZS1l3KC+fibX6l ci+pwETu/EXAGrrMcxsWJIzfGCQvbSM= Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-637-E888qujWNYmCjCrllVjtrQ-1; Fri, 09 Dec 2022 12:01:16 -0500 X-MC-Unique: E888qujWNYmCjCrllVjtrQ-1 Received: by mail-qk1-f200.google.com with SMTP id x2-20020a05620a448200b006fa7dad5c1cso5612434qkp.10 for ; Fri, 09 Dec 2022 09:01:16 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KF8YBIr+SenpQMPOCGcYKB11m6lZYSBtX9iYqo4LY9E=; b=RaH7CiRedd2U6H2qbNpM+tnlLm0AO8KdDBB4oYCElTmoQbrfzaKMoF8uN2a1zWICuP /MxcULTuYTbuBl9J5hhMuW9ZFR0ZuMlzTUR6mSE+Epc+bAkWs2Z6kYZnLAp89pyZB+ea dBfoajyL5VEE1X+J5kiOSSOForaYytMvVtncGwYnPMXk1xCnVINurclCaGyUJJ0fbmke skm2VErsyF4Stn2F3AHjq8OZnZzAm4lUHHwgh7wZwJi0GOX4QywaS2aXtF7lQOHdHls9 dcK4JHYi5o4D9iauIRq4AIhOvCObMoLVcsqeSvTGBTMXO6B1XLxSTQ5LqRgYeLo7cH3F xsFg== X-Gm-Message-State: ANoB5pmwrWYG6dMJbbx+sOzK0G6olWnLamHVjNk8p3RI2ofAWyVMO6sQ 7biw0UHyMb2N/3vcnW3O+UgFBEqMYJrHhsvkk1abPXqFgaxzkhRZV00RgYkD8n7AF72yqmfQxP2 OTwenIx9Yn7iuqIjRscZ8YhV/ X-Received: by 2002:a05:622a:181b:b0:3a6:8b0a:89f4 with SMTP id t27-20020a05622a181b00b003a68b0a89f4mr5562655qtc.37.1670605275470; Fri, 09 Dec 2022 09:01:15 -0800 (PST) X-Received: by 2002:a05:622a:181b:b0:3a6:8b0a:89f4 with SMTP id t27-20020a05622a181b00b003a68b0a89f4mr5562589qtc.37.1670605275115; Fri, 09 Dec 2022 09:01:15 -0800 (PST) Received: from x1n.redhat.com (bras-base-aurron9127w-grc-46-70-31-27-79.dsl.bell.ca. [70.31.27.79]) by smtp.gmail.com with ESMTPSA id q7-20020a05620a0d8700b006cf38fd659asm178907qkl.103.2022.12.09.09.01.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 09 Dec 2022 09:01:14 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Miaohe Lin , David Hildenbrand , Nadav Amit , peterx@redhat.com, Andrea Arcangeli , Jann Horn , John Hubbard , Mike Kravetz , James Houghton , Rik van Riel , Muchun Song Subject: [PATCH v3 3/9] mm/hugetlb: Document huge_pte_offset usage Date: Fri, 9 Dec 2022 12:00:54 -0500 Message-Id: <20221209170100.973970-4-peterx@redhat.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221209170100.973970-1-peterx@redhat.com> References: <20221209170100.973970-1-peterx@redhat.com> MIME-Version: 1.0 Content-type: text/plain X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751756727056316666?= X-GMAIL-MSGID: =?utf-8?q?1751756727056316666?= huge_pte_offset() is potentially a pgtable walker, looking up pte_t* for a hugetlb address. Normally, it's always safe to walk a generic pgtable as long as we're with the mmap lock held for either read or write, because that guarantees the pgtable pages will always be valid during the process. But it's not true for hugetlbfs, especially shared: hugetlbfs can have its pgtable freed by pmd unsharing, it means that even with mmap lock held for current mm, the PMD pgtable page can still go away from under us if pmd unsharing is possible during the walk. So we have two ways to make it safe even for a shared mapping: (1) If we're with the hugetlb vma lock held for either read/write, it's okay because pmd unshare cannot happen at all. (2) If we're with the i_mmap_rwsem lock held for either read/write, it's okay because even if pmd unshare can happen, the pgtable page cannot be freed from under us. Document it. Reviewed-by: John Hubbard Reviewed-by: David Hildenbrand Signed-off-by: Peter Xu --- include/linux/hugetlb.h | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 551834cd5299..d755e2a7c0db 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -192,6 +192,38 @@ extern struct list_head huge_boot_pages; pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz); +/* + * huge_pte_offset(): Walk the hugetlb pgtable until the last level PTE. + * Returns the pte_t* if found, or NULL if the address is not mapped. + * + * Since this function will walk all the pgtable pages (including not only + * high-level pgtable page, but also PUD entry that can be unshared + * concurrently for VM_SHARED), the caller of this function should be + * responsible of its thread safety. One can follow this rule: + * + * (1) For private mappings: pmd unsharing is not possible, so holding the + * mmap_lock for either read or write is sufficient. Most callers + * already hold the mmap_lock, so normally, no special action is + * required. + * + * (2) For shared mappings: pmd unsharing is possible (so the PUD-ranged + * pgtable page can go away from under us! It can be done by a pmd + * unshare with a follow up munmap() on the other process), then we + * need either: + * + * (2.1) hugetlb vma lock read or write held, to make sure pmd unshare + * won't happen upon the range (it also makes sure the pte_t we + * read is the right and stable one), or, + * + * (2.2) hugetlb mapping i_mmap_rwsem lock held read or write, to make + * sure even if unshare happened the racy unmap() will wait until + * i_mmap_rwsem is released. + * + * Option (2.1) is the safest, which guarantees pte stability from pmd + * sharing pov, until the vma lock released. Option (2.2) doesn't protect + * a concurrent pmd unshare, but it makes sure the pgtable page is safe to + * access. + */ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr, unsigned long sz); unsigned long hugetlb_mask_last_page(struct hstate *h); From patchwork Fri Dec 9 17:00:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 31878 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp889227wrr; Fri, 9 Dec 2022 09:04:45 -0800 (PST) X-Google-Smtp-Source: AA0mqf52pNh8WL79j7griCZEQ0DdIPVUuc2BGmDTY+Plgw59sIPQV4v2S+Ta79zCEbivhbAqXc+8 X-Received: by 2002:a62:3846:0:b0:577:7cfb:a896 with SMTP id f67-20020a623846000000b005777cfba896mr7452094pfa.31.1670605484875; Fri, 09 Dec 2022 09:04:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670605484; cv=none; d=google.com; s=arc-20160816; b=0Z46tkZxYNUqvk/2GaFw0JsKdJgOTy3stVTkFloq/ArN59OJIK2KijSobW7X1yOdmb kB6jMpJi3z7YyEymUHk/tzTKj2sIwmjy/fKd8YCOCoLQXlhZ5Tpz6dvWuJr3LImjiQJG DIFGFPv5qC5RX2tX5LhlcsZOYkE8NPNY93MZYLmC5Bg53+k/55F9JeyEDLNvrNOR90mk RR1hExqrn+dfc9smj6wHDcu4k2OD1NmEJlr0CMIcv4IQcvkF+Bafyc+emjhJGM96Tk1Q MK8/cJnrJE8TA60ATauR068T5mcNUkd7O92YrYTCkTlavj/IeQ/uTaTC4YvRGyuaPsO/ qWMQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=xtv4yezaaEbF1bO1grzv/hp6hNuvFjuXz2keNbWJszA=; b=eWPckOpJg5l493h7OT7dTDVSFnHc842Nb6y4oA0SQFcj3JScAEEcE8oQDByIpkrKae B7CvPTjJ+ali4lWCTvUki4vgfaXjG1VmvluKSDftOt3cK1VxCYTUyzHOpyq6wTJo1HWp nKwY3vPq2guMagaeRXQlH9H6Y/wR+IlXwHHrlZmEoGfCD5ydWOjsD4QbNfg9ufx9M3ks pu+3lSnVoviFhPTJrShS1R6cCQ6RGzVs+VAHiO4g6ZG0wPc6VrsJ2dalf1X34AEWTGSM EbYrQlCJ35QRcp67stV2lKEz3NOpeauh7WF3WGNCuf0kniQzY9McnBlIWIMV6QtucJok Zm3g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=SzwF386c; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p7-20020a056a000a0700b005726ac39d2esi2264659pfh.30.2022.12.09.09.04.28; Fri, 09 Dec 2022 09:04:44 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=SzwF386c; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230006AbiLIRCy (ORCPT + 99 others); Fri, 9 Dec 2022 12:02:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51464 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229901AbiLIRCY (ORCPT ); Fri, 9 Dec 2022 12:02:24 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 58A337D06C for ; Fri, 9 Dec 2022 09:01:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1670605281; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xtv4yezaaEbF1bO1grzv/hp6hNuvFjuXz2keNbWJszA=; b=SzwF386cJGBdeKTN2cX+P6IhpBeNqRC2JLfZm7jdN+9McljOZSuiRjd5B3QrCMqJpbv4bZ jvdrWeT7VXOsRRkvQEp7fhLmOMjp9UzpvBTESLf2yNx1CUrWTLmw1Ak6/QEr9iRAUYab6c Q/y5zbvMNCq0kkrVZGNKtbMRrJGPGSk= Received: from mail-oa1-f72.google.com (mail-oa1-f72.google.com [209.85.160.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-605-JbRuyix0OF62SY8SwZm8lw-1; Fri, 09 Dec 2022 12:01:20 -0500 X-MC-Unique: JbRuyix0OF62SY8SwZm8lw-1 Received: by mail-oa1-f72.google.com with SMTP id 586e51a60fabf-143c7a3da8aso115247fac.23 for ; Fri, 09 Dec 2022 09:01:20 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xtv4yezaaEbF1bO1grzv/hp6hNuvFjuXz2keNbWJszA=; b=MbYb0QCBRbRCw6+8GOFYdoAOm+HxnYGLGtvOEvI9vn4/Dpwvg3iVWF7yYVawtLmsaH vSuMWGThonO2T1rlkKYI8G5vqNR4FUYqObdw0qFU06Q1UHDr1lB+UQ33v0shL/Xenpcz UxPbIlwCtNsww+GBI7TntIFMvxS+bmqwWpbt7nmeLn8CbGUQXk+t5SgbfOnf4igi789Y ymbVnQRQhVrksc2llNqYsjVqAYRLiX8sXIM23SSn4IHRpe339XooMrbr6C/wPic6z3e6 a7LRnJPfYqrZJH43lu0NU8LhViapA7lYggAsEmPipuhw0S8UTBUP2RlRypPhzmmj/rgp 7qpQ== X-Gm-Message-State: ANoB5pn2O8tac7vWG/oZmqidvyjdsMkNUsdsYWQqKhbrfi8/VgE/oH/4 44loorzo/1bbG1Clm1yb0j9oT26SjVnqRk6rE5+Uey8LaV3TJ0sTGM+BKdKgpDEYgek0LyXC/tA mu20vLoQimPMujT7UTXkSZXUF X-Received: by 2002:a4a:aec6:0:b0:49f:96f:e6c0 with SMTP id v6-20020a4aaec6000000b0049f096fe6c0mr3915742oon.8.1670605279393; Fri, 09 Dec 2022 09:01:19 -0800 (PST) X-Received: by 2002:a4a:aec6:0:b0:49f:96f:e6c0 with SMTP id v6-20020a4aaec6000000b0049f096fe6c0mr3915632oon.8.1670605278011; Fri, 09 Dec 2022 09:01:18 -0800 (PST) Received: from x1n.redhat.com (bras-base-aurron9127w-grc-46-70-31-27-79.dsl.bell.ca. [70.31.27.79]) by smtp.gmail.com with ESMTPSA id q7-20020a05620a0d8700b006cf38fd659asm178907qkl.103.2022.12.09.09.01.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 09 Dec 2022 09:01:17 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Miaohe Lin , David Hildenbrand , Nadav Amit , peterx@redhat.com, Andrea Arcangeli , Jann Horn , John Hubbard , Mike Kravetz , James Houghton , Rik van Riel , Muchun Song Subject: [PATCH v3 4/9] mm/hugetlb: Move swap entry handling into vma lock when faulted Date: Fri, 9 Dec 2022 12:00:55 -0500 Message-Id: <20221209170100.973970-5-peterx@redhat.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221209170100.973970-1-peterx@redhat.com> References: <20221209170100.973970-1-peterx@redhat.com> MIME-Version: 1.0 Content-type: text/plain X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751756816604603319?= X-GMAIL-MSGID: =?utf-8?q?1751756816604603319?= In hugetlb_fault(), there used to have a special path to handle swap entry at the entrance using huge_pte_offset(). That's unsafe because huge_pte_offset() for a pmd sharable range can access freed pgtables if without any lock to protect the pgtable from being freed after pmd unshare. Here the simplest solution to make it safe is to move the swap handling to be after the vma lock being held. We may need to take the fault mutex on either migration or hwpoison entries now (also the vma lock, but that's really needed), however neither of them is hot path. Note that the vma lock cannot be released in hugetlb_fault() when the migration entry is detected, because in migration_entry_wait_huge() the pgtable page will be used again (by taking the pgtable lock), so that also need to be protected by the vma lock. Modify migration_entry_wait_huge() so that it must be called with vma read lock held, and properly release the lock in __migration_entry_wait_huge(). Reviewed-by: Mike Kravetz Reviewed-by: John Hubbard Signed-off-by: Peter Xu --- include/linux/swapops.h | 6 ++++-- mm/hugetlb.c | 37 ++++++++++++++++--------------------- mm/migrate.c | 25 +++++++++++++++++++++---- 3 files changed, 41 insertions(+), 27 deletions(-) diff --git a/include/linux/swapops.h b/include/linux/swapops.h index a70b5c3a68d7..b134c5eb75cb 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -337,7 +337,8 @@ extern void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep, extern void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, unsigned long address); #ifdef CONFIG_HUGETLB_PAGE -extern void __migration_entry_wait_huge(pte_t *ptep, spinlock_t *ptl); +extern void __migration_entry_wait_huge(struct vm_area_struct *vma, + pte_t *ptep, spinlock_t *ptl); extern void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte); #endif /* CONFIG_HUGETLB_PAGE */ #else /* CONFIG_MIGRATION */ @@ -366,7 +367,8 @@ static inline void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep, static inline void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, unsigned long address) { } #ifdef CONFIG_HUGETLB_PAGE -static inline void __migration_entry_wait_huge(pte_t *ptep, spinlock_t *ptl) { } +static inline void __migration_entry_wait_huge(struct vm_area_struct *vma, + pte_t *ptep, spinlock_t *ptl) { } static inline void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte) { } #endif /* CONFIG_HUGETLB_PAGE */ static inline int is_writable_migration_entry(swp_entry_t entry) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index c8a6673fe5b4..247702eb9f88 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5824,22 +5824,6 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, int need_wait_lock = 0; unsigned long haddr = address & huge_page_mask(h); - ptep = huge_pte_offset(mm, haddr, huge_page_size(h)); - if (ptep) { - /* - * Since we hold no locks, ptep could be stale. That is - * OK as we are only making decisions based on content and - * not actually modifying content here. - */ - entry = huge_ptep_get(ptep); - if (unlikely(is_hugetlb_entry_migration(entry))) { - migration_entry_wait_huge(vma, ptep); - return 0; - } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) - return VM_FAULT_HWPOISON_LARGE | - VM_FAULT_SET_HINDEX(hstate_index(h)); - } - /* * Serialize hugepage allocation and instantiation, so that we don't * get spurious allocation failures if two CPUs race to instantiate @@ -5854,10 +5838,6 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, * Acquire vma lock before calling huge_pte_alloc and hold * until finished with ptep. This prevents huge_pmd_unshare from * being called elsewhere and making the ptep no longer valid. - * - * ptep could have already be assigned via huge_pte_offset. That - * is OK, as huge_pte_alloc will return the same value unless - * something has changed. */ hugetlb_vma_lock_read(vma); ptep = huge_pte_alloc(mm, vma, haddr, huge_page_size(h)); @@ -5886,8 +5866,23 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, * fault, and is_hugetlb_entry_(migration|hwpoisoned) check will * properly handle it. */ - if (!pte_present(entry)) + if (!pte_present(entry)) { + if (unlikely(is_hugetlb_entry_migration(entry))) { + /* + * Release the hugetlb fault lock now, but retain + * the vma lock, because it is needed to guard the + * huge_pte_lockptr() later in + * migration_entry_wait_huge(). The vma lock will + * be released there. + */ + mutex_unlock(&hugetlb_fault_mutex_table[hash]); + migration_entry_wait_huge(vma, ptep); + return 0; + } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) + ret = VM_FAULT_HWPOISON_LARGE | + VM_FAULT_SET_HINDEX(hstate_index(h)); goto out_mutex; + } /* * If we are going to COW/unshare the mapping later, we examine the diff --git a/mm/migrate.c b/mm/migrate.c index 48584b032ea9..9c4e3a833449 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -333,24 +333,41 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, } #ifdef CONFIG_HUGETLB_PAGE -void __migration_entry_wait_huge(pte_t *ptep, spinlock_t *ptl) +/* + * The vma read lock must be held upon entry. Holding that lock prevents either + * the pte or the ptl from being freed. + * + * This function will release the vma lock before returning. + */ +void __migration_entry_wait_huge(struct vm_area_struct *vma, + pte_t *ptep, spinlock_t *ptl) { pte_t pte; + hugetlb_vma_assert_locked(vma); spin_lock(ptl); pte = huge_ptep_get(ptep); - if (unlikely(!is_hugetlb_entry_migration(pte))) + if (unlikely(!is_hugetlb_entry_migration(pte))) { spin_unlock(ptl); - else + hugetlb_vma_unlock_read(vma); + } else { + /* + * If migration entry existed, safe to release vma lock + * here because the pgtable page won't be freed without the + * pgtable lock released. See comment right above pgtable + * lock release in migration_entry_wait_on_locked(). + */ + hugetlb_vma_unlock_read(vma); migration_entry_wait_on_locked(pte_to_swp_entry(pte), NULL, ptl); + } } void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte) { spinlock_t *ptl = huge_pte_lockptr(hstate_vma(vma), vma->vm_mm, pte); - __migration_entry_wait_huge(pte, ptl); + __migration_entry_wait_huge(vma, pte, ptl); } #endif From patchwork Fri Dec 9 17:00:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 31877 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp888623wrr; Fri, 9 Dec 2022 09:03:47 -0800 (PST) X-Google-Smtp-Source: AA0mqf5wDWqtQyr8EhfyQa6ty/z8djW046dfzKo/CpCWDMhhCAOm5ImO0Q92r2yyDH0rsy9QduDR X-Received: by 2002:a05:6a00:2354:b0:572:725f:33ec with SMTP id j20-20020a056a00235400b00572725f33ecmr7888374pfj.12.1670605427294; Fri, 09 Dec 2022 09:03:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670605427; cv=none; d=google.com; s=arc-20160816; b=Jypot9egE4q2OSfDZbuIKy+thfxZ6khQOL8WweX+lkOvZ72A1F1Vci83Xdej2dDhRx OKOOcXSF3CpDLX0str05h0d14W+n/yWmPS8fFfnF4VLSda3wNu4k65BQ6TI6QmsTaNan 4gTBPuREEP5d8bmvAhpKM5m1dtyv3qKyNME7DiiQZb4Ne1PJxsQFOdvrq5LgRbUQ+oJR jdT1yEBo/UV4haxQ2IhKmuP9B+XmQHJIkATkzR6/baIsVuhGb6d8I5RtjTlz6y5vrFQL cG2Y2baWgXex7K08YEWQ5Ycq1Fo+vl5OxFKp/gAUwAvA9phFevIUQPrZS2accu4IQENn semg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=aMLwOwXmHw3aEdY2dAtBVU7yezK1ZrIU2NcKyDsSofc=; b=kf97yNWBWaQkwwANFtS7F0nqMvYpHitGgyLfrZww/NkFMJteqmYRxIoRVCM1OpfCjO kruzlg9eh19Mn5lUzX2gV/MY6xDs6SFfSMhi15HMEXKd+YnYnzP1jhY8po3XMyDW5/Sl fzb6o7DL/85Bpp3dnEqbWc3lpm6yEaAGRdg+KChhUR6GBCuIpYFGgFgnF39T2oLlTzhi glWXSIspuXjwiiUVtH17LID0LA1T+7h/6XiIeaeQS2mQtz7ZH7IZLL438EGYFoNhWRoN yla4Yw0zKdtt4d1VV2UQ9yClEpsjky920lyDJpFNKCgdiKbulPOharsknZK7a3nWf+4L 5RfQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ixxtqDBz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 8-20020a630308000000b00477d099bfadsi1906327pgd.129.2022.12.09.09.03.33; Fri, 09 Dec 2022 09:03:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ixxtqDBz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229985AbiLIRCu (ORCPT + 99 others); Fri, 9 Dec 2022 12:02:50 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51606 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229905AbiLIRCY (ORCPT ); Fri, 9 Dec 2022 12:02:24 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 02C6679C8D for ; Fri, 9 Dec 2022 09:01:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1670605283; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aMLwOwXmHw3aEdY2dAtBVU7yezK1ZrIU2NcKyDsSofc=; b=ixxtqDBz3KXFU6W6Rd51ro51xrtpek0V/FAklQ2MBKrfiB1dpf37iOiLNQx5ztGtBOKXtj xdYP0FerDvl1v/XoqBOHNuLl4YcEGNJtsykIefg7Jttua5FdOUUHd09yftg2YzM92zAaVH +T7n6G3CBh5s10r9Mt56CMx5qu9IadQ= Received: from mail-oo1-f72.google.com (mail-oo1-f72.google.com [209.85.161.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-198--M-jSTl9NvmeHNcRCz_TsQ-1; Fri, 09 Dec 2022 12:01:21 -0500 X-MC-Unique: -M-jSTl9NvmeHNcRCz_TsQ-1 Received: by mail-oo1-f72.google.com with SMTP id d3-20020a4a9cc3000000b004a06af5f883so1569304ook.17 for ; Fri, 09 Dec 2022 09:01:21 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=aMLwOwXmHw3aEdY2dAtBVU7yezK1ZrIU2NcKyDsSofc=; b=3z9tp4o7sWfhViG9o+mu5VPnmnCPdmD0Tk52eY75e18M8vUt2fnZVCQQSCGw7N2Js+ oLi5RTqvBJwLGFPGiBkhIkP4X0JN7erhQLCHjPfzpXROTIxL4P6oZ7d9C9t9fEhYrM9Y lln5u27DctxxiA04+wXVvrvQzl1mD4/tAvGAPv0Y6hG0jsZL1jUHPE5xYNTFN+nC8yWU has5wABEQCzv79lZc8vqCZAmbFcMfoTZaEEiFZ4ftVJ/s5kxdgOKBwhLWJ1nHofCO+z2 QqagrOx1yVVwMgSFZjCjqzQBlNG2F1KSCvICDBmb3hslR+oUTEiA+mHTGK0eOLBcCSY3 gnFA== X-Gm-Message-State: ANoB5pl4WqVuLQg0vb/BNApQKDuK8p6WggZDE1FNbaaKbptLL6ikmm8n sf+6Ja1rNpPzB78tDeZE335G2YhuX7HV1d5efkE+UMCvR4Jr7LXiVL97gL5hOgwGd6IObnOd6tw 6c+1r+F0Jz+5JznopTZ4pEoBV X-Received: by 2002:a9d:77c1:0:b0:66d:c8a2:b9b with SMTP id w1-20020a9d77c1000000b0066dc8a20b9bmr3276915otl.12.1670605281160; Fri, 09 Dec 2022 09:01:21 -0800 (PST) X-Received: by 2002:a9d:77c1:0:b0:66d:c8a2:b9b with SMTP id w1-20020a9d77c1000000b0066dc8a20b9bmr3276898otl.12.1670605280840; Fri, 09 Dec 2022 09:01:20 -0800 (PST) Received: from x1n.redhat.com (bras-base-aurron9127w-grc-46-70-31-27-79.dsl.bell.ca. [70.31.27.79]) by smtp.gmail.com with ESMTPSA id q7-20020a05620a0d8700b006cf38fd659asm178907qkl.103.2022.12.09.09.01.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 09 Dec 2022 09:01:20 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Miaohe Lin , David Hildenbrand , Nadav Amit , peterx@redhat.com, Andrea Arcangeli , Jann Horn , John Hubbard , Mike Kravetz , James Houghton , Rik van Riel , Muchun Song Subject: [PATCH v3 5/9] mm/hugetlb: Make userfaultfd_huge_must_wait() safe to pmd unshare Date: Fri, 9 Dec 2022 12:00:56 -0500 Message-Id: <20221209170100.973970-6-peterx@redhat.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221209170100.973970-1-peterx@redhat.com> References: <20221209170100.973970-1-peterx@redhat.com> MIME-Version: 1.0 Content-type: text/plain X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751756756746469079?= X-GMAIL-MSGID: =?utf-8?q?1751756756746469079?= We can take the hugetlb walker lock, here taking vma lock directly. Reviewed-by: David Hildenbrand Reviewed-by: Mike Kravetz Reviewed-by: John Hubbard Signed-off-by: Peter Xu --- fs/userfaultfd.c | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 07c81ab3fd4d..969f4be967c6 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -376,7 +376,8 @@ static inline unsigned int userfaultfd_get_blocking_state(unsigned int flags) */ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) { - struct mm_struct *mm = vmf->vma->vm_mm; + struct vm_area_struct *vma = vmf->vma; + struct mm_struct *mm = vma->vm_mm; struct userfaultfd_ctx *ctx; struct userfaultfd_wait_queue uwq; vm_fault_t ret = VM_FAULT_SIGBUS; @@ -403,7 +404,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) */ mmap_assert_locked(mm); - ctx = vmf->vma->vm_userfaultfd_ctx.ctx; + ctx = vma->vm_userfaultfd_ctx.ctx; if (!ctx) goto out; @@ -493,6 +494,15 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) blocking_state = userfaultfd_get_blocking_state(vmf->flags); + /* + * Take the vma lock now, in order to safely call + * userfaultfd_huge_must_wait() later. Since acquiring the + * (sleepable) vma lock can modify the current task state, that + * must be before explicitly calling set_current_state(). + */ + if (is_vm_hugetlb_page(vma)) + hugetlb_vma_lock_read(vma); + spin_lock_irq(&ctx->fault_pending_wqh.lock); /* * After the __add_wait_queue the uwq is visible to userland @@ -507,13 +517,15 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) set_current_state(blocking_state); spin_unlock_irq(&ctx->fault_pending_wqh.lock); - if (!is_vm_hugetlb_page(vmf->vma)) + if (!is_vm_hugetlb_page(vma)) must_wait = userfaultfd_must_wait(ctx, vmf->address, vmf->flags, reason); else - must_wait = userfaultfd_huge_must_wait(ctx, vmf->vma, + must_wait = userfaultfd_huge_must_wait(ctx, vma, vmf->address, vmf->flags, reason); + if (is_vm_hugetlb_page(vma)) + hugetlb_vma_unlock_read(vma); mmap_read_unlock(mm); if (likely(must_wait && !READ_ONCE(ctx->released))) { From patchwork Fri Dec 9 17:00:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 31879 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp891155wrr; Fri, 9 Dec 2022 09:07:49 -0800 (PST) X-Google-Smtp-Source: AA0mqf5iKlzJv/LVRW2Xgb67a7bmKFDKuwQB9ppMf3qQbuHBbOKn5pyVbgmJnLMOX/0S4kwSMDhq X-Received: by 2002:a05:6a00:324b:b0:574:3cde:385a with SMTP id bn11-20020a056a00324b00b005743cde385amr5594486pfb.32.1670605668825; Fri, 09 Dec 2022 09:07:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670605668; cv=none; d=google.com; s=arc-20160816; b=dv3TRyPgUfSJ3wcjtMGXADCMsgeZJwhJtN4zkUfeaKq3vLgYyK77IBUiI8wfEi1HDk WWI4y2k0+J52FGh7SYLIollC8yPCZDhALyzHNDzeD31BeP9labo0epfd8KRswVX6xKz1 GdgrjsEVvkpUDbhZ8h63xVplF9Rpc8gqgcQ17/WssJ2wtV6esRg5QwhOKHdTvMMp+szp p+tX7JxLDyPmVj/AAWG4L78ns+f32kR+6rGanDeNCo2wAvWLBOO1dqW69b+ca1Bgt8/N qhkHR2wd3ag4sHpGzya8CDffoDubhy5X021iVu5AxdYffNXf5gq0hv7QiNLe9QZFKUKN VKzg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=QzeyNG+xisx8vPYcMzT01YLXY3V0nLFHTQTVfPkpFHw=; b=cj1TUmsNWffU1BwM5GeiVsJkSLCJ+arbpGk9l4eIbVqc5X/DC+ExGJSAoxnwZ1muNZ i9ycf3qUI+PHPJtOA7lznWTzTp2SLvl/7TKJ0hWjYkiVelGVmhWeavN/sckO2epMVYyu d+uSNqO47vnRjf1Y68IOLldN1y14GPZxyAAw0ab2GhmMIztFMNuXn+3TvvMu3JUViQiQ VEz3QlPwezwbMdB5pNbqKRhdQhwn9KXUWU+Rqbp9kqhF35/9zIX2OLrShbJJm4k+/D0X O+LQR1sPlWGGf1vhnoBQF9SP467sCL8IZ5qL/9J9dd/e0XVDp78Ozogv1EcECxYRNVcE giTg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=egPMxOEL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bv5-20020a056a00414500b00574f3820920si2048250pfb.331.2022.12.09.09.07.33; Fri, 09 Dec 2022 09:07:48 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=egPMxOEL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230011AbiLIRC6 (ORCPT + 99 others); Fri, 9 Dec 2022 12:02:58 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51638 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229861AbiLIRCY (ORCPT ); Fri, 9 Dec 2022 12:02:24 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A4D8A14D13 for ; Fri, 9 Dec 2022 09:01:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1670605285; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QzeyNG+xisx8vPYcMzT01YLXY3V0nLFHTQTVfPkpFHw=; b=egPMxOEL9VEKTZCHwGQlHUDVy1Qj4DNJlAz6zvnBvfNqmv0fU4nUSA7iVaxqxZka0v5AJ+ AfN1256Jb70mOjXsRX9X8bYTaRwr1cU/61/wSgo6pE5iGnvknIJ5GE28e1yA+ioEFxRITp bOi5JxY0p+Sb8rBg1+9IcXZbkuZZgc8= Received: from mail-ua1-f72.google.com (mail-ua1-f72.google.com [209.85.222.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-672-SWK1_QJZMOyDYZOZg4M2QQ-1; Fri, 09 Dec 2022 12:01:24 -0500 X-MC-Unique: SWK1_QJZMOyDYZOZg4M2QQ-1 Received: by mail-ua1-f72.google.com with SMTP id z44-20020a9f372f000000b00390af225beaso2088206uad.12 for ; Fri, 09 Dec 2022 09:01:24 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QzeyNG+xisx8vPYcMzT01YLXY3V0nLFHTQTVfPkpFHw=; b=FLdwDIQvnhZWV2UuoIYMU83iH0Qoo16E80/mF2zaZ+i0wAIeG73odeMFU09nIBg/q6 QHbFtEBl/IOHkPoEqaGatWIxXRNtVUvZWsDN9g2Yz1kPfBEW3nWHNmWq/OSvSjLvODlb JekakgLLUA3wLZEhcq0BbShEtjMI7pi4IKja8A66PqXfHoQ5w7aCXBjp4jKFPhzdy52Q RvYdVwwJG40MbV1d1YN0p4yOaBxgRg7j64SbP02+fLf0VsPQwCodR6ZKVRKIEo9dstm6 z8E0UhQF2sxKlavuSg/DdOtSLhsNOLuCBd28X2rFG8v7t+Ota59KYMZerhtgIvnDA8zF 7DOA== X-Gm-Message-State: ANoB5plNsYQcjtbTZyXNpvm5yapsnS/9e/1YfWN/vjt70PuR7wS/B7Ov v/xQFya8u7wWst7OEIRZwRJYCvNu4RbuswG3CvePoTkQKMhGAxaY+2OR6ftYluBR3YX1kjgGTyT pmebaBVjEG8YeoMyz515Ms8x3 X-Received: by 2002:a1f:ee4e:0:b0:3bd:f324:5500 with SMTP id m75-20020a1fee4e000000b003bdf3245500mr3198331vkh.2.1670605283871; Fri, 09 Dec 2022 09:01:23 -0800 (PST) X-Received: by 2002:a1f:ee4e:0:b0:3bd:f324:5500 with SMTP id m75-20020a1fee4e000000b003bdf3245500mr3198279vkh.2.1670605283615; Fri, 09 Dec 2022 09:01:23 -0800 (PST) Received: from x1n.redhat.com (bras-base-aurron9127w-grc-46-70-31-27-79.dsl.bell.ca. [70.31.27.79]) by smtp.gmail.com with ESMTPSA id q7-20020a05620a0d8700b006cf38fd659asm178907qkl.103.2022.12.09.09.01.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 09 Dec 2022 09:01:22 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Miaohe Lin , David Hildenbrand , Nadav Amit , peterx@redhat.com, Andrea Arcangeli , Jann Horn , John Hubbard , Mike Kravetz , James Houghton , Rik van Riel , Muchun Song Subject: [PATCH v3 6/9] mm/hugetlb: Make hugetlb_follow_page_mask() safe to pmd unshare Date: Fri, 9 Dec 2022 12:00:57 -0500 Message-Id: <20221209170100.973970-7-peterx@redhat.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221209170100.973970-1-peterx@redhat.com> References: <20221209170100.973970-1-peterx@redhat.com> MIME-Version: 1.0 Content-type: text/plain X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751757009682189806?= X-GMAIL-MSGID: =?utf-8?q?1751757009682189806?= Since hugetlb_follow_page_mask() walks the pgtable, it needs the vma lock to make sure the pgtable page will not be freed concurrently. Acked-by: David Hildenbrand Reviewed-by: Mike Kravetz Reviewed-by: John Hubbard Signed-off-by: Peter Xu --- mm/hugetlb.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 247702eb9f88..e3af347470ac 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6227,9 +6227,10 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, if (WARN_ON_ONCE(flags & FOLL_PIN)) return NULL; + hugetlb_vma_lock_read(vma); pte = huge_pte_offset(mm, haddr, huge_page_size(h)); if (!pte) - return NULL; + goto out_unlock; ptl = huge_pte_lock(h, mm, pte); entry = huge_ptep_get(pte); @@ -6252,6 +6253,8 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, } out: spin_unlock(ptl); +out_unlock: + hugetlb_vma_unlock_read(vma); return page; } From patchwork Fri Dec 9 17:00:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 31881 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp892427wrr; Fri, 9 Dec 2022 09:10:09 -0800 (PST) X-Google-Smtp-Source: AA0mqf4pgFFi92JT15VbF2V3YVElm+aPnfAfixl+LTA0FrTkVm3055/JP70AlqdV+na8uyzg1jL2 X-Received: by 2002:a17:906:168a:b0:7c1:10b4:4742 with SMTP id s10-20020a170906168a00b007c110b44742mr5277691ejd.55.1670605809217; Fri, 09 Dec 2022 09:10:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670605809; cv=none; d=google.com; s=arc-20160816; b=obFumJk/HRfOjOVA+VH9vm8KggKUF3t5bwp4l+7trZGDaWdgRl0yQm7BfsHwMBG/cd z6vQW2uhvzimhbV7gAFg8bmyhLQWnuNmRGe3aiHMzqaWFdLcKmbRt+N2sR1hh/8Oae2r pC9Y1/S+8GZBX2FA3fZ8sKsFgRHXARfH+/glUj1qeQ+mlz9gOrqF4bD12PdZgiBdVO8z gy5Mptz77Ttu+xlJnYoradIkG/U3orkRx/xYOuCNk3K/wU+6EzzjAWeIRSK2ZAym3yrN y8DQuV2Nzf0NhONPavD/ySjpqGePfcuB8Td4/mPMiCjGW0kjRqcTC9qF7NWMjYVzDx2p eW5A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=BJ1c+DnQwwOjSKsDq4iN/7ttySyX/vWFxv5LWeLTqBE=; b=MSDP1GulMD2P9jTqstGWbAv9mbWOJGN57vewbVsheUeIzixn9XEAXBV+DYlwGY2PcU 9l3aAyONYbqOsWKDKJT2D6zKI6AM4uDXjnUAfEn4BqnR8QLvHIEOIN3lHer/EmchMRGk aetiKurnIJLG7r/HvZhJ7ezMIFluCnFU/m2DPWQ7XdcOBzP/9FtmozfzRq2HtueStyJ2 dyrRB7Z8CESA3T+3Xx4IBQTwb4BUY7BommfLRIcK+yxBlTNi6iAXYsEriksn+Aa+RWpF NQWbjnLRt6C54KIJ5mGWxvo2qsJSfVR7lFlHJeRW6aIZ/Qa30rCzTszsTpzUeCI9ric9 oZiA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=SJrjuxu3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hr40-20020a1709073fa800b0078034101c0esi201826ejc.978.2022.12.09.09.09.44; Fri, 09 Dec 2022 09:10:09 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=SJrjuxu3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229631AbiLIRDO (ORCPT + 99 others); Fri, 9 Dec 2022 12:03:14 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51752 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229917AbiLIRC0 (ORCPT ); Fri, 9 Dec 2022 12:02:26 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9A0DD7DA55 for ; Fri, 9 Dec 2022 09:01:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1670605289; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BJ1c+DnQwwOjSKsDq4iN/7ttySyX/vWFxv5LWeLTqBE=; b=SJrjuxu3NMB3kEkq8Qw3lRHS2W5ZxqyMZaaTxZ/nNmF4dXHjD8l5PEskCALrr3oNaMfmqR hqUfiH7/KYwnRlX3/zjjSDUdR9DaUIbPw4lYYM7D7V/2iPSiwkeRhPtEf43cuBicYV9tcl 0REDPNoemPNiL1aK5UqL8apMhQL8CB4= Received: from mail-oa1-f71.google.com (mail-oa1-f71.google.com [209.85.160.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-335-66wvnKmrNUyjYvEJ3uhGTA-1; Fri, 09 Dec 2022 12:01:26 -0500 X-MC-Unique: 66wvnKmrNUyjYvEJ3uhGTA-1 Received: by mail-oa1-f71.google.com with SMTP id 586e51a60fabf-1441866fa6cso117101fac.22 for ; Fri, 09 Dec 2022 09:01:26 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BJ1c+DnQwwOjSKsDq4iN/7ttySyX/vWFxv5LWeLTqBE=; b=QL2wApHU6vwvO6lRJuJxFlOOfOAYP6yaSGV7ywQ7rnT8gAUhOc1iJs0+PDYzT5wBgv SXFlTPqTtqVKe6KWZGOEdF1LbPb7pdgkbMAoYZ2MR+Kuh/1q+UdbN+JymqdvZVsvn0BY raU7yscUfY7hBRL1YOVmkipx9xM3nEYtOHxjrTGShgNXHmVO7X7NkPOCrAbY/ceVyKWl 6td/E/6OrgXcPcTf3XY765lU/21ubrQsD/ZOGyzfAKHaioZfps2nSXLsxDb5gXGLm55m YMrnqJdGwIk7ggl7i02n+V49MWG7F3LYbnnLXIyUfrM9Km/xrFgehkQx+Pkf4FevkeJ1 rpIg== X-Gm-Message-State: ANoB5pkljGASLu669wKNCML9a6vatzxSS+mUlx7E+yyR3aYsUvSadGNq bS9KEOcKL2ChBI9CUDJbabgDBkOtiDdwZ9daCphk0BHXRi2jqNEqUyOX6u0jqGSBIajSCzdl+dz LcTea5ngOVZ23y7T6+o/ydyep X-Received: by 2002:a05:6870:2f0a:b0:141:fe19:d4d0 with SMTP id qj10-20020a0568702f0a00b00141fe19d4d0mr2655998oab.50.1670605285751; Fri, 09 Dec 2022 09:01:25 -0800 (PST) X-Received: by 2002:a05:6870:2f0a:b0:141:fe19:d4d0 with SMTP id qj10-20020a0568702f0a00b00141fe19d4d0mr2655965oab.50.1670605285082; Fri, 09 Dec 2022 09:01:25 -0800 (PST) Received: from x1n.redhat.com (bras-base-aurron9127w-grc-46-70-31-27-79.dsl.bell.ca. [70.31.27.79]) by smtp.gmail.com with ESMTPSA id q7-20020a05620a0d8700b006cf38fd659asm178907qkl.103.2022.12.09.09.01.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 09 Dec 2022 09:01:24 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Miaohe Lin , David Hildenbrand , Nadav Amit , peterx@redhat.com, Andrea Arcangeli , Jann Horn , John Hubbard , Mike Kravetz , James Houghton , Rik van Riel , Muchun Song Subject: [PATCH v3 7/9] mm/hugetlb: Make follow_hugetlb_page() safe to pmd unshare Date: Fri, 9 Dec 2022 12:00:58 -0500 Message-Id: <20221209170100.973970-8-peterx@redhat.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221209170100.973970-1-peterx@redhat.com> References: <20221209170100.973970-1-peterx@redhat.com> MIME-Version: 1.0 Content-type: text/plain X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751757157088399554?= X-GMAIL-MSGID: =?utf-8?q?1751757157088399554?= Since follow_hugetlb_page() walks the pgtable, it needs the vma lock to make sure the pgtable page will not be freed concurrently. Acked-by: David Hildenbrand Reviewed-by: Mike Kravetz Reviewed-by: John Hubbard Signed-off-by: Peter Xu --- mm/hugetlb.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index e3af347470ac..9d8bb6508288 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6285,6 +6285,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, break; } + hugetlb_vma_lock_read(vma); /* * Some archs (sparc64, sh*) have multiple pte_ts to * each hugepage. We have to make sure we get the @@ -6309,6 +6310,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, !hugetlbfs_pagecache_present(h, vma, vaddr)) { if (pte) spin_unlock(ptl); + hugetlb_vma_unlock_read(vma); remainder = 0; break; } @@ -6330,6 +6332,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, if (pte) spin_unlock(ptl); + hugetlb_vma_unlock_read(vma); + if (flags & FOLL_WRITE) fault_flags |= FAULT_FLAG_WRITE; else if (unshare) @@ -6389,6 +6393,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, remainder -= pages_per_huge_page(h); i += pages_per_huge_page(h); spin_unlock(ptl); + hugetlb_vma_unlock_read(vma); continue; } @@ -6416,6 +6421,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, if (WARN_ON_ONCE(!try_grab_folio(pages[i], refs, flags))) { spin_unlock(ptl); + hugetlb_vma_unlock_read(vma); remainder = 0; err = -ENOMEM; break; @@ -6427,6 +6433,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, i += refs; spin_unlock(ptl); + hugetlb_vma_unlock_read(vma); } *nr_pages = remainder; /* From patchwork Fri Dec 9 17:00:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 31882 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp894093wrr; Fri, 9 Dec 2022 09:13:35 -0800 (PST) X-Google-Smtp-Source: AA0mqf5N454V2SvT0e/1KnGUgbEpH7RUHLlw/d78uFFs1Cr5O635dzEI2CT1+Yf0uxROzohsg1EN X-Received: by 2002:a17:906:dd9:b0:7c0:dfba:54d3 with SMTP id p25-20020a1709060dd900b007c0dfba54d3mr5656641eji.20.1670606015317; Fri, 09 Dec 2022 09:13:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670606015; cv=none; d=google.com; s=arc-20160816; b=hvcKfk5heHgQCGGGJyplXp+yKwt7lLldIfQeaWW23/ZNCqM2BrVDNOVm/t/HAvgSWd UPO3JkGOrRPOVGsmCpOXjPszAgJbceDgLsdDEAEocI6JBctJr3l0elUfVtvl81RXKLFl QSXArbhZkOsv83b3wyMNOYwQIxYBgD/9Il/fwDsNyj1daxTY54GimoRoOQtpht3XlJnS ZikfbUY2tpM6p4V9jes7OR0Kd1N/1TvE41k58jXVn5nEiRjx5X6zamAeVPtb9nOY0Ack QakSwFuo4oS8HYjqcQWrIbY/C4NfTjrdsRu3k+F41JDkW5M2MxXhjwRZh5TUXvcW6pm5 +OaA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=SiCXWzCdYbU9QHTa4B1WTrTi25NBF8hxZPF60vWGZSs=; b=yThpLzQx6avuTKKq79Qh7cofJ+6c4yVjfOttmZi/Ay1EMMkAxdZD/948lo5yJet4m6 D4sefLGNTBTBhlRIA4ZQRXSFJkZFTKuNMK0nmOJU+Ta5Rot/n4hVEutVa8CkcigrLiHy nFIUfQFqyZ1QbKu6UYRsjuxj5ZuK5X2jZ5zACVGq4QaHwK2BuXEMFUN6l5vr208R0hhy CzV8nvYGOgJJH+QqX0DHRS7Lx2G3ZJvyxGS3OCAPF4pVhghiNf8AYEMGPS0/XVfuRnj2 WGLxk62k5lF/0VSYlqL4MjP3MC9jT1lLsbUJhpXlx0RJahDWqvxI1LAmtcvMQtXWuZbA i0xA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=EjKoCatL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o16-20020a170906975000b007a6384d506csi222402ejy.643.2022.12.09.09.13.12; Fri, 09 Dec 2022 09:13:35 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=EjKoCatL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229580AbiLIRDY (ORCPT + 99 others); Fri, 9 Dec 2022 12:03:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51754 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229919AbiLIRC0 (ORCPT ); Fri, 9 Dec 2022 12:02:26 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D6E128D182 for ; Fri, 9 Dec 2022 09:01:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1670605291; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SiCXWzCdYbU9QHTa4B1WTrTi25NBF8hxZPF60vWGZSs=; b=EjKoCatLCpCy2lrxJ71HIOBk7eegBAKleKL3NlgCsSmwUUMnzNe9ygaCYICLz+3C3ONEX6 bFwGtlurU6KnIkcy9ikMQ46Lqj/RKkAv87Ev2yST3mos05YPM2jCnbxz1coABZqEHIgR4a GFt5yuDNYeIy4Gr4BVgmjj+O4LTkxyo= Received: from mail-oi1-f198.google.com (mail-oi1-f198.google.com [209.85.167.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-93-r_oC_N5yO9ysXI9T_kwNww-1; Fri, 09 Dec 2022 12:01:29 -0500 X-MC-Unique: r_oC_N5yO9ysXI9T_kwNww-1 Received: by mail-oi1-f198.google.com with SMTP id s18-20020a056808209200b0035be56b3f8dso2339468oiw.21 for ; Fri, 09 Dec 2022 09:01:29 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=SiCXWzCdYbU9QHTa4B1WTrTi25NBF8hxZPF60vWGZSs=; b=Ik7+DT0W7PIma+XxzHiQAHrHAjj3BftsmrlUmC/lqHgvc83RryXbbtfhnY4OuECPKy +PqqZbERNTB9ntqkBHncx5yL1Lla9IF8vioEhkiXNFZX2ucEHchBatFToy5/682RKwtW CQ0jd6FmqniSfBh/473UDqH3gcn/aOccUuPjMeKt2lujbIQqU4MC3Cmcnd5H4kNQvjgY KaL3lfOCeyiNw264+jzisNvRCUA7b0WfutrzFQDflHMBfvZgPYFQdGKbHgq1V3WI/P4c o0W1OM1/V/WgAhz6+E4Ufyl4cP11YBtW6SzvqF6UjIbOl835dJW7Q5Enux0eUlcoZde6 EX8A== X-Gm-Message-State: ANoB5pnfsD73+HDOceewsH/enELT7rruDowoOXIMT6iC9A7N1PeKm4vI L5fNJwB1kBqaJKuFeBWubNKt2KqYDVBT/gaeB+yLrRwVd0AATcpz+B2/7OLdj55Zom9cN1Mcle/ rF/O5T+TUQdcUJSAAqI1DUTz5 X-Received: by 2002:a05:6808:141:b0:35a:640d:300e with SMTP id h1-20020a056808014100b0035a640d300emr2638002oie.19.1670605288797; Fri, 09 Dec 2022 09:01:28 -0800 (PST) X-Received: by 2002:a05:6808:141:b0:35a:640d:300e with SMTP id h1-20020a056808014100b0035a640d300emr2637970oie.19.1670605288528; Fri, 09 Dec 2022 09:01:28 -0800 (PST) Received: from x1n.redhat.com (bras-base-aurron9127w-grc-46-70-31-27-79.dsl.bell.ca. [70.31.27.79]) by smtp.gmail.com with ESMTPSA id q7-20020a05620a0d8700b006cf38fd659asm178907qkl.103.2022.12.09.09.01.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 09 Dec 2022 09:01:27 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Miaohe Lin , David Hildenbrand , Nadav Amit , peterx@redhat.com, Andrea Arcangeli , Jann Horn , John Hubbard , Mike Kravetz , James Houghton , Rik van Riel , Muchun Song Subject: [PATCH v3 8/9] mm/hugetlb: Make walk_hugetlb_range() safe to pmd unshare Date: Fri, 9 Dec 2022 12:00:59 -0500 Message-Id: <20221209170100.973970-9-peterx@redhat.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221209170100.973970-1-peterx@redhat.com> References: <20221209170100.973970-1-peterx@redhat.com> MIME-Version: 1.0 Content-type: text/plain X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751757373092429991?= X-GMAIL-MSGID: =?utf-8?q?1751757373092429991?= Since walk_hugetlb_range() walks the pgtable, it needs the vma lock to make sure the pgtable page will not be freed concurrently. Reviewed-by: Mike Kravetz Signed-off-by: Peter Xu Reviewed-by: John Hubbard --- include/linux/pagewalk.h | 11 ++++++++++- mm/hmm.c | 15 ++++++++++++++- mm/pagewalk.c | 2 ++ 3 files changed, 26 insertions(+), 2 deletions(-) diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h index 959f52e5867d..27a6df448ee5 100644 --- a/include/linux/pagewalk.h +++ b/include/linux/pagewalk.h @@ -21,7 +21,16 @@ struct mm_walk; * depth is -1 if not known, 0:PGD, 1:P4D, 2:PUD, 3:PMD. * Any folded depths (where PTRS_PER_P?D is equal to 1) * are skipped. - * @hugetlb_entry: if set, called for each hugetlb entry + * @hugetlb_entry: if set, called for each hugetlb entry. This hook + * function is called with the vma lock held, in order to + * protect against a concurrent freeing of the pte_t* or + * the ptl. In some cases, the hook function needs to drop + * and retake the vma lock in order to avoid deadlocks + * while calling other functions. In such cases the hook + * function must either refrain from accessing the pte or + * ptl after dropping the vma lock, or else revalidate + * those items after re-acquiring the vma lock and before + * accessing them. * @test_walk: caller specific callback function to determine whether * we walk over the current vma or not. Returning 0 means * "do page table walk over the current vma", returning diff --git a/mm/hmm.c b/mm/hmm.c index 3850fb625dda..796de6866089 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -493,8 +493,21 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask, required_fault = hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, cpu_flags); if (required_fault) { + int ret; + spin_unlock(ptl); - return hmm_vma_fault(addr, end, required_fault, walk); + hugetlb_vma_unlock_read(vma); + /* + * Avoid deadlock: drop the vma lock before calling + * hmm_vma_fault(), which will itself potentially take and + * drop the vma lock. This is also correct from a + * protection point of view, because there is no further + * use here of either pte or ptl after dropping the vma + * lock. + */ + ret = hmm_vma_fault(addr, end, required_fault, walk); + hugetlb_vma_lock_read(vma); + return ret; } pfn = pte_pfn(entry) + ((start & ~hmask) >> PAGE_SHIFT); diff --git a/mm/pagewalk.c b/mm/pagewalk.c index 7f1c9b274906..d98564a7be57 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -302,6 +302,7 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end, const struct mm_walk_ops *ops = walk->ops; int err = 0; + hugetlb_vma_lock_read(vma); do { next = hugetlb_entry_end(h, addr, end); pte = huge_pte_offset(walk->mm, addr & hmask, sz); @@ -314,6 +315,7 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end, if (err) break; } while (addr = next, addr != end); + hugetlb_vma_unlock_read(vma); return err; } From patchwork Fri Dec 9 17:01:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 31880 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp891850wrr; Fri, 9 Dec 2022 09:09:02 -0800 (PST) X-Google-Smtp-Source: AA0mqf7QhDlUJo5mvFMt1VgxnGRZCUZAA1UhxvJ/VYs8TMZYJUqvns4/bBwsrzP+Jrz6ewS6b2fb X-Received: by 2002:a17:906:5208:b0:7ad:699b:227c with SMTP id g8-20020a170906520800b007ad699b227cmr5332942ejm.53.1670605742025; Fri, 09 Dec 2022 09:09:02 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670605742; cv=none; d=google.com; s=arc-20160816; b=fLbYnSjyP8QlhXM0jth8cUAbV20aYOGaPiViSszYG047TgpAGku8IpJ45AkrvdVSB8 ML2IXch4aEBxDCgDgPVSNyeGFvnumaLJT49yDyu3HGJRBynszxuUNZd7xYlEsiOMngo1 7MZW6eY1OsaPma+qXufGLSrazTaLrQkHrGrLSVI+7KkTvldAw3u4tWqwY4PUsW3fnkWe Cu0mocVo+F4VDzbAVkpGSYU5F5bNTLtwfb/6aJIhcYQzYDowSxdBbrgnlhxe3YSTs0ex eGoABlLUD7lyjf02ZOp7ohEv5U1pKv+n9+UdPh0J/jukEXCODWXeg6cWys4ttE813Epg FnMA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=L7MEdlvbVLjBM5+9KWX/Vi/7wZkpFuDB6TgvbozGtp8=; b=QHRoD0mUdk9MvcgxAiatmYqvJkuk/XTevvJclA/45zrJKoxxQlcQbcFnTh7E3+3saP 2oQhoJ0uwRNNGUyl273I+FulN1ruKi/djtMGlkes1UJ1Cg61qGFWSett12pyS1UXGizy ThIoTU0u/i5ZPpCqS0lBbV1MofNCvaa8KGx0SAPsDXknsiK92Z/deGAryQQZPZJ0YLVH Al1V2j30WKIPZGh5bceqhM+a4pKKfLTs+0QpAdr8Xv/oHYbSnu55YtvPYPRDL8LcMGX0 JSB8NFlk4foZT0zIu+8ZM3tAdVypdBEgpwN9GurMEpvHMrsG2kZKloHL5r9a45m5UL68 9aWA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ALLs3XnL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id wc16-20020a170907125000b007aea5ae3956si150352ejb.820.2022.12.09.09.08.38; Fri, 09 Dec 2022 09:09:02 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ALLs3XnL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230060AbiLIRDc (ORCPT + 99 others); Fri, 9 Dec 2022 12:03:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51800 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229954AbiLIRCf (ORCPT ); Fri, 9 Dec 2022 12:02:35 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 570BE83EA3 for ; Fri, 9 Dec 2022 09:01:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1670605295; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=L7MEdlvbVLjBM5+9KWX/Vi/7wZkpFuDB6TgvbozGtp8=; b=ALLs3XnLcXLLa8WdL4YW2EIiGGQaPCd12kHJz24J6N4jC9s0/UV9hXJKMiExIYojMxY54p MJnSOoIUsS0TXf6PS5F5h+t0O732BEG3AFxnY0Dlw9IJLQOKVHhK2uqBGzrIZ+SQHkcfFw AtiwEZWwCCIRSmDC9LZElbg9e/wic3M= Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-589-VRaWzlQQNaG6vpX4V6qVAw-1; Fri, 09 Dec 2022 12:01:34 -0500 X-MC-Unique: VRaWzlQQNaG6vpX4V6qVAw-1 Received: by mail-pg1-f200.google.com with SMTP id p7-20020a631e47000000b0047691854a86so3393896pgm.16 for ; Fri, 09 Dec 2022 09:01:34 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=L7MEdlvbVLjBM5+9KWX/Vi/7wZkpFuDB6TgvbozGtp8=; b=a2a/f1Tku/pXPNCDv7upF+fZmi/wOtZHN6c0ZHwddwESwaPc7iM/7ctxlBBAmhuGfk LL9f40K7RaBe/5PUUnFH5f9rX8PJK6C/ukMhLNZSeLXevjr3Qk0EYjZCWsR1aEM1W65Q C2l8122oKNUuml1BYJgJXUw4PUo3g8dErjpGRLJA0MnoJdvwisLVEl9x4xQW6dtQTJst YrCee06NUCVFVT7eB+vOeAyFCDgyE95cfuVwG0J5cjVMbh+ephSWwW5t1xMJKcsNBkSn BrzZSXVySUijl1PtBzqnn7B6t/G4mgWJ1MOZv9DSFRHgZErgU9btrgZrlSTMfXnV4ef0 gyGw== X-Gm-Message-State: ANoB5pmLm3PWNoPirPGlkVByzOU1peUtIrgaiA2BMpn4wrXW8ZBP/PnB nh500HbL2iFBRsmYjXXT0Gm3etfkXdW2Sr97bvdVeI9CGL2p6vUYvdz9O15VimcOUjYGWh8FN3C rQvqcofpMHHqgGdWNIaZOApWR X-Received: by 2002:a05:6a20:9f4a:b0:9d:efbf:6618 with SMTP id ml10-20020a056a209f4a00b0009defbf6618mr9037913pzb.38.1670605292396; Fri, 09 Dec 2022 09:01:32 -0800 (PST) X-Received: by 2002:a05:6a20:9f4a:b0:9d:efbf:6618 with SMTP id ml10-20020a056a209f4a00b0009defbf6618mr9037889pzb.38.1670605292058; Fri, 09 Dec 2022 09:01:32 -0800 (PST) Received: from x1n.redhat.com (bras-base-aurron9127w-grc-46-70-31-27-79.dsl.bell.ca. [70.31.27.79]) by smtp.gmail.com with ESMTPSA id q7-20020a05620a0d8700b006cf38fd659asm178907qkl.103.2022.12.09.09.01.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 09 Dec 2022 09:01:30 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Andrew Morton , Miaohe Lin , David Hildenbrand , Nadav Amit , peterx@redhat.com, Andrea Arcangeli , Jann Horn , John Hubbard , Mike Kravetz , James Houghton , Rik van Riel , Muchun Song Subject: [PATCH v3 9/9] mm/hugetlb: Introduce hugetlb_walk() Date: Fri, 9 Dec 2022 12:01:00 -0500 Message-Id: <20221209170100.973970-10-peterx@redhat.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221209170100.973970-1-peterx@redhat.com> References: <20221209170100.973970-1-peterx@redhat.com> MIME-Version: 1.0 Content-type: text/plain X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751757086311452345?= X-GMAIL-MSGID: =?utf-8?q?1751757086311452345?= huge_pte_offset() is the main walker function for hugetlb pgtables. The name is not really representing what it does, though. Instead of renaming it, introduce a wrapper function called hugetlb_walk() which will use huge_pte_offset() inside. Assert on the locks when walking the pgtable. Note, the vma lock assertion will be a no-op for private mappings. Document the last special case in the page_vma_mapped_walk() path where we don't need any more lock to call hugetlb_walk(). Taking vma lock there is not needed because either: (1) potential callers of hugetlb pvmw holds i_mmap_rwsem already (from one rmap_walk()), or (2) the caller will not walk a hugetlb vma at all so the hugetlb code path not reachable (e.g. in ksm or uprobe paths). It's slightly implicit for future page_vma_mapped_walk() callers on that lock requirement. But anyway, when one day this rule breaks, one will get a straightforward warning in hugetlb_walk() with lockdep, then there'll be a way out. Reviewed-by: Mike Kravetz Signed-off-by: Peter Xu Reviewed-by: John Hubbard --- fs/hugetlbfs/inode.c | 4 +--- fs/userfaultfd.c | 6 ++---- include/linux/hugetlb.h | 39 +++++++++++++++++++++++++++++++++++++++ mm/hugetlb.c | 32 +++++++++++++------------------- mm/page_vma_mapped.c | 9 ++++++--- mm/pagewalk.c | 4 +--- 6 files changed, 62 insertions(+), 32 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index fdb16246f46e..48f1a8ad2243 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -388,9 +388,7 @@ static bool hugetlb_vma_maps_page(struct vm_area_struct *vma, { pte_t *ptep, pte; - ptep = huge_pte_offset(vma->vm_mm, addr, - huge_page_size(hstate_vma(vma))); - + ptep = hugetlb_walk(vma, addr, huge_page_size(hstate_vma(vma))); if (!ptep) return false; diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 969f4be967c6..6a278941ec84 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -237,14 +237,12 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx, unsigned long flags, unsigned long reason) { - struct mm_struct *mm = ctx->mm; pte_t *ptep, pte; bool ret = true; - mmap_assert_locked(mm); - - ptep = huge_pte_offset(mm, address, vma_mmu_pagesize(vma)); + mmap_assert_locked(ctx->mm); + ptep = hugetlb_walk(vma, address, vma_mmu_pagesize(vma)); if (!ptep) goto out; diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index d755e2a7c0db..a5e87ec7fa6e 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -2,6 +2,7 @@ #ifndef _LINUX_HUGETLB_H #define _LINUX_HUGETLB_H +#include #include #include #include @@ -196,6 +197,11 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, * huge_pte_offset(): Walk the hugetlb pgtable until the last level PTE. * Returns the pte_t* if found, or NULL if the address is not mapped. * + * IMPORTANT: we should normally not directly call this function, instead + * this is only a common interface to implement arch-specific + * walker. Please use hugetlb_walk() instead, because that will attempt to + * verify the locking for you. + * * Since this function will walk all the pgtable pages (including not only * high-level pgtable page, but also PUD entry that can be unshared * concurrently for VM_SHARED), the caller of this function should be @@ -1229,4 +1235,37 @@ bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr); #define flush_hugetlb_tlb_range(vma, addr, end) flush_tlb_range(vma, addr, end) #endif +static inline bool +__vma_shareable_flags_pmd(struct vm_area_struct *vma) +{ + return vma->vm_flags & (VM_MAYSHARE | VM_SHARED) && + vma->vm_private_data; +} + +/* + * Safe version of huge_pte_offset() to check the locks. See comments + * above huge_pte_offset(). + */ +static inline pte_t * +hugetlb_walk(struct vm_area_struct *vma, unsigned long addr, unsigned long sz) +{ +#if defined(CONFIG_HUGETLB_PAGE) && \ + defined(CONFIG_ARCH_WANT_HUGE_PMD_SHARE) && defined(CONFIG_LOCKDEP) + struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; + + /* + * If pmd sharing possible, locking needed to safely walk the + * hugetlb pgtables. More information can be found at the comment + * above huge_pte_offset() in the same file. + * + * NOTE: lockdep_is_held() is only defined with CONFIG_LOCKDEP. + */ + if (__vma_shareable_flags_pmd(vma)) + WARN_ON_ONCE(!lockdep_is_held(&vma_lock->rw_sema) && + !lockdep_is_held( + &vma->vm_file->f_mapping->i_mmap_rwsem)); +#endif + return huge_pte_offset(vma->vm_mm, addr, sz); +} + #endif /* _LINUX_HUGETLB_H */ diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 9d8bb6508288..b20120d14a71 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4814,7 +4814,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, } else { /* * For shared mappings the vma lock must be held before - * calling huge_pte_offset in the src vma. Otherwise, the + * calling hugetlb_walk() in the src vma. Otherwise, the * returned ptep could go away if part of a shared pmd and * another thread calls huge_pmd_unshare. */ @@ -4824,7 +4824,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, last_addr_mask = hugetlb_mask_last_page(h); for (addr = src_vma->vm_start; addr < src_vma->vm_end; addr += sz) { spinlock_t *src_ptl, *dst_ptl; - src_pte = huge_pte_offset(src, addr, sz); + src_pte = hugetlb_walk(src_vma, addr, sz); if (!src_pte) { addr |= last_addr_mask; continue; @@ -5028,7 +5028,7 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma, hugetlb_vma_lock_write(vma); i_mmap_lock_write(mapping); for (; old_addr < old_end; old_addr += sz, new_addr += sz) { - src_pte = huge_pte_offset(mm, old_addr, sz); + src_pte = hugetlb_walk(vma, old_addr, sz); if (!src_pte) { old_addr |= last_addr_mask; new_addr |= last_addr_mask; @@ -5091,7 +5091,7 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct last_addr_mask = hugetlb_mask_last_page(h); address = start; for (; address < end; address += sz) { - ptep = huge_pte_offset(mm, address, sz); + ptep = hugetlb_walk(vma, address, sz); if (!ptep) { address |= last_addr_mask; continue; @@ -5404,7 +5404,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, mutex_lock(&hugetlb_fault_mutex_table[hash]); hugetlb_vma_lock_read(vma); spin_lock(ptl); - ptep = huge_pte_offset(mm, haddr, huge_page_size(h)); + ptep = hugetlb_walk(vma, haddr, huge_page_size(h)); if (likely(ptep && pte_same(huge_ptep_get(ptep), pte))) goto retry_avoidcopy; @@ -5442,7 +5442,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, * before the page tables are altered */ spin_lock(ptl); - ptep = huge_pte_offset(mm, haddr, huge_page_size(h)); + ptep = hugetlb_walk(vma, haddr, huge_page_size(h)); if (likely(ptep && pte_same(huge_ptep_get(ptep), pte))) { /* Break COW or unshare */ huge_ptep_clear_flush(vma, haddr, ptep); @@ -6228,7 +6228,7 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, return NULL; hugetlb_vma_lock_read(vma); - pte = huge_pte_offset(mm, haddr, huge_page_size(h)); + pte = hugetlb_walk(vma, haddr, huge_page_size(h)); if (!pte) goto out_unlock; @@ -6293,8 +6293,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, * * Note that page table lock is not held when pte is null. */ - pte = huge_pte_offset(mm, vaddr & huge_page_mask(h), - huge_page_size(h)); + pte = hugetlb_walk(vma, vaddr & huge_page_mask(h), + huge_page_size(h)); if (pte) ptl = huge_pte_lock(h, mm, pte); absent = !pte || huge_pte_none(huge_ptep_get(pte)); @@ -6480,7 +6480,7 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, last_addr_mask = hugetlb_mask_last_page(h); for (; address < end; address += psize) { spinlock_t *ptl; - ptep = huge_pte_offset(mm, address, psize); + ptep = hugetlb_walk(vma, address, psize); if (!ptep) { address |= last_addr_mask; continue; @@ -6858,12 +6858,6 @@ void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, *end = ALIGN(*end, PUD_SIZE); } -static bool __vma_shareable_flags_pmd(struct vm_area_struct *vma) -{ - return vma->vm_flags & (VM_MAYSHARE | VM_SHARED) && - vma->vm_private_data; -} - void hugetlb_vma_lock_read(struct vm_area_struct *vma) { if (__vma_shareable_flags_pmd(vma)) { @@ -7029,8 +7023,8 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, saddr = page_table_shareable(svma, vma, addr, idx); if (saddr) { - spte = huge_pte_offset(svma->vm_mm, saddr, - vma_mmu_pagesize(svma)); + spte = hugetlb_walk(svma, saddr, + vma_mmu_pagesize(svma)); if (spte) { get_page(virt_to_page(spte)); break; @@ -7388,7 +7382,7 @@ void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) hugetlb_vma_lock_write(vma); i_mmap_lock_write(vma->vm_file->f_mapping); for (address = start; address < end; address += PUD_SIZE) { - ptep = huge_pte_offset(mm, address, sz); + ptep = hugetlb_walk(vma, address, sz); if (!ptep) continue; ptl = huge_pte_lock(h, mm, ptep); diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 93e13fc17d3c..f3729b23dd0e 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -168,9 +168,12 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) /* The only possible mapping was handled on last iteration */ if (pvmw->pte) return not_found(pvmw); - - /* when pud is not present, pte will be NULL */ - pvmw->pte = huge_pte_offset(mm, pvmw->address, size); + /* + * All callers that get here will already hold the + * i_mmap_rwsem. Therefore, no additional locks need to be + * taken before calling hugetlb_walk(). + */ + pvmw->pte = hugetlb_walk(vma, pvmw->address, size); if (!pvmw->pte) return false; diff --git a/mm/pagewalk.c b/mm/pagewalk.c index d98564a7be57..cb23f8a15c13 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -305,13 +305,11 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end, hugetlb_vma_lock_read(vma); do { next = hugetlb_entry_end(h, addr, end); - pte = huge_pte_offset(walk->mm, addr & hmask, sz); - + pte = hugetlb_walk(vma, addr & hmask, sz); if (pte) err = ops->hugetlb_entry(pte, hmask, addr, next, walk); else if (ops->pte_hole) err = ops->pte_hole(addr, next, -1, walk); - if (err) break; } while (addr = next, addr != end);