From patchwork Fri Dec 16 15:50:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 33983 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:e747:0:0:0:0:0 with SMTP id c7csp1043460wrn; Fri, 16 Dec 2022 07:52:43 -0800 (PST) X-Google-Smtp-Source: AA0mqf7HdSfw29quwcAg5Z22wroFLQ7ukeICZ5tNDyEpxRxtRanY4i5XixuGTmdYaBCow8c/9LV6 X-Received: by 2002:a05:6a21:168c:b0:ad:ee09:f433 with SMTP id np12-20020a056a21168c00b000adee09f433mr21139942pzb.54.1671205963294; Fri, 16 Dec 2022 07:52:43 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671205963; cv=none; d=google.com; s=arc-20160816; b=QiiQYMvfQuc1dyJH+YTBEkOM9dMYRdQcWGCeWBWqJ2lrAjkaUgl5swtMrs7CEAw1vN 6VubW/emUcRxM3R0Qyzne7bSicd+3AlDgBx0RIRmUdqxS6xgBuS0Gs7qR7FFGhEac60I 6J7RdnzWEPuH3VXG3+SaWoZNcvm2mfhN4IPXqgaxh0n9mMEDky60MMSNkyNV1ukOaUkU CZB4QXxUtWzdegx1nhuzbnNGtZfu/BKIwHCg0nB5qvGqor8uljIIpPo0aEpIB0+bPStp iZ46potUeeVVbXOHVGq3bM+D/CQVa1yLsKAHiTkJi4dEpL28qHFOkbbN+Fz3V4MQr/Fp 10Pg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=W6TRH+D4KiiNcSkU8trC/D1fuTVk/3U3ig0mjyWFdEw=; b=rTb3FrcPUrdFYmE3sgZ0pCDfxRhnBUf/FZArXsWWaGelqH+InRKrywOnh3SKnF0anK 39MuQ2NW1hH1KxMHEAAkk7tWEe7MAAKHz2gpmYMCfnwXaWBX0k+LoSKOSuLOH+AC1i2I tbhSRvuYjmvLTeYCPcmPb4W7XjqS9NDcQUFD9ApZ9VrU4v8mAjgnGQ1u2S+9hs6At/Mp J0ohydxI67xdcKKoFchltiCUjqfBsK9BD5Z2am0ZOTHyY/35x18VBBGSBkAq9mB/H85b Q1T8n6udy5i5QAYY9wVYXXPjA+wyUg0wEbQtWAjU7EwCLUKPVeJymwSUosnvHXDoSHb8 cxgQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Ji0Qc+bH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 84-20020a630357000000b0047860cd6de3si3229651pgd.641.2022.12.16.07.52.30; Fri, 16 Dec 2022 07:52:43 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Ji0Qc+bH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231483AbiLPPwJ (ORCPT + 99 others); Fri, 16 Dec 2022 10:52:09 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57712 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231533AbiLPPwC (ORCPT ); Fri, 16 Dec 2022 10:52:02 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B2BAA55A8D for ; Fri, 16 Dec 2022 07:51:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1671205871; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=W6TRH+D4KiiNcSkU8trC/D1fuTVk/3U3ig0mjyWFdEw=; b=Ji0Qc+bH8yvuFExhq3bToVUi1NAsako3VaGAwPJFtAazaL2HwoNhM71zeNkfio5ro/Vxwx myesIRRbuBLvnAIU37kl3O5OANqYkE22mZbQ57Q5g/ahU0fy2qk7wfp7Xofo3EqCqRcDtn qLsRq4DrPO6uUInv4s5/O22fKSz5Ub0= Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com [209.85.222.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-246-PZ6micg3OUmAMECJIDHXjg-1; Fri, 16 Dec 2022 10:51:09 -0500 X-MC-Unique: PZ6micg3OUmAMECJIDHXjg-1 Received: by mail-qk1-f197.google.com with SMTP id br6-20020a05620a460600b007021e1a5c48so2081907qkb.6 for ; Fri, 16 Dec 2022 07:51:09 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=W6TRH+D4KiiNcSkU8trC/D1fuTVk/3U3ig0mjyWFdEw=; b=N8j7mX4q+P64sStrJPl+3WanZ6FLahT2hUXNJKSqi1CRvoJfgHgUiU6MTbngGMGMeO AC0xM1lckpdMXkXF5aOFhU0+nPCwOKymnTO8Tc31HS3vkUCVOgyqo6WmkrxRfDR66Cnt 1AjXEbNEm7NUysMqh9giCArbXcxgWI8IuOeAFOwPEIrAiSHfIpsD5xueYiGrl8mMEKhN hUMiLgVMw3W5TbqiFs7A46iMpIiVR0C6w084hf0XXqEAe5q3q716l0IsVTHERh0a3Ykh BF1aPNSAVlir0C3h9QVGHxq7QV49Vu/bpHI3mfCXA0UR07CFOam589oGWpBofAgushto 2aAQ== X-Gm-Message-State: ANoB5pkckugmehqSUz/Rkv5eixDFY/mwYizyEMiMovIQAooaMJHT9ioU 1PEQS8RH17Tbj8sT2fnlILRI6sB1W4jKrGXPmHN0ckv8CZOABilyM9DRy+kC0rjT+5q6glaOT5g j6oR3HXfL+Rc4kUQlNffESZIvFnRBUUrfVtvjsxmGm9xrv5CX2JLdjwfmjEATq3KBQUkzDmZgFg == X-Received: by 2002:ad4:5883:0:b0:4c6:f65b:2e16 with SMTP id dz3-20020ad45883000000b004c6f65b2e16mr49280210qvb.21.1671205867842; Fri, 16 Dec 2022 07:51:07 -0800 (PST) X-Received: by 2002:ad4:5883:0:b0:4c6:f65b:2e16 with SMTP id dz3-20020ad45883000000b004c6f65b2e16mr49280167qvb.21.1671205867572; Fri, 16 Dec 2022 07:51:07 -0800 (PST) Received: from x1n.redhat.com (bras-base-aurron9127w-grc-45-70-31-26-132.dsl.bell.ca. [70.31.26.132]) by smtp.gmail.com with ESMTPSA id s81-20020a37a954000000b006eeb3165554sm1682297qke.19.2022.12.16.07.51.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 16 Dec 2022 07:51:03 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Muchun Song , Miaohe Lin , Andrea Arcangeli , Nadav Amit , James Houghton , peterx@redhat.com, Mike Kravetz , David Hildenbrand , Rik van Riel , John Hubbard , Andrew Morton , Jann Horn Subject: [PATCH v4 1/9] mm/hugetlb: Let vma_offset_start() to return start Date: Fri, 16 Dec 2022 10:50:52 -0500 Message-Id: <20221216155100.2043537-2-peterx@redhat.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221216155100.2043537-1-peterx@redhat.com> References: <20221216155100.2043537-1-peterx@redhat.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752386464426268422?= X-GMAIL-MSGID: =?utf-8?q?1752386464426268422?= Even though vma_offset_start() is named like that, it's not returning "the start address of the range" but rather the offset we should use to offset the vma->vm_start address. Make it return the real value of the start vaddr, and it also helps for all the callers because whenever the retval is used, it'll be ultimately added into the vma->vm_start anyway, so it's better. Reviewed-by: Mike Kravetz Reviewed-by: David Hildenbrand Reviewed-by: John Hubbard Signed-off-by: Peter Xu --- fs/hugetlbfs/inode.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 790d2727141a..fdb16246f46e 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -412,10 +412,12 @@ static bool hugetlb_vma_maps_page(struct vm_area_struct *vma, */ static unsigned long vma_offset_start(struct vm_area_struct *vma, pgoff_t start) { + unsigned long offset = 0; + if (vma->vm_pgoff < start) - return (start - vma->vm_pgoff) << PAGE_SHIFT; - else - return 0; + offset = (start - vma->vm_pgoff) << PAGE_SHIFT; + + return vma->vm_start + offset; } static unsigned long vma_offset_end(struct vm_area_struct *vma, pgoff_t end) @@ -457,7 +459,7 @@ static void hugetlb_unmap_file_folio(struct hstate *h, v_start = vma_offset_start(vma, start); v_end = vma_offset_end(vma, end); - if (!hugetlb_vma_maps_page(vma, vma->vm_start + v_start, page)) + if (!hugetlb_vma_maps_page(vma, v_start, page)) continue; if (!hugetlb_vma_trylock_write(vma)) { @@ -473,8 +475,8 @@ static void hugetlb_unmap_file_folio(struct hstate *h, break; } - unmap_hugepage_range(vma, vma->vm_start + v_start, v_end, - NULL, ZAP_FLAG_DROP_MARKER); + unmap_hugepage_range(vma, v_start, v_end, NULL, + ZAP_FLAG_DROP_MARKER); hugetlb_vma_unlock_write(vma); } @@ -507,10 +509,9 @@ static void hugetlb_unmap_file_folio(struct hstate *h, */ v_start = vma_offset_start(vma, start); v_end = vma_offset_end(vma, end); - if (hugetlb_vma_maps_page(vma, vma->vm_start + v_start, page)) - unmap_hugepage_range(vma, vma->vm_start + v_start, - v_end, NULL, - ZAP_FLAG_DROP_MARKER); + if (hugetlb_vma_maps_page(vma, v_start, page)) + unmap_hugepage_range(vma, v_start, v_end, NULL, + ZAP_FLAG_DROP_MARKER); kref_put(&vma_lock->refs, hugetlb_vma_lock_release); hugetlb_vma_unlock_write(vma); @@ -540,8 +541,7 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end, v_start = vma_offset_start(vma, start); v_end = vma_offset_end(vma, end); - unmap_hugepage_range(vma, vma->vm_start + v_start, v_end, - NULL, zap_flags); + unmap_hugepage_range(vma, v_start, v_end, NULL, zap_flags); /* * Note that vma lock only exists for shared/non-private From patchwork Fri Dec 16 15:50:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 33986 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:e747:0:0:0:0:0 with SMTP id c7csp1044161wrn; Fri, 16 Dec 2022 07:54:06 -0800 (PST) X-Google-Smtp-Source: AA0mqf5JIqfs2sFqxSh+LiV9f0y889RQeWv+wBZ9ObN/jTte1mVbuZQhsJTvVu1PUw5zdx5tSO1Q X-Received: by 2002:a17:902:b618:b0:189:af28:fd8d with SMTP id b24-20020a170902b61800b00189af28fd8dmr31734506pls.1.1671206046194; Fri, 16 Dec 2022 07:54:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671206046; cv=none; d=google.com; s=arc-20160816; b=R3Y7AVXzwx+l3u3FNQQziHrfSBgj+7msIYHiG5MousO4pbQ2b3f50MvxAEIYIf6v3a W5bVtLtApMfsTVvDs0V5q1fM4JK998x5cPru1t1+m173ELFONtKsFJLy/ERN6PHn7xsA RrDogKGH84aAOlVW0AoypjoAhbGbNBbb7IZ4Q0RO4JbVkag0VvT12y7ZDSuPsPn+v+q/ a1sTVQOe+O/Ui5AARE9u57+rCB4NAveGnHOTgkVpfNEhsc8YPHEoqJOFTaqVv/eH4YVQ mHObbDQSubPaC6uVEYrtoVz3RU1AbKX7xMWPqbPHWSggJovcmoG+s6DrQnUoQHqYutjH 8UWA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=t1N733mGfOZDIeBgBOLPf9hKyMpT23Vbs9QUhNwynbg=; b=V4TnRyrbpAzfx6JmH341v/I51tLkRH02acHmYiJgdBWnnH+7MrVN97/2jchNVWFtER gHoaoNjmOVK5iPUzX2y3eJDDTdusLqnpWPLwQ9IF7n9oAw4GBQ2Lx0Up0KYs7BsPUe3O 02tA7OZEkjZNWnR+N3E9EPe4OBi+GJDbsZPDZZW1NBcEZzFABb0Jes6t5ZakZ+P0OCMM modRzyvAtKGDC0bT0s1JQcKzOzFKF6wJE90IeExv2KoUYfLgIVZBNKBJcS4H8f5PUm7C TOyOfAp/k2BO1rR/Uw0Lsq7nh9UiPQoktpqeEFBonICFQjQjoSZXxQ7SzXThbaF9VfWq QHAA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="g/odTUMQ"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p21-20020a170902f09500b00189e2b9e630si2641397pla.270.2022.12.16.07.53.54; Fri, 16 Dec 2022 07:54:06 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="g/odTUMQ"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231533AbiLPPwT (ORCPT + 99 others); Fri, 16 Dec 2022 10:52:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57196 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231481AbiLPPwI (ORCPT ); Fri, 16 Dec 2022 10:52:08 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 869A836C51 for ; Fri, 16 Dec 2022 07:51:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1671205878; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=t1N733mGfOZDIeBgBOLPf9hKyMpT23Vbs9QUhNwynbg=; b=g/odTUMQC38MTBi31Z1wOiOv28WTPLSd/+R/A0KhwjBYdDpqEG6q9+tgjSoCMrty6n74cc md1z4mCV6F4xhYZszrSZFwPF0PK4Qa9A7rOZDQ5opyC3iHyisOrLBedo0LI4WWhKWiF1s+ QaGc2V6aqqqWS8RALrfjqXMxw9k+a6E= Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-362-Qx61AxhiPuy9CElpUmT-eg-1; Fri, 16 Dec 2022 10:51:12 -0500 X-MC-Unique: Qx61AxhiPuy9CElpUmT-eg-1 Received: by mail-qv1-f72.google.com with SMTP id m4-20020ad44484000000b004c78122b496so1679694qvt.7 for ; Fri, 16 Dec 2022 07:51:11 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=t1N733mGfOZDIeBgBOLPf9hKyMpT23Vbs9QUhNwynbg=; b=xJ9vOcURGPLlJUgfX3wi2AaiMm5ND8Y12ktadUxCJm5hbLQoQE9hU3m3FH043RwtVj noNxFCZVwI7NBEgQ4TEMUiZzPXn/2/bw3r+H+eTFDocgvVlSYAIx1WU2+8FckM5XQ8mL 7s1XfOMYuGS+7xFThT9sF+hzm6Z9eSP8JQhSnqMePC/zIcap13j7XfAvsCBmbvsLqfXU OlyR86SMaWxNJVSBNlV+f32DYONT+xkeKjw/IWEUS7XIY0cNh5hTb80br9TjRgS7lFgV RGHs4K/3eATIDf2haumCqxbPE8L7/Kr7kmAHqZH7MVvUEuVg5EgTSdQmQvaVbzFdrtAx RnTA== X-Gm-Message-State: ANoB5plBT6dCKIBwnKV/GN+CyZDSDpUSJYDbNN+8+Qig8ScbqND6jEkK jnwtpd4j5S7y3M+UARDRRaOvhf1MESgcKtsOruh2TEB4SN8x3o9jQZIUtzTsRzlh/1gySFbZoPl Oz50vLxaK1nOmx0H3rTR6sfaO932US75gwIjoZgB6rq/jdOlZn4yEO8/lIt/bpQLD4SkLWRSwtg == X-Received: by 2002:a0c:dd01:0:b0:4c6:fb9f:a327 with SMTP id u1-20020a0cdd01000000b004c6fb9fa327mr41198464qvk.29.1671205870676; Fri, 16 Dec 2022 07:51:10 -0800 (PST) X-Received: by 2002:a0c:dd01:0:b0:4c6:fb9f:a327 with SMTP id u1-20020a0cdd01000000b004c6fb9fa327mr41198425qvk.29.1671205870379; Fri, 16 Dec 2022 07:51:10 -0800 (PST) Received: from x1n.redhat.com (bras-base-aurron9127w-grc-45-70-31-26-132.dsl.bell.ca. [70.31.26.132]) by smtp.gmail.com with ESMTPSA id s81-20020a37a954000000b006eeb3165554sm1682297qke.19.2022.12.16.07.51.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 16 Dec 2022 07:51:09 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Muchun Song , Miaohe Lin , Andrea Arcangeli , Nadav Amit , James Houghton , peterx@redhat.com, Mike Kravetz , David Hildenbrand , Rik van Riel , John Hubbard , Andrew Morton , Jann Horn Subject: [PATCH v4 2/9] mm/hugetlb: Don't wait for migration entry during follow page Date: Fri, 16 Dec 2022 10:50:53 -0500 Message-Id: <20221216155100.2043537-3-peterx@redhat.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221216155100.2043537-1-peterx@redhat.com> References: <20221216155100.2043537-1-peterx@redhat.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752386551098938425?= X-GMAIL-MSGID: =?utf-8?q?1752386551098938425?= That's what the code does with !hugetlb pages, so we should logically do the same for hugetlb, so migration entry will also be treated as no page. This is probably also the last piece in follow_page code that may sleep, the last one should be removed in cf994dd8af27 ("mm/gup: remove FOLL_MIGRATION", 2022-11-16). Reviewed-by: Mike Kravetz Reviewed-by: David Hildenbrand Reviewed-by: John Hubbard Signed-off-by: Peter Xu --- mm/hugetlb.c | 11 ----------- 1 file changed, 11 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 0dfe441f9f4d..8ccd55f9fbd3 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6380,7 +6380,6 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, if (WARN_ON_ONCE(flags & FOLL_PIN)) return NULL; -retry: pte = huge_pte_offset(mm, haddr, huge_page_size(h)); if (!pte) return NULL; @@ -6403,16 +6402,6 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, page = NULL; goto out; } - } else { - if (is_hugetlb_entry_migration(entry)) { - spin_unlock(ptl); - __migration_entry_wait_huge(pte, ptl); - goto retry; - } - /* - * hwpoisoned entry is treated as no_page_table in - * follow_page_mask(). - */ } out: spin_unlock(ptl); From patchwork Fri Dec 16 15:50:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 33987 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:e747:0:0:0:0:0 with SMTP id c7csp1044176wrn; Fri, 16 Dec 2022 07:54:07 -0800 (PST) X-Google-Smtp-Source: AA0mqf5kLsZWeYsiq80VwPZvaSRtNxstP8XAuMWhcZdu/si3R76Qd7wIN2FqhKuUCa2bvBmSboK6 X-Received: by 2002:a17:903:1209:b0:186:7608:1880 with SMTP id l9-20020a170903120900b0018676081880mr45975081plh.36.1671206047575; Fri, 16 Dec 2022 07:54:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671206047; cv=none; d=google.com; s=arc-20160816; b=XM9tj2E5gItl6Pk6Nm5KdkQfmMnXmF9489K+8/Mj9AnV8puQE7YwLkfRjT0QP1f7W4 m4ECPXtFasSVoldsqr9NpiniV1jOJR6inh24/TLNSqMCLsBMq9hl/ySCQsIB+m8nTG6o RKE/yTbXPQ2IqPOwV3RirGuCLafrgCCUoipYtA+O0I1YArkM6a08bZUIeTtBVzcvSxQr OVFqdFbxV3CA3/xAT2Xz1bA2i3NdVbsBFZwwwgqI5tDNMMKX4Nn/BDPipiN8anOvL3Vn EsPGJbqfHXDGL+62ZA+d9T5K5Ohfu6uvQGRaB3qywQBx0Y1E4AntvJ2eAkURu7ZEMsGp PBQg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=KF8YBIr+SenpQMPOCGcYKB11m6lZYSBtX9iYqo4LY9E=; b=O4k4S0eR59c/YzhmD7HUIcGZIDuIaAP9msp7/Zfzf8LEHtA6X8bMtD3PK013uoN1Rh ScdKxb8AxCPVeMlr8RPm/qDgigmbVA7AHePpbosXkzAnSb5bed9s4EO27dJvOZ1bsaHc ACBV2f5NshlirfSjOUOjYdnxk9RwByIIqAOjC/tgkSnrbdkVT+3wSlbfqZhFp9JES+yh 4sPy4suo4hlFmqq33IeYlqEkggVgTGakKNIn+W9RkSMy2wAzl213bps1Rf4aHE6F1NsP Ax0hw7xYndNaNwi57ecq+Bev04Wd9ZGdb77nRA8fZP+a1IX+bQjMkCZ2nxZ6z/pV4cbY D7Mg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="csNWsGW/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l6-20020a170902d34600b001891509807fsi2698895plk.575.2022.12.16.07.53.54; Fri, 16 Dec 2022 07:54:07 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="csNWsGW/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231501AbiLPPw2 (ORCPT + 99 others); Fri, 16 Dec 2022 10:52:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57156 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231488AbiLPPwM (ORCPT ); Fri, 16 Dec 2022 10:52:12 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 72481566F8 for ; Fri, 16 Dec 2022 07:51:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1671205875; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KF8YBIr+SenpQMPOCGcYKB11m6lZYSBtX9iYqo4LY9E=; b=csNWsGW/bQnmfPX6UmmutwDcOq+CNrGpRvu2gA07KwgnQdZdJy1bZlf3f/0b3k4m1U4O1w XUIJIPC/KzhmvhwseKfYWjq0DCm/H+NNljNp0nXFPuSc/u3vuWAassH/WotfobLpmOdrrh 5IwrblOLfp1rtWdV+1cFelUkUanpSIw= Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com [209.85.222.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-417-uNnN6rg-Ope8mSlwuzmkpg-1; Fri, 16 Dec 2022 10:51:14 -0500 X-MC-Unique: uNnN6rg-Ope8mSlwuzmkpg-1 Received: by mail-qk1-f197.google.com with SMTP id de38-20020a05620a372600b0070224de1c6eso1227110qkb.17 for ; Fri, 16 Dec 2022 07:51:14 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KF8YBIr+SenpQMPOCGcYKB11m6lZYSBtX9iYqo4LY9E=; b=tWNhQwJR2zKBByCNcJNFpjdAYEJmUqxIK2emHwWbcsjcQyEsewvlUXltDSc5dJXycl aVTegJ6nsl8kAXKk/LbkrwLyUIh2NnCxuyeSFW45bKoqpf1tXSwhPc64RM2gQYF9ALno eCrj46PxBQ/K5166em7No2GtFyLiZWBCO/WJoHa2sipZRiGp7MJaszP5vf7ftlFi0+8M hN8UkC2eQHNk1mlLNnMXBFO6RvCXSxbuuUyRXUug3jr+fqb3ZBGbMm2wspbm0RpgR31A BmvnmU63Ycyoutsli1JX2o4Fva1jAIAAtfE9SQTsDlTKHCxsFz/YJKc506QlSZaOSuxd MbTw== X-Gm-Message-State: ANoB5pkMpVKRnJacbUn0KBQr3Sosg2m7N9CE/64mwmjrHc45yq5x3hQC 0Q7WpMdnQkmQcge320gNC6pjXlYbhhXQsEi9B3pUpgrIFLS44kKFWV4vD8m/OqjqWD+tjRnQ744 /G42IWMSY9r3nlxDFdIT8vemmrhLXWqQCEDgopYoaqks+Y5kg8imVheeOPANvDj2/W2Sew+2X0A == X-Received: by 2002:a05:622a:4d8e:b0:3a6:258d:4387 with SMTP id ff14-20020a05622a4d8e00b003a6258d4387mr42497444qtb.13.1671205873269; Fri, 16 Dec 2022 07:51:13 -0800 (PST) X-Received: by 2002:a05:622a:4d8e:b0:3a6:258d:4387 with SMTP id ff14-20020a05622a4d8e00b003a6258d4387mr42497407qtb.13.1671205872924; Fri, 16 Dec 2022 07:51:12 -0800 (PST) Received: from x1n.redhat.com (bras-base-aurron9127w-grc-45-70-31-26-132.dsl.bell.ca. [70.31.26.132]) by smtp.gmail.com with ESMTPSA id s81-20020a37a954000000b006eeb3165554sm1682297qke.19.2022.12.16.07.51.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 16 Dec 2022 07:51:12 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Muchun Song , Miaohe Lin , Andrea Arcangeli , Nadav Amit , James Houghton , peterx@redhat.com, Mike Kravetz , David Hildenbrand , Rik van Riel , John Hubbard , Andrew Morton , Jann Horn Subject: [PATCH v4 3/9] mm/hugetlb: Document huge_pte_offset usage Date: Fri, 16 Dec 2022 10:50:54 -0500 Message-Id: <20221216155100.2043537-4-peterx@redhat.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221216155100.2043537-1-peterx@redhat.com> References: <20221216155100.2043537-1-peterx@redhat.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752386552351927755?= X-GMAIL-MSGID: =?utf-8?q?1752386552351927755?= huge_pte_offset() is potentially a pgtable walker, looking up pte_t* for a hugetlb address. Normally, it's always safe to walk a generic pgtable as long as we're with the mmap lock held for either read or write, because that guarantees the pgtable pages will always be valid during the process. But it's not true for hugetlbfs, especially shared: hugetlbfs can have its pgtable freed by pmd unsharing, it means that even with mmap lock held for current mm, the PMD pgtable page can still go away from under us if pmd unsharing is possible during the walk. So we have two ways to make it safe even for a shared mapping: (1) If we're with the hugetlb vma lock held for either read/write, it's okay because pmd unshare cannot happen at all. (2) If we're with the i_mmap_rwsem lock held for either read/write, it's okay because even if pmd unshare can happen, the pgtable page cannot be freed from under us. Document it. Reviewed-by: John Hubbard Reviewed-by: David Hildenbrand Signed-off-by: Peter Xu --- include/linux/hugetlb.h | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 551834cd5299..d755e2a7c0db 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -192,6 +192,38 @@ extern struct list_head huge_boot_pages; pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz); +/* + * huge_pte_offset(): Walk the hugetlb pgtable until the last level PTE. + * Returns the pte_t* if found, or NULL if the address is not mapped. + * + * Since this function will walk all the pgtable pages (including not only + * high-level pgtable page, but also PUD entry that can be unshared + * concurrently for VM_SHARED), the caller of this function should be + * responsible of its thread safety. One can follow this rule: + * + * (1) For private mappings: pmd unsharing is not possible, so holding the + * mmap_lock for either read or write is sufficient. Most callers + * already hold the mmap_lock, so normally, no special action is + * required. + * + * (2) For shared mappings: pmd unsharing is possible (so the PUD-ranged + * pgtable page can go away from under us! It can be done by a pmd + * unshare with a follow up munmap() on the other process), then we + * need either: + * + * (2.1) hugetlb vma lock read or write held, to make sure pmd unshare + * won't happen upon the range (it also makes sure the pte_t we + * read is the right and stable one), or, + * + * (2.2) hugetlb mapping i_mmap_rwsem lock held read or write, to make + * sure even if unshare happened the racy unmap() will wait until + * i_mmap_rwsem is released. + * + * Option (2.1) is the safest, which guarantees pte stability from pmd + * sharing pov, until the vma lock released. Option (2.2) doesn't protect + * a concurrent pmd unshare, but it makes sure the pgtable page is safe to + * access. + */ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr, unsigned long sz); unsigned long hugetlb_mask_last_page(struct hstate *h); From patchwork Fri Dec 16 15:50:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 33985 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:e747:0:0:0:0:0 with SMTP id c7csp1044072wrn; Fri, 16 Dec 2022 07:53:56 -0800 (PST) X-Google-Smtp-Source: AA0mqf5E3xg5ljKTSISEOWyA2WIalMAGvBiknFKP/DjL0fsglVINTAah+I8L9Ur0dE/6bCoNsGIl X-Received: by 2002:a17:903:258e:b0:189:83bb:646f with SMTP id jb14-20020a170903258e00b0018983bb646fmr32251897plb.17.1671206035887; Fri, 16 Dec 2022 07:53:55 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671206035; cv=none; d=google.com; s=arc-20160816; b=gCI/OepAAwEqI9E6S0kGFPkqszO1ZdXZGcA578SbWXU2IR6JeOauI++RyUPkJC5QA9 kBqjyBhR2m72JJXlkPG4LsmSc6IFHi+MiGES1CddteBc40GbTTJc1GrM9G+5n0Bj4N/k BxLL8k8Rxy9Z5lDUZgqqSkFpiAZwP4OF8xIRq9AixzWMASRanw1WYmSF9uHvjoIaWX8q iYCTkbmYvVsPhaTPI5A81naBUrciWoxGwX2p/wtg49Dvuka3uP3y+/G4ZtxTTAu4pMX0 4tYXaWL3ZJleflXmrrnrj5ejWS4o2vwAl6MkuJ7R34GH9Ksh4yjL0OLUVcWa+JomoVgL xPeA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=vsQ4xrOeNHA8Em2t3SYPnGMugJW9C8Z5spFzSFBIRTk=; b=bx4gMPyrkohEflNCaBX5A7F+efnm4RW4W6PwlTcQIxNRBE4WIuYjaAkWLoNL31SY+M FVBxSn7Gcyan9mRkUxlu5LAVLcofnBbjqykgJ7W2gVJWWHr5q1Jb6jO6lX6PnvuFhZCA 97l+RA56rN854sFS+vWLAotQ1zUVU49nBN3uTRkhTlQx3g4HyQQhR/WD6NwMiRHOG9Xc eCvY8nO48zc64POeR+Jf0/f8ZRTyA2nsoHpyJb/MEssFsyB3EuwcQ7L/JrW+8Z4kQp0F wlDb4RvCaaVuqanP/PoK2PjjpyFGFyrESV/agesdgItNDEN7m/0tElIkaRi6trkjdx4C br5A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=UZv+jF8K; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j15-20020a170903024f00b00174c5fdc8d2si2653915plh.307.2022.12.16.07.53.43; Fri, 16 Dec 2022 07:53:55 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=UZv+jF8K; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231487AbiLPPwR (ORCPT + 99 others); Fri, 16 Dec 2022 10:52:17 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57730 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231470AbiLPPwH (ORCPT ); Fri, 16 Dec 2022 10:52:07 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1A6A2566C1 for ; Fri, 16 Dec 2022 07:51:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1671205878; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vsQ4xrOeNHA8Em2t3SYPnGMugJW9C8Z5spFzSFBIRTk=; b=UZv+jF8Kz+8bGvGETFWTE+EZR1w/nrRz4jiObH3ORAHn7dx/aTN8scjsIas7aPAEUmekvf 7+RvbbwdI74ki/Q2Zt8ai8GdU4N6t+dsX0pko724+OjD3xALvzEOSh5ue+me73VnVALZkw cEzJcdl+naKpME6RYeVVUCULoi3Dlws= Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-424-806yTOPjMzm4KEE3-uOTVg-1; Fri, 16 Dec 2022 10:51:17 -0500 X-MC-Unique: 806yTOPjMzm4KEE3-uOTVg-1 Received: by mail-qt1-f200.google.com with SMTP id fg11-20020a05622a580b00b003a7eaa5cb47so1770855qtb.15 for ; Fri, 16 Dec 2022 07:51:17 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vsQ4xrOeNHA8Em2t3SYPnGMugJW9C8Z5spFzSFBIRTk=; b=VFHNc0nt/IJDAHSjtcxmKtJmhoi8E61D3v9fBUQWQR6YldEPM8UvutFxp1RepM8erM Ogt6v4FzvAezI0dxXRJy/qYScaNNxif1Y4Yce5HJT5uvHsLOTIliSuXz3MUwAxxiBjNQ 7waKeVWRieYTa0ETzqEaCLmvPnCKs8IpkP6iH/ekfypZHLyxeCY7mby+lR6/24Ea8fXE RwCii4vWFp0ZDV/9MGxjZ9Z96UMbWe0XNwKr3eaeNwqlM+l1+Kv2xf9wZCFrkq4Htuyc 6C9f7hmmQDj8wluklFrzUfr/XgyrflYeFtatdErPpSdUpkQdXJmSne5R/pwIdAc/Q7wN JrMA== X-Gm-Message-State: AFqh2krXGgQpXwCmoP62gXD0DsnaqRqlYr6lfSqmeoBwef0NBwgvcRUp voiTFQ7zyW2i9MWf8DEexjj9zb6k4lskpYVk+/LFwSgHtbG94NSEhvhqTN0nfkqV1vkhjBe3EKT CcDt/wU/loPlC+YhbQPyqHFuQE8Oylhhxv5HVXCcW0ErAWguSGEAVwmk5XdAQYCo4c+jBcn8DkA == X-Received: by 2002:a05:6214:3787:b0:4f3:7d92:13a9 with SMTP id ni7-20020a056214378700b004f37d9213a9mr8375891qvb.15.1671205875754; Fri, 16 Dec 2022 07:51:15 -0800 (PST) X-Received: by 2002:a05:6214:3787:b0:4f3:7d92:13a9 with SMTP id ni7-20020a056214378700b004f37d9213a9mr8375847qvb.15.1671205875427; Fri, 16 Dec 2022 07:51:15 -0800 (PST) Received: from x1n.redhat.com (bras-base-aurron9127w-grc-45-70-31-26-132.dsl.bell.ca. [70.31.26.132]) by smtp.gmail.com with ESMTPSA id s81-20020a37a954000000b006eeb3165554sm1682297qke.19.2022.12.16.07.51.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 16 Dec 2022 07:51:14 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Muchun Song , Miaohe Lin , Andrea Arcangeli , Nadav Amit , James Houghton , peterx@redhat.com, Mike Kravetz , David Hildenbrand , Rik van Riel , John Hubbard , Andrew Morton , Jann Horn Subject: [PATCH v4 4/9] mm/hugetlb: Move swap entry handling into vma lock when faulted Date: Fri, 16 Dec 2022 10:50:55 -0500 Message-Id: <20221216155100.2043537-5-peterx@redhat.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221216155100.2043537-1-peterx@redhat.com> References: <20221216155100.2043537-1-peterx@redhat.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752386539987030599?= X-GMAIL-MSGID: =?utf-8?q?1752386539987030599?= In hugetlb_fault(), there used to have a special path to handle swap entry at the entrance using huge_pte_offset(). That's unsafe because huge_pte_offset() for a pmd sharable range can access freed pgtables if without any lock to protect the pgtable from being freed after pmd unshare. Here the simplest solution to make it safe is to move the swap handling to be after the vma lock being held. We may need to take the fault mutex on either migration or hwpoison entries now (also the vma lock, but that's really needed), however neither of them is hot path. Note that the vma lock cannot be released in hugetlb_fault() when the migration entry is detected, because in migration_entry_wait_huge() the pgtable page will be used again (by taking the pgtable lock), so that also need to be protected by the vma lock. Modify migration_entry_wait_huge() so that it must be called with vma read lock held, and properly release the lock in __migration_entry_wait_huge(). Reviewed-by: Mike Kravetz Reviewed-by: John Hubbard Signed-off-by: Peter Xu --- include/linux/swapops.h | 6 ++++-- mm/hugetlb.c | 37 ++++++++++++++++--------------------- mm/migrate.c | 25 +++++++++++++++++++++---- 3 files changed, 41 insertions(+), 27 deletions(-) diff --git a/include/linux/swapops.h b/include/linux/swapops.h index b982dd614572..3a451b7afcb3 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -337,7 +337,8 @@ extern void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep, extern void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, unsigned long address); #ifdef CONFIG_HUGETLB_PAGE -extern void __migration_entry_wait_huge(pte_t *ptep, spinlock_t *ptl); +extern void __migration_entry_wait_huge(struct vm_area_struct *vma, + pte_t *ptep, spinlock_t *ptl); extern void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte); #endif /* CONFIG_HUGETLB_PAGE */ #else /* CONFIG_MIGRATION */ @@ -366,7 +367,8 @@ static inline void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep, static inline void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, unsigned long address) { } #ifdef CONFIG_HUGETLB_PAGE -static inline void __migration_entry_wait_huge(pte_t *ptep, spinlock_t *ptl) { } +static inline void __migration_entry_wait_huge(struct vm_area_struct *vma, + pte_t *ptep, spinlock_t *ptl) { } static inline void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte) { } #endif /* CONFIG_HUGETLB_PAGE */ static inline int is_writable_migration_entry(swp_entry_t entry) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 8ccd55f9fbd3..64512a151567 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5972,22 +5972,6 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, int need_wait_lock = 0; unsigned long haddr = address & huge_page_mask(h); - ptep = huge_pte_offset(mm, haddr, huge_page_size(h)); - if (ptep) { - /* - * Since we hold no locks, ptep could be stale. That is - * OK as we are only making decisions based on content and - * not actually modifying content here. - */ - entry = huge_ptep_get(ptep); - if (unlikely(is_hugetlb_entry_migration(entry))) { - migration_entry_wait_huge(vma, ptep); - return 0; - } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) - return VM_FAULT_HWPOISON_LARGE | - VM_FAULT_SET_HINDEX(hstate_index(h)); - } - /* * Serialize hugepage allocation and instantiation, so that we don't * get spurious allocation failures if two CPUs race to instantiate @@ -6002,10 +5986,6 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, * Acquire vma lock before calling huge_pte_alloc and hold * until finished with ptep. This prevents huge_pmd_unshare from * being called elsewhere and making the ptep no longer valid. - * - * ptep could have already be assigned via huge_pte_offset. That - * is OK, as huge_pte_alloc will return the same value unless - * something has changed. */ hugetlb_vma_lock_read(vma); ptep = huge_pte_alloc(mm, vma, haddr, huge_page_size(h)); @@ -6034,8 +6014,23 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, * fault, and is_hugetlb_entry_(migration|hwpoisoned) check will * properly handle it. */ - if (!pte_present(entry)) + if (!pte_present(entry)) { + if (unlikely(is_hugetlb_entry_migration(entry))) { + /* + * Release the hugetlb fault lock now, but retain + * the vma lock, because it is needed to guard the + * huge_pte_lockptr() later in + * migration_entry_wait_huge(). The vma lock will + * be released there. + */ + mutex_unlock(&hugetlb_fault_mutex_table[hash]); + migration_entry_wait_huge(vma, ptep); + return 0; + } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) + ret = VM_FAULT_HWPOISON_LARGE | + VM_FAULT_SET_HINDEX(hstate_index(h)); goto out_mutex; + } /* * If we are going to COW/unshare the mapping later, we examine the diff --git a/mm/migrate.c b/mm/migrate.c index a4d3fc65085f..98de7ce2b576 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -329,24 +329,41 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, } #ifdef CONFIG_HUGETLB_PAGE -void __migration_entry_wait_huge(pte_t *ptep, spinlock_t *ptl) +/* + * The vma read lock must be held upon entry. Holding that lock prevents either + * the pte or the ptl from being freed. + * + * This function will release the vma lock before returning. + */ +void __migration_entry_wait_huge(struct vm_area_struct *vma, + pte_t *ptep, spinlock_t *ptl) { pte_t pte; + hugetlb_vma_assert_locked(vma); spin_lock(ptl); pte = huge_ptep_get(ptep); - if (unlikely(!is_hugetlb_entry_migration(pte))) + if (unlikely(!is_hugetlb_entry_migration(pte))) { spin_unlock(ptl); - else + hugetlb_vma_unlock_read(vma); + } else { + /* + * If migration entry existed, safe to release vma lock + * here because the pgtable page won't be freed without the + * pgtable lock released. See comment right above pgtable + * lock release in migration_entry_wait_on_locked(). + */ + hugetlb_vma_unlock_read(vma); migration_entry_wait_on_locked(pte_to_swp_entry(pte), NULL, ptl); + } } void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte) { spinlock_t *ptl = huge_pte_lockptr(hstate_vma(vma), vma->vm_mm, pte); - __migration_entry_wait_huge(pte, ptl); + __migration_entry_wait_huge(vma, pte, ptl); } #endif From patchwork Fri Dec 16 15:52:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 33988 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:e747:0:0:0:0:0 with SMTP id c7csp1044203wrn; Fri, 16 Dec 2022 07:54:10 -0800 (PST) X-Google-Smtp-Source: AA0mqf7valSIiVg8JJRWgdX1n0gLwOR36BURFezgCNZDJP4Smze1lAJCSQNNkP+fbzsz45urUFd9 X-Received: by 2002:a05:6a20:2d23:b0:ac:19cf:1553 with SMTP id g35-20020a056a202d2300b000ac19cf1553mr39968091pzl.61.1671206050234; Fri, 16 Dec 2022 07:54:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671206050; cv=none; d=google.com; s=arc-20160816; b=uxcV84tib6dwnrM6PLm+vMCcBgD+8dCd+RtxUEMzlVbWpBmMTXzQUWL7T0F0LyuRmh I0FVdL6vnQA3e12CytRcU1T0d9gfglinEXp9CeqRUlPMgXLLZbj9OC9c1A6a2vijX/3q r6lW97VoXdyvqb0s2XPoN4TpHuCQ6p0k6dt6Kk3rjTLmxmCApdavgL4eHHeayvg6VCg8 u3ZaQWA9hpxKobDBFxgowAmXjuOdhFnAUS6dngoTLTCSKCQIfxQyyHy8lTJcR6Q2sKLI 0fs/Gqb82pX19qkOpfPJjT6RB5rOhBZC2lZ6VSFAsp5mejkoUCbOnqtijMlOrppr0h9o CAGg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=oQxWaXmf/BdMLd8AroQPz0fxFz+2cpe/6LbLYuumgUQ=; b=EK1o9OvsCznTNTXbNcADwgWqw/eqL5OkNnXLHfWYbRHYhEDXZiwOiRQHMMgBLmUtqW Hx5ylKX5K3skP/Mo/gT1zX/sHc9DmNzEa/wPl4w1ur2qZnP5jlKAsc2NJK6K4hqvct3y Q/14WUfr62+S1wYlL38VnWJ0kBljbxqDY7G7QFcSxeDIzX/GgwJXhZTY4/7w18KTmVOE HrTKFzzbZTqZ4uQK9BYPWIDKS2gm171qMLbyrMsHsYvwqAtJ16PuEedQ2X2UnYc0dkSw rlF8M4YVkuD0ms78OcdujI2znlavEoBMcy2Z7EVQQpMzImKo+tc9iLUf9v27SIdvtSiU uydA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=CpnfMNzr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v186-20020a6389c3000000b0047767ddcdcesi2925333pgd.566.2022.12.16.07.53.57; Fri, 16 Dec 2022 07:54:10 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=CpnfMNzr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231520AbiLPPxa (ORCPT + 99 others); Fri, 16 Dec 2022 10:53:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59488 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231528AbiLPPxK (ORCPT ); Fri, 16 Dec 2022 10:53:10 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 432CDDE8A for ; Fri, 16 Dec 2022 07:52:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1671205942; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oQxWaXmf/BdMLd8AroQPz0fxFz+2cpe/6LbLYuumgUQ=; b=CpnfMNzrUAe4bVCEv5QDWfUfNSi3DWx08C7axcKr+2TFdeP6PQUH1KUT8kso/poJ43D89x 1/K23fKjCTFuS5DA/nM+BLN8WQfM97hD/TD0uX2zIdjCskAgCGib8dMX6BAQpQEBprqZIM MGeS9HzGDj+TCcq57lv+w91Xui9NLcc= Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-90-mMvBi-XwPmi24i5upU1LWw-1; Fri, 16 Dec 2022 10:52:21 -0500 X-MC-Unique: mMvBi-XwPmi24i5upU1LWw-1 Received: by mail-qk1-f200.google.com with SMTP id v7-20020a05620a0f0700b006faffce43b2so2074621qkl.9 for ; Fri, 16 Dec 2022 07:52:21 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=oQxWaXmf/BdMLd8AroQPz0fxFz+2cpe/6LbLYuumgUQ=; b=E4LDl3xOR/kboR8ldS44s8Ku+6HzBg1Z9On16N1bM9S4D681F8q/L1XW3FkLF6NNFg oXoaGIoGFNfLXIlGP5pjUldxoflyQR+cA4T8Lp3KuSFyY17CHrChvFiiCSgUM7JVPuLX D34Cz9vWVcYaE2zbbgr248UuW45KUMfjMQlVAzE7s7RkJz5N8I/C4pk/hP0q9/Ta8eSu o/ytFu6C68Z7oHGA1WqAKKy652hAxmTRontalPTykYUtvF7MTkGWuiIwHGgk949KHyGr e22vDOrHGglNIIdyCmk6IInLe6dNYk8wZyCoUHhsAyOCrTeEeFJksdTjAVc5BeQ2b9hT +8tA== X-Gm-Message-State: ANoB5pn9kwupEWJ1hiSsI2CXerB1fWUdrEd6168inwVeYr5QtoURVVJr 34xgAwBca8hlnVkjRlALOJdFOpu4FITndSdTWmTS+8qIbVjpGE+2uP8BTmgHOsEhUrnEThow4Np 0Ymhhb/gbYsB1gqLNE8Tn330sI8otV2DaoIivzjyjzS8YHyCct6A+wxobKVw6oMur4nEIQgXouA == X-Received: by 2002:a05:622a:4c11:b0:3a5:2f80:5b08 with SMTP id ey17-20020a05622a4c1100b003a52f805b08mr46852455qtb.1.1671205939661; Fri, 16 Dec 2022 07:52:19 -0800 (PST) X-Received: by 2002:a05:622a:4c11:b0:3a5:2f80:5b08 with SMTP id ey17-20020a05622a4c1100b003a52f805b08mr46852414qtb.1.1671205939371; Fri, 16 Dec 2022 07:52:19 -0800 (PST) Received: from x1n.redhat.com (bras-base-aurron9127w-grc-45-70-31-26-132.dsl.bell.ca. [70.31.26.132]) by smtp.gmail.com with ESMTPSA id x13-20020ac8120d000000b003a6847d6386sm1483787qti.68.2022.12.16.07.52.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 16 Dec 2022 07:52:18 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: David Hildenbrand , John Hubbard , Muchun Song , Mike Kravetz , Nadav Amit , Andrea Arcangeli , Rik van Riel , peterx@redhat.com, Miaohe Lin , Jann Horn , James Houghton , Andrew Morton Subject: [PATCH v4 5/9] mm/hugetlb: Make userfaultfd_huge_must_wait() safe to pmd unshare Date: Fri, 16 Dec 2022 10:52:17 -0500 Message-Id: <20221216155217.2043700-1-peterx@redhat.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221216155100.2043537-1-peterx@redhat.com> References: <20221216155100.2043537-1-peterx@redhat.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752386555317270038?= X-GMAIL-MSGID: =?utf-8?q?1752386555317270038?= We can take the hugetlb walker lock, here taking vma lock directly. Reviewed-by: David Hildenbrand Reviewed-by: Mike Kravetz Reviewed-by: John Hubbard Signed-off-by: Peter Xu --- fs/userfaultfd.c | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 98ac37e34e3d..887e20472051 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -376,7 +376,8 @@ static inline unsigned int userfaultfd_get_blocking_state(unsigned int flags) */ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) { - struct mm_struct *mm = vmf->vma->vm_mm; + struct vm_area_struct *vma = vmf->vma; + struct mm_struct *mm = vma->vm_mm; struct userfaultfd_ctx *ctx; struct userfaultfd_wait_queue uwq; vm_fault_t ret = VM_FAULT_SIGBUS; @@ -403,7 +404,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) */ mmap_assert_locked(mm); - ctx = vmf->vma->vm_userfaultfd_ctx.ctx; + ctx = vma->vm_userfaultfd_ctx.ctx; if (!ctx) goto out; @@ -493,6 +494,15 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) blocking_state = userfaultfd_get_blocking_state(vmf->flags); + /* + * Take the vma lock now, in order to safely call + * userfaultfd_huge_must_wait() later. Since acquiring the + * (sleepable) vma lock can modify the current task state, that + * must be before explicitly calling set_current_state(). + */ + if (is_vm_hugetlb_page(vma)) + hugetlb_vma_lock_read(vma); + spin_lock_irq(&ctx->fault_pending_wqh.lock); /* * After the __add_wait_queue the uwq is visible to userland @@ -507,13 +517,15 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) set_current_state(blocking_state); spin_unlock_irq(&ctx->fault_pending_wqh.lock); - if (!is_vm_hugetlb_page(vmf->vma)) + if (!is_vm_hugetlb_page(vma)) must_wait = userfaultfd_must_wait(ctx, vmf->address, vmf->flags, reason); else - must_wait = userfaultfd_huge_must_wait(ctx, vmf->vma, + must_wait = userfaultfd_huge_must_wait(ctx, vma, vmf->address, vmf->flags, reason); + if (is_vm_hugetlb_page(vma)) + hugetlb_vma_unlock_read(vma); mmap_read_unlock(mm); if (likely(must_wait && !READ_ONCE(ctx->released))) { From patchwork Fri Dec 16 15:52:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 33989 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:e747:0:0:0:0:0 with SMTP id c7csp1044401wrn; Fri, 16 Dec 2022 07:54:34 -0800 (PST) X-Google-Smtp-Source: AA0mqf5yHLUzWsbchvynBsACcwe6b2K2JZQSTs7GVwQpyhjfrjdbGGMPcfvH2qz+kxBEEvqLjKfn X-Received: by 2002:a17:902:e8d6:b0:187:403c:7a3b with SMTP id v22-20020a170902e8d600b00187403c7a3bmr28584773plg.69.1671206073805; Fri, 16 Dec 2022 07:54:33 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671206073; cv=none; d=google.com; s=arc-20160816; b=Bk523aqS7lgN7ERv1lUkKk1GT31VJ+Hm0rihvm37mO4ELd3qMyA1dMhycLBrYrTbsD KTOXXv0Y+b5oC7r+8U5CE8t+J3hmNNqJM9pjqW7fESA6n9VzXeZya9+FlN6gV1s7WFBZ w0fWfo20f9aJNqIc4hvnuW0vZXDsZmFmML/Z877Kj9i0N5uUp20vMPcRq6XM2oGnDR+4 AMM1ehEabzeNQBQywJC2atDt0hnaWeMvi2+pyqUQstFn0DWMjN0EjaF/QU8PRDI8yY5M NNxvsMwtVIC3wLE53/mBb1mTbwWMh9DxLTEH9q5KSr7jjLSdAUf9lmnvrwJKVTiKisER jqFw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=7a/XOpLMoccJVDSJCL0inQbFYq9WEos4O0UHGEG2GQ8=; b=oyqH9kAmUTt5wCal3mPugACB6jBmN5Tl+ZNdnbVz6obKp0EvUWXacThjErdcCbzJbT H7lN5XaBQCADHngcH1rS3H3FPEAqBoPabSZY9wQe+Yior+v+ZsaBcFrOmTO0Tx8FPege DNbR6e+zyjBWquUGFS0fx4Tf7cV+J/J1KPED+OFZ0g1iTC4uCx24nrenM89NtecFVls0 J5zUpmAiqCLSA4CSAI8BnoXT6AbUivnIgcAWNdcMtHf76lcOqRNT04hJ0JF9EsynbRt5 yN0pfcHehY9Yw9lQJKHd5P+IU/e/aNVDgNl9hftNaQJT62I04IEBTv4lO+5VoQk9oZWn I+Rg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=gcukfW1j; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z14-20020a170903018e00b0017d9b373175si2959393plg.415.2022.12.16.07.54.21; Fri, 16 Dec 2022 07:54:33 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=gcukfW1j; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231547AbiLPPx5 (ORCPT + 99 others); Fri, 16 Dec 2022 10:53:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60040 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231472AbiLPPxe (ORCPT ); Fri, 16 Dec 2022 10:53:34 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6EEBB2D1FB for ; Fri, 16 Dec 2022 07:52:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1671205946; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7a/XOpLMoccJVDSJCL0inQbFYq9WEos4O0UHGEG2GQ8=; b=gcukfW1jx9ghc3Rxufk88BiLnwzhHzCfgP2XJE/8mW0eMRhZLDZODrPPF4u2QkIlwgJuN4 Pm3YueXz49zCuOj3nAatDVBY5YC9AGzJDKkf/vXStR6JynnOIkG1HKZVmAi6QasnPdQUh+ pqs+WLfz/BN2Q7yUqZJfR0npU3z7+uc= Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-593-ShNxBb3rM6WTLcfA3stdXg-1; Fri, 16 Dec 2022 10:52:25 -0500 X-MC-Unique: ShNxBb3rM6WTLcfA3stdXg-1 Received: by mail-qk1-f198.google.com with SMTP id h8-20020a05620a284800b006b5c98f09fbso2068038qkp.21 for ; Fri, 16 Dec 2022 07:52:25 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7a/XOpLMoccJVDSJCL0inQbFYq9WEos4O0UHGEG2GQ8=; b=fPcB4WN+0S5MZj8Eoq1pix7GLfdTZPDu8ViVxFrtG4HG4hlFwGzpyBV3N1vCuniuDO aX8dCM3KMnM77mBchZ0mCQgNQvGJ1CBXOIkeVwqH4JU08j/qtmk0a1v7/+ESKCQyxNIv 3alMtc0Rxhu858K5adkAI3BXVL0zI3ZsgMVyf4tBwkmP7OdWL0RKHHZxDhYuLlP+7amK quwCaDRmente9ZdJj3ssyV80HzQrMN/UCLfe9fPIDQduoLk0x+DxSVtQoL4VMNrzZ2Du i7uyihR/MHBO1Uf6o7T3q/Qb/8RlhwmXviYrudXGwl9ROB5hHzLoUnwtI6r8pttd7BBV GruQ== X-Gm-Message-State: ANoB5pnswnm1/J8u/TUrlyM35BAWl7C9mb3X/I1V0oFDWVULax0voNcP d4yYHlPZU9sTMrwb59CuSME5xdGdixuHp2lLrFppoP3laULI9MTvO4P3R78XHMqIlTAlOjJraAt UEwXGBNzkXKz2ZC0JcomN5V/WU76oqKnwevSzAt6ToOu3OEs92lc8uLqAn6ayt01xYhla0PzwFA == X-Received: by 2002:ac8:5755:0:b0:3a6:9162:245e with SMTP id 21-20020ac85755000000b003a69162245emr61455526qtx.36.1671205943533; Fri, 16 Dec 2022 07:52:23 -0800 (PST) X-Received: by 2002:ac8:5755:0:b0:3a6:9162:245e with SMTP id 21-20020ac85755000000b003a69162245emr61455484qtx.36.1671205943290; Fri, 16 Dec 2022 07:52:23 -0800 (PST) Received: from x1n.redhat.com (bras-base-aurron9127w-grc-45-70-31-26-132.dsl.bell.ca. [70.31.26.132]) by smtp.gmail.com with ESMTPSA id d136-20020ae9ef8e000000b006fef157c8aesm1673001qkg.36.2022.12.16.07.52.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 16 Dec 2022 07:52:22 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: David Hildenbrand , John Hubbard , Muchun Song , Mike Kravetz , Nadav Amit , Andrea Arcangeli , Rik van Riel , peterx@redhat.com, Miaohe Lin , Jann Horn , James Houghton , Andrew Morton Subject: [PATCH v4 6/9] mm/hugetlb: Make hugetlb_follow_page_mask() safe to pmd unshare Date: Fri, 16 Dec 2022 10:52:19 -0500 Message-Id: <20221216155219.2043714-1-peterx@redhat.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221216155100.2043537-1-peterx@redhat.com> References: <20221216155100.2043537-1-peterx@redhat.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752386579967012551?= X-GMAIL-MSGID: =?utf-8?q?1752386579967012551?= Since hugetlb_follow_page_mask() walks the pgtable, it needs the vma lock to make sure the pgtable page will not be freed concurrently. Acked-by: David Hildenbrand Reviewed-by: Mike Kravetz Reviewed-by: John Hubbard Signed-off-by: Peter Xu --- mm/hugetlb.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 64512a151567..0bf0abea388d 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6375,9 +6375,10 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, if (WARN_ON_ONCE(flags & FOLL_PIN)) return NULL; + hugetlb_vma_lock_read(vma); pte = huge_pte_offset(mm, haddr, huge_page_size(h)); if (!pte) - return NULL; + goto out_unlock; ptl = huge_pte_lock(h, mm, pte); entry = huge_ptep_get(pte); @@ -6400,6 +6401,8 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, } out: spin_unlock(ptl); +out_unlock: + hugetlb_vma_unlock_read(vma); return page; } From patchwork Fri Dec 16 15:52:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 33992 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:e747:0:0:0:0:0 with SMTP id c7csp1045796wrn; Fri, 16 Dec 2022 07:57:24 -0800 (PST) X-Google-Smtp-Source: AA0mqf7GROiNjeYI0Yqd8ZQzrSHP0f9jZU7r7TLAcdbXRJab+XYMplYukaDDMcdP9+2ZsvxNjJmW X-Received: by 2002:a17:903:2347:b0:189:bcf8:c1a0 with SMTP id c7-20020a170903234700b00189bcf8c1a0mr49875065plh.69.1671206244027; Fri, 16 Dec 2022 07:57:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671206244; cv=none; d=google.com; s=arc-20160816; b=eize8GLmDmNSzmZMxaj90/6PXN6sNdzIKKkS2NMvQovL3l/+XlS1QpenizanQdx2wp XNDF7/sDiekQfT3wKEd/6bYbp9JWt6XGElt+IQKdTWvEuRFSFJbRtzmbnIoERc5uk3fV cZtiNPHA1u7ezABWfAliIHjmj18OeuW6y9LZbhYxC52k4nTJATcizFmwIJofTauV5VFj y8e5nj90Lrh0I3qmOFXWQ3Zvrudnce6dvgzK0OdtaikwaIxiIYdnMDGrdyoqD5JnCaj2 rIeOoH/ef1nUBrvWWYJImK5UZESr90lmqog7FMPo54JlLHGw86Xofyy7hHowuQFkPdW4 gVeA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=rIRGvziR/92Z54UoPp5pwcNYu+IBI1cm3v25qmGTr7U=; b=lMsNKmBOTvEp96hBv2R+zpKtINk7+cy7rcScei3jgtztyqiGe3RbniF8+HjlstvuD/ 68hUdb2chDe9SXK5GJDgi0hf7gxqvTTIcPZZPWrtUZ+78WDxqipxhdi2V56bRcqPcuJA n+WobR6g8UVwWAssNtyYe1fkExsIXFZPDJe0BvbP+87b1FYD/OM+CbtwGrdAkop2/j+D aNaRJr6oCzxUu51JGuRU9CAYAgk7ACv9cjErKYakZA80BoWWrwfAFR6VvdOl3oS0iFy6 OIUhExpkauMNWk7n9X6qZmySNf/MfB6OsneyJrwrWkibmLn3w2kMhU6Q+X43Kpur4NwE a5Ug== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=HOjjp+kq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g187-20020a636bc4000000b004860ae7cac1si893513pgc.102.2022.12.16.07.57.10; Fri, 16 Dec 2022 07:57:23 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=HOjjp+kq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231605AbiLPPyM (ORCPT + 99 others); Fri, 16 Dec 2022 10:54:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59044 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231531AbiLPPxk (ORCPT ); Fri, 16 Dec 2022 10:53:40 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A47D76F0C2 for ; Fri, 16 Dec 2022 07:52:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1671205949; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rIRGvziR/92Z54UoPp5pwcNYu+IBI1cm3v25qmGTr7U=; b=HOjjp+kqXAbRlmu+PzSl07pk/lyPcxklwi4UAsoNTmQ0OfY0bCY3t7y9HtM6PykFfgV3e+ H41/7KK8dZ7knL/ZMvEOG2a8rknPdfcNZFsG7ciXVjk2tawYAqdTEB42jVIgmSSw6RpHsd hjoHuVfKpA7cEvdGFpI1o1DrZVlnItU= Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-645-4vPy465zNb-doL3sbTb1YA-1; Fri, 16 Dec 2022 10:52:28 -0500 X-MC-Unique: 4vPy465zNb-doL3sbTb1YA-1 Received: by mail-qk1-f200.google.com with SMTP id h13-20020a05620a244d00b006fb713618b8so2133899qkn.0 for ; Fri, 16 Dec 2022 07:52:28 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rIRGvziR/92Z54UoPp5pwcNYu+IBI1cm3v25qmGTr7U=; b=rLf+TmgFIKV6Jl5aQSXVf+k3E9fav2wd9aATqnhGyJD6X9FWZIhMTkzr641DtGqwDP /WxrrWb+u5XDEk5qo2DnqcmVPQJmHsQguJAnq02U0zPmEeFpt+iGcQSTtdasQdNl2pAq 5FdTvfYHEnvtdnTpktgwwdc798RqC5yBi5yl+4V28XnMMm4O6vhduRTMhff5TzsftwnX NfgcOiYaaM0OpoUiur0dqnQgO5ssvGrfPUTSxTrTRpXK0BWjrJMSr6qfW6TQnU+kHYoW SpcFWq+49KEdLd8w2Pz2kXmxYZu+92ebExXBLKNnGI+TqNzKy8ElLSDA1Q8Nww2rsPLx a1+A== X-Gm-Message-State: ANoB5pn+l/HdWTyAgzhupw65vW4YTqx/ptXTVjzu06flVxlAwHXBMkfT i6b3eaqzZD4pVTd5Pb12/xjjmGX1MN649Ikj+YJCl8dw6CS4lZiS+B+1EmPOXLTeMSusSGDFm+M EWhyKrGBQ/4VuvAD+gSPHn9xduNz2vtvhn+2cwTtQeaF4hhkpeSYg4TG+InI+XMsy8YfLWrmmag == X-Received: by 2002:ad4:4353:0:b0:4e8:c749:7f3a with SMTP id q19-20020ad44353000000b004e8c7497f3amr20525145qvs.50.1671205946817; Fri, 16 Dec 2022 07:52:26 -0800 (PST) X-Received: by 2002:ad4:4353:0:b0:4e8:c749:7f3a with SMTP id q19-20020ad44353000000b004e8c7497f3amr20525095qvs.50.1671205946508; Fri, 16 Dec 2022 07:52:26 -0800 (PST) Received: from x1n.redhat.com (bras-base-aurron9127w-grc-45-70-31-26-132.dsl.bell.ca. [70.31.26.132]) by smtp.gmail.com with ESMTPSA id d136-20020ae9ef8e000000b006fef157c8aesm1673097qkg.36.2022.12.16.07.52.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 16 Dec 2022 07:52:25 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: David Hildenbrand , John Hubbard , Muchun Song , Mike Kravetz , Nadav Amit , Andrea Arcangeli , Rik van Riel , peterx@redhat.com, Miaohe Lin , Jann Horn , James Houghton , Andrew Morton Subject: [PATCH v4 7/9] mm/hugetlb: Make follow_hugetlb_page() safe to pmd unshare Date: Fri, 16 Dec 2022 10:52:23 -0500 Message-Id: <20221216155223.2043727-1-peterx@redhat.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221216155100.2043537-1-peterx@redhat.com> References: <20221216155100.2043537-1-peterx@redhat.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752386758369602215?= X-GMAIL-MSGID: =?utf-8?q?1752386758369602215?= Since follow_hugetlb_page() walks the pgtable, it needs the vma lock to make sure the pgtable page will not be freed concurrently. Acked-by: David Hildenbrand Reviewed-by: Mike Kravetz Reviewed-by: John Hubbard Signed-off-by: Peter Xu --- mm/hugetlb.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 0bf0abea388d..33fe73e1e589 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6433,6 +6433,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, break; } + hugetlb_vma_lock_read(vma); /* * Some archs (sparc64, sh*) have multiple pte_ts to * each hugepage. We have to make sure we get the @@ -6457,6 +6458,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, !hugetlbfs_pagecache_present(h, vma, vaddr)) { if (pte) spin_unlock(ptl); + hugetlb_vma_unlock_read(vma); remainder = 0; break; } @@ -6478,6 +6480,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, if (pte) spin_unlock(ptl); + hugetlb_vma_unlock_read(vma); + if (flags & FOLL_WRITE) fault_flags |= FAULT_FLAG_WRITE; else if (unshare) @@ -6540,6 +6544,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, remainder -= pages_per_huge_page(h); i += pages_per_huge_page(h); spin_unlock(ptl); + hugetlb_vma_unlock_read(vma); continue; } @@ -6569,6 +6574,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, if (WARN_ON_ONCE(!try_grab_folio(pages[i], refs, flags))) { spin_unlock(ptl); + hugetlb_vma_unlock_read(vma); remainder = 0; err = -ENOMEM; break; @@ -6580,6 +6586,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, i += refs; spin_unlock(ptl); + hugetlb_vma_unlock_read(vma); } *nr_pages = remainder; /* From patchwork Fri Dec 16 15:52:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 33991 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:e747:0:0:0:0:0 with SMTP id c7csp1044859wrn; Fri, 16 Dec 2022 07:55:29 -0800 (PST) X-Google-Smtp-Source: AA0mqf5+WkjYQetbPRQk4VGBdY63VhR/4buqcC6XvprE+VeaJ76KQp6jfkpbZxJCfbMNgttrAEWc X-Received: by 2002:a17:903:2012:b0:189:d3dc:a9c6 with SMTP id s18-20020a170903201200b00189d3dca9c6mr31287950pla.19.1671206129228; Fri, 16 Dec 2022 07:55:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671206129; cv=none; d=google.com; s=arc-20160816; b=S1bRKQrnJfDhXuvoId5lYoRvUcbZKQcZ/IRaPGcT+oK2YxxOhNboR9sz/tJTOjSlCM 5irjqIpcMHwQK2+X2L/fkY0QQskhha8QfcqNpR/cvpgbXTWDSTIZdaa0srL84A0B7YXw XTqCAvc8cntvb+dCnq2bT0YZzHIHbdWShbK62iWkbx1+uhzlf5QJ0ZwThx4YmoE+WFa8 mY7M5uwtPxltD9KuFkSIuHVHw+UUjUDxNsOnvsVYXkDb+Oo2SUqTj0ORGWL9zBxJDnQh AX+zXiXarcLqZ8B0l5UUWNc2tI18Xt7NnwLTs/bDu/FAgfweaA69NUvMCgRMxJQfudvk eSOg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=SUHaufgIZGXg7uWbITubevKDDOKyb4JchyF1bpwW4KQ=; b=wStQTLmqaeqIKnTBfSlvIlJEHTWpYixjjKmQdeDZqvt74UvJgzPIg3eXB8oa/1+ZSS QV/ixjyAFA0q003z2Gp4ALmazAzZsh8lQ9Tb3p6OmOwag+ERdgoa2vSIB87Yi9DyqURC CsyCDosX19ZvO3DbSOXmWZ9fL3RX//pw+drdge5ScJRhcs0AYB2o4QgXQAXUq5SVNQXy /PlzGkWp/G1sxYmbdoovmFWBcxiihzYzkiYr390Qsx+dayFpBlBm84bLGyX3eoS7risE ZynDNNdNcYgBwXAxIpY/uMoLDgEqJpFvmkpC9R1pKz0TsHJ3C7ZM0BeZ/LyyqeKFNVMW WaYA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=F6ZrN9Ea; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q17-20020a17090311d100b00172cb948c68si3105111plh.227.2022.12.16.07.55.16; Fri, 16 Dec 2022 07:55:29 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=F6ZrN9Ea; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231579AbiLPPyH (ORCPT + 99 others); Fri, 16 Dec 2022 10:54:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60156 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231524AbiLPPxi (ORCPT ); Fri, 16 Dec 2022 10:53:38 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0AC936A742 for ; Fri, 16 Dec 2022 07:52:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1671205952; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SUHaufgIZGXg7uWbITubevKDDOKyb4JchyF1bpwW4KQ=; b=F6ZrN9Ea59ydeOaDqamuI0mZto7frQNq7V3oMvINPhSNOcn4THcOzjpuQpCXtl8o1uUXvt fhLbDinsBToFxyQo6exq1pA81e2nZwS6bR5FPHmEhknHzvYR+Ab2B0CyHxgV4VNfc84V+d XMQEue8xmU23pPQdUTQRmgBtzA6Qj9w= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-29-bAInidp2OnWKmmLUcle8DA-1; Fri, 16 Dec 2022 10:52:31 -0500 X-MC-Unique: bAInidp2OnWKmmLUcle8DA-1 Received: by mail-qv1-f71.google.com with SMTP id c10-20020a05621401ea00b004c72d0e92bcso1661841qvu.12 for ; Fri, 16 Dec 2022 07:52:31 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=SUHaufgIZGXg7uWbITubevKDDOKyb4JchyF1bpwW4KQ=; b=VJ+1b90fqMNFI7JZpMIAfKaAix/mMNw+ScYuk6vtFMVdmrVE9mVujmpoOTSn9wXcPZ DFZ6Swgan/ubN5sdzad3+K4IC0oHJLb+3j8TZhNylRNrDs+UsvDTUcQCAlDuFj+rDQdO bC4PzHpT7i4BWpwazSI+pRgDp8dqrPZDzRRlRfjrPyioenOIR/iJM7MO80S7zpGnJe9m /4s6zCTleQbBzfRz7/rcwRhSkj0/nOMcWwBkSyAXmyR93uJRhm7Nim+7Ho/jap1DuhFO PLRlxf5dJg8gsG4P+GTgKr4EDRiAPv8+3bks81FSGav3LeflmsM6FSpdILR3uMd7xxYe /+1A== X-Gm-Message-State: ANoB5pm4BcLohGW8IVNuT+UeTImFAvIvcMXQPTPDtEoFWv5haYiXcJPt 3vJkAONnVqJ1PmjIoQqJIVYpMcniYk6E6PQw6auUpgya8s1vQB6S5/j9GF18l9KCiOzTwKV3YC7 /l0O3SJ89+yfUnzWkmFKTM4ksdUQgzCs/kKX1PrjxzYxPIpgveg5b9J+SJga6EoD6E93gaGFOHA == X-Received: by 2002:a05:622a:1652:b0:3a8:1600:e60f with SMTP id y18-20020a05622a165200b003a81600e60fmr42089086qtj.14.1671205949581; Fri, 16 Dec 2022 07:52:29 -0800 (PST) X-Received: by 2002:a05:622a:1652:b0:3a8:1600:e60f with SMTP id y18-20020a05622a165200b003a81600e60fmr42089044qtj.14.1671205949310; Fri, 16 Dec 2022 07:52:29 -0800 (PST) Received: from x1n.redhat.com (bras-base-aurron9127w-grc-45-70-31-26-132.dsl.bell.ca. [70.31.26.132]) by smtp.gmail.com with ESMTPSA id h9-20020ac81389000000b003a530a32f67sm1472717qtj.65.2022.12.16.07.52.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 16 Dec 2022 07:52:28 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: David Hildenbrand , John Hubbard , Muchun Song , Mike Kravetz , Nadav Amit , Andrea Arcangeli , Rik van Riel , peterx@redhat.com, Miaohe Lin , Jann Horn , James Houghton , Andrew Morton Subject: [PATCH v4 8/9] mm/hugetlb: Make walk_hugetlb_range() safe to pmd unshare Date: Fri, 16 Dec 2022 10:52:26 -0500 Message-Id: <20221216155226.2043738-1-peterx@redhat.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221216155100.2043537-1-peterx@redhat.com> References: <20221216155100.2043537-1-peterx@redhat.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752386638299120920?= X-GMAIL-MSGID: =?utf-8?q?1752386638299120920?= Since walk_hugetlb_range() walks the pgtable, it needs the vma lock to make sure the pgtable page will not be freed concurrently. Reviewed-by: Mike Kravetz Reviewed-by: John Hubbard Signed-off-by: Peter Xu --- include/linux/pagewalk.h | 11 ++++++++++- mm/hmm.c | 15 ++++++++++++++- mm/pagewalk.c | 2 ++ 3 files changed, 26 insertions(+), 2 deletions(-) diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h index 959f52e5867d..27a6df448ee5 100644 --- a/include/linux/pagewalk.h +++ b/include/linux/pagewalk.h @@ -21,7 +21,16 @@ struct mm_walk; * depth is -1 if not known, 0:PGD, 1:P4D, 2:PUD, 3:PMD. * Any folded depths (where PTRS_PER_P?D is equal to 1) * are skipped. - * @hugetlb_entry: if set, called for each hugetlb entry + * @hugetlb_entry: if set, called for each hugetlb entry. This hook + * function is called with the vma lock held, in order to + * protect against a concurrent freeing of the pte_t* or + * the ptl. In some cases, the hook function needs to drop + * and retake the vma lock in order to avoid deadlocks + * while calling other functions. In such cases the hook + * function must either refrain from accessing the pte or + * ptl after dropping the vma lock, or else revalidate + * those items after re-acquiring the vma lock and before + * accessing them. * @test_walk: caller specific callback function to determine whether * we walk over the current vma or not. Returning 0 means * "do page table walk over the current vma", returning diff --git a/mm/hmm.c b/mm/hmm.c index 3850fb625dda..796de6866089 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -493,8 +493,21 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask, required_fault = hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, cpu_flags); if (required_fault) { + int ret; + spin_unlock(ptl); - return hmm_vma_fault(addr, end, required_fault, walk); + hugetlb_vma_unlock_read(vma); + /* + * Avoid deadlock: drop the vma lock before calling + * hmm_vma_fault(), which will itself potentially take and + * drop the vma lock. This is also correct from a + * protection point of view, because there is no further + * use here of either pte or ptl after dropping the vma + * lock. + */ + ret = hmm_vma_fault(addr, end, required_fault, walk); + hugetlb_vma_lock_read(vma); + return ret; } pfn = pte_pfn(entry) + ((start & ~hmask) >> PAGE_SHIFT); diff --git a/mm/pagewalk.c b/mm/pagewalk.c index 7f1c9b274906..d98564a7be57 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -302,6 +302,7 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end, const struct mm_walk_ops *ops = walk->ops; int err = 0; + hugetlb_vma_lock_read(vma); do { next = hugetlb_entry_end(h, addr, end); pte = huge_pte_offset(walk->mm, addr & hmask, sz); @@ -314,6 +315,7 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end, if (err) break; } while (addr = next, addr != end); + hugetlb_vma_unlock_read(vma); return err; } From patchwork Fri Dec 16 15:52:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 33990 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:e747:0:0:0:0:0 with SMTP id c7csp1044537wrn; Fri, 16 Dec 2022 07:54:49 -0800 (PST) X-Google-Smtp-Source: AA0mqf6Fw3AUecNGAWq2GZKTPvb+sdKuf47Rg8HVM8Z8tDfEWkhgkQ5+CNJ6PSG2pTClGLicQTMQ X-Received: by 2002:a17:902:eb8d:b0:189:89b1:ac95 with SMTP id q13-20020a170902eb8d00b0018989b1ac95mr29516003plg.29.1671206088953; Fri, 16 Dec 2022 07:54:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671206088; cv=none; d=google.com; s=arc-20160816; b=Ueva/wHs+dpKuKSmPuT2npACyZ1G6zn5bEjQzL2os+6eNCQ2ZDY7uKXQmtwhJUzyCY 8KywfO8fe93+KLMzjAu5ckyMkW1t/lxZqV3CD1cmKzBkZqyEgjkue2Bl2D9g65bXOHBW WijTvVQp67BSeJEUL1to4w7+fMVtvXT3wXWVntLUtN8qG+M4u9x0k2C3D66LIZj/ASM/ ldb3ecIKgP7QMqDyJqmLkOFV2Nvv61mEVPL3oCicHU698st96HR7Aqenw3XwfbwAcKaa OGA/U4Iic+DR/cRL/VbDm+vpTth6ViqyHhjm6yyEwr1bh0cs6qrsMRZeV6VX6sHVXPvZ tCYg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=75WLafdrG2VCO1SJG5HZ0fON/RGaffQASLSh3g0vcGk=; b=JbLogthN2K07ZkRReEyGTZcyU7KtmLcH2Ihj1WfWDekSn5xpEUuo62LUbtqDK4fVc3 lgjWpkzIaQeAVwnS7DZidL9U01PG6QGx64tmuZxBTKH2Pq1Jb/WmMz+nSOGtmKVXoZUZ 9ZF6ll+mTtlAWcCK3fw7EbBWJHKE7DjUnUIQl4Ci/NQdW2wv7mN0NF7o+8j/yBktszvX VuVwjAOkV+Gfj50ryl2h0tUyTBK/UAK4BdnDltzxlUXH2eF/365qj5yjJwvU4U570rzy NwQpPo+6gQYNmeBY0XhjgY7qcKSA4AzMUjHb9gjWTP2vaQESUO/a261NKVGaLmDL1e+G Lmcw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=LjM1O358; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g7-20020a1709026b4700b0018981c84002si2667920plt.5.2022.12.16.07.54.36; Fri, 16 Dec 2022 07:54:48 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=LjM1O358; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231560AbiLPPyB (ORCPT + 99 others); Fri, 16 Dec 2022 10:54:01 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60252 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231492AbiLPPxk (ORCPT ); Fri, 16 Dec 2022 10:53:40 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 57BFF6F0DB for ; Fri, 16 Dec 2022 07:52:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1671205956; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=75WLafdrG2VCO1SJG5HZ0fON/RGaffQASLSh3g0vcGk=; b=LjM1O358OXmS2cPFxhybSvIwbd+v8Y9LbHlv7Ig5L6mQp+HITSzCjjCsnGjsDHyQeZjsgt yumqXH68rOlx/dNF4mXV/j1Jb9MVgXaUCxtzUayTvfiuGB9yCk0dY4L7wCIduL2nXwUP5M VdNh8DKMJqfKwrZYbSsXvLg8D2mNzLM= Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-614-ha_DKkYnNfm_9vjFUe_QIQ-1; Fri, 16 Dec 2022 10:52:35 -0500 X-MC-Unique: ha_DKkYnNfm_9vjFUe_QIQ-1 Received: by mail-qt1-f200.google.com with SMTP id p20-20020ac84614000000b003a977d7a2ecso654633qtn.23 for ; Fri, 16 Dec 2022 07:52:35 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=75WLafdrG2VCO1SJG5HZ0fON/RGaffQASLSh3g0vcGk=; b=O2jn/cmwgtK0PqvmsNcfgaJLsCRNFP1wRrHvxmQc0Ows/TngVcPcM01mLT2mV2hm7E JcmyaBloG32tkefTxNl0wIz2vT1VtI/EXxG584mV9Y+lzIr/9TyryxCnoBzRP4itL7y2 hfop6Vj8HN9kLJsMiB4cQ9HceKYmJcnEav+MZ6Nnid8/+ZiIHZ8LwAp50NU6/ytXhZi7 ImifSUPh0E4B8Bmfm7D4zZ8eXuZOFL4GMWNY/2dmWDCkmRjHRzoWnQvp25zfq3sZzI15 bAm5XPgQqrvzQ4ThPEqveDGHjvzcLt/t+eAOTiqtzDocsDoRPpf1hMK0jmF5FLnQ8qWl PmZg== X-Gm-Message-State: AFqh2kp6fNxfCt5nwC9/7IVIbj9U7WDjKWxpGDQcRbqC9uVJNuoXc45Q ChI1rHtQF8XqXA3kqds6nGGS4R6mxmmOLaA7EgBQqNEH8hqePLxCqn9+AC4rmoLdnVmYl63fCbX n3Q0/YuRdeloZWbSM4SLXc/GrCWvZxvTAaRybr+rC7+FMtPNQgon5yCTGZkXhP2fJEUQmAGw38Q == X-Received: by 2002:ac8:545a:0:b0:3a9:763b:4a6d with SMTP id d26-20020ac8545a000000b003a9763b4a6dmr2397210qtq.10.1671205953138; Fri, 16 Dec 2022 07:52:33 -0800 (PST) X-Received: by 2002:ac8:545a:0:b0:3a9:763b:4a6d with SMTP id d26-20020ac8545a000000b003a9763b4a6dmr2397157qtq.10.1671205952717; Fri, 16 Dec 2022 07:52:32 -0800 (PST) Received: from x1n.redhat.com (bras-base-aurron9127w-grc-45-70-31-26-132.dsl.bell.ca. [70.31.26.132]) by smtp.gmail.com with ESMTPSA id s21-20020a05620a0bd500b006fa4ac86bfbsm202359qki.55.2022.12.16.07.52.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 16 Dec 2022 07:52:32 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: David Hildenbrand , John Hubbard , Muchun Song , Mike Kravetz , Nadav Amit , Andrea Arcangeli , Rik van Riel , peterx@redhat.com, Miaohe Lin , Jann Horn , James Houghton , Andrew Morton Subject: [PATCH v4 9/9] mm/hugetlb: Introduce hugetlb_walk() Date: Fri, 16 Dec 2022 10:52:29 -0500 Message-Id: <20221216155229.2043750-1-peterx@redhat.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221216155100.2043537-1-peterx@redhat.com> References: <20221216155100.2043537-1-peterx@redhat.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752386595770817410?= X-GMAIL-MSGID: =?utf-8?q?1752386595770817410?= huge_pte_offset() is the main walker function for hugetlb pgtables. The name is not really representing what it does, though. Instead of renaming it, introduce a wrapper function called hugetlb_walk() which will use huge_pte_offset() inside. Assert on the locks when walking the pgtable. Note, the vma lock assertion will be a no-op for private mappings. Document the last special case in the page_vma_mapped_walk() path where we don't need any more lock to call hugetlb_walk(). Taking vma lock there is not needed because either: (1) potential callers of hugetlb pvmw holds i_mmap_rwsem already (from one rmap_walk()), or (2) the caller will not walk a hugetlb vma at all so the hugetlb code path not reachable (e.g. in ksm or uprobe paths). It's slightly implicit for future page_vma_mapped_walk() callers on that lock requirement. But anyway, when one day this rule breaks, one will get a straightforward warning in hugetlb_walk() with lockdep, then there'll be a way out. Reviewed-by: Mike Kravetz Reviewed-by: John Hubbard Signed-off-by: Peter Xu Reviewed-by: David Hildenbrand --- fs/hugetlbfs/inode.c | 4 +--- fs/userfaultfd.c | 6 ++---- include/linux/hugetlb.h | 37 +++++++++++++++++++++++++++++++++++++ mm/hugetlb.c | 31 +++++++++++++------------------ mm/page_vma_mapped.c | 9 ++++++--- mm/pagewalk.c | 4 +--- 6 files changed, 60 insertions(+), 31 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index fdb16246f46e..48f1a8ad2243 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -388,9 +388,7 @@ static bool hugetlb_vma_maps_page(struct vm_area_struct *vma, { pte_t *ptep, pte; - ptep = huge_pte_offset(vma->vm_mm, addr, - huge_page_size(hstate_vma(vma))); - + ptep = hugetlb_walk(vma, addr, huge_page_size(hstate_vma(vma))); if (!ptep) return false; diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 887e20472051..4e27ff526873 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -237,14 +237,12 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx, unsigned long flags, unsigned long reason) { - struct mm_struct *mm = ctx->mm; pte_t *ptep, pte; bool ret = true; - mmap_assert_locked(mm); - - ptep = huge_pte_offset(mm, address, vma_mmu_pagesize(vma)); + mmap_assert_locked(ctx->mm); + ptep = hugetlb_walk(vma, address, vma_mmu_pagesize(vma)); if (!ptep) goto out; diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index d755e2a7c0db..b6b10101bea7 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -2,6 +2,7 @@ #ifndef _LINUX_HUGETLB_H #define _LINUX_HUGETLB_H +#include #include #include #include @@ -196,6 +197,11 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, * huge_pte_offset(): Walk the hugetlb pgtable until the last level PTE. * Returns the pte_t* if found, or NULL if the address is not mapped. * + * IMPORTANT: we should normally not directly call this function, instead + * this is only a common interface to implement arch-specific + * walker. Please use hugetlb_walk() instead, because that will attempt to + * verify the locking for you. + * * Since this function will walk all the pgtable pages (including not only * high-level pgtable page, but also PUD entry that can be unshared * concurrently for VM_SHARED), the caller of this function should be @@ -1229,4 +1235,35 @@ bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr); #define flush_hugetlb_tlb_range(vma, addr, end) flush_tlb_range(vma, addr, end) #endif +static inline bool __vma_shareable_lock(struct vm_area_struct *vma) +{ + return (vma->vm_flags & VM_MAYSHARE) && vma->vm_private_data; +} + +/* + * Safe version of huge_pte_offset() to check the locks. See comments + * above huge_pte_offset(). + */ +static inline pte_t * +hugetlb_walk(struct vm_area_struct *vma, unsigned long addr, unsigned long sz) +{ +#if defined(CONFIG_HUGETLB_PAGE) && \ + defined(CONFIG_ARCH_WANT_HUGE_PMD_SHARE) && defined(CONFIG_LOCKDEP) + struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; + + /* + * If pmd sharing possible, locking needed to safely walk the + * hugetlb pgtables. More information can be found at the comment + * above huge_pte_offset() in the same file. + * + * NOTE: lockdep_is_held() is only defined with CONFIG_LOCKDEP. + */ + if (__vma_shareable_lock(vma)) + WARN_ON_ONCE(!lockdep_is_held(&vma_lock->rw_sema) && + !lockdep_is_held( + &vma->vm_file->f_mapping->i_mmap_rwsem)); +#endif + return huge_pte_offset(vma->vm_mm, addr, sz); +} + #endif /* _LINUX_HUGETLB_H */ diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 33fe73e1e589..21dc37ff0896 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -258,11 +258,6 @@ static inline struct hugepage_subpool *subpool_vma(struct vm_area_struct *vma) /* * hugetlb vma_lock helper routines */ -static bool __vma_shareable_lock(struct vm_area_struct *vma) -{ - return vma->vm_flags & VM_MAYSHARE && vma->vm_private_data; -} - void hugetlb_vma_lock_read(struct vm_area_struct *vma) { if (__vma_shareable_lock(vma)) { @@ -4959,7 +4954,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, } else { /* * For shared mappings the vma lock must be held before - * calling huge_pte_offset in the src vma. Otherwise, the + * calling hugetlb_walk() in the src vma. Otherwise, the * returned ptep could go away if part of a shared pmd and * another thread calls huge_pmd_unshare. */ @@ -4969,7 +4964,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, last_addr_mask = hugetlb_mask_last_page(h); for (addr = src_vma->vm_start; addr < src_vma->vm_end; addr += sz) { spinlock_t *src_ptl, *dst_ptl; - src_pte = huge_pte_offset(src, addr, sz); + src_pte = hugetlb_walk(src_vma, addr, sz); if (!src_pte) { addr |= last_addr_mask; continue; @@ -5176,7 +5171,7 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma, hugetlb_vma_lock_write(vma); i_mmap_lock_write(mapping); for (; old_addr < old_end; old_addr += sz, new_addr += sz) { - src_pte = huge_pte_offset(mm, old_addr, sz); + src_pte = hugetlb_walk(vma, old_addr, sz); if (!src_pte) { old_addr |= last_addr_mask; new_addr |= last_addr_mask; @@ -5239,7 +5234,7 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct last_addr_mask = hugetlb_mask_last_page(h); address = start; for (; address < end; address += sz) { - ptep = huge_pte_offset(mm, address, sz); + ptep = hugetlb_walk(vma, address, sz); if (!ptep) { address |= last_addr_mask; continue; @@ -5552,7 +5547,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, mutex_lock(&hugetlb_fault_mutex_table[hash]); hugetlb_vma_lock_read(vma); spin_lock(ptl); - ptep = huge_pte_offset(mm, haddr, huge_page_size(h)); + ptep = hugetlb_walk(vma, haddr, huge_page_size(h)); if (likely(ptep && pte_same(huge_ptep_get(ptep), pte))) goto retry_avoidcopy; @@ -5590,7 +5585,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, * before the page tables are altered */ spin_lock(ptl); - ptep = huge_pte_offset(mm, haddr, huge_page_size(h)); + ptep = hugetlb_walk(vma, haddr, huge_page_size(h)); if (likely(ptep && pte_same(huge_ptep_get(ptep), pte))) { /* Break COW or unshare */ huge_ptep_clear_flush(vma, haddr, ptep); @@ -6376,7 +6371,7 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, return NULL; hugetlb_vma_lock_read(vma); - pte = huge_pte_offset(mm, haddr, huge_page_size(h)); + pte = hugetlb_walk(vma, haddr, huge_page_size(h)); if (!pte) goto out_unlock; @@ -6441,8 +6436,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, * * Note that page table lock is not held when pte is null. */ - pte = huge_pte_offset(mm, vaddr & huge_page_mask(h), - huge_page_size(h)); + pte = hugetlb_walk(vma, vaddr & huge_page_mask(h), + huge_page_size(h)); if (pte) ptl = huge_pte_lock(h, mm, pte); absent = !pte || huge_pte_none(huge_ptep_get(pte)); @@ -6633,7 +6628,7 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, last_addr_mask = hugetlb_mask_last_page(h); for (; address < end; address += psize) { spinlock_t *ptl; - ptep = huge_pte_offset(mm, address, psize); + ptep = hugetlb_walk(vma, address, psize); if (!ptep) { address |= last_addr_mask; continue; @@ -7040,8 +7035,8 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, saddr = page_table_shareable(svma, vma, addr, idx); if (saddr) { - spte = huge_pte_offset(svma->vm_mm, saddr, - vma_mmu_pagesize(svma)); + spte = hugetlb_walk(svma, saddr, + vma_mmu_pagesize(svma)); if (spte) { get_page(virt_to_page(spte)); break; @@ -7358,7 +7353,7 @@ void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) hugetlb_vma_lock_write(vma); i_mmap_lock_write(vma->vm_file->f_mapping); for (address = start; address < end; address += PUD_SIZE) { - ptep = huge_pte_offset(mm, address, sz); + ptep = hugetlb_walk(vma, address, sz); if (!ptep) continue; ptl = huge_pte_lock(h, mm, ptep); diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 93e13fc17d3c..f3729b23dd0e 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -168,9 +168,12 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) /* The only possible mapping was handled on last iteration */ if (pvmw->pte) return not_found(pvmw); - - /* when pud is not present, pte will be NULL */ - pvmw->pte = huge_pte_offset(mm, pvmw->address, size); + /* + * All callers that get here will already hold the + * i_mmap_rwsem. Therefore, no additional locks need to be + * taken before calling hugetlb_walk(). + */ + pvmw->pte = hugetlb_walk(vma, pvmw->address, size); if (!pvmw->pte) return false; diff --git a/mm/pagewalk.c b/mm/pagewalk.c index d98564a7be57..cb23f8a15c13 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -305,13 +305,11 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end, hugetlb_vma_lock_read(vma); do { next = hugetlb_entry_end(h, addr, end); - pte = huge_pte_offset(walk->mm, addr & hmask, sz); - + pte = hugetlb_walk(vma, addr & hmask, sz); if (pte) err = ops->hugetlb_entry(pte, hmask, addr, next, walk); else if (ops->pte_hole) err = ops->pte_hole(addr, next, -1, walk); - if (err) break; } while (addr = next, addr != end);