From patchwork Wed Dec 7 20:30:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 31019 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp392635wrr; Wed, 7 Dec 2022 12:34:28 -0800 (PST) X-Google-Smtp-Source: AA0mqf4LbzAR6yvKC87eNbfuNNId6rAr2cVXkdp+bOcVCDeVveJYrqolN0Lmr4KRxWHwL3XczZ1g X-Received: by 2002:a17:906:c7d2:b0:7c1:266e:85be with SMTP id dc18-20020a170906c7d200b007c1266e85bemr1620792ejb.681.1670445268246; Wed, 07 Dec 2022 12:34:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670445268; cv=none; d=google.com; s=arc-20160816; b=izClTH8yAIxSbbZ4wVbM/FRMCOd+DYOUdV7l5S3+llEhPQmcr9sFLGuirF65+JNhV6 +XqMxsCanDHIO7b8G6qUpgmH07WbqDuv6ZuvoavvJODOQJUy5aklJFkcADVBMf1mGcA5 BWV1KK6DxApaiJSqnh8IpHy5hSHBkBMMugNky3KjDAjmOMYGDe5t78iVdgIH0otxoaUi qlumHLMk7B1s51+4NtNoz2SLQfGAmzKKOpgKB/9KcEjTYFLSJXEBHNrDsEDEsXYCZkDR aa0+N79qsKean6Gp17qdc8cFNb6hKePwz+9qNtEaj5SzXkJE3OqxDixB0MO2+bKlo/ta d+BA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=kXTqB5FubF2+0tcQcQ/EoMxi6C2oCNf5KVcHgua8JL0=; b=puc/ak99MuBskQ9GYbIMV1oEu5uUzEeTRFr7Xw6OiNoAYmJnQt4eAgP9YDFlLJJzn1 tLGcVIKpO6644gUBtWPYgjie3C45pFwsukpBiKlfmX0exdEQ7rDTtnuWFrvvGUAy9ekX OPZ0hvHa3r9eszBLYmgNnhYE3PmVA0hhZ/nMqALQi8B6USO14M4XKHpMJmzGrG52wA0B SJFA6cvlJHcWrHG89Gc2E+BpwslsidRcEYG/Qc/jZKPUzLQS2+X38kWE3zz3MzOtyxqa rzNn8Y7pZgW//1Aaj4Yx4OqIvHvp1yS4R1KwY4S79OEt1f1qr9ZM47wsqEQHsrGuzH/J O1Qw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Im3ALiQK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w2-20020a05640234c200b0046b392e8c56si5408902edc.10.2022.12.07.12.34.05; Wed, 07 Dec 2022 12:34:28 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Im3ALiQK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229850AbiLGUbp (ORCPT + 99 others); Wed, 7 Dec 2022 15:31:45 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49180 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229705AbiLGUbn (ORCPT ); Wed, 7 Dec 2022 15:31:43 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 810997BC0B for ; Wed, 7 Dec 2022 12:30:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1670445041; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kXTqB5FubF2+0tcQcQ/EoMxi6C2oCNf5KVcHgua8JL0=; b=Im3ALiQKcTTxBuHeDHt7J8QeADRqBKXkTtD81mw8FxP/KL79b6Fk6G501blrJcgW83y9k3 QdUiTcvHqb08JEhl2VydHaBA2+QEgWLiOq39EK+YpgTBMotHiM7VNuxnJDySTyrWFs6/tG JGoT5aztdo5inxoPlUMgfnsQ3jfYXoc= Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-655-kel4CzXMMU6vcV0mWT6bLQ-1; Wed, 07 Dec 2022 15:30:40 -0500 X-MC-Unique: kel4CzXMMU6vcV0mWT6bLQ-1 Received: by mail-qk1-f200.google.com with SMTP id ay43-20020a05620a17ab00b006fa30ed61fdso25687997qkb.5 for ; Wed, 07 Dec 2022 12:30:40 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kXTqB5FubF2+0tcQcQ/EoMxi6C2oCNf5KVcHgua8JL0=; b=hFVuHw1cfvGPCF2eHFcd6MdR2y2Z95qN7G/RAaxDjdbD3gi6bFciLNU6LAyW+cAN/Z oixlf4DH/AYd1sRRzdXQz7HDK+zrVTnqsevl0PX+igtp+pRt0USNOc/oBi1rk47TWxok t1mOAsxNYloe7nHu5D0iBso1Grtx1KkMDNnPFfPgvwedQ1zjRUDTortwox8m9F5SRfQh ygmilgHfQR8ZnQwa2AC1++mo5xtq3rtgrDnV/tdehsZo6ZLpGjV+0XpJdGkEdyNTZnqd pn1G28LY0GIG5bmY8P9zSGIeAaIYc4b9iBK2YKLFC1YBipikrc0mEVVsHJqS+O90Rq4E poDw== X-Gm-Message-State: ANoB5pmC6ukSypW8I9IIpjmJdlC+nXMu0qLW5HX8SLIVrsI1CzZzodki bUnpTmEHTsiFEYnv6CPcmRTKGupXZajTEk6R7fkL7Mt/SoCTRDwpCpjMQxCDOE62wO0vFn7AreX kYAt6o0D79mfu1wfYbBJpGxRQ X-Received: by 2002:ac8:4e47:0:b0:3a5:63ef:cf4e with SMTP id e7-20020ac84e47000000b003a563efcf4emr1902947qtw.16.1670445040113; Wed, 07 Dec 2022 12:30:40 -0800 (PST) X-Received: by 2002:ac8:4e47:0:b0:3a5:63ef:cf4e with SMTP id e7-20020ac84e47000000b003a563efcf4emr1902931qtw.16.1670445039881; Wed, 07 Dec 2022 12:30:39 -0800 (PST) Received: from x1n.redhat.com (bras-base-aurron9127w-grc-46-70-31-27-79.dsl.bell.ca. [70.31.27.79]) by smtp.gmail.com with ESMTPSA id dc53-20020a05620a523500b006fefa5f7fcesm855594qkb.10.2022.12.07.12.30.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Dec 2022 12:30:37 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Muchun Song , John Hubbard , Andrea Arcangeli , James Houghton , Jann Horn , Rik van Riel , Miaohe Lin , Andrew Morton , Mike Kravetz , peterx@redhat.com, David Hildenbrand , Nadav Amit Subject: [PATCH v2 01/10] mm/hugetlb: Let vma_offset_start() to return start Date: Wed, 7 Dec 2022 15:30:25 -0500 Message-Id: <20221207203034.650899-2-peterx@redhat.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221207203034.650899-1-peterx@redhat.com> References: <20221207203034.650899-1-peterx@redhat.com> MIME-Version: 1.0 Content-type: text/plain X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751588817792607168?= X-GMAIL-MSGID: =?utf-8?q?1751588817792607168?= Even though vma_offset_start() is named like that, it's not returning "the start address of the range" but rather the offset we should use to offset the vma->vm_start address. Make it return the real value of the start vaddr, and it also helps for all the callers because whenever the retval is used, it'll be ultimately added into the vma->vm_start anyway, so it's better. Reviewed-by: Mike Kravetz Reviewed-by: David Hildenbrand Signed-off-by: Peter Xu Reviewed-by: John Hubbard --- fs/hugetlbfs/inode.c | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 790d2727141a..fdb16246f46e 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -412,10 +412,12 @@ static bool hugetlb_vma_maps_page(struct vm_area_struct *vma, */ static unsigned long vma_offset_start(struct vm_area_struct *vma, pgoff_t start) { + unsigned long offset = 0; + if (vma->vm_pgoff < start) - return (start - vma->vm_pgoff) << PAGE_SHIFT; - else - return 0; + offset = (start - vma->vm_pgoff) << PAGE_SHIFT; + + return vma->vm_start + offset; } static unsigned long vma_offset_end(struct vm_area_struct *vma, pgoff_t end) @@ -457,7 +459,7 @@ static void hugetlb_unmap_file_folio(struct hstate *h, v_start = vma_offset_start(vma, start); v_end = vma_offset_end(vma, end); - if (!hugetlb_vma_maps_page(vma, vma->vm_start + v_start, page)) + if (!hugetlb_vma_maps_page(vma, v_start, page)) continue; if (!hugetlb_vma_trylock_write(vma)) { @@ -473,8 +475,8 @@ static void hugetlb_unmap_file_folio(struct hstate *h, break; } - unmap_hugepage_range(vma, vma->vm_start + v_start, v_end, - NULL, ZAP_FLAG_DROP_MARKER); + unmap_hugepage_range(vma, v_start, v_end, NULL, + ZAP_FLAG_DROP_MARKER); hugetlb_vma_unlock_write(vma); } @@ -507,10 +509,9 @@ static void hugetlb_unmap_file_folio(struct hstate *h, */ v_start = vma_offset_start(vma, start); v_end = vma_offset_end(vma, end); - if (hugetlb_vma_maps_page(vma, vma->vm_start + v_start, page)) - unmap_hugepage_range(vma, vma->vm_start + v_start, - v_end, NULL, - ZAP_FLAG_DROP_MARKER); + if (hugetlb_vma_maps_page(vma, v_start, page)) + unmap_hugepage_range(vma, v_start, v_end, NULL, + ZAP_FLAG_DROP_MARKER); kref_put(&vma_lock->refs, hugetlb_vma_lock_release); hugetlb_vma_unlock_write(vma); @@ -540,8 +541,7 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end, v_start = vma_offset_start(vma, start); v_end = vma_offset_end(vma, end); - unmap_hugepage_range(vma, vma->vm_start + v_start, v_end, - NULL, zap_flags); + unmap_hugepage_range(vma, v_start, v_end, NULL, zap_flags); /* * Note that vma lock only exists for shared/non-private From patchwork Wed Dec 7 20:30:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 31023 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp392925wrr; Wed, 7 Dec 2022 12:35:06 -0800 (PST) X-Google-Smtp-Source: AA0mqf5hOWZdAs/1/A4yfYTvo08xGecSjJojieU6883ImodOApjze+VPldwIqzK1WQq/ytWI8kHO X-Received: by 2002:a05:6a00:21c8:b0:560:e4d1:8df5 with SMTP id t8-20020a056a0021c800b00560e4d18df5mr76898993pfj.39.1670445305757; Wed, 07 Dec 2022 12:35:05 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670445305; cv=none; d=google.com; s=arc-20160816; b=Jr3zLXdgNg9GHIUHc/3yssfod88cTW0Uh9gSZfqcq9KpKac2/BQWcUQYh9P/gZYdvM tj9SsEG5v1NfucZyBXzzyIUBwoBD+wI1qGehHhWTBYlLI1uerHJ83Wyt4u1IzfATZXJq KpcOzuaN9DfP0fOlGtRYia8+r8b6sf6iVaN9bAZvSffWKlr+HOxYaWby8u0+apvSvZzd 1FxCmUUJ/my+IYO10ci25lYUkohghdBPwxbUw9IcBP2Whr6yHbkMApE0E3njHhtK1bsZ AUslN56QmsBkRu3LW/vx/NN2MnOm6MOszxqylj9/Ah9pVIagsaR/7ltXuJfqkdo3oyyg SVJw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=RTEeCRjKpb4wZGC2IHTK/YLghk1hCkfDBIBgYSHgVC0=; b=cESWatD8Mh0Ewkahm7Cwq3EPYWYdtudlEBlQZe1dcs3j8hCPj8qU8ydXNChoJw2zAa NqymORkaBYxM99fCWjA3j10Prvp4Yko37+4923YP03RgFDPd3SnnNlFTtBvAsUSOk71s AZU71tz2eGjkWp2B6PamuAXM0jkO2oQ1BIfZltbeNbE5HftUEgIj4Nr7ikQmleIk+nPg J7TdEvVJIl2auAamIbRDBZkfi+10kR1HD+ctcHv5nal7XNegE37vFmjDgqD483LcXuyq PGzBV7r3CGNH1Z8hQ3WSTZ4x3QtAAtlxryNdcXCe5/WAoPg9OrnYF55GvwtP5GZG/H1s SFzA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="evZwOO/r"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 84-20020a621557000000b00576dc406db3si10158477pfv.172.2022.12.07.12.34.52; Wed, 07 Dec 2022 12:35:05 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="evZwOO/r"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229890AbiLGUcA (ORCPT + 99 others); Wed, 7 Dec 2022 15:32:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49202 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229705AbiLGUbq (ORCPT ); Wed, 7 Dec 2022 15:31:46 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C97C67BC28 for ; Wed, 7 Dec 2022 12:30:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1670445043; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RTEeCRjKpb4wZGC2IHTK/YLghk1hCkfDBIBgYSHgVC0=; b=evZwOO/reQhgKX78m90wFfpYs0rFtY1fq+Cxif56xRNB+CUFF17eR/kzpN5JQzu/NfKv+S 2/pU0BAM51YhINEEIzjPB+pZE7tRI3889YnCqAvVm3qXP1BDS+7qClDsau2YHodaZYp0ch iuLX4hhdq6x2WwIprAJh6v9iv5UToU0= Received: from mail-qv1-f69.google.com (mail-qv1-f69.google.com [209.85.219.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-529-PasFPQGDPvi8z08cbKKKGA-1; Wed, 07 Dec 2022 15:30:41 -0500 X-MC-Unique: PasFPQGDPvi8z08cbKKKGA-1 Received: by mail-qv1-f69.google.com with SMTP id 71-20020a0c804d000000b004b2fb260447so37545241qva.10 for ; Wed, 07 Dec 2022 12:30:41 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=RTEeCRjKpb4wZGC2IHTK/YLghk1hCkfDBIBgYSHgVC0=; b=PTF57sVXTgP3tc7wWW8Hgp+SMFzlIgBuVr0BR1C9Y2vAqvUTBuM8tHK2swcx9iFqiK IZ9gjWPAYEsJF6ymVB092+ekWBugygdQser10kg5uIRX+gw+zGLWm15FXsi05y3kHTZ5 PiuQZSYOl+tnVT2MJAnKAVF/WNkzRvXKvPdikT9upZUUyCL2axDY5CelfMhGQvmipF/F 6OSs91zbBRwbYie9juws87mTi/ILjkRk2VBDdeA/fsrN056zIdsg7glS5ZsJOTsnQ6q6 4c4cKmLl6iRqXrl05mDJQsQaorEu7qGeJGP0hc/p1gM7+sZoboQxHRwgtokKtmKeuKVE IfaQ== X-Gm-Message-State: ANoB5pkYxxohLOAJNIpFSCQ44P1sZbkQ+HEUdfvrhka3fRcn0FZ+6Gu1 NrmJrenlK7Y06r4OHbdSGjcCC12Fdn995krOJbSqjjH1yaDat5m251bslX/fSM5ISbLT6uyINxa 38wSwrjeyrIbr4nWf+H2b/x01 X-Received: by 2002:ac8:6b92:0:b0:3a6:8a53:b8ab with SMTP id z18-20020ac86b92000000b003a68a53b8abmr1339915qts.36.1670445041489; Wed, 07 Dec 2022 12:30:41 -0800 (PST) X-Received: by 2002:ac8:6b92:0:b0:3a6:8a53:b8ab with SMTP id z18-20020ac86b92000000b003a68a53b8abmr1339904qts.36.1670445041289; Wed, 07 Dec 2022 12:30:41 -0800 (PST) Received: from x1n.redhat.com (bras-base-aurron9127w-grc-46-70-31-27-79.dsl.bell.ca. [70.31.27.79]) by smtp.gmail.com with ESMTPSA id dc53-20020a05620a523500b006fefa5f7fcesm855594qkb.10.2022.12.07.12.30.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Dec 2022 12:30:40 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Muchun Song , John Hubbard , Andrea Arcangeli , James Houghton , Jann Horn , Rik van Riel , Miaohe Lin , Andrew Morton , Mike Kravetz , peterx@redhat.com, David Hildenbrand , Nadav Amit Subject: [PATCH v2 02/10] mm/hugetlb: Don't wait for migration entry during follow page Date: Wed, 7 Dec 2022 15:30:26 -0500 Message-Id: <20221207203034.650899-3-peterx@redhat.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221207203034.650899-1-peterx@redhat.com> References: <20221207203034.650899-1-peterx@redhat.com> MIME-Version: 1.0 Content-type: text/plain X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751588856969283217?= X-GMAIL-MSGID: =?utf-8?q?1751588856969283217?= That's what the code does with !hugetlb pages, so we should logically do the same for hugetlb, so migration entry will also be treated as no page. This is probably also the last piece in follow_page code that may sleep, the last one should be removed in cf994dd8af27 ("mm/gup: remove FOLL_MIGRATION", 2022-11-16). Reviewed-by: Mike Kravetz Reviewed-by: David Hildenbrand Signed-off-by: Peter Xu Reviewed-by: John Hubbard --- mm/hugetlb.c | 11 ----------- 1 file changed, 11 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 1088f2f41c88..c8a6673fe5b4 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6232,7 +6232,6 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, if (WARN_ON_ONCE(flags & FOLL_PIN)) return NULL; -retry: pte = huge_pte_offset(mm, haddr, huge_page_size(h)); if (!pte) return NULL; @@ -6255,16 +6254,6 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, page = NULL; goto out; } - } else { - if (is_hugetlb_entry_migration(entry)) { - spin_unlock(ptl); - __migration_entry_wait_huge(pte, ptl); - goto retry; - } - /* - * hwpoisoned entry is treated as no_page_table in - * follow_page_mask(). - */ } out: spin_unlock(ptl); From patchwork Wed Dec 7 20:30:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 31020 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp392680wrr; Wed, 7 Dec 2022 12:34:33 -0800 (PST) X-Google-Smtp-Source: AA0mqf4herPHKJ6C8aYrV6oSHE87Z5BAedOBKEt0ddic6UKe45hUtCimo2+ObAOFYjz0GvpQA0dN X-Received: by 2002:a65:4d49:0:b0:479:510:3eb with SMTP id j9-20020a654d49000000b00479051003ebmr1652945pgt.598.1670445272706; Wed, 07 Dec 2022 12:34:32 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670445272; cv=none; d=google.com; s=arc-20160816; b=qbc5D2CY+PMjAiC7hYlvH1s1y8w2anqHudostCxFUd66wUORmEAc2Y+0TGgsOX+edm z1aZJBdIwU+qpkYGcmzzUBGr5XjAPBYb7cRk0YOuymHqyPNOdukSIf/1a+kLzsh7Lnyc Z7E1NIglhAT1V/qibhRrA2G0pVKNyPh/CD0qYrZl0WKg3B9ZtVo4NwnvUfgNQ4XFKpSC un4t0pNVnn0UvJsB20OuUW7JfNO0Mqqp6jiXrc8ak5bAFIQqhVTaBHnOHpOP3uuJKT0C eV2jo0Ky9t3yy6Q695APJWz/t3IPsI0A+kWaWeRQ3aLhgaAo7QARLhEiMzQlxSBTOeEW Nkow== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=1A7zSSAdH3Q+Nn9HaLXwUXJ3cHWzn45QfpKMkITSm0w=; b=aH4A6nCfRUpFJ0z81yNuUZWaBXLGyxUiJPO4mopTOXqF83a2ql4lg6xIwFCdqsRqC7 PfXQ5NFbAK6NvVf8G49Kuzvb8RINJ2WjHRSmW6rne/NHmD1s0OrhpMY9yIfgpRLCsgQG Y+3v3NM6UIv0khMrAS6P3F2l3f5QOdsaO1b1Vvv8NC4Hy/QcC1MXj+tz9fK81t6EIBpB 57iNtTrFjJ15m92h3mKgn2csKRE+ioEPGjTcWjT4qa0xBqPHLhaXsGIW8ehM1efAdDcN ofqndQ8wDgJ097u5ujrLpDksKr3/ceHIP9xRQDnoE24Zh6n2O5wCV4hIPIKl6Ueyy86P ex1g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=RV5CXOV5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g14-20020a65580e000000b0047715c95650si22095425pgr.534.2022.12.07.12.34.19; Wed, 07 Dec 2022 12:34:32 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=RV5CXOV5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229867AbiLGUbr (ORCPT + 99 others); Wed, 7 Dec 2022 15:31:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49190 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229603AbiLGUbn (ORCPT ); Wed, 7 Dec 2022 15:31:43 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 87B847B57E for ; Wed, 7 Dec 2022 12:30:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1670445044; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1A7zSSAdH3Q+Nn9HaLXwUXJ3cHWzn45QfpKMkITSm0w=; b=RV5CXOV5Y6Cmi7B12y3Fnp7XG77jLFoFJdIr+DmwNNmN3hSYLjMVVgcha/oxTuBRje4UAu w/BqPgqMmmLSFgpFC/fdDaUnEj12F6RdSQviErZE+SwlKsBtF49TF67SZxEfnFxXqGpfkH k7yXVWgnuf25AFehVad+qNkxDLLpqTI= Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-141-S-ffjT42P9GltHSv8Dc6XA-1; Wed, 07 Dec 2022 15:30:43 -0500 X-MC-Unique: S-ffjT42P9GltHSv8Dc6XA-1 Received: by mail-qt1-f199.google.com with SMTP id ff5-20020a05622a4d8500b003a526107477so37923797qtb.9 for ; Wed, 07 Dec 2022 12:30:43 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1A7zSSAdH3Q+Nn9HaLXwUXJ3cHWzn45QfpKMkITSm0w=; b=iQ/Bvts5f6Cpu7N2a6ZHfsPhGZXqclJ5c5UexnaTuw1WSJ2oovR4MzCBDmocTvfDU9 xxww+hlCHp7tYH7oYoTnLXNGEubnE9VBF91XzVDU5dARDgaMGF5IFWoWUvPUSdoC6ju8 mtz9+DqDzTikSC8Xjef2Ngmq+17iNeO5pwAYduTYEh0NNVBtxhncuIVDehCxTBw16Xxg XG0h55Pj/9awJRNKEnpBTGB9GzGUpbIIaoD1p0loP7cJntAZbSy3XqMjqZXfYK2ImDZF g/2lXlfFaop3ZQoQY4fvibQ/K/B+7GOvwkU6X5NSJZEcj2HjGfHJWII6PD+L2gPLTtg4 tDhg== X-Gm-Message-State: ANoB5pnyjZMJYa1vGwCzk1RxOFJ+0eVY3R+mOqYlBKMYSiPbz2rzD19w GbhSTZhuo2Przg/vGNiAoQPdQOJS0QtUSlVHMvtObzbaSpUmptK8EWBI1E0GiQBpP4pRmpEuaih OIEOtOCAkrdEpb5XOiho4L32g X-Received: by 2002:a0c:e109:0:b0:4c6:ecbf:e47e with SMTP id w9-20020a0ce109000000b004c6ecbfe47emr1918691qvk.44.1670445042945; Wed, 07 Dec 2022 12:30:42 -0800 (PST) X-Received: by 2002:a0c:e109:0:b0:4c6:ecbf:e47e with SMTP id w9-20020a0ce109000000b004c6ecbfe47emr1918677qvk.44.1670445042671; Wed, 07 Dec 2022 12:30:42 -0800 (PST) Received: from x1n.redhat.com (bras-base-aurron9127w-grc-46-70-31-27-79.dsl.bell.ca. [70.31.27.79]) by smtp.gmail.com with ESMTPSA id dc53-20020a05620a523500b006fefa5f7fcesm855594qkb.10.2022.12.07.12.30.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Dec 2022 12:30:42 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Muchun Song , John Hubbard , Andrea Arcangeli , James Houghton , Jann Horn , Rik van Riel , Miaohe Lin , Andrew Morton , Mike Kravetz , peterx@redhat.com, David Hildenbrand , Nadav Amit Subject: [PATCH v2 03/10] mm/hugetlb: Document huge_pte_offset usage Date: Wed, 7 Dec 2022 15:30:27 -0500 Message-Id: <20221207203034.650899-4-peterx@redhat.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221207203034.650899-1-peterx@redhat.com> References: <20221207203034.650899-1-peterx@redhat.com> MIME-Version: 1.0 Content-type: text/plain X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751588822526387891?= X-GMAIL-MSGID: =?utf-8?q?1751588822526387891?= huge_pte_offset() is potentially a pgtable walker, looking up pte_t* for a hugetlb address. Normally, it's always safe to walk a generic pgtable as long as we're with the mmap lock held for either read or write, because that guarantees the pgtable pages will always be valid during the process. But it's not true for hugetlbfs, especially shared: hugetlbfs can have its pgtable freed by pmd unsharing, it means that even with mmap lock held for current mm, the PMD pgtable page can still go away from under us if pmd unsharing is possible during the walk. So we have two ways to make it safe even for a shared mapping: (1) If we're with the hugetlb vma lock held for either read/write, it's okay because pmd unshare cannot happen at all. (2) If we're with the i_mmap_rwsem lock held for either read/write, it's okay because even if pmd unshare can happen, the pgtable page cannot be freed from under us. Document it. Signed-off-by: Peter Xu Reviewed-by: John Hubbard Reviewed-by: David Hildenbrand --- include/linux/hugetlb.h | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 551834cd5299..81efd9b9baa2 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -192,6 +192,38 @@ extern struct list_head huge_boot_pages; pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz); +/* + * huge_pte_offset(): Walk the hugetlb pgtable until the last level PTE. + * Returns the pte_t* if found, or NULL if the address is not mapped. + * + * Since this function will walk all the pgtable pages (including not only + * high-level pgtable page, but also PUD entry that can be unshared + * concurrently for VM_SHARED), the caller of this function should be + * responsible of its thread safety. One can follow this rule: + * + * (1) For private mappings: pmd unsharing is not possible, so it'll + * always be safe if we're with the mmap sem for either read or write. + * This is normally always the case, IOW we don't need to do anything + * special. + * + * (2) For shared mappings: pmd unsharing is possible (so the PUD-ranged + * pgtable page can go away from under us! It can be done by a pmd + * unshare with a follow up munmap() on the other process), then we + * need either: + * + * (2.1) hugetlb vma lock read or write held, to make sure pmd unshare + * won't happen upon the range (it also makes sure the pte_t we + * read is the right and stable one), or, + * + * (2.2) hugetlb mapping i_mmap_rwsem lock held read or write, to make + * sure even if unshare happened the racy unmap() will wait until + * i_mmap_rwsem is released. + * + * Option (2.1) is the safest, which guarantees pte stability from pmd + * sharing pov, until the vma lock released. Option (2.2) doesn't protect + * a concurrent pmd unshare, but it makes sure the pgtable page is safe to + * access. + */ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr, unsigned long sz); unsigned long hugetlb_mask_last_page(struct hstate *h); From patchwork Wed Dec 7 20:30:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 31024 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp392978wrr; Wed, 7 Dec 2022 12:35:16 -0800 (PST) X-Google-Smtp-Source: AA0mqf6SWCtycEdy4aQbIISu34RU/ZHA8FtAPs2LZYzuBdGwbP/P18ublNpItJ8gBFahLieaGa/W X-Received: by 2002:a17:906:2693:b0:7aa:57c3:3f26 with SMTP id t19-20020a170906269300b007aa57c33f26mr19937739ejc.195.1670445316680; Wed, 07 Dec 2022 12:35:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670445316; cv=none; d=google.com; s=arc-20160816; b=BiOECkhAt97a8F40Db36nfY5IcfOVGcWb9XNB+xmo1JTLGmguD+0rV7MohQ+B2ExZQ m9NkipSNuTDINzKRPPrCRpMTB9KfZW8+/7MseTCn/0SIiYqb3VI+X+xnNd08Y0ShlH/n SoIl+auspzAjH9nzgA6ut2GOGPDsnnvWs6JPQCF4o2/tD9wfLeJMN9ps4sDbgdbUbPtG 7TGA2HfuTEPPAcwIjDs73b8VL/h0+oNPWYB0p4bz6f0Lyq7pFAg9Yu2/Slxvn2jssvAg 7+2gnLabWvkZn1j7D9rtzwz3/ZtHg1BUtHbrZriPs+YoYvEEj9jYMKhQq7U5p3u0KwxS lN0g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=XK/5vqUd1QlikRxebc+rXesIul2oExyAtkR9smyGVX0=; b=0oYBsxvVerfqBKfyXpCQdDV5+CZfm3uH1US0EuILdbwTm7N1avjlg6rKrMQ38vOLnW idxeunuLvTRnSAjVRCK+c0BTczruCLpk56syUbwowXpYFhpxvWGhrqzam494hX1Q6HHx 8JTPqgYKSTN2UbPo9VqpUIS12EkpmId8Bjwk/TVMqZA0n0hyWtX+P4mNMhIh5uD+7VHK AudO3r1RKxMYi/gYfRKPNP4AhPEKuYR7F6Qd90WqPfBiyOz0987H5VVV2wx9ZD+enojO CfMCYZW8XQMBQ6mORTi7nUmhsnDnm/CUPdJj9a2pSQwdhwynvY9HHhLvPRsADAxAaDVS eakg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=MJ9GIlhH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id nb36-20020a1709071ca400b007c10638840asi5415860ejc.75.2022.12.07.12.34.54; Wed, 07 Dec 2022 12:35:16 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=MJ9GIlhH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229945AbiLGUcE (ORCPT + 99 others); Wed, 7 Dec 2022 15:32:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49224 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229865AbiLGUbt (ORCPT ); Wed, 7 Dec 2022 15:31:49 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 36BF924BC1 for ; Wed, 7 Dec 2022 12:30:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1670445046; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XK/5vqUd1QlikRxebc+rXesIul2oExyAtkR9smyGVX0=; b=MJ9GIlhHC0W0yphgEWSNW8wi9F3K6AnBmB08kLAgZ8GI4/7otMvRiiAxKZZKDwuZI/tzZD 83pOlvm487qYVOwR1oJP5+bGw0TvdydguztNieu7IrEGFu4Ym+rXbnzrGReERDIBxYeKws r27sn3WY2TQz4/hq7JCPtj4eTU0X0OM= Received: from mail-qv1-f69.google.com (mail-qv1-f69.google.com [209.85.219.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-324-EgF7b9GtO0Cg090C2qIeFQ-1; Wed, 07 Dec 2022 15:30:45 -0500 X-MC-Unique: EgF7b9GtO0Cg090C2qIeFQ-1 Received: by mail-qv1-f69.google.com with SMTP id jh2-20020a0562141fc200b004c74bbb0affso24904196qvb.21 for ; Wed, 07 Dec 2022 12:30:45 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XK/5vqUd1QlikRxebc+rXesIul2oExyAtkR9smyGVX0=; b=Rt9cXZCJ4yPKSx2wrfp/fFqLSoJacIQkF9tibIs0IJ1wzDivbhie1brSd/lXzVjrgs QofU8Kqh2uBFaaDZsxrAWq72IYdQSUaIQ20f2bm94hfOCE+9jTNAxac7yG5b3W9l6WbJ cfoIxwtSzS0u3jcEzmQsJ0R2cdlTn20Ho5UmniFywfopRUBeMsYf5X8b0GuNBig6GwC7 0soGWivmn7VHdhnctIqwxFiu7VHZabBWnZF6jAcufn4vI9a/hWvuOiOUUpOMyO2nRST/ 4PS2NcT/eKZFup4YptshKZRjh3JePwq3jkuv8IgWT+GcBSgn5O/IxlfxR13mfq0jcFCW 4Xkw== X-Gm-Message-State: ANoB5pnM6WBR4waytnOibtEcrraIIma07HWtFE6FplpXCGrLFUBwUcr+ 47oUN4AHavwfDfdWMyNkoSWKMz8z+4blqsh5mrz5M54rjpTS4AqeQcBb85b1H6GXxsbfz9vOu+Q doxmRhPjHbihge8qKzly0ij/G X-Received: by 2002:ac8:57cd:0:b0:3a6:7b53:9b20 with SMTP id w13-20020ac857cd000000b003a67b539b20mr2152370qta.12.1670445044512; Wed, 07 Dec 2022 12:30:44 -0800 (PST) X-Received: by 2002:ac8:57cd:0:b0:3a6:7b53:9b20 with SMTP id w13-20020ac857cd000000b003a67b539b20mr2152353qta.12.1670445044228; Wed, 07 Dec 2022 12:30:44 -0800 (PST) Received: from x1n.redhat.com (bras-base-aurron9127w-grc-46-70-31-27-79.dsl.bell.ca. [70.31.27.79]) by smtp.gmail.com with ESMTPSA id dc53-20020a05620a523500b006fefa5f7fcesm855594qkb.10.2022.12.07.12.30.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Dec 2022 12:30:43 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Muchun Song , John Hubbard , Andrea Arcangeli , James Houghton , Jann Horn , Rik van Riel , Miaohe Lin , Andrew Morton , Mike Kravetz , peterx@redhat.com, David Hildenbrand , Nadav Amit Subject: [PATCH v2 04/10] mm/hugetlb: Move swap entry handling into vma lock when faulted Date: Wed, 7 Dec 2022 15:30:28 -0500 Message-Id: <20221207203034.650899-5-peterx@redhat.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221207203034.650899-1-peterx@redhat.com> References: <20221207203034.650899-1-peterx@redhat.com> MIME-Version: 1.0 Content-type: text/plain X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751588868402798872?= X-GMAIL-MSGID: =?utf-8?q?1751588868402798872?= In hugetlb_fault(), there used to have a special path to handle swap entry at the entrance using huge_pte_offset(). That's unsafe because huge_pte_offset() for a pmd sharable range can access freed pgtables if without any lock to protect the pgtable from being freed after pmd unshare. Here the simplest solution to make it safe is to move the swap handling to be after the vma lock being held. We may need to take the fault mutex on either migration or hwpoison entries now (also the vma lock, but that's really needed), however neither of them is hot path. Note that the vma lock cannot be released in hugetlb_fault() when the migration entry is detected, because in migration_entry_wait_huge() the pgtable page will be used again (by taking the pgtable lock), so that also need to be protected by the vma lock. Modify migration_entry_wait_huge() so that it must be called with vma read lock held, and properly release the lock in __migration_entry_wait_huge(). Reviewed-by: Mike Kravetz Signed-off-by: Peter Xu Reviewed-by: John Hubbard --- include/linux/swapops.h | 6 ++++-- mm/hugetlb.c | 36 +++++++++++++++--------------------- mm/migrate.c | 25 +++++++++++++++++++++---- 3 files changed, 40 insertions(+), 27 deletions(-) diff --git a/include/linux/swapops.h b/include/linux/swapops.h index a70b5c3a68d7..b134c5eb75cb 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -337,7 +337,8 @@ extern void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep, extern void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, unsigned long address); #ifdef CONFIG_HUGETLB_PAGE -extern void __migration_entry_wait_huge(pte_t *ptep, spinlock_t *ptl); +extern void __migration_entry_wait_huge(struct vm_area_struct *vma, + pte_t *ptep, spinlock_t *ptl); extern void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte); #endif /* CONFIG_HUGETLB_PAGE */ #else /* CONFIG_MIGRATION */ @@ -366,7 +367,8 @@ static inline void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep, static inline void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, unsigned long address) { } #ifdef CONFIG_HUGETLB_PAGE -static inline void __migration_entry_wait_huge(pte_t *ptep, spinlock_t *ptl) { } +static inline void __migration_entry_wait_huge(struct vm_area_struct *vma, + pte_t *ptep, spinlock_t *ptl) { } static inline void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte) { } #endif /* CONFIG_HUGETLB_PAGE */ static inline int is_writable_migration_entry(swp_entry_t entry) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index c8a6673fe5b4..49f73677a418 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5824,22 +5824,6 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, int need_wait_lock = 0; unsigned long haddr = address & huge_page_mask(h); - ptep = huge_pte_offset(mm, haddr, huge_page_size(h)); - if (ptep) { - /* - * Since we hold no locks, ptep could be stale. That is - * OK as we are only making decisions based on content and - * not actually modifying content here. - */ - entry = huge_ptep_get(ptep); - if (unlikely(is_hugetlb_entry_migration(entry))) { - migration_entry_wait_huge(vma, ptep); - return 0; - } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) - return VM_FAULT_HWPOISON_LARGE | - VM_FAULT_SET_HINDEX(hstate_index(h)); - } - /* * Serialize hugepage allocation and instantiation, so that we don't * get spurious allocation failures if two CPUs race to instantiate @@ -5854,10 +5838,6 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, * Acquire vma lock before calling huge_pte_alloc and hold * until finished with ptep. This prevents huge_pmd_unshare from * being called elsewhere and making the ptep no longer valid. - * - * ptep could have already be assigned via huge_pte_offset. That - * is OK, as huge_pte_alloc will return the same value unless - * something has changed. */ hugetlb_vma_lock_read(vma); ptep = huge_pte_alloc(mm, vma, haddr, huge_page_size(h)); @@ -5886,8 +5866,22 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, * fault, and is_hugetlb_entry_(migration|hwpoisoned) check will * properly handle it. */ - if (!pte_present(entry)) + if (!pte_present(entry)) { + if (unlikely(is_hugetlb_entry_migration(entry))) { + /* + * Release fault lock first because the vma lock is + * needed to guard the huge_pte_lockptr() later in + * migration_entry_wait_huge(). The vma lock will + * be released there. + */ + mutex_unlock(&hugetlb_fault_mutex_table[hash]); + migration_entry_wait_huge(vma, ptep); + return 0; + } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) + ret = VM_FAULT_HWPOISON_LARGE | + VM_FAULT_SET_HINDEX(hstate_index(h)); goto out_mutex; + } /* * If we are going to COW/unshare the mapping later, we examine the diff --git a/mm/migrate.c b/mm/migrate.c index 48584b032ea9..d14f1f3ab073 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -333,24 +333,41 @@ void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, } #ifdef CONFIG_HUGETLB_PAGE -void __migration_entry_wait_huge(pte_t *ptep, spinlock_t *ptl) +void __migration_entry_wait_huge(struct vm_area_struct *vma, + pte_t *ptep, spinlock_t *ptl) { pte_t pte; + /* + * The vma read lock must be taken, which will be released before + * the function returns. It makes sure the pgtable page (along + * with its spin lock) not be freed in parallel. + */ + hugetlb_vma_assert_locked(vma); + spin_lock(ptl); pte = huge_ptep_get(ptep); - if (unlikely(!is_hugetlb_entry_migration(pte))) + if (unlikely(!is_hugetlb_entry_migration(pte))) { spin_unlock(ptl); - else + hugetlb_vma_unlock_read(vma); + } else { + /* + * If migration entry existed, safe to release vma lock + * here because the pgtable page won't be freed without the + * pgtable lock released. See comment right above pgtable + * lock release in migration_entry_wait_on_locked(). + */ + hugetlb_vma_unlock_read(vma); migration_entry_wait_on_locked(pte_to_swp_entry(pte), NULL, ptl); + } } void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte) { spinlock_t *ptl = huge_pte_lockptr(hstate_vma(vma), vma->vm_mm, pte); - __migration_entry_wait_huge(pte, ptl); + __migration_entry_wait_huge(vma, pte, ptl); } #endif From patchwork Wed Dec 7 20:30:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 31027 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp393277wrr; Wed, 7 Dec 2022 12:36:10 -0800 (PST) X-Google-Smtp-Source: AA0mqf5u4VhY7koSfi4ac1JDhttVFfn7+Ad3RM9Yu3wgbqXzl/bWyaOMb1puQzk9533Q/auw/byU X-Received: by 2002:a17:907:a50a:b0:7c0:7902:885f with SMTP id vr10-20020a170907a50a00b007c07902885fmr19142432ejc.233.1670445370586; Wed, 07 Dec 2022 12:36:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670445370; cv=none; d=google.com; s=arc-20160816; b=I/CKCA1rPDtTMTArjF1dzZxhWhbKf5ETdrD/jYoS5o1AWRhD+55eqQcE345K8cGwZy SYXaNIOJRlRBKTqdjxOetkMsEetafmtlKWbVdc0QbOcm41OdpH0LYyji5YsOxZFTpyXj rx0aHfsVYbsW1PwosIbpxn0q57XQgNbg/4YoK77rLWK+BgYdiXpXEPbrYXb708xEAp1m OXyxiyswBDgObTeKP+Kus3aua/BAhDTznk+L8BE8LX5huBRz100Y27vyl6vE+Mreb41A qBMl+htN6n7/+a30NwX+rtgyTkvHOSIPlZMuah/2PYv9UUx1TgIA0Uj+7TxsVEUXnUAx K8IQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=v+Krxeq6Jc+PcaZJFIdSBIisnuJapI9S7rvolPGgBf8=; b=Bnh+OpUR/70Hl+MVyxYyG1Jc/DD4v5W7UqfVv00q83UPj6vRC8yRwkaSTEE8GVXbnn sv44BmbUeVOMTZDnTOo1gLgrBjKg5PQ/TnIbDS4hCo7nANvvNXDAADovQmuAFdmxeh8/ aGmKb06/XwZW3wHh0mLwHFjuzBWpv8fqdLehXVg09nqdjEEUH/bML1WkmnvbwfxRyA8R u37TE8ntIh6g1QIR+bnhZggQAYE2Qob4+6Vt/sNQhVwrUzaaIf1lCL6/CP5yS2u39lek k8W+/+ehxQESjeJE5yI+3PR6Oc4lVlp81B5GQFCO7AUDynVR65X10EyxGJY/BTgpgyV9 vkZQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=PTmas97J; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id dz9-20020a0564021d4900b00468fdd316e6si4780792edb.542.2022.12.07.12.35.47; Wed, 07 Dec 2022 12:36:10 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=PTmas97J; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229964AbiLGUcQ (ORCPT + 99 others); Wed, 7 Dec 2022 15:32:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49318 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229905AbiLGUbz (ORCPT ); Wed, 7 Dec 2022 15:31:55 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3CD0F7BC1E for ; Wed, 7 Dec 2022 12:30:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1670445055; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=v+Krxeq6Jc+PcaZJFIdSBIisnuJapI9S7rvolPGgBf8=; b=PTmas97Jb93qnuAog8k00coQ9bjf3eaC8ajXy2xigRSNgTiAb1RCoAxGG9h+XEXeAZSHmu ilQplszTVbbfQzO2cKGPcReX/6bUk3MlVMU0K+PUT2U23NCvXdY16Rk1Zbd5dkUHBibpKb VEexxEwkrsfWEER+1wP4KC1ExugASDo= Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-482-2GMUU1jMPreIwdjOd4-7ng-1; Wed, 07 Dec 2022 15:30:46 -0500 X-MC-Unique: 2GMUU1jMPreIwdjOd4-7ng-1 Received: by mail-qt1-f199.google.com with SMTP id w27-20020a05622a191b00b003a56c0e1cd0so38190777qtc.4 for ; Wed, 07 Dec 2022 12:30:46 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=v+Krxeq6Jc+PcaZJFIdSBIisnuJapI9S7rvolPGgBf8=; b=vUCvmUX2JnCXTdddOHtMiVBaPxSpRSwZCiAtgeOt1oLrBoxZp9Jpf3ozunGONn5uY4 M+0M0XYGRabNT2PyHuh0w227QBB+A72HokKOwmnNIf8+XiyXNqggwV1OJwyT1InZLA9r 3E4FdTTgTiyc64/fDinb9gQJLAOCHdtyLou4yYa8hz8djwg7263ZOWxm2YNRS9bsTOab qtv5tY71/6wLapolkJkfQc1AgpvqESCt4+ixQHGnQXwzyDo+Iu7ilObnRwpMW3MMgVTm AdY8hIbuluXitIUesiPBl/0qrco8hiQrvewPtTWGKmhhtuWPkPmVt+jJt2yxVpUe8XNe JaHQ== X-Gm-Message-State: ANoB5pmAFQBZIjx2MIsg323IzVIm3TgSMXxuVbY8Obtd8fwEgDcHB74J k0WstnuzRi/9aKDNAVPYulg1t+46duvlLzzS3+xyvZUSjKweAHBAZ5SwrX34cBTpckNM5gVSMNX QySFFSrRZ/qeNzvqPA4/tDdvZ X-Received: by 2002:ad4:4049:0:b0:4c6:e720:39ff with SMTP id r9-20020ad44049000000b004c6e72039ffmr1560027qvp.28.1670445046159; Wed, 07 Dec 2022 12:30:46 -0800 (PST) X-Received: by 2002:ad4:4049:0:b0:4c6:e720:39ff with SMTP id r9-20020ad44049000000b004c6e72039ffmr1560015qvp.28.1670445045854; Wed, 07 Dec 2022 12:30:45 -0800 (PST) Received: from x1n.redhat.com (bras-base-aurron9127w-grc-46-70-31-27-79.dsl.bell.ca. [70.31.27.79]) by smtp.gmail.com with ESMTPSA id dc53-20020a05620a523500b006fefa5f7fcesm855594qkb.10.2022.12.07.12.30.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Dec 2022 12:30:45 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Muchun Song , John Hubbard , Andrea Arcangeli , James Houghton , Jann Horn , Rik van Riel , Miaohe Lin , Andrew Morton , Mike Kravetz , peterx@redhat.com, David Hildenbrand , Nadav Amit Subject: [PATCH v2 05/10] mm/hugetlb: Make userfaultfd_huge_must_wait() safe to pmd unshare Date: Wed, 7 Dec 2022 15:30:29 -0500 Message-Id: <20221207203034.650899-6-peterx@redhat.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221207203034.650899-1-peterx@redhat.com> References: <20221207203034.650899-1-peterx@redhat.com> MIME-Version: 1.0 Content-type: text/plain X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751588925173329629?= X-GMAIL-MSGID: =?utf-8?q?1751588925173329629?= We can take the hugetlb walker lock, here taking vma lock directly. Reviewed-by: David Hildenbrand Reviewed-by: Mike Kravetz Signed-off-by: Peter Xu Reviewed-by: John Hubbard --- fs/userfaultfd.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 07c81ab3fd4d..a602f008dde5 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -376,7 +376,8 @@ static inline unsigned int userfaultfd_get_blocking_state(unsigned int flags) */ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) { - struct mm_struct *mm = vmf->vma->vm_mm; + struct vm_area_struct *vma = vmf->vma; + struct mm_struct *mm = vma->vm_mm; struct userfaultfd_ctx *ctx; struct userfaultfd_wait_queue uwq; vm_fault_t ret = VM_FAULT_SIGBUS; @@ -403,7 +404,7 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) */ mmap_assert_locked(mm); - ctx = vmf->vma->vm_userfaultfd_ctx.ctx; + ctx = vma->vm_userfaultfd_ctx.ctx; if (!ctx) goto out; @@ -493,6 +494,13 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) blocking_state = userfaultfd_get_blocking_state(vmf->flags); + /* + * This stablizes pgtable for hugetlb on e.g. pmd unsharing. Need + * to be before setting current state. + */ + if (is_vm_hugetlb_page(vma)) + hugetlb_vma_lock_read(vma); + spin_lock_irq(&ctx->fault_pending_wqh.lock); /* * After the __add_wait_queue the uwq is visible to userland @@ -507,13 +515,15 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) set_current_state(blocking_state); spin_unlock_irq(&ctx->fault_pending_wqh.lock); - if (!is_vm_hugetlb_page(vmf->vma)) + if (!is_vm_hugetlb_page(vma)) must_wait = userfaultfd_must_wait(ctx, vmf->address, vmf->flags, reason); else - must_wait = userfaultfd_huge_must_wait(ctx, vmf->vma, + must_wait = userfaultfd_huge_must_wait(ctx, vma, vmf->address, vmf->flags, reason); + if (is_vm_hugetlb_page(vma)) + hugetlb_vma_unlock_read(vma); mmap_read_unlock(mm); if (likely(must_wait && !READ_ONCE(ctx->released))) { From patchwork Wed Dec 7 20:30:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 31025 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp393042wrr; Wed, 7 Dec 2022 12:35:27 -0800 (PST) X-Google-Smtp-Source: AA0mqf4WiHhUgB3cAIXkuYPfujSqhMZstgPTz7hX3bJ+a84s9fHBBBehNCXtYZVsNvuJAF3CwKuK X-Received: by 2002:a17:906:840e:b0:7c0:d94c:7384 with SMTP id n14-20020a170906840e00b007c0d94c7384mr16928122ejx.109.1670445327542; Wed, 07 Dec 2022 12:35:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670445327; cv=none; d=google.com; s=arc-20160816; b=Y5RJoh1ptVq1mI6vdYZ6dRHil6g5eYkT+de4DXN4PbO9medPLzpxuCaYzHdqcTfa+1 Hrs7SFvvmLRUvJ1poe1PO8POoxhoweR0Bs5hxGVMadYZiotKKJ2+klYThT9H0hglSsOd Pu7B3mFAQI6D5pgDa5juFNZ6FpbR87xQGjIDPr/AS9PL8kYcEsE+drFAMW5tEFSZabaj 3X6XZ8GM6KmP+aC/u3m7dQG4hfQMvS+9bmiKqdMkzoFJPwIaj8IkLg1c5IzY7FNpOLoy 3TsQP+1GhwxZ3D9qxcmYjp5208QM1VuXP59mxzd9oWkf4i+milAm+YRNTHB8WHxucNEF qt/Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=PxW1+7FnblUXilUivNjEMBQoO6EpHq3+X52yjUcQpsM=; b=w62XN6164TKOBchKayYcfAXfvQ+FdFNIZ4N2CPAutR2BQ6nztESe/ivjzsuOcuqlFS xY4JBeZQC4MsGYKFmgIaq1DtFvZPAfpFBeljr0hgSxCsULECe6YoLM4knSglTYkS1zym DLaFWgz5ftehN0LPCo23W/krN0Ninve/L5DNy0+0KlsbXcfbR6C3Wkbh/11FPoAG82hK XvhCU5ZD0Z7X6nhBpEzgSZjEnGjj6Kc6CdW9JwpR49Pp9NEqtBpfvAEWzdoa+4eXV/iN LwA1nlWubUmsyWyjDq6cKfMU4dPfFL8g02KStBp/gObAk3OlL999M+j0+rABJZj6oxd4 wbZA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=O2SbdXyZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id he18-20020a1709073d9200b00782e437a368si18456791ejc.160.2022.12.07.12.35.04; Wed, 07 Dec 2022 12:35:27 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=O2SbdXyZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229948AbiLGUcG (ORCPT + 99 others); Wed, 7 Dec 2022 15:32:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49200 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229885AbiLGUbw (ORCPT ); Wed, 7 Dec 2022 15:31:52 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5ED5130573 for ; Wed, 7 Dec 2022 12:30:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1670445051; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PxW1+7FnblUXilUivNjEMBQoO6EpHq3+X52yjUcQpsM=; b=O2SbdXyZpKklrTHYxm5bTPaPWW5Yow3P1uLvnYHlM97GmfxL+QD7RDrcDdWDdp5zGbBpxg VJwEUQmAxILNB0/ArXSZdY0iN6JbbOv5u4MwIF3TL/GXmrQBBpAI4oeKFD1G6LYBkF185r ZGq8t4C2ZaVvhCIR6H/oYcuWre2MJZo= Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-664-vRZXXeIlPNW-QusS_hoOmw-1; Wed, 07 Dec 2022 15:30:48 -0500 X-MC-Unique: vRZXXeIlPNW-QusS_hoOmw-1 Received: by mail-qt1-f200.google.com with SMTP id cm12-20020a05622a250c00b003a521f66e8eso39653106qtb.17 for ; Wed, 07 Dec 2022 12:30:48 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PxW1+7FnblUXilUivNjEMBQoO6EpHq3+X52yjUcQpsM=; b=7s0vKwbZLDj+Z+EwORus8sT5GMdhxpqsPy9CuoB3iScVCeTbv+BuDorHGcQ61XWlMJ 27SjLBd/tx9G5XguPTSJ26SO8oW7VV79pTas3h529ELqEsZGvtXSUcrT8GREfAqT9Tvt s28PpTFrgT8JZgTXMUGsBM+PoFe2w6xL6DVT4LWIQh6KAtthoEhG8d0CmHFETljLPZX+ HaOqp1FbCw7Yg+IUYzPbT6hNQVxFYoAV4UdNNxKp4GfgF97YDCm9p0WjbCA93oDMLR5p nt5DYpTTB/M/GxifP0qb0pVE0pi8+KPk8RwCnZN33f3BgM3Lv3TE/1KwsAEHSYVssaij NzHQ== X-Gm-Message-State: ANoB5pl9675HQOBQAKTOSHqtWQqrbbxhwL5kimG1ZMPT2gbNl2GIZ5X2 9xQfXQQGZ/SHWDqF56aUqTh6+Rq5r2zn8+cE8rsBfzN17t5UnPdhT0g7hfu4lDYkhJr626xQNzR K348JMdD3Cr4iFI+9hzqmyMZz X-Received: by 2002:a05:622a:1b1c:b0:3a6:a205:741f with SMTP id bb28-20020a05622a1b1c00b003a6a205741fmr1433034qtb.32.1670445047602; Wed, 07 Dec 2022 12:30:47 -0800 (PST) X-Received: by 2002:a05:622a:1b1c:b0:3a6:a205:741f with SMTP id bb28-20020a05622a1b1c00b003a6a205741fmr1433023qtb.32.1670445047366; Wed, 07 Dec 2022 12:30:47 -0800 (PST) Received: from x1n.redhat.com (bras-base-aurron9127w-grc-46-70-31-27-79.dsl.bell.ca. [70.31.27.79]) by smtp.gmail.com with ESMTPSA id dc53-20020a05620a523500b006fefa5f7fcesm855594qkb.10.2022.12.07.12.30.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Dec 2022 12:30:47 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Muchun Song , John Hubbard , Andrea Arcangeli , James Houghton , Jann Horn , Rik van Riel , Miaohe Lin , Andrew Morton , Mike Kravetz , peterx@redhat.com, David Hildenbrand , Nadav Amit Subject: [PATCH v2 06/10] mm/hugetlb: Make hugetlb_follow_page_mask() safe to pmd unshare Date: Wed, 7 Dec 2022 15:30:30 -0500 Message-Id: <20221207203034.650899-7-peterx@redhat.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221207203034.650899-1-peterx@redhat.com> References: <20221207203034.650899-1-peterx@redhat.com> MIME-Version: 1.0 Content-type: text/plain X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751588879921088745?= X-GMAIL-MSGID: =?utf-8?q?1751588879921088745?= Since hugetlb_follow_page_mask() walks the pgtable, it needs the vma lock to make sure the pgtable page will not be freed concurrently. Acked-by: David Hildenbrand Reviewed-by: Mike Kravetz Signed-off-by: Peter Xu Reviewed-by: John Hubbard --- mm/hugetlb.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 49f73677a418..3fbbd599d015 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6226,9 +6226,10 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, if (WARN_ON_ONCE(flags & FOLL_PIN)) return NULL; + hugetlb_vma_lock_read(vma); pte = huge_pte_offset(mm, haddr, huge_page_size(h)); if (!pte) - return NULL; + goto out_unlock; ptl = huge_pte_lock(h, mm, pte); entry = huge_ptep_get(pte); @@ -6251,6 +6252,8 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, } out: spin_unlock(ptl); +out_unlock: + hugetlb_vma_unlock_read(vma); return page; } From patchwork Wed Dec 7 20:30:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 31026 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp393207wrr; Wed, 7 Dec 2022 12:35:59 -0800 (PST) X-Google-Smtp-Source: AA0mqf6Q6780C2o7tH8C3YQHwkDYf9IKsaqV6NeThBfziZEMVrsAwuAKRF7QDMDYeKVpOB3ZqgLs X-Received: by 2002:a05:6402:13cf:b0:46d:83ea:44e6 with SMTP id a15-20020a05640213cf00b0046d83ea44e6mr1707399edx.179.1670445358892; Wed, 07 Dec 2022 12:35:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670445358; cv=none; d=google.com; s=arc-20160816; b=NBMLKZcpQ/eAytgs3a/cdyrdTA3es1HTwUodMGCaPDv8rMhUuTkbAyJHBrrsX6Zozk RQRuenNRMqm3o/OcFqYND1o3ts8h5xzJG1HMiSR7CkV047tDegekfN0YUigV87i6l4qQ FVzebfaflQ1v7m7u39p60rZVzcKJcFxFxEKHT7jnYZyvhkfxJvlhGxjT2SVPGNBwawO6 SqMNKmSsGlMVER2tPZFuQF+KQtyMKn3tisjbklNytT478eK+GcpL2Hg2WHG7OOJMBfOL vh/f2JVOyt+c/NTqs+BcpphbUYZfjvWuYyXUtU9tnUzNzvuC8BSeYLdnxW+chEdms7t/ 9PTA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=NwIY6qGYW16H7RKgjgoL5wZJoRvSXS6wmhu5l+J439g=; b=AaB9cTUUw3oE+HUEIRzp4ZDuJchl3n/Dlp++ybSi7wX1UNudr7uEauVhYmFczzREnU qH5kodPTEXWeZyTBUUNWIgab3DZe/fs71DUoZ8Gr5//oEw8X1gOYM8RLxhBgUzjoewtu o+fJKnPLHrD0MxGcm7/TpwwGRQdTssYJQhS1n2xW10jVK8ejOrR9bhUzQIWNibfwyw1C ZW2yRJfl2+N/M23pWwhxTSrtQgBHeo2d+xrMMmF2nSgEThSBPX+b36Qj9FbcqldMPz16 RP8SLYyreLosTVmnKUdrp23OS7JPlZJnJ+ujU7w0MeHboEZs0pb8vtd4dYU9gkCiKsb8 xW8w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=AhUEBWL+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id t18-20020a056402525200b0046194b2dd53si5864892edd.119.2022.12.07.12.35.35; Wed, 07 Dec 2022 12:35:58 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=AhUEBWL+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229952AbiLGUcL (ORCPT + 99 others); Wed, 7 Dec 2022 15:32:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49300 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229895AbiLGUbx (ORCPT ); Wed, 7 Dec 2022 15:31:53 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6B39C7BC15 for ; Wed, 7 Dec 2022 12:30:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1670445054; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NwIY6qGYW16H7RKgjgoL5wZJoRvSXS6wmhu5l+J439g=; b=AhUEBWL+EhRw8UGgRw0Uh18wEacaCdSRFVWCsNpFHpE4QZMya95MYJMK9SsyelwzI8Xkru JV6OIk3gYL+vo/6Uja8RQdq/jPDX5N1zguYK+Js/TD5tZ1jba6KBmBl5yMYAb1Hz9RVb0N 4gNcww6OpH7nH0IXcSqZNjUbyeScSqs= Received: from mail-qv1-f70.google.com (mail-qv1-f70.google.com [209.85.219.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-100-nOivNHiJMc6c9UA2_eiIXg-1; Wed, 07 Dec 2022 15:30:49 -0500 X-MC-Unique: nOivNHiJMc6c9UA2_eiIXg-1 Received: by mail-qv1-f70.google.com with SMTP id a15-20020ad441cf000000b004c79ef7689aso104539qvq.14 for ; Wed, 07 Dec 2022 12:30:49 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=NwIY6qGYW16H7RKgjgoL5wZJoRvSXS6wmhu5l+J439g=; b=xBIZIel23oq8vB6znEB6cr7UeKiFCnPMtrkFhJJqFo7tFErI2ukc3WDs2OanpRIEtS GGnc7nl8GolhQs5D0rrwQQWHuk+Ob0WVbQ3L6R1ndTzTsCDTI4ppYYQ6LtIj9Bum3xwL EhRNsv+eap5GaWpS0Tay/lZz0xc4fgPvcGs9o9VEQBcLYF+DjmI0eVnOvx/jeRuz5CqO owuQ7cs6/zH5WvK7aiLLEpvnpWuZqoVjL1Qvb90J88R59L7RYeUfq/pnIVG05fVSJzId EoQ4yQDzusoa+aPjMArwt68Tcn9RcX9c5FROW5j5zrv7SkNeFRP39fxa41ngbWNvaS49 gbDw== X-Gm-Message-State: ANoB5pmBTzMTYE/qxk8endFwvmdi4CTs1AnFp1YHTv1bWJ1SRvkd0xtP sfGUbho1Oxkiz1NLPKPJDZ69KxVFk5ipt83hMTMtrGcOZI4AaP4HAP22lxqvyAX/q+205Z7bKki 2U9o9pN5i15G8/7jRPkCBDRBu X-Received: by 2002:a05:6214:284:b0:4c7:4f6d:d9bd with SMTP id l4-20020a056214028400b004c74f6dd9bdmr1355994qvv.38.1670445049159; Wed, 07 Dec 2022 12:30:49 -0800 (PST) X-Received: by 2002:a05:6214:284:b0:4c7:4f6d:d9bd with SMTP id l4-20020a056214028400b004c74f6dd9bdmr1355984qvv.38.1670445048921; Wed, 07 Dec 2022 12:30:48 -0800 (PST) Received: from x1n.redhat.com (bras-base-aurron9127w-grc-46-70-31-27-79.dsl.bell.ca. [70.31.27.79]) by smtp.gmail.com with ESMTPSA id dc53-20020a05620a523500b006fefa5f7fcesm855594qkb.10.2022.12.07.12.30.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Dec 2022 12:30:48 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Muchun Song , John Hubbard , Andrea Arcangeli , James Houghton , Jann Horn , Rik van Riel , Miaohe Lin , Andrew Morton , Mike Kravetz , peterx@redhat.com, David Hildenbrand , Nadav Amit Subject: [PATCH v2 07/10] mm/hugetlb: Make follow_hugetlb_page() safe to pmd unshare Date: Wed, 7 Dec 2022 15:30:31 -0500 Message-Id: <20221207203034.650899-8-peterx@redhat.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221207203034.650899-1-peterx@redhat.com> References: <20221207203034.650899-1-peterx@redhat.com> MIME-Version: 1.0 Content-type: text/plain X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751588912409544275?= X-GMAIL-MSGID: =?utf-8?q?1751588912409544275?= Since follow_hugetlb_page() walks the pgtable, it needs the vma lock to make sure the pgtable page will not be freed concurrently. Acked-by: David Hildenbrand Reviewed-by: Mike Kravetz Signed-off-by: Peter Xu Reviewed-by: John Hubbard --- mm/hugetlb.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 3fbbd599d015..f42399522805 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6284,6 +6284,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, break; } + hugetlb_vma_lock_read(vma); /* * Some archs (sparc64, sh*) have multiple pte_ts to * each hugepage. We have to make sure we get the @@ -6308,6 +6309,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, !hugetlbfs_pagecache_present(h, vma, vaddr)) { if (pte) spin_unlock(ptl); + hugetlb_vma_unlock_read(vma); remainder = 0; break; } @@ -6329,6 +6331,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, if (pte) spin_unlock(ptl); + hugetlb_vma_unlock_read(vma); + if (flags & FOLL_WRITE) fault_flags |= FAULT_FLAG_WRITE; else if (unshare) @@ -6388,6 +6392,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, remainder -= pages_per_huge_page(h); i += pages_per_huge_page(h); spin_unlock(ptl); + hugetlb_vma_unlock_read(vma); continue; } @@ -6415,6 +6420,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, if (WARN_ON_ONCE(!try_grab_folio(pages[i], refs, flags))) { spin_unlock(ptl); + hugetlb_vma_unlock_read(vma); remainder = 0; err = -ENOMEM; break; @@ -6426,6 +6432,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, i += refs; spin_unlock(ptl); + hugetlb_vma_unlock_read(vma); } *nr_pages = remainder; /* From patchwork Wed Dec 7 20:30:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 31028 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp393361wrr; Wed, 7 Dec 2022 12:36:25 -0800 (PST) X-Google-Smtp-Source: AA0mqf6M/0GkPe0UUuwZtd1VW+D94ITLtJgy5QTLQ/+m/5q8RZXiDATsYQgrDKDMhXoDDLnynFBo X-Received: by 2002:a17:906:6093:b0:78d:b37c:83d9 with SMTP id t19-20020a170906609300b0078db37c83d9mr61962344ejj.637.1670445385673; Wed, 07 Dec 2022 12:36:25 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670445385; cv=none; d=google.com; s=arc-20160816; b=zc02n3Ctw54jaSMomblh2CqjnbZ542nq9F6SD2OYCP1p5ga8ZSg8L9xfoH3BXvpLle jFSmkCxr5hvb1gz3vu5v8uVD2lKl4m8pqu1vXdQ0GigG3WBP9H5QPrYQXd8e+jwcWTKP LvVbHCrozi5LIgSUK9Jj76uU+sVzeEfkcJeFi+NXtxVB7WTFLfC2M6D5dtuU2QWMDVSC JFTg4F1xP/XBaDU+R/iM9IOZ14qLtWS+aR/7f93QWklni8vwnsSRECVlBBxdvFTVdIqH ViqWn35KQIZmBLIJHirNxpfez34kNaXnzoq2y9oYZrpA3lhOd2HoxOr+tXmwJssWCBlB ayqw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=dnUJ2i61erGKHgVVBS7fwgSAnx0w9hoDMCwKHPZDtzw=; b=jJcw3tOmlcgNPwWDV0tlhNtvbZAv+UKTlsxTXJ7zylQSe+K0J380LiuGKGMNI3FAjD andffCGeiLPqpMPXTqokPliijM7lMiAuS+xey3e3NAIk4ll+QpjHjpgbspZnmsB+Vrme dVjDPOSn5XxZCdIEndMBaj08C5s94lj0tYOVLojPo9Pwl/JR9D7U5Y/LAvC0XVv641nM 09iz9THpXlOQ1VRiQ3YptLpgNcHQZmM+wVNVXf97oq42WXg1y/mVuHfmOW9PWk88IjJa 11MbQCQF5MOX0RbTo1PXG9CGZ1ohUZXx60k1mh/L7hlbycsicMb7Xx/fTNX8+9he/o3w O3QQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=f6s6nbfU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id mp41-20020a1709071b2900b0073ce34d1a13si17405531ejc.499.2022.12.07.12.36.01; Wed, 07 Dec 2022 12:36:25 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=f6s6nbfU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229969AbiLGUcY (ORCPT + 99 others); Wed, 7 Dec 2022 15:32:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49358 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229797AbiLGUb7 (ORCPT ); Wed, 7 Dec 2022 15:31:59 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 307517B57D for ; Wed, 7 Dec 2022 12:31:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1670445062; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dnUJ2i61erGKHgVVBS7fwgSAnx0w9hoDMCwKHPZDtzw=; b=f6s6nbfUCZYloXQKbEpunbY8UjvK37tq2gruLTfSjBW5VmtTbTOgRLKvWqS3Eu+rpimxPI eEswRhxHlf7xop6edqNJgRCELKQAnbQKa/xJ7/LshzoElmpQz36+K7iLB+sX5I+UNCzWnd i7bJ2IO9B6BmP/5166yDizoOAMzYgls= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-458-7l3cns-tOk2acgjeiX0obQ-1; Wed, 07 Dec 2022 15:30:53 -0500 X-MC-Unique: 7l3cns-tOk2acgjeiX0obQ-1 Received: by mail-qt1-f198.google.com with SMTP id w27-20020a05622a191b00b003a56c0e1cd0so38191158qtc.4 for ; Wed, 07 Dec 2022 12:30:53 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=dnUJ2i61erGKHgVVBS7fwgSAnx0w9hoDMCwKHPZDtzw=; b=Lb9ecmWMME5qcb/t8efYrJ4pCuzY1Te/CLckWh39jHUNsA8DB/D9mZup6heRt93Gur 7DIZ9cVQt5CxqZInykzBfYCJ8YKDjO3y+Q40NpaGv3PZ5eGKWZCX0SBL554cSvVLESFx /qzdLrNP4SwFYlLooxP0EYmsrPpTg5/SwnzMPJ7eEqmP0/3BR2jLRCxKoGh+FgWA7RGA CAWU9vHST7daqULrFEmej8KWdeGyJsJvit0zk2pzkbE9h+hFnEJn5InZ2q/FzU7lZhZq 0kBjyBbWQDMCD5SGtYJN/ngLngMIniW16E3Od6+bcyVxVajYDUO3SFW2WfiBv6b4st7W IgUg== X-Gm-Message-State: ANoB5pmqCZuz9jyeEep+SJnH0qCOP+Ly1iUSlki4fmSxgHhd4AtsztE0 WsHx51OCqSkeU+nyjj+LgU1tOjxrUJLolnVwFDcS6DKkoennSauRjlmvH2UNMe/YOBugLJY012g FScT4QLRlR8EsNh7bZaM02Tsb X-Received: by 2002:a05:622a:4116:b0:39c:e5bf:8162 with SMTP id cc22-20020a05622a411600b0039ce5bf8162mr1523508qtb.55.1670445050825; Wed, 07 Dec 2022 12:30:50 -0800 (PST) X-Received: by 2002:a05:622a:4116:b0:39c:e5bf:8162 with SMTP id cc22-20020a05622a411600b0039ce5bf8162mr1523492qtb.55.1670445050551; Wed, 07 Dec 2022 12:30:50 -0800 (PST) Received: from x1n.redhat.com (bras-base-aurron9127w-grc-46-70-31-27-79.dsl.bell.ca. [70.31.27.79]) by smtp.gmail.com with ESMTPSA id dc53-20020a05620a523500b006fefa5f7fcesm855594qkb.10.2022.12.07.12.30.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Dec 2022 12:30:50 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Muchun Song , John Hubbard , Andrea Arcangeli , James Houghton , Jann Horn , Rik van Riel , Miaohe Lin , Andrew Morton , Mike Kravetz , peterx@redhat.com, David Hildenbrand , Nadav Amit Subject: [PATCH v2 08/10] mm/hugetlb: Make walk_hugetlb_range() safe to pmd unshare Date: Wed, 7 Dec 2022 15:30:32 -0500 Message-Id: <20221207203034.650899-9-peterx@redhat.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221207203034.650899-1-peterx@redhat.com> References: <20221207203034.650899-1-peterx@redhat.com> MIME-Version: 1.0 Content-type: text/plain X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751588940643588683?= X-GMAIL-MSGID: =?utf-8?q?1751588940643588683?= Since walk_hugetlb_range() walks the pgtable, it needs the vma lock to make sure the pgtable page will not be freed concurrently. Reviewed-by: Mike Kravetz Signed-off-by: Peter Xu Reviewed-by: John Hubbard --- arch/s390/mm/gmap.c | 2 ++ fs/proc/task_mmu.c | 2 ++ include/linux/pagewalk.h | 11 ++++++++++- mm/hmm.c | 15 ++++++++++++++- mm/pagewalk.c | 2 ++ 5 files changed, 30 insertions(+), 2 deletions(-) diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c index 8947451ae021..292a54c490d4 100644 --- a/arch/s390/mm/gmap.c +++ b/arch/s390/mm/gmap.c @@ -2643,7 +2643,9 @@ static int __s390_enable_skey_hugetlb(pte_t *pte, unsigned long addr, end = start + HPAGE_SIZE - 1; __storage_key_init_range(start, end); set_bit(PG_arch_1, &page->flags); + hugetlb_vma_unlock_read(walk->vma); cond_resched(); + hugetlb_vma_lock_read(walk->vma); return 0; } diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index e35a0398db63..cf3887fb2905 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -1613,7 +1613,9 @@ static int pagemap_hugetlb_range(pte_t *ptep, unsigned long hmask, frame++; } + hugetlb_vma_unlock_read(walk->vma); cond_resched(); + hugetlb_vma_lock_read(walk->vma); return err; } diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h index 959f52e5867d..27a6df448ee5 100644 --- a/include/linux/pagewalk.h +++ b/include/linux/pagewalk.h @@ -21,7 +21,16 @@ struct mm_walk; * depth is -1 if not known, 0:PGD, 1:P4D, 2:PUD, 3:PMD. * Any folded depths (where PTRS_PER_P?D is equal to 1) * are skipped. - * @hugetlb_entry: if set, called for each hugetlb entry + * @hugetlb_entry: if set, called for each hugetlb entry. This hook + * function is called with the vma lock held, in order to + * protect against a concurrent freeing of the pte_t* or + * the ptl. In some cases, the hook function needs to drop + * and retake the vma lock in order to avoid deadlocks + * while calling other functions. In such cases the hook + * function must either refrain from accessing the pte or + * ptl after dropping the vma lock, or else revalidate + * those items after re-acquiring the vma lock and before + * accessing them. * @test_walk: caller specific callback function to determine whether * we walk over the current vma or not. Returning 0 means * "do page table walk over the current vma", returning diff --git a/mm/hmm.c b/mm/hmm.c index 3850fb625dda..796de6866089 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -493,8 +493,21 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask, required_fault = hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, cpu_flags); if (required_fault) { + int ret; + spin_unlock(ptl); - return hmm_vma_fault(addr, end, required_fault, walk); + hugetlb_vma_unlock_read(vma); + /* + * Avoid deadlock: drop the vma lock before calling + * hmm_vma_fault(), which will itself potentially take and + * drop the vma lock. This is also correct from a + * protection point of view, because there is no further + * use here of either pte or ptl after dropping the vma + * lock. + */ + ret = hmm_vma_fault(addr, end, required_fault, walk); + hugetlb_vma_lock_read(vma); + return ret; } pfn = pte_pfn(entry) + ((start & ~hmask) >> PAGE_SHIFT); diff --git a/mm/pagewalk.c b/mm/pagewalk.c index 7f1c9b274906..d98564a7be57 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -302,6 +302,7 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end, const struct mm_walk_ops *ops = walk->ops; int err = 0; + hugetlb_vma_lock_read(vma); do { next = hugetlb_entry_end(h, addr, end); pte = huge_pte_offset(walk->mm, addr & hmask, sz); @@ -314,6 +315,7 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end, if (err) break; } while (addr = next, addr != end); + hugetlb_vma_unlock_read(vma); return err; } From patchwork Wed Dec 7 20:31:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 31030 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp394809wrr; Wed, 7 Dec 2022 12:40:48 -0800 (PST) X-Google-Smtp-Source: AA0mqf7PTPCGlVkURPKchejVT24zWaFCT6MQWmQDpZ6BDNK2V1audyWRy5yyIjOKAKodG3Y2a0ur X-Received: by 2002:a17:906:4a5a:b0:7c0:e6d7:dabc with SMTP id a26-20020a1709064a5a00b007c0e6d7dabcmr14528579ejv.227.1670445647884; Wed, 07 Dec 2022 12:40:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670445647; cv=none; d=google.com; s=arc-20160816; b=sng08Ad9u4FP+uCqe+HC6AINSm1IYUVjbOTM/Shn8l6ZPOSusV2d9hG7TQ0u9rhFSL optOb4AQWqIC0Y/ibcjtyvHUo7ZwXCTmksLzOv8UUvLbeaRQWj+Nh05hqADImnVHfLSn qhR0I1jxuILFI4BVa/ejxEm7KgvvfNZW+8k+3Lylnx7e77K1DyrqCxTmbp1oE8EQWUFC gF48LT7FfS4KZSyi2ylmS+GgdbTdcwEwkSKsCf6bG9+kwAmu2KETOG8LT7ipFrhqSXtB t/olKhyDaKGFN7sU2r2EpFB8dhkQDVAEzc/9bUSz2Esdf6moIu2tDkPtvvn1GWQhHXH0 k4AA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=DOnWN4xjgNQJFSPMI7LR+EFJPU3TVyZrxc0oybMi1t4=; b=A3ZPo9sLKTiCL5nhIvPLEHINqgjgaq9XMyf9pj6nOE0MWVYIqvHZ8oN8+Z0j0b1ALg K5qyvqQ4GoqR1ce72nFEdt/26qANwJ/iFcXiwm8jNilStPFJprKu9iLip8LmLBcmdABZ GPaP3HKUuqI8caB64YIhZeX3KkpOvsp5/bNAf+TXgyJaPfyAzCW53Svfpatpy06o/D9Q KDgtdPFGed9uLnEyBcSqmfgwQRrwBA+Tgfd48y5b/Lvn88MEsbUPC9ud/4BS8byH0Snx EAYM8gOBNN0/tEchqZKJifPN/YqXqDHkDI8UATVfoqnZu+cb3KjXT0Pjtk1w7LqsS0ye pjWA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=AQrYTfs8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l12-20020aa7c3cc000000b0046107f951f9si4460014edr.237.2022.12.07.12.40.24; Wed, 07 Dec 2022 12:40:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=AQrYTfs8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229898AbiLGUdh (ORCPT + 99 others); Wed, 7 Dec 2022 15:33:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49200 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229742AbiLGUdQ (ORCPT ); Wed, 7 Dec 2022 15:33:16 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5CEA87E406 for ; Wed, 7 Dec 2022 12:32:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1670445120; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DOnWN4xjgNQJFSPMI7LR+EFJPU3TVyZrxc0oybMi1t4=; b=AQrYTfs8CzDBc6GXHjm4jOc0Lv5I5xFiBUgNgDIqyIKUs5Uwlyw2xJ2hq5OWt0rdQseaU5 qLlqyUNB0cxi+Zarpn+ybQkVrFe4MAw7qyrvMsr0HWQz8wxc2KMHFXGPTepfVQLX/bT5ui 97VgkrGgADB3fM2pEBCOOmPE+BTc7nI= Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-3-SD7aH9C_NQelOocEH2WD1w-1; Wed, 07 Dec 2022 15:31:59 -0500 X-MC-Unique: SD7aH9C_NQelOocEH2WD1w-1 Received: by mail-qt1-f197.google.com with SMTP id cd6-20020a05622a418600b003a54cb17ad9so38675084qtb.0 for ; Wed, 07 Dec 2022 12:31:59 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=DOnWN4xjgNQJFSPMI7LR+EFJPU3TVyZrxc0oybMi1t4=; b=nmYPx2itvtyCZndz2EoysCMLKeUanBcxWdIksdXXkXVZz7TyjHpUsgXHNoYQ0b71Xp xsaZg0oWw8c8bYHlwlq14PabqFMxuJUiTat+nIgPEJHonTi8IzTDrML3H9lBk3G4InhR aZI4MxG5MoqMlVn39c9jRLCQiXbI6/oCGlHlUO5RBkDFR3mZ6HqtmL4Yw85SlSY1Oy1v ZKusQT74wTftO7lCR8WcY+xbYGwb9sJdu4XQUKFYmsbmtX1pCOcaubPds1Q+LQ5l2+IG 3i6mpPGSgnFsCvvfEG+3pdnYGYV/izBAJtB9E+alkhKnxImLO/1IMWIebCfvwVrrnpij MqAg== X-Gm-Message-State: ANoB5pkB83OWqoxj/RSCTYtdzeSA9byw1Ug8tfeeOo+sDRR1S9I1G8eA MDMaPLdXJBjABDb/GNI04SLMBExjraIpl4KqfR1EJQY73f+7FcvohjUqoeHo+/8LQjc5ZEAs56P JDxvAwo068tiU/6DEmlB0Krge X-Received: by 2002:ac8:7112:0:b0:3a6:88c8:2c36 with SMTP id z18-20020ac87112000000b003a688c82c36mr1643149qto.49.1670445118804; Wed, 07 Dec 2022 12:31:58 -0800 (PST) X-Received: by 2002:ac8:7112:0:b0:3a6:88c8:2c36 with SMTP id z18-20020ac87112000000b003a688c82c36mr1643131qto.49.1670445118505; Wed, 07 Dec 2022 12:31:58 -0800 (PST) Received: from x1n.redhat.com (bras-base-aurron9127w-grc-46-70-31-27-79.dsl.bell.ca. [70.31.27.79]) by smtp.gmail.com with ESMTPSA id u17-20020a05620a455100b006fa22f0494bsm17916710qkp.117.2022.12.07.12.31.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Dec 2022 12:31:58 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Jann Horn , Andrea Arcangeli , peterx@redhat.com, James Houghton , Rik van Riel , Miaohe Lin , Nadav Amit , John Hubbard , Mike Kravetz , David Hildenbrand , Andrew Morton , Muchun Song Subject: [PATCH v2 09/10] mm/hugetlb: Introduce hugetlb_walk() Date: Wed, 7 Dec 2022 15:31:56 -0500 Message-Id: <20221207203156.651077-1-peterx@redhat.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221207203034.650899-1-peterx@redhat.com> References: <20221207203034.650899-1-peterx@redhat.com> MIME-Version: 1.0 Content-type: text/plain X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751589216038626609?= X-GMAIL-MSGID: =?utf-8?q?1751589216038626609?= huge_pte_offset() is the main walker function for hugetlb pgtables. The name is not really representing what it does, though. Instead of renaming it, introduce a wrapper function called hugetlb_walk() which will use huge_pte_offset() inside. Assert on the locks when walking the pgtable. Note, the vma lock assertion will be a no-op for private mappings. Signed-off-by: Peter Xu Reviewed-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 4 +--- fs/userfaultfd.c | 6 ++---- include/linux/hugetlb.h | 39 +++++++++++++++++++++++++++++++++++++++ mm/hugetlb.c | 32 +++++++++++++------------------- mm/page_vma_mapped.c | 2 +- mm/pagewalk.c | 4 +--- 6 files changed, 57 insertions(+), 30 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index fdb16246f46e..48f1a8ad2243 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -388,9 +388,7 @@ static bool hugetlb_vma_maps_page(struct vm_area_struct *vma, { pte_t *ptep, pte; - ptep = huge_pte_offset(vma->vm_mm, addr, - huge_page_size(hstate_vma(vma))); - + ptep = hugetlb_walk(vma, addr, huge_page_size(hstate_vma(vma))); if (!ptep) return false; diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index a602f008dde5..f31fe1a9f4c5 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -237,14 +237,12 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx, unsigned long flags, unsigned long reason) { - struct mm_struct *mm = ctx->mm; pte_t *ptep, pte; bool ret = true; - mmap_assert_locked(mm); - - ptep = huge_pte_offset(mm, address, vma_mmu_pagesize(vma)); + mmap_assert_locked(ctx->mm); + ptep = hugetlb_walk(vma, address, vma_mmu_pagesize(vma)); if (!ptep) goto out; diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 81efd9b9baa2..1c20cbbf3d22 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -2,6 +2,7 @@ #ifndef _LINUX_HUGETLB_H #define _LINUX_HUGETLB_H +#include #include #include #include @@ -196,6 +197,11 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, * huge_pte_offset(): Walk the hugetlb pgtable until the last level PTE. * Returns the pte_t* if found, or NULL if the address is not mapped. * + * IMPORTANT: we should normally not directly call this function, instead + * this is only a common interface to implement arch-specific walker. + * Please consider using the hugetlb_walk() helper to make sure of the + * correct locking is satisfied. + * * Since this function will walk all the pgtable pages (including not only * high-level pgtable page, but also PUD entry that can be unshared * concurrently for VM_SHARED), the caller of this function should be @@ -1229,4 +1235,37 @@ bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr); #define flush_hugetlb_tlb_range(vma, addr, end) flush_tlb_range(vma, addr, end) #endif +static inline bool +__vma_shareable_flags_pmd(struct vm_area_struct *vma) +{ + return vma->vm_flags & (VM_MAYSHARE | VM_SHARED) && + vma->vm_private_data; +} + +/* + * Safe version of huge_pte_offset() to check the locks. See comments + * above huge_pte_offset(). + */ +static inline pte_t * +hugetlb_walk(struct vm_area_struct *vma, unsigned long addr, unsigned long sz) +{ +#if defined(CONFIG_HUGETLB_PAGE) && \ + defined(CONFIG_ARCH_WANT_HUGE_PMD_SHARE) && defined(CONFIG_LOCKDEP) + struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; + + /* + * If pmd sharing possible, locking needed to safely walk the + * hugetlb pgtables. More information can be found at the comment + * above huge_pte_offset() in the same file. + * + * NOTE: lockdep_is_held() is only defined with CONFIG_LOCKDEP. + */ + if (__vma_shareable_flags_pmd(vma)) + WARN_ON_ONCE(!lockdep_is_held(&vma_lock->rw_sema) && + !lockdep_is_held( + &vma->vm_file->f_mapping->i_mmap_rwsem)); +#endif + return huge_pte_offset(vma->vm_mm, addr, sz); +} + #endif /* _LINUX_HUGETLB_H */ diff --git a/mm/hugetlb.c b/mm/hugetlb.c index f42399522805..e3500c087893 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4814,7 +4814,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, } else { /* * For shared mappings the vma lock must be held before - * calling huge_pte_offset in the src vma. Otherwise, the + * calling hugetlb_walk() in the src vma. Otherwise, the * returned ptep could go away if part of a shared pmd and * another thread calls huge_pmd_unshare. */ @@ -4824,7 +4824,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, last_addr_mask = hugetlb_mask_last_page(h); for (addr = src_vma->vm_start; addr < src_vma->vm_end; addr += sz) { spinlock_t *src_ptl, *dst_ptl; - src_pte = huge_pte_offset(src, addr, sz); + src_pte = hugetlb_walk(src_vma, addr, sz); if (!src_pte) { addr |= last_addr_mask; continue; @@ -5028,7 +5028,7 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma, hugetlb_vma_lock_write(vma); i_mmap_lock_write(mapping); for (; old_addr < old_end; old_addr += sz, new_addr += sz) { - src_pte = huge_pte_offset(mm, old_addr, sz); + src_pte = hugetlb_walk(vma, old_addr, sz); if (!src_pte) { old_addr |= last_addr_mask; new_addr |= last_addr_mask; @@ -5091,7 +5091,7 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct last_addr_mask = hugetlb_mask_last_page(h); address = start; for (; address < end; address += sz) { - ptep = huge_pte_offset(mm, address, sz); + ptep = hugetlb_walk(vma, address, sz); if (!ptep) { address |= last_addr_mask; continue; @@ -5404,7 +5404,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, mutex_lock(&hugetlb_fault_mutex_table[hash]); hugetlb_vma_lock_read(vma); spin_lock(ptl); - ptep = huge_pte_offset(mm, haddr, huge_page_size(h)); + ptep = hugetlb_walk(vma, haddr, huge_page_size(h)); if (likely(ptep && pte_same(huge_ptep_get(ptep), pte))) goto retry_avoidcopy; @@ -5442,7 +5442,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, * before the page tables are altered */ spin_lock(ptl); - ptep = huge_pte_offset(mm, haddr, huge_page_size(h)); + ptep = hugetlb_walk(vma, haddr, huge_page_size(h)); if (likely(ptep && pte_same(huge_ptep_get(ptep), pte))) { /* Break COW or unshare */ huge_ptep_clear_flush(vma, haddr, ptep); @@ -6227,7 +6227,7 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, return NULL; hugetlb_vma_lock_read(vma); - pte = huge_pte_offset(mm, haddr, huge_page_size(h)); + pte = hugetlb_walk(vma, haddr, huge_page_size(h)); if (!pte) goto out_unlock; @@ -6292,8 +6292,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, * * Note that page table lock is not held when pte is null. */ - pte = huge_pte_offset(mm, vaddr & huge_page_mask(h), - huge_page_size(h)); + pte = hugetlb_walk(vma, vaddr & huge_page_mask(h), + huge_page_size(h)); if (pte) ptl = huge_pte_lock(h, mm, pte); absent = !pte || huge_pte_none(huge_ptep_get(pte)); @@ -6479,7 +6479,7 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, last_addr_mask = hugetlb_mask_last_page(h); for (; address < end; address += psize) { spinlock_t *ptl; - ptep = huge_pte_offset(mm, address, psize); + ptep = hugetlb_walk(vma, address, psize); if (!ptep) { address |= last_addr_mask; continue; @@ -6857,12 +6857,6 @@ void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, *end = ALIGN(*end, PUD_SIZE); } -static bool __vma_shareable_flags_pmd(struct vm_area_struct *vma) -{ - return vma->vm_flags & (VM_MAYSHARE | VM_SHARED) && - vma->vm_private_data; -} - void hugetlb_vma_lock_read(struct vm_area_struct *vma) { if (__vma_shareable_flags_pmd(vma)) { @@ -7028,8 +7022,8 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, saddr = page_table_shareable(svma, vma, addr, idx); if (saddr) { - spte = huge_pte_offset(svma->vm_mm, saddr, - vma_mmu_pagesize(svma)); + spte = hugetlb_walk(svma, saddr, + vma_mmu_pagesize(svma)); if (spte) { get_page(virt_to_page(spte)); break; @@ -7387,7 +7381,7 @@ void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) hugetlb_vma_lock_write(vma); i_mmap_lock_write(vma->vm_file->f_mapping); for (address = start; address < end; address += PUD_SIZE) { - ptep = huge_pte_offset(mm, address, sz); + ptep = hugetlb_walk(vma, address, sz); if (!ptep) continue; ptl = huge_pte_lock(h, mm, ptep); diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 93e13fc17d3c..e97b2e23bd28 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -170,7 +170,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) return not_found(pvmw); /* when pud is not present, pte will be NULL */ - pvmw->pte = huge_pte_offset(mm, pvmw->address, size); + pvmw->pte = hugetlb_walk(vma, pvmw->address, size); if (!pvmw->pte) return false; diff --git a/mm/pagewalk.c b/mm/pagewalk.c index d98564a7be57..cb23f8a15c13 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -305,13 +305,11 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end, hugetlb_vma_lock_read(vma); do { next = hugetlb_entry_end(h, addr, end); - pte = huge_pte_offset(walk->mm, addr & hmask, sz); - + pte = hugetlb_walk(vma, addr & hmask, sz); if (pte) err = ops->hugetlb_entry(pte, hmask, addr, next, walk); else if (ops->pte_hole) err = ops->pte_hole(addr, next, -1, walk); - if (err) break; } while (addr = next, addr != end); From patchwork Wed Dec 7 20:31:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 31029 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp394676wrr; Wed, 7 Dec 2022 12:40:23 -0800 (PST) X-Google-Smtp-Source: AA0mqf4gX+nvuE2hDdDm8c656TJT0seUDEvo/Y1D+TYxL4W7huy8Wd4xpHVYF3ydXQvXlENhlKCD X-Received: by 2002:a17:90a:b945:b0:21a:1f5f:e797 with SMTP id f5-20020a17090ab94500b0021a1f5fe797mr2253624pjw.14.1670445622803; Wed, 07 Dec 2022 12:40:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670445622; cv=none; d=google.com; s=arc-20160816; b=vfPtMOK23BroBlnL/UbtRsN4P/2B5ktsFf+HtSCpkEyq6IohCudYf4cKwfBXMi/JfV pFpms4EBTOFo9rHlTYDTTKAgVq5QDF+v8sosk0vKe3z7XZEgMdJXAMLshZeTGp4uFore 3ia652TeSiECL2r+ORbYXj/o+C/CZixkkxiOOF9P6LTePjxaEAwWxazbfmWFMlIK76fh B0gu0rtt/nLOLG4g6BMe2EIRJLa+ETBa8N/h0NMiOSLHrq7JQxkRiI1OP7c+LKWQpGIQ ICXhiuX001IMTmFnIRkhQBKwRxT8TugmB3MU/YazAZItONJcZoKyUShUJc8ZkE94WYrZ yNpQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=IJ6TW2E1hLLncShjpLbEpjeIsJ0SxMbIuMnmoV2Wa4Y=; b=S7dMMNI4zJxLGyRuz0/Ni5/cdvHZ6HC4KWPDbtK7scCP+c6ReAmuev/HHEzqQXdV0z czIb1gbmXX3Y4HYKCEudAzBTKKVDRUqLRNd+vfQ3JJuSujaCR/W4Obf4uRPnpwGtlKZx QpzAEHDrpoerpiFjte6ccmtEOJJLJOH5+6hO4YnLLbHSlvjqc9PoxMjh1cAnLwcYeU9o nx+vCae5exM3mkqAUdS9IOiKgYCEM782rB08H4nTKRpNk2BULVZLifYUp6qjh6pDtstC Pq5fX0A8VMuMuV+KW2E9iQi4HAttogS8hMchWBLvj6kH4qgYzWqC2yaARkwlrTMFZaxN AHhQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=iC7vgpMD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s11-20020a170902ea0b00b0018938988ea9si23495070plg.520.2022.12.07.12.40.08; Wed, 07 Dec 2022 12:40:22 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=iC7vgpMD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229893AbiLGUdt (ORCPT + 99 others); Wed, 7 Dec 2022 15:33:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50668 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229877AbiLGUdX (ORCPT ); Wed, 7 Dec 2022 15:33:23 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B2C907E40B for ; Wed, 7 Dec 2022 12:32:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1670445122; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=IJ6TW2E1hLLncShjpLbEpjeIsJ0SxMbIuMnmoV2Wa4Y=; b=iC7vgpMD1kprpPJTbKrRyUt3svh17KKgXEa/L3QmSGYlHFLn5DPIs5YYg3IDRDvsC9643J Bp5b5a3GiA4/E/fI7Ir1alKClmjdETTQVhkdcpKsMc3JYNPb8fjbG+JO3RFVoeFcjT2TO6 Drg9L25wi3K1gZNMP1QY2DsK57xU/4w= Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-526-TThzOUmoOVa49-e_Ar_Mew-1; Wed, 07 Dec 2022 15:32:01 -0500 X-MC-Unique: TThzOUmoOVa49-e_Ar_Mew-1 Received: by mail-qv1-f72.google.com with SMTP id 71-20020a0c804d000000b004b2fb260447so37549156qva.10 for ; Wed, 07 Dec 2022 12:32:01 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=IJ6TW2E1hLLncShjpLbEpjeIsJ0SxMbIuMnmoV2Wa4Y=; b=xZg0+5I8pd3k9ZKQY/Ut8S4jSnK+Op4a9wLCqSv4U1qLqE64G+HrSc85Lv7iah8W4E gwibX+dcUtcbxgFbGr4xMI1EzAZ4AnIxmS6/xrbuRv3iXRc6UC3n33GZNH6La4fp3Hfo WPxaS+K/9wpNpFvrCsTeti5NspNIAFOCSnnB3GHx1cB/lhfJGsTylyGMsxdavZZuqUlK Ynrh0k1qhcq8VDaRUIqM2o3JjD6LwfifVXHy3k3YDBYV0rSnB/fOOushvAchzGQd8GAA a9TCr7Ip3SuiymPl80T9bRLNBwQ/jcNZl/bebRGeCfPt8gjWcRqP3aMnSL9ssHliv1f6 PYDg== X-Gm-Message-State: ANoB5pm0Tn0Zwob6Y5Ai1538JLZlWdIO+sIwahFlX8G9k2ols7AetQ2G d4Oe1GQxVuyvda4epCxEZne9OOYRJnJ7J0wUJULFfzutcDJ47OPZfm48D0iike754TAWAmdxsXX REPG4Y5aR9eHn4XMT2haEqaRn X-Received: by 2002:ad4:4709:0:b0:4c7:629e:7a70 with SMTP id qb9-20020ad44709000000b004c7629e7a70mr1636771qvb.44.1670445120725; Wed, 07 Dec 2022 12:32:00 -0800 (PST) X-Received: by 2002:ad4:4709:0:b0:4c7:629e:7a70 with SMTP id qb9-20020ad44709000000b004c7629e7a70mr1636758qvb.44.1670445120489; Wed, 07 Dec 2022 12:32:00 -0800 (PST) Received: from x1n.redhat.com (bras-base-aurron9127w-grc-46-70-31-27-79.dsl.bell.ca. [70.31.27.79]) by smtp.gmail.com with ESMTPSA id bm8-20020a05620a198800b006fa8299b4d5sm18007118qkb.100.2022.12.07.12.31.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 07 Dec 2022 12:32:00 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Jann Horn , Andrea Arcangeli , peterx@redhat.com, James Houghton , Rik van Riel , Miaohe Lin , Nadav Amit , John Hubbard , Mike Kravetz , David Hildenbrand , Andrew Morton , Muchun Song Subject: [PATCH v2 10/10] mm/hugetlb: Document why page_vma_mapped_walk() is safe to walk Date: Wed, 7 Dec 2022 15:31:58 -0500 Message-Id: <20221207203158.651092-1-peterx@redhat.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20221207203034.650899-1-peterx@redhat.com> References: <20221207203034.650899-1-peterx@redhat.com> MIME-Version: 1.0 Content-type: text/plain X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751589189712769753?= X-GMAIL-MSGID: =?utf-8?q?1751589189712769753?= Taking vma lock here is not needed for now because all potential hugetlb walkers here should have i_mmap_rwsem held. Document the fact. Reviewed-by: Mike Kravetz Signed-off-by: Peter Xu --- mm/page_vma_mapped.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index e97b2e23bd28..2e59a0419d22 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -168,8 +168,14 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) /* The only possible mapping was handled on last iteration */ if (pvmw->pte) return not_found(pvmw); - - /* when pud is not present, pte will be NULL */ + /* + * NOTE: we don't need explicit lock here to walk the + * hugetlb pgtable because either (1) potential callers of + * hugetlb pvmw currently holds i_mmap_rwsem, or (2) the + * caller will not walk a hugetlb vma (e.g. ksm or uprobe). + * When one day this rule breaks, one will get a warning + * in hugetlb_walk(), and then we'll figure out what to do. + */ pvmw->pte = hugetlb_walk(vma, pvmw->address, size); if (!pvmw->pte) return false;