Message ID | 20230213104323.1792839-1-usama.anjum@collabora.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp2276162wrn; Mon, 13 Feb 2023 02:48:30 -0800 (PST) X-Google-Smtp-Source: AK7set+PlDx3O7YhlYRbMnRDDYN217FwAH9EwmX5LCQih9/cyI3PZeqa+MOKqkS1eYKDKet0xh/V X-Received: by 2002:a17:906:6a20:b0:8af:2a97:91d4 with SMTP id qw32-20020a1709066a2000b008af2a9791d4mr23399802ejc.14.1676285310738; Mon, 13 Feb 2023 02:48:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676285310; cv=none; d=google.com; s=arc-20160816; b=tYEtwhrbl5iUKGBIz7dEvsR0Ybr93IlvefY2zgi9+9yzpxqROBk0FlIFcx37i23Ttk YXbjpIhLYwibo+dSwYpGCF6XnCy6juP5NMPnXi1RTvYlw/bLnunoou7zlfs616P/iPD5 S7VPzRZ23euzMRMhsZ50XT+FOIORuvqGj7cSyWWLGuMh8g9X/7+N8euG5WRKXL1hzwqr krzK5sMjriJIVjHsaFWYcQ2uUIYpgZrMZ2EXGOfcqNcAfiEyqymqOPJlZ93KM1pl9InY y43tP5e5T1ME6Q981c5eIwBEAD/cYdXGY6Jj/qGkzSoA+kMXgCa81spU40GZEFJd9dVa 6J4g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=iwj1I4jVejjxZxGfashrJnMQT2Qz/9U16mBoavklPnc=; b=eUutedzCxWeSfUBEenZ5fCBeq0bH1c+WvYZUkXr8reInPRXSaThkDFlij6IKvob3VE y+1TU1a4F6H0GzGV6Yq2ETV4JcoSDLkklMe9quS83L35I9uk41IduuKfspTL15o69JdD lBpLnK5Okq0pHM9H3wQp6r/Sc88KjojIFT0tvcf8fEnuttGacZF62ApN3Ty/QLgbtkft hkW+5KWuwtM55wiakqwkZ70HkVYWJrJ+EvybrVba0jfbZKTxQe7j00XJTlQslVAIcaru ixsRHws9O/kofBXFLdA3cMUnQypRKZtpmfYcY32wlqLFeNobwlH1uEKaBbKXeTMr1B84 a/7Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@collabora.com header.s=mail header.b=GdtkrSRB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=collabora.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id eu23-20020a170907299700b0089d25ec9aa7si12670040ejc.912.2023.02.13.02.48.07; Mon, 13 Feb 2023 02:48:30 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@collabora.com header.s=mail header.b=GdtkrSRB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=collabora.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231260AbjBMKpE (ORCPT <rfc822;tebrre53rla2o@gmail.com> + 99 others); Mon, 13 Feb 2023 05:45:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49294 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231327AbjBMKou (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 13 Feb 2023 05:44:50 -0500 Received: from madras.collabora.co.uk (madras.collabora.co.uk [46.235.227.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 09E7B18A96 for <linux-kernel@vger.kernel.org>; Mon, 13 Feb 2023 02:43:45 -0800 (PST) Received: from localhost.localdomain (unknown [39.45.179.179]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: usama.anjum) by madras.collabora.co.uk (Postfix) with ESMTPSA id B98E56602136; Mon, 13 Feb 2023 10:43:39 +0000 (GMT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1676285021; bh=IaIVSmG6X9VC7e6/KLtDU/fP5htn4wfc9h00W5AtIpM=; h=From:To:Cc:Subject:Date:From; b=GdtkrSRBBs1/Emf4QAYjrg/iXvHP/AaGkG48ruTL9p/pcfpOBcoz+r93o8qPA495S S8CKQnEHlefPEo08PrcElHSpuXkjpDjiM0wNpWVZ/UCe6QHqFyNsRfjNaBo7+5w0io C4FfowgNK17yCVThqRFQy/IH3Ia82sKuWNkxTU0Sep9JvCWVFNfApRtF7RvlsK8jqY /y5TFv5bdK0Bxp4Gf70OX/zQL9yRmBIH53rAYJlAyDHTKk3W44hSeZsHMJqdSepIhU w85GAP/up/scB1axWSlxaxp06b1KTM3B7fbxesNopOSQvTXp6gFWbpx28Ao37eNhrc gQxJgNfNZD9WQ== From: Muhammad Usama Anjum <usama.anjum@collabora.com> To: peterx@redhat.com, david@redhat.com, Andrew Morton <akpm@linux-foundation.org> Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>, kernel@collabora.com, Paul Gofman <pgofman@codeweavers.com>, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH] mm/userfaultfd: Support operation on multiple VMAs Date: Mon, 13 Feb 2023 15:43:23 +0500 Message-Id: <20230213104323.1792839-1-usama.anjum@collabora.com> X-Mailer: git-send-email 2.39.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1757712545764466031?= X-GMAIL-MSGID: =?utf-8?q?1757712545764466031?= |
Series |
mm/userfaultfd: Support operation on multiple VMAs
|
|
Commit Message
Muhammad Usama Anjum
Feb. 13, 2023, 10:43 a.m. UTC
mwriteprotect_range() errors out if [start, end) doesn't fall in one
VMA. We are facing a use case where multiple VMAs are present in one
range of interest. For example, the following pseudocode reproduces the
error which we are trying to fix:
- Allocate memory of size 16 pages with PROT_NONE with mmap
- Register userfaultfd
- Change protection of the first half (1 to 8 pages) of memory to
PROT_READ | PROT_WRITE. This breaks the memory area in two VMAs.
- Now UFFDIO_WRITEPROTECT_MODE_WP on the whole memory of 16 pages errors
out.
This is a simple use case where user may or may not know if the memory
area has been divided into multiple VMAs.
Reported-by: Paul Gofman <pgofman@codeweavers.com>
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
---
mm/userfaultfd.c | 36 +++++++++++++++++++-----------------
1 file changed, 19 insertions(+), 17 deletions(-)
Comments
On 13.02.23 11:43, Muhammad Usama Anjum wrote: > mwriteprotect_range() errors out if [start, end) doesn't fall in one > VMA. We are facing a use case where multiple VMAs are present in one > range of interest. For example, the following pseudocode reproduces the > error which we are trying to fix: > > - Allocate memory of size 16 pages with PROT_NONE with mmap > - Register userfaultfd > - Change protection of the first half (1 to 8 pages) of memory to > PROT_READ | PROT_WRITE. This breaks the memory area in two VMAs. > - Now UFFDIO_WRITEPROTECT_MODE_WP on the whole memory of 16 pages errors > out. > > This is a simple use case where user may or may not know if the memory > area has been divided into multiple VMAs. > > Reported-by: Paul Gofman <pgofman@codeweavers.com> > Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> > --- > mm/userfaultfd.c | 36 +++++++++++++++++++----------------- > 1 file changed, 19 insertions(+), 17 deletions(-) > > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c > index 65ad172add27..46e0a014af68 100644 > --- a/mm/userfaultfd.c > +++ b/mm/userfaultfd.c > @@ -738,9 +738,11 @@ int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, > unsigned long len, bool enable_wp, > atomic_t *mmap_changing) > { > + unsigned long end = start + len; > struct vm_area_struct *dst_vma; > unsigned long page_mask; > int err; > + VMA_ITERATOR(vmi, dst_mm, start); > > /* > * Sanitize the command parameters: > @@ -762,26 +764,26 @@ int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, > if (mmap_changing && atomic_read(mmap_changing)) > goto out_unlock; > > - err = -ENOENT; > - dst_vma = find_dst_vma(dst_mm, start, len); > - > - if (!dst_vma) > - goto out_unlock; > - if (!userfaultfd_wp(dst_vma)) > - goto out_unlock; > - if (!vma_can_userfault(dst_vma, dst_vma->vm_flags)) > - goto out_unlock; > + for_each_vma_range(vmi, dst_vma, end) { > + err = -ENOENT; > > - if (is_vm_hugetlb_page(dst_vma)) { > - err = -EINVAL; > - page_mask = vma_kernel_pagesize(dst_vma) - 1; > - if ((start & page_mask) || (len & page_mask)) > - goto out_unlock; > - } > + if (!dst_vma->vm_userfaultfd_ctx.ctx) > + break; > + if (!userfaultfd_wp(dst_vma)) > + break; > + if (!vma_can_userfault(dst_vma, dst_vma->vm_flags)) > + break; > > - uffd_wp_range(dst_mm, dst_vma, start, len, enable_wp); > + if (is_vm_hugetlb_page(dst_vma)) { > + err = -EINVAL; > + page_mask = vma_kernel_pagesize(dst_vma) - 1; > + if ((start & page_mask) || (len & page_mask)) > + break; > + } > > - err = 0; > + uffd_wp_range(dst_mm, dst_vma, start, len, enable_wp); I suspect you should be adjusting the range to only cover that specific VMA here.
Hi David, Thank you for quick review! On 2/13/23 4:44 PM, David Hildenbrand wrote: > On 13.02.23 11:43, Muhammad Usama Anjum wrote: >> mwriteprotect_range() errors out if [start, end) doesn't fall in one >> VMA. We are facing a use case where multiple VMAs are present in one >> range of interest. For example, the following pseudocode reproduces the >> error which we are trying to fix: >> >> - Allocate memory of size 16 pages with PROT_NONE with mmap >> - Register userfaultfd >> - Change protection of the first half (1 to 8 pages) of memory to >> PROT_READ | PROT_WRITE. This breaks the memory area in two VMAs. >> - Now UFFDIO_WRITEPROTECT_MODE_WP on the whole memory of 16 pages errors >> out. >> >> This is a simple use case where user may or may not know if the memory >> area has been divided into multiple VMAs. >> >> Reported-by: Paul Gofman <pgofman@codeweavers.com> >> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> >> --- >> mm/userfaultfd.c | 36 +++++++++++++++++++----------------- >> 1 file changed, 19 insertions(+), 17 deletions(-) >> >> diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c >> index 65ad172add27..46e0a014af68 100644 >> --- a/mm/userfaultfd.c >> +++ b/mm/userfaultfd.c >> @@ -738,9 +738,11 @@ int mwriteprotect_range(struct mm_struct *dst_mm, >> unsigned long start, >> unsigned long len, bool enable_wp, >> atomic_t *mmap_changing) >> { >> + unsigned long end = start + len; >> struct vm_area_struct *dst_vma; >> unsigned long page_mask; >> int err; >> + VMA_ITERATOR(vmi, dst_mm, start); >> /* >> * Sanitize the command parameters: >> @@ -762,26 +764,26 @@ int mwriteprotect_range(struct mm_struct *dst_mm, >> unsigned long start, >> if (mmap_changing && atomic_read(mmap_changing)) >> goto out_unlock; >> - err = -ENOENT; >> - dst_vma = find_dst_vma(dst_mm, start, len); >> - >> - if (!dst_vma) >> - goto out_unlock; >> - if (!userfaultfd_wp(dst_vma)) >> - goto out_unlock; >> - if (!vma_can_userfault(dst_vma, dst_vma->vm_flags)) >> - goto out_unlock; >> + for_each_vma_range(vmi, dst_vma, end) { >> + err = -ENOENT; >> - if (is_vm_hugetlb_page(dst_vma)) { >> - err = -EINVAL; >> - page_mask = vma_kernel_pagesize(dst_vma) - 1; >> - if ((start & page_mask) || (len & page_mask)) >> - goto out_unlock; >> - } >> + if (!dst_vma->vm_userfaultfd_ctx.ctx) >> + break; >> + if (!userfaultfd_wp(dst_vma)) >> + break; >> + if (!vma_can_userfault(dst_vma, dst_vma->vm_flags)) >> + break; >> - uffd_wp_range(dst_mm, dst_vma, start, len, enable_wp); >> + if (is_vm_hugetlb_page(dst_vma)) { >> + err = -EINVAL; >> + page_mask = vma_kernel_pagesize(dst_vma) - 1; >> + if ((start & page_mask) || (len & page_mask)) >> + break; >> + } >> - err = 0; >> + uffd_wp_range(dst_mm, dst_vma, start, len, enable_wp); > > I suspect you should be adjusting the range to only cover that specific VMA > here. Sorry, you are right. I don't know why it is still working with the blunder. Will send a v2. Thanks, Usama
On 13.02.23 16:04, Muhammad Usama Anjum wrote: > Hi David, > > Thank you for quick review! > > On 2/13/23 4:44 PM, David Hildenbrand wrote: >> On 13.02.23 11:43, Muhammad Usama Anjum wrote: >>> mwriteprotect_range() errors out if [start, end) doesn't fall in one >>> VMA. We are facing a use case where multiple VMAs are present in one >>> range of interest. For example, the following pseudocode reproduces the >>> error which we are trying to fix: >>> >>> - Allocate memory of size 16 pages with PROT_NONE with mmap >>> - Register userfaultfd >>> - Change protection of the first half (1 to 8 pages) of memory to >>> PROT_READ | PROT_WRITE. This breaks the memory area in two VMAs. >>> - Now UFFDIO_WRITEPROTECT_MODE_WP on the whole memory of 16 pages errors >>> out. >>> >>> This is a simple use case where user may or may not know if the memory >>> area has been divided into multiple VMAs. >>> >>> Reported-by: Paul Gofman <pgofman@codeweavers.com> >>> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> >>> --- >>> mm/userfaultfd.c | 36 +++++++++++++++++++----------------- >>> 1 file changed, 19 insertions(+), 17 deletions(-) >>> >>> diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c >>> index 65ad172add27..46e0a014af68 100644 >>> --- a/mm/userfaultfd.c >>> +++ b/mm/userfaultfd.c >>> @@ -738,9 +738,11 @@ int mwriteprotect_range(struct mm_struct *dst_mm, >>> unsigned long start, >>> unsigned long len, bool enable_wp, >>> atomic_t *mmap_changing) >>> { >>> + unsigned long end = start + len; >>> struct vm_area_struct *dst_vma; >>> unsigned long page_mask; >>> int err; >>> + VMA_ITERATOR(vmi, dst_mm, start); >>> /* >>> * Sanitize the command parameters: >>> @@ -762,26 +764,26 @@ int mwriteprotect_range(struct mm_struct *dst_mm, >>> unsigned long start, >>> if (mmap_changing && atomic_read(mmap_changing)) >>> goto out_unlock; >>> - err = -ENOENT; >>> - dst_vma = find_dst_vma(dst_mm, start, len); >>> - >>> - if (!dst_vma) >>> - goto out_unlock; >>> - if (!userfaultfd_wp(dst_vma)) >>> - goto out_unlock; >>> - if (!vma_can_userfault(dst_vma, dst_vma->vm_flags)) >>> - goto out_unlock; >>> + for_each_vma_range(vmi, dst_vma, end) { >>> + err = -ENOENT; >>> - if (is_vm_hugetlb_page(dst_vma)) { >>> - err = -EINVAL; >>> - page_mask = vma_kernel_pagesize(dst_vma) - 1; >>> - if ((start & page_mask) || (len & page_mask)) >>> - goto out_unlock; >>> - } >>> + if (!dst_vma->vm_userfaultfd_ctx.ctx) >>> + break; >>> + if (!userfaultfd_wp(dst_vma)) >>> + break; >>> + if (!vma_can_userfault(dst_vma, dst_vma->vm_flags)) >>> + break; >>> - uffd_wp_range(dst_mm, dst_vma, start, len, enable_wp); >>> + if (is_vm_hugetlb_page(dst_vma)) { >>> + err = -EINVAL; >>> + page_mask = vma_kernel_pagesize(dst_vma) - 1; >>> + if ((start & page_mask) || (len & page_mask)) >>> + break; >>> + } >>> - err = 0; >>> + uffd_wp_range(dst_mm, dst_vma, start, len, enable_wp); >> >> I suspect you should be adjusting the range to only cover that specific VMA >> here. > Sorry, you are right. I don't know why it is still working with the > blunder. Will send a v2. Maybe worth adding some sanity checks (VM_WARN_ONCE()) in there (e.g., change_protection()) to catch that.
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 65ad172add27..46e0a014af68 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -738,9 +738,11 @@ int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, unsigned long len, bool enable_wp, atomic_t *mmap_changing) { + unsigned long end = start + len; struct vm_area_struct *dst_vma; unsigned long page_mask; int err; + VMA_ITERATOR(vmi, dst_mm, start); /* * Sanitize the command parameters: @@ -762,26 +764,26 @@ int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, if (mmap_changing && atomic_read(mmap_changing)) goto out_unlock; - err = -ENOENT; - dst_vma = find_dst_vma(dst_mm, start, len); - - if (!dst_vma) - goto out_unlock; - if (!userfaultfd_wp(dst_vma)) - goto out_unlock; - if (!vma_can_userfault(dst_vma, dst_vma->vm_flags)) - goto out_unlock; + for_each_vma_range(vmi, dst_vma, end) { + err = -ENOENT; - if (is_vm_hugetlb_page(dst_vma)) { - err = -EINVAL; - page_mask = vma_kernel_pagesize(dst_vma) - 1; - if ((start & page_mask) || (len & page_mask)) - goto out_unlock; - } + if (!dst_vma->vm_userfaultfd_ctx.ctx) + break; + if (!userfaultfd_wp(dst_vma)) + break; + if (!vma_can_userfault(dst_vma, dst_vma->vm_flags)) + break; - uffd_wp_range(dst_mm, dst_vma, start, len, enable_wp); + if (is_vm_hugetlb_page(dst_vma)) { + err = -EINVAL; + page_mask = vma_kernel_pagesize(dst_vma) - 1; + if ((start & page_mask) || (len & page_mask)) + break; + } - err = 0; + uffd_wp_range(dst_mm, dst_vma, start, len, enable_wp); + err = 0; + } out_unlock: mmap_read_unlock(dst_mm); return err;