Message ID | d2748bc4077b53c60bcb06fccaf976cb2afee345.1696709413.git.lstoakes@gmail.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a888:0:b0:403:3b70:6f57 with SMTP id x8csp1098952vqo; Sat, 7 Oct 2023 13:52:10 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEmQWUOb601uasiglff70L/D3aulJK4vMrryfhf4UtqlIkoUwM7kS6v/wcwuqeNgSp+OtRR X-Received: by 2002:a05:6a20:258a:b0:161:aef5:6395 with SMTP id k10-20020a056a20258a00b00161aef56395mr13417304pzd.24.1696711930640; Sat, 07 Oct 2023 13:52:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696711930; cv=none; d=google.com; s=arc-20160816; b=cKaEkyPdTHl55yI/Zb9g7C2kDLtLJoEyyU3zPhZeHtNLGBOeWkFZAIJ/THMomrmfVZ mCbEEAmq4UrGjclCzsGkXKLoGYa9c5Gl1Xdao21GSdJo+WthSBxH6B0Jldu7PVqSedf+ S9SRNY7YYFvcCgzBJOW7xNGzzoQrdIwALhtYWXUzww37sz6F8dqU0J9FEiBMaJ+srTBj jhxrwY7YFibuBRMMp9MYsVu6ZRf6R1c/R4NLiDE1PC/0UZLCA82Y12+M0ad7MM0PX2IM JXRmfCwi9JqRwIkae4TAKbYJJjSHcG00N3SZym1pnMNcxa9bFdQNQUrVuHMJiB+L7yhC lcvw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=2R/SPrp+9X8K4zqdkQ2nu1IGtJt6e/ubH1aV+LRG8uo=; fh=On8i7MftmLnii/x6szJvRZF6tmYBE7/T6K0SDqpksdA=; b=gyQqaGpdsQB+0wzREHapt9zMbAeA+v6EKBT4rGSvgWpTzpykrBvBEdGHRc/ygAwfL9 v1/qzT5xsmopoigwEbPbgfKUU3ZEO6zjAVP4mpEmUCV5jxsmbSzhsqNvG7jWRn1krQiM VRcZue8V43zfNNguGmt4lEy6ExhitGocndm2D/REEKPZOiYmhPH5exC8UDfgGuZ3Bht8 IoO6fopeknGRe2ujwqgMkFyzrRpeCgvIYH3rOC/ktcpWO5NKZ5sMthoKXFde4XpnlHxB 1pMux+ButXNEFi1SNgMyPBEJRY5Y2aF97C9NXWlnAcQfib/jGk7HXrNl4gxrNGWg6rL4 x+3w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b="cZfbV49/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from pete.vger.email (pete.vger.email. [2620:137:e000::3:6]) by mx.google.com with ESMTPS id h192-20020a636cc9000000b0058986c61b99si4549412pgc.43.2023.10.07.13.52.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 07 Oct 2023 13:52:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) client-ip=2620:137:e000::3:6; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b="cZfbV49/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id 339B98041ED8; Sat, 7 Oct 2023 13:52:08 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344218AbjJGUv1 (ORCPT <rfc822;pusanteemu@gmail.com> + 17 others); Sat, 7 Oct 2023 16:51:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56328 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344180AbjJGUvQ (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sat, 7 Oct 2023 16:51:16 -0400 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 89FE893; Sat, 7 Oct 2023 13:51:14 -0700 (PDT) Received: by mail-wr1-x42a.google.com with SMTP id ffacd0b85a97d-32003aae100so2663451f8f.0; Sat, 07 Oct 2023 13:51:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1696711873; x=1697316673; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=2R/SPrp+9X8K4zqdkQ2nu1IGtJt6e/ubH1aV+LRG8uo=; b=cZfbV49/owq2IZis6MxULDn/VPdCTmrw6fvoNcQwk9O6jniLVimzHHp5rf2k0h9qI6 +37ciYjDill9y9n+PFryG9O5PJ9MqxUpVJz0kOFjB9/POOuXqFiO1iQBbVPiiWNqrgMd RkeDDTnAQE4BMT2TRwEfHS7QgvCGrNddiHHtoCzoIOu2Lddcx362S5xn4RP2PQ/6c6k0 pqAblOpX3xDWbsv01txeebfO+dhr8mg2EegTUKaVhpwp0RiSQazhBIZ8rJBtRgP6t5Jh YlGMb9CaebZlBQUktQ8Rqzt50uFuov8LcMfRA0lE1S0j/zmMVMuRnrLWPKDykrcccfZP GwRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696711873; x=1697316673; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2R/SPrp+9X8K4zqdkQ2nu1IGtJt6e/ubH1aV+LRG8uo=; b=uWKhNqdnKRzLDiV5j+1a+chbm0WJmYTUhBTVHDQsbWWle6V/8Bs+Xac/QEz5RrKzxC E7F21Imo2ra+LBW4k9nZVZp6TEUtBSbxPEOlra3kwz/7mZ7f7ANPcczZT10+bxWIcHzQ o1VXetFsULFgPb8nXW+zXBF5nOLpd0SsYhx6GvuNYlnWU6d/XMJWMlt/aDn/s/7Y4cCO EfTSYqMzccXXo5OYbZrAEzrfb1qKrjtvW+//p1Q/Rg88MHpQFcY9V/8yljJU+BI8msUc YgBrYDG58HiAlJ4v6ZpfBZVzL0Hqa00Rchzz8ihnwjkRQKO4yUkHyB63I79CkCP0M8Zj YW3g== X-Gm-Message-State: AOJu0Yy8tSHdS/Py9vIxau+EVfpKIvthFFjj2RkZBXaGAZAaDdwg60GE y6hIWGe+Y+vyn/TuQYHa3UQ= X-Received: by 2002:a5d:4805:0:b0:314:3369:df57 with SMTP id l5-20020a5d4805000000b003143369df57mr6970810wrq.5.1696711872784; Sat, 07 Oct 2023 13:51:12 -0700 (PDT) Received: from lucifer.home ([2a00:23c5:dc8c:8701:1663:9a35:5a7b:1d76]) by smtp.googlemail.com with ESMTPSA id g7-20020adfe407000000b003232d122dbfsm5120550wrm.66.2023.10.07.13.51.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 07 Oct 2023 13:51:11 -0700 (PDT) From: Lorenzo Stoakes <lstoakes@gmail.com> To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton <akpm@linux-foundation.org> Cc: Mike Kravetz <mike.kravetz@oracle.com>, Muchun Song <muchun.song@linux.dev>, Alexander Viro <viro@zeniv.linux.org.uk>, Christian Brauner <brauner@kernel.org>, Matthew Wilcox <willy@infradead.org>, Hugh Dickins <hughd@google.com>, Andy Lutomirski <luto@kernel.org>, Jan Kara <jack@suse.cz>, linux-fsdevel@vger.kernel.org, bpf@vger.kernel.org, Lorenzo Stoakes <lstoakes@gmail.com> Subject: [PATCH v3 3/3] mm: enforce the mapping_map_writable() check after call_mmap() Date: Sat, 7 Oct 2023 21:51:01 +0100 Message-ID: <d2748bc4077b53c60bcb06fccaf976cb2afee345.1696709413.git.lstoakes@gmail.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <cover.1696709413.git.lstoakes@gmail.com> References: <cover.1696709413.git.lstoakes@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=3.0 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_SBL_CSS, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Sat, 07 Oct 2023 13:52:08 -0700 (PDT) X-Spam-Level: ** X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1779131409439813379 X-GMAIL-MSGID: 1779131409439813379 |
Series |
permit write-sealed memfd read-only shared mappings
|
|
Commit Message
Lorenzo Stoakes
Oct. 7, 2023, 8:51 p.m. UTC
In order for an F_SEAL_WRITE sealed memfd mapping to have an opportunity to
clear VM_MAYWRITE in seal_check_write() we must be able to invoke either
the shmem_mmap() or hugetlbfs_file_mmap() f_ops->mmap() handler to do so.
We would otherwise fail the mapping_map_writable() check before we had
the opportunity to clear VM_MAYWRITE.
However, the existing logic in mmap_region() performs this check BEFORE
calling call_mmap() (which invokes file->f_ops->mmap()). We must enforce
this check AFTER the function call.
In order to avoid any risk of breaking call_mmap() handlers which assume
this will have been done first, we continue to mark the file writable
first, simply deferring enforcement of it failing until afterwards.
This enables mmap(..., PROT_READ, MAP_SHARED, fd, 0) mappings for memfd's
sealed via F_SEAL_WRITE to succeed, whereas previously they were not
permitted.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217238
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
---
mm/mmap.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)
Comments
On Sat 07-10-23 21:51:01, Lorenzo Stoakes wrote: > In order for an F_SEAL_WRITE sealed memfd mapping to have an opportunity to > clear VM_MAYWRITE in seal_check_write() we must be able to invoke either > the shmem_mmap() or hugetlbfs_file_mmap() f_ops->mmap() handler to do so. > > We would otherwise fail the mapping_map_writable() check before we had > the opportunity to clear VM_MAYWRITE. > > However, the existing logic in mmap_region() performs this check BEFORE > calling call_mmap() (which invokes file->f_ops->mmap()). We must enforce > this check AFTER the function call. > > In order to avoid any risk of breaking call_mmap() handlers which assume > this will have been done first, we continue to mark the file writable > first, simply deferring enforcement of it failing until afterwards. > > This enables mmap(..., PROT_READ, MAP_SHARED, fd, 0) mappings for memfd's > sealed via F_SEAL_WRITE to succeed, whereas previously they were not > permitted. > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=217238 > Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> ... > diff --git a/mm/mmap.c b/mm/mmap.c > index 6f6856b3267a..9fbee92aaaee 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -2767,17 +2767,25 @@ unsigned long mmap_region(struct file *file, unsigned long addr, > vma->vm_pgoff = pgoff; > > if (file) { > - if (is_shared_maywrite(vm_flags)) { > - error = mapping_map_writable(file->f_mapping); > - if (error) > - goto free_vma; > - } > + int writable_error = 0; > + > + if (vma_is_shared_maywrite(vma)) > + writable_error = mapping_map_writable(file->f_mapping); > > vma->vm_file = get_file(file); > error = call_mmap(file, vma); > if (error) > goto unmap_and_free_vma; > > + /* > + * call_mmap() may have changed VMA flags, so retry this check > + * if it failed before. > + */ > + if (writable_error && vma_is_shared_maywrite(vma)) { > + error = writable_error; > + goto close_and_free_vma; > + } Hum, this doesn't quite give me a peace of mind ;). One bug I can see is that if call_mmap() drops the VM_MAYWRITE flag, we seem to forget to drop i_mmap_writeable counter here? I've checked why your v2 version broke i915 and I think the reason maybe has nothing to do with i915. Just in case call_mmap() failed, it ended up jumping to unmap_and_free_vma which calls mapping_unmap_writable() but we didn't call mapping_map_writable() yet so the counter became imbalanced. So I'd be for returning to v2 version, just fix up the error handling paths... Honza > + > /* > * Expansion is handled above, merging is handled below. > * Drivers should not alter the address of the VMA. > -- > 2.42.0 >
On Wed, Oct 11, 2023 at 11:46:27AM +0200, Jan Kara wrote: > On Sat 07-10-23 21:51:01, Lorenzo Stoakes wrote: > > In order for an F_SEAL_WRITE sealed memfd mapping to have an opportunity to > > clear VM_MAYWRITE in seal_check_write() we must be able to invoke either > > the shmem_mmap() or hugetlbfs_file_mmap() f_ops->mmap() handler to do so. > > > > We would otherwise fail the mapping_map_writable() check before we had > > the opportunity to clear VM_MAYWRITE. > > > > However, the existing logic in mmap_region() performs this check BEFORE > > calling call_mmap() (which invokes file->f_ops->mmap()). We must enforce > > this check AFTER the function call. > > > > In order to avoid any risk of breaking call_mmap() handlers which assume > > this will have been done first, we continue to mark the file writable > > first, simply deferring enforcement of it failing until afterwards. > > > > This enables mmap(..., PROT_READ, MAP_SHARED, fd, 0) mappings for memfd's > > sealed via F_SEAL_WRITE to succeed, whereas previously they were not > > permitted. > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=217238 > > Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> > > ... > > > diff --git a/mm/mmap.c b/mm/mmap.c > > index 6f6856b3267a..9fbee92aaaee 100644 > > --- a/mm/mmap.c > > +++ b/mm/mmap.c > > @@ -2767,17 +2767,25 @@ unsigned long mmap_region(struct file *file, unsigned long addr, > > vma->vm_pgoff = pgoff; > > > > if (file) { > > - if (is_shared_maywrite(vm_flags)) { > > - error = mapping_map_writable(file->f_mapping); > > - if (error) > > - goto free_vma; > > - } > > + int writable_error = 0; > > + > > + if (vma_is_shared_maywrite(vma)) > > + writable_error = mapping_map_writable(file->f_mapping); > > > > vma->vm_file = get_file(file); > > error = call_mmap(file, vma); > > if (error) > > goto unmap_and_free_vma; > > > > + /* > > + * call_mmap() may have changed VMA flags, so retry this check > > + * if it failed before. > > + */ > > + if (writable_error && vma_is_shared_maywrite(vma)) { > > + error = writable_error; > > + goto close_and_free_vma; > > + } > > Hum, this doesn't quite give me a peace of mind ;). One bug I can see is > that if call_mmap() drops the VM_MAYWRITE flag, we seem to forget to drop > i_mmap_writeable counter here? This wouldn't be applicable in the F_SEAL_WRITE case, as the i_mmap_writable counter would already have been decremented, and thus an error would arise causing no further decrement, and everything would work fine. It'd be very odd for something to be writable here but the driver to make it not writable. But we do need to account for this. > > I've checked why your v2 version broke i915 and I think the reason maybe > has nothing to do with i915. Just in case call_mmap() failed, it ended up > jumping to unmap_and_free_vma which calls mapping_unmap_writable() but we > didn't call mapping_map_writable() yet so the counter became imbalanced. yeah that must be the cause, I thought perhaps somehow __remove_shared_vm_struct() got invoked by i915_gem_mmap() but I didn't trace it through to see if it was possible. Looking at it again, i don't think that is possible, as we hold a mmap/vma write lock, and the only operations that can cause __remove_shared_vm_struct() to run are things that would not be able to do so with this lock held. > > So I'd be for returning to v2 version, just fix up the error handling > paths... So in conclusion, I agree, this is the better approach. Will respin in v4. > > Honza > > > > + > > /* > > * Expansion is handled above, merging is handled below. > > * Drivers should not alter the address of the VMA. > > -- > > 2.42.0 > > > -- > Jan Kara <jack@suse.com> > SUSE Labs, CR
On Wed 11-10-23 19:14:10, Lorenzo Stoakes wrote: > On Wed, Oct 11, 2023 at 11:46:27AM +0200, Jan Kara wrote: > > On Sat 07-10-23 21:51:01, Lorenzo Stoakes wrote: > > > In order for an F_SEAL_WRITE sealed memfd mapping to have an opportunity to > > > clear VM_MAYWRITE in seal_check_write() we must be able to invoke either > > > the shmem_mmap() or hugetlbfs_file_mmap() f_ops->mmap() handler to do so. > > > > > > We would otherwise fail the mapping_map_writable() check before we had > > > the opportunity to clear VM_MAYWRITE. > > > > > > However, the existing logic in mmap_region() performs this check BEFORE > > > calling call_mmap() (which invokes file->f_ops->mmap()). We must enforce > > > this check AFTER the function call. > > > > > > In order to avoid any risk of breaking call_mmap() handlers which assume > > > this will have been done first, we continue to mark the file writable > > > first, simply deferring enforcement of it failing until afterwards. > > > > > > This enables mmap(..., PROT_READ, MAP_SHARED, fd, 0) mappings for memfd's > > > sealed via F_SEAL_WRITE to succeed, whereas previously they were not > > > permitted. > > > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=217238 > > > Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> > > > > ... > > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > > index 6f6856b3267a..9fbee92aaaee 100644 > > > --- a/mm/mmap.c > > > +++ b/mm/mmap.c > > > @@ -2767,17 +2767,25 @@ unsigned long mmap_region(struct file *file, unsigned long addr, > > > vma->vm_pgoff = pgoff; > > > > > > if (file) { > > > - if (is_shared_maywrite(vm_flags)) { > > > - error = mapping_map_writable(file->f_mapping); > > > - if (error) > > > - goto free_vma; > > > - } > > > + int writable_error = 0; > > > + > > > + if (vma_is_shared_maywrite(vma)) > > > + writable_error = mapping_map_writable(file->f_mapping); > > > > > > vma->vm_file = get_file(file); > > > error = call_mmap(file, vma); > > > if (error) > > > goto unmap_and_free_vma; > > > > > > + /* > > > + * call_mmap() may have changed VMA flags, so retry this check > > > + * if it failed before. > > > + */ > > > + if (writable_error && vma_is_shared_maywrite(vma)) { > > > + error = writable_error; > > > + goto close_and_free_vma; > > > + } > > > > Hum, this doesn't quite give me a peace of mind ;). One bug I can see is > > that if call_mmap() drops the VM_MAYWRITE flag, we seem to forget to drop > > i_mmap_writeable counter here? > > This wouldn't be applicable in the F_SEAL_WRITE case, as the > i_mmap_writable counter would already have been decremented, and thus an > error would arise causing no further decrement, and everything would work > fine. > > It'd be very odd for something to be writable here but the driver to make > it not writable. But we do need to account for this. Yeah, it may be odd but this is indeed what i915 driver appears to be doing in i915_gem_object_mmap(): if (i915_gem_object_is_readonly(obj)) { if (vma->vm_flags & VM_WRITE) { i915_gem_object_put(obj); return -EINVAL; } vm_flags_clear(vma, VM_MAYWRITE); } > > I've checked why your v2 version broke i915 and I think the reason maybe > > has nothing to do with i915. Just in case call_mmap() failed, it ended up > > jumping to unmap_and_free_vma which calls mapping_unmap_writable() but we > > didn't call mapping_map_writable() yet so the counter became imbalanced. > > yeah that must be the cause, I thought perhaps somehow > __remove_shared_vm_struct() got invoked by i915_gem_mmap() but I didn't > trace it through to see if it was possible. > > Looking at it again, i don't think that is possible, as we hold a mmap/vma > write lock, and the only operations that can cause > __remove_shared_vm_struct() to run are things that would not be able to do > so with this lock held. > > > So I'd be for returning to v2 version, just fix up the error handling > > paths... > > So in conclusion, I agree, this is the better approach. Will respin in v4. Thanks! Honza
diff --git a/mm/mmap.c b/mm/mmap.c index 6f6856b3267a..9fbee92aaaee 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2767,17 +2767,25 @@ unsigned long mmap_region(struct file *file, unsigned long addr, vma->vm_pgoff = pgoff; if (file) { - if (is_shared_maywrite(vm_flags)) { - error = mapping_map_writable(file->f_mapping); - if (error) - goto free_vma; - } + int writable_error = 0; + + if (vma_is_shared_maywrite(vma)) + writable_error = mapping_map_writable(file->f_mapping); vma->vm_file = get_file(file); error = call_mmap(file, vma); if (error) goto unmap_and_free_vma; + /* + * call_mmap() may have changed VMA flags, so retry this check + * if it failed before. + */ + if (writable_error && vma_is_shared_maywrite(vma)) { + error = writable_error; + goto close_and_free_vma; + } + /* * Expansion is handled above, merging is handled below. * Drivers should not alter the address of the VMA.