Message ID | cover.1680560277.git.lstoakes@gmail.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp2621904vqo; Mon, 3 Apr 2023 15:29:58 -0700 (PDT) X-Google-Smtp-Source: AKy350aW61kXxAKlJk8GhyrKbJln3y7KxG8sys6TIL1w193Ay2Xtv4QZsBMHibww3YmK96qXxl0i X-Received: by 2002:a05:6a20:670a:b0:da:53ca:8f26 with SMTP id q10-20020a056a20670a00b000da53ca8f26mr247175pzh.30.1680560998383; Mon, 03 Apr 2023 15:29:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680560998; cv=none; d=google.com; s=arc-20160816; b=Kf0nkJIosmXpJ5dgvn6ncHMm6uEeTOrtoVrHCHctEcDimL/69lgkuP75VM1Ctp2Szv QO7HEV4H+thP8ri9UhXsQvYSlzIEW1F7QbacN8RC94M5jUvXFh08BRp3kFam/t2moiL4 2eTl2jGaTjTAgt8mvupfSObq3A7Vi6da29qtVhh2fiQMohp/Ki9BaIGzKN8ogEto9aUf AK8/j8yMRahjztrzoyGGKGESPb26lALfdMKG4FAYuTI267kZ0BcLlajHPVCK/nP2rJSN EK/LL5Is2d3sDgrCOZNt4nfxIkBhpD4tAGycyns4DzCcPn6d6tteZ+w9dahBldrW2zjx T9KA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=OioxhUaBsoagdWKYjA8WltvG/ViKSROcFBG3zkltOsQ=; b=eSl01FfCTZ+26+SN0cZv0bbCZ4sJC/6+HfTcxnGosrKZB8w2AibaKg955xhIF/ETOK S69fdLy+B/hDVMO/IdRalqAzQcdg1xbKbOYawKMj5ss4P3oTRKZf7h73ZsvMA247ERRb KC6eosW94mNj/CNjxTDvWLRqAcJdaUKSyHx3Yq/QlpVhagsxqkChJVENjbRbsXYcGpuB IID8stjnydddaYo9/ZCAoKC/k70RNDXPYxbxUN70S1G1rSwY62Af2YT9Rm4VA6Q8hFii 54cL1FCowePZuIdxvQM5KOzr6W/aRFnTFi++Wk+g4vZkNul0fm9ZrNujugYesA62PkUN HiEA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=ZB4YtgtH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z25-20020a634c19000000b004fd72ef0180si8879428pga.99.2023.04.03.15.29.46; Mon, 03 Apr 2023 15:29:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=ZB4YtgtH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233692AbjDCW2p (ORCPT <rfc822;zwp10758@gmail.com> + 99 others); Mon, 3 Apr 2023 18:28:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45748 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231794AbjDCW2o (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 3 Apr 2023 18:28:44 -0400 Received: from mail-wm1-x32f.google.com (mail-wm1-x32f.google.com [IPv6:2a00:1450:4864:20::32f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0A6A130C1; Mon, 3 Apr 2023 15:28:43 -0700 (PDT) Received: by mail-wm1-x32f.google.com with SMTP id hg25-20020a05600c539900b003f05a99a841so1024484wmb.3; Mon, 03 Apr 2023 15:28:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1680560921; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=OioxhUaBsoagdWKYjA8WltvG/ViKSROcFBG3zkltOsQ=; b=ZB4YtgtHuGTtxSQnDSNuKDwOX9f/ubCsGGjYYgGbHGMjD1/Zm2nfVmEfT8lpOKa63S zidEBr3cA8hPgiuN1gO+MV4hCsXk+UryUgsmsijFfo1OUioR68BEpJw0WuohlJ+WYFci CdE8ppoSRZXkRrVURkVZwR+TcWTgMEvbwmWgrg6NOwO3qtWQZyZRQb+drrDJ6oBAl+yv d/xuKm8vT4TnfRGHbKFjI5BR16g58gwN9CkSaZ0mcy7gqKt8/nmv6viJk2rKtAJ8/x7/ ZRFVLjTSv7C2iMkatIyuJdevtZqzCWM8ma9hDw92/D/2tlOEdZkU0/cYmDpgeWLeVNfi sS2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680560921; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=OioxhUaBsoagdWKYjA8WltvG/ViKSROcFBG3zkltOsQ=; b=IWzo2N245vyFoFLOwIeP9oRJJT7Zw3ssb8Ucf4144v2FlOU4AyQpeDxQhSI7YapFfN wIWkz2rDvCR6dExmXTsGJCJWrdFAWQ0sMVRsohGqR4Jzkgr7QFnJiJwtFY6HuZOngW+K ea1l/nHh3+1l384hIKVk7EF4KbNHPE+xfot5pVkUJ7An4q3LKi/IakP1FFkQfhJOXtpU G/y1sEvgunVoQeGy0+MjNW18ZFlcmmvCtiLC2KzVICFNWSjrFiJUkbpwfEmM9uN2wGYJ +wZZX0BICUG/IgxZwZmgTekJNb1+a/a1RIkKJt+JHpWCowbs+AxaXhIxVbyVEhh7BwJu HeHg== X-Gm-Message-State: AAQBX9dRMbjJ1gjeC5LQ4fKgW17pwYB8CSRFO50glEygMpB/tfx47IMb Ak1g4KyzdHbQo5cdUghQVrw= X-Received: by 2002:a7b:c4c6:0:b0:3ed:5a12:5641 with SMTP id g6-20020a7bc4c6000000b003ed5a125641mr585231wmk.36.1680560921186; Mon, 03 Apr 2023 15:28:41 -0700 (PDT) Received: from lucifer.home (host86-156-84-164.range86-156.btcentralplus.com. [86.156.84.164]) by smtp.googlemail.com with ESMTPSA id u17-20020a05600c19d100b003dd1bd0b915sm20731309wmq.22.2023.04.03.15.28.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Apr 2023 15:28:40 -0700 (PDT) From: Lorenzo Stoakes <lstoakes@gmail.com> To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Andrew Morton <akpm@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org>, Mike Kravetz <mike.kravetz@oracle.com>, Muchun Song <muchun.song@linux.dev>, Alexander Viro <viro@zeniv.linux.org.uk>, Christian Brauner <brauner@kernel.org>, Andy Lutomirski <luto@amacapital.net>, Lorenzo Stoakes <lstoakes@gmail.com> Subject: [RFC PATCH 0/3] permit write-sealed memfd read-only shared mappings Date: Mon, 3 Apr 2023 23:28:29 +0100 Message-Id: <cover.1680560277.git.lstoakes@gmail.com> X-Mailer: git-send-email 2.40.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1762195929027419969?= X-GMAIL-MSGID: =?utf-8?q?1762195929027419969?= |
Series |
permit write-sealed memfd read-only shared mappings
|
|
Message
Lorenzo Stoakes
April 3, 2023, 10:28 p.m. UTC
This patch series is in two parts:- 1. Currently there are a number of places in the kernel where we assume VM_SHARED implies that a mapping is writable. Let's be slightly less strict and relax this restriction in the case that VM_MAYWRITE is not set. This should have no noticeable impact as the lack of VM_MAYWRITE implies that the mapping can not be made writable via mprotect() or any other means. 2. Align the behaviour of F_SEAL_WRITE and F_SEAL_FUTURE_WRITE on mmap(). The latter already clears the VM_MAYWRITE flag for a sealed read-only mapping, we simply extend this to F_SEAL_WRITE too. For this to have effect, we must also invoke call_mmap() before mapping_map_writable(). As this is quite a fundamental change on the assumptions around VM_SHARED and since this causes a visible change to userland (in permitting read-only shared mappings on F_SEAL_WRITE mappings), I am putting forward as an RFC to see if there is anything terribly wrong with it. I suspect even if the patch series as a whole is unpalatable, there are probably things we can salvage from it in any case. Thanks to Andy Lutomirski who inspired the series! Lorenzo Stoakes (3): mm: drop the assumption that VM_SHARED always implies writable mm: update seal_check_[future_]write() to include F_SEAL_WRITE as well mm: perform the mapping_map_writable() check after call_mmap() fs/hugetlbfs/inode.c | 2 +- include/linux/fs.h | 4 ++-- include/linux/mm.h | 24 ++++++++++++++++++------ kernel/fork.c | 2 +- mm/filemap.c | 2 +- mm/madvise.c | 2 +- mm/mmap.c | 22 +++++++++++----------- mm/shmem.c | 2 +- 8 files changed, 36 insertions(+), 24 deletions(-) -- 2.40.0
Comments
Hi! On Mon 03-04-23 23:28:29, Lorenzo Stoakes wrote: > This patch series is in two parts:- > > 1. Currently there are a number of places in the kernel where we assume > VM_SHARED implies that a mapping is writable. Let's be slightly less > strict and relax this restriction in the case that VM_MAYWRITE is not > set. > > This should have no noticeable impact as the lack of VM_MAYWRITE implies > that the mapping can not be made writable via mprotect() or any other > means. > > 2. Align the behaviour of F_SEAL_WRITE and F_SEAL_FUTURE_WRITE on mmap(). > The latter already clears the VM_MAYWRITE flag for a sealed read-only > mapping, we simply extend this to F_SEAL_WRITE too. > > For this to have effect, we must also invoke call_mmap() before > mapping_map_writable(). > > As this is quite a fundamental change on the assumptions around VM_SHARED > and since this causes a visible change to userland (in permitting read-only > shared mappings on F_SEAL_WRITE mappings), I am putting forward as an RFC > to see if there is anything terribly wrong with it. So what I miss in this series is what the motivation is. Is it that you need to map F_SEAL_WRITE read-only? Why? Honza
On Fri, Apr 21, 2023 at 11:01:26AM +0200, Jan Kara wrote: > Hi! > > On Mon 03-04-23 23:28:29, Lorenzo Stoakes wrote: > > This patch series is in two parts:- > > > > 1. Currently there are a number of places in the kernel where we assume > > VM_SHARED implies that a mapping is writable. Let's be slightly less > > strict and relax this restriction in the case that VM_MAYWRITE is not > > set. > > > > This should have no noticeable impact as the lack of VM_MAYWRITE implies > > that the mapping can not be made writable via mprotect() or any other > > means. > > > > 2. Align the behaviour of F_SEAL_WRITE and F_SEAL_FUTURE_WRITE on mmap(). > > The latter already clears the VM_MAYWRITE flag for a sealed read-only > > mapping, we simply extend this to F_SEAL_WRITE too. > > > > For this to have effect, we must also invoke call_mmap() before > > mapping_map_writable(). > > > > As this is quite a fundamental change on the assumptions around VM_SHARED > > and since this causes a visible change to userland (in permitting read-only > > shared mappings on F_SEAL_WRITE mappings), I am putting forward as an RFC > > to see if there is anything terribly wrong with it. > > So what I miss in this series is what the motivation is. Is it that you need > to map F_SEAL_WRITE read-only? Why? > This originated from the discussion in [1], which refers to the bug reported in [2]. Essentially the user is write-sealing a memfd then trying to mmap it read-only, but receives an -EPERM error. F_SEAL_FUTURE_WRITE _does_ explicitly permit this but F_SEAL_WRITE does not. The fcntl() man page states: Furthermore, trying to create new shared, writable memory-mappings via mmap(2) will also fail with EPERM. So the kernel does not behave as the documentation states. I took the user-supplied repro and slightly modified it, enclosed below. After this patch series, this code works correctly. I think there's definitely a case for the VM_MAYWRITE part of this patch series even if the memfd bits are not considered useful, as we do seem to make the implicit assumption that MAP_SHARED == writable even if !VM_MAYWRITE which seems odd. Reproducer:- int main() { int fd = memfd_create("test", MFD_ALLOW_SEALING); if (fd == -1) { perror("memfd_create"); return EXIT_FAILURE; } write(fd, "test", 4); if (fcntl(fd, F_ADD_SEALS, F_SEAL_WRITE) == -1) { perror("fcntl"); return EXIT_FAILURE; } void *ret = mmap(NULL, 4, PROT_READ, MAP_SHARED, fd, 0); if (ret == MAP_FAILED) { perror("mmap"); return EXIT_FAILURE; } return EXIT_SUCCESS; } [1]:https://lore.kernel.org/all/20230324133646.16101dfa666f253c4715d965@linux-foundation.org/ [2]:https://bugzilla.kernel.org/show_bug.cgi?id=217238 > Honza > -- > Jan Kara <jack@suse.com> > SUSE Labs, CR
On Fri 21-04-23 22:23:12, Lorenzo Stoakes wrote: > On Fri, Apr 21, 2023 at 11:01:26AM +0200, Jan Kara wrote: > > Hi! > > > > On Mon 03-04-23 23:28:29, Lorenzo Stoakes wrote: > > > This patch series is in two parts:- > > > > > > 1. Currently there are a number of places in the kernel where we assume > > > VM_SHARED implies that a mapping is writable. Let's be slightly less > > > strict and relax this restriction in the case that VM_MAYWRITE is not > > > set. > > > > > > This should have no noticeable impact as the lack of VM_MAYWRITE implies > > > that the mapping can not be made writable via mprotect() or any other > > > means. > > > > > > 2. Align the behaviour of F_SEAL_WRITE and F_SEAL_FUTURE_WRITE on mmap(). > > > The latter already clears the VM_MAYWRITE flag for a sealed read-only > > > mapping, we simply extend this to F_SEAL_WRITE too. > > > > > > For this to have effect, we must also invoke call_mmap() before > > > mapping_map_writable(). > > > > > > As this is quite a fundamental change on the assumptions around VM_SHARED > > > and since this causes a visible change to userland (in permitting read-only > > > shared mappings on F_SEAL_WRITE mappings), I am putting forward as an RFC > > > to see if there is anything terribly wrong with it. > > > > So what I miss in this series is what the motivation is. Is it that you need > > to map F_SEAL_WRITE read-only? Why? > > > > This originated from the discussion in [1], which refers to the bug > reported in [2]. Essentially the user is write-sealing a memfd then trying > to mmap it read-only, but receives an -EPERM error. > > F_SEAL_FUTURE_WRITE _does_ explicitly permit this but F_SEAL_WRITE does not. > > The fcntl() man page states: > > Furthermore, trying to create new shared, writable memory-mappings via > mmap(2) will also fail with EPERM. > > So the kernel does not behave as the documentation states. > > I took the user-supplied repro and slightly modified it, enclosed > below. After this patch series, this code works correctly. > > I think there's definitely a case for the VM_MAYWRITE part of this patch > series even if the memfd bits are not considered useful, as we do seem to > make the implicit assumption that MAP_SHARED == writable even if > !VM_MAYWRITE which seems odd. Thanks for the explanation! Could you please include this information in the cover letter (perhaps in a form of a short note and reference to the mailing list) for future reference? Thanks! Honza
On Mon, Apr 24, 2023 at 02:19:36PM +0200, Jan Kara wrote: > On Fri 21-04-23 22:23:12, Lorenzo Stoakes wrote: > > On Fri, Apr 21, 2023 at 11:01:26AM +0200, Jan Kara wrote: > > > Hi! > > > > > > On Mon 03-04-23 23:28:29, Lorenzo Stoakes wrote: > > > > This patch series is in two parts:- > > > > > > > > 1. Currently there are a number of places in the kernel where we assume > > > > VM_SHARED implies that a mapping is writable. Let's be slightly less > > > > strict and relax this restriction in the case that VM_MAYWRITE is not > > > > set. > > > > > > > > This should have no noticeable impact as the lack of VM_MAYWRITE implies > > > > that the mapping can not be made writable via mprotect() or any other > > > > means. > > > > > > > > 2. Align the behaviour of F_SEAL_WRITE and F_SEAL_FUTURE_WRITE on mmap(). > > > > The latter already clears the VM_MAYWRITE flag for a sealed read-only > > > > mapping, we simply extend this to F_SEAL_WRITE too. > > > > > > > > For this to have effect, we must also invoke call_mmap() before > > > > mapping_map_writable(). > > > > > > > > As this is quite a fundamental change on the assumptions around VM_SHARED > > > > and since this causes a visible change to userland (in permitting read-only > > > > shared mappings on F_SEAL_WRITE mappings), I am putting forward as an RFC > > > > to see if there is anything terribly wrong with it. > > > > > > So what I miss in this series is what the motivation is. Is it that you need > > > to map F_SEAL_WRITE read-only? Why? > > > > > > > This originated from the discussion in [1], which refers to the bug > > reported in [2]. Essentially the user is write-sealing a memfd then trying > > to mmap it read-only, but receives an -EPERM error. > > > > F_SEAL_FUTURE_WRITE _does_ explicitly permit this but F_SEAL_WRITE does not. > > > > The fcntl() man page states: > > > > Furthermore, trying to create new shared, writable memory-mappings via > > mmap(2) will also fail with EPERM. > > > > So the kernel does not behave as the documentation states. > > > > I took the user-supplied repro and slightly modified it, enclosed > > below. After this patch series, this code works correctly. > > > > I think there's definitely a case for the VM_MAYWRITE part of this patch > > series even if the memfd bits are not considered useful, as we do seem to > > make the implicit assumption that MAP_SHARED == writable even if > > !VM_MAYWRITE which seems odd. > > Thanks for the explanation! Could you please include this information in > the cover letter (perhaps in a form of a short note and reference to the > mailing list) for future reference? Thanks! > > Honza > Sure, apologies for not being clear about that :) I may respin this as a non-RFC (with updated description of course) as its received very little attention as an RFC and I don't think it's so insane/huge a concept as to warrant remaining one. > -- > Jan Kara <jack@suse.com> > SUSE Labs, CR