Message ID | 20230504170942.822147-1-revest@chromium.org |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp490665vqo; Thu, 4 May 2023 10:28:44 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ536yi+ggMu272VlwPgWQ2/ydTJLvMFpnc7ZQWuM74qpzOwEFvlNJf5EsJEswZBvLamHjTy X-Received: by 2002:a17:902:b949:b0:1a8:18ae:1b36 with SMTP id h9-20020a170902b94900b001a818ae1b36mr4286044pls.18.1683221323937; Thu, 04 May 2023 10:28:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683221323; cv=none; d=google.com; s=arc-20160816; b=CltZQputr05g1dlt71HqsYCel3MGKkgs1Rzxet6kBoagbRsXjIPGsMOmxZus7SRDTe gs33pJeWmSMnPGF2EzCSjVvZXW8o/VdfQRBlMurTFucfJzMBm+ojSlR2t3XpZIBnEOz1 YZ5WC9jxoNM4nFa1AHi3iHWZfzsvEUDWt2GnTWJtpXoWi0tcgYDjKUlrfRQwfDhd8g4J 06PH5mpvsWyLxddrZL4rBerZ2VXdLmnxOfZ2qMXoZAX6KM4HWd8QikqKcGOdwXnFXo+i oylfxJa8i+aLild13xeGQinlXhTRSGbW1WbSZ2HxI6SjlO91CHpyIRsaTFSGkm1ulW5d 0H1A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=cTpvEiosV1q0JWnkoeAHKjXmXojYH9L1RV1Ai4BYwtc=; b=cysxH94JIUowA1mwba99kL6wraYKBmzrr8BLcegW9ui+z8FmNIofrsbrmGSSS2rFAR 9zUVI1yNCmbSWtsGWC7EISQ2OB9+up96KctG8Ej/kYsE4W8ZzxqMZImPB6WJmw2cDOJm 1v/e3qKdJfJBBd8Qlaf7nOGK06WObKUtLZEDSQd6yyc750BPfq+YDrj3TkEI6xYBqoH0 c46EX+7nO69ePfk0oiKRjUfL6RAnvgGmvhMN6/JQGrkQ+RZeEExYHvJGJyfXEpfFDFsT dysfKr/xfG9JE+49PQQ+V5on0hLKewzb3cViBgvfDHz8c3ZadEyuGlt8gm0MSciNag3+ 352w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b="HMvvC/c+"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c1-20020a170903234100b001a52dd0195csi4351719plh.549.2023.05.04.10.28.27; Thu, 04 May 2023 10:28:43 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b="HMvvC/c+"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229745AbjEDRKM (ORCPT <rfc822;b08248@gmail.com> + 99 others); Thu, 4 May 2023 13:10:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41932 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229643AbjEDRKL (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 4 May 2023 13:10:11 -0400 Received: from mail-wm1-x335.google.com (mail-wm1-x335.google.com [IPv6:2a00:1450:4864:20::335]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DF7A43A9E for <linux-kernel@vger.kernel.org>; Thu, 4 May 2023 10:10:09 -0700 (PDT) Received: by mail-wm1-x335.google.com with SMTP id 5b1f17b1804b1-3f1958d3a53so8170085e9.0 for <linux-kernel@vger.kernel.org>; Thu, 04 May 2023 10:10:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1683220208; x=1685812208; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=cTpvEiosV1q0JWnkoeAHKjXmXojYH9L1RV1Ai4BYwtc=; b=HMvvC/c+u6SUB3jBk7xrcdUvj5q2RToJwmXxDE5z6sVZmnbqTbv6vuNLC4ZI3JY0+L Ft0j0CdooqcxCP7vQxASBQkyoqw5+JMCXmg2oztQ1c4Di3cLCTSHd6k74c4tB/C0niGL Mj2wqb7AKJ1ReiqLLRqU4KbU4zCvXM+pqvc7U= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683220208; x=1685812208; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=cTpvEiosV1q0JWnkoeAHKjXmXojYH9L1RV1Ai4BYwtc=; b=e8TzxvkivAuaWH6WpQTBHA4+jHxOv2gc5LH+Nyu9r8PFPpewYEyGAdGc4avPp2MpkR qjqJJRgdGriy5yMii5UZ0ugCpxzXCY/FG8wfSY067bogVMdj5S4KA+BrDT9HtIRDOvPE HQFRy9CYSYONnZBFcs4+qxiHT3mAfO3l1s1gzUKK8/XDxNLKmt1mIXttgEBrivjHvCG6 usaEjQyLE5Y1J/GnWm+Y9DHYJ24q5ub1ZCoKEsLff1NY+o5Y8iDTX6TEbjTfOngtv6vZ x1IMrmY+y7dh4f9+NBeYeoqpF05hfvSkuwd6WplzX/mxDR1FnLz1BfS4hqWlWkbf3NH2 ZUMw== X-Gm-Message-State: AC+VfDy7HQVnBQ9eQ5/gHrP+L0DTrYwnQ9LA6m7sahMGuUyfu9bFRuMr TjDxbvhhjm+SOaTEozVFiBXnIjZGoddjMWX8TPE= X-Received: by 2002:a7b:c8c3:0:b0:3f2:5028:a558 with SMTP id f3-20020a7bc8c3000000b003f25028a558mr336713wml.0.1683220207691; Thu, 04 May 2023 10:10:07 -0700 (PDT) Received: from revest.zrh.corp.google.com ([2a00:79e0:9d:6:c740:f74d:132c:ca99]) by smtp.gmail.com with ESMTPSA id q3-20020a1cf303000000b003f3157988f8sm5447895wmq.26.2023.05.04.10.10.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 May 2023 10:10:07 -0700 (PDT) From: Florent Revest <revest@chromium.org> To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: akpm@linux-foundation.org, catalin.marinas@arm.com, anshuman.khandual@arm.com, joey.gouly@arm.com, mhocko@suse.com, keescook@chromium.org, david@redhat.com, peterx@redhat.com, izbyshev@ispras.ru, nd@arm.com, broonie@kernel.org, szabolcs.nagy@arm.com, Florent Revest <revest@chromium.org> Subject: [PATCH 0/4] MDWE without inheritance Date: Thu, 4 May 2023 19:09:38 +0200 Message-ID: <20230504170942.822147-1-revest@chromium.org> X-Mailer: git-send-email 2.40.1.495.gc816e09b53d-goog MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1764985482987598289?= X-GMAIL-MSGID: =?utf-8?q?1764985482987598289?= |
Series | MDWE without inheritance | |
Message
Florent Revest
May 4, 2023, 5:09 p.m. UTC
Joey recently introduced a Memory-Deny-Write-Executable (MDWE) prctl which tags current with a flag that prevents pages that were previously not executable from becoming executable. This tag always gets inherited by children tasks. (it's in MMF_INIT_MASK) At Google, we've been using a somewhat similar downstream patch for a few years now. To make the adoption of this feature easier, we've had it support a mode in which the W^X flag does not propagate to children. For example, this is handy if a C process which wants W^X protection suspects it could start children processes that would use a JIT. I'd like to align our features with the upstream prctl. This series proposes a new NO_INHERIT flag to the MDWE prctl to make this kind of adoption easier. It sets a different flag in current that is not in MMF_INIT_MASK and which does not propagate. As part of looking into MDWE, I also fixed a couple of things in the MDWE test. Florent Revest (4): kselftest: vm: Fix tabs/spaces inconsistency in the mdwe test kselftest: vm: Fix mdwe's mmap_FIXED test case mm: Add a NO_INHERIT flag to the PR_SET_MDWE prctl kselftest: vm: Add tests for no-inherit memory-deny-write-execute include/linux/mman.h | 8 +- include/linux/sched/coredump.h | 1 + include/uapi/linux/prctl.h | 1 + kernel/sys.c | 29 +++++-- tools/include/uapi/linux/prctl.h | 1 + tools/testing/selftests/mm/mdwe_test.c | 110 +++++++++++++++++++++---- 6 files changed, 128 insertions(+), 22 deletions(-)
Comments
On Thu, May 04, 2023 at 07:09:38PM +0200, Florent Revest wrote: > Joey recently introduced a Memory-Deny-Write-Executable (MDWE) prctl which tags > current with a flag that prevents pages that were previously not executable from > becoming executable. > This tag always gets inherited by children tasks. (it's in MMF_INIT_MASK) > > At Google, we've been using a somewhat similar downstream patch for a few years > now. To make the adoption of this feature easier, we've had it support a mode in > which the W^X flag does not propagate to children. For example, this is handy if > a C process which wants W^X protection suspects it could start children > processes that would use a JIT. > > I'd like to align our features with the upstream prctl. This series proposes a > new NO_INHERIT flag to the MDWE prctl to make this kind of adoption easier. It > sets a different flag in current that is not in MMF_INIT_MASK and which does not > propagate. I don't think I have enough context, so sorry if I'm going to ask a naive question.. I can understand how current MDWE helps on not allowing any modifi-able content from becoming executable. How could NO_INHERIT help if it won't inherit and not in MMF_INIT_MASK? IIUC it means the restriction will only apply to the current process. Then I assume the process can escape from this rule simply by a fork(). If so, what's the point to protect at all? And, what's the difference of this comparing to disabling MDWE after being enabled (which seems to be forbidden for now, but it seems fork() can play a similar role of disabling it)? Thanks,
On Thu, May 4, 2023 at 10:06 PM Peter Xu <peterx@redhat.com> wrote: > > On Thu, May 04, 2023 at 07:09:38PM +0200, Florent Revest wrote: > > Joey recently introduced a Memory-Deny-Write-Executable (MDWE) prctl which tags > > current with a flag that prevents pages that were previously not executable from > > becoming executable. > > This tag always gets inherited by children tasks. (it's in MMF_INIT_MASK) > > > > At Google, we've been using a somewhat similar downstream patch for a few years > > now. To make the adoption of this feature easier, we've had it support a mode in > > which the W^X flag does not propagate to children. For example, this is handy if > > a C process which wants W^X protection suspects it could start children > > processes that would use a JIT. > > > > I'd like to align our features with the upstream prctl. This series proposes a > > new NO_INHERIT flag to the MDWE prctl to make this kind of adoption easier. It > > sets a different flag in current that is not in MMF_INIT_MASK and which does not > > propagate. > > I don't think I have enough context, so sorry if I'm going to ask a naive > question.. Not at all! :) You're absolutely right, it's important to address these points. > I can understand how current MDWE helps on not allowing any modifi-able > content from becoming executable. How could NO_INHERIT help if it won't > inherit and not in MMF_INIT_MASK? The way I see it, enabling MDWE is just a small step towards hardening a binary anyway. It can possibly make exploitation a bit harder in the case where the attacker has _just_: a write primitive they can use to write a shellcode somewhere and a primitive to make that page executable later. It's a fairly narrow protection already and I think it only really helps as part of a broader "defense in depth" strategy. > IIUC it means the restriction will only apply to the current process. Then > I assume the process can escape from this rule simply by a fork(). If so, > what's the point to protect at all? If we assume enough control from the attacker, then MDWE is already useless since it can be bypassed by writing to a file and then mmapping that file with PROT_EXEC. I think that's a good example of how "perfect can be the enemy of good" in security hardening. MDWE isn't a silver-bullet but it's a cheap trick and it makes a small dent in reducing the attack surface so it seems worth having anyway ? But indeed, to address your question, if you choose to use this NO_INHERIT flag: you're no longer protected if the attacker can fork() as part of their exploitation. I think it's been a useful trade-off for our internal users since, on the other hand, it also makes adoption a lot easier: our C++ services developers can trivially opt into a potpourri of hardening features without having to think too much about how they work under-the-hood. The default behavior has been to use a NO_INHERIT strategy so users don't get bad surprises the day when they try to spawn a JITted subcommand. In the meantime, their C++ service still gets a little bit of extra protection. > And, what's the difference of this comparing to disabling MDWE after being > enabled (which seems to be forbidden for now, but it seems fork() can play > a similar role of disabling it)? That would be functionally somewhat similar, yes. I think it mostly comes down to ease of adoption. I imagine that users who would opt into NO_INHERIT are those who are interested in MDWE for the binary they are writing but aren't 100% confident in what subprocesses they will run and so they don't have to think about disabling it after every fork.
On Fri, May 05, 2023 at 06:42:08PM +0200, Florent Revest wrote: > On Thu, May 4, 2023 at 10:06 PM Peter Xu <peterx@redhat.com> wrote: > > > > On Thu, May 04, 2023 at 07:09:38PM +0200, Florent Revest wrote: > > > Joey recently introduced a Memory-Deny-Write-Executable (MDWE) prctl which tags > > > current with a flag that prevents pages that were previously not executable from > > > becoming executable. > > > This tag always gets inherited by children tasks. (it's in MMF_INIT_MASK) > > > > > > At Google, we've been using a somewhat similar downstream patch for a few years > > > now. To make the adoption of this feature easier, we've had it support a mode in > > > which the W^X flag does not propagate to children. For example, this is handy if > > > a C process which wants W^X protection suspects it could start children > > > processes that would use a JIT. > > > > > > I'd like to align our features with the upstream prctl. This series proposes a > > > new NO_INHERIT flag to the MDWE prctl to make this kind of adoption easier. It > > > sets a different flag in current that is not in MMF_INIT_MASK and which does not > > > propagate. > > > > I don't think I have enough context, so sorry if I'm going to ask a naive > > question.. > > Not at all! :) You're absolutely right, it's important to address these points. > > > I can understand how current MDWE helps on not allowing any modifi-able > > content from becoming executable. How could NO_INHERIT help if it won't > > inherit and not in MMF_INIT_MASK? > > The way I see it, enabling MDWE is just a small step towards hardening > a binary anyway. It can possibly make exploitation a bit harder in the > case where the attacker has _just_: a write primitive they can use to > write a shellcode somewhere and a primitive to make that page > executable later. It's a fairly narrow protection already and I think > it only really helps as part of a broader "defense in depth" strategy. > > > IIUC it means the restriction will only apply to the current process. Then > > I assume the process can escape from this rule simply by a fork(). If so, > > what's the point to protect at all? > > If we assume enough control from the attacker, then MDWE is already > useless since it can be bypassed by writing to a file and then > mmapping that file with PROT_EXEC. I think that's a good example of > how "perfect can be the enemy of good" in security hardening. MDWE > isn't a silver-bullet but it's a cheap trick and it makes a small dent > in reducing the attack surface so it seems worth having anyway ? > > But indeed, to address your question, if you choose to use this > NO_INHERIT flag: you're no longer protected if the attacker can fork() > as part of their exploitation. I think it's been a useful trade-off > for our internal users since, on the other hand, it also makes > adoption a lot easier: our C++ services developers can trivially opt > into a potpourri of hardening features without having to think too > much about how they work under-the-hood. The default behavior has been > to use a NO_INHERIT strategy so users don't get bad surprises the day > when they try to spawn a JITted subcommand. In the meantime, their C++ > service still gets a little bit of extra protection. > > > And, what's the difference of this comparing to disabling MDWE after being > > enabled (which seems to be forbidden for now, but it seems fork() can play > > a similar role of disabling it)? > > That would be functionally somewhat similar, yes. I think it mostly > comes down to ease of adoption. I imagine that users who would opt > into NO_INHERIT are those who are interested in MDWE for the binary > they are writing but aren't 100% confident in what subprocesses they > will run and so they don't have to think about disabling it after > every fork. Okay, that makes sense to me. Thanks. Since the original MDWE was for systemd, I'm wondering what will happen if some program like what you said is invoked by systemd and with MDWE enabled already. Currently in your patch IIUC MDWE_NO_INHERIT will fail directly on MDWE enabled process, but then it makes me think whether it makes more sense to allow converting MDWE->MDWE_NO_INHERIT in this case. It seems to provide a most broad coverage on system daemons using MDWE starting from systemd initial process, meanwhile allows specific daemons to fork into anything like a JIT process so it can make itself NO_INHERIT. Attackers won't leverage this because MDWE_NO_INHERIT also means MDWE enabled.
On Mon, May 8, 2023 at 3:29 AM Peter Xu <peterx@redhat.com> wrote: > > On Fri, May 05, 2023 at 06:42:08PM +0200, Florent Revest wrote: > > On Thu, May 4, 2023 at 10:06 PM Peter Xu <peterx@redhat.com> wrote: > > > > > > On Thu, May 04, 2023 at 07:09:38PM +0200, Florent Revest wrote: > > > > Joey recently introduced a Memory-Deny-Write-Executable (MDWE) prctl which tags > > > > current with a flag that prevents pages that were previously not executable from > > > > becoming executable. > > > > This tag always gets inherited by children tasks. (it's in MMF_INIT_MASK) > > > > > > > > At Google, we've been using a somewhat similar downstream patch for a few years > > > > now. To make the adoption of this feature easier, we've had it support a mode in > > > > which the W^X flag does not propagate to children. For example, this is handy if > > > > a C process which wants W^X protection suspects it could start children > > > > processes that would use a JIT. > > > > > > > > I'd like to align our features with the upstream prctl. This series proposes a > > > > new NO_INHERIT flag to the MDWE prctl to make this kind of adoption easier. It > > > > sets a different flag in current that is not in MMF_INIT_MASK and which does not > > > > propagate. > > > > > > I don't think I have enough context, so sorry if I'm going to ask a naive > > > question.. > > > > Not at all! :) You're absolutely right, it's important to address these points. > > > > > I can understand how current MDWE helps on not allowing any modifi-able > > > content from becoming executable. How could NO_INHERIT help if it won't > > > inherit and not in MMF_INIT_MASK? > > > > The way I see it, enabling MDWE is just a small step towards hardening > > a binary anyway. It can possibly make exploitation a bit harder in the > > case where the attacker has _just_: a write primitive they can use to > > write a shellcode somewhere and a primitive to make that page > > executable later. It's a fairly narrow protection already and I think > > it only really helps as part of a broader "defense in depth" strategy. > > > > > IIUC it means the restriction will only apply to the current process. Then > > > I assume the process can escape from this rule simply by a fork(). If so, > > > what's the point to protect at all? > > > > If we assume enough control from the attacker, then MDWE is already > > useless since it can be bypassed by writing to a file and then > > mmapping that file with PROT_EXEC. I think that's a good example of > > how "perfect can be the enemy of good" in security hardening. MDWE > > isn't a silver-bullet but it's a cheap trick and it makes a small dent > > in reducing the attack surface so it seems worth having anyway ? > > > > But indeed, to address your question, if you choose to use this > > NO_INHERIT flag: you're no longer protected if the attacker can fork() > > as part of their exploitation. I think it's been a useful trade-off > > for our internal users since, on the other hand, it also makes > > adoption a lot easier: our C++ services developers can trivially opt > > into a potpourri of hardening features without having to think too > > much about how they work under-the-hood. The default behavior has been > > to use a NO_INHERIT strategy so users don't get bad surprises the day > > when they try to spawn a JITted subcommand. In the meantime, their C++ > > service still gets a little bit of extra protection. > > > > > And, what's the difference of this comparing to disabling MDWE after being > > > enabled (which seems to be forbidden for now, but it seems fork() can play > > > a similar role of disabling it)? > > > > That would be functionally somewhat similar, yes. I think it mostly > > comes down to ease of adoption. I imagine that users who would opt > > into NO_INHERIT are those who are interested in MDWE for the binary > > they are writing but aren't 100% confident in what subprocesses they > > will run and so they don't have to think about disabling it after > > every fork. > > Okay, that makes sense to me. Thanks. > > Since the original MDWE was for systemd, I'm wondering what will happen if > some program like what you said is invoked by systemd and with MDWE enabled > already. Good question > Currently in your patch IIUC MDWE_NO_INHERIT will fail directly on MDWE > enabled process, Yes, I tried to stay close to the spirit of the existing logic (which doesn't allow any sort of privilege gains) but this is not particularly a requirement on our side so I'm quite flexible here. Maybe Joey has an input here ? > but then it makes me think whether it makes more sense to > allow converting MDWE->MDWE_NO_INHERIT in this case. It seems to provide a > most broad coverage on system daemons using MDWE starting from systemd > initial process, meanwhile allows specific daemons to fork into anything > like a JIT process so it can make itself NO_INHERIT. Attackers won't > leverage this because MDWE_NO_INHERIT also means MDWE enabled. I should have cc-ed systemd folks who could have opinions on this. I will for v2. + cc Topi & Lennart
On Mon, May 08, 2023 at 02:12:21PM +0200, Florent Revest wrote: > On Mon, May 8, 2023 at 3:29 AM Peter Xu <peterx@redhat.com> wrote: > > On Fri, May 05, 2023 at 06:42:08PM +0200, Florent Revest wrote: > > > On Thu, May 4, 2023 at 10:06 PM Peter Xu <peterx@redhat.com> wrote: > > > > And, what's the difference of this comparing to disabling MDWE after being > > > > enabled (which seems to be forbidden for now, but it seems fork() can play > > > > a similar role of disabling it)? > > > > > > That would be functionally somewhat similar, yes. I think it mostly > > > comes down to ease of adoption. I imagine that users who would opt > > > into NO_INHERIT are those who are interested in MDWE for the binary > > > they are writing but aren't 100% confident in what subprocesses they > > > will run and so they don't have to think about disabling it after > > > every fork. > > > > Okay, that makes sense to me. Thanks. > > > > Since the original MDWE was for systemd, I'm wondering what will happen if > > some program like what you said is invoked by systemd and with MDWE enabled > > already. > > Good question I think JITs work around this by creating two separate mappings of the same pages, one RX and the other RW (rather than toggling the permission with mprotect()). I had the impression Chromium can use memfd to do something similar but I never checked. > > Currently in your patch IIUC MDWE_NO_INHERIT will fail directly on MDWE > > enabled process, > > Yes, I tried to stay close to the spirit of the existing logic (which > doesn't allow any sort of privilege gains) but this is not > particularly a requirement on our side so I'm quite flexible here. I think we should keep the original behaviour of systemd here, otherwise they won't transition to the new interface and keep using the SECCOMP BPF approach (which, in addition, prevents glibc from setting PROT_BTI on an already executable mapping). To me MDWE is not about preventing JITs but rather ensuring buggy programs don't end up with WX mappings. We ended up this way because of the SECCOMP BPF limitations (just guessing, I haven't been involved in its design). With a no-inherit MDWE, one can introduce an additional policy for systemd. It would be a sysadmin decision which one to enable and maybe current (inherit) MDWE will disappear in time. x86 has protection keys and arm64 will soon have permission overlays that allow user-space to toggle between RX and RW (Joey is looking at the arm64 support). I'm not sure how we'll end up implemented this on arm64 (and haven't looked at x86) but I have a suspicion MDWE will get in the way as the base page table permission will probably need PROT_WRITE|PROT_EXEC.
On 8.5.2023 17.10, Catalin Marinas wrote: > I think we should keep the original behaviour of systemd here, otherwise > they won't transition to the new interface and keep using the SECCOMP > BPF approach (which, in addition, prevents glibc from setting PROT_BTI > on an already executable mapping). Systemd has transitioned to prctl(PR_SET_MDWE) method since release of v253, so the original behaviour definitely should be kept. > To me MDWE is not about preventing JITs but rather ensuring buggy > programs don't end up with WX mappings. We ended up this way because of > the SECCOMP BPF limitations (just guessing, I haven't been involved in > its design). With a no-inherit MDWE, one can introduce an additional > policy for systemd. It would be a sysadmin decision which one to enable > and maybe current (inherit) MDWE will disappear in time. There could be a new setting for this, like MemoryDenyWriteExecute=no-inherit. I'd only use it for those special cases where MemoryDenyWriteExecute=yes can't be used. > x86 has protection keys and arm64 will soon have permission overlays > that allow user-space to toggle between RX and RW (Joey is looking at > the arm64 support). I'm not sure how we'll end up implemented this on > arm64 (and haven't looked at x86) but I have a suspicion MDWE will get > in the way as the base page table permission will probably need > PROT_WRITE|PROT_EXEC. Wouldn't those features defeat any gains from MDWE? The features probably should be forbidden with MemoryDenyWriteExecute=yes. -Topi
On Mon, May 08, 2023 at 08:21:16PM +0300, Topi Miettinen wrote: > On 8.5.2023 17.10, Catalin Marinas wrote: > > I think we should keep the original behaviour of systemd here, otherwise > > they won't transition to the new interface and keep using the SECCOMP > > BPF approach (which, in addition, prevents glibc from setting PROT_BTI > > on an already executable mapping). > > Systemd has transitioned to prctl(PR_SET_MDWE) method since release of v253, > so the original behaviour definitely should be kept. That's great. So yes, no ABI changes allowed anymore. > > x86 has protection keys and arm64 will soon have permission overlays > > that allow user-space to toggle between RX and RW (Joey is looking at > > the arm64 support). I'm not sure how we'll end up implemented this on > > arm64 (and haven't looked at x86) but I have a suspicion MDWE will get > > in the way as the base page table permission will probably need > > PROT_WRITE|PROT_EXEC. > > Wouldn't those features defeat any gains from MDWE? The features probably > should be forbidden with MemoryDenyWriteExecute=yes. The permission overlays (controlled by the user) can only further restrict the mmap() permissions. So MDWE would still work as expected. If one wants to toggle between RW and RX with overlays, the overall mmap() needs to be RWX and it won't work if MDWE=yes. No need to explicitly disable the overlays feature. On arm64 at least, with the introduction of permission overlays we also have the notion of "Read, Execute if not Write". This permission automatically disables Exec if the mapping becomes writable (overlays can disable writable, allowing exec). We could have a new MDWE policy which allows this, though I'm not that keen on using it in Linux since background permission changes done by the kernel can lead to an unexpected executable permission (e.g. marking a page read-only for clean/dirty tracking or in preparation for CoW after fork()).