From patchwork Mon Aug 14 08:40:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aleksa Sarai X-Patchwork-Id: 135323 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b824:0:b0:3f2:4152:657d with SMTP id z4csp2650932vqi; Mon, 14 Aug 2023 03:36:24 -0700 (PDT) X-Google-Smtp-Source: AGHT+IG7MQN0pfZ0DvSP1jZRgMsBzpz4tDm9Mkajacj4gwYc9eVxkIU2J2zwocW1vdGLUiz+eJWh X-Received: by 2002:a17:903:2342:b0:1bc:e6a:205a with SMTP id c2-20020a170903234200b001bc0e6a205amr12621797plh.4.1692009384244; Mon, 14 Aug 2023 03:36:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1692009384; cv=none; d=google.com; s=arc-20160816; b=slUxbAsRezSfqS3HqV79EPYeIpEyYzbmz36YiCkfe7lQWIoUN8K+v7smJ+V/QMLPj+ eqX0I5rvXszGqFlst/CCVM7i5O7zlOq7z4rqTimj68ITotUF/DwF8SXByLbz2qmENVFs ADmMdDRBATiaG+7LxHawwtKY849gWqDSwOe6yBMpi8Fyz8+5ZoATLLcAkLzFkwZTBgEc CVY5dPMg2kjNlJGENTbhDW3mFpTZYykBhaaVlPSYkEQprbALYB6258j6mNcA4mYIB01G DSgekmYrH1C9Deg7nQb+p4PlJ7AYx1/aELGzoSSPKAr2KJK7Ml/DJHa9oApIhoiZPouj 8UqQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:in-reply-to:references:message-id :content-transfer-encoding:mime-version:subject:date:from :dkim-signature; bh=7gQyeuP4S02FF5qxYTiiXRiidByA4CAzLTryBvJ7G7w=; fh=2nZKxd+TZqDRZrcTvF52H2/zIyJN8fpOeSuAynFSd6k=; b=kvrc0gzjxCQ848qlKO7HuIyHgBhnUZSFUEB/Ij7FMNn3nLb850s4meQFmeQ2+K5AuQ Z8ynE3Tkosu8kRny0UqRCuz1Wm1QVWD0bsY6iBLPF/tfDuyI48YSiegl/n8Q9hRjeIRV KERn7O+S3MFb/hxHWCcBSh+dZSQDgaFOSd1+MHbGvjJTWi23Gz42bcElmzYh/JIuwMt0 zHXx47KsXAlimVRfyZSSFaZZg8THEh7Gl0kZFRaXRwTYfBELx/XgMcItHRyNVQuHKGyB yqqxRZVmQzDj5TWY4o+b66NiYZbp9vBmNX6egt3UH0JlZ6U5vjdRvv6wY3xnUCF70ykR kqeA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cyphar.com header.s=MBO0001 header.b=PWInDqWM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=cyphar.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n12-20020a170902f60c00b001b9d2010c39si8090368plg.192.2023.08.14.03.36.11; Mon, 14 Aug 2023 03:36:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@cyphar.com header.s=MBO0001 header.b=PWInDqWM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=cyphar.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234828AbjHNImF (ORCPT + 99 others); Mon, 14 Aug 2023 04:42:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52246 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234860AbjHNIlk (ORCPT ); Mon, 14 Aug 2023 04:41:40 -0400 Received: from mout-p-101.mailbox.org (mout-p-101.mailbox.org [80.241.56.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1337410EA; Mon, 14 Aug 2023 01:41:35 -0700 (PDT) Received: from smtp202.mailbox.org (smtp202.mailbox.org [10.196.197.202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-101.mailbox.org (Postfix) with ESMTPS id 4RPSYR2bs0z9sWB; Mon, 14 Aug 2023 10:41:31 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cyphar.com; s=MBO0001; t=1692002491; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7gQyeuP4S02FF5qxYTiiXRiidByA4CAzLTryBvJ7G7w=; b=PWInDqWMMKzI/ONQxGwzOm9Nqjj0QWXmLPZUkhDGsxbUFUikujZWbEY1MqZPB1WBCvi77e kiKJ8pi/2PteW88+lQBK8ltXeZV9mb2bYVBIT+xToxGb7CsmZSjDBOtgWYsBzPZwis7wXQ RtHIduy6h2n0Y08EgBWVCaB3NB4595F6cDxfr1hTZLGJAVUq5dt4TkT/kmSYd5o/jIhMC4 ijPwqZ/3W9OYLr4Ca7Mt3QrZIxPfcKZbkuVIeM0k0uGsxknhulEqFewn6BGmOkqAjyH3EK i++D6TrN/b3E0iNQOwpKhc/n00nHwaN4m35+mYh5wEz06ygRPqfkMrBizyhlDw== From: Aleksa Sarai Date: Mon, 14 Aug 2023 18:40:57 +1000 Subject: [PATCH v2 1/5] selftests: memfd: error out test process when child test fails MIME-Version: 1.0 Message-Id: <20230814-memfd-vm-noexec-uapi-fixes-v2-1-7ff9e3e10ba6@cyphar.com> References: <20230814-memfd-vm-noexec-uapi-fixes-v2-0-7ff9e3e10ba6@cyphar.com> In-Reply-To: <20230814-memfd-vm-noexec-uapi-fixes-v2-0-7ff9e3e10ba6@cyphar.com> To: Andrew Morton , Shuah Khan , Jeff Xu , Kees Cook , Daniel Verkamp Cc: Christian Brauner , Dominique Martinet , stable@vger.kernel.org, linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, Aleksa Sarai X-Developer-Signature: v=1; a=openpgp-sha256; l=1390; i=cyphar@cyphar.com; h=from:subject:message-id; bh=+Up0ghrBW/nfRj4bC3B4yqL0dFwv9KTB+GgoZwyV0aw=; b=owGbwMvMwCWmMf3Xpe0vXfIZT6slMaTcfLFiw/TsfmWDxoNz8nnmrvi1iHnrP6m7AWasOo8m3 1wekyNc2FHKwiDGxSArpsiyzc8zdNP8xVeSP61kg5nDygQyhIGLUwAmkpfFyDBRc+qvi5+lrvwq Uu9s9nvj/PqGRbrzrqg2Y4FZ6bO7O9cw/NPdWbdxP9tDl7XxbgXfxdZNaXf791I7X+J3+4PgVBu 9RXwA X-Developer-Key: i=cyphar@cyphar.com; a=openpgp; fpr=C9C370B246B09F6DBCFC744C34401015D1D2D386 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1774200432060762698 X-GMAIL-MSGID: 1774200432060762698 Before this change, a test runner using this self test would see a return code of 0 when the tests using a child process (namely the MFD_NOEXEC_SEAL and MFD_EXEC tests) failed, masking test failures. Fixes: 11f75a01448f ("selftests/memfd: add tests for MFD_NOEXEC_SEAL MFD_EXEC") Reviewed-by: Jeff Xu Signed-off-by: Aleksa Sarai --- tools/testing/selftests/memfd/memfd_test.c | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/memfd/memfd_test.c b/tools/testing/selftests/memfd/memfd_test.c index dbdd9ec5e397..8eb49204f9ea 100644 --- a/tools/testing/selftests/memfd/memfd_test.c +++ b/tools/testing/selftests/memfd/memfd_test.c @@ -1207,7 +1207,24 @@ static pid_t spawn_newpid_thread(unsigned int flags, int (*fn)(void *)) static void join_newpid_thread(pid_t pid) { - waitpid(pid, NULL, 0); + int wstatus; + + if (waitpid(pid, &wstatus, 0) < 0) { + printf("newpid thread: waitpid() failed: %m\n"); + abort(); + } + + if (WIFEXITED(wstatus) && WEXITSTATUS(wstatus) != 0) { + printf("newpid thread: exited with non-zero error code %d\n", + WEXITSTATUS(wstatus)); + abort(); + } + + if (WIFSIGNALED(wstatus)) { + printf("newpid thread: killed by signal %d\n", + WTERMSIG(wstatus)); + abort(); + } } /* From patchwork Mon Aug 14 08:40:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aleksa Sarai X-Patchwork-Id: 135315 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b824:0:b0:3f2:4152:657d with SMTP id z4csp2645736vqi; Mon, 14 Aug 2023 03:24:14 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGg1FEldLACgYhtlCifG2kUdLGQuJVk2JP6RZ1PAfM4hTFiHIN2XOgyvWgEKChI2plQ6VE+ X-Received: by 2002:a05:6871:5c8:b0:1be:f720:bb7b with SMTP id v8-20020a05687105c800b001bef720bb7bmr10915399oan.16.1692008653845; Mon, 14 Aug 2023 03:24:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1692008653; cv=none; d=google.com; s=arc-20160816; b=iYg0yAbT40bXqAWdQ74SZd7ryz2ZWDo/UqXgKOSsY8cp3JqqnWrO1Sk6ctzzYF8UP2 Ruu6usS4aosa2RGJM14ZqhiSJqO5BrylCuCF3Iqd2jK9yFLGpbgS4CbmKpcHZLueLU4q JlVzO+UTcgV5tfyIxAhl/iKVdX/FKcEjDnouo3knv6gM9blKuOUVZimzvsh6CSeelvu5 kgzRje5ucAPVwttJQYdfxkdSC3qXtgsP3OA647lSAy+gMODZ0nxJIuh8LYFfu+n/mxlQ mPF27Bw3gEq2a/A2UV5GSrve74c3dQzJAoCXivjnTNlqnSxM4RdczUbnMgw8fpeptVoM 2P6A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:in-reply-to:references:message-id :content-transfer-encoding:mime-version:subject:date:from :dkim-signature; bh=8KHnXfVFe2HzozOZ++k3K0yZYQRi3T1iqdVW5ddEXr8=; fh=2nZKxd+TZqDRZrcTvF52H2/zIyJN8fpOeSuAynFSd6k=; b=0DZgFYDTsrAQpAZaCJLcL0dS2CrX2LWXJZ0hTOm90lg8xqhZVFR0xh7M4I1wgBiV2E 18mtOsIweM1kJHKJDYUztupH0rmZQMP4KTrzGhseWNAwt4NJe9V+i6oBadNFeCyIXazf UsUSak06v8JGN4SDotg6HIBSOSyOzZdC5BEAOP5tAINhHmWn6GJRS/7LnNdAfTqTMdw5 BlSPqyhAV6/VkVqbPawWAtA0IAQR/ojFEAZzeBcaV6kC0QyAj3I7+obTG7Y24dsPqIMK z7HFWe3CHob02fDqe+X5aEI8WRQgUldiVimu7xtcwYom2ZSRrjjXKJgiwv9oW+n2+l3P 4BoA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cyphar.com header.s=MBO0001 header.b=mkHgS1T1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=cyphar.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id gm16-20020a17090b101000b002682dc64492si7866897pjb.185.2023.08.14.03.24.01; Mon, 14 Aug 2023 03:24:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@cyphar.com header.s=MBO0001 header.b=mkHgS1T1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=cyphar.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234845AbjHNImG (ORCPT + 99 others); Mon, 14 Aug 2023 04:42:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47790 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234889AbjHNIlq (ORCPT ); Mon, 14 Aug 2023 04:41:46 -0400 Received: from mout-p-202.mailbox.org (mout-p-202.mailbox.org [80.241.56.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 590BA12E; Mon, 14 Aug 2023 01:41:40 -0700 (PDT) Received: from smtp202.mailbox.org (smtp202.mailbox.org [10.196.197.202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-202.mailbox.org (Postfix) with ESMTPS id 4RPSYX6LYkz9sV8; Mon, 14 Aug 2023 10:41:36 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cyphar.com; s=MBO0001; t=1692002496; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8KHnXfVFe2HzozOZ++k3K0yZYQRi3T1iqdVW5ddEXr8=; b=mkHgS1T1saMuQar2+hLvU869NXgYOqmVxqGItOAqDywdCYkf+YWmIvZvqXQ1SBOXWxGuri z93zrsoK0Cj8bIXSlfhGixvtUc41ZaUdmQ33jWm5GX/PhRIaYPBM2xOKWAjWjt7iKXOuSp cdkg+J36AMISU9xvCR/ymXcYfbqrBKjjqMe8CS/6BSKryd1PeDjLHqrqfxgJXnS6mOnuw7 VGa+VOLZHVdVZw4pftsvVsKbgEGQR2sKNlxkvQqUF2QLN7UEW+vpJ5ZLh6Mheo+OAE8yR3 MzgG0xTPbS5PMTklW09fnxmgJhThTcR47109IlAv79ze0UER9zp0IlFsBGOBmA== From: Aleksa Sarai Date: Mon, 14 Aug 2023 18:40:58 +1000 Subject: [PATCH v2 2/5] memfd: do not -EACCES old memfd_create() users with vm.memfd_noexec=2 MIME-Version: 1.0 Message-Id: <20230814-memfd-vm-noexec-uapi-fixes-v2-2-7ff9e3e10ba6@cyphar.com> References: <20230814-memfd-vm-noexec-uapi-fixes-v2-0-7ff9e3e10ba6@cyphar.com> In-Reply-To: <20230814-memfd-vm-noexec-uapi-fixes-v2-0-7ff9e3e10ba6@cyphar.com> To: Andrew Morton , Shuah Khan , Jeff Xu , Kees Cook , Daniel Verkamp Cc: Christian Brauner , Dominique Martinet , stable@vger.kernel.org, linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, Aleksa Sarai X-Developer-Signature: v=1; a=openpgp-sha256; l=6045; i=cyphar@cyphar.com; h=from:subject:message-id; bh=Oi08rzV4zlV5QA1O8wafoqXVIjsJnVHyeP4wDfF54wA=; b=owGbwMvMwCWmMf3Xpe0vXfIZT6slMaTcfLFS1FRjkR3j8lvbuN47TmXl+1bCoFiVbf3R3WTBC n3tA5dMO0pZGMS4GGTFFFm2+XmGbpq/+Eryp5VsMHNYmUCGMHBxCsBEbt9l+O/W9O/Pzh2L/Hdf l+5+En+Lk4cleX5P3688r/3Mkb/X56oz/Ga9VZCQltttufi1/uNLe312cCSlnN3yuVogwfjKV9F 34WwA X-Developer-Key: i=cyphar@cyphar.com; a=openpgp; fpr=C9C370B246B09F6DBCFC744C34401015D1D2D386 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1774199666044225749 X-GMAIL-MSGID: 1774199666044225749 Given the difficulty of auditing all of userspace to figure out whether every memfd_create() user has switched to passing MFD_EXEC and MFD_NOEXEC_SEAL flags, it seems far less distruptive to make it possible for older programs that don't make use of executable memfds to run under vm.memfd_noexec=2. Otherwise, a small dependency change can result in spurious errors. For programs that don't use executable memfds, passing MFD_NOEXEC_SEAL is functionally a no-op and thus having the same In addition, every failure under vm.memfd_noexec=2 needs to print to the kernel log so that userspace can figure out where the error came from. The concerns about pr_warn_ratelimited() spam that caused the switch to pr_warn_once()[1,2] do not apply to the vm.memfd_noexec=2 case. This is a user-visible API change, but as it allows programs to do something that would be blocked before, and the sysctl itself was broken and recently released, it seems unlikely this will cause any issues. [1]: https://lore.kernel.org/Y5yS8wCnuYGLHMj4@x1n/ [2]: https://lore.kernel.org/202212161233.85C9783FB@keescook/ Cc: Dominique Martinet Cc: Christian Brauner Cc: stable@vger.kernel.org # v6.3+ Fixes: 105ff5339f49 ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC") Signed-off-by: Aleksa Sarai --- include/linux/pid_namespace.h | 16 ++++------------ mm/memfd.c | 30 +++++++++++------------------- tools/testing/selftests/memfd/memfd_test.c | 22 +++++++++++++++++----- 3 files changed, 32 insertions(+), 36 deletions(-) diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h index c758809d5bcf..53974d79d98e 100644 --- a/include/linux/pid_namespace.h +++ b/include/linux/pid_namespace.h @@ -17,18 +17,10 @@ struct fs_pin; #if defined(CONFIG_SYSCTL) && defined(CONFIG_MEMFD_CREATE) -/* - * sysctl for vm.memfd_noexec - * 0: memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL - * acts like MFD_EXEC was set. - * 1: memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL - * acts like MFD_NOEXEC_SEAL was set. - * 2: memfd_create() without MFD_NOEXEC_SEAL will be - * rejected. - */ -#define MEMFD_NOEXEC_SCOPE_EXEC 0 -#define MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL 1 -#define MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED 2 +/* modes for vm.memfd_noexec sysctl */ +#define MEMFD_NOEXEC_SCOPE_EXEC 0 /* MFD_EXEC implied if unset */ +#define MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL 1 /* MFD_NOEXEC_SEAL implied if unset */ +#define MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED 2 /* same as 1, except MFD_EXEC rejected */ #endif struct pid_namespace { diff --git a/mm/memfd.c b/mm/memfd.c index 0bdbd2335af7..d65485c762de 100644 --- a/mm/memfd.c +++ b/mm/memfd.c @@ -271,30 +271,22 @@ long memfd_fcntl(struct file *file, unsigned int cmd, unsigned int arg) static int check_sysctl_memfd_noexec(unsigned int *flags) { #ifdef CONFIG_SYSCTL - char comm[TASK_COMM_LEN]; - int sysctl = MEMFD_NOEXEC_SCOPE_EXEC; - struct pid_namespace *ns; - - ns = task_active_pid_ns(current); - if (ns) - sysctl = ns->memfd_noexec_scope; + int sysctl = task_active_pid_ns(current)->memfd_noexec_scope; if (!(*flags & (MFD_EXEC | MFD_NOEXEC_SEAL))) { - if (sysctl == MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL) + if (sysctl >= MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL) *flags |= MFD_NOEXEC_SEAL; else *flags |= MFD_EXEC; } - if (*flags & MFD_EXEC && sysctl >= MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED) { - pr_warn_once( - "memfd_create(): MFD_NOEXEC_SEAL is enforced, pid=%d '%s'\n", - task_pid_nr(current), get_task_comm(comm, current)); - + if (!(*flags & MFD_NOEXEC_SEAL) && sysctl >= MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED) { + pr_err_ratelimited( + "%s[%d]: memfd_create() requires MFD_NOEXEC_SEAL with vm.memfd_noexec=%d\n", + current->comm, task_pid_nr(current), sysctl); return -EACCES; } #endif - return 0; } @@ -302,7 +294,6 @@ SYSCALL_DEFINE2(memfd_create, const char __user *, uname, unsigned int, flags) { - char comm[TASK_COMM_LEN]; unsigned int *file_seals; struct file *file; int fd, error; @@ -325,12 +316,13 @@ SYSCALL_DEFINE2(memfd_create, if (!(flags & (MFD_EXEC | MFD_NOEXEC_SEAL))) { pr_warn_once( - "memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=%d '%s'\n", - task_pid_nr(current), get_task_comm(comm, current)); + "%s[%d]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set\n", + current->comm, task_pid_nr(current)); } - if (check_sysctl_memfd_noexec(&flags) < 0) - return -EACCES; + error = check_sysctl_memfd_noexec(&flags); + if (error < 0) + return error; /* length includes terminating zero */ len = strnlen_user(uname, MFD_NAME_MAX_LEN + 1); diff --git a/tools/testing/selftests/memfd/memfd_test.c b/tools/testing/selftests/memfd/memfd_test.c index 8eb49204f9ea..8b7390ad81d1 100644 --- a/tools/testing/selftests/memfd/memfd_test.c +++ b/tools/testing/selftests/memfd/memfd_test.c @@ -1145,11 +1145,23 @@ static void test_sysctl_child(void) printf("%s sysctl 2\n", memfd_str); sysctl_assert_write("2"); - mfd_fail_new("kern_memfd_sysctl_2", - MFD_CLOEXEC | MFD_ALLOW_SEALING); - mfd_fail_new("kern_memfd_sysctl_2_MFD_EXEC", - MFD_CLOEXEC | MFD_EXEC); - fd = mfd_assert_new("", 0, MFD_NOEXEC_SEAL); + mfd_fail_new("kern_memfd_sysctl_2_exec", + MFD_EXEC | MFD_CLOEXEC | MFD_ALLOW_SEALING); + + fd = mfd_assert_new("kern_memfd_sysctl_2_dfl", + mfd_def_size, + MFD_CLOEXEC | MFD_ALLOW_SEALING); + mfd_assert_mode(fd, 0666); + mfd_assert_has_seals(fd, F_SEAL_EXEC); + mfd_fail_chmod(fd, 0777); + close(fd); + + fd = mfd_assert_new("kern_memfd_sysctl_2_noexec_seal", + mfd_def_size, + MFD_NOEXEC_SEAL | MFD_CLOEXEC | MFD_ALLOW_SEALING); + mfd_assert_mode(fd, 0666); + mfd_assert_has_seals(fd, F_SEAL_EXEC); + mfd_fail_chmod(fd, 0777); close(fd); sysctl_fail_write("0"); From patchwork Mon Aug 14 08:40:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aleksa Sarai X-Patchwork-Id: 135273 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b824:0:b0:3f2:4152:657d with SMTP id z4csp2626609vqi; Mon, 14 Aug 2023 02:38:21 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHRrNx7Y//6uYGMSX9XvGZWs0hATDlIOMiq9kTungqRgau1iQZ8vFyYg0e9/UP3jh6daNb2 X-Received: by 2002:a05:6358:7e0c:b0:135:43da:b16d with SMTP id o12-20020a0563587e0c00b0013543dab16dmr4857950rwm.11.1692005901432; Mon, 14 Aug 2023 02:38:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1692005901; cv=none; d=google.com; s=arc-20160816; b=qOTrpKGLO4+psqfZFTaF2WFbJdkV+N3UGmk0nhR6V56gg+ngOLU+byFKXa0g7kVpRZ XwI6NUOcVii7fe6igKd36uvOkcXpREKYkFD0lpUs+llehVFcbDVC09LvNEGUNBwlzUYs wpYKs9bcd1GAlK5SOqskQQj0Pl2SvPdXnWIhxDN6FS8U6KjZ4n1bFsTBtOjS7S0lyG9w 12nsLSvtYX+9YLQbJFlRMCbjId8WdGv99ShwlX7d4KH6UAsiJyLbRJv7XHq6zGItyz8H 1Z78Kot37uddQCoNdldjpD7/K/HJ7uCwSd+VDFdWUN51NLqSlIKfKWg67xxHBT20yI6z JQ7w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:in-reply-to:references:message-id :content-transfer-encoding:mime-version:subject:date:from :dkim-signature; bh=79RoJoR/fwEl868XiYEwCuZo1mWc1anAic2p8X8NglE=; fh=2nZKxd+TZqDRZrcTvF52H2/zIyJN8fpOeSuAynFSd6k=; b=End8SU33Gfugy8TfslUwXwuMj7S0qntAeT4lENo9SrdtPMN0ucOKmXVlW5FjZRzaf6 4MIPfYR4lkFhCpRSXnjCcc7elHVRr2u24/P5h6AdyPptIb4iin8CGMXFU62HysivhrCL KIlYpnPDKVr9FK8orF918O3sLrFdMB+c1YlijgFHSs2/CqPu2EHozauknZiz+ZKUgVdw 9y1pHqtOp+ntKl6MeiPczv2N7vCW1mROEvGDyXtuxLi77biZw5GHn/UBHEcJ8Jkrzvor uvFizvyKHipyGFvLMDTWQdzqzxcYQyaiePT9RoEF8MQOfW5NB2S7aPpUMvSESsD3xnDh p8Lg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cyphar.com header.s=MBO0001 header.b=p1mH3+eN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=cyphar.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l185-20020a6391c2000000b00563a0c1bf09si7827485pge.208.2023.08.14.02.38.08; Mon, 14 Aug 2023 02:38:21 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@cyphar.com header.s=MBO0001 header.b=p1mH3+eN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=cyphar.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234854AbjHNImI (ORCPT + 99 others); Mon, 14 Aug 2023 04:42:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47800 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234904AbjHNIls (ORCPT ); Mon, 14 Aug 2023 04:41:48 -0400 Received: from mout-p-202.mailbox.org (mout-p-202.mailbox.org [IPv6:2001:67c:2050:0:465::202]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B3927127; Mon, 14 Aug 2023 01:41:45 -0700 (PDT) Received: from smtp202.mailbox.org (smtp202.mailbox.org [IPv6:2001:67c:2050:b231:465::202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-202.mailbox.org (Postfix) with ESMTPS id 4RPSYd70PJz9slh; Mon, 14 Aug 2023 10:41:41 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cyphar.com; s=MBO0001; t=1692002502; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=79RoJoR/fwEl868XiYEwCuZo1mWc1anAic2p8X8NglE=; b=p1mH3+eN9eCVXdi/TxEUdL6OIDyTpQYTMvtKaZZl79SDtqiMak9oJhVRBoeaG94oNA8btJ PXKhAtBfpm7RHIgirGHwamL0Dqr5iJzMUBPRM7gjZR47hIhN/BQmCcoL7ufFIkmwRd7HqN mVSznBoc06lgSX95l1aKE8O5Iqd0oGrgrweoODmiJ5QxpRealAixDosNM60Yo40IYyHmgg UYfjbMmbOD9Hz5BIR3vTbMmxgQcmoaVDW7Tiy9rJ0hv/5isMvaFDnVF17zljfHsxF5j7Ls UyjsKXdyA9wOQ660D6kuX8HSkD2lACAmd416Q3kl9z/0UdxSYT4GZVOm+puhqA== From: Aleksa Sarai Date: Mon, 14 Aug 2023 18:40:59 +1000 Subject: [PATCH v2 3/5] memfd: improve userspace warnings for missing exec-related flags MIME-Version: 1.0 Message-Id: <20230814-memfd-vm-noexec-uapi-fixes-v2-3-7ff9e3e10ba6@cyphar.com> References: <20230814-memfd-vm-noexec-uapi-fixes-v2-0-7ff9e3e10ba6@cyphar.com> In-Reply-To: <20230814-memfd-vm-noexec-uapi-fixes-v2-0-7ff9e3e10ba6@cyphar.com> To: Andrew Morton , Shuah Khan , Jeff Xu , Kees Cook , Daniel Verkamp Cc: Christian Brauner , Dominique Martinet , stable@vger.kernel.org, linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, Aleksa Sarai X-Developer-Signature: v=1; a=openpgp-sha256; l=2138; i=cyphar@cyphar.com; h=from:subject:message-id; bh=RhSvSYmzoCDTkzH6eBS7S4EFPaKzfTKNkGx9zoaFDXg=; b=owGbwMvMwCWmMf3Xpe0vXfIZT6slMaTcfLHKkqnzauO+Q1m9W/cV6QoIafZdM/7bxXw//nBq8 PzZ8dumdpSyMIhxMciKKbJs8/MM3TR/8ZXkTyvZYOawMoEMYeDiFICJPFNl+F/zy9EjaynTW+l7 ofv84tUT6+bn3924QbCv9tcRc+nsOj5GhjfxJfaqHXPWTy+PW/w4KtRJ8Llqn1Da/fm3D9eIrua pYQcA X-Developer-Key: i=cyphar@cyphar.com; a=openpgp; fpr=C9C370B246B09F6DBCFC744C34401015D1D2D386 X-Rspamd-Queue-Id: 4RPSYd70PJz9slh X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1774196779989761289 X-GMAIL-MSGID: 1774196779989761289 In order to incentivise userspace to switch to passing MFD_EXEC and MFD_NOEXEC_SEAL, we need to provide a warning on each attempt to call memfd_create() without the new flags. pr_warn_once() is not useful because on most systems the one warning is burned up during the boot process (on my system, systemd does this within the first second of boot) and thus userspace will in practice never see the warnings to push them to switch to the new flags. The original patchset[1] used pr_warn_ratelimited(), however there were concerns about the degree of spam in the kernel log[2,3]. The resulting inability to detect every case was flagged as an issue at the time[4]. While we could come up with an alternative rate-limiting scheme such as only outputting the message if vm.memfd_noexec has been modified, or only outputting the message once for a given task, these alternatives have downsides that don't make sense given how low-stakes a single kernel warning message is. Switching to pr_info_ratelimited() instead should be fine -- it's possible some monitoring tool will be unhappy with a stream of warning-level messages but there's already plenty of info-level message spam in dmesg. [1]: https://lore.kernel.org/20221215001205.51969-4-jeffxu@google.com/ [2]: https://lore.kernel.org/202212161233.85C9783FB@keescook/ [3]: https://lore.kernel.org/Y5yS8wCnuYGLHMj4@x1n/ [4]: https://lore.kernel.org/f185bb42-b29c-977e-312e-3349eea15383@linuxfoundation.org/ Cc: stable@vger.kernel.org # v6.3+ Fixes: 105ff5339f49 ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC") Signed-off-by: Aleksa Sarai Reviewed-by: Christian Brauner --- mm/memfd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/memfd.c b/mm/memfd.c index d65485c762de..aa46521057ab 100644 --- a/mm/memfd.c +++ b/mm/memfd.c @@ -315,7 +315,7 @@ SYSCALL_DEFINE2(memfd_create, return -EINVAL; if (!(flags & (MFD_EXEC | MFD_NOEXEC_SEAL))) { - pr_warn_once( + pr_info_ratelimited( "%s[%d]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set\n", current->comm, task_pid_nr(current)); } From patchwork Mon Aug 14 08:41:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aleksa Sarai X-Patchwork-Id: 135319 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b824:0:b0:3f2:4152:657d with SMTP id z4csp2646396vqi; Mon, 14 Aug 2023 03:26:07 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGY/C9SHkuWHdqKlQiH+F/QL/pX07PMqvOWANU01OE5vjjAbjElqN1Qf9/g0GxIYoSYxqnO X-Received: by 2002:a17:90a:5a4d:b0:269:1e3f:a54d with SMTP id m13-20020a17090a5a4d00b002691e3fa54dmr8023162pji.10.1692008767099; Mon, 14 Aug 2023 03:26:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1692008767; cv=none; d=google.com; s=arc-20160816; b=ow3wNGJcYBrtivi1vPpcdRJzmltct2Rv7wQ+6d5B7wUw2XHJQQU6eSq2cIue84ulro qXNeEyXGHdMW7uILvwr3QLsGgS82/Y38wvY++JZOrAU1rN2QLjXWMg1bK5PimHO0i47x TPpRvp88QiaYYA0M+Wi54VwzlNlETuUE/PV/mXGJ2g4r15ls429Xld0uDdTcQrsg4sWB 6dY0EUXVv/+wcyLWJUE15ID7Mr8Dml7Zr7bfjFz2d8uQD6NNRAiJUI/6SHUSOfTDTLEX ObwE5uhQhrNCOpOg++zqTaaTdG3ZLJvRF4X4rFOSV652BG/u7e3JUCYO57IBK5i1NKBM GCNw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:in-reply-to:references:message-id :content-transfer-encoding:mime-version:subject:date:from :dkim-signature; bh=YYw3VsM//wPuCzVBeDBi0yiXj1DgwyxLCg5CNbEnouM=; fh=2nZKxd+TZqDRZrcTvF52H2/zIyJN8fpOeSuAynFSd6k=; b=X16pBPK7Ythgo1IeVBoK/xirQ0jSFt8bU/FE1iBGYj84ozNMnhpeMjk35Lovv2H3Hi ACPVyCw7BIRTLsqBtjXi+PJjLaHKIUl7zyitI0UtaI8dN5sutwtvWUccdTrHydCtORnV twtM+7WOWEZDEuSG77BsJzpjLzVeBuzs66bnGmIIPtC4tn0DoP9g2uTCut2R0KqAw1XJ JnnMaVAdzTY5mOpfOxEnJs+iBTiPPcPlrAZo9DvuZOUU7aEXQ1Rc8sbYAphMOapgibV+ ft75hHXegWD0L7hKN/zF8rS7XdoX3hstkaFLgXy68xRtjUuUep59lee5S3Y9bh0FGZ/h cY+A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cyphar.com header.s=MBO0001 header.b=FW4pAGCj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=cyphar.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i2-20020a17090ac40200b0026b466905adsi3050938pjt.71.2023.08.14.03.25.54; Mon, 14 Aug 2023 03:26:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@cyphar.com header.s=MBO0001 header.b=FW4pAGCj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=cyphar.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234820AbjHNImY (ORCPT + 99 others); Mon, 14 Aug 2023 04:42:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47798 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234283AbjHNIlx (ORCPT ); Mon, 14 Aug 2023 04:41:53 -0400 Received: from mout-p-103.mailbox.org (mout-p-103.mailbox.org [80.241.56.161]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 968FE129; Mon, 14 Aug 2023 01:41:50 -0700 (PDT) Received: from smtp202.mailbox.org (smtp202.mailbox.org [IPv6:2001:67c:2050:b231:465::202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-103.mailbox.org (Postfix) with ESMTPS id 4RPSYl1sPZz9sSk; Mon, 14 Aug 2023 10:41:47 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cyphar.com; s=MBO0001; t=1692002507; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YYw3VsM//wPuCzVBeDBi0yiXj1DgwyxLCg5CNbEnouM=; b=FW4pAGCjQrxS9kPO8UYvY2wfmvyzfNE3ar8tAhYT8AGZbuNMo3b1ODPydY1aGAH/x6LhX4 6jB1DEBjKA9mW8gjTHmKZDuQjEFV1Ld0MeCYsSdvcJ5jDvlZahmyqXlYJu+bB1cHTZjR3R 1RB+CQPRho3REB3s13nRV+Dh2sR6yKNBKA++UGvK3B+kN/Jju9zriOkLqiO9YUDU6qE3tV oZrb9a3lSzcun2V7BNPrWtpVzjfmlnHrx7smaVNTSHPWfJsG/9UNHSkf9XPZATyTsyurvc ST1H8jrxOmhNk3e4vW0vdgILUT3YKEsLxGdZIOKw0kJ23wcPum1m1VfCLeoF/g== From: Aleksa Sarai Date: Mon, 14 Aug 2023 18:41:00 +1000 Subject: [PATCH v2 4/5] memfd: replace ratcheting feature from vm.memfd_noexec with hierarchy MIME-Version: 1.0 Message-Id: <20230814-memfd-vm-noexec-uapi-fixes-v2-4-7ff9e3e10ba6@cyphar.com> References: <20230814-memfd-vm-noexec-uapi-fixes-v2-0-7ff9e3e10ba6@cyphar.com> In-Reply-To: <20230814-memfd-vm-noexec-uapi-fixes-v2-0-7ff9e3e10ba6@cyphar.com> To: Andrew Morton , Shuah Khan , Jeff Xu , Kees Cook , Daniel Verkamp Cc: Christian Brauner , Dominique Martinet , stable@vger.kernel.org, linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, Aleksa Sarai X-Developer-Signature: v=1; a=openpgp-sha256; l=10124; i=cyphar@cyphar.com; h=from:subject:message-id; bh=U7ufwXeR3BywMTVYzMKLH+ocS470Rxhe9PUxTLcsuDk=; b=owGbwMvMwCWmMf3Xpe0vXfIZT6slMaTcfLEmXWFyS8qGo5za91//3FAcP4OVm1d+JpNYzjUHD eYH+xjiOkpZGMS4GGTFFFm2+XmGbpq/+Eryp5VsMHNYmUCGMHBxCsBE/s5g+Gch39W/f80Jq8Qd K2Jb0u73q+yUWZJ6uXjCllft3Ze4gkQZGX49jo+etthCvmLRglfvPhh+Ot27+ojTmuMTlKqaKsz tfvMCAA== X-Developer-Key: i=cyphar@cyphar.com; a=openpgp; fpr=C9C370B246B09F6DBCFC744C34401015D1D2D386 X-Rspamd-Queue-Id: 4RPSYl1sPZz9sSk X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1774199784868973436 X-GMAIL-MSGID: 1774199784868973436 This sysctl has the very unusual behaviour of not allowing any user (even CAP_SYS_ADMIN) to reduce the restriction setting, meaning that if you were to set this sysctl to a more restrictive option in the host pidns you would need to reboot your machine in order to reset it. The justification given in [1] is that this is a security feature and thus it should not be possible to disable. Aside from the fact that we have plenty of security-related sysctls that can be disabled after being enabled (fs.protected_symlinks for instance), the protection provided by the sysctl is to stop users from being able to create a binary and then execute it. A user with CAP_SYS_ADMIN can trivially do this without memfd_create(2): % cat mount-memfd.c #include #include #include #include #include #include #define SHELLCODE "#!/bin/echo this file was executed from this totally private tmpfs:" int main(void) { int fsfd = fsopen("tmpfs", FSOPEN_CLOEXEC); assert(fsfd >= 0); assert(!fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 2)); int dfd = fsmount(fsfd, FSMOUNT_CLOEXEC, 0); assert(dfd >= 0); int execfd = openat(dfd, "exe", O_CREAT | O_RDWR | O_CLOEXEC, 0782); assert(execfd >= 0); assert(write(execfd, SHELLCODE, strlen(SHELLCODE)) == strlen(SHELLCODE)); assert(!close(execfd)); char *execpath = NULL; char *argv[] = { "bad-exe", NULL }, *envp[] = { NULL }; execfd = openat(dfd, "exe", O_PATH | O_CLOEXEC); assert(execfd >= 0); assert(asprintf(&execpath, "/proc/self/fd/%d", execfd) > 0); assert(!execve(execpath, argv, envp)); } % ./mount-memfd this file was executed from this totally private tmpfs: /proc/self/fd/5 % Given that it is possible for CAP_SYS_ADMIN users to create executable binaries without memfd_create(2) and without touching the host filesystem (not to mention the many other things a CAP_SYS_ADMIN process would be able to do that would be equivalent or worse), it seems strange to cause a fair amount of headache to admins when there doesn't appear to be an actual security benefit to blocking this. There appear to be concerns about confused-deputy-esque attacks[2] but a confused deputy that can write to arbitrary sysctls is a bigger security issue than executable memfds. /* New API */ The primary requirement from the original author appears to be more based on the need to be able to restrict an entire system in a hierarchical manner[3], such that child namespaces cannot re-enable executable memfds. So, implement that behaviour explicitly -- the vm.memfd_noexec scope is evaluated up the pidns tree to &init_pid_ns and you have the most restrictive value applied to you. The new lower limit you can set vm.memfd_noexec is whatever limit applies to your parent. Note that a pidns will inherit a copy of the parent pidns's effective vm.memfd_noexec setting at unshare() time. This matches the existing behaviour, and it also ensures that a pidns will never have its vm.memfd_noexec setting *lowered* behind its back (but it will be raised if the parent raises theirs). /* Backwards Compatibility */ As the previous version of the sysctl didn't allow you to lower the setting at all, there are no backwards compatibility issues with this aspect of the change. However it should be noted that now that the setting is completely hierarchical. Previously, a cloned pidns would just copy the current pidns setting, meaning that if the parent's vm.memfd_noexec was changed it wouldn't propoagate to existing pid namespaces. Now, the restriction applies recursively. This is a uAPI change, however: * The sysctl is very new, having been merged in 6.3. * Several aspects of the sysctl were broken up until this patchset and the other patchset by Jeff Xu last month. And thus it seems incredibly unlikely that any real users would run into this issue. In the worst case, if this causes userspace isues we could make it so that modifying the setting follows the hierarchical rules but the restriction checking uses the cached copy. [1]: https://lore.kernel.org/CABi2SkWnAgHK1i6iqSqPMYuNEhtHBkO8jUuCvmG3RmUB5TKHJw@mail.gmail.com/ [2]: https://lore.kernel.org/CALmYWFs_dNCzw_pW1yRAo4bGCPEtykroEQaowNULp7svwMLjOg@mail.gmail.com/ [3]: https://lore.kernel.org/CALmYWFuahdUF7cT4cm7_TGLqPanuHXJ-hVSfZt7vpTnc18DPrw@mail.gmail.com/ Cc: Dominique Martinet Cc: Christian Brauner Cc: stable@vger.kernel.org # v6.3+ Fixes: 105ff5339f49 ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC") Signed-off-by: Aleksa Sarai --- include/linux/pid_namespace.h | 23 ++++++++++++++++++++++- kernel/pid.c | 3 +++ kernel/pid_namespace.c | 6 +++--- kernel/pid_sysctl.h | 28 ++++++++++++---------------- mm/memfd.c | 3 ++- 5 files changed, 42 insertions(+), 21 deletions(-) diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h index 53974d79d98e..f9f9931e02d6 100644 --- a/include/linux/pid_namespace.h +++ b/include/linux/pid_namespace.h @@ -39,7 +39,6 @@ struct pid_namespace { int reboot; /* group exit code if this pidns was rebooted */ struct ns_common ns; #if defined(CONFIG_SYSCTL) && defined(CONFIG_MEMFD_CREATE) - /* sysctl for vm.memfd_noexec */ int memfd_noexec_scope; #endif } __randomize_layout; @@ -56,6 +55,23 @@ static inline struct pid_namespace *get_pid_ns(struct pid_namespace *ns) return ns; } +#if defined(CONFIG_SYSCTL) && defined(CONFIG_MEMFD_CREATE) +static inline int pidns_memfd_noexec_scope(struct pid_namespace *ns) +{ + int scope = MEMFD_NOEXEC_SCOPE_EXEC; + + for (; ns; ns = ns->parent) + scope = max(scope, READ_ONCE(ns->memfd_noexec_scope)); + + return scope; +} +#else +static inline int pidns_memfd_noexec_scope(struct pid_namespace *ns) +{ + return 0; +} +#endif + extern struct pid_namespace *copy_pid_ns(unsigned long flags, struct user_namespace *user_ns, struct pid_namespace *ns); extern void zap_pid_ns_processes(struct pid_namespace *pid_ns); @@ -70,6 +86,11 @@ static inline struct pid_namespace *get_pid_ns(struct pid_namespace *ns) return ns; } +static inline int pidns_memfd_noexec_scope(struct pid_namespace *ns) +{ + return 0; +} + static inline struct pid_namespace *copy_pid_ns(unsigned long flags, struct user_namespace *user_ns, struct pid_namespace *ns) { diff --git a/kernel/pid.c b/kernel/pid.c index 6a1d23a11026..fee14a4486a3 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -83,6 +83,9 @@ struct pid_namespace init_pid_ns = { #ifdef CONFIG_PID_NS .ns.ops = &pidns_operations, #endif +#if defined(CONFIG_SYSCTL) && defined(CONFIG_MEMFD_CREATE) + .memfd_noexec_scope = MEMFD_NOEXEC_SCOPE_EXEC, +#endif }; EXPORT_SYMBOL_GPL(init_pid_ns); diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c index 0bf44afe04dd..619972c78774 100644 --- a/kernel/pid_namespace.c +++ b/kernel/pid_namespace.c @@ -110,9 +110,9 @@ static struct pid_namespace *create_pid_namespace(struct user_namespace *user_ns ns->user_ns = get_user_ns(user_ns); ns->ucounts = ucounts; ns->pid_allocated = PIDNS_ADDING; - - initialize_memfd_noexec_scope(ns); - +#if defined(CONFIG_SYSCTL) && defined(CONFIG_MEMFD_CREATE) + ns->memfd_noexec_scope = pidns_memfd_noexec_scope(parent_pid_ns); +#endif return ns; out_free_idr: diff --git a/kernel/pid_sysctl.h b/kernel/pid_sysctl.h index b26e027fc9cd..2ee41a3a1dfd 100644 --- a/kernel/pid_sysctl.h +++ b/kernel/pid_sysctl.h @@ -5,33 +5,30 @@ #include #if defined(CONFIG_SYSCTL) && defined(CONFIG_MEMFD_CREATE) -static inline void initialize_memfd_noexec_scope(struct pid_namespace *ns) -{ - ns->memfd_noexec_scope = - task_active_pid_ns(current)->memfd_noexec_scope; -} - static int pid_mfd_noexec_dointvec_minmax(struct ctl_table *table, int write, void *buf, size_t *lenp, loff_t *ppos) { struct pid_namespace *ns = task_active_pid_ns(current); struct ctl_table table_copy; + int err, scope, parent_scope; if (write && !ns_capable(ns->user_ns, CAP_SYS_ADMIN)) return -EPERM; table_copy = *table; - if (ns != &init_pid_ns) - table_copy.data = &ns->memfd_noexec_scope; - /* - * set minimum to current value, the effect is only bigger - * value is accepted. - */ - if (*(int *)table_copy.data > *(int *)table_copy.extra1) - table_copy.extra1 = table_copy.data; + /* You cannot set a lower enforcement value than your parent. */ + parent_scope = pidns_memfd_noexec_scope(ns->parent); + /* Equivalent to pidns_memfd_noexec_scope(ns). */ + scope = max(READ_ONCE(ns->memfd_noexec_scope), parent_scope); + + table_copy.data = &scope; + table_copy.extra1 = &parent_scope; - return proc_dointvec_minmax(&table_copy, write, buf, lenp, ppos); + err = proc_dointvec_minmax(&table_copy, write, buf, lenp, ppos); + if (!err && write) + WRITE_ONCE(ns->memfd_noexec_scope, scope); + return err; } static struct ctl_table pid_ns_ctl_table_vm[] = { @@ -51,7 +48,6 @@ static inline void register_pid_ns_sysctl_table_vm(void) register_sysctl("vm", pid_ns_ctl_table_vm); } #else -static inline void initialize_memfd_noexec_scope(struct pid_namespace *ns) {} static inline void register_pid_ns_sysctl_table_vm(void) {} #endif diff --git a/mm/memfd.c b/mm/memfd.c index aa46521057ab..1cad1904fc26 100644 --- a/mm/memfd.c +++ b/mm/memfd.c @@ -271,7 +271,8 @@ long memfd_fcntl(struct file *file, unsigned int cmd, unsigned int arg) static int check_sysctl_memfd_noexec(unsigned int *flags) { #ifdef CONFIG_SYSCTL - int sysctl = task_active_pid_ns(current)->memfd_noexec_scope; + struct pid_namespace *ns = task_active_pid_ns(current); + int sysctl = pidns_memfd_noexec_scope(ns); if (!(*flags & (MFD_EXEC | MFD_NOEXEC_SEAL))) { if (sysctl >= MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL) From patchwork Mon Aug 14 08:41:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aleksa Sarai X-Patchwork-Id: 135247 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b824:0:b0:3f2:4152:657d with SMTP id z4csp2621853vqi; Mon, 14 Aug 2023 02:25:52 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHaoZu1GJKZYyJeiafl3ch2G7TK/oDbYrOgRl8ivEfq2tczp/6t5Ax22ueWbdpsecK7y9Fx X-Received: by 2002:a05:6402:790:b0:523:3e5d:8aa2 with SMTP id d16-20020a056402079000b005233e5d8aa2mr6608841edy.14.1692005151983; Mon, 14 Aug 2023 02:25:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1692005151; cv=none; d=google.com; s=arc-20160816; b=H2a0BiIf5XS/Csq/yjdTDkZ7tpvgltgOssSzgxaKRf5VaAhnT6vMzwu0Y9cZ+558xZ gFFmJ1FAN5vUjezrHVQo1FMr17+JZihDqLqw9YFmP5x5WhqZIxyty3TjRqFJ7qlBRpV8 2ZPc+fKGkXHOCYn05fUpZBNEmX2SnS/lfM6AfBJYyobOzPphAl25pDJale6R5zI8t5vv vMVE0c9+BSEe/EWKzrmdeSdPxh49595udwQOKfRWMRygJSEb/gMlecKz0/qQ5hHwA36X A5XCWAwXivE5WwhF9JeEhS5TIBwknI7GDwk2IowpdBqYbPIitbecUsd6ht7gsB0dfbn+ u/ow== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:in-reply-to:references:message-id :content-transfer-encoding:mime-version:subject:date:from :dkim-signature; bh=W7cNopV9QtwDfWAl6SLw4jWogCD7aoRQLSKJtTeB2Z4=; fh=2nZKxd+TZqDRZrcTvF52H2/zIyJN8fpOeSuAynFSd6k=; b=UlKl1WwXefFKt28VUXcmBjqj5FqIop2HYvvMggWfiCzVlElp490ITcJbsAwYAuhcHe zI0ljikXOXatyrhtCdOhdepO32afBzXIKSRMIe1/E0Th4Bt1+3P4n8LM6eH8PR8Fqv4r pjR3tAcLQbk742fEukrwFwSAn00yw3oyq3GjhG7Rir86go1L+NozL+bz1LKCFd5Hdw5t Z4VoLP/fiKs9Uea9IetwZsXd1jZ8CrelmcbUodqtp5Z9inMuIgPzRzwvB0rflcHcwGCz MKWk6v65yxfLnDeNkfiyfj/MnAJaqwdt8YgYqcwEBOwdERvGVBcFWZ1WX7jrNp60eHg6 OIiQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cyphar.com header.s=MBO0001 header.b=RMj4CUJ8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=cyphar.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v7-20020aa7d9c7000000b0052336222abdsi7936391eds.510.2023.08.14.02.25.28; Mon, 14 Aug 2023 02:25:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@cyphar.com header.s=MBO0001 header.b=RMj4CUJ8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=cyphar.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234889AbjHNIm1 (ORCPT + 99 others); Mon, 14 Aug 2023 04:42:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46792 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234757AbjHNIl5 (ORCPT ); Mon, 14 Aug 2023 04:41:57 -0400 Received: from mout-p-101.mailbox.org (mout-p-101.mailbox.org [IPv6:2001:67c:2050:0:465::101]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A0E3A12E; Mon, 14 Aug 2023 01:41:55 -0700 (PDT) Received: from smtp202.mailbox.org (smtp202.mailbox.org [10.196.197.202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-101.mailbox.org (Postfix) with ESMTPS id 4RPSYr29qcz9sn2; Mon, 14 Aug 2023 10:41:52 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cyphar.com; s=MBO0001; t=1692002512; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=W7cNopV9QtwDfWAl6SLw4jWogCD7aoRQLSKJtTeB2Z4=; b=RMj4CUJ8Rvd/ZNQXo32QMV7F9lurzqTpsl02OwCu1jTFQ0n1d0fmf72WxHZ1ua+c7jCd9S D1E9rniYp4s34lLn7rTOmuQpmh3X6pRzrzURyHouUZnlH8th5qc4KVaHELYz6gQeWZpln9 Vgr+Uwo20A0DzCdzDAKx/PsWiZrylfeBS0edlCGMrrdZhKBQMuJCc+nW+saFy+eUxwGf0d /C3mvRd8NQMshleiI648XFnhOm+WbbnHPwtFAXk+GOcWLpgSGxvim4H+QKq7B+9A3O8LTY AMH1dOh1rk34EaAWtAIFNW6K9gvaZ1cf5MF5xsgbBghS5oC5H3C0dlRfJeTWbA== From: Aleksa Sarai Date: Mon, 14 Aug 2023 18:41:01 +1000 Subject: [PATCH v2 5/5] selftests: improve vm.memfd_noexec sysctl tests MIME-Version: 1.0 Message-Id: <20230814-memfd-vm-noexec-uapi-fixes-v2-5-7ff9e3e10ba6@cyphar.com> References: <20230814-memfd-vm-noexec-uapi-fixes-v2-0-7ff9e3e10ba6@cyphar.com> In-Reply-To: <20230814-memfd-vm-noexec-uapi-fixes-v2-0-7ff9e3e10ba6@cyphar.com> To: Andrew Morton , Shuah Khan , Jeff Xu , Kees Cook , Daniel Verkamp Cc: Christian Brauner , Dominique Martinet , stable@vger.kernel.org, linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, Aleksa Sarai X-Developer-Signature: v=1; a=openpgp-sha256; l=13000; i=cyphar@cyphar.com; h=from:subject:message-id; bh=0sdx3MhmB3vX/NM+/SK9/yhr9yRSi3Geq1KMMGDcLxI=; b=owGbwMvMwCWmMf3Xpe0vXfIZT6slMaTcfLG2x/3+jGwuzwQtTl/LxLKg180rat3ZkqJalPZ28 m4xPnS+o5SFQYyLQVZMkWWbn2fopvmLryR/WskGM4eVCWQIAxenAEzk2QFGhs4PXJ+fSNazrA34 9lOx7d2F0lMb0t2/9PwKXp7O0yOxYwkjwxtGGx9O5lVLjRPsOj1U3HqnnHOabjbXuDJo82yHIOl 6XgA= X-Developer-Key: i=cyphar@cyphar.com; a=openpgp; fpr=C9C370B246B09F6DBCFC744C34401015D1D2D386 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1774195994030506295 X-GMAIL-MSGID: 1774195994030506295 This adds proper tests for the nesting functionality of vm.memfd_noexec as well as some minor cleanups to spawn_*_thread(). Signed-off-by: Aleksa Sarai --- tools/testing/selftests/memfd/memfd_test.c | 339 +++++++++++++++++++++-------- 1 file changed, 254 insertions(+), 85 deletions(-) diff --git a/tools/testing/selftests/memfd/memfd_test.c b/tools/testing/selftests/memfd/memfd_test.c index 8b7390ad81d1..3df008677239 100644 --- a/tools/testing/selftests/memfd/memfd_test.c +++ b/tools/testing/selftests/memfd/memfd_test.c @@ -18,6 +18,7 @@ #include #include #include +#include #include "common.h" @@ -43,7 +44,6 @@ */ static size_t mfd_def_size = MFD_DEF_SIZE; static const char *memfd_str = MEMFD_STR; -static pid_t spawn_newpid_thread(unsigned int flags, int (*fn)(void *)); static int newpid_thread_fn2(void *arg); static void join_newpid_thread(pid_t pid); @@ -96,12 +96,12 @@ static void sysctl_assert_write(const char *val) int fd = open("/proc/sys/vm/memfd_noexec", O_WRONLY | O_CLOEXEC); if (fd < 0) { - printf("open sysctl failed\n"); + printf("open sysctl failed: %m\n"); abort(); } if (write(fd, val, strlen(val)) < 0) { - printf("write sysctl failed\n"); + printf("write sysctl %s failed: %m\n", val); abort(); } } @@ -111,7 +111,7 @@ static void sysctl_fail_write(const char *val) int fd = open("/proc/sys/vm/memfd_noexec", O_WRONLY | O_CLOEXEC); if (fd < 0) { - printf("open sysctl failed\n"); + printf("open sysctl failed: %m\n"); abort(); } @@ -122,6 +122,33 @@ static void sysctl_fail_write(const char *val) } } +static void sysctl_assert_equal(const char *val) +{ + char *p, buf[128] = {}; + int fd = open("/proc/sys/vm/memfd_noexec", O_RDONLY | O_CLOEXEC); + + if (fd < 0) { + printf("open sysctl failed: %m\n"); + abort(); + } + + if (read(fd, buf, sizeof(buf)) < 0) { + printf("read sysctl failed: %m\n"); + abort(); + } + + /* Strip trailing whitespace. */ + p = buf; + while (!isspace(*p)) + p++; + *p = '\0'; + + if (strcmp(buf, val) != 0) { + printf("unexpected sysctl value: expected %s, got %s\n", val, buf); + abort(); + } +} + static int mfd_assert_reopen_fd(int fd_in) { int fd; @@ -736,7 +763,7 @@ static int idle_thread_fn(void *arg) return 0; } -static pid_t spawn_idle_thread(unsigned int flags) +static pid_t spawn_thread(unsigned int flags, int (*fn)(void *), void *arg) { uint8_t *stack; pid_t pid; @@ -747,10 +774,7 @@ static pid_t spawn_idle_thread(unsigned int flags) abort(); } - pid = clone(idle_thread_fn, - stack + STACK_SIZE, - SIGCHLD | flags, - NULL); + pid = clone(fn, stack + STACK_SIZE, SIGCHLD | flags, arg); if (pid < 0) { printf("clone() failed: %m\n"); abort(); @@ -759,6 +783,33 @@ static pid_t spawn_idle_thread(unsigned int flags) return pid; } +static void join_thread(pid_t pid) +{ + int wstatus; + + if (waitpid(pid, &wstatus, 0) < 0) { + printf("newpid thread: waitpid() failed: %m\n"); + abort(); + } + + if (WIFEXITED(wstatus) && WEXITSTATUS(wstatus) != 0) { + printf("newpid thread: exited with non-zero error code %d\n", + WEXITSTATUS(wstatus)); + abort(); + } + + if (WIFSIGNALED(wstatus)) { + printf("newpid thread: killed by signal %d\n", + WTERMSIG(wstatus)); + abort(); + } +} + +static pid_t spawn_idle_thread(unsigned int flags) +{ + return spawn_thread(flags, idle_thread_fn, NULL); +} + static void join_idle_thread(pid_t pid) { kill(pid, SIGTERM); @@ -1111,42 +1162,69 @@ static void test_noexec_seal(void) close(fd); } -static void test_sysctl_child(void) +static void test_sysctl_sysctl0(void) { int fd; - int pid; - printf("%s sysctl 0\n", memfd_str); - sysctl_assert_write("0"); - fd = mfd_assert_new("kern_memfd_sysctl_0", + sysctl_assert_equal("0"); + + fd = mfd_assert_new("kern_memfd_sysctl_0_dfl", mfd_def_size, MFD_CLOEXEC | MFD_ALLOW_SEALING); - mfd_assert_mode(fd, 0777); mfd_assert_has_seals(fd, 0); mfd_assert_chmod(fd, 0644); close(fd); +} - printf("%s sysctl 1\n", memfd_str); - sysctl_assert_write("1"); - fd = mfd_assert_new("kern_memfd_sysctl_1", +static void test_sysctl_set_sysctl0(void) +{ + sysctl_assert_write("0"); + test_sysctl_sysctl0(); +} + +static void test_sysctl_sysctl1(void) +{ + int fd; + + sysctl_assert_equal("1"); + + fd = mfd_assert_new("kern_memfd_sysctl_1_dfl", mfd_def_size, MFD_CLOEXEC | MFD_ALLOW_SEALING); + mfd_assert_mode(fd, 0666); + mfd_assert_has_seals(fd, F_SEAL_EXEC); + mfd_fail_chmod(fd, 0777); + close(fd); - printf("%s child ns\n", memfd_str); - pid = spawn_newpid_thread(CLONE_NEWPID, newpid_thread_fn2); - join_newpid_thread(pid); + fd = mfd_assert_new("kern_memfd_sysctl_1_exec", + mfd_def_size, + MFD_CLOEXEC | MFD_EXEC | MFD_ALLOW_SEALING); + mfd_assert_mode(fd, 0777); + mfd_assert_has_seals(fd, 0); + mfd_assert_chmod(fd, 0644); + close(fd); + fd = mfd_assert_new("kern_memfd_sysctl_1_noexec", + mfd_def_size, + MFD_CLOEXEC | MFD_NOEXEC_SEAL | MFD_ALLOW_SEALING); mfd_assert_mode(fd, 0666); mfd_assert_has_seals(fd, F_SEAL_EXEC); mfd_fail_chmod(fd, 0777); - sysctl_fail_write("0"); close(fd); +} - printf("%s sysctl 2\n", memfd_str); - sysctl_assert_write("2"); - mfd_fail_new("kern_memfd_sysctl_2_exec", - MFD_EXEC | MFD_CLOEXEC | MFD_ALLOW_SEALING); +static void test_sysctl_set_sysctl1(void) +{ + sysctl_assert_write("1"); + test_sysctl_sysctl1(); +} + +static void test_sysctl_sysctl2(void) +{ + int fd; + + sysctl_assert_equal("2"); fd = mfd_assert_new("kern_memfd_sysctl_2_dfl", mfd_def_size, @@ -1156,98 +1234,188 @@ static void test_sysctl_child(void) mfd_fail_chmod(fd, 0777); close(fd); - fd = mfd_assert_new("kern_memfd_sysctl_2_noexec_seal", + mfd_fail_new("kern_memfd_sysctl_2_exec", + MFD_CLOEXEC | MFD_EXEC | MFD_ALLOW_SEALING); + + fd = mfd_assert_new("kern_memfd_sysctl_2_noexec", mfd_def_size, - MFD_NOEXEC_SEAL | MFD_CLOEXEC | MFD_ALLOW_SEALING); + MFD_CLOEXEC | MFD_NOEXEC_SEAL | MFD_ALLOW_SEALING); mfd_assert_mode(fd, 0666); mfd_assert_has_seals(fd, F_SEAL_EXEC); mfd_fail_chmod(fd, 0777); close(fd); - - sysctl_fail_write("0"); - sysctl_fail_write("1"); } -static int newpid_thread_fn(void *arg) +static void test_sysctl_set_sysctl2(void) { - test_sysctl_child(); - return 0; + sysctl_assert_write("2"); + test_sysctl_sysctl2(); } -static void test_sysctl_child2(void) +static int sysctl_simple_child(void *arg) { int fd; + int pid; - sysctl_fail_write("0"); - fd = mfd_assert_new("kern_memfd_sysctl_1", - mfd_def_size, - MFD_CLOEXEC | MFD_ALLOW_SEALING); + printf("%s sysctl 0\n", memfd_str); + test_sysctl_set_sysctl0(); - mfd_assert_mode(fd, 0666); - mfd_assert_has_seals(fd, F_SEAL_EXEC); - mfd_fail_chmod(fd, 0777); - close(fd); + printf("%s sysctl 1\n", memfd_str); + test_sysctl_set_sysctl1(); + + printf("%s sysctl 0\n", memfd_str); + test_sysctl_set_sysctl0(); + + printf("%s sysctl 2\n", memfd_str); + test_sysctl_set_sysctl2(); + + printf("%s sysctl 1\n", memfd_str); + test_sysctl_set_sysctl1(); + + printf("%s sysctl 0\n", memfd_str); + test_sysctl_set_sysctl0(); + + return 0; +} + +/* + * Test sysctl + * A very basic test to make sure the core sysctl semantics work. + */ +static void test_sysctl_simple(void) +{ + int pid = spawn_thread(CLONE_NEWPID, sysctl_simple_child, NULL); + + join_thread(pid); } -static int newpid_thread_fn2(void *arg) +static int sysctl_nested(void *arg) { - test_sysctl_child2(); + void (*fn)(void) = arg; + + fn(); return 0; } -static pid_t spawn_newpid_thread(unsigned int flags, int (*fn)(void *)) + +static int sysctl_nested_wait(void *arg) { - uint8_t *stack; - pid_t pid; + /* Wait for a SIGCONT. */ + kill(getpid(), SIGSTOP); + return sysctl_nested(arg); +} - stack = malloc(STACK_SIZE); - if (!stack) { - printf("malloc(STACK_SIZE) failed: %m\n"); - abort(); - } +static void test_sysctl_sysctl1_failset(void) +{ + sysctl_fail_write("0"); + test_sysctl_sysctl1(); +} - pid = clone(fn, - stack + STACK_SIZE, - SIGCHLD | flags, - NULL); - if (pid < 0) { - printf("clone() failed: %m\n"); - abort(); - } +static void test_sysctl_sysctl2_failset(void) +{ + sysctl_fail_write("1"); + test_sysctl_sysctl2(); - return pid; + sysctl_fail_write("0"); + test_sysctl_sysctl2(); } -static void join_newpid_thread(pid_t pid) +static int sysctl_nested_child(void *arg) { - int wstatus; + int fd; + int pid; - if (waitpid(pid, &wstatus, 0) < 0) { - printf("newpid thread: waitpid() failed: %m\n"); - abort(); - } + printf("%s nested sysctl 0\n", memfd_str); + sysctl_assert_write("0"); + /* A further nested pidns works the same. */ + pid = spawn_thread(CLONE_NEWPID, sysctl_simple_child, NULL); + join_thread(pid); - if (WIFEXITED(wstatus) && WEXITSTATUS(wstatus) != 0) { - printf("newpid thread: exited with non-zero error code %d\n", - WEXITSTATUS(wstatus)); - abort(); - } + printf("%s nested sysctl 1\n", memfd_str); + sysctl_assert_write("1"); + /* Child inherits our setting. */ + pid = spawn_thread(CLONE_NEWPID, sysctl_nested, test_sysctl_sysctl1); + join_thread(pid); + /* Child cannot raise the setting. */ + pid = spawn_thread(CLONE_NEWPID, sysctl_nested, + test_sysctl_sysctl1_failset); + join_thread(pid); + /* Child can lower the setting. */ + pid = spawn_thread(CLONE_NEWPID, sysctl_nested, + test_sysctl_set_sysctl2); + join_thread(pid); + /* Child lowering the setting has no effect on our setting. */ + test_sysctl_sysctl1(); + + printf("%s nested sysctl 2\n", memfd_str); + sysctl_assert_write("2"); + /* Child inherits our setting. */ + pid = spawn_thread(CLONE_NEWPID, sysctl_nested, test_sysctl_sysctl2); + join_thread(pid); + /* Child cannot raise the setting. */ + pid = spawn_thread(CLONE_NEWPID, sysctl_nested, + test_sysctl_sysctl2_failset); + join_thread(pid); + + /* Verify that the rules are actually inherited after fork. */ + printf("%s nested sysctl 0 -> 1 after fork\n", memfd_str); + sysctl_assert_write("0"); - if (WIFSIGNALED(wstatus)) { - printf("newpid thread: killed by signal %d\n", - WTERMSIG(wstatus)); - abort(); - } + pid = spawn_thread(CLONE_NEWPID, sysctl_nested_wait, + test_sysctl_sysctl1_failset); + sysctl_assert_write("1"); + kill(pid, SIGCONT); + join_thread(pid); + + printf("%s nested sysctl 0 -> 2 after fork\n", memfd_str); + sysctl_assert_write("0"); + + pid = spawn_thread(CLONE_NEWPID, sysctl_nested_wait, + test_sysctl_sysctl2_failset); + sysctl_assert_write("2"); + kill(pid, SIGCONT); + join_thread(pid); + + /* + * Verify that the current effective setting is saved on fork, meaning + * that the parent lowering the sysctl doesn't affect already-forked + * children. + */ + printf("%s nested sysctl 2 -> 1 after fork\n", memfd_str); + sysctl_assert_write("2"); + pid = spawn_thread(CLONE_NEWPID, sysctl_nested_wait, + test_sysctl_sysctl2); + sysctl_assert_write("1"); + kill(pid, SIGCONT); + join_thread(pid); + + printf("%s nested sysctl 2 -> 0 after fork\n", memfd_str); + sysctl_assert_write("2"); + pid = spawn_thread(CLONE_NEWPID, sysctl_nested_wait, + test_sysctl_sysctl2); + sysctl_assert_write("0"); + kill(pid, SIGCONT); + join_thread(pid); + + printf("%s nested sysctl 1 -> 0 after fork\n", memfd_str); + sysctl_assert_write("1"); + pid = spawn_thread(CLONE_NEWPID, sysctl_nested_wait, + test_sysctl_sysctl1); + sysctl_assert_write("0"); + kill(pid, SIGCONT); + join_thread(pid); + + return 0; } /* - * Test sysctl - * A very basic sealing test to see whether setting/retrieving seals works. + * Test sysctl with nested pid namespaces + * Make sure that the sysctl nesting semantics work correctly. */ -static void test_sysctl(void) +static void test_sysctl_nested(void) { - int pid = spawn_newpid_thread(CLONE_NEWPID, newpid_thread_fn); + int pid = spawn_thread(CLONE_NEWPID, sysctl_nested_child, NULL); - join_newpid_thread(pid); + join_thread(pid); } /* @@ -1433,6 +1601,9 @@ int main(int argc, char **argv) test_seal_grow(); test_seal_resize(); + test_sysctl_simple(); + test_sysctl_nested(); + test_share_dup("SHARE-DUP", ""); test_share_mmap("SHARE-MMAP", ""); test_share_open("SHARE-OPEN", ""); @@ -1447,8 +1618,6 @@ int main(int argc, char **argv) test_share_fork("SHARE-FORK", SHARED_FT_STR); join_idle_thread(pid); - test_sysctl(); - printf("memfd: DONE\n"); return 0;