Message ID | 20230125155557.37816-1-mjguzik@gmail.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp353426wrn; Wed, 25 Jan 2023 08:07:40 -0800 (PST) X-Google-Smtp-Source: AK7set903b2Z77iUvanz60b+flXnaR36qQXOd5I7uSY++hKm09Id3nIFCUKS7AcGh2RJgbOkqxSi X-Received: by 2002:aa7:c697:0:b0:4a0:b6c8:2555 with SMTP id n23-20020aa7c697000000b004a0b6c82555mr327992edq.10.1674662860071; Wed, 25 Jan 2023 08:07:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674662860; cv=none; d=google.com; s=arc-20160816; b=CI4fg+tNzcC1BQGza89cbqCe3ybenFycMuWiyZQrmayPHed9rhORHnpsqH7RBu8yXi aq2aXdHNO1H0F9SQ638cJPmkvQ8CeN1AtLGlAxXVmhgnreaadQAfN0EoBCea3af0oA8a ehvosaSduQEOUJzRKezChmd4UWZx9oDJOM/+esMRcUFpv6mQADQg/mmbI1q3XTx4ATK/ dBlwmx4f0tXGZ+zHtwkmVJSoShyHpLu5d5dqrw5GtE1E4+JXFhtGvR7nF0CXJeV9+7Md w72d8y/dlax6vFRdwxd8otVK8goRUa2FJcVXSXCouHbRPFoR+wX5ApT/LKkumKaGBfLc PygQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=YwP2tm0CAOWY073suUhF0J/FRKYyYwUasCH1kI3YMoo=; b=NCGnAx1XwQ//bIaqKW+L+wnZmbmL9vINeEEv1bDp3npqPXy/JCgTdxKewpYw/PWqRK VDr/elzLqxxcPWUvTSJAuUmPsuUWApvx/YWMouFwH363DKcrMjbSzqhxwJL/FkWDA/09 t+ZCauPMOppAZVBrQlJE1Paqoxb3t9VQDQALMOFBC40S9HaQpdX4eqmuhsW/r45aaT8Y 5yStK5sVWhRPZvuW4Y9F1e5z6IMSguyx8us/+lmMTd6/OC9qC2CwT2Zfs2im706yfLjo M4udV0ME9jXZ6UZumOt454PepZBcXiX7m0iSOBosXEUFKoWJzKKtt8XcjHm5LZyMpR85 m1Uw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=b4Hz3cF3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f29-20020a50d55d000000b0049f512ad602si7589129edj.336.2023.01.25.08.07.15; Wed, 25 Jan 2023 08:07:40 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=b4Hz3cF3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235651AbjAYP4T (ORCPT <rfc822;rust.linux@gmail.com> + 99 others); Wed, 25 Jan 2023 10:56:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38540 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235046AbjAYP4G (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 25 Jan 2023 10:56:06 -0500 Received: from mail-ed1-x52b.google.com (mail-ed1-x52b.google.com [IPv6:2a00:1450:4864:20::52b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4EBD4113CE; Wed, 25 Jan 2023 07:56:04 -0800 (PST) Received: by mail-ed1-x52b.google.com with SMTP id u21so363019edv.3; Wed, 25 Jan 2023 07:56:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=YwP2tm0CAOWY073suUhF0J/FRKYyYwUasCH1kI3YMoo=; b=b4Hz3cF3NnfPCbJomxECRfpWj1EP+pcb7wwtSC3FfiHAtis7Fa5wHww841JNA8SfCs KI5ydzziw1GyKjGph9B0R2bUY7w9eOIMCiTMtCSSwWRTFgBMIt4vtyya0bEhltICTtVT HXPXkm03w94K7eVlE/046M5lHlrZWMorer8C2PysP7EH16d7r+bD88618oouazgF7DyM lq1b9BR2r7hz8Vt/cNVQBdXlV+5k/zhxf3vFE1K3xOBT4iR+LgBpsWRdaDwt70ZzENJ8 E17gitlojSEGk8tS2+lhF2KG1aa0M6ldVeBY+eRTMlnaMTv7ehMB8GLnbQ/ZCcBtGVkX VXFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=YwP2tm0CAOWY073suUhF0J/FRKYyYwUasCH1kI3YMoo=; b=zttSkI4LdfEFVPiVdkjO6vCwoLb91LO6Y92BsBwYQhR9a8BgBiccBdFQY6V2c4Ewyi 1B1Re6vMRYDtFvkUd2PJRhLRcMWZ3XwASA1auypK/TAxbrMjWBFFhlq7Ck+TmX+Le9rs 7R4qzhMsHycJo7iGiGmnmj/LLooIHaKST3XmlZoz1kvHrjopPwCYYFNZbhp0HQsmiyHC 40AGWLmsmxkWdZFL2SgTGa4susBh+QnI3Pa/zusQKakG6ezTt5witsfVZixuNaO1Y6DO 91J0AvpPHnF0rjy0Tnyd51OSJ/dTzM3UU5eU4QMb9+aIbaBymFwJsY2dSq9JUF7FOTFS YAKg== X-Gm-Message-State: AFqh2krYGRCz0iBEkBtJUsaWBcU6F/lW15jjUZK6CQJagvdPrcppj9if AEILbNy0obDH7bzHTQCd10I= X-Received: by 2002:a05:6402:25cb:b0:49d:6ebe:e9dc with SMTP id x11-20020a05640225cb00b0049d6ebee9dcmr43014072edb.25.1674662162788; Wed, 25 Jan 2023 07:56:02 -0800 (PST) Received: from f.. (cst-prg-88-122.cust.vodafone.cz. [46.135.88.122]) by smtp.gmail.com with ESMTPSA id d24-20020a056402517800b0049e249c0e56sm2539287ede.56.2023.01.25.07.56.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Jan 2023 07:56:02 -0800 (PST) From: Mateusz Guzik <mjguzik@gmail.com> To: viro@zeniv.linux.org.uk Cc: serge@hallyn.com, torvalds@linux-foundation.org, paul@paul-moore.com, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org, Mateusz Guzik <mjguzik@gmail.com> Subject: [PATCH v3 1/2] capability: add cap_isidentical Date: Wed, 25 Jan 2023 16:55:56 +0100 Message-Id: <20230125155557.37816-1-mjguzik@gmail.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1756011283097111069?= X-GMAIL-MSGID: =?utf-8?q?1756011283097111069?= |
Series |
[v3,1/2] capability: add cap_isidentical
|
|
Commit Message
Mateusz Guzik
Jan. 25, 2023, 3:55 p.m. UTC
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Reviewed-by: Serge Hallyn <serge@hallyn.com> --- include/linux/capability.h | 10 ++++++++++ 1 file changed, 10 insertions(+)
Comments
On Wed, Jan 25, 2023 at 7:56 AM Mateusz Guzik <mjguzik@gmail.com> wrote: > > +static inline bool cap_isidentical(const kernel_cap_t a, const kernel_cap_t b) > +{ > + unsigned __capi; > + CAP_FOR_EACH_U32(__capi) { > + if (a.cap[__capi] != b.cap[__capi]) > + return false; > + } > + return true; > +} > + Side note, and this is not really related to this particular patch other than because it just brought up the issue once more.. Our "kernel_cap_t" thing is disgusting. It's been a structure containing __u32 cap[_KERNEL_CAPABILITY_U32S]; basically forever, and it's not likely to change in the future. I would object to any crazy capability expansion, considering how useless and painful they've been anyway, and I don't think anybody really is even remotely planning anything like that anyway. And what is _KERNEL_CAPABILITY_U32S anyway? It's the "third version" of that size: #define _KERNEL_CAPABILITY_U32S _LINUX_CAPABILITY_U32S_3 which happens to be the same number as the second version of said #define, which happens to be "2". In other words, that fancy array is just 64 bits. And we'd probably be better off just treating it as such, and just doing typedef u64 kernel_cap_t; since we have to do the special "convert from user space format" _anyway_, and this isn't something that is shared to user space as-is. Then that "cap_isidentical()" would literally be just "a == b" instead of us playing games with for-loops that are just two wide, and a compiler that may or may not realize. It would literally remove some of the insanity in <linux/capability.h> - look for CAP_TO_MASK() and CAP_TO_INDEX and CAP_FS_MASK_B0 and CAP_FS_MASK_B1 and just plain ugliness that comes from this entirely historical oddity. Yes, yes, we started out having it be a single-word array, and yes, the code is written to think that it might some day be expanded past the two words it then in 2008 it expanded to two words and 64 bits. And now, fifteen years later, we use 40 of those 64 bits, and hopefully we'll never add another one. So we have historical reasons for why our kernel_cap_t is so odd. But it *is* odd. Hmm? Linus
On 2/27/2023 5:14 PM, Linus Torvalds wrote: > On Wed, Jan 25, 2023 at 7:56 AM Mateusz Guzik <mjguzik@gmail.com> wrote: >> +static inline bool cap_isidentical(const kernel_cap_t a, const kernel_cap_t b) >> +{ >> + unsigned __capi; >> + CAP_FOR_EACH_U32(__capi) { >> + if (a.cap[__capi] != b.cap[__capi]) >> + return false; >> + } >> + return true; >> +} >> + > Side note, and this is not really related to this particular patch > other than because it just brought up the issue once more.. > > Our "kernel_cap_t" thing is disgusting. > > It's been a structure containing > > __u32 cap[_KERNEL_CAPABILITY_U32S]; > > basically forever, and it's not likely to change in the future. I > would object to any crazy capability expansion, considering how > useless and painful they've been anyway, and I don't think anybody > really is even remotely planning anything like that anyway. > > And what is _KERNEL_CAPABILITY_U32S anyway? It's the "third version" > of that size: > > #define _KERNEL_CAPABILITY_U32S _LINUX_CAPABILITY_U32S_3 > > which happens to be the same number as the second version of said > #define, which happens to be "2". > > In other words, that fancy array is just 64 bits. And we'd probably be > better off just treating it as such, and just doing > > typedef u64 kernel_cap_t; > > since we have to do the special "convert from user space format" > _anyway_, and this isn't something that is shared to user space as-is. > > Then that "cap_isidentical()" would literally be just "a == b" instead > of us playing games with for-loops that are just two wide, and a > compiler that may or may not realize. > > It would literally remove some of the insanity in <linux/capability.h> > - look for CAP_TO_MASK() and CAP_TO_INDEX and CAP_FS_MASK_B0 and > CAP_FS_MASK_B1 and just plain ugliness that comes from this entirely > historical oddity. > > Yes, yes, we started out having it be a single-word array, and yes, > the code is written to think that it might some day be expanded past > the two words it then in 2008 it expanded to two words and 64 bits. > And now, fifteen years later, we use 40 of those 64 bits, and > hopefully we'll never add another one. I agree that the addition of 24 more capabilities is unlikely. The two reasons presented recently for adding capabilities are to implement boutique policies (CAP_MYHARDWAREISSPECIAL) or to break up CAP_SYS_ADMIN. Neither of these is sustainable with a finite number of capabilities, nor do they fit the security model capabilities implement. It's possible that a small number of additional capabilities will be approved, but even that seems unlikely. > So we have historical reasons for why our kernel_cap_t is so odd. But > it *is* odd. > > Hmm? I don't see any reason that kernel_cap_t shouldn't be a u64. If by some amazing change in mindset we develop need for 65 capabilities, someone can dredge up the old code, shout "I told you so!" and put it back the way it was. Or maybe by then we'll have u128, and can just switch to that. > Linus
On 2/28/23, Casey Schaufler <casey@schaufler-ca.com> wrote: > On 2/27/2023 5:14 PM, Linus Torvalds wrote: >> On Wed, Jan 25, 2023 at 7:56 AM Mateusz Guzik <mjguzik@gmail.com> wrote: >>> +static inline bool cap_isidentical(const kernel_cap_t a, const >>> kernel_cap_t b) >>> +{ >>> + unsigned __capi; >>> + CAP_FOR_EACH_U32(__capi) { >>> + if (a.cap[__capi] != b.cap[__capi]) >>> + return false; >>> + } >>> + return true; >>> +} >>> + >> Side note, and this is not really related to this particular patch >> other than because it just brought up the issue once more.. >> >> Our "kernel_cap_t" thing is disgusting. >> >> It's been a structure containing >> >> __u32 cap[_KERNEL_CAPABILITY_U32S]; >> >> basically forever, and it's not likely to change in the future. I >> would object to any crazy capability expansion, considering how >> useless and painful they've been anyway, and I don't think anybody >> really is even remotely planning anything like that anyway. >> >> And what is _KERNEL_CAPABILITY_U32S anyway? It's the "third version" >> of that size: >> >> #define _KERNEL_CAPABILITY_U32S _LINUX_CAPABILITY_U32S_3 >> >> which happens to be the same number as the second version of said >> #define, which happens to be "2". >> >> In other words, that fancy array is just 64 bits. And we'd probably be >> better off just treating it as such, and just doing >> >> typedef u64 kernel_cap_t; >> >> since we have to do the special "convert from user space format" >> _anyway_, and this isn't something that is shared to user space as-is. >> >> Then that "cap_isidentical()" would literally be just "a == b" instead >> of us playing games with for-loops that are just two wide, and a >> compiler that may or may not realize. >> >> It would literally remove some of the insanity in <linux/capability.h> >> - look for CAP_TO_MASK() and CAP_TO_INDEX and CAP_FS_MASK_B0 and >> CAP_FS_MASK_B1 and just plain ugliness that comes from this entirely >> historical oddity. >> >> Yes, yes, we started out having it be a single-word array, and yes, >> the code is written to think that it might some day be expanded past >> the two words it then in 2008 it expanded to two words and 64 bits. >> And now, fifteen years later, we use 40 of those 64 bits, and >> hopefully we'll never add another one. > > I agree that the addition of 24 more capabilities is unlikely. The > two reasons presented recently for adding capabilities are to implement > boutique policies (CAP_MYHARDWAREISSPECIAL) or to break up CAP_SYS_ADMIN. > Neither of these is sustainable with a finite number of capabilities, nor > do they fit the security model capabilities implement. It's possible that > a small number of additional capabilities will be approved, but even that > seems unlikely. > > >> So we have historical reasons for why our kernel_cap_t is so odd. But >> it *is* odd. >> >> Hmm? > > I don't see any reason that kernel_cap_t shouldn't be a u64. If by some > amazing change in mindset we develop need for 65 capabilities, someone can > dredge up the old code, shout "I told you so!" and put it back the way it > was. Or maybe by then we'll have u128, and can just switch to that. > Premature generalization is the root of all evil (or however the saying goes), as evidenced above. The fact that this is an array of u32 escaped the confines of capability.h and as a result there would be unpleasant churn to sort it out, and more importantly this requires a lot more testing than you would normally expect. Personally I would only touch it as a result of losing a bet (and I'm not taking any with this in play), but that's just my $0.05 (adjusted for inflation).
On Mon, Feb 27, 2023 at 06:46:12PM -0800, Casey Schaufler wrote: > On 2/27/2023 5:14 PM, Linus Torvalds wrote: > > On Wed, Jan 25, 2023 at 7:56 AM Mateusz Guzik <mjguzik@gmail.com> wrote: > >> +static inline bool cap_isidentical(const kernel_cap_t a, const kernel_cap_t b) > >> +{ > >> + unsigned __capi; > >> + CAP_FOR_EACH_U32(__capi) { > >> + if (a.cap[__capi] != b.cap[__capi]) > >> + return false; > >> + } > >> + return true; > >> +} > >> + > > Side note, and this is not really related to this particular patch > > other than because it just brought up the issue once more.. > > > > Our "kernel_cap_t" thing is disgusting. > > > > It's been a structure containing > > > > __u32 cap[_KERNEL_CAPABILITY_U32S]; > > > > basically forever, and it's not likely to change in the future. I > > would object to any crazy capability expansion, considering how > > useless and painful they've been anyway, and I don't think anybody > > really is even remotely planning anything like that anyway. > > > > And what is _KERNEL_CAPABILITY_U32S anyway? It's the "third version" > > of that size: > > > > #define _KERNEL_CAPABILITY_U32S _LINUX_CAPABILITY_U32S_3 > > > > which happens to be the same number as the second version of said > > #define, which happens to be "2". > > > > In other words, that fancy array is just 64 bits. And we'd probably be > > better off just treating it as such, and just doing > > > > typedef u64 kernel_cap_t; > > > > since we have to do the special "convert from user space format" > > _anyway_, and this isn't something that is shared to user space as-is. > > > > Then that "cap_isidentical()" would literally be just "a == b" instead > > of us playing games with for-loops that are just two wide, and a > > compiler that may or may not realize. > > > > It would literally remove some of the insanity in <linux/capability.h> > > - look for CAP_TO_MASK() and CAP_TO_INDEX and CAP_FS_MASK_B0 and > > CAP_FS_MASK_B1 and just plain ugliness that comes from this entirely > > historical oddity. > > > > Yes, yes, we started out having it be a single-word array, and yes, > > the code is written to think that it might some day be expanded past > > the two words it then in 2008 it expanded to two words and 64 bits. > > And now, fifteen years later, we use 40 of those 64 bits, and > > hopefully we'll never add another one. > > I agree that the addition of 24 more capabilities is unlikely. The > two reasons presented recently for adding capabilities are to implement > boutique policies (CAP_MYHARDWAREISSPECIAL) or to break up CAP_SYS_ADMIN. FWIW IMO breaking up CAP_SYS_ADMIN is a good thing, so long as we continue to do it in the "you can use either CAP_SYS_ADMIN or CAP_NEW_FOO" way. But there haven't been many such patchsets :) > Neither of these is sustainable with a finite number of capabilities, nor > do they fit the security model capabilities implement. It's possible that > a small number of additional capabilities will be approved, but even that > seems unlikely. > > > > So we have historical reasons for why our kernel_cap_t is so odd. But > > it *is* odd. > > > > Hmm? > > I don't see any reason that kernel_cap_t shouldn't be a u64. If by some > amazing change in mindset we develop need for 65 capabilities, someone can > dredge up the old code, shout "I told you so!" and put it back the way it > was. Or maybe by then we'll have u128, and can just switch to that. > > > Linus
On 2/28/2023 9:32 AM, Serge E. Hallyn wrote: > On Mon, Feb 27, 2023 at 06:46:12PM -0800, Casey Schaufler wrote: >> On 2/27/2023 5:14 PM, Linus Torvalds wrote: >>> On Wed, Jan 25, 2023 at 7:56 AM Mateusz Guzik <mjguzik@gmail.com> wrote: >>>> +static inline bool cap_isidentical(const kernel_cap_t a, const kernel_cap_t b) >>>> +{ >>>> + unsigned __capi; >>>> + CAP_FOR_EACH_U32(__capi) { >>>> + if (a.cap[__capi] != b.cap[__capi]) >>>> + return false; >>>> + } >>>> + return true; >>>> +} >>>> + >>> Side note, and this is not really related to this particular patch >>> other than because it just brought up the issue once more.. >>> >>> Our "kernel_cap_t" thing is disgusting. >>> >>> It's been a structure containing >>> >>> __u32 cap[_KERNEL_CAPABILITY_U32S]; >>> >>> basically forever, and it's not likely to change in the future. I >>> would object to any crazy capability expansion, considering how >>> useless and painful they've been anyway, and I don't think anybody >>> really is even remotely planning anything like that anyway. >>> >>> And what is _KERNEL_CAPABILITY_U32S anyway? It's the "third version" >>> of that size: >>> >>> #define _KERNEL_CAPABILITY_U32S _LINUX_CAPABILITY_U32S_3 >>> >>> which happens to be the same number as the second version of said >>> #define, which happens to be "2". >>> >>> In other words, that fancy array is just 64 bits. And we'd probably be >>> better off just treating it as such, and just doing >>> >>> typedef u64 kernel_cap_t; >>> >>> since we have to do the special "convert from user space format" >>> _anyway_, and this isn't something that is shared to user space as-is. >>> >>> Then that "cap_isidentical()" would literally be just "a == b" instead >>> of us playing games with for-loops that are just two wide, and a >>> compiler that may or may not realize. >>> >>> It would literally remove some of the insanity in <linux/capability.h> >>> - look for CAP_TO_MASK() and CAP_TO_INDEX and CAP_FS_MASK_B0 and >>> CAP_FS_MASK_B1 and just plain ugliness that comes from this entirely >>> historical oddity. >>> >>> Yes, yes, we started out having it be a single-word array, and yes, >>> the code is written to think that it might some day be expanded past >>> the two words it then in 2008 it expanded to two words and 64 bits. >>> And now, fifteen years later, we use 40 of those 64 bits, and >>> hopefully we'll never add another one. >> I agree that the addition of 24 more capabilities is unlikely. The >> two reasons presented recently for adding capabilities are to implement >> boutique policies (CAP_MYHARDWAREISSPECIAL) or to break up CAP_SYS_ADMIN. > FWIW IMO breaking up CAP_SYS_ADMIN is a good thing, so long as we continue > to do it in the "you can use either CAP_SYS_ADMIN or CAP_NEW_FOO" way. You need to have a security policy to reference to add a capability. Telling the disc to spin in the opposite direction, while important to control, is not something that will fall under a security policy. It is also rare for programs to need CAP_SYS_ADMIN for only one thing. While I agree that we shouldn't be allowing a program to reverse the spin of a drive just because it needs to adjust memory use on a network interface, I don't believe that capabilities are the right approach. Capabilities haven't proven popular for their intended purpose, so I don't see them as a good candidate for extension. There were good reasons for capabilities to work the way they do, but they have not all stood the test of time. I do have a proposed implementation, but it's stuck behind LSM stacking. > > But there haven't been many such patchsets :) > >> Neither of these is sustainable with a finite number of capabilities, nor >> do they fit the security model capabilities implement. It's possible that >> a small number of additional capabilities will be approved, but even that >> seems unlikely. >> >> >>> So we have historical reasons for why our kernel_cap_t is so odd. But >>> it *is* odd. >>> >>> Hmm? >> I don't see any reason that kernel_cap_t shouldn't be a u64. If by some >> amazing change in mindset we develop need for 65 capabilities, someone can >> dredge up the old code, shout "I told you so!" and put it back the way it >> was. Or maybe by then we'll have u128, and can just switch to that. >> >>> Linus
On Tue, Feb 28, 2023 at 6:47 AM Mateusz Guzik <mjguzik@gmail.com> wrote: > > Personally I would only touch it as a result of losing a bet (and I'm > not taking any with this in play), but that's just my $0.05 (adjusted > for inflation). Heh. I took that as a challenge. It wasn't actually all that bad (famous last words). For type safety reasons I decided to use a struct wrapper typedef struct { u64 val; } kernel_cap_t; to avoid any nasty silent integer value conversions. But then it was literally just a matter of cleaning up some of the insanity. I think the fs/proc/array.c modification is an example of just how bad things used to be: --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -300,13 +300,8 @@ static inline void task_sig(struct seq_file *m, static void render_cap_t(struct seq_file *m, const char *header, kernel_cap_t *a) { - unsigned __capi; - seq_puts(m, header); - CAP_FOR_EACH_U32(__capi) { - seq_put_hex_ll(m, NULL, - a->cap[CAP_LAST_U32 - __capi], 8); - } + seq_put_hex_ll(m, NULL, a->val, 16); seq_putc(m, '\n'); } note how the code literally did that odd CAP_LAST_U32 - __capi in there just to get the natural "high word first" that is exactly what you get if you just print out the value as a hex number. Now, the actual user mode conversions still exist, but even those often got simplified. Have I actually *tested* this? Of course not. I'm lazy, and everything I write obviously always works on the first try anyway, so why should I? So take this patch with a healthy dose of salt, but it looks sane to me, and it does build cleanly (and with our type system, that actually does say quite a lot). This actually looks sane enough that I might even boot it. Call me crazy. Linus
On Tue, Feb 28, 2023 at 11:39 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > This actually looks sane enough that I might even boot it. Call me crazy. Oh, and while I haven't actually booted it or tested it in any way, I did verify that it changes - movq 48(%rbx), %rax - movq 56(%rbx), %rcx - cmpl %eax, %ecx - jne .LBB58_13 - shrq $32, %rax - shrq $32, %rcx - cmpl %eax, %ecx - jne .LBB58_13 into + movq 56(%rbx), %rax + cmpq 48(%rbx), %rax + jne .LBB58_12 because it looks like clang was smart enough to unroll the silly fixed-size loop and do the two adjacent 32-bit loads of the old cap[] array as one 64-bit load, but then was _not_ smart enough to combine the two 32-bit compares into one 64-bit one. And gcc didn't do the load optimization (which is questionable since it then just results in extra shifts and extra registers), so it just kept it as two 32-bit loads and compares. Again, with the patch, gcc obviously does the sane "one 64-bit load, one 64-bit compare" too. There's a lot to be said for compiler optimizations fixing up silly source code, but I personally would just prefer to make the source code DTRT. Could the compiler have been even smarter and generated the same code? Yes it could. We shouldn't expect that, though. Particularly when the sane code is much more legible to humans too. Linus
On Tue, Feb 28, 2023 at 11:39 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > This actually looks sane enough that I might even boot it. Call me crazy. LOL. I had to go through the patch with a find comb, because everything worked except for some reason network name resolution failed: systemd-resolved got a permission error on Failed to listen on UDP socket 127.0.0.53:53: Permission denied Spot the insufficient fixup in my cut-and-paste capget() patch: kdata[0].effective = pE.val; kdata[1].effective = pE.val >> 32; kdata[0].permitted = pP.val; kdata[1].permitted = pP.val >> 32; kdata[0].inheritable = pI.val; kdata[0].inheritable = pI.val >> 32; Oops. But with that fixed, that patch actually does seem to work. Linus
On 2/28/23, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Tue, Feb 28, 2023 at 11:39 AM Linus Torvalds > <torvalds@linux-foundation.org> wrote: >> >> Call me crazy. > Hello crazy, > I had to go through the patch with a find comb, because everything > worked except for some reason network name resolution failed: > systemd-resolved got a permission error on > > Failed to listen on UDP socket 127.0.0.53:53: Permission denied > > Spot the insufficient fixup in my cut-and-paste capget() patch: > > kdata[0].effective = pE.val; > kdata[1].effective = pE.val >> 32; > kdata[0].permitted = pP.val; > kdata[1].permitted = pP.val >> 32; > kdata[0].inheritable = pI.val; > kdata[0].inheritable = pI.val >> 32; > > Oops. > > But with that fixed, that patch actually does seem to work. > This is part of the crap which made me unwilling to do the clean up. Unless there is a test suite (which I'm guessing there is not), I think this warrants a prog which iterates over all methods with a bunch of randomly generated capsets (+ maybe handpicked corner cases?) and compares results new vs old. Otherwise I would feel very uneasy signing off on the patch. That said, nice cleanup if it works out :)
On Tue, Feb 28, 2023 at 1:21 PM Mateusz Guzik <mjguzik@gmail.com> wrote: > > This is part of the crap which made me unwilling to do the clean up. Yeah, it's not pretty. That said, the old code was worse. The only redeeming feature of the old code was that "nobody has touched it in ages", so it was at least stable. Linus
On Tue, Feb 28, 2023 at 1:29 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > That said, the old code was worse. The only redeeming feature of the > old code was that "nobody has touched it in ages", so it was at least > stable. Bah. I've walked through that patch something like ten times now, and decided that there's no way it breaks anything. Famous last words. It also means that I don't want to look at that ugly old code when I have the fix for it all, so I just moved it over from my experimental tree to the main tree, since it's still the merge window. Quod licet Iovi, non licet bovi, or something. Linus
diff --git a/include/linux/capability.h b/include/linux/capability.h index 65efb74c3585..736a973c677a 100644 --- a/include/linux/capability.h +++ b/include/linux/capability.h @@ -156,6 +156,16 @@ static inline bool cap_isclear(const kernel_cap_t a) return true; } +static inline bool cap_isidentical(const kernel_cap_t a, const kernel_cap_t b) +{ + unsigned __capi; + CAP_FOR_EACH_U32(__capi) { + if (a.cap[__capi] != b.cap[__capi]) + return false; + } + return true; +} + /* * Check if "a" is a subset of "set". * return true if ALL of the capabilities in "a" are also in "set"