Message ID | 20230726141026.307690-4-aleksandr.mikhalitsyn@canonical.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a985:0:b0:3e4:2afc:c1 with SMTP id t5csp483108vqo; Wed, 26 Jul 2023 08:32:55 -0700 (PDT) X-Google-Smtp-Source: APBJJlFo3E6VGMixQ7Z7GxskGtldyo5z9u3JhLAKhrKjX0BBqVpH/416PwZakP+bka48X39elWkI X-Received: by 2002:a19:e019:0:b0:4fb:897e:21cc with SMTP id x25-20020a19e019000000b004fb897e21ccmr1583886lfg.67.1690385575032; Wed, 26 Jul 2023 08:32:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690385575; cv=none; d=google.com; s=arc-20160816; b=MJoMEoE9wbLD1PhTfb+JzO6C0MLedJhYDxf1Oy59wQqqv531bQu3S2S6sefbtAZ5ij 9y2QwVtLBTF4Mhk5w2D/QYzK5ah/m9MRBaDmMtsUyaNJiAFsAr1c9RP3ZdqPlqMOl3iL cJ3cCQWq25dXrNMAkpNu6kddVmVXZER1cfhqqNJcXueymwTNsxtNab1Hj1G1+N/g30Qj eYcEchI7mrk2yHWAKX4s2fPLHQPTz0efnUqas1h2+zjftL61Sa8CnQLcDFMpwfw0R/re EDbkEAWrK3MQWVz48ZoS7qp4/PfAC+JoXqaylrVSNnDJVapM+UPcnuuVad9EuZAEcrYW mMbA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=ifNPKcEIq/Tv2vsoKX4zAa3gfholzjGLVEwPWVuZYK8=; fh=AB88THZ7e4yf+XqQ54VOhi5v1RBcLVn7ZMnG2pE+330=; b=W+BWO5FPLLWL6d6eBHj4Qt4JKPnFr7qRbwOv6NaePSZiYe1YtZNNDhAjVLRhjoU6Qg HwIVoJy7H5sJvIk7PPnCP533qdQCvaPoGJLx+CORgONJjCPRKEJMenpOeLOuMZD2xFrq +6lvvZCZwPk5qqqGFbQG2C4XK2TCbuGCzQ4GjMfJue0zu9ItMUcetOVBeL8QKTWPUNC/ QB9YouHoQMdnMnBZtuZMcMLuJIrygnS1v//UuwB6mhPNdrh04sjbc3Km3sF+j9QAfSxM m/g/q/xw9JcHrbexXwb5E23QNH1osbJjWKBESXv8G/NQUuWn0Rmb0LBIlEMPdgty+2CQ aIRQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@canonical.com header.s=20210705 header.b=KnchRmlD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=canonical.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a11-20020aa7d74b000000b005222968fc70si5410948eds.194.2023.07.26.08.32.29; Wed, 26 Jul 2023 08:32:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@canonical.com header.s=20210705 header.b=KnchRmlD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=canonical.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234443AbjGZOMS (ORCPT <rfc822;kloczko.tomasz@gmail.com> + 99 others); Wed, 26 Jul 2023 10:12:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35990 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232183AbjGZOLt (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 26 Jul 2023 10:11:49 -0400 Received: from smtp-relay-internal-1.canonical.com (smtp-relay-internal-1.canonical.com [185.125.188.123]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 15CDA3ABD for <linux-kernel@vger.kernel.org>; Wed, 26 Jul 2023 07:11:18 -0700 (PDT) Received: from mail-lf1-f72.google.com (mail-lf1-f72.google.com [209.85.167.72]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-1.canonical.com (Postfix) with ESMTPS id 630A9413CB for <linux-kernel@vger.kernel.org>; Wed, 26 Jul 2023 14:10:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1690380642; bh=ifNPKcEIq/Tv2vsoKX4zAa3gfholzjGLVEwPWVuZYK8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=KnchRmlDTs1KiN/izXGpgRCki/d/1hYblcl7SKJiANjnVNkaNeWnKqqP7ftH0CJlo GXpMn6HRsmRuLdac9ExCAFn4krEElHIGreSpavTWHw5P1GQ326qN/b5viv6QWw9ihH o26T2xiduW5YwjSQmwiOj3oBtMBWXUEP8SXtiEXwsUp+v7HHRNwxk7vJel25N0IJeQ gNC48NxT020tXd78XQ1mVRXm0KFcr2zflKYwp99/uYWNpImXII+8DQothBJraJgTO9 eWZOTciV/1nkRJfa0UsMII6s8QZjA0L98tgxuearAXrg4BE99X0I5mT3pISlHyA40x pEAlD/pc/XoNA== Received: by mail-lf1-f72.google.com with SMTP id 2adb3069b0e04-4fb9087a677so6196618e87.1 for <linux-kernel@vger.kernel.org>; Wed, 26 Jul 2023 07:10:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690380642; x=1690985442; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ifNPKcEIq/Tv2vsoKX4zAa3gfholzjGLVEwPWVuZYK8=; b=TP1K5KsBpexax5uy25bLMvCQyotfL+r9E2vx1cE+lf7ji8i3Sn32GWUXk36TEPeMdE dYAVNPPz86p0rFs4xbN38HKWRdHJnGQVuY8HYnZK2chKe+vLyG+hA8QFk9ckZzNNk6kw UTSLJhn2zh35AOH7ZX6TnGVH4K5K4wXIkBzMk3aoFsNesE2r2/QNPumMZ1ONhA6YSOZL qrfOrX0EOuvLkk2vcqZKECTEueV3vB3DEEk2/+b5FIl1yb/ATFXRYO0/v/2yzStZnQ0q wiQHexQFndPzkQeEOhBGP+jp5wJCPduXlCc6EXXhJ/DpNB5AX9Pkz3GkYpiVdcrH98mC R39w== X-Gm-Message-State: ABy/qLb4lp26qo+f1iL/AHesuoCpmVma/EVgTRN5FMyjjge9s57vfNTn ILeqRv7l69rniv2bcKCARGKgSDwMCQU5a93VGrwne0gVuuxiOpnsfwclso186xK+6HMYYGnRfcc hU2bC8MRUp/Dg7JSmcwmHn5O+5Yvred8ycs2QI0+KWA== X-Received: by 2002:a19:500b:0:b0:4f8:78c9:4f00 with SMTP id e11-20020a19500b000000b004f878c94f00mr1362281lfb.20.1690380641770; Wed, 26 Jul 2023 07:10:41 -0700 (PDT) X-Received: by 2002:a19:500b:0:b0:4f8:78c9:4f00 with SMTP id e11-20020a19500b000000b004f878c94f00mr1362257lfb.20.1690380641324; Wed, 26 Jul 2023 07:10:41 -0700 (PDT) Received: from amikhalitsyn.local (dslb-088-066-182-192.088.066.pools.vodafone-ip.de. [88.66.182.192]) by smtp.gmail.com with ESMTPSA id k14-20020a7bc30e000000b003fc02219081sm2099714wmj.33.2023.07.26.07.10.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 26 Jul 2023 07:10:40 -0700 (PDT) From: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> To: xiubli@redhat.com Cc: brauner@kernel.org, stgraber@ubuntu.com, linux-fsdevel@vger.kernel.org, Jeff Layton <jlayton@kernel.org>, Ilya Dryomov <idryomov@gmail.com>, ceph-devel@vger.kernel.org, Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>, Christian Brauner <christian.brauner@ubuntu.com>, linux-kernel@vger.kernel.org Subject: [PATCH v7 03/11] ceph: handle idmapped mounts in create_request_message() Date: Wed, 26 Jul 2023 16:10:18 +0200 Message-Id: <20230726141026.307690-4-aleksandr.mikhalitsyn@canonical.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230726141026.307690-1-aleksandr.mikhalitsyn@canonical.com> References: <20230726141026.307690-1-aleksandr.mikhalitsyn@canonical.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1772497744728140745 X-GMAIL-MSGID: 1772497744728140745 |
Series |
ceph: support idmapped mounts
|
|
Commit Message
Aleksandr Mikhalitsyn
July 26, 2023, 2:10 p.m. UTC
Inode operations that create a new filesystem object such as ->mknod, ->create, ->mkdir() and others don't take a {g,u}id argument explicitly. Instead the caller's fs{g,u}id is used for the {g,u}id of the new filesystem object. In order to ensure that the correct {g,u}id is used map the caller's fs{g,u}id for creation requests. This doesn't require complex changes. It suffices to pass in the relevant idmapping recorded in the request message. If this request message was triggered from an inode operation that creates filesystem objects it will have passed down the relevant idmaping. If this is a request message that was triggered from an inode operation that doens't need to take idmappings into account the initial idmapping is passed down which is an identity mapping. This change uses a new cephfs protocol extension CEPHFS_FEATURE_HAS_OWNER_UIDGID which adds two new fields (owner_{u,g}id) to the request head structure. So, we need to ensure that MDS supports it otherwise we need to fail any IO that comes through an idmapped mount because we can't process it in a proper way. MDS server without such an extension will use caller_{u,g}id fields to set a new inode owner UID/GID which is incorrect because caller_{u,g}id values are unmapped. At the same time we can't map these fields with an idmapping as it can break UID/GID-based permission checks logic on the MDS side. This problem was described with a lot of details at [1], [2]. [1] https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/ [2] https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/ Cc: Xiubo Li <xiubli@redhat.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: ceph-devel@vger.kernel.org Co-Developed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> --- v7: - reworked to use two new fields for owner UID/GID (https://github.com/ceph/ceph/pull/52575) --- fs/ceph/mds_client.c | 20 ++++++++++++++++++++ fs/ceph/mds_client.h | 5 ++++- include/linux/ceph/ceph_fs.h | 4 +++- 3 files changed, 27 insertions(+), 2 deletions(-)
Comments
On 7/26/23 22:10, Alexander Mikhalitsyn wrote: > Inode operations that create a new filesystem object such as ->mknod, > ->create, ->mkdir() and others don't take a {g,u}id argument explicitly. > Instead the caller's fs{g,u}id is used for the {g,u}id of the new > filesystem object. > > In order to ensure that the correct {g,u}id is used map the caller's > fs{g,u}id for creation requests. This doesn't require complex changes. > It suffices to pass in the relevant idmapping recorded in the request > message. If this request message was triggered from an inode operation > that creates filesystem objects it will have passed down the relevant > idmaping. If this is a request message that was triggered from an inode > operation that doens't need to take idmappings into account the initial > idmapping is passed down which is an identity mapping. > > This change uses a new cephfs protocol extension CEPHFS_FEATURE_HAS_OWNER_UIDGID > which adds two new fields (owner_{u,g}id) to the request head structure. > So, we need to ensure that MDS supports it otherwise we need to fail > any IO that comes through an idmapped mount because we can't process it > in a proper way. MDS server without such an extension will use caller_{u,g}id > fields to set a new inode owner UID/GID which is incorrect because caller_{u,g}id > values are unmapped. At the same time we can't map these fields with an > idmapping as it can break UID/GID-based permission checks logic on the > MDS side. This problem was described with a lot of details at [1], [2]. > > [1] https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/ > [2] https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/ > > Cc: Xiubo Li <xiubli@redhat.com> > Cc: Jeff Layton <jlayton@kernel.org> > Cc: Ilya Dryomov <idryomov@gmail.com> > Cc: ceph-devel@vger.kernel.org > Co-Developed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> > Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> > Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> > --- > v7: > - reworked to use two new fields for owner UID/GID (https://github.com/ceph/ceph/pull/52575) > --- > fs/ceph/mds_client.c | 20 ++++++++++++++++++++ > fs/ceph/mds_client.h | 5 ++++- > include/linux/ceph/ceph_fs.h | 4 +++- > 3 files changed, 27 insertions(+), 2 deletions(-) > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c > index c641ab046e98..ac095a95f3d0 100644 > --- a/fs/ceph/mds_client.c > +++ b/fs/ceph/mds_client.c > @@ -2923,6 +2923,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, > { > int mds = session->s_mds; > struct ceph_mds_client *mdsc = session->s_mdsc; > + struct ceph_client *cl = mdsc->fsc->client; > struct ceph_msg *msg; > struct ceph_mds_request_head_legacy *lhead; > const char *path1 = NULL; > @@ -3028,6 +3029,16 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, > lhead = find_legacy_request_head(msg->front.iov_base, > session->s_con.peer_features); > > + if ((req->r_mnt_idmap != &nop_mnt_idmap) && > + !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features)) { > + pr_err_ratelimited_client(cl, > + "idmapped mount is used and CEPHFS_FEATURE_HAS_OWNER_UIDGID" > + " is not supported by MDS. Fail request with -EIO.\n"); > + > + ret = -EIO; > + goto out_err; > + } > + I think this couldn't fail the mounting operation, right ? IMO we should fail the mounting from the beginning. Thanks - Xiubo > /* > * The ceph_mds_request_head_legacy didn't contain a version field, and > * one was added when we moved the message version from 3->4. > @@ -3043,10 +3054,19 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, > p = msg->front.iov_base + sizeof(*ohead); > } else { > struct ceph_mds_request_head *nhead = msg->front.iov_base; > + kuid_t owner_fsuid; > + kgid_t owner_fsgid; > > msg->hdr.version = cpu_to_le16(6); > nhead->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION); > p = msg->front.iov_base + sizeof(*nhead); > + > + owner_fsuid = from_vfsuid(req->r_mnt_idmap, &init_user_ns, > + VFSUIDT_INIT(req->r_cred->fsuid)); > + owner_fsgid = from_vfsgid(req->r_mnt_idmap, &init_user_ns, > + VFSGIDT_INIT(req->r_cred->fsgid)); > + nhead->owner_uid = cpu_to_le32(from_kuid(&init_user_ns, owner_fsuid)); > + nhead->owner_gid = cpu_to_le32(from_kgid(&init_user_ns, owner_fsgid)); > } > > end = msg->front.iov_base + msg->front.iov_len; > diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h > index e3bbf3ba8ee8..8f683e8203bd 100644 > --- a/fs/ceph/mds_client.h > +++ b/fs/ceph/mds_client.h > @@ -33,8 +33,10 @@ enum ceph_feature_type { > CEPHFS_FEATURE_NOTIFY_SESSION_STATE, > CEPHFS_FEATURE_OP_GETVXATTR, > CEPHFS_FEATURE_32BITS_RETRY_FWD, > + CEPHFS_FEATURE_NEW_SNAPREALM_INFO, > + CEPHFS_FEATURE_HAS_OWNER_UIDGID, > > - CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_32BITS_RETRY_FWD, > + CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_HAS_OWNER_UIDGID, > }; > > #define CEPHFS_FEATURES_CLIENT_SUPPORTED { \ > @@ -49,6 +51,7 @@ enum ceph_feature_type { > CEPHFS_FEATURE_NOTIFY_SESSION_STATE, \ > CEPHFS_FEATURE_OP_GETVXATTR, \ > CEPHFS_FEATURE_32BITS_RETRY_FWD, \ > + CEPHFS_FEATURE_HAS_OWNER_UIDGID, \ > } > > /* > diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h > index 5f2301ee88bc..6eb83a51341c 100644 > --- a/include/linux/ceph/ceph_fs.h > +++ b/include/linux/ceph/ceph_fs.h > @@ -499,7 +499,7 @@ struct ceph_mds_request_head_legacy { > union ceph_mds_request_args args; > } __attribute__ ((packed)); > > -#define CEPH_MDS_REQUEST_HEAD_VERSION 2 > +#define CEPH_MDS_REQUEST_HEAD_VERSION 3 > > struct ceph_mds_request_head_old { > __le16 version; /* struct version */ > @@ -530,6 +530,8 @@ struct ceph_mds_request_head { > > __le32 ext_num_retry; /* new count retry attempts */ > __le32 ext_num_fwd; /* new count fwd attempts */ > + > + __le32 owner_uid, owner_gid; /* used for OPs which create inodes */ > } __attribute__ ((packed)); > > /* cap/lease release record */
On Thu, Jul 27, 2023 at 7:30 AM Xiubo Li <xiubli@redhat.com> wrote: > > > On 7/26/23 22:10, Alexander Mikhalitsyn wrote: > > Inode operations that create a new filesystem object such as ->mknod, > > ->create, ->mkdir() and others don't take a {g,u}id argument explicitly. > > Instead the caller's fs{g,u}id is used for the {g,u}id of the new > > filesystem object. > > > > In order to ensure that the correct {g,u}id is used map the caller's > > fs{g,u}id for creation requests. This doesn't require complex changes. > > It suffices to pass in the relevant idmapping recorded in the request > > message. If this request message was triggered from an inode operation > > that creates filesystem objects it will have passed down the relevant > > idmaping. If this is a request message that was triggered from an inode > > operation that doens't need to take idmappings into account the initial > > idmapping is passed down which is an identity mapping. > > > > This change uses a new cephfs protocol extension CEPHFS_FEATURE_HAS_OWNER_UIDGID > > which adds two new fields (owner_{u,g}id) to the request head structure. > > So, we need to ensure that MDS supports it otherwise we need to fail > > any IO that comes through an idmapped mount because we can't process it > > in a proper way. MDS server without such an extension will use caller_{u,g}id > > fields to set a new inode owner UID/GID which is incorrect because caller_{u,g}id > > values are unmapped. At the same time we can't map these fields with an > > idmapping as it can break UID/GID-based permission checks logic on the > > MDS side. This problem was described with a lot of details at [1], [2]. > > > > [1] https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/ > > [2] https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/ > > > > Cc: Xiubo Li <xiubli@redhat.com> > > Cc: Jeff Layton <jlayton@kernel.org> > > Cc: Ilya Dryomov <idryomov@gmail.com> > > Cc: ceph-devel@vger.kernel.org > > Co-Developed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> > > Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> > > Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> > > --- > > v7: > > - reworked to use two new fields for owner UID/GID (https://github.com/ceph/ceph/pull/52575) > > --- > > fs/ceph/mds_client.c | 20 ++++++++++++++++++++ > > fs/ceph/mds_client.h | 5 ++++- > > include/linux/ceph/ceph_fs.h | 4 +++- > > 3 files changed, 27 insertions(+), 2 deletions(-) > > > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c > > index c641ab046e98..ac095a95f3d0 100644 > > --- a/fs/ceph/mds_client.c > > +++ b/fs/ceph/mds_client.c > > @@ -2923,6 +2923,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, > > { > > int mds = session->s_mds; > > struct ceph_mds_client *mdsc = session->s_mdsc; > > + struct ceph_client *cl = mdsc->fsc->client; > > struct ceph_msg *msg; > > struct ceph_mds_request_head_legacy *lhead; > > const char *path1 = NULL; > > @@ -3028,6 +3029,16 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, > > lhead = find_legacy_request_head(msg->front.iov_base, > > session->s_con.peer_features); > > > > + if ((req->r_mnt_idmap != &nop_mnt_idmap) && > > + !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features)) { > > + pr_err_ratelimited_client(cl, > > + "idmapped mount is used and CEPHFS_FEATURE_HAS_OWNER_UIDGID" > > + " is not supported by MDS. Fail request with -EIO.\n"); > > + > > + ret = -EIO; > > + goto out_err; > > + } > > + > > I think this couldn't fail the mounting operation, right ? This won't fail mounting. First of all an idmapped mount is always an additional mount, you always start from doing "normal" mount and only after that you can use this mount to create an idmapped one. ( example: https://github.com/brauner/mount-idmapped/tree/master ) > > IMO we should fail the mounting from the beginning. Unfortunately, we can't fail mount from the beginning. Procedure of the idmapped mounts creation is handled not on the filesystem level, but on the VFS level (source: https://github.com/torvalds/linux/blob/0a8db05b571ad5b8d5c8774a004c0424260a90bd/fs/namespace.c#L4277 ) Kernel perform all required checks as: - filesystem type has declared to support idmappings (fs_type->fs_flags & FS_ALLOW_IDMAP) - user who creates idmapped mount should be CAP_SYS_ADMIN in a user namespace that owns superblock of the filesystem (for cephfs it's always init_user_ns => user should be root on the host) So I would like to go this way because of the reasons mentioned above: - root user is someone who understands what he does. - idmapped mounts are never "first" mounts. They are always created after "normal" mount. - effectively this check makes "normal" mount to work normally and fail only requests that comes through an idmapped mounts with reasonable error message. Obviously, all read operations will work perfectly well only the operations that create new inodes will fail. Btw, we already have an analogical semantic on the VFS level for users who have no UID/GID mapping to the host. Filesystem requests for such users will fail with -EOVERFLOW. Here we have something close. I think we can take a look at this in the future when some other filesystem will require the same feature of checking idmapped mounts creation on the filesystem level. (We can introduce some extra callback on the superblock level or something like that.) But I think that it makes sense to do that when cephfs will be allowed to be mounted in the user namespace. I hope that Christian Brauner will add something here. :-) Kind regards, Alex > > Thanks > > - Xiubo > > > > /* > > * The ceph_mds_request_head_legacy didn't contain a version field, and > > * one was added when we moved the message version from 3->4. > > @@ -3043,10 +3054,19 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, > > p = msg->front.iov_base + sizeof(*ohead); > > } else { > > struct ceph_mds_request_head *nhead = msg->front.iov_base; > > + kuid_t owner_fsuid; > > + kgid_t owner_fsgid; > > > > msg->hdr.version = cpu_to_le16(6); > > nhead->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION); > > p = msg->front.iov_base + sizeof(*nhead); > > + > > + owner_fsuid = from_vfsuid(req->r_mnt_idmap, &init_user_ns, > > + VFSUIDT_INIT(req->r_cred->fsuid)); > > + owner_fsgid = from_vfsgid(req->r_mnt_idmap, &init_user_ns, > > + VFSGIDT_INIT(req->r_cred->fsgid)); > > + nhead->owner_uid = cpu_to_le32(from_kuid(&init_user_ns, owner_fsuid)); > > + nhead->owner_gid = cpu_to_le32(from_kgid(&init_user_ns, owner_fsgid)); > > } > > > > end = msg->front.iov_base + msg->front.iov_len; > > diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h > > index e3bbf3ba8ee8..8f683e8203bd 100644 > > --- a/fs/ceph/mds_client.h > > +++ b/fs/ceph/mds_client.h > > @@ -33,8 +33,10 @@ enum ceph_feature_type { > > CEPHFS_FEATURE_NOTIFY_SESSION_STATE, > > CEPHFS_FEATURE_OP_GETVXATTR, > > CEPHFS_FEATURE_32BITS_RETRY_FWD, > > + CEPHFS_FEATURE_NEW_SNAPREALM_INFO, > > + CEPHFS_FEATURE_HAS_OWNER_UIDGID, > > > > - CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_32BITS_RETRY_FWD, > > + CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_HAS_OWNER_UIDGID, > > }; > > > > #define CEPHFS_FEATURES_CLIENT_SUPPORTED { \ > > @@ -49,6 +51,7 @@ enum ceph_feature_type { > > CEPHFS_FEATURE_NOTIFY_SESSION_STATE, \ > > CEPHFS_FEATURE_OP_GETVXATTR, \ > > CEPHFS_FEATURE_32BITS_RETRY_FWD, \ > > + CEPHFS_FEATURE_HAS_OWNER_UIDGID, \ > > } > > > > /* > > diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h > > index 5f2301ee88bc..6eb83a51341c 100644 > > --- a/include/linux/ceph/ceph_fs.h > > +++ b/include/linux/ceph/ceph_fs.h > > @@ -499,7 +499,7 @@ struct ceph_mds_request_head_legacy { > > union ceph_mds_request_args args; > > } __attribute__ ((packed)); > > > > -#define CEPH_MDS_REQUEST_HEAD_VERSION 2 > > +#define CEPH_MDS_REQUEST_HEAD_VERSION 3 > > > > struct ceph_mds_request_head_old { > > __le16 version; /* struct version */ > > @@ -530,6 +530,8 @@ struct ceph_mds_request_head { > > > > __le32 ext_num_retry; /* new count retry attempts */ > > __le32 ext_num_fwd; /* new count fwd attempts */ > > + > > + __le32 owner_uid, owner_gid; /* used for OPs which create inodes */ > > } __attribute__ ((packed)); > > > > /* cap/lease release record */ >
On Wed, Jul 26, 2023 at 4:10 PM Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> wrote: > Oops, have just noticed. Author of this commit should be Christian Brauner. It's because I've squashed this commit into the previous one (which was the commit that updated struct ceph_mds_request_head). I'll fix that next time. > Inode operations that create a new filesystem object such as ->mknod, > ->create, ->mkdir() and others don't take a {g,u}id argument explicitly. > Instead the caller's fs{g,u}id is used for the {g,u}id of the new > filesystem object. > > In order to ensure that the correct {g,u}id is used map the caller's > fs{g,u}id for creation requests. This doesn't require complex changes. > It suffices to pass in the relevant idmapping recorded in the request > message. If this request message was triggered from an inode operation > that creates filesystem objects it will have passed down the relevant > idmaping. If this is a request message that was triggered from an inode > operation that doens't need to take idmappings into account the initial > idmapping is passed down which is an identity mapping. > > This change uses a new cephfs protocol extension CEPHFS_FEATURE_HAS_OWNER_UIDGID > which adds two new fields (owner_{u,g}id) to the request head structure. > So, we need to ensure that MDS supports it otherwise we need to fail > any IO that comes through an idmapped mount because we can't process it > in a proper way. MDS server without such an extension will use caller_{u,g}id > fields to set a new inode owner UID/GID which is incorrect because caller_{u,g}id > values are unmapped. At the same time we can't map these fields with an > idmapping as it can break UID/GID-based permission checks logic on the > MDS side. This problem was described with a lot of details at [1], [2]. > > [1] https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/ > [2] https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/ > > Cc: Xiubo Li <xiubli@redhat.com> > Cc: Jeff Layton <jlayton@kernel.org> > Cc: Ilya Dryomov <idryomov@gmail.com> > Cc: ceph-devel@vger.kernel.org > Co-Developed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> > Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> > Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> > --- > v7: > - reworked to use two new fields for owner UID/GID (https://github.com/ceph/ceph/pull/52575) > --- > fs/ceph/mds_client.c | 20 ++++++++++++++++++++ > fs/ceph/mds_client.h | 5 ++++- > include/linux/ceph/ceph_fs.h | 4 +++- > 3 files changed, 27 insertions(+), 2 deletions(-) > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c > index c641ab046e98..ac095a95f3d0 100644 > --- a/fs/ceph/mds_client.c > +++ b/fs/ceph/mds_client.c > @@ -2923,6 +2923,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, > { > int mds = session->s_mds; > struct ceph_mds_client *mdsc = session->s_mdsc; > + struct ceph_client *cl = mdsc->fsc->client; > struct ceph_msg *msg; > struct ceph_mds_request_head_legacy *lhead; > const char *path1 = NULL; > @@ -3028,6 +3029,16 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, > lhead = find_legacy_request_head(msg->front.iov_base, > session->s_con.peer_features); > > + if ((req->r_mnt_idmap != &nop_mnt_idmap) && > + !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features)) { > + pr_err_ratelimited_client(cl, > + "idmapped mount is used and CEPHFS_FEATURE_HAS_OWNER_UIDGID" > + " is not supported by MDS. Fail request with -EIO.\n"); > + > + ret = -EIO; > + goto out_err; > + } > + > /* > * The ceph_mds_request_head_legacy didn't contain a version field, and > * one was added when we moved the message version from 3->4. > @@ -3043,10 +3054,19 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, > p = msg->front.iov_base + sizeof(*ohead); > } else { > struct ceph_mds_request_head *nhead = msg->front.iov_base; > + kuid_t owner_fsuid; > + kgid_t owner_fsgid; > > msg->hdr.version = cpu_to_le16(6); > nhead->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION); > p = msg->front.iov_base + sizeof(*nhead); > + > + owner_fsuid = from_vfsuid(req->r_mnt_idmap, &init_user_ns, > + VFSUIDT_INIT(req->r_cred->fsuid)); > + owner_fsgid = from_vfsgid(req->r_mnt_idmap, &init_user_ns, > + VFSGIDT_INIT(req->r_cred->fsgid)); > + nhead->owner_uid = cpu_to_le32(from_kuid(&init_user_ns, owner_fsuid)); > + nhead->owner_gid = cpu_to_le32(from_kgid(&init_user_ns, owner_fsgid)); > } > > end = msg->front.iov_base + msg->front.iov_len; > diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h > index e3bbf3ba8ee8..8f683e8203bd 100644 > --- a/fs/ceph/mds_client.h > +++ b/fs/ceph/mds_client.h > @@ -33,8 +33,10 @@ enum ceph_feature_type { > CEPHFS_FEATURE_NOTIFY_SESSION_STATE, > CEPHFS_FEATURE_OP_GETVXATTR, > CEPHFS_FEATURE_32BITS_RETRY_FWD, > + CEPHFS_FEATURE_NEW_SNAPREALM_INFO, > + CEPHFS_FEATURE_HAS_OWNER_UIDGID, > > - CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_32BITS_RETRY_FWD, > + CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_HAS_OWNER_UIDGID, > }; > > #define CEPHFS_FEATURES_CLIENT_SUPPORTED { \ > @@ -49,6 +51,7 @@ enum ceph_feature_type { > CEPHFS_FEATURE_NOTIFY_SESSION_STATE, \ > CEPHFS_FEATURE_OP_GETVXATTR, \ > CEPHFS_FEATURE_32BITS_RETRY_FWD, \ > + CEPHFS_FEATURE_HAS_OWNER_UIDGID, \ > } > > /* > diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h > index 5f2301ee88bc..6eb83a51341c 100644 > --- a/include/linux/ceph/ceph_fs.h > +++ b/include/linux/ceph/ceph_fs.h > @@ -499,7 +499,7 @@ struct ceph_mds_request_head_legacy { > union ceph_mds_request_args args; > } __attribute__ ((packed)); > > -#define CEPH_MDS_REQUEST_HEAD_VERSION 2 > +#define CEPH_MDS_REQUEST_HEAD_VERSION 3 > > struct ceph_mds_request_head_old { > __le16 version; /* struct version */ > @@ -530,6 +530,8 @@ struct ceph_mds_request_head { > > __le32 ext_num_retry; /* new count retry attempts */ > __le32 ext_num_fwd; /* new count fwd attempts */ > + > + __le32 owner_uid, owner_gid; /* used for OPs which create inodes */ > } __attribute__ ((packed)); > > /* cap/lease release record */ > -- > 2.34.1 >
On Thu, Jul 27, 2023 at 08:36:40AM +0200, Aleksandr Mikhalitsyn wrote: > On Thu, Jul 27, 2023 at 7:30 AM Xiubo Li <xiubli@redhat.com> wrote: > > > > > > On 7/26/23 22:10, Alexander Mikhalitsyn wrote: > > > Inode operations that create a new filesystem object such as ->mknod, > > > ->create, ->mkdir() and others don't take a {g,u}id argument explicitly. > > > Instead the caller's fs{g,u}id is used for the {g,u}id of the new > > > filesystem object. > > > > > > In order to ensure that the correct {g,u}id is used map the caller's > > > fs{g,u}id for creation requests. This doesn't require complex changes. > > > It suffices to pass in the relevant idmapping recorded in the request > > > message. If this request message was triggered from an inode operation > > > that creates filesystem objects it will have passed down the relevant > > > idmaping. If this is a request message that was triggered from an inode > > > operation that doens't need to take idmappings into account the initial > > > idmapping is passed down which is an identity mapping. > > > > > > This change uses a new cephfs protocol extension CEPHFS_FEATURE_HAS_OWNER_UIDGID > > > which adds two new fields (owner_{u,g}id) to the request head structure. > > > So, we need to ensure that MDS supports it otherwise we need to fail > > > any IO that comes through an idmapped mount because we can't process it > > > in a proper way. MDS server without such an extension will use caller_{u,g}id > > > fields to set a new inode owner UID/GID which is incorrect because caller_{u,g}id > > > values are unmapped. At the same time we can't map these fields with an > > > idmapping as it can break UID/GID-based permission checks logic on the > > > MDS side. This problem was described with a lot of details at [1], [2]. > > > > > > [1] https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/ > > > [2] https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/ > > > > > > Cc: Xiubo Li <xiubli@redhat.com> > > > Cc: Jeff Layton <jlayton@kernel.org> > > > Cc: Ilya Dryomov <idryomov@gmail.com> > > > Cc: ceph-devel@vger.kernel.org > > > Co-Developed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> > > > Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> > > > Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> > > > --- > > > v7: > > > - reworked to use two new fields for owner UID/GID (https://github.com/ceph/ceph/pull/52575) > > > --- > > > fs/ceph/mds_client.c | 20 ++++++++++++++++++++ > > > fs/ceph/mds_client.h | 5 ++++- > > > include/linux/ceph/ceph_fs.h | 4 +++- > > > 3 files changed, 27 insertions(+), 2 deletions(-) > > > > > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c > > > index c641ab046e98..ac095a95f3d0 100644 > > > --- a/fs/ceph/mds_client.c > > > +++ b/fs/ceph/mds_client.c > > > @@ -2923,6 +2923,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, > > > { > > > int mds = session->s_mds; > > > struct ceph_mds_client *mdsc = session->s_mdsc; > > > + struct ceph_client *cl = mdsc->fsc->client; > > > struct ceph_msg *msg; > > > struct ceph_mds_request_head_legacy *lhead; > > > const char *path1 = NULL; > > > @@ -3028,6 +3029,16 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, > > > lhead = find_legacy_request_head(msg->front.iov_base, > > > session->s_con.peer_features); > > > > > > + if ((req->r_mnt_idmap != &nop_mnt_idmap) && > > > + !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features)) { > > > + pr_err_ratelimited_client(cl, > > > + "idmapped mount is used and CEPHFS_FEATURE_HAS_OWNER_UIDGID" > > > + " is not supported by MDS. Fail request with -EIO.\n"); > > > + > > > + ret = -EIO; > > > + goto out_err; > > > + } > > > + > > > > I think this couldn't fail the mounting operation, right ? > > This won't fail mounting. First of all an idmapped mount is always an > additional mount, you always > start from doing "normal" mount and only after that you can use this > mount to create an idmapped one. > ( example: https://github.com/brauner/mount-idmapped/tree/master ) > > > > > IMO we should fail the mounting from the beginning. > > Unfortunately, we can't fail mount from the beginning. Procedure of > the idmapped mounts > creation is handled not on the filesystem level, but on the VFS level Correct. It's a generic vfsmount feature. > (source: https://github.com/torvalds/linux/blob/0a8db05b571ad5b8d5c8774a004c0424260a90bd/fs/namespace.c#L4277 > ) > > Kernel perform all required checks as: > - filesystem type has declared to support idmappings > (fs_type->fs_flags & FS_ALLOW_IDMAP) > - user who creates idmapped mount should be CAP_SYS_ADMIN in a user > namespace that owns superblock of the filesystem > (for cephfs it's always init_user_ns => user should be root on the host) > > So I would like to go this way because of the reasons mentioned above: > - root user is someone who understands what he does. > - idmapped mounts are never "first" mounts. They are always created > after "normal" mount. > - effectively this check makes "normal" mount to work normally and > fail only requests that comes through an idmapped mounts > with reasonable error message. Obviously, all read operations will > work perfectly well only the operations that create new inodes will > fail. > Btw, we already have an analogical semantic on the VFS level for users > who have no UID/GID mapping to the host. Filesystem requests for > such users will fail with -EOVERFLOW. Here we have something close. Refusing requests coming from an idmapped mount if the server misses appropriate features is good enough as a first step imho. And yes, we do have similar logic on the vfs level for unmapped uid/gid.
On Thu, Jul 27, 2023 at 11:01 AM Christian Brauner <brauner@kernel.org> wrote: > > On Thu, Jul 27, 2023 at 08:36:40AM +0200, Aleksandr Mikhalitsyn wrote: > > On Thu, Jul 27, 2023 at 7:30 AM Xiubo Li <xiubli@redhat.com> wrote: > > > > > > > > > On 7/26/23 22:10, Alexander Mikhalitsyn wrote: > > > > Inode operations that create a new filesystem object such as ->mknod, > > > > ->create, ->mkdir() and others don't take a {g,u}id argument explicitly. > > > > Instead the caller's fs{g,u}id is used for the {g,u}id of the new > > > > filesystem object. > > > > > > > > In order to ensure that the correct {g,u}id is used map the caller's > > > > fs{g,u}id for creation requests. This doesn't require complex changes. > > > > It suffices to pass in the relevant idmapping recorded in the request > > > > message. If this request message was triggered from an inode operation > > > > that creates filesystem objects it will have passed down the relevant > > > > idmaping. If this is a request message that was triggered from an inode > > > > operation that doens't need to take idmappings into account the initial > > > > idmapping is passed down which is an identity mapping. > > > > > > > > This change uses a new cephfs protocol extension CEPHFS_FEATURE_HAS_OWNER_UIDGID > > > > which adds two new fields (owner_{u,g}id) to the request head structure. > > > > So, we need to ensure that MDS supports it otherwise we need to fail > > > > any IO that comes through an idmapped mount because we can't process it > > > > in a proper way. MDS server without such an extension will use caller_{u,g}id > > > > fields to set a new inode owner UID/GID which is incorrect because caller_{u,g}id > > > > values are unmapped. At the same time we can't map these fields with an > > > > idmapping as it can break UID/GID-based permission checks logic on the > > > > MDS side. This problem was described with a lot of details at [1], [2]. > > > > > > > > [1] https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/ > > > > [2] https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/ > > > > > > > > Cc: Xiubo Li <xiubli@redhat.com> > > > > Cc: Jeff Layton <jlayton@kernel.org> > > > > Cc: Ilya Dryomov <idryomov@gmail.com> > > > > Cc: ceph-devel@vger.kernel.org > > > > Co-Developed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> > > > > Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> > > > > Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> > > > > --- > > > > v7: > > > > - reworked to use two new fields for owner UID/GID (https://github.com/ceph/ceph/pull/52575) > > > > --- > > > > fs/ceph/mds_client.c | 20 ++++++++++++++++++++ > > > > fs/ceph/mds_client.h | 5 ++++- > > > > include/linux/ceph/ceph_fs.h | 4 +++- > > > > 3 files changed, 27 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c > > > > index c641ab046e98..ac095a95f3d0 100644 > > > > --- a/fs/ceph/mds_client.c > > > > +++ b/fs/ceph/mds_client.c > > > > @@ -2923,6 +2923,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, > > > > { > > > > int mds = session->s_mds; > > > > struct ceph_mds_client *mdsc = session->s_mdsc; > > > > + struct ceph_client *cl = mdsc->fsc->client; > > > > struct ceph_msg *msg; > > > > struct ceph_mds_request_head_legacy *lhead; > > > > const char *path1 = NULL; > > > > @@ -3028,6 +3029,16 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, > > > > lhead = find_legacy_request_head(msg->front.iov_base, > > > > session->s_con.peer_features); > > > > > > > > + if ((req->r_mnt_idmap != &nop_mnt_idmap) && > > > > + !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features)) { > > > > + pr_err_ratelimited_client(cl, > > > > + "idmapped mount is used and CEPHFS_FEATURE_HAS_OWNER_UIDGID" > > > > + " is not supported by MDS. Fail request with -EIO.\n"); > > > > + > > > > + ret = -EIO; > > > > + goto out_err; > > > > + } > > > > + > > > > > > I think this couldn't fail the mounting operation, right ? > > > > This won't fail mounting. First of all an idmapped mount is always an > > additional mount, you always > > start from doing "normal" mount and only after that you can use this > > mount to create an idmapped one. > > ( example: https://github.com/brauner/mount-idmapped/tree/master ) > > > > > > > > IMO we should fail the mounting from the beginning. > > > > Unfortunately, we can't fail mount from the beginning. Procedure of > > the idmapped mounts > > creation is handled not on the filesystem level, but on the VFS level > > Correct. It's a generic vfsmount feature. > > > (source: https://github.com/torvalds/linux/blob/0a8db05b571ad5b8d5c8774a004c0424260a90bd/fs/namespace.c#L4277 > > ) > > > > Kernel perform all required checks as: > > - filesystem type has declared to support idmappings > > (fs_type->fs_flags & FS_ALLOW_IDMAP) > > - user who creates idmapped mount should be CAP_SYS_ADMIN in a user > > namespace that owns superblock of the filesystem > > (for cephfs it's always init_user_ns => user should be root on the host) > > > > So I would like to go this way because of the reasons mentioned above: > > - root user is someone who understands what he does. > > - idmapped mounts are never "first" mounts. They are always created > > after "normal" mount. > > - effectively this check makes "normal" mount to work normally and > > fail only requests that comes through an idmapped mounts > > with reasonable error message. Obviously, all read operations will > > work perfectly well only the operations that create new inodes will > > fail. > > Btw, we already have an analogical semantic on the VFS level for users > > who have no UID/GID mapping to the host. Filesystem requests for > > such users will fail with -EOVERFLOW. Here we have something close. > > Refusing requests coming from an idmapped mount if the server misses > appropriate features is good enough as a first step imho. And yes, we do > have similar logic on the vfs level for unmapped uid/gid. Thanks, Christian! I wanted to add that alternative here is to modify caller_{u,g}id fields as it was done in the first approach, it will break the UID/GID-based permissions model for old MDS versions (we can put printk_once to inform user about this), but at the same time it will allow us to support idmapped mounts in all cases. This support will be not fully ideal for old MDS and perfectly well for new MDS versions. Alternatively, we can introduce cephfs mount option like "idmap_with_old_mds" and if it's enabled then we set caller_{u,g}id for MDS without CEPHFS_FEATURE_HAS_OWNER_UIDGID, if it's disabled (default) we fail requests with -EIO. For new MDS everything goes in the right way. Kind regards, Alex
On Thu, Jul 27, 2023 at 5:48 AM Aleksandr Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> wrote: > > On Thu, Jul 27, 2023 at 11:01 AM Christian Brauner <brauner@kernel.org> wrote: > > > > On Thu, Jul 27, 2023 at 08:36:40AM +0200, Aleksandr Mikhalitsyn wrote: > > > On Thu, Jul 27, 2023 at 7:30 AM Xiubo Li <xiubli@redhat.com> wrote: > > > > > > > > > > > > On 7/26/23 22:10, Alexander Mikhalitsyn wrote: > > > > > Inode operations that create a new filesystem object such as ->mknod, > > > > > ->create, ->mkdir() and others don't take a {g,u}id argument explicitly. > > > > > Instead the caller's fs{g,u}id is used for the {g,u}id of the new > > > > > filesystem object. > > > > > > > > > > In order to ensure that the correct {g,u}id is used map the caller's > > > > > fs{g,u}id for creation requests. This doesn't require complex changes. > > > > > It suffices to pass in the relevant idmapping recorded in the request > > > > > message. If this request message was triggered from an inode operation > > > > > that creates filesystem objects it will have passed down the relevant > > > > > idmaping. If this is a request message that was triggered from an inode > > > > > operation that doens't need to take idmappings into account the initial > > > > > idmapping is passed down which is an identity mapping. > > > > > > > > > > This change uses a new cephfs protocol extension CEPHFS_FEATURE_HAS_OWNER_UIDGID > > > > > which adds two new fields (owner_{u,g}id) to the request head structure. > > > > > So, we need to ensure that MDS supports it otherwise we need to fail > > > > > any IO that comes through an idmapped mount because we can't process it > > > > > in a proper way. MDS server without such an extension will use caller_{u,g}id > > > > > fields to set a new inode owner UID/GID which is incorrect because caller_{u,g}id > > > > > values are unmapped. At the same time we can't map these fields with an > > > > > idmapping as it can break UID/GID-based permission checks logic on the > > > > > MDS side. This problem was described with a lot of details at [1], [2]. > > > > > > > > > > [1] https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/ > > > > > [2] https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/ > > > > > > > > > > Cc: Xiubo Li <xiubli@redhat.com> > > > > > Cc: Jeff Layton <jlayton@kernel.org> > > > > > Cc: Ilya Dryomov <idryomov@gmail.com> > > > > > Cc: ceph-devel@vger.kernel.org > > > > > Co-Developed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> > > > > > Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> > > > > > Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> > > > > > --- > > > > > v7: > > > > > - reworked to use two new fields for owner UID/GID (https://github.com/ceph/ceph/pull/52575) > > > > > --- > > > > > fs/ceph/mds_client.c | 20 ++++++++++++++++++++ > > > > > fs/ceph/mds_client.h | 5 ++++- > > > > > include/linux/ceph/ceph_fs.h | 4 +++- > > > > > 3 files changed, 27 insertions(+), 2 deletions(-) > > > > > > > > > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c > > > > > index c641ab046e98..ac095a95f3d0 100644 > > > > > --- a/fs/ceph/mds_client.c > > > > > +++ b/fs/ceph/mds_client.c > > > > > @@ -2923,6 +2923,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, > > > > > { > > > > > int mds = session->s_mds; > > > > > struct ceph_mds_client *mdsc = session->s_mdsc; > > > > > + struct ceph_client *cl = mdsc->fsc->client; > > > > > struct ceph_msg *msg; > > > > > struct ceph_mds_request_head_legacy *lhead; > > > > > const char *path1 = NULL; > > > > > @@ -3028,6 +3029,16 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, > > > > > lhead = find_legacy_request_head(msg->front.iov_base, > > > > > session->s_con.peer_features); > > > > > > > > > > + if ((req->r_mnt_idmap != &nop_mnt_idmap) && > > > > > + !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features)) { > > > > > + pr_err_ratelimited_client(cl, > > > > > + "idmapped mount is used and CEPHFS_FEATURE_HAS_OWNER_UIDGID" > > > > > + " is not supported by MDS. Fail request with -EIO.\n"); > > > > > + > > > > > + ret = -EIO; > > > > > + goto out_err; > > > > > + } > > > > > + > > > > > > > > I think this couldn't fail the mounting operation, right ? > > > > > > This won't fail mounting. First of all an idmapped mount is always an > > > additional mount, you always > > > start from doing "normal" mount and only after that you can use this > > > mount to create an idmapped one. > > > ( example: https://github.com/brauner/mount-idmapped/tree/master ) > > > > > > > > > > > IMO we should fail the mounting from the beginning. > > > > > > Unfortunately, we can't fail mount from the beginning. Procedure of > > > the idmapped mounts > > > creation is handled not on the filesystem level, but on the VFS level > > > > Correct. It's a generic vfsmount feature. > > > > > (source: https://github.com/torvalds/linux/blob/0a8db05b571ad5b8d5c8774a004c0424260a90bd/fs/namespace.c#L4277 > > > ) > > > > > > Kernel perform all required checks as: > > > - filesystem type has declared to support idmappings > > > (fs_type->fs_flags & FS_ALLOW_IDMAP) > > > - user who creates idmapped mount should be CAP_SYS_ADMIN in a user > > > namespace that owns superblock of the filesystem > > > (for cephfs it's always init_user_ns => user should be root on the host) > > > > > > So I would like to go this way because of the reasons mentioned above: > > > - root user is someone who understands what he does. > > > - idmapped mounts are never "first" mounts. They are always created > > > after "normal" mount. > > > - effectively this check makes "normal" mount to work normally and > > > fail only requests that comes through an idmapped mounts > > > with reasonable error message. Obviously, all read operations will > > > work perfectly well only the operations that create new inodes will > > > fail. > > > Btw, we already have an analogical semantic on the VFS level for users > > > who have no UID/GID mapping to the host. Filesystem requests for > > > such users will fail with -EOVERFLOW. Here we have something close. > > > > Refusing requests coming from an idmapped mount if the server misses > > appropriate features is good enough as a first step imho. And yes, we do > > have similar logic on the vfs level for unmapped uid/gid. > > Thanks, Christian! > > I wanted to add that alternative here is to modify caller_{u,g}id > fields as it was done in the first approach, > it will break the UID/GID-based permissions model for old MDS versions > (we can put printk_once to inform user about this), > but at the same time it will allow us to support idmapped mounts in > all cases. This support will be not fully ideal for old MDS > and perfectly well for new MDS versions. > > Alternatively, we can introduce cephfs mount option like > "idmap_with_old_mds" and if it's enabled then we set caller_{u,g}id > for MDS without CEPHFS_FEATURE_HAS_OWNER_UIDGID, if it's disabled > (default) we fail requests with -EIO. For > new MDS everything goes in the right way. > > Kind regards, > Alex Hey there, A very strong +1 on there needing to be some way to make this work with older Ceph releases. Ceph Reef isn't out yet and we're in July 2023, so I'd really like not having to wait until Ceph Squid in mid 2024 to be able to make use of this! Some kind of mount option, module option or the like would all be fine for this. Stéphane
On Thu, Jul 27, 2023 at 4:46 PM Stéphane Graber <stgraber@ubuntu.com> wrote: > > On Thu, Jul 27, 2023 at 5:48 AM Aleksandr Mikhalitsyn > <aleksandr.mikhalitsyn@canonical.com> wrote: > > > > On Thu, Jul 27, 2023 at 11:01 AM Christian Brauner <brauner@kernel.org> wrote: > > > > > > On Thu, Jul 27, 2023 at 08:36:40AM +0200, Aleksandr Mikhalitsyn wrote: > > > > On Thu, Jul 27, 2023 at 7:30 AM Xiubo Li <xiubli@redhat.com> wrote: > > > > > > > > > > > > > > > On 7/26/23 22:10, Alexander Mikhalitsyn wrote: > > > > > > Inode operations that create a new filesystem object such as ->mknod, > > > > > > ->create, ->mkdir() and others don't take a {g,u}id argument explicitly. > > > > > > Instead the caller's fs{g,u}id is used for the {g,u}id of the new > > > > > > filesystem object. > > > > > > > > > > > > In order to ensure that the correct {g,u}id is used map the caller's > > > > > > fs{g,u}id for creation requests. This doesn't require complex changes. > > > > > > It suffices to pass in the relevant idmapping recorded in the request > > > > > > message. If this request message was triggered from an inode operation > > > > > > that creates filesystem objects it will have passed down the relevant > > > > > > idmaping. If this is a request message that was triggered from an inode > > > > > > operation that doens't need to take idmappings into account the initial > > > > > > idmapping is passed down which is an identity mapping. > > > > > > > > > > > > This change uses a new cephfs protocol extension CEPHFS_FEATURE_HAS_OWNER_UIDGID > > > > > > which adds two new fields (owner_{u,g}id) to the request head structure. > > > > > > So, we need to ensure that MDS supports it otherwise we need to fail > > > > > > any IO that comes through an idmapped mount because we can't process it > > > > > > in a proper way. MDS server without such an extension will use caller_{u,g}id > > > > > > fields to set a new inode owner UID/GID which is incorrect because caller_{u,g}id > > > > > > values are unmapped. At the same time we can't map these fields with an > > > > > > idmapping as it can break UID/GID-based permission checks logic on the > > > > > > MDS side. This problem was described with a lot of details at [1], [2]. > > > > > > > > > > > > [1] https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/ > > > > > > [2] https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/ > > > > > > > > > > > > Cc: Xiubo Li <xiubli@redhat.com> > > > > > > Cc: Jeff Layton <jlayton@kernel.org> > > > > > > Cc: Ilya Dryomov <idryomov@gmail.com> > > > > > > Cc: ceph-devel@vger.kernel.org > > > > > > Co-Developed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> > > > > > > Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> > > > > > > Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> > > > > > > --- > > > > > > v7: > > > > > > - reworked to use two new fields for owner UID/GID (https://github.com/ceph/ceph/pull/52575) > > > > > > --- > > > > > > fs/ceph/mds_client.c | 20 ++++++++++++++++++++ > > > > > > fs/ceph/mds_client.h | 5 ++++- > > > > > > include/linux/ceph/ceph_fs.h | 4 +++- > > > > > > 3 files changed, 27 insertions(+), 2 deletions(-) > > > > > > > > > > > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c > > > > > > index c641ab046e98..ac095a95f3d0 100644 > > > > > > --- a/fs/ceph/mds_client.c > > > > > > +++ b/fs/ceph/mds_client.c > > > > > > @@ -2923,6 +2923,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, > > > > > > { > > > > > > int mds = session->s_mds; > > > > > > struct ceph_mds_client *mdsc = session->s_mdsc; > > > > > > + struct ceph_client *cl = mdsc->fsc->client; > > > > > > struct ceph_msg *msg; > > > > > > struct ceph_mds_request_head_legacy *lhead; > > > > > > const char *path1 = NULL; > > > > > > @@ -3028,6 +3029,16 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, > > > > > > lhead = find_legacy_request_head(msg->front.iov_base, > > > > > > session->s_con.peer_features); > > > > > > > > > > > > + if ((req->r_mnt_idmap != &nop_mnt_idmap) && > > > > > > + !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features)) { > > > > > > + pr_err_ratelimited_client(cl, > > > > > > + "idmapped mount is used and CEPHFS_FEATURE_HAS_OWNER_UIDGID" > > > > > > + " is not supported by MDS. Fail request with -EIO.\n"); > > > > > > + > > > > > > + ret = -EIO; > > > > > > + goto out_err; > > > > > > + } > > > > > > + > > > > > > > > > > I think this couldn't fail the mounting operation, right ? > > > > > > > > This won't fail mounting. First of all an idmapped mount is always an > > > > additional mount, you always > > > > start from doing "normal" mount and only after that you can use this > > > > mount to create an idmapped one. > > > > ( example: https://github.com/brauner/mount-idmapped/tree/master ) > > > > > > > > > > > > > > IMO we should fail the mounting from the beginning. > > > > > > > > Unfortunately, we can't fail mount from the beginning. Procedure of > > > > the idmapped mounts > > > > creation is handled not on the filesystem level, but on the VFS level > > > > > > Correct. It's a generic vfsmount feature. > > > > > > > (source: https://github.com/torvalds/linux/blob/0a8db05b571ad5b8d5c8774a004c0424260a90bd/fs/namespace.c#L4277 > > > > ) > > > > > > > > Kernel perform all required checks as: > > > > - filesystem type has declared to support idmappings > > > > (fs_type->fs_flags & FS_ALLOW_IDMAP) > > > > - user who creates idmapped mount should be CAP_SYS_ADMIN in a user > > > > namespace that owns superblock of the filesystem > > > > (for cephfs it's always init_user_ns => user should be root on the host) > > > > > > > > So I would like to go this way because of the reasons mentioned above: > > > > - root user is someone who understands what he does. > > > > - idmapped mounts are never "first" mounts. They are always created > > > > after "normal" mount. > > > > - effectively this check makes "normal" mount to work normally and > > > > fail only requests that comes through an idmapped mounts > > > > with reasonable error message. Obviously, all read operations will > > > > work perfectly well only the operations that create new inodes will > > > > fail. > > > > Btw, we already have an analogical semantic on the VFS level for users > > > > who have no UID/GID mapping to the host. Filesystem requests for > > > > such users will fail with -EOVERFLOW. Here we have something close. > > > > > > Refusing requests coming from an idmapped mount if the server misses > > > appropriate features is good enough as a first step imho. And yes, we do > > > have similar logic on the vfs level for unmapped uid/gid. > > > > Thanks, Christian! > > > > I wanted to add that alternative here is to modify caller_{u,g}id > > fields as it was done in the first approach, > > it will break the UID/GID-based permissions model for old MDS versions > > (we can put printk_once to inform user about this), > > but at the same time it will allow us to support idmapped mounts in > > all cases. This support will be not fully ideal for old MDS > > and perfectly well for new MDS versions. > > > > Alternatively, we can introduce cephfs mount option like > > "idmap_with_old_mds" and if it's enabled then we set caller_{u,g}id > > for MDS without CEPHFS_FEATURE_HAS_OWNER_UIDGID, if it's disabled > > (default) we fail requests with -EIO. For > > new MDS everything goes in the right way. > > > > Kind regards, > > Alex > > Hey there, > > A very strong +1 on there needing to be some way to make this work > with older Ceph releases. > Ceph Reef isn't out yet and we're in July 2023, so I'd really like not > having to wait until Ceph Squid in mid 2024 to be able to make use of > this! > > Some kind of mount option, module option or the like would all be fine for this. I really like this way. I can implement it really quickly. Let's just agree on this :) It looks like an ideal solution for everyone. Kind regards, Alex > > Stéphane
On 7/27/23 22:46, Stéphane Graber wrote: > On Thu, Jul 27, 2023 at 5:48 AM Aleksandr Mikhalitsyn > <aleksandr.mikhalitsyn@canonical.com> wrote: >> On Thu, Jul 27, 2023 at 11:01 AM Christian Brauner <brauner@kernel.org> wrote: >>> On Thu, Jul 27, 2023 at 08:36:40AM +0200, Aleksandr Mikhalitsyn wrote: >>>> On Thu, Jul 27, 2023 at 7:30 AM Xiubo Li <xiubli@redhat.com> wrote: >>>>> >>>>> On 7/26/23 22:10, Alexander Mikhalitsyn wrote: >>>>>> Inode operations that create a new filesystem object such as ->mknod, >>>>>> ->create, ->mkdir() and others don't take a {g,u}id argument explicitly. >>>>>> Instead the caller's fs{g,u}id is used for the {g,u}id of the new >>>>>> filesystem object. >>>>>> >>>>>> In order to ensure that the correct {g,u}id is used map the caller's >>>>>> fs{g,u}id for creation requests. This doesn't require complex changes. >>>>>> It suffices to pass in the relevant idmapping recorded in the request >>>>>> message. If this request message was triggered from an inode operation >>>>>> that creates filesystem objects it will have passed down the relevant >>>>>> idmaping. If this is a request message that was triggered from an inode >>>>>> operation that doens't need to take idmappings into account the initial >>>>>> idmapping is passed down which is an identity mapping. >>>>>> >>>>>> This change uses a new cephfs protocol extension CEPHFS_FEATURE_HAS_OWNER_UIDGID >>>>>> which adds two new fields (owner_{u,g}id) to the request head structure. >>>>>> So, we need to ensure that MDS supports it otherwise we need to fail >>>>>> any IO that comes through an idmapped mount because we can't process it >>>>>> in a proper way. MDS server without such an extension will use caller_{u,g}id >>>>>> fields to set a new inode owner UID/GID which is incorrect because caller_{u,g}id >>>>>> values are unmapped. At the same time we can't map these fields with an >>>>>> idmapping as it can break UID/GID-based permission checks logic on the >>>>>> MDS side. This problem was described with a lot of details at [1], [2]. >>>>>> >>>>>> [1] https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/ >>>>>> [2] https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/ >>>>>> >>>>>> Cc: Xiubo Li <xiubli@redhat.com> >>>>>> Cc: Jeff Layton <jlayton@kernel.org> >>>>>> Cc: Ilya Dryomov <idryomov@gmail.com> >>>>>> Cc: ceph-devel@vger.kernel.org >>>>>> Co-Developed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> >>>>>> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> >>>>>> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> >>>>>> --- >>>>>> v7: >>>>>> - reworked to use two new fields for owner UID/GID (https://github.com/ceph/ceph/pull/52575) >>>>>> --- >>>>>> fs/ceph/mds_client.c | 20 ++++++++++++++++++++ >>>>>> fs/ceph/mds_client.h | 5 ++++- >>>>>> include/linux/ceph/ceph_fs.h | 4 +++- >>>>>> 3 files changed, 27 insertions(+), 2 deletions(-) >>>>>> >>>>>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c >>>>>> index c641ab046e98..ac095a95f3d0 100644 >>>>>> --- a/fs/ceph/mds_client.c >>>>>> +++ b/fs/ceph/mds_client.c >>>>>> @@ -2923,6 +2923,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, >>>>>> { >>>>>> int mds = session->s_mds; >>>>>> struct ceph_mds_client *mdsc = session->s_mdsc; >>>>>> + struct ceph_client *cl = mdsc->fsc->client; >>>>>> struct ceph_msg *msg; >>>>>> struct ceph_mds_request_head_legacy *lhead; >>>>>> const char *path1 = NULL; >>>>>> @@ -3028,6 +3029,16 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, >>>>>> lhead = find_legacy_request_head(msg->front.iov_base, >>>>>> session->s_con.peer_features); >>>>>> >>>>>> + if ((req->r_mnt_idmap != &nop_mnt_idmap) && >>>>>> + !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features)) { >>>>>> + pr_err_ratelimited_client(cl, >>>>>> + "idmapped mount is used and CEPHFS_FEATURE_HAS_OWNER_UIDGID" >>>>>> + " is not supported by MDS. Fail request with -EIO.\n"); >>>>>> + >>>>>> + ret = -EIO; >>>>>> + goto out_err; >>>>>> + } >>>>>> + >>>>> I think this couldn't fail the mounting operation, right ? >>>> This won't fail mounting. First of all an idmapped mount is always an >>>> additional mount, you always >>>> start from doing "normal" mount and only after that you can use this >>>> mount to create an idmapped one. >>>> ( example: https://github.com/brauner/mount-idmapped/tree/master ) >>>> >>>>> IMO we should fail the mounting from the beginning. >>>> Unfortunately, we can't fail mount from the beginning. Procedure of >>>> the idmapped mounts >>>> creation is handled not on the filesystem level, but on the VFS level >>> Correct. It's a generic vfsmount feature. >>> >>>> (source: https://github.com/torvalds/linux/blob/0a8db05b571ad5b8d5c8774a004c0424260a90bd/fs/namespace.c#L4277 >>>> ) >>>> >>>> Kernel perform all required checks as: >>>> - filesystem type has declared to support idmappings >>>> (fs_type->fs_flags & FS_ALLOW_IDMAP) >>>> - user who creates idmapped mount should be CAP_SYS_ADMIN in a user >>>> namespace that owns superblock of the filesystem >>>> (for cephfs it's always init_user_ns => user should be root on the host) >>>> >>>> So I would like to go this way because of the reasons mentioned above: >>>> - root user is someone who understands what he does. >>>> - idmapped mounts are never "first" mounts. They are always created >>>> after "normal" mount. >>>> - effectively this check makes "normal" mount to work normally and >>>> fail only requests that comes through an idmapped mounts >>>> with reasonable error message. Obviously, all read operations will >>>> work perfectly well only the operations that create new inodes will >>>> fail. >>>> Btw, we already have an analogical semantic on the VFS level for users >>>> who have no UID/GID mapping to the host. Filesystem requests for >>>> such users will fail with -EOVERFLOW. Here we have something close. >>> Refusing requests coming from an idmapped mount if the server misses >>> appropriate features is good enough as a first step imho. And yes, we do >>> have similar logic on the vfs level for unmapped uid/gid. >> Thanks, Christian! >> >> I wanted to add that alternative here is to modify caller_{u,g}id >> fields as it was done in the first approach, >> it will break the UID/GID-based permissions model for old MDS versions >> (we can put printk_once to inform user about this), >> but at the same time it will allow us to support idmapped mounts in >> all cases. This support will be not fully ideal for old MDS >> and perfectly well for new MDS versions. >> >> Alternatively, we can introduce cephfs mount option like >> "idmap_with_old_mds" and if it's enabled then we set caller_{u,g}id >> for MDS without CEPHFS_FEATURE_HAS_OWNER_UIDGID, if it's disabled >> (default) we fail requests with -EIO. For >> new MDS everything goes in the right way. >> >> Kind regards, >> Alex > Hey there, > > A very strong +1 on there needing to be some way to make this work > with older Ceph releases. > Ceph Reef isn't out yet and we're in July 2023, so I'd really like not > having to wait until Ceph Squid in mid 2024 to be able to make use of > this! IMO this shouldn't be an issue, because we can backport it to old releases. Thanks - Xiubo > > Some kind of mount option, module option or the like would all be fine for this. > > Stéphane >
On 7/27/23 22:46, Stéphane Graber wrote: > On Thu, Jul 27, 2023 at 5:48 AM Aleksandr Mikhalitsyn > <aleksandr.mikhalitsyn@canonical.com> wrote: >> On Thu, Jul 27, 2023 at 11:01 AM Christian Brauner<brauner@kernel.org> wrote: >>> On Thu, Jul 27, 2023 at 08:36:40AM +0200, Aleksandr Mikhalitsyn wrote: >>>> On Thu, Jul 27, 2023 at 7:30 AM Xiubo Li<xiubli@redhat.com> wrote: >>>>> On 7/26/23 22:10, Alexander Mikhalitsyn wrote: >>>>>> Inode operations that create a new filesystem object such as ->mknod, >>>>>> ->create, ->mkdir() and others don't take a {g,u}id argument explicitly. >>>>>> Instead the caller's fs{g,u}id is used for the {g,u}id of the new >>>>>> filesystem object. >>>>>> >>>>>> In order to ensure that the correct {g,u}id is used map the caller's >>>>>> fs{g,u}id for creation requests. This doesn't require complex changes. >>>>>> It suffices to pass in the relevant idmapping recorded in the request >>>>>> message. If this request message was triggered from an inode operation >>>>>> that creates filesystem objects it will have passed down the relevant >>>>>> idmaping. If this is a request message that was triggered from an inode >>>>>> operation that doens't need to take idmappings into account the initial >>>>>> idmapping is passed down which is an identity mapping. >>>>>> >>>>>> This change uses a new cephfs protocol extension CEPHFS_FEATURE_HAS_OWNER_UIDGID >>>>>> which adds two new fields (owner_{u,g}id) to the request head structure. >>>>>> So, we need to ensure that MDS supports it otherwise we need to fail >>>>>> any IO that comes through an idmapped mount because we can't process it >>>>>> in a proper way. MDS server without such an extension will use caller_{u,g}id >>>>>> fields to set a new inode owner UID/GID which is incorrect because caller_{u,g}id >>>>>> values are unmapped. At the same time we can't map these fields with an >>>>>> idmapping as it can break UID/GID-based permission checks logic on the >>>>>> MDS side. This problem was described with a lot of details at [1], [2]. >>>>>> >>>>>> [1]https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/ >>>>>> [2]https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/ >>>>>> >>>>>> Cc: Xiubo Li<xiubli@redhat.com> >>>>>> Cc: Jeff Layton<jlayton@kernel.org> >>>>>> Cc: Ilya Dryomov<idryomov@gmail.com> >>>>>> Cc:ceph-devel@vger.kernel.org >>>>>> Co-Developed-by: Alexander Mikhalitsyn<aleksandr.mikhalitsyn@canonical.com> >>>>>> Signed-off-by: Christian Brauner<christian.brauner@ubuntu.com> >>>>>> Signed-off-by: Alexander Mikhalitsyn<aleksandr.mikhalitsyn@canonical.com> >>>>>> --- >>>>>> v7: >>>>>> - reworked to use two new fields for owner UID/GID (https://github.com/ceph/ceph/pull/52575) >>>>>> --- >>>>>> fs/ceph/mds_client.c | 20 ++++++++++++++++++++ >>>>>> fs/ceph/mds_client.h | 5 ++++- >>>>>> include/linux/ceph/ceph_fs.h | 4 +++- >>>>>> 3 files changed, 27 insertions(+), 2 deletions(-) >>>>>> >>>>>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c >>>>>> index c641ab046e98..ac095a95f3d0 100644 >>>>>> --- a/fs/ceph/mds_client.c >>>>>> +++ b/fs/ceph/mds_client.c >>>>>> @@ -2923,6 +2923,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, >>>>>> { >>>>>> int mds = session->s_mds; >>>>>> struct ceph_mds_client *mdsc = session->s_mdsc; >>>>>> + struct ceph_client *cl = mdsc->fsc->client; >>>>>> struct ceph_msg *msg; >>>>>> struct ceph_mds_request_head_legacy *lhead; >>>>>> const char *path1 = NULL; >>>>>> @@ -3028,6 +3029,16 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, >>>>>> lhead = find_legacy_request_head(msg->front.iov_base, >>>>>> session->s_con.peer_features); >>>>>> >>>>>> + if ((req->r_mnt_idmap != &nop_mnt_idmap) && >>>>>> + !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features)) { >>>>>> + pr_err_ratelimited_client(cl, >>>>>> + "idmapped mount is used and CEPHFS_FEATURE_HAS_OWNER_UIDGID" >>>>>> + " is not supported by MDS. Fail request with -EIO.\n"); >>>>>> + >>>>>> + ret = -EIO; >>>>>> + goto out_err; >>>>>> + } >>>>>> + >>>>> I think this couldn't fail the mounting operation, right ? >>>> This won't fail mounting. First of all an idmapped mount is always an >>>> additional mount, you always >>>> start from doing "normal" mount and only after that you can use this >>>> mount to create an idmapped one. >>>> ( example:https://github.com/brauner/mount-idmapped/tree/master ) >>>> >>>>> IMO we should fail the mounting from the beginning. >>>> Unfortunately, we can't fail mount from the beginning. Procedure of >>>> the idmapped mounts >>>> creation is handled not on the filesystem level, but on the VFS level >>> Correct. It's a generic vfsmount feature. >>> >>>> (source:https://github.com/torvalds/linux/blob/0a8db05b571ad5b8d5c8774a004c0424260a90bd/fs/namespace.c#L4277 >>>> ) >>>> >>>> Kernel perform all required checks as: >>>> - filesystem type has declared to support idmappings >>>> (fs_type->fs_flags & FS_ALLOW_IDMAP) >>>> - user who creates idmapped mount should be CAP_SYS_ADMIN in a user >>>> namespace that owns superblock of the filesystem >>>> (for cephfs it's always init_user_ns => user should be root on the host) >>>> >>>> So I would like to go this way because of the reasons mentioned above: >>>> - root user is someone who understands what he does. >>>> - idmapped mounts are never "first" mounts. They are always created >>>> after "normal" mount. >>>> - effectively this check makes "normal" mount to work normally and >>>> fail only requests that comes through an idmapped mounts >>>> with reasonable error message. Obviously, all read operations will >>>> work perfectly well only the operations that create new inodes will >>>> fail. >>>> Btw, we already have an analogical semantic on the VFS level for users >>>> who have no UID/GID mapping to the host. Filesystem requests for >>>> such users will fail with -EOVERFLOW. Here we have something close. >>> Refusing requests coming from an idmapped mount if the server misses >>> appropriate features is good enough as a first step imho. And yes, we do >>> have similar logic on the vfs level for unmapped uid/gid. >> Thanks, Christian! >> >> I wanted to add that alternative here is to modify caller_{u,g}id >> fields as it was done in the first approach, >> it will break the UID/GID-based permissions model for old MDS versions >> (we can put printk_once to inform user about this), >> but at the same time it will allow us to support idmapped mounts in >> all cases. This support will be not fully ideal for old MDS >> and perfectly well for new MDS versions. >> >> Alternatively, we can introduce cephfs mount option like >> "idmap_with_old_mds" and if it's enabled then we set caller_{u,g}id >> for MDS without CEPHFS_FEATURE_HAS_OWNER_UIDGID, if it's disabled >> (default) we fail requests with -EIO. For >> new MDS everything goes in the right way. >> >> Kind regards, >> Alex > Hey there, > > A very strong +1 on there needing to be some way to make this work > with older Ceph releases. > Ceph Reef isn't out yet and we're in July 2023, so I'd really like not > having to wait until Ceph Squid in mid 2024 to be able to make use of > this! IMO this shouldn't be an issue, because we can backport it to old releases. Thanks - Xiubo > Some kind of mount option, module option or the like would all be fine for this. > > Stéphane >
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index c641ab046e98..ac095a95f3d0 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -2923,6 +2923,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, { int mds = session->s_mds; struct ceph_mds_client *mdsc = session->s_mdsc; + struct ceph_client *cl = mdsc->fsc->client; struct ceph_msg *msg; struct ceph_mds_request_head_legacy *lhead; const char *path1 = NULL; @@ -3028,6 +3029,16 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, lhead = find_legacy_request_head(msg->front.iov_base, session->s_con.peer_features); + if ((req->r_mnt_idmap != &nop_mnt_idmap) && + !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features)) { + pr_err_ratelimited_client(cl, + "idmapped mount is used and CEPHFS_FEATURE_HAS_OWNER_UIDGID" + " is not supported by MDS. Fail request with -EIO.\n"); + + ret = -EIO; + goto out_err; + } + /* * The ceph_mds_request_head_legacy didn't contain a version field, and * one was added when we moved the message version from 3->4. @@ -3043,10 +3054,19 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, p = msg->front.iov_base + sizeof(*ohead); } else { struct ceph_mds_request_head *nhead = msg->front.iov_base; + kuid_t owner_fsuid; + kgid_t owner_fsgid; msg->hdr.version = cpu_to_le16(6); nhead->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION); p = msg->front.iov_base + sizeof(*nhead); + + owner_fsuid = from_vfsuid(req->r_mnt_idmap, &init_user_ns, + VFSUIDT_INIT(req->r_cred->fsuid)); + owner_fsgid = from_vfsgid(req->r_mnt_idmap, &init_user_ns, + VFSGIDT_INIT(req->r_cred->fsgid)); + nhead->owner_uid = cpu_to_le32(from_kuid(&init_user_ns, owner_fsuid)); + nhead->owner_gid = cpu_to_le32(from_kgid(&init_user_ns, owner_fsgid)); } end = msg->front.iov_base + msg->front.iov_len; diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h index e3bbf3ba8ee8..8f683e8203bd 100644 --- a/fs/ceph/mds_client.h +++ b/fs/ceph/mds_client.h @@ -33,8 +33,10 @@ enum ceph_feature_type { CEPHFS_FEATURE_NOTIFY_SESSION_STATE, CEPHFS_FEATURE_OP_GETVXATTR, CEPHFS_FEATURE_32BITS_RETRY_FWD, + CEPHFS_FEATURE_NEW_SNAPREALM_INFO, + CEPHFS_FEATURE_HAS_OWNER_UIDGID, - CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_32BITS_RETRY_FWD, + CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_HAS_OWNER_UIDGID, }; #define CEPHFS_FEATURES_CLIENT_SUPPORTED { \ @@ -49,6 +51,7 @@ enum ceph_feature_type { CEPHFS_FEATURE_NOTIFY_SESSION_STATE, \ CEPHFS_FEATURE_OP_GETVXATTR, \ CEPHFS_FEATURE_32BITS_RETRY_FWD, \ + CEPHFS_FEATURE_HAS_OWNER_UIDGID, \ } /* diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h index 5f2301ee88bc..6eb83a51341c 100644 --- a/include/linux/ceph/ceph_fs.h +++ b/include/linux/ceph/ceph_fs.h @@ -499,7 +499,7 @@ struct ceph_mds_request_head_legacy { union ceph_mds_request_args args; } __attribute__ ((packed)); -#define CEPH_MDS_REQUEST_HEAD_VERSION 2 +#define CEPH_MDS_REQUEST_HEAD_VERSION 3 struct ceph_mds_request_head_old { __le16 version; /* struct version */ @@ -530,6 +530,8 @@ struct ceph_mds_request_head { __le32 ext_num_retry; /* new count retry attempts */ __le32 ext_num_fwd; /* new count fwd attempts */ + + __le32 owner_uid, owner_gid; /* used for OPs which create inodes */ } __attribute__ ((packed)); /* cap/lease release record */