Message ID | 20230803135955.230449-4-aleksandr.mikhalitsyn@canonical.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:9f41:0:b0:3e4:2afc:c1 with SMTP id v1csp1193091vqx; Thu, 3 Aug 2023 07:38:06 -0700 (PDT) X-Google-Smtp-Source: APBJJlEEMJgP7Gkyimg9o9EUg6uFw6Zcjzu+EKANm2Lgs7zZyDaei3TID6CgVxxRz2tuu4mGhwNI X-Received: by 2002:a17:90b:3712:b0:268:2af6:e48c with SMTP id mg18-20020a17090b371200b002682af6e48cmr24297070pjb.4.1691073485825; Thu, 03 Aug 2023 07:38:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691073485; cv=none; d=google.com; s=arc-20160816; b=uNi9tIaytQaQTI7YGUno1aE1DsOg6vK+d5m0dy/SjpBR6YqxgpgWcWXdQr00MyPLDt MIsRfgRtSLtJgDeDNwHNQyXrbsbi2wqZt8kQUGlD8dtKG2Lisd96OwNNPld+nwawW3BW Sw40foI8CrF0UDOTHuAD6v2KDVs8Wat9CGqwJ3jKwvqWcoBX9mIP19kzbR86jpBmoWGa Tq9yENgvsS0cKEKkdv8Vli/+mRaOGm+L8oyY2/0AllcMK9NAZci8Iz//s+qJBriwebRn psCvrGmnZj7AgNiJE5Zgyi6ncEZKxovbe8FJgW5ezmEn7nUU7HzMLNXjRvYj+8X7HvrQ knzg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=SRA2CxbC6ceQPxXoZHupL9znEMzT1kYcPE4TFQwQ4sw=; fh=Z2mkFIQPI1atT1PXylFFs5aXkUtGzwkgwqJaEftWDsI=; b=owToVolMfqDGnTkhnQmkKZ2ZXLhXqPNLiKj/RAm1A3pcX9DEqEggTuganpY10fNX0i SxwKOgdWKqYaGj7B8x2EoV6ZaHdcyBomzYNs9tvE39ww9HwCElRLRnqGvnbGcE/ibcoq 9iawqTg53B/GH0fkHG0tNVyYp9cg0Jf2XhR+tVKA9464Xw3NxFkyk+KLc4TwTfbYPHS3 yTtaJHU7wbtROLuDJuHRopWQu2EZQFyz7FPK91QErGwxxhYJ/LfQ10R8/HsN6nYE6/qG s00LMoPVN5lABBPT24Cyj6QMyhqX/zp69hQ9bVZAuFY+a85w3t3aLS/ldfG7INU/ZR3C yU5A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@canonical.com header.s=20210705 header.b=LgWNtHaW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=canonical.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bv19-20020a17090af19300b00268c19acce2si3313695pjb.38.2023.08.03.07.37.45; Thu, 03 Aug 2023 07:38:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@canonical.com header.s=20210705 header.b=LgWNtHaW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=canonical.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236609AbjHCOB1 (ORCPT <rfc822;guoshuai5156@gmail.com> + 99 others); Thu, 3 Aug 2023 10:01:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47384 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236263AbjHCOBR (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 3 Aug 2023 10:01:17 -0400 Received: from smtp-relay-internal-0.canonical.com (smtp-relay-internal-0.canonical.com [185.125.188.122]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 45A5D198A for <linux-kernel@vger.kernel.org>; Thu, 3 Aug 2023 07:00:39 -0700 (PDT) Received: from mail-ed1-f69.google.com (mail-ed1-f69.google.com [209.85.208.69]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-0.canonical.com (Postfix) with ESMTPS id D70AB40822 for <linux-kernel@vger.kernel.org>; Thu, 3 Aug 2023 14:00:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1691071218; bh=SRA2CxbC6ceQPxXoZHupL9znEMzT1kYcPE4TFQwQ4sw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=LgWNtHaWnBZZ3vI1FqFYr1XdMz9ELdXGNsKXKTMUbVUqTQp8U9VgkmLJl79nut1f5 wyApUmZeb8jxLkTGHa69bQJOlZtqkVHi+r8NZSgQROXPV1HhUdl/xvVZKVOmJJR0YI zJvj+nC7/5+O7SPd8AaWUQ80u11qap/ilbtm1gbIt/hqLjlNGbKhFEeuZ0mNiv2DmR OT77JqC/APlGsjyymRhJCnFE/2bx6b2/Njcz6/Byv9bfBaWU0d3B7e6mt2qHLCnwU2 4uaogCxjb6Q0ZoccE8udKEvF+GJGGWic1OrdQTUiZNckeJtQQ9IH5RpeG+LM0Y7P0j LeMvCK8sRGo8Q== Received: by mail-ed1-f69.google.com with SMTP id 4fb4d7f45d1cf-52258599da2so678120a12.3 for <linux-kernel@vger.kernel.org>; Thu, 03 Aug 2023 07:00:18 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691071218; x=1691676018; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=SRA2CxbC6ceQPxXoZHupL9znEMzT1kYcPE4TFQwQ4sw=; b=XpQtf9eTjjO3j94Cutoygu6EGA3SgZ6nRZmH0+Vt/Kdgm+Co96S+xnF2GbC/TT9aJy it8rX25ogYc+lxCaauUFdqraeO0HIIbpeGMgxbVSVGC7uldYIyujKLFETWAhIqMmpdJl SE43aDhnkF7J9T0v1rqIUbfy6UG2CYDr3oVu0/yVYVKUtt6f596cHXdPAzXDuAvmcB9t Sc8j6gKgDlUS6MkfPs402Dbi9WJmBwnWPhO5sapFTXTKTNjebtKimz4e/rFZ5glo5YoX WpMGGWFmdBBjxoEX/KNUqFaSZEbLm45xe4b8IegpmuFiUpukGqrBL2BjgJYUZwBVu2qE raEw== X-Gm-Message-State: ABy/qLZDWNksvme5QW7m7724L+em3qNP1edV1fT3brm54L18YCbSV7k6 kt2nDO79cEbv5pPpjV5/qKJblpha0xJ7kZ2Flq/JHOE1bgfJ81J/981EuNwmCmf9m8yFcYMz9pX WXZEwFiBsmSyBIE07ogKoEpv6XlpFbL+UvENtCy8glw== X-Received: by 2002:aa7:ca50:0:b0:522:39b7:da3a with SMTP id j16-20020aa7ca50000000b0052239b7da3amr8004197edt.31.1691071218112; Thu, 03 Aug 2023 07:00:18 -0700 (PDT) X-Received: by 2002:aa7:ca50:0:b0:522:39b7:da3a with SMTP id j16-20020aa7ca50000000b0052239b7da3amr8004184edt.31.1691071217918; Thu, 03 Aug 2023 07:00:17 -0700 (PDT) Received: from amikhalitsyn.local (dslb-088-066-182-192.088.066.pools.vodafone-ip.de. [88.66.182.192]) by smtp.gmail.com with ESMTPSA id bc21-20020a056402205500b0052229882fb0sm10114822edb.71.2023.08.03.07.00.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Aug 2023 07:00:17 -0700 (PDT) From: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> To: xiubli@redhat.com Cc: brauner@kernel.org, stgraber@ubuntu.com, linux-fsdevel@vger.kernel.org, Jeff Layton <jlayton@kernel.org>, Ilya Dryomov <idryomov@gmail.com>, ceph-devel@vger.kernel.org, Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>, linux-kernel@vger.kernel.org Subject: [PATCH v8 03/12] ceph: handle idmapped mounts in create_request_message() Date: Thu, 3 Aug 2023 15:59:46 +0200 Message-Id: <20230803135955.230449-4-aleksandr.mikhalitsyn@canonical.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230803135955.230449-1-aleksandr.mikhalitsyn@canonical.com> References: <20230803135955.230449-1-aleksandr.mikhalitsyn@canonical.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773219071152253588 X-GMAIL-MSGID: 1773219071152253588 |
Series |
ceph: support idmapped mounts
|
|
Commit Message
Aleksandr Mikhalitsyn
Aug. 3, 2023, 1:59 p.m. UTC
From: Christian Brauner <brauner@kernel.org> Inode operations that create a new filesystem object such as ->mknod, ->create, ->mkdir() and others don't take a {g,u}id argument explicitly. Instead the caller's fs{g,u}id is used for the {g,u}id of the new filesystem object. In order to ensure that the correct {g,u}id is used map the caller's fs{g,u}id for creation requests. This doesn't require complex changes. It suffices to pass in the relevant idmapping recorded in the request message. If this request message was triggered from an inode operation that creates filesystem objects it will have passed down the relevant idmaping. If this is a request message that was triggered from an inode operation that doens't need to take idmappings into account the initial idmapping is passed down which is an identity mapping. This change uses a new cephfs protocol extension CEPHFS_FEATURE_HAS_OWNER_UIDGID which adds two new fields (owner_{u,g}id) to the request head structure. So, we need to ensure that MDS supports it otherwise we need to fail any IO that comes through an idmapped mount because we can't process it in a proper way. MDS server without such an extension will use caller_{u,g}id fields to set a new inode owner UID/GID which is incorrect because caller_{u,g}id values are unmapped. At the same time we can't map these fields with an idmapping as it can break UID/GID-based permission checks logic on the MDS side. This problem was described with a lot of details at [1], [2]. [1] https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/ [2] https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/ https://github.com/ceph/ceph/pull/52575 https://tracker.ceph.com/issues/62217 Cc: Xiubo Li <xiubli@redhat.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: ceph-devel@vger.kernel.org Co-Developed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> --- v7: - reworked to use two new fields for owner UID/GID (https://github.com/ceph/ceph/pull/52575) v8: - properly handled case when old MDS used with new kernel client --- fs/ceph/mds_client.c | 46 +++++++++++++++++++++++++++++++++--- fs/ceph/mds_client.h | 5 +++- include/linux/ceph/ceph_fs.h | 4 +++- 3 files changed, 50 insertions(+), 5 deletions(-)
Comments
On 8/3/23 21:59, Alexander Mikhalitsyn wrote: > From: Christian Brauner <brauner@kernel.org> > > Inode operations that create a new filesystem object such as ->mknod, > ->create, ->mkdir() and others don't take a {g,u}id argument explicitly. > Instead the caller's fs{g,u}id is used for the {g,u}id of the new > filesystem object. > > In order to ensure that the correct {g,u}id is used map the caller's > fs{g,u}id for creation requests. This doesn't require complex changes. > It suffices to pass in the relevant idmapping recorded in the request > message. If this request message was triggered from an inode operation > that creates filesystem objects it will have passed down the relevant > idmaping. If this is a request message that was triggered from an inode > operation that doens't need to take idmappings into account the initial > idmapping is passed down which is an identity mapping. > > This change uses a new cephfs protocol extension CEPHFS_FEATURE_HAS_OWNER_UIDGID > which adds two new fields (owner_{u,g}id) to the request head structure. > So, we need to ensure that MDS supports it otherwise we need to fail > any IO that comes through an idmapped mount because we can't process it > in a proper way. MDS server without such an extension will use caller_{u,g}id > fields to set a new inode owner UID/GID which is incorrect because caller_{u,g}id > values are unmapped. At the same time we can't map these fields with an > idmapping as it can break UID/GID-based permission checks logic on the > MDS side. This problem was described with a lot of details at [1], [2]. > > [1] https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/ > [2] https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/ > > https://github.com/ceph/ceph/pull/52575 > https://tracker.ceph.com/issues/62217 > > Cc: Xiubo Li <xiubli@redhat.com> > Cc: Jeff Layton <jlayton@kernel.org> > Cc: Ilya Dryomov <idryomov@gmail.com> > Cc: ceph-devel@vger.kernel.org > Co-Developed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> > Signed-off-by: Christian Brauner <brauner@kernel.org> > Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> > --- > v7: > - reworked to use two new fields for owner UID/GID (https://github.com/ceph/ceph/pull/52575) > v8: > - properly handled case when old MDS used with new kernel client > --- > fs/ceph/mds_client.c | 46 +++++++++++++++++++++++++++++++++--- > fs/ceph/mds_client.h | 5 +++- > include/linux/ceph/ceph_fs.h | 4 +++- > 3 files changed, 50 insertions(+), 5 deletions(-) > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c > index 8829f55103da..7d3106d3b726 100644 > --- a/fs/ceph/mds_client.c > +++ b/fs/ceph/mds_client.c > @@ -2902,6 +2902,17 @@ static void encode_mclientrequest_tail(void **p, const struct ceph_mds_request * > } > } > > +static inline u16 mds_supported_head_version(struct ceph_mds_session *session) > +{ > + if (!test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features)) > + return 1; > + > + if (!test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features)) > + return 2; > + > + return CEPH_MDS_REQUEST_HEAD_VERSION; > +} > + > static struct ceph_mds_request_head_legacy * > find_legacy_request_head(void *p, u64 features) > { > @@ -2923,6 +2934,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, > { > int mds = session->s_mds; > struct ceph_mds_client *mdsc = session->s_mdsc; > + struct ceph_client *cl = mdsc->fsc->client; > struct ceph_msg *msg; > struct ceph_mds_request_head_legacy *lhead; > const char *path1 = NULL; > @@ -2936,7 +2948,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, > void *p, *end; > int ret; > bool legacy = !(session->s_con.peer_features & CEPH_FEATURE_FS_BTIME); > - bool old_version = !test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features); > + u16 request_head_version = mds_supported_head_version(session); > > ret = set_request_path_attr(mdsc, req->r_inode, req->r_dentry, > req->r_parent, req->r_path1, req->r_ino1.ino, > @@ -2977,8 +2989,10 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, > */ > if (legacy) > len = sizeof(struct ceph_mds_request_head_legacy); > - else if (old_version) > + else if (request_head_version == 1) > len = sizeof(struct ceph_mds_request_head_old); > + else if (request_head_version == 2) > + len = offsetofend(struct ceph_mds_request_head, ext_num_fwd); > else > len = sizeof(struct ceph_mds_request_head); > This is not what we suppose to. If we do this again and again when adding new members it will make the code very complicated to maintain. Once the CEPHFS_FEATURE_32BITS_RETRY_FWD has been supported the ceph should correctly decode it and if CEPHFS_FEATURE_HAS_OWNER_UIDGID is not supported the decoder should skip it directly. Is the MDS side buggy ? Why you last version didn't work ? Thanks - Xiubo > @@ -3028,6 +3042,16 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, > lhead = find_legacy_request_head(msg->front.iov_base, > session->s_con.peer_features); > > + if ((req->r_mnt_idmap != &nop_mnt_idmap) && > + !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features)) { > + pr_err_ratelimited_client(cl, > + "idmapped mount is used and CEPHFS_FEATURE_HAS_OWNER_UIDGID" > + " is not supported by MDS. Fail request with -EIO.\n"); > + > + ret = -EIO; > + goto out_err; > + } > + > /* > * The ceph_mds_request_head_legacy didn't contain a version field, and > * one was added when we moved the message version from 3->4. > @@ -3035,17 +3059,33 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, > if (legacy) { > msg->hdr.version = cpu_to_le16(3); > p = msg->front.iov_base + sizeof(*lhead); > - } else if (old_version) { > + } else if (request_head_version == 1) { > struct ceph_mds_request_head_old *ohead = msg->front.iov_base; > > msg->hdr.version = cpu_to_le16(4); > ohead->version = cpu_to_le16(1); > p = msg->front.iov_base + sizeof(*ohead); > + } else if (request_head_version == 2) { > + struct ceph_mds_request_head *nhead = msg->front.iov_base; > + > + msg->hdr.version = cpu_to_le16(6); > + nhead->version = cpu_to_le16(2); > + > + p = msg->front.iov_base + offsetofend(struct ceph_mds_request_head, ext_num_fwd); > } else { > struct ceph_mds_request_head *nhead = msg->front.iov_base; > + kuid_t owner_fsuid; > + kgid_t owner_fsgid; > > msg->hdr.version = cpu_to_le16(6); > nhead->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION); > + > + owner_fsuid = from_vfsuid(req->r_mnt_idmap, &init_user_ns, > + VFSUIDT_INIT(req->r_cred->fsuid)); > + owner_fsgid = from_vfsgid(req->r_mnt_idmap, &init_user_ns, > + VFSGIDT_INIT(req->r_cred->fsgid)); > + nhead->owner_uid = cpu_to_le32(from_kuid(&init_user_ns, owner_fsuid)); > + nhead->owner_gid = cpu_to_le32(from_kgid(&init_user_ns, owner_fsgid)); > p = msg->front.iov_base + sizeof(*nhead); > } > > diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h > index e3bbf3ba8ee8..8f683e8203bd 100644 > --- a/fs/ceph/mds_client.h > +++ b/fs/ceph/mds_client.h > @@ -33,8 +33,10 @@ enum ceph_feature_type { > CEPHFS_FEATURE_NOTIFY_SESSION_STATE, > CEPHFS_FEATURE_OP_GETVXATTR, > CEPHFS_FEATURE_32BITS_RETRY_FWD, > + CEPHFS_FEATURE_NEW_SNAPREALM_INFO, > + CEPHFS_FEATURE_HAS_OWNER_UIDGID, > > - CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_32BITS_RETRY_FWD, > + CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_HAS_OWNER_UIDGID, > }; > > #define CEPHFS_FEATURES_CLIENT_SUPPORTED { \ > @@ -49,6 +51,7 @@ enum ceph_feature_type { > CEPHFS_FEATURE_NOTIFY_SESSION_STATE, \ > CEPHFS_FEATURE_OP_GETVXATTR, \ > CEPHFS_FEATURE_32BITS_RETRY_FWD, \ > + CEPHFS_FEATURE_HAS_OWNER_UIDGID, \ > } > > /* > diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h > index 5f2301ee88bc..6eb83a51341c 100644 > --- a/include/linux/ceph/ceph_fs.h > +++ b/include/linux/ceph/ceph_fs.h > @@ -499,7 +499,7 @@ struct ceph_mds_request_head_legacy { > union ceph_mds_request_args args; > } __attribute__ ((packed)); > > -#define CEPH_MDS_REQUEST_HEAD_VERSION 2 > +#define CEPH_MDS_REQUEST_HEAD_VERSION 3 > > struct ceph_mds_request_head_old { > __le16 version; /* struct version */ > @@ -530,6 +530,8 @@ struct ceph_mds_request_head { > > __le32 ext_num_retry; /* new count retry attempts */ > __le32 ext_num_fwd; /* new count fwd attempts */ > + > + __le32 owner_uid, owner_gid; /* used for OPs which create inodes */ > } __attribute__ ((packed)); > > /* cap/lease release record */
On 8/4/23 10:26, Xiubo Li wrote: > > On 8/3/23 21:59, Alexander Mikhalitsyn wrote: >> From: Christian Brauner <brauner@kernel.org> >> >> Inode operations that create a new filesystem object such as ->mknod, >> ->create, ->mkdir() and others don't take a {g,u}id argument explicitly. >> Instead the caller's fs{g,u}id is used for the {g,u}id of the new >> filesystem object. >> >> In order to ensure that the correct {g,u}id is used map the caller's >> fs{g,u}id for creation requests. This doesn't require complex changes. >> It suffices to pass in the relevant idmapping recorded in the request >> message. If this request message was triggered from an inode operation >> that creates filesystem objects it will have passed down the relevant >> idmaping. If this is a request message that was triggered from an inode >> operation that doens't need to take idmappings into account the initial >> idmapping is passed down which is an identity mapping. >> >> This change uses a new cephfs protocol extension >> CEPHFS_FEATURE_HAS_OWNER_UIDGID >> which adds two new fields (owner_{u,g}id) to the request head structure. >> So, we need to ensure that MDS supports it otherwise we need to fail >> any IO that comes through an idmapped mount because we can't process it >> in a proper way. MDS server without such an extension will use >> caller_{u,g}id >> fields to set a new inode owner UID/GID which is incorrect because >> caller_{u,g}id >> values are unmapped. At the same time we can't map these fields with an >> idmapping as it can break UID/GID-based permission checks logic on the >> MDS side. This problem was described with a lot of details at [1], [2]. >> >> [1] >> https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/ >> [2] >> https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/ >> >> https://github.com/ceph/ceph/pull/52575 >> https://tracker.ceph.com/issues/62217 >> >> Cc: Xiubo Li <xiubli@redhat.com> >> Cc: Jeff Layton <jlayton@kernel.org> >> Cc: Ilya Dryomov <idryomov@gmail.com> >> Cc: ceph-devel@vger.kernel.org >> Co-Developed-by: Alexander Mikhalitsyn >> <aleksandr.mikhalitsyn@canonical.com> >> Signed-off-by: Christian Brauner <brauner@kernel.org> >> Signed-off-by: Alexander Mikhalitsyn >> <aleksandr.mikhalitsyn@canonical.com> >> --- >> v7: >> - reworked to use two new fields for owner UID/GID >> (https://github.com/ceph/ceph/pull/52575) >> v8: >> - properly handled case when old MDS used with new kernel client >> --- >> fs/ceph/mds_client.c | 46 +++++++++++++++++++++++++++++++++--- >> fs/ceph/mds_client.h | 5 +++- >> include/linux/ceph/ceph_fs.h | 4 +++- >> 3 files changed, 50 insertions(+), 5 deletions(-) >> >> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c >> index 8829f55103da..7d3106d3b726 100644 >> --- a/fs/ceph/mds_client.c >> +++ b/fs/ceph/mds_client.c >> @@ -2902,6 +2902,17 @@ static void encode_mclientrequest_tail(void >> **p, const struct ceph_mds_request * >> } >> } >> +static inline u16 mds_supported_head_version(struct >> ceph_mds_session *session) >> +{ >> + if (!test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, >> &session->s_features)) >> + return 1; >> + >> + if (!test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, >> &session->s_features)) >> + return 2; >> + >> + return CEPH_MDS_REQUEST_HEAD_VERSION; >> +} >> + >> static struct ceph_mds_request_head_legacy * >> find_legacy_request_head(void *p, u64 features) >> { >> @@ -2923,6 +2934,7 @@ static struct ceph_msg >> *create_request_message(struct ceph_mds_session *session, >> { >> int mds = session->s_mds; >> struct ceph_mds_client *mdsc = session->s_mdsc; >> + struct ceph_client *cl = mdsc->fsc->client; >> struct ceph_msg *msg; >> struct ceph_mds_request_head_legacy *lhead; >> const char *path1 = NULL; >> @@ -2936,7 +2948,7 @@ static struct ceph_msg >> *create_request_message(struct ceph_mds_session *session, >> void *p, *end; >> int ret; >> bool legacy = !(session->s_con.peer_features & >> CEPH_FEATURE_FS_BTIME); >> - bool old_version = !test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, >> &session->s_features); >> + u16 request_head_version = mds_supported_head_version(session); >> ret = set_request_path_attr(mdsc, req->r_inode, req->r_dentry, >> req->r_parent, req->r_path1, req->r_ino1.ino, >> @@ -2977,8 +2989,10 @@ static struct ceph_msg >> *create_request_message(struct ceph_mds_session *session, >> */ >> if (legacy) >> len = sizeof(struct ceph_mds_request_head_legacy); >> - else if (old_version) >> + else if (request_head_version == 1) >> len = sizeof(struct ceph_mds_request_head_old); >> + else if (request_head_version == 2) >> + len = offsetofend(struct ceph_mds_request_head, ext_num_fwd); >> else >> len = sizeof(struct ceph_mds_request_head); > > This is not what we suppose to. If we do this again and again when > adding new members it will make the code very complicated to maintain. > > Once the CEPHFS_FEATURE_32BITS_RETRY_FWD has been supported the ceph > should correctly decode it and if CEPHFS_FEATURE_HAS_OWNER_UIDGID is > not supported the decoder should skip it directly. > > Is the MDS side buggy ? Why you last version didn't work ? > I think the ceph side is buggy. Possibly we should add one new `length` member in struct `struct ceph_mds_request_head` and just skip the extra bytes when decoding it. Could you fix it together with your ceph PR ? Thanks - Xiubo > Thanks > > - Xiubo > >> @@ -3028,6 +3042,16 @@ static struct ceph_msg >> *create_request_message(struct ceph_mds_session *session, >> lhead = find_legacy_request_head(msg->front.iov_base, >> session->s_con.peer_features); >> + if ((req->r_mnt_idmap != &nop_mnt_idmap) && >> + !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, >> &session->s_features)) { >> + pr_err_ratelimited_client(cl, >> + "idmapped mount is used and >> CEPHFS_FEATURE_HAS_OWNER_UIDGID" >> + " is not supported by MDS. Fail request with -EIO.\n"); >> + >> + ret = -EIO; >> + goto out_err; >> + } >> + >> /* >> * The ceph_mds_request_head_legacy didn't contain a version >> field, and >> * one was added when we moved the message version from 3->4. >> @@ -3035,17 +3059,33 @@ static struct ceph_msg >> *create_request_message(struct ceph_mds_session *session, >> if (legacy) { >> msg->hdr.version = cpu_to_le16(3); >> p = msg->front.iov_base + sizeof(*lhead); >> - } else if (old_version) { >> + } else if (request_head_version == 1) { >> struct ceph_mds_request_head_old *ohead = msg->front.iov_base; >> msg->hdr.version = cpu_to_le16(4); >> ohead->version = cpu_to_le16(1); >> p = msg->front.iov_base + sizeof(*ohead); >> + } else if (request_head_version == 2) { >> + struct ceph_mds_request_head *nhead = msg->front.iov_base; >> + >> + msg->hdr.version = cpu_to_le16(6); >> + nhead->version = cpu_to_le16(2); >> + >> + p = msg->front.iov_base + offsetofend(struct >> ceph_mds_request_head, ext_num_fwd); >> } else { >> struct ceph_mds_request_head *nhead = msg->front.iov_base; >> + kuid_t owner_fsuid; >> + kgid_t owner_fsgid; >> msg->hdr.version = cpu_to_le16(6); >> nhead->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION); >> + >> + owner_fsuid = from_vfsuid(req->r_mnt_idmap, &init_user_ns, >> + VFSUIDT_INIT(req->r_cred->fsuid)); >> + owner_fsgid = from_vfsgid(req->r_mnt_idmap, &init_user_ns, >> + VFSGIDT_INIT(req->r_cred->fsgid)); >> + nhead->owner_uid = cpu_to_le32(from_kuid(&init_user_ns, >> owner_fsuid)); >> + nhead->owner_gid = cpu_to_le32(from_kgid(&init_user_ns, >> owner_fsgid)); >> p = msg->front.iov_base + sizeof(*nhead); >> } >> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h >> index e3bbf3ba8ee8..8f683e8203bd 100644 >> --- a/fs/ceph/mds_client.h >> +++ b/fs/ceph/mds_client.h >> @@ -33,8 +33,10 @@ enum ceph_feature_type { >> CEPHFS_FEATURE_NOTIFY_SESSION_STATE, >> CEPHFS_FEATURE_OP_GETVXATTR, >> CEPHFS_FEATURE_32BITS_RETRY_FWD, >> + CEPHFS_FEATURE_NEW_SNAPREALM_INFO, >> + CEPHFS_FEATURE_HAS_OWNER_UIDGID, >> - CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_32BITS_RETRY_FWD, >> + CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_HAS_OWNER_UIDGID, >> }; >> #define CEPHFS_FEATURES_CLIENT_SUPPORTED { \ >> @@ -49,6 +51,7 @@ enum ceph_feature_type { >> CEPHFS_FEATURE_NOTIFY_SESSION_STATE, \ >> CEPHFS_FEATURE_OP_GETVXATTR, \ >> CEPHFS_FEATURE_32BITS_RETRY_FWD, \ >> + CEPHFS_FEATURE_HAS_OWNER_UIDGID, \ >> } >> /* >> diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h >> index 5f2301ee88bc..6eb83a51341c 100644 >> --- a/include/linux/ceph/ceph_fs.h >> +++ b/include/linux/ceph/ceph_fs.h >> @@ -499,7 +499,7 @@ struct ceph_mds_request_head_legacy { >> union ceph_mds_request_args args; >> } __attribute__ ((packed)); >> -#define CEPH_MDS_REQUEST_HEAD_VERSION 2 >> +#define CEPH_MDS_REQUEST_HEAD_VERSION 3 >> struct ceph_mds_request_head_old { >> __le16 version; /* struct version */ >> @@ -530,6 +530,8 @@ struct ceph_mds_request_head { >> __le32 ext_num_retry; /* new count retry attempts */ >> __le32 ext_num_fwd; /* new count fwd attempts */ >> + >> + __le32 owner_uid, owner_gid; /* used for OPs which create >> inodes */ >> } __attribute__ ((packed)); >> /* cap/lease release record */
On Fri, Aug 4, 2023 at 4:26 AM Xiubo Li <xiubli@redhat.com> wrote: > > > On 8/3/23 21:59, Alexander Mikhalitsyn wrote: > > From: Christian Brauner <brauner@kernel.org> > > > > Inode operations that create a new filesystem object such as ->mknod, > > ->create, ->mkdir() and others don't take a {g,u}id argument explicitly. > > Instead the caller's fs{g,u}id is used for the {g,u}id of the new > > filesystem object. > > > > In order to ensure that the correct {g,u}id is used map the caller's > > fs{g,u}id for creation requests. This doesn't require complex changes. > > It suffices to pass in the relevant idmapping recorded in the request > > message. If this request message was triggered from an inode operation > > that creates filesystem objects it will have passed down the relevant > > idmaping. If this is a request message that was triggered from an inode > > operation that doens't need to take idmappings into account the initial > > idmapping is passed down which is an identity mapping. > > > > This change uses a new cephfs protocol extension CEPHFS_FEATURE_HAS_OWNER_UIDGID > > which adds two new fields (owner_{u,g}id) to the request head structure. > > So, we need to ensure that MDS supports it otherwise we need to fail > > any IO that comes through an idmapped mount because we can't process it > > in a proper way. MDS server without such an extension will use caller_{u,g}id > > fields to set a new inode owner UID/GID which is incorrect because caller_{u,g}id > > values are unmapped. At the same time we can't map these fields with an > > idmapping as it can break UID/GID-based permission checks logic on the > > MDS side. This problem was described with a lot of details at [1], [2]. > > > > [1] https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/ > > [2] https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/ > > > > https://github.com/ceph/ceph/pull/52575 > > https://tracker.ceph.com/issues/62217 > > > > Cc: Xiubo Li <xiubli@redhat.com> > > Cc: Jeff Layton <jlayton@kernel.org> > > Cc: Ilya Dryomov <idryomov@gmail.com> > > Cc: ceph-devel@vger.kernel.org > > Co-Developed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> > > Signed-off-by: Christian Brauner <brauner@kernel.org> > > Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> > > --- > > v7: > > - reworked to use two new fields for owner UID/GID (https://github.com/ceph/ceph/pull/52575) > > v8: > > - properly handled case when old MDS used with new kernel client > > --- > > fs/ceph/mds_client.c | 46 +++++++++++++++++++++++++++++++++--- > > fs/ceph/mds_client.h | 5 +++- > > include/linux/ceph/ceph_fs.h | 4 +++- > > 3 files changed, 50 insertions(+), 5 deletions(-) > > > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c > > index 8829f55103da..7d3106d3b726 100644 > > --- a/fs/ceph/mds_client.c > > +++ b/fs/ceph/mds_client.c > > @@ -2902,6 +2902,17 @@ static void encode_mclientrequest_tail(void **p, const struct ceph_mds_request * > > } > > } > > > > +static inline u16 mds_supported_head_version(struct ceph_mds_session *session) > > +{ > > + if (!test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features)) > > + return 1; > > + > > + if (!test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features)) > > + return 2; > > + > > + return CEPH_MDS_REQUEST_HEAD_VERSION; > > +} > > + > > static struct ceph_mds_request_head_legacy * > > find_legacy_request_head(void *p, u64 features) > > { > > @@ -2923,6 +2934,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, > > { > > int mds = session->s_mds; > > struct ceph_mds_client *mdsc = session->s_mdsc; > > + struct ceph_client *cl = mdsc->fsc->client; > > struct ceph_msg *msg; > > struct ceph_mds_request_head_legacy *lhead; > > const char *path1 = NULL; > > @@ -2936,7 +2948,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, > > void *p, *end; > > int ret; > > bool legacy = !(session->s_con.peer_features & CEPH_FEATURE_FS_BTIME); > > - bool old_version = !test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features); > > + u16 request_head_version = mds_supported_head_version(session); > > > > ret = set_request_path_attr(mdsc, req->r_inode, req->r_dentry, > > req->r_parent, req->r_path1, req->r_ino1.ino, > > @@ -2977,8 +2989,10 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, > > */ > > if (legacy) > > len = sizeof(struct ceph_mds_request_head_legacy); > > - else if (old_version) > > + else if (request_head_version == 1) > > len = sizeof(struct ceph_mds_request_head_old); > > + else if (request_head_version == 2) > > + len = offsetofend(struct ceph_mds_request_head, ext_num_fwd); > > else > > len = sizeof(struct ceph_mds_request_head); > > > > This is not what we suppose to. If we do this again and again when > adding new members it will make the code very complicated to maintain. > > Once the CEPHFS_FEATURE_32BITS_RETRY_FWD has been supported the ceph > should correctly decode it and if CEPHFS_FEATURE_HAS_OWNER_UIDGID is not > supported the decoder should skip it directly. I thought that too. But it doesn't work. Just try - take kernel client testing branch, and then add a new field to the struct ceph_mds_request_head. Compile and try to mount. It will stop to work and on the MDS side you will see something like: 2023-08-03T13:15:40.871+0200 7fe64ef5e640 10 mds.c ms_handle_accept v1:192.168.2.136:0/49354629 con 0x563962206880 session 0x563967054000 2023-08-03T13:15:40.871+0200 7fe650f62640 -1 failed to decode message of type 24 v6: End of buffer [buffer:2] 2023-08-03T13:15:40.871+0200 7fe650f62640 1 dump: 00000000 03 00 01 00 00 00 00 00 00 00 10 00 00 00 00 00 |................| 00000010 00 00 00 00 00 00 01 01 00 00 00 00 00 00 00 00 |................| 00000020 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 |................| 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00000070 00 00 01 01 00 00 00 00 00 00 00 00 00 00 00 01 |................| 00000080 00 00 00 00 00 00 00 00 00 00 00 00 5b 8c cb 64 |............[..d| 00000090 64 78 11 13 01 00 00 00 00 00 00 00 00 00 00 00 |dx..............| 000000a0 00 00 00 00 00 00 00 00 00 00 00 00 |............| 000000ac As I understand, the MDS side is not ready to see struct ceph_mds_request_head bigger in size than supported. > > Is the MDS side buggy ? Why you last version didn't work ? > > Thanks > > - Xiubo > > > @@ -3028,6 +3042,16 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, > > lhead = find_legacy_request_head(msg->front.iov_base, > > session->s_con.peer_features); > > > > + if ((req->r_mnt_idmap != &nop_mnt_idmap) && > > + !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features)) { > > + pr_err_ratelimited_client(cl, > > + "idmapped mount is used and CEPHFS_FEATURE_HAS_OWNER_UIDGID" > > + " is not supported by MDS. Fail request with -EIO.\n"); > > + > > + ret = -EIO; > > + goto out_err; > > + } > > + > > /* > > * The ceph_mds_request_head_legacy didn't contain a version field, and > > * one was added when we moved the message version from 3->4. > > @@ -3035,17 +3059,33 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, > > if (legacy) { > > msg->hdr.version = cpu_to_le16(3); > > p = msg->front.iov_base + sizeof(*lhead); > > - } else if (old_version) { > > + } else if (request_head_version == 1) { > > struct ceph_mds_request_head_old *ohead = msg->front.iov_base; > > > > msg->hdr.version = cpu_to_le16(4); > > ohead->version = cpu_to_le16(1); > > p = msg->front.iov_base + sizeof(*ohead); > > + } else if (request_head_version == 2) { > > + struct ceph_mds_request_head *nhead = msg->front.iov_base; > > + > > + msg->hdr.version = cpu_to_le16(6); > > + nhead->version = cpu_to_le16(2); > > + > > + p = msg->front.iov_base + offsetofend(struct ceph_mds_request_head, ext_num_fwd); > > } else { > > struct ceph_mds_request_head *nhead = msg->front.iov_base; > > + kuid_t owner_fsuid; > > + kgid_t owner_fsgid; > > > > msg->hdr.version = cpu_to_le16(6); > > nhead->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION); > > + > > + owner_fsuid = from_vfsuid(req->r_mnt_idmap, &init_user_ns, > > + VFSUIDT_INIT(req->r_cred->fsuid)); > > + owner_fsgid = from_vfsgid(req->r_mnt_idmap, &init_user_ns, > > + VFSGIDT_INIT(req->r_cred->fsgid)); > > + nhead->owner_uid = cpu_to_le32(from_kuid(&init_user_ns, owner_fsuid)); > > + nhead->owner_gid = cpu_to_le32(from_kgid(&init_user_ns, owner_fsgid)); > > p = msg->front.iov_base + sizeof(*nhead); > > } > > > > diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h > > index e3bbf3ba8ee8..8f683e8203bd 100644 > > --- a/fs/ceph/mds_client.h > > +++ b/fs/ceph/mds_client.h > > @@ -33,8 +33,10 @@ enum ceph_feature_type { > > CEPHFS_FEATURE_NOTIFY_SESSION_STATE, > > CEPHFS_FEATURE_OP_GETVXATTR, > > CEPHFS_FEATURE_32BITS_RETRY_FWD, > > + CEPHFS_FEATURE_NEW_SNAPREALM_INFO, > > + CEPHFS_FEATURE_HAS_OWNER_UIDGID, > > > > - CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_32BITS_RETRY_FWD, > > + CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_HAS_OWNER_UIDGID, > > }; > > > > #define CEPHFS_FEATURES_CLIENT_SUPPORTED { \ > > @@ -49,6 +51,7 @@ enum ceph_feature_type { > > CEPHFS_FEATURE_NOTIFY_SESSION_STATE, \ > > CEPHFS_FEATURE_OP_GETVXATTR, \ > > CEPHFS_FEATURE_32BITS_RETRY_FWD, \ > > + CEPHFS_FEATURE_HAS_OWNER_UIDGID, \ > > } > > > > /* > > diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h > > index 5f2301ee88bc..6eb83a51341c 100644 > > --- a/include/linux/ceph/ceph_fs.h > > +++ b/include/linux/ceph/ceph_fs.h > > @@ -499,7 +499,7 @@ struct ceph_mds_request_head_legacy { > > union ceph_mds_request_args args; > > } __attribute__ ((packed)); > > > > -#define CEPH_MDS_REQUEST_HEAD_VERSION 2 > > +#define CEPH_MDS_REQUEST_HEAD_VERSION 3 > > > > struct ceph_mds_request_head_old { > > __le16 version; /* struct version */ > > @@ -530,6 +530,8 @@ struct ceph_mds_request_head { > > > > __le32 ext_num_retry; /* new count retry attempts */ > > __le32 ext_num_fwd; /* new count fwd attempts */ > > + > > + __le32 owner_uid, owner_gid; /* used for OPs which create inodes */ > > } __attribute__ ((packed)); > > > > /* cap/lease release record */ >
On Fri, Aug 4, 2023 at 5:24 AM Xiubo Li <xiubli@redhat.com> wrote: > > > On 8/4/23 10:26, Xiubo Li wrote: > > > > On 8/3/23 21:59, Alexander Mikhalitsyn wrote: > >> From: Christian Brauner <brauner@kernel.org> > >> > >> Inode operations that create a new filesystem object such as ->mknod, > >> ->create, ->mkdir() and others don't take a {g,u}id argument explicitly. > >> Instead the caller's fs{g,u}id is used for the {g,u}id of the new > >> filesystem object. > >> > >> In order to ensure that the correct {g,u}id is used map the caller's > >> fs{g,u}id for creation requests. This doesn't require complex changes. > >> It suffices to pass in the relevant idmapping recorded in the request > >> message. If this request message was triggered from an inode operation > >> that creates filesystem objects it will have passed down the relevant > >> idmaping. If this is a request message that was triggered from an inode > >> operation that doens't need to take idmappings into account the initial > >> idmapping is passed down which is an identity mapping. > >> > >> This change uses a new cephfs protocol extension > >> CEPHFS_FEATURE_HAS_OWNER_UIDGID > >> which adds two new fields (owner_{u,g}id) to the request head structure. > >> So, we need to ensure that MDS supports it otherwise we need to fail > >> any IO that comes through an idmapped mount because we can't process it > >> in a proper way. MDS server without such an extension will use > >> caller_{u,g}id > >> fields to set a new inode owner UID/GID which is incorrect because > >> caller_{u,g}id > >> values are unmapped. At the same time we can't map these fields with an > >> idmapping as it can break UID/GID-based permission checks logic on the > >> MDS side. This problem was described with a lot of details at [1], [2]. > >> > >> [1] > >> https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/ > >> [2] > >> https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/ > >> > >> https://github.com/ceph/ceph/pull/52575 > >> https://tracker.ceph.com/issues/62217 > >> > >> Cc: Xiubo Li <xiubli@redhat.com> > >> Cc: Jeff Layton <jlayton@kernel.org> > >> Cc: Ilya Dryomov <idryomov@gmail.com> > >> Cc: ceph-devel@vger.kernel.org > >> Co-Developed-by: Alexander Mikhalitsyn > >> <aleksandr.mikhalitsyn@canonical.com> > >> Signed-off-by: Christian Brauner <brauner@kernel.org> > >> Signed-off-by: Alexander Mikhalitsyn > >> <aleksandr.mikhalitsyn@canonical.com> > >> --- > >> v7: > >> - reworked to use two new fields for owner UID/GID > >> (https://github.com/ceph/ceph/pull/52575) > >> v8: > >> - properly handled case when old MDS used with new kernel client > >> --- > >> fs/ceph/mds_client.c | 46 +++++++++++++++++++++++++++++++++--- > >> fs/ceph/mds_client.h | 5 +++- > >> include/linux/ceph/ceph_fs.h | 4 +++- > >> 3 files changed, 50 insertions(+), 5 deletions(-) > >> > >> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c > >> index 8829f55103da..7d3106d3b726 100644 > >> --- a/fs/ceph/mds_client.c > >> +++ b/fs/ceph/mds_client.c > >> @@ -2902,6 +2902,17 @@ static void encode_mclientrequest_tail(void > >> **p, const struct ceph_mds_request * > >> } > >> } > >> +static inline u16 mds_supported_head_version(struct > >> ceph_mds_session *session) > >> +{ > >> + if (!test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, > >> &session->s_features)) > >> + return 1; > >> + > >> + if (!test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, > >> &session->s_features)) > >> + return 2; > >> + > >> + return CEPH_MDS_REQUEST_HEAD_VERSION; > >> +} > >> + > >> static struct ceph_mds_request_head_legacy * > >> find_legacy_request_head(void *p, u64 features) > >> { > >> @@ -2923,6 +2934,7 @@ static struct ceph_msg > >> *create_request_message(struct ceph_mds_session *session, > >> { > >> int mds = session->s_mds; > >> struct ceph_mds_client *mdsc = session->s_mdsc; > >> + struct ceph_client *cl = mdsc->fsc->client; > >> struct ceph_msg *msg; > >> struct ceph_mds_request_head_legacy *lhead; > >> const char *path1 = NULL; > >> @@ -2936,7 +2948,7 @@ static struct ceph_msg > >> *create_request_message(struct ceph_mds_session *session, > >> void *p, *end; > >> int ret; > >> bool legacy = !(session->s_con.peer_features & > >> CEPH_FEATURE_FS_BTIME); > >> - bool old_version = !test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, > >> &session->s_features); > >> + u16 request_head_version = mds_supported_head_version(session); > >> ret = set_request_path_attr(mdsc, req->r_inode, req->r_dentry, > >> req->r_parent, req->r_path1, req->r_ino1.ino, > >> @@ -2977,8 +2989,10 @@ static struct ceph_msg > >> *create_request_message(struct ceph_mds_session *session, > >> */ > >> if (legacy) > >> len = sizeof(struct ceph_mds_request_head_legacy); > >> - else if (old_version) > >> + else if (request_head_version == 1) > >> len = sizeof(struct ceph_mds_request_head_old); > >> + else if (request_head_version == 2) > >> + len = offsetofend(struct ceph_mds_request_head, ext_num_fwd); > >> else > >> len = sizeof(struct ceph_mds_request_head); > > > > This is not what we suppose to. If we do this again and again when > > adding new members it will make the code very complicated to maintain. > > > > Once the CEPHFS_FEATURE_32BITS_RETRY_FWD has been supported the ceph > > should correctly decode it and if CEPHFS_FEATURE_HAS_OWNER_UIDGID is > > not supported the decoder should skip it directly. > > > > Is the MDS side buggy ? Why you last version didn't work ? > > > > I think the ceph side is buggy. Possibly we should add one new `length` > member in struct `struct ceph_mds_request_head` and just skip the extra > bytes when decoding it. Hm, I think I found something suspicious. In cephfs code we have many places that call the DECODE_FINISH macro, but in our decoder we don't have it. From documentation it follows that DECODE_FINISH purpose is precisely about this problem. What do you think? > > Could you fix it together with your ceph PR ? > > Thanks > > - Xiubo > > > > Thanks > > > > - Xiubo > > > >> @@ -3028,6 +3042,16 @@ static struct ceph_msg > >> *create_request_message(struct ceph_mds_session *session, > >> lhead = find_legacy_request_head(msg->front.iov_base, > >> session->s_con.peer_features); > >> + if ((req->r_mnt_idmap != &nop_mnt_idmap) && > >> + !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, > >> &session->s_features)) { > >> + pr_err_ratelimited_client(cl, > >> + "idmapped mount is used and > >> CEPHFS_FEATURE_HAS_OWNER_UIDGID" > >> + " is not supported by MDS. Fail request with -EIO.\n"); > >> + > >> + ret = -EIO; > >> + goto out_err; > >> + } > >> + > >> /* > >> * The ceph_mds_request_head_legacy didn't contain a version > >> field, and > >> * one was added when we moved the message version from 3->4. > >> @@ -3035,17 +3059,33 @@ static struct ceph_msg > >> *create_request_message(struct ceph_mds_session *session, > >> if (legacy) { > >> msg->hdr.version = cpu_to_le16(3); > >> p = msg->front.iov_base + sizeof(*lhead); > >> - } else if (old_version) { > >> + } else if (request_head_version == 1) { > >> struct ceph_mds_request_head_old *ohead = msg->front.iov_base; > >> msg->hdr.version = cpu_to_le16(4); > >> ohead->version = cpu_to_le16(1); > >> p = msg->front.iov_base + sizeof(*ohead); > >> + } else if (request_head_version == 2) { > >> + struct ceph_mds_request_head *nhead = msg->front.iov_base; > >> + > >> + msg->hdr.version = cpu_to_le16(6); > >> + nhead->version = cpu_to_le16(2); > >> + > >> + p = msg->front.iov_base + offsetofend(struct > >> ceph_mds_request_head, ext_num_fwd); > >> } else { > >> struct ceph_mds_request_head *nhead = msg->front.iov_base; > >> + kuid_t owner_fsuid; > >> + kgid_t owner_fsgid; > >> msg->hdr.version = cpu_to_le16(6); > >> nhead->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION); > >> + > >> + owner_fsuid = from_vfsuid(req->r_mnt_idmap, &init_user_ns, > >> + VFSUIDT_INIT(req->r_cred->fsuid)); > >> + owner_fsgid = from_vfsgid(req->r_mnt_idmap, &init_user_ns, > >> + VFSGIDT_INIT(req->r_cred->fsgid)); > >> + nhead->owner_uid = cpu_to_le32(from_kuid(&init_user_ns, > >> owner_fsuid)); > >> + nhead->owner_gid = cpu_to_le32(from_kgid(&init_user_ns, > >> owner_fsgid)); > >> p = msg->front.iov_base + sizeof(*nhead); > >> } > >> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h > >> index e3bbf3ba8ee8..8f683e8203bd 100644 > >> --- a/fs/ceph/mds_client.h > >> +++ b/fs/ceph/mds_client.h > >> @@ -33,8 +33,10 @@ enum ceph_feature_type { > >> CEPHFS_FEATURE_NOTIFY_SESSION_STATE, > >> CEPHFS_FEATURE_OP_GETVXATTR, > >> CEPHFS_FEATURE_32BITS_RETRY_FWD, > >> + CEPHFS_FEATURE_NEW_SNAPREALM_INFO, > >> + CEPHFS_FEATURE_HAS_OWNER_UIDGID, > >> - CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_32BITS_RETRY_FWD, > >> + CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_HAS_OWNER_UIDGID, > >> }; > >> #define CEPHFS_FEATURES_CLIENT_SUPPORTED { \ > >> @@ -49,6 +51,7 @@ enum ceph_feature_type { > >> CEPHFS_FEATURE_NOTIFY_SESSION_STATE, \ > >> CEPHFS_FEATURE_OP_GETVXATTR, \ > >> CEPHFS_FEATURE_32BITS_RETRY_FWD, \ > >> + CEPHFS_FEATURE_HAS_OWNER_UIDGID, \ > >> } > >> /* > >> diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h > >> index 5f2301ee88bc..6eb83a51341c 100644 > >> --- a/include/linux/ceph/ceph_fs.h > >> +++ b/include/linux/ceph/ceph_fs.h > >> @@ -499,7 +499,7 @@ struct ceph_mds_request_head_legacy { > >> union ceph_mds_request_args args; > >> } __attribute__ ((packed)); > >> -#define CEPH_MDS_REQUEST_HEAD_VERSION 2 > >> +#define CEPH_MDS_REQUEST_HEAD_VERSION 3 > >> struct ceph_mds_request_head_old { > >> __le16 version; /* struct version */ > >> @@ -530,6 +530,8 @@ struct ceph_mds_request_head { > >> __le32 ext_num_retry; /* new count retry attempts */ > >> __le32 ext_num_fwd; /* new count fwd attempts */ > >> + > >> + __le32 owner_uid, owner_gid; /* used for OPs which create > >> inodes */ > >> } __attribute__ ((packed)); > >> /* cap/lease release record */ >
On Fri, Aug 4, 2023 at 8:35 AM Aleksandr Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> wrote: > > On Fri, Aug 4, 2023 at 5:24 AM Xiubo Li <xiubli@redhat.com> wrote: > > > > > > On 8/4/23 10:26, Xiubo Li wrote: > > > > > > On 8/3/23 21:59, Alexander Mikhalitsyn wrote: > > >> From: Christian Brauner <brauner@kernel.org> > > >> > > >> Inode operations that create a new filesystem object such as ->mknod, > > >> ->create, ->mkdir() and others don't take a {g,u}id argument explicitly. > > >> Instead the caller's fs{g,u}id is used for the {g,u}id of the new > > >> filesystem object. > > >> > > >> In order to ensure that the correct {g,u}id is used map the caller's > > >> fs{g,u}id for creation requests. This doesn't require complex changes. > > >> It suffices to pass in the relevant idmapping recorded in the request > > >> message. If this request message was triggered from an inode operation > > >> that creates filesystem objects it will have passed down the relevant > > >> idmaping. If this is a request message that was triggered from an inode > > >> operation that doens't need to take idmappings into account the initial > > >> idmapping is passed down which is an identity mapping. > > >> > > >> This change uses a new cephfs protocol extension > > >> CEPHFS_FEATURE_HAS_OWNER_UIDGID > > >> which adds two new fields (owner_{u,g}id) to the request head structure. > > >> So, we need to ensure that MDS supports it otherwise we need to fail > > >> any IO that comes through an idmapped mount because we can't process it > > >> in a proper way. MDS server without such an extension will use > > >> caller_{u,g}id > > >> fields to set a new inode owner UID/GID which is incorrect because > > >> caller_{u,g}id > > >> values are unmapped. At the same time we can't map these fields with an > > >> idmapping as it can break UID/GID-based permission checks logic on the > > >> MDS side. This problem was described with a lot of details at [1], [2]. > > >> > > >> [1] > > >> https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/ > > >> [2] > > >> https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/ > > >> > > >> https://github.com/ceph/ceph/pull/52575 > > >> https://tracker.ceph.com/issues/62217 > > >> > > >> Cc: Xiubo Li <xiubli@redhat.com> > > >> Cc: Jeff Layton <jlayton@kernel.org> > > >> Cc: Ilya Dryomov <idryomov@gmail.com> > > >> Cc: ceph-devel@vger.kernel.org > > >> Co-Developed-by: Alexander Mikhalitsyn > > >> <aleksandr.mikhalitsyn@canonical.com> > > >> Signed-off-by: Christian Brauner <brauner@kernel.org> > > >> Signed-off-by: Alexander Mikhalitsyn > > >> <aleksandr.mikhalitsyn@canonical.com> > > >> --- > > >> v7: > > >> - reworked to use two new fields for owner UID/GID > > >> (https://github.com/ceph/ceph/pull/52575) > > >> v8: > > >> - properly handled case when old MDS used with new kernel client > > >> --- > > >> fs/ceph/mds_client.c | 46 +++++++++++++++++++++++++++++++++--- > > >> fs/ceph/mds_client.h | 5 +++- > > >> include/linux/ceph/ceph_fs.h | 4 +++- > > >> 3 files changed, 50 insertions(+), 5 deletions(-) > > >> > > >> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c > > >> index 8829f55103da..7d3106d3b726 100644 > > >> --- a/fs/ceph/mds_client.c > > >> +++ b/fs/ceph/mds_client.c > > >> @@ -2902,6 +2902,17 @@ static void encode_mclientrequest_tail(void > > >> **p, const struct ceph_mds_request * > > >> } > > >> } > > >> +static inline u16 mds_supported_head_version(struct > > >> ceph_mds_session *session) > > >> +{ > > >> + if (!test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, > > >> &session->s_features)) > > >> + return 1; > > >> + > > >> + if (!test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, > > >> &session->s_features)) > > >> + return 2; > > >> + > > >> + return CEPH_MDS_REQUEST_HEAD_VERSION; > > >> +} > > >> + > > >> static struct ceph_mds_request_head_legacy * > > >> find_legacy_request_head(void *p, u64 features) > > >> { > > >> @@ -2923,6 +2934,7 @@ static struct ceph_msg > > >> *create_request_message(struct ceph_mds_session *session, > > >> { > > >> int mds = session->s_mds; > > >> struct ceph_mds_client *mdsc = session->s_mdsc; > > >> + struct ceph_client *cl = mdsc->fsc->client; > > >> struct ceph_msg *msg; > > >> struct ceph_mds_request_head_legacy *lhead; > > >> const char *path1 = NULL; > > >> @@ -2936,7 +2948,7 @@ static struct ceph_msg > > >> *create_request_message(struct ceph_mds_session *session, > > >> void *p, *end; > > >> int ret; > > >> bool legacy = !(session->s_con.peer_features & > > >> CEPH_FEATURE_FS_BTIME); > > >> - bool old_version = !test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, > > >> &session->s_features); > > >> + u16 request_head_version = mds_supported_head_version(session); > > >> ret = set_request_path_attr(mdsc, req->r_inode, req->r_dentry, > > >> req->r_parent, req->r_path1, req->r_ino1.ino, > > >> @@ -2977,8 +2989,10 @@ static struct ceph_msg > > >> *create_request_message(struct ceph_mds_session *session, > > >> */ > > >> if (legacy) > > >> len = sizeof(struct ceph_mds_request_head_legacy); > > >> - else if (old_version) > > >> + else if (request_head_version == 1) > > >> len = sizeof(struct ceph_mds_request_head_old); > > >> + else if (request_head_version == 2) > > >> + len = offsetofend(struct ceph_mds_request_head, ext_num_fwd); > > >> else > > >> len = sizeof(struct ceph_mds_request_head); > > > > > > This is not what we suppose to. If we do this again and again when > > > adding new members it will make the code very complicated to maintain. > > > > > > Once the CEPHFS_FEATURE_32BITS_RETRY_FWD has been supported the ceph > > > should correctly decode it and if CEPHFS_FEATURE_HAS_OWNER_UIDGID is > > > not supported the decoder should skip it directly. > > > > > > Is the MDS side buggy ? Why you last version didn't work ? > > > > > > > I think the ceph side is buggy. Possibly we should add one new `length` > > member in struct `struct ceph_mds_request_head` and just skip the extra > > bytes when decoding it. > > Hm, I think I found something suspicious. In cephfs code we have many > places that > call the DECODE_FINISH macro, but in our decoder we don't have it. > > From documentation it follows that DECODE_FINISH purpose is precisely > about this problem. > > What do you think? Upd: this thing also changes on-wire format and adds field to store length. But this will be a massive and incompatible protocol change. I don't think that we want to do this in the scope of this task. > > > > > Could you fix it together with your ceph PR ? > > > > Thanks > > > > - Xiubo > > > > > > > Thanks > > > > > > - Xiubo > > > > > >> @@ -3028,6 +3042,16 @@ static struct ceph_msg > > >> *create_request_message(struct ceph_mds_session *session, > > >> lhead = find_legacy_request_head(msg->front.iov_base, > > >> session->s_con.peer_features); > > >> + if ((req->r_mnt_idmap != &nop_mnt_idmap) && > > >> + !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, > > >> &session->s_features)) { > > >> + pr_err_ratelimited_client(cl, > > >> + "idmapped mount is used and > > >> CEPHFS_FEATURE_HAS_OWNER_UIDGID" > > >> + " is not supported by MDS. Fail request with -EIO.\n"); > > >> + > > >> + ret = -EIO; > > >> + goto out_err; > > >> + } > > >> + > > >> /* > > >> * The ceph_mds_request_head_legacy didn't contain a version > > >> field, and > > >> * one was added when we moved the message version from 3->4. > > >> @@ -3035,17 +3059,33 @@ static struct ceph_msg > > >> *create_request_message(struct ceph_mds_session *session, > > >> if (legacy) { > > >> msg->hdr.version = cpu_to_le16(3); > > >> p = msg->front.iov_base + sizeof(*lhead); > > >> - } else if (old_version) { > > >> + } else if (request_head_version == 1) { > > >> struct ceph_mds_request_head_old *ohead = msg->front.iov_base; > > >> msg->hdr.version = cpu_to_le16(4); > > >> ohead->version = cpu_to_le16(1); > > >> p = msg->front.iov_base + sizeof(*ohead); > > >> + } else if (request_head_version == 2) { > > >> + struct ceph_mds_request_head *nhead = msg->front.iov_base; > > >> + > > >> + msg->hdr.version = cpu_to_le16(6); > > >> + nhead->version = cpu_to_le16(2); > > >> + > > >> + p = msg->front.iov_base + offsetofend(struct > > >> ceph_mds_request_head, ext_num_fwd); > > >> } else { > > >> struct ceph_mds_request_head *nhead = msg->front.iov_base; > > >> + kuid_t owner_fsuid; > > >> + kgid_t owner_fsgid; > > >> msg->hdr.version = cpu_to_le16(6); > > >> nhead->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION); > > >> + > > >> + owner_fsuid = from_vfsuid(req->r_mnt_idmap, &init_user_ns, > > >> + VFSUIDT_INIT(req->r_cred->fsuid)); > > >> + owner_fsgid = from_vfsgid(req->r_mnt_idmap, &init_user_ns, > > >> + VFSGIDT_INIT(req->r_cred->fsgid)); > > >> + nhead->owner_uid = cpu_to_le32(from_kuid(&init_user_ns, > > >> owner_fsuid)); > > >> + nhead->owner_gid = cpu_to_le32(from_kgid(&init_user_ns, > > >> owner_fsgid)); > > >> p = msg->front.iov_base + sizeof(*nhead); > > >> } > > >> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h > > >> index e3bbf3ba8ee8..8f683e8203bd 100644 > > >> --- a/fs/ceph/mds_client.h > > >> +++ b/fs/ceph/mds_client.h > > >> @@ -33,8 +33,10 @@ enum ceph_feature_type { > > >> CEPHFS_FEATURE_NOTIFY_SESSION_STATE, > > >> CEPHFS_FEATURE_OP_GETVXATTR, > > >> CEPHFS_FEATURE_32BITS_RETRY_FWD, > > >> + CEPHFS_FEATURE_NEW_SNAPREALM_INFO, > > >> + CEPHFS_FEATURE_HAS_OWNER_UIDGID, > > >> - CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_32BITS_RETRY_FWD, > > >> + CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_HAS_OWNER_UIDGID, > > >> }; > > >> #define CEPHFS_FEATURES_CLIENT_SUPPORTED { \ > > >> @@ -49,6 +51,7 @@ enum ceph_feature_type { > > >> CEPHFS_FEATURE_NOTIFY_SESSION_STATE, \ > > >> CEPHFS_FEATURE_OP_GETVXATTR, \ > > >> CEPHFS_FEATURE_32BITS_RETRY_FWD, \ > > >> + CEPHFS_FEATURE_HAS_OWNER_UIDGID, \ > > >> } > > >> /* > > >> diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h > > >> index 5f2301ee88bc..6eb83a51341c 100644 > > >> --- a/include/linux/ceph/ceph_fs.h > > >> +++ b/include/linux/ceph/ceph_fs.h > > >> @@ -499,7 +499,7 @@ struct ceph_mds_request_head_legacy { > > >> union ceph_mds_request_args args; > > >> } __attribute__ ((packed)); > > >> -#define CEPH_MDS_REQUEST_HEAD_VERSION 2 > > >> +#define CEPH_MDS_REQUEST_HEAD_VERSION 3 > > >> struct ceph_mds_request_head_old { > > >> __le16 version; /* struct version */ > > >> @@ -530,6 +530,8 @@ struct ceph_mds_request_head { > > >> __le32 ext_num_retry; /* new count retry attempts */ > > >> __le32 ext_num_fwd; /* new count fwd attempts */ > > >> + > > >> + __le32 owner_uid, owner_gid; /* used for OPs which create > > >> inodes */ > > >> } __attribute__ ((packed)); > > >> /* cap/lease release record */ > >
On Fri, Aug 4, 2023 at 8:43 AM Aleksandr Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> wrote: > > On Fri, Aug 4, 2023 at 8:35 AM Aleksandr Mikhalitsyn > <aleksandr.mikhalitsyn@canonical.com> wrote: > > > > On Fri, Aug 4, 2023 at 5:24 AM Xiubo Li <xiubli@redhat.com> wrote: > > > > > > > > > On 8/4/23 10:26, Xiubo Li wrote: > > > > > > > > On 8/3/23 21:59, Alexander Mikhalitsyn wrote: > > > >> From: Christian Brauner <brauner@kernel.org> > > > >> > > > >> Inode operations that create a new filesystem object such as ->mknod, > > > >> ->create, ->mkdir() and others don't take a {g,u}id argument explicitly. > > > >> Instead the caller's fs{g,u}id is used for the {g,u}id of the new > > > >> filesystem object. > > > >> > > > >> In order to ensure that the correct {g,u}id is used map the caller's > > > >> fs{g,u}id for creation requests. This doesn't require complex changes. > > > >> It suffices to pass in the relevant idmapping recorded in the request > > > >> message. If this request message was triggered from an inode operation > > > >> that creates filesystem objects it will have passed down the relevant > > > >> idmaping. If this is a request message that was triggered from an inode > > > >> operation that doens't need to take idmappings into account the initial > > > >> idmapping is passed down which is an identity mapping. > > > >> > > > >> This change uses a new cephfs protocol extension > > > >> CEPHFS_FEATURE_HAS_OWNER_UIDGID > > > >> which adds two new fields (owner_{u,g}id) to the request head structure. > > > >> So, we need to ensure that MDS supports it otherwise we need to fail > > > >> any IO that comes through an idmapped mount because we can't process it > > > >> in a proper way. MDS server without such an extension will use > > > >> caller_{u,g}id > > > >> fields to set a new inode owner UID/GID which is incorrect because > > > >> caller_{u,g}id > > > >> values are unmapped. At the same time we can't map these fields with an > > > >> idmapping as it can break UID/GID-based permission checks logic on the > > > >> MDS side. This problem was described with a lot of details at [1], [2]. > > > >> > > > >> [1] > > > >> https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/ > > > >> [2] > > > >> https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/ > > > >> > > > >> https://github.com/ceph/ceph/pull/52575 > > > >> https://tracker.ceph.com/issues/62217 > > > >> > > > >> Cc: Xiubo Li <xiubli@redhat.com> > > > >> Cc: Jeff Layton <jlayton@kernel.org> > > > >> Cc: Ilya Dryomov <idryomov@gmail.com> > > > >> Cc: ceph-devel@vger.kernel.org > > > >> Co-Developed-by: Alexander Mikhalitsyn > > > >> <aleksandr.mikhalitsyn@canonical.com> > > > >> Signed-off-by: Christian Brauner <brauner@kernel.org> > > > >> Signed-off-by: Alexander Mikhalitsyn > > > >> <aleksandr.mikhalitsyn@canonical.com> > > > >> --- > > > >> v7: > > > >> - reworked to use two new fields for owner UID/GID > > > >> (https://github.com/ceph/ceph/pull/52575) > > > >> v8: > > > >> - properly handled case when old MDS used with new kernel client > > > >> --- > > > >> fs/ceph/mds_client.c | 46 +++++++++++++++++++++++++++++++++--- > > > >> fs/ceph/mds_client.h | 5 +++- > > > >> include/linux/ceph/ceph_fs.h | 4 +++- > > > >> 3 files changed, 50 insertions(+), 5 deletions(-) > > > >> > > > >> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c > > > >> index 8829f55103da..7d3106d3b726 100644 > > > >> --- a/fs/ceph/mds_client.c > > > >> +++ b/fs/ceph/mds_client.c > > > >> @@ -2902,6 +2902,17 @@ static void encode_mclientrequest_tail(void > > > >> **p, const struct ceph_mds_request * > > > >> } > > > >> } > > > >> +static inline u16 mds_supported_head_version(struct > > > >> ceph_mds_session *session) > > > >> +{ > > > >> + if (!test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, > > > >> &session->s_features)) > > > >> + return 1; > > > >> + > > > >> + if (!test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, > > > >> &session->s_features)) > > > >> + return 2; > > > >> + > > > >> + return CEPH_MDS_REQUEST_HEAD_VERSION; > > > >> +} > > > >> + > > > >> static struct ceph_mds_request_head_legacy * > > > >> find_legacy_request_head(void *p, u64 features) > > > >> { > > > >> @@ -2923,6 +2934,7 @@ static struct ceph_msg > > > >> *create_request_message(struct ceph_mds_session *session, > > > >> { > > > >> int mds = session->s_mds; > > > >> struct ceph_mds_client *mdsc = session->s_mdsc; > > > >> + struct ceph_client *cl = mdsc->fsc->client; > > > >> struct ceph_msg *msg; > > > >> struct ceph_mds_request_head_legacy *lhead; > > > >> const char *path1 = NULL; > > > >> @@ -2936,7 +2948,7 @@ static struct ceph_msg > > > >> *create_request_message(struct ceph_mds_session *session, > > > >> void *p, *end; > > > >> int ret; > > > >> bool legacy = !(session->s_con.peer_features & > > > >> CEPH_FEATURE_FS_BTIME); > > > >> - bool old_version = !test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, > > > >> &session->s_features); > > > >> + u16 request_head_version = mds_supported_head_version(session); > > > >> ret = set_request_path_attr(mdsc, req->r_inode, req->r_dentry, > > > >> req->r_parent, req->r_path1, req->r_ino1.ino, > > > >> @@ -2977,8 +2989,10 @@ static struct ceph_msg > > > >> *create_request_message(struct ceph_mds_session *session, > > > >> */ > > > >> if (legacy) > > > >> len = sizeof(struct ceph_mds_request_head_legacy); > > > >> - else if (old_version) > > > >> + else if (request_head_version == 1) > > > >> len = sizeof(struct ceph_mds_request_head_old); > > > >> + else if (request_head_version == 2) > > > >> + len = offsetofend(struct ceph_mds_request_head, ext_num_fwd); > > > >> else > > > >> len = sizeof(struct ceph_mds_request_head); > > > > > > > > This is not what we suppose to. If we do this again and again when > > > > adding new members it will make the code very complicated to maintain. > > > > > > > > Once the CEPHFS_FEATURE_32BITS_RETRY_FWD has been supported the ceph > > > > should correctly decode it and if CEPHFS_FEATURE_HAS_OWNER_UIDGID is > > > > not supported the decoder should skip it directly. > > > > > > > > Is the MDS side buggy ? Why you last version didn't work ? > > > > > > > > > > I think the ceph side is buggy. Possibly we should add one new `length` > > > member in struct `struct ceph_mds_request_head` and just skip the extra > > > bytes when decoding it. > > > > Hm, I think I found something suspicious. In cephfs code we have many > > places that > > call the DECODE_FINISH macro, but in our decoder we don't have it. > > > > From documentation it follows that DECODE_FINISH purpose is precisely > > about this problem. > > > > What do you think? > > Upd: this thing also changes on-wire format and adds field to store length. > But this will be a massive and incompatible protocol change. I don't think that > we want to do this in the scope of this task. https://github.com/ceph/ceph/pull/52575#issuecomment-1665141641 > > > > > > > > > Could you fix it together with your ceph PR ? > > > > > > Thanks > > > > > > - Xiubo > > > > > > > > > > Thanks > > > > > > > > - Xiubo > > > > > > > >> @@ -3028,6 +3042,16 @@ static struct ceph_msg > > > >> *create_request_message(struct ceph_mds_session *session, > > > >> lhead = find_legacy_request_head(msg->front.iov_base, > > > >> session->s_con.peer_features); > > > >> + if ((req->r_mnt_idmap != &nop_mnt_idmap) && > > > >> + !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, > > > >> &session->s_features)) { > > > >> + pr_err_ratelimited_client(cl, > > > >> + "idmapped mount is used and > > > >> CEPHFS_FEATURE_HAS_OWNER_UIDGID" > > > >> + " is not supported by MDS. Fail request with -EIO.\n"); > > > >> + > > > >> + ret = -EIO; > > > >> + goto out_err; > > > >> + } > > > >> + > > > >> /* > > > >> * The ceph_mds_request_head_legacy didn't contain a version > > > >> field, and > > > >> * one was added when we moved the message version from 3->4. > > > >> @@ -3035,17 +3059,33 @@ static struct ceph_msg > > > >> *create_request_message(struct ceph_mds_session *session, > > > >> if (legacy) { > > > >> msg->hdr.version = cpu_to_le16(3); > > > >> p = msg->front.iov_base + sizeof(*lhead); > > > >> - } else if (old_version) { > > > >> + } else if (request_head_version == 1) { > > > >> struct ceph_mds_request_head_old *ohead = msg->front.iov_base; > > > >> msg->hdr.version = cpu_to_le16(4); > > > >> ohead->version = cpu_to_le16(1); > > > >> p = msg->front.iov_base + sizeof(*ohead); > > > >> + } else if (request_head_version == 2) { > > > >> + struct ceph_mds_request_head *nhead = msg->front.iov_base; > > > >> + > > > >> + msg->hdr.version = cpu_to_le16(6); > > > >> + nhead->version = cpu_to_le16(2); > > > >> + > > > >> + p = msg->front.iov_base + offsetofend(struct > > > >> ceph_mds_request_head, ext_num_fwd); > > > >> } else { > > > >> struct ceph_mds_request_head *nhead = msg->front.iov_base; > > > >> + kuid_t owner_fsuid; > > > >> + kgid_t owner_fsgid; > > > >> msg->hdr.version = cpu_to_le16(6); > > > >> nhead->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION); > > > >> + > > > >> + owner_fsuid = from_vfsuid(req->r_mnt_idmap, &init_user_ns, > > > >> + VFSUIDT_INIT(req->r_cred->fsuid)); > > > >> + owner_fsgid = from_vfsgid(req->r_mnt_idmap, &init_user_ns, > > > >> + VFSGIDT_INIT(req->r_cred->fsgid)); > > > >> + nhead->owner_uid = cpu_to_le32(from_kuid(&init_user_ns, > > > >> owner_fsuid)); > > > >> + nhead->owner_gid = cpu_to_le32(from_kgid(&init_user_ns, > > > >> owner_fsgid)); > > > >> p = msg->front.iov_base + sizeof(*nhead); > > > >> } > > > >> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h > > > >> index e3bbf3ba8ee8..8f683e8203bd 100644 > > > >> --- a/fs/ceph/mds_client.h > > > >> +++ b/fs/ceph/mds_client.h > > > >> @@ -33,8 +33,10 @@ enum ceph_feature_type { > > > >> CEPHFS_FEATURE_NOTIFY_SESSION_STATE, > > > >> CEPHFS_FEATURE_OP_GETVXATTR, > > > >> CEPHFS_FEATURE_32BITS_RETRY_FWD, > > > >> + CEPHFS_FEATURE_NEW_SNAPREALM_INFO, > > > >> + CEPHFS_FEATURE_HAS_OWNER_UIDGID, > > > >> - CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_32BITS_RETRY_FWD, > > > >> + CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_HAS_OWNER_UIDGID, > > > >> }; > > > >> #define CEPHFS_FEATURES_CLIENT_SUPPORTED { \ > > > >> @@ -49,6 +51,7 @@ enum ceph_feature_type { > > > >> CEPHFS_FEATURE_NOTIFY_SESSION_STATE, \ > > > >> CEPHFS_FEATURE_OP_GETVXATTR, \ > > > >> CEPHFS_FEATURE_32BITS_RETRY_FWD, \ > > > >> + CEPHFS_FEATURE_HAS_OWNER_UIDGID, \ > > > >> } > > > >> /* > > > >> diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h > > > >> index 5f2301ee88bc..6eb83a51341c 100644 > > > >> --- a/include/linux/ceph/ceph_fs.h > > > >> +++ b/include/linux/ceph/ceph_fs.h > > > >> @@ -499,7 +499,7 @@ struct ceph_mds_request_head_legacy { > > > >> union ceph_mds_request_args args; > > > >> } __attribute__ ((packed)); > > > >> -#define CEPH_MDS_REQUEST_HEAD_VERSION 2 > > > >> +#define CEPH_MDS_REQUEST_HEAD_VERSION 3 > > > >> struct ceph_mds_request_head_old { > > > >> __le16 version; /* struct version */ > > > >> @@ -530,6 +530,8 @@ struct ceph_mds_request_head { > > > >> __le32 ext_num_retry; /* new count retry attempts */ > > > >> __le32 ext_num_fwd; /* new count fwd attempts */ > > > >> + > > > >> + __le32 owner_uid, owner_gid; /* used for OPs which create > > > >> inodes */ > > > >> } __attribute__ ((packed)); > > > >> /* cap/lease release record */ > > >
On 8/4/23 14:35, Aleksandr Mikhalitsyn wrote: > On Fri, Aug 4, 2023 at 5:24 AM Xiubo Li <xiubli@redhat.com> wrote: >> >> On 8/4/23 10:26, Xiubo Li wrote: >>> On 8/3/23 21:59, Alexander Mikhalitsyn wrote: >>>> From: Christian Brauner <brauner@kernel.org> >>>> >>>> Inode operations that create a new filesystem object such as ->mknod, >>>> ->create, ->mkdir() and others don't take a {g,u}id argument explicitly. >>>> Instead the caller's fs{g,u}id is used for the {g,u}id of the new >>>> filesystem object. >>>> >>>> In order to ensure that the correct {g,u}id is used map the caller's >>>> fs{g,u}id for creation requests. This doesn't require complex changes. >>>> It suffices to pass in the relevant idmapping recorded in the request >>>> message. If this request message was triggered from an inode operation >>>> that creates filesystem objects it will have passed down the relevant >>>> idmaping. If this is a request message that was triggered from an inode >>>> operation that doens't need to take idmappings into account the initial >>>> idmapping is passed down which is an identity mapping. >>>> >>>> This change uses a new cephfs protocol extension >>>> CEPHFS_FEATURE_HAS_OWNER_UIDGID >>>> which adds two new fields (owner_{u,g}id) to the request head structure. >>>> So, we need to ensure that MDS supports it otherwise we need to fail >>>> any IO that comes through an idmapped mount because we can't process it >>>> in a proper way. MDS server without such an extension will use >>>> caller_{u,g}id >>>> fields to set a new inode owner UID/GID which is incorrect because >>>> caller_{u,g}id >>>> values are unmapped. At the same time we can't map these fields with an >>>> idmapping as it can break UID/GID-based permission checks logic on the >>>> MDS side. This problem was described with a lot of details at [1], [2]. >>>> >>>> [1] >>>> https://lore.kernel.org/lkml/CAEivzxfw1fHO2TFA4dx3u23ZKK6Q+EThfzuibrhA3RKM=ZOYLg@mail.gmail.com/ >>>> [2] >>>> https://lore.kernel.org/all/20220104140414.155198-3-brauner@kernel.org/ >>>> >>>> https://github.com/ceph/ceph/pull/52575 >>>> https://tracker.ceph.com/issues/62217 >>>> >>>> Cc: Xiubo Li <xiubli@redhat.com> >>>> Cc: Jeff Layton <jlayton@kernel.org> >>>> Cc: Ilya Dryomov <idryomov@gmail.com> >>>> Cc: ceph-devel@vger.kernel.org >>>> Co-Developed-by: Alexander Mikhalitsyn >>>> <aleksandr.mikhalitsyn@canonical.com> >>>> Signed-off-by: Christian Brauner <brauner@kernel.org> >>>> Signed-off-by: Alexander Mikhalitsyn >>>> <aleksandr.mikhalitsyn@canonical.com> >>>> --- >>>> v7: >>>> - reworked to use two new fields for owner UID/GID >>>> (https://github.com/ceph/ceph/pull/52575) >>>> v8: >>>> - properly handled case when old MDS used with new kernel client >>>> --- >>>> fs/ceph/mds_client.c | 46 +++++++++++++++++++++++++++++++++--- >>>> fs/ceph/mds_client.h | 5 +++- >>>> include/linux/ceph/ceph_fs.h | 4 +++- >>>> 3 files changed, 50 insertions(+), 5 deletions(-) >>>> >>>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c >>>> index 8829f55103da..7d3106d3b726 100644 >>>> --- a/fs/ceph/mds_client.c >>>> +++ b/fs/ceph/mds_client.c >>>> @@ -2902,6 +2902,17 @@ static void encode_mclientrequest_tail(void >>>> **p, const struct ceph_mds_request * >>>> } >>>> } >>>> +static inline u16 mds_supported_head_version(struct >>>> ceph_mds_session *session) >>>> +{ >>>> + if (!test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, >>>> &session->s_features)) >>>> + return 1; >>>> + >>>> + if (!test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, >>>> &session->s_features)) >>>> + return 2; >>>> + >>>> + return CEPH_MDS_REQUEST_HEAD_VERSION; >>>> +} >>>> + >>>> static struct ceph_mds_request_head_legacy * >>>> find_legacy_request_head(void *p, u64 features) >>>> { >>>> @@ -2923,6 +2934,7 @@ static struct ceph_msg >>>> *create_request_message(struct ceph_mds_session *session, >>>> { >>>> int mds = session->s_mds; >>>> struct ceph_mds_client *mdsc = session->s_mdsc; >>>> + struct ceph_client *cl = mdsc->fsc->client; >>>> struct ceph_msg *msg; >>>> struct ceph_mds_request_head_legacy *lhead; >>>> const char *path1 = NULL; >>>> @@ -2936,7 +2948,7 @@ static struct ceph_msg >>>> *create_request_message(struct ceph_mds_session *session, >>>> void *p, *end; >>>> int ret; >>>> bool legacy = !(session->s_con.peer_features & >>>> CEPH_FEATURE_FS_BTIME); >>>> - bool old_version = !test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, >>>> &session->s_features); >>>> + u16 request_head_version = mds_supported_head_version(session); >>>> ret = set_request_path_attr(mdsc, req->r_inode, req->r_dentry, >>>> req->r_parent, req->r_path1, req->r_ino1.ino, >>>> @@ -2977,8 +2989,10 @@ static struct ceph_msg >>>> *create_request_message(struct ceph_mds_session *session, >>>> */ >>>> if (legacy) >>>> len = sizeof(struct ceph_mds_request_head_legacy); >>>> - else if (old_version) >>>> + else if (request_head_version == 1) >>>> len = sizeof(struct ceph_mds_request_head_old); >>>> + else if (request_head_version == 2) >>>> + len = offsetofend(struct ceph_mds_request_head, ext_num_fwd); >>>> else >>>> len = sizeof(struct ceph_mds_request_head); >>> This is not what we suppose to. If we do this again and again when >>> adding new members it will make the code very complicated to maintain. >>> >>> Once the CEPHFS_FEATURE_32BITS_RETRY_FWD has been supported the ceph >>> should correctly decode it and if CEPHFS_FEATURE_HAS_OWNER_UIDGID is >>> not supported the decoder should skip it directly. >>> >>> Is the MDS side buggy ? Why you last version didn't work ? >>> >> I think the ceph side is buggy. Possibly we should add one new `length` >> member in struct `struct ceph_mds_request_head` and just skip the extra >> bytes when decoding it. > Hm, I think I found something suspicious. In cephfs code we have many > places that > call the DECODE_FINISH macro, but in our decoder we don't have it. > > From documentation it follows that DECODE_FINISH purpose is precisely > about this problem. > > What do you think? Yeah, correct. We also need to do it like this. Thanks - Xiubo >> Could you fix it together with your ceph PR ? >> >> Thanks >> >> - Xiubo >> >> >>> Thanks >>> >>> - Xiubo >>> >>>> @@ -3028,6 +3042,16 @@ static struct ceph_msg >>>> *create_request_message(struct ceph_mds_session *session, >>>> lhead = find_legacy_request_head(msg->front.iov_base, >>>> session->s_con.peer_features); >>>> + if ((req->r_mnt_idmap != &nop_mnt_idmap) && >>>> + !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, >>>> &session->s_features)) { >>>> + pr_err_ratelimited_client(cl, >>>> + "idmapped mount is used and >>>> CEPHFS_FEATURE_HAS_OWNER_UIDGID" >>>> + " is not supported by MDS. Fail request with -EIO.\n"); >>>> + >>>> + ret = -EIO; >>>> + goto out_err; >>>> + } >>>> + >>>> /* >>>> * The ceph_mds_request_head_legacy didn't contain a version >>>> field, and >>>> * one was added when we moved the message version from 3->4. >>>> @@ -3035,17 +3059,33 @@ static struct ceph_msg >>>> *create_request_message(struct ceph_mds_session *session, >>>> if (legacy) { >>>> msg->hdr.version = cpu_to_le16(3); >>>> p = msg->front.iov_base + sizeof(*lhead); >>>> - } else if (old_version) { >>>> + } else if (request_head_version == 1) { >>>> struct ceph_mds_request_head_old *ohead = msg->front.iov_base; >>>> msg->hdr.version = cpu_to_le16(4); >>>> ohead->version = cpu_to_le16(1); >>>> p = msg->front.iov_base + sizeof(*ohead); >>>> + } else if (request_head_version == 2) { >>>> + struct ceph_mds_request_head *nhead = msg->front.iov_base; >>>> + >>>> + msg->hdr.version = cpu_to_le16(6); >>>> + nhead->version = cpu_to_le16(2); >>>> + >>>> + p = msg->front.iov_base + offsetofend(struct >>>> ceph_mds_request_head, ext_num_fwd); >>>> } else { >>>> struct ceph_mds_request_head *nhead = msg->front.iov_base; >>>> + kuid_t owner_fsuid; >>>> + kgid_t owner_fsgid; >>>> msg->hdr.version = cpu_to_le16(6); >>>> nhead->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION); >>>> + >>>> + owner_fsuid = from_vfsuid(req->r_mnt_idmap, &init_user_ns, >>>> + VFSUIDT_INIT(req->r_cred->fsuid)); >>>> + owner_fsgid = from_vfsgid(req->r_mnt_idmap, &init_user_ns, >>>> + VFSGIDT_INIT(req->r_cred->fsgid)); >>>> + nhead->owner_uid = cpu_to_le32(from_kuid(&init_user_ns, >>>> owner_fsuid)); >>>> + nhead->owner_gid = cpu_to_le32(from_kgid(&init_user_ns, >>>> owner_fsgid)); >>>> p = msg->front.iov_base + sizeof(*nhead); >>>> } >>>> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h >>>> index e3bbf3ba8ee8..8f683e8203bd 100644 >>>> --- a/fs/ceph/mds_client.h >>>> +++ b/fs/ceph/mds_client.h >>>> @@ -33,8 +33,10 @@ enum ceph_feature_type { >>>> CEPHFS_FEATURE_NOTIFY_SESSION_STATE, >>>> CEPHFS_FEATURE_OP_GETVXATTR, >>>> CEPHFS_FEATURE_32BITS_RETRY_FWD, >>>> + CEPHFS_FEATURE_NEW_SNAPREALM_INFO, >>>> + CEPHFS_FEATURE_HAS_OWNER_UIDGID, >>>> - CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_32BITS_RETRY_FWD, >>>> + CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_HAS_OWNER_UIDGID, >>>> }; >>>> #define CEPHFS_FEATURES_CLIENT_SUPPORTED { \ >>>> @@ -49,6 +51,7 @@ enum ceph_feature_type { >>>> CEPHFS_FEATURE_NOTIFY_SESSION_STATE, \ >>>> CEPHFS_FEATURE_OP_GETVXATTR, \ >>>> CEPHFS_FEATURE_32BITS_RETRY_FWD, \ >>>> + CEPHFS_FEATURE_HAS_OWNER_UIDGID, \ >>>> } >>>> /* >>>> diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h >>>> index 5f2301ee88bc..6eb83a51341c 100644 >>>> --- a/include/linux/ceph/ceph_fs.h >>>> +++ b/include/linux/ceph/ceph_fs.h >>>> @@ -499,7 +499,7 @@ struct ceph_mds_request_head_legacy { >>>> union ceph_mds_request_args args; >>>> } __attribute__ ((packed)); >>>> -#define CEPH_MDS_REQUEST_HEAD_VERSION 2 >>>> +#define CEPH_MDS_REQUEST_HEAD_VERSION 3 >>>> struct ceph_mds_request_head_old { >>>> __le16 version; /* struct version */ >>>> @@ -530,6 +530,8 @@ struct ceph_mds_request_head { >>>> __le32 ext_num_retry; /* new count retry attempts */ >>>> __le32 ext_num_fwd; /* new count fwd attempts */ >>>> + >>>> + __le32 owner_uid, owner_gid; /* used for OPs which create >>>> inodes */ >>>> } __attribute__ ((packed)); >>>> /* cap/lease release record */
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index 8829f55103da..7d3106d3b726 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -2902,6 +2902,17 @@ static void encode_mclientrequest_tail(void **p, const struct ceph_mds_request * } } +static inline u16 mds_supported_head_version(struct ceph_mds_session *session) +{ + if (!test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features)) + return 1; + + if (!test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features)) + return 2; + + return CEPH_MDS_REQUEST_HEAD_VERSION; +} + static struct ceph_mds_request_head_legacy * find_legacy_request_head(void *p, u64 features) { @@ -2923,6 +2934,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, { int mds = session->s_mds; struct ceph_mds_client *mdsc = session->s_mdsc; + struct ceph_client *cl = mdsc->fsc->client; struct ceph_msg *msg; struct ceph_mds_request_head_legacy *lhead; const char *path1 = NULL; @@ -2936,7 +2948,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, void *p, *end; int ret; bool legacy = !(session->s_con.peer_features & CEPH_FEATURE_FS_BTIME); - bool old_version = !test_bit(CEPHFS_FEATURE_32BITS_RETRY_FWD, &session->s_features); + u16 request_head_version = mds_supported_head_version(session); ret = set_request_path_attr(mdsc, req->r_inode, req->r_dentry, req->r_parent, req->r_path1, req->r_ino1.ino, @@ -2977,8 +2989,10 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, */ if (legacy) len = sizeof(struct ceph_mds_request_head_legacy); - else if (old_version) + else if (request_head_version == 1) len = sizeof(struct ceph_mds_request_head_old); + else if (request_head_version == 2) + len = offsetofend(struct ceph_mds_request_head, ext_num_fwd); else len = sizeof(struct ceph_mds_request_head); @@ -3028,6 +3042,16 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, lhead = find_legacy_request_head(msg->front.iov_base, session->s_con.peer_features); + if ((req->r_mnt_idmap != &nop_mnt_idmap) && + !test_bit(CEPHFS_FEATURE_HAS_OWNER_UIDGID, &session->s_features)) { + pr_err_ratelimited_client(cl, + "idmapped mount is used and CEPHFS_FEATURE_HAS_OWNER_UIDGID" + " is not supported by MDS. Fail request with -EIO.\n"); + + ret = -EIO; + goto out_err; + } + /* * The ceph_mds_request_head_legacy didn't contain a version field, and * one was added when we moved the message version from 3->4. @@ -3035,17 +3059,33 @@ static struct ceph_msg *create_request_message(struct ceph_mds_session *session, if (legacy) { msg->hdr.version = cpu_to_le16(3); p = msg->front.iov_base + sizeof(*lhead); - } else if (old_version) { + } else if (request_head_version == 1) { struct ceph_mds_request_head_old *ohead = msg->front.iov_base; msg->hdr.version = cpu_to_le16(4); ohead->version = cpu_to_le16(1); p = msg->front.iov_base + sizeof(*ohead); + } else if (request_head_version == 2) { + struct ceph_mds_request_head *nhead = msg->front.iov_base; + + msg->hdr.version = cpu_to_le16(6); + nhead->version = cpu_to_le16(2); + + p = msg->front.iov_base + offsetofend(struct ceph_mds_request_head, ext_num_fwd); } else { struct ceph_mds_request_head *nhead = msg->front.iov_base; + kuid_t owner_fsuid; + kgid_t owner_fsgid; msg->hdr.version = cpu_to_le16(6); nhead->version = cpu_to_le16(CEPH_MDS_REQUEST_HEAD_VERSION); + + owner_fsuid = from_vfsuid(req->r_mnt_idmap, &init_user_ns, + VFSUIDT_INIT(req->r_cred->fsuid)); + owner_fsgid = from_vfsgid(req->r_mnt_idmap, &init_user_ns, + VFSGIDT_INIT(req->r_cred->fsgid)); + nhead->owner_uid = cpu_to_le32(from_kuid(&init_user_ns, owner_fsuid)); + nhead->owner_gid = cpu_to_le32(from_kgid(&init_user_ns, owner_fsgid)); p = msg->front.iov_base + sizeof(*nhead); } diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h index e3bbf3ba8ee8..8f683e8203bd 100644 --- a/fs/ceph/mds_client.h +++ b/fs/ceph/mds_client.h @@ -33,8 +33,10 @@ enum ceph_feature_type { CEPHFS_FEATURE_NOTIFY_SESSION_STATE, CEPHFS_FEATURE_OP_GETVXATTR, CEPHFS_FEATURE_32BITS_RETRY_FWD, + CEPHFS_FEATURE_NEW_SNAPREALM_INFO, + CEPHFS_FEATURE_HAS_OWNER_UIDGID, - CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_32BITS_RETRY_FWD, + CEPHFS_FEATURE_MAX = CEPHFS_FEATURE_HAS_OWNER_UIDGID, }; #define CEPHFS_FEATURES_CLIENT_SUPPORTED { \ @@ -49,6 +51,7 @@ enum ceph_feature_type { CEPHFS_FEATURE_NOTIFY_SESSION_STATE, \ CEPHFS_FEATURE_OP_GETVXATTR, \ CEPHFS_FEATURE_32BITS_RETRY_FWD, \ + CEPHFS_FEATURE_HAS_OWNER_UIDGID, \ } /* diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h index 5f2301ee88bc..6eb83a51341c 100644 --- a/include/linux/ceph/ceph_fs.h +++ b/include/linux/ceph/ceph_fs.h @@ -499,7 +499,7 @@ struct ceph_mds_request_head_legacy { union ceph_mds_request_args args; } __attribute__ ((packed)); -#define CEPH_MDS_REQUEST_HEAD_VERSION 2 +#define CEPH_MDS_REQUEST_HEAD_VERSION 3 struct ceph_mds_request_head_old { __le16 version; /* struct version */ @@ -530,6 +530,8 @@ struct ceph_mds_request_head { __le32 ext_num_retry; /* new count retry attempts */ __le32 ext_num_fwd; /* new count fwd attempts */ + + __le32 owner_uid, owner_gid; /* used for OPs which create inodes */ } __attribute__ ((packed)); /* cap/lease release record */