Message ID | 20231109190844.2044940-1-agruenba@redhat.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b129:0:b0:403:3b70:6f57 with SMTP id q9csp647795vqs; Thu, 9 Nov 2023 11:10:13 -0800 (PST) X-Google-Smtp-Source: AGHT+IGzP5y6JDjMtR4pvtX13ryBEbl0K6tjhvsjSiBjNRaKDGzXeSYenEjuycaaHC4DYdpTfGkf X-Received: by 2002:a17:903:22c7:b0:1cc:70dd:62e7 with SMTP id y7-20020a17090322c700b001cc70dd62e7mr7034735plg.32.1699557013361; Thu, 09 Nov 2023 11:10:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699557013; cv=none; d=google.com; s=arc-20160816; b=CwKCfamYmOxWSfvOKcIEH1sx7WBcELCx3iKsqplOlmCYyOqOkfhFQI0DcUihHiOXpG ATGGgzGVdpuwea3iX6mdRNM3lblV/svNSUzczHnDurSwF4kp/Dg+5QWuVr7XSBBVJpI9 R1oGSerPwxvIV5l7NX1rIba5jZwSjb2QWejZE0WacbDYKx4wLNO2ozgU8JoUTO8+t+Ig yhkdjDfWSoK5tIIfvR1HytB1ZOhWNA0K/oqfIsNx1dR/bUZPrPZ2guvsq1OhUGu16Wxl M2UpR8h6VB4drFmNoNVoeghvJnpBfbdVkrOhBJWSUFKoGi65cPixLFbR3/X92ClJfaIn 5cfQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=uLRQJON7Ac90EDnJlEoBuTyXHqH7isfPCOz8x1WkVk0=; fh=cHX4Q7QVTS11+Rn08OMBejmPcDSgkh3nPG2upxrdrAA=; b=GZ3RBETpDxfE8AyaFtXz9RSxZJaNszFeSrpewJLHCsZ3KERWuqkrw0DCxkKNzhLHT4 tQrL2ML+L4RL1xt8g49mG9IGBqs0W98tw6940yL+9E9EgTlJXNdIL5Ft6pJfUlQzlGpK 4V/bqQAY3b2bh6HcrmzNlwmO39xN2v8NtgwJv+oPYqxnvmljqkigRNcY6dmNSMrixoiy X7lSlugTUC0QeKvNU64m7NvWplCCflR9gxOKhQ6TK+04OQVfunN0Qt60Y2jaEnCnrrBP SWcnRxndOztWKCpMoyz54/k3pJfRWYWJlR+dbs8mSudYbBX75pEBgs6E74dfaPtBFLDe N49A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=gpJfhRKa; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from lipwig.vger.email (lipwig.vger.email. [2620:137:e000::3:3]) by mx.google.com with ESMTPS id a18-20020a170902ecd200b001c6069b659csi5868754plh.384.2023.11.09.11.10.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Nov 2023 11:10:13 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) client-ip=2620:137:e000::3:3; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=gpJfhRKa; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 5C1CC83650A7; Thu, 9 Nov 2023 11:10:09 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345162AbjKITJp (ORCPT <rfc822;lhua1029@gmail.com> + 30 others); Thu, 9 Nov 2023 14:09:45 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59470 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345118AbjKITJj (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 9 Nov 2023 14:09:39 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 30D363C1B for <linux-kernel@vger.kernel.org>; Thu, 9 Nov 2023 11:08:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1699556931; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=uLRQJON7Ac90EDnJlEoBuTyXHqH7isfPCOz8x1WkVk0=; b=gpJfhRKaRidR6s0nkeGZt5gI563xwZEl2PKMDdzpYEmCoZ82Qlq5evP3zIc+kPrkouJxUX Hgg062fnsiMBbCIaflzS9yLB5N1/IKyyHQhtrcr4caD/HKy0mQ2r6YD8hR2gs2HNal4s2B 0nhw/+mEpiFZzvBxonK9yLoR+daCZgg= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-120-rEHXra9eOGScqRP2kSZytg-1; Thu, 09 Nov 2023 14:08:47 -0500 X-MC-Unique: rEHXra9eOGScqRP2kSZytg-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id D6E092815E24; Thu, 9 Nov 2023 19:08:46 +0000 (UTC) Received: from pasta.redhat.com (unknown [10.45.224.96]) by smtp.corp.redhat.com (Postfix) with ESMTP id A66BE492BE7; Thu, 9 Nov 2023 19:08:45 +0000 (UTC) From: Andreas Gruenbacher <agruenba@redhat.com> To: Jens Axboe <axboe@kernel.dk> Cc: Andreas Gruenbacher <agruenba@redhat.com>, Alexander Viro <viro@zeniv.linux.org.uk>, Christian Brauner <brauner@kernel.org>, Abhi Das <adas@redhat.com>, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH] fs: RESOLVE_CACHED final path component fix Date: Thu, 9 Nov 2023 20:08:44 +0100 Message-ID: <20231109190844.2044940-1-agruenba@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Thu, 09 Nov 2023 11:10:09 -0800 (PST) X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1782114694753628202 X-GMAIL-MSGID: 1782114694753628202 |
Series |
fs: RESOLVE_CACHED final path component fix
|
|
Commit Message
Andreas Gruenbacher
Nov. 9, 2023, 7:08 p.m. UTC
Jens,
since your commit 99668f618062, applications can request cached lookups
with the RESOLVE_CACHED openat2() flag. When adding support for that in
gfs2, we found that this causes the ->permission inode operation to be
called with the MAY_NOT_BLOCK flag set for directories along the path,
which is good, but the ->permission check on the final path component is
missing that flag. The filesystem will then sleep when it needs to read
in the ACL, for example.
This doesn't look like the intended RESOLVE_CACHED behavior.
The file permission checks in path_openat() happen as follows:
(1) link_path_walk() -> may_lookup() -> inode_permission() is called for
each but the final path component. If the LOOKUP_RCU nameidata flag is
set, may_lookup() passes the MAY_NOT_BLOCK flag on to
inode_permission(), which passes it on to the permission inode
operation.
(2) do_open() -> may_open() -> inode_permission() is called for the
final path component. The MAY_* flags passed to inode_permission() are
computed by build_open_flags(), outside of do_open(), and passed down
from there. The MAY_NOT_BLOCK flag doesn't get set.
I think we can fix this in build_open_flags(), by setting the
MAY_NOT_BLOCK flag when a RESOLVE_CACHED lookup is requested, right
where RESOLVE_CACHED is mapped to LOOKUP_CACHED as well.
Fixes: 99668f618062 ("fs: expose LOOKUP_CACHED through openat2() RESOLVE_CACHED")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Comments
On Thu, Nov 09, 2023 at 08:08:44PM +0100, Andreas Gruenbacher wrote: > Jens, > > since your commit 99668f618062, applications can request cached lookups > with the RESOLVE_CACHED openat2() flag. When adding support for that in > gfs2, we found that this causes the ->permission inode operation to be > called with the MAY_NOT_BLOCK flag set for directories along the path, > which is good, but the ->permission check on the final path component is > missing that flag. The filesystem will then sleep when it needs to read > in the ACL, for example. > > This doesn't look like the intended RESOLVE_CACHED behavior. > > The file permission checks in path_openat() happen as follows: > > (1) link_path_walk() -> may_lookup() -> inode_permission() is called for > each but the final path component. If the LOOKUP_RCU nameidata flag is > set, may_lookup() passes the MAY_NOT_BLOCK flag on to > inode_permission(), which passes it on to the permission inode > operation. > > (2) do_open() -> may_open() -> inode_permission() is called for the > final path component. The MAY_* flags passed to inode_permission() are > computed by build_open_flags(), outside of do_open(), and passed down > from there. The MAY_NOT_BLOCK flag doesn't get set. > > I think we can fix this in build_open_flags(), by setting the > MAY_NOT_BLOCK flag when a RESOLVE_CACHED lookup is requested, right > where RESOLVE_CACHED is mapped to LOOKUP_CACHED as well. No. This will expose ->permission() instances to previously impossible cases of MAY_NOT_BLOCK lookups, and we already have enough trouble in that area. See RCU pathwalk patches I posted last cycle; I'm planning to rebase what still needs to be rebased and feed the fixes into mainline, but that won't happen until the end of this week *AND* ->permission()-related part of code audit will need to be repeated and extended. Until then - no, with the side of fuck, no.
On Thu, Nov 09, 2023 at 10:00:18PM +0000, Al Viro wrote: > On Thu, Nov 09, 2023 at 08:08:44PM +0100, Andreas Gruenbacher wrote: > > Jens, > > > > since your commit 99668f618062, applications can request cached lookups > > with the RESOLVE_CACHED openat2() flag. When adding support for that in > > gfs2, we found that this causes the ->permission inode operation to be > > called with the MAY_NOT_BLOCK flag set for directories along the path, > > which is good, but the ->permission check on the final path component is > > missing that flag. The filesystem will then sleep when it needs to read > > in the ACL, for example. > > > > This doesn't look like the intended RESOLVE_CACHED behavior. > > > > The file permission checks in path_openat() happen as follows: > > > > (1) link_path_walk() -> may_lookup() -> inode_permission() is called for > > each but the final path component. If the LOOKUP_RCU nameidata flag is > > set, may_lookup() passes the MAY_NOT_BLOCK flag on to > > inode_permission(), which passes it on to the permission inode > > operation. > > > > (2) do_open() -> may_open() -> inode_permission() is called for the > > final path component. The MAY_* flags passed to inode_permission() are > > computed by build_open_flags(), outside of do_open(), and passed down > > from there. The MAY_NOT_BLOCK flag doesn't get set. > > > > I think we can fix this in build_open_flags(), by setting the > > MAY_NOT_BLOCK flag when a RESOLVE_CACHED lookup is requested, right > > where RESOLVE_CACHED is mapped to LOOKUP_CACHED as well. > > No. This will expose ->permission() instances to previously impossible > cases of MAY_NOT_BLOCK lookups, and we already have enough trouble > in that area. See RCU pathwalk patches I posted last cycle; I'm > planning to rebase what still needs to be rebased and feed the > fixes into mainline, but that won't happen until the end of this > week *AND* ->permission()-related part of code audit will need > to be repeated and extended. > > Until then - no, with the side of fuck, no. Note that it's not just "->permission() might get called in RCU mode with combination of flags it never had seen before"; it actually would be called with MAY_NOT_BLOCK and without rcu_read_lock() held. Which means that it can't make the usual assumptions about the objects not getting freed under it in such mode. Sure, the inode itself is pinned in your new case, but anything that goes if !MAY_NOT_BLOCK grab a spinlock grab a reference to X from inode drop spinlock use X, it's pinned down drop a reference to X else READ_ONCE the reference to X use X, its freeing is RCU-delayed would get screwed. And that would need to be audited for all instances of ->permission() in the tree, along with all weird shit they might be pulling off.
Am Do., 9. Nov. 2023 um 23:00 Uhr schrieb Al Viro <viro@zeniv.linux.org.uk>: > On Thu, Nov 09, 2023 at 08:08:44PM +0100, Andreas Gruenbacher wrote: > > Jens, > > > > since your commit 99668f618062, applications can request cached lookups > > with the RESOLVE_CACHED openat2() flag. When adding support for that in > > gfs2, we found that this causes the ->permission inode operation to be > > called with the MAY_NOT_BLOCK flag set for directories along the path, > > which is good, but the ->permission check on the final path component is > > missing that flag. The filesystem will then sleep when it needs to read > > in the ACL, for example. > > > > This doesn't look like the intended RESOLVE_CACHED behavior. > > > > The file permission checks in path_openat() happen as follows: > > > > (1) link_path_walk() -> may_lookup() -> inode_permission() is called for > > each but the final path component. If the LOOKUP_RCU nameidata flag is > > set, may_lookup() passes the MAY_NOT_BLOCK flag on to > > inode_permission(), which passes it on to the permission inode > > operation. > > > > (2) do_open() -> may_open() -> inode_permission() is called for the > > final path component. The MAY_* flags passed to inode_permission() are > > computed by build_open_flags(), outside of do_open(), and passed down > > from there. The MAY_NOT_BLOCK flag doesn't get set. > > > > I think we can fix this in build_open_flags(), by setting the > > MAY_NOT_BLOCK flag when a RESOLVE_CACHED lookup is requested, right > > where RESOLVE_CACHED is mapped to LOOKUP_CACHED as well. > > No. This will expose ->permission() instances to previously impossible > cases of MAY_NOT_BLOCK lookups, and we already have enough trouble > in that area. True, lockdep wouldn't be happy. > See RCU pathwalk patches I posted last cycle; Do you have a pointer? Thanks. > I'm > planning to rebase what still needs to be rebased and feed the > fixes into mainline, but that won't happen until the end of this > week *AND* ->permission()-related part of code audit will need > to be repeated and extended. > > Until then - no, with the side of fuck, no. >
On Thu, Nov 09, 2023 at 11:12:32PM +0100, Andreas Grünbacher wrote: > Am Do., 9. Nov. 2023 um 23:00 Uhr schrieb Al Viro <viro@zeniv.linux.org.uk>: > > On Thu, Nov 09, 2023 at 08:08:44PM +0100, Andreas Gruenbacher wrote: > > > Jens, > > > > > > since your commit 99668f618062, applications can request cached lookups > > > with the RESOLVE_CACHED openat2() flag. When adding support for that in > > > gfs2, we found that this causes the ->permission inode operation to be > > > called with the MAY_NOT_BLOCK flag set for directories along the path, > > > which is good, but the ->permission check on the final path component is > > > missing that flag. The filesystem will then sleep when it needs to read > > > in the ACL, for example. > > > > > > This doesn't look like the intended RESOLVE_CACHED behavior. > > > > > > The file permission checks in path_openat() happen as follows: > > > > > > (1) link_path_walk() -> may_lookup() -> inode_permission() is called for > > > each but the final path component. If the LOOKUP_RCU nameidata flag is > > > set, may_lookup() passes the MAY_NOT_BLOCK flag on to > > > inode_permission(), which passes it on to the permission inode > > > operation. > > > > > > (2) do_open() -> may_open() -> inode_permission() is called for the > > > final path component. The MAY_* flags passed to inode_permission() are > > > computed by build_open_flags(), outside of do_open(), and passed down > > > from there. The MAY_NOT_BLOCK flag doesn't get set. > > > > > > I think we can fix this in build_open_flags(), by setting the > > > MAY_NOT_BLOCK flag when a RESOLVE_CACHED lookup is requested, right > > > where RESOLVE_CACHED is mapped to LOOKUP_CACHED as well. > > > > No. This will expose ->permission() instances to previously impossible > > cases of MAY_NOT_BLOCK lookups, and we already have enough trouble > > in that area. > > True, lockdep wouldn't be happy. > > > See RCU pathwalk patches I posted last cycle; > > Do you have a pointer? Thanks. Thread starting with Message-ID: <20231002022815.GQ800259@ZenIV> I don't remember if I posted the audit notes into it; I'll get around to resurrecting that stuff this weekend, when the mainline settles down enough to bother with that.
Hi Al, Did you get a chance to look into the RCU pathwalk stuff a bit more? Any ideas on how to allow may_open() to indicate to inode_permission() that it's part of a RESOLVE_CACHED lookup? Cheers! --Abhi On Thu, Nov 9, 2023 at 4:23 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Thu, Nov 09, 2023 at 11:12:32PM +0100, Andreas Grünbacher wrote: > > Am Do., 9. Nov. 2023 um 23:00 Uhr schrieb Al Viro <viro@zeniv.linux.org.uk>: > > > On Thu, Nov 09, 2023 at 08:08:44PM +0100, Andreas Gruenbacher wrote: > > > > Jens, > > > > > > > > since your commit 99668f618062, applications can request cached lookups > > > > with the RESOLVE_CACHED openat2() flag. When adding support for that in > > > > gfs2, we found that this causes the ->permission inode operation to be > > > > called with the MAY_NOT_BLOCK flag set for directories along the path, > > > > which is good, but the ->permission check on the final path component is > > > > missing that flag. The filesystem will then sleep when it needs to read > > > > in the ACL, for example. > > > > > > > > This doesn't look like the intended RESOLVE_CACHED behavior. > > > > > > > > The file permission checks in path_openat() happen as follows: > > > > > > > > (1) link_path_walk() -> may_lookup() -> inode_permission() is called for > > > > each but the final path component. If the LOOKUP_RCU nameidata flag is > > > > set, may_lookup() passes the MAY_NOT_BLOCK flag on to > > > > inode_permission(), which passes it on to the permission inode > > > > operation. > > > > > > > > (2) do_open() -> may_open() -> inode_permission() is called for the > > > > final path component. The MAY_* flags passed to inode_permission() are > > > > computed by build_open_flags(), outside of do_open(), and passed down > > > > from there. The MAY_NOT_BLOCK flag doesn't get set. > > > > > > > > I think we can fix this in build_open_flags(), by setting the > > > > MAY_NOT_BLOCK flag when a RESOLVE_CACHED lookup is requested, right > > > > where RESOLVE_CACHED is mapped to LOOKUP_CACHED as well. > > > > > > No. This will expose ->permission() instances to previously impossible > > > cases of MAY_NOT_BLOCK lookups, and we already have enough trouble > > > in that area. > > > > True, lockdep wouldn't be happy. > > > > > See RCU pathwalk patches I posted last cycle; > > > > Do you have a pointer? Thanks. > > Thread starting with Message-ID: <20231002022815.GQ800259@ZenIV> > I don't remember if I posted the audit notes into it; I'll get around > to resurrecting that stuff this weekend, when the mainline settles down > enough to bother with that. >
diff --git a/fs/open.c b/fs/open.c index 98f6601fbac6..61311c9845bd 100644 --- a/fs/open.c +++ b/fs/open.c @@ -1340,6 +1340,7 @@ inline int build_open_flags(const struct open_how *how, struct open_flags *op) if (flags & (O_TRUNC | O_CREAT | __O_TMPFILE)) return -EAGAIN; lookup_flags |= LOOKUP_CACHED; + op->acc_mode |= MAY_NOT_BLOCK; } op->lookup_flags = lookup_flags;