Message ID | 20221031175256.2813280-1-jannh@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp2456293wru; Mon, 31 Oct 2022 10:56:33 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4iR9bvpm7FBCtXfBMJRK3jX+MO3pnQxt5VKEJBaF1KhJ7SabVGYhB+5EjgsmACYv9yJ7ig X-Received: by 2002:a17:906:844b:b0:7a9:f67e:a5d4 with SMTP id e11-20020a170906844b00b007a9f67ea5d4mr13899843ejy.136.1667238992935; Mon, 31 Oct 2022 10:56:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667238992; cv=none; d=google.com; s=arc-20160816; b=JwS0Nh+tkzMwBVUfHOvJ4xjovobxg3UdVaRKdBW0wSz36ZjKMXlAfcf7N0p0EUqPBi 8Bb0qU4iiS4ehx2Gqiyu5t6Q0m07T1H9JbV+zHdpLUO/cWDSxlrmBJigZ3hh8Ilm7mB8 cTYJZTYt8hIkqjWSVwulMFsSz2IsiKc1jyZybSgxtAWLbX3oWCqR86yvt/BOrBUWwZ7e ZuGKcu2kgywgQY7h4xFXytVgCpN+mmgcMNCHzER7i10pPvQE653bLXej15orAKkck0jx VoYlqcGorbo7l4XOd2S2WbN2EkhhGTxOFOEdQOACicQZomZJgmpY8+C+XER3SxPHKtcY b6Bg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=U7UbF4nZPug8lSmlrO1tsRcqhQZwCNdlA+12Y/ynwzs=; b=bK1XWLvDqik1JNFbmghGctD5fYDd0I7voK7W1Q5pMSQFtRLvqZCP3Lnq4XSDvKCMg4 jTOCVAsdrly6ig6c7GqROh7QkD7O/8ds8u0lKFKJn2fLRoY35/Jr76OAuL+hSLyaMo6y KcBHF1nP9ICkfhFXxsB0aq/ZCMJzEGEkH5CQsnh15/ZChJJV0oqZdYwRZBSnqpCoZlwQ LNS6gDBIz5BqYPmbYuyakiS053sk5dWBeu5kSq4jLEjd21DLPc/wUDOiEyXpG/6/UEGH q4Bj0CNvUU2LFbMQnC59DCD5Joa1XkqW9SHJlPxL8AeBXDHvKVrebSVrpE7JVLO+c2wc noQQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=Y097of5r; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m11-20020aa7c2cb000000b00461a3438b24si6660869edp.182.2022.10.31.10.56.08; Mon, 31 Oct 2022 10:56:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=Y097of5r; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230099AbiJaRxN (ORCPT <rfc822;kartikey406@gmail.com> + 99 others); Mon, 31 Oct 2022 13:53:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57468 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231302AbiJaRxF (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 31 Oct 2022 13:53:05 -0400 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AD17913D60 for <linux-kernel@vger.kernel.org>; Mon, 31 Oct 2022 10:53:03 -0700 (PDT) Received: by mail-wr1-x42a.google.com with SMTP id z14so17031842wrn.7 for <linux-kernel@vger.kernel.org>; Mon, 31 Oct 2022 10:53:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=U7UbF4nZPug8lSmlrO1tsRcqhQZwCNdlA+12Y/ynwzs=; b=Y097of5rV70g0ZvYp+B8s8y1ErB3xlFNDaYW6ykJJYNatkuqJlr+z+CyX0O3OdoplY 3mu+Og8NL2wgCY/UedkxZg/63j/icDlf6ELFCdMtov2T/caOd8nStl63B1f6Ng8tcU1M 5hgV7wZQwyNOFnJIJGLc7IfcKpyE6ArcZAM/dMQojkvPcXsLX6wzBQ4wxV0vVsqUBgQo nlt47HP/JLDJZw1BxKGWt8/cguUb1ck9UcwvtLChSK4ybEORV7tsGUXsaKD8Z+PJmzIw A5XIJIsaFJUKt+TXFuyjzrL8cPOIZQL4DK4C9/b8ysCbhMYRzIA+6uJ2wpervRsmuhoV ljgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=U7UbF4nZPug8lSmlrO1tsRcqhQZwCNdlA+12Y/ynwzs=; b=k24gl1V8bmboiqveqEeYBnZidfI7ZLPqQoUfCJgTVrS2DxyODSQuqoOPRMn2kpFRJY nzgc4rgD+J7+O6WAYjpGtJCS/8C2LWkZOJ+jJMV4aoEHkTVefnSHnmAOG5AvUHMRuw2r lR/J2naSgGocpRMpPAfh4X43WhR84Xg9J5NfOI6fBPf9rmOpyNeIeCXhO7VE/IZN4AvM mOd9bxEGr5blg8vSC1QK2II/Ww9VP6Ucik5IXPPwIGohsfuG8OPm+zMlRW4p4mFV+nrq YYJJU+30Sf1v9d8DW84TdCW4Rcmv8KP+NLo4EkLli2Q8GO6E2g4OYLg/4iy+PLviQDGc f4Kg== X-Gm-Message-State: ACrzQf0HTzcmTFVW3+CDjmU0DzxZjbtrTO6yJT3L5HaoCirCseUvnlox S+alLm6uZ77e145RgJTlY8UV8g== X-Received: by 2002:a05:6000:1841:b0:236:70dc:1a6f with SMTP id c1-20020a056000184100b0023670dc1a6fmr9096259wri.464.1667238782105; Mon, 31 Oct 2022 10:53:02 -0700 (PDT) Received: from localhost ([2a00:79e0:9d:4:f03a:db2e:7a5c:b47c]) by smtp.gmail.com with ESMTPSA id w12-20020a5d404c000000b002365254ea42sm7937270wrp.1.2022.10.31.10.53.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 31 Oct 2022 10:53:01 -0700 (PDT) From: Jann Horn <jannh@google.com> To: Al Viro <viro@zeniv.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org>, Miklos Szeredi <mszeredi@redhat.com>, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Will Deacon <will@kernel.org> Subject: [PATCH v2] fs: use acquire ordering in __fget_light() Date: Mon, 31 Oct 2022 18:52:56 +0100 Message-Id: <20221031175256.2813280-1-jannh@google.com> X-Mailer: git-send-email 2.38.1.273.g43a17bfeac-goog MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748226794609760978?= X-GMAIL-MSGID: =?utf-8?q?1748226794609760978?= |
Series |
[v2] fs: use acquire ordering in __fget_light()
|
|
Commit Message
Jann Horn
Oct. 31, 2022, 5:52 p.m. UTC
We must prevent the CPU from reordering the files->count read with the
FD table access like this, on architectures where read-read reordering is
possible:
files_lookup_fd_raw()
close_fd()
put_files_struct()
atomic_read(&files->count)
I would like to mark this for stable, but the stable rules explicitly say
"no theoretical races", and given that the FD table pointer and
files->count are explicitly stored in the same cacheline, this sort of
reordering seems quite unlikely in practice...
Signed-off-by: Jann Horn <jannh@google.com>
---
fs/file.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
base-commit: 30a0b95b1335e12efef89dd78518ed3e4a71a763
Comments
On Mon, Oct 31, 2022 at 06:52:56PM +0100, Jann Horn wrote: > We must prevent the CPU from reordering the files->count read with the > FD table access like this, on architectures where read-read reordering is > possible: > > files_lookup_fd_raw() > close_fd() > put_files_struct() > atomic_read(&files->count) > > I would like to mark this for stable, but the stable rules explicitly say > "no theoretical races", and given that the FD table pointer and > files->count are explicitly stored in the same cacheline, this sort of > reordering seems quite unlikely in practice... Looks sane, but looking at the definition of atomic_read_acquire... ouch. static __always_inline int atomic_read_acquire(const atomic_t *v) { instrument_atomic_read(v, sizeof(*v)); return arch_atomic_read_acquire(v); } OK... ; git grep -n -w arch_atomic_read_acquire include/linux/atomic/atomic-arch-fallback.h:220:#ifndef arch_atomic_read_acquire include/linux/atomic/atomic-arch-fallback.h:222:arch_atomic_read_acquire(const atomic_t *v) include/linux/atomic/atomic-arch-fallback.h:235:#define arch_atomic_read_acquire arch_atomic_read_acquire include/linux/atomic/atomic-instrumented.h:35: return arch_atomic_read_acquire(v); include/linux/atomic/atomic-long.h:529: return arch_atomic_read_acquire(v); No arch-specific instances, so... static __always_inline int arch_atomic_read_acquire(const atomic_t *v) { int ret; if (__native_word(atomic_t)) { ret = smp_load_acquire(&(v)->counter); } else { ret = arch_atomic_read(v); __atomic_acquire_fence(); } return ret; } OK, but when would that test not be true? We have unconditional typedef struct { int counter; } atomic_t; and #define __native_word(t) \ (sizeof(t) == sizeof(char) || sizeof(t) == sizeof(short) || \ sizeof(t) == sizeof(int) || sizeof(t) == sizeof(long)) Do we really have any architectures where a structure with one int field does *not* have a size that would satisfy that check? Is it future-proofing for masturbation sake, or am I missing something real here?
On Mon, Oct 31, 2022 at 7:08 PM Al Viro <viro@zeniv.linux.org.uk> wrote: [...] > No arch-specific instances, so... > static __always_inline int > arch_atomic_read_acquire(const atomic_t *v) > { > int ret; > > if (__native_word(atomic_t)) { > ret = smp_load_acquire(&(v)->counter); > } else { > ret = arch_atomic_read(v); > __atomic_acquire_fence(); > } > > return ret; > } [...] > Do we really have any architectures where a structure with one > int field does *not* have a size that would satisfy that check? > > Is it future-proofing for masturbation sake, or am I missing something > real here? include/linux/atomic/atomic-arch-fallback.h has a comment at the top that says: // Generated by scripts/atomic/gen-atomic-fallback.sh // DO NOT MODIFY THIS FILE DIRECTLY
On Mon, Oct 31, 2022 at 11:08 AM Al Viro <viro@zeniv.linux.org.uk> wrote: > > Looks sane, but looking at the definition of atomic_read_acquire... ouch. The compiler should sort all that out and the mess shouldn't affect any code generation. But I also wouldn't mind somebody fixing things up, because I do agree that checking whether 'atomic_t' is a native word size is kind of pointless and probably just makes our build times unnecessarily longer. Linus
On Mon, Oct 31, 2022 at 07:13:30PM +0100, Jann Horn wrote: > On Mon, Oct 31, 2022 at 7:08 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > [...] > > No arch-specific instances, so... > > static __always_inline int > > arch_atomic_read_acquire(const atomic_t *v) > > { > > int ret; > > > > if (__native_word(atomic_t)) { > > ret = smp_load_acquire(&(v)->counter); > > } else { > > ret = arch_atomic_read(v); > > __atomic_acquire_fence(); > > } > > > > return ret; > > } > [...] > > Do we really have any architectures where a structure with one > > int field does *not* have a size that would satisfy that check? > > > > Is it future-proofing for masturbation sake, or am I missing something > > real here? > > include/linux/atomic/atomic-arch-fallback.h has a comment at the top that says: > > // Generated by scripts/atomic/gen-atomic-fallback.sh > // DO NOT MODIFY THIS FILE DIRECTLY Hmm... Apparently, the source is shared for atomic and atomic64, and the check is intended for atomic64 counterpart of that thing on 32bit boxen. Might make sense to put a comment in there... The question about architectures with non-default implementations still stands, though. Anyway, it's unrelated to the patch itself. I'm fine with it in the current form. Will apply for the next merge window, unless Linus wants it in right now.
On Mon, Oct 31, 2022 at 11:48 AM Al Viro <viro@zeniv.linux.org.uk> wrote: > > Anyway, it's unrelated to the patch itself. I'm fine with it in the current > form. Will apply for the next merge window, unless Linus wants it in right > now. It doesn't strike me as hugely critical, so I'm fine with it being put in any random pile of "fixes to be applied" as long as it doesn't get lost entirely. But if y ou have a "fixes" branch that may end up coming to me before this release is over, that's not the wrong place either. I would tend to agree with Jann that the re-ordering doesn't look very likely because it's the same cacheline, so even an aggressively out-of-order core really doesn't seem to be very likely to trigger any issues. So you have a really unlikely situation to begin with, and even less reason for it then triggering the re-ordering. The "original situation is unlikely" can probably be made quite likely with an active attack, but that active attacker would then also also have to rely on "that re-ordering looks sketchy", and actively hunt for hardware where it can happen. And said hardware may simply not exist, even if the race is certainly theoretically possible on any weakly ordered CPU. Linus
On Mon, Oct 31, 2022 at 12:07:36PM -0700, Linus Torvalds wrote: > On Mon, Oct 31, 2022 at 11:48 AM Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > Anyway, it's unrelated to the patch itself. I'm fine with it in the current > > form. Will apply for the next merge window, unless Linus wants it in right > > now. > > It doesn't strike me as hugely critical, so I'm fine with it being put > in any random pile of "fixes to be applied" as long as it doesn't get > lost entirely. But if y ou have a "fixes" branch that may end up > coming to me before this release is over, that's not the wrong place > either. Applied to #fixes and pushed out...
diff --git a/fs/file.c b/fs/file.c index 5f9c802a5d8d3..c942c89ca4cda 100644 --- a/fs/file.c +++ b/fs/file.c @@ -1003,7 +1003,16 @@ static unsigned long __fget_light(unsigned int fd, fmode_t mask) struct files_struct *files = current->files; struct file *file; - if (atomic_read(&files->count) == 1) { + /* + * If another thread is concurrently calling close_fd() followed + * by put_files_struct(), we must not observe the old table + * entry combined with the new refcount - otherwise we could + * return a file that is concurrently being freed. + * + * atomic_read_acquire() pairs with atomic_dec_and_test() in + * put_files_struct(). + */ + if (atomic_read_acquire(&files->count) == 1) { file = files_lookup_fd_raw(files, fd); if (!file || unlikely(file->f_mode & mask)) return 0;