Message ID | 166606036215.13363.1288735296954908554.stgit@donald.themaw.net |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4ac7:0:0:0:0:0 with SMTP id y7csp1746459wrs; Mon, 17 Oct 2022 19:50:50 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5Jq1ohFx+aLZvJY6x9NnVp+/OEDfweyt/p6sB256RNMv0hPeptXcN+CeT1gXFzguFfOD/J X-Received: by 2002:a05:6402:70c:b0:459:7b65:fead with SMTP id w12-20020a056402070c00b004597b65feadmr609110edx.209.1666061449988; Mon, 17 Oct 2022 19:50:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666061449; cv=none; d=google.com; s=arc-20160816; b=kDlZCWjU4feOKMsVvbDW5TbyNcFDWP/Y+I+CE9HoN/8TkSq0Asj+jMyydjm6Y5mMi2 qEpqxT4RPV445U8ImRx6wJjdsre2FOZxOEmZ/zUQRfpRwWi0UfQx+QLkGONTFyFpgjEn WxJa70E+/QHJuLdwSrB56jrwji6ADY7Prfvs3qNeDuXg0caQyXJhN7TxRJLCX/pJoRQQ SgdoQcYxs2IZ+9ESXbe6UWtWVuPaisfRZRZn6DLw0FVDZ4sECub/BNTCvXM5btkDBBNM boqt4Q56kWuhBZW/YJgUfMiP81/P4NkcTYNz+t56ayZzSE8Pk9m2nCEhtQH1YqpoSU4b Qskg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject; bh=WeBLvlw43Q04SBQ4j9FqRARqidIAevq9fsF4RadZmh0=; b=sEg6AJV11CqokD9YL+T0ATZDK+nWUq6ncqfQ7eE1tR1/VH7rWgp7tqbgNC+V20/Jiw 8G3D3jJjAkJQ1bcsuc+jdcGxCO+EDCaTSmXcx/IQ30/vX+GS6C5oDKSK1o8xNTU+HaTx 4pdN4w4ZLCQSgeiXmN6zWsyU0op6VxnMuyaNJviW8xF3koTUM48CXpxC9Ba+ywzyVibp o25pY2c5tdW8mL6qx6b/z3PvYUGQEjd+plRsQMTFjJI61VP4NNfzJdfLl0nIv8Eqhqqu fItMZqjX3x2NALkJwASExFFmA7PgrTKWi41cLt77MYJ6yr/48eosTyf21TbTUTuQot+w 69YQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z72-20020a509e4e000000b004596db363c4si10975935ede.264.2022.10.17.19.50.25; Mon, 17 Oct 2022 19:50:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230077AbiJRCcv (ORCPT <rfc822;carlos.wei.hk@gmail.com> + 99 others); Mon, 17 Oct 2022 22:32:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40276 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229822AbiJRCcs (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 17 Oct 2022 22:32:48 -0400 Received: from smtp01.aussiebb.com.au (smtp01.aussiebb.com.au [121.200.0.92]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7630C92F48; Mon, 17 Oct 2022 19:32:47 -0700 (PDT) Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp01.aussiebb.com.au (Postfix) with ESMTP id B9A01100626; Tue, 18 Oct 2022 13:32:44 +1100 (AEDT) X-Virus-Scanned: Debian amavisd-new at smtp01.aussiebb.com.au Received: from smtp01.aussiebb.com.au ([127.0.0.1]) by localhost (smtp01.aussiebb.com.au [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6T_nZIWt-tY1; Tue, 18 Oct 2022 13:32:44 +1100 (AEDT) Received: by smtp01.aussiebb.com.au (Postfix, from userid 116) id A5E6810061D; Tue, 18 Oct 2022 13:32:44 +1100 (AEDT) X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_NONE, SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 Received: from donald.themaw.net (180-150-90-198.b4965a.per.nbn.aussiebb.net [180.150.90.198]) by smtp01.aussiebb.com.au (Postfix) with ESMTP id E5B0B1005F4; Tue, 18 Oct 2022 13:32:42 +1100 (AEDT) Subject: [PATCH 1/2] kernfs: dont take i_lock on inode attr read From: Ian Kent <raven@themaw.net> To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Tejun Heo <tj@kernel.org> Cc: Minchan Kim <minchan@kernel.org>, Eric Sandeen <sandeen@sandeen.net>, Al Viro <viro@ZenIV.linux.org.uk>, Rick Lindsley <ricklind@linux.vnet.ibm.com>, David Howells <dhowells@redhat.com>, Miklos Szeredi <miklos@szeredi.hu>, Carlos Maiolino <cmaiolino@redhat.com>, linux-fsdevel <linux-fsdevel@vger.kernel.org>, Kernel Mailing List <linux-kernel@vger.kernel.org> Date: Tue, 18 Oct 2022 10:32:42 +0800 Message-ID: <166606036215.13363.1288735296954908554.stgit@donald.themaw.net> In-Reply-To: <166606025456.13363.3829702374064563472.stgit@donald.themaw.net> References: <166606025456.13363.3829702374064563472.stgit@donald.themaw.net> User-Agent: StGit/1.4 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1746992051125035778?= X-GMAIL-MSGID: =?utf-8?q?1746992051125035778?= |
Series |
kernfs: remove i_lock usage that isn't needed
|
|
Commit Message
Ian Kent
Oct. 18, 2022, 2:32 a.m. UTC
The kernfs write lock is held when the kernfs node inode attributes
are updated. Therefore, when either kernfs_iop_getattr() or
kernfs_iop_permission() are called the kernfs node inode attributes
won't change.
Consequently concurrent kernfs_refresh_inode() calls always copy the
same values from the kernfs node.
So there's no need to take the inode i_lock to get consistent values
for generic_fillattr() and generic_permission(), the kernfs read lock
is sufficient.
Signed-off-by: Ian Kent <raven@themaw.net>
---
fs/kernfs/inode.c | 4 ----
1 file changed, 4 deletions(-)
Comments
On Tue, 18 Oct 2022 at 04:32, Ian Kent <raven@themaw.net> wrote: > > The kernfs write lock is held when the kernfs node inode attributes > are updated. Therefore, when either kernfs_iop_getattr() or > kernfs_iop_permission() are called the kernfs node inode attributes > won't change. > > Consequently concurrent kernfs_refresh_inode() calls always copy the > same values from the kernfs node. > > So there's no need to take the inode i_lock to get consistent values > for generic_fillattr() and generic_permission(), the kernfs read lock > is sufficient. > > Signed-off-by: Ian Kent <raven@themaw.net> Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
On Tue, Oct 18, 2022 at 10:32:42AM +0800, Ian Kent wrote: > The kernfs write lock is held when the kernfs node inode attributes > are updated. Therefore, when either kernfs_iop_getattr() or > kernfs_iop_permission() are called the kernfs node inode attributes > won't change. > > Consequently concurrent kernfs_refresh_inode() calls always copy the > same values from the kernfs node. > > So there's no need to take the inode i_lock to get consistent values > for generic_fillattr() and generic_permission(), the kernfs read lock > is sufficient. > > Signed-off-by: Ian Kent <raven@themaw.net> Acked-by: Tejun Heo <tj@kernel.org> Thanks.
On 2022-10-31 12:30, Tejun Heo wrote: > On Tue, Oct 18, 2022 at 10:32:42AM +0800, Ian Kent wrote: > > The kernfs write lock is held when the kernfs node inode attributes > > are updated. Therefore, when either kernfs_iop_getattr() or > > kernfs_iop_permission() are called the kernfs node inode attributes > > won't change. > > > > Consequently concurrent kernfs_refresh_inode() calls always copy the > > same values from the kernfs node. > > > > So there's no need to take the inode i_lock to get consistent values > > for generic_fillattr() and generic_permission(), the kernfs read lock > > is sufficient. > > > > Signed-off-by: Ian Kent <raven@themaw.net> > > Acked-by: Tejun Heo <tj@kernel.org> Hi, Building an allmodconfig arm64 kernel on yesterdays next-20221220 and booting that in qemu I see the following "BUG: KCSAN: data-race in set_nlink / set_nlink". ================================================================== [ 1540.388669][ T123] BUG: KCSAN: data-race in set_nlink / set_nlink [ 1540.392779][ T123] [ 1540.394302][ T123] write to 0xffff00000adcc3e4 of 4 bytes by task 126 on cpu 0: [ 1540.398828][ T123] set_nlink (/home/anders/src/kernel/next/fs/inode.c:371) [ 1540.401609][ T123] kernfs_refresh_inode (/home/anders/src/kernel/next/fs/kernfs/inode.c:181) [ 1540.404925][ T123] kernfs_iop_getattr (/home/anders/src/kernel/next/fs/kernfs/inode.c:194) [ 1540.408088][ T123] vfs_getattr_nosec (/home/anders/src/kernel/next/fs/stat.c:133) [ 1540.411334][ T123] vfs_statx (/home/anders/src/kernel/next/fs/stat.c:170) [ 1540.414224][ T123] vfs_fstatat (/home/anders/src/kernel/next/fs/stat.c:276) [ 1540.417166][ T123] __do_sys_newfstatat (/home/anders/src/kernel/next/fs/stat.c:446) [ 1540.420539][ T123] __arm64_sys_newfstatat (/home/anders/src/kernel/next/fs/stat.c:440 /home/anders/src/kernel/next/fs/stat.c:440) [ 1540.424003][ T123] el0_svc_common.constprop.0 (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:142) [ 1540.427648][ T123] do_el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:197) [ 1540.430378][ T123] el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:142 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:638) [ 1540.433053][ T123] el0t_64_sync_handler (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:656) [ 1540.436421][ T123] el0t_64_sync (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:584) [ 1540.439303][ T123] [ 1540.440828][ T123] 1 lock held by systemd-udevd/126: [ 1540.444055][ T123] #0: ffff00000609b960 (&root->kernfs_rwsem){++++}-{3:3}, at: kernfs_iop_getattr (/home/anders/src/kernel/next/fs/kernfs/inode.c:193) [ 1540.450699][ T123] irq event stamp: 185034 [ 1540.453373][ T123] hardirqs last enabled at (185034): seqcount_lockdep_reader_access (/home/anders/src/kernel/next/mm/page_alloc.c:5302) [ 1540.460087][ T123] hardirqs last disabled at (185033): seqcount_lockdep_reader_access (/home/anders/src/kernel/next/include/linux/seqlock.h:103 (discriminator 4)) [ 1540.466686][ T123] softirqs last enabled at (185001): fpsimd_restore_current_state (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1780) [ 1540.473310][ T123] softirqs last disabled at (184999): fpsimd_restore_current_state (/home/anders/src/kernel/next/include/linux/bottom_half.h:20 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1773) [ 1540.479872][ T123] [ 1540.481406][ T123] read to 0xffff00000adcc3e4 of 4 bytes by task 123 on cpu 0: [ 1540.485893][ T123] set_nlink (/home/anders/src/kernel/next/fs/inode.c:368) [ 1540.488663][ T123] kernfs_refresh_inode (/home/anders/src/kernel/next/fs/kernfs/inode.c:181) [ 1540.491961][ T123] kernfs_iop_permission (/home/anders/src/kernel/next/fs/kernfs/inode.c:290) [ 1540.495260][ T123] inode_permission (/home/anders/src/kernel/next/fs/namei.c:458 /home/anders/src/kernel/next/fs/namei.c:525) [ 1540.498450][ T123] link_path_walk (/home/anders/src/kernel/next/fs/namei.c:1715 /home/anders/src/kernel/next/fs/namei.c:2262) [ 1540.501552][ T123] path_lookupat (/home/anders/src/kernel/next/fs/namei.c:2473 (discriminator 2)) [ 1540.504592][ T123] filename_lookup (/home/anders/src/kernel/next/fs/namei.c:2503) [ 1540.507740][ T123] user_path_at_empty (/home/anders/src/kernel/next/fs/namei.c:2876) [ 1540.511010][ T123] do_readlinkat (/home/anders/src/kernel/next/fs/stat.c:477) [ 1540.514097][ T123] __arm64_sys_readlinkat (/home/anders/src/kernel/next/fs/stat.c:504 /home/anders/src/kernel/next/fs/stat.c:501 /home/anders/src/kernel/next/fs/stat.c:501) [ 1540.517598][ T123] el0_svc_common.constprop.0 (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:142) [ 1540.521319][ T123] do_el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:197) [ 1540.524125][ T123] el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:142 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:638) [ 1540.526795][ T123] el0t_64_sync_handler (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:656) [ 1540.530224][ T123] el0t_64_sync (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:584) [ 1540.533176][ T123] [ 1540.534710][ T123] 1 lock held by systemd-udevd/123: [ 1540.537977][ T123] #0: ffff00000609b960 (&root->kernfs_rwsem){++++}-{3:3}, at: kernfs_iop_permission (/home/anders/src/kernel/next/fs/kernfs/inode.c:289) [ 1540.544892][ T123] irq event stamp: 216564 [ 1540.547603][ T123] hardirqs last enabled at (216564): seqcount_lockdep_reader_access (/home/anders/src/kernel/next/mm/page_alloc.c:5302) [ 1540.554368][ T123] hardirqs last disabled at (216563): seqcount_lockdep_reader_access (/home/anders/src/kernel/next/include/linux/seqlock.h:103 (discriminator 4)) [ 1540.561107][ T123] softirqs last enabled at (216533): fpsimd_restore_current_state (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1780) [ 1540.567833][ T123] softirqs last disabled at (216531): fpsimd_restore_current_state (/home/anders/src/kernel/next/include/linux/bottom_half.h:20 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1773) [ 1540.574496][ T123] [ 1540.576050][ T123] Reported by Kernel Concurrency Sanitizer on: [ 1540.587925][ T123] Hardware name: linux,dummy-virt (DT) [ 1540.591282][ T123] ================================================================== Reverting this patch I don't see the data race anymore. Cheers, Anders
On 21/12/22 21:34, Anders Roxell wrote: > On 2022-10-31 12:30, Tejun Heo wrote: >> On Tue, Oct 18, 2022 at 10:32:42AM +0800, Ian Kent wrote: >>> The kernfs write lock is held when the kernfs node inode attributes >>> are updated. Therefore, when either kernfs_iop_getattr() or >>> kernfs_iop_permission() are called the kernfs node inode attributes >>> won't change. >>> >>> Consequently concurrent kernfs_refresh_inode() calls always copy the >>> same values from the kernfs node. >>> >>> So there's no need to take the inode i_lock to get consistent values >>> for generic_fillattr() and generic_permission(), the kernfs read lock >>> is sufficient. >>> >>> Signed-off-by: Ian Kent <raven@themaw.net> >> Acked-by: Tejun Heo <tj@kernel.org> > Hi, > > Building an allmodconfig arm64 kernel on yesterdays next-20221220 and > booting that in qemu I see the following "BUG: KCSAN: data-race in > set_nlink / set_nlink". I'll check if I missed any places where set_link() could be called where the link count could be different. If there aren't any the question will then be can writing the same value to this location in multiple concurrent threads corrupt it? Ian > > > ================================================================== > [ 1540.388669][ T123] BUG: KCSAN: data-race in set_nlink / set_nlink > [ 1540.392779][ T123] > [ 1540.394302][ T123] write to 0xffff00000adcc3e4 of 4 bytes by task 126 on cpu 0: > [ 1540.398828][ T123] set_nlink (/home/anders/src/kernel/next/fs/inode.c:371) > [ 1540.401609][ T123] kernfs_refresh_inode (/home/anders/src/kernel/next/fs/kernfs/inode.c:181) > [ 1540.404925][ T123] kernfs_iop_getattr (/home/anders/src/kernel/next/fs/kernfs/inode.c:194) > [ 1540.408088][ T123] vfs_getattr_nosec (/home/anders/src/kernel/next/fs/stat.c:133) > [ 1540.411334][ T123] vfs_statx (/home/anders/src/kernel/next/fs/stat.c:170) > [ 1540.414224][ T123] vfs_fstatat (/home/anders/src/kernel/next/fs/stat.c:276) > [ 1540.417166][ T123] __do_sys_newfstatat (/home/anders/src/kernel/next/fs/stat.c:446) > [ 1540.420539][ T123] __arm64_sys_newfstatat (/home/anders/src/kernel/next/fs/stat.c:440 /home/anders/src/kernel/next/fs/stat.c:440) > [ 1540.424003][ T123] el0_svc_common.constprop.0 (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:142) > [ 1540.427648][ T123] do_el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:197) > [ 1540.430378][ T123] el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:142 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:638) > [ 1540.433053][ T123] el0t_64_sync_handler (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:656) > [ 1540.436421][ T123] el0t_64_sync (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:584) > [ 1540.439303][ T123] > [ 1540.440828][ T123] 1 lock held by systemd-udevd/126: > [ 1540.444055][ T123] #0: ffff00000609b960 (&root->kernfs_rwsem){++++}-{3:3}, at: kernfs_iop_getattr (/home/anders/src/kernel/next/fs/kernfs/inode.c:193) > [ 1540.450699][ T123] irq event stamp: 185034 > [ 1540.453373][ T123] hardirqs last enabled at (185034): seqcount_lockdep_reader_access (/home/anders/src/kernel/next/mm/page_alloc.c:5302) > [ 1540.460087][ T123] hardirqs last disabled at (185033): seqcount_lockdep_reader_access (/home/anders/src/kernel/next/include/linux/seqlock.h:103 (discriminator 4)) > [ 1540.466686][ T123] softirqs last enabled at (185001): fpsimd_restore_current_state (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1780) > [ 1540.473310][ T123] softirqs last disabled at (184999): fpsimd_restore_current_state (/home/anders/src/kernel/next/include/linux/bottom_half.h:20 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1773) > [ 1540.479872][ T123] > [ 1540.481406][ T123] read to 0xffff00000adcc3e4 of 4 bytes by task 123 on cpu 0: > [ 1540.485893][ T123] set_nlink (/home/anders/src/kernel/next/fs/inode.c:368) > [ 1540.488663][ T123] kernfs_refresh_inode (/home/anders/src/kernel/next/fs/kernfs/inode.c:181) > [ 1540.491961][ T123] kernfs_iop_permission (/home/anders/src/kernel/next/fs/kernfs/inode.c:290) > [ 1540.495260][ T123] inode_permission (/home/anders/src/kernel/next/fs/namei.c:458 /home/anders/src/kernel/next/fs/namei.c:525) > [ 1540.498450][ T123] link_path_walk (/home/anders/src/kernel/next/fs/namei.c:1715 /home/anders/src/kernel/next/fs/namei.c:2262) > [ 1540.501552][ T123] path_lookupat (/home/anders/src/kernel/next/fs/namei.c:2473 (discriminator 2)) > [ 1540.504592][ T123] filename_lookup (/home/anders/src/kernel/next/fs/namei.c:2503) > [ 1540.507740][ T123] user_path_at_empty (/home/anders/src/kernel/next/fs/namei.c:2876) > [ 1540.511010][ T123] do_readlinkat (/home/anders/src/kernel/next/fs/stat.c:477) > [ 1540.514097][ T123] __arm64_sys_readlinkat (/home/anders/src/kernel/next/fs/stat.c:504 /home/anders/src/kernel/next/fs/stat.c:501 /home/anders/src/kernel/next/fs/stat.c:501) > [ 1540.517598][ T123] el0_svc_common.constprop.0 (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:142) > [ 1540.521319][ T123] do_el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:197) > [ 1540.524125][ T123] el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:142 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:638) > [ 1540.526795][ T123] el0t_64_sync_handler (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:656) > [ 1540.530224][ T123] el0t_64_sync (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:584) > [ 1540.533176][ T123] > [ 1540.534710][ T123] 1 lock held by systemd-udevd/123: > [ 1540.537977][ T123] #0: ffff00000609b960 (&root->kernfs_rwsem){++++}-{3:3}, at: kernfs_iop_permission (/home/anders/src/kernel/next/fs/kernfs/inode.c:289) > [ 1540.544892][ T123] irq event stamp: 216564 > [ 1540.547603][ T123] hardirqs last enabled at (216564): seqcount_lockdep_reader_access (/home/anders/src/kernel/next/mm/page_alloc.c:5302) > [ 1540.554368][ T123] hardirqs last disabled at (216563): seqcount_lockdep_reader_access (/home/anders/src/kernel/next/include/linux/seqlock.h:103 (discriminator 4)) > [ 1540.561107][ T123] softirqs last enabled at (216533): fpsimd_restore_current_state (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1780) > [ 1540.567833][ T123] softirqs last disabled at (216531): fpsimd_restore_current_state (/home/anders/src/kernel/next/include/linux/bottom_half.h:20 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1773) > [ 1540.574496][ T123] > [ 1540.576050][ T123] Reported by Kernel Concurrency Sanitizer on: > [ 1540.587925][ T123] Hardware name: linux,dummy-virt (DT) > [ 1540.591282][ T123] ================================================================== > > > Reverting this patch I don't see the data race anymore. > > Cheers, > Anders
On Fri, Dec 23, 2022, at 00:11, Ian Kent wrote: > On 21/12/22 21:34, Anders Roxell wrote: >> On 2022-10-31 12:30, Tejun Heo wrote: >>> On Tue, Oct 18, 2022 at 10:32:42AM +0800, Ian Kent wrote: >>>> The kernfs write lock is held when the kernfs node inode attributes >>>> are updated. Therefore, when either kernfs_iop_getattr() or >>>> kernfs_iop_permission() are called the kernfs node inode attributes >>>> won't change. >>>> >>>> Consequently concurrent kernfs_refresh_inode() calls always copy the >>>> same values from the kernfs node. >>>> >>>> So there's no need to take the inode i_lock to get consistent values >>>> for generic_fillattr() and generic_permission(), the kernfs read lock >>>> is sufficient. >>>> >>>> Signed-off-by: Ian Kent <raven@themaw.net> >>> Acked-by: Tejun Heo <tj@kernel.org> >> Hi, >> >> Building an allmodconfig arm64 kernel on yesterdays next-20221220 and >> booting that in qemu I see the following "BUG: KCSAN: data-race in >> set_nlink / set_nlink". > > > I'll check if I missed any places where set_link() could be > called where the link count could be different. > > > If there aren't any the question will then be can writing the > same value to this location in multiple concurrent threads > corrupt it? I think the race that is getting reported for set_nlink() is about this bit getting called simulatenously on multiple CPUs with only the read lock held for the inode: /* Yes, some filesystems do change nlink from zero to one */ if (inode->i_nlink == 0) atomic_long_dec(&inode->i_sb->s_remove_count); inode->__i_nlink = nlink; Since i_nlink and __i_nlink refer to the same memory location, the 'inode->i_nlink == 0' check can be true for all of them before the nonzero nlink value gets set, and this results in s_remove_count being decremented more than once. Arnd
On 29/12/22 17:20, Arnd Bergmann wrote: > On Fri, Dec 23, 2022, at 00:11, Ian Kent wrote: >> On 21/12/22 21:34, Anders Roxell wrote: >>> On 2022-10-31 12:30, Tejun Heo wrote: >>>> On Tue, Oct 18, 2022 at 10:32:42AM +0800, Ian Kent wrote: >>>>> The kernfs write lock is held when the kernfs node inode attributes >>>>> are updated. Therefore, when either kernfs_iop_getattr() or >>>>> kernfs_iop_permission() are called the kernfs node inode attributes >>>>> won't change. >>>>> >>>>> Consequently concurrent kernfs_refresh_inode() calls always copy the >>>>> same values from the kernfs node. >>>>> >>>>> So there's no need to take the inode i_lock to get consistent values >>>>> for generic_fillattr() and generic_permission(), the kernfs read lock >>>>> is sufficient. >>>>> >>>>> Signed-off-by: Ian Kent <raven@themaw.net> >>>> Acked-by: Tejun Heo <tj@kernel.org> >>> Hi, >>> >>> Building an allmodconfig arm64 kernel on yesterdays next-20221220 and >>> booting that in qemu I see the following "BUG: KCSAN: data-race in >>> set_nlink / set_nlink". >> >> I'll check if I missed any places where set_link() could be >> called where the link count could be different. >> >> >> If there aren't any the question will then be can writing the >> same value to this location in multiple concurrent threads >> corrupt it? > I think the race that is getting reported for set_nlink() > is about this bit getting called simulatenously on multiple > CPUs with only the read lock held for the inode: > > /* Yes, some filesystems do change nlink from zero to one */ > if (inode->i_nlink == 0) > atomic_long_dec(&inode->i_sb->s_remove_count); > inode->__i_nlink = nlink; > > Since i_nlink and __i_nlink refer to the same memory location, > the 'inode->i_nlink == 0' check can be true for all of them > before the nonzero nlink value gets set, and this results in > s_remove_count being decremented more than once. Thanks for the comment Arnd. I'll certainly have a close look at that. It will be a little while though, given the season, ;). Ian
On 29/12/22 21:07, Ian Kent wrote: > > On 29/12/22 17:20, Arnd Bergmann wrote: >> On Fri, Dec 23, 2022, at 00:11, Ian Kent wrote: >>> On 21/12/22 21:34, Anders Roxell wrote: >>>> On 2022-10-31 12:30, Tejun Heo wrote: >>>>> On Tue, Oct 18, 2022 at 10:32:42AM +0800, Ian Kent wrote: >>>>>> The kernfs write lock is held when the kernfs node inode attributes >>>>>> are updated. Therefore, when either kernfs_iop_getattr() or >>>>>> kernfs_iop_permission() are called the kernfs node inode attributes >>>>>> won't change. >>>>>> >>>>>> Consequently concurrent kernfs_refresh_inode() calls always copy the >>>>>> same values from the kernfs node. >>>>>> >>>>>> So there's no need to take the inode i_lock to get consistent values >>>>>> for generic_fillattr() and generic_permission(), the kernfs read >>>>>> lock >>>>>> is sufficient. >>>>>> >>>>>> Signed-off-by: Ian Kent <raven@themaw.net> >>>>> Acked-by: Tejun Heo <tj@kernel.org> >>>> Hi, >>>> >>>> Building an allmodconfig arm64 kernel on yesterdays next-20221220 and >>>> booting that in qemu I see the following "BUG: KCSAN: data-race in >>>> set_nlink / set_nlink". >>> >>> I'll check if I missed any places where set_link() could be >>> called where the link count could be different. >>> >>> >>> If there aren't any the question will then be can writing the >>> same value to this location in multiple concurrent threads >>> corrupt it? >> I think the race that is getting reported for set_nlink() >> is about this bit getting called simulatenously on multiple >> CPUs with only the read lock held for the inode: >> >> /* Yes, some filesystems do change nlink from zero to one */ >> if (inode->i_nlink == 0) >> atomic_long_dec(&inode->i_sb->s_remove_count); >> inode->__i_nlink = nlink; >> >> Since i_nlink and __i_nlink refer to the same memory location, >> the 'inode->i_nlink == 0' check can be true for all of them >> before the nonzero nlink value gets set, and this results in >> s_remove_count being decremented more than once. > > > Thanks for the comment Arnd. Hello all, I've been looking at this and after consulting Miklos and his pointing out that it looks like a false positive the urgency dropped off a bit. So apologies for taking so long to report back. Anyway it needs some description of conclusions reached so far. I'm still looking around but in short, kernfs will set directories to <# of directory entries> + 2 unconditionally for directories. I can't yet find any other places where i_nlink is set or changed and if there are none then i_nlink will never be set to zero so the race should not occur. Consequently my claim is this is a real false positive. There are the file system operations that may be passed at mount time but given the way kernfs sets i_nlink it pretty much dictates those operations (if there were any that modify it and there don't appear to be any) leave it alone. So it just doesn't make sense for users of kernfs to fiddle with i_nlink ... Ian
On 2023-01-23 11:11, Ian Kent wrote: > > On 29/12/22 21:07, Ian Kent wrote: > > > > On 29/12/22 17:20, Arnd Bergmann wrote: > > > On Fri, Dec 23, 2022, at 00:11, Ian Kent wrote: > > > > On 21/12/22 21:34, Anders Roxell wrote: > > > > > On 2022-10-31 12:30, Tejun Heo wrote: > > > > > > On Tue, Oct 18, 2022 at 10:32:42AM +0800, Ian Kent wrote: > > > > > > > The kernfs write lock is held when the kernfs node inode attributes > > > > > > > are updated. Therefore, when either kernfs_iop_getattr() or > > > > > > > kernfs_iop_permission() are called the kernfs node inode attributes > > > > > > > won't change. > > > > > > > > > > > > > > Consequently concurrent kernfs_refresh_inode() calls always copy the > > > > > > > same values from the kernfs node. > > > > > > > > > > > > > > So there's no need to take the inode i_lock to get consistent values > > > > > > > for generic_fillattr() and generic_permission(), the > > > > > > > kernfs read lock > > > > > > > is sufficient. > > > > > > > > > > > > > > Signed-off-by: Ian Kent <raven@themaw.net> > > > > > > Acked-by: Tejun Heo <tj@kernel.org> > > > > > Hi, > > > > > > > > > > Building an allmodconfig arm64 kernel on yesterdays next-20221220 and > > > > > booting that in qemu I see the following "BUG: KCSAN: data-race in > > > > > set_nlink / set_nlink". > > > > > > > > I'll check if I missed any places where set_link() could be > > > > called where the link count could be different. > > > > > > > > > > > > If there aren't any the question will then be can writing the > > > > same value to this location in multiple concurrent threads > > > > corrupt it? > > > I think the race that is getting reported for set_nlink() > > > is about this bit getting called simulatenously on multiple > > > CPUs with only the read lock held for the inode: > > > > > > /* Yes, some filesystems do change nlink from zero to one */ > > > if (inode->i_nlink == 0) > > > atomic_long_dec(&inode->i_sb->s_remove_count); > > > inode->__i_nlink = nlink; > > > > > > Since i_nlink and __i_nlink refer to the same memory location, > > > the 'inode->i_nlink == 0' check can be true for all of them > > > before the nonzero nlink value gets set, and this results in > > > s_remove_count being decremented more than once. > > > > > > Thanks for the comment Arnd. > > > Hello all, > > > I've been looking at this and after consulting Miklos and his pointing > > out that it looks like a false positive the urgency dropped off a bit. So > > apologies for taking so long to report back. > > > Anyway it needs some description of conclusions reached so far. > > > I'm still looking around but in short, kernfs will set directories to <# of > > directory entries> + 2 unconditionally for directories. I can't yet find > > any other places where i_nlink is set or changed and if there are none > > then i_nlink will never be set to zero so the race should not occur. > > > Consequently my claim is this is a real false positive. > > > There are the file system operations that may be passed at mount time > > but given the way kernfs sets i_nlink it pretty much dictates those > operations > > (if there were any that modify it and there don't appear to be any) leave it > > alone. > > > So it just doesn't make sense for users of kernfs to fiddle with i_nlink ... On todays next tag, next-20230718 this KCSAN BUG poped up again. When I built an allmodconfig arm64 kernel and booted it in QEMU. Full log can be found http://ix.io/4AUd [ 1694.987789][ T137] BUG: KCSAN: data-race in inode_permission / kernfs_refresh_inode [ 1694.992912][ T137] [ 1694.994532][ T137] write to 0xffff00000bab6070 of 2 bytes by task 104 on cpu 0: [ 1694.999269][ T137] kernfs_refresh_inode (/home/anders/src/kernel/next/fs/kernfs/inode.c:171) [ 1695.002707][ T137] kernfs_iop_permission (/home/anders/src/kernel/next/fs/kernfs/inode.c:289) [ 1695.006148][ T137] inode_permission (/home/anders/src/kernel/next/fs/namei.c:461 /home/anders/src/kernel/next/fs/namei.c:528) [ 1695.009420][ T137] link_path_walk (/home/anders/src/kernel/next/fs/namei.c:1720 /home/anders/src/kernel/next/fs/namei.c:2267) [ 1695.012643][ T137] path_lookupat (/home/anders/src/kernel/next/fs/namei.c:2478 (discriminator 2)) [ 1695.015781][ T137] filename_lookup (/home/anders/src/kernel/next/fs/namei.c:2508) [ 1695.019059][ T137] vfs_statx (/home/anders/src/kernel/next/fs/stat.c:238) [ 1695.022024][ T137] vfs_fstatat (/home/anders/src/kernel/next/fs/stat.c:276) [ 1695.025067][ T137] __do_sys_newfstatat (/home/anders/src/kernel/next/fs/stat.c:446) [ 1695.028497][ T137] __arm64_sys_newfstatat (/home/anders/src/kernel/next/fs/stat.c:440 /home/anders/src/kernel/next/fs/stat.c:440) [ 1695.032080][ T137] el0_svc_common.constprop.0 (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:139) [ 1695.035916][ T137] do_el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:188) [ 1695.038796][ T137] el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:144 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:648) [ 1695.041468][ T137] el0t_64_sync_handler (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:666) [ 1695.044889][ T137] el0t_64_sync (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:591) [ 1695.047904][ T137] [ 1695.049511][ T137] 1 lock held by systemd-udevd/104: [ 1695.052837][ T137] #0: ffff000006681e08 (&root->kernfs_iattr_rwsem){++++}-{3:3}, at: kernfs_iop_permission (/home/anders/src/kernel/next/fs/kernfs/inode.c:288) [ 1695.060241][ T137] irq event stamp: 82902 [ 1695.063006][ T137] hardirqs last enabled at (82901): _raw_spin_unlock_irqrestore (/home/anders/src/kernel/next/arch/arm64/include/asm/alternative-macros.h:250 /home/anders/src/kernel/next/arch/arm64/include/asm/irqflags.h:27 /home/anders/src/kernel/next/arch/arm64/include/asm/irqflags.h:140 /home/anders/src/kernel/next/include/linux/spinlock_api_smp.h:151 /home/anders/src/kernel/next/kernel/locking/spinlock.c:194) [ 1695.069673][ T137] hardirqs last disabled at (82902): el1_interrupt (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:472 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:488) [ 1695.075474][ T137] softirqs last enabled at (82792): fpsimd_restore_current_state (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1791) [ 1695.082319][ T137] softirqs last disabled at (82790): fpsimd_restore_current_state (/home/anders/src/kernel/next/include/linux/bottom_half.h:20 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1784) [ 1695.089049][ T137] [ 1695.090659][ T137] read to 0xffff00000bab6070 of 2 bytes by task 137 on cpu 0: [ 1695.095374][ T137] inode_permission (/home/anders/src/kernel/next/fs/namei.c:532) [ 1695.098655][ T137] link_path_walk (/home/anders/src/kernel/next/fs/namei.c:1720 /home/anders/src/kernel/next/fs/namei.c:2267) [ 1695.101857][ T137] path_openat (/home/anders/src/kernel/next/fs/namei.c:3789 (discriminator 2)) [ 1695.104885][ T137] do_filp_open (/home/anders/src/kernel/next/fs/namei.c:3820) [ 1695.108006][ T137] do_sys_openat2 (/home/anders/src/kernel/next/fs/open.c:1418) [ 1695.111290][ T137] __arm64_sys_openat (/home/anders/src/kernel/next/fs/open.c:1433) [ 1695.114825][ T137] el0_svc_common.constprop.0 (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:139) [ 1695.118662][ T137] do_el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:188) [ 1695.121555][ T137] el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:144 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:648) [ 1695.124207][ T137] el0t_64_sync_handler (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:666) [ 1695.127590][ T137] el0t_64_sync (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:591) [ 1695.130641][ T137] [ 1695.132241][ T137] no locks held by systemd-udevd/137. [ 1695.135618][ T137] irq event stamp: 3246 [ 1695.138519][ T137] hardirqs last enabled at (3245): seqcount_lockdep_reader_access (/home/anders/src/kernel/next/include/linux/seqlock.h:105) [ 1695.145825][ T137] hardirqs last disabled at (3246): el1_interrupt (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:472 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:488) [ 1695.151942][ T137] softirqs last enabled at (3208): fpsimd_restore_current_state (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1791) [ 1695.158950][ T137] softirqs last disabled at (3206): fpsimd_restore_current_state (/home/anders/src/kernel/next/include/linux/bottom_half.h:20 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1784) [ 1695.166036][ T137] [ 1695.167621][ T137] Reported by Kernel Concurrency Sanitizer on: [ 1695.179990][ T137] Hardware name: linux,dummy-virt (DT) [ 1695.183687][ T137] ================================================================== [...] [ 1738.053819][ T104] BUG: KCSAN: data-race in set_nlink / set_nlink [ 1738.058223][ T104] [ 1738.059865][ T104] read to 0xffff00000bab6918 of 4 bytes by task 108 on cpu 0: [ 1738.064916][ T104] set_nlink (/home/anders/src/kernel/next/fs/inode.c:369) [ 1738.067845][ T104] kernfs_refresh_inode (/home/anders/src/kernel/next/fs/kernfs/inode.c:180) [ 1738.071607][ T104] kernfs_iop_permission (/home/anders/src/kernel/next/fs/kernfs/inode.c:289) [ 1738.075467][ T104] inode_permission (/home/anders/src/kernel/next/fs/namei.c:461 /home/anders/src/kernel/next/fs/namei.c:528) [ 1738.078868][ T104] link_path_walk (/home/anders/src/kernel/next/fs/namei.c:1720 /home/anders/src/kernel/next/fs/namei.c:2267) [ 1738.082270][ T104] path_lookupat (/home/anders/src/kernel/next/fs/namei.c:2478 (discriminator 2)) [ 1738.085488][ T104] filename_lookup (/home/anders/src/kernel/next/fs/namei.c:2508) [ 1738.089101][ T104] user_path_at_empty (/home/anders/src/kernel/next/fs/namei.c:2907) [ 1738.092469][ T104] do_readlinkat (/home/anders/src/kernel/next/fs/stat.c:477) [ 1738.095970][ T104] __arm64_sys_readlinkat (/home/anders/src/kernel/next/fs/stat.c:504 /home/anders/src/kernel/next/fs/stat.c:501 /home/anders/src/kernel/next/fs/stat.c:501) [ 1738.099529][ T104] el0_svc_common.constprop.0 (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:139) [ 1738.103696][ T104] do_el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:188) [ 1738.106560][ T104] el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:144 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:648) [ 1738.109613][ T104] el0t_64_sync_handler (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:666) [ 1738.113035][ T104] el0t_64_sync (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:591) [ 1738.116346][ T104] [ 1738.117924][ T104] 1 lock held by systemd-udevd/108: [ 1738.121580][ T104] #0: ffff000006681e08 (&root->kernfs_iattr_rwsem){++++}-{3:3}, at: kernfs_iop_permission (/home/anders/src/kernel/next/fs/kernfs/inode.c:288) [ 1738.129355][ T104] irq event stamp: 31000 [ 1738.132088][ T104] hardirqs last enabled at (31000): seqcount_lockdep_reader_access (/home/anders/src/kernel/next/include/linux/seqlock.h:105) [ 1738.139417][ T104] hardirqs last disabled at (30999): seqcount_lockdep_reader_access (/home/anders/src/kernel/next/include/linux/seqlock.h:104) [ 1738.146781][ T104] softirqs last enabled at (30973): fpsimd_restore_current_state (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1791) [ 1738.153891][ T104] softirqs last disabled at (30971): fpsimd_restore_current_state (/home/anders/src/kernel/next/include/linux/bottom_half.h:20 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1784) [ 1738.161012][ T104] [ 1738.162663][ T104] write to 0xffff00000bab6918 of 4 bytes by task 104 on cpu 0: [ 1738.167730][ T104] set_nlink (/home/anders/src/kernel/next/fs/inode.c:372) [ 1738.170559][ T104] kernfs_refresh_inode (/home/anders/src/kernel/next/fs/kernfs/inode.c:180) [ 1738.174355][ T104] kernfs_iop_permission (/home/anders/src/kernel/next/fs/kernfs/inode.c:289) [ 1738.177829][ T104] inode_permission (/home/anders/src/kernel/next/fs/namei.c:461 /home/anders/src/kernel/next/fs/namei.c:528) [ 1738.181403][ T104] link_path_walk (/home/anders/src/kernel/next/fs/namei.c:1720 /home/anders/src/kernel/next/fs/namei.c:2267) [ 1738.184738][ T104] path_lookupat (/home/anders/src/kernel/next/fs/namei.c:2478 (discriminator 2)) [ 1738.188268][ T104] filename_lookup (/home/anders/src/kernel/next/fs/namei.c:2508) [ 1738.191865][ T104] vfs_statx (/home/anders/src/kernel/next/fs/stat.c:238) [ 1738.196236][ T104] vfs_fstatat (/home/anders/src/kernel/next/fs/stat.c:276) [ 1738.200120][ T104] __do_sys_newfstatat (/home/anders/src/kernel/next/fs/stat.c:446) [ 1738.204095][ T104] __arm64_sys_newfstatat (/home/anders/src/kernel/next/fs/stat.c:440 /home/anders/src/kernel/next/fs/stat.c:440) [ 1738.207676][ T104] el0_svc_common.constprop.0 (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:139) [ 1738.211820][ T104] do_el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:188) [ 1738.214815][ T104] el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:144 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:648) [ 1738.217709][ T104] el0t_64_sync_handler (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:666) [ 1738.221239][ T104] el0t_64_sync (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:591) [ 1738.224502][ T104] [ 1738.226090][ T104] 1 lock held by systemd-udevd/104: [ 1738.229747][ T104] #0: ffff000006681e08 (&root->kernfs_iattr_rwsem){++++}-{3:3}, at: kernfs_iop_permission (/home/anders/src/kernel/next/fs/kernfs/inode.c:288) [ 1738.237504][ T104] irq event stamp: 108353 [ 1738.240262][ T104] hardirqs last enabled at (108353): seqcount_lockdep_reader_access (/home/anders/src/kernel/next/include/linux/seqlock.h:105) [ 1738.247443][ T104] hardirqs last disabled at (108352): seqcount_lockdep_reader_access (/home/anders/src/kernel/next/include/linux/seqlock.h:104) [ 1738.254510][ T104] softirqs last enabled at (108326): fpsimd_restore_current_state (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1791) [ 1738.262187][ T104] softirqs last disabled at (108324): fpsimd_restore_current_state (/home/anders/src/kernel/next/include/linux/bottom_half.h:20 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1784) [ 1738.270239][ T104] [ 1738.272140][ T104] Reported by Kernel Concurrency Sanitizer on: [ 1738.285185][ T104] Hardware name: linux,dummy-virt (DT) [ 1738.288703][ T104] ================================================================== Cheers, Anders
On 19/7/23 03:00, Anders Roxell wrote: > On 2023-01-23 11:11, Ian Kent wrote: >> On 29/12/22 21:07, Ian Kent wrote: >>> On 29/12/22 17:20, Arnd Bergmann wrote: >>>> On Fri, Dec 23, 2022, at 00:11, Ian Kent wrote: >>>>> On 21/12/22 21:34, Anders Roxell wrote: >>>>>> On 2022-10-31 12:30, Tejun Heo wrote: >>>>>>> On Tue, Oct 18, 2022 at 10:32:42AM +0800, Ian Kent wrote: >>>>>>>> The kernfs write lock is held when the kernfs node inode attributes >>>>>>>> are updated. Therefore, when either kernfs_iop_getattr() or >>>>>>>> kernfs_iop_permission() are called the kernfs node inode attributes >>>>>>>> won't change. >>>>>>>> >>>>>>>> Consequently concurrent kernfs_refresh_inode() calls always copy the >>>>>>>> same values from the kernfs node. >>>>>>>> >>>>>>>> So there's no need to take the inode i_lock to get consistent values >>>>>>>> for generic_fillattr() and generic_permission(), the >>>>>>>> kernfs read lock >>>>>>>> is sufficient. >>>>>>>> >>>>>>>> Signed-off-by: Ian Kent <raven@themaw.net> >>>>>>> Acked-by: Tejun Heo <tj@kernel.org> >>>>>> Hi, >>>>>> >>>>>> Building an allmodconfig arm64 kernel on yesterdays next-20221220 and >>>>>> booting that in qemu I see the following "BUG: KCSAN: data-race in >>>>>> set_nlink / set_nlink". >>>>> I'll check if I missed any places where set_link() could be >>>>> called where the link count could be different. >>>>> >>>>> >>>>> If there aren't any the question will then be can writing the >>>>> same value to this location in multiple concurrent threads >>>>> corrupt it? >>>> I think the race that is getting reported for set_nlink() >>>> is about this bit getting called simulatenously on multiple >>>> CPUs with only the read lock held for the inode: >>>> >>>> /* Yes, some filesystems do change nlink from zero to one */ >>>> if (inode->i_nlink == 0) >>>> atomic_long_dec(&inode->i_sb->s_remove_count); >>>> inode->__i_nlink = nlink; >>>> >>>> Since i_nlink and __i_nlink refer to the same memory location, >>>> the 'inode->i_nlink == 0' check can be true for all of them >>>> before the nonzero nlink value gets set, and this results in >>>> s_remove_count being decremented more than once. >>> >>> Thanks for the comment Arnd. >> >> Hello all, >> >> >> I've been looking at this and after consulting Miklos and his pointing >> >> out that it looks like a false positive the urgency dropped off a bit. So >> >> apologies for taking so long to report back. >> >> >> Anyway it needs some description of conclusions reached so far. >> >> >> I'm still looking around but in short, kernfs will set directories to <# of >> >> directory entries> + 2 unconditionally for directories. I can't yet find >> >> any other places where i_nlink is set or changed and if there are none >> >> then i_nlink will never be set to zero so the race should not occur. >> >> >> Consequently my claim is this is a real false positive. >> >> >> There are the file system operations that may be passed at mount time >> >> but given the way kernfs sets i_nlink it pretty much dictates those >> operations >> >> (if there were any that modify it and there don't appear to be any) leave it >> >> alone. >> >> >> So it just doesn't make sense for users of kernfs to fiddle with i_nlink ... > On todays next tag, next-20230718 this KCSAN BUG poped up again. When I > built an allmodconfig arm64 kernel and booted it in QEMU. Full log can > be found http://ix.io/4AUd > > [ 1694.987789][ T137] BUG: KCSAN: data-race in inode_permission / kernfs_refresh_inode > [ 1694.992912][ T137] > [ 1694.994532][ T137] write to 0xffff00000bab6070 of 2 bytes by task 104 on cpu 0: > [ 1694.999269][ T137] kernfs_refresh_inode (/home/anders/src/kernel/next/fs/kernfs/inode.c:171) > [ 1695.002707][ T137] kernfs_iop_permission (/home/anders/src/kernel/next/fs/kernfs/inode.c:289) > [ 1695.006148][ T137] inode_permission (/home/anders/src/kernel/next/fs/namei.c:461 /home/anders/src/kernel/next/fs/namei.c:528) > [ 1695.009420][ T137] link_path_walk (/home/anders/src/kernel/next/fs/namei.c:1720 /home/anders/src/kernel/next/fs/namei.c:2267) > [ 1695.012643][ T137] path_lookupat (/home/anders/src/kernel/next/fs/namei.c:2478 (discriminator 2)) > [ 1695.015781][ T137] filename_lookup (/home/anders/src/kernel/next/fs/namei.c:2508) > [ 1695.019059][ T137] vfs_statx (/home/anders/src/kernel/next/fs/stat.c:238) > [ 1695.022024][ T137] vfs_fstatat (/home/anders/src/kernel/next/fs/stat.c:276) > [ 1695.025067][ T137] __do_sys_newfstatat (/home/anders/src/kernel/next/fs/stat.c:446) > [ 1695.028497][ T137] __arm64_sys_newfstatat (/home/anders/src/kernel/next/fs/stat.c:440 /home/anders/src/kernel/next/fs/stat.c:440) > [ 1695.032080][ T137] el0_svc_common.constprop.0 (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:139) > [ 1695.035916][ T137] do_el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:188) > [ 1695.038796][ T137] el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:144 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:648) > [ 1695.041468][ T137] el0t_64_sync_handler (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:666) > [ 1695.044889][ T137] el0t_64_sync (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:591) > [ 1695.047904][ T137] > [ 1695.049511][ T137] 1 lock held by systemd-udevd/104: > [ 1695.052837][ T137] #0: ffff000006681e08 (&root->kernfs_iattr_rwsem){++++}-{3:3}, at: kernfs_iop_permission (/home/anders/src/kernel/next/fs/kernfs/inode.c:288) > [ 1695.060241][ T137] irq event stamp: 82902 > [ 1695.063006][ T137] hardirqs last enabled at (82901): _raw_spin_unlock_irqrestore (/home/anders/src/kernel/next/arch/arm64/include/asm/alternative-macros.h:250 /home/anders/src/kernel/next/arch/arm64/include/asm/irqflags.h:27 /home/anders/src/kernel/next/arch/arm64/include/asm/irqflags.h:140 /home/anders/src/kernel/next/include/linux/spinlock_api_smp.h:151 /home/anders/src/kernel/next/kernel/locking/spinlock.c:194) > [ 1695.069673][ T137] hardirqs last disabled at (82902): el1_interrupt (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:472 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:488) > [ 1695.075474][ T137] softirqs last enabled at (82792): fpsimd_restore_current_state (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1791) > [ 1695.082319][ T137] softirqs last disabled at (82790): fpsimd_restore_current_state (/home/anders/src/kernel/next/include/linux/bottom_half.h:20 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1784) > [ 1695.089049][ T137] > [ 1695.090659][ T137] read to 0xffff00000bab6070 of 2 bytes by task 137 on cpu 0: > [ 1695.095374][ T137] inode_permission (/home/anders/src/kernel/next/fs/namei.c:532) > [ 1695.098655][ T137] link_path_walk (/home/anders/src/kernel/next/fs/namei.c:1720 /home/anders/src/kernel/next/fs/namei.c:2267) > [ 1695.101857][ T137] path_openat (/home/anders/src/kernel/next/fs/namei.c:3789 (discriminator 2)) > [ 1695.104885][ T137] do_filp_open (/home/anders/src/kernel/next/fs/namei.c:3820) > [ 1695.108006][ T137] do_sys_openat2 (/home/anders/src/kernel/next/fs/open.c:1418) > [ 1695.111290][ T137] __arm64_sys_openat (/home/anders/src/kernel/next/fs/open.c:1433) > [ 1695.114825][ T137] el0_svc_common.constprop.0 (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:139) > [ 1695.118662][ T137] do_el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:188) > [ 1695.121555][ T137] el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:144 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:648) > [ 1695.124207][ T137] el0t_64_sync_handler (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:666) > [ 1695.127590][ T137] el0t_64_sync (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:591) > [ 1695.130641][ T137] > [ 1695.132241][ T137] no locks held by systemd-udevd/137. > [ 1695.135618][ T137] irq event stamp: 3246 > [ 1695.138519][ T137] hardirqs last enabled at (3245): seqcount_lockdep_reader_access (/home/anders/src/kernel/next/include/linux/seqlock.h:105) > [ 1695.145825][ T137] hardirqs last disabled at (3246): el1_interrupt (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:472 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:488) > [ 1695.151942][ T137] softirqs last enabled at (3208): fpsimd_restore_current_state (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1791) > [ 1695.158950][ T137] softirqs last disabled at (3206): fpsimd_restore_current_state (/home/anders/src/kernel/next/include/linux/bottom_half.h:20 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1784) > [ 1695.166036][ T137] > [ 1695.167621][ T137] Reported by Kernel Concurrency Sanitizer on: > [ 1695.179990][ T137] Hardware name: linux,dummy-virt (DT) > [ 1695.183687][ T137] ================================================================== This one is different to the original. I can't see where the problem is here, can someone help me out please. I don't see any shared data values used by the call devcgroup_inode_permission(inode, mask) in fs/namei.c:inode_permission() that might be a problem during the read except possibly inode->i_mode. I'll check on that ... maybe something's been missed when kernfs_rwsem was changed to a separate lock (kernfs_iattr_rwsem). > > [...] > > [ 1738.053819][ T104] BUG: KCSAN: data-race in set_nlink / set_nlink > [ 1738.058223][ T104] > [ 1738.059865][ T104] read to 0xffff00000bab6918 of 4 bytes by task 108 on cpu 0: > [ 1738.064916][ T104] set_nlink (/home/anders/src/kernel/next/fs/inode.c:369) > [ 1738.067845][ T104] kernfs_refresh_inode (/home/anders/src/kernel/next/fs/kernfs/inode.c:180) > [ 1738.071607][ T104] kernfs_iop_permission (/home/anders/src/kernel/next/fs/kernfs/inode.c:289) > [ 1738.075467][ T104] inode_permission (/home/anders/src/kernel/next/fs/namei.c:461 /home/anders/src/kernel/next/fs/namei.c:528) > [ 1738.078868][ T104] link_path_walk (/home/anders/src/kernel/next/fs/namei.c:1720 /home/anders/src/kernel/next/fs/namei.c:2267) > [ 1738.082270][ T104] path_lookupat (/home/anders/src/kernel/next/fs/namei.c:2478 (discriminator 2)) > [ 1738.085488][ T104] filename_lookup (/home/anders/src/kernel/next/fs/namei.c:2508) > [ 1738.089101][ T104] user_path_at_empty (/home/anders/src/kernel/next/fs/namei.c:2907) > [ 1738.092469][ T104] do_readlinkat (/home/anders/src/kernel/next/fs/stat.c:477) > [ 1738.095970][ T104] __arm64_sys_readlinkat (/home/anders/src/kernel/next/fs/stat.c:504 /home/anders/src/kernel/next/fs/stat.c:501 /home/anders/src/kernel/next/fs/stat.c:501) > [ 1738.099529][ T104] el0_svc_common.constprop.0 (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:139) > [ 1738.103696][ T104] do_el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:188) > [ 1738.106560][ T104] el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:144 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:648) > [ 1738.109613][ T104] el0t_64_sync_handler (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:666) > [ 1738.113035][ T104] el0t_64_sync (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:591) > [ 1738.116346][ T104] > [ 1738.117924][ T104] 1 lock held by systemd-udevd/108: > [ 1738.121580][ T104] #0: ffff000006681e08 (&root->kernfs_iattr_rwsem){++++}-{3:3}, at: kernfs_iop_permission (/home/anders/src/kernel/next/fs/kernfs/inode.c:288) > [ 1738.129355][ T104] irq event stamp: 31000 > [ 1738.132088][ T104] hardirqs last enabled at (31000): seqcount_lockdep_reader_access (/home/anders/src/kernel/next/include/linux/seqlock.h:105) > [ 1738.139417][ T104] hardirqs last disabled at (30999): seqcount_lockdep_reader_access (/home/anders/src/kernel/next/include/linux/seqlock.h:104) > [ 1738.146781][ T104] softirqs last enabled at (30973): fpsimd_restore_current_state (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1791) > [ 1738.153891][ T104] softirqs last disabled at (30971): fpsimd_restore_current_state (/home/anders/src/kernel/next/include/linux/bottom_half.h:20 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1784) > [ 1738.161012][ T104] > [ 1738.162663][ T104] write to 0xffff00000bab6918 of 4 bytes by task 104 on cpu 0: > [ 1738.167730][ T104] set_nlink (/home/anders/src/kernel/next/fs/inode.c:372) > [ 1738.170559][ T104] kernfs_refresh_inode (/home/anders/src/kernel/next/fs/kernfs/inode.c:180) > [ 1738.174355][ T104] kernfs_iop_permission (/home/anders/src/kernel/next/fs/kernfs/inode.c:289) > [ 1738.177829][ T104] inode_permission (/home/anders/src/kernel/next/fs/namei.c:461 /home/anders/src/kernel/next/fs/namei.c:528) > [ 1738.181403][ T104] link_path_walk (/home/anders/src/kernel/next/fs/namei.c:1720 /home/anders/src/kernel/next/fs/namei.c:2267) > [ 1738.184738][ T104] path_lookupat (/home/anders/src/kernel/next/fs/namei.c:2478 (discriminator 2)) > [ 1738.188268][ T104] filename_lookup (/home/anders/src/kernel/next/fs/namei.c:2508) > [ 1738.191865][ T104] vfs_statx (/home/anders/src/kernel/next/fs/stat.c:238) > [ 1738.196236][ T104] vfs_fstatat (/home/anders/src/kernel/next/fs/stat.c:276) > [ 1738.200120][ T104] __do_sys_newfstatat (/home/anders/src/kernel/next/fs/stat.c:446) > [ 1738.204095][ T104] __arm64_sys_newfstatat (/home/anders/src/kernel/next/fs/stat.c:440 /home/anders/src/kernel/next/fs/stat.c:440) > [ 1738.207676][ T104] el0_svc_common.constprop.0 (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52 /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:139) > [ 1738.211820][ T104] do_el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:188) > [ 1738.214815][ T104] el0_svc (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:144 /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:648) > [ 1738.217709][ T104] el0t_64_sync_handler (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:666) > [ 1738.221239][ T104] el0t_64_sync (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:591) > [ 1738.224502][ T104] > [ 1738.226090][ T104] 1 lock held by systemd-udevd/104: > [ 1738.229747][ T104] #0: ffff000006681e08 (&root->kernfs_iattr_rwsem){++++}-{3:3}, at: kernfs_iop_permission (/home/anders/src/kernel/next/fs/kernfs/inode.c:288) > [ 1738.237504][ T104] irq event stamp: 108353 > [ 1738.240262][ T104] hardirqs last enabled at (108353): seqcount_lockdep_reader_access (/home/anders/src/kernel/next/include/linux/seqlock.h:105) > [ 1738.247443][ T104] hardirqs last disabled at (108352): seqcount_lockdep_reader_access (/home/anders/src/kernel/next/include/linux/seqlock.h:104) > [ 1738.254510][ T104] softirqs last enabled at (108326): fpsimd_restore_current_state (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1791) > [ 1738.262187][ T104] softirqs last disabled at (108324): fpsimd_restore_current_state (/home/anders/src/kernel/next/include/linux/bottom_half.h:20 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242 /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1784) > [ 1738.270239][ T104] > [ 1738.272140][ T104] Reported by Kernel Concurrency Sanitizer on: > [ 1738.285185][ T104] Hardware name: linux,dummy-virt (DT) > [ 1738.288703][ T104] ================================================================== This looks just like the original except a different lock is being used now. The link count can't be less than two if set_nlink() is called. Maybe I am missing something but the directory count is changed only while holding the root->kernfs_iattr_rwsem so the value used by set_nlink() will not change during concurrent calls to refresh_inode(). Still looks like a false positive, I'll check the write operations again just to be sure. Ian
On Wed, 2023-07-19 at 12:23 +0800, Ian Kent wrote: > On 19/7/23 03:00, Anders Roxell wrote: > > On 2023-01-23 11:11, Ian Kent wrote: > > > On 29/12/22 21:07, Ian Kent wrote: > > > > On 29/12/22 17:20, Arnd Bergmann wrote: > > > > > On Fri, Dec 23, 2022, at 00:11, Ian Kent wrote: > > > > > > On 21/12/22 21:34, Anders Roxell wrote: > > > > > > > On 2022-10-31 12:30, Tejun Heo wrote: > > > > > > > > On Tue, Oct 18, 2022 at 10:32:42AM +0800, Ian Kent > > > > > > > > wrote: > > > > > > > > > The kernfs write lock is held when the kernfs node > > > > > > > > > inode attributes > > > > > > > > > are updated. Therefore, when either > > > > > > > > > kernfs_iop_getattr() or > > > > > > > > > kernfs_iop_permission() are called the kernfs node > > > > > > > > > inode attributes > > > > > > > > > won't change. > > > > > > > > > > > > > > > > > > Consequently concurrent kernfs_refresh_inode() calls > > > > > > > > > always copy the > > > > > > > > > same values from the kernfs node. > > > > > > > > > > > > > > > > > > So there's no need to take the inode i_lock to get > > > > > > > > > consistent values > > > > > > > > > for generic_fillattr() and generic_permission(), the > > > > > > > > > kernfs read lock > > > > > > > > > is sufficient. > > > > > > > > > > > > > > > > > > Signed-off-by: Ian Kent <raven@themaw.net> > > > > > > > > Acked-by: Tejun Heo <tj@kernel.org> > > > > > > > Hi, > > > > > > > > > > > > > > Building an allmodconfig arm64 kernel on yesterdays next- > > > > > > > 20221220 and > > > > > > > booting that in qemu I see the following "BUG: KCSAN: > > > > > > > data-race in > > > > > > > set_nlink / set_nlink". > > > > > > I'll check if I missed any places where set_link() could be > > > > > > called where the link count could be different. > > > > > > > > > > > > > > > > > > If there aren't any the question will then be can writing > > > > > > the > > > > > > same value to this location in multiple concurrent threads > > > > > > corrupt it? > > > > > I think the race that is getting reported for set_nlink() > > > > > is about this bit getting called simulatenously on multiple > > > > > CPUs with only the read lock held for the inode: > > > > > > > > > > /* Yes, some filesystems do change nlink from zero to > > > > > one */ > > > > > if (inode->i_nlink == 0) > > > > > atomic_long_dec(&inode->i_sb->s_remove_count); > > > > > inode->__i_nlink = nlink; > > > > > > > > > > Since i_nlink and __i_nlink refer to the same memory > > > > > location, > > > > > the 'inode->i_nlink == 0' check can be true for all of them > > > > > before the nonzero nlink value gets set, and this results in > > > > > s_remove_count being decremented more than once. > > > > > > > > Thanks for the comment Arnd. > > > > > > Hello all, > > > > > > > > > I've been looking at this and after consulting Miklos and his > > > pointing > > > > > > out that it looks like a false positive the urgency dropped off a > > > bit. So > > > > > > apologies for taking so long to report back. > > > > > > > > > Anyway it needs some description of conclusions reached so far. > > > > > > > > > I'm still looking around but in short, kernfs will set > > > directories to <# of > > > > > > directory entries> + 2 unconditionally for directories. I can't > > > yet find > > > > > > any other places where i_nlink is set or changed and if there are > > > none > > > > > > then i_nlink will never be set to zero so the race should not > > > occur. > > > > > > > > > Consequently my claim is this is a real false positive. > > > > > > > > > There are the file system operations that may be passed at mount > > > time > > > > > > but given the way kernfs sets i_nlink it pretty much dictates > > > those > > > operations > > > > > > (if there were any that modify it and there don't appear to be > > > any) leave it > > > > > > alone. > > > > > > > > > So it just doesn't make sense for users of kernfs to fiddle with > > > i_nlink ... > > On todays next tag, next-20230718 this KCSAN BUG poped up again. > > When I > > built an allmodconfig arm64 kernel and booted it in QEMU. Full log > > can > > be found http://ix.io/4AUd > > > > [ 1694.987789][ T137] BUG: KCSAN: data-race in inode_permission / > > kernfs_refresh_inode > > [ 1694.992912][ T137] > > [ 1694.994532][ T137] write to 0xffff00000bab6070 of 2 bytes by > > task 104 on cpu 0: > > [ 1694.999269][ T137] kernfs_refresh_inode > > (/home/anders/src/kernel/next/fs/kernfs/inode.c:171) > > [ 1695.002707][ T137] kernfs_iop_permission > > (/home/anders/src/kernel/next/fs/kernfs/inode.c:289) > > [ 1695.006148][ T137] inode_permission > > (/home/anders/src/kernel/next/fs/namei.c:461 > > /home/anders/src/kernel/next/fs/namei.c:528) > > [ 1695.009420][ T137] link_path_walk > > (/home/anders/src/kernel/next/fs/namei.c:1720 > > /home/anders/src/kernel/next/fs/namei.c:2267) > > [ 1695.012643][ T137] path_lookupat > > (/home/anders/src/kernel/next/fs/namei.c:2478 (discriminator 2)) > > [ 1695.015781][ T137] filename_lookup > > (/home/anders/src/kernel/next/fs/namei.c:2508) > > [ 1695.019059][ T137] vfs_statx > > (/home/anders/src/kernel/next/fs/stat.c:238) > > [ 1695.022024][ T137] vfs_fstatat > > (/home/anders/src/kernel/next/fs/stat.c:276) > > [ 1695.025067][ T137] __do_sys_newfstatat > > (/home/anders/src/kernel/next/fs/stat.c:446) > > [ 1695.028497][ T137] __arm64_sys_newfstatat > > (/home/anders/src/kernel/next/fs/stat.c:440 > > /home/anders/src/kernel/next/fs/stat.c:440) > > [ 1695.032080][ T137] el0_svc_common.constprop.0 > > (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38 > > /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52 > > /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:139) > > [ 1695.035916][ T137] do_el0_svc > > (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:188) > > [ 1695.038796][ T137] el0_svc > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133 > > /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:144 > > /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:648) > > [ 1695.041468][ T137] el0t_64_sync_handler > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:666) > > [ 1695.044889][ T137] el0t_64_sync > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:591) > > [ 1695.047904][ T137] > > [ 1695.049511][ T137] 1 lock held by systemd-udevd/104: > > [ 1695.052837][ T137] #0: ffff000006681e08 (&root- > > >kernfs_iattr_rwsem){++++}-{3:3}, at: kernfs_iop_permission > > (/home/anders/src/kernel/next/fs/kernfs/inode.c:288) > > [ 1695.060241][ T137] irq event stamp: 82902 > > [ 1695.063006][ T137] hardirqs last enabled at (82901): > > _raw_spin_unlock_irqrestore > > (/home/anders/src/kernel/next/arch/arm64/include/asm/alternative- > > macros.h:250 > > /home/anders/src/kernel/next/arch/arm64/include/asm/irqflags.h:27 > > /home/anders/src/kernel/next/arch/arm64/include/asm/irqflags.h:140 > > /home/anders/src/kernel/next/include/linux/spinlock_api_smp.h:151 > > /home/anders/src/kernel/next/kernel/locking/spinlock.c:194) > > [ 1695.069673][ T137] hardirqs last disabled at (82902): > > el1_interrupt > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:472 > > /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:488) > > [ 1695.075474][ T137] softirqs last enabled at (82792): > > fpsimd_restore_current_state > > (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264 > > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1791) > > [ 1695.082319][ T137] softirqs last disabled at (82790): > > fpsimd_restore_current_state > > (/home/anders/src/kernel/next/include/linux/bottom_half.h:20 > > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242 > > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1784) > > [ 1695.089049][ T137] > > [ 1695.090659][ T137] read to 0xffff00000bab6070 of 2 bytes by > > task 137 on cpu 0: > > [ 1695.095374][ T137] inode_permission > > (/home/anders/src/kernel/next/fs/namei.c:532) > > [ 1695.098655][ T137] link_path_walk > > (/home/anders/src/kernel/next/fs/namei.c:1720 > > /home/anders/src/kernel/next/fs/namei.c:2267) > > [ 1695.101857][ T137] path_openat > > (/home/anders/src/kernel/next/fs/namei.c:3789 (discriminator 2)) > > [ 1695.104885][ T137] do_filp_open > > (/home/anders/src/kernel/next/fs/namei.c:3820) > > [ 1695.108006][ T137] do_sys_openat2 > > (/home/anders/src/kernel/next/fs/open.c:1418) > > [ 1695.111290][ T137] __arm64_sys_openat > > (/home/anders/src/kernel/next/fs/open.c:1433) > > [ 1695.114825][ T137] el0_svc_common.constprop.0 > > (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38 > > /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52 > > /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:139) > > [ 1695.118662][ T137] do_el0_svc > > (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:188) > > [ 1695.121555][ T137] el0_svc > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133 > > /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:144 > > /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:648) > > [ 1695.124207][ T137] el0t_64_sync_handler > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:666) > > [ 1695.127590][ T137] el0t_64_sync > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:591) > > [ 1695.130641][ T137] > > [ 1695.132241][ T137] no locks held by systemd-udevd/137. > > [ 1695.135618][ T137] irq event stamp: 3246 > > [ 1695.138519][ T137] hardirqs last enabled at (3245): > > seqcount_lockdep_reader_access > > (/home/anders/src/kernel/next/include/linux/seqlock.h:105) > > [ 1695.145825][ T137] hardirqs last disabled at (3246): > > el1_interrupt > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:472 > > /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:488) > > [ 1695.151942][ T137] softirqs last enabled at (3208): > > fpsimd_restore_current_state > > (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264 > > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1791) > > [ 1695.158950][ T137] softirqs last disabled at (3206): > > fpsimd_restore_current_state > > (/home/anders/src/kernel/next/include/linux/bottom_half.h:20 > > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242 > > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1784) > > [ 1695.166036][ T137] > > [ 1695.167621][ T137] Reported by Kernel Concurrency Sanitizer on: > > [ 1695.179990][ T137] Hardware name: linux,dummy-virt (DT) > > [ 1695.183687][ T137] > > ================================================================== > > > This one is different to the original. > > > I can't see where the problem is here, can someone help me out > > please. > > > I don't see any shared data values used by the call > > devcgroup_inode_permission(inode, mask) in > fs/namei.c:inode_permission() > > that might be a problem during the read except possibly inode- > >i_mode. > > > I'll check on that ... maybe something's been missed when > kernfs_rwsem > > was changed to a separate lock (kernfs_iattr_rwsem). > > > > > > [...] > > > > [ 1738.053819][ T104] BUG: KCSAN: data-race in set_nlink / > > set_nlink > > [ 1738.058223][ T104] > > [ 1738.059865][ T104] read to 0xffff00000bab6918 of 4 bytes by > > task 108 on cpu 0: > > [ 1738.064916][ T104] set_nlink > > (/home/anders/src/kernel/next/fs/inode.c:369) > > [ 1738.067845][ T104] kernfs_refresh_inode > > (/home/anders/src/kernel/next/fs/kernfs/inode.c:180) > > [ 1738.071607][ T104] kernfs_iop_permission > > (/home/anders/src/kernel/next/fs/kernfs/inode.c:289) > > [ 1738.075467][ T104] inode_permission > > (/home/anders/src/kernel/next/fs/namei.c:461 > > /home/anders/src/kernel/next/fs/namei.c:528) > > [ 1738.078868][ T104] link_path_walk > > (/home/anders/src/kernel/next/fs/namei.c:1720 > > /home/anders/src/kernel/next/fs/namei.c:2267) > > [ 1738.082270][ T104] path_lookupat > > (/home/anders/src/kernel/next/fs/namei.c:2478 (discriminator 2)) > > [ 1738.085488][ T104] filename_lookup > > (/home/anders/src/kernel/next/fs/namei.c:2508) > > [ 1738.089101][ T104] user_path_at_empty > > (/home/anders/src/kernel/next/fs/namei.c:2907) > > [ 1738.092469][ T104] do_readlinkat > > (/home/anders/src/kernel/next/fs/stat.c:477) > > [ 1738.095970][ T104] __arm64_sys_readlinkat > > (/home/anders/src/kernel/next/fs/stat.c:504 > > /home/anders/src/kernel/next/fs/stat.c:501 > > /home/anders/src/kernel/next/fs/stat.c:501) > > [ 1738.099529][ T104] el0_svc_common.constprop.0 > > (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38 > > /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52 > > /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:139) > > [ 1738.103696][ T104] do_el0_svc > > (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:188) > > [ 1738.106560][ T104] el0_svc > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133 > > /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:144 > > /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:648) > > [ 1738.109613][ T104] el0t_64_sync_handler > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:666) > > [ 1738.113035][ T104] el0t_64_sync > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:591) > > [ 1738.116346][ T104] > > [ 1738.117924][ T104] 1 lock held by systemd-udevd/108: > > [ 1738.121580][ T104] #0: ffff000006681e08 (&root- > > >kernfs_iattr_rwsem){++++}-{3:3}, at: kernfs_iop_permission > > (/home/anders/src/kernel/next/fs/kernfs/inode.c:288) > > [ 1738.129355][ T104] irq event stamp: 31000 > > [ 1738.132088][ T104] hardirqs last enabled at (31000): > > seqcount_lockdep_reader_access > > (/home/anders/src/kernel/next/include/linux/seqlock.h:105) > > [ 1738.139417][ T104] hardirqs last disabled at (30999): > > seqcount_lockdep_reader_access > > (/home/anders/src/kernel/next/include/linux/seqlock.h:104) > > [ 1738.146781][ T104] softirqs last enabled at (30973): > > fpsimd_restore_current_state > > (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264 > > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1791) > > [ 1738.153891][ T104] softirqs last disabled at (30971): > > fpsimd_restore_current_state > > (/home/anders/src/kernel/next/include/linux/bottom_half.h:20 > > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242 > > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1784) > > [ 1738.161012][ T104] > > [ 1738.162663][ T104] write to 0xffff00000bab6918 of 4 bytes by > > task 104 on cpu 0: > > [ 1738.167730][ T104] set_nlink > > (/home/anders/src/kernel/next/fs/inode.c:372) > > [ 1738.170559][ T104] kernfs_refresh_inode > > (/home/anders/src/kernel/next/fs/kernfs/inode.c:180) > > [ 1738.174355][ T104] kernfs_iop_permission > > (/home/anders/src/kernel/next/fs/kernfs/inode.c:289) > > [ 1738.177829][ T104] inode_permission > > (/home/anders/src/kernel/next/fs/namei.c:461 > > /home/anders/src/kernel/next/fs/namei.c:528) > > [ 1738.181403][ T104] link_path_walk > > (/home/anders/src/kernel/next/fs/namei.c:1720 > > /home/anders/src/kernel/next/fs/namei.c:2267) > > [ 1738.184738][ T104] path_lookupat > > (/home/anders/src/kernel/next/fs/namei.c:2478 (discriminator 2)) > > [ 1738.188268][ T104] filename_lookup > > (/home/anders/src/kernel/next/fs/namei.c:2508) > > [ 1738.191865][ T104] vfs_statx > > (/home/anders/src/kernel/next/fs/stat.c:238) > > [ 1738.196236][ T104] vfs_fstatat > > (/home/anders/src/kernel/next/fs/stat.c:276) > > [ 1738.200120][ T104] __do_sys_newfstatat > > (/home/anders/src/kernel/next/fs/stat.c:446) > > [ 1738.204095][ T104] __arm64_sys_newfstatat > > (/home/anders/src/kernel/next/fs/stat.c:440 > > /home/anders/src/kernel/next/fs/stat.c:440) > > [ 1738.207676][ T104] el0_svc_common.constprop.0 > > (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38 > > /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52 > > /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:139) > > [ 1738.211820][ T104] do_el0_svc > > (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:188) > > [ 1738.214815][ T104] el0_svc > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133 > > /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:144 > > /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:648) > > [ 1738.217709][ T104] el0t_64_sync_handler > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:666) > > [ 1738.221239][ T104] el0t_64_sync > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:591) > > [ 1738.224502][ T104] > > [ 1738.226090][ T104] 1 lock held by systemd-udevd/104: > > [ 1738.229747][ T104] #0: ffff000006681e08 (&root- > > >kernfs_iattr_rwsem){++++}-{3:3}, at: kernfs_iop_permission > > (/home/anders/src/kernel/next/fs/kernfs/inode.c:288) > > [ 1738.237504][ T104] irq event stamp: 108353 > > [ 1738.240262][ T104] hardirqs last enabled at (108353): > > seqcount_lockdep_reader_access > > (/home/anders/src/kernel/next/include/linux/seqlock.h:105) > > [ 1738.247443][ T104] hardirqs last disabled at (108352): > > seqcount_lockdep_reader_access > > (/home/anders/src/kernel/next/include/linux/seqlock.h:104) > > [ 1738.254510][ T104] softirqs last enabled at (108326): > > fpsimd_restore_current_state > > (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264 > > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1791) > > [ 1738.262187][ T104] softirqs last disabled at (108324): > > fpsimd_restore_current_state > > (/home/anders/src/kernel/next/include/linux/bottom_half.h:20 > > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242 > > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1784) > > [ 1738.270239][ T104] > > [ 1738.272140][ T104] Reported by Kernel Concurrency Sanitizer on: > > [ 1738.285185][ T104] Hardware name: linux,dummy-virt (DT) > > [ 1738.288703][ T104] > > ================================================================== > > This looks just like the original except a different lock is being > > used now. > > > The link count can't be less than two if set_nlink() is called. > > > Maybe I am missing something but the directory count is changed only > > while holding the root->kernfs_iattr_rwsem so the value used by > > set_nlink() will not change during concurrent calls to > refresh_inode(). > > Still looks like a false positive, I'll check the write operations > again just to be sure. I do see a problem with recent changes. I'll send this off to Greg after I've done some testing (primarily just compile and function). Here's a patch which describes what I found. Comments are, of course, welcome, ;) kernfs: fix missing kernfs_iattr_rwsem locking From: Ian Kent <raven@themaw.net> When the kernfs_iattr_rwsem was introduced a case was missed. The update of the kernfs directory node child count was also protected by the kernfs_rwsem and needs to be included in the change so that the child count (and so the inode n_link attribute) does not change while holding the rwsem for read. Fixes: 9caf696142 (kernfs: Introduce separate rwsem to protect inode attributes) Signed-off-by: Ian Kent <raven@themaw.net> Cc: Anders Roxell <anders.roxell@linaro.org> Cc: Imran Khan <imran.f.khan@oracle.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Minchan Kim <minchan@kernel.org> Cc: Eric Sandeen <sandeen@sandeen.net> --- fs/kernfs/dir.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c index 45b6919903e6..6e84bb69602e 100644 --- a/fs/kernfs/dir.c +++ b/fs/kernfs/dir.c @@ -383,9 +383,11 @@ static int kernfs_link_sibling(struct kernfs_node *kn) rb_insert_color(&kn->rb, &kn->parent->dir.children); /* successfully added, account subdir number */ + down_write(&kernfs_root(kn)->kernfs_iattr_rwsem); if (kernfs_type(kn) == KERNFS_DIR) kn->parent->dir.subdirs++; kernfs_inc_rev(kn->parent); + up_write(&kernfs_root(kn)->kernfs_iattr_rwsem); return 0; } @@ -408,9 +410,11 @@ static bool kernfs_unlink_sibling(struct kernfs_node *kn) if (RB_EMPTY_NODE(&kn->rb)) return false; + down_write(&kernfs_root(kn)->kernfs_iattr_rwsem); if (kernfs_type(kn) == KERNFS_DIR) kn->parent->dir.subdirs--; kernfs_inc_rev(kn->parent); + up_write(&kernfs_root(kn)->kernfs_iattr_rwsem); rb_erase(&kn->rb, &kn->parent->dir.children); RB_CLEAR_NODE(&kn->rb);
On Thu, 20 Jul 2023 at 04:03, Ian Kent <raven@themaw.net> wrote: > > On Wed, 2023-07-19 at 12:23 +0800, Ian Kent wrote: > > On 19/7/23 03:00, Anders Roxell wrote: > > > On 2023-01-23 11:11, Ian Kent wrote: > > > > On 29/12/22 21:07, Ian Kent wrote: > > > > > On 29/12/22 17:20, Arnd Bergmann wrote: > > > > > > On Fri, Dec 23, 2022, at 00:11, Ian Kent wrote: > > > > > > > On 21/12/22 21:34, Anders Roxell wrote: > > > > > > > > On 2022-10-31 12:30, Tejun Heo wrote: > > > > > > > > > On Tue, Oct 18, 2022 at 10:32:42AM +0800, Ian Kent > > > > > > > > > wrote: > > > > > > > > > > The kernfs write lock is held when the kernfs node > > > > > > > > > > inode attributes > > > > > > > > > > are updated. Therefore, when either > > > > > > > > > > kernfs_iop_getattr() or > > > > > > > > > > kernfs_iop_permission() are called the kernfs node > > > > > > > > > > inode attributes > > > > > > > > > > won't change. > > > > > > > > > > > > > > > > > > > > Consequently concurrent kernfs_refresh_inode() calls > > > > > > > > > > always copy the > > > > > > > > > > same values from the kernfs node. > > > > > > > > > > > > > > > > > > > > So there's no need to take the inode i_lock to get > > > > > > > > > > consistent values > > > > > > > > > > for generic_fillattr() and generic_permission(), the > > > > > > > > > > kernfs read lock > > > > > > > > > > is sufficient. > > > > > > > > > > > > > > > > > > > > Signed-off-by: Ian Kent <raven@themaw.net> > > > > > > > > > Acked-by: Tejun Heo <tj@kernel.org> > > > > > > > > Hi, > > > > > > > > > > > > > > > > Building an allmodconfig arm64 kernel on yesterdays next- > > > > > > > > 20221220 and > > > > > > > > booting that in qemu I see the following "BUG: KCSAN: > > > > > > > > data-race in > > > > > > > > set_nlink / set_nlink". > > > > > > > I'll check if I missed any places where set_link() could be > > > > > > > called where the link count could be different. > > > > > > > > > > > > > > > > > > > > > If there aren't any the question will then be can writing > > > > > > > the > > > > > > > same value to this location in multiple concurrent threads > > > > > > > corrupt it? > > > > > > I think the race that is getting reported for set_nlink() > > > > > > is about this bit getting called simulatenously on multiple > > > > > > CPUs with only the read lock held for the inode: > > > > > > > > > > > > /* Yes, some filesystems do change nlink from zero to > > > > > > one */ > > > > > > if (inode->i_nlink == 0) > > > > > > atomic_long_dec(&inode->i_sb->s_remove_count); > > > > > > inode->__i_nlink = nlink; > > > > > > > > > > > > Since i_nlink and __i_nlink refer to the same memory > > > > > > location, > > > > > > the 'inode->i_nlink == 0' check can be true for all of them > > > > > > before the nonzero nlink value gets set, and this results in > > > > > > s_remove_count being decremented more than once. > > > > > > > > > > Thanks for the comment Arnd. > > > > > > > > Hello all, > > > > > > > > > > > > I've been looking at this and after consulting Miklos and his > > > > pointing > > > > > > > > out that it looks like a false positive the urgency dropped off a > > > > bit. So > > > > > > > > apologies for taking so long to report back. > > > > > > > > > > > > Anyway it needs some description of conclusions reached so far. > > > > > > > > > > > > I'm still looking around but in short, kernfs will set > > > > directories to <# of > > > > > > > > directory entries> + 2 unconditionally for directories. I can't > > > > yet find > > > > > > > > any other places where i_nlink is set or changed and if there are > > > > none > > > > > > > > then i_nlink will never be set to zero so the race should not > > > > occur. > > > > > > > > > > > > Consequently my claim is this is a real false positive. > > > > > > > > > > > > There are the file system operations that may be passed at mount > > > > time > > > > > > > > but given the way kernfs sets i_nlink it pretty much dictates > > > > those > > > > operations > > > > > > > > (if there were any that modify it and there don't appear to be > > > > any) leave it > > > > > > > > alone. > > > > > > > > > > > > So it just doesn't make sense for users of kernfs to fiddle with > > > > i_nlink ... > > > On todays next tag, next-20230718 this KCSAN BUG poped up again. > > > When I > > > built an allmodconfig arm64 kernel and booted it in QEMU. Full log > > > can > > > be found http://ix.io/4AUd > > > > > > [ 1694.987789][ T137] BUG: KCSAN: data-race in inode_permission / > > > kernfs_refresh_inode > > > [ 1694.992912][ T137] > > > [ 1694.994532][ T137] write to 0xffff00000bab6070 of 2 bytes by > > > task 104 on cpu 0: > > > [ 1694.999269][ T137] kernfs_refresh_inode > > > (/home/anders/src/kernel/next/fs/kernfs/inode.c:171) > > > [ 1695.002707][ T137] kernfs_iop_permission > > > (/home/anders/src/kernel/next/fs/kernfs/inode.c:289) > > > [ 1695.006148][ T137] inode_permission > > > (/home/anders/src/kernel/next/fs/namei.c:461 > > > /home/anders/src/kernel/next/fs/namei.c:528) > > > [ 1695.009420][ T137] link_path_walk > > > (/home/anders/src/kernel/next/fs/namei.c:1720 > > > /home/anders/src/kernel/next/fs/namei.c:2267) > > > [ 1695.012643][ T137] path_lookupat > > > (/home/anders/src/kernel/next/fs/namei.c:2478 (discriminator 2)) > > > [ 1695.015781][ T137] filename_lookup > > > (/home/anders/src/kernel/next/fs/namei.c:2508) > > > [ 1695.019059][ T137] vfs_statx > > > (/home/anders/src/kernel/next/fs/stat.c:238) > > > [ 1695.022024][ T137] vfs_fstatat > > > (/home/anders/src/kernel/next/fs/stat.c:276) > > > [ 1695.025067][ T137] __do_sys_newfstatat > > > (/home/anders/src/kernel/next/fs/stat.c:446) > > > [ 1695.028497][ T137] __arm64_sys_newfstatat > > > (/home/anders/src/kernel/next/fs/stat.c:440 > > > /home/anders/src/kernel/next/fs/stat.c:440) > > > [ 1695.032080][ T137] el0_svc_common.constprop.0 > > > (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38 > > > /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52 > > > /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:139) > > > [ 1695.035916][ T137] do_el0_svc > > > (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:188) > > > [ 1695.038796][ T137] el0_svc > > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133 > > > /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:144 > > > /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:648) > > > [ 1695.041468][ T137] el0t_64_sync_handler > > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:666) > > > [ 1695.044889][ T137] el0t_64_sync > > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:591) > > > [ 1695.047904][ T137] > > > [ 1695.049511][ T137] 1 lock held by systemd-udevd/104: > > > [ 1695.052837][ T137] #0: ffff000006681e08 (&root- > > > >kernfs_iattr_rwsem){++++}-{3:3}, at: kernfs_iop_permission > > > (/home/anders/src/kernel/next/fs/kernfs/inode.c:288) > > > [ 1695.060241][ T137] irq event stamp: 82902 > > > [ 1695.063006][ T137] hardirqs last enabled at (82901): > > > _raw_spin_unlock_irqrestore > > > (/home/anders/src/kernel/next/arch/arm64/include/asm/alternative- > > > macros.h:250 > > > /home/anders/src/kernel/next/arch/arm64/include/asm/irqflags.h:27 > > > /home/anders/src/kernel/next/arch/arm64/include/asm/irqflags.h:140 > > > /home/anders/src/kernel/next/include/linux/spinlock_api_smp.h:151 > > > /home/anders/src/kernel/next/kernel/locking/spinlock.c:194) > > > [ 1695.069673][ T137] hardirqs last disabled at (82902): > > > el1_interrupt > > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:472 > > > /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:488) > > > [ 1695.075474][ T137] softirqs last enabled at (82792): > > > fpsimd_restore_current_state > > > (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264 > > > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1791) > > > [ 1695.082319][ T137] softirqs last disabled at (82790): > > > fpsimd_restore_current_state > > > (/home/anders/src/kernel/next/include/linux/bottom_half.h:20 > > > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242 > > > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1784) > > > [ 1695.089049][ T137] > > > [ 1695.090659][ T137] read to 0xffff00000bab6070 of 2 bytes by > > > task 137 on cpu 0: > > > [ 1695.095374][ T137] inode_permission > > > (/home/anders/src/kernel/next/fs/namei.c:532) > > > [ 1695.098655][ T137] link_path_walk > > > (/home/anders/src/kernel/next/fs/namei.c:1720 > > > /home/anders/src/kernel/next/fs/namei.c:2267) > > > [ 1695.101857][ T137] path_openat > > > (/home/anders/src/kernel/next/fs/namei.c:3789 (discriminator 2)) > > > [ 1695.104885][ T137] do_filp_open > > > (/home/anders/src/kernel/next/fs/namei.c:3820) > > > [ 1695.108006][ T137] do_sys_openat2 > > > (/home/anders/src/kernel/next/fs/open.c:1418) > > > [ 1695.111290][ T137] __arm64_sys_openat > > > (/home/anders/src/kernel/next/fs/open.c:1433) > > > [ 1695.114825][ T137] el0_svc_common.constprop.0 > > > (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38 > > > /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52 > > > /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:139) > > > [ 1695.118662][ T137] do_el0_svc > > > (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:188) > > > [ 1695.121555][ T137] el0_svc > > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133 > > > /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:144 > > > /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:648) > > > [ 1695.124207][ T137] el0t_64_sync_handler > > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:666) > > > [ 1695.127590][ T137] el0t_64_sync > > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:591) > > > [ 1695.130641][ T137] > > > [ 1695.132241][ T137] no locks held by systemd-udevd/137. > > > [ 1695.135618][ T137] irq event stamp: 3246 > > > [ 1695.138519][ T137] hardirqs last enabled at (3245): > > > seqcount_lockdep_reader_access > > > (/home/anders/src/kernel/next/include/linux/seqlock.h:105) > > > [ 1695.145825][ T137] hardirqs last disabled at (3246): > > > el1_interrupt > > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:472 > > > /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:488) > > > [ 1695.151942][ T137] softirqs last enabled at (3208): > > > fpsimd_restore_current_state > > > (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264 > > > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1791) > > > [ 1695.158950][ T137] softirqs last disabled at (3206): > > > fpsimd_restore_current_state > > > (/home/anders/src/kernel/next/include/linux/bottom_half.h:20 > > > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242 > > > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1784) > > > [ 1695.166036][ T137] > > > [ 1695.167621][ T137] Reported by Kernel Concurrency Sanitizer on: > > > [ 1695.179990][ T137] Hardware name: linux,dummy-virt (DT) > > > [ 1695.183687][ T137] > > > ================================================================== > > > > > > This one is different to the original. > > > > > > I can't see where the problem is here, can someone help me out > > > > please. > > > > > > I don't see any shared data values used by the call > > > > devcgroup_inode_permission(inode, mask) in > > fs/namei.c:inode_permission() > > > > that might be a problem during the read except possibly inode- > > >i_mode. > > > > > > I'll check on that ... maybe something's been missed when > > kernfs_rwsem > > > > was changed to a separate lock (kernfs_iattr_rwsem). > > > > > > > > > > [...] > > > > > > [ 1738.053819][ T104] BUG: KCSAN: data-race in set_nlink / > > > set_nlink > > > [ 1738.058223][ T104] > > > [ 1738.059865][ T104] read to 0xffff00000bab6918 of 4 bytes by > > > task 108 on cpu 0: > > > [ 1738.064916][ T104] set_nlink > > > (/home/anders/src/kernel/next/fs/inode.c:369) > > > [ 1738.067845][ T104] kernfs_refresh_inode > > > (/home/anders/src/kernel/next/fs/kernfs/inode.c:180) > > > [ 1738.071607][ T104] kernfs_iop_permission > > > (/home/anders/src/kernel/next/fs/kernfs/inode.c:289) > > > [ 1738.075467][ T104] inode_permission > > > (/home/anders/src/kernel/next/fs/namei.c:461 > > > /home/anders/src/kernel/next/fs/namei.c:528) > > > [ 1738.078868][ T104] link_path_walk > > > (/home/anders/src/kernel/next/fs/namei.c:1720 > > > /home/anders/src/kernel/next/fs/namei.c:2267) > > > [ 1738.082270][ T104] path_lookupat > > > (/home/anders/src/kernel/next/fs/namei.c:2478 (discriminator 2)) > > > [ 1738.085488][ T104] filename_lookup > > > (/home/anders/src/kernel/next/fs/namei.c:2508) > > > [ 1738.089101][ T104] user_path_at_empty > > > (/home/anders/src/kernel/next/fs/namei.c:2907) > > > [ 1738.092469][ T104] do_readlinkat > > > (/home/anders/src/kernel/next/fs/stat.c:477) > > > [ 1738.095970][ T104] __arm64_sys_readlinkat > > > (/home/anders/src/kernel/next/fs/stat.c:504 > > > /home/anders/src/kernel/next/fs/stat.c:501 > > > /home/anders/src/kernel/next/fs/stat.c:501) > > > [ 1738.099529][ T104] el0_svc_common.constprop.0 > > > (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38 > > > /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52 > > > /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:139) > > > [ 1738.103696][ T104] do_el0_svc > > > (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:188) > > > [ 1738.106560][ T104] el0_svc > > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133 > > > /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:144 > > > /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:648) > > > [ 1738.109613][ T104] el0t_64_sync_handler > > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:666) > > > [ 1738.113035][ T104] el0t_64_sync > > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:591) > > > [ 1738.116346][ T104] > > > [ 1738.117924][ T104] 1 lock held by systemd-udevd/108: > > > [ 1738.121580][ T104] #0: ffff000006681e08 (&root- > > > >kernfs_iattr_rwsem){++++}-{3:3}, at: kernfs_iop_permission > > > (/home/anders/src/kernel/next/fs/kernfs/inode.c:288) > > > [ 1738.129355][ T104] irq event stamp: 31000 > > > [ 1738.132088][ T104] hardirqs last enabled at (31000): > > > seqcount_lockdep_reader_access > > > (/home/anders/src/kernel/next/include/linux/seqlock.h:105) > > > [ 1738.139417][ T104] hardirqs last disabled at (30999): > > > seqcount_lockdep_reader_access > > > (/home/anders/src/kernel/next/include/linux/seqlock.h:104) > > > [ 1738.146781][ T104] softirqs last enabled at (30973): > > > fpsimd_restore_current_state > > > (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264 > > > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1791) > > > [ 1738.153891][ T104] softirqs last disabled at (30971): > > > fpsimd_restore_current_state > > > (/home/anders/src/kernel/next/include/linux/bottom_half.h:20 > > > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242 > > > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1784) > > > [ 1738.161012][ T104] > > > [ 1738.162663][ T104] write to 0xffff00000bab6918 of 4 bytes by > > > task 104 on cpu 0: > > > [ 1738.167730][ T104] set_nlink > > > (/home/anders/src/kernel/next/fs/inode.c:372) > > > [ 1738.170559][ T104] kernfs_refresh_inode > > > (/home/anders/src/kernel/next/fs/kernfs/inode.c:180) > > > [ 1738.174355][ T104] kernfs_iop_permission > > > (/home/anders/src/kernel/next/fs/kernfs/inode.c:289) > > > [ 1738.177829][ T104] inode_permission > > > (/home/anders/src/kernel/next/fs/namei.c:461 > > > /home/anders/src/kernel/next/fs/namei.c:528) > > > [ 1738.181403][ T104] link_path_walk > > > (/home/anders/src/kernel/next/fs/namei.c:1720 > > > /home/anders/src/kernel/next/fs/namei.c:2267) > > > [ 1738.184738][ T104] path_lookupat > > > (/home/anders/src/kernel/next/fs/namei.c:2478 (discriminator 2)) > > > [ 1738.188268][ T104] filename_lookup > > > (/home/anders/src/kernel/next/fs/namei.c:2508) > > > [ 1738.191865][ T104] vfs_statx > > > (/home/anders/src/kernel/next/fs/stat.c:238) > > > [ 1738.196236][ T104] vfs_fstatat > > > (/home/anders/src/kernel/next/fs/stat.c:276) > > > [ 1738.200120][ T104] __do_sys_newfstatat > > > (/home/anders/src/kernel/next/fs/stat.c:446) > > > [ 1738.204095][ T104] __arm64_sys_newfstatat > > > (/home/anders/src/kernel/next/fs/stat.c:440 > > > /home/anders/src/kernel/next/fs/stat.c:440) > > > [ 1738.207676][ T104] el0_svc_common.constprop.0 > > > (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38 > > > /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52 > > > /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:139) > > > [ 1738.211820][ T104] do_el0_svc > > > (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:188) > > > [ 1738.214815][ T104] el0_svc > > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133 > > > /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:144 > > > /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:648) > > > [ 1738.217709][ T104] el0t_64_sync_handler > > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:666) > > > [ 1738.221239][ T104] el0t_64_sync > > > (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:591) > > > [ 1738.224502][ T104] > > > [ 1738.226090][ T104] 1 lock held by systemd-udevd/104: > > > [ 1738.229747][ T104] #0: ffff000006681e08 (&root- > > > >kernfs_iattr_rwsem){++++}-{3:3}, at: kernfs_iop_permission > > > (/home/anders/src/kernel/next/fs/kernfs/inode.c:288) > > > [ 1738.237504][ T104] irq event stamp: 108353 > > > [ 1738.240262][ T104] hardirqs last enabled at (108353): > > > seqcount_lockdep_reader_access > > > (/home/anders/src/kernel/next/include/linux/seqlock.h:105) > > > [ 1738.247443][ T104] hardirqs last disabled at (108352): > > > seqcount_lockdep_reader_access > > > (/home/anders/src/kernel/next/include/linux/seqlock.h:104) > > > [ 1738.254510][ T104] softirqs last enabled at (108326): > > > fpsimd_restore_current_state > > > (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264 > > > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1791) > > > [ 1738.262187][ T104] softirqs last disabled at (108324): > > > fpsimd_restore_current_state > > > (/home/anders/src/kernel/next/include/linux/bottom_half.h:20 > > > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242 > > > /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1784) > > > [ 1738.270239][ T104] > > > [ 1738.272140][ T104] Reported by Kernel Concurrency Sanitizer on: > > > [ 1738.285185][ T104] Hardware name: linux,dummy-virt (DT) > > > [ 1738.288703][ T104] > > > ================================================================== > > > > This looks just like the original except a different lock is being > > > > used now. > > > > > > The link count can't be less than two if set_nlink() is called. > > > > > > Maybe I am missing something but the directory count is changed only > > > > while holding the root->kernfs_iattr_rwsem so the value used by > > > > set_nlink() will not change during concurrent calls to > > refresh_inode(). > > > > Still looks like a false positive, I'll check the write operations > > again just to be sure. > > I do see a problem with recent changes. > > I'll send this off to Greg after I've done some testing (primarily just > compile and function). > > Here's a patch which describes what I found. > > Comments are, of course, welcome, ;) > > kernfs: fix missing kernfs_iattr_rwsem locking > > From: Ian Kent <raven@themaw.net> > > When the kernfs_iattr_rwsem was introduced a case was missed. > > The update of the kernfs directory node child count was also protected > by the kernfs_rwsem and needs to be included in the change so that the > child count (and so the inode n_link attribute) does not change while > holding the rwsem for read. > > Fixes: 9caf696142 (kernfs: Introduce separate rwsem to protect inode > attributes) > > Signed-off-by: Ian Kent <raven@themaw.net> > Cc: Anders Roxell <anders.roxell@linaro.org> > Cc: Imran Khan <imran.f.khan@oracle.com> > Cc: Arnd Bergmann <arnd@arndb.de> > Cc: Minchan Kim <minchan@kernel.org> > Cc: Eric Sandeen <sandeen@sandeen.net> Looks good. Acked-by: Miklos Szeredi <mszeredi@redhat.com>
On 20/7/23 10:03, Ian Kent wrote: > On Wed, 2023-07-19 at 12:23 +0800, Ian Kent wrote: >> On 19/7/23 03:00, Anders Roxell wrote: >>> On 2023-01-23 11:11, Ian Kent wrote: >>>> On 29/12/22 21:07, Ian Kent wrote: >>>>> On 29/12/22 17:20, Arnd Bergmann wrote: >>>>>> On Fri, Dec 23, 2022, at 00:11, Ian Kent wrote: >>>>>>> On 21/12/22 21:34, Anders Roxell wrote: >>>>>>>> On 2022-10-31 12:30, Tejun Heo wrote: >>>>>>>>> On Tue, Oct 18, 2022 at 10:32:42AM +0800, Ian Kent >>>>>>>>> wrote: >>>>>>>>>> The kernfs write lock is held when the kernfs node >>>>>>>>>> inode attributes >>>>>>>>>> are updated. Therefore, when either >>>>>>>>>> kernfs_iop_getattr() or >>>>>>>>>> kernfs_iop_permission() are called the kernfs node >>>>>>>>>> inode attributes >>>>>>>>>> won't change. >>>>>>>>>> >>>>>>>>>> Consequently concurrent kernfs_refresh_inode() calls >>>>>>>>>> always copy the >>>>>>>>>> same values from the kernfs node. >>>>>>>>>> >>>>>>>>>> So there's no need to take the inode i_lock to get >>>>>>>>>> consistent values >>>>>>>>>> for generic_fillattr() and generic_permission(), the >>>>>>>>>> kernfs read lock >>>>>>>>>> is sufficient. >>>>>>>>>> >>>>>>>>>> Signed-off-by: Ian Kent <raven@themaw.net> >>>>>>>>> Acked-by: Tejun Heo <tj@kernel.org> >>>>>>>> Hi, >>>>>>>> >>>>>>>> Building an allmodconfig arm64 kernel on yesterdays next- >>>>>>>> 20221220 and >>>>>>>> booting that in qemu I see the following "BUG: KCSAN: >>>>>>>> data-race in >>>>>>>> set_nlink / set_nlink". >>>>>>> I'll check if I missed any places where set_link() could be >>>>>>> called where the link count could be different. >>>>>>> >>>>>>> >>>>>>> If there aren't any the question will then be can writing >>>>>>> the >>>>>>> same value to this location in multiple concurrent threads >>>>>>> corrupt it? >>>>>> I think the race that is getting reported for set_nlink() >>>>>> is about this bit getting called simulatenously on multiple >>>>>> CPUs with only the read lock held for the inode: >>>>>> >>>>>> /* Yes, some filesystems do change nlink from zero to >>>>>> one */ >>>>>> if (inode->i_nlink == 0) >>>>>> atomic_long_dec(&inode->i_sb->s_remove_count); >>>>>> inode->__i_nlink = nlink; >>>>>> >>>>>> Since i_nlink and __i_nlink refer to the same memory >>>>>> location, >>>>>> the 'inode->i_nlink == 0' check can be true for all of them >>>>>> before the nonzero nlink value gets set, and this results in >>>>>> s_remove_count being decremented more than once. >>>>> Thanks for the comment Arnd. >>>> Hello all, >>>> >>>> >>>> I've been looking at this and after consulting Miklos and his >>>> pointing >>>> >>>> out that it looks like a false positive the urgency dropped off a >>>> bit. So >>>> >>>> apologies for taking so long to report back. >>>> >>>> >>>> Anyway it needs some description of conclusions reached so far. >>>> >>>> >>>> I'm still looking around but in short, kernfs will set >>>> directories to <# of >>>> >>>> directory entries> + 2 unconditionally for directories. I can't >>>> yet find >>>> >>>> any other places where i_nlink is set or changed and if there are >>>> none >>>> >>>> then i_nlink will never be set to zero so the race should not >>>> occur. >>>> >>>> >>>> Consequently my claim is this is a real false positive. >>>> >>>> >>>> There are the file system operations that may be passed at mount >>>> time >>>> >>>> but given the way kernfs sets i_nlink it pretty much dictates >>>> those >>>> operations >>>> >>>> (if there were any that modify it and there don't appear to be >>>> any) leave it >>>> >>>> alone. >>>> >>>> >>>> So it just doesn't make sense for users of kernfs to fiddle with >>>> i_nlink ... >>> On todays next tag, next-20230718 this KCSAN BUG poped up again. >>> When I >>> built an allmodconfig arm64 kernel and booted it in QEMU. Full log >>> can >>> be found http://ix.io/4AUd >>> >>> [ 1694.987789][ T137] BUG: KCSAN: data-race in inode_permission / >>> kernfs_refresh_inode >>> [ 1694.992912][ T137] >>> [ 1694.994532][ T137] write to 0xffff00000bab6070 of 2 bytes by >>> task 104 on cpu 0: >>> [ 1694.999269][ T137] kernfs_refresh_inode >>> (/home/anders/src/kernel/next/fs/kernfs/inode.c:171) >>> [ 1695.002707][ T137] kernfs_iop_permission >>> (/home/anders/src/kernel/next/fs/kernfs/inode.c:289) >>> [ 1695.006148][ T137] inode_permission >>> (/home/anders/src/kernel/next/fs/namei.c:461 >>> /home/anders/src/kernel/next/fs/namei.c:528) >>> [ 1695.009420][ T137] link_path_walk >>> (/home/anders/src/kernel/next/fs/namei.c:1720 >>> /home/anders/src/kernel/next/fs/namei.c:2267) >>> [ 1695.012643][ T137] path_lookupat >>> (/home/anders/src/kernel/next/fs/namei.c:2478 (discriminator 2)) >>> [ 1695.015781][ T137] filename_lookup >>> (/home/anders/src/kernel/next/fs/namei.c:2508) >>> [ 1695.019059][ T137] vfs_statx >>> (/home/anders/src/kernel/next/fs/stat.c:238) >>> [ 1695.022024][ T137] vfs_fstatat >>> (/home/anders/src/kernel/next/fs/stat.c:276) >>> [ 1695.025067][ T137] __do_sys_newfstatat >>> (/home/anders/src/kernel/next/fs/stat.c:446) >>> [ 1695.028497][ T137] __arm64_sys_newfstatat >>> (/home/anders/src/kernel/next/fs/stat.c:440 >>> /home/anders/src/kernel/next/fs/stat.c:440) >>> [ 1695.032080][ T137] el0_svc_common.constprop.0 >>> (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38 >>> /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52 >>> /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:139) >>> [ 1695.035916][ T137] do_el0_svc >>> (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:188) >>> [ 1695.038796][ T137] el0_svc >>> (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133 >>> /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:144 >>> /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:648) >>> [ 1695.041468][ T137] el0t_64_sync_handler >>> (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:666) >>> [ 1695.044889][ T137] el0t_64_sync >>> (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:591) >>> [ 1695.047904][ T137] >>> [ 1695.049511][ T137] 1 lock held by systemd-udevd/104: >>> [ 1695.052837][ T137] #0: ffff000006681e08 (&root- >>>> kernfs_iattr_rwsem){++++}-{3:3}, at: kernfs_iop_permission >>> (/home/anders/src/kernel/next/fs/kernfs/inode.c:288) >>> [ 1695.060241][ T137] irq event stamp: 82902 >>> [ 1695.063006][ T137] hardirqs last enabled at (82901): >>> _raw_spin_unlock_irqrestore >>> (/home/anders/src/kernel/next/arch/arm64/include/asm/alternative- >>> macros.h:250 >>> /home/anders/src/kernel/next/arch/arm64/include/asm/irqflags.h:27 >>> /home/anders/src/kernel/next/arch/arm64/include/asm/irqflags.h:140 >>> /home/anders/src/kernel/next/include/linux/spinlock_api_smp.h:151 >>> /home/anders/src/kernel/next/kernel/locking/spinlock.c:194) >>> [ 1695.069673][ T137] hardirqs last disabled at (82902): >>> el1_interrupt >>> (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:472 >>> /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:488) >>> [ 1695.075474][ T137] softirqs last enabled at (82792): >>> fpsimd_restore_current_state >>> (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264 >>> /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1791) >>> [ 1695.082319][ T137] softirqs last disabled at (82790): >>> fpsimd_restore_current_state >>> (/home/anders/src/kernel/next/include/linux/bottom_half.h:20 >>> /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242 >>> /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1784) >>> [ 1695.089049][ T137] >>> [ 1695.090659][ T137] read to 0xffff00000bab6070 of 2 bytes by >>> task 137 on cpu 0: >>> [ 1695.095374][ T137] inode_permission >>> (/home/anders/src/kernel/next/fs/namei.c:532) >>> [ 1695.098655][ T137] link_path_walk >>> (/home/anders/src/kernel/next/fs/namei.c:1720 >>> /home/anders/src/kernel/next/fs/namei.c:2267) >>> [ 1695.101857][ T137] path_openat >>> (/home/anders/src/kernel/next/fs/namei.c:3789 (discriminator 2)) >>> [ 1695.104885][ T137] do_filp_open >>> (/home/anders/src/kernel/next/fs/namei.c:3820) >>> [ 1695.108006][ T137] do_sys_openat2 >>> (/home/anders/src/kernel/next/fs/open.c:1418) >>> [ 1695.111290][ T137] __arm64_sys_openat >>> (/home/anders/src/kernel/next/fs/open.c:1433) >>> [ 1695.114825][ T137] el0_svc_common.constprop.0 >>> (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38 >>> /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52 >>> /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:139) >>> [ 1695.118662][ T137] do_el0_svc >>> (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:188) >>> [ 1695.121555][ T137] el0_svc >>> (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133 >>> /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:144 >>> /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:648) >>> [ 1695.124207][ T137] el0t_64_sync_handler >>> (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:666) >>> [ 1695.127590][ T137] el0t_64_sync >>> (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:591) >>> [ 1695.130641][ T137] >>> [ 1695.132241][ T137] no locks held by systemd-udevd/137. >>> [ 1695.135618][ T137] irq event stamp: 3246 >>> [ 1695.138519][ T137] hardirqs last enabled at (3245): >>> seqcount_lockdep_reader_access >>> (/home/anders/src/kernel/next/include/linux/seqlock.h:105) >>> [ 1695.145825][ T137] hardirqs last disabled at (3246): >>> el1_interrupt >>> (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:472 >>> /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:488) >>> [ 1695.151942][ T137] softirqs last enabled at (3208): >>> fpsimd_restore_current_state >>> (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264 >>> /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1791) >>> [ 1695.158950][ T137] softirqs last disabled at (3206): >>> fpsimd_restore_current_state >>> (/home/anders/src/kernel/next/include/linux/bottom_half.h:20 >>> /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242 >>> /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1784) >>> [ 1695.166036][ T137] >>> [ 1695.167621][ T137] Reported by Kernel Concurrency Sanitizer on: >>> [ 1695.179990][ T137] Hardware name: linux,dummy-virt (DT) >>> [ 1695.183687][ T137] >>> ================================================================== >> >> This one is different to the original. >> >> >> I can't see where the problem is here, can someone help me out >> >> please. >> >> >> I don't see any shared data values used by the call >> >> devcgroup_inode_permission(inode, mask) in >> fs/namei.c:inode_permission() >> >> that might be a problem during the read except possibly inode- >>> i_mode. >> >> I'll check on that ... maybe something's been missed when >> kernfs_rwsem >> >> was changed to a separate lock (kernfs_iattr_rwsem). >> >> >>> [...] >>> >>> [ 1738.053819][ T104] BUG: KCSAN: data-race in set_nlink / >>> set_nlink >>> [ 1738.058223][ T104] >>> [ 1738.059865][ T104] read to 0xffff00000bab6918 of 4 bytes by >>> task 108 on cpu 0: >>> [ 1738.064916][ T104] set_nlink >>> (/home/anders/src/kernel/next/fs/inode.c:369) >>> [ 1738.067845][ T104] kernfs_refresh_inode >>> (/home/anders/src/kernel/next/fs/kernfs/inode.c:180) >>> [ 1738.071607][ T104] kernfs_iop_permission >>> (/home/anders/src/kernel/next/fs/kernfs/inode.c:289) >>> [ 1738.075467][ T104] inode_permission >>> (/home/anders/src/kernel/next/fs/namei.c:461 >>> /home/anders/src/kernel/next/fs/namei.c:528) >>> [ 1738.078868][ T104] link_path_walk >>> (/home/anders/src/kernel/next/fs/namei.c:1720 >>> /home/anders/src/kernel/next/fs/namei.c:2267) >>> [ 1738.082270][ T104] path_lookupat >>> (/home/anders/src/kernel/next/fs/namei.c:2478 (discriminator 2)) >>> [ 1738.085488][ T104] filename_lookup >>> (/home/anders/src/kernel/next/fs/namei.c:2508) >>> [ 1738.089101][ T104] user_path_at_empty >>> (/home/anders/src/kernel/next/fs/namei.c:2907) >>> [ 1738.092469][ T104] do_readlinkat >>> (/home/anders/src/kernel/next/fs/stat.c:477) >>> [ 1738.095970][ T104] __arm64_sys_readlinkat >>> (/home/anders/src/kernel/next/fs/stat.c:504 >>> /home/anders/src/kernel/next/fs/stat.c:501 >>> /home/anders/src/kernel/next/fs/stat.c:501) >>> [ 1738.099529][ T104] el0_svc_common.constprop.0 >>> (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38 >>> /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52 >>> /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:139) >>> [ 1738.103696][ T104] do_el0_svc >>> (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:188) >>> [ 1738.106560][ T104] el0_svc >>> (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133 >>> /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:144 >>> /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:648) >>> [ 1738.109613][ T104] el0t_64_sync_handler >>> (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:666) >>> [ 1738.113035][ T104] el0t_64_sync >>> (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:591) >>> [ 1738.116346][ T104] >>> [ 1738.117924][ T104] 1 lock held by systemd-udevd/108: >>> [ 1738.121580][ T104] #0: ffff000006681e08 (&root- >>>> kernfs_iattr_rwsem){++++}-{3:3}, at: kernfs_iop_permission >>> (/home/anders/src/kernel/next/fs/kernfs/inode.c:288) >>> [ 1738.129355][ T104] irq event stamp: 31000 >>> [ 1738.132088][ T104] hardirqs last enabled at (31000): >>> seqcount_lockdep_reader_access >>> (/home/anders/src/kernel/next/include/linux/seqlock.h:105) >>> [ 1738.139417][ T104] hardirqs last disabled at (30999): >>> seqcount_lockdep_reader_access >>> (/home/anders/src/kernel/next/include/linux/seqlock.h:104) >>> [ 1738.146781][ T104] softirqs last enabled at (30973): >>> fpsimd_restore_current_state >>> (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264 >>> /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1791) >>> [ 1738.153891][ T104] softirqs last disabled at (30971): >>> fpsimd_restore_current_state >>> (/home/anders/src/kernel/next/include/linux/bottom_half.h:20 >>> /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242 >>> /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1784) >>> [ 1738.161012][ T104] >>> [ 1738.162663][ T104] write to 0xffff00000bab6918 of 4 bytes by >>> task 104 on cpu 0: >>> [ 1738.167730][ T104] set_nlink >>> (/home/anders/src/kernel/next/fs/inode.c:372) >>> [ 1738.170559][ T104] kernfs_refresh_inode >>> (/home/anders/src/kernel/next/fs/kernfs/inode.c:180) >>> [ 1738.174355][ T104] kernfs_iop_permission >>> (/home/anders/src/kernel/next/fs/kernfs/inode.c:289) >>> [ 1738.177829][ T104] inode_permission >>> (/home/anders/src/kernel/next/fs/namei.c:461 >>> /home/anders/src/kernel/next/fs/namei.c:528) >>> [ 1738.181403][ T104] link_path_walk >>> (/home/anders/src/kernel/next/fs/namei.c:1720 >>> /home/anders/src/kernel/next/fs/namei.c:2267) >>> [ 1738.184738][ T104] path_lookupat >>> (/home/anders/src/kernel/next/fs/namei.c:2478 (discriminator 2)) >>> [ 1738.188268][ T104] filename_lookup >>> (/home/anders/src/kernel/next/fs/namei.c:2508) >>> [ 1738.191865][ T104] vfs_statx >>> (/home/anders/src/kernel/next/fs/stat.c:238) >>> [ 1738.196236][ T104] vfs_fstatat >>> (/home/anders/src/kernel/next/fs/stat.c:276) >>> [ 1738.200120][ T104] __do_sys_newfstatat >>> (/home/anders/src/kernel/next/fs/stat.c:446) >>> [ 1738.204095][ T104] __arm64_sys_newfstatat >>> (/home/anders/src/kernel/next/fs/stat.c:440 >>> /home/anders/src/kernel/next/fs/stat.c:440) >>> [ 1738.207676][ T104] el0_svc_common.constprop.0 >>> (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:38 >>> /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:52 >>> /home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:139) >>> [ 1738.211820][ T104] do_el0_svc >>> (/home/anders/src/kernel/next/arch/arm64/kernel/syscall.c:188) >>> [ 1738.214815][ T104] el0_svc >>> (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:133 >>> /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:144 >>> /home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:648) >>> [ 1738.217709][ T104] el0t_64_sync_handler >>> (/home/anders/src/kernel/next/arch/arm64/kernel/entry-common.c:666) >>> [ 1738.221239][ T104] el0t_64_sync >>> (/home/anders/src/kernel/next/arch/arm64/kernel/entry.S:591) >>> [ 1738.224502][ T104] >>> [ 1738.226090][ T104] 1 lock held by systemd-udevd/104: >>> [ 1738.229747][ T104] #0: ffff000006681e08 (&root- >>>> kernfs_iattr_rwsem){++++}-{3:3}, at: kernfs_iop_permission >>> (/home/anders/src/kernel/next/fs/kernfs/inode.c:288) >>> [ 1738.237504][ T104] irq event stamp: 108353 >>> [ 1738.240262][ T104] hardirqs last enabled at (108353): >>> seqcount_lockdep_reader_access >>> (/home/anders/src/kernel/next/include/linux/seqlock.h:105) >>> [ 1738.247443][ T104] hardirqs last disabled at (108352): >>> seqcount_lockdep_reader_access >>> (/home/anders/src/kernel/next/include/linux/seqlock.h:104) >>> [ 1738.254510][ T104] softirqs last enabled at (108326): >>> fpsimd_restore_current_state >>> (/home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:264 >>> /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1791) >>> [ 1738.262187][ T104] softirqs last disabled at (108324): >>> fpsimd_restore_current_state >>> (/home/anders/src/kernel/next/include/linux/bottom_half.h:20 >>> /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:242 >>> /home/anders/src/kernel/next/arch/arm64/kernel/fpsimd.c:1784) >>> [ 1738.270239][ T104] >>> [ 1738.272140][ T104] Reported by Kernel Concurrency Sanitizer on: >>> [ 1738.285185][ T104] Hardware name: linux,dummy-virt (DT) >>> [ 1738.288703][ T104] >>> ================================================================== >> This looks just like the original except a different lock is being >> >> used now. >> >> >> The link count can't be less than two if set_nlink() is called. >> >> >> Maybe I am missing something but the directory count is changed only >> >> while holding the root->kernfs_iattr_rwsem so the value used by >> >> set_nlink() will not change during concurrent calls to >> refresh_inode(). >> >> Still looks like a false positive, I'll check the write operations >> again just to be sure. > I do see a problem with recent changes. > > I'll send this off to Greg after I've done some testing (primarily just > compile and function). > > Here's a patch which describes what I found. > > Comments are, of course, welcome, ;) Anders I was hoping you would check if/what lockdep trace you get with this patch. Imran, I was hoping you would comment on my change as it relates to the kernfs_iattr_rwsem changes. Ian > > kernfs: fix missing kernfs_iattr_rwsem locking > > From: Ian Kent <raven@themaw.net> > > When the kernfs_iattr_rwsem was introduced a case was missed. > > The update of the kernfs directory node child count was also protected > by the kernfs_rwsem and needs to be included in the change so that the > child count (and so the inode n_link attribute) does not change while > holding the rwsem for read. > > Fixes: 9caf696142 (kernfs: Introduce separate rwsem to protect inode > attributes) > > Signed-off-by: Ian Kent <raven@themaw.net> > Cc: Anders Roxell <anders.roxell@linaro.org> > Cc: Imran Khan <imran.f.khan@oracle.com> > Cc: Arnd Bergmann <arnd@arndb.de> > Cc: Minchan Kim <minchan@kernel.org> > Cc: Eric Sandeen <sandeen@sandeen.net> > --- > fs/kernfs/dir.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c > index 45b6919903e6..6e84bb69602e 100644 > --- a/fs/kernfs/dir.c > +++ b/fs/kernfs/dir.c > @@ -383,9 +383,11 @@ static int kernfs_link_sibling(struct kernfs_node > *kn) > rb_insert_color(&kn->rb, &kn->parent->dir.children); > > /* successfully added, account subdir number */ > + down_write(&kernfs_root(kn)->kernfs_iattr_rwsem); > if (kernfs_type(kn) == KERNFS_DIR) > kn->parent->dir.subdirs++; > kernfs_inc_rev(kn->parent); > + up_write(&kernfs_root(kn)->kernfs_iattr_rwsem); > > return 0; > } > @@ -408,9 +410,11 @@ static bool kernfs_unlink_sibling(struct > kernfs_node *kn) > if (RB_EMPTY_NODE(&kn->rb)) > return false; > > + down_write(&kernfs_root(kn)->kernfs_iattr_rwsem); > if (kernfs_type(kn) == KERNFS_DIR) > kn->parent->dir.subdirs--; > kernfs_inc_rev(kn->parent); > + up_write(&kernfs_root(kn)->kernfs_iattr_rwsem); > > rb_erase(&kn->rb, &kn->parent->dir.children); > RB_CLEAR_NODE(&kn->rb); >
Hello Ian, Sorry for late reply. I was about to reply this week. On 27/7/2023 10:38 am, Ian Kent wrote: > On 20/7/23 10:03, Ian Kent wrote: >> On Wed, 2023-07-19 at 12:23 +0800, Ian Kent wrote: [...] >> I do see a problem with recent changes. >> >> I'll send this off to Greg after I've done some testing (primarily just >> compile and function). >> >> Here's a patch which describes what I found. >> >> Comments are, of course, welcome, ;) > > Anders I was hoping you would check if/what lockdep trace > > you get with this patch. > > > Imran, I was hoping you would comment on my change as it > > relates to the kernfs_iattr_rwsem changes. > > > Ian > >> >> kernfs: fix missing kernfs_iattr_rwsem locking >> >> From: Ian Kent <raven@themaw.net> >> >> When the kernfs_iattr_rwsem was introduced a case was missed. >> >> The update of the kernfs directory node child count was also protected >> by the kernfs_rwsem and needs to be included in the change so that the >> child count (and so the inode n_link attribute) does not change while >> holding the rwsem for read. >> kernfs direcytory node's child count changes in kernfs_(un)link_sibling and these are getting invoked while adding (kernfs_add_one), removing(__kernfs_remove) or moving (kernfs_rename_ns)a node. Each of these operations proceed under kernfs_rwsem and I see each invocation of kernfs_link/unlink_sibling during the above mentioned operations is happening under kernfs_rwsem. So the child count should still be protected by kernfs_rwsem and we should not need to acquire kernfs_iattr_rwsem in kernfs_link/unlink_sibling. Kindly let me know your thoughts. I would still like to see new lockdep traces with this change. Thanks, Imran >> Fixes: 9caf696142 (kernfs: Introduce separate rwsem to protect inode >> attributes) >> >> Signed-off-by: Ian Kent <raven@themaw.net> >> Cc: Anders Roxell <anders.roxell@linaro.org> >> Cc: Imran Khan <imran.f.khan@oracle.com> >> Cc: Arnd Bergmann <arnd@arndb.de> >> Cc: Minchan Kim <minchan@kernel.org> >> Cc: Eric Sandeen <sandeen@sandeen.net> >> --- >> fs/kernfs/dir.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c >> index 45b6919903e6..6e84bb69602e 100644 >> --- a/fs/kernfs/dir.c >> +++ b/fs/kernfs/dir.c >> @@ -383,9 +383,11 @@ static int kernfs_link_sibling(struct kernfs_node >> *kn) >> rb_insert_color(&kn->rb, &kn->parent->dir.children); >> /* successfully added, account subdir number */ >> + down_write(&kernfs_root(kn)->kernfs_iattr_rwsem); >> if (kernfs_type(kn) == KERNFS_DIR) >> kn->parent->dir.subdirs++; >> kernfs_inc_rev(kn->parent); >> + up_write(&kernfs_root(kn)->kernfs_iattr_rwsem); >> return 0; >> } >> @@ -408,9 +410,11 @@ static bool kernfs_unlink_sibling(struct >> kernfs_node *kn) >> if (RB_EMPTY_NODE(&kn->rb)) >> return false; >> + down_write(&kernfs_root(kn)->kernfs_iattr_rwsem); >> if (kernfs_type(kn) == KERNFS_DIR) >> kn->parent->dir.subdirs--; >> kernfs_inc_rev(kn->parent); >> + up_write(&kernfs_root(kn)->kernfs_iattr_rwsem); >> rb_erase(&kn->rb, &kn->parent->dir.children); >> RB_CLEAR_NODE(&kn->rb); >>
Hello again Ian, I take back my previous comment :). On 27/7/2023 2:30 pm, Imran Khan wrote: > Hello Ian, > Sorry for late reply. I was about to reply this week. > > On 27/7/2023 10:38 am, Ian Kent wrote: >> On 20/7/23 10:03, Ian Kent wrote: >>> On Wed, 2023-07-19 at 12:23 +0800, Ian Kent wrote: > > [...] >>> I do see a problem with recent changes. >>> >>> I'll send this off to Greg after I've done some testing (primarily just >>> compile and function). >>> >>> Here's a patch which describes what I found. >>> >>> Comments are, of course, welcome, ;) >> >> Anders I was hoping you would check if/what lockdep trace >> >> you get with this patch. >> >> >> Imran, I was hoping you would comment on my change as it >> >> relates to the kernfs_iattr_rwsem changes. >> >> >> Ian >> >>> >>> kernfs: fix missing kernfs_iattr_rwsem locking >>> >>> From: Ian Kent <raven@themaw.net> >>> >>> When the kernfs_iattr_rwsem was introduced a case was missed. >>> >>> The update of the kernfs directory node child count was also protected >>> by the kernfs_rwsem and needs to be included in the change so that the >>> child count (and so the inode n_link attribute) does not change while >>> holding the rwsem for read. >>> > > kernfs direcytory node's child count changes in kernfs_(un)link_sibling and > these are getting invoked while adding (kernfs_add_one), > removing(__kernfs_remove) or moving (kernfs_rename_ns)a node. Each of these > operations proceed under kernfs_rwsem and I see each invocation of > kernfs_link/unlink_sibling during the above mentioned operations is happening > under kernfs_rwsem. > So the child count should still be protected by kernfs_rwsem and we should not > need to acquire kernfs_iattr_rwsem in kernfs_link/unlink_sibling. > kernfs_refresh_inode can still race against kernfs_link/unlink_siblings. So your change looks good to me. My tests are not showing any issues either. ( I tested on 4.14 and 5.4 kernels as well). Fee free to add my RB. Reviewed-by: Imran Khan <imran.f.khan@oracle.com> > Kindly let me know your thoughts. I would still like to see new lockdep traces > with this change. > > Thanks, > Imran > >>> Fixes: 9caf696142 (kernfs: Introduce separate rwsem to protect inode >>> attributes) >>> >>> Signed-off-by: Ian Kent <raven@themaw.net> >>> Cc: Anders Roxell <anders.roxell@linaro.org> >>> Cc: Imran Khan <imran.f.khan@oracle.com> >>> Cc: Arnd Bergmann <arnd@arndb.de> >>> Cc: Minchan Kim <minchan@kernel.org> >>> Cc: Eric Sandeen <sandeen@sandeen.net> >>> --- >>> fs/kernfs/dir.c | 4 ++++ >>> 1 file changed, 4 insertions(+) >>> >>> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c >>> index 45b6919903e6..6e84bb69602e 100644 >>> --- a/fs/kernfs/dir.c >>> +++ b/fs/kernfs/dir.c >>> @@ -383,9 +383,11 @@ static int kernfs_link_sibling(struct kernfs_node >>> *kn) >>> rb_insert_color(&kn->rb, &kn->parent->dir.children); >>> /* successfully added, account subdir number */ >>> + down_write(&kernfs_root(kn)->kernfs_iattr_rwsem); >>> if (kernfs_type(kn) == KERNFS_DIR) >>> kn->parent->dir.subdirs++; >>> kernfs_inc_rev(kn->parent); >>> + up_write(&kernfs_root(kn)->kernfs_iattr_rwsem); >>> return 0; >>> } >>> @@ -408,9 +410,11 @@ static bool kernfs_unlink_sibling(struct >>> kernfs_node *kn) >>> if (RB_EMPTY_NODE(&kn->rb)) >>> return false; >>> + down_write(&kernfs_root(kn)->kernfs_iattr_rwsem); >>> if (kernfs_type(kn) == KERNFS_DIR) >>> kn->parent->dir.subdirs--; >>> kernfs_inc_rev(kn->parent); >>> + up_write(&kernfs_root(kn)->kernfs_iattr_rwsem); >>> rb_erase(&kn->rb, &kn->parent->dir.children); >>> RB_CLEAR_NODE(&kn->rb); >>>
On 27/7/23 12:30, Imran Khan wrote: > Hello Ian, > Sorry for late reply. I was about to reply this week. > > On 27/7/2023 10:38 am, Ian Kent wrote: >> On 20/7/23 10:03, Ian Kent wrote: >>> On Wed, 2023-07-19 at 12:23 +0800, Ian Kent wrote: > [...] >>> I do see a problem with recent changes. >>> >>> I'll send this off to Greg after I've done some testing (primarily just >>> compile and function). >>> >>> Here's a patch which describes what I found. >>> >>> Comments are, of course, welcome, ;) >> Anders I was hoping you would check if/what lockdep trace >> >> you get with this patch. >> >> >> Imran, I was hoping you would comment on my change as it >> >> relates to the kernfs_iattr_rwsem changes. >> >> >> Ian >> >>> kernfs: fix missing kernfs_iattr_rwsem locking >>> >>> From: Ian Kent <raven@themaw.net> >>> >>> When the kernfs_iattr_rwsem was introduced a case was missed. >>> >>> The update of the kernfs directory node child count was also protected >>> by the kernfs_rwsem and needs to be included in the change so that the >>> child count (and so the inode n_link attribute) does not change while >>> holding the rwsem for read. >>> > kernfs direcytory node's child count changes in kernfs_(un)link_sibling and > these are getting invoked while adding (kernfs_add_one), > removing(__kernfs_remove) or moving (kernfs_rename_ns)a node. Each of these > operations proceed under kernfs_rwsem and I see each invocation of > kernfs_link/unlink_sibling during the above mentioned operations is happening > under kernfs_rwsem. > So the child count should still be protected by kernfs_rwsem and we should not > need to acquire kernfs_iattr_rwsem in kernfs_link/unlink_sibling. Yes, that's exactly what I intended (assuming you mean write lock in those cases) when I did it so now I wonder what I saw that lead to my patch, I'll need to look again ... > > Kindly let me know your thoughts. I would still like to see new lockdep traces > with this change. Indeed, I hope Anders can find time to get the trace. Ian > > Thanks, > Imran > >>> Fixes: 9caf696142 (kernfs: Introduce separate rwsem to protect inode >>> attributes) >>> >>> Signed-off-by: Ian Kent <raven@themaw.net> >>> Cc: Anders Roxell <anders.roxell@linaro.org> >>> Cc: Imran Khan <imran.f.khan@oracle.com> >>> Cc: Arnd Bergmann <arnd@arndb.de> >>> Cc: Minchan Kim <minchan@kernel.org> >>> Cc: Eric Sandeen <sandeen@sandeen.net> >>> --- >>> fs/kernfs/dir.c | 4 ++++ >>> 1 file changed, 4 insertions(+) >>> >>> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c >>> index 45b6919903e6..6e84bb69602e 100644 >>> --- a/fs/kernfs/dir.c >>> +++ b/fs/kernfs/dir.c >>> @@ -383,9 +383,11 @@ static int kernfs_link_sibling(struct kernfs_node >>> *kn) >>> rb_insert_color(&kn->rb, &kn->parent->dir.children); >>> /* successfully added, account subdir number */ >>> + down_write(&kernfs_root(kn)->kernfs_iattr_rwsem); >>> if (kernfs_type(kn) == KERNFS_DIR) >>> kn->parent->dir.subdirs++; >>> kernfs_inc_rev(kn->parent); >>> + up_write(&kernfs_root(kn)->kernfs_iattr_rwsem); >>> return 0; >>> } >>> @@ -408,9 +410,11 @@ static bool kernfs_unlink_sibling(struct >>> kernfs_node *kn) >>> if (RB_EMPTY_NODE(&kn->rb)) >>> return false; >>> + down_write(&kernfs_root(kn)->kernfs_iattr_rwsem); >>> if (kernfs_type(kn) == KERNFS_DIR) >>> kn->parent->dir.subdirs--; >>> kernfs_inc_rev(kn->parent); >>> + up_write(&kernfs_root(kn)->kernfs_iattr_rwsem); >>> rb_erase(&kn->rb, &kn->parent->dir.children); >>> RB_CLEAR_NODE(&kn->rb); >>>
On 28/7/23 08:00, Ian Kent wrote: > On 27/7/23 12:30, Imran Khan wrote: >> Hello Ian, >> Sorry for late reply. I was about to reply this week. >> >> On 27/7/2023 10:38 am, Ian Kent wrote: >>> On 20/7/23 10:03, Ian Kent wrote: >>>> On Wed, 2023-07-19 at 12:23 +0800, Ian Kent wrote: >> [...] >>>> I do see a problem with recent changes. >>>> >>>> I'll send this off to Greg after I've done some testing (primarily >>>> just >>>> compile and function). >>>> >>>> Here's a patch which describes what I found. >>>> >>>> Comments are, of course, welcome, ;) >>> Anders I was hoping you would check if/what lockdep trace >>> >>> you get with this patch. >>> >>> >>> Imran, I was hoping you would comment on my change as it >>> >>> relates to the kernfs_iattr_rwsem changes. >>> >>> >>> Ian >>> >>>> kernfs: fix missing kernfs_iattr_rwsem locking >>>> >>>> From: Ian Kent <raven@themaw.net> >>>> >>>> When the kernfs_iattr_rwsem was introduced a case was missed. >>>> >>>> The update of the kernfs directory node child count was also protected >>>> by the kernfs_rwsem and needs to be included in the change so that the >>>> child count (and so the inode n_link attribute) does not change while >>>> holding the rwsem for read. >>>> >> kernfs direcytory node's child count changes in >> kernfs_(un)link_sibling and >> these are getting invoked while adding (kernfs_add_one), >> removing(__kernfs_remove) or moving (kernfs_rename_ns)a node. Each of >> these >> operations proceed under kernfs_rwsem and I see each invocation of >> kernfs_link/unlink_sibling during the above mentioned operations is >> happening >> under kernfs_rwsem. >> So the child count should still be protected by kernfs_rwsem and we >> should not >> need to acquire kernfs_iattr_rwsem in kernfs_link/unlink_sibling. > > Yes, that's exactly what I intended (assuming you mean write lock in > those cases) > > when I did it so now I wonder what I saw that lead to my patch, I'll > need to look > > again ... Ahh, I see why I thought this ... It's the hunk: @@ -285,10 +285,10 @@ int kernfs_iop_permission(struct mnt_idmap *idmap, kn = inode->i_private; root = kernfs_root(kn); - down_read(&root->kernfs_rwsem); + down_read(&root->kernfs_iattr_rwsem); kernfs_refresh_inode(kn, inode); ret = generic_permission(&nop_mnt_idmap, inode, mask); - up_read(&root->kernfs_rwsem); + up_read(&root->kernfs_iattr_rwsem); return ret; } which takes away the kernfs_rwsem and introduces the possibility of the change. It may be more instructive to add back taking the read lock of kernfs_rwsem in .permission() than altering the sibling link and unlink functions, I mean I even caught myself on it. Ian > > >> >> Kindly let me know your thoughts. I would still like to see new >> lockdep traces >> with this change. > > Indeed, I hope Anders can find time to get the trace. > > > Ian > >> >> Thanks, >> Imran >> >>>> Fixes: 9caf696142 (kernfs: Introduce separate rwsem to protect inode >>>> attributes) >>>> >>>> Signed-off-by: Ian Kent <raven@themaw.net> >>>> Cc: Anders Roxell <anders.roxell@linaro.org> >>>> Cc: Imran Khan <imran.f.khan@oracle.com> >>>> Cc: Arnd Bergmann <arnd@arndb.de> >>>> Cc: Minchan Kim <minchan@kernel.org> >>>> Cc: Eric Sandeen <sandeen@sandeen.net> >>>> --- >>>> fs/kernfs/dir.c | 4 ++++ >>>> 1 file changed, 4 insertions(+) >>>> >>>> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c >>>> index 45b6919903e6..6e84bb69602e 100644 >>>> --- a/fs/kernfs/dir.c >>>> +++ b/fs/kernfs/dir.c >>>> @@ -383,9 +383,11 @@ static int kernfs_link_sibling(struct kernfs_node >>>> *kn) >>>> rb_insert_color(&kn->rb, &kn->parent->dir.children); >>>> /* successfully added, account subdir number */ >>>> + down_write(&kernfs_root(kn)->kernfs_iattr_rwsem); >>>> if (kernfs_type(kn) == KERNFS_DIR) >>>> kn->parent->dir.subdirs++; >>>> kernfs_inc_rev(kn->parent); >>>> + up_write(&kernfs_root(kn)->kernfs_iattr_rwsem); >>>> return 0; >>>> } >>>> @@ -408,9 +410,11 @@ static bool kernfs_unlink_sibling(struct >>>> kernfs_node *kn) >>>> if (RB_EMPTY_NODE(&kn->rb)) >>>> return false; >>>> + down_write(&kernfs_root(kn)->kernfs_iattr_rwsem); >>>> if (kernfs_type(kn) == KERNFS_DIR) >>>> kn->parent->dir.subdirs--; >>>> kernfs_inc_rev(kn->parent); >>>> + up_write(&kernfs_root(kn)->kernfs_iattr_rwsem); >>>> rb_erase(&kn->rb, &kn->parent->dir.children); >>>> RB_CLEAR_NODE(&kn->rb); >>>>
Hello Ian, On 28/7/2023 10:16 am, Ian Kent wrote: > On 28/7/23 08:00, Ian Kent wrote: >> On 27/7/23 12:30, Imran Khan wrote: >>> Hello Ian, >>> Sorry for late reply. I was about to reply this week. >>> >>> On 27/7/2023 10:38 am, Ian Kent wrote: >>>> On 20/7/23 10:03, Ian Kent wrote: >>>>> On Wed, 2023-07-19 at 12:23 +0800, Ian Kent wrote: >>> [...] >>>>> I do see a problem with recent changes. >>>>> >>>>> I'll send this off to Greg after I've done some testing (primarily just >>>>> compile and function). >>>>> >>>>> Here's a patch which describes what I found. >>>>> >>>>> Comments are, of course, welcome, ;) >>>> Anders I was hoping you would check if/what lockdep trace >>>> >>>> you get with this patch. >>>> >>>> >>>> Imran, I was hoping you would comment on my change as it >>>> >>>> relates to the kernfs_iattr_rwsem changes. >>>> >>>> >>>> Ian >>>> >>>>> kernfs: fix missing kernfs_iattr_rwsem locking >>>>> >>>>> From: Ian Kent <raven@themaw.net> >>>>> >>>>> When the kernfs_iattr_rwsem was introduced a case was missed. >>>>> >>>>> The update of the kernfs directory node child count was also protected >>>>> by the kernfs_rwsem and needs to be included in the change so that the >>>>> child count (and so the inode n_link attribute) does not change while >>>>> holding the rwsem for read. >>>>> >>> kernfs direcytory node's child count changes in kernfs_(un)link_sibling and >>> these are getting invoked while adding (kernfs_add_one), >>> removing(__kernfs_remove) or moving (kernfs_rename_ns)a node. Each of these >>> operations proceed under kernfs_rwsem and I see each invocation of >>> kernfs_link/unlink_sibling during the above mentioned operations is happening >>> under kernfs_rwsem. >>> So the child count should still be protected by kernfs_rwsem and we should not >>> need to acquire kernfs_iattr_rwsem in kernfs_link/unlink_sibling. >> >> Yes, that's exactly what I intended (assuming you mean write lock in those cases) >> >> when I did it so now I wonder what I saw that lead to my patch, I'll need to look >> >> again ... > > Ahh, I see why I thought this ... > > It's the hunk: > > @@ -285,10 +285,10 @@ int kernfs_iop_permission(struct mnt_idmap *idmap, > kn = inode->i_private; > root = kernfs_root(kn); > > - down_read(&root->kernfs_rwsem); > + down_read(&root->kernfs_iattr_rwsem); > kernfs_refresh_inode(kn, inode); > ret = generic_permission(&nop_mnt_idmap, inode, mask); > - up_read(&root->kernfs_rwsem); > + up_read(&root->kernfs_iattr_rwsem); > > return ret; > } > > which takes away the kernfs_rwsem and introduces the possibility of > > the change. It may be more instructive to add back taking the read > > lock of kernfs_rwsem in .permission() than altering the sibling link > > and unlink functions, I mean I even caught myself on it. > Yes this was the block I referred to in my second comment [1]. However adding back read lock of kernfs_rwsem in .permission() will again introduce the bottleneck that I mentioned at [2]. So I think taking taking the locks in link/unlink is the better option. I understand having one lock to synchronize everything makes it easier debug/development wise but sometimes (such as the case mentioned in [2]), it is not optimum performance wise. Thoughts ? Thanks, Imran [1]: https://lore.kernel.org/all/8b0a1619-1e39-fc3a-1226-f3b167e64646@oracle.com/ [2]: https://lore.kernel.org/all/20230302043203.1695051-2-imran.f.khan@oracle.com/ > > Ian > >> >> >>> >>> Kindly let me know your thoughts. I would still like to see new lockdep traces >>> with this change. >> >> Indeed, I hope Anders can find time to get the trace. >> >> >> Ian >> >>> >>> Thanks, >>> Imran >>> >>>>> Fixes: 9caf696142 (kernfs: Introduce separate rwsem to protect inode >>>>> attributes) >>>>> >>>>> Signed-off-by: Ian Kent <raven@themaw.net> >>>>> Cc: Anders Roxell <anders.roxell@linaro.org> >>>>> Cc: Imran Khan <imran.f.khan@oracle.com> >>>>> Cc: Arnd Bergmann <arnd@arndb.de> >>>>> Cc: Minchan Kim <minchan@kernel.org> >>>>> Cc: Eric Sandeen <sandeen@sandeen.net> >>>>> --- >>>>> fs/kernfs/dir.c | 4 ++++ >>>>> 1 file changed, 4 insertions(+) >>>>> >>>>> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c >>>>> index 45b6919903e6..6e84bb69602e 100644 >>>>> --- a/fs/kernfs/dir.c >>>>> +++ b/fs/kernfs/dir.c >>>>> @@ -383,9 +383,11 @@ static int kernfs_link_sibling(struct kernfs_node >>>>> *kn) >>>>> rb_insert_color(&kn->rb, &kn->parent->dir.children); >>>>> /* successfully added, account subdir number */ >>>>> + down_write(&kernfs_root(kn)->kernfs_iattr_rwsem); >>>>> if (kernfs_type(kn) == KERNFS_DIR) >>>>> kn->parent->dir.subdirs++; >>>>> kernfs_inc_rev(kn->parent); >>>>> + up_write(&kernfs_root(kn)->kernfs_iattr_rwsem); >>>>> return 0; >>>>> } >>>>> @@ -408,9 +410,11 @@ static bool kernfs_unlink_sibling(struct >>>>> kernfs_node *kn) >>>>> if (RB_EMPTY_NODE(&kn->rb)) >>>>> return false; >>>>> + down_write(&kernfs_root(kn)->kernfs_iattr_rwsem); >>>>> if (kernfs_type(kn) == KERNFS_DIR) >>>>> kn->parent->dir.subdirs--; >>>>> kernfs_inc_rev(kn->parent); >>>>> + up_write(&kernfs_root(kn)->kernfs_iattr_rwsem); >>>>> rb_erase(&kn->rb, &kn->parent->dir.children); >>>>> RB_CLEAR_NODE(&kn->rb); >>>>>
On 28/7/23 09:06, Imran Khan wrote: > Hello Ian, > > On 28/7/2023 10:16 am, Ian Kent wrote: >> On 28/7/23 08:00, Ian Kent wrote: >>> On 27/7/23 12:30, Imran Khan wrote: >>>> Hello Ian, >>>> Sorry for late reply. I was about to reply this week. >>>> >>>> On 27/7/2023 10:38 am, Ian Kent wrote: >>>>> On 20/7/23 10:03, Ian Kent wrote: >>>>>> On Wed, 2023-07-19 at 12:23 +0800, Ian Kent wrote: >>>> [...] >>>>>> I do see a problem with recent changes. >>>>>> >>>>>> I'll send this off to Greg after I've done some testing (primarily just >>>>>> compile and function). >>>>>> >>>>>> Here's a patch which describes what I found. >>>>>> >>>>>> Comments are, of course, welcome, ;) >>>>> Anders I was hoping you would check if/what lockdep trace >>>>> >>>>> you get with this patch. >>>>> >>>>> >>>>> Imran, I was hoping you would comment on my change as it >>>>> >>>>> relates to the kernfs_iattr_rwsem changes. >>>>> >>>>> >>>>> Ian >>>>> >>>>>> kernfs: fix missing kernfs_iattr_rwsem locking >>>>>> >>>>>> From: Ian Kent <raven@themaw.net> >>>>>> >>>>>> When the kernfs_iattr_rwsem was introduced a case was missed. >>>>>> >>>>>> The update of the kernfs directory node child count was also protected >>>>>> by the kernfs_rwsem and needs to be included in the change so that the >>>>>> child count (and so the inode n_link attribute) does not change while >>>>>> holding the rwsem for read. >>>>>> >>>> kernfs direcytory node's child count changes in kernfs_(un)link_sibling and >>>> these are getting invoked while adding (kernfs_add_one), >>>> removing(__kernfs_remove) or moving (kernfs_rename_ns)a node. Each of these >>>> operations proceed under kernfs_rwsem and I see each invocation of >>>> kernfs_link/unlink_sibling during the above mentioned operations is happening >>>> under kernfs_rwsem. >>>> So the child count should still be protected by kernfs_rwsem and we should not >>>> need to acquire kernfs_iattr_rwsem in kernfs_link/unlink_sibling. >>> Yes, that's exactly what I intended (assuming you mean write lock in those cases) >>> >>> when I did it so now I wonder what I saw that lead to my patch, I'll need to look >>> >>> again ... >> Ahh, I see why I thought this ... >> >> It's the hunk: >> >> @@ -285,10 +285,10 @@ int kernfs_iop_permission(struct mnt_idmap *idmap, >> kn = inode->i_private; >> root = kernfs_root(kn); >> >> - down_read(&root->kernfs_rwsem); >> + down_read(&root->kernfs_iattr_rwsem); >> kernfs_refresh_inode(kn, inode); >> ret = generic_permission(&nop_mnt_idmap, inode, mask); >> - up_read(&root->kernfs_rwsem); >> + up_read(&root->kernfs_iattr_rwsem); >> >> return ret; >> } >> >> which takes away the kernfs_rwsem and introduces the possibility of >> >> the change. It may be more instructive to add back taking the read >> >> lock of kernfs_rwsem in .permission() than altering the sibling link >> >> and unlink functions, I mean I even caught myself on it. >> > Yes this was the block I referred to in my second comment [1]. However adding > back read lock of kernfs_rwsem in .permission() will again introduce the > bottleneck that I mentioned at [2]. So I think taking taking the locks in > link/unlink is the better option. Yes, sorry, I always fall into the trap of not reading through all my mail before commenting, oops! The fact that .permission() is called so much more than the create/remove functions also occurred to be me too just after I posted my comment (and is probably why I originally did it that way). I'll forward the patch for merge but would really like to see the lockdep trace so I'll wait a little while ... > I understand having one lock to synchronize everything makes it easier > debug/development wise but sometimes (such as the case mentioned in [2]), it is > not optimum performance wise. Indeed, the performance improvement work that has been happening over quite some time now is very good. I had seen some opportunities around the file open path long ago but hadn't got to doing anything there as you have, the work looks good to me, thanks for doing it. Ian > Thoughts ? > > Thanks, > Imran > > [1]: https://lore.kernel.org/all/8b0a1619-1e39-fc3a-1226-f3b167e64646@oracle.com/ > [2]: https://lore.kernel.org/all/20230302043203.1695051-2-imran.f.khan@oracle.com/ >> Ian >> >>> >>>> Kindly let me know your thoughts. I would still like to see new lockdep traces >>>> with this change. >>> Indeed, I hope Anders can find time to get the trace. >>> >>> >>> Ian >>> >>>> Thanks, >>>> Imran >>>> >>>>>> Fixes: 9caf696142 (kernfs: Introduce separate rwsem to protect inode >>>>>> attributes) >>>>>> >>>>>> Signed-off-by: Ian Kent <raven@themaw.net> >>>>>> Cc: Anders Roxell <anders.roxell@linaro.org> >>>>>> Cc: Imran Khan <imran.f.khan@oracle.com> >>>>>> Cc: Arnd Bergmann <arnd@arndb.de> >>>>>> Cc: Minchan Kim <minchan@kernel.org> >>>>>> Cc: Eric Sandeen <sandeen@sandeen.net> >>>>>> --- >>>>>> fs/kernfs/dir.c | 4 ++++ >>>>>> 1 file changed, 4 insertions(+) >>>>>> >>>>>> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c >>>>>> index 45b6919903e6..6e84bb69602e 100644 >>>>>> --- a/fs/kernfs/dir.c >>>>>> +++ b/fs/kernfs/dir.c >>>>>> @@ -383,9 +383,11 @@ static int kernfs_link_sibling(struct kernfs_node >>>>>> *kn) >>>>>> rb_insert_color(&kn->rb, &kn->parent->dir.children); >>>>>> /* successfully added, account subdir number */ >>>>>> + down_write(&kernfs_root(kn)->kernfs_iattr_rwsem); >>>>>> if (kernfs_type(kn) == KERNFS_DIR) >>>>>> kn->parent->dir.subdirs++; >>>>>> kernfs_inc_rev(kn->parent); >>>>>> + up_write(&kernfs_root(kn)->kernfs_iattr_rwsem); >>>>>> return 0; >>>>>> } >>>>>> @@ -408,9 +410,11 @@ static bool kernfs_unlink_sibling(struct >>>>>> kernfs_node *kn) >>>>>> if (RB_EMPTY_NODE(&kn->rb)) >>>>>> return false; >>>>>> + down_write(&kernfs_root(kn)->kernfs_iattr_rwsem); >>>>>> if (kernfs_type(kn) == KERNFS_DIR) >>>>>> kn->parent->dir.subdirs--; >>>>>> kernfs_inc_rev(kn->parent); >>>>>> + up_write(&kernfs_root(kn)->kernfs_iattr_rwsem); >>>>>> rb_erase(&kn->rb, &kn->parent->dir.children); >>>>>> RB_CLEAR_NODE(&kn->rb); >>>>>>
diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c index 3d783d80f5da..74f3453f4639 100644 --- a/fs/kernfs/inode.c +++ b/fs/kernfs/inode.c @@ -190,10 +190,8 @@ int kernfs_iop_getattr(struct user_namespace *mnt_userns, struct kernfs_root *root = kernfs_root(kn); down_read(&root->kernfs_rwsem); - spin_lock(&inode->i_lock); kernfs_refresh_inode(kn, inode); generic_fillattr(&init_user_ns, inode, stat); - spin_unlock(&inode->i_lock); up_read(&root->kernfs_rwsem); return 0; @@ -288,10 +286,8 @@ int kernfs_iop_permission(struct user_namespace *mnt_userns, root = kernfs_root(kn); down_read(&root->kernfs_rwsem); - spin_lock(&inode->i_lock); kernfs_refresh_inode(kn, inode); ret = generic_permission(&init_user_ns, inode, mask); - spin_unlock(&inode->i_lock); up_read(&root->kernfs_rwsem); return ret;