Message ID | 20230119211455.498968-2-echanude@redhat.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp555097wrn; Thu, 19 Jan 2023 13:33:16 -0800 (PST) X-Google-Smtp-Source: AMrXdXvZBbzol6ckmmYdcXlPZpViMFWm0ugQDf8QsByu+NA0t0+XdavrUlLTa2oR3KR5t7Ehw8dM X-Received: by 2002:a05:6a21:33aa:b0:b9:6208:44e9 with SMTP id yy42-20020a056a2133aa00b000b9620844e9mr750456pzb.7.1674163995759; Thu, 19 Jan 2023 13:33:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674163995; cv=none; d=google.com; s=arc-20160816; b=NDA6nZlQfjgQEIooSdOY48NSMIBgBo2h1MOgyn7yEJsd1ttvRaVKXN/qGDNs3iMvLp X11Uj1JkYMto0+s5Nc3QN9kyFIOyP1hAjgTPilvPaMG0ywhQeF5fpxatgYe/c6/057P5 5PYvD3Sjp1T2dVNog4U+EjtuuCeRZ0VXmPCkhMVspVx9xPN5C5sktoI0MjCrBn5N14OF Coe/DAiQJIyupTdfrgT1ZXYNnDLGCUq+Nw+Ga9iaWTVk8WJwBkfP5Sd1YYrgE8lowjEB 3+TOme6G45BYZRwX0CxdkTysVgdoCqsBfuHjLIIZzg0ij2VKr/YWtKlY6OPcS6ydEef/ wyKg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=kMg8gNQDfp59Z8c/Laiz+GjWJb2Q7J01BdsOGckaZ2o=; b=L2vCOKddmooyUSQ9CeT4KGcsJsqzRlDojxjeo+42QHsc3McPNM6mwV8xenl/RDU7ay GRZSwb80yv9GEqrgkKRvcHnrm2ZjOuCNUs07sxCUVlF+M3XgwAemFi4RCNoEQuYUpNFh M9G3dzuXT1WfzdmFb36K0SW14DXbra43Xvex8tbHd9D+XtVFzYO1YiuxjbeumlWU8wX6 3u5NR23sR+aHu9SKPtq0SEDKVhu/27OCgpcDXSxqPW0HgEui4QhPaCQz68kR9Wn804vS vo7+Pftx7RXOHgLfC4c8mkCvoS1CDrt6FKogCpBV5pBHziLrk7VOzuaOWfSZGotpiTbe svcg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="UpC/KpUD"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q8-20020a63cc48000000b0046b1dabcb34si40095313pgi.765.2023.01.19.13.33.03; Thu, 19 Jan 2023 13:33:15 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="UpC/KpUD"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230342AbjASVY5 (ORCPT <rfc822;pavtiger@gmail.com> + 99 others); Thu, 19 Jan 2023 16:24:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38066 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229654AbjASVWl (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 19 Jan 2023 16:22:41 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 63A8C9F3AE for <linux-kernel@vger.kernel.org>; Thu, 19 Jan 2023 13:15:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1674162931; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kMg8gNQDfp59Z8c/Laiz+GjWJb2Q7J01BdsOGckaZ2o=; b=UpC/KpUDpEMYCS5+V9Fr8jl0r/Xo7A/XnUk+plLkz8t+bk1JRgj6RuI42vM5E0zdTw3ATt 3Qjw3/vNjdCukcjHR0mf7gnoDgAq6wsXOcgzGK++8Fda0ZNm/FGW6FMbfO/2bLInAAsnHw OuEmCshJZvfDpyCD1FWhLhx6ueQX4ZQ= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-508-kv9d8IbAOKS5z4NiyJvJDQ-1; Thu, 19 Jan 2023 16:15:30 -0500 X-MC-Unique: kv9d8IbAOKS5z4NiyJvJDQ-1 Received: by mail-qt1-f198.google.com with SMTP id o16-20020ac841d0000000b003b689580725so1525796qtm.22 for <linux-kernel@vger.kernel.org>; Thu, 19 Jan 2023 13:15:30 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kMg8gNQDfp59Z8c/Laiz+GjWJb2Q7J01BdsOGckaZ2o=; b=G0qBtmJI/ixVVsQmV2sv9dj1Ag3x0VY72XQKvaFgMx8hqXrnqifSSb1b30eQc41LPs N2VTj18Gcgq6MkA6NS3kiEMVGYX/KMrBeXPRfxDromPuvS0Ojg2JW2/pXFlDjJX6pFV4 kQgNmnkYZdM/6OWGKzkT+d3DZ0n2CvHqXkqNO50BcZ/n494Z65Y4ob7uoPtZnIXZJLuO 0u4nPfHkesGbtLPPKwZDzq4WtRWB9uJwkGL4uhsD41beSZsWK1GeFkhcLTmGUIDPectE 1D5IkozLgorZg+d8TlTHY69pDS8PIP5IG8l3O+/zTuwHSHMvC2fbLbDNM1/DcJ4E17Zz 5F4Q== X-Gm-Message-State: AFqh2krCHjiGjn+r22eoLmhNGhRnVw2R+8DljUPY6HopHc6R9MipjFAv rpA9TSJiVOMOMPk39VjzX2WHRSQDzKyEJSonUYoo3RGT91yctP5IrEUH5fSPrBGH4U2UZCmMMkT pDMvLFzUsEpJ3F47Jhfpgvl9u X-Received: by 2002:ac8:45ce:0:b0:3b6:2e75:c9bd with SMTP id e14-20020ac845ce000000b003b62e75c9bdmr14633632qto.1.1674162929638; Thu, 19 Jan 2023 13:15:29 -0800 (PST) X-Received: by 2002:ac8:45ce:0:b0:3b6:2e75:c9bd with SMTP id e14-20020ac845ce000000b003b62e75c9bdmr14633616qto.1.1674162929441; Thu, 19 Jan 2023 13:15:29 -0800 (PST) Received: from localhost (pool-71-184-142-128.bstnma.fios.verizon.net. [71.184.142.128]) by smtp.gmail.com with ESMTPSA id b24-20020ac86798000000b0039cc944ebdasm19443699qtp.54.2023.01.19.13.15.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 Jan 2023 13:15:29 -0800 (PST) From: Eric Chanudet <echanude@redhat.com> To: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rt-users@vger.kernel.org, Alexander Larsson <alexl@redhat.com>, Andrew Halaney <ahalaney@redhat.com>, Eric Chanudet <echanude@redhat.com> Subject: [RFC PATCH RESEND 1/1] fs/namespace: defer free_mount from namespace_unlock Date: Thu, 19 Jan 2023 16:14:55 -0500 Message-Id: <20230119211455.498968-2-echanude@redhat.com> X-Mailer: git-send-email 2.39.0 In-Reply-To: <20230119211455.498968-1-echanude@redhat.com> References: <20230119211455.498968-1-echanude@redhat.com> MIME-Version: 1.0 Content-type: text/plain Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755486370383353604?= X-GMAIL-MSGID: =?utf-8?q?1755488186165265005?= |
Series |
fs/namespace: defer free_mount from namespace_unlock
|
|
Commit Message
Eric Chanudet
Jan. 19, 2023, 9:14 p.m. UTC
From: Alexander Larsson <alexl@redhat.com> Use call_rcu to defer releasing the umount'ed or detached filesystem when calling namepsace_unlock(). Calling synchronize_rcu_expedited() has a significant cost on RT kernel that default to rcupdate.rcu_normal_after_boot=1. For example, on a 6.2-rt1 kernel: perf stat -r 10 --null --pre 'mount -t tmpfs tmpfs mnt' -- umount mnt 0.07464 +- 0.00396 seconds time elapsed ( +- 5.31% ) With this change applied: perf stat -r 10 --null --pre 'mount -t tmpfs tmpfs mnt' -- umount mnt 0.00162604 +- 0.00000637 seconds time elapsed ( +- 0.39% ) Waiting for the grace period before completing the syscall does not seem mandatory. The struct mount umount'ed are queued up for release in a separate list and no longer accessible to following syscalls. Signed-off-by: Alexander Larsson <alexl@redhat.com> Signed-off-by: Eric Chanudet <echanude@redhat.com> --- fs/namespace.c | 42 +++++++++++++++++++++++++++++++++++------- 1 file changed, 35 insertions(+), 7 deletions(-)
Comments
On Thu, Jan 19, 2023 at 04:14:55PM -0500, Eric Chanudet wrote: > From: Alexander Larsson <alexl@redhat.com> > > Use call_rcu to defer releasing the umount'ed or detached filesystem > when calling namepsace_unlock(). > > Calling synchronize_rcu_expedited() has a significant cost on RT kernel > that default to rcupdate.rcu_normal_after_boot=1. > > For example, on a 6.2-rt1 kernel: > perf stat -r 10 --null --pre 'mount -t tmpfs tmpfs mnt' -- umount mnt > 0.07464 +- 0.00396 seconds time elapsed ( +- 5.31% ) > > With this change applied: > perf stat -r 10 --null --pre 'mount -t tmpfs tmpfs mnt' -- umount mnt > 0.00162604 +- 0.00000637 seconds time elapsed ( +- 0.39% ) > > Waiting for the grace period before completing the syscall does not seem > mandatory. The struct mount umount'ed are queued up for release in a > separate list and no longer accessible to following syscalls. Again, NAK. If a filesystem is expected to be shut down by umount(2), userland expects it to have been already shut down by the time the syscall returns. It's not just visibility in namespace; it's "can I pull the disk out?". Or "can the shutdown get to taking the network down?", for that matter.
On Thu, Jan 19, 2023 at 11:09 PM Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Thu, Jan 19, 2023 at 04:14:55PM -0500, Eric Chanudet wrote: > > From: Alexander Larsson <alexl@redhat.com> > > > > Use call_rcu to defer releasing the umount'ed or detached filesystem > > when calling namepsace_unlock(). > > > > Calling synchronize_rcu_expedited() has a significant cost on RT kernel > > that default to rcupdate.rcu_normal_after_boot=1. > > > > For example, on a 6.2-rt1 kernel: > > perf stat -r 10 --null --pre 'mount -t tmpfs tmpfs mnt' -- umount mnt > > 0.07464 +- 0.00396 seconds time elapsed ( +- 5.31% ) > > > > With this change applied: > > perf stat -r 10 --null --pre 'mount -t tmpfs tmpfs mnt' -- umount mnt > > 0.00162604 +- 0.00000637 seconds time elapsed ( +- 0.39% ) > > > > Waiting for the grace period before completing the syscall does not seem > > mandatory. The struct mount umount'ed are queued up for release in a > > separate list and no longer accessible to following syscalls. > > Again, NAK. If a filesystem is expected to be shut down by umount(2), > userland expects it to have been already shut down by the time the > syscall returns. > > It's not just visibility in namespace; it's "can I pull the disk out?". > Or "can the shutdown get to taking the network down?", for that matter. In the usecase we're worrying about, all the unmounts are lazy (i.e. MNT_DETACH). What about delaying the destroy in that case? That seems in line with the expected behaviour of lazy shutdown. I.e. you can't rely on it to be settled anyway.
Greeting, FYI, we noticed WARNING:inconsistent_lock_state due to commit (built with gcc-11): commit: fcd28ffda89717f5d68fff9d3260b57f02d93eda ("[RFC PATCH RESEND 1/1] fs/namespace: defer free_mount from namespace_unlock") url: https://github.com/intel-lab-lkp/linux/commits/Eric-Chanudet/fs-namespace-defer-free_mount-from-namespace_unlock/20230120-052658 base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git d368967cb1039b5c4cccb62b5a4b9468c50cd143 patch link: https://lore.kernel.org/all/20230119211455.498968-2-echanude@redhat.com/ patch subject: [RFC PATCH RESEND 1/1] fs/namespace: defer free_mount from namespace_unlock in testcase: boot on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): [ 13.499613][ C0] WARNING: inconsistent lock state [ 13.500185][ C0] 6.2.0-rc4-00078-gfcd28ffda897 #1 Not tainted [ 13.500852][ C0] -------------------------------- [ 13.501426][ C0] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [ 13.502160][ C0] systemd/1 [HC0[0]:SC1[1]:HE1:SE0] takes: [ 13.502810][ C0] ffffffff84414790 (mount_lock){+.?.}-{2:2}, at: mntput_no_expire (include/linux/seqlock.h:469 include/linux/seqlock.h:515 include/linux/seqlock.h:541 include/linux/seqlock.h:887 fs/namespace.c:125 fs/namespace.c:1337) [ 13.503774][ C0] {SOFTIRQ-ON-W} state was registered at: [ 13.504400][ C0] __lock_acquire (kernel/locking/lockdep.c:5009) [ 13.504961][ C0] lock_acquire (kernel/locking/lockdep.c:466 kernel/locking/lockdep.c:5670 kernel/locking/lockdep.c:5633) [ 13.505496][ C0] _raw_spin_lock (include/linux/spinlock_api_smp.h:134 kernel/locking/spinlock.c:154) [ 13.506033][ C0] vfs_create_mount (include/linux/seqlock.h:469 include/linux/seqlock.h:515 include/linux/seqlock.h:541 include/linux/seqlock.h:887 fs/namespace.c:125 fs/namespace.c:1145) [ 13.506594][ C0] vfs_kern_mount (fs/namespace.c:1201) [ 13.507214][ C0] kern_mount (fs/namespace.c:4590) [ 13.507716][ C0] shmem_init (mm/shmem.c:4060) [ 13.508225][ C0] mnt_init (fs/namespace.c:4574) [ 13.508723][ C0] vfs_caches_init (fs/dcache.c:3354) [ 13.509273][ C0] start_kernel (init/main.c:1131) [ 13.509812][ C0] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:358) [ 13.510468][ C0] irq event stamp: 1059406 [ 13.510999][ C0] hardirqs last enabled at (1059406): kasan_quarantine_put (arch/x86/include/asm/irqflags.h:45 (discriminator 1) arch/x86/include/asm/irqflags.h:80 (discriminator 1) arch/x86/include/asm/irqflags.h:138 (discriminator 1) mm/kasan/quarantine.c:242 (discriminator 1)) [ 13.512073][ C0] hardirqs last disabled at (1059405): kasan_quarantine_put (mm/kasan/quarantine.c:217 (discriminator 1)) [ 13.513123][ C0] softirqs last enabled at (1058838): __do_softirq (arch/x86/include/asm/preempt.h:27 kernel/softirq.c:415 kernel/softirq.c:600) [ 13.514123][ C0] softirqs last disabled at (1059387): __irq_exit_rcu (kernel/softirq.c:445 kernel/softirq.c:650) [ 13.515161][ C0] [ 13.515161][ C0] other info that might help us debug this: [ 13.516033][ C0] Possible unsafe locking scenario: [ 13.516033][ C0] [ 13.516857][ C0] CPU0 [ 13.517282][ C0] ---- [ 13.517700][ C0] lock(mount_lock); [ 13.518184][ C0] <Interrupt> [ 13.518612][ C0] lock(mount_lock); [ 13.519118][ C0] [ 13.519118][ C0] *** DEADLOCK *** [ 13.519118][ C0] [ 13.520048][ C0] 3 locks held by systemd/1: [ 13.520581][ C0] #0: ffffffff84925200 (rcu_read_lock){....}-{1:2}, at: __is_insn_slot_addr (kernel/kprobes.c:297) [ 13.521514][ C0] #1: ffffffff849250e0 (rcu_callback){....}-{0:0}, at: rcu_do_batch (include/linux/rcupdate.h:325 kernel/rcu/tree.c:2241) [ 13.522372][ C0] #2: ffffffff84925200 (rcu_read_lock){....}-{1:2}, at: mntput_no_expire (fs/namespace.c:1318) [ 13.523384][ C0] [ 13.523384][ C0] stack backtrace: [ 13.524062][ C0] CPU: 0 PID: 1 Comm: systemd Not tainted 6.2.0-rc4-00078-gfcd28ffda897 #1 [ 13.524985][ C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-5 04/01/2014 [ 13.526078][ C0] Call Trace: [ 13.526492][ C0] <IRQ> [ 13.526882][ C0] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4)) [ 13.527408][ C0] mark_lock_irq.cold (kernel/locking/lockdep.c:4206 kernel/locking/lockdep.c:3975 kernel/locking/lockdep.c:4178) [ 13.527961][ C0] ? lock_chain_count (kernel/locking/lockdep.c:4169) [ 13.528512][ C0] ? filter_irq_stacks (kernel/stacktrace.c:114) [ 13.529078][ C0] ? save_trace (kernel/locking/lockdep.c:585) [ 13.529597][ C0] ? stack_trace_save (kernel/stacktrace.c:123) [ 13.530151][ C0] mark_lock+0x4c9/0xac0 [ 13.530704][ C0] ? mark_lock_irq (kernel/locking/lockdep.c:4593) [ 13.531246][ C0] ? kasan_save_stack (mm/kasan/common.c:47) [ 13.531804][ C0] ? kasan_save_stack (mm/kasan/common.c:46) [ 13.532359][ C0] ? kasan_set_track (mm/kasan/common.c:52) [ 13.532901][ C0] ? kasan_save_free_info (mm/kasan/generic.c:520) [ 13.533485][ C0] mark_usage (kernel/locking/lockdep.c:4529) [ 13.533981][ C0] __lock_acquire (kernel/locking/lockdep.c:5009) [ 13.534531][ C0] lock_acquire (kernel/locking/lockdep.c:466 kernel/locking/lockdep.c:5670 kernel/locking/lockdep.c:5633) [ 13.535060][ C0] ? mntput_no_expire (include/linux/seqlock.h:469 include/linux/seqlock.h:515 include/linux/seqlock.h:541 include/linux/seqlock.h:887 fs/namespace.c:125 fs/namespace.c:1337) [ 13.535631][ C0] ? rcu_read_unlock (include/linux/rcupdate.h:793 (discriminator 5)) [ 13.536176][ C0] ? rcu_read_unlock (include/linux/rcupdate.h:793 (discriminator 5)) [ 13.536718][ C0] ? lock_downgrade (kernel/locking/lockdep.c:5320) [ 13.537261][ C0] _raw_spin_lock (include/linux/spinlock_api_smp.h:134 kernel/locking/spinlock.c:154) [ 13.537780][ C0] ? mntput_no_expire (include/linux/seqlock.h:469 include/linux/seqlock.h:515 include/linux/seqlock.h:541 include/linux/seqlock.h:887 fs/namespace.c:125 fs/namespace.c:1337) [ 13.538339][ C0] mntput_no_expire (include/linux/seqlock.h:469 include/linux/seqlock.h:515 include/linux/seqlock.h:541 include/linux/seqlock.h:887 fs/namespace.c:125 fs/namespace.c:1337) [ 13.538901][ C0] ? rcu_read_unlock (include/linux/rcupdate.h:793 (discriminator 5)) [ 13.539451][ C0] ? lock_downgrade (kernel/locking/lockdep.c:5320) [ 13.539999][ C0] ? slab_free_freelist_hook (mm/slub.c:1807) [ 13.540609][ C0] ? mnt_get_count (fs/namespace.c:1318) [ 13.541158][ C0] ? lock_is_held_type (kernel/locking/lockdep.c:5409 kernel/locking/lockdep.c:5711) [ 13.541731][ C0] delayed_mount_release (fs/namespace.c:1607) [ 13.542305][ C0] rcu_do_batch (include/linux/rcupdate.h:330 kernel/rcu/tree.c:2248) [ 13.543383][ C0] ? rcu_implicit_dynticks_qs (kernel/rcu/tree.c:2185) [ 13.544036][ C0] ? trace_hardirqs_on (kernel/trace/trace_preemptirq.c:50 (discriminator 22)) [ 13.544610][ C0] ? _raw_spin_unlock_irqrestore (arch/x86/include/asm/irqflags.h:45 arch/x86/include/asm/irqflags.h:80 arch/x86/include/asm/irqflags.h:138 include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194) [ 13.545254][ C0] ? rcu_report_qs_rdp (kernel/rcu/tree.c:2050) [ 13.545826][ C0] rcu_core (kernel/rcu/tree.c:2508) [ 13.546315][ C0] __do_softirq (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 include/trace/events/irq.h:142 kernel/softirq.c:572) [ 13.547696][ C0] __irq_exit_rcu (kernel/softirq.c:445 kernel/softirq.c:650) [ 13.548241][ C0] irq_exit_rcu (kernel/softirq.c:664) [ 13.548741][ C0] sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1107 (discriminator 14)) [ 13.549370][ C0] </IRQ> [ 13.549757][ C0] <TASK> [ 13.550132][ C0] asm_sysvec_apic_timer_interrupt (arch/x86/include/asm/idtentry.h:649) [ 13.550793][ C0] RIP: 0010:lock_acquire (kernel/locking/lockdep.c:5636) [ 13.551387][ C0] Code: ff ff 48 83 c4 20 65 0f c1 05 af 26 d0 7e 83 f8 01 0f 85 ae 02 00 00 48 83 7c 24 08 00 74 01 fb 48 b8 00 00 00 00 00 fc ff df <48> 01 c3 48 c7 03 00 00 00 00 48 c7 43 08 00 00 00 00 48 8b 84 24 All code ======== 0: ff (bad) 1: ff 48 83 decl -0x7d(%rax) 4: c4 (bad) 5: 20 65 0f and %ah,0xf(%rbp) 8: c1 05 af 26 d0 7e 83 roll $0x83,0x7ed026af(%rip) # 0x7ed026be f: f8 clc 10: 01 0f add %ecx,(%rdi) 12: 85 ae 02 00 00 48 test %ebp,0x48000002(%rsi) 18: 83 7c 24 08 00 cmpl $0x0,0x8(%rsp) 1d: 74 01 je 0x20 1f: fb sti 20: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax 27: fc ff df 2a:* 48 01 c3 add %rax,%rbx <-- trapping instruction 2d: 48 c7 03 00 00 00 00 movq $0x0,(%rbx) 34: 48 c7 43 08 00 00 00 movq $0x0,0x8(%rbx) 3b: 00 3c: 48 rex.W 3d: 8b .byte 0x8b 3e: 84 .byte 0x84 3f: 24 .byte 0x24 Code starting with the faulting instruction =========================================== 0: 48 01 c3 add %rax,%rbx 3: 48 c7 03 00 00 00 00 movq $0x0,(%rbx) a: 48 c7 43 08 00 00 00 movq $0x0,0x8(%rbx) 11: 00 12: 48 rex.W 13: 8b .byte 0x8b 14: 84 .byte 0x84 15: 24 .byte 0x24 If you fix the issue, kindly add following tag | Reported-by: kernel test robot <yujie.liu@intel.com> | Link: https://lore.kernel.org/oe-lkp/202301301039.8bdb3db5-yujie.liu@intel.com To reproduce: # build kernel cd linux cp config-6.2.0-rc4-00078-gfcd28ffda897 .config make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install cd <mod-install-dir> find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email # if come across any failure that blocks the test, # please remove ~/.lkp and /lkp dir to run from a clean state.
diff --git a/fs/namespace.c b/fs/namespace.c index ab467ee58341..11d219a6e83c 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -44,6 +44,11 @@ static unsigned int m_hash_shift __read_mostly; static unsigned int mp_hash_mask __read_mostly; static unsigned int mp_hash_shift __read_mostly; +struct mount_delayed_release { + struct rcu_head rcu; + struct hlist_head release_list; +}; + static __initdata unsigned long mhash_entries; static int __init set_mhash_entries(char *str) { @@ -1582,11 +1587,31 @@ int may_umount(struct vfsmount *mnt) EXPORT_SYMBOL(may_umount); -static void namespace_unlock(void) +static void free_mounts(struct hlist_head *mount_list) { - struct hlist_head head; struct hlist_node *p; struct mount *m; + + hlist_for_each_entry_safe(m, p, mount_list, mnt_umount) { + hlist_del(&m->mnt_umount); + mntput(&m->mnt); + } +} + +static void delayed_mount_release(struct rcu_head *head) +{ + struct mount_delayed_release *drelease = + container_of(head, struct mount_delayed_release, rcu); + + free_mounts(&drelease->release_list); + kfree(drelease); +} + +static void namespace_unlock(void) +{ + struct hlist_head head; + struct mount_delayed_release *drelease; + LIST_HEAD(list); hlist_move_list(&unmounted, &head); @@ -1599,12 +1624,15 @@ static void namespace_unlock(void) if (likely(hlist_empty(&head))) return; - synchronize_rcu_expedited(); - - hlist_for_each_entry_safe(m, p, &head, mnt_umount) { - hlist_del(&m->mnt_umount); - mntput(&m->mnt); + drelease = kmalloc(sizeof(*drelease), GFP_KERNEL); + if (unlikely(!drelease)) { + synchronize_rcu_expedited(); + free_mounts(&head); + return; } + + hlist_move_list(&head, &drelease->release_list); + call_rcu(&drelease->rcu, delayed_mount_release); } static inline void namespace_lock(void)