Message ID | 20231108064641.65209-1-haifeng.xu@shopee.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:aa0b:0:b0:403:3b70:6f57 with SMTP id k11csp735002vqo; Tue, 7 Nov 2023 22:47:39 -0800 (PST) X-Google-Smtp-Source: AGHT+IEbUTaHXcirNTvgKM+9b8QVe30qxqA+kGKyGlhMK2cQB0/3j2Qq/kGrCCR1vksT7K48iCA5 X-Received: by 2002:a25:d353:0:b0:da0:3c5a:e1bd with SMTP id e80-20020a25d353000000b00da03c5ae1bdmr952720ybf.5.1699426058880; Tue, 07 Nov 2023 22:47:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699426058; cv=none; d=google.com; s=arc-20160816; b=C6ZjXFxOi4UhEp3iRVSeUb6WBRLXeD6IYGxuE2AsmEprOIOkCWa9efh3mmMkGv7k5D 1F7jEueBDpDvR9X73wBg2I4DumZAEVmOnsiuDev9r6LAmPY/uGenoDZ0fRQDDKzNBXF0 jCMMUvkN4mE8kWpA9TwGxiMkffAErW11dm0YX4Dv8DV8MWWBN9atzVi+d050nPyMtjKG zBoRvjBO3R2x2xEh88M4NVGNUcl8VNLicTQqNCVrPYLg9wX+EWdxfIah3JsaeaKcaVe5 SFHcUbwYqWaBOEONZifGch1hWgIr+Dh1nOKS9bl24DtHo5loq39pgpB5v1cAkzftLRc5 vU1g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=6aKtTjNDRBAbkTJx1/gQ1fqt7H+SbuhGHosyUpxruvU=; fh=j1UUC5nkGJiT6qCTwdCFisIxFjUl5d1HbRGp5r5cujA=; b=jwEkErA+sT5MOkomNPKDJIJJvY4Hbq09gkJGgvGvPu6Ab5A2LpJCXyBpMe3IW13C/t 6Dp48gRRQiFgBz8QzhyzbW5OGxNuiv/QR1FwU+M11xf0S9aUpRY9prgM7zEmmVjJkJSH PdVWCt14CylrpNbjkbSm41GMC9gzk13vNjIen4YXTZjeQUEd0SBKRjjrmIC2UNYENw0s H58UZw+dLL2KNXOFty107z3+clr2+4UYq6zPvr1dbEh17psS9AZn0dpMMDmGLTnnpyIX ApMjDKGKXtnDmqhvisa/davhGs4V5RPRZAyQoiXryaDh+Ie6wCjEEwyTN1dx7kAjwDtZ jHkA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@shopee.com header.s=shopee.com header.b=ELpwII6+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=shopee.com Received: from lipwig.vger.email (lipwig.vger.email. [2620:137:e000::3:3]) by mx.google.com with ESMTPS id 80-20020a630153000000b005ac86e7df6esi3308892pgb.363.2023.11.07.22.47.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Nov 2023 22:47:38 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) client-ip=2620:137:e000::3:3; Authentication-Results: mx.google.com; dkim=pass header.i=@shopee.com header.s=shopee.com header.b=ELpwII6+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=shopee.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 71F4B821578D; Tue, 7 Nov 2023 22:47:36 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230467AbjKHGrS (ORCPT <rfc822;jaysivo@gmail.com> + 32 others); Wed, 8 Nov 2023 01:47:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37092 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229566AbjKHGrR (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 8 Nov 2023 01:47:17 -0500 Received: from mail-pf1-x42c.google.com (mail-pf1-x42c.google.com [IPv6:2607:f8b0:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4FA428F for <linux-kernel@vger.kernel.org>; Tue, 7 Nov 2023 22:47:15 -0800 (PST) Received: by mail-pf1-x42c.google.com with SMTP id d2e1a72fcca58-6c431ca7826so1438722b3a.0 for <linux-kernel@vger.kernel.org>; Tue, 07 Nov 2023 22:47:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shopee.com; s=shopee.com; t=1699426035; x=1700030835; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=6aKtTjNDRBAbkTJx1/gQ1fqt7H+SbuhGHosyUpxruvU=; b=ELpwII6+EuVgV4C8ZGvbNIAQxLAV8cKYzpeckJqQjFDv69SM7a52ypjl87s10A0E+b wU5bZezmhgOOaicLvoEoMZwmM6f5JIYwxXUxg0UPyXbfMBCireh0nvrVzRcKVf/LeR6Y EKlO9w8sk1sbZ3tp0yqAc1FVysWXoKZnw6PWoaYE4wRlgD2k/Uy8EKE+mRTmG3K0kaP7 vZT7VmKA/DYhRDmAL2CPBlJF2R6Fk6Wj3GKQ3RmG6Bvq1MxoW0yeVeFc7wdvtKAk2tDb Ie583YvQFntDvuGJfp7Lfqdbj8C9WdkGJavZoIn130wnv8JOXcY0lPAH1YIA2U7SbPZC Qb/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699426035; x=1700030835; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=6aKtTjNDRBAbkTJx1/gQ1fqt7H+SbuhGHosyUpxruvU=; b=Mf+k564psqghIqYd/hC7o2ZlngF7kV1gS8u5NPg56+Ff6gkZs+WTNVIq6/BdsjMNqQ oFnmBGVYOdZitH+hoLRW94dYsWSG6KkYTCbPhu/asBWSX8qDbj7laz9R2BvmExyfe/Ow PG5z92MaL/pW6xkZ+DAyenc1jERGkKv8TbcyfuvlIowO37WdSiAJ0Qasbn8R4rt3HHzp 6XDdk6Q36Kp+8vardyyaJ4LkrL+D4AuJGsl5Ll6Sta6+bL20/GEuzQNSRBcYrr1muRXv ceDUmEfsX2QG7XQ5lQY0vRabgJJbJkvgZOBdLAO78Sv6tOu5elLiXHBREFdaTBK3x/V4 IQhg== X-Gm-Message-State: AOJu0Yw7bU8X+3T6op9zr1yb7iUkn1PQTb0Rt2La5UjwJ3F/VjBpAG6i 8Ps7NNcggImmi5AqN1xw6epuAHBhhAKNhTF7OhU= X-Received: by 2002:a05:6a21:71c7:b0:181:39a7:4fd4 with SMTP id ay7-20020a056a2171c700b0018139a74fd4mr1141079pzc.30.1699426034700; Tue, 07 Nov 2023 22:47:14 -0800 (PST) Received: from ubuntu-hf2.default.svc.cluster.local ([101.127.248.173]) by smtp.gmail.com with ESMTPSA id ij27-20020a170902ab5b00b001c3a8b135ebsm933831plb.282.2023.11.07.22.47.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Nov 2023 22:47:14 -0800 (PST) From: Haifeng Xu <haifeng.xu@shopee.com> To: j.vosburgh@gmail.com Cc: andy@greyhouse.net, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Haifeng Xu <haifeng.xu@shopee.com> Subject: [PATCH] boning: use a read-write lock in bonding_show_bonds() Date: Wed, 8 Nov 2023 06:46:41 +0000 Message-Id: <20231108064641.65209-1-haifeng.xu@shopee.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Tue, 07 Nov 2023 22:47:36 -0800 (PST) X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781977379110381496 X-GMAIL-MSGID: 1781977379110381496 |
Series |
boning: use a read-write lock in bonding_show_bonds()
|
|
Commit Message
Haifeng Xu
Nov. 8, 2023, 6:46 a.m. UTC
call stack:
......
PID: 210933 TASK: ffff92424e5ec080 CPU: 13 COMMAND: "kworker/u96:2"
[ffffa7a8e96bbac0] __schedule at ffffffffb0719898
[ffffa7a8e96bbb48] schedule at ffffffffb0719e9e
[ffffa7a8e96bbb68] rwsem_down_write_slowpath at ffffffffafb3167a
[ffffa7a8e96bbc00] down_write at ffffffffb071bfc1
[ffffa7a8e96bbc18] kernfs_remove_by_name_ns at ffffffffafe3593e
[ffffa7a8e96bbc48] sysfs_unmerge_group at ffffffffafe38922
[ffffa7a8e96bbc68] dpm_sysfs_remove at ffffffffb021c96a
[ffffa7a8e96bbc80] device_del at ffffffffb0209af8
[ffffa7a8e96bbcd0] netdev_unregister_kobject at ffffffffb04a6b0e
[ffffa7a8e96bbcf8] unregister_netdevice_many at ffffffffb046d3d9
[ffffa7a8e96bbd60] default_device_exit_batch at ffffffffb046d8d1
[ffffa7a8e96bbdd0] ops_exit_list at ffffffffb045e21d
[ffffa7a8e96bbe00] cleanup_net at ffffffffb045ea46
[ffffa7a8e96bbe60] process_one_work at ffffffffafad94bb
[ffffa7a8e96bbeb0] worker_thread at ffffffffafad96ad
[ffffa7a8e96bbf10] kthread at ffffffffafae132a
[ffffa7a8e96bbf50] ret_from_fork at ffffffffafa04b92
290858 PID: 278176 TASK: ffff925deb39a040 CPU: 32 COMMAND: "node-exporter"
[ffffa7a8d14dbb80] __schedule at ffffffffb0719898
[ffffa7a8d14dbc08] schedule at ffffffffb0719e9e
[ffffa7a8d14dbc28] schedule_preempt_disabled at ffffffffb071a24e
[ffffa7a8d14dbc38] __mutex_lock at ffffffffb071af28
[ffffa7a8d14dbcb8] __mutex_lock_slowpath at ffffffffb071b1a3
[ffffa7a8d14dbcc8] mutex_lock at ffffffffb071b1e2
[ffffa7a8d14dbce0] rtnl_lock at ffffffffb047f4b5
[ffffa7a8d14dbcf0] bonding_show_bonds at ffffffffc079b1a1 [bonding]
[ffffa7a8d14dbd20] class_attr_show at ffffffffb02117ce
[ffffa7a8d14dbd30] sysfs_kf_seq_show at ffffffffafe37ba1
[ffffa7a8d14dbd50] kernfs_seq_show at ffffffffafe35c07
[ffffa7a8d14dbd60] seq_read_iter at ffffffffafd9fce0
[ffffa7a8d14dbdc0] kernfs_fop_read_iter at ffffffffafe36a10
[ffffa7a8d14dbe00] new_sync_read at ffffffffafd6de23
[ffffa7a8d14dbe90] vfs_read at ffffffffafd6e64e
[ffffa7a8d14dbed0] ksys_read at ffffffffafd70977
[ffffa7a8d14dbf10] __x64_sys_read at ffffffffafd70a0a
[ffffa7a8d14dbf20] do_syscall_64 at ffffffffb070bf1c
[ffffa7a8d14dbf50] entry_SYSCALL_64_after_hwframe at ffffffffb080007c
......
Problem description:
Thread 210933 holds the rtnl_mutex and tries to acquire the kernfs_rwsem,
but there are many readers which hold the kernfs_rwsem, so it has to sleep
for a long time to wait the readers release the lock. Thread 278176 and any
other threads which call bonding_show_bonds() also need to wait because
they try to accuire the rtnl_mutex.
bonding_show_bonds() uses rtnl_mutex to protect the bond_list traversal.
However, the addition and deletion of bond_list are only performed in
bond_init()/bond_uninit(), so we can intoduce a separate read-write lock
to synchronize bond list mutation.
What's the benefits of this change?
1) All threads which call bonding_show_bonds() only wait when the
registration or unregistration of bond device happens.
2) There are many other users of rtnl_mutex, so bonding_show_bonds()
won't compete with them.
In a word, this change reduces the lock contention of rtnl_mutex.
Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com>
---
drivers/net/bonding/bond_main.c | 4 ++++
drivers/net/bonding/bond_sysfs.c | 6 ++++--
include/net/bonding.h | 3 +++
3 files changed, 11 insertions(+), 2 deletions(-)
Comments
On Wed, Nov 8, 2023 at 7:47 AM Haifeng Xu <haifeng.xu@shopee.com> wrote: > > call stack: These stacks should either be removed from the changelog, or moved _after_ the description of the problem. These are normal looking call stacks, you are not fixing a crash or deadlock. > ...... > PID: 210933 TASK: ffff92424e5ec080 CPU: 13 COMMAND: "kworker/u96:2" > [ffffa7a8e96bbac0] __schedule at ffffffffb0719898 > [ffffa7a8e96bbb48] schedule at ffffffffb0719e9e > [ffffa7a8e96bbb68] rwsem_down_write_slowpath at ffffffffafb3167a > [ffffa7a8e96bbc00] down_write at ffffffffb071bfc1 > [ffffa7a8e96bbc18] kernfs_remove_by_name_ns at ffffffffafe3593e > [ffffa7a8e96bbc48] sysfs_unmerge_group at ffffffffafe38922 > [ffffa7a8e96bbc68] dpm_sysfs_remove at ffffffffb021c96a > [ffffa7a8e96bbc80] device_del at ffffffffb0209af8 > [ffffa7a8e96bbcd0] netdev_unregister_kobject at ffffffffb04a6b0e > [ffffa7a8e96bbcf8] unregister_netdevice_many at ffffffffb046d3d9 > [ffffa7a8e96bbd60] default_device_exit_batch at ffffffffb046d8d1 > [ffffa7a8e96bbdd0] ops_exit_list at ffffffffb045e21d > [ffffa7a8e96bbe00] cleanup_net at ffffffffb045ea46 > [ffffa7a8e96bbe60] process_one_work at ffffffffafad94bb > [ffffa7a8e96bbeb0] worker_thread at ffffffffafad96ad > [ffffa7a8e96bbf10] kthread at ffffffffafae132a > [ffffa7a8e96bbf50] ret_from_fork at ffffffffafa04b92 > > 290858 PID: 278176 TASK: ffff925deb39a040 CPU: 32 COMMAND: "node-exporter" > [ffffa7a8d14dbb80] __schedule at ffffffffb0719898 > [ffffa7a8d14dbc08] schedule at ffffffffb0719e9e > [ffffa7a8d14dbc28] schedule_preempt_disabled at ffffffffb071a24e > [ffffa7a8d14dbc38] __mutex_lock at ffffffffb071af28 > [ffffa7a8d14dbcb8] __mutex_lock_slowpath at ffffffffb071b1a3 > [ffffa7a8d14dbcc8] mutex_lock at ffffffffb071b1e2 > [ffffa7a8d14dbce0] rtnl_lock at ffffffffb047f4b5 > [ffffa7a8d14dbcf0] bonding_show_bonds at ffffffffc079b1a1 [bonding] > [ffffa7a8d14dbd20] class_attr_show at ffffffffb02117ce > [ffffa7a8d14dbd30] sysfs_kf_seq_show at ffffffffafe37ba1 > [ffffa7a8d14dbd50] kernfs_seq_show at ffffffffafe35c07 > [ffffa7a8d14dbd60] seq_read_iter at ffffffffafd9fce0 > [ffffa7a8d14dbdc0] kernfs_fop_read_iter at ffffffffafe36a10 > [ffffa7a8d14dbe00] new_sync_read at ffffffffafd6de23 > [ffffa7a8d14dbe90] vfs_read at ffffffffafd6e64e > [ffffa7a8d14dbed0] ksys_read at ffffffffafd70977 > [ffffa7a8d14dbf10] __x64_sys_read at ffffffffafd70a0a > [ffffa7a8d14dbf20] do_syscall_64 at ffffffffb070bf1c > [ffffa7a8d14dbf50] entry_SYSCALL_64_after_hwframe at ffffffffb080007c > ...... > > Problem description: > > Thread 210933 holds the rtnl_mutex and tries to acquire the kernfs_rwsem, > but there are many readers which hold the kernfs_rwsem, so it has to sleep > for a long time to wait the readers release the lock. Thread 278176 and any > other threads which call bonding_show_bonds() also need to wait because > they try to accuire the rtnl_mutex. acquire > > bonding_show_bonds() uses rtnl_mutex to protect the bond_list traversal. > However, the addition and deletion of bond_list are only performed in > bond_init()/bond_uninit(), so we can intoduce a separate read-write lock introduce > to synchronize bond list mutation. > > What's the benefits of this change? > > 1) All threads which call bonding_show_bonds() only wait when the > registration or unregistration of bond device happens. > > 2) There are many other users of rtnl_mutex, so bonding_show_bonds() > won't compete with them. > > In a word, this change reduces the lock contention of rtnl_mutex. > This looks good to me, but please note: 1) This is net-next material, please resend next week, because net-next is currently closed during the merge window. 2) Using a spell checker would point few typos (including in the title "boning" -> "bonding") Thanks.
On 2023/11/8 22:19, Eric Dumazet wrote: > On Wed, Nov 8, 2023 at 7:47 AM Haifeng Xu <haifeng.xu@shopee.com> wrote: >> >> call stack: > > These stacks should either be removed from the changelog, or moved > _after_ the description > of the problem. These are normal looking call stacks, you are not > fixing a crash or deadlock. > >> ...... >> PID: 210933 TASK: ffff92424e5ec080 CPU: 13 COMMAND: "kworker/u96:2" >> [ffffa7a8e96bbac0] __schedule at ffffffffb0719898 >> [ffffa7a8e96bbb48] schedule at ffffffffb0719e9e >> [ffffa7a8e96bbb68] rwsem_down_write_slowpath at ffffffffafb3167a >> [ffffa7a8e96bbc00] down_write at ffffffffb071bfc1 >> [ffffa7a8e96bbc18] kernfs_remove_by_name_ns at ffffffffafe3593e >> [ffffa7a8e96bbc48] sysfs_unmerge_group at ffffffffafe38922 >> [ffffa7a8e96bbc68] dpm_sysfs_remove at ffffffffb021c96a >> [ffffa7a8e96bbc80] device_del at ffffffffb0209af8 >> [ffffa7a8e96bbcd0] netdev_unregister_kobject at ffffffffb04a6b0e >> [ffffa7a8e96bbcf8] unregister_netdevice_many at ffffffffb046d3d9 >> [ffffa7a8e96bbd60] default_device_exit_batch at ffffffffb046d8d1 >> [ffffa7a8e96bbdd0] ops_exit_list at ffffffffb045e21d >> [ffffa7a8e96bbe00] cleanup_net at ffffffffb045ea46 >> [ffffa7a8e96bbe60] process_one_work at ffffffffafad94bb >> [ffffa7a8e96bbeb0] worker_thread at ffffffffafad96ad >> [ffffa7a8e96bbf10] kthread at ffffffffafae132a >> [ffffa7a8e96bbf50] ret_from_fork at ffffffffafa04b92 >> >> 290858 PID: 278176 TASK: ffff925deb39a040 CPU: 32 COMMAND: "node-exporter" >> [ffffa7a8d14dbb80] __schedule at ffffffffb0719898 >> [ffffa7a8d14dbc08] schedule at ffffffffb0719e9e >> [ffffa7a8d14dbc28] schedule_preempt_disabled at ffffffffb071a24e >> [ffffa7a8d14dbc38] __mutex_lock at ffffffffb071af28 >> [ffffa7a8d14dbcb8] __mutex_lock_slowpath at ffffffffb071b1a3 >> [ffffa7a8d14dbcc8] mutex_lock at ffffffffb071b1e2 >> [ffffa7a8d14dbce0] rtnl_lock at ffffffffb047f4b5 >> [ffffa7a8d14dbcf0] bonding_show_bonds at ffffffffc079b1a1 [bonding] >> [ffffa7a8d14dbd20] class_attr_show at ffffffffb02117ce >> [ffffa7a8d14dbd30] sysfs_kf_seq_show at ffffffffafe37ba1 >> [ffffa7a8d14dbd50] kernfs_seq_show at ffffffffafe35c07 >> [ffffa7a8d14dbd60] seq_read_iter at ffffffffafd9fce0 >> [ffffa7a8d14dbdc0] kernfs_fop_read_iter at ffffffffafe36a10 >> [ffffa7a8d14dbe00] new_sync_read at ffffffffafd6de23 >> [ffffa7a8d14dbe90] vfs_read at ffffffffafd6e64e >> [ffffa7a8d14dbed0] ksys_read at ffffffffafd70977 >> [ffffa7a8d14dbf10] __x64_sys_read at ffffffffafd70a0a >> [ffffa7a8d14dbf20] do_syscall_64 at ffffffffb070bf1c >> [ffffa7a8d14dbf50] entry_SYSCALL_64_after_hwframe at ffffffffb080007c >> ...... >> >> Problem description: >> >> Thread 210933 holds the rtnl_mutex and tries to acquire the kernfs_rwsem, >> but there are many readers which hold the kernfs_rwsem, so it has to sleep >> for a long time to wait the readers release the lock. Thread 278176 and any >> other threads which call bonding_show_bonds() also need to wait because >> they try to accuire the rtnl_mutex. > > acquire > >> >> bonding_show_bonds() uses rtnl_mutex to protect the bond_list traversal. >> However, the addition and deletion of bond_list are only performed in >> bond_init()/bond_uninit(), so we can intoduce a separate read-write lock > > introduce > >> to synchronize bond list mutation. >> >> What's the benefits of this change? >> >> 1) All threads which call bonding_show_bonds() only wait when the >> registration or unregistration of bond device happens. >> >> 2) There are many other users of rtnl_mutex, so bonding_show_bonds() >> won't compete with them. >> >> In a word, this change reduces the lock contention of rtnl_mutex. >> > > This looks good to me, but please note: > > 1) This is net-next material, please resend next week, because > net-next is currently closed during the merge window. > > 2) Using a spell checker would point few typos (including in the title > "boning" -> "bonding") > > Thanks. Thanks for your review, I 'll send a new patch next week.
s/boning/bonding/ ?
On Wed, 8 Nov 2023 06:46:41 +0000 Haifeng Xu <haifeng.xu@shopee.com> wrote: > call stack: > ...... > PID: 210933 TASK: ffff92424e5ec080 CPU: 13 COMMAND: "kworker/u96:2" > [ffffa7a8e96bbac0] __schedule at ffffffffb0719898 > [ffffa7a8e96bbb48] schedule at ffffffffb0719e9e > [ffffa7a8e96bbb68] rwsem_down_write_slowpath at ffffffffafb3167a > [ffffa7a8e96bbc00] down_write at ffffffffb071bfc1 > [ffffa7a8e96bbc18] kernfs_remove_by_name_ns at ffffffffafe3593e > [ffffa7a8e96bbc48] sysfs_unmerge_group at ffffffffafe38922 > [ffffa7a8e96bbc68] dpm_sysfs_remove at ffffffffb021c96a > [ffffa7a8e96bbc80] device_del at ffffffffb0209af8 > [ffffa7a8e96bbcd0] netdev_unregister_kobject at ffffffffb04a6b0e > [ffffa7a8e96bbcf8] unregister_netdevice_many at ffffffffb046d3d9 > [ffffa7a8e96bbd60] default_device_exit_batch at ffffffffb046d8d1 > [ffffa7a8e96bbdd0] ops_exit_list at ffffffffb045e21d > [ffffa7a8e96bbe00] cleanup_net at ffffffffb045ea46 > [ffffa7a8e96bbe60] process_one_work at ffffffffafad94bb > [ffffa7a8e96bbeb0] worker_thread at ffffffffafad96ad > [ffffa7a8e96bbf10] kthread at ffffffffafae132a > [ffffa7a8e96bbf50] ret_from_fork at ffffffffafa04b92 > > 290858 PID: 278176 TASK: ffff925deb39a040 CPU: 32 COMMAND: "node-exporter" > [ffffa7a8d14dbb80] __schedule at ffffffffb0719898 > [ffffa7a8d14dbc08] schedule at ffffffffb0719e9e > [ffffa7a8d14dbc28] schedule_preempt_disabled at ffffffffb071a24e > [ffffa7a8d14dbc38] __mutex_lock at ffffffffb071af28 > [ffffa7a8d14dbcb8] __mutex_lock_slowpath at ffffffffb071b1a3 > [ffffa7a8d14dbcc8] mutex_lock at ffffffffb071b1e2 > [ffffa7a8d14dbce0] rtnl_lock at ffffffffb047f4b5 > [ffffa7a8d14dbcf0] bonding_show_bonds at ffffffffc079b1a1 [bonding] > [ffffa7a8d14dbd20] class_attr_show at ffffffffb02117ce > [ffffa7a8d14dbd30] sysfs_kf_seq_show at ffffffffafe37ba1 > [ffffa7a8d14dbd50] kernfs_seq_show at ffffffffafe35c07 > [ffffa7a8d14dbd60] seq_read_iter at ffffffffafd9fce0 > [ffffa7a8d14dbdc0] kernfs_fop_read_iter at ffffffffafe36a10 > [ffffa7a8d14dbe00] new_sync_read at ffffffffafd6de23 > [ffffa7a8d14dbe90] vfs_read at ffffffffafd6e64e > [ffffa7a8d14dbed0] ksys_read at ffffffffafd70977 > [ffffa7a8d14dbf10] __x64_sys_read at ffffffffafd70a0a > [ffffa7a8d14dbf20] do_syscall_64 at ffffffffb070bf1c > [ffffa7a8d14dbf50] entry_SYSCALL_64_after_hwframe at ffffffffb080007c > ...... > > Problem description: > > Thread 210933 holds the rtnl_mutex and tries to acquire the kernfs_rwsem, > but there are many readers which hold the kernfs_rwsem, so it has to sleep > for a long time to wait the readers release the lock. Thread 278176 and any > other threads which call bonding_show_bonds() also need to wait because > they try to accuire the rtnl_mutex. > > bonding_show_bonds() uses rtnl_mutex to protect the bond_list traversal. > However, the addition and deletion of bond_list are only performed in > bond_init()/bond_uninit(), so we can intoduce a separate read-write lock > to synchronize bond list mutation. > > What's the benefits of this change? > > 1) All threads which call bonding_show_bonds() only wait when the > registration or unregistration of bond device happens. > > 2) There are many other users of rtnl_mutex, so bonding_show_bonds() > won't compete with them. > > In a word, this change reduces the lock contention of rtnl_mutex. > > Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com> > --- > drivers/net/bonding/bond_main.c | 4 ++++ > drivers/net/bonding/bond_sysfs.c | 6 ++++-- > include/net/bonding.h | 3 +++ > 3 files changed, 11 insertions(+), 2 deletions(-) Reader-writer locks are slower than spin locks and should be discouraged. Would it be possible to use RCU here instead?
On Thu, Nov 9, 2023 at 6:55 PM Stephen Hemminger <stephen@networkplumber.org> wrote: > Reader-writer locks are slower than spin locks and should be discouraged. > Would it be possible to use RCU here instead? I doubt there is a case for repeatedly reading this file ? This 'lock' is only for slow paths, it doesn't really matter what it is.
On 2023/11/9 23:47, Jiri Pirko wrote: > > s/boning/bonding/ > ? Yes, Eric has pointed out the problem in last email. Thanks.
On 2023/11/10 01:55, Stephen Hemminger wrote: > On Wed, 8 Nov 2023 06:46:41 +0000 > Haifeng Xu <haifeng.xu@shopee.com> wrote: > >> call stack: >> ...... >> PID: 210933 TASK: ffff92424e5ec080 CPU: 13 COMMAND: "kworker/u96:2" >> [ffffa7a8e96bbac0] __schedule at ffffffffb0719898 >> [ffffa7a8e96bbb48] schedule at ffffffffb0719e9e >> [ffffa7a8e96bbb68] rwsem_down_write_slowpath at ffffffffafb3167a >> [ffffa7a8e96bbc00] down_write at ffffffffb071bfc1 >> [ffffa7a8e96bbc18] kernfs_remove_by_name_ns at ffffffffafe3593e >> [ffffa7a8e96bbc48] sysfs_unmerge_group at ffffffffafe38922 >> [ffffa7a8e96bbc68] dpm_sysfs_remove at ffffffffb021c96a >> [ffffa7a8e96bbc80] device_del at ffffffffb0209af8 >> [ffffa7a8e96bbcd0] netdev_unregister_kobject at ffffffffb04a6b0e >> [ffffa7a8e96bbcf8] unregister_netdevice_many at ffffffffb046d3d9 >> [ffffa7a8e96bbd60] default_device_exit_batch at ffffffffb046d8d1 >> [ffffa7a8e96bbdd0] ops_exit_list at ffffffffb045e21d >> [ffffa7a8e96bbe00] cleanup_net at ffffffffb045ea46 >> [ffffa7a8e96bbe60] process_one_work at ffffffffafad94bb >> [ffffa7a8e96bbeb0] worker_thread at ffffffffafad96ad >> [ffffa7a8e96bbf10] kthread at ffffffffafae132a >> [ffffa7a8e96bbf50] ret_from_fork at ffffffffafa04b92 >> >> 290858 PID: 278176 TASK: ffff925deb39a040 CPU: 32 COMMAND: "node-exporter" >> [ffffa7a8d14dbb80] __schedule at ffffffffb0719898 >> [ffffa7a8d14dbc08] schedule at ffffffffb0719e9e >> [ffffa7a8d14dbc28] schedule_preempt_disabled at ffffffffb071a24e >> [ffffa7a8d14dbc38] __mutex_lock at ffffffffb071af28 >> [ffffa7a8d14dbcb8] __mutex_lock_slowpath at ffffffffb071b1a3 >> [ffffa7a8d14dbcc8] mutex_lock at ffffffffb071b1e2 >> [ffffa7a8d14dbce0] rtnl_lock at ffffffffb047f4b5 >> [ffffa7a8d14dbcf0] bonding_show_bonds at ffffffffc079b1a1 [bonding] >> [ffffa7a8d14dbd20] class_attr_show at ffffffffb02117ce >> [ffffa7a8d14dbd30] sysfs_kf_seq_show at ffffffffafe37ba1 >> [ffffa7a8d14dbd50] kernfs_seq_show at ffffffffafe35c07 >> [ffffa7a8d14dbd60] seq_read_iter at ffffffffafd9fce0 >> [ffffa7a8d14dbdc0] kernfs_fop_read_iter at ffffffffafe36a10 >> [ffffa7a8d14dbe00] new_sync_read at ffffffffafd6de23 >> [ffffa7a8d14dbe90] vfs_read at ffffffffafd6e64e >> [ffffa7a8d14dbed0] ksys_read at ffffffffafd70977 >> [ffffa7a8d14dbf10] __x64_sys_read at ffffffffafd70a0a >> [ffffa7a8d14dbf20] do_syscall_64 at ffffffffb070bf1c >> [ffffa7a8d14dbf50] entry_SYSCALL_64_after_hwframe at ffffffffb080007c >> ...... >> >> Problem description: >> >> Thread 210933 holds the rtnl_mutex and tries to acquire the kernfs_rwsem, >> but there are many readers which hold the kernfs_rwsem, so it has to sleep >> for a long time to wait the readers release the lock. Thread 278176 and any >> other threads which call bonding_show_bonds() also need to wait because >> they try to accuire the rtnl_mutex. >> >> bonding_show_bonds() uses rtnl_mutex to protect the bond_list traversal. >> However, the addition and deletion of bond_list are only performed in >> bond_init()/bond_uninit(), so we can intoduce a separate read-write lock >> to synchronize bond list mutation. >> >> What's the benefits of this change? >> >> 1) All threads which call bonding_show_bonds() only wait when the >> registration or unregistration of bond device happens. >> >> 2) There are many other users of rtnl_mutex, so bonding_show_bonds() >> won't compete with them. >> >> In a word, this change reduces the lock contention of rtnl_mutex. >> >> Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com> >> --- >> drivers/net/bonding/bond_main.c | 4 ++++ >> drivers/net/bonding/bond_sysfs.c | 6 ++++-- >> include/net/bonding.h | 3 +++ >> 3 files changed, 11 insertions(+), 2 deletions(-) > > Reader-writer locks are slower than spin locks and should be discouraged> Would it be possible to use RCU here instead? In most cases,there are many threads which want to iterate over the bond_list, the registration or unregistration of bond device rarely happens.
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 51d47eda1c87..ac4773d19beb 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -5951,7 +5951,9 @@ static void bond_uninit(struct net_device *bond_dev) bond_set_slave_arr(bond, NULL, NULL); + write_lock(&bonding_dev_lock); list_del(&bond->bond_list); + write_unlock(&bonding_dev_lock); bond_debug_unregister(bond); } @@ -6364,7 +6366,9 @@ static int bond_init(struct net_device *bond_dev) spin_lock_init(&bond->stats_lock); netdev_lockdep_set_classes(bond_dev); + write_lock(&bonding_dev_lock); list_add_tail(&bond->bond_list, &bn->dev_list); + write_unlock(&bonding_dev_lock); bond_prepare_sysfs_group(bond); diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c index 2805135a7205..e107c1d7a6bf 100644 --- a/drivers/net/bonding/bond_sysfs.c +++ b/drivers/net/bonding/bond_sysfs.c @@ -28,6 +28,8 @@ #define to_bond(cd) ((struct bonding *)(netdev_priv(to_net_dev(cd)))) +DEFINE_RWLOCK(bonding_dev_lock); + /* "show" function for the bond_masters attribute. * The class parameter is ignored. */ @@ -40,7 +42,7 @@ static ssize_t bonding_show_bonds(const struct class *cls, int res = 0; struct bonding *bond; - rtnl_lock(); + read_lock(&bonding_dev_lock); list_for_each_entry(bond, &bn->dev_list, bond_list) { if (res > (PAGE_SIZE - IFNAMSIZ)) { @@ -55,7 +57,7 @@ static ssize_t bonding_show_bonds(const struct class *cls, if (res) buf[res-1] = '\n'; /* eat the leftover space */ - rtnl_unlock(); + read_unlock(&bonding_dev_lock); return res; } diff --git a/include/net/bonding.h b/include/net/bonding.h index 5b8b1b644a2d..584ba4b5b8df 100644 --- a/include/net/bonding.h +++ b/include/net/bonding.h @@ -777,6 +777,9 @@ extern struct rtnl_link_ops bond_link_ops; /* exported from bond_sysfs_slave.c */ extern const struct sysfs_ops slave_sysfs_ops; +/* exported from bond_sysfs.c */ +extern rwlock_t bonding_dev_lock; + /* exported from bond_3ad.c */ extern const u8 lacpdu_mcast_addr[];