From patchwork Thu Mar 2 04:32:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Imran Khan X-Patchwork-Id: 63203 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp4041276wrd; Wed, 1 Mar 2023 20:40:46 -0800 (PST) X-Google-Smtp-Source: AK7set/2wHrbldwRE4jUGuj095HpqoyVdBe1Ahbck/v5cGAt3m+KU5h4FIqxiBfNiH7TBFK4874E X-Received: by 2002:a17:906:c516:b0:8de:e66a:ee68 with SMTP id bf22-20020a170906c51600b008dee66aee68mr6416633ejb.35.1677732046163; Wed, 01 Mar 2023 20:40:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1677732046; cv=none; d=google.com; s=arc-20160816; b=uya22V6SJZztByu9yQyLzWaiK11hMM4y91YDDLlU0TsCkeWLU3t7+08/IhQGVtXSa5 PIX5ceT0uZQPswz9dVNZ9rpQ87ucWw6/Mr53wgeLvRckOorYVZEDu1waAvOj7iKNRMJ7 rgqzap5jxqEe84vx7rXGvYfnVY3sWzZVB7cWnGOFou06OuZXUjeoBMS23ITp+/s7ybXT gQ3wK96SsNh3VGa3IVMMcQPt13WpDB65lUToSQxCYv0LER+SstFvmYPSBfJcL16EbjsC CHzoAA71rXO4rZHZcUknzWz7czesYrqo8+SGwWqiJYylKQWCEhMkgA6bO+R24UwZN2uZ XZ/Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=CeyDZipp/ufRAN71ci8Y49vLa1a0i3SMVyGK2lSK0xQ=; b=dY1h5z/aSN3EALO/x6p+m0QL+DyZ0oeAzpIxuEKA1Gw1KLURMIR6hsDRYpPgjksO6l 8gG6YlSYKPfPl7hqvU4vw/V76P2kq/Xffix5A6DsLXzlflW/nIgwEmHcqXJA9iR+ItAI hhbdymw9Js0Cik0fG7FwOCkEkHYFEVRSylc6WhoDwlBp7i8XnjT98RKazjivpOsoJZoA OR8UpND0ymPOEqDirRh+YMg17qfUprwrAhtO5gdUx1odiCbzbqrJLD2T4yzE1yUuIUl0 PvsyTdjBWfOFjfrEIXJUyBBdXL82YJWwlDRMHDJjRYHrhpzgks/glr0qY0x8SRPQCTUZ oipQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2022-7-12 header.b=ST2BkCT8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q14-20020aa7d44e000000b004be9d2d7efasi1991028edr.1.2023.03.01.20.40.23; Wed, 01 Mar 2023 20:40:46 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2022-7-12 header.b=ST2BkCT8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229773AbjCBEc1 (ORCPT + 99 others); Wed, 1 Mar 2023 23:32:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58066 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229706AbjCBEcS (ORCPT ); Wed, 1 Mar 2023 23:32:18 -0500 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 180BD497C0; Wed, 1 Mar 2023 20:32:17 -0800 (PST) Received: from pps.filterd (m0246631.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 321MxWam017246; Thu, 2 Mar 2023 04:32:13 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2022-7-12; bh=CeyDZipp/ufRAN71ci8Y49vLa1a0i3SMVyGK2lSK0xQ=; b=ST2BkCT8EqSF97vNtkMj4OrC94JMxoFTvwAR92V+krhCMGEcCRsBCvn5ogB9dqNki+ND 3mjBmLoUE6OF8FiA157r9P4d7U/QFrBetT3dAIdPXcKlpf6nRO97mwcoMK8w/sD2om8b JmGjDJk3WTHcXZSAIlKux0IhSy2/83vjEmdRnpj2H59FOBG8eFm9PkpktJ1++1rahB32 c0CFjQXKTu78QDZ+73FApi+/KwwUpTksXsH89pnn5ctVQ+vVEnz8gwqM/8H/9jrkOs/G eWuXJ3RkuYWXyCMoN+Z+s39uXB5K/TKI/J0j/kMvahalROdVjg9zg6nt6s9U8WauDGgv +A== Received: from iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta02.appoci.oracle.com [147.154.18.20]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3nybaktnay-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 02 Mar 2023 04:32:13 +0000 Received: from pps.filterd (iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 3222fxje031538; Thu, 2 Mar 2023 04:32:12 GMT Received: from pps.reinject (localhost [127.0.0.1]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3ny8sga7es-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 02 Mar 2023 04:32:12 +0000 Received: from iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 3224W8eU012677; Thu, 2 Mar 2023 04:32:11 GMT Received: from localhost.localdomain (dhcp-10-191-129-161.vpn.oracle.com [10.191.129.161]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTP id 3ny8sga7bn-2; Thu, 02 Mar 2023 04:32:11 +0000 From: Imran Khan To: tj@kernel.org, gregkh@linuxfoundation.org, viro@zeniv.linux.org.uk Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, joe.jin@oracle.com Subject: [PATCH 1/3] kernfs: Introduce separate rwsem to protect inode attributes. Date: Thu, 2 Mar 2023 15:32:01 +1100 Message-Id: <20230302043203.1695051-2-imran.f.khan@oracle.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230302043203.1695051-1-imran.f.khan@oracle.com> References: <20230302043203.1695051-1-imran.f.khan@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-03-02_01,2023-03-01_03,2023-02-09_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 mlxlogscore=942 mlxscore=0 adultscore=0 bulkscore=0 malwarescore=0 phishscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2303020035 X-Proofpoint-GUID: rHqLv8DG_iUjbOFWAUYW5VWRjCPFIX0M X-Proofpoint-ORIG-GUID: rHqLv8DG_iUjbOFWAUYW5VWRjCPFIX0M X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759229557689968204?= X-GMAIL-MSGID: =?utf-8?q?1759229557689968204?= Right now a global per-fs rwsem (kernfs_rwsem) synchronizes multiple kernfs operations. On a large system with few hundred CPUs and few hundred applications simultaneoulsy trying to access sysfs, this results in multiple sys_open(s) contending on kernfs_rwsem via kernfs_iop_permission and kernfs_dop_revalidate. For example on a system with 384 cores, if I run 200 instances of an application which is mostly executing the following loop: for (int loop = 0; loop <100 ; loop++) { for (int port_num = 1; port_num < 2; port_num++) { for (int gid_index = 0; gid_index < 254; gid_index++ ) { char ret_buf[64], ret_buf_lo[64]; char gid_file_path[1024]; int ret_len; int ret_fd; ssize_t ret_rd; ub4 i, saved_errno; memset(ret_buf, 0, sizeof(ret_buf)); memset(gid_file_path, 0, sizeof(gid_file_path)); ret_len = snprintf(gid_file_path, sizeof(gid_file_path), "/sys/class/infiniband/%s/ports/%d/gids/%d", dev_name, port_num, gid_index); ret_fd = open(gid_file_path, O_RDONLY | O_CLOEXEC); if (ret_fd < 0) { printf("Failed to open %s\n", gid_file_path); continue; } /* Read the GID */ ret_rd = read(ret_fd, ret_buf, 40); if (ret_rd == -1) { printf("Failed to read from file %s, errno: %u\n", gid_file_path, saved_errno); continue; } close(ret_fd); } } I see contention around kernfs_rwsem as follows: path_openat | |----link_path_walk.part.0.constprop.0 | | | |--49.92%--inode_permission | | | | | --48.69%--kernfs_iop_permission | | | | | |--18.16%--down_read | | | | | |--15.38%--up_read | | | | | --14.58%--_raw_spin_lock | | | | | ----- | | | |--29.08%--walk_component | | | | | --29.02%--lookup_fast | | | | | |--24.26%--kernfs_dop_revalidate | | | | | | | |--14.97%--down_read | | | | | | | --9.01%--up_read | | | | | --4.74%--__d_lookup | | | | | --4.64%--_raw_spin_lock | | | | | ---- Having a separate per-fs rwsem to protect kernfs inode attributes, will avoid the above mentioned contention and result in better performance as can bee seen below: path_openat | |----link_path_walk.part.0.constprop.0 | | | | | |--27.06%--inode_permission | | | | | --25.84%--kernfs_iop_permission | | | | | |--9.29%--up_read | | | | | |--8.19%--down_read | | | | | --7.89%--_raw_spin_lock | | | | | ---- | | | |--22.42%--walk_component | | | | | --22.36%--lookup_fast | | | | | |--16.07%--__d_lookup | | | | | | | --16.01%--_raw_spin_lock | | | | | | | ---- | | | | | --6.28%--kernfs_dop_revalidate | | | | | |--3.76%--down_read | | | | | --2.26%--up_read As can be seen from the above data the overhead due to both kerfs_iop_permission and kernfs_dop_revalidate have gone down and this also reduces overall run time of the earlier mentioned loop. Signed-off-by: Imran Khan --- fs/kernfs/dir.c | 7 +++++++ fs/kernfs/inode.c | 16 ++++++++-------- fs/kernfs/kernfs-internal.h | 1 + 3 files changed, 16 insertions(+), 8 deletions(-) diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c index ef00b5fe8ceea..953b2717c60e6 100644 --- a/fs/kernfs/dir.c +++ b/fs/kernfs/dir.c @@ -770,12 +770,15 @@ int kernfs_add_one(struct kernfs_node *kn) goto out_unlock; /* Update timestamps on the parent */ + down_write(&root->kernfs_iattr_rwsem); + ps_iattr = parent->iattr; if (ps_iattr) { ktime_get_real_ts64(&ps_iattr->ia_ctime); ps_iattr->ia_mtime = ps_iattr->ia_ctime; } + up_write(&root->kernfs_iattr_rwsem); up_write(&root->kernfs_rwsem); /* @@ -940,6 +943,7 @@ struct kernfs_root *kernfs_create_root(struct kernfs_syscall_ops *scops, idr_init(&root->ino_idr); init_rwsem(&root->kernfs_rwsem); + init_rwsem(&root->kernfs_iattr_rwsem); INIT_LIST_HEAD(&root->supers); /* @@ -1462,11 +1466,14 @@ static void __kernfs_remove(struct kernfs_node *kn) pos->parent ? pos->parent->iattr : NULL; /* update timestamps on the parent */ + down_write(&kernfs_root(kn)->kernfs_iattr_rwsem); + if (ps_iattr) { ktime_get_real_ts64(&ps_iattr->ia_ctime); ps_iattr->ia_mtime = ps_iattr->ia_ctime; } + up_write(&kernfs_root(kn)->kernfs_iattr_rwsem); kernfs_put(pos); } diff --git a/fs/kernfs/inode.c b/fs/kernfs/inode.c index 30494dcb0df34..b22b74d1a1150 100644 --- a/fs/kernfs/inode.c +++ b/fs/kernfs/inode.c @@ -101,9 +101,9 @@ int kernfs_setattr(struct kernfs_node *kn, const struct iattr *iattr) int ret; struct kernfs_root *root = kernfs_root(kn); - down_write(&root->kernfs_rwsem); + down_write(&root->kernfs_iattr_rwsem); ret = __kernfs_setattr(kn, iattr); - up_write(&root->kernfs_rwsem); + up_write(&root->kernfs_iattr_rwsem); return ret; } @@ -119,7 +119,7 @@ int kernfs_iop_setattr(struct mnt_idmap *idmap, struct dentry *dentry, return -EINVAL; root = kernfs_root(kn); - down_write(&root->kernfs_rwsem); + down_write(&root->kernfs_iattr_rwsem); error = setattr_prepare(&nop_mnt_idmap, dentry, iattr); if (error) goto out; @@ -132,7 +132,7 @@ int kernfs_iop_setattr(struct mnt_idmap *idmap, struct dentry *dentry, setattr_copy(&nop_mnt_idmap, inode, iattr); out: - up_write(&root->kernfs_rwsem); + up_write(&root->kernfs_iattr_rwsem); return error; } @@ -189,10 +189,10 @@ int kernfs_iop_getattr(struct mnt_idmap *idmap, struct kernfs_node *kn = inode->i_private; struct kernfs_root *root = kernfs_root(kn); - down_read(&root->kernfs_rwsem); + down_read(&root->kernfs_iattr_rwsem); kernfs_refresh_inode(kn, inode); generic_fillattr(&nop_mnt_idmap, inode, stat); - up_read(&root->kernfs_rwsem); + up_read(&root->kernfs_iattr_rwsem); return 0; } @@ -285,10 +285,10 @@ int kernfs_iop_permission(struct mnt_idmap *idmap, kn = inode->i_private; root = kernfs_root(kn); - down_read(&root->kernfs_rwsem); + down_read(&root->kernfs_iattr_rwsem); kernfs_refresh_inode(kn, inode); ret = generic_permission(&nop_mnt_idmap, inode, mask); - up_read(&root->kernfs_rwsem); + up_read(&root->kernfs_iattr_rwsem); return ret; } diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h index 236c3a6113f1e..3297093c920de 100644 --- a/fs/kernfs/kernfs-internal.h +++ b/fs/kernfs/kernfs-internal.h @@ -47,6 +47,7 @@ struct kernfs_root { wait_queue_head_t deactivate_waitq; struct rw_semaphore kernfs_rwsem; + struct rw_semaphore kernfs_iattr_rwsem; }; /* +1 to avoid triggering overflow warning when negating it */ From patchwork Thu Mar 2 04:32:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Imran Khan X-Patchwork-Id: 63204 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp4041476wrd; Wed, 1 Mar 2023 20:41:30 -0800 (PST) X-Google-Smtp-Source: AK7set+6UQw/M+W3fSdrXGt2uU6qdXw/JAVMaMXbH+op3EhTSGnCFUBLcHbjCRcIfq2QObycMhgW X-Received: by 2002:a05:6a20:d49b:b0:cb:6e5d:6ce0 with SMTP id im27-20020a056a20d49b00b000cb6e5d6ce0mr9605428pzb.37.1677732089866; Wed, 01 Mar 2023 20:41:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1677732089; cv=none; d=google.com; s=arc-20160816; b=OKr/fbx+sl11i1OXU/ctC01KqutUW15gPSY3AUHu/5SyMKQ7v8Nr6vikI8eNiy0ctr KzXf1ORVdqm6MZ83v6MKDZ9btmZtIjXXo8HHkXVvJJqmY/D6yjKb5YoPVM8BYAbnjJY9 PJxjPi5z6GTCZ5cA/V3JCP+xiKi8tcJHhMTKXqhiQzOqUIcz4woigDyGFVwKfRmnli0U 0hLYqqidSt5Et/K/O89cpyp/EIqVpAerrfBTQ+jYpuoSu7gY2337dg8ZXB9bmwJo/m4K HOP5qqBJA9iJsUeIOaDgoOWMimTd9F20/YrDm2QrFskjaJtZ2vahr9Y82G8JWsQ1RV/X BKvA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=TnmHeNRcAoyXl5CEEGM2b0RdtGuW/vrovhZ7nMWGPa0=; b=rz5j/jaAB+rTu9QI99/8mmXo98qV+ggHGA5C8PIWP1nHpq4BGbbSol5PanATwltO7/ Aiuhzk0FAbLZctIdhdk7f/xl9dRgG+2Qa46UhRv/ohDnLgPrUcoEYlEWfRSYfB0AYW7c REm+0GYgMbhCwVvQ99kwnrovwWdBWXVAyjtRTku0LPhBX6bIdro8vTA5Ws5Y+XVyC4ra hTcJR8Lj10pcfCvtIfaLOlCJ3kCsnnD4WtmCf+VVwrnOfayovIccg3PuOdIOJEc6GjVE vmvpGZ7KPaUkFlDiuxxONfnyXCtpigdHQrT1Q9rkvaPHEkWemiGDoUDZR7rkMrWf7MgN 0gUA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2022-7-12 header.b=2gOGmw3W; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c80-20020a621c53000000b00576c9c3c4aasi13527451pfc.5.2023.03.01.20.41.16; Wed, 01 Mar 2023 20:41:29 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2022-7-12 header.b=2gOGmw3W; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229600AbjCBEdB (ORCPT + 99 others); Wed, 1 Mar 2023 23:33:01 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58292 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229761AbjCBEc0 (ORCPT ); Wed, 1 Mar 2023 23:32:26 -0500 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CAC484989B; Wed, 1 Mar 2023 20:32:22 -0800 (PST) Received: from pps.filterd (m0246617.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 321MuxpQ010778; Thu, 2 Mar 2023 04:32:18 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2022-7-12; bh=TnmHeNRcAoyXl5CEEGM2b0RdtGuW/vrovhZ7nMWGPa0=; b=2gOGmw3Wh0CLC7D2pE1j5DHV7xeIlF0qqJR/e8+rBMWqHBO4lL2cFG9eriTgocpPDkaO rDHEzJCB5blI11WLf/+3E4diSRpDf8DsnRMttn47zhqEnW+7fb4ASMjcvwd8LjO2e+Tl snt5n1xfm4/ZyWmiDR8J26VURmdYlmG9iLGWOf8XT/OsjaEq5ffNVmmz1SY3tT1nxdAq As80PfpE7lj0amVB3E/0r4sk+Kr5Iaesd+EaR56HCy8LbTvj9RDA/LPfGYnkWf+nz4bR ePgjIixaF/LS7R/CPkjEJIoecQabd6pOiZ/fvT0m9bjH4BAQp2XHlI4NLxbHT+ORZLPO 5A== Received: from iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta02.appoci.oracle.com [147.154.18.20]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3nybb2jnht-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 02 Mar 2023 04:32:18 +0000 Received: from pps.filterd (iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 3223Grw5031559; Thu, 2 Mar 2023 04:32:15 GMT Received: from pps.reinject (localhost [127.0.0.1]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3ny8sga7g1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 02 Mar 2023 04:32:15 +0000 Received: from iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 3224W8eW012677; Thu, 2 Mar 2023 04:32:15 GMT Received: from localhost.localdomain (dhcp-10-191-129-161.vpn.oracle.com [10.191.129.161]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTP id 3ny8sga7bn-3; Thu, 02 Mar 2023 04:32:14 +0000 From: Imran Khan To: tj@kernel.org, gregkh@linuxfoundation.org, viro@zeniv.linux.org.uk Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, joe.jin@oracle.com Subject: [PATCH 2/3] kernfs: Use a per-fs rwsem to protect per-fs list of kernfs_super_info. Date: Thu, 2 Mar 2023 15:32:02 +1100 Message-Id: <20230302043203.1695051-3-imran.f.khan@oracle.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230302043203.1695051-1-imran.f.khan@oracle.com> References: <20230302043203.1695051-1-imran.f.khan@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-03-02_01,2023-03-01_03,2023-02-09_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 mlxlogscore=999 mlxscore=0 adultscore=0 bulkscore=0 malwarescore=0 phishscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2303020035 X-Proofpoint-GUID: dxhkk4ncYBTkcogIzjvu99Hbi9OzkoaW X-Proofpoint-ORIG-GUID: dxhkk4ncYBTkcogIzjvu99Hbi9OzkoaW X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759229603498917442?= X-GMAIL-MSGID: =?utf-8?q?1759229603498917442?= Right now per-fs kernfs_rwsem protects list of kernfs_super_info instances for a kernfs_root. Since kernfs_rwsem is used to synchronize several other operations across kernfs and since most of these operations don't impact kernfs_super_info, we can use a separate per-fs rwsem to synchronize access to list of kernfs_super_info. This helps in reducing contention around kernfs_rwsem and also allows operations that change/access list of kernfs_super_info to proceed without contending for kernfs_rwsem. Signed-off-by: Imran Khan --- fs/kernfs/dir.c | 1 + fs/kernfs/file.c | 2 ++ fs/kernfs/kernfs-internal.h | 1 + fs/kernfs/mount.c | 8 ++++---- 4 files changed, 8 insertions(+), 4 deletions(-) diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c index 953b2717c60e6..2cdb8516e5287 100644 --- a/fs/kernfs/dir.c +++ b/fs/kernfs/dir.c @@ -944,6 +944,7 @@ struct kernfs_root *kernfs_create_root(struct kernfs_syscall_ops *scops, idr_init(&root->ino_idr); init_rwsem(&root->kernfs_rwsem); init_rwsem(&root->kernfs_iattr_rwsem); + init_rwsem(&root->kernfs_supers_rwsem); INIT_LIST_HEAD(&root->supers); /* diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c index e4a50e4ff0d23..b84cf0cd4bd44 100644 --- a/fs/kernfs/file.c +++ b/fs/kernfs/file.c @@ -924,6 +924,7 @@ static void kernfs_notify_workfn(struct work_struct *work) /* kick fsnotify */ down_write(&root->kernfs_rwsem); + down_write(&root->kernfs_supers_rwsem); list_for_each_entry(info, &kernfs_root(kn)->supers, node) { struct kernfs_node *parent; struct inode *p_inode = NULL; @@ -960,6 +961,7 @@ static void kernfs_notify_workfn(struct work_struct *work) iput(inode); } + up_write(&root->kernfs_supers_rwsem); up_write(&root->kernfs_rwsem); kernfs_put(kn); goto repeat; diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h index 3297093c920de..a9b854cdfdb5f 100644 --- a/fs/kernfs/kernfs-internal.h +++ b/fs/kernfs/kernfs-internal.h @@ -48,6 +48,7 @@ struct kernfs_root { wait_queue_head_t deactivate_waitq; struct rw_semaphore kernfs_rwsem; struct rw_semaphore kernfs_iattr_rwsem; + struct rw_semaphore kernfs_supers_rwsem; }; /* +1 to avoid triggering overflow warning when negating it */ diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c index e08e8d9998070..d49606accb07b 100644 --- a/fs/kernfs/mount.c +++ b/fs/kernfs/mount.c @@ -351,9 +351,9 @@ int kernfs_get_tree(struct fs_context *fc) } sb->s_flags |= SB_ACTIVE; - down_write(&root->kernfs_rwsem); + down_write(&root->kernfs_supers_rwsem); list_add(&info->node, &info->root->supers); - up_write(&root->kernfs_rwsem); + up_write(&root->kernfs_supers_rwsem); } fc->root = dget(sb->s_root); @@ -380,9 +380,9 @@ void kernfs_kill_sb(struct super_block *sb) struct kernfs_super_info *info = kernfs_info(sb); struct kernfs_root *root = info->root; - down_write(&root->kernfs_rwsem); + down_write(&root->kernfs_supers_rwsem); list_del(&info->node); - up_write(&root->kernfs_rwsem); + up_write(&root->kernfs_supers_rwsem); /* * Remove the superblock from fs_supers/s_instances From patchwork Thu Mar 2 04:32:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Imran Khan X-Patchwork-Id: 63206 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp4054782wrd; Wed, 1 Mar 2023 21:24:37 -0800 (PST) X-Google-Smtp-Source: AK7set8NE4U92GTycgBO/uaHvrVLIjMWGuoqoITr2uxi2375LM2aksGveFXFaT8N3X8KVLDr20dX X-Received: by 2002:a17:906:b08e:b0:8eb:27de:240e with SMTP id x14-20020a170906b08e00b008eb27de240emr8214276ejy.13.1677734677779; Wed, 01 Mar 2023 21:24:37 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1677734677; cv=none; d=google.com; s=arc-20160816; b=OwQk3kMQYK2/tMMrdIaVWo7U7fPGS4kT73zJDaXr7lwQoTLcCreRQDlvMj+Ro/0MTA XZ0A9VSCNw+s3DpG+C5xFQiE3MBdjU61eo/TEp2C691vwl+sScba/nD2dUph9V0HCEjp IQDNSKofgd/r9rTsauPoCib95wXhDuVZse9EgN5mZzsEggqu/LozcvWALlKvVE1vP8e6 NgIlgjxE9ckIjk0v647sAWKkWB0Wt8BDWk2fQR754xw95h/eQgK8AnINsJ3n0FWMX4SF 9kSPZmB/CI7QocYdHL0ZHpuIAUIYEEBuf6LTRLxkqiJFanw04m7BHDoFzaLBBP/akYiO 8nzQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=YN/Ch07oSsUOLr3lFWqb7Mfe8tp5H4NhoNi2kvbRIPc=; b=rvGnu9aVJ8W4XzIHdLj+7iI0sKiQhROGzhYYp/l7x37wrjsR6fqTHmdgdDCYeMFIz4 tPGeG7wsVgb/EY1OuoJkDAEBe1JMQpUQ6opmZYApIU3gYQ6QM0UbsSY945VBax6CyOUb UuSX2p2Sznv27dcHnFhatJISuBiKRMPbYr9rO2US6FFsSvJMW1sug25JEF7pxOtdKJ9t G1Xy3mq/gFU24IQE89m9hW0DdqkYC+87NBGLSA03k6ocWIptmP29NdDmn633x0vTvmOS Kgqyu4BXO0TdT+pmljt2eC5hLGiDOMQ1qHbBzVqMz+PRDrdRhzufubeZMGJcOdnHKWfE AM+w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2022-7-12 header.b="s//Pg01S"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id qx9-20020a170906fcc900b008b9b135aef1si1582820ejb.997.2023.03.01.21.24.14; Wed, 01 Mar 2023 21:24:37 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2022-7-12 header.b="s//Pg01S"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229802AbjCBEcb (ORCPT + 99 others); Wed, 1 Mar 2023 23:32:31 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58278 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229739AbjCBEc0 (ORCPT ); Wed, 1 Mar 2023 23:32:26 -0500 Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0581C4A1FC; Wed, 1 Mar 2023 20:32:23 -0800 (PST) Received: from pps.filterd (m0246617.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 321MuxpR010778; Thu, 2 Mar 2023 04:32:20 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=corp-2022-7-12; bh=YN/Ch07oSsUOLr3lFWqb7Mfe8tp5H4NhoNi2kvbRIPc=; b=s//Pg01Sz8r4OWSXfIpCwmdlGJzXXlfDsHX3/dZyRC9uqemlp7j7tegI4ZtctDyjGEVz Ec6iXuTKnE74uxEzl3A42ToGB5BX4OERHD0ujGXLBYoo5vQtd3puQewV6D5YQKj7Ziqj jO3dfZelirwJfzoocqtM63rwxo7xYbF55uwqlNC8o7DrXiu3XENm3ZsdkPGfR7fqeLaG OSnws33h+/nw8tNeTUabXfcBx0G/R2lU3Kf7FEt0y9exTwE0wSnGIZKYtrPdI6ihTe/C Zd+EWxsTSLyDhL3zrN5iG9UFu75EvmHFajXwr6UgS6jfgCLHlJx9FImmarT+/1VQnM8I Bw== Received: from iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta02.appoci.oracle.com [147.154.18.20]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3nybb2jnhu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 02 Mar 2023 04:32:20 +0000 Received: from pps.filterd (iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 3222fxjh031538; Thu, 2 Mar 2023 04:32:18 GMT Received: from pps.reinject (localhost [127.0.0.1]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3ny8sga7hq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 02 Mar 2023 04:32:18 +0000 Received: from iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 3224W8eY012677; Thu, 2 Mar 2023 04:32:18 GMT Received: from localhost.localdomain (dhcp-10-191-129-161.vpn.oracle.com [10.191.129.161]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTP id 3ny8sga7bn-4; Thu, 02 Mar 2023 04:32:17 +0000 From: Imran Khan To: tj@kernel.org, gregkh@linuxfoundation.org, viro@zeniv.linux.org.uk Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, joe.jin@oracle.com Subject: [PATCH 3/3] kernfs: change kernfs_rename_lock into a read-write lock. Date: Thu, 2 Mar 2023 15:32:03 +1100 Message-Id: <20230302043203.1695051-4-imran.f.khan@oracle.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230302043203.1695051-1-imran.f.khan@oracle.com> References: <20230302043203.1695051-1-imran.f.khan@oracle.com> MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-03-02_01,2023-03-01_03,2023-02-09_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 mlxlogscore=999 mlxscore=0 adultscore=0 bulkscore=0 malwarescore=0 phishscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2303020035 X-Proofpoint-GUID: KApAJvKGxjQeBdVv7hOdaI69gaix-Q-T X-Proofpoint-ORIG-GUID: KApAJvKGxjQeBdVv7hOdaI69gaix-Q-T X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759232317443454127?= X-GMAIL-MSGID: =?utf-8?q?1759232317443454127?= kernfs_rename_lock protects a node's ->parent and thus kernfs topology. Thus it can be used in cases that rely on a stable kernfs topology. Change it to a read-write lock for better scalability. Suggested by: Al Viro Signed-off-by: Imran Khan Reviewed-by: Matthew Wilcox (Oracle) --- fs/kernfs/dir.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c index 2cdb8516e5287..06e27b36216fe 100644 --- a/fs/kernfs/dir.c +++ b/fs/kernfs/dir.c @@ -17,7 +17,7 @@ #include "kernfs-internal.h" -static DEFINE_SPINLOCK(kernfs_rename_lock); /* kn->parent and ->name */ +static DEFINE_RWLOCK(kernfs_rename_lock); /* kn->parent and ->name */ /* * Don't use rename_lock to piggy back on pr_cont_buf. We don't want to * call pr_cont() while holding rename_lock. Because sometimes pr_cont() @@ -196,9 +196,9 @@ int kernfs_name(struct kernfs_node *kn, char *buf, size_t buflen) unsigned long flags; int ret; - spin_lock_irqsave(&kernfs_rename_lock, flags); + read_lock_irqsave(&kernfs_rename_lock, flags); ret = kernfs_name_locked(kn, buf, buflen); - spin_unlock_irqrestore(&kernfs_rename_lock, flags); + read_unlock_irqrestore(&kernfs_rename_lock, flags); return ret; } @@ -224,9 +224,9 @@ int kernfs_path_from_node(struct kernfs_node *to, struct kernfs_node *from, unsigned long flags; int ret; - spin_lock_irqsave(&kernfs_rename_lock, flags); + read_lock_irqsave(&kernfs_rename_lock, flags); ret = kernfs_path_from_node_locked(to, from, buf, buflen); - spin_unlock_irqrestore(&kernfs_rename_lock, flags); + read_unlock_irqrestore(&kernfs_rename_lock, flags); return ret; } EXPORT_SYMBOL_GPL(kernfs_path_from_node); @@ -294,10 +294,10 @@ struct kernfs_node *kernfs_get_parent(struct kernfs_node *kn) struct kernfs_node *parent; unsigned long flags; - spin_lock_irqsave(&kernfs_rename_lock, flags); + read_lock_irqsave(&kernfs_rename_lock, flags); parent = kn->parent; kernfs_get(parent); - spin_unlock_irqrestore(&kernfs_rename_lock, flags); + read_unlock_irqrestore(&kernfs_rename_lock, flags); return parent; } @@ -1731,7 +1731,7 @@ int kernfs_rename_ns(struct kernfs_node *kn, struct kernfs_node *new_parent, kernfs_get(new_parent); /* rename_lock protects ->parent and ->name accessors */ - spin_lock_irq(&kernfs_rename_lock); + write_lock_irq(&kernfs_rename_lock); old_parent = kn->parent; kn->parent = new_parent; @@ -1742,7 +1742,7 @@ int kernfs_rename_ns(struct kernfs_node *kn, struct kernfs_node *new_parent, kn->name = new_name; } - spin_unlock_irq(&kernfs_rename_lock); + write_unlock_irq(&kernfs_rename_lock); kn->hash = kernfs_name_hash(kn->name, kn->ns); kernfs_link_sibling(kn);