From patchwork Wed Dec 6 06:05:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 174314 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp3907779vqy; Tue, 5 Dec 2023 22:07:30 -0800 (PST) X-Google-Smtp-Source: AGHT+IGu/hTlg1fuG4YnMW3zkz1KLO14OOl7D6BI5VLI4Oq5xcBfu813xVqcl6Iv0TfyL6ysmq6B X-Received: by 2002:a05:6808:2008:b0:3b8:6151:cacb with SMTP id q8-20020a056808200800b003b86151cacbmr439345oiw.10.1701842850684; Tue, 05 Dec 2023 22:07:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701842850; cv=none; d=google.com; s=arc-20160816; b=Hdvb+44WwG5MFIFvBQHiszquE/AdC1KYbZu7Lk+vy/aCheb/Vk3truv+JUrZzqCLfA AEb1ZbCJ+LjdTHLPkDH54/vj+fLgFnKBJV+iRcnkjyKWhJsHIsgDrya/Egj4j+DV9nl1 jRpf3z2Q8d4VWceaSV/ap6kkQpKqIS2MM4RbCR8U4UKHXM+zbzrNQlcaqSvO2WrDrLot LX8H/losLWMmaUVJ22YwrWNOtCEt4HU+iul38Outo/3gg1U2V2zfeOEFrHXQUSwuFMgw kACvfeA6z+A1/5dWNRE4Sv3fasjSdTyr7uuZMDfe5WYmX3zEorlgM6Isidyict35+glR jJAw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=LLgME4lRnn5RupWlODhqkby7STuCvfx+W1FcT+Yq4Ko=; fh=CW5pnDTDkJR8Vn9NaUbLrrp28PUVapFEjMCn+HkS0lE=; b=dG6UbmSuvYliyMkG7VLE4t36BsK3Ur1xjDI0jvcYU4FfrWLVoDDdsZJ0Gw8d8GnjhP 8o2iRzzLsYhG8P0EZTJz63Budl6ivPMKEYLovM7RcRIsDrK6ARs2rTEaUC/yYtJoiOQl pxdPcbI17bPks5RkPh9eg3TVhmRaBppYl7ivP8hY3ACHZDzVzAT3TfcWvszLV3NPszOu xzt1v+4ry9owXrCiM39M4U+geARJ1l+ghUcr2PfMCtebqwh2BNlmX3hBtwD3p8icO+aN LTdR2ukfe+ZDZCp1pMWg9NruRwmbWSkiRKeG8eTtfFeU+GKHNFb2F+WmoAr4aUvb7O6i lYHQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=ZX8kJzEy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Received: from morse.vger.email (morse.vger.email. [23.128.96.31]) by mx.google.com with ESMTPS id k26-20020a63ff1a000000b005be03f0da65si10767666pgi.720.2023.12.05.22.07.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 22:07:30 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) client-ip=23.128.96.31; Authentication-Results: mx.google.com; dkim=pass header.i=@fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=ZX8kJzEy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by morse.vger.email (Postfix) with ESMTP id F3D2882072C5; Tue, 5 Dec 2023 22:07:26 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at morse.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1377046AbjLFGHO (ORCPT + 99 others); Wed, 6 Dec 2023 01:07:14 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49910 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1376876AbjLFGGj (ORCPT ); Wed, 6 Dec 2023 01:06:39 -0500 Received: from mail-pf1-x434.google.com (mail-pf1-x434.google.com [IPv6:2607:f8b0:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 664F8D5E for ; Tue, 5 Dec 2023 22:06:37 -0800 (PST) Received: by mail-pf1-x434.google.com with SMTP id d2e1a72fcca58-6cdd214bce1so6814782b3a.3 for ; Tue, 05 Dec 2023 22:06:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1701842796; x=1702447596; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=LLgME4lRnn5RupWlODhqkby7STuCvfx+W1FcT+Yq4Ko=; b=ZX8kJzEywvwMHMHSBXs8pDHad2w9X2rYKkF9KWT1xg9TfusLKPZWhNsPufRl2PF2bc WG2Kfkvxqhizuyh7YT3Zg9hkAbDsrX4IxPIj7iUb5w0Tww3WxrqU8TuXqLTm8RvGObaO DM0RZDSqIteLYvUnentc4eBGbVGlV+HqUmJcvn+rbbOuCCAgKDr9fR3gtEqzUs2LiSJd 9z2t6xEpTbd7zogOa5WEwXxIfHDuja4A3xWqZ34HtqmzpLeZStcC7NeKCh+Q046J1201 2Z/KwpljIQwVuA8aMoy12q7lOw7CF8Dnqy16KHS0sGlHFSgEyLcHpT3yqgfYZhoZekda eFYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701842796; x=1702447596; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LLgME4lRnn5RupWlODhqkby7STuCvfx+W1FcT+Yq4Ko=; b=VvCskJQ7nUc1GSgIEjkmMZIxkgOUaXH9x/6hI8+QwkzIIE/kusOucKGiANxiwnxu5d Vfv5VGRcQy5cQJo+8TQIHGEA6RHz6GWRyTCpFQjCHqf5d7W9CJrA1K6IDrN2uBxHVmMz q7VMW0LSe9EkBUtNUCeqFNdd3yCTzq99xWAjVIjNi3q8/YVTGLOykpyvzUtuY+/f+8QE +S4bwLMTcUPkVVN0lOZFb+O96HezXG4hggyZwwtNJxuiz2N/ZZNzN+UpZcbUH9jnE9bY IefTUvAUout+vJu+dKbtn1VQ0LPJVbTJ0Xvo5xs9C40/2OOmwj3D24Db/6UnCVF//2x6 amcg== X-Gm-Message-State: AOJu0YxPKoIPlfAp2DwLjnKz29LQxR73v9y0PdIB5POLmJ2vr4xKDYyE fj/bkJtPtsZGZJHJUmx/JLUOyg== X-Received: by 2002:a05:6a20:4308:b0:18f:97c:8a1d with SMTP id h8-20020a056a20430800b0018f097c8a1dmr467536pzk.72.1701842796169; Tue, 05 Dec 2023 22:06:36 -0800 (PST) Received: from dread.disaster.area (pa49-180-125-5.pa.nsw.optusnet.com.au. [49.180.125.5]) by smtp.gmail.com with ESMTPSA id w18-20020a63af12000000b005b32d6b4f2fsm6965686pge.81.2023.12.05.22.06.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 22:06:34 -0800 (PST) Received: from [192.168.253.23] (helo=devoid.disaster.area) by dread.disaster.area with esmtp (Exim 4.96) (envelope-from ) id 1rAl3H-004VOa-2d; Wed, 06 Dec 2023 17:06:31 +1100 Received: from dave by devoid.disaster.area with local (Exim 4.97-RC0) (envelope-from ) id 1rAl3H-0000000BrV2-1eky; Wed, 06 Dec 2023 17:06:31 +1100 From: Dave Chinner To: linux-fsdevel@vger.kernel.org Cc: linux-block@vger.kernel.org, linux-cachefs@redhat.com, dhowells@redhat.com, gfs2@lists.linux.dev, dm-devel@lists.linux.dev, linux-security-module@vger.kernel.org, selinux@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 01/11] lib/dlock-list: Distributed and lock-protected lists Date: Wed, 6 Dec 2023 17:05:30 +1100 Message-ID: <20231206060629.2827226-2-david@fromorbit.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20231206060629.2827226-1-david@fromorbit.com> References: <20231206060629.2827226-1-david@fromorbit.com> MIME-Version: 1.0 X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on morse.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (morse.vger.email [0.0.0.0]); Tue, 05 Dec 2023 22:07:27 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784511568603483188 X-GMAIL-MSGID: 1784511568603483188 From: Waiman Long Linked list is used everywhere in the Linux kernel. However, if many threads are trying to add or delete entries into the same linked list, it can create a performance bottleneck. This patch introduces a new list APIs that provide a set of distributed lists (one per CPU), each of which is protected by its own spinlock. To the callers, however, the set of lists acts like a single consolidated list. This allows list entries insertion and deletion operations to happen in parallel instead of being serialized with a global list and lock. List entry insertion is strictly per cpu. List deletion, however, can happen in a cpu other than the one that did the insertion. So we still need lock to protect the list. Because of that, there may still be a small amount of contention when deletion is being done. A new header file include/linux/dlock-list.h will be added with the associated dlock_list_head and dlock_list_node structures. The following functions are provided to manage the per-cpu list: 1. int alloc_dlock_list_heads(struct dlock_list_heads *dlist) 2. void free_dlock_list_heads(struct dlock_list_heads *dlist) 3. void dlock_list_add(struct dlock_list_node *node, struct dlock_list_heads *dlist) 4. void dlock_list_del(struct dlock_list *node) Iteration of all the list entries within a dlock list array is done by calling either the dlist_for_each_entry() or dlist_for_each_entry_safe() macros. They correspond to the list_for_each_entry() and list_for_each_entry_safe() macros respectively. The iteration states are keep in a dlock_list_iter structure that is passed to the iteration macros. Signed-off-by: Waiman Long Reviewed-by: Jan Kara --- include/linux/dlock-list.h | 242 +++++++++++++++++++++++++++++++++++++ lib/Makefile | 2 +- lib/dlock-list.c | 234 +++++++++++++++++++++++++++++++++++ 3 files changed, 477 insertions(+), 1 deletion(-) create mode 100644 include/linux/dlock-list.h create mode 100644 lib/dlock-list.c diff --git a/include/linux/dlock-list.h b/include/linux/dlock-list.h new file mode 100644 index 000000000000..327cb9edc7e3 --- /dev/null +++ b/include/linux/dlock-list.h @@ -0,0 +1,242 @@ +/* + * Distributed and locked list + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * (C) Copyright 2016 Hewlett-Packard Enterprise Development LP + * (C) Copyright 2017-2018 Red Hat, Inc. + * + * Authors: Waiman Long + */ +#ifndef __LINUX_DLOCK_LIST_H +#define __LINUX_DLOCK_LIST_H + +#include +#include + +/* + * include/linux/dlock-list.h + * + * The dlock_list_head structure contains the spinlock. It is cacheline + * aligned to reduce contention among different CPUs. The other + * dlock_list_node structures contains a pointer to the head entry instead. + */ +struct dlock_list_head { + struct list_head list; + spinlock_t lock; +} ____cacheline_aligned_in_smp; + +struct dlock_list_heads { + struct dlock_list_head *heads; +}; + +/* + * dlock list node data structure + */ +struct dlock_list_node { + struct list_head list; + struct dlock_list_head *head; +}; + +/* + * dlock list iteration state + * + * This is an opaque data structure that may change. Users of this structure + * should not access the structure members directly other than using the + * helper functions and macros provided in this header file. + */ +struct dlock_list_iter { + int index; + struct dlock_list_head *head, *entry; +}; + +#define DLOCK_LIST_ITER_INIT(dlist) \ + { \ + .index = -1, \ + .head = (dlist)->heads, \ + } + +#define DEFINE_DLOCK_LIST_ITER(s, heads) \ + struct dlock_list_iter s = DLOCK_LIST_ITER_INIT(heads) + +static inline void init_dlock_list_iter(struct dlock_list_iter *iter, + struct dlock_list_heads *heads) +{ + *iter = (struct dlock_list_iter)DLOCK_LIST_ITER_INIT(heads); +} + +#define DLOCK_LIST_NODE_INIT(name) \ + { \ + .list = LIST_HEAD_INIT(name) \ + } + +static inline void init_dlock_list_node(struct dlock_list_node *node) +{ + *node = (struct dlock_list_node)DLOCK_LIST_NODE_INIT(node->list); +} + +/** + * dlock_list_unlock - unlock the spinlock that protects the current list + * @iter: Pointer to the dlock list iterator structure + */ +static inline void dlock_list_unlock(struct dlock_list_iter *iter) +{ + spin_unlock(&iter->entry->lock); +} + +/** + * dlock_list_relock - lock the spinlock that protects the current list + * @iter: Pointer to the dlock list iterator structure + */ +static inline void dlock_list_relock(struct dlock_list_iter *iter) +{ + spin_lock(&iter->entry->lock); +} + +/* + * Allocation and freeing of dlock list + */ +extern int __alloc_dlock_list_heads(struct dlock_list_heads *dlist, + struct lock_class_key *key); +extern void free_dlock_list_heads(struct dlock_list_heads *dlist); + +/** + * alloc_dlock_list_head - Initialize and allocate the list of head entries. + * @dlist : Pointer to the dlock_list_heads structure to be initialized + * Return : 0 if successful, -ENOMEM if memory allocation error + */ +#define alloc_dlock_list_heads(dlist) \ +({ \ + static struct lock_class_key _key; \ + __alloc_dlock_list_heads(dlist, &_key); \ +}) + +/* + * Check if a dlock list is empty or not. + */ +extern bool dlock_lists_empty(struct dlock_list_heads *dlist); + +/* + * The dlock list addition and deletion functions here are not irq-safe. + * Special irq-safe variants will have to be added if we need them. + */ +extern void dlock_lists_add(struct dlock_list_node *node, + struct dlock_list_heads *dlist); +extern void dlock_lists_del(struct dlock_list_node *node); + +/* + * Find the first entry of the next available list. + */ +extern struct dlock_list_node * +__dlock_list_next_list(struct dlock_list_iter *iter); + +/** + * __dlock_list_next_entry - Iterate to the next entry of the dlock list + * @curr : Pointer to the current dlock_list_node structure + * @iter : Pointer to the dlock list iterator structure + * Return: Pointer to the next entry or NULL if all the entries are iterated + * + * The iterator has to be properly initialized before calling this function. + */ +static inline struct dlock_list_node * +__dlock_list_next_entry(struct dlock_list_node *curr, + struct dlock_list_iter *iter) +{ + /* + * Find next entry + */ + if (curr) + curr = list_next_entry(curr, list); + + if (!curr || (&curr->list == &iter->entry->list)) { + /* + * The current list has been exhausted, try the next available + * list. + */ + curr = __dlock_list_next_list(iter); + } + + return curr; /* Continue the iteration */ +} + +/** + * _dlock_list_next_list_entry - get first element from next list in iterator + * @iter : The dlock list iterator. + * @pos : A variable of the struct that is embedded in. + * @member: The name of the dlock_list_node within the struct. + * Return : Pointer to first entry or NULL if all the lists are iterated. + */ +#define _dlock_list_next_list_entry(iter, pos, member) \ + ({ \ + struct dlock_list_node *_n; \ + _n = __dlock_list_next_entry(NULL, iter); \ + _n ? list_entry(_n, typeof(*pos), member) : NULL; \ + }) + +/** + * _dlock_list_next_entry - iterate to the next entry of the list + * @pos : The type * to cursor + * @iter : The dlock list iterator. + * @member: The name of the dlock_list_node within the struct. + * Return : Pointer to the next entry or NULL if all the entries are iterated. + * + * Note that pos can't be NULL. + */ +#define _dlock_list_next_entry(pos, iter, member) \ + ({ \ + struct dlock_list_node *_n; \ + _n = __dlock_list_next_entry(&(pos)->member, iter); \ + _n ? list_entry(_n, typeof(*(pos)), member) : NULL; \ + }) + +/** + * dlist_for_each_entry - iterate over the dlock list + * @pos : Type * to use as a loop cursor + * @iter : The dlock list iterator + * @member: The name of the dlock_list_node within the struct + * + * This iteration macro isn't safe with respect to list entry removal, but + * it can correctly iterate newly added entries right after the current one. + * This iteration function is designed to be used in a while loop. + */ +#define dlist_for_each_entry(pos, iter, member) \ + for (pos = _dlock_list_next_list_entry(iter, pos, member); \ + pos != NULL; \ + pos = _dlock_list_next_entry(pos, iter, member)) + +/** + * dlist_for_each_entry_safe - iterate over the dlock list & safe over removal + * @pos : Type * to use as a loop cursor + * @n : Another type * to use as temporary storage + * @iter : The dlock list iterator + * @member: The name of the dlock_list_node within the struct + * + * This iteration macro is safe with respect to list entry removal. + * However, it cannot correctly iterate newly added entries right after the + * current one. + * + * The call to __dlock_list_next_list() is deferred until the next entry + * is being iterated to avoid use-after-unlock problem. + */ +#define dlist_for_each_entry_safe(pos, n, iter, member) \ + for (pos = NULL; \ + ({ \ + if (!pos || \ + (&(pos)->member.list == &(iter)->entry->list)) \ + pos = _dlock_list_next_list_entry(iter, pos, \ + member); \ + if (pos) \ + n = list_next_entry(pos, member.list); \ + pos; \ + }); \ + pos = n) + +#endif /* __LINUX_DLOCK_LIST_H */ diff --git a/lib/Makefile b/lib/Makefile index 6b09731d8e61..73d84b569f1e 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -48,7 +48,7 @@ obj-y += bcd.o sort.o parser.o debug_locks.o random32.o \ bsearch.o find_bit.o llist.o lwq.o memweight.o kfifo.o \ percpu-refcount.o rhashtable.o base64.o \ once.o refcount.o rcuref.o usercopy.o errseq.o bucket_locks.o \ - generic-radix-tree.o bitmap-str.o + generic-radix-tree.o bitmap-str.o dlock-list.o obj-$(CONFIG_STRING_SELFTEST) += test_string.o obj-y += string_helpers.o obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o diff --git a/lib/dlock-list.c b/lib/dlock-list.c new file mode 100644 index 000000000000..f64ea4cc5e79 --- /dev/null +++ b/lib/dlock-list.c @@ -0,0 +1,234 @@ +/* + * Distributed and locked list + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * (C) Copyright 2016 Hewlett-Packard Enterprise Development LP + * (C) Copyright 2017-2018 Red Hat, Inc. + * + * Authors: Waiman Long + */ +#include +#include +#include +#include + +/* + * The distributed and locked list is a distributed set of lists each of + * which is protected by its own spinlock, but acts like a single + * consolidated list to the callers. For scaling purpose, the number of + * lists used is equal to the number of possible CPUs in the system to + * minimize contention. + * + * However, it is possible that individual CPU numbers may be equal to + * or greater than the number of possible CPUs when there are holes in + * the CPU number list. As a result, we need to map the CPU number to a + * list index. + */ +static DEFINE_PER_CPU_READ_MOSTLY(int, cpu2idx); + +/* + * Initialize cpu2idx mapping table + * + * It is possible that a dlock-list can be allocated before the cpu2idx is + * initialized. In this case, all the cpus are mapped to the first entry + * before initialization. + * + */ +static int __init cpu2idx_init(void) +{ + int idx, cpu; + + idx = 0; + for_each_possible_cpu(cpu) + per_cpu(cpu2idx, cpu) = idx++; + return 0; +} +postcore_initcall(cpu2idx_init); + +/** + * __alloc_dlock_list_heads - Initialize and allocate the list of head entries + * @dlist: Pointer to the dlock_list_heads structure to be initialized + * @key : The lock class key to be used for lockdep + * Return: 0 if successful, -ENOMEM if memory allocation error + * + * This function does not allocate the dlock_list_heads structure itself. The + * callers will have to do their own memory allocation, if necessary. However, + * this allows embedding the dlock_list_heads structure directly into other + * structures. + * + * Dynamically allocated locks need to have their own special lock class + * to avoid lockdep warning. + */ +int __alloc_dlock_list_heads(struct dlock_list_heads *dlist, + struct lock_class_key *key) +{ + int idx; + + dlist->heads = kcalloc(nr_cpu_ids, sizeof(struct dlock_list_head), + GFP_KERNEL); + + if (!dlist->heads) + return -ENOMEM; + + for (idx = 0; idx < nr_cpu_ids; idx++) { + struct dlock_list_head *head = &dlist->heads[idx]; + + INIT_LIST_HEAD(&head->list); + head->lock = __SPIN_LOCK_UNLOCKED(&head->lock); + lockdep_set_class(&head->lock, key); + } + return 0; +} +EXPORT_SYMBOL(__alloc_dlock_list_heads); + +/** + * free_dlock_list_heads - Free all the heads entries of the dlock list + * @dlist: Pointer of the dlock_list_heads structure to be freed + * + * This function doesn't free the dlock_list_heads structure itself. So + * the caller will have to do it, if necessary. + */ +void free_dlock_list_heads(struct dlock_list_heads *dlist) +{ + kfree(dlist->heads); + dlist->heads = NULL; +} +EXPORT_SYMBOL(free_dlock_list_heads); + +/** + * dlock_lists_empty - Check if all the dlock lists are empty + * @dlist: Pointer to the dlock_list_heads structure + * Return: true if list is empty, false otherwise. + * + * This can be a pretty expensive function call. If this function is required + * in a performance critical path, we may have to maintain a global count + * of the list entries in the global dlock_list_heads structure instead. + */ +bool dlock_lists_empty(struct dlock_list_heads *dlist) +{ + int idx; + + for (idx = 0; idx < nr_cpu_ids; idx++) + if (!list_empty(&dlist->heads[idx].list)) + return false; + return true; +} +EXPORT_SYMBOL(dlock_lists_empty); + +/** + * dlock_lists_add - Adds a node to the given dlock list + * @node : The node to be added + * @dlist: The dlock list where the node is to be added + * + * List selection is based on the CPU being used when the dlock_list_add() + * function is called. However, deletion may be done by a different CPU. + */ +void dlock_lists_add(struct dlock_list_node *node, + struct dlock_list_heads *dlist) +{ + struct dlock_list_head *head = &dlist->heads[this_cpu_read(cpu2idx)]; + + /* + * There is no need to disable preemption + */ + spin_lock(&head->lock); + WRITE_ONCE(node->head, head); + list_add(&node->list, &head->list); + spin_unlock(&head->lock); +} +EXPORT_SYMBOL(dlock_lists_add); + +/** + * dlock_lists_del - Delete a node from a dlock list + * @node : The node to be deleted + * + * We need to check the lock pointer again after taking the lock to guard + * against concurrent deletion of the same node. If the lock pointer changes + * (becomes NULL or to a different one), we assume that the deletion was done + * elsewhere. A warning will be printed if this happens as it is likely to be + * a bug. + */ +void dlock_lists_del(struct dlock_list_node *node) +{ + struct dlock_list_head *head; + bool retry; + + do { + head = READ_ONCE(node->head); + if (WARN_ONCE(!head, "%s: node 0x%lx has no associated head\n", + __func__, (unsigned long)node)) + return; + + spin_lock(&head->lock); + if (likely(head == READ_ONCE(node->head))) { + list_del_init(&node->list); + WRITE_ONCE(node->head, NULL); + retry = false; + } else { + /* + * The lock has somehow changed. Retry again if it is + * not NULL. Otherwise, just ignore the delete + * operation. + */ + retry = (READ_ONCE(node->head) != NULL); + } + spin_unlock(&head->lock); + } while (retry); +} +EXPORT_SYMBOL(dlock_lists_del); + +/** + * __dlock_list_next_list: Find the first entry of the next available list + * @dlist: Pointer to the dlock_list_heads structure + * @iter : Pointer to the dlock list iterator structure + * Return: true if the entry is found, false if all the lists exhausted + * + * The information about the next available list will be put into the iterator. + */ +struct dlock_list_node *__dlock_list_next_list(struct dlock_list_iter *iter) +{ + struct dlock_list_node *next; + struct dlock_list_head *head; + +restart: + if (iter->entry) { + spin_unlock(&iter->entry->lock); + iter->entry = NULL; + } + +next_list: + /* + * Try next list + */ + if (++iter->index >= nr_cpu_ids) + return NULL; /* All the entries iterated */ + + if (list_empty(&iter->head[iter->index].list)) + goto next_list; + + head = iter->entry = &iter->head[iter->index]; + spin_lock(&head->lock); + /* + * There is a slight chance that the list may become empty just + * before the lock is acquired. So an additional check is + * needed to make sure that a valid node will be returned. + */ + if (list_empty(&head->list)) + goto restart; + + next = list_entry(head->list.next, struct dlock_list_node, + list); + WARN_ON_ONCE(next->head != head); + + return next; +} +EXPORT_SYMBOL(__dlock_list_next_list); From patchwork Wed Dec 6 06:05:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 174311 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp3907616vqy; Tue, 5 Dec 2023 22:07:05 -0800 (PST) X-Google-Smtp-Source: AGHT+IFHphggYBCfavCS7XWIeNpPB+BatVWihVDuMKyhq+KTzrO+N//2gL5KAjdxxbACLvuUa8Sb X-Received: by 2002:a05:6e02:1564:b0:35d:5f9d:d9be with SMTP id k4-20020a056e02156400b0035d5f9dd9bemr587187ilu.10.1701842824885; Tue, 05 Dec 2023 22:07:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701842824; cv=none; d=google.com; s=arc-20160816; b=QozyRqBC2P+u7SP/r5HvoghJb7tfVeIY7NQQ7AsaikkHC2PmZq7hXX2uYyJMMWHYsu slkWGVmV91IGM+f5fKyqrAwJE0bONb7ZYg5zkpRR/KOCabGHW6WYKxntYUIqHDrkJLAM EyqPUS2o3NP84P2Wr7CBMmBrCtgqAP7DmmTX/QRPPAzHjzksEH2C2I369wTv6BYMz+1A KPwNzwpIVKC2zVXF/V7U/TteRuvIZMU500LEzN4/cJyj95k6cbHbrTUyrxXNVz2QEOSo kOps6AXLH5CbgfynAReoojPLFebFYAXg17Q6/AkRcvZ3A1gB/4MYZ8pwhVcYAS2WJrLu B1QQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=l5knnzQ7H9RqqOrEz7MpDyWFgjQJLgH2xlCY/qpCSH0=; fh=CW5pnDTDkJR8Vn9NaUbLrrp28PUVapFEjMCn+HkS0lE=; b=AFj22fab8n77cv+5OpxS13oY5AvdPwE8tC1EsMKJPSCaAK4kaIw81t1fE5URqTaI6+ 9K8r5N38IFnDeid8a1vsyi7fcDdj4tXryXVU5L5gqrhLjNWZM4dcICEhKY9o78Leh/8g WXDbDIen5Gt/6W/4uEmYLKdJPafCEX+rEQKt3GqfwcHrM9SdI4JyNHGR8bC5jLOueZGD qtQaQXvX2T79xaMfhyOLgXQ2kK94Ny1oPqIctEsBOzeWqwTfULNggMYYeVVYO30Gj7YI L5Ub4Q4Ex+neTwaRjjSY46deTg1R5iYj0hcP3VBdynLocijaXK8MV4953J3ZcDdXR0IG GG2g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=WRC5Pup1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id bx39-20020a056a02052700b005c682ff4f79si5357766pgb.378.2023.12.05.22.07.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 22:07:04 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=WRC5Pup1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id AF902802F333; Tue, 5 Dec 2023 22:07:03 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1376980AbjLFGGw (ORCPT + 99 others); Wed, 6 Dec 2023 01:06:52 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49878 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1376897AbjLFGGd (ORCPT ); Wed, 6 Dec 2023 01:06:33 -0500 Received: from mail-pf1-x42d.google.com (mail-pf1-x42d.google.com [IPv6:2607:f8b0:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B6288D66 for ; Tue, 5 Dec 2023 22:06:37 -0800 (PST) Received: by mail-pf1-x42d.google.com with SMTP id d2e1a72fcca58-6ce6b62746dso1351684b3a.2 for ; Tue, 05 Dec 2023 22:06:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1701842796; x=1702447596; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=l5knnzQ7H9RqqOrEz7MpDyWFgjQJLgH2xlCY/qpCSH0=; b=WRC5Pup1W54aB1wX0TS1oKMY8iXKQ7renKUg3EaIjcdddFQTBl0J2xPaKVkyrH4JH1 00sGA6PqVUMoOsst9w74KM96Ggye8Cn2b3rpDh56TOAUyJ91XuLkOwgztgz9Tif4VgDR UN4d8muVEGfjpEUtO8EFQvytujoy2uQdy5zZ3PXRG+sq03TBVOs9sflwPhlC2ODGs6z/ IQDHYGnPzMBhhN8ks/EiG+HUeh+/wYsFp7lF2o68WBszJTZqbJElVHFDchnKQNhmdSTS knbPNVS+dlmczQZEHu621+wg+efZD/F5y0EZBTuKVW6Cb8ErbLjdpRBkGO4m8CCuZy/r KL+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701842796; x=1702447596; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=l5knnzQ7H9RqqOrEz7MpDyWFgjQJLgH2xlCY/qpCSH0=; b=I/LDV/2OlTn86exvJ4i9ISe1UnJtsN0MTHR3W1AEl5PJXQngs31Qf3tVFwnllSZKU+ rAgjDDCZRZSmoGYwUIAwHfztmsVe0LOyF+9Bh1yG0mHFDgAnBtI3vcvUVn+VhB8MDOf7 CYWvqx7dx+c+EFVPP2FYIJSQL7PfVzREtNMf6AXBc0yK9LsBB1GhsBbRJRXqTsgDxcy/ dfKbFiBMfaUBWEPXUTFBnObhMwoi8KdG2+mYUv2MwTr9nQyUDss1GnZOQAFUkg/zVZxy uoBVRJFiUWUZRSomYN7nEmu2MVoeQKPUAEmax64739oU31VQtp7iWJ8KzDRkyjqBqoFA 70vw== X-Gm-Message-State: AOJu0YxczLWBHJ1MlWvrh31E7m5+vW9sf+OZNv2Sv0ErmYrI/Wa4S3bL hiJuUftgatA+NsOUwzTHAkZrIuXs5hkpbFs9IDs= X-Received: by 2002:a05:6a00:194f:b0:6ce:2731:d5b7 with SMTP id s15-20020a056a00194f00b006ce2731d5b7mr286716pfk.40.1701842796562; Tue, 05 Dec 2023 22:06:36 -0800 (PST) Received: from dread.disaster.area (pa49-180-125-5.pa.nsw.optusnet.com.au. [49.180.125.5]) by smtp.gmail.com with ESMTPSA id r8-20020aa78b88000000b006889511ab14sm10382099pfd.37.2023.12.05.22.06.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 22:06:34 -0800 (PST) Received: from [192.168.253.23] (helo=devoid.disaster.area) by dread.disaster.area with esmtp (Exim 4.96) (envelope-from ) id 1rAl3H-004VOd-2q; Wed, 06 Dec 2023 17:06:31 +1100 Received: from dave by devoid.disaster.area with local (Exim 4.97-RC0) (envelope-from ) id 1rAl3H-0000000BrV6-1sZg; Wed, 06 Dec 2023 17:06:31 +1100 From: Dave Chinner To: linux-fsdevel@vger.kernel.org Cc: linux-block@vger.kernel.org, linux-cachefs@redhat.com, dhowells@redhat.com, gfs2@lists.linux.dev, dm-devel@lists.linux.dev, linux-security-module@vger.kernel.org, selinux@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 02/11] vfs: Remove unnecessary list_for_each_entry_safe() variants Date: Wed, 6 Dec 2023 17:05:31 +1100 Message-ID: <20231206060629.2827226-3-david@fromorbit.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20231206060629.2827226-1-david@fromorbit.com> References: <20231206060629.2827226-1-david@fromorbit.com> MIME-Version: 1.0 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Tue, 05 Dec 2023 22:07:03 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784511542087469770 X-GMAIL-MSGID: 1784511542087469770 From: Jan Kara evict_inodes() and invalidate_inodes() use list_for_each_entry_safe() to iterate sb->s_inodes list. However, since we use i_lru list entry for our local temporary list of inodes to destroy, the inode is guaranteed to stay in sb->s_inodes list while we hold sb->s_inode_list_lock. So there is no real need for safe iteration variant and we can use list_for_each_entry() just fine. Signed-off-by: Jan Kara Signed-off-by: Waiman Long ACKed-by: Al Viro Reviewed-by: Kent Overstreet --- fs/inode.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/inode.c b/fs/inode.c index f238d987dec9..17c50a75514f 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -712,12 +712,12 @@ static void dispose_list(struct list_head *head) */ void evict_inodes(struct super_block *sb) { - struct inode *inode, *next; + struct inode *inode; LIST_HEAD(dispose); again: spin_lock(&sb->s_inode_list_lock); - list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) { + list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { if (atomic_read(&inode->i_count)) continue; @@ -758,12 +758,12 @@ EXPORT_SYMBOL_GPL(evict_inodes); */ void invalidate_inodes(struct super_block *sb) { - struct inode *inode, *next; + struct inode *inode; LIST_HEAD(dispose); again: spin_lock(&sb->s_inode_list_lock); - list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) { + list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { spin_lock(&inode->i_lock); if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) { spin_unlock(&inode->i_lock); From patchwork Wed Dec 6 06:05:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 174316 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp3907888vqy; Tue, 5 Dec 2023 22:07:45 -0800 (PST) X-Google-Smtp-Source: AGHT+IGaaQFNSvv7MQLoGsP/bb+7zQrj+l2M4izDmN9s5Jt2mtIG8D7SV2KsJTBBLjeJRYOZVSrb X-Received: by 2002:a05:6a00:409a:b0:6ce:5416:592f with SMTP id bw26-20020a056a00409a00b006ce5416592fmr227794pfb.63.1701842865713; Tue, 05 Dec 2023 22:07:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701842865; cv=none; d=google.com; s=arc-20160816; b=qUTNW3v7KcrrN12Flgk9wYC8vbxZbRh61Mjdyu9sasy9vaW5N+VgiiHpm3WeJRBE14 5VhLagF20ecCG+BgZnPI1v0P8Ls8JKCspMp4EzqZPj9gRKy0NdN2r7n+CcjbPYgV/qp5 /e4M7Laiu2NNlT/7VL/YJkHyvXO/nEGhG5MmaMknXmYCmRNJ4CNkYlCki0YShHJIR1uq YSOm78Z1iSZ6yfD3pU8hNDXT5SCAtX49wNaMu+94tccQa1BZw2/sx1LnChV/ERh+cGFp 8z9iNyhTFtXtHfbizj6JjrYFU5TsKfUahGVeKf4EaaQw6AiOwSQPWVWHiC6hFr+KtFFL c0/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=NnFiB1AOSnW5eXL0TXhVYgeL/vpsO/mdIcAfnCTORyA=; fh=CW5pnDTDkJR8Vn9NaUbLrrp28PUVapFEjMCn+HkS0lE=; b=uGXGaQiBvY1ER5/4V0uLBPvhj5vjOfdWrawF25YXn+2gyQJ4IF4pCzw1d0zP5+FfHJ bMXqIVsuNH8clHjMPmskoF6TQstNsvErwZ7vAoZdqKsLSZvEEuWKFLR4DeSfIKxpgF+9 LpupYXyyKwhY9T68hdB3ntBoaruS1ICt/MUH48ucPte9FTrPHahutFmSddLSJk0IdY04 +letULKqg9vtwZ4/F/4FLGAuenVwR00iE4lu7+Z9DGJyCOck7r/JslnMmVgyVm4X8fnD SDuAO66sTlMUIXyu4YFuxvuW2hzGjsFVe3Pf/YTK4RafEfp2eeTcqEjFrQLNmVMvejud OhAg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b="SvdV/jF9"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Received: from morse.vger.email (morse.vger.email. [2620:137:e000::3:1]) by mx.google.com with ESMTPS id f19-20020a056a00239300b006b905fe37b7si10963437pfc.221.2023.12.05.22.07.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 22:07:45 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) client-ip=2620:137:e000::3:1; Authentication-Results: mx.google.com; dkim=pass header.i=@fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b="SvdV/jF9"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by morse.vger.email (Postfix) with ESMTP id 2658282072C5; Tue, 5 Dec 2023 22:07:42 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at morse.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1377063AbjLFGH2 (ORCPT + 99 others); Wed, 6 Dec 2023 01:07:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50002 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1376889AbjLFGGl (ORCPT ); Wed, 6 Dec 2023 01:06:41 -0500 Received: from mail-pl1-x633.google.com (mail-pl1-x633.google.com [IPv6:2607:f8b0:4864:20::633]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AEC1E10CC for ; Tue, 5 Dec 2023 22:06:39 -0800 (PST) Received: by mail-pl1-x633.google.com with SMTP id d9443c01a7336-1d03bcf27e9so33461525ad.0 for ; Tue, 05 Dec 2023 22:06:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1701842799; x=1702447599; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=NnFiB1AOSnW5eXL0TXhVYgeL/vpsO/mdIcAfnCTORyA=; b=SvdV/jF9qBeunUyMw46u+j5wYLGdMXjRoiGBMsjCKwtEQJP3jhHGWTferAhYSizaCV fJpjzSUA1CkVJXuIsxI5iNPOAjJKKTL8dNF29vQGST2Xe+zzJorLUMyBenHkp9NtBc6Q DxT4Z+GzKlg+CBoEsfYlvHEt+chxZlCMIqaJmhwhpIJczkV53LuHCPLxmYbtvvFhKVTz JD/Gv3AJQDddoc64LIu5SNvVU/sIKB5W1FFkV2oPFQ/sKoLWVjTdrXdPRj5Bg9Vwz5wW EoeG8oV/pGF4X9z1txW22KjLuaLzk3FWsSjTNfjrWMh413GY7xNEySyzFkhn1rcgz+a7 PAzw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701842799; x=1702447599; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=NnFiB1AOSnW5eXL0TXhVYgeL/vpsO/mdIcAfnCTORyA=; b=JK5YYVCvz09BN9mg1FGlzRNaYl8HE9Y0QKeItzviMP4yiayVOoIKnt0S//X6xjg/L6 ab0v7Idkob3DbBgvkFsXKPjvWctAnVuOTGA5jfS2MrFV0fyFITx5E5FSVwlDdg3aglv5 PcbaShqKRmdAj+n7qjULrl1JlEAhqyk0PBdbFhlnAWweUb35ZRJbXa5oiorr/IJUQ3Se jnNRfD2XEe1+0Q+YWlQOYurPbwM+t6q6OhMm3gbsmzD5mColEyKXsGZteRx/17Z4uIOo IoHKGjKUAo7/7QuWXhcPYs7ftdNdRMAAjUOetrkKRkUVdrTFBmWalVGXfiMExhNelTkh qFug== X-Gm-Message-State: AOJu0YxqLr3LdWZbdl+D6zYKnqznHdLlvsJzd8uANv4LIenPNvf7Mw28 jaPrqFQ/JoJIWRyhE5zRoLWl1g== X-Received: by 2002:a17:902:b28c:b0:1d0:6ffe:1e89 with SMTP id u12-20020a170902b28c00b001d06ffe1e89mr238281plr.108.1701842798702; Tue, 05 Dec 2023 22:06:38 -0800 (PST) Received: from dread.disaster.area (pa49-180-125-5.pa.nsw.optusnet.com.au. [49.180.125.5]) by smtp.gmail.com with ESMTPSA id i4-20020a17090332c400b001d071d58e85sm7382209plr.98.2023.12.05.22.06.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 22:06:35 -0800 (PST) Received: from [192.168.253.23] (helo=devoid.disaster.area) by dread.disaster.area with esmtp (Exim 4.96) (envelope-from ) id 1rAl3H-004VOj-33; Wed, 06 Dec 2023 17:06:31 +1100 Received: from dave by devoid.disaster.area with local (Exim 4.97-RC0) (envelope-from ) id 1rAl3H-0000000BrVB-27ej; Wed, 06 Dec 2023 17:06:31 +1100 From: Dave Chinner To: linux-fsdevel@vger.kernel.org Cc: linux-block@vger.kernel.org, linux-cachefs@redhat.com, dhowells@redhat.com, gfs2@lists.linux.dev, dm-devel@lists.linux.dev, linux-security-module@vger.kernel.org, selinux@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 03/11] vfs: Use dlock list for superblock's inode list Date: Wed, 6 Dec 2023 17:05:32 +1100 Message-ID: <20231206060629.2827226-4-david@fromorbit.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20231206060629.2827226-1-david@fromorbit.com> References: <20231206060629.2827226-1-david@fromorbit.com> MIME-Version: 1.0 X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on morse.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (morse.vger.email [0.0.0.0]); Tue, 05 Dec 2023 22:07:42 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784511584881023666 X-GMAIL-MSGID: 1784511584881023666 From: Waiman Long [dchinner: original commit message preserved] When many threads are trying to add or delete inode to or from a superblock's s_inodes list, spinlock contention on the list can become a performance bottleneck. This patch changes the s_inodes field to become a dlock list which is a distributed set of lists with per-list spinlocks. As a result, the following superblock inode list (sb->s_inodes) iteration functions in vfs are also being modified: 1. iterate_bdevs() 2. drop_pagecache_sb() 3. evict_inodes() 4. invalidate_inodes() 5. fsnotify_unmount_inodes() 6. add_dquot_ref() 7. remove_dquot_ref() With an exit microbenchmark that creates a large number of threads, attachs many inodes to them in procfs and then exits. The runtimes of that microbenchmark with various number of threads before and after the patch on a 4-socket Intel E7-8867 v3 system (64 cores, 128 threads) on a 4.19-rc3 based kernel were as follows: # of threads Elapsed/Sys Time Elapsed/Sys Time Speedup Unpatched Kernel Patched Kernel ------------ ---------------- ---------------- ------- 1000 59.17s/123m09.8s 18.90s/24m44.5s 3.13 1200 73.20s/151m24.1s 27.54s/50m05.3s 2.66 1400 102.04s/212m00.9s 36.75s/68m26.7s 2.78 1600 131.13s/272m52.4s 50.16s/94m23.7s 2.61 [dchinner: forward port, add new inode list traversals, etc] [dchinner: scalability results on current TOT XFS] With 400k inodes per thread concurrent directory traversal workload, scalability improves at >=16 threads on 6.7-rc4 on XFS. We only test XFS here as it is the only filesystem that demonstrates sufficient internal scalability for the superblock inode list to be a scalability bottleneck. Table contains test runtime in seconds; perfect scalability is demonstrated by the runtime staying constant as thread count goes up. Threads 6.4-rc7 patched ------- ------- ------- 2 11.673 11.158 4 9.665 9.444 8 10.622 9.275 16 12.148 9.508 32 20.518 10.308 Unpatched kernel profile at 32 threads: - 95.45% vfs_fstatat - 95.00% vfs_statx - 91.00% filename_lookup - 90.90% path_lookupat - 90.40% walk_component - 89.05% lookup_slow - 88.95% __lookup_slow - 86.38% xfs_vn_lookup - 84.05% xfs_lookup - 78.82% xfs_iget - 72.58% xfs_setup_inode - 72.54% inode_sb_list_add - 71.12% _raw_spin_lock - 71.09% do_raw_spin_lock - 68.85% __pv_queued_spin_lock_slowpath Patched kernel profile at 32 threads - the biggest single point of contention is now the dentry cache LRU via dput(): - 21.59% 0.25% [kernel] [k] dput - 21.34% dput - 19.93% retain_dentry - d_lru_add - 19.82% list_lru_add - 14.62% _raw_spin_lock - 14.47% do_raw_spin_lock 10.89% __pv_queued_spin_lock_slowpath 1.78% __list_add_valid_or_report - 0.81% _raw_spin_unlock - do_raw_spin_unlock 0.77% __raw_callee_save___pv_queued_spin_unlock - 0.79% _raw_spin_unlock - 0.78% do_raw_spin_unlock 0.67% __raw_callee_save___pv_queued_spin_unlock Signed-off-by: Waiman Long Signed-off-by: Dave Chinner --- block/bdev.c | 24 ++++++++---------------- fs/drop_caches.c | 9 ++++----- fs/gfs2/ops_fstype.c | 21 +++++++++++---------- fs/inode.c | 37 ++++++++++++++++--------------------- fs/notify/fsnotify.c | 12 ++++++------ fs/quota/dquot.c | 22 ++++++---------------- fs/super.c | 13 +++++++------ include/linux/fs.h | 8 ++++---- security/landlock/fs.c | 25 ++++++------------------- 9 files changed, 68 insertions(+), 103 deletions(-) diff --git a/block/bdev.c b/block/bdev.c index 750aec178b6a..07135fd6fda4 100644 --- a/block/bdev.c +++ b/block/bdev.c @@ -437,11 +437,11 @@ long nr_blockdev_pages(void) { struct inode *inode; long ret = 0; + DEFINE_DLOCK_LIST_ITER(iter, &blockdev_superblock->s_inodes); - spin_lock(&blockdev_superblock->s_inode_list_lock); - list_for_each_entry(inode, &blockdev_superblock->s_inodes, i_sb_list) + dlist_for_each_entry(inode, &iter, i_sb_list) { ret += inode->i_mapping->nrpages; - spin_unlock(&blockdev_superblock->s_inode_list_lock); + } return ret; } @@ -1032,9 +1032,9 @@ EXPORT_SYMBOL_GPL(bdev_mark_dead); void sync_bdevs(bool wait) { struct inode *inode, *old_inode = NULL; + DEFINE_DLOCK_LIST_ITER(iter, &blockdev_superblock->s_inodes); - spin_lock(&blockdev_superblock->s_inode_list_lock); - list_for_each_entry(inode, &blockdev_superblock->s_inodes, i_sb_list) { + dlist_for_each_entry(inode, &iter, i_sb_list) { struct address_space *mapping = inode->i_mapping; struct block_device *bdev; @@ -1046,15 +1046,8 @@ void sync_bdevs(bool wait) } __iget(inode); spin_unlock(&inode->i_lock); - spin_unlock(&blockdev_superblock->s_inode_list_lock); - /* - * We hold a reference to 'inode' so it couldn't have been - * removed from s_inodes list while we dropped the - * s_inode_list_lock We cannot iput the inode now as we can - * be holding the last reference and we cannot iput it under - * s_inode_list_lock. So we keep the reference and iput it - * later. - */ + dlock_list_unlock(&iter); + iput(old_inode); old_inode = inode; bdev = I_BDEV(inode); @@ -1075,9 +1068,8 @@ void sync_bdevs(bool wait) } mutex_unlock(&bdev->bd_disk->open_mutex); - spin_lock(&blockdev_superblock->s_inode_list_lock); + dlock_list_relock(&iter); } - spin_unlock(&blockdev_superblock->s_inode_list_lock); iput(old_inode); } diff --git a/fs/drop_caches.c b/fs/drop_caches.c index b9575957a7c2..3596d0a7c0da 100644 --- a/fs/drop_caches.c +++ b/fs/drop_caches.c @@ -19,9 +19,9 @@ int sysctl_drop_caches; static void drop_pagecache_sb(struct super_block *sb, void *unused) { struct inode *inode, *toput_inode = NULL; + DEFINE_DLOCK_LIST_ITER(iter, &sb->s_inodes); - spin_lock(&sb->s_inode_list_lock); - list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { + dlist_for_each_entry(inode, &iter, i_sb_list) { spin_lock(&inode->i_lock); /* * We must skip inodes in unusual state. We may also skip @@ -35,16 +35,15 @@ static void drop_pagecache_sb(struct super_block *sb, void *unused) } __iget(inode); spin_unlock(&inode->i_lock); - spin_unlock(&sb->s_inode_list_lock); + dlock_list_unlock(&iter); invalidate_mapping_pages(inode->i_mapping, 0, -1); iput(toput_inode); toput_inode = inode; cond_resched(); - spin_lock(&sb->s_inode_list_lock); + dlock_list_relock(&iter); } - spin_unlock(&sb->s_inode_list_lock); iput(toput_inode); } diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c index b108c5d26839..1105710482e7 100644 --- a/fs/gfs2/ops_fstype.c +++ b/fs/gfs2/ops_fstype.c @@ -1738,22 +1738,24 @@ static int gfs2_meta_init_fs_context(struct fs_context *fc) * attempt will time out. Since inodes are evicted sequentially, this can add * up quickly. * - * Function evict_inodes() tries to keep the s_inode_list_lock list locked over - * a long time, which prevents other inodes from being evicted concurrently. - * This precludes the cooperative behavior we are looking for. This special - * version of evict_inodes() avoids that. - * * Modeled after drop_pagecache_sb(). + * + * XXX(dgc): this is particularly awful. With the dlist for inodes, concurrent + * access to the inode list can occur and evict_inodes() will drop the per-cpu + * list lock if the CPU needs rescheduling. Hence if this exists just because + * evict_inodes() holds the s_inode_list_lock for long periods preventing + * concurrent inode eviction work from being done, this can probably go away + * entirely now. */ static void gfs2_evict_inodes(struct super_block *sb) { struct inode *inode, *toput_inode = NULL; struct gfs2_sbd *sdp = sb->s_fs_info; + DEFINE_DLOCK_LIST_ITER(iter, &sb->s_inodes); set_bit(SDF_EVICTING, &sdp->sd_flags); - spin_lock(&sb->s_inode_list_lock); - list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { + dlist_for_each_entry(inode, &iter, i_sb_list) { spin_lock(&inode->i_lock); if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) && !need_resched()) { @@ -1762,15 +1764,14 @@ static void gfs2_evict_inodes(struct super_block *sb) } atomic_inc(&inode->i_count); spin_unlock(&inode->i_lock); - spin_unlock(&sb->s_inode_list_lock); + dlock_list_unlock(&iter); iput(toput_inode); toput_inode = inode; cond_resched(); - spin_lock(&sb->s_inode_list_lock); + dlock_list_relock(&iter); } - spin_unlock(&sb->s_inode_list_lock); iput(toput_inode); } diff --git a/fs/inode.c b/fs/inode.c index 17c50a75514f..3426691fa305 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -30,7 +30,7 @@ * inode->i_state, inode->i_hash, __iget(), inode->i_io_list * Inode LRU list locks protect: * inode->i_sb->s_inode_lru, inode->i_lru - * inode->i_sb->s_inode_list_lock protects: + * inode->i_sb->s_inodes->head->lock protects: * inode->i_sb->s_inodes, inode->i_sb_list * bdi->wb.list_lock protects: * bdi->wb.b_{dirty,io,more_io,dirty_time}, inode->i_io_list @@ -39,7 +39,7 @@ * * Lock ordering: * - * inode->i_sb->s_inode_list_lock + * inode->i_sb->s_inodes->head->lock * inode->i_lock * Inode LRU list locks * @@ -47,7 +47,7 @@ * inode->i_lock * * inode_hash_lock - * inode->i_sb->s_inode_list_lock + * inode->i_sb->s_inodes->head->lock * inode->i_lock * * iunique_lock @@ -423,7 +423,7 @@ void inode_init_once(struct inode *inode) INIT_LIST_HEAD(&inode->i_io_list); INIT_LIST_HEAD(&inode->i_wb_list); INIT_LIST_HEAD(&inode->i_lru); - INIT_LIST_HEAD(&inode->i_sb_list); + init_dlock_list_node(&inode->i_sb_list); __address_space_init_once(&inode->i_data); i_size_ordered_init(inode); } @@ -492,19 +492,14 @@ static void inode_lru_list_del(struct inode *inode) */ void inode_sb_list_add(struct inode *inode) { - spin_lock(&inode->i_sb->s_inode_list_lock); - list_add(&inode->i_sb_list, &inode->i_sb->s_inodes); - spin_unlock(&inode->i_sb->s_inode_list_lock); + dlock_lists_add(&inode->i_sb_list, &inode->i_sb->s_inodes); } EXPORT_SYMBOL_GPL(inode_sb_list_add); static inline void inode_sb_list_del(struct inode *inode) { - if (!list_empty(&inode->i_sb_list)) { - spin_lock(&inode->i_sb->s_inode_list_lock); - list_del_init(&inode->i_sb_list); - spin_unlock(&inode->i_sb->s_inode_list_lock); - } + if (!list_empty(&inode->i_sb_list.list)) + dlock_lists_del(&inode->i_sb_list); } static unsigned long hash(struct super_block *sb, unsigned long hashval) @@ -713,11 +708,12 @@ static void dispose_list(struct list_head *head) void evict_inodes(struct super_block *sb) { struct inode *inode; + struct dlock_list_iter iter; LIST_HEAD(dispose); again: - spin_lock(&sb->s_inode_list_lock); - list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { + init_dlock_list_iter(&iter, &sb->s_inodes); + dlist_for_each_entry(inode, &iter, i_sb_list) { if (atomic_read(&inode->i_count)) continue; @@ -738,13 +734,12 @@ void evict_inodes(struct super_block *sb) * bit so we don't livelock. */ if (need_resched()) { - spin_unlock(&sb->s_inode_list_lock); + dlock_list_unlock(&iter); cond_resched(); dispose_list(&dispose); goto again; } } - spin_unlock(&sb->s_inode_list_lock); dispose_list(&dispose); } @@ -759,11 +754,12 @@ EXPORT_SYMBOL_GPL(evict_inodes); void invalidate_inodes(struct super_block *sb) { struct inode *inode; + struct dlock_list_iter iter; LIST_HEAD(dispose); again: - spin_lock(&sb->s_inode_list_lock); - list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { + init_dlock_list_iter(&iter, &sb->s_inodes); + dlist_for_each_entry(inode, &iter, i_sb_list) { spin_lock(&inode->i_lock); if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) { spin_unlock(&inode->i_lock); @@ -779,13 +775,12 @@ void invalidate_inodes(struct super_block *sb) spin_unlock(&inode->i_lock); list_add(&inode->i_lru, &dispose); if (need_resched()) { - spin_unlock(&sb->s_inode_list_lock); + dlock_list_unlock(&iter); cond_resched(); dispose_list(&dispose); goto again; } } - spin_unlock(&sb->s_inode_list_lock); dispose_list(&dispose); } @@ -1232,7 +1227,7 @@ struct inode *inode_insert5(struct inode *inode, unsigned long hashval, * Add inode to the sb list if it's not already. It has I_NEW at this * point, so it should be safe to test i_sb_list locklessly. */ - if (list_empty(&inode->i_sb_list)) + if (list_empty(&inode->i_sb_list.list)) inode_sb_list_add(inode); unlock: spin_unlock(&inode_hash_lock); diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index 7974e91ffe13..15e3769e76f5 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -33,14 +33,15 @@ void __fsnotify_vfsmount_delete(struct vfsmount *mnt) * @sb: superblock being unmounted. * * Called during unmount with no locks held, so needs to be safe against - * concurrent modifiers. We temporarily drop sb->s_inode_list_lock and CAN block. + * concurrent modifiers. We temporarily drop sb->s_inodes list lock and CAN + * block. */ static void fsnotify_unmount_inodes(struct super_block *sb) { struct inode *inode, *iput_inode = NULL; + DEFINE_DLOCK_LIST_ITER(iter, &sb->s_inodes); - spin_lock(&sb->s_inode_list_lock); - list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { + dlist_for_each_entry(inode, &iter, i_sb_list) { /* * We cannot __iget() an inode in state I_FREEING, * I_WILL_FREE, or I_NEW which is fine because by that point @@ -68,7 +69,7 @@ static void fsnotify_unmount_inodes(struct super_block *sb) __iget(inode); spin_unlock(&inode->i_lock); - spin_unlock(&sb->s_inode_list_lock); + dlock_list_unlock(&iter); iput(iput_inode); @@ -80,9 +81,8 @@ static void fsnotify_unmount_inodes(struct super_block *sb) iput_inode = inode; cond_resched(); - spin_lock(&sb->s_inode_list_lock); + dlock_list_relock(&iter); } - spin_unlock(&sb->s_inode_list_lock); iput(iput_inode); } diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c index 58b5de081b57..e873dcbe6feb 100644 --- a/fs/quota/dquot.c +++ b/fs/quota/dquot.c @@ -1024,13 +1024,13 @@ static int dqinit_needed(struct inode *inode, int type) static int add_dquot_ref(struct super_block *sb, int type) { struct inode *inode, *old_inode = NULL; + DEFINE_DLOCK_LIST_ITER(iter, &sb->s_inodes); #ifdef CONFIG_QUOTA_DEBUG int reserved = 0; #endif int err = 0; - spin_lock(&sb->s_inode_list_lock); - list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { + dlist_for_each_entry(inode, &iter, i_sb_list) { spin_lock(&inode->i_lock); if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) || !atomic_read(&inode->i_writecount) || @@ -1040,7 +1040,7 @@ static int add_dquot_ref(struct super_block *sb, int type) } __iget(inode); spin_unlock(&inode->i_lock); - spin_unlock(&sb->s_inode_list_lock); + dlock_list_unlock(&iter); #ifdef CONFIG_QUOTA_DEBUG if (unlikely(inode_get_rsv_space(inode) > 0)) @@ -1053,19 +1053,10 @@ static int add_dquot_ref(struct super_block *sb, int type) goto out; } - /* - * We hold a reference to 'inode' so it couldn't have been - * removed from s_inodes list while we dropped the - * s_inode_list_lock. We cannot iput the inode now as we can be - * holding the last reference and we cannot iput it under - * s_inode_list_lock. So we keep the reference and iput it - * later. - */ old_inode = inode; cond_resched(); - spin_lock(&sb->s_inode_list_lock); + dlock_list_relock(&iter); } - spin_unlock(&sb->s_inode_list_lock); iput(old_inode); out: #ifdef CONFIG_QUOTA_DEBUG @@ -1081,12 +1072,12 @@ static int add_dquot_ref(struct super_block *sb, int type) static void remove_dquot_ref(struct super_block *sb, int type) { struct inode *inode; + DEFINE_DLOCK_LIST_ITER(iter, &sb->s_inodes); #ifdef CONFIG_QUOTA_DEBUG int reserved = 0; #endif - spin_lock(&sb->s_inode_list_lock); - list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { + dlist_for_each_entry(inode, &iter, i_sb_list) { /* * We have to scan also I_NEW inodes because they can already * have quota pointer initialized. Luckily, we need to touch @@ -1108,7 +1099,6 @@ static void remove_dquot_ref(struct super_block *sb, int type) } spin_unlock(&dq_data_lock); } - spin_unlock(&sb->s_inode_list_lock); #ifdef CONFIG_QUOTA_DEBUG if (reserved) { printk(KERN_WARNING "VFS (%s): Writes happened after quota" diff --git a/fs/super.c b/fs/super.c index 076392396e72..61c19e3f06d8 100644 --- a/fs/super.c +++ b/fs/super.c @@ -303,6 +303,7 @@ static void destroy_unused_super(struct super_block *s) super_unlock_excl(s); list_lru_destroy(&s->s_dentry_lru); list_lru_destroy(&s->s_inode_lru); + free_dlock_list_heads(&s->s_inodes); security_sb_free(s); put_user_ns(s->s_user_ns); kfree(s->s_subtype); @@ -367,8 +368,6 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags, INIT_HLIST_NODE(&s->s_instances); INIT_HLIST_BL_HEAD(&s->s_roots); mutex_init(&s->s_sync_lock); - INIT_LIST_HEAD(&s->s_inodes); - spin_lock_init(&s->s_inode_list_lock); INIT_LIST_HEAD(&s->s_inodes_wb); spin_lock_init(&s->s_inode_wblist_lock); @@ -383,6 +382,9 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags, s->s_time_min = TIME64_MIN; s->s_time_max = TIME64_MAX; + if (alloc_dlock_list_heads(&s->s_inodes)) + goto fail; + s->s_shrink = shrinker_alloc(SHRINKER_NUMA_AWARE | SHRINKER_MEMCG_AWARE, "sb-%s", type->name); if (!s->s_shrink) @@ -695,7 +697,7 @@ void generic_shutdown_super(struct super_block *sb) if (sop->put_super) sop->put_super(sb); - if (CHECK_DATA_CORRUPTION(!list_empty(&sb->s_inodes), + if (CHECK_DATA_CORRUPTION(!dlock_lists_empty(&sb->s_inodes), "VFS: Busy inodes after unmount of %s (%s)", sb->s_id, sb->s_type->name)) { /* @@ -704,14 +706,13 @@ void generic_shutdown_super(struct super_block *sb) * iput_final() or such crashes cleanly. */ struct inode *inode; + DEFINE_DLOCK_LIST_ITER(iter, &sb->s_inodes); - spin_lock(&sb->s_inode_list_lock); - list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { + dlist_for_each_entry(inode, &iter, i_sb_list) { inode->i_op = VFS_PTR_POISON; inode->i_sb = VFS_PTR_POISON; inode->i_mapping = VFS_PTR_POISON; } - spin_unlock(&sb->s_inode_list_lock); } } /* diff --git a/include/linux/fs.h b/include/linux/fs.h index 98b7a7a8c42e..bb35591733f1 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -43,6 +43,7 @@ #include #include #include +#include #include #include @@ -702,7 +703,7 @@ struct inode { u16 i_wb_frn_history; #endif struct list_head i_lru; /* inode LRU list */ - struct list_head i_sb_list; + struct dlock_list_node i_sb_list; struct list_head i_wb_list; /* backing dev writeback list */ union { struct hlist_head i_dentry; @@ -1315,9 +1316,8 @@ struct super_block { */ int s_stack_depth; - /* s_inode_list_lock protects s_inodes */ - spinlock_t s_inode_list_lock ____cacheline_aligned_in_smp; - struct list_head s_inodes; /* all inodes */ + /* The internal per-list locks protect s_inodes */ + struct dlock_list_heads s_inodes; /* all inodes */ spinlock_t s_inode_wblist_lock; struct list_head s_inodes_wb; /* writeback inodes */ diff --git a/security/landlock/fs.c b/security/landlock/fs.c index bc7c126deea2..4269d9938c09 100644 --- a/security/landlock/fs.c +++ b/security/landlock/fs.c @@ -844,12 +844,12 @@ static void hook_inode_free_security(struct inode *const inode) static void hook_sb_delete(struct super_block *const sb) { struct inode *inode, *prev_inode = NULL; + DEFINE_DLOCK_LIST_ITER(iter, &sb->s_inodes); if (!landlock_initialized) return; - spin_lock(&sb->s_inode_list_lock); - list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { + dlist_for_each_entry(inode, &iter, i_sb_list) { struct landlock_object *object; /* Only handles referenced inodes. */ @@ -883,6 +883,7 @@ static void hook_sb_delete(struct super_block *const sb) /* Keeps a reference to this inode until the next loop walk. */ __iget(inode); spin_unlock(&inode->i_lock); + dlock_list_unlock(&iter); /* * If there is no concurrent release_inode() ongoing, then we @@ -917,25 +918,11 @@ static void hook_sb_delete(struct super_block *const sb) rcu_read_unlock(); } - if (prev_inode) { - /* - * At this point, we still own the __iget() reference - * that we just set in this loop walk. Therefore we - * can drop the list lock and know that the inode won't - * disappear from under us until the next loop walk. - */ - spin_unlock(&sb->s_inode_list_lock); - /* - * We can now actually put the inode reference from the - * previous loop walk, which is not needed anymore. - */ - iput(prev_inode); - cond_resched(); - spin_lock(&sb->s_inode_list_lock); - } + iput(prev_inode); prev_inode = inode; + cond_resched(); + dlock_list_relock(&iter); } - spin_unlock(&sb->s_inode_list_lock); /* Puts the inode reference from the last loop walk, if any. */ if (prev_inode) From patchwork Wed Dec 6 06:05:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 174313 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp3907729vqy; Tue, 5 Dec 2023 22:07:21 -0800 (PST) X-Google-Smtp-Source: AGHT+IFiR4JJuPhsDDc+5xwbMWjhf+NcXUv7zGOpuE/Rq/qeRv9zk1lh7OkM8+iRYL7OmYRZmE9J X-Received: by 2002:a05:6808:219c:b0:3b8:43fe:96d9 with SMTP id be28-20020a056808219c00b003b843fe96d9mr645921oib.35.1701842841669; Tue, 05 Dec 2023 22:07:21 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701842841; cv=none; d=google.com; s=arc-20160816; b=wnhMPI3OD4sdBx6bJPeTiVSMDWzmVtN6hayo/XQ6nsZiaOLIVhChxzVhayXgQmrHTA DnGuASgbnbVTcpGA4WtaJovU1rTGIxsjSjtLmOwMCT4haTm/23kuaJqz+Gjq8NuUpCiM y2/6rABC78AhlJ2Bzo99QY4yPRcNmVotHevH0e0C12/TYwKXNQA1qZYczAFpMyTNzWAp ij9Mf+iPd1rNgp/nhq/P9o0pObl7w2GLlOO26LlekTkdqK3/EmvGDR3vPpKsY/xOEEEZ TkMJC8mQUxajktMeDq5nJwFPNb79RJuFw88UAgMFfutuQkeGHYwK9e2OrU/BNrwfBTq1 sNcw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=5fxgjHIY+i+wILRYpSYSEatYEwrJqDdx6PzPhkDcv2Y=; fh=CW5pnDTDkJR8Vn9NaUbLrrp28PUVapFEjMCn+HkS0lE=; b=WEi3mANrRZ56PsEQpm9q3SKtOj2LcTQNscHHWky0quohWNuZ4xdsQzhlLTTvNkruiY bBS6DyxcZbuZn2egiuun5TXP+O4R0kLoLYzSAEptMIqVkNIo0hHXKCPNoqCdlSBfVDqF rKjeF4IUHi/IGNiTK+CjJ/gfGCFl11I9BwGHj5FSaVst9s3iZIB2pHz9e7HVwwDxBf3p NY9ZcK7yRAKJ3J0HFjylEtkaypP6IoJeNbwwqIVJdcEfSiwtHTSR+YNn+GXt/xqOOIdg SlgeIphKqXNcMk4NS6nNktcsj+M6m1NufjBN2bMEUJbSgMg96qYt9bFr4K7a/bEtd6KN bApQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=OkeXp78t; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id m27-20020a634c5b000000b005c2185da898si10695566pgl.832.2023.12.05.22.07.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 22:07:21 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=OkeXp78t; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 53522802F193; Tue, 5 Dec 2023 22:07:20 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1377037AbjLFGHI (ORCPT + 99 others); Wed, 6 Dec 2023 01:07:08 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49912 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1376905AbjLFGGj (ORCPT ); Wed, 6 Dec 2023 01:06:39 -0500 Received: from mail-pg1-x52d.google.com (mail-pg1-x52d.google.com [IPv6:2607:f8b0:4864:20::52d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C090110C2 for ; Tue, 5 Dec 2023 22:06:38 -0800 (PST) Received: by mail-pg1-x52d.google.com with SMTP id 41be03b00d2f7-5be30d543c4so2863400a12.2 for ; Tue, 05 Dec 2023 22:06:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1701842798; x=1702447598; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=5fxgjHIY+i+wILRYpSYSEatYEwrJqDdx6PzPhkDcv2Y=; b=OkeXp78tTzQtRq/qF05WM5DAndaCDB8vhHKZDY0XNgg5ts5hL63JNgkkGzvDYoU598 +3f3XDOyYR3xtGk2x8Uo9q8hEk0xSVZ2RCpPCylqNbpaWV8x0C/MKx2BNOhTx8Defkzu LBxuKcz2ucwJ8N39FMlxiLJCGjqx4V0EcqQ6VFLWFKJFodaUUU0ay9THUBRWyAtcXz3D /nWPrTWAxSQByn9g2DU/P7+3BATGMUXWDrnetwZcEVSIcnFH3B0EyBNJefdx4snPcBzo u2z3AOvWP5I9NzS1qDxExZIevQB2Td+6o8sYj0OGzaBpOgeCI4Zm8q+H1HFahj4TTHne 0Zcw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701842798; x=1702447598; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5fxgjHIY+i+wILRYpSYSEatYEwrJqDdx6PzPhkDcv2Y=; b=En7l3HSrFkRSXDLsMlYchf1QITj6jWxkRYOpkhIInRaeviPBq398DggNEpMj2VdgEQ xnr286wBdm2hMlsUQxYYDG3cbk7zJtiDtjDzVjd/0NqqHjN3h5iwKp9CGk93r8q3dT2g l/GwqZ08Umpf+ovhqm7Lq7EKRsWGWoX9RspGzGqmE3p/hjGLNK59DcS30Efl+8L+pEwm 18y3fY22HtdQJfahwDiQnI0e2yIodahGKOs4/Tx3ox/icYTeFGl49PDgtk1de/2s0du+ wPMZDqdae/d65tKEBMDCeqGwFRageCLXB4Ip4TfY3FC8L/24CKRwfqDhwW3dqgR29pY4 HU0A== X-Gm-Message-State: AOJu0YxNgToX6Q+sgYi1AC8OpyF3ws5HNnDrnaEe8zN0UgydmcKXEA2g Cs63s/A+Lt2pPCmACxipjeWHoA== X-Received: by 2002:a05:6a20:a426:b0:18f:97c:823e with SMTP id z38-20020a056a20a42600b0018f097c823emr186921pzk.72.1701842797795; Tue, 05 Dec 2023 22:06:37 -0800 (PST) Received: from dread.disaster.area (pa49-180-125-5.pa.nsw.optusnet.com.au. [49.180.125.5]) by smtp.gmail.com with ESMTPSA id 13-20020a17090a08cd00b002868f5c2847sm2240355pjn.7.2023.12.05.22.06.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 22:06:34 -0800 (PST) Received: from [192.168.253.23] (helo=devoid.disaster.area) by dread.disaster.area with esmtp (Exim 4.96) (envelope-from ) id 1rAl3H-004VOm-3D; Wed, 06 Dec 2023 17:06:31 +1100 Received: from dave by devoid.disaster.area with local (Exim 4.97-RC0) (envelope-from ) id 1rAl3H-0000000BrVG-2K9L; Wed, 06 Dec 2023 17:06:31 +1100 From: Dave Chinner To: linux-fsdevel@vger.kernel.org Cc: linux-block@vger.kernel.org, linux-cachefs@redhat.com, dhowells@redhat.com, gfs2@lists.linux.dev, dm-devel@lists.linux.dev, linux-security-module@vger.kernel.org, selinux@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 04/11] lib/dlock-list: Make sibling CPUs share the same linked list Date: Wed, 6 Dec 2023 17:05:33 +1100 Message-ID: <20231206060629.2827226-5-david@fromorbit.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20231206060629.2827226-1-david@fromorbit.com> References: <20231206060629.2827226-1-david@fromorbit.com> MIME-Version: 1.0 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Tue, 05 Dec 2023 22:07:20 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784511559475076605 X-GMAIL-MSGID: 1784511559475076605 From: Waiman Long The dlock list needs one list for each of the CPUs available. However, for sibling CPUs, they are sharing the L2 and probably L1 caches too. As a result, there is not much to gain in term of avoiding cacheline contention while increasing the cacheline footprint of the L1/L2 caches as separate lists may need to be in the cache. This patch makes all the sibling CPUs share the same list, thus reducing the number of lists that need to be maintained in each dlock list without having any noticeable impact on performance. It also improves dlock list iteration performance as fewer lists need to be iterated. Signed-off-by: Waiman Long Reviewed-by: Jan Kara Reviewed-by: Kent Overstreet --- lib/dlock-list.c | 74 ++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 59 insertions(+), 15 deletions(-) diff --git a/lib/dlock-list.c b/lib/dlock-list.c index f64ea4cc5e79..e2860944ec9f 100644 --- a/lib/dlock-list.c +++ b/lib/dlock-list.c @@ -25,31 +25,65 @@ * The distributed and locked list is a distributed set of lists each of * which is protected by its own spinlock, but acts like a single * consolidated list to the callers. For scaling purpose, the number of - * lists used is equal to the number of possible CPUs in the system to - * minimize contention. + * lists used is equal to the number of possible cores in the system to + * minimize contention. All threads of the same CPU core will share the + * same list. * - * However, it is possible that individual CPU numbers may be equal to - * or greater than the number of possible CPUs when there are holes in - * the CPU number list. As a result, we need to map the CPU number to a - * list index. + * We need to map each CPU number to a list index. */ static DEFINE_PER_CPU_READ_MOSTLY(int, cpu2idx); +static int nr_dlock_lists __read_mostly; /* - * Initialize cpu2idx mapping table + * Initialize cpu2idx mapping table & nr_dlock_lists. * * It is possible that a dlock-list can be allocated before the cpu2idx is * initialized. In this case, all the cpus are mapped to the first entry * before initialization. * + * All the sibling CPUs of a sibling group will map to the same dlock list so + * as to reduce the number of dlock lists to be maintained while minimizing + * cacheline contention. + * + * As the sibling masks are set up in the core initcall phase, this function + * has to be done in the postcore phase to get the right data. */ static int __init cpu2idx_init(void) { int idx, cpu; + struct cpumask *sibling_mask; + static struct cpumask mask __initdata; + cpumask_clear(&mask); idx = 0; - for_each_possible_cpu(cpu) - per_cpu(cpu2idx, cpu) = idx++; + for_each_possible_cpu(cpu) { + int scpu; + + if (cpumask_test_cpu(cpu, &mask)) + continue; + per_cpu(cpu2idx, cpu) = idx; + cpumask_set_cpu(cpu, &mask); + + sibling_mask = topology_sibling_cpumask(cpu); + if (sibling_mask) { + for_each_cpu(scpu, sibling_mask) { + per_cpu(cpu2idx, scpu) = idx; + cpumask_set_cpu(scpu, &mask); + } + } + idx++; + } + + /* + * nr_dlock_lists can only be set after cpu2idx is properly + * initialized. + */ + smp_mb(); + nr_dlock_lists = idx; + WARN_ON(nr_dlock_lists > nr_cpu_ids); + + pr_info("dlock-list: %d head entries per dlock list.\n", + nr_dlock_lists); return 0; } postcore_initcall(cpu2idx_init); @@ -67,19 +101,23 @@ postcore_initcall(cpu2idx_init); * * Dynamically allocated locks need to have their own special lock class * to avoid lockdep warning. + * + * Since nr_dlock_lists will always be <= nr_cpu_ids, having more lists + * than necessary allocated is not a problem other than some wasted memory. + * The extra lists will not be ever used as all the cpu2idx entries will be + * 0 before initialization. */ int __alloc_dlock_list_heads(struct dlock_list_heads *dlist, struct lock_class_key *key) { - int idx; + int idx, cnt = nr_dlock_lists ? nr_dlock_lists : nr_cpu_ids; - dlist->heads = kcalloc(nr_cpu_ids, sizeof(struct dlock_list_head), - GFP_KERNEL); + dlist->heads = kcalloc(cnt, sizeof(struct dlock_list_head), GFP_KERNEL); if (!dlist->heads) return -ENOMEM; - for (idx = 0; idx < nr_cpu_ids; idx++) { + for (idx = 0; idx < cnt; idx++) { struct dlock_list_head *head = &dlist->heads[idx]; INIT_LIST_HEAD(&head->list); @@ -117,7 +155,10 @@ bool dlock_lists_empty(struct dlock_list_heads *dlist) { int idx; - for (idx = 0; idx < nr_cpu_ids; idx++) + /* Shouldn't be called before nr_dlock_lists is initialized */ + WARN_ON_ONCE(!nr_dlock_lists); + + for (idx = 0; idx < nr_dlock_lists; idx++) if (!list_empty(&dlist->heads[idx].list)) return false; return true; @@ -199,6 +240,9 @@ struct dlock_list_node *__dlock_list_next_list(struct dlock_list_iter *iter) struct dlock_list_node *next; struct dlock_list_head *head; + /* Shouldn't be called before nr_dlock_lists is initialized */ + WARN_ON_ONCE(!nr_dlock_lists); + restart: if (iter->entry) { spin_unlock(&iter->entry->lock); @@ -209,7 +253,7 @@ struct dlock_list_node *__dlock_list_next_list(struct dlock_list_iter *iter) /* * Try next list */ - if (++iter->index >= nr_cpu_ids) + if (++iter->index >= nr_dlock_lists) return NULL; /* All the entries iterated */ if (list_empty(&iter->head[iter->index].list)) From patchwork Wed Dec 6 06:05:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 174309 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp3907568vqy; Tue, 5 Dec 2023 22:06:57 -0800 (PST) X-Google-Smtp-Source: AGHT+IEMpTxSJDJR2FDAe71VQCHQkV0fy20bR1qUN1tDolHswns7mu3qqEm+C7F0dsXapT7Y7vgW X-Received: by 2002:a05:6808:191b:b0:3b8:b063:5d6d with SMTP id bf27-20020a056808191b00b003b8b0635d6dmr664664oib.84.1701842817475; Tue, 05 Dec 2023 22:06:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701842817; cv=none; d=google.com; s=arc-20160816; b=HktNEZoVz/Jge72rQN1qeKEnLv9P2upwm40zg6KOt0e//E7p93rpAS4/x8atfAHaZJ xcEua8CsKTg5I75N7NU0gkIF1y1Suocodje3NPypBRCU89vyi2D8V1G+ASUJxIEwKewq lGSNxFOL4hwVxyaMZ9rgII5+Kg/aKp8A7KewBXF2bhXBR3OJ+wSojEKhc/fIK8Jg1DZv cE5Ux3byyFwQVfFmk4geYOwKQvJLegvODnxwCfLZbuCd+I32wPKSj8a+W12uHj6l+srV c8oPB3sAXE/h9PzUL/UMyGWzZCHlx0bhGIinb5jZlkfumEzo8mcFyR1T915TjirgDlpW ewAQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=I06elknjTKgnu51XMDSSIqaWSXzJw01Bc0WUn6qADWU=; fh=CW5pnDTDkJR8Vn9NaUbLrrp28PUVapFEjMCn+HkS0lE=; b=DRtWJWccDP2+08b7u1sl/bpyGz89izVzy8zTAteilvhILQSzwF2AIZ/FCNEibYdtWy y2OK4haTiCfHLoImKPN3bSE0TzdsPpnQ39qqd3iK1Myk1j+mPwLMMABg7IaFdHF1jDml EAqIYJfMKZDRtnI8Zp0xCJCvfAguWZvZMnxSFotouUAsxQUxpFd6lDPhE4HQNGyBNoDO BP91hXoeMhLQ4IwtxMRuZ09NOWsNN61c8NeJTLhcTLXBF9TcIYfjVfkZlRqS8B8lq9oR erGwvICJWLgT58CJTRqd8eJYp2RzJ6nijZBLYUHQID7Waou7f6Fxj6fCI7qlhc/RnBS0 a8Ew== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=z1+QzgS4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id e3-20020a63d943000000b005aebc9096d4si10732683pgj.150.2023.12.05.22.06.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 22:06:57 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=z1+QzgS4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 1D4B9802F309; Tue, 5 Dec 2023 22:06:56 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1376924AbjLFGGl (ORCPT + 99 others); Wed, 6 Dec 2023 01:06:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49868 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1376893AbjLFGGc (ORCPT ); Wed, 6 Dec 2023 01:06:32 -0500 Received: from mail-oi1-x234.google.com (mail-oi1-x234.google.com [IPv6:2607:f8b0:4864:20::234]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A8697D4F for ; Tue, 5 Dec 2023 22:06:36 -0800 (PST) Received: by mail-oi1-x234.google.com with SMTP id 5614622812f47-3b8b463c7c6so2594462b6e.3 for ; Tue, 05 Dec 2023 22:06:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1701842796; x=1702447596; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=I06elknjTKgnu51XMDSSIqaWSXzJw01Bc0WUn6qADWU=; b=z1+QzgS4/Sbur45/4z83hrhThmyzPewSii5IGNWBp6GTVMHXan1ZgtwY9whhev+mue K2ZHkgYYfm6+aYG0XuukPWn5UueTuQw+vOUYlV80QZ+lUDS2TfQykJDh7HtNJsxIKz7n SpQtBqTgUjl8bpz5bxP9mVB+b9Hs80aof8fSZcuFhp2tLKDxIFIW3xsRkXS9AbzM0iK3 iRZWOATKwtohJNwKyzPUz7t0hfb4AgHI9uvqRSrtknZnOo43KE/OM4nvkcLkJlDTwx7/ hgSRB/D5Cu4dVXEoD7xo9XGHzuEhM7KIKsX0Wc1kHbOpmaAGbWLjo9U/Vw9B5ehH1kU1 M6pQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701842796; x=1702447596; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=I06elknjTKgnu51XMDSSIqaWSXzJw01Bc0WUn6qADWU=; b=MnpIFFvVvtTvp8TAu8oU9sFppIoBUyRCUWOb08LBE3IKgUn8ZpDYR8eREbiOBm12tT VXF194IML2SgdmkaKXJRByNVDSROPRR5mWDW++RkRa3LuTH+AuzSF/Jt3zSYbhKE8l2E U6vYLYMljd/AKSs6FnjkIqUCnoUATqsAL0znGDjxNtnRbSTdhG7NzackBtdzmBOTaPnn thSafvymOQT4QXnce093RfG/cb0lQ4MvD49gUPkjiLErioKPX7WT/mJbhJYkhXSaRmJl ERtl1veaGwPFFu8QfV08cuZp9uKQ/7Bq5LBNu/h7WHi7cLKVZwa/j1baTS74dK0Ijals eYOw== X-Gm-Message-State: AOJu0YwXTyBkIEulBN1oV4aVTK82OfxTyl2yV99c8GEF5s1gtG8zeinB WsRpZlOBSL/jH62DBW45hNpZ0Q== X-Received: by 2002:a05:6808:6507:b0:3b8:b063:5d82 with SMTP id fm7-20020a056808650700b003b8b0635d82mr684790oib.105.1701842795753; Tue, 05 Dec 2023 22:06:35 -0800 (PST) Received: from dread.disaster.area (pa49-180-125-5.pa.nsw.optusnet.com.au. [49.180.125.5]) by smtp.gmail.com with ESMTPSA id m18-20020a056a00081200b006ce64ebd2a0sm2297147pfk.99.2023.12.05.22.06.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 22:06:34 -0800 (PST) Received: from [192.168.253.23] (helo=devoid.disaster.area) by dread.disaster.area with esmtp (Exim 4.96) (envelope-from ) id 1rAl3I-004VOr-0C; Wed, 06 Dec 2023 17:06:31 +1100 Received: from dave by devoid.disaster.area with local (Exim 4.97-RC0) (envelope-from ) id 1rAl3H-0000000BrVL-2XOR; Wed, 06 Dec 2023 17:06:31 +1100 From: Dave Chinner To: linux-fsdevel@vger.kernel.org Cc: linux-block@vger.kernel.org, linux-cachefs@redhat.com, dhowells@redhat.com, gfs2@lists.linux.dev, dm-devel@lists.linux.dev, linux-security-module@vger.kernel.org, selinux@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 05/11] selinux: use dlist for isec inode list Date: Wed, 6 Dec 2023 17:05:34 +1100 Message-ID: <20231206060629.2827226-6-david@fromorbit.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20231206060629.2827226-1-david@fromorbit.com> References: <20231206060629.2827226-1-david@fromorbit.com> MIME-Version: 1.0 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Tue, 05 Dec 2023 22:06:56 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784511533889637423 X-GMAIL-MSGID: 1784511533889637423 From: Dave Chinner Because it's a horrible point of lock contention under heavily concurrent directory traversals... - 12.14% d_instantiate - 12.06% security_d_instantiate - 12.13% selinux_d_instantiate - 12.16% inode_doinit_with_dentry - 15.45% _raw_spin_lock - do_raw_spin_lock 14.68% __pv_queued_spin_lock_slowpath Signed-off-by: Dave Chinner Acked-by: Paul Moore --- include/linux/dlock-list.h | 9 ++++ security/selinux/hooks.c | 72 +++++++++++++++---------------- security/selinux/include/objsec.h | 6 +-- 3 files changed, 47 insertions(+), 40 deletions(-) diff --git a/include/linux/dlock-list.h b/include/linux/dlock-list.h index 327cb9edc7e3..7ad933b8875d 100644 --- a/include/linux/dlock-list.h +++ b/include/linux/dlock-list.h @@ -132,6 +132,15 @@ extern void dlock_lists_add(struct dlock_list_node *node, struct dlock_list_heads *dlist); extern void dlock_lists_del(struct dlock_list_node *node); +static inline void +dlock_list_del_iter(struct dlock_list_iter *iter, + struct dlock_list_node *node) +{ + WARN_ON_ONCE((iter->entry != READ_ONCE(node->head))); + list_del_init(&node->list); + WRITE_ONCE(node->head, NULL); +} + /* * Find the first entry of the next available list. */ diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index feda711c6b7b..0358d7c66949 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -340,26 +340,11 @@ static struct inode_security_struct *backing_inode_security(struct dentry *dentr static void inode_free_security(struct inode *inode) { struct inode_security_struct *isec = selinux_inode(inode); - struct superblock_security_struct *sbsec; if (!isec) return; - sbsec = selinux_superblock(inode->i_sb); - /* - * As not all inode security structures are in a list, we check for - * empty list outside of the lock to make sure that we won't waste - * time taking a lock doing nothing. - * - * The list_del_init() function can be safely called more than once. - * It should not be possible for this function to be called with - * concurrent list_add(), but for better safety against future changes - * in the code, we use list_empty_careful() here. - */ - if (!list_empty_careful(&isec->list)) { - spin_lock(&sbsec->isec_lock); - list_del_init(&isec->list); - spin_unlock(&sbsec->isec_lock); - } + if (!list_empty(&isec->list.list)) + dlock_lists_del(&isec->list); } struct selinux_mnt_opts { @@ -547,6 +532,8 @@ static int sb_finish_set_opts(struct super_block *sb) struct superblock_security_struct *sbsec = selinux_superblock(sb); struct dentry *root = sb->s_root; struct inode *root_inode = d_backing_inode(root); + struct dlock_list_iter iter; + struct inode_security_struct *isec, *n; int rc = 0; if (sbsec->behavior == SECURITY_FS_USE_XATTR) { @@ -570,27 +557,28 @@ static int sb_finish_set_opts(struct super_block *sb) /* Initialize the root inode. */ rc = inode_doinit_with_dentry(root_inode, root); - /* Initialize any other inodes associated with the superblock, e.g. - inodes created prior to initial policy load or inodes created - during get_sb by a pseudo filesystem that directly - populates itself. */ - spin_lock(&sbsec->isec_lock); - while (!list_empty(&sbsec->isec_head)) { - struct inode_security_struct *isec = - list_first_entry(&sbsec->isec_head, - struct inode_security_struct, list); + /* + * Initialize any other inodes associated with the superblock, e.g. + * inodes created prior to initial policy load or inodes created during + * get_sb by a pseudo filesystem that directly populates itself. + */ + init_dlock_list_iter(&iter, &sbsec->isec_head); + dlist_for_each_entry_safe(isec, n, &iter, list) { struct inode *inode = isec->inode; - list_del_init(&isec->list); - spin_unlock(&sbsec->isec_lock); + + dlock_list_del_iter(&iter, &isec->list); + dlock_list_unlock(&iter); + inode = igrab(inode); if (inode) { if (!IS_PRIVATE(inode)) inode_doinit_with_dentry(inode, NULL); iput(inode); } - spin_lock(&sbsec->isec_lock); + + dlock_list_relock(&iter); } - spin_unlock(&sbsec->isec_lock); + WARN_ON_ONCE(!dlock_lists_empty(&sbsec->isec_head)); return rc; } @@ -1428,10 +1416,8 @@ static int inode_doinit_with_dentry(struct inode *inode, struct dentry *opt_dent /* Defer initialization until selinux_complete_init, after the initial policy is loaded and the security server is ready to handle calls. */ - spin_lock(&sbsec->isec_lock); - if (list_empty(&isec->list)) - list_add(&isec->list, &sbsec->isec_head); - spin_unlock(&sbsec->isec_lock); + if (list_empty(&isec->list.list)) + dlock_lists_add(&isec->list, &sbsec->isec_head); goto out_unlock; } @@ -2548,9 +2534,10 @@ static int selinux_sb_alloc_security(struct super_block *sb) { struct superblock_security_struct *sbsec = selinux_superblock(sb); + if (alloc_dlock_list_heads(&sbsec->isec_head)) + return -ENOMEM; + mutex_init(&sbsec->lock); - INIT_LIST_HEAD(&sbsec->isec_head); - spin_lock_init(&sbsec->isec_lock); sbsec->sid = SECINITSID_UNLABELED; sbsec->def_sid = SECINITSID_FILE; sbsec->mntpoint_sid = SECINITSID_UNLABELED; @@ -2558,6 +2545,15 @@ static int selinux_sb_alloc_security(struct super_block *sb) return 0; } +static void selinux_sb_free_security(struct super_block *sb) +{ + struct superblock_security_struct *sbsec = selinux_superblock(sb); + + if (!sbsec) + return; + free_dlock_list_heads(&sbsec->isec_head); +} + static inline int opt_len(const char *s) { bool open_quote = false; @@ -2841,7 +2837,7 @@ static int selinux_inode_alloc_security(struct inode *inode) u32 sid = current_sid(); spin_lock_init(&isec->lock); - INIT_LIST_HEAD(&isec->list); + init_dlock_list_node(&isec->list); isec->inode = inode; isec->sid = SECINITSID_UNLABELED; isec->sclass = SECCLASS_FILE; @@ -2920,6 +2916,7 @@ static int selinux_inode_init_security(struct inode *inode, struct inode *dir, if (rc) return rc; + /* Possibly defer initialization to selinux_complete_init. */ if (sbsec->flags & SE_SBINITIALIZED) { struct inode_security_struct *isec = selinux_inode(inode); @@ -7215,6 +7212,7 @@ static struct security_hook_list selinux_hooks[] __ro_after_init = { selinux_msg_queue_alloc_security), LSM_HOOK_INIT(shm_alloc_security, selinux_shm_alloc_security), LSM_HOOK_INIT(sb_alloc_security, selinux_sb_alloc_security), + LSM_HOOK_INIT(sb_free_security, selinux_sb_free_security), LSM_HOOK_INIT(inode_alloc_security, selinux_inode_alloc_security), LSM_HOOK_INIT(sem_alloc_security, selinux_sem_alloc_security), LSM_HOOK_INIT(secid_to_secctx, selinux_secid_to_secctx), diff --git a/security/selinux/include/objsec.h b/security/selinux/include/objsec.h index 8159fd53c3de..e0709f429c56 100644 --- a/security/selinux/include/objsec.h +++ b/security/selinux/include/objsec.h @@ -24,6 +24,7 @@ #include #include #include +#include #include #include "flask.h" #include "avc.h" @@ -45,7 +46,7 @@ enum label_initialized { struct inode_security_struct { struct inode *inode; /* back pointer to inode object */ - struct list_head list; /* list of inode_security_struct */ + struct dlock_list_node list; /* list of inode_security_struct */ u32 task_sid; /* SID of creating task */ u32 sid; /* SID of this object */ u16 sclass; /* security class of this object */ @@ -67,8 +68,7 @@ struct superblock_security_struct { unsigned short behavior; /* labeling behavior */ unsigned short flags; /* which mount options were specified */ struct mutex lock; - struct list_head isec_head; - spinlock_t isec_lock; + struct dlock_list_heads isec_head; }; struct msg_security_struct { From patchwork Wed Dec 6 06:05:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 174307 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp3907490vqy; Tue, 5 Dec 2023 22:06:46 -0800 (PST) X-Google-Smtp-Source: AGHT+IFz/YPQBuQeu5Rs1JUcvpTjpJJWOpupYdX6q48UHdrXLCyhJbwt1m8vv6+jQyhcnCxSpoCq X-Received: by 2002:a05:6e02:60d:b0:35d:6394:9583 with SMTP id t13-20020a056e02060d00b0035d63949583mr607012ils.1.1701842806505; Tue, 05 Dec 2023 22:06:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701842806; cv=none; d=google.com; s=arc-20160816; b=JzD8uA4j0lx4JzaOekUjxB+oGgao+bXy7XOprnsAGOlsNNLiWwfCNzx7pzrM4MaAm+ 7qAGsasPkONHtVqZWCUeJpkTinA7o4kRgN8jKnfQHxxZL2sxSX90kidODwRkX4/OBNRe NN9foxbREUZ+R3y2q6e29BkacDFiey+g/4MlVci1wkBpgvA8hXlSKXmaqyT1ICUV32ne ZyQpwYSHMh6QZE+/A7RZczts4xO68bRAdiEzvlK7R//QEZH5mFRhZPG8B0KgBhE3Ca13 fokBIU1up9ulcUyiJXWVGxBOrmnwXPD8F+6oECCRsIxt8gAOkIUvewlA0E5ocHxxBLP+ gaWg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=wHSvpgHbJd6Eu7qsv8V2o7uK7jb6cBWSzdjaYC45cFQ=; fh=CW5pnDTDkJR8Vn9NaUbLrrp28PUVapFEjMCn+HkS0lE=; b=Z+NMEUz140PJQvvkrPkPMaHfP5ax5KNwkTttYuSCaDZOYk+BFVrwyNkV3N2BL6oxLy 8Kai6WCWrcglEpqQnsXHEFhVJRFz2jiP9xpaFCYBjT8qzoBnLbd+6IayYNffL2mLzXWI OV2BKV6MqqrGl6L/yrO6b7ji9D0BRNlx2jfTL86Ljaw2+2PdEbmix5oOAU2wc0+5K6qA xNRqaUB1NYS4TlK9gGmFucZyv1XzvdTG6hvx1esYGiDDv2K3u6veNKqQvvWASoQiHM6K qaKM4x5FY35sroc3gyRQo+zck/RUuRjjoDsjdRg5n0eoV0Q3Uisoc15bCDf8LZ9WetvK C8Lw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=hP0bHWFJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Received: from agentk.vger.email (agentk.vger.email. [2620:137:e000::3:2]) by mx.google.com with ESMTPS id bw19-20020a056a02049300b005c6ac5b5fd1si2049760pgb.676.2023.12.05.22.06.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 22:06:46 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) client-ip=2620:137:e000::3:2; Authentication-Results: mx.google.com; dkim=pass header.i=@fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=hP0bHWFJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 8D995803FC29; Tue, 5 Dec 2023 22:06:39 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1376878AbjLFGGa (ORCPT + 99 others); Wed, 6 Dec 2023 01:06:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41688 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1376768AbjLFGG3 (ORCPT ); Wed, 6 Dec 2023 01:06:29 -0500 Received: from mail-il1-x133.google.com (mail-il1-x133.google.com [IPv6:2607:f8b0:4864:20::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 315FA1B6 for ; Tue, 5 Dec 2023 22:06:35 -0800 (PST) Received: by mail-il1-x133.google.com with SMTP id e9e14a558f8ab-35d67599613so21984905ab.3 for ; Tue, 05 Dec 2023 22:06:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1701842794; x=1702447594; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=wHSvpgHbJd6Eu7qsv8V2o7uK7jb6cBWSzdjaYC45cFQ=; b=hP0bHWFJK3CDWgHLZH+vVHpswe5I5mgw1LNhFoZUiTK3MyI+/heouNQ/zh6m6zyqDZ CJbH6hqpwV33+QX44UBROaWbOttoIlTNxv+OQgreCmZ2JgLKJkRzdr5QajqIagFA/d/6 VDz2VZCxtod7iSBiaLp/6CD24NgRV56eOvu8VeCPfeGe3I/s+Wx/7aMOOEny82mPe9mX ZbMo13HENl4dEHIq0KCtkfh9NX1vWaYqTIhfSapbyjLS/krHKDQdAnw4a8GHNj0OqwTq fbLV+Xf+KG3f1VdlOANs9ZWnpW1BbJxleDB3FMaJePnNqN9hi7EYu7+bgnS4/nFOt6R0 /8+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701842794; x=1702447594; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wHSvpgHbJd6Eu7qsv8V2o7uK7jb6cBWSzdjaYC45cFQ=; b=JiR98wRW4z9etAeH8mDsvdlXCM6eoFOtKhOOQHNvNJRSjR9uaSkjyDyPj0nHM6JnCm GnWv54+/Z/OYiwUgJA66lZvHz8xsl43DP9ATXaqi8RXu5JFHQsu5DtzHwGiE9Vx6zrqp a+Id4D2Z9lAddCdo8kPGl84Z3reOko0bqL2+JqSxuSbvfrvYH2Ug7KPMnd1DSTJgZ451 DOSU9qVCzTN7cgAasXl7dpKTTubFetG0bC8vcyYTCmUvWS9bWutUvvAYsQ+uLjBP8x8c RzIS+BIB4yPbGeqn5aB23obGlKHhuI5Q8dZ8tOm3MsMgP2f9rESf+r/OaXtjubbTVjg0 tKuA== X-Gm-Message-State: AOJu0YyO763IWyWX++SuzMz+TffKf9pY8fAUEFwrWaFaZRkVE6ieKi1u E37znc6p0cX+UJY419bNlz3bUg== X-Received: by 2002:a05:6e02:13e8:b0:35d:6a39:faeb with SMTP id w8-20020a056e0213e800b0035d6a39faebmr517082ilj.10.1701842794477; Tue, 05 Dec 2023 22:06:34 -0800 (PST) Received: from dread.disaster.area (pa49-180-125-5.pa.nsw.optusnet.com.au. [49.180.125.5]) by smtp.gmail.com with ESMTPSA id i123-20020a639d81000000b005c662e103a1sm6488754pgd.41.2023.12.05.22.06.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 22:06:33 -0800 (PST) Received: from [192.168.253.23] (helo=devoid.disaster.area) by dread.disaster.area with esmtp (Exim 4.96) (envelope-from ) id 1rAl3I-004VOv-0L; Wed, 06 Dec 2023 17:06:31 +1100 Received: from dave by devoid.disaster.area with local (Exim 4.97-RC0) (envelope-from ) id 1rAl3H-0000000BrVQ-2oA0; Wed, 06 Dec 2023 17:06:31 +1100 From: Dave Chinner To: linux-fsdevel@vger.kernel.org Cc: linux-block@vger.kernel.org, linux-cachefs@redhat.com, dhowells@redhat.com, gfs2@lists.linux.dev, dm-devel@lists.linux.dev, linux-security-module@vger.kernel.org, selinux@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 06/11] vfs: factor out inode hash head calculation Date: Wed, 6 Dec 2023 17:05:35 +1100 Message-ID: <20231206060629.2827226-7-david@fromorbit.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20231206060629.2827226-1-david@fromorbit.com> References: <20231206060629.2827226-1-david@fromorbit.com> MIME-Version: 1.0 X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Tue, 05 Dec 2023 22:06:40 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784511522528023090 X-GMAIL-MSGID: 1784511522528023090 From: Dave Chinner In preparation for changing the inode hash table implementation. Signed-off-by: Dave Chinner ACKed-by: Al Viro --- fs/inode.c | 44 +++++++++++++++++++++++++------------------- 1 file changed, 25 insertions(+), 19 deletions(-) diff --git a/fs/inode.c b/fs/inode.c index 3426691fa305..fead81550cf4 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -59,6 +59,22 @@ static unsigned int i_hash_shift __ro_after_init; static struct hlist_head *inode_hashtable __ro_after_init; static __cacheline_aligned_in_smp DEFINE_SPINLOCK(inode_hash_lock); +static unsigned long hash(struct super_block *sb, unsigned long hashval) +{ + unsigned long tmp; + + tmp = (hashval * (unsigned long)sb) ^ (GOLDEN_RATIO_PRIME + hashval) / + L1_CACHE_BYTES; + tmp = tmp ^ ((tmp ^ GOLDEN_RATIO_PRIME) >> i_hash_shift); + return tmp & i_hash_mask; +} + +static inline struct hlist_head *i_hash_head(struct super_block *sb, + unsigned int hashval) +{ + return inode_hashtable + hash(sb, hashval); +} + /* * Empty aops. Can be used for the cases where the user does not * define any of the address_space operations. @@ -502,16 +518,6 @@ static inline void inode_sb_list_del(struct inode *inode) dlock_lists_del(&inode->i_sb_list); } -static unsigned long hash(struct super_block *sb, unsigned long hashval) -{ - unsigned long tmp; - - tmp = (hashval * (unsigned long)sb) ^ (GOLDEN_RATIO_PRIME + hashval) / - L1_CACHE_BYTES; - tmp = tmp ^ ((tmp ^ GOLDEN_RATIO_PRIME) >> i_hash_shift); - return tmp & i_hash_mask; -} - /** * __insert_inode_hash - hash an inode * @inode: unhashed inode @@ -1187,7 +1193,7 @@ struct inode *inode_insert5(struct inode *inode, unsigned long hashval, int (*test)(struct inode *, void *), int (*set)(struct inode *, void *), void *data) { - struct hlist_head *head = inode_hashtable + hash(inode->i_sb, hashval); + struct hlist_head *head = i_hash_head(inode->i_sb, hashval); struct inode *old; again: @@ -1291,7 +1297,7 @@ EXPORT_SYMBOL(iget5_locked); */ struct inode *iget_locked(struct super_block *sb, unsigned long ino) { - struct hlist_head *head = inode_hashtable + hash(sb, ino); + struct hlist_head *head = i_hash_head(sb, ino); struct inode *inode; again: spin_lock(&inode_hash_lock); @@ -1359,7 +1365,7 @@ EXPORT_SYMBOL(iget_locked); */ static int test_inode_iunique(struct super_block *sb, unsigned long ino) { - struct hlist_head *b = inode_hashtable + hash(sb, ino); + struct hlist_head *b = i_hash_head(sb, ino); struct inode *inode; hlist_for_each_entry_rcu(inode, b, i_hash) { @@ -1446,7 +1452,7 @@ EXPORT_SYMBOL(igrab); struct inode *ilookup5_nowait(struct super_block *sb, unsigned long hashval, int (*test)(struct inode *, void *), void *data) { - struct hlist_head *head = inode_hashtable + hash(sb, hashval); + struct hlist_head *head = i_hash_head(sb, hashval); struct inode *inode; spin_lock(&inode_hash_lock); @@ -1501,7 +1507,7 @@ EXPORT_SYMBOL(ilookup5); */ struct inode *ilookup(struct super_block *sb, unsigned long ino) { - struct hlist_head *head = inode_hashtable + hash(sb, ino); + struct hlist_head *head = i_hash_head(sb, ino); struct inode *inode; again: spin_lock(&inode_hash_lock); @@ -1550,7 +1556,7 @@ struct inode *find_inode_nowait(struct super_block *sb, void *), void *data) { - struct hlist_head *head = inode_hashtable + hash(sb, hashval); + struct hlist_head *head = i_hash_head(sb, hashval); struct inode *inode, *ret_inode = NULL; int mval; @@ -1595,7 +1601,7 @@ EXPORT_SYMBOL(find_inode_nowait); struct inode *find_inode_rcu(struct super_block *sb, unsigned long hashval, int (*test)(struct inode *, void *), void *data) { - struct hlist_head *head = inode_hashtable + hash(sb, hashval); + struct hlist_head *head = i_hash_head(sb, hashval); struct inode *inode; RCU_LOCKDEP_WARN(!rcu_read_lock_held(), @@ -1633,7 +1639,7 @@ EXPORT_SYMBOL(find_inode_rcu); struct inode *find_inode_by_ino_rcu(struct super_block *sb, unsigned long ino) { - struct hlist_head *head = inode_hashtable + hash(sb, ino); + struct hlist_head *head = i_hash_head(sb, ino); struct inode *inode; RCU_LOCKDEP_WARN(!rcu_read_lock_held(), @@ -1653,7 +1659,7 @@ int insert_inode_locked(struct inode *inode) { struct super_block *sb = inode->i_sb; ino_t ino = inode->i_ino; - struct hlist_head *head = inode_hashtable + hash(sb, ino); + struct hlist_head *head = i_hash_head(sb, ino); while (1) { struct inode *old = NULL; From patchwork Wed Dec 6 06:05:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 174308 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp3907530vqy; Tue, 5 Dec 2023 22:06:51 -0800 (PST) X-Google-Smtp-Source: AGHT+IHUEZXc6w2Bzhk9pKcujXZd62J+AI6qEUaQLSShkPZxp/RQmRYO6Cj/Y+eKSzJ30Vc7A1AP X-Received: by 2002:a17:902:e852:b0:1d0:9c03:a7dc with SMTP id t18-20020a170902e85200b001d09c03a7dcmr247768plg.100.1701842811481; Tue, 05 Dec 2023 22:06:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701842811; cv=none; d=google.com; s=arc-20160816; b=tmRPR+huYwrHnINMDLCq4dqWdDrjyeU4khb5j7dFK1tYSu1N2dLQDHE+PFzUA7LxHh QHWlJv5IDXialL+qrG8ZZVvqAv9pbfiSPAYGsIIAJUbjx5kRJUylYgRoSelORsz/X1zx glFsjUm0F1Hh30CngIuDFBXV1TTY5bzz+sZIImgD33cDulGmb7Hfe3NMd56SkJICz1S1 rY17TQh/74jc39vTT3tP84z0tPk/sIcmXxJqJPCyfJEL8mCr924lnUmrv+Dp7qcYpWaH zezNrWO4fEgiiUfELu1oM5C02S7H5Hb/tytMWR9xC2wfDX13M/15TZPl7Zg15GriLga8 4hKg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=p8wgNl1PchXcHI5F0qOFxbj06sbuE+5yXttvrgVz+ro=; fh=CW5pnDTDkJR8Vn9NaUbLrrp28PUVapFEjMCn+HkS0lE=; b=QFfR4d90Ul7jha+I02Ad4w65GyyceMQm/131lcsZPy41rEMg+aihOH5aTIVj9gOhWa M9WriwbJ/BwZyqzN+5NmJYFCw7/3Sg2puqEBnIGGWtOM0NGWB03dKk+T4gv/I+QlMv7B 01Y39vNl027GsfyfEwhnsKDhV3FNv2NqDaGRaL5jw+W2wi3FImu6BUfRKPlsNQWvfgby 8hOdVkrJ8ZHvt4maB2G764XOs1msCTnWXGVyWVvDhoYXGy+/iUJRn2nekwV7mtP5+lYa vN+1kY5A4SsXYKm3H+Fe/x24NXC/BqW7jn4p8/T1ItkM7fJiqzoSQ0aesvHHAuGfcDBF K9BQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b="Xjve/251"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id u9-20020a170902e80900b001d005be08c4si10968943plg.235.2023.12.05.22.06.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 22:06:51 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b="Xjve/251"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 19F6A802F303; Tue, 5 Dec 2023 22:06:50 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1376890AbjLFGGc (ORCPT + 99 others); Wed, 6 Dec 2023 01:06:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41698 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1376859AbjLFGGa (ORCPT ); Wed, 6 Dec 2023 01:06:30 -0500 Received: from mail-pf1-x431.google.com (mail-pf1-x431.google.com [IPv6:2607:f8b0:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 720E71BF for ; Tue, 5 Dec 2023 22:06:35 -0800 (PST) Received: by mail-pf1-x431.google.com with SMTP id d2e1a72fcca58-6cbe716b511so5041133b3a.3 for ; Tue, 05 Dec 2023 22:06:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1701842795; x=1702447595; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=p8wgNl1PchXcHI5F0qOFxbj06sbuE+5yXttvrgVz+ro=; b=Xjve/251Fqx9HCpRQ+sthCif1vXTCZi9TuR3dET8NQ5NeHqYD1yi+XNWOLpLkuK8IO 01Ohgxf0N/isIx5y0MdGk72WDMA4YrIEbeO+E0EYikWDO2FkIejJgiys1VYPmwXT5nrW D6eUEuoK/+u9lOIpKSYJwd2RFXp+wnah6N1enac4ckeuZ5vQGPMOXVUVkUIlQo/XzBL9 lyxMzrceGSNV/3XAIKiwTri/q9fMAdZXqc9w0t0zjgiO3UgnSj1gyGj7fbqmGbEITj9D kOLZ5A0r+dw5szr5k3BT8OXKr6OBrHs1C6uGgOyY09Ug4gKA4qDSy/uJHZgYnnart1+n +jLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701842795; x=1702447595; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=p8wgNl1PchXcHI5F0qOFxbj06sbuE+5yXttvrgVz+ro=; b=xS0eJglRsoe/e5HHr7AnsSNNmWMS5b67NQt9e023MvLyVuFu3KwwZNjpc12zHx5cMq v1jHIdeYkLYnoOvAnxap39u18tyF2VeQqlofvTzGAiTbP68FEgOMSFOZInw0fDbxKdNH +oYIoATI9uKaSz7m3ctWC5BNGd8Fah2agJi69lJsGZmGbq+vgmxoSuZlNVI5uc3OolOR qeRwscmaIQdCeiu+u8ycPTyAYmTiGFx3a5+UHMK1uX1jFP/PEPZeInfwXvI0D4uZ7k9y +x9iZlAKuoLplTTUiRYUkUSiPNhXc2uV+yYSO0IwzKYNzuYqOR50riJpO8SeTbHHGjju pvLw== X-Gm-Message-State: AOJu0YygsXWocsZCRisnBkPPcF09gRCEqVSmI4ffn4ogguM8i23Iq3v4 rIPTNi1QUIwsUCv7KE5iuGa4SQ== X-Received: by 2002:a05:6a20:c19e:b0:18f:97c:8254 with SMTP id bg30-20020a056a20c19e00b0018f097c8254mr292006pzb.94.1701842794864; Tue, 05 Dec 2023 22:06:34 -0800 (PST) Received: from dread.disaster.area (pa49-180-125-5.pa.nsw.optusnet.com.au. [49.180.125.5]) by smtp.gmail.com with ESMTPSA id p12-20020a170902bd0c00b001d0c5037ed3sm2322833pls.46.2023.12.05.22.06.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 22:06:33 -0800 (PST) Received: from [192.168.253.23] (helo=devoid.disaster.area) by dread.disaster.area with esmtp (Exim 4.96) (envelope-from ) id 1rAl3I-004VOz-0c; Wed, 06 Dec 2023 17:06:31 +1100 Received: from dave by devoid.disaster.area with local (Exim 4.97-RC0) (envelope-from ) id 1rAl3H-0000000BrVV-32vJ; Wed, 06 Dec 2023 17:06:31 +1100 From: Dave Chinner To: linux-fsdevel@vger.kernel.org Cc: linux-block@vger.kernel.org, linux-cachefs@redhat.com, dhowells@redhat.com, gfs2@lists.linux.dev, dm-devel@lists.linux.dev, linux-security-module@vger.kernel.org, selinux@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 07/11] hlist-bl: add hlist_bl_fake() Date: Wed, 6 Dec 2023 17:05:36 +1100 Message-ID: <20231206060629.2827226-8-david@fromorbit.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20231206060629.2827226-1-david@fromorbit.com> References: <20231206060629.2827226-1-david@fromorbit.com> MIME-Version: 1.0 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Tue, 05 Dec 2023 22:06:50 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784511528095624787 X-GMAIL-MSGID: 1784511528095624787 From: Dave Chinner in preparation for switching the VFS inode cache over the hlist_bl lists, we nee dto be able to fake a list node that looks like it is hased for correct operation of filesystems that don't directly use the VFS indoe cache. Signed-off-by: Dave Chinner --- include/linux/list_bl.h | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/include/linux/list_bl.h b/include/linux/list_bl.h index ae1b541446c9..8ee2bf5af131 100644 --- a/include/linux/list_bl.h +++ b/include/linux/list_bl.h @@ -143,6 +143,28 @@ static inline void hlist_bl_del_init(struct hlist_bl_node *n) } } +/** + * hlist_bl_add_fake - create a fake list consisting of a single headless node + * @n: Node to make a fake list out of + * + * This makes @n appear to be its own predecessor on a headless hlist. + * The point of this is to allow things like hlist_bl_del() to work correctly + * in cases where there is no list. + */ +static inline void hlist_bl_add_fake(struct hlist_bl_node *n) +{ + n->pprev = &n->next; +} + +/** + * hlist_fake: Is this node a fake hlist_bl? + * @h: Node to check for being a self-referential fake hlist. + */ +static inline bool hlist_bl_fake(struct hlist_bl_node *n) +{ + return n->pprev == &n->next; +} + static inline void hlist_bl_lock(struct hlist_bl_head *b) { bit_spin_lock(0, (unsigned long *)b); From patchwork Wed Dec 6 06:05:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 174317 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp3907986vqy; Tue, 5 Dec 2023 22:08:00 -0800 (PST) X-Google-Smtp-Source: AGHT+IFZDCklnYXJjQud4vcsXZNRxaMxHJNfI6G0uhSwSfyiumNpJvrhmXjQZWkgKwxiorAgaw8A X-Received: by 2002:a05:6a00:1d8d:b0:6ce:2731:47bf with SMTP id z13-20020a056a001d8d00b006ce273147bfmr543832pfw.31.1701842879655; Tue, 05 Dec 2023 22:07:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701842879; cv=none; d=google.com; s=arc-20160816; b=FRZo4JqQ2f5L/J3jm1WPRi+eXNXVHd7NIL8ba9lPCZHXHNNEH8rju07IzKZRkzNbkl CEzuerzpMxx5+ikY7Hqb2E5nHaCRWQgGAy5fioOWu4UHmettyw0luY0+IB9xVqhZX9k+ /uq4yvjygqdxU3szydV4ktYJ25ahT5gim0Cbs2KalleQYgmycvzA4xxsklUyVf2WQjVy MIQQL8CxE5Q5o7lvmR0++W6NRfPu9/QQrABjpCqKB+zr33KlAidj/Pkmq6GTK4ogG+zT VnQeW3Lp3D10ccmL3Y3OFcJHp/rXa8DFhamnvRhG32Ri6MnUPOEP601v07Fmf9F0Lwjy 8ECg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=d5c67p4Iwxfk8EXvTyIdbVdxswTm1LVGC6Ng7PTwCNc=; fh=CW5pnDTDkJR8Vn9NaUbLrrp28PUVapFEjMCn+HkS0lE=; b=KdZBx40Xb0YDUiTi4IficCaORVlzwMQAzRXT1Pa3/uurr36Zr3/b8P7iyUuFvT+/Oq ZYUd3cg85j+YELBJWtGjF3VATP7EwknjOV1Ui9ARWUMCOeBx+bH1RN+xZFJKzfshzbkt 0lMN527DgceTz/WKLyi+9I8GGfzLtn5Cl6L+44lllMleyLrjduLpFRq8qFVaStEJUjYN za9uZQdlZhDl0UynVc+o/CWUBHJpGH1nY23V4tl3EGmeHK+Yg0UHc4/oGCQxHhaKBwpi zJfQdOUaiOnjkMjlH5i+ui6VIxjeiZCpZHmWHQZYNw4tt+C2fRVrMxd60sQUf1/e1+W6 wX7A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=C70rExER; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id x8-20020a63fe48000000b005c5ff4675d8si4892938pgj.734.2023.12.05.22.07.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 22:07:59 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; dkim=pass header.i=@fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=C70rExER; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 9C8E480B1A03; Tue, 5 Dec 2023 22:07:54 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1377052AbjLFGHT (ORCPT + 99 others); Wed, 6 Dec 2023 01:07:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49940 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1376909AbjLFGGk (ORCPT ); Wed, 6 Dec 2023 01:06:40 -0500 Received: from mail-pj1-x1035.google.com (mail-pj1-x1035.google.com [IPv6:2607:f8b0:4864:20::1035]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0BC8F10C3 for ; Tue, 5 Dec 2023 22:06:38 -0800 (PST) Received: by mail-pj1-x1035.google.com with SMTP id 98e67ed59e1d1-28659348677so436266a91.0 for ; Tue, 05 Dec 2023 22:06:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1701842798; x=1702447598; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=d5c67p4Iwxfk8EXvTyIdbVdxswTm1LVGC6Ng7PTwCNc=; b=C70rExERCUDKVhuB49M3Fxl3WYE+P/JH2DBsrfx8UZ3Zc7HcBSZeAjtSnWWcPnKBRO JU5QdqH8lm8KJj/SKiRtRYxuqRwPLlgWXZOEtOmeJSGfnYHiwFBqnF8HDkNvZ0/ZL+Tn pdO0v2cOf8w54S8PVvmhaV3/QUZ194kwTJ3w+Wj4n9fjfkFMqL2nNsx2Oh/kSSpBfOY2 YBMRl+m+f89zvWy+Ct/kjoMG1z8l40Hgd4/esQe45t2Qf+FWpkPRl2MWRUugTMRVeDDw jADnmoprU9tbUAdqmbPPAcsj66NQz9dkj709IhL3W/E1pZAXQa6Me7l0Exky/0YTR+WA 7U6Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701842798; x=1702447598; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=d5c67p4Iwxfk8EXvTyIdbVdxswTm1LVGC6Ng7PTwCNc=; b=h1bfC1M1XiJB/ORcopYssJ6D1mVwA6WkbxbzI86G17rMxpe/zWcD+kcSCfFUcGHMZ5 rqGPZ/KP1Sl6pjym7z6Y+foaXc/Yv+2QFuqR5wLbp6HndC6YJLyWth70IeyaHvUUb59A hYL1GNTzJ91zPGsScQHC91qjCMQGI1k1GLE/EG0aNoE+3oGwlUd+N3O94kcHh9zY0EIv SWCDdbGNicfSrkeZAoUeKY6z1EPt/dzWn1UTIKaeWJVD4xq2Y6XDugnxcC7LTjbAQLbI 5m7zmQzsrD2IMOPyzOj4wzrrs4y6epYi8yvyGMw+OpiL0JGJQeTkGvprwbX8jyYT0B2a 2yog== X-Gm-Message-State: AOJu0Yw6lDiZ55OG6FRZocgLay5iu1w34GPcetrWpI0/730vqZXvFgra 7euaE6cbDLq/4CxgJS1j/Ma3rg== X-Received: by 2002:a17:90a:d917:b0:286:b01d:88a6 with SMTP id c23-20020a17090ad91700b00286b01d88a6mr2839698pjv.6.1701842798242; Tue, 05 Dec 2023 22:06:38 -0800 (PST) Received: from dread.disaster.area (pa49-180-125-5.pa.nsw.optusnet.com.au. [49.180.125.5]) by smtp.gmail.com with ESMTPSA id m13-20020a17090a858d00b00286920600a9sm5350682pjn.32.2023.12.05.22.06.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 22:06:35 -0800 (PST) Received: from [192.168.253.23] (helo=devoid.disaster.area) by dread.disaster.area with esmtp (Exim 4.96) (envelope-from ) id 1rAl3I-004VP3-0n; Wed, 06 Dec 2023 17:06:31 +1100 Received: from dave by devoid.disaster.area with local (Exim 4.97-RC0) (envelope-from ) id 1rAl3H-0000000BrVa-3MP5; Wed, 06 Dec 2023 17:06:31 +1100 From: Dave Chinner To: linux-fsdevel@vger.kernel.org Cc: linux-block@vger.kernel.org, linux-cachefs@redhat.com, dhowells@redhat.com, gfs2@lists.linux.dev, dm-devel@lists.linux.dev, linux-security-module@vger.kernel.org, selinux@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 08/11] vfs: inode cache conversion to hash-bl Date: Wed, 6 Dec 2023 17:05:37 +1100 Message-ID: <20231206060629.2827226-9-david@fromorbit.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20231206060629.2827226-1-david@fromorbit.com> References: <20231206060629.2827226-1-david@fromorbit.com> MIME-Version: 1.0 X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Tue, 05 Dec 2023 22:07:54 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784511599608551644 X-GMAIL-MSGID: 1784511599608551644 From: Dave Chinner Scalability of the global inode_hash_lock really sucks for filesystems that use the vfs inode cache (i.e. everything but XFS). Profiles of a 32-way concurrent sharded directory walk (no contended directories) on a couple of different filesystems. All numbers from a 6.7-rc4 kernel. Bcachefs: - 98.78% vfs_statx - 97.74% filename_lookup - 97.70% path_lookupat - 97.54% walk_component - 97.06% lookup_slow - 97.03% __lookup_slow - 96.21% bch2_lookup - 91.87% bch2_vfs_inode_get - 84.10% iget5_locked - 44.09% ilookup5 - 43.50% _raw_spin_lock - 43.49% do_raw_spin_lock 42.75% __pv_queued_spin_lock_slowpath - 39.06% inode_insert5 - 38.46% _raw_spin_lock - 38.46% do_raw_spin_lock 37.51% __pv_queued_spin_lock_slowpath ext4: - 93.75% vfs_statx - 92.39% filename_lookup - 92.34% path_lookupat - 92.09% walk_component - 91.48% lookup_slow - 91.43% __lookup_slow - 90.18% ext4_lookup - 84.84% __ext4_iget - 83.67% iget_locked - 81.24% _raw_spin_lock - 81.23% do_raw_spin_lock - 78.90% __pv_queued_spin_lock_slowpath Both bcachefs and ext4 demonstrate poor scaling at >=8 threads on concurrent lookup or create workloads. Hence convert the inode hash table to a RCU-aware hash-bl table just like the dentry cache. Note that we need to store a pointer to the hlist_bl_head the inode has been added to in the inode so that when it comes to unhash the inode we know what list to lock. We need to do this because, unlike the dentry cache, the hash value that is used to hash the inode is not generated from the inode itself. i.e. filesystems can provide this themselves so we have to either store the hashval or the hlist head pointer in the inode to be able to find the right list head for removal... Concurrent walk of 400k files per thread with varying thread count in seconds is as follows. Perfect scaling is an unchanged walk time as thread count increases. ext4 bcachefs threads vanilla patched vanilla patched 2 7.923 7.358 8.003 7.276 4 8.152 7.530 9.097 8.506 8 13.090 7.871 11.752 10.015 16 24.602 9.540 24.614 13.989 32 49.536 19.314 49.179 25.982 The big wins here are at >= 8 threads, with both filesytsems now being limited by internal filesystem algorithms, not the VFS inode cache scalability. Ext4 contention moves to the buffer cache on directory block lookups: - 66.45% 0.44% [kernel] [k] __ext4_read_dirblock - 66.01% __ext4_read_dirblock - 66.01% ext4_bread - ext4_getblk - 64.77% bdev_getblk - 64.69% __find_get_block - 63.01% _raw_spin_lock - 62.96% do_raw_spin_lock 59.21% __pv_queued_spin_lock_slowpath bcachefs contention moves to internal btree traversal locks. - 95.37% __lookup_slow - 93.95% bch2_lookup - 82.57% bch2_vfs_inode_get - 65.44% bch2_inode_find_by_inum_trans - 65.41% bch2_inode_peek_nowarn - 64.60% bch2_btree_iter_peek_slot - 64.55% bch2_btree_path_traverse_one - bch2_btree_path_traverse_cached - 63.02% bch2_btree_path_traverse_cached_slowpath - 56.60% mutex_lock - 55.29% __mutex_lock_slowpath - 55.25% __mutex_lock 50.29% osq_lock 1.84% __raw_callee_save___kvm_vcpu_is_preempted 0.54% mutex_spin_on_owner Signed-off-by: Dave Chinner Reviewed-by: Kent Overstreet --- fs/inode.c | 200 ++++++++++++++++++++++++++++----------------- include/linux/fs.h | 9 +- 2 files changed, 132 insertions(+), 77 deletions(-) diff --git a/fs/inode.c b/fs/inode.c index fead81550cf4..3eb9c4e5b279 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -56,8 +56,7 @@ static unsigned int i_hash_mask __ro_after_init; static unsigned int i_hash_shift __ro_after_init; -static struct hlist_head *inode_hashtable __ro_after_init; -static __cacheline_aligned_in_smp DEFINE_SPINLOCK(inode_hash_lock); +static struct hlist_bl_head *inode_hashtable __ro_after_init; static unsigned long hash(struct super_block *sb, unsigned long hashval) { @@ -69,7 +68,7 @@ static unsigned long hash(struct super_block *sb, unsigned long hashval) return tmp & i_hash_mask; } -static inline struct hlist_head *i_hash_head(struct super_block *sb, +static inline struct hlist_bl_head *i_hash_head(struct super_block *sb, unsigned int hashval) { return inode_hashtable + hash(sb, hashval); @@ -434,7 +433,7 @@ EXPORT_SYMBOL(address_space_init_once); void inode_init_once(struct inode *inode) { memset(inode, 0, sizeof(*inode)); - INIT_HLIST_NODE(&inode->i_hash); + INIT_HLIST_BL_NODE(&inode->i_hash); INIT_LIST_HEAD(&inode->i_devices); INIT_LIST_HEAD(&inode->i_io_list); INIT_LIST_HEAD(&inode->i_wb_list); @@ -518,6 +517,17 @@ static inline void inode_sb_list_del(struct inode *inode) dlock_lists_del(&inode->i_sb_list); } +/* + * Ensure that we store the hash head in the inode when we insert the inode into + * the hlist_bl_head... + */ +static inline void +__insert_inode_hash_head(struct inode *inode, struct hlist_bl_head *b) +{ + hlist_bl_add_head_rcu(&inode->i_hash, b); + inode->i_hash_head = b; +} + /** * __insert_inode_hash - hash an inode * @inode: unhashed inode @@ -528,13 +538,13 @@ static inline void inode_sb_list_del(struct inode *inode) */ void __insert_inode_hash(struct inode *inode, unsigned long hashval) { - struct hlist_head *b = inode_hashtable + hash(inode->i_sb, hashval); + struct hlist_bl_head *b = i_hash_head(inode->i_sb, hashval); - spin_lock(&inode_hash_lock); + hlist_bl_lock(b); spin_lock(&inode->i_lock); - hlist_add_head_rcu(&inode->i_hash, b); + __insert_inode_hash_head(inode, b); spin_unlock(&inode->i_lock); - spin_unlock(&inode_hash_lock); + hlist_bl_unlock(b); } EXPORT_SYMBOL(__insert_inode_hash); @@ -546,11 +556,44 @@ EXPORT_SYMBOL(__insert_inode_hash); */ void __remove_inode_hash(struct inode *inode) { - spin_lock(&inode_hash_lock); - spin_lock(&inode->i_lock); - hlist_del_init_rcu(&inode->i_hash); - spin_unlock(&inode->i_lock); - spin_unlock(&inode_hash_lock); + struct hlist_bl_head *b = inode->i_hash_head; + + /* + * There are some callers that come through here without synchronisation + * and potentially with multiple references to the inode. Hence we have + * to handle the case that we might race with a remove and insert to a + * different list. Coda, in particular, seems to have a userspace API + * that can directly trigger "unhash/rehash to different list" behaviour + * without any serialisation at all. + * + * Hence we have to handle the situation where the inode->i_hash_head + * might point to a different list than what we expect, indicating that + * we raced with another unhash and potentially a new insertion. This + * means we have to retest the head once we have everything locked up + * and loop again if it doesn't match. + */ + while (b) { + hlist_bl_lock(b); + spin_lock(&inode->i_lock); + if (b != inode->i_hash_head) { + hlist_bl_unlock(b); + b = inode->i_hash_head; + spin_unlock(&inode->i_lock); + continue; + } + /* + * Need to set the pprev pointer to NULL after list removal so + * that both RCU traversals and hlist_bl_unhashed() work + * correctly at this point. + */ + hlist_bl_del_rcu(&inode->i_hash); + inode->i_hash.pprev = NULL; + inode->i_hash_head = NULL; + spin_unlock(&inode->i_lock); + hlist_bl_unlock(b); + break; + } + } EXPORT_SYMBOL(__remove_inode_hash); @@ -886,26 +929,28 @@ long prune_icache_sb(struct super_block *sb, struct shrink_control *sc) return freed; } -static void __wait_on_freeing_inode(struct inode *inode); +static void __wait_on_freeing_inode(struct hlist_bl_head *b, + struct inode *inode); /* * Called with the inode lock held. */ static struct inode *find_inode(struct super_block *sb, - struct hlist_head *head, + struct hlist_bl_head *b, int (*test)(struct inode *, void *), void *data) { + struct hlist_bl_node *node; struct inode *inode = NULL; repeat: - hlist_for_each_entry(inode, head, i_hash) { + hlist_bl_for_each_entry(inode, node, b, i_hash) { if (inode->i_sb != sb) continue; if (!test(inode, data)) continue; spin_lock(&inode->i_lock); if (inode->i_state & (I_FREEING|I_WILL_FREE)) { - __wait_on_freeing_inode(inode); + __wait_on_freeing_inode(b, inode); goto repeat; } if (unlikely(inode->i_state & I_CREATING)) { @@ -924,19 +969,20 @@ static struct inode *find_inode(struct super_block *sb, * iget_locked for details. */ static struct inode *find_inode_fast(struct super_block *sb, - struct hlist_head *head, unsigned long ino) + struct hlist_bl_head *b, unsigned long ino) { + struct hlist_bl_node *node; struct inode *inode = NULL; repeat: - hlist_for_each_entry(inode, head, i_hash) { + hlist_bl_for_each_entry(inode, node, b, i_hash) { if (inode->i_ino != ino) continue; if (inode->i_sb != sb) continue; spin_lock(&inode->i_lock); if (inode->i_state & (I_FREEING|I_WILL_FREE)) { - __wait_on_freeing_inode(inode); + __wait_on_freeing_inode(b, inode); goto repeat; } if (unlikely(inode->i_state & I_CREATING)) { @@ -1186,25 +1232,25 @@ EXPORT_SYMBOL(unlock_two_nondirectories); * return it locked, hashed, and with the I_NEW flag set. The file system gets * to fill it in before unlocking it via unlock_new_inode(). * - * Note both @test and @set are called with the inode_hash_lock held, so can't - * sleep. + * Note both @test and @set are called with the inode hash chain lock held, + * so can't sleep. */ struct inode *inode_insert5(struct inode *inode, unsigned long hashval, int (*test)(struct inode *, void *), int (*set)(struct inode *, void *), void *data) { - struct hlist_head *head = i_hash_head(inode->i_sb, hashval); + struct hlist_bl_head *b = i_hash_head(inode->i_sb, hashval); struct inode *old; again: - spin_lock(&inode_hash_lock); - old = find_inode(inode->i_sb, head, test, data); + hlist_bl_lock(b); + old = find_inode(inode->i_sb, b, test, data); if (unlikely(old)) { /* * Uhhuh, somebody else created the same inode under us. * Use the old inode instead of the preallocated one. */ - spin_unlock(&inode_hash_lock); + hlist_bl_unlock(b); if (IS_ERR(old)) return NULL; wait_on_inode(old); @@ -1226,7 +1272,7 @@ struct inode *inode_insert5(struct inode *inode, unsigned long hashval, */ spin_lock(&inode->i_lock); inode->i_state |= I_NEW; - hlist_add_head_rcu(&inode->i_hash, head); + __insert_inode_hash_head(inode, b); spin_unlock(&inode->i_lock); /* @@ -1236,7 +1282,7 @@ struct inode *inode_insert5(struct inode *inode, unsigned long hashval, if (list_empty(&inode->i_sb_list.list)) inode_sb_list_add(inode); unlock: - spin_unlock(&inode_hash_lock); + hlist_bl_unlock(b); return inode; } @@ -1297,12 +1343,12 @@ EXPORT_SYMBOL(iget5_locked); */ struct inode *iget_locked(struct super_block *sb, unsigned long ino) { - struct hlist_head *head = i_hash_head(sb, ino); + struct hlist_bl_head *b = i_hash_head(sb, ino); struct inode *inode; again: - spin_lock(&inode_hash_lock); - inode = find_inode_fast(sb, head, ino); - spin_unlock(&inode_hash_lock); + hlist_bl_lock(b); + inode = find_inode_fast(sb, b, ino); + hlist_bl_unlock(b); if (inode) { if (IS_ERR(inode)) return NULL; @@ -1318,17 +1364,17 @@ struct inode *iget_locked(struct super_block *sb, unsigned long ino) if (inode) { struct inode *old; - spin_lock(&inode_hash_lock); + hlist_bl_lock(b); /* We released the lock, so.. */ - old = find_inode_fast(sb, head, ino); + old = find_inode_fast(sb, b, ino); if (!old) { inode->i_ino = ino; spin_lock(&inode->i_lock); inode->i_state = I_NEW; - hlist_add_head_rcu(&inode->i_hash, head); + __insert_inode_hash_head(inode, b); spin_unlock(&inode->i_lock); inode_sb_list_add(inode); - spin_unlock(&inode_hash_lock); + hlist_bl_unlock(b); /* Return the locked inode with I_NEW set, the * caller is responsible for filling in the contents @@ -1341,7 +1387,7 @@ struct inode *iget_locked(struct super_block *sb, unsigned long ino) * us. Use the old inode instead of the one we just * allocated. */ - spin_unlock(&inode_hash_lock); + hlist_bl_unlock(b); destroy_inode(inode); if (IS_ERR(old)) return NULL; @@ -1365,10 +1411,11 @@ EXPORT_SYMBOL(iget_locked); */ static int test_inode_iunique(struct super_block *sb, unsigned long ino) { - struct hlist_head *b = i_hash_head(sb, ino); + struct hlist_bl_head *b = i_hash_head(sb, ino); + struct hlist_bl_node *node; struct inode *inode; - hlist_for_each_entry_rcu(inode, b, i_hash) { + hlist_bl_for_each_entry_rcu(inode, node, b, i_hash) { if (inode->i_ino == ino && inode->i_sb == sb) return 0; } @@ -1452,12 +1499,12 @@ EXPORT_SYMBOL(igrab); struct inode *ilookup5_nowait(struct super_block *sb, unsigned long hashval, int (*test)(struct inode *, void *), void *data) { - struct hlist_head *head = i_hash_head(sb, hashval); + struct hlist_bl_head *b = i_hash_head(sb, hashval); struct inode *inode; - spin_lock(&inode_hash_lock); - inode = find_inode(sb, head, test, data); - spin_unlock(&inode_hash_lock); + hlist_bl_lock(b); + inode = find_inode(sb, b, test, data); + hlist_bl_unlock(b); return IS_ERR(inode) ? NULL : inode; } @@ -1507,12 +1554,12 @@ EXPORT_SYMBOL(ilookup5); */ struct inode *ilookup(struct super_block *sb, unsigned long ino) { - struct hlist_head *head = i_hash_head(sb, ino); + struct hlist_bl_head *b = i_hash_head(sb, ino); struct inode *inode; again: - spin_lock(&inode_hash_lock); - inode = find_inode_fast(sb, head, ino); - spin_unlock(&inode_hash_lock); + hlist_bl_lock(b); + inode = find_inode_fast(sb, b, ino); + hlist_bl_unlock(b); if (inode) { if (IS_ERR(inode)) @@ -1556,12 +1603,13 @@ struct inode *find_inode_nowait(struct super_block *sb, void *), void *data) { - struct hlist_head *head = i_hash_head(sb, hashval); + struct hlist_bl_head *b = i_hash_head(sb, hashval); + struct hlist_bl_node *node; struct inode *inode, *ret_inode = NULL; int mval; - spin_lock(&inode_hash_lock); - hlist_for_each_entry(inode, head, i_hash) { + hlist_bl_lock(b); + hlist_bl_for_each_entry(inode, node, b, i_hash) { if (inode->i_sb != sb) continue; mval = match(inode, hashval, data); @@ -1572,7 +1620,7 @@ struct inode *find_inode_nowait(struct super_block *sb, goto out; } out: - spin_unlock(&inode_hash_lock); + hlist_bl_unlock(b); return ret_inode; } EXPORT_SYMBOL(find_inode_nowait); @@ -1601,13 +1649,14 @@ EXPORT_SYMBOL(find_inode_nowait); struct inode *find_inode_rcu(struct super_block *sb, unsigned long hashval, int (*test)(struct inode *, void *), void *data) { - struct hlist_head *head = i_hash_head(sb, hashval); + struct hlist_bl_head *b = i_hash_head(sb, hashval); + struct hlist_bl_node *node; struct inode *inode; RCU_LOCKDEP_WARN(!rcu_read_lock_held(), "suspicious find_inode_rcu() usage"); - hlist_for_each_entry_rcu(inode, head, i_hash) { + hlist_bl_for_each_entry_rcu(inode, node, b, i_hash) { if (inode->i_sb == sb && !(READ_ONCE(inode->i_state) & (I_FREEING | I_WILL_FREE)) && test(inode, data)) @@ -1639,13 +1688,14 @@ EXPORT_SYMBOL(find_inode_rcu); struct inode *find_inode_by_ino_rcu(struct super_block *sb, unsigned long ino) { - struct hlist_head *head = i_hash_head(sb, ino); + struct hlist_bl_head *b = i_hash_head(sb, ino); + struct hlist_bl_node *node; struct inode *inode; RCU_LOCKDEP_WARN(!rcu_read_lock_held(), "suspicious find_inode_by_ino_rcu() usage"); - hlist_for_each_entry_rcu(inode, head, i_hash) { + hlist_bl_for_each_entry_rcu(inode, node, b, i_hash) { if (inode->i_ino == ino && inode->i_sb == sb && !(READ_ONCE(inode->i_state) & (I_FREEING | I_WILL_FREE))) @@ -1659,39 +1709,42 @@ int insert_inode_locked(struct inode *inode) { struct super_block *sb = inode->i_sb; ino_t ino = inode->i_ino; - struct hlist_head *head = i_hash_head(sb, ino); + struct hlist_bl_head *b = i_hash_head(sb, ino); while (1) { - struct inode *old = NULL; - spin_lock(&inode_hash_lock); - hlist_for_each_entry(old, head, i_hash) { - if (old->i_ino != ino) + struct hlist_bl_node *node; + struct inode *old = NULL, *t; + + hlist_bl_lock(b); + hlist_bl_for_each_entry(t, node, b, i_hash) { + if (t->i_ino != ino) continue; - if (old->i_sb != sb) + if (t->i_sb != sb) continue; - spin_lock(&old->i_lock); - if (old->i_state & (I_FREEING|I_WILL_FREE)) { - spin_unlock(&old->i_lock); + spin_lock(&t->i_lock); + if (t->i_state & (I_FREEING|I_WILL_FREE)) { + spin_unlock(&t->i_lock); continue; } + old = t; break; } if (likely(!old)) { spin_lock(&inode->i_lock); inode->i_state |= I_NEW | I_CREATING; - hlist_add_head_rcu(&inode->i_hash, head); + __insert_inode_hash_head(inode, b); spin_unlock(&inode->i_lock); - spin_unlock(&inode_hash_lock); + hlist_bl_unlock(b); return 0; } if (unlikely(old->i_state & I_CREATING)) { spin_unlock(&old->i_lock); - spin_unlock(&inode_hash_lock); + hlist_bl_unlock(b); return -EBUSY; } __iget(old); spin_unlock(&old->i_lock); - spin_unlock(&inode_hash_lock); + hlist_bl_unlock(b); wait_on_inode(old); if (unlikely(!inode_unhashed(old))) { iput(old); @@ -2271,17 +2324,18 @@ EXPORT_SYMBOL(inode_needs_sync); * wake_up_bit(&inode->i_state, __I_NEW) after removing from the hash list * will DTRT. */ -static void __wait_on_freeing_inode(struct inode *inode) +static void __wait_on_freeing_inode(struct hlist_bl_head *b, + struct inode *inode) { wait_queue_head_t *wq; DEFINE_WAIT_BIT(wait, &inode->i_state, __I_NEW); wq = bit_waitqueue(&inode->i_state, __I_NEW); prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE); spin_unlock(&inode->i_lock); - spin_unlock(&inode_hash_lock); + hlist_bl_unlock(b); schedule(); finish_wait(wq, &wait.wq_entry); - spin_lock(&inode_hash_lock); + hlist_bl_lock(b); } static __initdata unsigned long ihash_entries; @@ -2307,7 +2361,7 @@ void __init inode_init_early(void) inode_hashtable = alloc_large_system_hash("Inode-cache", - sizeof(struct hlist_head), + sizeof(struct hlist_bl_head), ihash_entries, 14, HASH_EARLY | HASH_ZERO, @@ -2333,7 +2387,7 @@ void __init inode_init(void) inode_hashtable = alloc_large_system_hash("Inode-cache", - sizeof(struct hlist_head), + sizeof(struct hlist_bl_head), ihash_entries, 14, HASH_ZERO, diff --git a/include/linux/fs.h b/include/linux/fs.h index bb35591733f1..0ef1b72340c7 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -692,7 +692,8 @@ struct inode { unsigned long dirtied_when; /* jiffies of first dirtying */ unsigned long dirtied_time_when; - struct hlist_node i_hash; + struct hlist_bl_node i_hash; + struct hlist_bl_head *i_hash_head; struct list_head i_io_list; /* backing dev IO list */ #ifdef CONFIG_CGROUP_WRITEBACK struct bdi_writeback *i_wb; /* the associated cgroup wb */ @@ -758,7 +759,7 @@ static inline unsigned int i_blocksize(const struct inode *node) static inline int inode_unhashed(struct inode *inode) { - return hlist_unhashed(&inode->i_hash); + return hlist_bl_unhashed(&inode->i_hash); } /* @@ -769,7 +770,7 @@ static inline int inode_unhashed(struct inode *inode) */ static inline void inode_fake_hash(struct inode *inode) { - hlist_add_fake(&inode->i_hash); + hlist_bl_add_fake(&inode->i_hash); } /* @@ -2946,7 +2947,7 @@ static inline void insert_inode_hash(struct inode *inode) extern void __remove_inode_hash(struct inode *); static inline void remove_inode_hash(struct inode *inode) { - if (!inode_unhashed(inode) && !hlist_fake(&inode->i_hash)) + if (!inode_unhashed(inode) && !hlist_bl_fake(&inode->i_hash)) __remove_inode_hash(inode); } From patchwork Wed Dec 6 06:05:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 174312 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp3907665vqy; Tue, 5 Dec 2023 22:07:11 -0800 (PST) X-Google-Smtp-Source: AGHT+IFF4A1/83gN38D24DvZAhbZ1t/BlRt71OueZV6d+mg8O7xlshMjdcKjfrMX1OY1059nQaWv X-Received: by 2002:a05:6a20:e101:b0:18a:d766:cae1 with SMTP id kr1-20020a056a20e10100b0018ad766cae1mr309706pzb.54.1701842831569; Tue, 05 Dec 2023 22:07:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701842831; cv=none; d=google.com; s=arc-20160816; b=To4+n03hfMTwNIqU4l18Gfyub92VWiEYHhRYeo8I9R3qujfMTuQBme3dpl6aG84e4J B8qvHtRbKtqtVz4oxsmX32L1s+O7EmXJwy7HTSTOv+CcsPplegDfj5jw0xIGLxDTbqTn CpsUM7ITZHqPCujOziuq2fmXcpNOHIMvdF7ktCMYCk0p0aJxzgIokVNjAkXpju89U81K cFmKpm5p7tyqSGeoFzK2EcZKG/Q5DqdtDzXK1kvdktoIPqjgVNHVTMyaipssceU0jZ8K QWrlR9FfqOR1i4rFIWsAcKHYahhRE2D4GRQE7omC076QYiPaiSRL8X0QrvSmTAqTiTZo gtOg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=6abx9zF8o8O5TeS/ywwBDTAT+8QpF2rP4pNJ9ZpbgvE=; fh=CW5pnDTDkJR8Vn9NaUbLrrp28PUVapFEjMCn+HkS0lE=; b=KyxfIEEfWtnxY1+X9CLoLvFQgIa3vMqjeqC55q+fqWWM6mTZPLCbqFrAWHINMyCSk/ 1fUfpmz7KJO6ytI6kEbIsFQ+MJrLsOihActfOdiEzCxhlovSfxYyHjGylEAEt2Vr4VWl 7ETedqCbxOESa+dSiW/RSync9yLXZwslL4FLUuzNwXXHGVtUMaIa4YH4ZvbwGD5CzL5I q9fMMvvRgfG3ctZI6QOGDYXGXCYzNveQcdOONTykpvgG4fu0hXezUnl7oQeOzAyRDV5+ yDM2J03TQCzG6WokZO8zBRFDx0TGMqZOtTfQc6Qh+6gZgv90HjdS0VyoeAMkF6E0GboI m58Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=nLK+P+X6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id c18-20020a170902849200b001d07b6a0aeasi6168634plo.325.2023.12.05.22.07.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 22:07:11 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=nLK+P+X6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 5E76C802F187; Tue, 5 Dec 2023 22:07:10 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1376985AbjLFGG5 (ORCPT + 99 others); Wed, 6 Dec 2023 01:06:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49900 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1376904AbjLFGGe (ORCPT ); Wed, 6 Dec 2023 01:06:34 -0500 Received: from mail-pl1-x62b.google.com (mail-pl1-x62b.google.com [IPv6:2607:f8b0:4864:20::62b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 21071D6D for ; Tue, 5 Dec 2023 22:06:38 -0800 (PST) Received: by mail-pl1-x62b.google.com with SMTP id d9443c01a7336-1d069b1d127so26084855ad.0 for ; Tue, 05 Dec 2023 22:06:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1701842797; x=1702447597; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=6abx9zF8o8O5TeS/ywwBDTAT+8QpF2rP4pNJ9ZpbgvE=; b=nLK+P+X6MBvI4uK779aodJifGNiro1QhuSp848soYZsUF7Y8tiAietL/WwwiSPgS84 s7NwROAcdxsuYb9dXJgkNockQQyfVQrCL9/31a3AC9N8Fn5xjzDzFeJtdh70Y4cZhoYS WBkFTzhqamosXgvoln0cb3daNPiwuIsChm9CWTfw8Ds2hFVe9W1Mp0UIJPoMY13eXrr7 yXC1T9qugab1tcrvHcnZk2etfRIkKHeOMtJleVbg+nPbsdy3X35iqymk8zJkH0vSFR2L sVS5fgP8cbUEPiteZChwcZh/ubBTAGen6Yd+G4W54CWgIpT2iTzeXaM62lcuddZ5NvGO C1iw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701842797; x=1702447597; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6abx9zF8o8O5TeS/ywwBDTAT+8QpF2rP4pNJ9ZpbgvE=; b=hGIutnKqwFQXOwkgAz3mIJHqTupxQ3vvJsVuQ/+xcQ/J4YVq8gqVFK43STR7n7b8Ra CWhSz+AKHfotE/09Ddreg5O6eyl2tvB3yeH4zjdy9UWJNfqUMXLxVF9DR6bzzt2wa1FS cX4idEUyFNz+j6c0ZP0iK7iotofqRfp15Oy3lQsbLvmi1EX7g4uu9/Ztkgj7mrpK63Hv 7w/qWEghcokLkG2cClwRNlet3bw+xirMaVtwtMHaATsFbw00EgALiNCVp7ql+Nf8q9GK rH5BGi1/3puMXLOe70b1tlqZy7Wc9KRzDwhxfFd5W9wicDHWvx0DlZevopj4BabTxvHe Davg== X-Gm-Message-State: AOJu0YzOqXboKZkHU4bCfVNnLAKw0ArZXlS33EwBRuKukTubmPfMy/Y6 JSS9sQ5a7HzbZzbhQajrwcmESQ== X-Received: by 2002:a17:902:bc8c:b0:1d0:6ffd:f222 with SMTP id bb12-20020a170902bc8c00b001d06ffdf222mr231131plb.120.1701842797392; Tue, 05 Dec 2023 22:06:37 -0800 (PST) Received: from dread.disaster.area (pa49-180-125-5.pa.nsw.optusnet.com.au. [49.180.125.5]) by smtp.gmail.com with ESMTPSA id 13-20020a170902c10d00b001d0b4693539sm3945964pli.189.2023.12.05.22.06.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 22:06:34 -0800 (PST) Received: from [192.168.253.23] (helo=devoid.disaster.area) by dread.disaster.area with esmtp (Exim 4.96) (envelope-from ) id 1rAl3I-004VP7-13; Wed, 06 Dec 2023 17:06:31 +1100 Received: from dave by devoid.disaster.area with local (Exim 4.97-RC0) (envelope-from ) id 1rAl3H-0000000BrVf-3cUt; Wed, 06 Dec 2023 17:06:31 +1100 From: Dave Chinner To: linux-fsdevel@vger.kernel.org Cc: linux-block@vger.kernel.org, linux-cachefs@redhat.com, dhowells@redhat.com, gfs2@lists.linux.dev, dm-devel@lists.linux.dev, linux-security-module@vger.kernel.org, selinux@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 09/11] hash-bl: explicitly initialise hash-bl heads Date: Wed, 6 Dec 2023 17:05:38 +1100 Message-ID: <20231206060629.2827226-10-david@fromorbit.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20231206060629.2827226-1-david@fromorbit.com> References: <20231206060629.2827226-1-david@fromorbit.com> MIME-Version: 1.0 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Tue, 05 Dec 2023 22:07:10 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784511549009919955 X-GMAIL-MSGID: 1784511549009919955 From: Dave Chinner Because we are going to change how the structure is laid out to support RTPREEMPT and LOCKDEP, just assuming that the hash table is allocated as zeroed memory is no longer sufficient to initialise a hash-bl table. Signed-off-by: Dave Chinner --- fs/dcache.c | 21 ++++++++++++++++++++- fs/fscache/cookie.c | 8 ++++++++ fs/fscache/internal.h | 6 ++++-- fs/fscache/main.c | 3 +++ fs/fscache/volume.c | 8 ++++++++ fs/inode.c | 19 ++++++++++++++++++- 6 files changed, 61 insertions(+), 4 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index c82ae731df9a..9059b3a55370 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -3284,7 +3284,10 @@ __setup("dhash_entries=", set_dhash_entries); static void __init dcache_init_early(void) { - /* If hashes are distributed across NUMA nodes, defer + int i; + + /* + * If hashes are distributed across NUMA nodes, defer * hash allocation until vmalloc space is available. */ if (hashdist) @@ -3300,11 +3303,20 @@ static void __init dcache_init_early(void) NULL, 0, 0); + /* + * The value returned in d_hash_shift tells us the size of the + * hash table that was allocated as a log2 value. + */ + for (i = 0; i < (1 << d_hash_shift); i++) + INIT_HLIST_BL_HEAD(&dentry_hashtable[i]); + d_hash_shift = 32 - d_hash_shift; } static void __init dcache_init(void) { + int i; + /* * A constructor could be added for stable state like the lists, * but it is probably not worth it because of the cache nature @@ -3328,6 +3340,13 @@ static void __init dcache_init(void) NULL, 0, 0); + /* + * The value returned in d_hash_shift tells us the size of the + * hash table that was allocated as a log2 value. + */ + for (i = 0; i < (1 << d_hash_shift); i++) + INIT_HLIST_BL_HEAD(&dentry_hashtable[i]); + d_hash_shift = 32 - d_hash_shift; } diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c index bce2492186d0..21617f7c88e4 100644 --- a/fs/fscache/cookie.c +++ b/fs/fscache/cookie.c @@ -32,6 +32,14 @@ static DECLARE_WORK(fscache_cookie_lru_work, fscache_cookie_lru_worker); static const char fscache_cookie_states[FSCACHE_COOKIE_STATE__NR] = "-LCAIFUWRD"; static unsigned int fscache_lru_cookie_timeout = 10 * HZ; +void fscache_cookie_hash_init(void) +{ + int i; + + for (i = 0; i < (1 << fscache_cookie_hash_shift); i++) + INIT_HLIST_BL_HEAD(&fscache_cookie_hash[i]); +} + void fscache_print_cookie(struct fscache_cookie *cookie, char prefix) { const u8 *k; diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h index 1336f517e9b1..6cbe07decc11 100644 --- a/fs/fscache/internal.h +++ b/fs/fscache/internal.h @@ -61,8 +61,9 @@ extern const struct seq_operations fscache_cookies_seq_ops; #endif extern struct timer_list fscache_cookie_lru_timer; -extern void fscache_print_cookie(struct fscache_cookie *cookie, char prefix); -extern bool fscache_begin_cookie_access(struct fscache_cookie *cookie, +void fscache_cookie_hash_init(void); +void fscache_print_cookie(struct fscache_cookie *cookie, char prefix); +bool fscache_begin_cookie_access(struct fscache_cookie *cookie, enum fscache_access_trace why); static inline void fscache_see_cookie(struct fscache_cookie *cookie, @@ -143,6 +144,7 @@ int fscache_stats_show(struct seq_file *m, void *v); extern const struct seq_operations fscache_volumes_seq_ops; #endif +void fscache_volume_hash_init(void); struct fscache_volume *fscache_get_volume(struct fscache_volume *volume, enum fscache_volume_trace where); void fscache_put_volume(struct fscache_volume *volume, diff --git a/fs/fscache/main.c b/fs/fscache/main.c index dad85fd84f6f..7db2a4423315 100644 --- a/fs/fscache/main.c +++ b/fs/fscache/main.c @@ -92,6 +92,9 @@ static int __init fscache_init(void) goto error_cookie_jar; } + fscache_volume_hash_init(); + fscache_cookie_hash_init(); + pr_notice("Loaded\n"); return 0; diff --git a/fs/fscache/volume.c b/fs/fscache/volume.c index cdf991bdd9de..8b029c46a3a3 100644 --- a/fs/fscache/volume.c +++ b/fs/fscache/volume.c @@ -17,6 +17,14 @@ static LIST_HEAD(fscache_volumes); static void fscache_create_volume_work(struct work_struct *work); +void fscache_volume_hash_init(void) +{ + int i; + + for (i = 0; i < (1 << fscache_volume_hash_shift); i++) + INIT_HLIST_BL_HEAD(&fscache_volume_hash[i]); +} + struct fscache_volume *fscache_get_volume(struct fscache_volume *volume, enum fscache_volume_trace where) { diff --git a/fs/inode.c b/fs/inode.c index 3eb9c4e5b279..57c1030ccad3 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -2353,7 +2353,10 @@ __setup("ihash_entries=", set_ihash_entries); */ void __init inode_init_early(void) { - /* If hashes are distributed across NUMA nodes, defer + int i; + + /* + * If hashes are distributed across NUMA nodes, defer * hash allocation until vmalloc space is available. */ if (hashdist) @@ -2369,10 +2372,18 @@ void __init inode_init_early(void) &i_hash_mask, 0, 0); + /* + * The value returned in i_hash_shift tells us the size of the + * hash table that was allocated as a log2 value. + */ + for (i = 0; i < (1 << i_hash_shift); i++) + INIT_HLIST_BL_HEAD(&inode_hashtable[i]); } void __init inode_init(void) { + int i; + /* inode slab cache */ inode_cachep = kmem_cache_create("inode_cache", sizeof(struct inode), @@ -2395,6 +2406,12 @@ void __init inode_init(void) &i_hash_mask, 0, 0); + /* + * The value returned in i_hash_shift tells us the size of the + * hash table that was allocated as a log2 value. + */ + for (i = 0; i < (1 << i_hash_shift); i++) + INIT_HLIST_BL_HEAD(&inode_hashtable[i]); } void init_special_inode(struct inode *inode, umode_t mode, dev_t rdev) From patchwork Wed Dec 6 06:05:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 174315 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp3907863vqy; Tue, 5 Dec 2023 22:07:43 -0800 (PST) X-Google-Smtp-Source: AGHT+IGQvw3FAiAK4Ok3hwrNK9y6WVM5TkwclQzNrnLo3WEej3Yw755Z+3Ccrt/rDvciUDo35L3Q X-Received: by 2002:a17:90b:3a8e:b0:286:6cc0:62a5 with SMTP id om14-20020a17090b3a8e00b002866cc062a5mr508401pjb.36.1701842863553; Tue, 05 Dec 2023 22:07:43 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701842863; cv=none; d=google.com; s=arc-20160816; b=dJQLvvmpLj5V1zrPR5XyDLjNECZv2gMcna9MRz9wPDeGcbVcgsUVamoESjhlCGaOqJ DJMhD6ltH/rxEm46/b/j7czMeKnM/4LdhZgwZKJyw2TcF7tzsGmBM32VGPwIrzjSL4GK RA/fhi8IuWZkzDEig+V4+E/P4bo4PlTuEtPp+ApzJs+8/0y1OWQinIvyy1kS1ewqBkCm jHToy46CiPLCC6oC4arVVw3zU64nv32+1orhY2kTzWu3FtHc5bBb4vbJybhBvHaI3m9Y Ml77GDfM0PFUznmFbq3yxRlnnu86xL/6AdzYxWSH495eLFmwB0XY9ZRbpFY43HpItdEf 77GQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=Rdb9xvX76/xANEogpQRhxIokBrxEBw+rSVcTgSipqa8=; fh=CW5pnDTDkJR8Vn9NaUbLrrp28PUVapFEjMCn+HkS0lE=; b=jBYmgPxccBboYXE8frKc4ko+5VNvyj0lgTRICE4QtDSZTli67iith1u740NZq24/PV a2Q4B4ap0gf2pqzJcHw9BaYZrTQ+dol9+BWnrRBuzFYDECVePCOIDZ5ND1ZV4YPY4xF7 MtuNydh42+d0qENK5xfw4pCc8pG9Ax09FNBxzvVxnm0cA++9cNgZ3TBVQib3yHWiI8ve oxWuE30G1J6L0oouY/V10VdCXPiL/Dl6L9upus2uZ1JtRzDWK5GRlKbEcWYDT/63uVrX vpvbm0vVZ6YMkAdfw7dslOH53kqWW5P7IKZngtPbx9/g7xhqsyrXFE4MnxeXJeBx+s8X yz9Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=kuoM7EmE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Received: from fry.vger.email (fry.vger.email. [23.128.96.38]) by mx.google.com with ESMTPS id gi15-20020a17090b110f00b00280479459f7si3276547pjb.50.2023.12.05.22.07.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 22:07:43 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) client-ip=23.128.96.38; Authentication-Results: mx.google.com; dkim=pass header.i=@fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=kuoM7EmE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id 6CEBE804B02A; Tue, 5 Dec 2023 22:07:32 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1377058AbjLFGHX (ORCPT + 99 others); Wed, 6 Dec 2023 01:07:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49954 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1376911AbjLFGGk (ORCPT ); Wed, 6 Dec 2023 01:06:40 -0500 Received: from mail-pg1-x532.google.com (mail-pg1-x532.google.com [IPv6:2607:f8b0:4864:20::532]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4ECE910D3 for ; Tue, 5 Dec 2023 22:06:40 -0800 (PST) Received: by mail-pg1-x532.google.com with SMTP id 41be03b00d2f7-5c66988c2eeso399911a12.1 for ; Tue, 05 Dec 2023 22:06:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1701842799; x=1702447599; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Rdb9xvX76/xANEogpQRhxIokBrxEBw+rSVcTgSipqa8=; b=kuoM7EmEWDdEqIKN8PcgL+FALJqsEhHgZOoMjXlxKXkW6Tnwr2qqTNBjHhUb1gU36y WALXNcQPAOvMx/uaf4IgZwfEMKF+6JtdlWMVO0lP4fB4yL6cWGBfZS90bY8mnP+BtfZG bXaRHjWZB3VEMDp6mh3RcS3IWzgJhpMNcyUAygik6u1ESFJAVbVC+d46UvhF4lVdJI5x gVaoQaNxcXuw6FnXl3BovN3q/Z9Yo140FTTzq0HCLsi3wmrOFhXbK2Y+NPM6jOnz9mN8 G6HzpAd3Q3zFhG3sfgPEFEPDb6BdaGPeYfyMQ0mvXMb/38KP0BWlkl3s+7C+myZ6NcRO I15Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701842799; x=1702447599; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Rdb9xvX76/xANEogpQRhxIokBrxEBw+rSVcTgSipqa8=; b=Z5oLqlBtFCIple1YplEtueVM7fASo/VxbVNGwtHlzwVG+4N1xgX5Hbn0ptiLtPfvo8 HQDT1Vn6uVhSGh5brv+Vf0CLselqzFNbFjekMY78kHWrhDY9/fYXQkonjrlrjd9eXZh4 zaEmY+QME1H5vMHqqHtzL1EV5hMLUupiK6+uxRQ+7lXdSLtxATAC6dwjX62Zk8XTMLVc TQkIzXiIGta5arilpmLbQrZcv6DcZJQK6Jjl0x4PQ6No1bDmIUOQMUeN70AexiCXpJO4 c9XxUzRQpG3r8LTfV4fCRj7uHkNdvWbGJdC0C6OkvOTolPWXh0e/Pdlqjq4Rpjsx9rtf zh3Q== X-Gm-Message-State: AOJu0YzTo+6lihntfnKQlcwqKG9tNlZX9OBx4Tw3B0WfxFJLOzPcPAM9 YIJcvDrKAm3EFhDVHouKPV7p5Q== X-Received: by 2002:a17:90b:1650:b0:286:d498:1ad with SMTP id il16-20020a17090b165000b00286d49801admr507619pjb.3.1701842799111; Tue, 05 Dec 2023 22:06:39 -0800 (PST) Received: from dread.disaster.area (pa49-180-125-5.pa.nsw.optusnet.com.au. [49.180.125.5]) by smtp.gmail.com with ESMTPSA id iy3-20020a170903130300b001cfed5524easm7574769plb.288.2023.12.05.22.06.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 22:06:36 -0800 (PST) Received: from [192.168.253.23] (helo=devoid.disaster.area) by dread.disaster.area with esmtp (Exim 4.96) (envelope-from ) id 1rAl3I-004VPB-1D; Wed, 06 Dec 2023 17:06:32 +1100 Received: from dave by devoid.disaster.area with local (Exim 4.97-RC0) (envelope-from ) id 1rAl3H-0000000BrVk-3uln; Wed, 06 Dec 2023 17:06:31 +1100 From: Dave Chinner To: linux-fsdevel@vger.kernel.org Cc: linux-block@vger.kernel.org, linux-cachefs@redhat.com, dhowells@redhat.com, gfs2@lists.linux.dev, dm-devel@lists.linux.dev, linux-security-module@vger.kernel.org, selinux@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 10/11] list_bl: don't use bit locks for PREEMPT_RT or lockdep Date: Wed, 6 Dec 2023 17:05:39 +1100 Message-ID: <20231206060629.2827226-11-david@fromorbit.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20231206060629.2827226-1-david@fromorbit.com> References: <20231206060629.2827226-1-david@fromorbit.com> MIME-Version: 1.0 X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Tue, 05 Dec 2023 22:07:32 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784511582401744894 X-GMAIL-MSGID: 1784511582401744894 From: Dave Chinner hash-bl nests spinlocks inside the bit locks. This causes problems for CONFIG_PREEMPT_RT which converts spin locks to sleeping locks, and we're not allowed to sleep while holding a spinning lock. Further, lockdep does not support bit locks, so we lose lockdep coverage of the inode hash table with the hash-bl conversion. To enable these configs to work, add an external per-chain spinlock to the hlist_bl_head() and add helpers to use this instead of the bit spinlock when preempt_rt or lockdep are enabled. This converts all users of hlist-bl to use the external spinlock in these situations, so we also gain lockdep coverage of things like the dentry cache hash table with this change. Signed-off-by: Dave Chinner Reviewed-by: Kent Overstreet --- include/linux/list_bl.h | 126 ++++++++++++++++++++++++++++--------- include/linux/rculist_bl.h | 13 ++++ 2 files changed, 110 insertions(+), 29 deletions(-) diff --git a/include/linux/list_bl.h b/include/linux/list_bl.h index 8ee2bf5af131..990ad8e24e0b 100644 --- a/include/linux/list_bl.h +++ b/include/linux/list_bl.h @@ -4,14 +4,27 @@ #include #include +#include /* * Special version of lists, where head of the list has a lock in the lowest * bit. This is useful for scalable hash tables without increasing memory * footprint overhead. * - * For modification operations, the 0 bit of hlist_bl_head->first - * pointer must be set. + * Whilst the general use of bit spin locking is considered safe, PREEMPT_RT + * introduces a problem with nesting spin locks inside bit locks: spin locks + * become sleeping locks, and we can't sleep inside spinning locks such as bit + * locks. However, for RTPREEMPT, performance is less of an issue than + * correctness, so we trade off the memory and cache footprint of a spinlock per + * list so the list locks are converted to sleeping locks and work correctly + * with PREEMPT_RT kernels. + * + * An added advantage of this is that we can use the same trick when lockdep is + * enabled (again, performance doesn't matter) and gain lockdep coverage of all + * the hash-bl operations. + * + * For modification operations when using pure bit locking, the 0 bit of + * hlist_bl_head->first pointer must be set. * * With some small modifications, this can easily be adapted to store several * arbitrary bits (not just a single lock bit), if the need arises to store @@ -30,16 +43,21 @@ #define LIST_BL_BUG_ON(x) #endif +#undef LIST_BL_USE_SPINLOCKS +#if defined(CONFIG_PREEMPT_RT) || defined(CONFIG_LOCKDEP) +#define LIST_BL_USE_SPINLOCKS 1 +#endif struct hlist_bl_head { struct hlist_bl_node *first; +#ifdef LIST_BL_USE_SPINLOCKS + spinlock_t lock; +#endif }; struct hlist_bl_node { struct hlist_bl_node *next, **pprev; }; -#define INIT_HLIST_BL_HEAD(ptr) \ - ((ptr)->first = NULL) static inline void INIT_HLIST_BL_NODE(struct hlist_bl_node *h) { @@ -54,6 +72,69 @@ static inline bool hlist_bl_unhashed(const struct hlist_bl_node *h) return !h->pprev; } +#ifdef LIST_BL_USE_SPINLOCKS +#define INIT_HLIST_BL_HEAD(ptr) do { \ + (ptr)->first = NULL; \ + spin_lock_init(&(ptr)->lock); \ +} while (0) + +static inline void hlist_bl_lock(struct hlist_bl_head *b) +{ + spin_lock(&b->lock); +} + +static inline void hlist_bl_unlock(struct hlist_bl_head *b) +{ + spin_unlock(&b->lock); +} + +static inline bool hlist_bl_is_locked(struct hlist_bl_head *b) +{ + return spin_is_locked(&b->lock); +} + +static inline struct hlist_bl_node *hlist_bl_first(struct hlist_bl_head *h) +{ + return h->first; +} + +static inline void hlist_bl_set_first(struct hlist_bl_head *h, + struct hlist_bl_node *n) +{ + h->first = n; +} + +static inline void hlist_bl_set_before(struct hlist_bl_node **pprev, + struct hlist_bl_node *n) +{ + WRITE_ONCE(*pprev, n); +} + +static inline bool hlist_bl_empty(const struct hlist_bl_head *h) +{ + return !READ_ONCE(h->first); +} + +#else /* !LIST_BL_USE_SPINLOCKS */ + +#define INIT_HLIST_BL_HEAD(ptr) \ + ((ptr)->first = NULL) + +static inline void hlist_bl_lock(struct hlist_bl_head *b) +{ + bit_spin_lock(0, (unsigned long *)b); +} + +static inline void hlist_bl_unlock(struct hlist_bl_head *b) +{ + __bit_spin_unlock(0, (unsigned long *)b); +} + +static inline bool hlist_bl_is_locked(struct hlist_bl_head *b) +{ + return bit_spin_is_locked(0, (unsigned long *)b); +} + static inline struct hlist_bl_node *hlist_bl_first(struct hlist_bl_head *h) { return (struct hlist_bl_node *) @@ -69,11 +150,21 @@ static inline void hlist_bl_set_first(struct hlist_bl_head *h, h->first = (struct hlist_bl_node *)((unsigned long)n | LIST_BL_LOCKMASK); } +static inline void hlist_bl_set_before(struct hlist_bl_node **pprev, + struct hlist_bl_node *n) +{ + WRITE_ONCE(*pprev, + (struct hlist_bl_node *) + ((uintptr_t)n | ((uintptr_t)*pprev & LIST_BL_LOCKMASK))); +} + static inline bool hlist_bl_empty(const struct hlist_bl_head *h) { return !((unsigned long)READ_ONCE(h->first) & ~LIST_BL_LOCKMASK); } +#endif /* LIST_BL_USE_SPINLOCKS */ + static inline void hlist_bl_add_head(struct hlist_bl_node *n, struct hlist_bl_head *h) { @@ -94,11 +185,7 @@ static inline void hlist_bl_add_before(struct hlist_bl_node *n, n->pprev = pprev; n->next = next; next->pprev = &n->next; - - /* pprev may be `first`, so be careful not to lose the lock bit */ - WRITE_ONCE(*pprev, - (struct hlist_bl_node *) - ((uintptr_t)n | ((uintptr_t)*pprev & LIST_BL_LOCKMASK))); + hlist_bl_set_before(pprev, n); } static inline void hlist_bl_add_behind(struct hlist_bl_node *n, @@ -119,11 +206,7 @@ static inline void __hlist_bl_del(struct hlist_bl_node *n) LIST_BL_BUG_ON((unsigned long)n & LIST_BL_LOCKMASK); - /* pprev may be `first`, so be careful not to lose the lock bit */ - WRITE_ONCE(*pprev, - (struct hlist_bl_node *) - ((unsigned long)next | - ((unsigned long)*pprev & LIST_BL_LOCKMASK))); + hlist_bl_set_before(pprev, next); if (next) next->pprev = pprev; } @@ -165,21 +248,6 @@ static inline bool hlist_bl_fake(struct hlist_bl_node *n) return n->pprev == &n->next; } -static inline void hlist_bl_lock(struct hlist_bl_head *b) -{ - bit_spin_lock(0, (unsigned long *)b); -} - -static inline void hlist_bl_unlock(struct hlist_bl_head *b) -{ - __bit_spin_unlock(0, (unsigned long *)b); -} - -static inline bool hlist_bl_is_locked(struct hlist_bl_head *b) -{ - return bit_spin_is_locked(0, (unsigned long *)b); -} - /** * hlist_bl_for_each_entry - iterate over list of given type * @tpos: the type * to use as a loop cursor. diff --git a/include/linux/rculist_bl.h b/include/linux/rculist_bl.h index 0b952d06eb0b..2d5eb5153121 100644 --- a/include/linux/rculist_bl.h +++ b/include/linux/rculist_bl.h @@ -8,6 +8,18 @@ #include #include +#ifdef LIST_BL_USE_SPINLOCKS +static inline void hlist_bl_set_first_rcu(struct hlist_bl_head *h, + struct hlist_bl_node *n) +{ + rcu_assign_pointer(h->first, n); +} + +static inline struct hlist_bl_node *hlist_bl_first_rcu(struct hlist_bl_head *h) +{ + return rcu_dereference_check(h->first, hlist_bl_is_locked(h)); +} +#else /* !LIST_BL_USE_SPINLOCKS */ static inline void hlist_bl_set_first_rcu(struct hlist_bl_head *h, struct hlist_bl_node *n) { @@ -23,6 +35,7 @@ static inline struct hlist_bl_node *hlist_bl_first_rcu(struct hlist_bl_head *h) return (struct hlist_bl_node *) ((unsigned long)rcu_dereference_check(h->first, hlist_bl_is_locked(h)) & ~LIST_BL_LOCKMASK); } +#endif /* LIST_BL_USE_SPINLOCKS */ /** * hlist_bl_del_rcu - deletes entry from hash list without re-initialization From patchwork Wed Dec 6 06:05:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 174310 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp3907583vqy; Tue, 5 Dec 2023 22:06:58 -0800 (PST) X-Google-Smtp-Source: AGHT+IGBTnuCinLHF5JTsJtMqkV/JNIItfjaTDdul3JZye7+otatEDnqsfZ7KzQpicz+IMDZAZdg X-Received: by 2002:a05:6a00:291a:b0:6ce:2732:588 with SMTP id cg26-20020a056a00291a00b006ce27320588mr414156pfb.57.1701842818599; Tue, 05 Dec 2023 22:06:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701842818; cv=none; d=google.com; s=arc-20160816; b=umyz2Ar+JzuGIBduWlHTwcvGIKGox5im9wSMZAg1cMsIh6p8JT6yB7KJeM+MWYdNND gx8KgxJ3Dpmiw5VtTo8BpiUt2WjlB4yWLPBGXYL9pvVgb37/7WSDfb0rA6QloryxQRvz RPyXoMmxCrj9pf1du+TwUSO0mt6oHHMDDFBSLmzSkbWU5rjQdf7Y++N8dvMSuAMxXpTL ROmEWDayoVaOV45oMdbcYx2HfS66SPOLs9VRUrxZ/5K9FGH3HCt4FcbVghmq+drXQ4sy CSx4c3dXFoN4y8fUNuhVdYLPZP6ITjIuyJRBu0Z7nEPUNdbUijD4xetExMrjGnWiI8Su qh3A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=k4fn8znMBnHx32svyrnbDfUnHhzlVQ7G5pzlVkycieA=; fh=CW5pnDTDkJR8Vn9NaUbLrrp28PUVapFEjMCn+HkS0lE=; b=gl2vl71bUDxBGjd/zIhxy7JuOEm3R4ge0Kp0U9+uFG26I+SBv5sViUSxsxZwMqm/SG ShflHxTJ6D8vGxqkejr+VysQUo+4o3uhwdAvEeA9yLuqwRIx4iyHXDox1Uqzi9yGorXV 0p1nTrYKLIVTsB0VD9O5+9UMAf/xuTL3lFd6nkphrHyW2+KNrsL8AP6qgBGplQD6+Lkn CowQ7NVKNIJtL+o6T61KBxV0iZcdwtT3cbF4TKppMCxZF9fCZPt6S20GTgWrvaNeQcGH VukdwOL/+M6l5dHZUZCigAP53hp5+i8bYWvZNo3YOpq3HWZzsshmDIVfaCADKRCmRuX3 zLlw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=vueYdWwm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id e20-20020a056a001a9400b006cd905021d4si10844099pfv.367.2023.12.05.22.06.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 22:06:58 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=vueYdWwm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=fromorbit.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 30C92802F801; Tue, 5 Dec 2023 22:06:57 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1376936AbjLFGGq (ORCPT + 99 others); Wed, 6 Dec 2023 01:06:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49892 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1376902AbjLFGGe (ORCPT ); Wed, 6 Dec 2023 01:06:34 -0500 Received: from mail-pg1-x534.google.com (mail-pg1-x534.google.com [IPv6:2607:f8b0:4864:20::534]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D1BE4D67 for ; Tue, 5 Dec 2023 22:06:37 -0800 (PST) Received: by mail-pg1-x534.google.com with SMTP id 41be03b00d2f7-5c659db0ce2so3399984a12.0 for ; Tue, 05 Dec 2023 22:06:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1701842797; x=1702447597; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=k4fn8znMBnHx32svyrnbDfUnHhzlVQ7G5pzlVkycieA=; b=vueYdWwmYeDx3LEn4Nye1xXvMy+5G/E9jQShJCw83qUlCniD1y2t9oPEohyUqOoltY BjztqHStsw58EMoS2Fzudh7ZaOmxKl+XuDLNgEERc24vCJmuRpwn4ATB7dw2ZJa2gU7V GbtC74J7H1chY2cXaAU40V1cAXazNakqkxvvq3ZuFN2OpBuW+MBVe46LKfN7qnsZQKZr h5YOvGSFHgiVt6fK7J+ktLmvqMMG8QFe32Ok1HGAFWKmqrGJN+5gcFJkJVGNqHR68nP1 fK/GzXz9L8qquvU9zCn4hnzaF09K7Ci+9Qh5nsk3oJMkO1gWs8LmURpMWBFBCX9xMGI/ /fkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701842797; x=1702447597; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=k4fn8znMBnHx32svyrnbDfUnHhzlVQ7G5pzlVkycieA=; b=U/dv5weYklLZgYEsLFs3K8X4UoeybGYVz6LS0TWtVPb5oCKoZbI+Tz3thxTic28Fiw kmCGnYqufAWS8cNV4AXttGU/hZBrRX90sHQXgl8clCPl2N0adsi82cDXDs1E+vjgMj9/ LRVvGBZEbP34WPk1n9uTRULM5qPcxTwjwxV56r/1FDA4soZqVOSHqo7ZbdAA4Mdtob78 3Gx+fQEqng2hXj5E3ecFe1juaWAth/KRTtlC7khEYehkq6u5zcI7xkdDiCoUawyDbrSw 5TRTaOiwfKor7tFqsc45VINHsClSR5WEsLXO7XCkkQZ7xXCkcOLdXNsm9QGaAaDzuG9h l2jw== X-Gm-Message-State: AOJu0YxBDZ9UUlC/XX4HDYeu+yU4eeaGfvwoUA7LQIg6wNcfZaK8Hv30 1o/zx/bQnyi9SQVHmvGnvDpdLA== X-Received: by 2002:a17:902:7795:b0:1d0:5806:f45d with SMTP id o21-20020a170902779500b001d05806f45dmr351927pll.42.1701842796981; Tue, 05 Dec 2023 22:06:36 -0800 (PST) Received: from dread.disaster.area (pa49-180-125-5.pa.nsw.optusnet.com.au. [49.180.125.5]) by smtp.gmail.com with ESMTPSA id k10-20020a170902c40a00b001d087f68ef8sm2308763plk.37.2023.12.05.22.06.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 22:06:34 -0800 (PST) Received: from [192.168.253.23] (helo=devoid.disaster.area) by dread.disaster.area with esmtp (Exim 4.96) (envelope-from ) id 1rAl3I-004VPF-1Q; Wed, 06 Dec 2023 17:06:32 +1100 Received: from dave by devoid.disaster.area with local (Exim 4.97-RC0) (envelope-from ) id 1rAl3H-0000000BrVp-49wq; Wed, 06 Dec 2023 17:06:32 +1100 From: Dave Chinner To: linux-fsdevel@vger.kernel.org Cc: linux-block@vger.kernel.org, linux-cachefs@redhat.com, dhowells@redhat.com, gfs2@lists.linux.dev, dm-devel@lists.linux.dev, linux-security-module@vger.kernel.org, selinux@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 11/11] hlist-bl: introduced nested locking for dm-snap Date: Wed, 6 Dec 2023 17:05:40 +1100 Message-ID: <20231206060629.2827226-12-david@fromorbit.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20231206060629.2827226-1-david@fromorbit.com> References: <20231206060629.2827226-1-david@fromorbit.com> MIME-Version: 1.0 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Tue, 05 Dec 2023 22:06:57 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784511535593312585 X-GMAIL-MSGID: 1784511535593312585 From: Dave Chinner Testing with lockdep enabled threw this warning from generic/081 in fstests: [ 2369.724151] ============================================ [ 2369.725805] WARNING: possible recursive locking detected [ 2369.727125] 6.7.0-rc2-dgc+ #1952 Not tainted [ 2369.728647] -------------------------------------------- [ 2369.730197] systemd-udevd/389493 is trying to acquire lock: [ 2369.732378] ffff888116a1a320 (&(et->table + i)->lock){+.+.}-{2:2}, at: snapshot_map+0x13e/0x7f0 [ 2369.736197] but task is already holding lock: [ 2369.738657] ffff8881098a4fd0 (&(et->table + i)->lock){+.+.}-{2:2}, at: snapshot_map+0x136/0x7f0 [ 2369.742118] other info that might help us debug this: [ 2369.744403] Possible unsafe locking scenario: [ 2369.746814] CPU0 [ 2369.747675] ---- [ 2369.748496] lock(&(et->table + i)->lock); [ 2369.749877] lock(&(et->table + i)->lock); [ 2369.751241] *** DEADLOCK *** [ 2369.753173] May be due to missing lock nesting notation [ 2369.754963] 4 locks held by systemd-udevd/389493: [ 2369.756124] #0: ffff88811b3a1f48 (mapping.invalidate_lock#2){++++}-{3:3}, at: page_cache_ra_unbounded+0x69/0x190 [ 2369.758516] #1: ffff888121ceff10 (&md->io_barrier){.+.+}-{0:0}, at: dm_get_live_table+0x52/0xd0 [ 2369.760888] #2: ffff888110240078 (&s->lock#2){++++}-{3:3}, at: snapshot_map+0x12e/0x7f0 [ 2369.763254] #3: ffff8881098a4fd0 (&(et->table + i)->lock){+.+.}-{2:2}, at: snapshot_map+0x136/0x7f0 [ 2369.765896] stack backtrace: [ 2369.767429] CPU: 3 PID: 389493 Comm: systemd-udevd Not tainted 6.7.0-rc2-dgc+ #1952 [ 2369.770203] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 2369.773771] Call Trace: [ 2369.774657] [ 2369.775494] dump_stack_lvl+0x5c/0xc0 [ 2369.776765] dump_stack+0x10/0x20 [ 2369.778031] print_deadlock_bug+0x220/0x2f0 [ 2369.779568] __lock_acquire+0x1255/0x2180 [ 2369.781013] lock_acquire+0xb9/0x2c0 [ 2369.782456] ? snapshot_map+0x13e/0x7f0 [ 2369.783927] ? snapshot_map+0x136/0x7f0 [ 2369.785240] _raw_spin_lock+0x34/0x70 [ 2369.786413] ? snapshot_map+0x13e/0x7f0 [ 2369.787482] snapshot_map+0x13e/0x7f0 [ 2369.788462] ? lockdep_init_map_type+0x75/0x250 [ 2369.789650] __map_bio+0x1d7/0x200 [ 2369.790364] dm_submit_bio+0x17d/0x570 [ 2369.791387] __submit_bio+0x4a/0x80 [ 2369.792215] submit_bio_noacct_nocheck+0x108/0x350 [ 2369.793357] submit_bio_noacct+0x115/0x450 [ 2369.794334] submit_bio+0x43/0x60 [ 2369.795112] mpage_readahead+0xf1/0x130 [ 2369.796037] ? blkdev_write_begin+0x30/0x30 [ 2369.797007] blkdev_readahead+0x15/0x20 [ 2369.797893] read_pages+0x5c/0x230 [ 2369.798703] page_cache_ra_unbounded+0x143/0x190 [ 2369.799810] force_page_cache_ra+0x9a/0xc0 [ 2369.800754] page_cache_sync_ra+0x2e/0x50 [ 2369.801704] filemap_get_pages+0x112/0x630 [ 2369.802696] ? __lock_acquire+0x413/0x2180 [ 2369.803663] filemap_read+0xfc/0x3a0 [ 2369.804527] ? __might_sleep+0x42/0x70 [ 2369.805443] blkdev_read_iter+0x6d/0x150 [ 2369.806370] vfs_read+0x1a6/0x2d0 [ 2369.807148] ksys_read+0x71/0xf0 [ 2369.807936] __x64_sys_read+0x19/0x20 [ 2369.808810] do_syscall_64+0x3c/0xe0 [ 2369.809746] entry_SYSCALL_64_after_hwframe+0x63/0x6b [ 2369.810914] RIP: 0033:0x7f9f14dbb03d Turns out that dm-snap holds two hash-bl locks at the same time, so we need nesting semantics to ensure lockdep understands what is going on. Signed-off-by: Dave Chinner --- drivers/md/dm-snap.c | 2 +- include/linux/list_bl.h | 10 ++++++++++ 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c index bf7a574499a3..cd97d5cb295d 100644 --- a/drivers/md/dm-snap.c +++ b/drivers/md/dm-snap.c @@ -645,7 +645,7 @@ static void dm_exception_table_lock_init(struct dm_snapshot *s, chunk_t chunk, static void dm_exception_table_lock(struct dm_exception_table_lock *lock) { hlist_bl_lock(lock->complete_slot); - hlist_bl_lock(lock->pending_slot); + hlist_bl_lock_nested(lock->pending_slot, SINGLE_DEPTH_NESTING); } static void dm_exception_table_unlock(struct dm_exception_table_lock *lock) diff --git a/include/linux/list_bl.h b/include/linux/list_bl.h index 990ad8e24e0b..0e3e60c10563 100644 --- a/include/linux/list_bl.h +++ b/include/linux/list_bl.h @@ -83,6 +83,11 @@ static inline void hlist_bl_lock(struct hlist_bl_head *b) spin_lock(&b->lock); } +static inline void hlist_bl_lock_nested(struct hlist_bl_head *b, int subclass) +{ + spin_lock_nested(&b->lock, subclass); +} + static inline void hlist_bl_unlock(struct hlist_bl_head *b) { spin_unlock(&b->lock); @@ -125,6 +130,11 @@ static inline void hlist_bl_lock(struct hlist_bl_head *b) bit_spin_lock(0, (unsigned long *)b); } +static inline void hlist_bl_lock_nested(struct hlist_bl_head *b, int subclass) +{ + hlist_bl_lock(b); +} + static inline void hlist_bl_unlock(struct hlist_bl_head *b) { __bit_spin_unlock(0, (unsigned long *)b);