Message ID | 20230726080916.17454-5-zhangpeng.00@bytedance.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a985:0:b0:3e4:2afc:c1 with SMTP id t5csp274409vqo; Wed, 26 Jul 2023 02:19:17 -0700 (PDT) X-Google-Smtp-Source: APBJJlGXgVG45GpbscbLCDpdqrC/1/MlbMGSOp6SI5kjG047RCYX9E/uWDAzRWYlZgXDvORqhnTO X-Received: by 2002:a17:90b:3a90:b0:268:47c2:e490 with SMTP id om16-20020a17090b3a9000b0026847c2e490mr1090460pjb.1.1690363157644; Wed, 26 Jul 2023 02:19:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690363157; cv=none; d=google.com; s=arc-20160816; b=Nt5I7RSxu0HhsaYPc0ye5foJoIRwdN18nosK8tMJ+d/isKQOWi0ndZTQfW30Blqnbu gcBTTgMbYrr9qGTJJ5y1Keo1CJB3xZnD0V5O3WgMmn3j9BDxv5onpv0anKUUV4KF2fMZ Xi+FQxVJ6mcapLE06UTj4hX5cGzklqrSVMcw5ZCwSDUpF8fIvxFnviqtsGEGhhg6upzN PC7uCzOyGuThihNjssRpsQG1QVXxaFtN3QNvHaNT3k0ITCVPZVrbp5s8Cws02xlrR5Wf zZmTSwka1My6aoMXdX90cAmIAPXvu8xfY5eFtnnpuIL55Vdn6kCd0Lh4zheMBAJ8RWn1 T09w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=Q9dsUg7Mi/WzTn56TpcYztC53dUlO7YtHGzVyJj2ahA=; fh=83ed9mee+o5Uq0I46ZSfSA+aUr2+chEOwR4qsJ7tckA=; b=JHuuMy016bhv4QBlKQKi/5H17D2kyWMF61ArzBKbdfQ0TLcgG4s/grOorRABUvt+1H Vr1DsZUvA0j/nEM+8O2bGRKIx5yfyNZucEALGSq+Ywkttelrm3Gutq1qMeD8j4NXcEZo FwiN6EkfzlckbIf3Xlpn1SoZQ8EcAOFXAIZuT9YKRoYcrzEeOOPf5E6jzpT0aCA5Wz7b p6qCxdR512hyRXlQgRRhfAlVMBx0gw0BFzC6vayHLuzhQZkwjIpsF6SSH2tT6GAWYuvf phnEXEuAC1NHBGElF0GXvn0BpCn/VsBVEcHSGWgNN9l3oyJis4HPSOj/JRIU2LufI1hv lAVA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=AxvO2W8W; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b2-20020a17090a5a0200b0024753ec4dccsi854154pjd.124.2023.07.26.02.19.04; Wed, 26 Jul 2023 02:19:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=AxvO2W8W; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231499AbjGZIXO (ORCPT <rfc822;kloczko.tomasz@gmail.com> + 99 others); Wed, 26 Jul 2023 04:23:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60704 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231840AbjGZIWY (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 26 Jul 2023 04:22:24 -0400 Received: from mail-pj1-x1031.google.com (mail-pj1-x1031.google.com [IPv6:2607:f8b0:4864:20::1031]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 794B26A6D for <linux-kernel@vger.kernel.org>; Wed, 26 Jul 2023 01:10:13 -0700 (PDT) Received: by mail-pj1-x1031.google.com with SMTP id 98e67ed59e1d1-2680ccc1d52so1608492a91.0 for <linux-kernel@vger.kernel.org>; Wed, 26 Jul 2023 01:10:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1690359013; x=1690963813; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Q9dsUg7Mi/WzTn56TpcYztC53dUlO7YtHGzVyJj2ahA=; b=AxvO2W8WmVjCl+8rJI4ixbPJPEirwx12lLH3TIqpROUBNuTlBFTpwIzvimh0rKV5GO jl53Qb/ksDM+V8W/GLFWjp+RoSYw3jS+hUC4AqvaYFObpV+KEgG/doJAkaO3KYxj67D3 hGPB/vbqL0FvtkEDKiSP2uFPrG+vqg6MD9oePIzWQfrkyb9NfNm5I1GxlLLwR2jNR5xw PYqQU1CDGlNTqFeTUH9EsCbpxb5Yp/asxXO7fyvKXIG2Sip5Qaq5nthqVKfov9s0UZnU nig8GQjrdik29VC+VZghqXGcKhCir4shFIkMGLbvr/P68Zz7mLMG8OhUaIsqqlWUdBPz BlrQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690359013; x=1690963813; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Q9dsUg7Mi/WzTn56TpcYztC53dUlO7YtHGzVyJj2ahA=; b=IKlrQAtPWQQiDTaGob0MP2WoDMDqBZyI/H4WAjjZcJa/LL/a9zXvF5grA6U1p7JMyz vWKA3YXkkROE8oZnGyPWUQ+qBd2UNCJShO7d9H9muAsa4OlPlufr/5FaKIeCaFgEg29F seY+nJoUdyno01AyXRgb7Pt9VMwfH/IlR4xBNdsYFwMJRuQHeSSLc1JhYdW09xJcmkCB rjtaziXcoRI3izxBa1qZCmBYFjeENzPILMomrEsg6KGdc1weSKYFwn6I35N/6CaMtfaq tkP3OYcLO3WWt8GS2l2efl2xIjMKf+GK7m5EouBy8nFDpngklY0BOXcx2hWM8CAjxwyH 0YZw== X-Gm-Message-State: ABy/qLaoMNrU9uBVVvdk3mLJRPZEWicXK0yGtjmHCMIKAeNs0TetOKsx bVYmKudN5fVnCUOyfW8G6Ml/NA== X-Received: by 2002:a17:90b:1498:b0:268:f56:a2d6 with SMTP id js24-20020a17090b149800b002680f56a2d6mr947545pjb.22.1690359012722; Wed, 26 Jul 2023 01:10:12 -0700 (PDT) Received: from GL4FX4PXWL.bytedance.net ([203.208.167.147]) by smtp.gmail.com with ESMTPSA id gc17-20020a17090b311100b002680b2d2ab6sm756540pjb.19.2023.07.26.01.10.07 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 26 Jul 2023 01:10:12 -0700 (PDT) From: Peng Zhang <zhangpeng.00@bytedance.com> To: Liam.Howlett@oracle.com, corbet@lwn.net, akpm@linux-foundation.org, willy@infradead.org, brauner@kernel.org, surenb@google.com, michael.christie@oracle.com, peterz@infradead.org, mathieu.desnoyers@efficios.com, npiggin@gmail.com, avagin@gmail.com Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Peng Zhang <zhangpeng.00@bytedance.com> Subject: [PATCH 04/11] maple_tree: Introduce interfaces __mt_dup() and mt_dup() Date: Wed, 26 Jul 2023 16:09:09 +0800 Message-Id: <20230726080916.17454-5-zhangpeng.00@bytedance.com> X-Mailer: git-send-email 2.37.0 (Apple Git-136) In-Reply-To: <20230726080916.17454-1-zhangpeng.00@bytedance.com> References: <20230726080916.17454-1-zhangpeng.00@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1772474238634275613 X-GMAIL-MSGID: 1772474238634275613 |
Series |
Introduce mt_dup() to improve the performance of fork()
|
|
Commit Message
Peng Zhang
July 26, 2023, 8:09 a.m. UTC
Introduce interfaces __mt_dup() and mt_dup(), which are used to
duplicate a maple tree. Compared with traversing the source tree and
reinserting entry by entry in the new tree, it has better performance.
The difference between __mt_dup() and mt_dup() is that mt_dup() holds
an internal lock.
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
---
include/linux/maple_tree.h | 3 +
lib/maple_tree.c | 211 +++++++++++++++++++++++++++++++++++++
2 files changed, 214 insertions(+)
Comments
* Peng Zhang <zhangpeng.00@bytedance.com> [230726 04:10]: > Introduce interfaces __mt_dup() and mt_dup(), which are used to > duplicate a maple tree. Compared with traversing the source tree and > reinserting entry by entry in the new tree, it has better performance. > The difference between __mt_dup() and mt_dup() is that mt_dup() holds > an internal lock. > > Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> > --- > include/linux/maple_tree.h | 3 + > lib/maple_tree.c | 211 +++++++++++++++++++++++++++++++++++++ > 2 files changed, 214 insertions(+) > > diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h > index c962af188681..229fe78e4c89 100644 > --- a/include/linux/maple_tree.h > +++ b/include/linux/maple_tree.h > @@ -327,6 +327,9 @@ int mtree_store(struct maple_tree *mt, unsigned long index, > void *entry, gfp_t gfp); > void *mtree_erase(struct maple_tree *mt, unsigned long index); > > +int mt_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp); > +int __mt_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp); > + > void mtree_destroy(struct maple_tree *mt); > void __mt_destroy(struct maple_tree *mt); > > diff --git a/lib/maple_tree.c b/lib/maple_tree.c > index da3a2fb405c0..efac6761ae37 100644 > --- a/lib/maple_tree.c > +++ b/lib/maple_tree.c > @@ -6595,6 +6595,217 @@ void *mtree_erase(struct maple_tree *mt, unsigned long index) > } > EXPORT_SYMBOL(mtree_erase); > > +/* > + * mt_dup_free() - Free the nodes of a incomplete maple tree. > + * @mt: The incomplete maple tree > + * @node: Free nodes from @node > + * > + * This function frees all nodes starting from @node in the reverse order of > + * mt_dup_build(). At this point we don't need to hold the source tree lock. > + */ > +static void mt_dup_free(struct maple_tree *mt, struct maple_node *node) > +{ > + void **slots; > + unsigned char offset; > + struct maple_enode *enode; > + enum maple_type type; > + unsigned char count = 0, i; > + Can we make these labels inline functions and try to make this a loop? > +try_ascend: > + if (ma_is_root(node)) { > + mt_free_one(node); > + return; > + } > + > + offset = ma_parent_slot(node); > + type = ma_parent_type(mt, node); > + node = ma_parent(node); > + if (!offset) > + goto free; > + > + offset--; > + > +descend: > + slots = (void **)ma_slots(node, type); > + enode = slots[offset]; > + if (mte_is_leaf(enode)) > + goto free; > + > + type = mte_node_type(enode); > + node = mte_to_node(enode); > + offset = ma_nonleaf_data_end_nocheck(node, type); > + goto descend; > + > +free: > + slots = (void **)ma_slots(node, type); > + count = ma_nonleaf_data_end_nocheck(node, type) + 1; > + for (i = 0; i < count; i++) > + ((unsigned long *)slots)[i] &= ~MAPLE_NODE_MASK; > + > + /* Cast to __rcu to avoid sparse checker complaining. */ > + mt_free_bulk(count, (void __rcu **)slots); > + goto try_ascend; > +} > + > +/* > + * mt_dup_build() - Build a new maple tree from a source tree > + * @mt: The source maple tree to copy from > + * @new: The new maple tree > + * @gfp: The GFP_FLAGS to use for allocations > + * @to_free: Free nodes starting from @to_free if the build fails > + * > + * This function builds a new tree in DFS preorder. If it fails due to memory > + * allocation, @to_free will store the last failed node to free the incomplete > + * tree. Use mt_dup_free() to free nodes. > + * > + * Return: 0 on success, -ENOMEM if memory could not be allocated. > + */ > +static inline int mt_dup_build(struct maple_tree *mt, struct maple_tree *new, > + gfp_t gfp, struct maple_node **to_free) I am trying to change the functions to be two tabs of indent for arguments from now on. It allows for more to fit on a single line and still maintains a clear separation between code and argument lists. > +{ > + struct maple_enode *enode; > + struct maple_node *new_node, *new_parent = NULL, *node; > + enum maple_type type; > + void __rcu **slots; > + void **new_slots; > + unsigned char count, request, i, offset; > + unsigned long *set_parent; > + unsigned long new_root; > + > + mt_init_flags(new, mt->ma_flags); > + enode = mt_root_locked(mt); > + if (unlikely(!xa_is_node(enode))) { > + rcu_assign_pointer(new->ma_root, enode); > + return 0; > + } > + > + new_node = mt_alloc_one(gfp); > + if (!new_node) > + return -ENOMEM; > + > + new_root = (unsigned long)new_node; > + new_root |= (unsigned long)enode & MAPLE_NODE_MASK; > + > +copy_node: Can you make copy_node, descend, ascend inline functions instead of the goto jumping please? It's better to have loops over jumping around a lot. Gotos are good for undoing things and retry, but constructing loops with them makes it difficult to follow. > + node = mte_to_node(enode); > + type = mte_node_type(enode); > + memcpy(new_node, node, sizeof(struct maple_node)); > + > + set_parent = (unsigned long *)&(new_node->parent); > + *set_parent &= MAPLE_NODE_MASK; > + *set_parent |= (unsigned long)new_parent; Maybe make a small inline to set the parent instead of this? There are some defined helpers for setting the types like ma_parent_ptr() and ma_enode_ptr() to make casting more type-safe. > + if (ma_is_leaf(type)) > + goto ascend; > + > + new_slots = (void **)ma_slots(new_node, type); > + slots = ma_slots(node, type); > + request = ma_nonleaf_data_end(mt, node, type) + 1; > + count = mt_alloc_bulk(gfp, request, new_slots); > + if (!count) { > + *to_free = new_node; > + return -ENOMEM; > + } > + > + for (i = 0; i < count; i++) > + ((unsigned long *)new_slots)[i] |= > + ((unsigned long)mt_slot_locked(mt, slots, i) & > + MAPLE_NODE_MASK); > + offset = 0; > + > +descend: > + new_parent = new_node; > + enode = mt_slot_locked(mt, slots, offset); > + new_node = mte_to_node(new_slots[offset]); > + goto copy_node; > + > +ascend: > + if (ma_is_root(node)) { > + new_node = mte_to_node((void *)new_root); > + new_node->parent = ma_parent_ptr((unsigned long)new | > + MA_ROOT_PARENT); > + rcu_assign_pointer(new->ma_root, (void *)new_root); > + return 0; > + } > + > + offset = ma_parent_slot(node); > + type = ma_parent_type(mt, node); > + node = ma_parent(node); > + new_node = ma_parent(new_node); > + if (offset < ma_nonleaf_data_end(mt, node, type)) { > + offset++; > + new_slots = (void **)ma_slots(new_node, type); > + slots = ma_slots(node, type); > + goto descend; > + } > + > + goto ascend; > +} > + > +/** > + * __mt_dup(): Duplicate a maple tree > + * @mt: The source maple tree > + * @new: The new maple tree > + * @gfp: The GFP_FLAGS to use for allocations > + * > + * This function duplicates a maple tree using a faster method than traversing > + * the source tree and inserting entries into the new tree one by one. The user > + * needs to lock the source tree manually. Before calling this function, @new > + * must be an empty tree or an uninitialized tree. If @mt uses an external lock, > + * we may also need to manually set @new's external lock using > + * mt_set_external_lock(). > + * > + * Return: 0 on success, -ENOMEM if memory could not be allocated. > + */ > +int __mt_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp) We use mas_ for things that won't handle the locking and pass in a maple state. Considering the leaves need to be altered once this is returned, I would expect passing in a maple state should be feasible? > +{ > + int ret; > + struct maple_node *to_free = NULL; > + > + ret = mt_dup_build(mt, new, gfp, &to_free); > + > + if (unlikely(ret == -ENOMEM)) { On other errors, will the half constructed tree be returned? Is this safe? > + if (to_free) > + mt_dup_free(new, to_free); > + } > + > + return ret; > +} > +EXPORT_SYMBOL(__mt_dup); > + > +/** > + * mt_dup(): Duplicate a maple tree > + * @mt: The source maple tree > + * @new: The new maple tree > + * @gfp: The GFP_FLAGS to use for allocations > + * > + * This function duplicates a maple tree using a faster method than traversing > + * the source tree and inserting entries into the new tree one by one. The > + * function will lock the source tree with an internal lock, and the user does > + * not need to manually handle the lock. Before calling this function, @new must > + * be an empty tree or an uninitialized tree. If @mt uses an external lock, we > + * may also need to manually set @new's external lock using > + * mt_set_external_lock(). > + * > + * Return: 0 on success, -ENOMEM if memory could not be allocated. > + */ > +int mt_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp) mtree_ ususually used to indicate locking is handled. > +{ > + int ret; > + struct maple_node *to_free = NULL; > + > + mtree_lock(mt); > + ret = mt_dup_build(mt, new, gfp, &to_free); > + mtree_unlock(mt); > + > + if (unlikely(ret == -ENOMEM)) { > + if (to_free) > + mt_dup_free(new, to_free); Again, is a half constructed tree safe to return? Since each caller checks to_free is NULL, could that be in mt_dup_free() instead? > + } > + > + return ret; > +} > +EXPORT_SYMBOL(mt_dup); > + > /** > * __mt_destroy() - Walk and free all nodes of a locked maple tree. > * @mt: The maple tree > -- > 2.20.1 > >
在 2023/7/27 00:03, Liam R. Howlett 写道: > * Peng Zhang <zhangpeng.00@bytedance.com> [230726 04:10]: >> Introduce interfaces __mt_dup() and mt_dup(), which are used to >> duplicate a maple tree. Compared with traversing the source tree and >> reinserting entry by entry in the new tree, it has better performance. >> The difference between __mt_dup() and mt_dup() is that mt_dup() holds >> an internal lock. >> >> Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> >> --- >> include/linux/maple_tree.h | 3 + >> lib/maple_tree.c | 211 +++++++++++++++++++++++++++++++++++++ >> 2 files changed, 214 insertions(+) >> >> diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h >> index c962af188681..229fe78e4c89 100644 >> --- a/include/linux/maple_tree.h >> +++ b/include/linux/maple_tree.h >> @@ -327,6 +327,9 @@ int mtree_store(struct maple_tree *mt, unsigned long index, >> void *entry, gfp_t gfp); >> void *mtree_erase(struct maple_tree *mt, unsigned long index); >> >> +int mt_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp); >> +int __mt_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp); >> + >> void mtree_destroy(struct maple_tree *mt); >> void __mt_destroy(struct maple_tree *mt); >> >> diff --git a/lib/maple_tree.c b/lib/maple_tree.c >> index da3a2fb405c0..efac6761ae37 100644 >> --- a/lib/maple_tree.c >> +++ b/lib/maple_tree.c >> @@ -6595,6 +6595,217 @@ void *mtree_erase(struct maple_tree *mt, unsigned long index) >> } >> EXPORT_SYMBOL(mtree_erase); >> >> +/* >> + * mt_dup_free() - Free the nodes of a incomplete maple tree. >> + * @mt: The incomplete maple tree >> + * @node: Free nodes from @node >> + * >> + * This function frees all nodes starting from @node in the reverse order of >> + * mt_dup_build(). At this point we don't need to hold the source tree lock. >> + */ >> +static void mt_dup_free(struct maple_tree *mt, struct maple_node *node) >> +{ >> + void **slots; >> + unsigned char offset; >> + struct maple_enode *enode; >> + enum maple_type type; >> + unsigned char count = 0, i; >> + > > Can we make these labels inline functions and try to make this a loop? I did this just to make things easier. Refer to the implementation of walk_tg_tree_from() in sched/core.c. Using some loops and inline functions probably doesn't simplify things. I'll try to do that and give up if it complicates things. > >> +try_ascend: >> + if (ma_is_root(node)) { >> + mt_free_one(node); >> + return; >> + } >> + >> + offset = ma_parent_slot(node); >> + type = ma_parent_type(mt, node); >> + node = ma_parent(node); >> + if (!offset) >> + goto free; >> + >> + offset--; >> + >> +descend: >> + slots = (void **)ma_slots(node, type); >> + enode = slots[offset]; >> + if (mte_is_leaf(enode)) >> + goto free; >> + >> + type = mte_node_type(enode); >> + node = mte_to_node(enode); >> + offset = ma_nonleaf_data_end_nocheck(node, type); >> + goto descend; >> + >> +free: >> + slots = (void **)ma_slots(node, type); >> + count = ma_nonleaf_data_end_nocheck(node, type) + 1; >> + for (i = 0; i < count; i++) >> + ((unsigned long *)slots)[i] &= ~MAPLE_NODE_MASK; >> + >> + /* Cast to __rcu to avoid sparse checker complaining. */ >> + mt_free_bulk(count, (void __rcu **)slots); >> + goto try_ascend; >> +} >> + >> +/* >> + * mt_dup_build() - Build a new maple tree from a source tree >> + * @mt: The source maple tree to copy from >> + * @new: The new maple tree >> + * @gfp: The GFP_FLAGS to use for allocations >> + * @to_free: Free nodes starting from @to_free if the build fails >> + * >> + * This function builds a new tree in DFS preorder. If it fails due to memory >> + * allocation, @to_free will store the last failed node to free the incomplete >> + * tree. Use mt_dup_free() to free nodes. >> + * >> + * Return: 0 on success, -ENOMEM if memory could not be allocated. >> + */ >> +static inline int mt_dup_build(struct maple_tree *mt, struct maple_tree *new, >> + gfp_t gfp, struct maple_node **to_free) > > I am trying to change the functions to be two tabs of indent for > arguments from now on. It allows for more to fit on a single line and > still maintains a clear separation between code and argument lists. I'm not too concerned about code formatting. . . At least in this patchset. > >> +{ >> + struct maple_enode *enode; >> + struct maple_node *new_node, *new_parent = NULL, *node; >> + enum maple_type type; >> + void __rcu **slots; >> + void **new_slots; >> + unsigned char count, request, i, offset; >> + unsigned long *set_parent; >> + unsigned long new_root; >> + >> + mt_init_flags(new, mt->ma_flags); >> + enode = mt_root_locked(mt); >> + if (unlikely(!xa_is_node(enode))) { >> + rcu_assign_pointer(new->ma_root, enode); >> + return 0; >> + } >> + >> + new_node = mt_alloc_one(gfp); >> + if (!new_node) >> + return -ENOMEM; >> + >> + new_root = (unsigned long)new_node; >> + new_root |= (unsigned long)enode & MAPLE_NODE_MASK; >> + >> +copy_node: > > Can you make copy_node, descend, ascend inline functions instead of the > goto jumping please? It's better to have loops over jumping around a > lot. Gotos are good for undoing things and retry, but constructing > loops with them makes it difficult to follow. Same as above. > >> + node = mte_to_node(enode); >> + type = mte_node_type(enode); >> + memcpy(new_node, node, sizeof(struct maple_node)); >> + >> + set_parent = (unsigned long *)&(new_node->parent); >> + *set_parent &= MAPLE_NODE_MASK; >> + *set_parent |= (unsigned long)new_parent; > > Maybe make a small inline to set the parent instead of this? > > There are some defined helpers for setting the types like > ma_parent_ptr() and ma_enode_ptr() to make casting more type-safe. Ok, I'll try to do that. > >> + if (ma_is_leaf(type)) >> + goto ascend; >> + >> + new_slots = (void **)ma_slots(new_node, type); >> + slots = ma_slots(node, type); >> + request = ma_nonleaf_data_end(mt, node, type) + 1; >> + count = mt_alloc_bulk(gfp, request, new_slots); >> + if (!count) { >> + *to_free = new_node; >> + return -ENOMEM; >> + } >> + >> + for (i = 0; i < count; i++) >> + ((unsigned long *)new_slots)[i] |= >> + ((unsigned long)mt_slot_locked(mt, slots, i) & >> + MAPLE_NODE_MASK); >> + offset = 0; >> + >> +descend: >> + new_parent = new_node; >> + enode = mt_slot_locked(mt, slots, offset); >> + new_node = mte_to_node(new_slots[offset]); >> + goto copy_node; >> + >> +ascend: >> + if (ma_is_root(node)) { >> + new_node = mte_to_node((void *)new_root); >> + new_node->parent = ma_parent_ptr((unsigned long)new | >> + MA_ROOT_PARENT); >> + rcu_assign_pointer(new->ma_root, (void *)new_root); >> + return 0; >> + } >> + >> + offset = ma_parent_slot(node); >> + type = ma_parent_type(mt, node); >> + node = ma_parent(node); >> + new_node = ma_parent(new_node); >> + if (offset < ma_nonleaf_data_end(mt, node, type)) { >> + offset++; >> + new_slots = (void **)ma_slots(new_node, type); >> + slots = ma_slots(node, type); >> + goto descend; >> + } >> + >> + goto ascend; >> +} >> + >> +/** >> + * __mt_dup(): Duplicate a maple tree >> + * @mt: The source maple tree >> + * @new: The new maple tree >> + * @gfp: The GFP_FLAGS to use for allocations >> + * >> + * This function duplicates a maple tree using a faster method than traversing >> + * the source tree and inserting entries into the new tree one by one. The user >> + * needs to lock the source tree manually. Before calling this function, @new >> + * must be an empty tree or an uninitialized tree. If @mt uses an external lock, >> + * we may also need to manually set @new's external lock using >> + * mt_set_external_lock(). >> + * >> + * Return: 0 on success, -ENOMEM if memory could not be allocated. >> + */ >> +int __mt_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp) > > We use mas_ for things that won't handle the locking and pass in a maple > state. Considering the leaves need to be altered once this is returned, > I would expect passing in a maple state should be feasible? But we don't really need mas here. What do you think the state of mas should be when this function returns? Make it point to the first entry, or the last entry? > >> +{ >> + int ret; >> + struct maple_node *to_free = NULL; >> + >> + ret = mt_dup_build(mt, new, gfp, &to_free); >> + >> + if (unlikely(ret == -ENOMEM)) { > > On other errors, will the half constructed tree be returned? Is this > safe? Of course, mt_dup_free() is carefully designed to handle this. > >> + if (to_free) >> + mt_dup_free(new, to_free); >> + } >> + >> + return ret; >> +} >> +EXPORT_SYMBOL(__mt_dup); >> + >> +/** >> + * mt_dup(): Duplicate a maple tree >> + * @mt: The source maple tree >> + * @new: The new maple tree >> + * @gfp: The GFP_FLAGS to use for allocations >> + * >> + * This function duplicates a maple tree using a faster method than traversing >> + * the source tree and inserting entries into the new tree one by one. The >> + * function will lock the source tree with an internal lock, and the user does >> + * not need to manually handle the lock. Before calling this function, @new must >> + * be an empty tree or an uninitialized tree. If @mt uses an external lock, we >> + * may also need to manually set @new's external lock using >> + * mt_set_external_lock(). >> + * >> + * Return: 0 on success, -ENOMEM if memory could not be allocated. >> + */ >> +int mt_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp) > > mtree_ ususually used to indicate locking is handled. Before unifying mtree_* and mt_*, I don't think I can see any difference between them. At least mt_set_in_rcu() and mt_clear_in_rcu() will hold the lock. > >> +{ >> + int ret; >> + struct maple_node *to_free = NULL; >> + >> + mtree_lock(mt); >> + ret = mt_dup_build(mt, new, gfp, &to_free); >> + mtree_unlock(mt); >> + >> + if (unlikely(ret == -ENOMEM)) { >> + if (to_free) >> + mt_dup_free(new, to_free); > > Again, is a half constructed tree safe to return? Since each caller > checks to_free is NULL, could that be in mt_dup_free() instead? Yes, this check can be put in mt_dup_free(). > >> + } >> + >> + return ret; >> +} >> +EXPORT_SYMBOL(mt_dup); >> + >> /** >> * __mt_destroy() - Walk and free all nodes of a locked maple tree. >> * @mt: The maple tree >> -- >> 2.20.1 >> >>
* Peng Zhang <zhangpeng.00@bytedance.com> [230731 08:24]: > > > 在 2023/7/27 00:03, Liam R. Howlett 写道: > > * Peng Zhang <zhangpeng.00@bytedance.com> [230726 04:10]: > > > Introduce interfaces __mt_dup() and mt_dup(), which are used to > > > duplicate a maple tree. Compared with traversing the source tree and > > > reinserting entry by entry in the new tree, it has better performance. > > > The difference between __mt_dup() and mt_dup() is that mt_dup() holds > > > an internal lock. > > > > > > Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com> > > > --- > > > include/linux/maple_tree.h | 3 + > > > lib/maple_tree.c | 211 +++++++++++++++++++++++++++++++++++++ > > > 2 files changed, 214 insertions(+) > > > > > > diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h > > > index c962af188681..229fe78e4c89 100644 > > > --- a/include/linux/maple_tree.h > > > +++ b/include/linux/maple_tree.h > > > @@ -327,6 +327,9 @@ int mtree_store(struct maple_tree *mt, unsigned long index, > > > void *entry, gfp_t gfp); > > > void *mtree_erase(struct maple_tree *mt, unsigned long index); > > > +int mt_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp); > > > +int __mt_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp); > > > + > > > void mtree_destroy(struct maple_tree *mt); > > > void __mt_destroy(struct maple_tree *mt); > > > diff --git a/lib/maple_tree.c b/lib/maple_tree.c > > > index da3a2fb405c0..efac6761ae37 100644 > > > --- a/lib/maple_tree.c > > > +++ b/lib/maple_tree.c > > > @@ -6595,6 +6595,217 @@ void *mtree_erase(struct maple_tree *mt, unsigned long index) > > > } > > > EXPORT_SYMBOL(mtree_erase); > > > +/* > > > + * mt_dup_free() - Free the nodes of a incomplete maple tree. > > > + * @mt: The incomplete maple tree > > > + * @node: Free nodes from @node > > > + * > > > + * This function frees all nodes starting from @node in the reverse order of > > > + * mt_dup_build(). At this point we don't need to hold the source tree lock. > > > + */ > > > +static void mt_dup_free(struct maple_tree *mt, struct maple_node *node) > > > +{ > > > + void **slots; > > > + unsigned char offset; > > > + struct maple_enode *enode; > > > + enum maple_type type; > > > + unsigned char count = 0, i; > > > + > > > > Can we make these labels inline functions and try to make this a loop? > I did this just to make things easier. Refer to the implementation of > walk_tg_tree_from() in sched/core.c. Using some loops and inline > functions probably doesn't simplify things. I'll try to do that and give > up if it complicates things. Thanks, I'd like to try and simplify the code instead of adding goto label loops. The code you are referencing is from 2008 and goto loops are not common. > > > > > +try_ascend: > > > + if (ma_is_root(node)) { > > > + mt_free_one(node); > > > + return; > > > + } > > > + > > > + offset = ma_parent_slot(node); > > > + type = ma_parent_type(mt, node); > > > + node = ma_parent(node); > > > + if (!offset) > > > + goto free; > > > + > > > + offset--; > > > + > > > +descend: > > > + slots = (void **)ma_slots(node, type); > > > + enode = slots[offset]; > > > + if (mte_is_leaf(enode)) > > > + goto free; > > > + > > > + type = mte_node_type(enode); > > > + node = mte_to_node(enode); > > > + offset = ma_nonleaf_data_end_nocheck(node, type); > > > + goto descend; > > > + > > > +free: > > > + slots = (void **)ma_slots(node, type); > > > + count = ma_nonleaf_data_end_nocheck(node, type) + 1; > > > + for (i = 0; i < count; i++) > > > + ((unsigned long *)slots)[i] &= ~MAPLE_NODE_MASK; > > > + > > > + /* Cast to __rcu to avoid sparse checker complaining. */ > > > + mt_free_bulk(count, (void __rcu **)slots); > > > + goto try_ascend; > > > +} > > > + > > > +/* > > > + * mt_dup_build() - Build a new maple tree from a source tree > > > + * @mt: The source maple tree to copy from > > > + * @new: The new maple tree > > > + * @gfp: The GFP_FLAGS to use for allocations > > > + * @to_free: Free nodes starting from @to_free if the build fails > > > + * > > > + * This function builds a new tree in DFS preorder. If it fails due to memory > > > + * allocation, @to_free will store the last failed node to free the incomplete > > > + * tree. Use mt_dup_free() to free nodes. > > > + * > > > + * Return: 0 on success, -ENOMEM if memory could not be allocated. > > > + */ > > > +static inline int mt_dup_build(struct maple_tree *mt, struct maple_tree *new, > > > + gfp_t gfp, struct maple_node **to_free) > > > > I am trying to change the functions to be two tabs of indent for > > arguments from now on. It allows for more to fit on a single line and > > still maintains a clear separation between code and argument lists. > I'm not too concerned about code formatting. . . At least in this > patchset. I have a mess of it in the tree and wanted to communicate my desire to shift to using two tabs for extra arguments in the future. > > > > > +{ > > > + struct maple_enode *enode; > > > + struct maple_node *new_node, *new_parent = NULL, *node; > > > + enum maple_type type; > > > + void __rcu **slots; > > > + void **new_slots; > > > + unsigned char count, request, i, offset; > > > + unsigned long *set_parent; > > > + unsigned long new_root; > > > + > > > + mt_init_flags(new, mt->ma_flags); > > > + enode = mt_root_locked(mt); > > > + if (unlikely(!xa_is_node(enode))) { > > > + rcu_assign_pointer(new->ma_root, enode); > > > + return 0; > > > + } > > > + > > > + new_node = mt_alloc_one(gfp); > > > + if (!new_node) > > > + return -ENOMEM; > > > + > > > + new_root = (unsigned long)new_node; > > > + new_root |= (unsigned long)enode & MAPLE_NODE_MASK; > > > + > > > +copy_node: > > > > Can you make copy_node, descend, ascend inline functions instead of the > > goto jumping please? It's better to have loops over jumping around a > > lot. Gotos are good for undoing things and retry, but constructing > > loops with them makes it difficult to follow. > Same as above. > > > > > + node = mte_to_node(enode); > > > + type = mte_node_type(enode); > > > + memcpy(new_node, node, sizeof(struct maple_node)); > > > + > > > + set_parent = (unsigned long *)&(new_node->parent); > > > + *set_parent &= MAPLE_NODE_MASK; > > > + *set_parent |= (unsigned long)new_parent; > > > > Maybe make a small inline to set the parent instead of this? > > > > There are some defined helpers for setting the types like > > ma_parent_ptr() and ma_enode_ptr() to make casting more type-safe. > Ok, I'll try to do that. > > > > > + if (ma_is_leaf(type)) > > > + goto ascend; > > > + > > > + new_slots = (void **)ma_slots(new_node, type); > > > + slots = ma_slots(node, type); > > > + request = ma_nonleaf_data_end(mt, node, type) + 1; > > > + count = mt_alloc_bulk(gfp, request, new_slots); > > > + if (!count) { > > > + *to_free = new_node; > > > + return -ENOMEM; > > > + } > > > + > > > + for (i = 0; i < count; i++) > > > + ((unsigned long *)new_slots)[i] |= > > > + ((unsigned long)mt_slot_locked(mt, slots, i) & > > > + MAPLE_NODE_MASK); > > > + offset = 0; > > > + > > > +descend: > > > + new_parent = new_node; > > > + enode = mt_slot_locked(mt, slots, offset); > > > + new_node = mte_to_node(new_slots[offset]); > > > + goto copy_node; > > > + > > > +ascend: > > > + if (ma_is_root(node)) { > > > + new_node = mte_to_node((void *)new_root); > > > + new_node->parent = ma_parent_ptr((unsigned long)new | > > > + MA_ROOT_PARENT); > > > + rcu_assign_pointer(new->ma_root, (void *)new_root); > > > + return 0; > > > + } > > > + > > > + offset = ma_parent_slot(node); > > > + type = ma_parent_type(mt, node); > > > + node = ma_parent(node); > > > + new_node = ma_parent(new_node); > > > + if (offset < ma_nonleaf_data_end(mt, node, type)) { > > > + offset++; > > > + new_slots = (void **)ma_slots(new_node, type); > > > + slots = ma_slots(node, type); > > > + goto descend; > > > + } > > > + > > > + goto ascend; > > > +} > > > + > > > +/** > > > + * __mt_dup(): Duplicate a maple tree > > > + * @mt: The source maple tree > > > + * @new: The new maple tree > > > + * @gfp: The GFP_FLAGS to use for allocations > > > + * > > > + * This function duplicates a maple tree using a faster method than traversing > > > + * the source tree and inserting entries into the new tree one by one. The user > > > + * needs to lock the source tree manually. Before calling this function, @new > > > + * must be an empty tree or an uninitialized tree. If @mt uses an external lock, > > > + * we may also need to manually set @new's external lock using > > > + * mt_set_external_lock(). > > > + * > > > + * Return: 0 on success, -ENOMEM if memory could not be allocated. > > > + */ > > > +int __mt_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp) > > > > We use mas_ for things that won't handle the locking and pass in a maple > > state. Considering the leaves need to be altered once this is returned, > > I would expect passing in a maple state should be feasible? > But we don't really need mas here. What do you think the state of mas > should be when this function returns? Make it point to the first entry, > or the last entry? I would write it to point to the first element so that the call to replace the first element can just do that without an extra walk and document the maple state end point. > > > > > +{ > > > + int ret; > > > + struct maple_node *to_free = NULL; > > > + > > > + ret = mt_dup_build(mt, new, gfp, &to_free); > > > + > > > + if (unlikely(ret == -ENOMEM)) { > > > > On other errors, will the half constructed tree be returned? Is this > > safe? > Of course, mt_dup_free() is carefully designed to handle this. > > > > > + if (to_free) > > > + mt_dup_free(new, to_free); > > > + } > > > + > > > + return ret; > > > +} > > > +EXPORT_SYMBOL(__mt_dup); > > > + > > > +/** > > > + * mt_dup(): Duplicate a maple tree > > > + * @mt: The source maple tree > > > + * @new: The new maple tree > > > + * @gfp: The GFP_FLAGS to use for allocations > > > + * > > > + * This function duplicates a maple tree using a faster method than traversing > > > + * the source tree and inserting entries into the new tree one by one. The > > > + * function will lock the source tree with an internal lock, and the user does > > > + * not need to manually handle the lock. Before calling this function, @new must > > > + * be an empty tree or an uninitialized tree. If @mt uses an external lock, we > > > + * may also need to manually set @new's external lock using > > > + * mt_set_external_lock(). > > > + * > > > + * Return: 0 on success, -ENOMEM if memory could not be allocated. > > > + */ > > > +int mt_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp) > > > > mtree_ ususually used to indicate locking is handled. > Before unifying mtree_* and mt_*, I don't think I can see any difference > between them. At least mt_set_in_rcu() and mt_clear_in_rcu() will hold > the lock. Fair enough. I was thinking this closely matches __mt_destroy() and mtree_destroy(). We could be consistent in our inconsistency, at least. > > > > > +{ > > > + int ret; > > > + struct maple_node *to_free = NULL; > > > + > > > + mtree_lock(mt); > > > + ret = mt_dup_build(mt, new, gfp, &to_free); > > > + mtree_unlock(mt); > > > + > > > + if (unlikely(ret == -ENOMEM)) { > > > + if (to_free) > > > + mt_dup_free(new, to_free); > > > > Again, is a half constructed tree safe to return? Since each caller > > checks to_free is NULL, could that be in mt_dup_free() instead? > Yes, this check can be put in mt_dup_free(). > > > > > + } > > > + > > > + return ret; > > > +} > > > +EXPORT_SYMBOL(mt_dup); > > > + > > > /** > > > * __mt_destroy() - Walk and free all nodes of a locked maple tree. > > > * @mt: The maple tree > > > -- > > > 2.20.1 > > > > > >
* Peng Zhang <zhangpeng.00@bytedance.com> [230816 09:42]: > > ... > > > > > +/** > > > > > + * __mt_dup(): Duplicate a maple tree > > > > > + * @mt: The source maple tree > > > > > + * @new: The new maple tree > > > > > + * @gfp: The GFP_FLAGS to use for allocations > > > > > + * > > > > > + * This function duplicates a maple tree using a faster method than traversing > > > > > + * the source tree and inserting entries into the new tree one by one. The user > > > > > + * needs to lock the source tree manually. Before calling this function, @new > > > > > + * must be an empty tree or an uninitialized tree. If @mt uses an external lock, > > > > > + * we may also need to manually set @new's external lock using > > > > > + * mt_set_external_lock(). > > > > > + * > > > > > + * Return: 0 on success, -ENOMEM if memory could not be allocated. > > > > > + */ > > > > > +int __mt_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp) > > > > > > > > We use mas_ for things that won't handle the locking and pass in a maple > > > > state. Considering the leaves need to be altered once this is returned, > > > > I would expect passing in a maple state should be feasible? > > > But we don't really need mas here. What do you think the state of mas > > > should be when this function returns? Make it point to the first entry, > > > or the last entry? > > > > I would write it to point to the first element so that the call to > > replace the first element can just do that without an extra walk and > > document the maple state end point. > Unfortunately, this does not seem to be convenient. Users usually use > mas_for_each() to replace elements. If we set mas to the first element, > the first call to mas_find() in mas_for_each() will get the next > element. This sounds like the need for another iterator specifically for duplicating. > > There may also be other scenarios where the user does not necessarily > have to replace every element. Do you mean a limit or elements that need to be skipped? We could have a limit on the iteration. > > Finally, getting the first element in __mt_dup() requires an additional > check to check whether the first element has already been recorded. Such > a check will be performed at each leaf node, which is unnecessary > overhead. > > Of course, the first reason is the main reason, which prevents us from > using mas_for_each(). So I don't want to record the first element. I don't like the interface because it can easily be misunderstood and used incorrectly. I don't know how to make a cleaner interface, but I've gone through a few thoughts: The first was hide _all of it_ in a new iterator: mas_dup_each(old, new, old_entry) { if (don't_dup(old_entry)) { mas_erase(new); continue; } mas_dup_insert(new, new_entry); } This iterator would check if mas_is_start(old) and dup the tree in that event. Leave the both new trees pointing to the first element and set old_entry. I don't know how to handle the failure in duplicating the tree in this case - I guess we could return old_entry = NULL and check if mas_is_err(old) after the loop. Do you see a problem with this? The second idea was an init of the old tree. This is closest to what you have: if (mas_dup_init(old, new)) goto -ENOMEM; mas_dup_each(old, new) { if (don't_dup(old_entry)) { mas_erase(new); continue; } mas_dup_insert(new, new_entry); } This would duplicate the tree at the start and leave both pointing at the first element so that mas_dup_each() could start on that element. Each subsequent call would go to the next element in both maple states. It sounds like you don't want this for performance reasons? Although looking at mas_find() today, I think this could still work since we are checking the maple state for a lot. Both ideas could be even faster than what you have if we handle the special cases of mas_is_none()/mas_is_ptr() in a smarter way because we don't need to be as worried about the entry point of the maple state as much as we do with mas_find()/mas_for_each(). I mean, is it possible to get to a mas_is_none() or mas_is_ptr() on duplicating a tree? How do we handle these users? Both ideas still suffer from someone saying "Gee, that {insert function name here} is used in the forking code, so I can totally use that in my code because that's how it work!" and find out it works for the limited testing they do. Then it fails later and the emails start flying. I almost think we should do something like this on insert: void mas_dup_insert(old, new, new_entry) { WARN_ON_ONCE(old == new); WARN_ON_ONCE(old->index != new->index); WARN_ON_ONCE(old->last != new->last); ... } This would at least _require_ someone to have two maple states and hopefully think twice on using it where it should not be used. The bottom line is that this code is close to what we need to make forking better, but I fear the misuse of the interface. Something else to think about: In the work items for the Maple Tree, there is a plan to have an enum to specify the type of write that is going to happen. The idea was for mas_preallocate() to set this type of write so we can just go right to the correct function. We could use that here and set the maple state write type to a direct replacement. Thanks, Liam
diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h index c962af188681..229fe78e4c89 100644 --- a/include/linux/maple_tree.h +++ b/include/linux/maple_tree.h @@ -327,6 +327,9 @@ int mtree_store(struct maple_tree *mt, unsigned long index, void *entry, gfp_t gfp); void *mtree_erase(struct maple_tree *mt, unsigned long index); +int mt_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp); +int __mt_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp); + void mtree_destroy(struct maple_tree *mt); void __mt_destroy(struct maple_tree *mt); diff --git a/lib/maple_tree.c b/lib/maple_tree.c index da3a2fb405c0..efac6761ae37 100644 --- a/lib/maple_tree.c +++ b/lib/maple_tree.c @@ -6595,6 +6595,217 @@ void *mtree_erase(struct maple_tree *mt, unsigned long index) } EXPORT_SYMBOL(mtree_erase); +/* + * mt_dup_free() - Free the nodes of a incomplete maple tree. + * @mt: The incomplete maple tree + * @node: Free nodes from @node + * + * This function frees all nodes starting from @node in the reverse order of + * mt_dup_build(). At this point we don't need to hold the source tree lock. + */ +static void mt_dup_free(struct maple_tree *mt, struct maple_node *node) +{ + void **slots; + unsigned char offset; + struct maple_enode *enode; + enum maple_type type; + unsigned char count = 0, i; + +try_ascend: + if (ma_is_root(node)) { + mt_free_one(node); + return; + } + + offset = ma_parent_slot(node); + type = ma_parent_type(mt, node); + node = ma_parent(node); + if (!offset) + goto free; + + offset--; + +descend: + slots = (void **)ma_slots(node, type); + enode = slots[offset]; + if (mte_is_leaf(enode)) + goto free; + + type = mte_node_type(enode); + node = mte_to_node(enode); + offset = ma_nonleaf_data_end_nocheck(node, type); + goto descend; + +free: + slots = (void **)ma_slots(node, type); + count = ma_nonleaf_data_end_nocheck(node, type) + 1; + for (i = 0; i < count; i++) + ((unsigned long *)slots)[i] &= ~MAPLE_NODE_MASK; + + /* Cast to __rcu to avoid sparse checker complaining. */ + mt_free_bulk(count, (void __rcu **)slots); + goto try_ascend; +} + +/* + * mt_dup_build() - Build a new maple tree from a source tree + * @mt: The source maple tree to copy from + * @new: The new maple tree + * @gfp: The GFP_FLAGS to use for allocations + * @to_free: Free nodes starting from @to_free if the build fails + * + * This function builds a new tree in DFS preorder. If it fails due to memory + * allocation, @to_free will store the last failed node to free the incomplete + * tree. Use mt_dup_free() to free nodes. + * + * Return: 0 on success, -ENOMEM if memory could not be allocated. + */ +static inline int mt_dup_build(struct maple_tree *mt, struct maple_tree *new, + gfp_t gfp, struct maple_node **to_free) +{ + struct maple_enode *enode; + struct maple_node *new_node, *new_parent = NULL, *node; + enum maple_type type; + void __rcu **slots; + void **new_slots; + unsigned char count, request, i, offset; + unsigned long *set_parent; + unsigned long new_root; + + mt_init_flags(new, mt->ma_flags); + enode = mt_root_locked(mt); + if (unlikely(!xa_is_node(enode))) { + rcu_assign_pointer(new->ma_root, enode); + return 0; + } + + new_node = mt_alloc_one(gfp); + if (!new_node) + return -ENOMEM; + + new_root = (unsigned long)new_node; + new_root |= (unsigned long)enode & MAPLE_NODE_MASK; + +copy_node: + node = mte_to_node(enode); + type = mte_node_type(enode); + memcpy(new_node, node, sizeof(struct maple_node)); + + set_parent = (unsigned long *)&(new_node->parent); + *set_parent &= MAPLE_NODE_MASK; + *set_parent |= (unsigned long)new_parent; + if (ma_is_leaf(type)) + goto ascend; + + new_slots = (void **)ma_slots(new_node, type); + slots = ma_slots(node, type); + request = ma_nonleaf_data_end(mt, node, type) + 1; + count = mt_alloc_bulk(gfp, request, new_slots); + if (!count) { + *to_free = new_node; + return -ENOMEM; + } + + for (i = 0; i < count; i++) + ((unsigned long *)new_slots)[i] |= + ((unsigned long)mt_slot_locked(mt, slots, i) & + MAPLE_NODE_MASK); + offset = 0; + +descend: + new_parent = new_node; + enode = mt_slot_locked(mt, slots, offset); + new_node = mte_to_node(new_slots[offset]); + goto copy_node; + +ascend: + if (ma_is_root(node)) { + new_node = mte_to_node((void *)new_root); + new_node->parent = ma_parent_ptr((unsigned long)new | + MA_ROOT_PARENT); + rcu_assign_pointer(new->ma_root, (void *)new_root); + return 0; + } + + offset = ma_parent_slot(node); + type = ma_parent_type(mt, node); + node = ma_parent(node); + new_node = ma_parent(new_node); + if (offset < ma_nonleaf_data_end(mt, node, type)) { + offset++; + new_slots = (void **)ma_slots(new_node, type); + slots = ma_slots(node, type); + goto descend; + } + + goto ascend; +} + +/** + * __mt_dup(): Duplicate a maple tree + * @mt: The source maple tree + * @new: The new maple tree + * @gfp: The GFP_FLAGS to use for allocations + * + * This function duplicates a maple tree using a faster method than traversing + * the source tree and inserting entries into the new tree one by one. The user + * needs to lock the source tree manually. Before calling this function, @new + * must be an empty tree or an uninitialized tree. If @mt uses an external lock, + * we may also need to manually set @new's external lock using + * mt_set_external_lock(). + * + * Return: 0 on success, -ENOMEM if memory could not be allocated. + */ +int __mt_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp) +{ + int ret; + struct maple_node *to_free = NULL; + + ret = mt_dup_build(mt, new, gfp, &to_free); + + if (unlikely(ret == -ENOMEM)) { + if (to_free) + mt_dup_free(new, to_free); + } + + return ret; +} +EXPORT_SYMBOL(__mt_dup); + +/** + * mt_dup(): Duplicate a maple tree + * @mt: The source maple tree + * @new: The new maple tree + * @gfp: The GFP_FLAGS to use for allocations + * + * This function duplicates a maple tree using a faster method than traversing + * the source tree and inserting entries into the new tree one by one. The + * function will lock the source tree with an internal lock, and the user does + * not need to manually handle the lock. Before calling this function, @new must + * be an empty tree or an uninitialized tree. If @mt uses an external lock, we + * may also need to manually set @new's external lock using + * mt_set_external_lock(). + * + * Return: 0 on success, -ENOMEM if memory could not be allocated. + */ +int mt_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp) +{ + int ret; + struct maple_node *to_free = NULL; + + mtree_lock(mt); + ret = mt_dup_build(mt, new, gfp, &to_free); + mtree_unlock(mt); + + if (unlikely(ret == -ENOMEM)) { + if (to_free) + mt_dup_free(new, to_free); + } + + return ret; +} +EXPORT_SYMBOL(mt_dup); + /** * __mt_destroy() - Walk and free all nodes of a locked maple tree. * @mt: The maple tree