From patchwork Mon Dec 12 12:31:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "wuqiang.matt" X-Patchwork-Id: 32349 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp2215447wrr; Mon, 12 Dec 2022 04:34:12 -0800 (PST) X-Google-Smtp-Source: AA0mqf6itW1Mn8rv4LaB/GlUl65iGKEXpzZ2WRj4x91l6yZrKCix0zGFtqqMoqNNfmh1n+nUFevA X-Received: by 2002:a17:907:1627:b0:7c1:413d:5b0 with SMTP id hb39-20020a170907162700b007c1413d05b0mr14992522ejc.47.1670848452064; Mon, 12 Dec 2022 04:34:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670848452; cv=none; d=google.com; s=arc-20160816; b=RarrDb5fEniNyoqWO7AeYJ2Wv+eJhSIXMBmEN6QW9zlKaP51Wvpz29LUgpcNj1i4h9 VUTJWQJzMJcMxfWa5lLDhNlV2TsX5EGBTwCdqQGxvFxGAp6PbyAcB3u4hDnAgCQSCOuw F+NfqCvMZDMzx8+fvrmfxscCCZSK0axIivxVfjOApwEm85LDEUbVxCchTzyUZPZD6YFf mJRbbmIBPrd8lwXS71ECxYENPmrl7ZcNfOtpq6FElfQATLeg5emAulpmgZ0T+KPeTTEU veXMAtBhMbY+LgHqpy1sHY/VgyoL9yQi164VedIHEmZoUTIVK4rz5Q71B5qmMUGDvI2N eCuQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=mw7SboPpZuiyRJowRlWiQwh9TLDhl54cfoFb4bbn8Ng=; b=S+EcpR5zpHrLFp8DjdcN9ESl4YKLCx7RZchPgyHN9O5Ijla60WIkR2Eklp7emS3QJA tpqsbqNY1pqky80cJ09EWXoIUOYkyPfeK79djHUhxAqH/9BNuFY1bwgObcgFDeNHbeEl otNhO2j7+vjX0fVi+x9mCxgnACW/9t60/vEY6gA7pMJ5DlYhM7e/ZVyBMlRu+Zwqej5P 4nuvzuJmlXjFsdW/0q0tgD4ZELdLb979ANPl8dFx6oQa+iBc8Bu5TylkizXXcQi+rAJD yEu001ai9sRLxtXUl8Fu5ioueZGwlNXvzSqhSsQaMvYGIZQ0WqL2VVmomvAtUBnGmO1D MBGw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=HTyc5qFe; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id gx6-20020a1709068a4600b0077c5ec87ec2si5280719ejc.297.2022.12.12.04.33.49; Mon, 12 Dec 2022 04:34:12 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=HTyc5qFe; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232106AbiLLMcs (ORCPT + 99 others); Mon, 12 Dec 2022 07:32:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43328 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232125AbiLLMcn (ORCPT ); Mon, 12 Dec 2022 07:32:43 -0500 Received: from mail-pl1-x631.google.com (mail-pl1-x631.google.com [IPv6:2607:f8b0:4864:20::631]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 96523CDB for ; Mon, 12 Dec 2022 04:32:39 -0800 (PST) Received: by mail-pl1-x631.google.com with SMTP id jn7so11895253plb.13 for ; Mon, 12 Dec 2022 04:32:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=mw7SboPpZuiyRJowRlWiQwh9TLDhl54cfoFb4bbn8Ng=; b=HTyc5qFeHcrSjKoVz/+SDAsZGa75DM0u9AeQKVg2svNqj4tXmj1JTOSg6GcvBotTbA JX+WMUR9QhGYjRld2dUJWYXM/l+RUGfLBCiDMYlMfgHIurQUZmqoBNvzDQu4b4NthogK goSt/uoUuvr/pP+/+F+iteJlTG/+uks9ZmG3woG/VI6i/cD4YQLmtn1An1qhqGVrQFzA BPXJyxt7bujdL2Cz+onoBVWD4tC7h4h3q4uY1Ks1bzXhLhns/FDR1/Suuow6h6EDtfkP 5KmHM9moP+P8e0j1SFg6nleirEoic7ed2prxBkWdECRwKbrK2eBM9QR/dlwuG4Z2Quss pZkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mw7SboPpZuiyRJowRlWiQwh9TLDhl54cfoFb4bbn8Ng=; b=HkFKo1hgzTZLOyt3dssx27uOUcjNQiiCCaIM9cv0XKCn2N+RLnY0LDbyzfLCFdUoXC Ww6L/qo7e54CAv8CcAw1o9TNUM6dm9sIZNYGpS2YL2+OfEwUAEXugeiBUQ5vm1LXtuWO 99AEEsTmrWbiT1X/6YW8Pd0dLN02MhAVi5sWye8+0Lfcg91MWObSi4lMl7lJ0V5chzML /7xDwLFlyLGsfGhagYJzdu8ZIvJYPJGTvrdAUuR+ajsFrwmTtkIhukMyqL2u73Gg2VbJ txEg3IwIc45Fy6mep0AGdsXVm3ESDAY+4wlcKf6+oOYb53nBSdapRsawaK7u569FUqze 06QQ== X-Gm-Message-State: ANoB5pkIevqswhsmMtIVZqAvYBBFCBaolRYwCSlnMTJcw+oLJq3f+p3T 2c7GgaDqfRkHOipuqIqnyfPhVQ== X-Received: by 2002:a17:902:b087:b0:189:e81b:d254 with SMTP id p7-20020a170902b08700b00189e81bd254mr15307147plr.60.1670848358948; Mon, 12 Dec 2022 04:32:38 -0800 (PST) Received: from devtp.bytedance.net ([139.177.225.227]) by smtp.gmail.com with ESMTPSA id jw1-20020a170903278100b001869ba04c83sm6219987plb.245.2022.12.12.04.32.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 12 Dec 2022 04:32:38 -0800 (PST) From: wuqiang To: mhiramat@kernel.org, davem@davemloft.net, anil.s.keshavamurthy@intel.com, naveen.n.rao@linux.ibm.com, rostedt@goodmis.org, peterz@infradead.org, akpm@linux-foundation.org, sander@svanheule.net, ebiggers@google.com, dan.j.williams@intel.com, jpoimboe@kernel.org Cc: linux-kernel@vger.kernel.org, lkp@intel.com, mattwu@163.com, wuqiang Subject: [PATCH v7 1/5] lib: objpool added: ring-array based lockless MPMC queue Date: Mon, 12 Dec 2022 20:31:49 +0800 Message-Id: <20221212123153.190888-2-wuqiang.matt@bytedance.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20221212123153.190888-1-wuqiang.matt@bytedance.com> References: <20221212123153.190888-1-wuqiang.matt@bytedance.com> MIME-Version: 1.0 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752011586487954800?= X-GMAIL-MSGID: =?utf-8?q?1752011586487954800?= The object pool is a scalable implementaion of high performance queue for objects allocation and reclamation, such as kretprobe instances. With leveraging per-cpu ring-array to mitigate the hot spots of memory contention, it could deliver near-linear scalability for high parallel scenarios. The ring-array is compactly managed in a single cache-line to benefit from warmed L1 cache for most cases (<= 4 objects per-core). The body of pre-allocated objects is stored in continuous cache-lines just after the ring-array. The object pool is interrupt safe. Both allocation and reclamation (object pop and push operations) can be preemptible or interruptable. It's best suited for following cases: 1) Memory allocation or reclamation are prohibited or too expensive 2) Consumers are of different priorities, such as irqs and threads Limitations: 1) Maximum objects (capacity) is determined during pool initializing 2) The memory of objects won't be freed until the poll is finalized 3) Object allocation (pop) may fail after trying all cpu slots 4) Object reclamation (push) won't fail but may take long time to finish for imbalanced scenarios. You can try larger max_entries to mitigate, or ( >= CPUS * nr_objs) to avoid Signed-off-by: wuqiang --- include/linux/objpool.h | 109 ++++++++++++++ lib/Makefile | 2 +- lib/objpool.c | 320 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 430 insertions(+), 1 deletion(-) create mode 100644 include/linux/objpool.h create mode 100644 lib/objpool.c diff --git a/include/linux/objpool.h b/include/linux/objpool.h new file mode 100644 index 000000000000..922e1bc96f2b --- /dev/null +++ b/include/linux/objpool.h @@ -0,0 +1,109 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef _LINUX_OBJPOOL_H +#define _LINUX_OBJPOOL_H + +#include + +/* + * objpool: ring-array based lockless MPMC queue + * + * Copyright: wuqiang.matt@bytedance.com + * + * The object pool is a scalable implementaion of high performance queue + * for objects allocation and reclamation, such as kretprobe instances. + * + * With leveraging per-cpu ring-array to mitigate the hot spots of memory + * contention, it could deliver near-linear scalability for high parallel + * scenarios. The ring-array is compactly managed in a single cache-line + * to benefit from warmed L1 cache for most cases (<= 4 objects per-core). + * The body of pre-allocated objects is stored in continuous cache-lines + * just after the ring-array. + * + * The object pool is interrupt safe. Both allocation and reclamation + * (object pop and push operations) can be preemptible or interruptable. + * + * It's best suited for following cases: + * 1) Memory allocation or reclamation are prohibited or too expensive + * 2) Consumers are of different priorities, such as irqs and threads + * + * Limitations: + * 1) Maximum objects (capacity) is determined during pool initializing + * 2) The memory of objects won't be freed until the poll is finalized + * 3) Object allocation (pop) may fail after trying all cpu slots + */ + +/* + * objpool_slot: per-cpu ring array + * + * Represents a cpu-local array-based ring buffer, its size is specialized + * during initialization of object pool. + * + * The objpool_slot is allocated from local memory for NUMA system, and to + * be kept compact in a single cacheline. ages[] is stored just after the + * body of objpool_slot, and then entries[]. The Array of ages[] describes + * revision of each item, solely used to avoid ABA. And array of entries[] + * contains the pointers of objects. + * + * The default size of objpool_slot is a single cache-line, aka. 64 bytes. + * + * 64bit: + * 4 8 12 16 32 64 + * | head | tail | size | mask | ages[4] | ents[4]: (8 * 4) | objects + * + * 32bit: + * 4 8 12 16 32 48 64 + * | head | tail | size | mask | ages[4] | ents[4] | unused | objects + * + */ + +struct objpool_slot { + uint32_t head; /* head of ring array */ + uint32_t tail; /* tail of ring array */ + uint32_t size; /* array size, pow of 2 */ + uint32_t mask; /* size - 1 */ +} __packed; + +struct objpool_head; + +/* caller-specified callback for object initial setup, only called once */ +typedef int (*objpool_init_obj_cb)(void *obj, void *context); + +/* caller-specified cleanup callback for objpool destruction */ +typedef int (*objpool_fini_cb)(struct objpool_head *head, void *context); + +/* + * objpool_head: object pooling metadata + */ + +struct objpool_head { + int obj_size; /* object & element size */ + int nr_objs; /* total objs (to be pre-allocated) */ + int nr_cpus; /* nr_cpu_ids */ + int capacity; /* max objects per cpuslot */ + gfp_t gfp; /* gfp flags for kmalloc & vmalloc */ + unsigned long flags; /* flags for objpool management */ + struct objpool_slot **cpu_slots; /* array of percpu slots */ + int *slot_sizes; /* size in bytes of slots */ + objpool_fini_cb release; /* resource cleanup callback */ + void *context; /* caller-provided context */ +}; + +#define OBJPOOL_FROM_VMALLOC (0x800000000) /* objpool allocated from vmalloc area */ +#define OBJPOOL_HAVE_OBJECTS (0x400000000) /* objects allocated along with objpool */ + +/* initialize object pool and pre-allocate objects */ +int objpool_init(struct objpool_head *head, int nr_objs, int object_size, + gfp_t gfp, void *context, objpool_init_obj_cb objinit, + objpool_fini_cb release); + +/* allocate an object from objects pool */ +void *objpool_pop(struct objpool_head *head); + +/* reclaim an object to objects pool */ +int objpool_push(void *node, struct objpool_head *head); + +/* cleanup the whole object pool (objects including) */ +void objpool_fini(struct objpool_head *head); + +#endif /* _LINUX_OBJPOOL_H */ diff --git a/lib/Makefile b/lib/Makefile index 59bd7c2f793a..f23d9c4fe639 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -34,7 +34,7 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \ is_single_threaded.o plist.o decompress.o kobject_uevent.o \ earlycpio.o seq_buf.o siphash.o dec_and_lock.o \ nmi_backtrace.o win_minmax.o memcat_p.o \ - buildid.o + buildid.o objpool.o lib-$(CONFIG_PRINTK) += dump_stack.o lib-$(CONFIG_SMP) += cpumask.o diff --git a/lib/objpool.c b/lib/objpool.c new file mode 100644 index 000000000000..bab8b27e75d7 --- /dev/null +++ b/lib/objpool.c @@ -0,0 +1,320 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include +#include +#include +#include + +/* + * objpool: ring-array based lockless MPMC/FIFO queues + * + * Copyright: wuqiang.matt@bytedance.com + */ + +/* compute the suitable num of objects to be managed by slot */ +static inline int objpool_nobjs(int size) +{ + return rounddown_pow_of_two((size - sizeof(struct objpool_slot)) / + (sizeof(uint32_t) + sizeof(void *))); +} + +#define SLOT_AGES(s) ((uint32_t *)((char *)(s) + sizeof(struct objpool_slot))) +#define SLOT_ENTS(s) ((void **)((char *)(s) + sizeof(struct objpool_slot) + \ + sizeof(uint32_t) * (s)->size)) +#define SLOT_OBJS(s) ((void *)((char *)(s) + sizeof(struct objpool_slot) + \ + (sizeof(uint32_t) + sizeof(void *)) * (s)->size)) +#define SLOT_CORE(n) cpumask_nth((n) % num_possible_cpus(), cpu_possible_mask) + +/* allocate and initialize percpu slots */ +static inline int +objpool_init_percpu_slots(struct objpool_head *head, int nobjs, + void *context, objpool_init_obj_cb objinit) +{ + int i, j, n, size, objsz, cpu = 0, nents = head->capacity; + + /* aligned object size by sizeof(void *) */ + objsz = ALIGN(head->obj_size, sizeof(void *)); + /* shall we allocate objects along with objpool_slot */ + if (objsz) + head->flags |= OBJPOOL_HAVE_OBJECTS; + + for (i = 0; i < head->nr_cpus; i++) { + struct objpool_slot *os; + + /* skip the cpus which could never be present */ + if (!cpu_possible(i)) + continue; + + /* compute how many objects to be managed by this slot */ + n = nobjs / num_possible_cpus(); + if (cpu < (nobjs % num_possible_cpus())) + n++; + size = sizeof(struct objpool_slot) + sizeof(void *) * nents + + sizeof(uint32_t) * nents + objsz * n; + + /* decide memory area for cpu-slot allocation */ + if (!cpu && !(head->gfp & GFP_ATOMIC) && size > PAGE_SIZE / 2) + head->flags |= OBJPOOL_FROM_VMALLOC; + + /* allocate percpu slot & objects from local memory */ + if (head->flags & OBJPOOL_FROM_VMALLOC) + os = __vmalloc_node(size, sizeof(void *), head->gfp, + cpu_to_node(i), __builtin_return_address(0)); + else + os = kmalloc_node(size, head->gfp, cpu_to_node(i)); + if (!os) + return -ENOMEM; + + /* initialize percpu slot for the i-th slot */ + memset(os, 0, size); + os->size = head->capacity; + os->mask = os->size - 1; + head->cpu_slots[i] = os; + head->slot_sizes[i] = size; + cpu = cpu + 1; + + /* + * start from 2nd round to avoid conflict of 1st item. + * we assume that the head item is ready for retrieval + * iff head is equal to ages[head & mask]. but ages is + * initialized as 0, so in view of the caller of pop(), + * the 1st item (0th) is always ready, but fact could + * be: push() is stalled before the final update, thus + * the item being inserted will be lost forever. + */ + os->head = os->tail = head->capacity; + + if (!objsz) + continue; + + for (j = 0; j < n; j++) { + uint32_t *ages = SLOT_AGES(os); + void **ents = SLOT_ENTS(os); + void *obj = SLOT_OBJS(os) + j * objsz; + uint32_t ie = os->tail & os->mask; + + /* perform object initialization */ + if (objinit) { + int rc = objinit(obj, context); + if (rc) + return rc; + } + + /* add obj into the ring array */ + ents[ie] = obj; + ages[ie] = os->tail; + os->tail++; + head->nr_objs++; + } + } + + return 0; +} + +/* cleanup all percpu slots of the object pool */ +static inline void objpool_fini_percpu_slots(struct objpool_head *head) +{ + int i; + + if (!head->cpu_slots) + return; + + for (i = 0; i < head->nr_cpus; i++) { + if (!head->cpu_slots[i]) + continue; + if (head->flags & OBJPOOL_FROM_VMALLOC) + vfree(head->cpu_slots[i]); + else + kfree(head->cpu_slots[i]); + } + kfree(head->cpu_slots); + head->cpu_slots = NULL; + head->slot_sizes = NULL; +} + +/** + * objpool_init: initialize object pool and pre-allocate objects + * + * args: + * @head: the object pool to be initialized, declared by caller + * @nr_objs: total objects to be pre-allocated by this object pool + * @object_size: size of an object, no objects pre-allocated if 0 + * @gfp: flags for memory allocation (via kmalloc or vmalloc) + * @context: user context for object initialization callback + * @objinit: object initialization callback for extra setting-up + * @release: cleanup callback for private objects/pool/context + * + * return: + * 0 for success, otherwise error code + * + * All pre-allocated objects are to be zeroed. Caller could do extra + * initialization in objinit callback. The objinit callback will be + * called once and only once after the slot allocation. Then objpool + * won't touch any content of the objects since then. It's caller's + * duty to perform reinitialization after object allocation (pop) or + * clearance before object reclamation (push) if required. + */ +int objpool_init(struct objpool_head *head, int nr_objs, int object_size, + gfp_t gfp, void *context, objpool_init_obj_cb objinit, + objpool_fini_cb release) +{ + int nents, rc; + + /* check input parameters */ + if (nr_objs <= 0 || object_size < 0) + return -EINVAL; + + /* calculate percpu slot size (rounded to pow of 2) */ + nents = max_t(int, roundup_pow_of_two(nr_objs), + objpool_nobjs(L1_CACHE_BYTES)); + + /* initialize objpool head */ + memset(head, 0, sizeof(struct objpool_head)); + head->nr_cpus = nr_cpu_ids; + head->obj_size = object_size; + head->capacity = nents; + head->gfp = gfp & ~__GFP_ZERO; + head->context = context; + head->release = release; + + /* allocate array for percpu slots */ + head->cpu_slots = kzalloc(head->nr_cpus * sizeof(void *) + + head->nr_cpus * sizeof(uint32_t), head->gfp); + if (!head->cpu_slots) + return -ENOMEM; + head->slot_sizes = (uint32_t *)&head->cpu_slots[head->nr_cpus]; + + /* initialize per-cpu slots */ + rc = objpool_init_percpu_slots(head, nr_objs, context, objinit); + if (rc) + objpool_fini_percpu_slots(head); + + return rc; +} +EXPORT_SYMBOL_GPL(objpool_init); + +/* adding object to slot tail, the given slot must NOT be full */ +static inline int objpool_add_slot(void *obj, struct objpool_slot *os) +{ + uint32_t *ages = SLOT_AGES(os); + void **ents = SLOT_ENTS(os); + uint32_t tail = atomic_inc_return((atomic_t *)&os->tail) - 1; + + WRITE_ONCE(ents[tail & os->mask], obj); + + /* order matters: obj must be updated before tail updating */ + smp_store_release(&ages[tail & os->mask], tail); + return 0; +} + +/** + * objpool_push: reclaim the object and return back to objects pool + * + * args: + * @obj: object pointer to be pushed to object pool + * @head: object pool + * + * return: + * 0 or error code: it fails only when objects pool are full + * + * objpool_push is non-blockable, and can be nested + */ +int objpool_push(void *obj, struct objpool_head *head) +{ + int cpu = raw_smp_processor_id(); + + return objpool_add_slot(obj, head->cpu_slots[cpu]); +} +EXPORT_SYMBOL_GPL(objpool_push); + +/* try to retrieve object from slot */ +static inline void *objpool_try_get_slot(struct objpool_slot *os) +{ + uint32_t *ages = SLOT_AGES(os); + void **ents = SLOT_ENTS(os); + /* do memory load of head to local head */ + uint32_t head = smp_load_acquire(&os->head); + + /* loop if slot isn't empty */ + while (head != READ_ONCE(os->tail)) { + uint32_t id = head & os->mask, prev = head; + + /* do prefetching of object ents */ + prefetch(&ents[id]); + + /* + * check whether this item was ready for retrieval ? There's + * possibility * in theory * we might retrieve wrong object, + * in case ages[id] overflows when current task is sleeping, + * but it will take very very long to overflow an uint32_t + */ + if (smp_load_acquire(&ages[id]) == head) { + /* node must have been udpated by push() */ + void *node = READ_ONCE(ents[id]); + /* commit and move forward head of the slot */ + if (try_cmpxchg_release(&os->head, &head, head + 1)) + return node; + } + + /* re-load head from memory and continue trying */ + head = READ_ONCE(os->head); + /* + * head stays unchanged, so it's very likely current pop() + * just preempted/interrupted an ongoing push() operation + */ + if (head == prev) + break; + } + + return NULL; +} + +/** + * objpool_pop: allocate an object from objects pool + * + * args: + * @head: object pool + * + * return: + * object: NULL if failed (object pool is empty) + * + * objpool_pop can be nested, so can be used in any context. + */ +void *objpool_pop(struct objpool_head *head) +{ + int i, cpu = raw_smp_processor_id(); + void *obj = NULL; + + for (i = 0; i < num_possible_cpus(); i++) { + obj = objpool_try_get_slot(head->cpu_slots[cpu]); + if (obj) + break; + cpu = cpumask_next_wrap(cpu, cpu_possible_mask, -1, 1); + } + + return obj; +} +EXPORT_SYMBOL_GPL(objpool_pop); + +/** + * objpool_fini: cleanup the whole object pool (releasing all objects) + * + * args: + * @head: object pool to be released + * + */ +void objpool_fini(struct objpool_head *head) +{ + if (!head->cpu_slots) + return; + + /* release percpu slots */ + objpool_fini_percpu_slots(head); + + /* call user's cleanup callback if provided */ + if (head->release) + head->release(head, head->context); +} +EXPORT_SYMBOL_GPL(objpool_fini); From patchwork Mon Dec 12 12:31:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "wuqiang.matt" X-Patchwork-Id: 32350 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp2215477wrr; Mon, 12 Dec 2022 04:34:17 -0800 (PST) X-Google-Smtp-Source: AA0mqf5pusI/vzhjJ065dY0B6wzqle/4Ug2RiigxJSApzQaEAfp9yvfF932zo2TNK3TXWxyWJEzu X-Received: by 2002:a05:6402:1154:b0:462:16a2:a2e7 with SMTP id g20-20020a056402115400b0046216a2a2e7mr13077176edw.6.1670848457616; Mon, 12 Dec 2022 04:34:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670848457; cv=none; d=google.com; s=arc-20160816; b=prf5OITFCtZQ7pbB2eVDpevmE+RFxREJg4ZR27gI6sDcObc90G2MmIyQPLN6NhLn6t eHd1u7QXjodWDmMkbZe+K/Ha02mD4qjkrHpZ6SNOGqQG5vSIxIv5zHyYJLsGnys4rl1Y l7L2CeJyCRIsuhptmkPonpXPzgKSxfpAfeMnYE7JUUvyokRbWOGbKsCFyPDeJjcqtw8k qrB6uAdBvrd3fQzu1ip5/V4iRMW3ZNVuzPD9CFF/bYAcWoA8x0JtPV94CrmnXChQmvWG WV+QXFD3H2TMooOJJGayjPw4Lu01DOdkA7ebYilQ5JGPF3w4RXX4UlOUUjR8rBras4Fk lflA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=cH/a1E8XhuTykeFjVwxRcwkScPTFoUzlEyI9omkgaFk=; b=cmu+p6czoIj1mp3D/73D+Wks46XoUIxYJmlNY7TdswMX61BSMalX+yOEbvPBGoHVdA zSiMrHhnqV9Fk6p1Clexz0U+U4EsmrRQzKyNcFux0vSk8+6LjbACcE5Yn3svC/1vtj6q DyhGcKJSRM5Vt72MxjxIT5iBLVI+QYEotfmSSa0foZRj44zgTbWFXpE+L1dPcQgpaLWf oqVsNfNL5AhXt4bzQ+GFbar6mb3be+KCqgBZ8NJF93NF69F9Dtjop61AZDih2A0OrzmJ UPtwp9vtM3HXVAxD2D3x5cmAsOZcAr56c20a8RoU/m2uCXw5C6xj9XA2mrnX+yg4cTha K5+w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=Zg8+G9w5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b25-20020a05640202d900b004672f952b7asi6776929edx.203.2022.12.12.04.33.54; Mon, 12 Dec 2022 04:34:17 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=Zg8+G9w5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232154AbiLLMdF (ORCPT + 99 others); Mon, 12 Dec 2022 07:33:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43346 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232126AbiLLMcr (ORCPT ); Mon, 12 Dec 2022 07:32:47 -0500 Received: from mail-pj1-x102b.google.com (mail-pj1-x102b.google.com [IPv6:2607:f8b0:4864:20::102b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 12D925FF0 for ; Mon, 12 Dec 2022 04:32:46 -0800 (PST) Received: by mail-pj1-x102b.google.com with SMTP id e7-20020a17090a77c700b00216928a3917so15529717pjs.4 for ; Mon, 12 Dec 2022 04:32:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=cH/a1E8XhuTykeFjVwxRcwkScPTFoUzlEyI9omkgaFk=; b=Zg8+G9w5uvRtfwLQ7E8WIIIbSnIGXeJQ4h7M2yeyBBPH0HGXuaNH2i1GwVlGtQ/xVG CWszrXrYAHNyq6kO3+NwarNdScMhItuLPU0tCcLN3lTAsIgR4toML9wbk7q8nVp0hTcW hqGvuK1hlFCvv+CvBvwSZJq5xI8I3G2xvfv7kCdlMSlMaKdng4ho6/fr4lR4trFGyzyL Jh9ewc9He98ViIqDQUmMskTMcB2gCpWQJlUcHOo4bzP+7moJjGs4vB9iVRGegjGqrWIk 52KQPbUUjM4jCZleFjn3RhDLv5jTMPFUw6RZlTbB4O/c2PQjILePJ+CruMTTNpDI/23I PTog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=cH/a1E8XhuTykeFjVwxRcwkScPTFoUzlEyI9omkgaFk=; b=QfNtkJoFfvXOP8418ejjSModTmZc+LtoJXNJxSHczM4TS8jj3PHOAIvjVlwBs7PJVY CvJL/sBWDHOkW9MeKCPbG0+YpXEmxI8odQ/2mKvxvi8BrcLAreHqsvl6FWye6iyT2Z1g AgEYrwfCLCMftUF6skrTQYwa47gtusz35V7+HqurTWaYyAUTFvtJgIJgwRE/ZwAdcfT9 cm4esoViW/08nSFE5PVxd+C6TfIER4YOajHhpxc472kVhJVGnhayoVe6lT09f+8nQ0NR ucGhykxIphmskttoTwvg+CwlIEI9fxWnhkt9ZVPGGl0lGK+QvKtkanfBqVhZ6osvQrTx 5Nog== X-Gm-Message-State: ANoB5pnXgB0OC0j4pifdl2hU4NK7NaSEU73Hhcj63+1UAoYhCEwh7Vnu DixGf3nr9uxZjUnhE8ohH4CosA== X-Received: by 2002:a17:902:e8d8:b0:188:d405:63c0 with SMTP id v24-20020a170902e8d800b00188d40563c0mr15706265plg.6.1670848365390; Mon, 12 Dec 2022 04:32:45 -0800 (PST) Received: from devtp.bytedance.net ([139.177.225.227]) by smtp.gmail.com with ESMTPSA id jw1-20020a170903278100b001869ba04c83sm6219987plb.245.2022.12.12.04.32.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 12 Dec 2022 04:32:45 -0800 (PST) From: wuqiang To: mhiramat@kernel.org, davem@davemloft.net, anil.s.keshavamurthy@intel.com, naveen.n.rao@linux.ibm.com, rostedt@goodmis.org, peterz@infradead.org, akpm@linux-foundation.org, sander@svanheule.net, ebiggers@google.com, dan.j.williams@intel.com, jpoimboe@kernel.org Cc: linux-kernel@vger.kernel.org, lkp@intel.com, mattwu@163.com, wuqiang Subject: [PATCH v7 2/5] lib: objpool test module added Date: Mon, 12 Dec 2022 20:31:50 +0800 Message-Id: <20221212123153.190888-3-wuqiang.matt@bytedance.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20221212123153.190888-1-wuqiang.matt@bytedance.com> References: <20221212123153.190888-1-wuqiang.matt@bytedance.com> MIME-Version: 1.0 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752011592139283795?= X-GMAIL-MSGID: =?utf-8?q?1752011592139283795?= The test_objpool module (test_objpool) will run several testcases for objpool stress and performance evaluation. Each testcase will have all available cpu cores involved to create a situation of high parallel and high contention. As of now there are 3 groups and 3 * 2 testcases in total: 1) group 1: synchronous mode objpool is managed synchronously, that is, all objects are to be reclaimed before objpool finalization and the objpool owner makes sure of it. All threads on different cores run in the same pace. 2) group 2: synchronous + miss mode This test group is mainly for performance evaluation of missing cases when pre-allocated objects are less than the requested. 3) group 3: asynchronous mode This case is just an emulation of kretprobe. The objpool owner has no control of the object after it's allocated. hrtimer irq is introduced to stress objpool with thread preemption. Signed-off-by: wuqiang --- lib/Kconfig.debug | 11 + lib/Makefile | 2 + lib/test_objpool.c | 696 +++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 709 insertions(+) create mode 100644 lib/test_objpool.c diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 3638b3424be5..840903b51434 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -2750,6 +2750,17 @@ config TEST_CLOCKSOURCE_WATCHDOG If unsure, say N. +config TEST_OBJPOOL + tristate "Test module for correctness and stress of objpool" + default n + depends on m + help + This builds the "test_objpool" module that should be used for + correctness verification and concurrent testings of objects + allocation and reclamation. + + If unsure, say N. + endif # RUNTIME_TESTING_MENU config ARCH_USE_MEMTEST diff --git a/lib/Makefile b/lib/Makefile index f23d9c4fe639..c078dc5f64ac 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -100,6 +100,8 @@ obj-$(CONFIG_KPROBES_SANITY_TEST) += test_kprobes.o obj-$(CONFIG_TEST_REF_TRACKER) += test_ref_tracker.o CFLAGS_test_fprobe.o += $(CC_FLAGS_FTRACE) obj-$(CONFIG_FPROBE_SANITY_TEST) += test_fprobe.o +obj-$(CONFIG_TEST_OBJPOOL) += test_objpool.o + # # CFLAGS for compiling floating point code inside the kernel. x86/Makefile turns # off the generation of FPU/SSE* instructions for kernel proper but FPU_FLAGS diff --git a/lib/test_objpool.c b/lib/test_objpool.c new file mode 100644 index 000000000000..733b557c25b1 --- /dev/null +++ b/lib/test_objpool.c @@ -0,0 +1,696 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* + * Test module for lockless object pool + * (C) 2022 Matt Wu + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define OT_NR_MAX_BULK (16) + +struct ot_ctrl { + int mode; /* test no */ + int objsz; /* object size */ + int duration; /* ms */ + int delay; /* ms */ + int bulk_normal; + int bulk_irq; + unsigned long hrtimer; /* ms */ + const char *name; +}; + +struct ot_stat { + unsigned long nhits; + unsigned long nmiss; +}; + +struct ot_item { + struct objpool_head *pool; /* pool head */ + struct ot_ctrl *ctrl; /* ctrl parameters */ + + void (*worker)(struct ot_item *item, int irq); + + /* hrtimer control */ + ktime_t hrtcycle; + struct hrtimer hrtimer; + + int bulk[2]; /* for thread and irq */ + int delay; + u32 niters; + + /* results summary */ + struct ot_stat stat[2]; /* thread and irq */ + + u64 duration; +}; + +struct ot_mem_stat { + atomic_long_t alloc; + atomic_long_t free; +}; + +struct ot_data { + struct rw_semaphore start; + struct completion wait; + struct completion rcu; + atomic_t nthreads ____cacheline_aligned_in_smp; + atomic_t stop ____cacheline_aligned_in_smp; + struct ot_mem_stat kmalloc; + struct ot_mem_stat vmalloc; +} g_ot_data; + +/* + * memory leakage checking + */ + +static void *ot_kzalloc(long size) +{ + void *ptr = kzalloc(size, GFP_KERNEL); + + if (ptr) + atomic_long_add(size, &g_ot_data.kmalloc.alloc); + return ptr; +} + +static void ot_kfree(void *ptr, long size) +{ + if (!ptr) + return; + atomic_long_add(size, &g_ot_data.kmalloc.free); + kfree(ptr); +} + +static void ot_mem_report(struct ot_ctrl *ctrl) +{ + long alloc, free; + + pr_info("memory allocation summary for %s\n", ctrl->name); + + alloc = atomic_long_read(&g_ot_data.kmalloc.alloc); + free = atomic_long_read(&g_ot_data.kmalloc.free); + pr_info(" kmalloc: %lu - %lu = %lu\n", alloc, free, alloc - free); + + alloc = atomic_long_read(&g_ot_data.vmalloc.alloc); + free = atomic_long_read(&g_ot_data.vmalloc.free); + pr_info(" vmalloc: %lu - %lu = %lu\n", alloc, free, alloc - free); +} + +/* + * general structs & routines + */ + +struct ot_node { + void *owner; + unsigned long data; + unsigned long refs; + unsigned long payload[32]; +}; + +struct ot_context { + struct objpool_head pool; /* objpool head */ + struct ot_ctrl *ctrl; /* ctrl parameters */ + void *ptr; /* user pool buffer */ + unsigned long size; /* buffer size */ + refcount_t refs; + struct rcu_head rcu; +}; + +static DEFINE_PER_CPU(struct ot_item, ot_pcup_items); + +static int ot_init_data(struct ot_data *data) +{ + memset(data, 0, sizeof(*data)); + init_rwsem(&data->start); + init_completion(&data->wait); + init_completion(&data->rcu); + atomic_set(&data->nthreads, 1); + + return 0; +} + +static void ot_reset_data(struct ot_data *data) +{ + reinit_completion(&data->wait); + reinit_completion(&data->rcu); + atomic_set(&data->nthreads, 1); + atomic_set(&data->stop, 0); + memset(&data->kmalloc, 0, sizeof(data->kmalloc)); + memset(&data->vmalloc, 0, sizeof(data->vmalloc)); +} + +static int ot_init_node(void *nod, void *context) +{ + struct ot_context *sop = context; + struct ot_node *on = nod; + + on->owner = &sop->pool; + return 0; +} + +static enum hrtimer_restart ot_hrtimer_handler(struct hrtimer *hrt) +{ + struct ot_item *item = container_of(hrt, struct ot_item, hrtimer); + + if (atomic_read_acquire(&g_ot_data.stop)) + return HRTIMER_NORESTART; + + /* do bulk-testings for objects pop/push */ + item->worker(item, 1); + + hrtimer_forward(hrt, hrt->base->get_time(), item->hrtcycle); + return HRTIMER_RESTART; +} + +static void ot_start_hrtimer(struct ot_item *item) +{ + if (!item->ctrl->hrtimer) + return; + hrtimer_start(&item->hrtimer, item->hrtcycle, HRTIMER_MODE_REL); +} + +static void ot_stop_hrtimer(struct ot_item *item) +{ + if (!item->ctrl->hrtimer) + return; + hrtimer_cancel(&item->hrtimer); +} + +static int ot_init_hrtimer(struct ot_item *item, unsigned long hrtimer) +{ + struct hrtimer *hrt = &item->hrtimer; + + if (!hrtimer) + return -ENOENT; + + item->hrtcycle = ktime_set(0, hrtimer * 1000000UL); + hrtimer_init(hrt, CLOCK_MONOTONIC, HRTIMER_MODE_REL); + hrt->function = ot_hrtimer_handler; + return 0; +} + +static int ot_init_cpu_item(struct ot_item *item, + struct ot_ctrl *ctrl, + struct objpool_head *pool, + void (*worker)(struct ot_item *, int)) +{ + memset(item, 0, sizeof(*item)); + item->pool = pool; + item->ctrl = ctrl; + item->worker = worker; + + item->bulk[0] = ctrl->bulk_normal; + item->bulk[1] = ctrl->bulk_irq; + item->delay = ctrl->delay; + + /* initialize hrtimer */ + ot_init_hrtimer(item, item->ctrl->hrtimer); + return 0; +} + +static int ot_thread_worker(void *arg) +{ + struct ot_item *item = arg; + ktime_t start; + + sched_set_normal(current, 50); + + atomic_inc(&g_ot_data.nthreads); + down_read(&g_ot_data.start); + up_read(&g_ot_data.start); + start = ktime_get(); + ot_start_hrtimer(item); + do { + if (atomic_read_acquire(&g_ot_data.stop)) + break; + /* do bulk-testings for objects pop/push */ + item->worker(item, 0); + } while (!kthread_should_stop()); + ot_stop_hrtimer(item); + item->duration = (u64) ktime_us_delta(ktime_get(), start); + if (atomic_dec_and_test(&g_ot_data.nthreads)) + complete(&g_ot_data.wait); + + return 0; +} + +static void ot_perf_report(struct ot_ctrl *ctrl, u64 duration) +{ + struct ot_stat total, normal = {0}, irq = {0}; + int cpu, nthreads = 0; + + pr_info("\n"); + pr_info("Testing summary for %s\n", ctrl->name); + + for_each_possible_cpu(cpu) { + struct ot_item *item = per_cpu_ptr(&ot_pcup_items, cpu); + if (!item->duration) + continue; + normal.nhits += item->stat[0].nhits; + normal.nmiss += item->stat[0].nmiss; + irq.nhits += item->stat[1].nhits; + irq.nmiss += item->stat[1].nmiss; + pr_info("CPU: %d duration: %lluus\n", cpu, item->duration); + pr_info("\tthread:\t%16lu hits \t%16lu miss\n", + item->stat[0].nhits, item->stat[0].nmiss); + pr_info("\tirq: \t%16lu hits \t%16lu miss\n", + item->stat[1].nhits, item->stat[1].nmiss); + pr_info("\ttotal: \t%16lu hits \t%16lu miss\n", + item->stat[0].nhits + item->stat[1].nhits, + item->stat[0].nmiss + item->stat[1].nmiss); + nthreads++; + } + + total.nhits = normal.nhits + irq.nhits; + total.nmiss = normal.nmiss + irq.nmiss; + + pr_info("ALL: \tnthreads: %d duration: %lluus\n", nthreads, duration); + pr_info("SUM: \t%16lu hits \t%16lu miss\n", + total.nhits, total.nmiss); +} + +/* + * synchronous test cases for objpool manipulation + */ + +/* objpool manipulation for synchronous mode 0 (percpu objpool) */ +static struct ot_context *ot_init_sync_m0(struct ot_ctrl *ctrl) +{ + struct ot_context *sop = NULL; + int max = num_possible_cpus() << 3; + + sop = (struct ot_context *)ot_kzalloc(sizeof(*sop)); + if (!sop) + return NULL; + sop->ctrl = ctrl; + + if (objpool_init(&sop->pool, max, ctrl->objsz, + GFP_KERNEL, sop, ot_init_node, NULL)) { + ot_kfree(sop, sizeof(*sop)); + return NULL; + } + WARN_ON(max != sop->pool.nr_objs); + + return sop; +} + +static void ot_fini_sync_m0(struct ot_context *sop) +{ + objpool_fini(&sop->pool); + ot_kfree(sop, sizeof(*sop)); +} + +struct { + struct ot_context * (*init)(struct ot_ctrl *oc); + void (*fini)(struct ot_context *sop); +} g_ot_sync_ops[] = { + {.init = ot_init_sync_m0, .fini = ot_fini_sync_m0}, +}; + +/* + * synchronous test cases: performance mode + */ + +static void ot_bulk_sync(struct ot_item *item, int irq) +{ + struct ot_node *nods[OT_NR_MAX_BULK]; + int i; + + for (i = 0; i < item->bulk[irq]; i++) + nods[i] = objpool_pop(item->pool); + + if (!irq && (item->delay || !(++(item->niters) & 0x7FFF))) + msleep(item->delay); + + while (i-- > 0) { + struct ot_node *on = nods[i]; + if (on) { + on->refs++; + objpool_push(on, item->pool); + item->stat[irq].nhits++; + } else { + item->stat[irq].nmiss++; + } + } +} + +static int ot_start_sync(struct ot_ctrl *ctrl) +{ + struct ot_context *sop; + ktime_t start; + u64 duration; + unsigned long timeout; + int cpu, rc; + + /* initialize objpool for syncrhonous testcase */ + sop = g_ot_sync_ops[ctrl->mode].init(ctrl); + if (!sop) + return -ENOMEM; + + /* grab rwsem to block testing threads */ + down_write(&g_ot_data.start); + + for_each_possible_cpu(cpu) { + struct ot_item *item = per_cpu_ptr(&ot_pcup_items, cpu); + struct task_struct *work; + + ot_init_cpu_item(item, ctrl, &sop->pool, ot_bulk_sync); + + /* skip offline cpus */ + if (!cpu_online(cpu)) + continue; + + work = kthread_create_on_node(ot_thread_worker, item, + cpu_to_node(cpu), "ot_worker_%d", cpu); + if (IS_ERR(work)) { + pr_err("failed to create thread for cpu %d\n", cpu); + } else { + kthread_bind(work, cpu); + wake_up_process(work); + } + } + + /* wait a while to make sure all threads waiting at start line */ + msleep(20); + + /* in case no threads were created: memory insufficient ? */ + if (atomic_dec_and_test(&g_ot_data.nthreads)) + complete(&g_ot_data.wait); + + // sched_set_fifo_low(current); + + /* start objpool testing threads */ + start = ktime_get(); + up_write(&g_ot_data.start); + + /* yeild cpu to worker threads for duration ms */ + timeout = msecs_to_jiffies(ctrl->duration); + rc = schedule_timeout_interruptible(timeout); + + /* tell workers threads to quit */ + atomic_set_release(&g_ot_data.stop, 1); + + /* wait all workers threads finish and quit */ + wait_for_completion(&g_ot_data.wait); + duration = (u64) ktime_us_delta(ktime_get(), start); + + /* cleanup objpool */ + g_ot_sync_ops[ctrl->mode].fini(sop); + + /* report testing summary and performance results */ + ot_perf_report(ctrl, duration); + + /* report memory allocation summary */ + ot_mem_report(ctrl); + + return rc; +} + +/* + * asynchronous test cases: pool lifecycle controlled by refcount + */ + +static void ot_fini_async_rcu(struct rcu_head *rcu) +{ + struct ot_context *sop = container_of(rcu, struct ot_context, rcu); + struct ot_node *on; + + /* here all cpus are aware of the stop event: g_ot_data.stop = 1 */ + WARN_ON(!atomic_read_acquire(&g_ot_data.stop)); + + do { + /* release all objects remained in objpool */ + on = objpool_pop(&sop->pool); + + /* deref anyway since we've one extra ref grabbed */ + if (refcount_dec_and_test(&sop->refs)) { + objpool_fini(&sop->pool); + break; + } + } while (on); + + complete(&g_ot_data.rcu); +} + +static void ot_fini_async(struct ot_context *sop) +{ + /* make sure the stop event is acknowledged by all cores */ + call_rcu(&sop->rcu, ot_fini_async_rcu); +} + +static int ot_objpool_release(struct objpool_head *head, void *context) +{ + struct ot_context *sop = context; + + WARN_ON(!head || !sop || head != &sop->pool); + + /* do context cleaning if needed */ + if (sop) + ot_kfree(sop, sizeof(*sop)); + + return 0; +} + +static struct ot_context *ot_init_async_m0(struct ot_ctrl *ctrl) +{ + struct ot_context *sop = NULL; + int max = num_possible_cpus() << 3; + + sop = (struct ot_context *)ot_kzalloc(sizeof(*sop)); + if (!sop) + return NULL; + sop->ctrl = ctrl; + + if (objpool_init(&sop->pool, max, ctrl->objsz, GFP_KERNEL, + sop, ot_init_node, ot_objpool_release)) { + ot_kfree(sop, sizeof(*sop)); + return NULL; + } + WARN_ON(max != sop->pool.nr_objs); + refcount_set(&sop->refs, max + 1); + + return sop; +} + +struct { + struct ot_context * (*init)(struct ot_ctrl *oc); + void (*fini)(struct ot_context *sop); +} g_ot_async_ops[] = { + {.init = ot_init_async_m0, .fini = ot_fini_async}, +}; + +static void ot_nod_recycle(struct ot_node *on, struct objpool_head *pool, + int release) +{ + struct ot_context *sop; + + on->refs++; + + if (!release) { + /* push object back to opjpool for reuse */ + objpool_push(on, pool); + return; + } + + sop = container_of(pool, struct ot_context, pool); + WARN_ON(sop != pool->context); + + /* unref objpool with nod removed forever */ + if (refcount_dec_and_test(&sop->refs)) + objpool_fini(pool); +} + +static void ot_bulk_async(struct ot_item *item, int irq) +{ + struct ot_node *nods[OT_NR_MAX_BULK]; + int i, stop; + + for (i = 0; i < item->bulk[irq]; i++) + nods[i] = objpool_pop(item->pool); + + if (!irq) { + if (item->delay || !(++(item->niters) & 0x7FFF)) + msleep(item->delay); + get_cpu(); + } + + stop = atomic_read_acquire(&g_ot_data.stop); + + /* drop all objects and deref objpool */ + while (i-- > 0) { + struct ot_node *on = nods[i]; + + if (on) { + on->refs++; + ot_nod_recycle(on, item->pool, stop); + item->stat[irq].nhits++; + } else { + item->stat[irq].nmiss++; + } + } + + if (!irq) + put_cpu(); +} + +static int ot_start_async(struct ot_ctrl *ctrl) +{ + struct ot_context *sop; + ktime_t start; + u64 duration; + unsigned long timeout; + int cpu, rc; + + /* initialize objpool for syncrhonous testcase */ + sop = g_ot_async_ops[ctrl->mode].init(ctrl); + if (!sop) + return -ENOMEM; + + /* grab rwsem to block testing threads */ + down_write(&g_ot_data.start); + + for_each_possible_cpu(cpu) { + struct ot_item *item = per_cpu_ptr(&ot_pcup_items, cpu); + struct task_struct *work; + + ot_init_cpu_item(item, ctrl, &sop->pool, ot_bulk_async); + + /* skip offline cpus */ + if (!cpu_online(cpu)) + continue; + + work = kthread_create_on_node(ot_thread_worker, item, + cpu_to_node(cpu), "ot_worker_%d", cpu); + if (IS_ERR(work)) { + pr_err("failed to create thread for cpu %d\n", cpu); + } else { + kthread_bind(work, cpu); + wake_up_process(work); + } + } + + /* wait a while to make sure all threads waiting at start line */ + msleep(20); + + /* in case no threads were created: memory insufficient ? */ + if (atomic_dec_and_test(&g_ot_data.nthreads)) + complete(&g_ot_data.wait); + + /* start objpool testing threads */ + start = ktime_get(); + up_write(&g_ot_data.start); + + /* yeild cpu to worker threads for duration ms */ + timeout = msecs_to_jiffies(ctrl->duration); + rc = schedule_timeout_interruptible(timeout); + + /* tell workers threads to quit */ + atomic_set_release(&g_ot_data.stop, 1); + + /* do async-finalization */ + g_ot_async_ops[ctrl->mode].fini(sop); + + /* wait all workers threads finish and quit */ + wait_for_completion(&g_ot_data.wait); + duration = (u64) ktime_us_delta(ktime_get(), start); + + /* assure rcu callback is triggered */ + wait_for_completion(&g_ot_data.rcu); + + /* + * now we are sure that objpool is finalized either + * by rcu callback or by worker threads + */ + + /* report testing summary and performance results */ + ot_perf_report(ctrl, duration); + + /* report memory allocation summary */ + ot_mem_report(ctrl); + + return rc; +} + +/* + * predefined testing cases: + * synchronous case / overrun case / async case + * + * mode: int, currently only mode 0 is supoorted + * duration: int, total test time in ms + * delay: int, delay (in ms) between each iteration + * bulk_normal: int, repeat times for thread worker + * bulk_irq: int, repeat times for irq consumer + * hrtimer: unsigned long, hrtimer intervnal in ms + * name: char *, tag for current test ot_item + */ + +#define NODE_COMPACT sizeof(struct ot_node) +#define NODE_VMALLOC (512) + +struct ot_ctrl g_ot_sync[] = { + {0, NODE_COMPACT, 1000, 0, 1, 0, 0, "sync: percpu objpool"}, + {0, NODE_VMALLOC, 1000, 0, 1, 0, 0, "sync: percpu objpool from vmalloc"}, +}; + +struct ot_ctrl g_ot_miss[] = { + {0, NODE_COMPACT, 1000, 0, 16, 0, 0, "sync overrun: percpu objpool"}, + {0, NODE_VMALLOC, 1000, 0, 16, 0, 0, "sync overrun: percpu objpool from vmalloc"}, +}; + +struct ot_ctrl g_ot_async[] = { + {0, NODE_COMPACT, 1000, 4, 8, 8, 6, "async: percpu objpool"}, + {0, NODE_VMALLOC, 1000, 4, 8, 8, 6, "async: percpu objpool from vmalloc"}, +}; + +static int __init ot_mod_init(void) +{ + int i; + + ot_init_data(&g_ot_data); + + for (i = 0; i < ARRAY_SIZE(g_ot_sync); i++) { + if (ot_start_sync(&g_ot_sync[i])) + goto out; + ot_reset_data(&g_ot_data); + } + + for (i = 0; i < ARRAY_SIZE(g_ot_miss); i++) { + if (ot_start_sync(&g_ot_miss[i])) + goto out; + ot_reset_data(&g_ot_data); + } + + for (i = 0; i < ARRAY_SIZE(g_ot_async); i++) { + if (ot_start_async(&g_ot_async[i])) + goto out; + ot_reset_data(&g_ot_data); + } + +out: + return -EAGAIN; +} + +static void __exit ot_mod_exit(void) +{ +} + +module_init(ot_mod_init); +module_exit(ot_mod_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Matt Wu"); From patchwork Mon Dec 12 12:31:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "wuqiang.matt" X-Patchwork-Id: 32353 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp2215694wrr; Mon, 12 Dec 2022 04:34:48 -0800 (PST) X-Google-Smtp-Source: AA0mqf5n3SD8KKy+wFYEPnm4RKsY4zxGbdJBl1+J/WrMM2CV4IjhVN+qgOvEfmV0qornBilN0reX X-Received: by 2002:a17:906:4f92:b0:7a7:9a38:d284 with SMTP id o18-20020a1709064f9200b007a79a38d284mr13460999eju.19.1670848488528; Mon, 12 Dec 2022 04:34:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670848488; cv=none; d=google.com; s=arc-20160816; b=vIMClwQPHJjLWSckWJ75FiX1Qc0szw7+oFxHoSDwgRdyQsbbyU0QB7hP+7DBZ4fx7F i6ay9jY/ehzKuNCxH+j+JV4q2a4wiSSIv5vIJUVDNyXq62jTZ0muLzXqJaNiUUG8m3Eo /Fq6hKg5W9wnFOLCyoUB2MpkpHmVvyVOPscF4Wi7G2Fv35WJEnyASplUv9uXFtybsEJn Rzc7w+J+kN1pPSoijuQC0XuFTNkIDDi2yFELf0CHKGLpgI2wf8JM4eWGU1KkJS5pY9KU +hF4k7d5DehIRc3t6Q7Hni6ASDo+eYNLmkx+XvRNv1Kp2yw1rFzo0OgfPkKTpjCtoSyg 7D1w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=B9KK76o4dD9Ujc2kSY8YS/TMhCbH0ifcsw1v6jMsH9w=; b=z9fdW8ppFxjED0/F9Ho5uftC5oHdIdJHcbpwDrSbe2FrT2XoXj4AhLelxm43P2mtiA ol1QOYNR9dcQEp9A91FzoNh/NXTO2JqxvJoabKPyEAZe+stKylt4JuvZJD1SCfJqP8qe z4B6ekY0PAzykwPBTiKJQqVOX/qGan32NNuxKTfck5wo+PqA9tukZl5HUjOB7jD0LQEZ cKVSne5VfoOHE4lDEkX4vVCKcSBuro+OepujA9vSx1jpBqDqfeqEs1ZOw7rovVeyq2Pz LSnTFqFNmlJH0TE6E8LpwV32yZNm/D6qdBHFjohxEaKjmn8iHQ4dxmhFD4nEtI8BFKnO Cckg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=vDM5JZ9P; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hg4-20020a1709072cc400b007b299051a9fsi5840943ejc.192.2022.12.12.04.34.24; Mon, 12 Dec 2022 04:34:48 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=vDM5JZ9P; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232160AbiLLMdY (ORCPT + 99 others); Mon, 12 Dec 2022 07:33:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43616 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232112AbiLLMdC (ORCPT ); Mon, 12 Dec 2022 07:33:02 -0500 Received: from mail-pj1-x102d.google.com (mail-pj1-x102d.google.com [IPv6:2607:f8b0:4864:20::102d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 89C0810FF1 for ; Mon, 12 Dec 2022 04:32:52 -0800 (PST) Received: by mail-pj1-x102d.google.com with SMTP id t11-20020a17090a024b00b0021932afece4so15529717pje.5 for ; Mon, 12 Dec 2022 04:32:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=B9KK76o4dD9Ujc2kSY8YS/TMhCbH0ifcsw1v6jMsH9w=; b=vDM5JZ9PWjxJSsFbX4zeNgVeJwoPx/YuH2ZWSgzqb0v5U/o6xLIvct0ERI5Efkbeit V6YEEOgr2pZFIhHaUlZLbjkhYqPaKMJZiKnXCLPecrVAL7g15fkX97PQtisdwyRMpUFS sWNar3cR+bphwYVpunKT6grWOAwBt026A1OLvMKcDVkqASUEWZRM5w1tlUUBdSPTBVq2 eyQmu5wPMIDdkwvkLrHd9UZ/H95SfjbwyJeS4q38XJgcv3zsDWFn5dnnHb0kEWK4v28Q SQeNHZSJd61Qz03RtzGU46SBNl6LGHB25x7Dip98DivwnQ/s0QcimY39Zr9PhWPs/wIf ZcWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=B9KK76o4dD9Ujc2kSY8YS/TMhCbH0ifcsw1v6jMsH9w=; b=SAqE+0yOt1W3c+yU5RVIZEJ8F7tSUnVR2Ao0oWLIz4REPoNcsOOc0TH0kDMCvNLDf6 zDQ3Rp5GkdMig25ShjqeU3ckYW8xTYd/8AZGnOmgaU587Fh2LIZ46S8BjQYfOInOSKZB KEe4ZCkh6ORUZnwTeuTVGCU8N8NirnYBQCpzM7BkUSD2y64rBiVqE7AQt9BwAH3HbE5a sCikegDldvUR+qB9gbwXcFXbU5F8ehN6KCiAt/AVMcNOLTlsrQ/nTiP2qIQL/Z5uypMk RCT7S1BVZynnUchlGFfcG0/6a/SUX3LerH9CwzoxsbKcr5Xl4+4mvVBZztCNSWFX4Ymi 2EnQ== X-Gm-Message-State: ANoB5pl5MtvkPoXmXIhBrvIXZ/P21ygS4cvfWf5AUT2Gsh+1XHLnDGzA ak5rYQvQj97SDfG3kmw9C2kznQ== X-Received: by 2002:a17:902:ce06:b0:188:bc62:276f with SMTP id k6-20020a170902ce0600b00188bc62276fmr18388161plg.3.1670848371913; Mon, 12 Dec 2022 04:32:51 -0800 (PST) Received: from devtp.bytedance.net ([139.177.225.227]) by smtp.gmail.com with ESMTPSA id jw1-20020a170903278100b001869ba04c83sm6219987plb.245.2022.12.12.04.32.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 12 Dec 2022 04:32:51 -0800 (PST) From: wuqiang To: mhiramat@kernel.org, davem@davemloft.net, anil.s.keshavamurthy@intel.com, naveen.n.rao@linux.ibm.com, rostedt@goodmis.org, peterz@infradead.org, akpm@linux-foundation.org, sander@svanheule.net, ebiggers@google.com, dan.j.williams@intel.com, jpoimboe@kernel.org Cc: linux-kernel@vger.kernel.org, lkp@intel.com, mattwu@163.com, wuqiang Subject: [PATCH v7 3/5] kprobes: kretprobe scalability improvement with objpool Date: Mon, 12 Dec 2022 20:31:51 +0800 Message-Id: <20221212123153.190888-4-wuqiang.matt@bytedance.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20221212123153.190888-1-wuqiang.matt@bytedance.com> References: <20221212123153.190888-1-wuqiang.matt@bytedance.com> MIME-Version: 1.0 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752011624850297974?= X-GMAIL-MSGID: =?utf-8?q?1752011624850297974?= kretprobe is using freelist to manage return-instances, but freelist, as LIFO queue based on singly linked list, scales badly and reduces the overall throughput of kretprobed routines, especially for high contention scenarios. Here's a typical throughput test of sys_flock (counts in 10 seconds, measured with perf stat -a -I 10000 -e syscalls:sys_enter_flock): OS: Debian 10 X86_64, Linux 6.1rc2 HW: XEON 8336C x 2, 64 cores/128 threads, DDR4 3200MT/s 1X 2X 4X 6X 8X 12X 16X 34762430 36546920 17949900 13101899 12569595 12646601 14729195 24X 32X 48X 64X 72X 96X 128X 19263546 10102064 8985418 11936495 11493980 7127789 9330985 This patch introduces objpool to kretprobe and rethook, with orginal freelist replaced and brings near-linear scalability to kretprobed routines. Tests of kretprobe throughput show the biggest ratio as 333.9x of the original freelist. Here's the comparison: 1X 2X 4X 8X 16X freelist: 34762430 36546920 17949900 12569595 14729195 objpool: 35627544 72182095 144068494 287564688 576903916 32X 48X 64X 96X 128X freelist: 10102064 8985418 11936495 7127789 9330985 objpool: 1158876372 1737828164 2324371724 2380310472 2463182819 Tests on 96-core ARM64 system output similarly, but with the biggest ratio up to 642.2x: OS: Debian 10 AARCH64, Linux 6.1rc2 HW: Kunpeng-920 96 cores/2 sockets/4 NUMA nodes, DDR4 2933 MT/s 1X 2X 4X 8X 16X freelist: 17498299 10887037 10224710 8499132 6421751 objpool: 18715726 35549845 71615884 144258971 283707220 24X 32X 48X 64X 96X freelist: 5339868 4819116 3593919 3121575 2687167 objpool: 419830913 571609748 877456139 1143316315 1725668029 Signed-off-by: wuqiang --- include/linux/kprobes.h | 9 ++-- include/linux/rethook.h | 14 ++---- kernel/kprobes.c | 101 +++++++++++++++++++--------------------- kernel/trace/fprobe.c | 37 ++++++--------- kernel/trace/rethook.c | 99 ++++++++++++++++++++------------------- 5 files changed, 118 insertions(+), 142 deletions(-) diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h index a0b92be98984..122b1f21f3a9 100644 --- a/include/linux/kprobes.h +++ b/include/linux/kprobes.h @@ -27,7 +27,7 @@ #include #include #include -#include +#include #include #include @@ -141,6 +141,7 @@ static inline bool kprobe_ftrace(struct kprobe *p) */ struct kretprobe_holder { struct kretprobe *rp; + struct objpool_head pool; refcount_t ref; }; @@ -154,7 +155,6 @@ struct kretprobe { #ifdef CONFIG_KRETPROBE_ON_RETHOOK struct rethook *rh; #else - struct freelist_head freelist; struct kretprobe_holder *rph; #endif }; @@ -165,10 +165,7 @@ struct kretprobe_instance { #ifdef CONFIG_KRETPROBE_ON_RETHOOK struct rethook_node node; #else - union { - struct freelist_node freelist; - struct rcu_head rcu; - }; + struct rcu_head rcu; struct llist_node llist; struct kretprobe_holder *rph; kprobe_opcode_t *ret_addr; diff --git a/include/linux/rethook.h b/include/linux/rethook.h index c8ac1e5afcd1..f97283c622b7 100644 --- a/include/linux/rethook.h +++ b/include/linux/rethook.h @@ -6,7 +6,7 @@ #define _LINUX_RETHOOK_H #include -#include +#include #include #include #include @@ -30,14 +30,13 @@ typedef void (*rethook_handler_t) (struct rethook_node *, void *, struct pt_regs struct rethook { void *data; rethook_handler_t handler; - struct freelist_head pool; + struct objpool_head pool; refcount_t ref; struct rcu_head rcu; }; /** * struct rethook_node - The rethook shadow-stack entry node. - * @freelist: The freelist, linked to struct rethook::pool. * @rcu: The rcu_head for deferred freeing. * @llist: The llist, linked to a struct task_struct::rethooks. * @rethook: The pointer to the struct rethook. @@ -48,19 +47,15 @@ struct rethook { * on each entry of the shadow stack. */ struct rethook_node { - union { - struct freelist_node freelist; - struct rcu_head rcu; - }; + struct rcu_head rcu; struct llist_node llist; struct rethook *rethook; unsigned long ret_addr; unsigned long frame; }; -struct rethook *rethook_alloc(void *data, rethook_handler_t handler); +struct rethook *rethook_alloc(void *data, rethook_handler_t handler, int size, int num); void rethook_free(struct rethook *rh); -void rethook_add_node(struct rethook *rh, struct rethook_node *node); struct rethook_node *rethook_try_get(struct rethook *rh); void rethook_recycle(struct rethook_node *node); void rethook_hook(struct rethook_node *node, struct pt_regs *regs, bool mcount); @@ -97,4 +92,3 @@ void rethook_flush_task(struct task_struct *tk); #endif #endif - diff --git a/kernel/kprobes.c b/kernel/kprobes.c index 3050631e528d..5f35997b61f7 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -1868,13 +1868,28 @@ static struct notifier_block kprobe_exceptions_nb = { #ifdef CONFIG_KRETPROBES #if !defined(CONFIG_KRETPROBE_ON_RETHOOK) + +/* callbacks for objpool of kretprobe instances */ +static int kretprobe_init_inst(void *nod, void *context) +{ + struct kretprobe_instance *ri = nod; + + ri->rph = context; + return 0; +} +static int kretprobe_fini_pool(struct objpool_head *head, void *context) +{ + kfree(context); + return 0; +} + static void free_rp_inst_rcu(struct rcu_head *head) { struct kretprobe_instance *ri = container_of(head, struct kretprobe_instance, rcu); + struct kretprobe_holder *rph = ri->rph; - if (refcount_dec_and_test(&ri->rph->ref)) - kfree(ri->rph); - kfree(ri); + if (refcount_dec_and_test(&rph->ref)) + objpool_fini(&rph->pool); } NOKPROBE_SYMBOL(free_rp_inst_rcu); @@ -1883,7 +1898,7 @@ static void recycle_rp_inst(struct kretprobe_instance *ri) struct kretprobe *rp = get_kretprobe(ri); if (likely(rp)) - freelist_add(&ri->freelist, &rp->freelist); + objpool_push(ri, &rp->rph->pool); else call_rcu(&ri->rcu, free_rp_inst_rcu); } @@ -1920,23 +1935,18 @@ NOKPROBE_SYMBOL(kprobe_flush_task); static inline void free_rp_inst(struct kretprobe *rp) { - struct kretprobe_instance *ri; - struct freelist_node *node; - int count = 0; - - node = rp->freelist.head; - while (node) { - ri = container_of(node, struct kretprobe_instance, freelist); - node = node->next; - - kfree(ri); - count++; - } + struct kretprobe_holder *rph = rp->rph; + void *nod; - if (refcount_sub_and_test(count, &rp->rph->ref)) { - kfree(rp->rph); - rp->rph = NULL; - } + rp->rph = NULL; + do { + nod = objpool_pop(&rph->pool); + /* deref anyway since we've one extra ref grabbed */ + if (refcount_dec_and_test(&rph->ref)) { + objpool_fini(&rph->pool); + break; + } + } while (nod); } /* This assumes the 'tsk' is the current task or the is not running. */ @@ -2078,19 +2088,17 @@ NOKPROBE_SYMBOL(__kretprobe_trampoline_handler) static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs) { struct kretprobe *rp = container_of(p, struct kretprobe, kp); + struct kretprobe_holder *rph = rp->rph; struct kretprobe_instance *ri; - struct freelist_node *fn; - fn = freelist_try_get(&rp->freelist); - if (!fn) { + ri = objpool_pop(&rph->pool); + if (!ri) { rp->nmissed++; return 0; } - ri = container_of(fn, struct kretprobe_instance, freelist); - if (rp->entry_handler && rp->entry_handler(ri, regs)) { - freelist_add(&ri->freelist, &rp->freelist); + objpool_push(ri, &rph->pool); return 0; } @@ -2183,7 +2191,6 @@ int kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long o int register_kretprobe(struct kretprobe *rp) { int ret; - struct kretprobe_instance *inst; int i; void *addr; @@ -2221,20 +2228,12 @@ int register_kretprobe(struct kretprobe *rp) #endif } #ifdef CONFIG_KRETPROBE_ON_RETHOOK - rp->rh = rethook_alloc((void *)rp, kretprobe_rethook_handler); - if (!rp->rh) - return -ENOMEM; + rp->rh = rethook_alloc((void *)rp, kretprobe_rethook_handler, + sizeof(struct kretprobe_instance) + + rp->data_size, rp->maxactive); + if (IS_ERR(rp->rh)) + return PTR_ERR(rp->rh); - for (i = 0; i < rp->maxactive; i++) { - inst = kzalloc(sizeof(struct kretprobe_instance) + - rp->data_size, GFP_KERNEL); - if (inst == NULL) { - rethook_free(rp->rh); - rp->rh = NULL; - return -ENOMEM; - } - rethook_add_node(rp->rh, &inst->node); - } rp->nmissed = 0; /* Establish function entry probe point */ ret = register_kprobe(&rp->kp); @@ -2243,25 +2242,19 @@ int register_kretprobe(struct kretprobe *rp) rp->rh = NULL; } #else /* !CONFIG_KRETPROBE_ON_RETHOOK */ - rp->freelist.head = NULL; rp->rph = kzalloc(sizeof(struct kretprobe_holder), GFP_KERNEL); if (!rp->rph) return -ENOMEM; - rp->rph->rp = rp; - for (i = 0; i < rp->maxactive; i++) { - inst = kzalloc(sizeof(struct kretprobe_instance) + - rp->data_size, GFP_KERNEL); - if (inst == NULL) { - refcount_set(&rp->rph->ref, i); - free_rp_inst(rp); - return -ENOMEM; - } - inst->rph = rp->rph; - freelist_add(&inst->freelist, &rp->freelist); + if (objpool_init(&rp->rph->pool, rp->maxactive, rp->data_size + + sizeof(struct kretprobe_instance), GFP_KERNEL, + rp->rph, kretprobe_init_inst, kretprobe_fini_pool)) { + kfree(rp->rph); + rp->rph = NULL; + return -ENOMEM; } - refcount_set(&rp->rph->ref, i); - + refcount_set(&rp->rph->ref, rp->maxactive + 1); + rp->rph->rp = rp; rp->nmissed = 0; /* Establish function entry probe point */ ret = register_kprobe(&rp->kp); diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c index e8143e368074..9b685d6921d1 100644 --- a/kernel/trace/fprobe.c +++ b/kernel/trace/fprobe.c @@ -125,41 +125,32 @@ static void fprobe_init(struct fprobe *fp) static int fprobe_init_rethook(struct fprobe *fp, int num) { - int i, size; - - if (num < 0) - return -EINVAL; + int max; if (!fp->exit_handler) { fp->rethook = NULL; return 0; } - /* Initialize rethook if needed */ - size = num * num_possible_cpus() * 2; - if (size < 0) + if (num <= 0) + return -EINVAL; + max = num * num_possible_cpus() * 2; + /* Fail if max overflows */ + if (max <= 0) return -E2BIG; - fp->rethook = rethook_alloc((void *)fp, fprobe_exit_handler); - if (!fp->rethook) - return -ENOMEM; - for (i = 0; i < size; i++) { - struct fprobe_rethook_node *node; - - node = kzalloc(sizeof(*node), GFP_KERNEL); - if (!node) { - rethook_free(fp->rethook); - fp->rethook = NULL; - return -ENOMEM; - } - rethook_add_node(fp->rethook, &node->node); - } + /* Initialize rethook */ + fp->rethook = rethook_alloc((void *)fp, fprobe_exit_handler, + sizeof(struct fprobe_rethook_node), max); + if (IS_ERR(fp->rethook)) + return PTR_ERR(fp->rethook); + return 0; } static void fprobe_fail_cleanup(struct fprobe *fp) { - if (fp->rethook) { + if (!IS_ERR_OR_NULL(fp->rethook)) { /* Don't need to cleanup rethook->handler because this is not used. */ rethook_free(fp->rethook); fp->rethook = NULL; @@ -313,7 +304,7 @@ int unregister_fprobe(struct fprobe *fp) * current running handlers are finished, call unregister_ftrace_function() * after this. */ - if (fp->rethook) + if (!IS_ERR_OR_NULL(fp->rethook)) rethook_free(fp->rethook); ret = unregister_ftrace_function(&fp->ops); diff --git a/kernel/trace/rethook.c b/kernel/trace/rethook.c index 32c3dfdb4d6a..6e1014e4f2f7 100644 --- a/kernel/trace/rethook.c +++ b/kernel/trace/rethook.c @@ -36,21 +36,16 @@ void rethook_flush_task(struct task_struct *tk) static void rethook_free_rcu(struct rcu_head *head) { struct rethook *rh = container_of(head, struct rethook, rcu); - struct rethook_node *rhn; - struct freelist_node *node; - int count = 1; - - node = rh->pool.head; - while (node) { - rhn = container_of(node, struct rethook_node, freelist); - node = node->next; - kfree(rhn); - count++; - } + struct rethook_node *nod; - /* The rh->ref is the number of pooled node + 1 */ - if (refcount_sub_and_test(count, &rh->ref)) - kfree(rh); + do { + nod = objpool_pop(&rh->pool); + /* deref anyway since we've one extra ref grabbed */ + if (refcount_dec_and_test(&rh->ref)) { + objpool_fini(&rh->pool); + break; + } + } while (nod); } /** @@ -70,54 +65,65 @@ void rethook_free(struct rethook *rh) call_rcu(&rh->rcu, rethook_free_rcu); } +static int rethook_init_node(void *nod, void *context) +{ + struct rethook_node *node = nod; + + node->rethook = context; + return 0; +} + +static int rethook_fini_pool(struct objpool_head *head, void *context) +{ + kfree(context); + return 0; +} + /** * rethook_alloc() - Allocate struct rethook. * @data: a data to pass the @handler when hooking the return. - * @handler: the return hook callback function. + * @handler: the return hook callback function, must NOT be NULL + * @gfp: default gfp for objpool allocation + * @size: node size: rethook node and additional data + * @num: number of rethook nodes to be preallocated * * Allocate and initialize a new rethook with @data and @handler. - * Return NULL if memory allocation fails or @handler is NULL. + * Return pointer of new rethook, or error codes for failures. + * * Note that @handler == NULL means this rethook is going to be freed. */ -struct rethook *rethook_alloc(void *data, rethook_handler_t handler) +struct rethook *rethook_alloc(void *data, rethook_handler_t handler, + int size, int num) { - struct rethook *rh = kzalloc(sizeof(struct rethook), GFP_KERNEL); + struct rethook *rh; - if (!rh || !handler) { - kfree(rh); - return NULL; - } + if (!handler || num <= 0 || size < sizeof(struct rethook_node)) + return ERR_PTR(-EINVAL); + + rh = kzalloc(sizeof(struct rethook), GFP_KERNEL); + if (!rh) + return ERR_PTR(-ENOMEM); rh->data = data; rh->handler = handler; - rh->pool.head = NULL; - refcount_set(&rh->ref, 1); + /* initialize the objpool for rethook nodes */ + if (objpool_init(&rh->pool, num, size, GFP_KERNEL, rh, + rethook_init_node, rethook_fini_pool)) { + kfree(rh); + return ERR_PTR(-ENOMEM); + } + refcount_set(&rh->ref, num + 1); return rh; } -/** - * rethook_add_node() - Add a new node to the rethook. - * @rh: the struct rethook. - * @node: the struct rethook_node to be added. - * - * Add @node to @rh. User must allocate @node (as a part of user's - * data structure.) The @node fields are initialized in this function. - */ -void rethook_add_node(struct rethook *rh, struct rethook_node *node) -{ - node->rethook = rh; - freelist_add(&node->freelist, &rh->pool); - refcount_inc(&rh->ref); -} - static void free_rethook_node_rcu(struct rcu_head *head) { struct rethook_node *node = container_of(head, struct rethook_node, rcu); + struct rethook *rh = node->rethook; - if (refcount_dec_and_test(&node->rethook->ref)) - kfree(node->rethook); - kfree(node); + if (refcount_dec_and_test(&rh->ref)) + objpool_fini(&rh->pool); } /** @@ -132,7 +138,7 @@ void rethook_recycle(struct rethook_node *node) lockdep_assert_preemption_disabled(); if (likely(READ_ONCE(node->rethook->handler))) - freelist_add(&node->freelist, &node->rethook->pool); + objpool_push(node, &node->rethook->pool); else call_rcu(&node->rcu, free_rethook_node_rcu); } @@ -148,7 +154,6 @@ NOKPROBE_SYMBOL(rethook_recycle); struct rethook_node *rethook_try_get(struct rethook *rh) { rethook_handler_t handler = READ_ONCE(rh->handler); - struct freelist_node *fn; lockdep_assert_preemption_disabled(); @@ -165,11 +170,7 @@ struct rethook_node *rethook_try_get(struct rethook *rh) if (unlikely(!rcu_is_watching())) return NULL; - fn = freelist_try_get(&rh->pool); - if (!fn) - return NULL; - - return container_of(fn, struct rethook_node, freelist); + return (struct rethook_node *)objpool_pop(&rh->pool); } NOKPROBE_SYMBOL(rethook_try_get); From patchwork Mon Dec 12 12:31:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "wuqiang.matt" X-Patchwork-Id: 32352 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp2215693wrr; Mon, 12 Dec 2022 04:34:48 -0800 (PST) X-Google-Smtp-Source: AA0mqf5H0l/IwlJb65VByLdMliTh8vVTEcWRZxbOQdyIdjASKIV5ZbkCm/CCaL2wPwM1NXbfzzFv X-Received: by 2002:aa7:de04:0:b0:46a:8a79:377d with SMTP id h4-20020aa7de04000000b0046a8a79377dmr14230776edv.34.1670848488363; Mon, 12 Dec 2022 04:34:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670848488; cv=none; d=google.com; s=arc-20160816; b=kWjBLXEzCBCMHT2RIysVgZg6BfBVL9SnfN/oDwhKNR6h2zYdFn0Wa8IUc8nGjREIPe LuelUDKQ7A2fy1+Rp/XpZVEfgNX+NRUMSx67nou0ukBB88jKx3OJ9JdTrVHKmI/8A/GK BSTuv52gZkJuDdhq5rm0d2/UVYjjfYgQoNOoDxGTmTn3/7M1L3c7gwgnP3/JXjst/yvi gaC4OxM4HMZ/jHkBmxNxlX7775xMI/QIDekIvEGK/0XMGc6JSrJRt0rXvmw0nxIoh20S y3KiOG1hCWkzQ+K6nqc6Bi1+x6tPfdsUdqPw8okR/6pam8n0QlYm7kljgbv6JJGuflKh 0ecg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=OkVJDMV1dLbnY0a97cBJ8dALMRcKW3eBADQn81KDV+U=; b=U0Npc4Q5aSAbcNbj1suAcE/ekOvvZOlBPv5Ez6OcsP0B2Nu6YkPNeBtNbNMSZzRvwJ LyCd1qQEbuNPOomqdds3cWpJW96/HLtHWj7OzPJ+FgjxSw8QrJWzlOds7qY5w1EOTMJx OGHlUo0MQwOPPJXoFxSsW/yNT8yDPc1jG9HciKOrOueWVOwFTtjcNMLcK+ZNa+TQhWsb BzUcbbEuJc8wfGfQKWCbq2CLI24K1YI8uPihI9Cv1xmlW1q5+dl+9xv/NUdbrrd+oSzv hKrTG5EGW9+NK3EoS+PlrAonA2kaQiakujspv3y7upwJnizlPnNDQXl79L9FhJtbf5Mi s1hw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b="eipWx9/p"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m5-20020a056402050500b0046885dfa380si6628901edv.247.2022.12.12.04.34.24; Mon, 12 Dec 2022 04:34:48 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b="eipWx9/p"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231856AbiLLMd3 (ORCPT + 99 others); Mon, 12 Dec 2022 07:33:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43422 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232122AbiLLMdD (ORCPT ); Mon, 12 Dec 2022 07:33:03 -0500 Received: from mail-pj1-x1032.google.com (mail-pj1-x1032.google.com [IPv6:2607:f8b0:4864:20::1032]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7B84111A00 for ; Mon, 12 Dec 2022 04:32:59 -0800 (PST) Received: by mail-pj1-x1032.google.com with SMTP id n65-20020a17090a2cc700b0021bc5ef7a14so12080384pjd.0 for ; Mon, 12 Dec 2022 04:32:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=OkVJDMV1dLbnY0a97cBJ8dALMRcKW3eBADQn81KDV+U=; b=eipWx9/prrMwJzeGbMBSVMSc4mPPoBwq9qXM7SCQHwfIqG7At4VURyR3GWLh0lJSYT bZYEPzUMMenHa+z146tDJ11mWMNZUEdKd1QaAbJeBdvUE1NmnWURsdavgfWsVhD5YW4n Go5wvuGKCcn3gCYRJrx6F/2h7RmNwzJkOx4ziS03wAgEyBnBEOVQwcEzwJPDDnpNtJgB 8OEyYMVDz0FkhfotU8jr3/h0qai1STtGJmyqoPA5m/+PXi1zD/DBDB+tAjphBUUeTgwX 2avcWaivcMPuReDR/zMOQ16txxltbIUB9iItbU3EfoiABgKFnABClbYdVrrC95H6l9fh 8Stg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=OkVJDMV1dLbnY0a97cBJ8dALMRcKW3eBADQn81KDV+U=; b=HJP7naKjPSi3ANEkdznYtu5EVyui9CFViglXcigxvKg3Q+e9OPgEsqoOwpVbuBt2Ak 0rkqM+TXxpjg+Swtn+0QjpFv6LzGKRt7R3/3dZbjRpeQy0TzpKas5+PVPPb8qs1TBe8u UoWQdI3CMSHSL0NneMBl2F031TjoBBYGj11No+UyawbYZGnkiOMbwKcCxmpqNhPA6xdr 3E3Gl4RczN59T3uTPUWF+vw/10XKZEdliFhw7n6B9wIkcz/mb17ac25MrU0EtZxWW38a HbUt29TFjJCtbKl3Gr8m4Gu7LHJW7m6C8PMZVAun+jJcNvmMPnmKE/cAhOu+R3Gfc0E5 hCIA== X-Gm-Message-State: ANoB5pmnKrm2A6A2UOyGvKOIjehrNCUzXPbI853CbGwK2t08DLpcfFTJ p1f/oDenE5gC4/hBC9Cv2B1+GA== X-Received: by 2002:a17:902:e743:b0:187:34f6:3323 with SMTP id p3-20020a170902e74300b0018734f63323mr23892675plf.59.1670848378982; Mon, 12 Dec 2022 04:32:58 -0800 (PST) Received: from devtp.bytedance.net ([139.177.225.227]) by smtp.gmail.com with ESMTPSA id jw1-20020a170903278100b001869ba04c83sm6219987plb.245.2022.12.12.04.32.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 12 Dec 2022 04:32:58 -0800 (PST) From: wuqiang To: mhiramat@kernel.org, davem@davemloft.net, anil.s.keshavamurthy@intel.com, naveen.n.rao@linux.ibm.com, rostedt@goodmis.org, peterz@infradead.org, akpm@linux-foundation.org, sander@svanheule.net, ebiggers@google.com, dan.j.williams@intel.com, jpoimboe@kernel.org Cc: linux-kernel@vger.kernel.org, lkp@intel.com, mattwu@163.com, wuqiang Subject: [PATCH v7 4/5] kprobes: freelist.h removed Date: Mon, 12 Dec 2022 20:31:52 +0800 Message-Id: <20221212123153.190888-5-wuqiang.matt@bytedance.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20221212123153.190888-1-wuqiang.matt@bytedance.com> References: <20221212123153.190888-1-wuqiang.matt@bytedance.com> MIME-Version: 1.0 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752011624087201254?= X-GMAIL-MSGID: =?utf-8?q?1752011624087201254?= This patch will remove freelist.h from kernel source tree, since the only use cases (kretprobe and rethook) are converted to objpool. Signed-off-by: wuqiang --- include/linux/freelist.h | 129 --------------------------------------- 1 file changed, 129 deletions(-) delete mode 100644 include/linux/freelist.h diff --git a/include/linux/freelist.h b/include/linux/freelist.h deleted file mode 100644 index fc1842b96469..000000000000 --- a/include/linux/freelist.h +++ /dev/null @@ -1,129 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause */ -#ifndef FREELIST_H -#define FREELIST_H - -#include - -/* - * Copyright: cameron@moodycamel.com - * - * A simple CAS-based lock-free free list. Not the fastest thing in the world - * under heavy contention, but simple and correct (assuming nodes are never - * freed until after the free list is destroyed), and fairly speedy under low - * contention. - * - * Adapted from: https://moodycamel.com/blog/2014/solving-the-aba-problem-for-lock-free-free-lists - */ - -struct freelist_node { - atomic_t refs; - struct freelist_node *next; -}; - -struct freelist_head { - struct freelist_node *head; -}; - -#define REFS_ON_FREELIST 0x80000000 -#define REFS_MASK 0x7FFFFFFF - -static inline void __freelist_add(struct freelist_node *node, struct freelist_head *list) -{ - /* - * Since the refcount is zero, and nobody can increase it once it's - * zero (except us, and we run only one copy of this method per node at - * a time, i.e. the single thread case), then we know we can safely - * change the next pointer of the node; however, once the refcount is - * back above zero, then other threads could increase it (happens under - * heavy contention, when the refcount goes to zero in between a load - * and a refcount increment of a node in try_get, then back up to - * something non-zero, then the refcount increment is done by the other - * thread) -- so if the CAS to add the node to the actual list fails, - * decrese the refcount and leave the add operation to the next thread - * who puts the refcount back to zero (which could be us, hence the - * loop). - */ - struct freelist_node *head = READ_ONCE(list->head); - - for (;;) { - WRITE_ONCE(node->next, head); - atomic_set_release(&node->refs, 1); - - if (!try_cmpxchg_release(&list->head, &head, node)) { - /* - * Hmm, the add failed, but we can only try again when - * the refcount goes back to zero. - */ - if (atomic_fetch_add_release(REFS_ON_FREELIST - 1, &node->refs) == 1) - continue; - } - return; - } -} - -static inline void freelist_add(struct freelist_node *node, struct freelist_head *list) -{ - /* - * We know that the should-be-on-freelist bit is 0 at this point, so - * it's safe to set it using a fetch_add. - */ - if (!atomic_fetch_add_release(REFS_ON_FREELIST, &node->refs)) { - /* - * Oh look! We were the last ones referencing this node, and we - * know we want to add it to the free list, so let's do it! - */ - __freelist_add(node, list); - } -} - -static inline struct freelist_node *freelist_try_get(struct freelist_head *list) -{ - struct freelist_node *prev, *next, *head = smp_load_acquire(&list->head); - unsigned int refs; - - while (head) { - prev = head; - refs = atomic_read(&head->refs); - if ((refs & REFS_MASK) == 0 || - !atomic_try_cmpxchg_acquire(&head->refs, &refs, refs+1)) { - head = smp_load_acquire(&list->head); - continue; - } - - /* - * Good, reference count has been incremented (it wasn't at - * zero), which means we can read the next and not worry about - * it changing between now and the time we do the CAS. - */ - next = READ_ONCE(head->next); - if (try_cmpxchg_acquire(&list->head, &head, next)) { - /* - * Yay, got the node. This means it was on the list, - * which means should-be-on-freelist must be false no - * matter the refcount (because nobody else knows it's - * been taken off yet, it can't have been put back on). - */ - WARN_ON_ONCE(atomic_read(&head->refs) & REFS_ON_FREELIST); - - /* - * Decrease refcount twice, once for our ref, and once - * for the list's ref. - */ - atomic_fetch_add(-2, &head->refs); - - return head; - } - - /* - * OK, the head must have changed on us, but we still need to decrement - * the refcount we increased. - */ - refs = atomic_fetch_add(-1, &prev->refs); - if (refs == REFS_ON_FREELIST + 1) - __freelist_add(prev, list); - } - - return NULL; -} - -#endif /* FREELIST_H */ From patchwork Mon Dec 12 12:31:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "wuqiang.matt" X-Patchwork-Id: 32354 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp2215705wrr; Mon, 12 Dec 2022 04:34:51 -0800 (PST) X-Google-Smtp-Source: AA0mqf7li+7iB1GTegzk1Wv57UQfWqpoNBWwpGRgu9Ou4Uwu+kEUqP4vMvRidD2KqPUD3JKMW3Wc X-Received: by 2002:a17:90a:348e:b0:219:8cbb:c158 with SMTP id p14-20020a17090a348e00b002198cbbc158mr16765038pjb.5.1670848490689; Mon, 12 Dec 2022 04:34:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670848490; cv=none; d=google.com; s=arc-20160816; b=U1BE6cit24hUVcp8fvGttSULWLLCg8K9pkEDdO4Pyn4KiRumTD3XMlRtddt6IPHkEf e+84jc7p1MBuUTvv1Nw3CKPH9RgYcmqsj521/umYXRNSxYSeqaxTrupxMExhd5MVbzN+ ZMp5UcK1B1bV4j18r5Z91xpwIALcO8LmSnA2I6/DcgeRGPiACB6x9nsDVWNWZyy//T8x H3W91CW5PCZrWDvBHkdgFrRJ7n/nqxhA6UXJoGBuqhIk7SdbUqOXb8O6M34gRPvQdu6f L0ll1MPajy0EWEE/zIia4LXE/RjNQQMuGEHVhISu+i39loFw+QeTms0+1pSl0HvaYEsD qQgg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=f+8uGGe5vOHJy+eJbczvZUafBVT7vGqMpKgjkjdLPuI=; b=RFZ4Ms0xQDz6Yw698XPUD3Su51Qz58BpeOTs2/o6K44lM6fK7TR8D/gRrSYJILK1ui g8z2i8UtcnpHSSggAzGsHSdk5u6lDhrF9luAOKp3FBhIPA1L6+7CPQoVEDjGNGCz4yWo v8yQOsccnI/DGtXwwED8NjCfv2g0PVTKdcHpqaeePSdKtUt74mhMcJoi9MhRZaKuJsh3 03jKc4aevpYEepYO+XApW8G91EMR4JbrADe/liRUkonBsq+wO0v8viuvhv78A01xQtJk 41JrQ92ERIXq7q8qrcId8AOOf9x918VyxFfyafavyKqdDrq+6d3QAYZEWTR0ZO4pBVCR 3tFQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=tcqTn2ob; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id nv15-20020a17090b1b4f00b0021917840676si9996557pjb.31.2022.12.12.04.34.36; Mon, 12 Dec 2022 04:34:50 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=tcqTn2ob; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232254AbiLLMdd (ORCPT + 99 others); Mon, 12 Dec 2022 07:33:33 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43470 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232135AbiLLMdH (ORCPT ); Mon, 12 Dec 2022 07:33:07 -0500 Received: from mail-pl1-x635.google.com (mail-pl1-x635.google.com [IPv6:2607:f8b0:4864:20::635]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C2B4ECD1 for ; Mon, 12 Dec 2022 04:33:05 -0800 (PST) Received: by mail-pl1-x635.google.com with SMTP id d15so2309817pls.6 for ; Mon, 12 Dec 2022 04:33:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=f+8uGGe5vOHJy+eJbczvZUafBVT7vGqMpKgjkjdLPuI=; b=tcqTn2obQf6CbBT4rNITkZLHWBYQLrodivwc8sWIEM+ZtA8FvvSKHEO3iImOl2evuH fd6gl+Kn6aU7TmqewmevTq5xzUuVks6WM5BE12+V2FA80OLRBs/dATSgK5jieZarWOk4 xv9gyA7iG18WMx+A+J/f/uOMeMIOGQk5Qf0qWwJhFFeDEP0m3MS+I+e1x1pVQwNYezzu k8eG9TeW03WbwjF1b+UNoNrsnz9g3W0Eq6xlXW91UTVhuezYcW27GRZK5gJg/QVNcJbG 4xeg4P1bxeRiAmUgelLqB34waeS9h+S1DRAt05jAV6mS9gJMMYBNhKEnYS+hJvRhFKg1 ipfw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=f+8uGGe5vOHJy+eJbczvZUafBVT7vGqMpKgjkjdLPuI=; b=nfYGdey9y/lJ0JfcCx4fSg00PNN8y2n0J4hXFyLiqQGCLbgObnVkxffym3YPgrel9l cOQNXQosOpLmueulPZxjyuaC3r8qFiS2wRbpW6PHU8T7vn/Z1mzGMlvofHGw2f2UINHh iqEyvaesJAPkujypDBzApRQkUL5jbeWibjpj8rdonQ0uSq81A57pv/Y4RX8W9VY8+GSW GNzbOlwejhm23fP/VOJpg2hlzNQyjkUad3HqIrIT2tHnRnlkvz/CY46GpyeRQ27ldvI/ xyElVfAAb5kSSQml1DfsPM1EvriecL7sFYJhxTTsRfdJQxK28SoW0xTmaaHRCgRWWbFr 5oBQ== X-Gm-Message-State: ANoB5pld+4ujMDl+qCs08ASVN7vsF5UUJpSyXdh1uON36HikDGBKXI01 ijNc0sQgrNE6OQpBbGS0NTRicQ== X-Received: by 2002:a17:903:2412:b0:188:f47f:ac06 with SMTP id e18-20020a170903241200b00188f47fac06mr14926109plo.19.1670848385329; Mon, 12 Dec 2022 04:33:05 -0800 (PST) Received: from devtp.bytedance.net ([139.177.225.227]) by smtp.gmail.com with ESMTPSA id jw1-20020a170903278100b001869ba04c83sm6219987plb.245.2022.12.12.04.32.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 12 Dec 2022 04:33:05 -0800 (PST) From: wuqiang To: mhiramat@kernel.org, davem@davemloft.net, anil.s.keshavamurthy@intel.com, naveen.n.rao@linux.ibm.com, rostedt@goodmis.org, peterz@infradead.org, akpm@linux-foundation.org, sander@svanheule.net, ebiggers@google.com, dan.j.williams@intel.com, jpoimboe@kernel.org Cc: linux-kernel@vger.kernel.org, lkp@intel.com, mattwu@163.com, wuqiang Subject: [PATCH v7 5/5] MAINTAINERS: objpool added Date: Mon, 12 Dec 2022 20:31:53 +0800 Message-Id: <20221212123153.190888-6-wuqiang.matt@bytedance.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20221212123153.190888-1-wuqiang.matt@bytedance.com> References: <20221212123153.190888-1-wuqiang.matt@bytedance.com> MIME-Version: 1.0 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752011626996775004?= X-GMAIL-MSGID: =?utf-8?q?1752011626996775004?= ojbpool, a scalable and lockless ring-array based object pool, was introduced to replace the original freelist (a LIFO queue based on singly linked list) to improve kretprobe scalability. Signed-off-by: wuqiang --- MAINTAINERS | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 886d3f69ee64..9584aa440eb9 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -14914,6 +14914,13 @@ F: include/linux/objagg.h F: lib/objagg.c F: lib/test_objagg.c +OBJPOOL +M: Matt Wu +S: Supported +F: include/linux/objpool.h +F: lib/objpool.c +F: lib/test_objpool.c + OBJTOOL M: Josh Poimboeuf M: Peter Zijlstra