Message ID | 871qn1wofe.ffs@tglx |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:2388:b0:96:219d:e725 with SMTP id i8csp4035998dyf; Tue, 7 Feb 2023 06:22:07 -0800 (PST) X-Google-Smtp-Source: AK7set8hhAkiiAFGrxJjA/FFET+yPz5XQlA5BgogUojMyoU8/esjsXtci1dwGViwz1D3DvCTJ8Te X-Received: by 2002:a17:906:2a0b:b0:887:d0e6:fa24 with SMTP id j11-20020a1709062a0b00b00887d0e6fa24mr3359581eje.76.1675779727193; Tue, 07 Feb 2023 06:22:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1675779727; cv=none; d=google.com; s=arc-20160816; b=X75PbCE4DLzqC+1FAvLL0nsww7WAD+RKCUgaAaN1Rbu1Ya+mSCOI+qKFWM4S6RQfnB ufOv1k/7ATmduLK3xDFdiap//Jlgi5dBIIdzy4s0Ig2b1+QusNNFS26cba4hp2lnGwoH 9IuHr1w6S5ZaRPCJJUeTjrBLR07rZmh7r+772XClAKre880LXEBIzEpJejKtKGs2bflB fM0zzQKotNKbELGn4B5pGKAzRQ5+L/PTH1/qaIfeJbni3eCNnVlz84oEkeiG31icjc3a LGAFOL5DvAweFgo7RGVRFlSUi8S5U5k4qNUuuBLDnB+5cL8O1jyRy9bVFkJfx0JcPEHF iclg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:references :in-reply-to:subject:cc:to:dkim-signature:dkim-signature:from; bh=pMBsGdGaUVtpVb5l7qAv/CD26vfZYLfuerCxMvwtG94=; b=UmIoAdQcMkF/IFALNZ9sLmo4TY8Zr5cmQcYWcVYGLXjOm5HRmDAAS5BiprMy7f9w1b +EpqC87Y1A2Oaa0xr+euTIMtmkeQhBznoKxJm7hYpN+0wDfsX61HXG3TI7q+MimMzuak LX++YL3wyfGhmIc1HOf7/f4+dVGOiJr9p1wEOON20TZcRRUV1xuZCukpHrMj0zEizwkt urTafV2zXhHGlPq6WZxwB232/kw2sL3xuHDehvVf0SGR+DZuq5VFk6Ssa1xqYxeOxptN ed87OvVNeEqZJ+uFaYk3u3iuNBzNTQJem49UVs3T5dXV6LSKSHPB8GUrjoeKFEgiPuVq 015A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=Z1VjMIto; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=5KIdDvzl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ep21-20020a1709069b5500b008aab2dfd243si821148ejc.430.2023.02.07.06.21.43; Tue, 07 Feb 2023 06:22:07 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=Z1VjMIto; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=5KIdDvzl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232220AbjBGOQ7 (ORCPT <rfc822;kmanaouilinux@gmail.com> + 99 others); Tue, 7 Feb 2023 09:16:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57700 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231713AbjBGOQ5 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 7 Feb 2023 09:16:57 -0500 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1CC0B1D916 for <linux-kernel@vger.kernel.org>; Tue, 7 Feb 2023 06:16:55 -0800 (PST) From: Thomas Gleixner <tglx@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1675779413; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=pMBsGdGaUVtpVb5l7qAv/CD26vfZYLfuerCxMvwtG94=; b=Z1VjMItoihm9Q5k80PT5C1fQWCpwP9WvO3o6gWf/0/msrRbYyI0Kp6aOBcDM0zvpeovF59 jHRwxOr8kpnpQAXJdGJxlvJe+5jqfxkq3hhE7uV9PaHEok2TmSwhjskPAMlCeyhdX17SKE EhURhyr5f1g06F/fHrO13pN2IXxHYHwvkXUl190+XFhs7qF2mYWQltO98PsbGClOgMx+dy AZsywvOvz/YhLa2Hees8YcOXNGW7GfMtl8GvSNcwEDj8Vmae1XO0QjqKTduDpTqfILYTTm t0alDHuV9ytg7gsXT3J0hAmEZm7xdEokgU8kYEbfx2048bBJHY20xIbgO1DJdQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1675779413; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=pMBsGdGaUVtpVb5l7qAv/CD26vfZYLfuerCxMvwtG94=; b=5KIdDvzlVcK/H2duY4vqVIvrkGkr2fYRRVcFh/a6cnDuLXvioiD+aWcuU+STnf3sv5o4j4 wUdzUVDmg6uOomDw== To: Vlastimil Babka <vbabka@suse.cz>, kernel test robot <oliver.sang@intel.com>, Shanker Donthineni <sdonthineni@nvidia.com> Cc: oe-lkp@lists.linux.dev, lkp@intel.com, linux-kernel@vger.kernel.org, Marc Zyngier <maz@kernel.org>, Michael Walle <michael@walle.cc>, Sebastian Andrzej Siewior <bigeasy@linutronix.de>, Hans de Goede <hdegoede@redhat.com>, Wolfram Sang <wsa+renesas@sang-engineering.com>, linux-mm@kvack.org, "Liam R. Howlett" <Liam.Howlett@oracle.com>, Matthew Wilcox <willy@infradead.org> Subject: mm, slab/slub: Ensure kmem_cache_alloc_bulk() is available early In-Reply-To: <875ycdwyx6.ffs@tglx> References: <202302011308.f53123d2-oliver.sang@intel.com> <87o7qdzfay.ffs@tglx> <9a682773-df56-f36c-f582-e8eeef55d7f8@suse.cz> <875ycdwyx6.ffs@tglx> Date: Tue, 07 Feb 2023 15:16:53 +0100 Message-ID: <871qn1wofe.ffs@tglx> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1757182403203641579?= X-GMAIL-MSGID: =?utf-8?q?1757182403203641579?= |
Series |
mm, slab/slub: Ensure kmem_cache_alloc_bulk() is available early
|
|
Commit Message
Thomas Gleixner
Feb. 7, 2023, 2:16 p.m. UTC
The memory allocators are available during early boot even in the phase
where interrupts are disabled and scheduling is not yet possible.
The setup is so that GFP_KERNEL allocations work in this phase without
causing might_alloc() splats to be emitted because the system state is
SYSTEM_BOOTING at that point which prevents the warnings to trigger.
Most allocation/free functions use local_irq_save()/restore() or a lock
variant of that. But kmem_cache_alloc_bulk() and kmem_cache_free_bulk() use
local_[lock]_irq_disable()/enable(), which leads to a lockdep warning when
interrupts are enabled during the early boot phase.
This went unnoticed so far as there are no early users of these
interfaces. The upcoming conversion of the interrupt descriptor store from
radix_tree to maple_tree triggered this warning as maple_tree uses the bulk
interface.
Cure this by moving the kmem_cache_alloc/free() bulk variants of SLUB and
SLAB to local[_lock]_irq_save()/restore().
There is obviously no reclaim possible and required at this point so there
is no need to expand this coverage further.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
Initial version: https://lore.kernel.org/r/87o7qdzfay.ffs@tglx
Changes: Update SLAB as well and add changelog
---
mm/slab.c | 18 ++++++++++--------
mm/slub.c | 9 +++++----
2 files changed, 15 insertions(+), 12 deletions(-)
Comments
On 2/7/23 15:16, Thomas Gleixner wrote: > The memory allocators are available during early boot even in the phase > where interrupts are disabled and scheduling is not yet possible. > > The setup is so that GFP_KERNEL allocations work in this phase without > causing might_alloc() splats to be emitted because the system state is > SYSTEM_BOOTING at that point which prevents the warnings to trigger. > > Most allocation/free functions use local_irq_save()/restore() or a lock > variant of that. But kmem_cache_alloc_bulk() and kmem_cache_free_bulk() use > local_[lock]_irq_disable()/enable(), which leads to a lockdep warning when > interrupts are enabled during the early boot phase. > > This went unnoticed so far as there are no early users of these > interfaces. The upcoming conversion of the interrupt descriptor store from > radix_tree to maple_tree triggered this warning as maple_tree uses the bulk > interface. > > Cure this by moving the kmem_cache_alloc/free() bulk variants of SLUB and > SLAB to local[_lock]_irq_save()/restore(). > > There is obviously no reclaim possible and required at this point so there > is no need to expand this coverage further. > > No functional change. > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> +Cc rest of slab folks Thanks, added to slab/for-6.3/fixes > --- > Initial version: https://lore.kernel.org/r/87o7qdzfay.ffs@tglx > Changes: Update SLAB as well and add changelog > --- > mm/slab.c | 18 ++++++++++-------- > mm/slub.c | 9 +++++---- > 2 files changed, 15 insertions(+), 12 deletions(-) > > --- a/mm/slab.c > +++ b/mm/slab.c > @@ -3479,14 +3479,15 @@ cache_alloc_debugcheck_after_bulk(struct > int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, > void **p) > { > - size_t i; > struct obj_cgroup *objcg = NULL; > + unsigned long irqflags; > + size_t i; > > s = slab_pre_alloc_hook(s, NULL, &objcg, size, flags); > if (!s) > return 0; > > - local_irq_disable(); > + local_irq_save(irqflags); > for (i = 0; i < size; i++) { > void *objp = kfence_alloc(s, s->object_size, flags) ?: > __do_cache_alloc(s, flags, NUMA_NO_NODE); > @@ -3495,7 +3496,7 @@ int kmem_cache_alloc_bulk(struct kmem_ca > goto error; > p[i] = objp; > } > - local_irq_enable(); > + local_irq_restore(irqflags); > > cache_alloc_debugcheck_after_bulk(s, flags, size, p, _RET_IP_); > > @@ -3508,7 +3509,7 @@ int kmem_cache_alloc_bulk(struct kmem_ca > /* FIXME: Trace call missing. Christoph would like a bulk variant */ > return size; > error: > - local_irq_enable(); > + local_irq_restore(irqflags); > cache_alloc_debugcheck_after_bulk(s, flags, i, p, _RET_IP_); > slab_post_alloc_hook(s, objcg, flags, i, p, false, s->object_size); > kmem_cache_free_bulk(s, i, p); > @@ -3610,8 +3611,9 @@ EXPORT_SYMBOL(kmem_cache_free); > > void kmem_cache_free_bulk(struct kmem_cache *orig_s, size_t size, void **p) > { > + unsigned long flags; > > - local_irq_disable(); > + local_irq_save(flags); > for (int i = 0; i < size; i++) { > void *objp = p[i]; > struct kmem_cache *s; > @@ -3621,9 +3623,9 @@ void kmem_cache_free_bulk(struct kmem_ca > > /* called via kfree_bulk */ > if (!folio_test_slab(folio)) { > - local_irq_enable(); > + local_irq_restore(flags); > free_large_kmalloc(folio, objp); > - local_irq_disable(); > + local_irq_save(flags); > continue; > } > s = folio_slab(folio)->slab_cache; > @@ -3640,7 +3642,7 @@ void kmem_cache_free_bulk(struct kmem_ca > > __cache_free(s, objp, _RET_IP_); > } > - local_irq_enable(); > + local_irq_restore(flags); > > /* FIXME: add tracing */ > } > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -3913,6 +3913,7 @@ static inline int __kmem_cache_alloc_bul > size_t size, void **p, struct obj_cgroup *objcg) > { > struct kmem_cache_cpu *c; > + unsigned long irqflags; > int i; > > /* > @@ -3921,7 +3922,7 @@ static inline int __kmem_cache_alloc_bul > * handlers invoking normal fastpath. > */ > c = slub_get_cpu_ptr(s->cpu_slab); > - local_lock_irq(&s->cpu_slab->lock); > + local_lock_irqsave(&s->cpu_slab->lock, irqflags); > > for (i = 0; i < size; i++) { > void *object = kfence_alloc(s, s->object_size, flags); > @@ -3942,7 +3943,7 @@ static inline int __kmem_cache_alloc_bul > */ > c->tid = next_tid(c->tid); > > - local_unlock_irq(&s->cpu_slab->lock); > + local_unlock_irqrestore(&s->cpu_slab->lock, irqflags); > > /* > * Invoking slow path likely have side-effect > @@ -3956,7 +3957,7 @@ static inline int __kmem_cache_alloc_bul > c = this_cpu_ptr(s->cpu_slab); > maybe_wipe_obj_freeptr(s, p[i]); > > - local_lock_irq(&s->cpu_slab->lock); > + local_lock_irqsave(&s->cpu_slab->lock, irqflags); > > continue; /* goto for-loop */ > } > @@ -3965,7 +3966,7 @@ static inline int __kmem_cache_alloc_bul > maybe_wipe_obj_freeptr(s, p[i]); > } > c->tid = next_tid(c->tid); > - local_unlock_irq(&s->cpu_slab->lock); > + local_unlock_irqrestore(&s->cpu_slab->lock, irqflags); > slub_put_cpu_ptr(s->cpu_slab); > > return i;
On 2/7/23 15:45, Vlastimil Babka wrote: > On 2/7/23 15:16, Thomas Gleixner wrote: >> The memory allocators are available during early boot even in the phase >> where interrupts are disabled and scheduling is not yet possible. >> >> The setup is so that GFP_KERNEL allocations work in this phase without >> causing might_alloc() splats to be emitted because the system state is >> SYSTEM_BOOTING at that point which prevents the warnings to trigger. >> >> Most allocation/free functions use local_irq_save()/restore() or a lock >> variant of that. But kmem_cache_alloc_bulk() and kmem_cache_free_bulk() use >> local_[lock]_irq_disable()/enable(), which leads to a lockdep warning when >> interrupts are enabled during the early boot phase. >> >> This went unnoticed so far as there are no early users of these >> interfaces. The upcoming conversion of the interrupt descriptor store from >> radix_tree to maple_tree triggered this warning as maple_tree uses the bulk >> interface. >> >> Cure this by moving the kmem_cache_alloc/free() bulk variants of SLUB and >> SLAB to local[_lock]_irq_save()/restore(). >> >> There is obviously no reclaim possible and required at this point so there >> is no need to expand this coverage further. >> >> No functional change. >> >> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > > +Cc rest of slab folks > > Thanks, added to slab/for-6.3/fixes After your patch, I think it also makes sense to do the following: ----8<---- From 340d7c7b99f3e67780f6dec480ed1d27e6f325eb Mon Sep 17 00:00:00 2001 From: Vlastimil Babka <vbabka@suse.cz> Date: Tue, 7 Feb 2023 15:34:53 +0100 Subject: [PATCH] mm, slab/slub: remove notes that bulk alloc/free needs interrupts enabled The slab functions kmem_cache_[alloc|free]_bulk() have been documented as requiring interrupts to be enabled, since their addition in 2015. It's unclear whether that was a fundamental restriction, or an attempt to save some cpu cycles by not having to save and restore the irq flags. However, it appears that most of the code involved was/became safe to be called with interrupts disabled, and the remaining bits were fixed by commit f244b0182b8e ("mm, slab/slub: Ensure kmem_cache_alloc_bulk() is available early"). While the commit was aimed at early boot scenario, we can now also remove the documented restrictions for any interrupt disabled scenarios. Signed-off-by: Vlastimil Babka <vbabka@suse.cz> --- include/linux/slab.h | 2 -- mm/slub.c | 2 -- 2 files changed, 4 deletions(-) diff --git a/include/linux/slab.h b/include/linux/slab.h index 45af70315a94..ea439b4e2b34 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -481,8 +481,6 @@ void kmem_cache_free(struct kmem_cache *s, void *objp); * Bulk allocation and freeing operations. These are accelerated in an * allocator specific way to avoid taking locks repeatedly or building * metadata structures unnecessarily. - * - * Note that interrupts must be enabled when calling these functions. */ void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p); int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, void **p); diff --git a/mm/slub.c b/mm/slub.c index c16d78698e3f..23b3fb86045d 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3889,7 +3889,6 @@ int build_detached_freelist(struct kmem_cache *s, size_t size, return same; } -/* Note that interrupts must be enabled when calling this function. */ void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p) { if (!size) @@ -4009,7 +4008,6 @@ static int __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, } #endif /* CONFIG_SLUB_TINY */ -/* Note that interrupts must be enabled when calling this function. */ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, void **p) {
On Tue, Feb 07 2023 at 15:47, Vlastimil Babka wrote: > From 340d7c7b99f3e67780f6dec480ed1d27e6f325eb Mon Sep 17 00:00:00 2001 > From: Vlastimil Babka <vbabka@suse.cz> > Date: Tue, 7 Feb 2023 15:34:53 +0100 > Subject: [PATCH] mm, slab/slub: remove notes that bulk alloc/free needs > interrupts enabled > > The slab functions kmem_cache_[alloc|free]_bulk() have been documented > as requiring interrupts to be enabled, since their addition in 2015. > It's unclear whether that was a fundamental restriction, or an attempt > to save some cpu cycles by not having to save and restore the irq > flags. I don't think so. The restriction is rather meant to avoid huge allocations in atomic context which causes latencies and also might deplete the atomic reserves. So I rather avoid that and enforce !ATOMIC mode despite the local_irq_save/restore() change which is really only to accomodate with early boot. Thanks, tglx
On 2/7/23 19:20, Thomas Gleixner wrote: > On Tue, Feb 07 2023 at 15:47, Vlastimil Babka wrote: >> From 340d7c7b99f3e67780f6dec480ed1d27e6f325eb Mon Sep 17 00:00:00 2001 >> From: Vlastimil Babka <vbabka@suse.cz> >> Date: Tue, 7 Feb 2023 15:34:53 +0100 >> Subject: [PATCH] mm, slab/slub: remove notes that bulk alloc/free needs >> interrupts enabled >> >> The slab functions kmem_cache_[alloc|free]_bulk() have been documented >> as requiring interrupts to be enabled, since their addition in 2015. >> It's unclear whether that was a fundamental restriction, or an attempt >> to save some cpu cycles by not having to save and restore the irq >> flags. > > I don't think so. The restriction is rather meant to avoid huge > allocations in atomic context which causes latencies and also might > deplete the atomic reserves. Fair enough. > So I rather avoid that and enforce !ATOMIC mode despite the > local_irq_save/restore() change which is really only to accomodate with > early boot. We could add some warning then? People might use the bulk alloc unknowingly again e.g. via maple tree. GFP_KERNEL would warn through the existing warning, but e.g. GFP_ATOMIC currently not. Some maple tree users could use its preallocation instead outside of the atomic context, when possible. > Thanks, > > tglx
On Tue, Feb 07, 2023 at 03:16:53PM +0100, Thomas Gleixner wrote: > The memory allocators are available during early boot even in the phase > where interrupts are disabled and scheduling is not yet possible. > > The setup is so that GFP_KERNEL allocations work in this phase without > causing might_alloc() splats to be emitted because the system state is > SYSTEM_BOOTING at that point which prevents the warnings to trigger. > > Most allocation/free functions use local_irq_save()/restore() or a lock > variant of that. But kmem_cache_alloc_bulk() and kmem_cache_free_bulk() use > local_[lock]_irq_disable()/enable(), which leads to a lockdep warning when > interrupts are enabled during the early boot phase. > > This went unnoticed so far as there are no early users of these > interfaces. The upcoming conversion of the interrupt descriptor store from > radix_tree to maple_tree triggered this warning as maple_tree uses the bulk > interface. > > Cure this by moving the kmem_cache_alloc/free() bulk variants of SLUB and > SLAB to local[_lock]_irq_save()/restore(). > > There is obviously no reclaim possible and required at this point so there > is no need to expand this coverage further. > > No functional change. > > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > --- > Initial version: https://lore.kernel.org/r/87o7qdzfay.ffs@tglx > Changes: Update SLAB as well and add changelog > --- > mm/slab.c | 18 ++++++++++-------- > mm/slub.c | 9 +++++---- > 2 files changed, 15 insertions(+), 12 deletions(-) > > --- a/mm/slab.c > +++ b/mm/slab.c > @@ -3479,14 +3479,15 @@ cache_alloc_debugcheck_after_bulk(struct > int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, > void **p) > { > - size_t i; > struct obj_cgroup *objcg = NULL; > + unsigned long irqflags; > + size_t i; > > s = slab_pre_alloc_hook(s, NULL, &objcg, size, flags); > if (!s) > return 0; > > - local_irq_disable(); > + local_irq_save(irqflags); > for (i = 0; i < size; i++) { > void *objp = kfence_alloc(s, s->object_size, flags) ?: > __do_cache_alloc(s, flags, NUMA_NO_NODE); > @@ -3495,7 +3496,7 @@ int kmem_cache_alloc_bulk(struct kmem_ca > goto error; > p[i] = objp; > } > - local_irq_enable(); > + local_irq_restore(irqflags); > > cache_alloc_debugcheck_after_bulk(s, flags, size, p, _RET_IP_); > > @@ -3508,7 +3509,7 @@ int kmem_cache_alloc_bulk(struct kmem_ca > /* FIXME: Trace call missing. Christoph would like a bulk variant */ > return size; > error: > - local_irq_enable(); > + local_irq_restore(irqflags); > cache_alloc_debugcheck_after_bulk(s, flags, i, p, _RET_IP_); > slab_post_alloc_hook(s, objcg, flags, i, p, false, s->object_size); > kmem_cache_free_bulk(s, i, p); > @@ -3610,8 +3611,9 @@ EXPORT_SYMBOL(kmem_cache_free); > > void kmem_cache_free_bulk(struct kmem_cache *orig_s, size_t size, void **p) > { > + unsigned long flags; > > - local_irq_disable(); > + local_irq_save(flags); > for (int i = 0; i < size; i++) { > void *objp = p[i]; > struct kmem_cache *s; > @@ -3621,9 +3623,9 @@ void kmem_cache_free_bulk(struct kmem_ca > > /* called via kfree_bulk */ > if (!folio_test_slab(folio)) { > - local_irq_enable(); > + local_irq_restore(flags); > free_large_kmalloc(folio, objp); > - local_irq_disable(); > + local_irq_save(flags); > continue; > } > s = folio_slab(folio)->slab_cache; > @@ -3640,7 +3642,7 @@ void kmem_cache_free_bulk(struct kmem_ca > > __cache_free(s, objp, _RET_IP_); > } > - local_irq_enable(); > + local_irq_restore(flags); > > /* FIXME: add tracing */ > } > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -3913,6 +3913,7 @@ static inline int __kmem_cache_alloc_bul > size_t size, void **p, struct obj_cgroup *objcg) > { > struct kmem_cache_cpu *c; > + unsigned long irqflags; > int i; > > /* > @@ -3921,7 +3922,7 @@ static inline int __kmem_cache_alloc_bul > * handlers invoking normal fastpath. > */ > c = slub_get_cpu_ptr(s->cpu_slab); > - local_lock_irq(&s->cpu_slab->lock); > + local_lock_irqsave(&s->cpu_slab->lock, irqflags); > > for (i = 0; i < size; i++) { > void *object = kfence_alloc(s, s->object_size, flags); > @@ -3942,7 +3943,7 @@ static inline int __kmem_cache_alloc_bul > */ > c->tid = next_tid(c->tid); > > - local_unlock_irq(&s->cpu_slab->lock); > + local_unlock_irqrestore(&s->cpu_slab->lock, irqflags); > > /* > * Invoking slow path likely have side-effect > @@ -3956,7 +3957,7 @@ static inline int __kmem_cache_alloc_bul > c = this_cpu_ptr(s->cpu_slab); > maybe_wipe_obj_freeptr(s, p[i]); > > - local_lock_irq(&s->cpu_slab->lock); > + local_lock_irqsave(&s->cpu_slab->lock, irqflags); > > continue; /* goto for-loop */ > } > @@ -3965,7 +3966,7 @@ static inline int __kmem_cache_alloc_bul > maybe_wipe_obj_freeptr(s, p[i]); > } > c->tid = next_tid(c->tid); > - local_unlock_irq(&s->cpu_slab->lock); > + local_unlock_irqrestore(&s->cpu_slab->lock, irqflags); Looks good to me. Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Thanks! > slub_put_cpu_ptr(s->cpu_slab); > > return i; >
On Wed, Feb 08 2023 at 10:15, Vlastimil Babka wrote: Cc+ Willy > On 2/7/23 19:20, Thomas Gleixner wrote: >> On Tue, Feb 07 2023 at 15:47, Vlastimil Babka wrote: >>> From 340d7c7b99f3e67780f6dec480ed1d27e6f325eb Mon Sep 17 00:00:00 2001 >>> From: Vlastimil Babka <vbabka@suse.cz> >>> Date: Tue, 7 Feb 2023 15:34:53 +0100 >>> Subject: [PATCH] mm, slab/slub: remove notes that bulk alloc/free needs >>> interrupts enabled >>> >>> The slab functions kmem_cache_[alloc|free]_bulk() have been documented >>> as requiring interrupts to be enabled, since their addition in 2015. >>> It's unclear whether that was a fundamental restriction, or an attempt >>> to save some cpu cycles by not having to save and restore the irq >>> flags. >> >> I don't think so. The restriction is rather meant to avoid huge >> allocations in atomic context which causes latencies and also might >> deplete the atomic reserves. > > Fair enough. > >> So I rather avoid that and enforce !ATOMIC mode despite the >> local_irq_save/restore() change which is really only to accomodate with >> early boot. > > We could add some warning then? People might use the bulk alloc unknowingly > again e.g. via maple tree. GFP_KERNEL would warn through the existing > warning, but e.g. GFP_ATOMIC currently not. Correct. > Some maple tree users could use its preallocation instead outside of the > atomic context, when possible. Right. The issue is that there might be maple_tree users which depend on GFP_ATOMIC, but call in from interrupt enabled context, which is legitimate today. Willy might have some insight on that. Thanks, tglx
On Wed, Feb 08, 2023 at 09:46:30PM +0100, Thomas Gleixner wrote: > On Wed, Feb 08 2023 at 10:15, Vlastimil Babka wrote: > > Cc+ Willy > > > On 2/7/23 19:20, Thomas Gleixner wrote: > >> On Tue, Feb 07 2023 at 15:47, Vlastimil Babka wrote: > >>> From 340d7c7b99f3e67780f6dec480ed1d27e6f325eb Mon Sep 17 00:00:00 2001 > >>> From: Vlastimil Babka <vbabka@suse.cz> > >>> Date: Tue, 7 Feb 2023 15:34:53 +0100 > >>> Subject: [PATCH] mm, slab/slub: remove notes that bulk alloc/free needs > >>> interrupts enabled > >>> > >>> The slab functions kmem_cache_[alloc|free]_bulk() have been documented > >>> as requiring interrupts to be enabled, since their addition in 2015. > >>> It's unclear whether that was a fundamental restriction, or an attempt > >>> to save some cpu cycles by not having to save and restore the irq > >>> flags. > >> > >> I don't think so. The restriction is rather meant to avoid huge > >> allocations in atomic context which causes latencies and also might > >> deplete the atomic reserves. > > > > Fair enough. > > > >> So I rather avoid that and enforce !ATOMIC mode despite the > >> local_irq_save/restore() change which is really only to accomodate with > >> early boot. > > > > We could add some warning then? People might use the bulk alloc unknowingly > > again e.g. via maple tree. GFP_KERNEL would warn through the existing > > warning, but e.g. GFP_ATOMIC currently not. > > Correct. > > > Some maple tree users could use its preallocation instead outside of the > > atomic context, when possible. > > Right. > > The issue is that there might be maple_tree users which depend on > GFP_ATOMIC, but call in from interrupt enabled context, which is > legitimate today. > > Willy might have some insight on that. Not today, but eventually. There are XArray users which modify the tree in interrupt context or under some other spinlock that we can't drop for them in order to do an allocation. And I want to replace the radix tree underpinnings of the XArray with the maple tree. In my copious spare time.
On Thu, Feb 09 2023 at 20:28, Matthew Wilcox wrote: > On Wed, Feb 08, 2023 at 09:46:30PM +0100, Thomas Gleixner wrote: >> The issue is that there might be maple_tree users which depend on >> GFP_ATOMIC, but call in from interrupt enabled context, which is >> legitimate today. >> >> Willy might have some insight on that. > > Not today, but eventually. There are XArray users which modify the tree > in interrupt context or under some other spinlock that we can't drop > for them in order to do an allocation. And I want to replace the radix > tree underpinnings of the XArray with the maple tree. In my copious > spare time. If any usage which you described, i.e. interrupt context or with a spinlock held, where interrupts were disabled on acquisition of the lock, ends up calling into kmem_cache_alloc_bulk() today, then that's broken because kmem_cache_alloc_bulk() reenables interrupts unconditionally. So either such code does not exist as of today or it just gets lucky to not run into the code path leading up to kmem_cache_alloc_bulk(). We have to clarify what the valid calling convention of kmem_cache_alloc_bulk() is in the regular kernel context, i.e. outside of early boot. Thanks, tglx
--- a/mm/slab.c +++ b/mm/slab.c @@ -3479,14 +3479,15 @@ cache_alloc_debugcheck_after_bulk(struct int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, void **p) { - size_t i; struct obj_cgroup *objcg = NULL; + unsigned long irqflags; + size_t i; s = slab_pre_alloc_hook(s, NULL, &objcg, size, flags); if (!s) return 0; - local_irq_disable(); + local_irq_save(irqflags); for (i = 0; i < size; i++) { void *objp = kfence_alloc(s, s->object_size, flags) ?: __do_cache_alloc(s, flags, NUMA_NO_NODE); @@ -3495,7 +3496,7 @@ int kmem_cache_alloc_bulk(struct kmem_ca goto error; p[i] = objp; } - local_irq_enable(); + local_irq_restore(irqflags); cache_alloc_debugcheck_after_bulk(s, flags, size, p, _RET_IP_); @@ -3508,7 +3509,7 @@ int kmem_cache_alloc_bulk(struct kmem_ca /* FIXME: Trace call missing. Christoph would like a bulk variant */ return size; error: - local_irq_enable(); + local_irq_restore(irqflags); cache_alloc_debugcheck_after_bulk(s, flags, i, p, _RET_IP_); slab_post_alloc_hook(s, objcg, flags, i, p, false, s->object_size); kmem_cache_free_bulk(s, i, p); @@ -3610,8 +3611,9 @@ EXPORT_SYMBOL(kmem_cache_free); void kmem_cache_free_bulk(struct kmem_cache *orig_s, size_t size, void **p) { + unsigned long flags; - local_irq_disable(); + local_irq_save(flags); for (int i = 0; i < size; i++) { void *objp = p[i]; struct kmem_cache *s; @@ -3621,9 +3623,9 @@ void kmem_cache_free_bulk(struct kmem_ca /* called via kfree_bulk */ if (!folio_test_slab(folio)) { - local_irq_enable(); + local_irq_restore(flags); free_large_kmalloc(folio, objp); - local_irq_disable(); + local_irq_save(flags); continue; } s = folio_slab(folio)->slab_cache; @@ -3640,7 +3642,7 @@ void kmem_cache_free_bulk(struct kmem_ca __cache_free(s, objp, _RET_IP_); } - local_irq_enable(); + local_irq_restore(flags); /* FIXME: add tracing */ } --- a/mm/slub.c +++ b/mm/slub.c @@ -3913,6 +3913,7 @@ static inline int __kmem_cache_alloc_bul size_t size, void **p, struct obj_cgroup *objcg) { struct kmem_cache_cpu *c; + unsigned long irqflags; int i; /* @@ -3921,7 +3922,7 @@ static inline int __kmem_cache_alloc_bul * handlers invoking normal fastpath. */ c = slub_get_cpu_ptr(s->cpu_slab); - local_lock_irq(&s->cpu_slab->lock); + local_lock_irqsave(&s->cpu_slab->lock, irqflags); for (i = 0; i < size; i++) { void *object = kfence_alloc(s, s->object_size, flags); @@ -3942,7 +3943,7 @@ static inline int __kmem_cache_alloc_bul */ c->tid = next_tid(c->tid); - local_unlock_irq(&s->cpu_slab->lock); + local_unlock_irqrestore(&s->cpu_slab->lock, irqflags); /* * Invoking slow path likely have side-effect @@ -3956,7 +3957,7 @@ static inline int __kmem_cache_alloc_bul c = this_cpu_ptr(s->cpu_slab); maybe_wipe_obj_freeptr(s, p[i]); - local_lock_irq(&s->cpu_slab->lock); + local_lock_irqsave(&s->cpu_slab->lock, irqflags); continue; /* goto for-loop */ } @@ -3965,7 +3966,7 @@ static inline int __kmem_cache_alloc_bul maybe_wipe_obj_freeptr(s, p[i]); } c->tid = next_tid(c->tid); - local_unlock_irq(&s->cpu_slab->lock); + local_unlock_irqrestore(&s->cpu_slab->lock, irqflags); slub_put_cpu_ptr(s->cpu_slab); return i;