Message ID | 20230605201107.83298-1-lstoakes@gmail.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp2932205vqr; Mon, 5 Jun 2023 13:20:50 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7S3l8IQzs6F5AwlDLRmEcrD+LHAv/ImRsVeZdE3cbZuCP7QsNRkWOkHEyY0MVtNWdrGiIv X-Received: by 2002:a37:654a:0:b0:75d:560f:2048 with SMTP id z71-20020a37654a000000b0075d560f2048mr743063qkb.74.1685996450136; Mon, 05 Jun 2023 13:20:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685996450; cv=none; d=google.com; s=arc-20160816; b=A6zpk0DcDab5QN3xF3oH/PTop7OU7Apt61rBtaFXwfbaMfLVOLtZzAQiQKlLwpvLSy 5tALXZe+NQ1Ls4G5j11ZZFY15Jp1L2R9McaL4BpjgVuee1wpaYtXq/pl3JBrBh3uG+Do d2kh6yxsMIxADDYmyr+0h/a7MgTib1M3rJmXWkvHJWSO6D9by/OWQxIDYwo7o8tS8jrO bC7iUAbmdoi/t4af/gnwkRlM9f63isabnPf/48kSPt1QA2ci/4CDQaKGU8wEEA/om9pQ SBNEqEyZt7j4Me13HR0t4GbVjDfemikQ2Mqx81dqX7AdCRk6szmm2vYP64rHO1HSRWT/ 3ZfQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=I03V9LL6wy3VgyRgg/zC0hcUQtAfkY257wxzzkjPoYM=; b=a/0eu7YR1fCYjaXfdLxbTqzfnc7H3qGxuOr3+Wqk9aSgybGTXDCy8ep4cyj9xEliD3 sicmbpMs7I/eoNiqbFoZ4Dbl4kv3YVcTDty5F/X4QDExIB4CuPOuQct/Qzh+d+MxUZBj oWMcaYF+BxQYcfz/Ytk5+MSskBGo/EwsjEOxDOWVwBtwB1ytpLxlTZLOH8w334zvI2HY EWfdloiEibKo7V3xSUITHWfU/GjX0HPCRid0QrFoXXn32ZeIbYxpO8CG17bdNhYfSl2G lAzjbUkov44JoJCLaoeMA8HW0fs5qgLmDy45FfLIbl3hU+5WzDRsU5nRSy28t3fNtGqH Z5Ug== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=SgFJczam; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f20-20020a05620a15b400b0075cf4272692si354991qkk.586.2023.06.05.13.20.36; Mon, 05 Jun 2023 13:20:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=SgFJczam; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231643AbjFEULT (ORCPT <rfc822;xxoosimple@gmail.com> + 99 others); Mon, 5 Jun 2023 16:11:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42394 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231319AbjFEULQ (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 5 Jun 2023 16:11:16 -0400 Received: from mail-wr1-x42f.google.com (mail-wr1-x42f.google.com [IPv6:2a00:1450:4864:20::42f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D463699 for <linux-kernel@vger.kernel.org>; Mon, 5 Jun 2023 13:11:15 -0700 (PDT) Received: by mail-wr1-x42f.google.com with SMTP id ffacd0b85a97d-30aef0b8837so4374496f8f.1 for <linux-kernel@vger.kernel.org>; Mon, 05 Jun 2023 13:11:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1685995874; x=1688587874; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=I03V9LL6wy3VgyRgg/zC0hcUQtAfkY257wxzzkjPoYM=; b=SgFJczamY7iJrnDAAUuSVGa6BF0kI4cEDyzxvkh92/wCJTbBVZ7lEMkoZG+0WOp3wG X98QrAaTnw/JQ/azOsxIcOG+/MQqf4XGvyHjuzmp7uoIGqlOpfABtUrIdpiSh9vQhklZ WpQIYQpSQVDFj8WKJsZ23n45sLWiByIHKLDKSctcNjHoPjmXWXYiBtYfksgqKBVJJIsg IKNsPKu/U8xH++ONujyhcU5xkyHL1Wxq1aEv+IvtxoOwGq99qFjYWUWrmhTS3zUCezy1 6B/Qwdvjt8Yu9mZmwWBSCpnKRFEbBxkzHcbp841/vCQ5vFpfTgnCgTyzPz/UhEvx0Jid Regw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685995874; x=1688587874; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=I03V9LL6wy3VgyRgg/zC0hcUQtAfkY257wxzzkjPoYM=; b=VNj1mpNxunQqcfwh6eDp5q8obcttdVH8wabNNzKAsJyJRwOU+T+LBpIZP2r3TyUNJR s67e0llwkSXMuhkXazHrZQXgnGSeYv7pHnbTR3cT3M8gKsc4Vv9httrajlCeuEyNkQwa vRmtRj3jCHVIYiGN68s4jG9GP6bLHyOjnZlmooa2IzfRZ1pJWedqAeIVZpx1C3p1rOZV JbSqFNkoi42GnnJOg9pjt/deZmZzNiYM6NfpSKdd6nalNjqbTlArLSjKGhjr0eVNegBR e5ihoxm9rm04EM/a3KaykMbjzTRA2YQIPDRbq17V1FWbo/mHQznh+mndMDUnG96FbowL 171Q== X-Gm-Message-State: AC+VfDytK0E9jkg0rCyafrLaq+WrvKlnTji+iZeq7LHHkKVTwVdRdyqR hDqS9DMVkdetobzJKTZJfIA= X-Received: by 2002:a5d:668c:0:b0:30d:5071:b033 with SMTP id l12-20020a5d668c000000b0030d5071b033mr16438wru.31.1685995873994; Mon, 05 Jun 2023 13:11:13 -0700 (PDT) Received: from lucifer.home ([2a00:23c5:dc8c:8701:1663:9a35:5a7b:1d76]) by smtp.googlemail.com with ESMTPSA id h7-20020adff4c7000000b002f28de9f73bsm10539756wrp.55.2023.06.05.13.11.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Jun 2023 13:11:12 -0700 (PDT) From: Lorenzo Stoakes <lstoakes@gmail.com> To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton <akpm@linux-foundation.org> Cc: Baoquan He <bhe@redhat.com>, Uladzislau Rezki <urezki@gmail.com>, Christoph Hellwig <hch@infradead.org>, Vlastimil Babka <vbabka@suse.cz>, Lorenzo Stoakes <lstoakes@gmail.com> Subject: [PATCH] mm/vmalloc: do not output a spurious warning when huge vmalloc() fails Date: Mon, 5 Jun 2023 21:11:07 +0100 Message-Id: <20230605201107.83298-1-lstoakes@gmail.com> X-Mailer: git-send-email 2.40.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1767895413665139870?= X-GMAIL-MSGID: =?utf-8?q?1767895413665139870?= |
Series |
mm/vmalloc: do not output a spurious warning when huge vmalloc() fails
|
|
Commit Message
Lorenzo Stoakes
June 5, 2023, 8:11 p.m. UTC
In __vmalloc_area_node() we always warn_alloc() when an allocation
performed by vm_area_alloc_pages() fails unless it was due to a pending
fatal signal.
However, huge page allocations instigated either by vmalloc_huge() or
__vmalloc_node_range() (or a caller that invokes this like kvmalloc() or
kvmalloc_node()) always falls back to order-0 allocations if the huge page
allocation fails.
This renders the warning useless and noisy, especially as all callers
appear to be aware that this may fallback. This has already resulted in at
least one bug report from a user who was confused by this (see link).
Therefore, simply update the code to only output this warning for order-0
pages when no fatal signal is pending.
Link: https://bugzilla.suse.com/show_bug.cgi?id=1211410
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
---
mm/vmalloc.c | 17 +++++++++++++----
1 file changed, 13 insertions(+), 4 deletions(-)
Comments
On 6/5/23 22:11, Lorenzo Stoakes wrote: > In __vmalloc_area_node() we always warn_alloc() when an allocation > performed by vm_area_alloc_pages() fails unless it was due to a pending > fatal signal. > > However, huge page allocations instigated either by vmalloc_huge() or > __vmalloc_node_range() (or a caller that invokes this like kvmalloc() or > kvmalloc_node()) always falls back to order-0 allocations if the huge page > allocation fails. > > This renders the warning useless and noisy, especially as all callers > appear to be aware that this may fallback. This has already resulted in at > least one bug report from a user who was confused by this (see link). > > Therefore, simply update the code to only output this warning for order-0 > pages when no fatal signal is pending. > > Link: https://bugzilla.suse.com/show_bug.cgi?id=1211410 > Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Thanks! > --- > mm/vmalloc.c | 17 +++++++++++++---- > 1 file changed, 13 insertions(+), 4 deletions(-) > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index ab606a80f475..e563f40ad379 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -3149,11 +3149,20 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, > * allocation request, free them via vfree() if any. > */ > if (area->nr_pages != nr_small_pages) { > - /* vm_area_alloc_pages() can also fail due to a fatal signal */ > - if (!fatal_signal_pending(current)) > + /* > + * vm_area_alloc_pages() can fail due to insufficient memory but > + * also:- > + * > + * - a pending fatal signal > + * - insufficient huge page-order pages > + * > + * Since we always retry allocations at order-0 in the huge page > + * case a warning for either is spurious. > + */ > + if (!fatal_signal_pending(current) && page_order == 0) > warn_alloc(gfp_mask, NULL, > - "vmalloc error: size %lu, page order %u, failed to allocate pages", > - area->nr_pages * PAGE_SIZE, page_order); > + "vmalloc error: size %lu, failed to allocate pages", > + area->nr_pages * PAGE_SIZE); > goto fail; > } >
On 06/05/23 at 09:11pm, Lorenzo Stoakes wrote: > In __vmalloc_area_node() we always warn_alloc() when an allocation > performed by vm_area_alloc_pages() fails unless it was due to a pending > fatal signal. > > However, huge page allocations instigated either by vmalloc_huge() or > __vmalloc_node_range() (or a caller that invokes this like kvmalloc() or > kvmalloc_node()) always falls back to order-0 allocations if the huge page > allocation fails. > > This renders the warning useless and noisy, especially as all callers > appear to be aware that this may fallback. This has already resulted in at > least one bug report from a user who was confused by this (see link). > > Therefore, simply update the code to only output this warning for order-0 > pages when no fatal signal is pending. > > Link: https://bugzilla.suse.com/show_bug.cgi?id=1211410 > Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> > --- > mm/vmalloc.c | 17 +++++++++++++---- > 1 file changed, 13 insertions(+), 4 deletions(-) > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index ab606a80f475..e563f40ad379 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -3149,11 +3149,20 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, > * allocation request, free them via vfree() if any. > */ > if (area->nr_pages != nr_small_pages) { > - /* vm_area_alloc_pages() can also fail due to a fatal signal */ > - if (!fatal_signal_pending(current)) > + /* > + * vm_area_alloc_pages() can fail due to insufficient memory but > + * also:- > + * > + * - a pending fatal signal > + * - insufficient huge page-order pages > + * > + * Since we always retry allocations at order-0 in the huge page > + * case a warning for either is spurious. > + */ LGTM, Reviewed-by: Baoquan He <bhe@redhat.com> > + if (!fatal_signal_pending(current) && page_order == 0) > warn_alloc(gfp_mask, NULL, > - "vmalloc error: size %lu, page order %u, failed to allocate pages", > - area->nr_pages * PAGE_SIZE, page_order); > + "vmalloc error: size %lu, failed to allocate pages", > + area->nr_pages * PAGE_SIZE); > goto fail; > } > > -- > 2.40.1 >
On 6/5/23 22:11, Lorenzo Stoakes wrote: > In __vmalloc_area_node() we always warn_alloc() when an allocation > performed by vm_area_alloc_pages() fails unless it was due to a pending > fatal signal. > > However, huge page allocations instigated either by vmalloc_huge() or > __vmalloc_node_range() (or a caller that invokes this like kvmalloc() or > kvmalloc_node()) always falls back to order-0 allocations if the huge page > allocation fails. > > This renders the warning useless and noisy, especially as all callers > appear to be aware that this may fallback. This has already resulted in at > least one bug report from a user who was confused by this (see link). > > Therefore, simply update the code to only output this warning for order-0 > pages when no fatal signal is pending. > > Link: https://bugzilla.suse.com/show_bug.cgi?id=1211410 > Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> I think there are more reports of same thing from the btrfs context, that appear to be a 6.3 regression https://bugzilla.kernel.org/show_bug.cgi?id=217466 Link: https://lore.kernel.org/all/efa04d56-cd7f-6620-bca7-1df89f49bf4b@gmail.com/ If this indeed helps, it would make sense to Cc: stable here. Although I don't see what caused the regression, the warning itself is not new, so is it new source of order-9 attempts in vmalloc() or new reasons why order-9 pages would not be possible to allocate? > --- > mm/vmalloc.c | 17 +++++++++++++---- > 1 file changed, 13 insertions(+), 4 deletions(-) > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index ab606a80f475..e563f40ad379 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -3149,11 +3149,20 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, > * allocation request, free them via vfree() if any. > */ > if (area->nr_pages != nr_small_pages) { > - /* vm_area_alloc_pages() can also fail due to a fatal signal */ > - if (!fatal_signal_pending(current)) > + /* > + * vm_area_alloc_pages() can fail due to insufficient memory but > + * also:- > + * > + * - a pending fatal signal > + * - insufficient huge page-order pages > + * > + * Since we always retry allocations at order-0 in the huge page > + * case a warning for either is spurious. > + */ > + if (!fatal_signal_pending(current) && page_order == 0) > warn_alloc(gfp_mask, NULL, > - "vmalloc error: size %lu, page order %u, failed to allocate pages", > - area->nr_pages * PAGE_SIZE, page_order); > + "vmalloc error: size %lu, failed to allocate pages", > + area->nr_pages * PAGE_SIZE); > goto fail; > } >
On 05.06.23 22:11, Lorenzo Stoakes wrote: > In __vmalloc_area_node() we always warn_alloc() when an allocation > performed by vm_area_alloc_pages() fails unless it was due to a pending > fatal signal. > > However, huge page allocations instigated either by vmalloc_huge() or > __vmalloc_node_range() (or a caller that invokes this like kvmalloc() or > kvmalloc_node()) always falls back to order-0 allocations if the huge page > allocation fails. > > This renders the warning useless and noisy, especially as all callers > appear to be aware that this may fallback. This has already resulted in at > least one bug report from a user who was confused by this (see link). > > Therefore, simply update the code to only output this warning for order-0 > pages when no fatal signal is pending. > > Link: https://bugzilla.suse.com/show_bug.cgi?id=1211410 > Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> > --- > mm/vmalloc.c | 17 +++++++++++++---- > 1 file changed, 13 insertions(+), 4 deletions(-) > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index ab606a80f475..e563f40ad379 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -3149,11 +3149,20 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, > * allocation request, free them via vfree() if any. > */ > if (area->nr_pages != nr_small_pages) { > - /* vm_area_alloc_pages() can also fail due to a fatal signal */ > - if (!fatal_signal_pending(current)) > + /* > + * vm_area_alloc_pages() can fail due to insufficient memory but > + * also:- > + * > + * - a pending fatal signal > + * - insufficient huge page-order pages > + * > + * Since we always retry allocations at order-0 in the huge page > + * case a warning for either is spurious. > + */ > + if (!fatal_signal_pending(current) && page_order == 0) > warn_alloc(gfp_mask, NULL, > - "vmalloc error: size %lu, page order %u, failed to allocate pages", > - area->nr_pages * PAGE_SIZE, page_order); > + "vmalloc error: size %lu, failed to allocate pages", > + area->nr_pages * PAGE_SIZE); > goto fail; > } > Reviewed-by: David Hildenbrand <david@redhat.com>
On Tue, Jun 06, 2023 at 09:13:24AM +0200, Vlastimil Babka wrote: > > On 6/5/23 22:11, Lorenzo Stoakes wrote: > > In __vmalloc_area_node() we always warn_alloc() when an allocation > > performed by vm_area_alloc_pages() fails unless it was due to a pending > > fatal signal. > > > > However, huge page allocations instigated either by vmalloc_huge() or > > __vmalloc_node_range() (or a caller that invokes this like kvmalloc() or > > kvmalloc_node()) always falls back to order-0 allocations if the huge page > > allocation fails. > > > > This renders the warning useless and noisy, especially as all callers > > appear to be aware that this may fallback. This has already resulted in at > > least one bug report from a user who was confused by this (see link). > > > > Therefore, simply update the code to only output this warning for order-0 > > pages when no fatal signal is pending. > > > > Link: https://bugzilla.suse.com/show_bug.cgi?id=1211410 > > Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> > > I think there are more reports of same thing from the btrfs context, that > appear to be a 6.3 regression > > https://bugzilla.kernel.org/show_bug.cgi?id=217466 > Link: https://lore.kernel.org/all/efa04d56-cd7f-6620-bca7-1df89f49bf4b@gmail.com/ > > If this indeed helps, it would make sense to Cc: stable here. Although I > don't see what caused the regression, the warning itself is not new, so is > it new source of order-9 attempts in vmalloc() or new reasons why order-9 > pages would not be possible to allocate? Linus updated kvmalloc() to use huge vmalloc() allocations in 9becb6889130 ("kvmalloc: use vmalloc_huge for vmalloc allocations") and Song update alloc_large_system_hash() to as well in f2edd118d02d ("page_alloc: use vmalloc_huge for large system hash") both of which are ~1y old, however these would impact ~5.18, so it's weird to see reports citing 6.2 -> 6.3. Will dig to see if something else changed that would increase the prevalence of this. Also while we're here, ugh at us immediately splitting the non-compound (also ugh) huge page. Nicholas explains why in the patch that introduces it - 3b8000ae185c ("mm/vmalloc: huge vmalloc backing pages should be split rather than compound") - but it'd be nice if we could find a way to avoid this. If only there were a data type (perhaps beginning with 'f') that abstracted the order of the page entirely and could be guaranteed to always be the one with which you manipulated ref count, etc... ;) > > > --- > > mm/vmalloc.c | 17 +++++++++++++---- > > 1 file changed, 13 insertions(+), 4 deletions(-) > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > > index ab606a80f475..e563f40ad379 100644 > > --- a/mm/vmalloc.c > > +++ b/mm/vmalloc.c > > @@ -3149,11 +3149,20 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, > > * allocation request, free them via vfree() if any. > > */ > > if (area->nr_pages != nr_small_pages) { > > - /* vm_area_alloc_pages() can also fail due to a fatal signal */ > > - if (!fatal_signal_pending(current)) > > + /* > > + * vm_area_alloc_pages() can fail due to insufficient memory but > > + * also:- > > + * > > + * - a pending fatal signal > > + * - insufficient huge page-order pages > > + * > > + * Since we always retry allocations at order-0 in the huge page > > + * case a warning for either is spurious. > > + */ > > + if (!fatal_signal_pending(current) && page_order == 0) > > warn_alloc(gfp_mask, NULL, > > - "vmalloc error: size %lu, page order %u, failed to allocate pages", > > - area->nr_pages * PAGE_SIZE, page_order); > > + "vmalloc error: size %lu, failed to allocate pages", > > + area->nr_pages * PAGE_SIZE); > > goto fail; > > } > > >
On Tue, Jun 06, 2023 at 09:13:24AM +0200, Vlastimil Babka wrote: > > On 6/5/23 22:11, Lorenzo Stoakes wrote: > > In __vmalloc_area_node() we always warn_alloc() when an allocation > > performed by vm_area_alloc_pages() fails unless it was due to a pending > > fatal signal. > > > > However, huge page allocations instigated either by vmalloc_huge() or > > __vmalloc_node_range() (or a caller that invokes this like kvmalloc() or > > kvmalloc_node()) always falls back to order-0 allocations if the huge page > > allocation fails. > > > > This renders the warning useless and noisy, especially as all callers > > appear to be aware that this may fallback. This has already resulted in at > > least one bug report from a user who was confused by this (see link). > > > > Therefore, simply update the code to only output this warning for order-0 > > pages when no fatal signal is pending. > > > > Link: https://bugzilla.suse.com/show_bug.cgi?id=1211410 > > Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> > > I think there are more reports of same thing from the btrfs context, that > appear to be a 6.3 regression > > https://bugzilla.kernel.org/show_bug.cgi?id=217466 > Link: https://lore.kernel.org/all/efa04d56-cd7f-6620-bca7-1df89f49bf4b@gmail.com/ > I had a look at that report. The btrfs complains due to the fact that a high-order page(1 << 9) can not be obtained. In the vmalloc code we do not fall to 0-order allocator if there is a request of getting a high-order. I provided a patch to fallback if a high-order. A reproducer, after applying the patch, started to get oppses in another places. IMO, we should fallback even for high-order requests. Because it is highly likely it can not be accomplished. Any thoughts? <snip> diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 31ff782d368b..7a06452f7807 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2957,14 +2957,18 @@ vm_area_alloc_pages(gfp_t gfp, int nid, page = alloc_pages(alloc_gfp, order); else page = alloc_pages_node(nid, alloc_gfp, order); + if (unlikely(!page)) { - if (!nofail) - break; + if (nofail) + alloc_gfp |= __GFP_NOFAIL; - /* fall back to the zero order allocations */ - alloc_gfp |= __GFP_NOFAIL; - order = 0; - continue; + /* Fall back to the zero order allocations. */ + if (order || nofail) { + order = 0; + continue; + } + + break; } /* <snip> -- Uladzislau Rezki
On Mon, Jun 05, 2023 at 09:11:07PM +0100, Lorenzo Stoakes wrote: > In __vmalloc_area_node() we always warn_alloc() when an allocation > performed by vm_area_alloc_pages() fails unless it was due to a pending > fatal signal. > > However, huge page allocations instigated either by vmalloc_huge() or > __vmalloc_node_range() (or a caller that invokes this like kvmalloc() or > kvmalloc_node()) always falls back to order-0 allocations if the huge page > allocation fails. > > This renders the warning useless and noisy, especially as all callers > appear to be aware that this may fallback. This has already resulted in at > least one bug report from a user who was confused by this (see link). > > Therefore, simply update the code to only output this warning for order-0 > pages when no fatal signal is pending. > > Link: https://bugzilla.suse.com/show_bug.cgi?id=1211410 > Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> > --- > mm/vmalloc.c | 17 +++++++++++++---- > 1 file changed, 13 insertions(+), 4 deletions(-) > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index ab606a80f475..e563f40ad379 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -3149,11 +3149,20 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, > * allocation request, free them via vfree() if any. > */ > if (area->nr_pages != nr_small_pages) { > - /* vm_area_alloc_pages() can also fail due to a fatal signal */ > - if (!fatal_signal_pending(current)) > + /* > + * vm_area_alloc_pages() can fail due to insufficient memory but > + * also:- > + * > + * - a pending fatal signal > + * - insufficient huge page-order pages > + * > + * Since we always retry allocations at order-0 in the huge page > + * case a warning for either is spurious. > + */ > + if (!fatal_signal_pending(current) && page_order == 0) > warn_alloc(gfp_mask, NULL, > - "vmalloc error: size %lu, page order %u, failed to allocate pages", > - area->nr_pages * PAGE_SIZE, page_order); > + "vmalloc error: size %lu, failed to allocate pages", > + area->nr_pages * PAGE_SIZE); > goto fail; > } > > -- > 2.40.1 > Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
On Tue, Jun 06, 2023 at 10:17:02AM +0200, Uladzislau Rezki wrote: > On Tue, Jun 06, 2023 at 09:13:24AM +0200, Vlastimil Babka wrote: > > > > On 6/5/23 22:11, Lorenzo Stoakes wrote: > > > In __vmalloc_area_node() we always warn_alloc() when an allocation > > > performed by vm_area_alloc_pages() fails unless it was due to a pending > > > fatal signal. > > > > > > However, huge page allocations instigated either by vmalloc_huge() or > > > __vmalloc_node_range() (or a caller that invokes this like kvmalloc() or > > > kvmalloc_node()) always falls back to order-0 allocations if the huge page > > > allocation fails. > > > > > > This renders the warning useless and noisy, especially as all callers > > > appear to be aware that this may fallback. This has already resulted in at > > > least one bug report from a user who was confused by this (see link). > > > > > > Therefore, simply update the code to only output this warning for order-0 > > > pages when no fatal signal is pending. > > > > > > Link: https://bugzilla.suse.com/show_bug.cgi?id=1211410 > > > Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> > > > > I think there are more reports of same thing from the btrfs context, that > > appear to be a 6.3 regression > > > > https://bugzilla.kernel.org/show_bug.cgi?id=217466 > > Link: https://lore.kernel.org/all/efa04d56-cd7f-6620-bca7-1df89f49bf4b@gmail.com/ > > > I had a look at that report. The btrfs complains due to the > fact that a high-order page(1 << 9) can not be obtained. In the > vmalloc code we do not fall to 0-order allocator if there is > a request of getting a high-order. This isn't true, we _do_ fallback to order-0 (this is the basis of my patch), in __vmalloc_node_range():- /* Allocate physical pages and map them into vmalloc space. */ ret = __vmalloc_area_node(area, gfp_mask, prot, shift, node); if (!ret) goto fail; ... fail: if (shift > PAGE_SHIFT) { shift = PAGE_SHIFT; align = real_align; size = real_size; goto again; } With the order being derived from shift, and __vmalloc_area_node() only being called from __vmalloc_node_range(). > > I provided a patch to fallback if a high-order. A reproducer, after > applying the patch, started to get oppses in another places. > > IMO, we should fallback even for high-order requests. Because it is > highly likely it can not be accomplished. > > Any thoughts? > > <snip> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index 31ff782d368b..7a06452f7807 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -2957,14 +2957,18 @@ vm_area_alloc_pages(gfp_t gfp, int nid, > page = alloc_pages(alloc_gfp, order); > else > page = alloc_pages_node(nid, alloc_gfp, order); > + > if (unlikely(!page)) { > - if (!nofail) > - break; > + if (nofail) > + alloc_gfp |= __GFP_NOFAIL; > > - /* fall back to the zero order allocations */ > - alloc_gfp |= __GFP_NOFAIL; > - order = 0; > - continue; > + /* Fall back to the zero order allocations. */ > + if (order || nofail) { > + order = 0; > + continue; > + } > + > + break; > } > > /* > <snip> > > > > -- > Uladzislau Rezki I saw that, it seems to be duplicating the same thing as the original fallback code is (which was originally designed to permit higher order non-__GFP_NOFAIL allocations before trying order-0 __GFP_NOFAIL). I don't think it is really useful to change this as it confuses that logic and duplicates something we already do. Honestly though moreover I think this whole area needs some refactoring.
On Tue, Jun 06, 2023 at 09:24:33AM +0100, Lorenzo Stoakes wrote: > On Tue, Jun 06, 2023 at 10:17:02AM +0200, Uladzislau Rezki wrote: > > On Tue, Jun 06, 2023 at 09:13:24AM +0200, Vlastimil Babka wrote: > > > > > > On 6/5/23 22:11, Lorenzo Stoakes wrote: > > > > In __vmalloc_area_node() we always warn_alloc() when an allocation > > > > performed by vm_area_alloc_pages() fails unless it was due to a pending > > > > fatal signal. > > > > > > > > However, huge page allocations instigated either by vmalloc_huge() or > > > > __vmalloc_node_range() (or a caller that invokes this like kvmalloc() or > > > > kvmalloc_node()) always falls back to order-0 allocations if the huge page > > > > allocation fails. > > > > > > > > This renders the warning useless and noisy, especially as all callers > > > > appear to be aware that this may fallback. This has already resulted in at > > > > least one bug report from a user who was confused by this (see link). > > > > > > > > Therefore, simply update the code to only output this warning for order-0 > > > > pages when no fatal signal is pending. > > > > > > > > Link: https://bugzilla.suse.com/show_bug.cgi?id=1211410 > > > > Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> > > > > > > I think there are more reports of same thing from the btrfs context, that > > > appear to be a 6.3 regression > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=217466 > > > Link: https://lore.kernel.org/all/efa04d56-cd7f-6620-bca7-1df89f49bf4b@gmail.com/ > > > > > I had a look at that report. The btrfs complains due to the > > fact that a high-order page(1 << 9) can not be obtained. In the > > vmalloc code we do not fall to 0-order allocator if there is > > a request of getting a high-order. > > This isn't true, we _do_ fallback to order-0 (this is the basis of my patch), in > __vmalloc_node_range():- > > /* Allocate physical pages and map them into vmalloc space. */ > ret = __vmalloc_area_node(area, gfp_mask, prot, shift, node); > if (!ret) > goto fail; > > ... > > fail: > if (shift > PAGE_SHIFT) { > shift = PAGE_SHIFT; > align = real_align; > size = real_size; > goto again; > } > > With the order being derived from shift, and __vmalloc_area_node() only being > called from __vmalloc_node_range(). > Correct. It is done on an upper-layer whereas i checked the vm_area_alloc_pages() function. But as you mentioned, the refactoring has to be done as it looks a bit messy. -- Uladzislau Rezki
On Mon 05-06-23 21:11:07, Lorenzo Stoakes wrote: > In __vmalloc_area_node() we always warn_alloc() when an allocation > performed by vm_area_alloc_pages() fails unless it was due to a pending > fatal signal. > > However, huge page allocations instigated either by vmalloc_huge() or > __vmalloc_node_range() (or a caller that invokes this like kvmalloc() or > kvmalloc_node()) always falls back to order-0 allocations if the huge page > allocation fails. > > This renders the warning useless and noisy, especially as all callers > appear to be aware that this may fallback. This has already resulted in at > least one bug report from a user who was confused by this (see link). > > Therefore, simply update the code to only output this warning for order-0 > pages when no fatal signal is pending. The way how high order allocations are grafted in is just horrendous. Sigh. > Link: https://bugzilla.suse.com/show_bug.cgi?id=1211410 > Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Thanks! > --- > mm/vmalloc.c | 17 +++++++++++++---- > 1 file changed, 13 insertions(+), 4 deletions(-) > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index ab606a80f475..e563f40ad379 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -3149,11 +3149,20 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, > * allocation request, free them via vfree() if any. > */ > if (area->nr_pages != nr_small_pages) { > - /* vm_area_alloc_pages() can also fail due to a fatal signal */ > - if (!fatal_signal_pending(current)) > + /* > + * vm_area_alloc_pages() can fail due to insufficient memory but > + * also:- > + * > + * - a pending fatal signal > + * - insufficient huge page-order pages > + * > + * Since we always retry allocations at order-0 in the huge page > + * case a warning for either is spurious. > + */ > + if (!fatal_signal_pending(current) && page_order == 0) > warn_alloc(gfp_mask, NULL, > - "vmalloc error: size %lu, page order %u, failed to allocate pages", > - area->nr_pages * PAGE_SIZE, page_order); > + "vmalloc error: size %lu, failed to allocate pages", > + area->nr_pages * PAGE_SIZE); > goto fail; > } > > -- > 2.40.1
On 6/6/23 09:40, Lorenzo Stoakes wrote: > On Tue, Jun 06, 2023 at 09:13:24AM +0200, Vlastimil Babka wrote: >> >> On 6/5/23 22:11, Lorenzo Stoakes wrote: >>> In __vmalloc_area_node() we always warn_alloc() when an allocation >>> performed by vm_area_alloc_pages() fails unless it was due to a pending >>> fatal signal. >>> >>> However, huge page allocations instigated either by vmalloc_huge() or >>> __vmalloc_node_range() (or a caller that invokes this like kvmalloc() or >>> kvmalloc_node()) always falls back to order-0 allocations if the huge page >>> allocation fails. >>> >>> This renders the warning useless and noisy, especially as all callers >>> appear to be aware that this may fallback. This has already resulted in at >>> least one bug report from a user who was confused by this (see link). >>> >>> Therefore, simply update the code to only output this warning for order-0 >>> pages when no fatal signal is pending. >>> >>> Link: https://bugzilla.suse.com/show_bug.cgi?id=1211410 >>> Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> >> >> I think there are more reports of same thing from the btrfs context, that >> appear to be a 6.3 regression >> >> https://bugzilla.kernel.org/show_bug.cgi?id=217466 >> Link: https://lore.kernel.org/all/efa04d56-cd7f-6620-bca7-1df89f49bf4b@gmail.com/ >> >> If this indeed helps, it would make sense to Cc: stable here. Although I >> don't see what caused the regression, the warning itself is not new, so is >> it new source of order-9 attempts in vmalloc() or new reasons why order-9 >> pages would not be possible to allocate? > > Linus updated kvmalloc() to use huge vmalloc() allocations in 9becb6889130 > ("kvmalloc: use vmalloc_huge for vmalloc allocations") and Song update > alloc_large_system_hash() to as well in f2edd118d02d ("page_alloc: use > vmalloc_huge for large system hash") both of which are ~1y old, however > these would impact ~5.18, so it's weird to see reports citing 6.2 -> 6.3. > > Will dig to see if something else changed that would increase the > prevalence of this. I think I found the commit from 6.3 that effectively exposed this warning. As this is a tracked regression I would really suggest moving the fix to mm-hotfixes instead of mm-unstable, and Fixes: 80b1d8fdfad1 ("mm: vmalloc: correct use of __GFP_NOWARN mask in __vmalloc_area_node()") Cc: <stable@vger.kernel.org> > Also while we're here, ugh at us immediately splitting the non-compound > (also ugh) huge page. Nicholas explains why in the patch that introduces it > - 3b8000ae185c ("mm/vmalloc: huge vmalloc backing pages should be split > rather than compound") - but it'd be nice if we could find a way to avoid > this. > > If only there were a data type (perhaps beginning with 'f') that abstracted > the order of the page entirely and could be guaranteed to always be the one > with which you manipulated ref count, etc... ;) > >> >>> --- >>> mm/vmalloc.c | 17 +++++++++++++---- >>> 1 file changed, 13 insertions(+), 4 deletions(-) >>> >>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c >>> index ab606a80f475..e563f40ad379 100644 >>> --- a/mm/vmalloc.c >>> +++ b/mm/vmalloc.c >>> @@ -3149,11 +3149,20 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, >>> * allocation request, free them via vfree() if any. >>> */ >>> if (area->nr_pages != nr_small_pages) { >>> - /* vm_area_alloc_pages() can also fail due to a fatal signal */ >>> - if (!fatal_signal_pending(current)) >>> + /* >>> + * vm_area_alloc_pages() can fail due to insufficient memory but >>> + * also:- >>> + * >>> + * - a pending fatal signal >>> + * - insufficient huge page-order pages >>> + * >>> + * Since we always retry allocations at order-0 in the huge page >>> + * case a warning for either is spurious. >>> + */ >>> + if (!fatal_signal_pending(current) && page_order == 0) >>> warn_alloc(gfp_mask, NULL, >>> - "vmalloc error: size %lu, page order %u, failed to allocate pages", >>> - area->nr_pages * PAGE_SIZE, page_order); >>> + "vmalloc error: size %lu, failed to allocate pages", >>> + area->nr_pages * PAGE_SIZE); >>> goto fail; >>> } >>> >>
On Wed, Jun 07, 2023 at 10:58:40AM +0200, Vlastimil Babka wrote: > I think I found the commit from 6.3 that effectively exposed this warning. > As this is a tracked regression I would really suggest moving the fix to > mm-hotfixes instead of mm-unstable, and > > Fixes: 80b1d8fdfad1 ("mm: vmalloc: correct use of __GFP_NOWARN mask in __vmalloc_area_node()") > Cc: <stable@vger.kernel.org> Yeah, ugh. What's irritating is that this is not incorrect - invoking warn_alloc() in such a way that it does literally nothing is not right, so that fix was required, but it simply exposed another issue. But completely agree this is technically a regression, and yes Andrew it'd be great if we could move this to hotfixes and append the stable cc if possible thanks! (We definitely need to refactor a lot of this code!)
On Wed, 7 Jun 2023 10:58:40 +0200 Vlastimil Babka <vbabka@suse.cz> wrote: > I would really suggest moving the fix to > mm-hotfixes instead of mm-unstable, and > > Fixes: 80b1d8fdfad1 ("mm: vmalloc: correct use of __GFP_NOWARN mask in __vmalloc_area_node()") > Cc: <stable@vger.kernel.org> I've made those changes.
---- From: Andrew Morton <akpm@linux-foundation.org> -- Sent: 2023-06-07 - 18:33 ---- > On Wed, 7 Jun 2023 10:58:40 +0200 Vlastimil Babka <vbabka@suse.cz> wrote: > >> I would really suggest moving the fix to >> mm-hotfixes instead of mm-unstable, and >> >> Fixes: 80b1d8fdfad1 ("mm: vmalloc: correct use of __GFP_NOWARN mask in __vmalloc_area_node()") >> Cc: <stable@vger.kernel.org> > > I've made those changes. Did the chabge go into 6.3 stable? I saw these issues with kernels 6.3.0-6 3.7. I now updated to 6.3.9 and have had no more warnings since.
On Sun, Jun 25, 2023 at 05:40:17PM +0200, Forza wrote: > > > ---- From: Andrew Morton <akpm@linux-foundation.org> -- Sent: 2023-06-07 - 18:33 ---- > > > On Wed, 7 Jun 2023 10:58:40 +0200 Vlastimil Babka <vbabka@suse.cz> wrote: > > > >> I would really suggest moving the fix to > >> mm-hotfixes instead of mm-unstable, and > >> > >> Fixes: 80b1d8fdfad1 ("mm: vmalloc: correct use of __GFP_NOWARN mask in __vmalloc_area_node()") > >> Cc: <stable@vger.kernel.org> > > > > I've made those changes. > > Did the chabge go into 6.3 stable? I saw these issues with kernels 6.3.0-6 3.7. I now updated to 6.3.9 and have had no more warnings since. Yeah, got the notification from Greg's script that it landed in 6.3 stable. >
On 6/25/23 17:59, Lorenzo Stoakes wrote: > On Sun, Jun 25, 2023 at 05:40:17PM +0200, Forza wrote: >> >> >> ---- From: Andrew Morton <akpm@linux-foundation.org> -- Sent: 2023-06-07 - 18:33 ---- >> >> > On Wed, 7 Jun 2023 10:58:40 +0200 Vlastimil Babka <vbabka@suse.cz> wrote: >> > >> >> I would really suggest moving the fix to >> >> mm-hotfixes instead of mm-unstable, and >> >> >> >> Fixes: 80b1d8fdfad1 ("mm: vmalloc: correct use of __GFP_NOWARN mask in __vmalloc_area_node()") >> >> Cc: <stable@vger.kernel.org> >> > >> > I've made those changes. >> >> Did the chabge go into 6.3 stable? I saw these issues with kernels 6.3.0-6 3.7. I now updated to 6.3.9 and have had no more warnings since. > > Yeah, got the notification from Greg's script that it landed in 6.3 stable. It did, but was not yet released. 6.3.9 from Wed Jun 21 doesn't have it yet, so it's interesting the warnings are gone already. >>
---- From: Vlastimil Babka <vbabka@suse.cz> -- Sent: 2023-06-26 - 11:08 ---- > On 6/25/23 17:59, Lorenzo Stoakes wrote: >> On Sun, Jun 25, 2023 at 05:40:17PM +0200, Forza wrote: >>> >>> >>> ---- From: Andrew Morton <akpm@linux-foundation.org> -- Sent: 2023-06-07 - 18:33 ---- >>> >>> > On Wed, 7 Jun 2023 10:58:40 +0200 Vlastimil Babka <vbabka@suse.cz> wrote: >>> > >>> >> I would really suggest moving the fix to >>> >> mm-hotfixes instead of mm-unstable, and >>> >> >>> >> Fixes: 80b1d8fdfad1 ("mm: vmalloc: correct use of __GFP_NOWARN mask in __vmalloc_area_node()") >>> >> Cc: <stable@vger.kernel.org> >>> > >>> > I've made those changes. >>> >>> Did the chabge go into 6.3 stable? I saw these issues with kernels 6.3.0-6 3.7. I now updated to 6.3.9 and have had no more warnings since. >> >> Yeah, got the notification from Greg's script that it landed in 6.3 stable. > > It did, but was not yet released. 6.3.9 from Wed Jun 21 doesn't have it yet, > so it's interesting the warnings are gone already. > > Oh dang it. I jinxed the thing... At least there was 4 days uptime before this happened. I did run with vm.swappiness=0, and started a new VM in QEMU, which must have put extra pressure on allocations. # dmesg | tail -n +1550 [286405.332000] lan: port 5(vnet10) entered blocking state [286405.332008] lan: port 5(vnet10) entered forwarding state [286405.686587] qemu:deb12-virt: vmalloc error: size 0, page order 9, failed to allocate pages, mode:0xdc2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO), nodemask=(null),cpuset=emulator,mems_allowed=0 [286405.686604] CPU: 1 PID: 16084 Comm: qemu:deb12-virt Not tainted 6.3.9-gentoo-e350 #2 [286405.686608] Hardware name: Gigabyte Technology Co., Ltd. B450M DS3H/B450M DS3H-CF, BIOS F64 06/08/2023 [286405.686610] Call Trace: [286405.686612] <TASK> [286405.686616] dump_stack_lvl+0x32/0x50 [286405.686622] warn_alloc+0x132/0x1b0 [286405.686627] __vmalloc_node_range+0x639/0x880 [286405.686633] ? mas_wr_bnode+0x123/0x1060 [286405.686637] ? amdgpu_bo_create+0xd6/0x480 [amdgpu] [286405.686919] kvmalloc_node+0x92/0xb0 [286405.686923] ? amdgpu_bo_create+0xd6/0x480 [amdgpu] [286405.687171] amdgpu_bo_create+0xd6/0x480 [amdgpu] [286405.687408] amdgpu_bo_create_vm+0x2e/0x60 [amdgpu] [286405.687663] amdgpu_vm_pt_create+0x12b/0x2a0 [amdgpu] [286405.687941] amdgpu_vm_init+0x245/0x4d0 [amdgpu] [286405.688193] amdgpu_driver_open_kms+0x94/0x230 [amdgpu] [286405.688440] drm_file_alloc+0x196/0x240 [286405.688445] drm_open_helper+0x74/0x120 [286405.688448] drm_open+0x7b/0x140 [286405.688450] drm_stub_open+0xa4/0xe0 [286405.688454] chrdev_open+0xbd/0x210 [286405.688458] ? __pfx_chrdev_open+0x10/0x10 [286405.688461] do_dentry_open+0x1e5/0x460 [286405.688465] path_openat+0xc91/0x1080 [286405.688469] do_filp_open+0xb4/0x160 [286405.688472] ? __check_object_size+0x23a/0x2b0 [286405.688475] do_sys_openat2+0x95/0x150 [286405.688478] __x64_sys_openat+0x6a/0xa0 [286405.688480] do_syscall_64+0x3a/0x90 [286405.688484] entry_SYSCALL_64_after_hwframe+0x72/0xdc [286405.688488] RIP: 0033:0x7fc725f0ae59 [286405.688504] Code: 24 18 48 8d 44 24 30 48 89 44 24 20 75 95 e8 1e e9 f8 ff 45 89 e2 89 da 48 89 ee 41 89 c0 bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 3f 44 89 c7 89 44 24 0c e8 73 e9 f8 ff 8b 44 [286405.688506] RSP: 002b:00007ffffffe6840 EFLAGS: 00000293 ORIG_RAX: 0000000000000101 [286405.688509] RAX: ffffffffffffffda RBX: 0000000000080902 RCX: 00007fc725f0ae59 [286405.688511] RDX: 0000000000080902 RSI: 00007fc72453ad20 RDI: 00000000ffffff9c [286405.688513] RBP: 00007fc72453ad20 R08: 0000000000000000 R09: 000000000000000c [286405.688514] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000 [286405.688516] R13: 0000559e9cfe4708 R14: 00007ffffffe6ac0 R15: 0000559e9cfe4708 [286405.688518] </TASK> [286405.688519] Mem-Info: [286405.688521] active_anon:2351704 inactive_anon:2000415 isolated_anon:0 active_file:35015 inactive_file:28668 isolated_file:0 unevictable:5145 dirty:129 writeback:0 slab_reclaimable:70205 slab_unreclaimable:80481 mapped:982607 shmem:1063997 pagetables:18273 sec_pagetables:3080 bounce:0 kernel_misc_reclaimable:0 free:1389338 free_pcp:259 free_cma:0 [286405.688526] Node 0 active_anon:9406816kB inactive_anon:8001660kB active_file:140060kB inactive_file:114672kB unevictable:20580kB isolated(anon):0kB isolated(file):0kB mapped:3930428kB dirty:516kB writeback:0kB shmem:4255988kB shmem_thp: 4151296kB shmem_pmdmapped: 2641920kB anon_thp: 10160128kB writeback_tmp:0kB kernel_stack:18384kB pagetables:73092kB sec_pagetables:12320kB all_unreclaimable? no [286405.688532] DMA free:15372kB boost:0kB min:40kB low:52kB high:64kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15996kB managed:15372kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [286405.688536] lowmem_reserve[]: 0 2671 23694 23694 23694 [286405.688541] DMA32 free:933372kB boost:0kB min:7616kB low:10352kB high:13088kB reserved_highatomic:0KB active_anon:1140744kB inactive_anon:634600kB active_file:0kB inactive_file:324kB unevictable:0kB writepending:0kB present:2801616kB managed:2736072kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [286405.688546] lowmem_reserve[]: 0 0 21022 21022 21022 [286405.688550] Normal free:4608608kB boost:0kB min:59924kB low:81448kB high:102972kB reserved_highatomic:2048KB active_anon:8265464kB inactive_anon:7367108kB active_file:139392kB inactive_file:114328kB unevictable:20580kB writepending:516kB present:22007040kB managed:21527488kB mlocked:20580kB bounce:0kB free_pcp:1036kB local_pcp:0kB free_cma:0kB [286405.688555] lowmem_reserve[]: 0 0 0 0 0 [286405.688558] DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15372kB [286405.688571] DMA32: 1421*4kB (UME) 1447*8kB (UME) 1443*16kB (UME) 1491*32kB (UME) 1279*64kB (UME) 1024*128kB (UME) 667*256kB (UM) 424*512kB (UM) 239*1024kB (UM) 0*2048kB 0*4096kB = 933564kB [286405.688585] Normal: 34288*4kB (UME) 25137*8kB (UME) 18613*16kB (UME) 13225*32kB (UME) 8674*64kB (UME) 5360*128kB (UME) 3163*256kB (UME) 1722*512kB (UM) 601*1024kB (UM) 1*2048kB (H) 0*4096kB = 4609336kB [286405.688600] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [286405.688603] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [286405.688606] 1129365 total pagecache pages [286405.688608] 0 pages in swap cache [286405.688609] Free swap = 16576496kB [286405.688610] Total swap = 16576496kB [286405.688611] 6206163 pages RAM [286405.688612] 0 pages HighMem/MovableOnly [286405.688613] 136430 pages reserved [286405.688613] 0 pages hwpoisoned [289047.869189] lan: port 5(vnet10) entered disabled state [289047.871407] vnet10 (unregistering): left allmulticast mode [289047.871412] vnet10 (unregistering): left promiscuous mode [289047.871416] lan: port 5(vnet10) entered disabled state [290840.031863] kworker/u16:5: vmalloc error: size 0, page order 9, failed to allocate pages, mode:0xcc2(GFP_KERNEL|__GFP_HIGHMEM), nodemask=(null),cpuset=/,mems_allowed=0 [290840.031877] CPU: 2 PID: 24909 Comm: kworker/u16:5 Not tainted 6.3.9-gentoo-e350 #2 [290840.031880] Hardware name: Gigabyte Technology Co., Ltd. B450M DS3H/B450M DS3H-CF, BIOS F64 06/08/2023 [290840.031882] Workqueue: btrfs-delalloc btrfs_work_helper [290840.031887] Call Trace: [290840.031900] <TASK> [290840.031903] dump_stack_lvl+0x32/0x50 [290840.031912] warn_alloc+0x132/0x1b0 [290840.031917] __vmalloc_node_range+0x639/0x880 [290840.031921] ? zstd_alloc_workspace+0x6a/0xe0 [290840.031925] kvmalloc_node+0x92/0xb0 [290840.031928] ? zstd_alloc_workspace+0x6a/0xe0 [290840.031931] zstd_alloc_workspace+0x6a/0xe0 [290840.031934] zstd_get_workspace+0xfc/0x230 [290840.031939] btrfs_compress_pages+0x4c/0x110 [290840.031944] compress_file_range+0x37c/0x8d0 [290840.031948] async_cow_start+0x12/0x40 [290840.031950] ? __pfx_async_cow_submit+0x10/0x10 [290840.031953] btrfs_work_helper+0xde/0x300 [290840.031955] process_one_work+0x20f/0x3e0 [290840.031959] worker_thread+0x4a/0x3c0 [290840.031962] ? __pfx_worker_thread+0x10/0x10 [290840.031964] kthread+0xc3/0xe0 [290840.031968] ? __pfx_kthread+0x10/0x10 [290840.031970] ret_from_fork+0x2c/0x50 [290840.031975] </TASK> [290840.031976] Mem-Info: [290840.031978] active_anon:2339909 inactive_anon:2064196 isolated_anon:0 active_file:65663 inactive_file:55179 isolated_file:0 unevictable:5145 dirty:25418 writeback:0 slab_reclaimable:70164 slab_unreclaimable:80864 mapped:986684 shmem:1076299 pagetables:18629 sec_pagetables:3104 bounce:0 kernel_misc_reclaimable:0 free:1284125 free_pcp:64 free_cma:0 [290840.031983] Node 0 active_anon:9359636kB inactive_anon:8256784kB active_file:262652kB inactive_file:220716kB unevictable:20580kB isolated(anon):0kB isolated(file):0kB mapped:3946736kB dirty:101672kB writeback:0kB shmem:4305196kB shmem_thp: 4149248kB shmem_pmdmapped: 2641920kB anon_thp: 10182656kB writeback_tmp:0kB kernel_stack:18560kB pagetables:74516kB sec_pagetables:12416kB all_unreclaimable? no [290840.031988] DMA free:15372kB boost:0kB min:40kB low:52kB high:64kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15996kB managed:15372kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [290840.031995] lowmem_reserve[]: 0 2671 23694 23694 23694 [290840.032001] DMA32 free:929580kB boost:0kB min:7616kB low:10352kB high:13088kB reserved_highatomic:0KB active_anon:690416kB inactive_anon:1084604kB active_file:332kB inactive_file:212kB unevictable:0kB writepending:0kB present:2801616kB managed:2736072kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [290840.032007] lowmem_reserve[]: 0 0 21022 21022 21022 [290840.032012] Normal free:4191548kB boost:0kB min:59924kB low:81448kB high:102972kB reserved_highatomic:2048KB active_anon:8669148kB inactive_anon:7172160kB active_file:262204kB inactive_file:219956kB unevictable:20580kB writepending:101672kB present:22007040kB managed:21527488kB mlocked:20580kB bounce:0kB free_pcp:512kB local_pcp:0kB free_cma:0kB [290840.032018] lowmem_reserve[]: 0 0 0 0 0 [290840.032023] DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15372kB [290840.032039] DMA32: 1429*4kB (UME) 1449*8kB (UME) 1443*16kB (UME) 1491*32kB (UME) 1278*64kB (UME) 1021*128kB (UME) 667*256kB (UM) 421*512kB (UM) 237*1024kB (UM) 0*2048kB 0*4096kB = 929580kB [290840.032052] Normal: 3292*4kB (UME) 8398*8kB (UME) 16602*16kB (UME) 13421*32kB (UME) 8881*64kB (UME) 5345*128kB (UME) 3056*256kB (UME) 1553*512kB (UM) 571*1024kB (UM) 1*2048kB (H) 0*4096kB = 4192224kB [290840.032069] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [290840.032071] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [290840.032073] 1198652 total pagecache pages [290840.032074] 0 pages in swap cache [290840.032074] Free swap = 16576496kB [290840.032075] Total swap = 16576496kB [290840.032076] 6206163 pages RAM [290840.032077] 0 pages HighMem/MovableOnly [290840.032077] 136430 pages reserved [290840.032078] 0 pages hwpoisoned [294419.578589] warn_alloc: 3 callbacks suppressed [294419.578592] kworker/u16:7: vmalloc error: size 0, page order 9, failed to allocate pages, mode:0xcc2(GFP_KERNEL|__GFP_HIGHMEM), nodemask=(null),cpuset=/,mems_allowed=0 [294419.578603] CPU: 2 PID: 24910 Comm: kworker/u16:7 Not tainted 6.3.9-gentoo-e350 #2 [294419.578606] Hardware name: Gigabyte Technology Co., Ltd. B450M DS3H/B450M DS3H-CF, BIOS F64 06/08/2023 [294419.578607] Workqueue: btrfs-delalloc btrfs_work_helper [294419.578612] Call Trace: [294419.578615] <TASK> [294419.578617] dump_stack_lvl+0x32/0x50 [294419.578623] warn_alloc+0x132/0x1b0 [294419.578627] __vmalloc_node_range+0x639/0x880 [294419.578631] ? zstd_alloc_workspace+0x6a/0xe0 [294419.578635] kvmalloc_node+0x92/0xb0 [294419.578638] ? zstd_alloc_workspace+0x6a/0xe0 [294419.578642] zstd_alloc_workspace+0x6a/0xe0 [294419.578646] zstd_get_workspace+0xfc/0x230 [294419.578650] btrfs_compress_pages+0x4c/0x110 [294419.578654] compress_file_range+0x37c/0x8d0 [294419.578658] async_cow_start+0x12/0x40 [294419.578661] ? __pfx_async_cow_submit+0x10/0x10 [294419.578664] btrfs_work_helper+0xde/0x300 [294419.578667] process_one_work+0x20f/0x3e0 [294419.578671] worker_thread+0x4a/0x3c0 [294419.578673] ? __pfx_worker_thread+0x10/0x10 [294419.578676] ? __pfx_worker_thread+0x10/0x10 [294419.578678] kthread+0xc3/0xe0 [294419.578682] ? __pfx_kthread+0x10/0x10 [294419.578686] ret_from_fork+0x2c/0x50 [294419.578692] </TASK> [294419.578694] Mem-Info: [294419.578696] active_anon:1869566 inactive_anon:2491416 isolated_anon:0 active_file:82836 inactive_file:33435 isolated_file:6 unevictable:5145 dirty:10368 writeback:0 slab_reclaimable:70163 slab_unreclaimable:79069 mapped:992200 shmem:1077022 pagetables:18383 sec_pagetables:3104 bounce:0 kernel_misc_reclaimable:0 free:1330442 free_pcp:5208 free_cma:0 [294419.578704] Node 0 active_anon:7478264kB inactive_anon:9965664kB active_file:331344kB inactive_file:133740kB unevictable:20580kB isolated(anon):0kB isolated(file):24kB mapped:3968800kB dirty:41472kB writeback:0kB shmem:4308088kB shmem_thp: 4149248kB shmem_pmdmapped: 2646016kB anon_thp: 10196992kB writeback_tmp:0kB kernel_stack:18432kB pagetables:73532kB sec_pagetables:12416kB all_unreclaimable? no [294419.578709] DMA free:15372kB boost:0kB min:40kB low:52kB high:64kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15996kB managed:15372kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [294419.578714] lowmem_reserve[]: 0 2671 23694 23694 23694 [294419.578724] DMA32 free:927704kB boost:0kB min:7616kB low:10352kB high:13088kB reserved_highatomic:0KB active_anon:973788kB inactive_anon:807332kB active_file:104kB inactive_file:220kB unevictable:0kB writepending:0kB present:2801616kB managed:2736072kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [294419.578730] lowmem_reserve[]: 0 0 21022 21022 21022 [294419.578736] Normal free:4380452kB boost:0kB min:59924kB low:81448kB high:102972kB reserved_highatomic:2048KB active_anon:6504476kB inactive_anon:9158332kB active_file:331240kB inactive_file:133520kB unevictable:20580kB writepending:41472kB present:22007040kB managed:21527488kB mlocked:20580kB bounce:0kB free_pcp:19148kB local_pcp:13588kB free_cma:0kB [294419.578741] lowmem_reserve[]: 0 0 0 0 0 [294419.578744] DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15372kB [294419.578760] DMA32: 1432*4kB (UME) 1445*8kB (UME) 1437*16kB (UME) 1484*32kB (UME) 1276*64kB (UME) 1019*128kB (UME) 665*256kB (UM) 420*512kB (UM) 237*1024kB (UM) 0*2048kB 0*4096kB = 927832kB [294419.578777] Normal: 7051*4kB (UME) 11342*8kB (UME) 16910*16kB (UME) 13380*32kB (UME) 9035*64kB (UME) 5516*128kB (UME) 3230*256kB (UME) 1696*512kB (UM) 581*1024kB (UM) 1*2048kB (H) 0*4096kB = 4394172kB [294419.579034] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [294419.579036] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [294419.579037] 1194726 total pagecache pages [294419.579038] 0 pages in swap cache [294419.579038] Free swap = 16576496kB [294419.579039] Total swap = 16576496kB [294419.579040] 6206163 pages RAM [294419.579040] 0 pages HighMem/MovableOnly [294419.579041] 136430 pages reserved [294419.579041] 0 pages hwpoisoned
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index ab606a80f475..e563f40ad379 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -3149,11 +3149,20 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, * allocation request, free them via vfree() if any. */ if (area->nr_pages != nr_small_pages) { - /* vm_area_alloc_pages() can also fail due to a fatal signal */ - if (!fatal_signal_pending(current)) + /* + * vm_area_alloc_pages() can fail due to insufficient memory but + * also:- + * + * - a pending fatal signal + * - insufficient huge page-order pages + * + * Since we always retry allocations at order-0 in the huge page + * case a warning for either is spurious. + */ + if (!fatal_signal_pending(current) && page_order == 0) warn_alloc(gfp_mask, NULL, - "vmalloc error: size %lu, page order %u, failed to allocate pages", - area->nr_pages * PAGE_SIZE, page_order); + "vmalloc error: size %lu, failed to allocate pages", + area->nr_pages * PAGE_SIZE); goto fail; }