mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations
Message ID | 20240221114357.13655-2-vbabka@suse.cz |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-74655-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:693c:2685:b0:108:e6aa:91d0 with SMTP id mn5csp981019dyc; Wed, 21 Feb 2024 03:44:33 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCXIJoOKwXu56Doc2kgAQZewj04r6qco82wWBVgT4zLGxA8dphG/+2MshwcEvSlqKgxBXJU6McxwdLE1kOEVLKRF4qWqIg== X-Google-Smtp-Source: AGHT+IERa2sN4VQX/05K8dI1mzB+930IL6q43GUXcJtVY8XKJBXKldziC60me2/wTzkZMsABHX93 X-Received: by 2002:a05:6808:1520:b0:3c1:7027:1d71 with SMTP id u32-20020a056808152000b003c170271d71mr1401174oiw.37.1708515873025; Wed, 21 Feb 2024 03:44:33 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1708515872; cv=pass; d=google.com; s=arc-20160816; b=p0cq4GjOoCY6EBamvPndBDdud8CnBNjgrE9FsRmLXy6JseG86yHjUx/ytwyJv1rcpl zXYkD+DkIiev6xPGzY+p/7lJZ6gWvGAvkx4kLpzWcITqd9YYtm71OuHcZHl6awBhFgNX dyCKWXKUK/qIJVYTn1haU1XrKMJYm8KXN2wmZ8XSw1/jmZZH0zD95EDAKEETe8c6gZ2f RepzA3yr5I0eZVc1SjVcMN2hmBNnlutMirQOP0qOWD8rgTpa/NSvFJl6420BGPtBvSxc Ii0wbq2mY4P3WaHpl2p1L5S/l+PaEwmFcm5cXWkPDfbx3tMvi1jPFBJYlhtw0V85sHRv 97cw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature:dkim-signature :dkim-signature:dkim-signature; bh=3PZj5do2Zn1U7eHBDXemVKtNuh921UBQkK1vHaJCPTw=; fh=b0KJghYtfEj01lpJD4NJGpqkNnmfFgrGKT6AqLFh6Qg=; b=qnv7CGxZkVvnDyIKcQ3gR7sJRgM6pd9b6FDQ/+2ndby6BbtMv3wjnllo4Ztb48Mhp6 UkEC7c3SyJJa97E8sMFckKy+yaQZLq8rXncDT51nYeHYO5wUy8mWId641WLBIIZ/R07p RCFwPwRRox7zvteWLY0k4vgc598BSZObGuX6IcTVohHY2c3tHeCHpECuq2CZaEvY44Hm 4S3nxID2fFGYwLg9OFsRspq4H/kDOlKy2/T9hnYm3JB0uBwfakmQ7HT0cp3EA1tfWcM+ 8npptr2ZRIiWDVoWoctB7DvtTVBmnPNahUNdAqMnLFYpyhOjA01e8+05Q/gj+K2KazwK 62Ng==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b="sp4/C4Ty"; dkim=neutral (no key) header.i=@suse.cz header.s=susede2_ed25519; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b="sp4/C4Ty"; dkim=neutral (no key) header.i=@suse.cz header.s=susede2_ed25519; arc=pass (i=1 spf=pass spfdomain=suse.cz dkim=pass dkdomain=suse.cz dkim=pass dkdomain=suse.cz); spf=pass (google.com: domain of linux-kernel+bounces-74655-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-74655-ouuuleilei=gmail.com@vger.kernel.org" Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id s14-20020a056a0008ce00b006e130b90d4csi8156355pfu.254.2024.02.21.03.44.32 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 21 Feb 2024 03:44:32 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-74655-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b="sp4/C4Ty"; dkim=neutral (no key) header.i=@suse.cz header.s=susede2_ed25519; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b="sp4/C4Ty"; dkim=neutral (no key) header.i=@suse.cz header.s=susede2_ed25519; arc=pass (i=1 spf=pass spfdomain=suse.cz dkim=pass dkdomain=suse.cz dkim=pass dkdomain=suse.cz); spf=pass (google.com: domain of linux-kernel+bounces-74655-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-74655-ouuuleilei=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id C8832283F60 for <ouuuleilei@gmail.com>; Wed, 21 Feb 2024 11:44:32 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id E389145BE1; Wed, 21 Feb 2024 11:44:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.cz header.i=@suse.cz header.b="sp4/C4Ty"; dkim=permerror (0-bit key) header.d=suse.cz header.i=@suse.cz header.b="tOERbRIT"; dkim=pass (1024-bit key) header.d=suse.cz header.i=@suse.cz header.b="sp4/C4Ty"; dkim=permerror (0-bit key) header.d=suse.cz header.i=@suse.cz header.b="tOERbRIT" Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 336C741740; Wed, 21 Feb 2024 11:44:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708515855; cv=none; b=AiqAwlwo+5G2Kui8alOc5yCibbryMGy9WFEAu9G1CB5LfVfFADBGBkz84GEC4j87TyZq3CCqPQsrS6uzASS2lVoNJm0oC2jrALS4nz5LkCO4FRO1KbpmrtC0knuqInQAu66G75DOurx7b/OVfUhZM1R+kgFqv8ce6AiELQ8OmcE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708515855; c=relaxed/simple; bh=8+P8M51aPGDYhiKhvw3u07B0GeWAdkCEZ4nbhau6AFM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=LGERHBjofGvYThfzA8TfqOur0NisxB7402SWBawghPa4SRUHiMovhX5IQhbdaH8iaw2qZ9lSZd2QQ7r0xE47HrUw8Vgd3nBWIBQysdT4kiB56rDf9Sg5GfvZMIjRRyeuNTkqeyb9dKan/L4Zfi1RxnO+IGfl2ZXqoRORPmZ7D8A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz; spf=pass smtp.mailfrom=suse.cz; dkim=pass (1024-bit key) header.d=suse.cz header.i=@suse.cz header.b=sp4/C4Ty; dkim=permerror (0-bit key) header.d=suse.cz header.i=@suse.cz header.b=tOERbRIT; dkim=pass (1024-bit key) header.d=suse.cz header.i=@suse.cz header.b=sp4/C4Ty; dkim=permerror (0-bit key) header.d=suse.cz header.i=@suse.cz header.b=tOERbRIT; arc=none smtp.client-ip=195.135.223.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.cz Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 484421FB55; Wed, 21 Feb 2024 11:44:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1708515851; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3PZj5do2Zn1U7eHBDXemVKtNuh921UBQkK1vHaJCPTw=; b=sp4/C4TyFwmJYeTEdjtbwxk30XrvTy+bsqUGVNiMYf6igoxlBuS4nPvBuCpPM+pKWbJ1pO sUMSmfyX32P6pptNWfob7oHrH6nZqN8V73JFsh15rGFk+PD2M20VvKcDBzc60NNX8EG1su rxKR42sO+y13LGRK8Q0ynn18xU207gg= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1708515851; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3PZj5do2Zn1U7eHBDXemVKtNuh921UBQkK1vHaJCPTw=; b=tOERbRITgvoXRWGR14ROJ7MyMmVg/JlkKY3zxk/5H6MlQp/GKC6ToIjdEJ3DJUC/n/twHh fjXj9OrMRBn5/UBg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1708515851; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3PZj5do2Zn1U7eHBDXemVKtNuh921UBQkK1vHaJCPTw=; b=sp4/C4TyFwmJYeTEdjtbwxk30XrvTy+bsqUGVNiMYf6igoxlBuS4nPvBuCpPM+pKWbJ1pO sUMSmfyX32P6pptNWfob7oHrH6nZqN8V73JFsh15rGFk+PD2M20VvKcDBzc60NNX8EG1su rxKR42sO+y13LGRK8Q0ynn18xU207gg= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1708515851; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3PZj5do2Zn1U7eHBDXemVKtNuh921UBQkK1vHaJCPTw=; b=tOERbRITgvoXRWGR14ROJ7MyMmVg/JlkKY3zxk/5H6MlQp/GKC6ToIjdEJ3DJUC/n/twHh fjXj9OrMRBn5/UBg== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 1F78C13A69; Wed, 21 Feb 2024 11:44:11 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id gL9TBwvi1WUYWQAAD6G6ig (envelope-from <vbabka@suse.cz>); Wed, 21 Feb 2024 11:44:11 +0000 From: Vlastimil Babka <vbabka@suse.cz> To: Andrew Morton <akpm@linux-foundation.org>, svenva@chromium.org Cc: bgeffon@google.com, cujomalainey@chromium.org, kramasub@chromium.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-sound@vger.kernel.org, perex@perex.cz, stable@vger.kernel.org, tiwai@suse.com, tiwai@suse.de, vbabka@suse.cz, Michal Hocko <mhocko@kernel.org>, Mel Gorman <mgorman@techsingularity.net> Subject: [PATCH] mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations Date: Wed, 21 Feb 2024 12:43:58 +0100 Message-ID: <20240221114357.13655-2-vbabka@suse.cz> X-Mailer: git-send-email 2.43.1 In-Reply-To: <CAG-rBihs_xMKb3wrMO1+-+p4fowP9oy1pa_OTkfxBzPUVOZF+g@mail.gmail.com> References: <CAG-rBihs_xMKb3wrMO1+-+p4fowP9oy1pa_OTkfxBzPUVOZF+g@mail.gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Authentication-Results: smtp-out2.suse.de; none X-Spamd-Result: default: False [1.90 / 50.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; R_MISSING_CHARSET(2.50)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; BROKEN_CONTENT_TYPE(1.50)[]; RCVD_COUNT_THREE(0.00)[3]; DKIM_SIGNED(0.00)[suse.cz:s=susede2_rsa,suse.cz:s=susede2_ed25519]; RCPT_COUNT_TWELVE(0.00)[15]; MID_CONTAINS_FROM(1.00)[]; DBL_BLOCKED_OPENRESOLVER(0.00)[chromium.org:email,suse.cz:email]; FUZZY_BLOCKED(0.00)[rspamd.com]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_ALL(0.00)[]; BAYES_HAM(-3.00)[100.00%] X-Spam-Level: * X-Spam-Score: 1.90 X-Spam-Flag: NO X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1791508739972219164 X-GMAIL-MSGID: 1791508739972219164 |
Series |
mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations
|
|
Commit Message
Vlastimil Babka
Feb. 21, 2024, 11:43 a.m. UTC
Sven reports an infinite loop in __alloc_pages_slowpath() for costly
order __GFP_RETRY_MAYFAIL allocations that are also GFP_NOIO. Such
combination can happen in a suspend/resume context where a GFP_KERNEL
allocation can have __GFP_IO masked out via gfp_allowed_mask.
Quoting Sven:
1. try to do a "costly" allocation (order > PAGE_ALLOC_COSTLY_ORDER)
with __GFP_RETRY_MAYFAIL set.
2. page alloc's __alloc_pages_slowpath tries to get a page from the
freelist. This fails because there is nothing free of that costly
order.
3. page alloc tries to reclaim by calling __alloc_pages_direct_reclaim,
which bails out because a zone is ready to be compacted; it pretends
to have made a single page of progress.
4. page alloc tries to compact, but this always bails out early because
__GFP_IO is not set (it's not passed by the snd allocator, and even
if it were, we are suspending so the __GFP_IO flag would be cleared
anyway).
5. page alloc believes reclaim progress was made (because of the
pretense in item 3) and so it checks whether it should retry
compaction. The compaction retry logic thinks it should try again,
because:
a) reclaim is needed because of the early bail-out in item 4
b) a zonelist is suitable for compaction
6. goto 2. indefinite stall.
(end quote)
The immediate root cause is confusing the COMPACT_SKIPPED returned from
__alloc_pages_direct_compact() (step 4) due to lack of __GFP_IO to be
indicating a lack of order-0 pages, and in step 5 evaluating that in
should_compact_retry() as a reason to retry, before incrementing and
limiting the number of retries. There are however other places that
wrongly assume that compaction can happen while we lack __GFP_IO.
To fix this, introduce gfp_compaction_allowed() to abstract the __GFP_IO
evaluation and switch the open-coded test in try_to_compact_pages() to
use it.
Also use the new helper in:
- compaction_ready(), which will make reclaim not bail out in step 3, so
there's at least one attempt to actually reclaim, even if chances are
small for a costly order
- in_reclaim_compaction() which will make should_continue_reclaim()
return false and we don't over-reclaim unnecessarily
- in __alloc_pages_slowpath() to set a local variable can_compact,
which is then used to avoid retrying reclaim/compaction for costly
allocations (step 5) if we can't compact and also to skip the early
compaction attempt that we do in some cases
Reported-by: Sven van Ashbrook <svenva@chromium.org>
Closes: https://lore.kernel.org/all/CAG-rBihs_xMKb3wrMO1%2B-%2Bp4fowP9oy1pa_OTkfxBzPUVOZF%2Bg@mail.gmail.com/
Fixes: 3250845d0526 ("Revert "mm, oom: prevent premature OOM killer invocation for high order request"")
Cc: <stable@vger.kernel.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
include/linux/gfp.h | 9 +++++++++
mm/compaction.c | 7 +------
mm/page_alloc.c | 10 ++++++----
mm/vmscan.c | 5 ++++-
4 files changed, 20 insertions(+), 11 deletions(-)
Comments
Thanks so much ! We will stress test this on our side. We do this by exhausting memory and triggering many suspend/resume cycles. This reliably reproduces the problem (before this patch). Of course, as we all know, absence of evidence (no more stalls in stress tests) does not equal evidence of absence (stalls are gone in all code paths).
With this patch, the test results have been promising so far. After 10 hours of stress testing, we have not hit the reported problem yet. We will keep testing and report here if we hit the problem again. Thanks for engaging with us. On Wed, Feb 21, 2024 at 8:53 AM Sven van Ashbrook <svenva@chromium.org> wrote: > > Thanks so much ! We will stress test this on our side. > > We do this by exhausting memory and triggering many suspend/resume > cycles. This reliably reproduces the problem (before this patch). > > Of course, as we all know, absence of evidence (no more stalls in stress tests) > does not equal evidence of absence (stalls are gone in all code paths).
Vlastimil, We noticed that this patch is now added to Andrew Morton's mm-hotfixes-unstable branch. How can we help to get this into mm-hotfixes-stable and from there, into mainline ? We are still stress-testing using low memory suspend. Anything else that is required, and we can help with? Sven On Wed, Feb 21, 2024 at 6:44 AM Vlastimil Babka <vbabka@suse.cz> wrote: > > Sven reports an infinite loop in __alloc_pages_slowpath() for costly > order __GFP_RETRY_MAYFAIL allocations that are also GFP_NOIO. Such > combination can happen in a suspend/resume context where a GFP_KERNEL > allocation can have __GFP_IO masked out via gfp_allowed_mask. > > Quoting Sven: > > 1. try to do a "costly" allocation (order > PAGE_ALLOC_COSTLY_ORDER) > with __GFP_RETRY_MAYFAIL set. > > 2. page alloc's __alloc_pages_slowpath tries to get a page from the > freelist. This fails because there is nothing free of that costly > order. > > 3. page alloc tries to reclaim by calling __alloc_pages_direct_reclaim, > which bails out because a zone is ready to be compacted; it pretends > to have made a single page of progress. > > 4. page alloc tries to compact, but this always bails out early because > __GFP_IO is not set (it's not passed by the snd allocator, and even > if it were, we are suspending so the __GFP_IO flag would be cleared > anyway). > > 5. page alloc believes reclaim progress was made (because of the > pretense in item 3) and so it checks whether it should retry > compaction. The compaction retry logic thinks it should try again, > because: > a) reclaim is needed because of the early bail-out in item 4 > b) a zonelist is suitable for compaction > > 6. goto 2. indefinite stall. > > (end quote) > > The immediate root cause is confusing the COMPACT_SKIPPED returned from > __alloc_pages_direct_compact() (step 4) due to lack of __GFP_IO to be > indicating a lack of order-0 pages, and in step 5 evaluating that in > should_compact_retry() as a reason to retry, before incrementing and > limiting the number of retries. There are however other places that > wrongly assume that compaction can happen while we lack __GFP_IO. > > To fix this, introduce gfp_compaction_allowed() to abstract the __GFP_IO > evaluation and switch the open-coded test in try_to_compact_pages() to > use it. > > Also use the new helper in: > - compaction_ready(), which will make reclaim not bail out in step 3, so > there's at least one attempt to actually reclaim, even if chances are > small for a costly order > - in_reclaim_compaction() which will make should_continue_reclaim() > return false and we don't over-reclaim unnecessarily > - in __alloc_pages_slowpath() to set a local variable can_compact, > which is then used to avoid retrying reclaim/compaction for costly > allocations (step 5) if we can't compact and also to skip the early > compaction attempt that we do in some cases > > Reported-by: Sven van Ashbrook <svenva@chromium.org> > Closes: https://lore.kernel.org/all/CAG-rBihs_xMKb3wrMO1%2B-%2Bp4fowP9oy1pa_OTkfxBzPUVOZF%2Bg@mail.gmail.com/ > Fixes: 3250845d0526 ("Revert "mm, oom: prevent premature OOM killer invocation for high order request"") > Cc: <stable@vger.kernel.org> > Signed-off-by: Vlastimil Babka <vbabka@suse.cz> > --- > include/linux/gfp.h | 9 +++++++++ > mm/compaction.c | 7 +------ > mm/page_alloc.c | 10 ++++++---- > mm/vmscan.c | 5 ++++- > 4 files changed, 20 insertions(+), 11 deletions(-) > > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > index de292a007138..e2a916cf29c4 100644 > --- a/include/linux/gfp.h > +++ b/include/linux/gfp.h > @@ -353,6 +353,15 @@ static inline bool gfp_has_io_fs(gfp_t gfp) > return (gfp & (__GFP_IO | __GFP_FS)) == (__GFP_IO | __GFP_FS); > } > > +/* > + * Check if the gfp flags allow compaction - GFP_NOIO is a really > + * tricky context because the migration might require IO. > + */ > +static inline bool gfp_compaction_allowed(gfp_t gfp_mask) > +{ > + return IS_ENABLED(CONFIG_COMPACTION) && (gfp_mask & __GFP_IO); > +} > + > extern gfp_t vma_thp_gfp_mask(struct vm_area_struct *vma); > > #ifdef CONFIG_CONTIG_ALLOC > diff --git a/mm/compaction.c b/mm/compaction.c > index 4add68d40e8d..b961db601df4 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -2723,16 +2723,11 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order, > unsigned int alloc_flags, const struct alloc_context *ac, > enum compact_priority prio, struct page **capture) > { > - int may_perform_io = (__force int)(gfp_mask & __GFP_IO); > struct zoneref *z; > struct zone *zone; > enum compact_result rc = COMPACT_SKIPPED; > > - /* > - * Check if the GFP flags allow compaction - GFP_NOIO is really > - * tricky context because the migration might require IO > - */ > - if (!may_perform_io) > + if (!gfp_compaction_allowed(gfp_mask)) > return COMPACT_SKIPPED; > > trace_mm_compaction_try_to_compact_pages(order, gfp_mask, prio); > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 150d4f23b010..a663202045dc 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -4041,6 +4041,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > struct alloc_context *ac) > { > bool can_direct_reclaim = gfp_mask & __GFP_DIRECT_RECLAIM; > + bool can_compact = gfp_compaction_allowed(gfp_mask); > const bool costly_order = order > PAGE_ALLOC_COSTLY_ORDER; > struct page *page = NULL; > unsigned int alloc_flags; > @@ -4111,7 +4112,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > * Don't try this for allocations that are allowed to ignore > * watermarks, as the ALLOC_NO_WATERMARKS attempt didn't yet happen. > */ > - if (can_direct_reclaim && > + if (can_direct_reclaim && can_compact && > (costly_order || > (order > 0 && ac->migratetype != MIGRATE_MOVABLE)) > && !gfp_pfmemalloc_allowed(gfp_mask)) { > @@ -4209,9 +4210,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > > /* > * Do not retry costly high order allocations unless they are > - * __GFP_RETRY_MAYFAIL > + * __GFP_RETRY_MAYFAIL and we can compact > */ > - if (costly_order && !(gfp_mask & __GFP_RETRY_MAYFAIL)) > + if (costly_order && (!can_compact || > + !(gfp_mask & __GFP_RETRY_MAYFAIL))) > goto nopage; > > if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags, > @@ -4224,7 +4226,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > * implementation of the compaction depends on the sufficient amount > * of free memory (see __compaction_suitable) > */ > - if (did_some_progress > 0 && > + if (did_some_progress > 0 && can_compact && > should_compact_retry(ac, order, alloc_flags, > compact_result, &compact_priority, > &compaction_retries)) > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 4f9c854ce6cc..4255619a1a31 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -5753,7 +5753,7 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) > /* Use reclaim/compaction for costly allocs or under memory pressure */ > static bool in_reclaim_compaction(struct scan_control *sc) > { > - if (IS_ENABLED(CONFIG_COMPACTION) && sc->order && > + if (gfp_compaction_allowed(sc->gfp_mask) && sc->order && > (sc->order > PAGE_ALLOC_COSTLY_ORDER || > sc->priority < DEF_PRIORITY - 2)) > return true; > @@ -5998,6 +5998,9 @@ static inline bool compaction_ready(struct zone *zone, struct scan_control *sc) > { > unsigned long watermark; > > + if (!gfp_compaction_allowed(sc->gfp_mask)) > + return false; > + > /* Allocation can already succeed, nothing to do */ > if (zone_watermark_ok(zone, sc->order, min_wmark_pages(zone), > sc->reclaim_idx, 0)) > -- > 2.43.1 >
On 2/26/24 17:09, Sven van Ashbrook wrote: > Vlastimil, > > We noticed that this patch is now added to Andrew Morton's > mm-hotfixes-unstable branch. > > How can we help to get this into mm-hotfixes-stable > and from there, into mainline ? A Tested-by: can't hurt, but of course you need to finish the testing first. It's encouraging you didn't hit the bug yet anymore, thought! > We are still stress-testing using low memory suspend. Anything else > that is required, and we can help with? I think that's already great enough, thanks! > Sven
On Wed, Feb 21, 2024 at 4:44 AM Vlastimil Babka <vbabka@suse.cz> wrote: > > Sven reports an infinite loop in __alloc_pages_slowpath() for costly > order __GFP_RETRY_MAYFAIL allocations that are also GFP_NOIO. Such > combination can happen in a suspend/resume context where a GFP_KERNEL > allocation can have __GFP_IO masked out via gfp_allowed_mask. > > Quoting Sven: > > 1. try to do a "costly" allocation (order > PAGE_ALLOC_COSTLY_ORDER) > with __GFP_RETRY_MAYFAIL set. > > 2. page alloc's __alloc_pages_slowpath tries to get a page from the > freelist. This fails because there is nothing free of that costly > order. > > 3. page alloc tries to reclaim by calling __alloc_pages_direct_reclaim, > which bails out because a zone is ready to be compacted; it pretends > to have made a single page of progress. > > 4. page alloc tries to compact, but this always bails out early because > __GFP_IO is not set (it's not passed by the snd allocator, and even > if it were, we are suspending so the __GFP_IO flag would be cleared > anyway). > > 5. page alloc believes reclaim progress was made (because of the > pretense in item 3) and so it checks whether it should retry > compaction. The compaction retry logic thinks it should try again, > because: > a) reclaim is needed because of the early bail-out in item 4 > b) a zonelist is suitable for compaction > > 6. goto 2. indefinite stall. > > (end quote) > > The immediate root cause is confusing the COMPACT_SKIPPED returned from > __alloc_pages_direct_compact() (step 4) due to lack of __GFP_IO to be > indicating a lack of order-0 pages, and in step 5 evaluating that in > should_compact_retry() as a reason to retry, before incrementing and > limiting the number of retries. There are however other places that > wrongly assume that compaction can happen while we lack __GFP_IO. > > To fix this, introduce gfp_compaction_allowed() to abstract the __GFP_IO > evaluation and switch the open-coded test in try_to_compact_pages() to > use it. > > Also use the new helper in: > - compaction_ready(), which will make reclaim not bail out in step 3, so > there's at least one attempt to actually reclaim, even if chances are > small for a costly order > - in_reclaim_compaction() which will make should_continue_reclaim() > return false and we don't over-reclaim unnecessarily > - in __alloc_pages_slowpath() to set a local variable can_compact, > which is then used to avoid retrying reclaim/compaction for costly > allocations (step 5) if we can't compact and also to skip the early > compaction attempt that we do in some cases > > Reported-by: Sven van Ashbrook <svenva@chromium.org> > Closes: https://lore.kernel.org/all/CAG-rBihs_xMKb3wrMO1%2B-%2Bp4fowP9oy1pa_OTkfxBzPUVOZF%2Bg@mail.gmail.com/ > Fixes: 3250845d0526 ("Revert "mm, oom: prevent premature OOM killer invocation for high order request"") > Cc: <stable@vger.kernel.org> > Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Karthikeyan Ramasubramanian <kramasub@chromium.org> > --- > include/linux/gfp.h | 9 +++++++++ > mm/compaction.c | 7 +------ > mm/page_alloc.c | 10 ++++++---- > mm/vmscan.c | 5 ++++- > 4 files changed, 20 insertions(+), 11 deletions(-) > > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > index de292a007138..e2a916cf29c4 100644 > --- a/include/linux/gfp.h > +++ b/include/linux/gfp.h > @@ -353,6 +353,15 @@ static inline bool gfp_has_io_fs(gfp_t gfp) > return (gfp & (__GFP_IO | __GFP_FS)) == (__GFP_IO | __GFP_FS); > } > > +/* > + * Check if the gfp flags allow compaction - GFP_NOIO is a really > + * tricky context because the migration might require IO. > + */ > +static inline bool gfp_compaction_allowed(gfp_t gfp_mask) > +{ > + return IS_ENABLED(CONFIG_COMPACTION) && (gfp_mask & __GFP_IO); > +} > + > extern gfp_t vma_thp_gfp_mask(struct vm_area_struct *vma); > > #ifdef CONFIG_CONTIG_ALLOC > diff --git a/mm/compaction.c b/mm/compaction.c > index 4add68d40e8d..b961db601df4 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -2723,16 +2723,11 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order, > unsigned int alloc_flags, const struct alloc_context *ac, > enum compact_priority prio, struct page **capture) > { > - int may_perform_io = (__force int)(gfp_mask & __GFP_IO); > struct zoneref *z; > struct zone *zone; > enum compact_result rc = COMPACT_SKIPPED; > > - /* > - * Check if the GFP flags allow compaction - GFP_NOIO is really > - * tricky context because the migration might require IO > - */ > - if (!may_perform_io) > + if (!gfp_compaction_allowed(gfp_mask)) > return COMPACT_SKIPPED; > > trace_mm_compaction_try_to_compact_pages(order, gfp_mask, prio); > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 150d4f23b010..a663202045dc 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -4041,6 +4041,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > struct alloc_context *ac) > { > bool can_direct_reclaim = gfp_mask & __GFP_DIRECT_RECLAIM; > + bool can_compact = gfp_compaction_allowed(gfp_mask); > const bool costly_order = order > PAGE_ALLOC_COSTLY_ORDER; > struct page *page = NULL; > unsigned int alloc_flags; > @@ -4111,7 +4112,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > * Don't try this for allocations that are allowed to ignore > * watermarks, as the ALLOC_NO_WATERMARKS attempt didn't yet happen. > */ > - if (can_direct_reclaim && > + if (can_direct_reclaim && can_compact && > (costly_order || > (order > 0 && ac->migratetype != MIGRATE_MOVABLE)) > && !gfp_pfmemalloc_allowed(gfp_mask)) { > @@ -4209,9 +4210,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > > /* > * Do not retry costly high order allocations unless they are > - * __GFP_RETRY_MAYFAIL > + * __GFP_RETRY_MAYFAIL and we can compact > */ > - if (costly_order && !(gfp_mask & __GFP_RETRY_MAYFAIL)) > + if (costly_order && (!can_compact || > + !(gfp_mask & __GFP_RETRY_MAYFAIL))) > goto nopage; > > if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags, > @@ -4224,7 +4226,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > * implementation of the compaction depends on the sufficient amount > * of free memory (see __compaction_suitable) > */ > - if (did_some_progress > 0 && > + if (did_some_progress > 0 && can_compact && > should_compact_retry(ac, order, alloc_flags, > compact_result, &compact_priority, > &compaction_retries)) > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 4f9c854ce6cc..4255619a1a31 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -5753,7 +5753,7 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) > /* Use reclaim/compaction for costly allocs or under memory pressure */ > static bool in_reclaim_compaction(struct scan_control *sc) > { > - if (IS_ENABLED(CONFIG_COMPACTION) && sc->order && > + if (gfp_compaction_allowed(sc->gfp_mask) && sc->order && > (sc->order > PAGE_ALLOC_COSTLY_ORDER || > sc->priority < DEF_PRIORITY - 2)) > return true; > @@ -5998,6 +5998,9 @@ static inline bool compaction_ready(struct zone *zone, struct scan_control *sc) > { > unsigned long watermark; > > + if (!gfp_compaction_allowed(sc->gfp_mask)) > + return false; > + > /* Allocation can already succeed, nothing to do */ > if (zone_watermark_ok(zone, sc->order, min_wmark_pages(zone), > sc->reclaim_idx, 0)) > -- > 2.43.1 >
diff --git a/include/linux/gfp.h b/include/linux/gfp.h index de292a007138..e2a916cf29c4 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -353,6 +353,15 @@ static inline bool gfp_has_io_fs(gfp_t gfp) return (gfp & (__GFP_IO | __GFP_FS)) == (__GFP_IO | __GFP_FS); } +/* + * Check if the gfp flags allow compaction - GFP_NOIO is a really + * tricky context because the migration might require IO. + */ +static inline bool gfp_compaction_allowed(gfp_t gfp_mask) +{ + return IS_ENABLED(CONFIG_COMPACTION) && (gfp_mask & __GFP_IO); +} + extern gfp_t vma_thp_gfp_mask(struct vm_area_struct *vma); #ifdef CONFIG_CONTIG_ALLOC diff --git a/mm/compaction.c b/mm/compaction.c index 4add68d40e8d..b961db601df4 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -2723,16 +2723,11 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order, unsigned int alloc_flags, const struct alloc_context *ac, enum compact_priority prio, struct page **capture) { - int may_perform_io = (__force int)(gfp_mask & __GFP_IO); struct zoneref *z; struct zone *zone; enum compact_result rc = COMPACT_SKIPPED; - /* - * Check if the GFP flags allow compaction - GFP_NOIO is really - * tricky context because the migration might require IO - */ - if (!may_perform_io) + if (!gfp_compaction_allowed(gfp_mask)) return COMPACT_SKIPPED; trace_mm_compaction_try_to_compact_pages(order, gfp_mask, prio); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 150d4f23b010..a663202045dc 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4041,6 +4041,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, struct alloc_context *ac) { bool can_direct_reclaim = gfp_mask & __GFP_DIRECT_RECLAIM; + bool can_compact = gfp_compaction_allowed(gfp_mask); const bool costly_order = order > PAGE_ALLOC_COSTLY_ORDER; struct page *page = NULL; unsigned int alloc_flags; @@ -4111,7 +4112,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, * Don't try this for allocations that are allowed to ignore * watermarks, as the ALLOC_NO_WATERMARKS attempt didn't yet happen. */ - if (can_direct_reclaim && + if (can_direct_reclaim && can_compact && (costly_order || (order > 0 && ac->migratetype != MIGRATE_MOVABLE)) && !gfp_pfmemalloc_allowed(gfp_mask)) { @@ -4209,9 +4210,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, /* * Do not retry costly high order allocations unless they are - * __GFP_RETRY_MAYFAIL + * __GFP_RETRY_MAYFAIL and we can compact */ - if (costly_order && !(gfp_mask & __GFP_RETRY_MAYFAIL)) + if (costly_order && (!can_compact || + !(gfp_mask & __GFP_RETRY_MAYFAIL))) goto nopage; if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags, @@ -4224,7 +4226,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, * implementation of the compaction depends on the sufficient amount * of free memory (see __compaction_suitable) */ - if (did_some_progress > 0 && + if (did_some_progress > 0 && can_compact && should_compact_retry(ac, order, alloc_flags, compact_result, &compact_priority, &compaction_retries)) diff --git a/mm/vmscan.c b/mm/vmscan.c index 4f9c854ce6cc..4255619a1a31 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -5753,7 +5753,7 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) /* Use reclaim/compaction for costly allocs or under memory pressure */ static bool in_reclaim_compaction(struct scan_control *sc) { - if (IS_ENABLED(CONFIG_COMPACTION) && sc->order && + if (gfp_compaction_allowed(sc->gfp_mask) && sc->order && (sc->order > PAGE_ALLOC_COSTLY_ORDER || sc->priority < DEF_PRIORITY - 2)) return true; @@ -5998,6 +5998,9 @@ static inline bool compaction_ready(struct zone *zone, struct scan_control *sc) { unsigned long watermark; + if (!gfp_compaction_allowed(sc->gfp_mask)) + return false; + /* Allocation can already succeed, nothing to do */ if (zone_watermark_ok(zone, sc->order, min_wmark_pages(zone), sc->reclaim_idx, 0))