From patchwork Fri Jun 23 17:12:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 112245 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp5962152vqr; Fri, 23 Jun 2023 11:24:59 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5bJxssfdbQC6DD+J1qYVfrmuxvacQ+uThrizwCOu02n9Uzgog8oNtDf4Xev95Sz0//ox8P X-Received: by 2002:a05:6358:1a8f:b0:131:d52f:ca6b with SMTP id gm15-20020a0563581a8f00b00131d52fca6bmr7153842rwb.24.1687544698954; Fri, 23 Jun 2023 11:24:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687544698; cv=none; d=google.com; s=arc-20160816; b=0OMR0nnD/oY91LQmPlsRBXI7x1pDM4RdRwa10yqv+JGvpAPva+su9p+1vSm5ysPY3w 0CAS//2R4C5vktWUspKomCGm+lXHXP2phAUApy16FUMsa3dJtUdzyM++UJXcZuhbLS4C kqL9c8kQcWn4QRJctszGqn1FWrZACLNVzBu0CeWFCZEOSP/jyXZqK5p4O/JFfMIckpFh wWv6hWIq+ZS6y+zzBuInAPVYsQDpE4FIbOSLuLa/tDh13WqpbIqx4Hucjz0fH7W0K2/6 IzjVfwjo723hT01mev+zzFqXQO0vhs5xFnSVTNt2/ch4vr8NXRus3xQzLm88m6yja9cm aeww== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:dkim-signature :dkim-signature:from; bh=BMld1PknhyoTTWqIQWTi/haDnPaStkrULY18977v1W8=; b=NqRPB0naLHMd5VybLPoG8B8U80bsDoC20o/Yt7lvPAZ7LrYH7dpp6LDcmMKQLRg38H qMz0DOkEGwVfkMprEHAMwdUhtBskVeYI03x7ypyQ8t3hDJ8fasgBaI1LUH0vsecQKbKN 2YAgjlwxOFANvm5MDIVtZrkzGp95OeK0PJnkwbgwTkqgLmB6iyIJfviFkZjJSvj9soys raD9xCXfm0kxI0zf1P8Yl0hNWQZdEcTBBZB9/BWXsSYAvodxFMcY5EDUbx6WlIF1CVnH LmiV/LfeGDAEd3QIV4tEec0S0z3PrPttW8t75d/+dkzk4A1ycRwZpWslcZheRCpkkrnQ naig== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b="UooxLz/z"; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i28-20020a633c5c000000b005572aaee706si7103541pgn.689.2023.06.23.11.24.42; Fri, 23 Jun 2023 11:24:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b="UooxLz/z"; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231761AbjFWRMx (ORCPT + 99 others); Fri, 23 Jun 2023 13:12:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42628 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230501AbjFWRMq (ORCPT ); Fri, 23 Jun 2023 13:12:46 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 08BD01993 for ; Fri, 23 Jun 2023 10:12:43 -0700 (PDT) From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1687540361; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BMld1PknhyoTTWqIQWTi/haDnPaStkrULY18977v1W8=; b=UooxLz/zkoWhq+XAmNcit5ZQDI4n/Y+KVronARo5RtTZ+Tn6qjAbf+SOx/FC241sDV8f4t t8HJpjsAu06tlvLNMcBnutuGtTbknm/McJdfqacY7/9mKIMPXaChuyDfV91Lysn8JRgd04 FEajI9b521OGBvRhaLW4XirU9DI8WhB8c6njc7IoxIwm2p33Jxy1cOxv66rkXxgB5Chru+ A/FRmn+Hy1gok2d3NWwOZ5rM/Dtjk9URSGAuXs6LWjFMo+DbMSueXkd5UnHH+D8fM0FIeG T9YGtSeiToHljMH9F6NvMNkhsbUtBC+W2HnzhVsbBO2nnoCWRzHlsR620hmdWQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1687540361; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BMld1PknhyoTTWqIQWTi/haDnPaStkrULY18977v1W8=; b=8kTgNXsUUa80s7tfPsheHF9UbSnMTZUFyjFjS52xpo0dV5r9dRvc1KbatV5CicN60n6keS OX2i42K9YrjV0vBg== To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: "Luis Claudio R. Goncalves" , Andrew Morton , Boqun Feng , Ingo Molnar , John Ogness , Mel Gorman , Michal Hocko , Peter Zijlstra , Petr Mladek , Tetsuo Handa , Thomas Gleixner , Waiman Long , Will Deacon , Sebastian Andrzej Siewior Subject: [PATCH v2 2/2] mm/page_alloc: Use write_seqlock_irqsave() instead write_seqlock() + local_irq_save(). Date: Fri, 23 Jun 2023 19:12:32 +0200 Message-Id: <20230623171232.892937-3-bigeasy@linutronix.de> In-Reply-To: <20230623171232.892937-1-bigeasy@linutronix.de> References: <20230623171232.892937-1-bigeasy@linutronix.de> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769518870272293978?= X-GMAIL-MSGID: =?utf-8?q?1769518870272293978?= __build_all_zonelists() acquires zonelist_update_seq by first disabling interrupts via local_irq_save() and then acquiring the seqlock with write_seqlock(). This is troublesome and leads to problems on PREEMPT_RT. The problem is that the inner spinlock_t becomes a sleeping lock on PREEMPT_RT and must not be acquired with disabled interrupts. The API provides write_seqlock_irqsave() which does the right thing in one step. printk_deferred_enter() has to be invoked in non-migrate-able context to ensure that deferred printing is enabled and disabled on the same CPU. This is the case after zonelist_update_seq has been acquired. There was discussion on the first submission that the order should be: local_irq_disable(); printk_deferred_enter(); write_seqlock(); to avoid pitfalls like having an unaccounted printk() coming from write_seqlock_irqsave() before printk_deferred_enter() is invoked. The only origin of such a printk() can be a lockdep splat because the lockdep annotation happens after the sequence count is incremented. This is exceptional and subject to change. It was also pointed that PREEMPT_RT can be affected by the printk problem since its write_seqlock_irqsave() does not really disable interrupts. This isn't the case because PREEMPT_RT's printk implementation differs from the mainline implementation in two important aspects: - Printing happens in a dedicated threads and not at during the invocation of printk(). - In emergency cases where synchronous printing is used, a different driver is used which does not use tty_port::lock. Acquire zonelist_update_seq with write_seqlock_irqsave() and then defer printk output. Fixes: 1007843a91909 ("mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock") Signed-off-by: Sebastian Andrzej Siewior Acked-by: Michal Hocko --- mm/page_alloc.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 47421bedc12b7..99b7e7d09c5c0 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5808,11 +5808,10 @@ static void __build_all_zonelists(void *data) unsigned long flags; /* - * Explicitly disable this CPU's interrupts before taking seqlock - * to prevent any IRQ handler from calling into the page allocator - * (e.g. GFP_ATOMIC) that could hit zonelist_iter_begin and livelock. + * The zonelist_update_seq must be acquired with irqsave because the + * reader can be invoked from IRQ with GFP_ATOMIC. */ - local_irq_save(flags); + write_seqlock_irqsave(&zonelist_update_seq, flags); /* * Explicitly disable this CPU's synchronous printk() before taking * seqlock to prevent any printk() from trying to hold port->lock, for @@ -5820,7 +5819,6 @@ static void __build_all_zonelists(void *data) * calling kmalloc(GFP_ATOMIC | __GFP_NOWARN) with port->lock held. */ printk_deferred_enter(); - write_seqlock(&zonelist_update_seq); #ifdef CONFIG_NUMA memset(node_load, 0, sizeof(node_load)); @@ -5857,9 +5855,8 @@ static void __build_all_zonelists(void *data) #endif } - write_sequnlock(&zonelist_update_seq); printk_deferred_exit(); - local_irq_restore(flags); + write_sequnlock_irqrestore(&zonelist_update_seq, flags); } static noinline void __init