From patchwork Fri Aug 4 22:38:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 131401 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:c44e:0:b0:3f2:4152:657d with SMTP id w14csp153307vqr; Fri, 4 Aug 2023 16:09:01 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGfGdTiluIDT+WIx2V1/YPgeQdzAZ+gtxH2aRlTbX/7USCrfA+UOoJbmmCHXSKAPe/yu8tP X-Received: by 2002:a17:902:ab0c:b0:1bb:b855:db3c with SMTP id ik12-20020a170902ab0c00b001bbb855db3cmr2597263plb.41.1691190540995; Fri, 04 Aug 2023 16:09:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691190540; cv=none; d=google.com; s=arc-20160816; b=mSw9QBt3yw8RpeHGQ/wNdoq2xy/VgTwlM0erE08xppYEsGt85SSwtIJ8f1WDzJGPxC J4yUYBPgn0hzR3eubXeAwv31+sjk+t4DKnuclUp1UY/oVC+drk1zvFkv5rUoczwo7xc5 dpQMxpTeGSlGE+/mmr/Ape8XMiuUZc+WWQqM/w4su7C0ofzPhoc6iEHEwPF2P07/AOcf 71L2f6pxQwFhesqjDwS848YQULKEU0cYyZHQ64vRpBrky1Y+8ZVhjgNmvBhUWKMjEFa6 b5sXpJvlzBL1ljearYeWEb2gYcF1QMtJrKnb1wfVlJPHho8o3fY7Dyp3SNVoYfOTDNM1 TnkA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-disposition:mime-version:message-id :subject:cc:to:from:date:dkim-signature; bh=9XikFsxqVjlq51j/czaQ7MeGkcssjET2PDMus5XdV5c=; fh=Fa5BiLoFuJrDKRWF1RpTtRjuNKYLxaIrJWLcJCepQCE=; b=iyee1C0z0PrgGplVDndmu053NOGMzdYASZHVTAWAZNfdsUA8PaDJYKpn3xyME+4sHl j1WykZ89IwyJOoNTJFumm+0Grik6oaJVYF33POGrvj9C4pnbDWVB/qXKt8w1SJN3ZI4L cEK+AAxF/ha9s6KHtWd/31WnDtXlGEPbejKP22ost+Q2wL648uhFtRw45D6hoSCO4+qO 1IdlNVDCM1bIiUlLvnNhw9eZ0wSzbrQw5YIxGyV7Q5SS5PkyY3iztFoj64SiGDPj57BL 0TULIYSJtHOnh1pO/lkXBDx2fkEc0gfvCwsS0kUhsHbxW57ZXJXrBCXy6V/jA5MBh8Bf vLfw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=B75Owh8y; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l8-20020a170902f68800b001b9e9edc44bsi2472836plg.474.2023.08.04.16.08.45; Fri, 04 Aug 2023 16:09:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=B75Owh8y; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229826AbjHDWjJ (ORCPT + 99 others); Fri, 4 Aug 2023 18:39:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43250 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229634AbjHDWjG (ORCPT ); Fri, 4 Aug 2023 18:39:06 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 92D694EDD; Fri, 4 Aug 2023 15:38:56 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 2CB4862159; Fri, 4 Aug 2023 22:38:56 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7B86CC433C8; Fri, 4 Aug 2023 22:38:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1691188735; bh=4ZHUpAEsPHLIZuavcCFPyzt3vciVGgDCLfCh4b1DlrI=; h=Date:From:To:Cc:Subject:From; b=B75Owh8yBQ9FC2zY2RMF4UB1IPUzVgHbg/9nWepfU/JwRF7fpu+xjatD0vigtoO5K 7rcOk0wpr3Cbc4Z2zcvqr+RA9jYmyBYcJROzHrQQk8bZsUVmGgqCDRprgLpqIVmVp0 fPoICOSerN5a8/b62AbvJrtbAW2OfRcECIdl9OtxBIilNgxKqha3ZVuqQui/AgFs68 3ZcpenDZaEHM+9TbgXHI7q5D9z5P7QS58Lp5rrAF2JBpreyC4z1nUu/nRAx0T55/On VPQnUyVNSK+xIjM6NVLAQ7UjK7mDEDCL0EmJty64nq+N7zZLroxB7mu/jWr7O8zoVp PdmaRmbTHHAtA== Date: Fri, 4 Aug 2023 15:38:54 -0700 From: "Darrick J. Wong" To: Dave Chinner Cc: tglx@linutronix.de, peterz@infradead.org, xfs , linux-kernel Subject: [RFC PATCH] xfs: fix per-cpu CIL structure aggregation racing with dying cpus Message-ID: <20230804223854.GL11352@frogsfrogsfrogs> MIME-Version: 1.0 Content-Disposition: inline X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773341812819325205 X-GMAIL-MSGID: 1773341812819325205 From: Darrick J. Wong In commit 7c8ade2121200 ("xfs: implement percpu cil space used calculation"), the XFS committed (log) item list code was converted to use per-cpu lists and space tracking to reduce cpu contention when multiple threads are modifying different parts of the filesystem and hence end up contending on the log structures during transaction commit. Each CPU tracks its own commit items and space usage, and these do not have to be merged into the main CIL until either someone wants to push the CIL items, or we run over a soft threshold and switch to slower (but more accurate) accounting with atomics. Unfortunately, the for_each_cpu iteration suffers from the same race with cpu dying problem that was identified in commit 8b57b11cca88f ("pcpcntrs: fix dying cpu summation race") -- CPUs are removed from cpu_online_mask before the CPUHP_XFS_DEAD callback gets called. As a result, both CIL percpu structure aggregation functions fail to collect the items and accounted space usage at the correct point in time. If we're lucky, the items that are collected from the online cpus exceed the space given to those cpus, and the log immediately shuts down in xlog_cil_insert_items due to the (apparent) log reservation overrun. This happens periodically with generic/650, which exercises cpu hotplug vs. the filesystem code. Applying the same sort of fix from 8b57b11cca88f to the CIL code seems to make the generic/650 problem go away, but I've been told that tglx was not happy when he saw: "...the only thing we actually need to care about is that percpu_counter_sum() iterates dying CPUs. That's trivial to do, and when there are no CPUs dying, it has no addition overhead except for a cpumask_or() operation." I have no idea what the /correct/ solution is, but I've been holding on to this patch for 3 months. In that time, 8b57b11cca88 hasn't been reverted and cpu_dying_mask hasn't been removed, so I'm sending this and we'll see what happens. So, how /do/ we perform periodic aggregation of per-cpu data safely? Move the xlog_cil_pcp_dead call to the dying section, where at least the other cpus will all be stopped? And have it dump its items into any online cpu's data structure? Link: https://lore.kernel.org/lkml/877cuj1mt1.ffs@tglx/ Link: https://lore.kernel.org/lkml/20230414162755.281993820@linutronix.de/ Cc: tglx@linutronix.de Cc: peterz@infradead.org Signed-off-by: Darrick J. Wong Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_log_cil.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c index eccbfb99e8948..5c909ffcb25b7 100644 --- a/fs/xfs/xfs_log_cil.c +++ b/fs/xfs/xfs_log_cil.c @@ -124,7 +124,7 @@ xlog_cil_push_pcp_aggregate( struct xlog_cil_pcp *cilpcp; int cpu; - for_each_online_cpu(cpu) { + for_each_cpu_or(cpu, cpu_online_mask, cpu_dying_mask) { cilpcp = per_cpu_ptr(cil->xc_pcp, cpu); ctx->ticket->t_curr_res += cilpcp->space_reserved; @@ -165,7 +165,7 @@ xlog_cil_insert_pcp_aggregate( if (!test_and_clear_bit(XLOG_CIL_PCP_SPACE, &cil->xc_flags)) return; - for_each_online_cpu(cpu) { + for_each_cpu_or(cpu, cpu_online_mask, cpu_dying_mask) { int old, prev; cilpcp = per_cpu_ptr(cil->xc_pcp, cpu);