From patchwork Tue Mar 28 22:16:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yosry Ahmed X-Patchwork-Id: 76285 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp5766vqo; Tue, 28 Mar 2023 15:20:19 -0700 (PDT) X-Google-Smtp-Source: AKy350buYWZzdgdY4AKucAVFt96paBRAGZDYA6ggiQxLdLkj1ldjpgQkAeZySt7fIC7mADp82n2a X-Received: by 2002:a17:902:d153:b0:1a0:5d0b:c31e with SMTP id t19-20020a170902d15300b001a05d0bc31emr12242135plt.44.1680042019531; Tue, 28 Mar 2023 15:20:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680042019; cv=none; d=google.com; s=arc-20160816; b=e1/pb/uaKQajf3UCNsiuRtppdTsjAWMgkv90E0UfhWKJJNyF7KwJc7M1UY6y4S/TXB /UmLQQN1U+3xXyLFI8PgrObn+tAuropFipwN6bR+o4/DxnLvd8xGQGBNm6FvUKwCud1L gn+9mVo7Z94mFXsvVrfQUR1tyo8f0JjnNYLqDfRl45xOTLr0/L6Yv6Z25GtCo6wSBfOR 8RybTCQ9ER4tX65M9rKtrm9f0NHurNNVO8GNJLH/A6EhIjLWefjbcQrLPaztQEexiJIx hodwPWUX+Bxi4I0QzocJNazofM8aLqTb0GrsB4zQg+An6Bsu6OMEvRRLtqX/hp9I+plV iV4Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=PicCaNLBAprYzcnNZKIFMVmM/pcbqkg3z9jyIWcMc+U=; b=J2Z8ynlahvlBgZxcc59R6rJWhpmprFDiiIdL7D7Gv/MwhLAWvYjoCZsPClCkDJB0OK GtlFqex1fG7IvbYY73FgdP+p6eIs2JGTfQ64JLFCEBbxwbaAhxDJM6+c6NrO4DjeNnKq h3RTpK0IAKH8JR5VWiK5vEBv20YqBgsqECXcOaSn77gjgfAgYy+K2xO49cE8JiKKHENg JPP2swXPDVW9pCD4sNwgV/W3zpF6dpsM6fv6rD1JSFYQI94HpJMMTQO35ua9GHJoSVGT MLqc17xqyCBGH2DzhsyhkBxd/naF1OZ326vVZhUlmy21a51rtfuAmLT0ocZMkApwBcmA Au0g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=h3KlqW+j; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y13-20020a170902cacd00b0019ced5a540csi27905099pld.370.2023.03.28.15.20.06; Tue, 28 Mar 2023 15:20:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=h3KlqW+j; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230219AbjC1WRX (ORCPT + 99 others); Tue, 28 Mar 2023 18:17:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39358 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230147AbjC1WRO (ORCPT ); Tue, 28 Mar 2023 18:17:14 -0400 Received: from mail-pl1-x64a.google.com (mail-pl1-x64a.google.com [IPv6:2607:f8b0:4864:20::64a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7C6B52D5B for ; Tue, 28 Mar 2023 15:17:00 -0700 (PDT) Received: by mail-pl1-x64a.google.com with SMTP id s11-20020a170902a50b00b001a1f8fc0d2cso8420135plq.15 for ; Tue, 28 Mar 2023 15:17:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1680041820; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=PicCaNLBAprYzcnNZKIFMVmM/pcbqkg3z9jyIWcMc+U=; b=h3KlqW+jIEJYTK+BpLRlcd0fDHvnJwiAwYWj6YfVTBV4VsUspeq7grO2+Nyd5Hs9eL i4P1tnbfhFqPfNn0TjyAIHaXO2gZSr29zn48TaONjgFqCTDEnUQ3f8yJSse0IzidGN/G NpCoYbM12CNmhz16TMo5PtvGfqzYl/w2IuM1QOzqxIgbpDSLm4BrT8d/pKei+Ju4GCP/ Ogx40HCdcd1U7cHn91PPNKl9xn/1CmyhrjIjg+lC++Gh679/KOFvr1XoPNmeY+cl/pIr M7KlNiqZzpwmUKaPk8gBIyxokfcnZJdVNjXvvh+f/KGJuZhwnOInjtgGjgRxKbQyAU78 gp4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680041820; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=PicCaNLBAprYzcnNZKIFMVmM/pcbqkg3z9jyIWcMc+U=; b=u7IkQGTo/ln95AQORBRF2LZ0OG5SjDo7nR8PSVi86w31bCQmXODtCisuKtw72CYfQh AVdAGuDBE8Fy6D+7rq+TUGVnn4b5/D2YG12io+/GWl0XvCcR52qx/yaHwse/5swBMfua ulrY1FnoEgqjps9COROtdKUytKLTAaMfEUShZQJ/aPXEDrtAdLId7bx4WHT8xwuNbxuu nEJwlppMYKjD71xOC9nSImmAT1OWPvIrV0gyV2kOv6LBoRXC7maITgp4aF37k3nZNfQz 49il8wwDLhMEousEw3IcqGsBKO3TZhE802cC7sMy+SNhC74+IqyTidtV0dg/jLLgt4Ma 4wYQ== X-Gm-Message-State: AAQBX9e8p8HRJpPSL2Vsc11v2noF8h52FCIcMQ07xfonHG6vv8mUTDb1 vJA9Z4PziWJ5PVmMKWozUeCsQGQEV5qGRFiY X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a63:ce08:0:b0:50c:6cd:cace with SMTP id y8-20020a63ce08000000b0050c06cdcacemr4745231pgf.2.1680041819916; Tue, 28 Mar 2023 15:16:59 -0700 (PDT) Date: Tue, 28 Mar 2023 22:16:40 +0000 In-Reply-To: <20230328221644.803272-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230328221644.803272-1-yosryahmed@google.com> X-Mailer: git-send-email 2.40.0.348.gf938b09366-goog Message-ID: <20230328221644.803272-6-yosryahmed@google.com> Subject: [PATCH v2 5/9] memcg: replace stats_flush_lock with an atomic From: Yosry Ahmed To: Tejun Heo , Josef Bacik , Jens Axboe , Zefan Li , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , " =?utf-8?q?Michal_Koutn=C3=BD?= " Cc: Vasily Averin , cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Yosry Ahmed X-Spam-Status: No, score=-7.7 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761651740899178036?= X-GMAIL-MSGID: =?utf-8?q?1761651740899178036?= As Johannes notes in [1], stats_flush_lock is currently used to: (a) Protect updated to stats_flush_threshold. (b) Protect updates to flush_next_time. (c) Serializes calls to cgroup_rstat_flush() based on those ratelimits. However: 1. stats_flush_threshold is already an atomic 2. flush_next_time is not atomic. The writer is locked, but the reader is lockless. If the reader races with a flush, you could see this: if (time_after(jiffies, flush_next_time)) spin_trylock() flush_next_time = now + delay flush() spin_unlock() spin_trylock() flush_next_time = now + delay flush() spin_unlock() which means we already can get flushes at a higher frequency than FLUSH_TIME during races. But it isn't really a problem. The reader could also see garbled partial updates, so it needs at least READ_ONCE and WRITE_ONCE protection. 3. Serializing cgroup_rstat_flush() calls against the ratelimit factors is currently broken because of the race in 2. But the race is actually harmless, all we might get is the occasional earlier flush. If there is no delta, the flush won't do much. And if there is, the flush is justified. So the lock can be removed all together. However, the lock also served the purpose of preventing a thundering herd problem for concurrent flushers, see [2]. Use an atomic instead to serve the purpose of unifying concurrent flushers. [1]https://lore.kernel.org/lkml/20230323172732.GE739026@cmpxchg.org/ [2]https://lore.kernel.org/lkml/20210716212137.1391164-2-shakeelb@google.com/ Signed-off-by: Yosry Ahmed Acked-by: Johannes Weiner Acked-by: Shakeel Butt Acked-by: Michal Hocko --- mm/memcontrol.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index ff39f78f962e..65750f8b8259 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -585,8 +585,8 @@ mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz) */ static void flush_memcg_stats_dwork(struct work_struct *w); static DECLARE_DEFERRABLE_WORK(stats_flush_dwork, flush_memcg_stats_dwork); -static DEFINE_SPINLOCK(stats_flush_lock); static DEFINE_PER_CPU(unsigned int, stats_updates); +static atomic_t stats_flush_ongoing = ATOMIC_INIT(0); static atomic_t stats_flush_threshold = ATOMIC_INIT(0); static u64 flush_next_time; @@ -636,15 +636,19 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val) static void __mem_cgroup_flush_stats(void) { - unsigned long flag; - - if (!spin_trylock_irqsave(&stats_flush_lock, flag)) + /* + * We always flush the entire tree, so concurrent flushers can just + * skip. This avoids a thundering herd problem on the rstat global lock + * from memcg flushers (e.g. reclaim, refault, etc). + */ + if (atomic_read(&stats_flush_ongoing) || + atomic_xchg(&stats_flush_ongoing, 1)) return; - flush_next_time = jiffies_64 + 2*FLUSH_TIME; + WRITE_ONCE(flush_next_time, jiffies_64 + 2*FLUSH_TIME); cgroup_rstat_flush_atomic(root_mem_cgroup->css.cgroup); atomic_set(&stats_flush_threshold, 0); - spin_unlock_irqrestore(&stats_flush_lock, flag); + atomic_set(&stats_flush_ongoing, 0); } void mem_cgroup_flush_stats(void) @@ -655,7 +659,7 @@ void mem_cgroup_flush_stats(void) void mem_cgroup_flush_stats_ratelimited(void) { - if (time_after64(jiffies_64, flush_next_time)) + if (time_after64(jiffies_64, READ_ONCE(flush_next_time))) mem_cgroup_flush_stats(); }