Message ID | 20230518124142.57644-1-jiahao.os@bytedance.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp462221vqo; Thu, 18 May 2023 05:44:24 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4R92b2YD8ESSyWzlu9OmzjWbX+sWOSqTzvCofIllqBduuQXoUaVwUnds/KdEDrbCUeMkck X-Received: by 2002:a17:903:2303:b0:1ac:a61c:7a14 with SMTP id d3-20020a170903230300b001aca61c7a14mr3055224plh.62.1684413863837; Thu, 18 May 2023 05:44:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684413863; cv=none; d=google.com; s=arc-20160816; b=p/p+0Xsb0BW7ZeLAHXp+lQf9BDh0LJ6FUdjXY09vYMkmVISr+JjXdEMlB1xsj2LK+Y zL6wB3INL17asi9WBdbPXV3bZs+crFsMC5EZCKqBlqy1+Gi5kyvpsWi+0eOcwOjwiage CVh8vuGUye2y56HfVYk08frcXmBPwBLBp385MW9y1t5QZWawnnTlYLiT6aF9d1IWxStA C7rupqNEYTkxszejGUdL7iyqv+2Ggd9fPODUhorVh4Fy70sAdOOI/zQ8jerCaP/LI0/C xR8fJJ4JhTaINFWw0tP61KWunrGjyTg5ATThjrUFHoaKwzR4CnjGvr5IESWSdYv5+QX/ 01Mg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=G7R4w9fF0O96j6E7s3O266tf7WZDrfjpg6MXApwCsXo=; b=kCZfbPESnoUIyZBsyw60AN6sHLug0G5GBF/PrTw4p0BIqTRvpChPP34U6iF+FS/fQp wysaRA3oYfvarqzpgliifa4/CtZFpoI6LCbCtYhCCR1BTJfi5xGohaqUHnsVWH14sNwZ pePG3PAFUHUoEBFcoywKakmobp6g3iMpo+Tivjg1mfGQIpuaozgcuJy65NaFtPqxyk5P QkUt4RT5Ll3vn2fQrJ78hPBXWgSVrPTOvXwpskzV6UQyAiP01asUzNGVidJU+lUBQYXy RMP/bA9p4tblRRSLnHJMoxWBbM64/X613HaRmnOpDW1ZorGsJzM4uXyIAm3ZN4jL1JYs eEXw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=G337Jniv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i5-20020a17090332c500b001ae6b1c3fb6si1200283plr.470.2023.05.18.05.44.03; Thu, 18 May 2023 05:44:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=G337Jniv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230267AbjERMnE (ORCPT <rfc822;cscallsign@gmail.com> + 99 others); Thu, 18 May 2023 08:43:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34978 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230031AbjERMnC (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 18 May 2023 08:43:02 -0400 Received: from mail-pg1-x536.google.com (mail-pg1-x536.google.com [IPv6:2607:f8b0:4864:20::536]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E4D898F for <linux-kernel@vger.kernel.org>; Thu, 18 May 2023 05:42:14 -0700 (PDT) Received: by mail-pg1-x536.google.com with SMTP id 41be03b00d2f7-51b0f9d7d70so1747280a12.1 for <linux-kernel@vger.kernel.org>; Thu, 18 May 2023 05:42:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1684413710; x=1687005710; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=G7R4w9fF0O96j6E7s3O266tf7WZDrfjpg6MXApwCsXo=; b=G337Jniv8+DJ0J/DvJ6ENbRKNs4k874YWBFd2ohEIzp0ds70nTGJ60CnbM/XKAyfe1 BMzS7qgPK5UE40CrXFR+Dvjh5EKuLqcO6Epl6XiK879eDQMZ1wEnxro4lSDJoXzjRv0E XS39RR1GE2RxAlk8KtWS5z4p8Z8icCnHkFEr3X9FPv83Ik8dr5WBwusUrHN3Gke4l/ql OIkEwvQr5+VQGQtMZtJGeKJKjZJNXorI0J84gW+ZaEtvP1B/05L5RxoY7l824zKUzYXc WYnSl5UTbNw0lHsPJxovLA4cOdKi8pbkoOVedeC7LOd0eHTV4DvDVLvhSGWAhjkaeNYt hM6Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684413710; x=1687005710; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=G7R4w9fF0O96j6E7s3O266tf7WZDrfjpg6MXApwCsXo=; b=gMEzQ5tcWStFysGC0SnkYC3ZeVTqSxlooPTDkW9Koetn+S2aB6qR7IFRquOGMdUxqQ X2V3+gBx62V7k4OdC4uuIt+sgXo40L/HByYAhtOjcqBxIunuGH2SP56Dhk5rG4TwwIrK ZwfftSR5WfkvS5tzYTapkKpAY5UiNPK+ZaeBhTVvf9EOHQ7Jpx9uOpadO8bwOv+BRZYJ hnRhHmFeeIPWE7Rk/C6f20bw/2rTdgp02iRxSQec/03kLxZr6uS1BfKuYdJ1yPOip/cD gcm9WDlyy48skc5fcXY1OpDj7+Ur8/hU8a1MTMCacqvKPUxyl+Ga0tWkmPywRSfcod6G 1RTQ== X-Gm-Message-State: AC+VfDyU53N2/RFituHTzaA37kNb5eCXiHXvuHfW6mChtUsbouBBLpGA 7xlu/uNHj2qGoMJ9CcXH4SwbaQ== X-Received: by 2002:a17:90a:e284:b0:253:4409:3c4a with SMTP id d4-20020a17090ae28400b0025344093c4amr2751990pjz.28.1684413710262; Thu, 18 May 2023 05:41:50 -0700 (PDT) Received: from C02G87K0MD6R.bytedance.net ([139.177.225.244]) by smtp.gmail.com with ESMTPSA id cx14-20020a17090afd8e00b0025352448ba9sm2116434pjb.0.2023.05.18.05.41.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 May 2023 05:41:49 -0700 (PDT) From: Hao Jia <jiahao.os@bytedance.com> To: tj@kernel.org, lizefan.x@bytedance.com, hannes@cmpxchg.org Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Hao Jia <jiahao.os@bytedance.com> Subject: [PATCH] cgroup: rstat: Simplified cgroup_base_stat_flush() update last_bstat logic Date: Thu, 18 May 2023 20:41:42 +0800 Message-Id: <20230518124142.57644-1-jiahao.os@bytedance.com> X-Mailer: git-send-email 2.37.0 (Apple Git-136) MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766235951331318772?= X-GMAIL-MSGID: =?utf-8?q?1766235951331318772?= |
Series |
cgroup: rstat: Simplified cgroup_base_stat_flush() update last_bstat logic
|
|
Commit Message
Hao Jia
May 18, 2023, 12:41 p.m. UTC
In cgroup_base_stat_flush() function, {rstatc, cgrp}->last_bstat
needs to be updated to the current {rstatc, cgrp}->bstat, directly
assigning values instead of adding the last value to delta.
Signed-off-by: Hao Jia <jiahao.os@bytedance.com>
---
kernel/cgroup/rstat.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
Comments
On 2023/5/18 Hao Jia wrote: > In cgroup_base_stat_flush() function, {rstatc, cgrp}->last_bstat > needs to be updated to the current {rstatc, cgrp}->bstat, directly > assigning values instead of adding the last value to delta. > > Signed-off-by: Hao Jia <jiahao.os@bytedance.com> > --- > kernel/cgroup/rstat.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c > index 9c4c55228567..3e5c4c1c92c6 100644 > --- a/kernel/cgroup/rstat.c > +++ b/kernel/cgroup/rstat.c > @@ -376,14 +376,14 @@ static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu) > /* propagate percpu delta to global */ > cgroup_base_stat_sub(&delta, &rstatc->last_bstat); *(1)* > cgroup_base_stat_add(&cgrp->bstat, &delta); > - cgroup_base_stat_add(&rstatc->last_bstat, &delta); > + rstatc->last_bstat = rstatc->bstat; *(2)* Some things are wrong, the value of rstatc->bstat at (1) and (2) may not be the same, rstatc->bstat may be updated on other cpu. Sorry for the noise. > > /* propagate global delta to parent (unless that's root) */ > if (cgroup_parent(parent)) { > delta = cgrp->bstat; > cgroup_base_stat_sub(&delta, &cgrp->last_bstat); > cgroup_base_stat_add(&parent->bstat, &delta); > - cgroup_base_stat_add(&cgrp->last_bstat, &delta); > + cgrp->last_bstat = cgrp->bstat; > } > } > Maybe something like this? In cgroup_base_stat_flush() function, {rstatc, cgrp}->last_bstat needs to be updated to the current {rstatc, cgrp}->bstat after the calculation. For the rstatc->last_bstat case, rstatc->bstat may be updated on other cpus during our calculation, resulting in inconsistent rstatc->bstat statistics for the two reads. So we use the temporary variable @cur to record the read statc->bstat statistics, and use @cur to update rstatc->last_bstat. For the cgrp->last_bstat case, we already hold cgroup_rstat_lock, so cgrp->bstat will not change during the calculation process, and it can be directly used to update cgrp->last_bstat. It is better for us to assign directly instead of using cgroup_base_stat_add() to update {rstatc, cgrp}->last_bstat. Signed-off-by: Hao Jia <jiahao.os@bytedance.com> --- kernel/cgroup/rstat.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index 9c4c55228567..17a6a1fcc2d4 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -360,7 +360,7 @@ static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu) { struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(cgrp, cpu); struct cgroup *parent = cgroup_parent(cgrp); - struct cgroup_base_stat delta; + struct cgroup_base_stat delta, cur; unsigned seq; /* Root-level stats are sourced from system-wide CPU stats */ @@ -370,20 +370,21 @@ static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu) /* fetch the current per-cpu values */ do { seq = __u64_stats_fetch_begin(&rstatc->bsync); - delta = rstatc->bstat; + cur = rstatc->bstat; } while (__u64_stats_fetch_retry(&rstatc->bsync, seq)); /* propagate percpu delta to global */ + delta = cur; cgroup_base_stat_sub(&delta, &rstatc->last_bstat); cgroup_base_stat_add(&cgrp->bstat, &delta); - cgroup_base_stat_add(&rstatc->last_bstat, &delta); + rstatc->last_bstat = cur; /* propagate global delta to parent (unless that's root) */ if (cgroup_parent(parent)) { delta = cgrp->bstat; cgroup_base_stat_sub(&delta, &cgrp->last_bstat); cgroup_base_stat_add(&parent->bstat, &delta); - cgroup_base_stat_add(&cgrp->last_bstat, &delta); + cgrp->last_bstat = cgrp->bstat; } }
Hello Jia. On Fri, May 19, 2023 at 12:15:57PM +0800, Hao Jia <jiahao.os@bytedance.com> wrote: > Maybe something like this? (Next time please send with a version bump in subject.) > In cgroup_base_stat_flush() function, {rstatc, cgrp}->last_bstat > needs to be updated to the current {rstatc, cgrp}->bstat after the > calculation. > > For the rstatc->last_bstat case, rstatc->bstat may be updated on other > cpus during our calculation, resulting in inconsistent rstatc->bstat > statistics for the two reads. So we use the temporary variable @cur to > record the read statc->bstat statistics, and use @cur to update > rstatc->last_bstat. If a concurrent update happens after sample of bstat was taken for calculation, it won't be reflected in the flushed result. But subsequent flush will use the updated bstat and the difference from last_bstat would account for that concurrent change (and any other changes between the flushes). IOW flushing cannot prevent concurrent updates but it will give eventually consistent (repeated without more updates) results. > It is better for us to assign directly instead of using > cgroup_base_stat_add() to update {rstatc, cgrp}->last_bstat. Or do you mean the copying is faster then arithmetics? Thanks, Michal
On 2023/5/23 Michal Koutný wrote: > Hello Jia. > > On Fri, May 19, 2023 at 12:15:57PM +0800, Hao Jia <jiahao.os@bytedance.com> wrote: >> Maybe something like this? > > (Next time please send with a version bump in subject.) Thanks for your review, I will do it in the next version. > > >> In cgroup_base_stat_flush() function, {rstatc, cgrp}->last_bstat >> needs to be updated to the current {rstatc, cgrp}->bstat after the >> calculation. >> >> For the rstatc->last_bstat case, rstatc->bstat may be updated on other >> cpus during our calculation, resulting in inconsistent rstatc->bstat >> statistics for the two reads. So we use the temporary variable @cur to >> record the read statc->bstat statistics, and use @cur to update >> rstatc->last_bstat. > > If a concurrent update happens after sample of bstat was taken for > calculation, it won't be reflected in the flushed result. > But subsequent flush will use the updated bstat and the difference from > last_bstat would account for that concurrent change (and any other > changes between the flushes). > > IOW flushing cannot prevent concurrent updates but it will give > eventually consistent (repeated without more updates) results. > Yes, so we need @curr to record the bstat value after the sequence fetch is completed. >> It is better for us to assign directly instead of using >> cgroup_base_stat_add() to update {rstatc, cgrp}->last_bstat. > > Or do you mean the copying is faster then arithmetics? > Yes, but it may not be obvious. Another reason is that when we complete an update, we snapshot last_bstat as the current bstat, which is better for readers to understand. Arithmetics is somewhat obscure. Thanks, Hao
On Wed, May 24, 2023 at 02:54:10PM +0800, Hao Jia <jiahao.os@bytedance.com> wrote: > Yes, so we need @curr to record the bstat value after the sequence fetch is > completed. No, I still don't see a problem that it solves. If you find incorrect data being reported, please explain it more/with an example. > Yes, but it may not be obvious. > Another reason is that when we complete an update, we snapshot last_bstat as > the current bstat, which is better for readers to understand. Arithmetics is > somewhat obscure. The readability here is subjective. It'd be interesting to have some data comparing arithmetics vs copying though. HTH, Michal
On 2023/5/24 Michal Koutný wrote: > On Wed, May 24, 2023 at 02:54:10PM +0800, Hao Jia <jiahao.os@bytedance.com> wrote: >> Yes, so we need @curr to record the bstat value after the sequence fetch is >> completed. > > No, I still don't see a problem that it solves. If you find incorrect > data being reported, please explain it more/with an example. Sorry to confuse you. My earliest patch is like this: diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index 9c4c55228567..3e5c4c1c92c6 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -376,14 +376,14 @@ static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu) /* propagate percpu delta to global */ cgroup_base_stat_sub(&delta, &rstatc->last_bstat); (1) <--- cgroup_base_stat_add(&cgrp->bstat, &delta); - cgroup_base_stat_add(&rstatc->last_bstat, &delta); + rstatc->last_bstat = rstatc->bstat; (2) <-- /* propagate global delta to parent (unless that's root) */ if (cgroup_parent(parent)) { delta = cgrp->bstat; cgroup_base_stat_sub(&delta, &cgrp->last_bstat); cgroup_base_stat_add(&parent->bstat, &delta); - cgroup_base_stat_add(&cgrp->last_bstat, &delta); + cgrp->last_bstat = cgrp->bstat; } } If I understand correctly, the rstatc->bstat at (1) and (2) may be different. At (2) rstatc->bstat may have been updated on other CPUs. Or we should not read rstatc->bstat directly, we should pass the following way do { seq = __u64_stats_fetch_begin(&rstatc->bsync); cur = rstatc->bstat; } while (__u64_stats_fetch_retry(&rstatc->bsync, seq)); > >> Yes, but it may not be obvious. >> Another reason is that when we complete an update, we snapshot last_bstat as >> the current bstat, which is better for readers to understand. Arithmetics is >> somewhat obscure. > > The readability here is subjective. It'd be interesting to have some > data comparing arithmetics vs copying though. Thanks for your suggestion, I plan to use RDTSC to compare the time consumption of arithmetics vs copying. Do you have better suggestions or tools? Thanks, Hao
On 2023/5/24 Michal Koutný wrote: > On Wed, May 24, 2023 at 02:54:10PM +0800, Hao Jia <jiahao.os@bytedance.com> wrote: >> Yes, so we need @curr to record the bstat value after the sequence fetch is >> completed. > > No, I still don't see a problem that it solves. If you find incorrect > data being reported, please explain it more/with an example. > >> Yes, but it may not be obvious. >> Another reason is that when we complete an update, we snapshot last_bstat as >> the current bstat, which is better for readers to understand. Arithmetics is >> somewhat obscure. > > The readability here is subjective. It'd be interesting to have some > data comparing arithmetics vs copying though. > Sorry for replying you so late. I am using RDTSC on my machine (an Intel Xeon(R) Platinum 8260 CPU@2.40GHz machine with 2 NUMA nodes each of which has 24 cores with SMT2 enabled, so 96 CPUs in total.) to compare the time consumption of arithmetics vs copying. There is almost no difference in the time consumption between arithmetics and copying. > HTH, > Michal
On Mon, Jun 12, 2023 at 11:13:41AM +0800, Hao Jia <jiahao.os@bytedance.com> wrote: > Sorry for replying you so late. I am using RDTSC on my machine (an Intel > Xeon(R) Platinum 8260 CPU@2.40GHz machine with 2 NUMA nodes each of which > has 24 cores with SMT2 enabled, so 96 CPUs in total.) to compare the time > consumption of arithmetics vs copying. There is almost no difference in the > time consumption between arithmetics and copying. Thanks for carrying out and sharing this despite not convincing towards the change. Michal
diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index 9c4c55228567..3e5c4c1c92c6 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -376,14 +376,14 @@ static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu) /* propagate percpu delta to global */ cgroup_base_stat_sub(&delta, &rstatc->last_bstat); cgroup_base_stat_add(&cgrp->bstat, &delta); - cgroup_base_stat_add(&rstatc->last_bstat, &delta); + rstatc->last_bstat = rstatc->bstat; /* propagate global delta to parent (unless that's root) */ if (cgroup_parent(parent)) { delta = cgrp->bstat; cgroup_base_stat_sub(&delta, &cgrp->last_bstat); cgroup_base_stat_add(&parent->bstat, &delta); - cgroup_base_stat_add(&cgrp->last_bstat, &delta); + cgrp->last_bstat = cgrp->bstat; } }