cgroup: rstat: Simplified cgroup_base_stat_flush() update last_bstat logic

Message ID 20230518124142.57644-1-jiahao.os@bytedance.com
State New
Headers
Series cgroup: rstat: Simplified cgroup_base_stat_flush() update last_bstat logic |

Commit Message

Hao Jia May 18, 2023, 12:41 p.m. UTC
  In cgroup_base_stat_flush() function, {rstatc, cgrp}->last_bstat
needs to be updated to the current {rstatc, cgrp}->bstat, directly
assigning values instead of adding the last value to delta.

Signed-off-by: Hao Jia <jiahao.os@bytedance.com>
---
 kernel/cgroup/rstat.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
  

Comments

Hao Jia May 19, 2023, 4:15 a.m. UTC | #1
On 2023/5/18 Hao Jia wrote:
> In cgroup_base_stat_flush() function, {rstatc, cgrp}->last_bstat
> needs to be updated to the current {rstatc, cgrp}->bstat, directly
> assigning values instead of adding the last value to delta.
> 
> Signed-off-by: Hao Jia <jiahao.os@bytedance.com>
> ---
>   kernel/cgroup/rstat.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
> index 9c4c55228567..3e5c4c1c92c6 100644
> --- a/kernel/cgroup/rstat.c
> +++ b/kernel/cgroup/rstat.c
> @@ -376,14 +376,14 @@ static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu)
>   	/* propagate percpu delta to global */
>   	cgroup_base_stat_sub(&delta, &rstatc->last_bstat);  *(1)*
>   	cgroup_base_stat_add(&cgrp->bstat, &delta);
> -	cgroup_base_stat_add(&rstatc->last_bstat, &delta);
> +	rstatc->last_bstat = rstatc->bstat;		    *(2)*

Some things are wrong, the value of rstatc->bstat at (1) and (2) may not 
be the same, rstatc->bstat may be updated on other cpu. Sorry for the noise.

>   
>   	/* propagate global delta to parent (unless that's root) */
>   	if (cgroup_parent(parent)) {
>   		delta = cgrp->bstat;
>   		cgroup_base_stat_sub(&delta, &cgrp->last_bstat);
>   		cgroup_base_stat_add(&parent->bstat, &delta);
> -		cgroup_base_stat_add(&cgrp->last_bstat, &delta);
> +		cgrp->last_bstat = cgrp->bstat;
>   	}
>   }
>   

Maybe something like this?


In cgroup_base_stat_flush() function, {rstatc, cgrp}->last_bstat
needs to be updated to the current {rstatc, cgrp}->bstat after the
calculation.

For the rstatc->last_bstat case, rstatc->bstat may be updated on other
cpus during our calculation, resulting in inconsistent rstatc->bstat
statistics for the two reads. So we use the temporary variable @cur to
record the read statc->bstat statistics, and use @cur to update
rstatc->last_bstat.

For the cgrp->last_bstat case, we already hold cgroup_rstat_lock, so
cgrp->bstat will not change during the calculation process, and it can
be directly used to update cgrp->last_bstat.

It is better for us to assign directly instead of using
cgroup_base_stat_add() to update {rstatc, cgrp}->last_bstat.

Signed-off-by: Hao Jia <jiahao.os@bytedance.com>
---
  kernel/cgroup/rstat.c | 9 +++++----
  1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
index 9c4c55228567..17a6a1fcc2d4 100644
--- a/kernel/cgroup/rstat.c
+++ b/kernel/cgroup/rstat.c
@@ -360,7 +360,7 @@ static void cgroup_base_stat_flush(struct cgroup 
*cgrp, int cpu)
  {
  	struct cgroup_rstat_cpu *rstatc = cgroup_rstat_cpu(cgrp, cpu);
  	struct cgroup *parent = cgroup_parent(cgrp);
-	struct cgroup_base_stat delta;
+	struct cgroup_base_stat delta, cur;
  	unsigned seq;

  	/* Root-level stats are sourced from system-wide CPU stats */
@@ -370,20 +370,21 @@ static void cgroup_base_stat_flush(struct cgroup 
*cgrp, int cpu)
  	/* fetch the current per-cpu values */
  	do {
  		seq = __u64_stats_fetch_begin(&rstatc->bsync);
-		delta = rstatc->bstat;
+		cur = rstatc->bstat;
  	} while (__u64_stats_fetch_retry(&rstatc->bsync, seq));

  	/* propagate percpu delta to global */
+	delta = cur;
  	cgroup_base_stat_sub(&delta, &rstatc->last_bstat);
  	cgroup_base_stat_add(&cgrp->bstat, &delta);
-	cgroup_base_stat_add(&rstatc->last_bstat, &delta);
+	rstatc->last_bstat = cur;

  	/* propagate global delta to parent (unless that's root) */
  	if (cgroup_parent(parent)) {
  		delta = cgrp->bstat;
  		cgroup_base_stat_sub(&delta, &cgrp->last_bstat);
  		cgroup_base_stat_add(&parent->bstat, &delta);
-		cgroup_base_stat_add(&cgrp->last_bstat, &delta);
+		cgrp->last_bstat = cgrp->bstat;
  	}
  }
  
Michal Koutný May 23, 2023, 3:14 p.m. UTC | #2
Hello Jia.

On Fri, May 19, 2023 at 12:15:57PM +0800, Hao Jia <jiahao.os@bytedance.com> wrote:
> Maybe something like this?

(Next time please send with a version bump in subject.)


> In cgroup_base_stat_flush() function, {rstatc, cgrp}->last_bstat
> needs to be updated to the current {rstatc, cgrp}->bstat after the
> calculation.
> 
> For the rstatc->last_bstat case, rstatc->bstat may be updated on other
> cpus during our calculation, resulting in inconsistent rstatc->bstat
> statistics for the two reads. So we use the temporary variable @cur to
> record the read statc->bstat statistics, and use @cur to update
> rstatc->last_bstat.

If a concurrent update happens after sample of bstat was taken for
calculation, it won't be reflected in the flushed result.
But subsequent flush will use the updated bstat and the difference from
last_bstat would account for that concurrent change (and any other
changes between the flushes).

IOW flushing cannot prevent concurrent updates but it will give
eventually consistent (repeated without more updates) results.

> It is better for us to assign directly instead of using
> cgroup_base_stat_add() to update {rstatc, cgrp}->last_bstat.

Or do you mean the copying is faster then arithmetics?

Thanks,
Michal
  
Hao Jia May 24, 2023, 6:54 a.m. UTC | #3
On 2023/5/23 Michal Koutný wrote:
> Hello Jia.
> 
> On Fri, May 19, 2023 at 12:15:57PM +0800, Hao Jia <jiahao.os@bytedance.com> wrote:
>> Maybe something like this?
> 
> (Next time please send with a version bump in subject.)

Thanks for your review, I will do it in the next version.

> 
> 
>> In cgroup_base_stat_flush() function, {rstatc, cgrp}->last_bstat
>> needs to be updated to the current {rstatc, cgrp}->bstat after the
>> calculation.
>>
>> For the rstatc->last_bstat case, rstatc->bstat may be updated on other
>> cpus during our calculation, resulting in inconsistent rstatc->bstat
>> statistics for the two reads. So we use the temporary variable @cur to
>> record the read statc->bstat statistics, and use @cur to update
>> rstatc->last_bstat.
> 
> If a concurrent update happens after sample of bstat was taken for
> calculation, it won't be reflected in the flushed result.
> But subsequent flush will use the updated bstat and the difference from
> last_bstat would account for that concurrent change (and any other
> changes between the flushes).
> 
> IOW flushing cannot prevent concurrent updates but it will give
> eventually consistent (repeated without more updates) results.
> 

Yes, so we need @curr to record the bstat value after the sequence fetch 
is completed.


>> It is better for us to assign directly instead of using
>> cgroup_base_stat_add() to update {rstatc, cgrp}->last_bstat.
> 
> Or do you mean the copying is faster then arithmetics?
> 

Yes, but it may not be obvious.
Another reason is that when we complete an update, we snapshot 
last_bstat as the current bstat, which is better for readers to 
understand. Arithmetics is somewhat obscure.

Thanks,
Hao
  
Michal Koutný May 24, 2023, 8:02 a.m. UTC | #4
On Wed, May 24, 2023 at 02:54:10PM +0800, Hao Jia <jiahao.os@bytedance.com> wrote:
> Yes, so we need @curr to record the bstat value after the sequence fetch is
> completed.

No, I still don't see a problem that it solves. If you find incorrect
data being reported, please explain it more/with an example.

> Yes, but it may not be obvious.
> Another reason is that when we complete an update, we snapshot last_bstat as
> the current bstat, which is better for readers to understand. Arithmetics is
> somewhat obscure.

The readability here is subjective. It'd be interesting to have some
data comparing arithmetics vs copying though.

HTH,
Michal
  
Hao Jia May 24, 2023, 8:41 a.m. UTC | #5
On 2023/5/24 Michal Koutný wrote:
> On Wed, May 24, 2023 at 02:54:10PM +0800, Hao Jia <jiahao.os@bytedance.com> wrote:
>> Yes, so we need @curr to record the bstat value after the sequence fetch is
>> completed.
> 
> No, I still don't see a problem that it solves. If you find incorrect
> data being reported, please explain it more/with an example.

Sorry to confuse you.

My earliest patch is like this:

diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
index 9c4c55228567..3e5c4c1c92c6 100644
--- a/kernel/cgroup/rstat.c
+++ b/kernel/cgroup/rstat.c
@@ -376,14 +376,14 @@ static void cgroup_base_stat_flush(struct cgroup 
*cgrp, int cpu)
     /* propagate percpu delta to global */
     cgroup_base_stat_sub(&delta, &rstatc->last_bstat);  (1) <---
     cgroup_base_stat_add(&cgrp->bstat, &delta);
- cgroup_base_stat_add(&rstatc->last_bstat, &delta);
+ rstatc->last_bstat = rstatc->bstat; 			(2) <--

     /* propagate global delta to parent (unless that's root) */
     if (cgroup_parent(parent)) {
        delta = cgrp->bstat;
        cgroup_base_stat_sub(&delta, &cgrp->last_bstat);
        cgroup_base_stat_add(&parent->bstat, &delta);
- cgroup_base_stat_add(&cgrp->last_bstat, &delta);
+ cgrp->last_bstat = cgrp->bstat;
     }
   }

If I understand correctly, the rstatc->bstat at (1) and (2) may be 
different. At (2) rstatc->bstat may have been updated on other CPUs.
Or we should not read rstatc->bstat directly, we should pass the 
following way

     do {
        seq = __u64_stats_fetch_begin(&rstatc->bsync);
        cur = rstatc->bstat;
     } while (__u64_stats_fetch_retry(&rstatc->bsync, seq));


> 
>> Yes, but it may not be obvious.
>> Another reason is that when we complete an update, we snapshot last_bstat as
>> the current bstat, which is better for readers to understand. Arithmetics is
>> somewhat obscure.
> 
> The readability here is subjective. It'd be interesting to have some
> data comparing arithmetics vs copying though.

Thanks for your suggestion, I plan to use RDTSC to compare the time 
consumption of arithmetics vs copying. Do you have better suggestions or 
tools?

Thanks,
Hao
  
Hao Jia June 12, 2023, 3:13 a.m. UTC | #6
On 2023/5/24 Michal Koutný wrote:
> On Wed, May 24, 2023 at 02:54:10PM +0800, Hao Jia <jiahao.os@bytedance.com> wrote:
>> Yes, so we need @curr to record the bstat value after the sequence fetch is
>> completed.
> 
> No, I still don't see a problem that it solves. If you find incorrect
> data being reported, please explain it more/with an example.
> 
>> Yes, but it may not be obvious.
>> Another reason is that when we complete an update, we snapshot last_bstat as
>> the current bstat, which is better for readers to understand. Arithmetics is
>> somewhat obscure.
> 
> The readability here is subjective. It'd be interesting to have some
> data comparing arithmetics vs copying though.
> 

Sorry for replying you so late. I am using RDTSC on my machine (an Intel 
Xeon(R) Platinum 8260 CPU@2.40GHz machine with 2 NUMA nodes each of 
which has 24 cores with SMT2 enabled, so 96 CPUs in total.) to compare 
the time consumption of arithmetics vs copying. There is almost no 
difference in the time consumption between arithmetics and copying.



> HTH,
> Michal
  
Michal Koutný June 13, 2023, 11:52 a.m. UTC | #7
On Mon, Jun 12, 2023 at 11:13:41AM +0800, Hao Jia <jiahao.os@bytedance.com> wrote:
> Sorry for replying you so late. I am using RDTSC on my machine (an Intel
> Xeon(R) Platinum 8260 CPU@2.40GHz machine with 2 NUMA nodes each of which
> has 24 cores with SMT2 enabled, so 96 CPUs in total.) to compare the time
> consumption of arithmetics vs copying. There is almost no difference in the
> time consumption between arithmetics and copying.

Thanks for carrying out and sharing this despite not convincing towards
the change.

Michal
  

Patch

diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
index 9c4c55228567..3e5c4c1c92c6 100644
--- a/kernel/cgroup/rstat.c
+++ b/kernel/cgroup/rstat.c
@@ -376,14 +376,14 @@  static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu)
 	/* propagate percpu delta to global */
 	cgroup_base_stat_sub(&delta, &rstatc->last_bstat);
 	cgroup_base_stat_add(&cgrp->bstat, &delta);
-	cgroup_base_stat_add(&rstatc->last_bstat, &delta);
+	rstatc->last_bstat = rstatc->bstat;
 
 	/* propagate global delta to parent (unless that's root) */
 	if (cgroup_parent(parent)) {
 		delta = cgrp->bstat;
 		cgroup_base_stat_sub(&delta, &cgrp->last_bstat);
 		cgroup_base_stat_add(&parent->bstat, &delta);
-		cgroup_base_stat_add(&cgrp->last_bstat, &delta);
+		cgrp->last_bstat = cgrp->bstat;
 	}
 }