diff mbox series

[v9,3/3] blk-cgroup: Flush stats at blkgs destruction path

Message ID	20221104182050.342908-4-longman@redhat.com
State	New
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; From: Waiman Long <longman@redhat.com> To: Tejun Heo <tj@kernel.org>, Jens Axboe <axboe@kernel.dk> Cc: cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Ming Lei <ming.lei@redhat.com>, Andy Shevchenko <andriy.shevchenko@linux.intel.com>, Andrew Morton <akpm@linux-foundation.org>, =?utf-8?q?Michal_Koutn=C3=BD?= <mkoutny@suse.com>, Hillf Danton <hdanton@sina.com>, Waiman Long <longman@redhat.com> Subject: [PATCH v9 3/3] blk-cgroup: Flush stats at blkgs destruction path Date: Fri, 4 Nov 2022 14:20:50 -0400 Message-Id: <20221104182050.342908-4-longman@redhat.com> In-Reply-To: <20221104182050.342908-1-longman@redhat.com> References: <20221104182050.342908-1-longman@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	blk-cgroup: Optimize blkcg_rstat_flush() \| [v9,0/3] blk-cgroup: Optimize blkcg_rstat_flush() [v9,1/3] blk-cgroup: Return -ENOMEM directly in blkcg_css_alloc() error path [v9,2/3] blk-cgroup: Optimize blkcg_rstat_flush() [v9,3/3] blk-cgroup: Flush stats at blkgs destruction path

Commit Message

Waiman Long Nov. 4, 2022, 6:20 p.m. UTC

  As noted by Michal, the blkg_iostat_set's in the lockless list
hold reference to blkg's to protect against their removal. Those
blkg's hold reference to blkcg. When a cgroup is being destroyed,
cgroup_rstat_flush() is only called at css_release_work_fn() which is
called when the blkcg reference count reaches 0. This circular dependency
will prevent blkcg from being freed until some other events cause
cgroup_rstat_flush() to be called to flush out the pending blkcg stats.

To prevent this delayed blkcg removal, add a new cgroup_rstat_css_flush()
function to flush stats for a given css and cpu and call it at the blkgs
destruction path, blkcg_destroy_blkgs(), whenever there are still some
pending stats to be flushed. This will ensure that blkcg reference
count can reach 0 ASAP.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 block/blk-cgroup.c     | 15 ++++++++++++++-
 include/linux/cgroup.h |  1 +
 kernel/cgroup/rstat.c  | 20 ++++++++++++++++++++
 3 files changed, 35 insertions(+), 1 deletion(-)

Comments

Tejun Heo Nov. 4, 2022, 8 p.m. UTC | #1

On Fri, Nov 04, 2022 at 02:20:50PM -0400, Waiman Long wrote:
> +/**
> + * cgroup_rstat_css_flush - flush stats for the given css and cpu
> + * @css: target css to be flush
> + * @cpu: the cpu that holds the stats to be flush
> + *
> + * A lightweight rstat flush operation for a given css and cpu.
> + * Only the cpu_lock is being held for mutual exclusion, the cgroup_rstat_lock
> + * isn't used.
> + */
> +void cgroup_rstat_css_flush(struct cgroup_subsys_state *css, int cpu)
> +{
> +	raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu);
> +
> +	raw_spin_lock_irq(cpu_lock);
> +	rcu_read_lock();
> +	css->ss->css_rstat_flush(css, cpu);
> +	rcu_read_unlock();
> +	raw_spin_unlock_irq(cpu_lock);
> +}

Would it make sense to itereate CPUs within the helper rather than asking
the caller to do it? Also, in terms of patch sequencing, this introduces a
bug and then fixes it. Prolly better to not introduce the bug in the first
place.

Thanks.

Waiman Long Nov. 4, 2022, 8:12 p.m. UTC | #2

On 11/4/22 16:00, Tejun Heo wrote:
> On Fri, Nov 04, 2022 at 02:20:50PM -0400, Waiman Long wrote:
>> +/**
>> + * cgroup_rstat_css_flush - flush stats for the given css and cpu
>> + * @css: target css to be flush
>> + * @cpu: the cpu that holds the stats to be flush
>> + *
>> + * A lightweight rstat flush operation for a given css and cpu.
>> + * Only the cpu_lock is being held for mutual exclusion, the cgroup_rstat_lock
>> + * isn't used.
>> + */
>> +void cgroup_rstat_css_flush(struct cgroup_subsys_state *css, int cpu)
>> +{
>> +	raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu);
>> +
>> +	raw_spin_lock_irq(cpu_lock);
>> +	rcu_read_lock();
>> +	css->ss->css_rstat_flush(css, cpu);
>> +	rcu_read_unlock();
>> +	raw_spin_unlock_irq(cpu_lock);
>> +}
> Would it make sense to itereate CPUs within the helper rather than asking
> the caller to do it? Also, in terms of patch sequencing, this introduces a
> bug and then fixes it. Prolly better to not introduce the bug in the first
> place.
>
> Thanks.

I should have named the function cgroup_rstat_css_cpu_flush() to 
indicate that the cpu is a needed parameter. We can have a 
cgroup_rstat_css_flush() in the future if the need arises.

It is an optimization to call this function only if the corresponding 
cpu has a pending lockless list. I could do cpu iteration here and call 
the flushing function for all the CPUs. It is less optimized this way. 
Since it is a slow path, I guess performance is not that critical. So I 
can go either way. Please let me know your preference.

Thanks,
Longman

Tejun Heo Nov. 4, 2022, 8:13 p.m. UTC | #3

On Fri, Nov 04, 2022 at 04:12:05PM -0400, Waiman Long wrote:
> I should have named the function cgroup_rstat_css_cpu_flush() to indicate
> that the cpu is a needed parameter. We can have a cgroup_rstat_css_flush()
> in the future if the need arises.
> 
> It is an optimization to call this function only if the corresponding cpu
> has a pending lockless list. I could do cpu iteration here and call the
> flushing function for all the CPUs. It is less optimized this way. Since it
> is a slow path, I guess performance is not that critical. So I can go either
> way. Please let me know your preference.

Yeah, cpu_flush is fine. Let's leave it that way.

Thanks.

Waiman Long Nov. 4, 2022, 8:21 p.m. UTC | #4

On 11/4/22 16:13, Tejun Heo wrote:
> On Fri, Nov 04, 2022 at 04:12:05PM -0400, Waiman Long wrote:
>> I should have named the function cgroup_rstat_css_cpu_flush() to indicate
>> that the cpu is a needed parameter. We can have a cgroup_rstat_css_flush()
>> in the future if the need arises.
>>
>> It is an optimization to call this function only if the corresponding cpu
>> has a pending lockless list. I could do cpu iteration here and call the
>> flushing function for all the CPUs. It is less optimized this way. Since it
>> is a slow path, I guess performance is not that critical. So I can go either
>> way. Please let me know your preference.
> Yeah, cpu_flush is fine. Let's leave it that way.
>
Will do.

Cheers,
Longman

diff mbox series

Patch

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 3e03c0d13253..fa0a366e3476 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1084,10 +1084,12 @@  struct list_head *blkcg_get_cgwb_list(struct cgroup_subsys_state *css)
  */
 static void blkcg_destroy_blkgs(struct blkcg *blkcg)
 {
+	int cpu;
+
 	might_sleep();
 
+	css_get(&blkcg->css);
 	spin_lock_irq(&blkcg->lock);
-
 	while (!hlist_empty(&blkcg->blkg_list)) {
 		struct blkcg_gq *blkg = hlist_entry(blkcg->blkg_list.first,
 						struct blkcg_gq, blkcg_node);
@@ -1110,6 +1112,17 @@  static void blkcg_destroy_blkgs(struct blkcg *blkcg)
 	}
 
 	spin_unlock_irq(&blkcg->lock);
+
+	/*
+	 * Flush all the non-empty percpu lockless lists.
+	 */
+	for_each_possible_cpu(cpu) {
+		struct llist_head *lhead = per_cpu_ptr(blkcg->lhead, cpu);
+
+		if (!llist_empty(lhead))
+			cgroup_rstat_css_flush(&blkcg->css, cpu);
+	}
+	css_put(&blkcg->css);
 }
 
 /**
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 528bd44b59e2..4a61cc5d1952 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -766,6 +766,7 @@  void cgroup_rstat_flush(struct cgroup *cgrp);
 void cgroup_rstat_flush_irqsafe(struct cgroup *cgrp);
 void cgroup_rstat_flush_hold(struct cgroup *cgrp);
 void cgroup_rstat_flush_release(void);
+void cgroup_rstat_css_flush(struct cgroup_subsys_state *css, int cpu);
 
 /*
  * Basic resource stats.
diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
index 793ecff29038..28033190fb29 100644
--- a/kernel/cgroup/rstat.c
+++ b/kernel/cgroup/rstat.c
@@ -281,6 +281,26 @@  void cgroup_rstat_flush_release(void)
 	spin_unlock_irq(&cgroup_rstat_lock);
 }
 
+/**
+ * cgroup_rstat_css_flush - flush stats for the given css and cpu
+ * @css: target css to be flush
+ * @cpu: the cpu that holds the stats to be flush
+ *
+ * A lightweight rstat flush operation for a given css and cpu.
+ * Only the cpu_lock is being held for mutual exclusion, the cgroup_rstat_lock
+ * isn't used.
+ */
+void cgroup_rstat_css_flush(struct cgroup_subsys_state *css, int cpu)
+{
+	raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu);
+
+	raw_spin_lock_irq(cpu_lock);
+	rcu_read_lock();
+	css->ss->css_rstat_flush(css, cpu);
+	rcu_read_unlock();
+	raw_spin_unlock_irq(cpu_lock);
+}
+
 int cgroup_rstat_init(struct cgroup *cgrp)
 {
 	int cpu;