From patchwork Fri Nov 4 18:20:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 15743 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp567184wru; Fri, 4 Nov 2022 11:25:38 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7yiF+HHAis7olsthATN9rcY6m53tmHmqVeCzH91jXFFMWfpC2+2Pn3wehrZT5TPlugw2A+ X-Received: by 2002:a05:6402:c07:b0:461:87ab:78aa with SMTP id co7-20020a0564020c0700b0046187ab78aamr37602158edb.258.1667586338637; Fri, 04 Nov 2022 11:25:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667586338; cv=none; d=google.com; s=arc-20160816; b=fA21aQC07ImpMoK9eFoTX5VzJfeqQF1AJ+cqkaLeadui1m/vf9kYpeu17gwLkjZIZQ YUVGO3ujeo7Nl31O8qLvHZ3NCLFJhnGg3PGFTDjBEyNirXcS21wSStO4KVNM8wfKKuOt FO2TF8plYfeXMRP1pPqYhyV8uUdSlicbpuZCrBpHu/p5JoKq6+z/ErC4qJc47vEbqVap ewos+/2JsnDddeFtBSdv3m6h6+46j/gzYgJi1VkcFFXEzMdUMSxx82JlYSK0I2wZEIae tVjrDGNXdeq21R1L20ffSLpQVSvzf659pUeKkxn+4r7WMjUGLLt+K9qugE8jkk6leNJE FBJQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=fsF/w+s1hazC7fbvdAgtahejdx2Z7prb6bJ8RwO66qE=; b=nATPpRKMdgxad2VjPu3sR/9FLuFzt5Pi0TOU6mIjysf92eK78rURNHF2W4LmDQMEF1 bArDJ5ZgHA/ZiBdJZFip7HJt3NFaP2uof3k4g9KiCFFxW2DAVgrLiBH94ixyQ9L0VpGC H9H43XayQ4oE0WdPq5Pg7ed7KeCYY33Ju/qjpwB+sBEIQlqZu2woR06FzrIPxds2DkY/ ROYcF0wFteAvAfvBXQE7OFKzlE8uje5JabnQjCclgRav+P+hgDz6ROffZgJaGuCANp8P z9e132KZqdetyHj7gzfCZGWJSzxjZilPpK6ei/cxsHpqOZ7PH9Yo7AmTAy2YcIEuyFRC p9/Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=K464wRwu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g11-20020a50ec0b000000b0045bc92ef2c0si208727edr.195.2022.11.04.11.25.15; Fri, 04 Nov 2022 11:25:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=K464wRwu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230509AbiKDSVy (ORCPT + 99 others); Fri, 4 Nov 2022 14:21:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41014 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231543AbiKDSVw (ORCPT ); Fri, 4 Nov 2022 14:21:52 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AFE9F2B8 for ; Fri, 4 Nov 2022 11:21:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1667586060; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fsF/w+s1hazC7fbvdAgtahejdx2Z7prb6bJ8RwO66qE=; b=K464wRwuAxU9qUuw+GgnWrt0j372dONuvQRKwapdnQ4yegKesC3QBgZQh+c9SAkp7eIM6z mHAPAI45kmgvhwHN/gb+x2PyynflMQAKp2gAWtyi1lZk5c7afiHIcL+LKGkZQuakvRjh4j 9QWtH/5JAxvDp9wN/l5+vrJ3nI7AJj0= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-417-wMiRFWXmPxejMKYYvjH6vg-1; Fri, 04 Nov 2022 14:20:57 -0400 X-MC-Unique: wMiRFWXmPxejMKYYvjH6vg-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 11297811E67; Fri, 4 Nov 2022 18:20:57 +0000 (UTC) Received: from llong.com (unknown [10.22.34.155]) by smtp.corp.redhat.com (Postfix) with ESMTP id 940ADC15BA5; Fri, 4 Nov 2022 18:20:56 +0000 (UTC) From: Waiman Long To: Tejun Heo , Jens Axboe Cc: cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Ming Lei , Andy Shevchenko , Andrew Morton , =?utf-8?q?Michal_Koutn=C3=BD?= , Hillf Danton , Waiman Long Subject: [PATCH v9 1/3] blk-cgroup: Return -ENOMEM directly in blkcg_css_alloc() error path Date: Fri, 4 Nov 2022 14:20:48 -0400 Message-Id: <20221104182050.342908-2-longman@redhat.com> In-Reply-To: <20221104182050.342908-1-longman@redhat.com> References: <20221104182050.342908-1-longman@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748591012410326155?= X-GMAIL-MSGID: =?utf-8?q?1748591012410326155?= For blkcg_css_alloc(), the only error that will be returned is -ENOMEM. Simplify error handling code by returning this error directly instead of setting an intermediate "ret" variable. Signed-off-by: Waiman Long Reviewed-by: Ming Lei Acked-by: Tejun Heo --- block/blk-cgroup.c | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index 6a5c849ee061..af8a4d2d1fd1 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -1139,7 +1139,6 @@ static struct cgroup_subsys_state * blkcg_css_alloc(struct cgroup_subsys_state *parent_css) { struct blkcg *blkcg; - struct cgroup_subsys_state *ret; int i; mutex_lock(&blkcg_pol_mutex); @@ -1148,10 +1147,8 @@ blkcg_css_alloc(struct cgroup_subsys_state *parent_css) blkcg = &blkcg_root; } else { blkcg = kzalloc(sizeof(*blkcg), GFP_KERNEL); - if (!blkcg) { - ret = ERR_PTR(-ENOMEM); + if (!blkcg) goto unlock; - } } for (i = 0; i < BLKCG_MAX_POLS ; i++) { @@ -1168,10 +1165,9 @@ blkcg_css_alloc(struct cgroup_subsys_state *parent_css) continue; cpd = pol->cpd_alloc_fn(GFP_KERNEL); - if (!cpd) { - ret = ERR_PTR(-ENOMEM); + if (!cpd) goto free_pd_blkcg; - } + blkcg->cpd[i] = cpd; cpd->blkcg = blkcg; cpd->plid = i; @@ -1200,7 +1196,7 @@ blkcg_css_alloc(struct cgroup_subsys_state *parent_css) kfree(blkcg); unlock: mutex_unlock(&blkcg_pol_mutex); - return ret; + return ERR_PTR(-ENOMEM); } static int blkcg_css_online(struct cgroup_subsys_state *css) From patchwork Fri Nov 4 18:20:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 15742 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp567037wru; Fri, 4 Nov 2022 11:25:14 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7CoXPEHYvPAsNDz/66/s5j/cBFt2WY5socp5ry3kQ6Odpuk8J2HT/i23WRu9dna8Xa4Y5I X-Received: by 2002:a17:902:c943:b0:187:1572:282b with SMTP id i3-20020a170902c94300b001871572282bmr31938395pla.126.1667586314282; Fri, 04 Nov 2022 11:25:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667586314; cv=none; d=google.com; s=arc-20160816; b=MCWPf1FQfyhVQ1/GovGb29CU+wCPSSm/1X9t5lQud5GohyHIn6gznJwGBMLTG5Di+1 EPGBdY6X40+QspKSbp9R2y9mqUISkRLtOjglJfa0NDyrGoy5PildfuKR9YwXb7QIVumT Wz1fVUrtbtPpzziIF+bQGdofB9faCFWHXh2YMYsn6pY6Jp6wzLHnV+ZeDju7ZEhh1XQR qWOH3o6QQpHHDiD6Pl/ZxmGf8LFUjMrhx4mWaxZ0Dq3X5hZa6UIeNT1ZD5gmXRbN90ty 4msBkkkkfiiQVCVTj1sRXRVcO05FFk1hsP5vfQ1RKKEpfq4QTwVym7oyYfDBNamV0Fhn wylg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=6mnvePSKfA0PS6IPlFTUrHTvTz+SxUwN1SV6IlLPeBU=; b=z2EdyEvfLp0MN9tpr9LKfBnorygCDyrbeM1dUKEbYuZc8tmJZ4tseoyPQ/tMF68QtL Iyap846zv5RJyfRdzZy7ym7GhZXP3Jw7lWXj+eP0WJ3Lh3tv2ycWDNd4555zRYccmcAf TGlWjwgaGbzQn9SQA+SP6/pVjTUxZe9GvTXeCY6/D+wGelHL3jurGEkZRpCzioLTI5M4 5EAZ2P8q6aedFaLJ1wNqSVvfgMLw97vkKZzmMvEVWIjpoTn1UmVJvGkaDQ7sK3e5t7xB 6HXHkJEC5qDX3LABk2pSL3/kcstflBfTGrgT3rfR1XVW8mHzG1HjXa+SayIR6FzxnMdG ytAQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=FWYuRD+G; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b15-20020a170902b60f00b001781675f423si116469pls.556.2022.11.04.11.25.00; Fri, 04 Nov 2022 11:25:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=FWYuRD+G; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232037AbiKDSXS (ORCPT + 99 others); Fri, 4 Nov 2022 14:23:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41452 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232075AbiKDSWz (ORCPT ); Fri, 4 Nov 2022 14:22:55 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1FD4B4B98D for ; Fri, 4 Nov 2022 11:21:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1667586062; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6mnvePSKfA0PS6IPlFTUrHTvTz+SxUwN1SV6IlLPeBU=; b=FWYuRD+GpRrJW3REl2y5lIlO3asUxVmtod6wnz8Qh3hYEUzekMmh5V8XpBIwNz73l+lBhQ Jl+pRUqiwBd55Mgth4yHSMNhpyq7uh9SlOSPSOOJhCKVX8m0bXiiCc+45WZDxqWLubDglM C7LlWmUtWi6/0BGrvMr6fGlysE9/Z8s= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-654-Xn7LV5J6PfmBXLE_i2FP8A-1; Fri, 04 Nov 2022 14:20:58 -0400 X-MC-Unique: Xn7LV5J6PfmBXLE_i2FP8A-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 909E7886462; Fri, 4 Nov 2022 18:20:57 +0000 (UTC) Received: from llong.com (unknown [10.22.34.155]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1DB78C15BA5; Fri, 4 Nov 2022 18:20:57 +0000 (UTC) From: Waiman Long To: Tejun Heo , Jens Axboe Cc: cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Ming Lei , Andy Shevchenko , Andrew Morton , =?utf-8?q?Michal_Koutn=C3=BD?= , Hillf Danton , Waiman Long Subject: [PATCH v9 2/3] blk-cgroup: Optimize blkcg_rstat_flush() Date: Fri, 4 Nov 2022 14:20:49 -0400 Message-Id: <20221104182050.342908-3-longman@redhat.com> In-Reply-To: <20221104182050.342908-1-longman@redhat.com> References: <20221104182050.342908-1-longman@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748590892387727437?= X-GMAIL-MSGID: =?utf-8?q?1748590987335309956?= For a system with many CPUs and block devices, the time to do blkcg_rstat_flush() from cgroup_rstat_flush() can be rather long. It can be especially problematic as interrupt is disabled during the flush. It was reported that it might take seconds to complete in some extreme cases leading to hard lockup messages. As it is likely that not all the percpu blkg_iostat_set's has been updated since the last flush, those stale blkg_iostat_set's don't need to be flushed in this case. This patch optimizes blkcg_rstat_flush() by keeping a lockless list of recently updated blkg_iostat_set's in a newly added percpu blkcg->lhead pointer. The blkg_iostat_set is added to a lockless list on the update side in blk_cgroup_bio_start(). It is removed from the lockless list when flushed in blkcg_rstat_flush(). Due to racing, it is possible that blk_iostat_set's in the lockless list may have no new IO stats to be flushed, but that is OK. To protect against destruction of blkg, a percpu reference is gotten when putting into the lockless list and put back when removed. When booting up an instrumented test kernel with this patch on a 2-socket 96-thread system with cgroup v2, out of the 2051 calls to cgroup_rstat_flush() after bootup, 1788 of the calls were exited immediately because of empty lockless list. After an all-cpu kernel build, the ratio became 6295424/6340513. That was more than 99%. Signed-off-by: Waiman Long Acked-by: Tejun Heo --- block/blk-cgroup.c | 76 ++++++++++++++++++++++++++++++++++++++++++---- block/blk-cgroup.h | 10 ++++++ 2 files changed, 80 insertions(+), 6 deletions(-) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index af8a4d2d1fd1..3e03c0d13253 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -59,6 +59,37 @@ static struct workqueue_struct *blkcg_punt_bio_wq; #define BLKG_DESTROY_BATCH_SIZE 64 +/* + * Lockless lists for tracking IO stats update + * + * New IO stats are stored in the percpu iostat_cpu within blkcg_gq (blkg). + * There are multiple blkg's (one for each block device) attached to each + * blkcg. The rstat code keeps track of which cpu has IO stats updated, + * but it doesn't know which blkg has the updated stats. If there are many + * block devices in a system, the cost of iterating all the blkg's to flush + * out the IO stats can be high. To reduce such overhead, a set of percpu + * lockless lists (lhead) per blkcg are used to track the set of recently + * updated iostat_cpu's since the last flush. An iostat_cpu will be put + * onto the lockless list on the update side [blk_cgroup_bio_start()] if + * not there yet and then removed when being flushed [blkcg_rstat_flush()]. + * References to blkg are gotten and then put back in the process to + * protect against blkg removal. + * + * Return: 0 if successful or -ENOMEM if allocation fails. + */ +static int init_blkcg_llists(struct blkcg *blkcg) +{ + int cpu; + + blkcg->lhead = alloc_percpu_gfp(struct llist_head, GFP_KERNEL); + if (!blkcg->lhead) + return -ENOMEM; + + for_each_possible_cpu(cpu) + init_llist_head(per_cpu_ptr(blkcg->lhead, cpu)); + return 0; +} + /** * blkcg_css - find the current css * @@ -236,8 +267,10 @@ static struct blkcg_gq *blkg_alloc(struct blkcg *blkcg, struct gendisk *disk, blkg->blkcg = blkcg; u64_stats_init(&blkg->iostat.sync); - for_each_possible_cpu(cpu) + for_each_possible_cpu(cpu) { u64_stats_init(&per_cpu_ptr(blkg->iostat_cpu, cpu)->sync); + per_cpu_ptr(blkg->iostat_cpu, cpu)->blkg = blkg; + } for (i = 0; i < BLKCG_MAX_POLS; i++) { struct blkcg_policy *pol = blkcg_policy[i]; @@ -827,7 +860,9 @@ static void blkcg_iostat_update(struct blkcg_gq *blkg, struct blkg_iostat *cur, static void blkcg_rstat_flush(struct cgroup_subsys_state *css, int cpu) { struct blkcg *blkcg = css_to_blkcg(css); - struct blkcg_gq *blkg; + struct llist_head *lhead = per_cpu_ptr(blkcg->lhead, cpu); + struct llist_node *lnode; + struct blkg_iostat_set *bisc, *next_bisc; /* Root-level stats are sourced from system-wide IO stats */ if (!cgroup_parent(css->cgroup)) @@ -835,12 +870,21 @@ static void blkcg_rstat_flush(struct cgroup_subsys_state *css, int cpu) rcu_read_lock(); - hlist_for_each_entry_rcu(blkg, &blkcg->blkg_list, blkcg_node) { + lnode = llist_del_all(lhead); + if (!lnode) + goto out; + + /* + * Iterate only the iostat_cpu's queued in the lockless list. + */ + llist_for_each_entry_safe(bisc, next_bisc, lnode, lnode) { + struct blkcg_gq *blkg = bisc->blkg; struct blkcg_gq *parent = blkg->parent; - struct blkg_iostat_set *bisc = per_cpu_ptr(blkg->iostat_cpu, cpu); struct blkg_iostat cur; unsigned int seq; + WRITE_ONCE(bisc->lqueued, false); + /* fetch the current per-cpu values */ do { seq = u64_stats_fetch_begin(&bisc->sync); @@ -853,8 +897,10 @@ static void blkcg_rstat_flush(struct cgroup_subsys_state *css, int cpu) if (parent && parent->parent) blkcg_iostat_update(parent, &blkg->iostat.cur, &blkg->iostat.last); + percpu_ref_put(&blkg->refcnt); } +out: rcu_read_unlock(); } @@ -1132,6 +1178,7 @@ static void blkcg_css_free(struct cgroup_subsys_state *css) mutex_unlock(&blkcg_pol_mutex); + free_percpu(blkcg->lhead); kfree(blkcg); } @@ -1151,6 +1198,9 @@ blkcg_css_alloc(struct cgroup_subsys_state *parent_css) goto unlock; } + if (init_blkcg_llists(blkcg)) + goto free_blkcg; + for (i = 0; i < BLKCG_MAX_POLS ; i++) { struct blkcg_policy *pol = blkcg_policy[i]; struct blkcg_policy_data *cpd; @@ -1191,7 +1241,8 @@ blkcg_css_alloc(struct cgroup_subsys_state *parent_css) for (i--; i >= 0; i--) if (blkcg->cpd[i]) blkcg_policy[i]->cpd_free_fn(blkcg->cpd[i]); - + free_percpu(blkcg->lhead); +free_blkcg: if (blkcg != &blkcg_root) kfree(blkcg); unlock: @@ -1939,6 +1990,7 @@ static int blk_cgroup_io_type(struct bio *bio) void blk_cgroup_bio_start(struct bio *bio) { + struct blkcg *blkcg = bio->bi_blkg->blkcg; int rwd = blk_cgroup_io_type(bio), cpu; struct blkg_iostat_set *bis; unsigned long flags; @@ -1957,9 +2009,21 @@ void blk_cgroup_bio_start(struct bio *bio) } bis->cur.ios[rwd]++; + /* + * If the iostat_cpu isn't in a lockless list, put it into the + * list to indicate that a stat update is pending. + */ + if (!READ_ONCE(bis->lqueued)) { + struct llist_head *lhead = this_cpu_ptr(blkcg->lhead); + + llist_add(&bis->lnode, lhead); + WRITE_ONCE(bis->lqueued, true); + percpu_ref_get(&bis->blkg->refcnt); + } + u64_stats_update_end_irqrestore(&bis->sync, flags); if (cgroup_subsys_on_dfl(io_cgrp_subsys)) - cgroup_rstat_updated(bio->bi_blkg->blkcg->css.cgroup, cpu); + cgroup_rstat_updated(blkcg->css.cgroup, cpu); put_cpu(); } diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h index aa2b286bc825..1e94e404eaa8 100644 --- a/block/blk-cgroup.h +++ b/block/blk-cgroup.h @@ -18,6 +18,7 @@ #include #include #include +#include struct blkcg_gq; struct blkg_policy_data; @@ -43,6 +44,9 @@ struct blkg_iostat { struct blkg_iostat_set { struct u64_stats_sync sync; + struct blkcg_gq *blkg; + struct llist_node lnode; + int lqueued; /* queued in llist */ struct blkg_iostat cur; struct blkg_iostat last; }; @@ -97,6 +101,12 @@ struct blkcg { struct blkcg_policy_data *cpd[BLKCG_MAX_POLS]; struct list_head all_blkcgs_node; + + /* + * List of updated percpu blkg_iostat_set's since the last flush. + */ + struct llist_head __percpu *lhead; + #ifdef CONFIG_BLK_CGROUP_FC_APPID char fc_app_id[FC_APPID_LEN]; #endif From patchwork Fri Nov 4 18:20:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Waiman Long X-Patchwork-Id: 15744 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp567294wru; Fri, 4 Nov 2022 11:25:56 -0700 (PDT) X-Google-Smtp-Source: AA0mqf6XPNeAFu4BdKeWgQ3EFCNHJ4Iaqy4yMRadFYA/SRsB4BgFj+SpqheiuuutN/fKqHnsfoX7 X-Received: by 2002:a17:902:ce88:b0:188:6429:fedd with SMTP id f8-20020a170902ce8800b001886429feddmr4608886plg.0.1667586355978; Fri, 04 Nov 2022 11:25:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667586355; cv=none; d=google.com; s=arc-20160816; b=dM4HDesaYE8Sq/IL3dsHjUKyhqaH17p0tNj1bKFtZ3BmuLqLw0myCKprMJtw1ghFUE 840yMIONrjTYgmp5O8RitR1R2JiE5l+DTq4CxfOaKGbLWDZb/pt00BaY9Mx448PX2lUk C7qnlc3xxoQf7lUAvpT/Yj0pyo1HchxpN9VUEbZ7mgtQkwW3YLe3R7aHtqFcWQZ7gFSi mT7+bmQbLvch1WxgjKjyzZJ5YApkEoWuHO+tqkvGTGb6h2GO2QudxvS12nmu5e6pIFyE i9wwWBFv3pCNO8cxG3+xPS2EGfFs+84NZ74JWUdA6hELXbI/xucDYRCY3fWNiUutFsyZ CD6A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=PjL1+H6RuB7jMr/6XEfK1aTzhYLFDKOwXB8sr3ps4w8=; b=GYMk5c8dA7wyX8qpsxoGbzoQeBAgQcdyZf3LmeEtUhaZ0q2mVTO8yVOd1ImtlTf0JX urss+IeNtHQMDg3MLa/j6anXt+wjMq/3kHLgAtBMdmshlNJ36wP1zAAqtKVpDbutrVra UVrpQDBGGthSYkxiPOohpWfJk+XtBdGk8zNDiuKG1qq7SZDwJZQ+d5+V9HGfGXGdb2q/ m+JXyKTgBiN/u6Z3EaYMTl4poGMJYrZS77eiaK7c8nQwrpr/uHH2cz8UlWusoDM5COZs rpiPVEUS+Z7TmDHUr6ZnLfCXgfkXgcaAzuR92SFA9fGPxNz384/UclQgRZ4pTmdrEUDP x44w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=FkhyJCk9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l25-20020a63ba59000000b0043941e5532dsi215589pgu.391.2022.11.04.11.25.41; Fri, 04 Nov 2022 11:25:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=FkhyJCk9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232024AbiKDSXP (ORCPT + 99 others); Fri, 4 Nov 2022 14:23:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41432 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230139AbiKDSWt (ORCPT ); Fri, 4 Nov 2022 14:22:49 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 88B7E4AF10 for ; Fri, 4 Nov 2022 11:21:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1667586061; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PjL1+H6RuB7jMr/6XEfK1aTzhYLFDKOwXB8sr3ps4w8=; b=FkhyJCk9MmFPZ9aeNg1YYiZZEyPz3ep4pXK2C35+knlTp8ZyEKh0xZYP/69CmHpT+CuUgB gE2yeJPPMGs5o9MBibyX8KpSyxx3Zi4BvWxQShwstSs6IFa4uw/Z/hB68KL5OpklO1uPDs BJQLhBMxta6AVQjX2cRakP55qERX1GU= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-646-mcELFn2KNKet6HaMbpKa7g-1; Fri, 04 Nov 2022 14:20:58 -0400 X-MC-Unique: mcELFn2KNKet6HaMbpKa7g-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 17FED185A78F; Fri, 4 Nov 2022 18:20:58 +0000 (UTC) Received: from llong.com (unknown [10.22.34.155]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9DD9CC15BA5; Fri, 4 Nov 2022 18:20:57 +0000 (UTC) From: Waiman Long To: Tejun Heo , Jens Axboe Cc: cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Ming Lei , Andy Shevchenko , Andrew Morton , =?utf-8?q?Michal_Koutn=C3=BD?= , Hillf Danton , Waiman Long Subject: [PATCH v9 3/3] blk-cgroup: Flush stats at blkgs destruction path Date: Fri, 4 Nov 2022 14:20:50 -0400 Message-Id: <20221104182050.342908-4-longman@redhat.com> In-Reply-To: <20221104182050.342908-1-longman@redhat.com> References: <20221104182050.342908-1-longman@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748591030912102151?= X-GMAIL-MSGID: =?utf-8?q?1748591030912102151?= As noted by Michal, the blkg_iostat_set's in the lockless list hold reference to blkg's to protect against their removal. Those blkg's hold reference to blkcg. When a cgroup is being destroyed, cgroup_rstat_flush() is only called at css_release_work_fn() which is called when the blkcg reference count reaches 0. This circular dependency will prevent blkcg from being freed until some other events cause cgroup_rstat_flush() to be called to flush out the pending blkcg stats. To prevent this delayed blkcg removal, add a new cgroup_rstat_css_flush() function to flush stats for a given css and cpu and call it at the blkgs destruction path, blkcg_destroy_blkgs(), whenever there are still some pending stats to be flushed. This will ensure that blkcg reference count can reach 0 ASAP. Signed-off-by: Waiman Long --- block/blk-cgroup.c | 15 ++++++++++++++- include/linux/cgroup.h | 1 + kernel/cgroup/rstat.c | 20 ++++++++++++++++++++ 3 files changed, 35 insertions(+), 1 deletion(-) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index 3e03c0d13253..fa0a366e3476 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -1084,10 +1084,12 @@ struct list_head *blkcg_get_cgwb_list(struct cgroup_subsys_state *css) */ static void blkcg_destroy_blkgs(struct blkcg *blkcg) { + int cpu; + might_sleep(); + css_get(&blkcg->css); spin_lock_irq(&blkcg->lock); - while (!hlist_empty(&blkcg->blkg_list)) { struct blkcg_gq *blkg = hlist_entry(blkcg->blkg_list.first, struct blkcg_gq, blkcg_node); @@ -1110,6 +1112,17 @@ static void blkcg_destroy_blkgs(struct blkcg *blkcg) } spin_unlock_irq(&blkcg->lock); + + /* + * Flush all the non-empty percpu lockless lists. + */ + for_each_possible_cpu(cpu) { + struct llist_head *lhead = per_cpu_ptr(blkcg->lhead, cpu); + + if (!llist_empty(lhead)) + cgroup_rstat_css_flush(&blkcg->css, cpu); + } + css_put(&blkcg->css); } /** diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 528bd44b59e2..4a61cc5d1952 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -766,6 +766,7 @@ void cgroup_rstat_flush(struct cgroup *cgrp); void cgroup_rstat_flush_irqsafe(struct cgroup *cgrp); void cgroup_rstat_flush_hold(struct cgroup *cgrp); void cgroup_rstat_flush_release(void); +void cgroup_rstat_css_flush(struct cgroup_subsys_state *css, int cpu); /* * Basic resource stats. diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index 793ecff29038..28033190fb29 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -281,6 +281,26 @@ void cgroup_rstat_flush_release(void) spin_unlock_irq(&cgroup_rstat_lock); } +/** + * cgroup_rstat_css_flush - flush stats for the given css and cpu + * @css: target css to be flush + * @cpu: the cpu that holds the stats to be flush + * + * A lightweight rstat flush operation for a given css and cpu. + * Only the cpu_lock is being held for mutual exclusion, the cgroup_rstat_lock + * isn't used. + */ +void cgroup_rstat_css_flush(struct cgroup_subsys_state *css, int cpu) +{ + raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu); + + raw_spin_lock_irq(cpu_lock); + rcu_read_lock(); + css->ss->css_rstat_flush(css, cpu); + rcu_read_unlock(); + raw_spin_unlock_irq(cpu_lock); +} + int cgroup_rstat_init(struct cgroup *cgrp) { int cpu;