From patchwork Wed Mar 29 12:55:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Juri Lelli X-Patchwork-Id: 76569 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp399300vqo; Wed, 29 Mar 2023 06:08:36 -0700 (PDT) X-Google-Smtp-Source: AKy350Yfx1Rv52+CfIMmBP6zLF5oxq2nzStyH8yMi+VdNLzmJx4Php2w1kx31lX3InoKb+t1kguP X-Received: by 2002:a17:906:3449:b0:930:9197:24d1 with SMTP id d9-20020a170906344900b00930919724d1mr2436415ejb.6.1680095315880; Wed, 29 Mar 2023 06:08:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680095315; cv=none; d=google.com; s=arc-20160816; b=Hx6FwgqNQ+RaQ5+CkeIWgMJfF8JiFnvbuPul8Xih+ysl50E1YhLB02wLHJJv0DP7ZL d8rOM1qMFNOQCrgtR+pNGKyBSFyw/1ckH5ux/HKwwJW/3vS/+wFq43J9cjccFFTfmEMG hc3WEBeFs7VgZVXmlQB0TqziwKCwDvVYqIZRx/1ckROfmzNACK6RyznblZRu/GWuz7ei nqJLpfRwLL745/dZMcZm2skHSUjjECJf746zo+bDb+U+5YEnTy7LAhjdfU78/mhoDhTA A17qvCvN7KKzt9mqXSUm/ZTVenjV+83QnHzq8fGA8+aC69FTQJww3y06x+td/D6aoKA2 TYkQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=e9wz52Em0oDIrgIoF+9rhsFOYzetLxtv432sbjV9gOo=; b=LT7CpnHg3aWHrHfIP99pnDhoLK24RvNgkeJpy2CQz4R9A/Z7ppZtNiEHCNdWyT/rN/ /dPmbcP+qHYkFD7mnRsnCR+xc5ucohs6EArsEEb5uITzYHHHaasl3ZB6AD2AXauQoZjm E+9NReen+A+NMCwkON1RImc4zcoSLL/Hs6oCzA7fTuHFEjkiwCpWwWqh31pxwvZGNK30 YsXgywmoUz/qTOWpwX5wD2/OlepwSuq4vUBwvX6Wk6JdHQOQCcQxIcsY1EQLB2n9YRDF Xr00Jq6F735/8lEz6gXFFz5gGSslV7jPlI0UY6WzT0acVPEHGUlZSClQ/adTtCYfmiqR CslQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Ns2R2OLW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i15-20020a1709061ccf00b0092b97d144d0si34958160ejh.157.2023.03.29.06.08.12; Wed, 29 Mar 2023 06:08:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Ns2R2OLW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229870AbjC2M5Q (ORCPT + 99 others); Wed, 29 Mar 2023 08:57:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39048 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229642AbjC2M5M (ORCPT ); Wed, 29 Mar 2023 08:57:12 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 34E5FBA for ; Wed, 29 Mar 2023 05:56:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680094583; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=e9wz52Em0oDIrgIoF+9rhsFOYzetLxtv432sbjV9gOo=; b=Ns2R2OLWTs3HMHv2jcd78K9hSapl5vU1nvEvjikL9EVtLtNCzrWX0DzkXrChfsFCUh/JaL WYnGzbAzgnShVPAzn35UL1yXso0WU5SGKR+7bXUoFGU6zDw3b321aUNP2Yba2id0jXsAO0 Juq8yO/6KkGOosG1u00ypW0DvQzXQr4= Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-201-GuScjsHvOp-Ws1NCqnLC6Q-1; Wed, 29 Mar 2023 08:56:22 -0400 X-MC-Unique: GuScjsHvOp-Ws1NCqnLC6Q-1 Received: by mail-qt1-f200.google.com with SMTP id b11-20020ac87fcb000000b003e37d72d532so10163924qtk.18 for ; Wed, 29 Mar 2023 05:56:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680094581; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=e9wz52Em0oDIrgIoF+9rhsFOYzetLxtv432sbjV9gOo=; b=ZPB/9ASJbGRhhi9tQuWV68bWT2zPWb9AMIucoob2NQdwTTPaTffGeT/TE6y56K4USu eXu3UWt3fX3stGzU7ydqdaBo5FxxRYWNg7EJ8Y0SqEOj9MyDrYvAZA1F6uhzYqrKTRWu GM+J5cQxeB2kcphnnIJ/HZ0OKHgxDTA17wVXQPqooZJ2Jjf/P2EoeferdykcRxxltF1i WA0NWXjqB2guoJx9H1hTiv4PuLANHeXKsz4jbmOU9J2fv1o8f3cc5HcaglAtWw7DcfLC CLzl02TBRM+U/+xZZxxYJrGc1Me+K9YQEVhW6dbIcRrlUJDQ0bSXv/f8X2D5lKahOgrx /RSA== X-Gm-Message-State: AAQBX9dW9dpQ1AovKbmUiSM4f0dZu/vV2sy22PmkjPOIpM2Rvv7wPkdO JoLIFb1vSsuRRoG/P3U8b0sfI2Te54MWq0VK+cKEE/kwkOH2elI6iXtyuv5eENGESMDOUPdxru2 aGkmjalFU2vJqKR8J60aLgfpo X-Received: by 2002:a05:6214:194b:b0:5df:450b:8002 with SMTP id q11-20020a056214194b00b005df450b8002mr11421701qvk.31.1680094581746; Wed, 29 Mar 2023 05:56:21 -0700 (PDT) X-Received: by 2002:a05:6214:194b:b0:5df:450b:8002 with SMTP id q11-20020a056214194b00b005df450b8002mr11421668qvk.31.1680094581447; Wed, 29 Mar 2023 05:56:21 -0700 (PDT) Received: from localhost.localdomain.com ([151.29.151.163]) by smtp.gmail.com with ESMTPSA id c23-20020a379a17000000b007436d0e9408sm13527134qke.127.2023.03.29.05.56.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 Mar 2023 05:56:21 -0700 (PDT) From: Juri Lelli To: Peter Zijlstra , Ingo Molnar , Qais Yousef , Waiman Long , Tejun Heo , Zefan Li , Johannes Weiner , Hao Luo Cc: Dietmar Eggemann , Steven Rostedt , linux-kernel@vger.kernel.org, luca.abeni@santannapisa.it, claudio@evidence.eu.com, tommaso.cucinotta@santannapisa.it, bristot@redhat.com, mathieu.poirier@linaro.org, cgroups@vger.kernel.org, Vincent Guittot , Wei Wang , Rick Yiu , Quentin Perret , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Sudeep Holla , Juri Lelli Subject: [PATCH 1/6] cgroup/cpuset: Rename functions dealing with DEADLINE accounting Date: Wed, 29 Mar 2023 14:55:53 +0200 Message-Id: <20230329125558.255239-2-juri.lelli@redhat.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230329125558.255239-1-juri.lelli@redhat.com> References: <20230329125558.255239-1-juri.lelli@redhat.com> MIME-Version: 1.0 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761707625872639825?= X-GMAIL-MSGID: =?utf-8?q?1761707625872639825?= rebuild_root_domains() and update_tasks_root_domain() have neutral names, but actually deal with DEADLINE bandwidth accounting. Rename them to use 'dl_' prefix so that intent is more clear. No functional change. Suggested-by: Qais Yousef Signed-off-by: Juri Lelli Reviewed-by: Qais Yousef Tested-by: Qais Yousef --- kernel/cgroup/cpuset.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 636f1c682ac0..501913bc2805 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -1066,7 +1066,7 @@ static int generate_sched_domains(cpumask_var_t **domains, return ndoms; } -static void update_tasks_root_domain(struct cpuset *cs) +static void dl_update_tasks_root_domain(struct cpuset *cs) { struct css_task_iter it; struct task_struct *task; @@ -1079,7 +1079,7 @@ static void update_tasks_root_domain(struct cpuset *cs) css_task_iter_end(&it); } -static void rebuild_root_domains(void) +static void dl_rebuild_rd_accounting(void) { struct cpuset *cs = NULL; struct cgroup_subsys_state *pos_css; @@ -1107,7 +1107,7 @@ static void rebuild_root_domains(void) rcu_read_unlock(); - update_tasks_root_domain(cs); + dl_update_tasks_root_domain(cs); rcu_read_lock(); css_put(&cs->css); @@ -1121,7 +1121,7 @@ partition_and_rebuild_sched_domains(int ndoms_new, cpumask_var_t doms_new[], { mutex_lock(&sched_domains_mutex); partition_sched_domains_locked(ndoms_new, doms_new, dattr_new); - rebuild_root_domains(); + dl_rebuild_rd_accounting(); mutex_unlock(&sched_domains_mutex); } From patchwork Wed Mar 29 12:55:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Juri Lelli X-Patchwork-Id: 76568 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp398889vqo; Wed, 29 Mar 2023 06:08:08 -0700 (PDT) X-Google-Smtp-Source: AKy350bK5vCLpAGlNoQNSOTSu30rWi0NSSUQQxe0ens1PV/RrXoDJE0I7QNDaPXBau4Yjz//j5xc X-Received: by 2002:aa7:cf14:0:b0:501:cf67:97fc with SMTP id a20-20020aa7cf14000000b00501cf6797fcmr2278673edy.10.1680095288008; Wed, 29 Mar 2023 06:08:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680095287; cv=none; d=google.com; s=arc-20160816; b=0L3xaX2F/7S7Gv68YfIoroXW2cimhNNUeLjfRmkwDRORHI/RJiOIQH4kBSSuAhMKlg HN8KJJc9w41fZxnFDQh0L104IvNM6N3auDZ0MIJDHD2NDZMVBfX5oDn5RwBWSyCVIBHG 3w0xSD4YIySw4iKIeb0S2joqrN918VT0b/ZTjDBdNdJaMbtktsDZWkI67wEO/Ivbr6k8 PpJuLWW3bjAX4YHfag6+5Vr5DDjaydU6/XOgrV7pWmVDF2wrwGI4lAuCoCKN7NzcqQyC JLtob6GNL5JY5U2Pk21vRbJ38l9/jm4D1UuqgH2A2DxioNcii6+e+FRtrVapnNXqMdb8 Bwnw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=vHeWAEwMsV34s33LhAXi9eMve0H+8/uZUBY+12KYjkc=; b=C+TTmJ8LRGGq3baTn22Pi7IiUZkxFSkvP3rjUkWDdznG1jR32jWr1VqzZzwo+OgPCy /OvgOPrj4enhOfH5xCQ4/2/fN2zLa8qPALsLEPC8lV2HLqA5yYZBhSRB/5VwUYm76yM6 pyGPpIAxUNSwOFgZ85aqENcCCz3RVcBGgKGopM147euFHEOOO8YG4Y5qbk+RNFqKr7nc JdJz8rRscJKImHxpoapsKpoxQ5q4TmJvbU1mi9KVSXhF2JxR3UDF0wYcxLiH9I6gM+a/ do8o6vjKFNNV3E1nWW1U4pnnhl6JgLVSb/Gg8oxkM1p26RB863GwLkDk7me12w2ezrIW fuNQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=KOMgRU9k; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n7-20020a05640204c700b004d2785fd4b3si32978430edw.348.2023.03.29.06.07.43; Wed, 29 Mar 2023 06:08:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=KOMgRU9k; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229957AbjC2M50 (ORCPT + 99 others); Wed, 29 Mar 2023 08:57:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39096 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229939AbjC2M5S (ORCPT ); Wed, 29 Mar 2023 08:57:18 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DE89D198 for ; Wed, 29 Mar 2023 05:56:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680094590; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vHeWAEwMsV34s33LhAXi9eMve0H+8/uZUBY+12KYjkc=; b=KOMgRU9k7xQvw9lqrKuPbALfUbjYLuned3gCmBHDEJMrc981NetBFxKLaXxFmoCl4jYuGG vrntiSaIIBx/n8HIHPYCR24IIC6SiBLJjj/O/pqLW4l04IzjkhY2XuORgFjNhJtVvT3ZBL MemQh9qTvZYgWNpMyYLJRM+jqK6xzPc= Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-342-6NCXZYPyM3CqIm1CmQwGpw-1; Wed, 29 Mar 2023 08:56:28 -0400 X-MC-Unique: 6NCXZYPyM3CqIm1CmQwGpw-1 Received: by mail-qk1-f200.google.com with SMTP id pc36-20020a05620a842400b00742c715894bso7191294qkn.21 for ; Wed, 29 Mar 2023 05:56:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680094588; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vHeWAEwMsV34s33LhAXi9eMve0H+8/uZUBY+12KYjkc=; b=kYcKnz2QL0XIpGGF1jIElWw9zkWAwXj8daR/bnGrpqPdyIsKJJOp+lFvFj21a6lfLy z7pgl9vEpGldcr//A7r+XOcizMPeCEZTVrAR+paLA8kMYLtnvFW3syaiQyors4Ux0+Td rEdXPm5Dy7FhYpmFFVvFF31NOPQB9Hxij+JWVARHoDk84OArBVLoYHEIPoW4oKj82x7W 9rfNRPPEcZ0f4MojAELzIPgHQZN5SwEQ75J1P9SNq3fpsRGTyN5o1bro5eGTL9v7VFVW Bdxk9sO9yiEwcxVfdAXrPh6vVyHJoVwYHXWcqy9gT44hjwlI+0epq88rpBppECiIhnwn CCVA== X-Gm-Message-State: AAQBX9dV/WCORCJ56HjRBe0BvpggCD1RjTx7z93HvSfsZi59jj21+KeK EJlWAik8R1XEFinsvUC+/Bilp506IBGHvDvYf9uOq8Dq1KfibD/IDHZpgiJb0S9T6dz820u3ynq BQdeYE1FR9mGzaR0SG9J7pfDZ X-Received: by 2002:a05:622a:1a12:b0:3bf:c423:c384 with SMTP id f18-20020a05622a1a1200b003bfc423c384mr3399278qtb.15.1680094587857; Wed, 29 Mar 2023 05:56:27 -0700 (PDT) X-Received: by 2002:a05:622a:1a12:b0:3bf:c423:c384 with SMTP id f18-20020a05622a1a1200b003bfc423c384mr3399207qtb.15.1680094587337; Wed, 29 Mar 2023 05:56:27 -0700 (PDT) Received: from localhost.localdomain.com ([151.29.151.163]) by smtp.gmail.com with ESMTPSA id c23-20020a379a17000000b007436d0e9408sm13527134qke.127.2023.03.29.05.56.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 Mar 2023 05:56:27 -0700 (PDT) From: Juri Lelli To: Peter Zijlstra , Ingo Molnar , Qais Yousef , Waiman Long , Tejun Heo , Zefan Li , Johannes Weiner , Hao Luo Cc: Dietmar Eggemann , Steven Rostedt , linux-kernel@vger.kernel.org, luca.abeni@santannapisa.it, claudio@evidence.eu.com, tommaso.cucinotta@santannapisa.it, bristot@redhat.com, mathieu.poirier@linaro.org, cgroups@vger.kernel.org, Vincent Guittot , Wei Wang , Rick Yiu , Quentin Perret , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Sudeep Holla , Juri Lelli Subject: [PATCH 2/6] sched/cpuset: Bring back cpuset_mutex Date: Wed, 29 Mar 2023 14:55:54 +0200 Message-Id: <20230329125558.255239-3-juri.lelli@redhat.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230329125558.255239-1-juri.lelli@redhat.com> References: <20230329125558.255239-1-juri.lelli@redhat.com> MIME-Version: 1.0 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761707596933200750?= X-GMAIL-MSGID: =?utf-8?q?1761707596933200750?= Turns out percpu_cpuset_rwsem - commit 1243dc518c9d ("cgroup/cpuset: Convert cpuset_mutex to percpu_rwsem") - wasn't such a brilliant idea, as it has been reported to cause slowdowns in workloads that need to change cpuset configuration frequently and it is also not implementing priority inheritance (which causes troubles with realtime workloads). Convert percpu_cpuset_rwsem back to regular cpuset_mutex. Also grab it only for SCHED_DEADLINE tasks (other policies don't care about stable cpusets anyway). Signed-off-by: Juri Lelli Reviewed-by: Qais Yousef Tested-by: Qais Yousef --- include/linux/cpuset.h | 8 +-- kernel/cgroup/cpuset.c | 145 ++++++++++++++++++++--------------------- kernel/sched/core.c | 22 +++++-- 3 files changed, 91 insertions(+), 84 deletions(-) diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h index d58e0476ee8e..355f796c5f07 100644 --- a/include/linux/cpuset.h +++ b/include/linux/cpuset.h @@ -71,8 +71,8 @@ extern void cpuset_init_smp(void); extern void cpuset_force_rebuild(void); extern void cpuset_update_active_cpus(void); extern void cpuset_wait_for_hotplug(void); -extern void cpuset_read_lock(void); -extern void cpuset_read_unlock(void); +extern void cpuset_lock(void); +extern void cpuset_unlock(void); extern void cpuset_cpus_allowed(struct task_struct *p, struct cpumask *mask); extern bool cpuset_cpus_allowed_fallback(struct task_struct *p); extern nodemask_t cpuset_mems_allowed(struct task_struct *p); @@ -196,8 +196,8 @@ static inline void cpuset_update_active_cpus(void) static inline void cpuset_wait_for_hotplug(void) { } -static inline void cpuset_read_lock(void) { } -static inline void cpuset_read_unlock(void) { } +static inline void cpuset_lock(void) { } +static inline void cpuset_unlock(void) { } static inline void cpuset_cpus_allowed(struct task_struct *p, struct cpumask *mask) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 501913bc2805..fbc10b494292 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -366,22 +366,21 @@ static struct cpuset top_cpuset = { if (is_cpuset_online(((des_cs) = css_cs((pos_css))))) /* - * There are two global locks guarding cpuset structures - cpuset_rwsem and + * There are two global locks guarding cpuset structures - cpuset_mutex and * callback_lock. We also require taking task_lock() when dereferencing a * task's cpuset pointer. See "The task_lock() exception", at the end of this - * comment. The cpuset code uses only cpuset_rwsem write lock. Other - * kernel subsystems can use cpuset_read_lock()/cpuset_read_unlock() to - * prevent change to cpuset structures. + * comment. The cpuset code uses only cpuset_mutex. Other kernel subsystems + * can use cpuset_lock()/cpuset_unlock() to prevent change to cpuset + * structures. * * A task must hold both locks to modify cpusets. If a task holds - * cpuset_rwsem, it blocks others wanting that rwsem, ensuring that it - * is the only task able to also acquire callback_lock and be able to - * modify cpusets. It can perform various checks on the cpuset structure - * first, knowing nothing will change. It can also allocate memory while - * just holding cpuset_rwsem. While it is performing these checks, various - * callback routines can briefly acquire callback_lock to query cpusets. - * Once it is ready to make the changes, it takes callback_lock, blocking - * everyone else. + * cpuset_mutex, it blocks others, ensuring that it is the only task able to + * also acquire callback_lock and be able to modify cpusets. It can perform + * various checks on the cpuset structure first, knowing nothing will change. + * It can also allocate memory while just holding cpuset_mutex. While it is + * performing these checks, various callback routines can briefly acquire + * callback_lock to query cpusets. Once it is ready to make the changes, it + * takes callback_lock, blocking everyone else. * * Calls to the kernel memory allocator can not be made while holding * callback_lock, as that would risk double tripping on callback_lock @@ -403,16 +402,16 @@ static struct cpuset top_cpuset = { * guidelines for accessing subsystem state in kernel/cgroup.c */ -DEFINE_STATIC_PERCPU_RWSEM(cpuset_rwsem); +static DEFINE_MUTEX(cpuset_mutex); -void cpuset_read_lock(void) +void cpuset_lock(void) { - percpu_down_read(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); } -void cpuset_read_unlock(void) +void cpuset_unlock(void) { - percpu_up_read(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); } static DEFINE_SPINLOCK(callback_lock); @@ -496,7 +495,7 @@ static inline bool partition_is_populated(struct cpuset *cs, * One way or another, we guarantee to return some non-empty subset * of cpu_online_mask. * - * Call with callback_lock or cpuset_rwsem held. + * Call with callback_lock or cpuset_mutex held. */ static void guarantee_online_cpus(struct task_struct *tsk, struct cpumask *pmask) @@ -538,7 +537,7 @@ static void guarantee_online_cpus(struct task_struct *tsk, * One way or another, we guarantee to return some non-empty subset * of node_states[N_MEMORY]. * - * Call with callback_lock or cpuset_rwsem held. + * Call with callback_lock or cpuset_mutex held. */ static void guarantee_online_mems(struct cpuset *cs, nodemask_t *pmask) { @@ -550,7 +549,7 @@ static void guarantee_online_mems(struct cpuset *cs, nodemask_t *pmask) /* * update task's spread flag if cpuset's page/slab spread flag is set * - * Call with callback_lock or cpuset_rwsem held. The check can be skipped + * Call with callback_lock or cpuset_mutex held. The check can be skipped * if on default hierarchy. */ static void cpuset_update_task_spread_flags(struct cpuset *cs, @@ -575,7 +574,7 @@ static void cpuset_update_task_spread_flags(struct cpuset *cs, * * One cpuset is a subset of another if all its allowed CPUs and * Memory Nodes are a subset of the other, and its exclusive flags - * are only set if the other's are set. Call holding cpuset_rwsem. + * are only set if the other's are set. Call holding cpuset_mutex. */ static int is_cpuset_subset(const struct cpuset *p, const struct cpuset *q) @@ -713,7 +712,7 @@ static int validate_change_legacy(struct cpuset *cur, struct cpuset *trial) * If we replaced the flag and mask values of the current cpuset * (cur) with those values in the trial cpuset (trial), would * our various subset and exclusive rules still be valid? Presumes - * cpuset_rwsem held. + * cpuset_mutex held. * * 'cur' is the address of an actual, in-use cpuset. Operations * such as list traversal that depend on the actual address of the @@ -829,7 +828,7 @@ static void update_domain_attr_tree(struct sched_domain_attr *dattr, rcu_read_unlock(); } -/* Must be called with cpuset_rwsem held. */ +/* Must be called with cpuset_mutex held. */ static inline int nr_cpusets(void) { /* jump label reference count + the top-level cpuset */ @@ -855,7 +854,7 @@ static inline int nr_cpusets(void) * domains when operating in the severe memory shortage situations * that could cause allocation failures below. * - * Must be called with cpuset_rwsem held. + * Must be called with cpuset_mutex held. * * The three key local variables below are: * cp - cpuset pointer, used (together with pos_css) to perform a @@ -1084,7 +1083,7 @@ static void dl_rebuild_rd_accounting(void) struct cpuset *cs = NULL; struct cgroup_subsys_state *pos_css; - percpu_rwsem_assert_held(&cpuset_rwsem); + lockdep_assert_held(&cpuset_mutex); lockdep_assert_cpus_held(); lockdep_assert_held(&sched_domains_mutex); @@ -1134,7 +1133,7 @@ partition_and_rebuild_sched_domains(int ndoms_new, cpumask_var_t doms_new[], * 'cpus' is removed, then call this routine to rebuild the * scheduler's dynamic sched domains. * - * Call with cpuset_rwsem held. Takes cpus_read_lock(). + * Call with cpuset_mutex held. Takes cpus_read_lock(). */ static void rebuild_sched_domains_locked(void) { @@ -1145,7 +1144,7 @@ static void rebuild_sched_domains_locked(void) int ndoms; lockdep_assert_cpus_held(); - percpu_rwsem_assert_held(&cpuset_rwsem); + lockdep_assert_held(&cpuset_mutex); /* * If we have raced with CPU hotplug, return early to avoid @@ -1196,9 +1195,9 @@ static void rebuild_sched_domains_locked(void) void rebuild_sched_domains(void) { cpus_read_lock(); - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); rebuild_sched_domains_locked(); - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); cpus_read_unlock(); } @@ -1208,7 +1207,7 @@ void rebuild_sched_domains(void) * @new_cpus: the temp variable for the new effective_cpus mask * * Iterate through each task of @cs updating its cpus_allowed to the - * effective cpuset's. As this function is called with cpuset_rwsem held, + * effective cpuset's. As this function is called with cpuset_mutex held, * cpuset membership stays stable. */ static void update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus) @@ -1317,7 +1316,7 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd, int old_prs, new_prs; int part_error = PERR_NONE; /* Partition error? */ - percpu_rwsem_assert_held(&cpuset_rwsem); + lockdep_assert_held(&cpuset_mutex); /* * The parent must be a partition root. @@ -1540,7 +1539,7 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd, * * On legacy hierarchy, effective_cpus will be the same with cpu_allowed. * - * Called with cpuset_rwsem held + * Called with cpuset_mutex held */ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp, bool force) @@ -1700,7 +1699,7 @@ static void update_sibling_cpumasks(struct cpuset *parent, struct cpuset *cs, struct cpuset *sibling; struct cgroup_subsys_state *pos_css; - percpu_rwsem_assert_held(&cpuset_rwsem); + lockdep_assert_held(&cpuset_mutex); /* * Check all its siblings and call update_cpumasks_hier() @@ -1942,12 +1941,12 @@ static void *cpuset_being_rebound; * @cs: the cpuset in which each task's mems_allowed mask needs to be changed * * Iterate through each task of @cs updating its mems_allowed to the - * effective cpuset's. As this function is called with cpuset_rwsem held, + * effective cpuset's. As this function is called with cpuset_mutex held, * cpuset membership stays stable. */ static void update_tasks_nodemask(struct cpuset *cs) { - static nodemask_t newmems; /* protected by cpuset_rwsem */ + static nodemask_t newmems; /* protected by cpuset_mutex */ struct css_task_iter it; struct task_struct *task; @@ -1960,7 +1959,7 @@ static void update_tasks_nodemask(struct cpuset *cs) * take while holding tasklist_lock. Forks can happen - the * mpol_dup() cpuset_being_rebound check will catch such forks, * and rebind their vma mempolicies too. Because we still hold - * the global cpuset_rwsem, we know that no other rebind effort + * the global cpuset_mutex, we know that no other rebind effort * will be contending for the global variable cpuset_being_rebound. * It's ok if we rebind the same mm twice; mpol_rebind_mm() * is idempotent. Also migrate pages in each mm to new nodes. @@ -2006,7 +2005,7 @@ static void update_tasks_nodemask(struct cpuset *cs) * * On legacy hierarchy, effective_mems will be the same with mems_allowed. * - * Called with cpuset_rwsem held + * Called with cpuset_mutex held */ static void update_nodemasks_hier(struct cpuset *cs, nodemask_t *new_mems) { @@ -2059,7 +2058,7 @@ static void update_nodemasks_hier(struct cpuset *cs, nodemask_t *new_mems) * mempolicies and if the cpuset is marked 'memory_migrate', * migrate the tasks pages to the new memory. * - * Call with cpuset_rwsem held. May take callback_lock during call. + * Call with cpuset_mutex held. May take callback_lock during call. * Will take tasklist_lock, scan tasklist for tasks in cpuset cs, * lock each such tasks mm->mmap_lock, scan its vma's and rebind * their mempolicies to the cpusets new mems_allowed. @@ -2151,7 +2150,7 @@ static int update_relax_domain_level(struct cpuset *cs, s64 val) * @cs: the cpuset in which each task's spread flags needs to be changed * * Iterate through each task of @cs updating its spread flags. As this - * function is called with cpuset_rwsem held, cpuset membership stays + * function is called with cpuset_mutex held, cpuset membership stays * stable. */ static void update_tasks_flags(struct cpuset *cs) @@ -2171,7 +2170,7 @@ static void update_tasks_flags(struct cpuset *cs) * cs: the cpuset to update * turning_on: whether the flag is being set or cleared * - * Call with cpuset_rwsem held. + * Call with cpuset_mutex held. */ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs, @@ -2221,7 +2220,7 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs, * @new_prs: new partition root state * Return: 0 if successful, != 0 if error * - * Call with cpuset_rwsem held. + * Call with cpuset_mutex held. */ static int update_prstate(struct cpuset *cs, int new_prs) { @@ -2445,7 +2444,7 @@ static int fmeter_getrate(struct fmeter *fmp) static struct cpuset *cpuset_attach_old_cs; -/* Called by cgroups to determine if a cpuset is usable; cpuset_rwsem held */ +/* Called by cgroups to determine if a cpuset is usable; cpuset_mutex held */ static int cpuset_can_attach(struct cgroup_taskset *tset) { struct cgroup_subsys_state *css; @@ -2457,7 +2456,7 @@ static int cpuset_can_attach(struct cgroup_taskset *tset) cpuset_attach_old_cs = task_cs(cgroup_taskset_first(tset, &css)); cs = css_cs(css); - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); /* allow moving tasks into an empty cpuset if on default hierarchy */ ret = -ENOSPC; @@ -2487,7 +2486,7 @@ static int cpuset_can_attach(struct cgroup_taskset *tset) cs->attach_in_progress++; ret = 0; out_unlock: - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); return ret; } @@ -2497,13 +2496,13 @@ static void cpuset_cancel_attach(struct cgroup_taskset *tset) cgroup_taskset_first(tset, &css); - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); css_cs(css)->attach_in_progress--; - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); } /* - * Protected by cpuset_rwsem. cpus_attach is used only by cpuset_attach() + * Protected by cpuset_mutex. cpus_attach is used only by cpuset_attach() * but we can't allocate it dynamically there. Define it global and * allocate from cpuset_init(). */ @@ -2511,7 +2510,7 @@ static cpumask_var_t cpus_attach; static void cpuset_attach(struct cgroup_taskset *tset) { - /* static buf protected by cpuset_rwsem */ + /* static buf protected by cpuset_mutex */ static nodemask_t cpuset_attach_nodemask_to; struct task_struct *task; struct task_struct *leader; @@ -2524,7 +2523,7 @@ static void cpuset_attach(struct cgroup_taskset *tset) cs = css_cs(css); lockdep_assert_cpus_held(); /* see cgroup_attach_lock() */ - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); cpus_updated = !cpumask_equal(cs->effective_cpus, oldcs->effective_cpus); mems_updated = !nodes_equal(cs->effective_mems, oldcs->effective_mems); @@ -2597,7 +2596,7 @@ static void cpuset_attach(struct cgroup_taskset *tset) if (!cs->attach_in_progress) wake_up(&cpuset_attach_wq); - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); } /* The various types of files and directories in a cpuset file system */ @@ -2629,7 +2628,7 @@ static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft, int retval = 0; cpus_read_lock(); - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); if (!is_cpuset_online(cs)) { retval = -ENODEV; goto out_unlock; @@ -2665,7 +2664,7 @@ static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft, break; } out_unlock: - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); cpus_read_unlock(); return retval; } @@ -2678,7 +2677,7 @@ static int cpuset_write_s64(struct cgroup_subsys_state *css, struct cftype *cft, int retval = -ENODEV; cpus_read_lock(); - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); if (!is_cpuset_online(cs)) goto out_unlock; @@ -2691,7 +2690,7 @@ static int cpuset_write_s64(struct cgroup_subsys_state *css, struct cftype *cft, break; } out_unlock: - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); cpus_read_unlock(); return retval; } @@ -2724,7 +2723,7 @@ static ssize_t cpuset_write_resmask(struct kernfs_open_file *of, * operation like this one can lead to a deadlock through kernfs * active_ref protection. Let's break the protection. Losing the * protection is okay as we check whether @cs is online after - * grabbing cpuset_rwsem anyway. This only happens on the legacy + * grabbing cpuset_mutex anyway. This only happens on the legacy * hierarchies. */ css_get(&cs->css); @@ -2732,7 +2731,7 @@ static ssize_t cpuset_write_resmask(struct kernfs_open_file *of, flush_work(&cpuset_hotplug_work); cpus_read_lock(); - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); if (!is_cpuset_online(cs)) goto out_unlock; @@ -2756,7 +2755,7 @@ static ssize_t cpuset_write_resmask(struct kernfs_open_file *of, free_cpuset(trialcs); out_unlock: - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); cpus_read_unlock(); kernfs_unbreak_active_protection(of->kn); css_put(&cs->css); @@ -2904,13 +2903,13 @@ static ssize_t sched_partition_write(struct kernfs_open_file *of, char *buf, css_get(&cs->css); cpus_read_lock(); - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); if (!is_cpuset_online(cs)) goto out_unlock; retval = update_prstate(cs, val); out_unlock: - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); cpus_read_unlock(); css_put(&cs->css); return retval ?: nbytes; @@ -3127,7 +3126,7 @@ static int cpuset_css_online(struct cgroup_subsys_state *css) return 0; cpus_read_lock(); - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); set_bit(CS_ONLINE, &cs->flags); if (is_spread_page(parent)) @@ -3178,7 +3177,7 @@ static int cpuset_css_online(struct cgroup_subsys_state *css) cpumask_copy(cs->effective_cpus, parent->cpus_allowed); spin_unlock_irq(&callback_lock); out_unlock: - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); cpus_read_unlock(); return 0; } @@ -3199,7 +3198,7 @@ static void cpuset_css_offline(struct cgroup_subsys_state *css) struct cpuset *cs = css_cs(css); cpus_read_lock(); - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); if (is_partition_valid(cs)) update_prstate(cs, 0); @@ -3218,7 +3217,7 @@ static void cpuset_css_offline(struct cgroup_subsys_state *css) cpuset_dec(); clear_bit(CS_ONLINE, &cs->flags); - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); cpus_read_unlock(); } @@ -3231,7 +3230,7 @@ static void cpuset_css_free(struct cgroup_subsys_state *css) static void cpuset_bind(struct cgroup_subsys_state *root_css) { - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); spin_lock_irq(&callback_lock); if (is_in_v2_mode()) { @@ -3244,7 +3243,7 @@ static void cpuset_bind(struct cgroup_subsys_state *root_css) } spin_unlock_irq(&callback_lock); - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); } /* @@ -3357,7 +3356,7 @@ hotplug_update_tasks_legacy(struct cpuset *cs, is_empty = cpumask_empty(cs->cpus_allowed) || nodes_empty(cs->mems_allowed); - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); /* * Move tasks to the nearest ancestor with execution resources, @@ -3367,7 +3366,7 @@ hotplug_update_tasks_legacy(struct cpuset *cs, if (is_empty) remove_tasks_in_empty_cpuset(cs); - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); } static void @@ -3418,14 +3417,14 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp) retry: wait_event(cpuset_attach_wq, cs->attach_in_progress == 0); - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); /* * We have raced with task attaching. We wait until attaching * is finished, so we won't attach a task to an empty cpuset. */ if (cs->attach_in_progress) { - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); goto retry; } @@ -3519,7 +3518,7 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp) hotplug_update_tasks_legacy(cs, &new_cpus, &new_mems, cpus_updated, mems_updated); - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); } /** @@ -3549,7 +3548,7 @@ static void cpuset_hotplug_workfn(struct work_struct *work) if (on_dfl && !alloc_cpumasks(NULL, &tmp)) ptmp = &tmp; - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); /* fetch the available cpus/mems and find out which changed how */ cpumask_copy(&new_cpus, cpu_active_mask); @@ -3606,7 +3605,7 @@ static void cpuset_hotplug_workfn(struct work_struct *work) update_tasks_nodemask(&top_cpuset); } - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); /* if cpus or mems changed, we need to propagate to descendants */ if (cpus_updated || mems_updated) { @@ -4037,7 +4036,7 @@ void __cpuset_memory_pressure_bump(void) * - Used for /proc//cpuset. * - No need to task_lock(tsk) on this tsk->cpuset reference, as it * doesn't really matter if tsk->cpuset changes after we read it, - * and we take cpuset_rwsem, keeping cpuset_attach() from changing it + * and we take cpuset_mutex, keeping cpuset_attach() from changing it * anyway. */ int proc_cpuset_show(struct seq_file *m, struct pid_namespace *ns, diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b9616f153946..179266ff653f 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -7565,6 +7565,7 @@ static int __sched_setscheduler(struct task_struct *p, int reset_on_fork; int queue_flags = DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOCLOCK; struct rq *rq; + bool cpuset_locked = false; /* The pi code expects interrupts enabled */ BUG_ON(pi && in_interrupt()); @@ -7614,8 +7615,14 @@ static int __sched_setscheduler(struct task_struct *p, return retval; } - if (pi) - cpuset_read_lock(); + /* + * SCHED_DEADLINE bandwidth accounting relies on stable cpusets + * information. + */ + if (dl_policy(policy) || dl_policy(p->policy)) { + cpuset_locked = true; + cpuset_lock(); + } /* * Make sure no PI-waiters arrive (or leave) while we are @@ -7691,8 +7698,8 @@ static int __sched_setscheduler(struct task_struct *p, if (unlikely(oldpolicy != -1 && oldpolicy != p->policy)) { policy = oldpolicy = -1; task_rq_unlock(rq, p, &rf); - if (pi) - cpuset_read_unlock(); + if (cpuset_locked) + cpuset_unlock(); goto recheck; } @@ -7759,7 +7766,8 @@ static int __sched_setscheduler(struct task_struct *p, task_rq_unlock(rq, p, &rf); if (pi) { - cpuset_read_unlock(); + if (cpuset_locked) + cpuset_unlock(); rt_mutex_adjust_pi(p); } @@ -7771,8 +7779,8 @@ static int __sched_setscheduler(struct task_struct *p, unlock: task_rq_unlock(rq, p, &rf); - if (pi) - cpuset_read_unlock(); + if (cpuset_locked) + cpuset_unlock(); return retval; } From patchwork Wed Mar 29 12:55:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Juri Lelli X-Patchwork-Id: 76567 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp397871vqo; Wed, 29 Mar 2023 06:07:04 -0700 (PDT) X-Google-Smtp-Source: AKy350anQjaIl34phjvUD0Y2yl7PNa4v3D/Jjtjl1O7Ebs0n/wkgkcm4KXtFM5qkQfCi1bncrzjc X-Received: by 2002:aa7:d8cf:0:b0:502:2b1:c939 with SMTP id k15-20020aa7d8cf000000b0050202b1c939mr21018014eds.26.1680095224032; Wed, 29 Mar 2023 06:07:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680095224; cv=none; d=google.com; s=arc-20160816; b=wlq90V9vgMzVFTPYZHT/D7UGGLGk1X2tzyw1osOytmuQ8gEe673UKsEo4fnmNJ8bOu cvziksJV0uzUsTOIO1ik9AKg6z4O2uyVZ8WfwgElKxRTvzM6jgn3TjMSgv7/83fxD3VO BpB7Ve3qEDHSY0F7HUHbxaVC8spKiAZstl34JmsUEeuQRkyTs7MhD5vo+6UKvXAFCc5G wtkS3Yg5imoBk3Is7E2VQ6D9j3cyit95c45BqVVYqqAlaCRcPQPx9ec8EVkS+8qDP7iY Ez01EFUa9doa+blNJJNSttl2l1RDi2vINJ8EVBlbF42C175MksgQdKYGuMckq4A13qs1 Q62w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=V+tcCpHoOz8e5BKcWtXtUbqlaYi7I1AwPgxgJKpvzd4=; b=yyXjoG7sDkCp+ylfVPR0zyTOKY8zPdcP4/8l4SqSjzXkqyJXcIfKHXtpB6WqW8wL1b J3jX9zIhUBsrdVIbAgHxaZ3rzcukhVj2sEvTlxbAUwBuLs6O0HRrYZ2HCZew8a+ePNbS www2CF5T1daq3r2HX0Oo5qOOwb9i7giHNxLK6I8/yzH2P5NK8grut2tJTgYAFd+O7vTG wIBPFIf4H8UHbJgT4KN1Npz7AVxkRNm5sE3CDSfdTN/1lZgCjCiTI7VgUASSLDprbPKt dZ4g/Xv5VpR1yUVrjf41Ic1fQptyLRK23u7v+WR4lp+v61L+KziffBNatjrYEFji3bVy /1eg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=iezc4kR3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c18-20020aa7c752000000b004acbb9ce157si36653344eds.237.2023.03.29.06.06.40; Wed, 29 Mar 2023 06:07:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=iezc4kR3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230044AbjC2M5Y (ORCPT + 99 others); Wed, 29 Mar 2023 08:57:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39106 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229642AbjC2M5S (ORCPT ); Wed, 29 Mar 2023 08:57:18 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 674521A6 for ; Wed, 29 Mar 2023 05:56:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680094594; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=V+tcCpHoOz8e5BKcWtXtUbqlaYi7I1AwPgxgJKpvzd4=; b=iezc4kR3ii/dKzubpLoedxCuP0qptMfEoNj7oOo89Yt9F/E7+PCVl+/b/T+ch9vKLGGKEt njM4pYE3Ksm42hmO4yWFnDn0Rmkp49MPGLt9BZ8lnkfumEhvUs2U1uC23U/3jiiRwXe3Rm S28rvHsk4sojU6XL14lbu66vrirttaI= Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-558-VsTgRHE9MXuuZqecwXlnng-1; Wed, 29 Mar 2023 08:56:33 -0400 X-MC-Unique: VsTgRHE9MXuuZqecwXlnng-1 Received: by mail-qt1-f200.google.com with SMTP id u1-20020a05622a198100b003e12a0467easo10167718qtc.11 for ; Wed, 29 Mar 2023 05:56:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680094593; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=V+tcCpHoOz8e5BKcWtXtUbqlaYi7I1AwPgxgJKpvzd4=; b=2wYpJAtD7b7c9w/m9RDu6nXgERZHML79xIrr4ooHIl3EIuhfol3LrinTw4VXKQM1rS K5RH04Ig6KS/hpU9pQhgwBeNNyeQlHjs2n5qiZgw4X/MHNS4VRMEHgO1Ilf52RP/2DC1 Wl7mdKYJI+Ve+VTWnPeZavPW2yBfH7chRLeEYYfMbrwP1XF8S62VwkwpeybQL0c5wNVB t6UtJciNIUWM10X0JE4ScIORepCDT/u5MVKd5KmaWenP0DtulgiMUnqMIxeE3aXNQ9nj gb+sHcktX3qlIHcSqhcoOljnC3p7IPrfRoURWxz4zV2mhgq0AB9isL/EnXQ6QK/DO8T0 Hy0A== X-Gm-Message-State: AAQBX9fWGiKX726PrIS5Ib9TDCG9nNwfcxygrkAPqbwv6HEHlRAmZqey NIn468JYylWNoHR3HtNyFyjIOSNawU5Kz80SnD3NaUU7oJgP9AajArtaXitpSiRsBu7RuHriqdf 0AEMYEyVzqL7EJKtfWoP3tbbI X-Received: by 2002:a05:6214:1c8d:b0:5df:466f:9edc with SMTP id ib13-20020a0562141c8d00b005df466f9edcmr8412273qvb.22.1680094592810; Wed, 29 Mar 2023 05:56:32 -0700 (PDT) X-Received: by 2002:a05:6214:1c8d:b0:5df:466f:9edc with SMTP id ib13-20020a0562141c8d00b005df466f9edcmr8412238qvb.22.1680094592489; Wed, 29 Mar 2023 05:56:32 -0700 (PDT) Received: from localhost.localdomain.com ([151.29.151.163]) by smtp.gmail.com with ESMTPSA id c23-20020a379a17000000b007436d0e9408sm13527134qke.127.2023.03.29.05.56.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 Mar 2023 05:56:32 -0700 (PDT) From: Juri Lelli To: Peter Zijlstra , Ingo Molnar , Qais Yousef , Waiman Long , Tejun Heo , Zefan Li , Johannes Weiner , Hao Luo Cc: Dietmar Eggemann , Steven Rostedt , linux-kernel@vger.kernel.org, luca.abeni@santannapisa.it, claudio@evidence.eu.com, tommaso.cucinotta@santannapisa.it, bristot@redhat.com, mathieu.poirier@linaro.org, cgroups@vger.kernel.org, Vincent Guittot , Wei Wang , Rick Yiu , Quentin Perret , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Sudeep Holla , Juri Lelli Subject: [PATCH 3/6] sched/cpuset: Keep track of SCHED_DEADLINE task in cpusets Date: Wed, 29 Mar 2023 14:55:55 +0200 Message-Id: <20230329125558.255239-4-juri.lelli@redhat.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230329125558.255239-1-juri.lelli@redhat.com> References: <20230329125558.255239-1-juri.lelli@redhat.com> MIME-Version: 1.0 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761707529910643333?= X-GMAIL-MSGID: =?utf-8?q?1761707529910643333?= Qais reported that iterating over all tasks when rebuilding root domains for finding out which ones are DEADLINE and need their bandwidth correctly restored on such root domains can be a costly operation (10+ ms delays on suspend-resume). To fix the problem keep track of the number of DEADLINE tasks belonging to each cpuset and then use this information (followup patch) to only perform the above iteration if DEADLINE tasks are actually present in the cpuset for which a corresponding root domain is being rebuilt. Reported-by: Qais Yousef Link: https://lore.kernel.org/lkml/20230206221428.2125324-1-qyousef@layalina.io/ Signed-off-by: Juri Lelli Reviewed-by: Qais Yousef Tested-by: Qais Yousef --- include/linux/cpuset.h | 4 ++++ kernel/cgroup/cgroup.c | 4 ++++ kernel/cgroup/cpuset.c | 25 +++++++++++++++++++++++++ kernel/sched/deadline.c | 14 ++++++++++++++ 4 files changed, 47 insertions(+) diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h index 355f796c5f07..0348dba5680e 100644 --- a/include/linux/cpuset.h +++ b/include/linux/cpuset.h @@ -71,6 +71,8 @@ extern void cpuset_init_smp(void); extern void cpuset_force_rebuild(void); extern void cpuset_update_active_cpus(void); extern void cpuset_wait_for_hotplug(void); +extern void inc_dl_tasks_cs(struct task_struct *task); +extern void dec_dl_tasks_cs(struct task_struct *task); extern void cpuset_lock(void); extern void cpuset_unlock(void); extern void cpuset_cpus_allowed(struct task_struct *p, struct cpumask *mask); @@ -196,6 +198,8 @@ static inline void cpuset_update_active_cpus(void) static inline void cpuset_wait_for_hotplug(void) { } +static inline void inc_dl_tasks_cs(struct task_struct *task) { } +static inline void dec_dl_tasks_cs(struct task_struct *task) { } static inline void cpuset_lock(void) { } static inline void cpuset_unlock(void) { } diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 935e8121b21e..ff27b2d2bf0b 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -57,6 +57,7 @@ #include #include #include +#include #include #include @@ -6673,6 +6674,9 @@ void cgroup_exit(struct task_struct *tsk) list_add_tail(&tsk->cg_list, &cset->dying_tasks); cset->nr_tasks--; + if (dl_task(tsk)) + dec_dl_tasks_cs(tsk); + WARN_ON_ONCE(cgroup_task_frozen(tsk)); if (unlikely(!(tsk->flags & PF_KTHREAD) && test_bit(CGRP_FREEZE, &task_dfl_cgroup(tsk)->flags))) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index fbc10b494292..eb0854ef9757 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -193,6 +193,12 @@ struct cpuset { int use_parent_ecpus; int child_ecpus_count; + /* + * number of SCHED_DEADLINE tasks attached to this cpuset, so that we + * know when to rebuild associated root domain bandwidth information. + */ + int nr_deadline_tasks; + /* Invalid partition error code, not lock protected */ enum prs_errcode prs_err; @@ -245,6 +251,20 @@ static inline struct cpuset *parent_cs(struct cpuset *cs) return css_cs(cs->css.parent); } +void inc_dl_tasks_cs(struct task_struct *p) +{ + struct cpuset *cs = task_cs(p); + + cs->nr_deadline_tasks++; +} + +void dec_dl_tasks_cs(struct task_struct *p) +{ + struct cpuset *cs = task_cs(p); + + cs->nr_deadline_tasks--; +} + /* bits in struct cpuset flags field */ typedef enum { CS_ONLINE, @@ -2477,6 +2497,11 @@ static int cpuset_can_attach(struct cgroup_taskset *tset) ret = security_task_setscheduler(task); if (ret) goto out_unlock; + + if (dl_task(task)) { + cs->nr_deadline_tasks++; + cpuset_attach_old_cs->nr_deadline_tasks--; + } } /* diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 4cc7e1ca066d..8f92f0f87383 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -16,6 +16,8 @@ * Fabio Checconi */ +#include + /* * Default limits for DL period; on the top end we guard against small util * tasks still getting ridiculously long effective runtimes, on the bottom end we @@ -2595,6 +2597,12 @@ static void switched_from_dl(struct rq *rq, struct task_struct *p) if (task_on_rq_queued(p) && p->dl.dl_runtime) task_non_contending(p); + /* + * In case a task is setscheduled out from SCHED_DEADLINE we need to + * keep track of that on its cpuset (for correct bandwidth tracking). + */ + dec_dl_tasks_cs(p); + if (!task_on_rq_queued(p)) { /* * Inactive timer is armed. However, p is leaving DEADLINE and @@ -2635,6 +2643,12 @@ static void switched_to_dl(struct rq *rq, struct task_struct *p) if (hrtimer_try_to_cancel(&p->dl.inactive_timer) == 1) put_task_struct(p); + /* + * In case a task is setscheduled to SCHED_DEADLINE we need to keep + * track of that on its cpuset (for correct bandwidth tracking). + */ + inc_dl_tasks_cs(p); + /* If p is not queued we will update its parameters at next wakeup. */ if (!task_on_rq_queued(p)) { add_rq_bw(&p->dl, &rq->dl); From patchwork Wed Mar 29 12:55:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Juri Lelli X-Patchwork-Id: 76565 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp397329vqo; Wed, 29 Mar 2023 06:06:31 -0700 (PDT) X-Google-Smtp-Source: AKy350Zb2Q7vOIfLHghqm6bLzUtmNYcJQtjrbDrQZLaYfdj8EXr3qUmJ3dNhkcptBW+d3urhiGjz X-Received: by 2002:a17:907:7ea6:b0:944:18ef:c970 with SMTP id qb38-20020a1709077ea600b0094418efc970mr16846364ejc.32.1680095190908; Wed, 29 Mar 2023 06:06:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680095190; cv=none; d=google.com; s=arc-20160816; b=woqb5nkNQ3C589FSLIkLt83cQqTLzflIaUwy0MkDyellbl4BOuqnRGbK6fEDxhkCQP KiubQFvy29bSC8/cKPstfwXVr94nPJLfoG81s113u0RbKV3U1lczLx5+Dcu8KJfVir/z +GpEhSE3Mmdz1fAouusmExzRKESFscDR71vdSHzpb7OtRNbh8FKj/YsyRngmuauL5wJ6 +Ae7NfzBzaGB0T497a0LIu9gwr7SgNmFD3C4oJXdzg9zAiMgVVQx4KmfOMSHuSV5HRdv G9qnvx0b0HbTDNS6egk/qNNyHAvYFRCzuRVedvJgX/cyMRV9k1veyFJl27ako8SZ3OTu OdaQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=iaWZTicEe0Rf17O02C5dBoHMVec0v/VEmNZr7Wzl9CM=; b=W1YP6BvpgT0jVozHBppdrfCRJEVoCm8qGKqXEnAhNDhqaQEMN73B70j3A9Ggr2AMDh fHRdu8eHiQNK7077XydT7zlxegXuIO750iMnssJv3BHY3xVuhCoWE2kec80ETw+nC5vR /fi8Sh7IDIWjTOjmTs/HPKpkIwsTBN7v4Ze1V5NTtykU3XGnYpxr124sf52WVONd5+1a f7TQ+eUEU0TGO4PNH3kF8opm/XOL3AuYOa4/ubb/2PVb6OPQPYSS7dq9doUTYNNexb5i J3WoeXIpSRq4meliUdrS8OWOrY95NnnVWFQOuv/pfO5o2cA/LOjHQDZzefPuF8LivKhT /dGw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="jP/UsPQ1"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id r26-20020a1709064d1a00b008cdf4a9c9a1si22351902eju.622.2023.03.29.06.05.55; Wed, 29 Mar 2023 06:06:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="jP/UsPQ1"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230059AbjC2M5a (ORCPT + 99 others); Wed, 29 Mar 2023 08:57:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39126 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229852AbjC2M5W (ORCPT ); Wed, 29 Mar 2023 08:57:22 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1E3141708 for ; Wed, 29 Mar 2023 05:56:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680094599; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=iaWZTicEe0Rf17O02C5dBoHMVec0v/VEmNZr7Wzl9CM=; b=jP/UsPQ1D1KbB5NLeq2T3VNT6kppjDWzrhzSlvi2kqfl2NjcVARIG3ZqyHC4lW8YVmr8q4 tDlYpmGuxRfxZT6PjQhmx1O2z1G+Z21aoG7s3tyX7SQWwuq0KLP3R6asLJ+GwPIjN4a9iu nP1l2+zEJowZEM11B3aQxbwC+7ArAl4= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-549-JiPWSA-zNcCp0HV2a1D3kw-1; Wed, 29 Mar 2023 08:56:37 -0400 X-MC-Unique: JiPWSA-zNcCp0HV2a1D3kw-1 Received: by mail-qv1-f71.google.com with SMTP id y19-20020ad445b3000000b005a5123cb627so6510409qvu.20 for ; Wed, 29 Mar 2023 05:56:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680094597; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=iaWZTicEe0Rf17O02C5dBoHMVec0v/VEmNZr7Wzl9CM=; b=efuMKWRiX33sQULZDd/guas9XF46HFMakozoliT2aFdZrpwWFldoeZwAx4Yhqp5D5C ngNiGBEm4Q6bkEUT7+VDe7fSlqt0Da/aDAUn7xxzQmYyu3lcceZ+bNzLvsBvEYyUBzY0 t20mUG+6KCyb+6V3pca944BOTnA0hh5yMXonHmiAj0H+cNNpQfhXqXQAWz+6C5sApW0M OV8OcNppeVAf2tqiOgtnhD+GlQrFXIS6ZEF1K4VYES3A/w+tT39U/m+KKlx6i31E4sei Cn3J/bavB9785Ff7YpcDTDeldDSPiyiGQtqpx9+q4MQfEtOXtrt1Wv8qCBkAr7EuV6EU NBfQ== X-Gm-Message-State: AO0yUKUl26A9xbA4c3g9LDs5RMNFDZ3G9d+vz6fPJqTm45iS0H1VOzp5 c/HwOrGUScuuy/obUYmFSpb43QjrXsMTz0p9DdzmxkI2VKqAkLx7lIaGf8cGg3dUe82E7hP7vw4 cg7hyFzMEWIZ1eAG24y4aygfG X-Received: by 2002:ac8:57d5:0:b0:3b8:340b:1aab with SMTP id w21-20020ac857d5000000b003b8340b1aabmr31612101qta.25.1680094597323; Wed, 29 Mar 2023 05:56:37 -0700 (PDT) X-Received: by 2002:ac8:57d5:0:b0:3b8:340b:1aab with SMTP id w21-20020ac857d5000000b003b8340b1aabmr31612059qta.25.1680094597004; Wed, 29 Mar 2023 05:56:37 -0700 (PDT) Received: from localhost.localdomain.com ([151.29.151.163]) by smtp.gmail.com with ESMTPSA id c23-20020a379a17000000b007436d0e9408sm13527134qke.127.2023.03.29.05.56.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 Mar 2023 05:56:36 -0700 (PDT) From: Juri Lelli To: Peter Zijlstra , Ingo Molnar , Qais Yousef , Waiman Long , Tejun Heo , Zefan Li , Johannes Weiner , Hao Luo Cc: Dietmar Eggemann , Steven Rostedt , linux-kernel@vger.kernel.org, luca.abeni@santannapisa.it, claudio@evidence.eu.com, tommaso.cucinotta@santannapisa.it, bristot@redhat.com, mathieu.poirier@linaro.org, cgroups@vger.kernel.org, Vincent Guittot , Wei Wang , Rick Yiu , Quentin Perret , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Sudeep Holla , Juri Lelli Subject: [PATCH 4/6] sched/deadline: Create DL BW alloc, free & check overflow interface Date: Wed, 29 Mar 2023 14:55:56 +0200 Message-Id: <20230329125558.255239-5-juri.lelli@redhat.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230329125558.255239-1-juri.lelli@redhat.com> References: <20230329125558.255239-1-juri.lelli@redhat.com> MIME-Version: 1.0 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761707494591141719?= X-GMAIL-MSGID: =?utf-8?q?1761707494591141719?= From: Dietmar Eggemann Rework the existing dl_cpu_busy() interface which offers DL BW overflow checking and per-task DL BW allocation. Add dl_bw_free() as an interface to be able to free DL BW. It will be used to allow freeing of the DL BW request done during cpuset_can_attach() in case multiple controllers are attached to the cgroup next to the cpuset controller and one of the non-cpuset can_attach() fails. dl_bw_alloc() (and dl_bw_free()) now take a `u64 dl_bw` parameter instead of `struct task_struct *p` used in dl_cpu_busy(). This allows to allocate DL BW for a set of tasks too rater than only for a single task. Signed-off-by: Dietmar Eggemann Signed-off-by: Juri Lelli --- include/linux/sched.h | 2 ++ kernel/sched/core.c | 4 ++-- kernel/sched/deadline.c | 53 +++++++++++++++++++++++++++++++---------- kernel/sched/sched.h | 2 +- 4 files changed, 45 insertions(+), 16 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 6d654eb4cabd..6f3d84e0ed08 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1848,6 +1848,8 @@ current_restore_flags(unsigned long orig_flags, unsigned long flags) extern int cpuset_cpumask_can_shrink(const struct cpumask *cur, const struct cpumask *trial); extern int task_can_attach(struct task_struct *p, const struct cpumask *cs_effective_cpus); +extern int dl_bw_alloc(int cpu, u64 dl_bw); +extern void dl_bw_free(int cpu, u64 dl_bw); #ifdef CONFIG_SMP extern void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask); extern int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 179266ff653f..c83dae6b8586 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -9294,7 +9294,7 @@ int task_can_attach(struct task_struct *p, if (unlikely(cpu >= nr_cpu_ids)) return -EINVAL; - ret = dl_cpu_busy(cpu, p); + ret = dl_bw_alloc(cpu, p->dl.dl_bw); } out: @@ -9579,7 +9579,7 @@ static void cpuset_cpu_active(void) static int cpuset_cpu_inactive(unsigned int cpu) { if (!cpuhp_tasks_frozen) { - int ret = dl_cpu_busy(cpu, NULL); + int ret = dl_bw_check_overflow(cpu); if (ret) return ret; diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 8f92f0f87383..5b6965e0e537 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -3057,26 +3057,38 @@ int dl_cpuset_cpumask_can_shrink(const struct cpumask *cur, return ret; } -int dl_cpu_busy(int cpu, struct task_struct *p) +enum dl_bw_request { + dl_bw_req_check_overflow = 0, + dl_bw_req_alloc, + dl_bw_req_free +}; + +static int dl_bw_manage(enum dl_bw_request req, int cpu, u64 dl_bw) { - unsigned long flags, cap; + unsigned long flags; struct dl_bw *dl_b; - bool overflow; + bool overflow = 0; rcu_read_lock_sched(); dl_b = dl_bw_of(cpu); raw_spin_lock_irqsave(&dl_b->lock, flags); - cap = dl_bw_capacity(cpu); - overflow = __dl_overflow(dl_b, cap, 0, p ? p->dl.dl_bw : 0); - if (!overflow && p) { - /* - * We reserve space for this task in the destination - * root_domain, as we can't fail after this point. - * We will free resources in the source root_domain - * later on (see set_cpus_allowed_dl()). - */ - __dl_add(dl_b, p->dl.dl_bw, dl_bw_cpus(cpu)); + if (req == dl_bw_req_free) { + __dl_sub(dl_b, dl_bw, dl_bw_cpus(cpu)); + } else { + unsigned long cap = dl_bw_capacity(cpu); + + overflow = __dl_overflow(dl_b, cap, 0, dl_bw); + + if (req == dl_bw_req_alloc && !overflow) { + /* + * We reserve space in the destination + * root_domain, as we can't fail after this point. + * We will free resources in the source root_domain + * later on (see set_cpus_allowed_dl()). + */ + __dl_add(dl_b, dl_bw, dl_bw_cpus(cpu)); + } } raw_spin_unlock_irqrestore(&dl_b->lock, flags); @@ -3084,6 +3096,21 @@ int dl_cpu_busy(int cpu, struct task_struct *p) return overflow ? -EBUSY : 0; } + +int dl_bw_check_overflow(int cpu) +{ + return dl_bw_manage(dl_bw_req_check_overflow, cpu, 0); +} + +int dl_bw_alloc(int cpu, u64 dl_bw) +{ + return dl_bw_manage(dl_bw_req_alloc, cpu, dl_bw); +} + +void dl_bw_free(int cpu, u64 dl_bw) +{ + dl_bw_manage(dl_bw_req_free, cpu, dl_bw); +} #endif #ifdef CONFIG_SCHED_DEBUG diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 060616944d7a..81ecfd1a1a48 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -330,7 +330,7 @@ extern void __getparam_dl(struct task_struct *p, struct sched_attr *attr); extern bool __checkparam_dl(const struct sched_attr *attr); extern bool dl_param_changed(struct task_struct *p, const struct sched_attr *attr); extern int dl_cpuset_cpumask_can_shrink(const struct cpumask *cur, const struct cpumask *trial); -extern int dl_cpu_busy(int cpu, struct task_struct *p); +extern int dl_bw_check_overflow(int cpu); #ifdef CONFIG_CGROUP_SCHED From patchwork Wed Mar 29 12:55:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Juri Lelli X-Patchwork-Id: 76566 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp397795vqo; Wed, 29 Mar 2023 06:06:58 -0700 (PDT) X-Google-Smtp-Source: AKy350Zd/mBoDUBddj94HkgbbMuBIVDBtJejyvaHogHEhCp4LMUHeeHlQkqz9+sCK3xvql5ylGKR X-Received: by 2002:a17:907:2077:b0:92f:fbac:69c4 with SMTP id qp23-20020a170907207700b0092ffbac69c4mr18344847ejb.56.1680095218788; Wed, 29 Mar 2023 06:06:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680095218; cv=none; d=google.com; s=arc-20160816; b=vAAK7VRiUfYptW10rcEciYl/zMpy1uyvXG1kAg9s7s1XGJTSna2Wp0GW4gmXtdc+md 6Quu4kSo8Ka726zI1X0kbrYzwzBpqdFvY/+klSDgWAOKM3WV6NwMgTTXcgo5UKUUUP4c CSERYZgKeRRAU4x76FbtviirQI3adNi6iygUkwZR0J3faTMEEnYZbVkeLMSTcQUMVZxf H5fkbFVfl7TrltqvfKcuzy9iPCrtEq492YdOVGEE+SBGIXxXntktXKMfmpfqqis+6H7r qNBpl1qaprx562ln6SPJ2r0P5+J557vowWXfKcxiSldOJzcljNF71ci8qgcpybH/E1/o 5Xwg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=AarAALbhiwCdePxFbgwikLSSecQrfwKJY2q+EcgoXOI=; b=NlVxyK4mSQw3P+QHoqNKwgRtgR3mJNRaQnjWGm/QDCKuI5jYSwCcyGIfydK6KvUIdw bGjIJla4ReHXLCFSiwIhk4fFA1TpCfERXCqRtejDGqSTsYz3jUFlhKTso80ezSUIazKl Wnk3oOnpWx4C4HePfdErp9zNmvLED1AvPOVay1CC/EkNgiIMCKTXu5ycFgyx0eosjBho VnJBIWZ5wwLeDPGynu1cHFF4CTLy1Qt7gASPd1j4XWB8GnjCnJQfEBkXsMAV+kNpKGoO O6Le3Nrzc1SXY1bYJFRQJa68T5WMapZIWNGFLMzj6IEBffDr2zvLsFNe2NV1d/2FBdri niDA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=EIrvsdSR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e9-20020a056402104900b004ad06322427si17840948edu.9.2023.03.29.06.06.33; Wed, 29 Mar 2023 06:06:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=EIrvsdSR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230054AbjC2M56 (ORCPT + 99 others); Wed, 29 Mar 2023 08:57:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39168 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229652AbjC2M5u (ORCPT ); Wed, 29 Mar 2023 08:57:50 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7FDF61717 for ; Wed, 29 Mar 2023 05:56:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680094604; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=AarAALbhiwCdePxFbgwikLSSecQrfwKJY2q+EcgoXOI=; b=EIrvsdSRTKH7BcbmCj5tP9sEy5SIgU/dCBuz/CrTkAk8qjk4lzPPjDexYhuWJh8/mHAzO1 pU2Yyh6WURDq65sK9OaG0Xd/iHNS1Sf2FuN3q+vvwmlfq6nPNBI5920tN5hK6D17upHwRX 8s8dcwUqGQ6p0YZ6mEqjoTmnNfDyl9s= Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-507-6TTYgejiO5Wt9ZoYHNVWjA-1; Wed, 29 Mar 2023 08:56:43 -0400 X-MC-Unique: 6TTYgejiO5Wt9ZoYHNVWjA-1 Received: by mail-qt1-f197.google.com with SMTP id p22-20020a05622a00d600b003e38f7f800bso10085894qtw.9 for ; Wed, 29 Mar 2023 05:56:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680094603; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=AarAALbhiwCdePxFbgwikLSSecQrfwKJY2q+EcgoXOI=; b=LTLuhO8BifJmksgqMcNM3D1TpxkLRpOmXcOliy5B2vjSNys1AFoep59hIo/OjlXvGJ hby18z2NhUa+u3aELTOgRv4SobscmK2ZWgvfHjOlSmpZsr/CmStZJhmvFFSDqORJ59NK k8Bbv/3a9Pwrpr2Dh+NgLxc0WyrGjC+vBMJMVxgIVxs/yQU/p7UZp29ePNxJpXthJC5c kWhWzDnk5MI847TSsFM+IFBnLZxTrtpBFpKYLtysbpr/O3BXQJDzw+og/lLIJ+97POO/ uNN7BLiXRIOCEoNa0Z1ySBgu9ugy9djCYawRkErIQyKqc1ZY8vidRqO6O37LZ5lKAVs5 nMVw== X-Gm-Message-State: AO0yUKXiVHDBAJnElu7odsUqHQ4gQKqvVqIQVA0AIQhJdyEVsrJXpK7d Ytl1VP7grcgMd5YqsG7IxKqPZdB0G4PmfXkb0WKVdapILkqkpMov7GaKUnLhvqnXZ2nPpAv//DI RFu9xdPF70BDQMV/L7X+iF/oK X-Received: by 2002:ac8:5a95:0:b0:3e1:59e8:7437 with SMTP id c21-20020ac85a95000000b003e159e87437mr32925156qtc.0.1680094602652; Wed, 29 Mar 2023 05:56:42 -0700 (PDT) X-Received: by 2002:ac8:5a95:0:b0:3e1:59e8:7437 with SMTP id c21-20020ac85a95000000b003e159e87437mr32925117qtc.0.1680094602334; Wed, 29 Mar 2023 05:56:42 -0700 (PDT) Received: from localhost.localdomain.com ([151.29.151.163]) by smtp.gmail.com with ESMTPSA id c23-20020a379a17000000b007436d0e9408sm13527134qke.127.2023.03.29.05.56.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 Mar 2023 05:56:42 -0700 (PDT) From: Juri Lelli To: Peter Zijlstra , Ingo Molnar , Qais Yousef , Waiman Long , Tejun Heo , Zefan Li , Johannes Weiner , Hao Luo Cc: Dietmar Eggemann , Steven Rostedt , linux-kernel@vger.kernel.org, luca.abeni@santannapisa.it, claudio@evidence.eu.com, tommaso.cucinotta@santannapisa.it, bristot@redhat.com, mathieu.poirier@linaro.org, cgroups@vger.kernel.org, Vincent Guittot , Wei Wang , Rick Yiu , Quentin Perret , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Sudeep Holla , Juri Lelli Subject: [PATCH 5/6] cgroup/cpuset: Free DL BW in case can_attach() fails Date: Wed, 29 Mar 2023 14:55:57 +0200 Message-Id: <20230329125558.255239-6-juri.lelli@redhat.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230329125558.255239-1-juri.lelli@redhat.com> References: <20230329125558.255239-1-juri.lelli@redhat.com> MIME-Version: 1.0 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761707524131360109?= X-GMAIL-MSGID: =?utf-8?q?1761707524131360109?= From: Dietmar Eggemann cpuset_can_attach() can fail. Postpone DL BW allocation until all tasks have been checked. DL BW is not allocated per-task but as a sum over all DL tasks migrating. If multiple controllers are attached to the cgroup next to the cuset controller a non-cpuset can_attach() can fail. In this case free DL BW in cpuset_cancel_attach(). Finally, update cpuset DL task count (nr_deadline_tasks) only in cpuset_attach(). Suggested-by: Waiman Long Signed-off-by: Dietmar Eggemann Signed-off-by: Juri Lelli --- include/linux/sched.h | 2 +- kernel/cgroup/cpuset.c | 55 ++++++++++++++++++++++++++++++++++++++---- kernel/sched/core.c | 17 ++----------- 3 files changed, 53 insertions(+), 21 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 6f3d84e0ed08..50cbbfefbe11 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1847,7 +1847,7 @@ current_restore_flags(unsigned long orig_flags, unsigned long flags) } extern int cpuset_cpumask_can_shrink(const struct cpumask *cur, const struct cpumask *trial); -extern int task_can_attach(struct task_struct *p, const struct cpumask *cs_effective_cpus); +extern int task_can_attach(struct task_struct *p); extern int dl_bw_alloc(int cpu, u64 dl_bw); extern void dl_bw_free(int cpu, u64 dl_bw); #ifdef CONFIG_SMP diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index eb0854ef9757..f8ebec66da51 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -198,6 +198,8 @@ struct cpuset { * know when to rebuild associated root domain bandwidth information. */ int nr_deadline_tasks; + int nr_migrate_dl_tasks; + u64 sum_migrate_dl_bw; /* Invalid partition error code, not lock protected */ enum prs_errcode prs_err; @@ -2464,16 +2466,23 @@ static int fmeter_getrate(struct fmeter *fmp) static struct cpuset *cpuset_attach_old_cs; +static void reset_migrate_dl_data(struct cpuset *cs) +{ + cs->nr_migrate_dl_tasks = 0; + cs->sum_migrate_dl_bw = 0; +} + /* Called by cgroups to determine if a cpuset is usable; cpuset_mutex held */ static int cpuset_can_attach(struct cgroup_taskset *tset) { struct cgroup_subsys_state *css; - struct cpuset *cs; + struct cpuset *cs, *oldcs; struct task_struct *task; int ret; /* used later by cpuset_attach() */ cpuset_attach_old_cs = task_cs(cgroup_taskset_first(tset, &css)); + oldcs = cpuset_attach_old_cs; cs = css_cs(css); mutex_lock(&cpuset_mutex); @@ -2491,7 +2500,7 @@ static int cpuset_can_attach(struct cgroup_taskset *tset) goto out_unlock; cgroup_taskset_for_each(task, css, tset) { - ret = task_can_attach(task, cs->effective_cpus); + ret = task_can_attach(task); if (ret) goto out_unlock; ret = security_task_setscheduler(task); @@ -2499,11 +2508,31 @@ static int cpuset_can_attach(struct cgroup_taskset *tset) goto out_unlock; if (dl_task(task)) { - cs->nr_deadline_tasks++; - cpuset_attach_old_cs->nr_deadline_tasks--; + cs->nr_migrate_dl_tasks++; + cs->sum_migrate_dl_bw += task->dl.dl_bw; + } + } + + if (!cs->nr_migrate_dl_tasks) + goto out_succes; + + if (!cpumask_intersects(oldcs->effective_cpus, cs->effective_cpus)) { + int cpu = cpumask_any_and(cpu_active_mask, cs->effective_cpus); + + if (unlikely(cpu >= nr_cpu_ids)) { + reset_migrate_dl_data(cs); + ret = -EINVAL; + goto out_unlock; + } + + ret = dl_bw_alloc(cpu, cs->sum_migrate_dl_bw); + if (ret) { + reset_migrate_dl_data(cs); + goto out_unlock; } } +out_succes: /* * Mark attach is in progress. This makes validate_change() fail * changes which zero cpus/mems_allowed. @@ -2518,11 +2547,21 @@ static int cpuset_can_attach(struct cgroup_taskset *tset) static void cpuset_cancel_attach(struct cgroup_taskset *tset) { struct cgroup_subsys_state *css; + struct cpuset *cs; cgroup_taskset_first(tset, &css); + cs = css_cs(css); mutex_lock(&cpuset_mutex); - css_cs(css)->attach_in_progress--; + cs->attach_in_progress--; + + if (cs->nr_migrate_dl_tasks) { + int cpu = cpumask_any(cs->effective_cpus); + + dl_bw_free(cpu, cs->sum_migrate_dl_bw); + reset_migrate_dl_data(cs); + } + mutex_unlock(&cpuset_mutex); } @@ -2617,6 +2656,12 @@ static void cpuset_attach(struct cgroup_taskset *tset) out: cs->old_mems_allowed = cpuset_attach_nodemask_to; + if (cs->nr_migrate_dl_tasks) { + cs->nr_deadline_tasks += cs->nr_migrate_dl_tasks; + oldcs->nr_deadline_tasks -= cs->nr_migrate_dl_tasks; + reset_migrate_dl_data(cs); + } + cs->attach_in_progress--; if (!cs->attach_in_progress) wake_up(&cpuset_attach_wq); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index c83dae6b8586..10454980e830 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -9269,8 +9269,7 @@ int cpuset_cpumask_can_shrink(const struct cpumask *cur, return ret; } -int task_can_attach(struct task_struct *p, - const struct cpumask *cs_effective_cpus) +int task_can_attach(struct task_struct *p) { int ret = 0; @@ -9283,21 +9282,9 @@ int task_can_attach(struct task_struct *p, * success of set_cpus_allowed_ptr() on all attached tasks * before cpus_mask may be changed. */ - if (p->flags & PF_NO_SETAFFINITY) { + if (p->flags & PF_NO_SETAFFINITY) ret = -EINVAL; - goto out; - } - - if (dl_task(p) && !cpumask_intersects(task_rq(p)->rd->span, - cs_effective_cpus)) { - int cpu = cpumask_any_and(cpu_active_mask, cs_effective_cpus); - if (unlikely(cpu >= nr_cpu_ids)) - return -EINVAL; - ret = dl_bw_alloc(cpu, p->dl.dl_bw); - } - -out: return ret; } From patchwork Wed Mar 29 12:55:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Juri Lelli X-Patchwork-Id: 76570 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp399542vqo; Wed, 29 Mar 2023 06:08:51 -0700 (PDT) X-Google-Smtp-Source: AKy350Zhor9kzAQWD2OJwjNihGLkW+18mWUY7k++TY9xD/vjS1QmXk/GjntCCgMgj5vUOfGluUgS X-Received: by 2002:a17:906:90c9:b0:933:6ae6:374d with SMTP id v9-20020a17090690c900b009336ae6374dmr16775461ejw.73.1680095331639; Wed, 29 Mar 2023 06:08:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680095331; cv=none; d=google.com; s=arc-20160816; b=SjjAumPEWMRZEdwtajGOEOplXRNxGSLt/Ot5WnQQ33nimjQzZjwiuy3yhtkXFN/fAn axA8ieJdjI7a3bX3tf4ft+16c9kz704eYeGQdB1vymtvtBfFDMc7OntODUUpfpJKsS+n TtPmMiU1LyvHjuN74QALW6Vy0wfyaZGRRXr3Z9nNXaAaPIkWpfVa0oIBPzXfjb9L9Vtx C3F9o4vAHZ9MH5RGeXyCssEE29ePMcuq9FJl/QlYfPTS/7Y2oI1xHUx9ukYSRZkvT/my L7PRB7Y7m0sGjC6Jvgu/thfZ1bY8YPElsz4YDz/PrJQbpqoo24qeMW/dO7WuqelNNkrR fDyg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=5nV5Ts2SkJ8UEBvAJl55FmB5KeAii4wSJwG1pfX3s8w=; b=07gAoASp/fVJzvtlodr+FAH9EoKax2S2HJBUg1jHwcsKaWYwKggbXBuye+hC+CO4IX BPu6zkL3pn79anA7vvDl3m8UXfzaYM8A+9twtCwWKKs27djKcWVtV9o8bfTJSG8t4pnj hMPGYTdElDzOyWMOUGWe0b80JnePVaDsX1gyMJWSgYDGeXFanE0bFt8VmQrddd0fBbvT q0smNI+HOJqI+qC3MJ4lUgXUsxOAYwblzNt2mkDGbxj9N0VtTftq9xTgmY7jHds5MqMU 1oinjH2kpNTHazDP6aDObmWtSIzgxVnvPowPc5U1KQii/79jMhuHj/WXV7ng2KfwZlNU g87w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Zggtmc55; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c18-20020aa7c752000000b004acbb9ce157si36653344eds.237.2023.03.29.06.08.27; Wed, 29 Mar 2023 06:08:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Zggtmc55; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229939AbjC2M54 (ORCPT + 99 others); Wed, 29 Mar 2023 08:57:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39170 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229835AbjC2M5t (ORCPT ); Wed, 29 Mar 2023 08:57:49 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 51A8819B2 for ; Wed, 29 Mar 2023 05:56:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680094611; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5nV5Ts2SkJ8UEBvAJl55FmB5KeAii4wSJwG1pfX3s8w=; b=Zggtmc55O2V2DI9SJ2FH0P1FtwxPwL8neh0N6kciH0yUtujQHbsP654Hc8agpBlwWEmPpT jL3b8khbOMnVDq/5rvCmbDefi8ikf/lz6llAk2KYLhi5svjp06PEFk+pgcT9UxzoKckZ4o dKIgRi/HnpCop4SJ8t+MPLquffEzNCo= Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-428-ieF7ZibMOp275gam2_ErIA-1; Wed, 29 Mar 2023 08:56:48 -0400 X-MC-Unique: ieF7ZibMOp275gam2_ErIA-1 Received: by mail-qt1-f200.google.com with SMTP id v7-20020a05622a188700b003e0e27bbc2eso10173672qtc.8 for ; Wed, 29 Mar 2023 05:56:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680094608; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5nV5Ts2SkJ8UEBvAJl55FmB5KeAii4wSJwG1pfX3s8w=; b=wpnNWN3tzxtNWd+itqQOEmKd442v8SbGJQ5U3/2/CkltjAqhLxYpHjFKmYEY8INtaJ 8TitGEFEBvfU4eakXLJUwUVOnJ9/agLBxk/sfi6ZE9ImwW2sko71OQGUjvxeE0mIGPA1 NdH8pVZPnwc+Gft0WpC2F/9Q+DveDv60oQJWZcToeb/zgIJdiBUfGdwS8un1qgX3Muua M4clZ0+EQghrfajd2tkul5+TJyCSS6oth7zjxkexyrEPe/LK/hXmtugAwnd0YUqA1Jyu FzrhDciJDzAfJUbVHNGZxzmhBmCO2DoASU0v7p0YtxWU9fhDGPMVV9dcTWQqf88zWRXo ziVA== X-Gm-Message-State: AO0yUKWD4b0rTy+Q6AcQhgfXZRFWlyjBSOh7NtWH0dv0bDHK8GUtVI2w pKFzHEQL/0FEcSZMFXuDL4yBt/1f2eVIyqWAzxYMG4ucg5YNPhR2u+gVt2mM7/y6iAQ9XHqnfIS yea1Px1cIh0RoqO/wxtzZJqoDHMXo3lBSUr0= X-Received: by 2002:a05:622a:647:b0:3bf:d9ee:882d with SMTP id a7-20020a05622a064700b003bfd9ee882dmr31368140qtb.40.1680094608245; Wed, 29 Mar 2023 05:56:48 -0700 (PDT) X-Received: by 2002:a05:622a:647:b0:3bf:d9ee:882d with SMTP id a7-20020a05622a064700b003bfd9ee882dmr31368117qtb.40.1680094607962; Wed, 29 Mar 2023 05:56:47 -0700 (PDT) Received: from localhost.localdomain.com ([151.29.151.163]) by smtp.gmail.com with ESMTPSA id c23-20020a379a17000000b007436d0e9408sm13527134qke.127.2023.03.29.05.56.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 Mar 2023 05:56:47 -0700 (PDT) From: Juri Lelli To: Peter Zijlstra , Ingo Molnar , Qais Yousef , Waiman Long , Tejun Heo , Zefan Li , Johannes Weiner , Hao Luo Cc: Dietmar Eggemann , Steven Rostedt , linux-kernel@vger.kernel.org, luca.abeni@santannapisa.it, claudio@evidence.eu.com, tommaso.cucinotta@santannapisa.it, bristot@redhat.com, mathieu.poirier@linaro.org, cgroups@vger.kernel.org, Vincent Guittot , Wei Wang , Rick Yiu , Quentin Perret , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Sudeep Holla , Juri Lelli Subject: [PATCH 6/6] cgroup/cpuset: Iterate only if DEADLINE tasks are present Date: Wed, 29 Mar 2023 14:55:58 +0200 Message-Id: <20230329125558.255239-7-juri.lelli@redhat.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230329125558.255239-1-juri.lelli@redhat.com> References: <20230329125558.255239-1-juri.lelli@redhat.com> MIME-Version: 1.0 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761707642643895145?= X-GMAIL-MSGID: =?utf-8?q?1761707642643895145?= update_tasks_root_domain currently iterates over all tasks even if no DEADLINE task is present on the cpuset/root domain for which bandwidth accounting is being rebuilt. This has been reported to introduce 10+ ms delays on suspend-resume operations. Skip the costly iteration for cpusets that don't contain DEADLINE tasks. Reported-by: Qais Yousef Link: https://lore.kernel.org/lkml/20230206221428.2125324-1-qyousef@layalina.io/ Signed-off-by: Juri Lelli Reviewed-by: Qais Yousef Tested-by: Qais Yousef --- kernel/cgroup/cpuset.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index f8ebec66da51..05c0a1255218 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -1092,6 +1092,9 @@ static void dl_update_tasks_root_domain(struct cpuset *cs) struct css_task_iter it; struct task_struct *task; + if (cs->nr_deadline_tasks == 0) + return; + css_task_iter_start(&cs->css, 0, &it); while ((task = css_task_iter_next(&it)))