From patchwork Wed May 3 07:22:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Juri Lelli X-Patchwork-Id: 89597 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp1142174vqo; Wed, 3 May 2023 00:39:20 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ79INXfIBm73nRoTaA8d3Koe9DHBnlpH8rokgo9Dh7uXNxGqoIEv4t43c53X/IflcfLLHJ/ X-Received: by 2002:a17:90b:380b:b0:23e:f855:79ed with SMTP id mq11-20020a17090b380b00b0023ef85579edmr22244821pjb.28.1683099560074; Wed, 03 May 2023 00:39:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683099560; cv=none; d=google.com; s=arc-20160816; b=BOqf3AkjV7UXXygbZRBpjw4BIpAxHZBnV93egH9JAfs5HuYEiBH3R9kJFr0zLYxIWz IppauAole5t28G1IGoUVPGdBprxo1/65+AyGCQox+TOmwoADl4ZMhEyLGr0+Wtet9/nU KPwUu/LMtVq9QAS/M6EYuStlgXYyBUI7X+r0hrrSLWGxK68/0o727jqDg0hF5cBvw9pC u/YJUZzhsvpbTI7gBb9034Lcb03vn3YKOLAWFe98aD4cCdmZaYoPeaS2B78VlylfjjiY JgoekxI7n5EpIGuoixcyhgSp/HsYZdGLK/PN4E4UgltDJ2WLEqUHNTH5uLA4hCEtaW+4 nM+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=FXQ0mcO+uCMP4DfjuxEbtb+zWtt0HFFweQAzf8+D5iw=; b=jzXdUCuYaI9gsP4z7Q18Qp64cbMZj7SwljIkqq9mi/AVYDvmXfYP1nCWzl9biooZF6 zx3gRXmQwQH4mqBkA52df4B2EbwcLAhcRFANTEJd7Ta0D+coIeA5EUM0IYzCO3oEQ2ir syqm2lXMSGOV/vcU92XV0z2Scs2Fxn11VBUkNSgvDs5iXSZbh4q7a+Fj2zZWyoclV4Jj nTUpf9kijaE9iAQp4kASG5NaAqk8l4negn4ICc/FCd3wlPDcu6Ow4RqJFAaZR6zn8M57 RYBkJhFAjQBNw4YVJ5NxGDV3CyQwOBRfuuyh73Oo0GG5CnEVEnncfRlW7Pp/tX6xDWN/ 8Kkw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=DN20QMBE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c20-20020a17090a8d1400b0024df69fdff6si910077pjo.168.2023.05.03.00.39.04; Wed, 03 May 2023 00:39:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=DN20QMBE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229761AbjECHXv (ORCPT + 99 others); Wed, 3 May 2023 03:23:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34684 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229672AbjECHXs (ORCPT ); Wed, 3 May 2023 03:23:48 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3948B3C3C for ; Wed, 3 May 2023 00:23:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1683098580; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FXQ0mcO+uCMP4DfjuxEbtb+zWtt0HFFweQAzf8+D5iw=; b=DN20QMBEWaHcIFcfgoTC6zmUGt56cKGs9cThhrVg048TeNUFo5epJEalFtkMEgrujY35cT s29IF4fXY7Pxk36PjMe+THtdLYO2H3qsk0CUSprrBLNGR0cTEuz4GaeewybKogIEkIjNMF 5iPllgpcLSXvBMIy73KsmxvK4wvbPmo= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-624-f3p2cN7mNryo2L9L_SpoUA-1; Wed, 03 May 2023 03:22:59 -0400 X-MC-Unique: f3p2cN7mNryo2L9L_SpoUA-1 Received: by mail-wr1-f70.google.com with SMTP id ffacd0b85a97d-3063287f1c9so845940f8f.2 for ; Wed, 03 May 2023 00:22:59 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683098578; x=1685690578; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FXQ0mcO+uCMP4DfjuxEbtb+zWtt0HFFweQAzf8+D5iw=; b=fTb9S3l8/D2yDuHClX9hOrauki7w2DI2nSJC+yx7dhgXdjGANW4uU07rbkbDnx6yCG 1n4gbRRXyfNfn4NNhBbt2z8HVyQfYvp9eCB+uJFxd+bsBscjZpTSyTSV2Ehk3B1MIwWy az45UHT6MOONzstJwFTDzHARB0sbpYNRct2Whe/VmM1Q9os7+2fFvxA0ZNsC1iS9evGg ZXN9XJfaFjfOtsWwUann07Cny7bb0n2lWSNYj31p7E+156yVXt9q9d0jOpgg+FCTr2I5 LDVzaPJtsXAb2PWxCJrelEZPvlHJjqn7vnWivc0KjUu4PQPBSG3b/+q8NZ1GB2eG4jvy wVsw== X-Gm-Message-State: AC+VfDwj5Z+rHwzSF2elphW9U14+gOYUdQ8vbg0Yx3KQjcAwTiE5ZOcP 930HCMXNekRPPayMxxNUMcrUdQDU5VtvnHCiSe0rcEKg6FdR2QwtrdS+2Cly0UQu7NaBIZ0seSE Chi1gw+ue0l7ZiVqojXWpqMSq X-Received: by 2002:a05:6000:87:b0:306:31e0:964 with SMTP id m7-20020a056000008700b0030631e00964mr4690155wrx.55.1683098578072; Wed, 03 May 2023 00:22:58 -0700 (PDT) X-Received: by 2002:a05:6000:87:b0:306:31e0:964 with SMTP id m7-20020a056000008700b0030631e00964mr4690133wrx.55.1683098577748; Wed, 03 May 2023 00:22:57 -0700 (PDT) Received: from localhost.localdomain.com ([2a02:b127:8011:7489:32ac:78e2:be8c:a5fb]) by smtp.gmail.com with ESMTPSA id k1-20020a7bc301000000b003eddc6aa5fasm947259wmj.39.2023.05.03.00.22.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 May 2023 00:22:57 -0700 (PDT) From: Juri Lelli To: Peter Zijlstra , Ingo Molnar , Qais Yousef , Waiman Long , Tejun Heo , Zefan Li , Johannes Weiner , Hao Luo Cc: Dietmar Eggemann , Steven Rostedt , linux-kernel@vger.kernel.org, luca.abeni@santannapisa.it, claudio@evidence.eu.com, tommaso.cucinotta@santannapisa.it, bristot@redhat.com, mathieu.poirier@linaro.org, cgroups@vger.kernel.org, Vincent Guittot , Wei Wang , Rick Yiu , Quentin Perret , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Sudeep Holla , Juri Lelli Subject: [PATCH v2 1/6] cgroup/cpuset: Rename functions dealing with DEADLINE accounting Date: Wed, 3 May 2023 09:22:23 +0200 Message-Id: <20230503072228.115707-2-juri.lelli@redhat.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230503072228.115707-1-juri.lelli@redhat.com> References: <20230503072228.115707-1-juri.lelli@redhat.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1764857804238058155?= X-GMAIL-MSGID: =?utf-8?q?1764857804238058155?= rebuild_root_domains() and update_tasks_root_domain() have neutral names, but actually deal with DEADLINE bandwidth accounting. Rename them to use 'dl_' prefix so that intent is more clear. No functional change. Suggested-by: Qais Yousef Signed-off-by: Juri Lelli Reviewed-by: Waiman Long --- kernel/cgroup/cpuset.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index e4ca2dd2b764..428ab46291e2 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -1066,7 +1066,7 @@ static int generate_sched_domains(cpumask_var_t **domains, return ndoms; } -static void update_tasks_root_domain(struct cpuset *cs) +static void dl_update_tasks_root_domain(struct cpuset *cs) { struct css_task_iter it; struct task_struct *task; @@ -1079,7 +1079,7 @@ static void update_tasks_root_domain(struct cpuset *cs) css_task_iter_end(&it); } -static void rebuild_root_domains(void) +static void dl_rebuild_rd_accounting(void) { struct cpuset *cs = NULL; struct cgroup_subsys_state *pos_css; @@ -1107,7 +1107,7 @@ static void rebuild_root_domains(void) rcu_read_unlock(); - update_tasks_root_domain(cs); + dl_update_tasks_root_domain(cs); rcu_read_lock(); css_put(&cs->css); @@ -1121,7 +1121,7 @@ partition_and_rebuild_sched_domains(int ndoms_new, cpumask_var_t doms_new[], { mutex_lock(&sched_domains_mutex); partition_sched_domains_locked(ndoms_new, doms_new, dattr_new); - rebuild_root_domains(); + dl_rebuild_rd_accounting(); mutex_unlock(&sched_domains_mutex); } From patchwork Wed May 3 07:22:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Juri Lelli X-Patchwork-Id: 89602 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp1143650vqo; Wed, 3 May 2023 00:43:11 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7/l4aAyl/78m2W89l+NvYY9vVuLAFebG56l3i8TcySS3NrVXTBjk37ict0b4/fwelJZ+7R X-Received: by 2002:a17:90a:bc45:b0:24c:c75:2531 with SMTP id t5-20020a17090abc4500b0024c0c752531mr21195773pjv.37.1683099790777; Wed, 03 May 2023 00:43:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683099790; cv=none; d=google.com; s=arc-20160816; b=ggVrcLtHZzzjpo4Uv0fB+XsC2sJp7vSaYjeMvfAkGNrvM/rUfEERTUdq6f4pPvOmbb XIrJGf9ZBW/6JOPrrxR4foifKjHqa3h3yCzdvwu+AjkVVNuFxLBQI1fCg8VX7WWRRsNG X+E2e56YH1sBX70XR/I+vdQNUwK8Lo25ObSPJhTcZDSGODauxA+OviN8cF7BwXR3evZ4 mAfu3NHQQP4pCymHqmDA8E5NlMKH2+xd/k6Vrfh0Gf/NPJ/vVnK4l2ZZM34hULOElDPN tLfYCSkzZaJ+p2AsX4uWuXd2DSjDjXOntv/z13mqxBWQCD0G4nE9WpULnnzv8Fsx5HRY qDQg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=nO1mzVdGNAoh4CAeozZUp/SmNYBwf5C/F+mNsOzIzh4=; b=V657K0cefbQheolDsTL3FomIUBHJiujmwjjYcL+R1hIQBuoVeppWdDpW7koN9mr1K7 n6BgJCQ53OlGVk6eZCShHKtz/2JFIRFLYoWBh3gWJjQ2E5gYZ5+GoTSAt+bdbZUF8s7S lC0vb2lazu6R6+h6ZqjQUnilBo7lYej5XAFo3RKqQeGRNgX4ftax2yfu/pqLafOs2xOI xBm4sWFHGqflTkJCCe/Iv8lB7oWsLU8BmsvgZtMovK4MqPZaTd3G/l9nf0Uxcr+Yw6ws HDotsiCUogQcWMQNwwrnoz0bSXl3yj9bAyZhtgLMee7lxbAkq1/jTA08NNBcWeZuCVKc PJ9g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=XeuCjMMY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id gw18-20020a17090b0a5200b002470ea7f67fsi971073pjb.3.2023.05.03.00.42.55; Wed, 03 May 2023 00:43:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=XeuCjMMY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229839AbjECHYL (ORCPT + 99 others); Wed, 3 May 2023 03:24:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34724 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229792AbjECHX4 (ORCPT ); Wed, 3 May 2023 03:23:56 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 823983C3F for ; Wed, 3 May 2023 00:23:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1683098585; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nO1mzVdGNAoh4CAeozZUp/SmNYBwf5C/F+mNsOzIzh4=; b=XeuCjMMYQa3wlHlft5CpNc2tHabeKvnCGHrRYyS3WXCKWiJboRG0kzWIYPfFmuDfP1Q6bl 1fMDwRI8Z+YxlhTbdcEt2zDljNGvGWUZZ8VFQ8gG4RjSoZsHE+jNfuwuT9ugTkkBeF3gAe +WNe3MWbgQqCkDQghgHtFo9KLwIpx3s= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-557-3mf3mn6lOHGBtbFm0mkb-w-1; Wed, 03 May 2023 03:23:03 -0400 X-MC-Unique: 3mf3mn6lOHGBtbFm0mkb-w-1 Received: by mail-wm1-f69.google.com with SMTP id 5b1f17b1804b1-3f18c2b2110so14876905e9.3 for ; Wed, 03 May 2023 00:23:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683098582; x=1685690582; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nO1mzVdGNAoh4CAeozZUp/SmNYBwf5C/F+mNsOzIzh4=; b=WDH58OsTx+NK5erFSNsEhGXW1fbxLU+G2oOgMlPLVTnNDP7L7LNWoMv1RxCMkDG+9b o9O61B3huxQDYiyQlxEGBirgcGYT43nEhq3GAH3Chw8MjBYY0pl2kglCiS76kGNleB7X WFQg0et1MVWDNBkOQksq7JVTv0VZEEYJg+JHG28sxRsai+yJtmcMhxQlJW6S5YQTQDrq vhC/br4IUoqGd+T3H1OVDrHA8Ha9n8noEy0PMW7UmjcJItiojvcJ31lnf6lmHKvvGkzk yhGRhgxhRx5xIKgOdowV/0T6iDXfb+ByT4iQCVliKG/xRq2vIsL9tkurT2GuGlpVk8l/ TAAw== X-Gm-Message-State: AC+VfDwJfXucWmwynkgxHxEu/LqomEN63qSlMTsu7oqjByvsCErf1Eva VUNsFoq7e2YVhMJe5M++aibswULdhlL6eLLvXvywdOpOH71uNHWc/fg6J7NSCLZtZSPbvA28+Ui H2/z+LjoIV1OWVGQ4EjSjiG75 X-Received: by 2002:a05:600c:cb:b0:3f1:7144:5e32 with SMTP id u11-20020a05600c00cb00b003f171445e32mr13784989wmm.35.1683098582102; Wed, 03 May 2023 00:23:02 -0700 (PDT) X-Received: by 2002:a05:600c:cb:b0:3f1:7144:5e32 with SMTP id u11-20020a05600c00cb00b003f171445e32mr13784948wmm.35.1683098581539; Wed, 03 May 2023 00:23:01 -0700 (PDT) Received: from localhost.localdomain.com ([2a02:b127:8011:7489:32ac:78e2:be8c:a5fb]) by smtp.gmail.com with ESMTPSA id k1-20020a7bc301000000b003eddc6aa5fasm947259wmj.39.2023.05.03.00.22.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 May 2023 00:23:01 -0700 (PDT) From: Juri Lelli To: Peter Zijlstra , Ingo Molnar , Qais Yousef , Waiman Long , Tejun Heo , Zefan Li , Johannes Weiner , Hao Luo Cc: Dietmar Eggemann , Steven Rostedt , linux-kernel@vger.kernel.org, luca.abeni@santannapisa.it, claudio@evidence.eu.com, tommaso.cucinotta@santannapisa.it, bristot@redhat.com, mathieu.poirier@linaro.org, cgroups@vger.kernel.org, Vincent Guittot , Wei Wang , Rick Yiu , Quentin Perret , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Sudeep Holla , Juri Lelli Subject: [PATCH v2 2/6] sched/cpuset: Bring back cpuset_mutex Date: Wed, 3 May 2023 09:22:24 +0200 Message-Id: <20230503072228.115707-3-juri.lelli@redhat.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230503072228.115707-1-juri.lelli@redhat.com> References: <20230503072228.115707-1-juri.lelli@redhat.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1764858045795576349?= X-GMAIL-MSGID: =?utf-8?q?1764858045795576349?= Turns out percpu_cpuset_rwsem - commit 1243dc518c9d ("cgroup/cpuset: Convert cpuset_mutex to percpu_rwsem") - wasn't such a brilliant idea, as it has been reported to cause slowdowns in workloads that need to change cpuset configuration frequently and it is also not implementing priority inheritance (which causes troubles with realtime workloads). Convert percpu_cpuset_rwsem back to regular cpuset_mutex. Also grab it only for SCHED_DEADLINE tasks (other policies don't care about stable cpusets anyway). Signed-off-by: Juri Lelli Reviewed-by: Waiman Long --- include/linux/cpuset.h | 8 +-- kernel/cgroup/cpuset.c | 157 ++++++++++++++++++++--------------------- kernel/sched/core.c | 22 ++++-- 3 files changed, 97 insertions(+), 90 deletions(-) diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h index 980b76a1237e..f90e6325d707 100644 --- a/include/linux/cpuset.h +++ b/include/linux/cpuset.h @@ -71,8 +71,8 @@ extern void cpuset_init_smp(void); extern void cpuset_force_rebuild(void); extern void cpuset_update_active_cpus(void); extern void cpuset_wait_for_hotplug(void); -extern void cpuset_read_lock(void); -extern void cpuset_read_unlock(void); +extern void cpuset_lock(void); +extern void cpuset_unlock(void); extern void cpuset_cpus_allowed(struct task_struct *p, struct cpumask *mask); extern bool cpuset_cpus_allowed_fallback(struct task_struct *p); extern nodemask_t cpuset_mems_allowed(struct task_struct *p); @@ -189,8 +189,8 @@ static inline void cpuset_update_active_cpus(void) static inline void cpuset_wait_for_hotplug(void) { } -static inline void cpuset_read_lock(void) { } -static inline void cpuset_read_unlock(void) { } +static inline void cpuset_lock(void) { } +static inline void cpuset_unlock(void) { } static inline void cpuset_cpus_allowed(struct task_struct *p, struct cpumask *mask) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 428ab46291e2..ee66be215fb9 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -366,22 +366,21 @@ static struct cpuset top_cpuset = { if (is_cpuset_online(((des_cs) = css_cs((pos_css))))) /* - * There are two global locks guarding cpuset structures - cpuset_rwsem and + * There are two global locks guarding cpuset structures - cpuset_mutex and * callback_lock. We also require taking task_lock() when dereferencing a * task's cpuset pointer. See "The task_lock() exception", at the end of this - * comment. The cpuset code uses only cpuset_rwsem write lock. Other - * kernel subsystems can use cpuset_read_lock()/cpuset_read_unlock() to - * prevent change to cpuset structures. + * comment. The cpuset code uses only cpuset_mutex. Other kernel subsystems + * can use cpuset_lock()/cpuset_unlock() to prevent change to cpuset + * structures. * * A task must hold both locks to modify cpusets. If a task holds - * cpuset_rwsem, it blocks others wanting that rwsem, ensuring that it - * is the only task able to also acquire callback_lock and be able to - * modify cpusets. It can perform various checks on the cpuset structure - * first, knowing nothing will change. It can also allocate memory while - * just holding cpuset_rwsem. While it is performing these checks, various - * callback routines can briefly acquire callback_lock to query cpusets. - * Once it is ready to make the changes, it takes callback_lock, blocking - * everyone else. + * cpuset_mutex, it blocks others, ensuring that it is the only task able to + * also acquire callback_lock and be able to modify cpusets. It can perform + * various checks on the cpuset structure first, knowing nothing will change. + * It can also allocate memory while just holding cpuset_mutex. While it is + * performing these checks, various callback routines can briefly acquire + * callback_lock to query cpusets. Once it is ready to make the changes, it + * takes callback_lock, blocking everyone else. * * Calls to the kernel memory allocator can not be made while holding * callback_lock, as that would risk double tripping on callback_lock @@ -403,16 +402,16 @@ static struct cpuset top_cpuset = { * guidelines for accessing subsystem state in kernel/cgroup.c */ -DEFINE_STATIC_PERCPU_RWSEM(cpuset_rwsem); +static DEFINE_MUTEX(cpuset_mutex); -void cpuset_read_lock(void) +void cpuset_lock(void) { - percpu_down_read(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); } -void cpuset_read_unlock(void) +void cpuset_unlock(void) { - percpu_up_read(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); } static DEFINE_SPINLOCK(callback_lock); @@ -496,7 +495,7 @@ static inline bool partition_is_populated(struct cpuset *cs, * One way or another, we guarantee to return some non-empty subset * of cpu_online_mask. * - * Call with callback_lock or cpuset_rwsem held. + * Call with callback_lock or cpuset_mutex held. */ static void guarantee_online_cpus(struct task_struct *tsk, struct cpumask *pmask) @@ -538,7 +537,7 @@ static void guarantee_online_cpus(struct task_struct *tsk, * One way or another, we guarantee to return some non-empty subset * of node_states[N_MEMORY]. * - * Call with callback_lock or cpuset_rwsem held. + * Call with callback_lock or cpuset_mutex held. */ static void guarantee_online_mems(struct cpuset *cs, nodemask_t *pmask) { @@ -550,7 +549,7 @@ static void guarantee_online_mems(struct cpuset *cs, nodemask_t *pmask) /* * update task's spread flag if cpuset's page/slab spread flag is set * - * Call with callback_lock or cpuset_rwsem held. The check can be skipped + * Call with callback_lock or cpuset_mutex held. The check can be skipped * if on default hierarchy. */ static void cpuset_update_task_spread_flags(struct cpuset *cs, @@ -575,7 +574,7 @@ static void cpuset_update_task_spread_flags(struct cpuset *cs, * * One cpuset is a subset of another if all its allowed CPUs and * Memory Nodes are a subset of the other, and its exclusive flags - * are only set if the other's are set. Call holding cpuset_rwsem. + * are only set if the other's are set. Call holding cpuset_mutex. */ static int is_cpuset_subset(const struct cpuset *p, const struct cpuset *q) @@ -713,7 +712,7 @@ static int validate_change_legacy(struct cpuset *cur, struct cpuset *trial) * If we replaced the flag and mask values of the current cpuset * (cur) with those values in the trial cpuset (trial), would * our various subset and exclusive rules still be valid? Presumes - * cpuset_rwsem held. + * cpuset_mutex held. * * 'cur' is the address of an actual, in-use cpuset. Operations * such as list traversal that depend on the actual address of the @@ -829,7 +828,7 @@ static void update_domain_attr_tree(struct sched_domain_attr *dattr, rcu_read_unlock(); } -/* Must be called with cpuset_rwsem held. */ +/* Must be called with cpuset_mutex held. */ static inline int nr_cpusets(void) { /* jump label reference count + the top-level cpuset */ @@ -855,7 +854,7 @@ static inline int nr_cpusets(void) * domains when operating in the severe memory shortage situations * that could cause allocation failures below. * - * Must be called with cpuset_rwsem held. + * Must be called with cpuset_mutex held. * * The three key local variables below are: * cp - cpuset pointer, used (together with pos_css) to perform a @@ -1084,7 +1083,7 @@ static void dl_rebuild_rd_accounting(void) struct cpuset *cs = NULL; struct cgroup_subsys_state *pos_css; - percpu_rwsem_assert_held(&cpuset_rwsem); + lockdep_assert_held(&cpuset_mutex); lockdep_assert_cpus_held(); lockdep_assert_held(&sched_domains_mutex); @@ -1134,7 +1133,7 @@ partition_and_rebuild_sched_domains(int ndoms_new, cpumask_var_t doms_new[], * 'cpus' is removed, then call this routine to rebuild the * scheduler's dynamic sched domains. * - * Call with cpuset_rwsem held. Takes cpus_read_lock(). + * Call with cpuset_mutex held. Takes cpus_read_lock(). */ static void rebuild_sched_domains_locked(void) { @@ -1145,7 +1144,7 @@ static void rebuild_sched_domains_locked(void) int ndoms; lockdep_assert_cpus_held(); - percpu_rwsem_assert_held(&cpuset_rwsem); + lockdep_assert_held(&cpuset_mutex); /* * If we have raced with CPU hotplug, return early to avoid @@ -1196,9 +1195,9 @@ static void rebuild_sched_domains_locked(void) void rebuild_sched_domains(void) { cpus_read_lock(); - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); rebuild_sched_domains_locked(); - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); cpus_read_unlock(); } @@ -1208,7 +1207,7 @@ void rebuild_sched_domains(void) * @new_cpus: the temp variable for the new effective_cpus mask * * Iterate through each task of @cs updating its cpus_allowed to the - * effective cpuset's. As this function is called with cpuset_rwsem held, + * effective cpuset's. As this function is called with cpuset_mutex held, * cpuset membership stays stable. For top_cpuset, task_cpu_possible_mask() * is used instead of effective_cpus to make sure all offline CPUs are also * included as hotplug code won't update cpumasks for tasks in top_cpuset. @@ -1322,7 +1321,7 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd, int old_prs, new_prs; int part_error = PERR_NONE; /* Partition error? */ - percpu_rwsem_assert_held(&cpuset_rwsem); + lockdep_assert_held(&cpuset_mutex); /* * The parent must be a partition root. @@ -1545,7 +1544,7 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd, * * On legacy hierarchy, effective_cpus will be the same with cpu_allowed. * - * Called with cpuset_rwsem held + * Called with cpuset_mutex held */ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp, bool force) @@ -1705,7 +1704,7 @@ static void update_sibling_cpumasks(struct cpuset *parent, struct cpuset *cs, struct cpuset *sibling; struct cgroup_subsys_state *pos_css; - percpu_rwsem_assert_held(&cpuset_rwsem); + lockdep_assert_held(&cpuset_mutex); /* * Check all its siblings and call update_cpumasks_hier() @@ -1955,12 +1954,12 @@ static void *cpuset_being_rebound; * @cs: the cpuset in which each task's mems_allowed mask needs to be changed * * Iterate through each task of @cs updating its mems_allowed to the - * effective cpuset's. As this function is called with cpuset_rwsem held, + * effective cpuset's. As this function is called with cpuset_mutex held, * cpuset membership stays stable. */ static void update_tasks_nodemask(struct cpuset *cs) { - static nodemask_t newmems; /* protected by cpuset_rwsem */ + static nodemask_t newmems; /* protected by cpuset_mutex */ struct css_task_iter it; struct task_struct *task; @@ -1973,7 +1972,7 @@ static void update_tasks_nodemask(struct cpuset *cs) * take while holding tasklist_lock. Forks can happen - the * mpol_dup() cpuset_being_rebound check will catch such forks, * and rebind their vma mempolicies too. Because we still hold - * the global cpuset_rwsem, we know that no other rebind effort + * the global cpuset_mutex, we know that no other rebind effort * will be contending for the global variable cpuset_being_rebound. * It's ok if we rebind the same mm twice; mpol_rebind_mm() * is idempotent. Also migrate pages in each mm to new nodes. @@ -2019,7 +2018,7 @@ static void update_tasks_nodemask(struct cpuset *cs) * * On legacy hierarchy, effective_mems will be the same with mems_allowed. * - * Called with cpuset_rwsem held + * Called with cpuset_mutex held */ static void update_nodemasks_hier(struct cpuset *cs, nodemask_t *new_mems) { @@ -2072,7 +2071,7 @@ static void update_nodemasks_hier(struct cpuset *cs, nodemask_t *new_mems) * mempolicies and if the cpuset is marked 'memory_migrate', * migrate the tasks pages to the new memory. * - * Call with cpuset_rwsem held. May take callback_lock during call. + * Call with cpuset_mutex held. May take callback_lock during call. * Will take tasklist_lock, scan tasklist for tasks in cpuset cs, * lock each such tasks mm->mmap_lock, scan its vma's and rebind * their mempolicies to the cpusets new mems_allowed. @@ -2164,7 +2163,7 @@ static int update_relax_domain_level(struct cpuset *cs, s64 val) * @cs: the cpuset in which each task's spread flags needs to be changed * * Iterate through each task of @cs updating its spread flags. As this - * function is called with cpuset_rwsem held, cpuset membership stays + * function is called with cpuset_mutex held, cpuset membership stays * stable. */ static void update_tasks_flags(struct cpuset *cs) @@ -2184,7 +2183,7 @@ static void update_tasks_flags(struct cpuset *cs) * cs: the cpuset to update * turning_on: whether the flag is being set or cleared * - * Call with cpuset_rwsem held. + * Call with cpuset_mutex held. */ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs, @@ -2234,7 +2233,7 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs, * @new_prs: new partition root state * Return: 0 if successful, != 0 if error * - * Call with cpuset_rwsem held. + * Call with cpuset_mutex held. */ static int update_prstate(struct cpuset *cs, int new_prs) { @@ -2472,7 +2471,7 @@ static int cpuset_can_attach_check(struct cpuset *cs) return 0; } -/* Called by cgroups to determine if a cpuset is usable; cpuset_rwsem held */ +/* Called by cgroups to determine if a cpuset is usable; cpuset_mutex held */ static int cpuset_can_attach(struct cgroup_taskset *tset) { struct cgroup_subsys_state *css; @@ -2484,7 +2483,7 @@ static int cpuset_can_attach(struct cgroup_taskset *tset) cpuset_attach_old_cs = task_cs(cgroup_taskset_first(tset, &css)); cs = css_cs(css); - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); /* Check to see if task is allowed in the cpuset */ ret = cpuset_can_attach_check(cs); @@ -2506,7 +2505,7 @@ static int cpuset_can_attach(struct cgroup_taskset *tset) */ cs->attach_in_progress++; out_unlock: - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); return ret; } @@ -2518,15 +2517,15 @@ static void cpuset_cancel_attach(struct cgroup_taskset *tset) cgroup_taskset_first(tset, &css); cs = css_cs(css); - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); cs->attach_in_progress--; if (!cs->attach_in_progress) wake_up(&cpuset_attach_wq); - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); } /* - * Protected by cpuset_rwsem. cpus_attach is used only by cpuset_attach_task() + * Protected by cpuset_mutex. cpus_attach is used only by cpuset_attach_task() * but we can't allocate it dynamically there. Define it global and * allocate from cpuset_init(). */ @@ -2535,7 +2534,7 @@ static nodemask_t cpuset_attach_nodemask_to; static void cpuset_attach_task(struct cpuset *cs, struct task_struct *task) { - percpu_rwsem_assert_held(&cpuset_rwsem); + lockdep_assert_held(&cpuset_mutex); if (cs != &top_cpuset) guarantee_online_cpus(task, cpus_attach); @@ -2565,7 +2564,7 @@ static void cpuset_attach(struct cgroup_taskset *tset) cs = css_cs(css); lockdep_assert_cpus_held(); /* see cgroup_attach_lock() */ - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); cpus_updated = !cpumask_equal(cs->effective_cpus, oldcs->effective_cpus); mems_updated = !nodes_equal(cs->effective_mems, oldcs->effective_mems); @@ -2626,7 +2625,7 @@ static void cpuset_attach(struct cgroup_taskset *tset) if (!cs->attach_in_progress) wake_up(&cpuset_attach_wq); - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); } /* The various types of files and directories in a cpuset file system */ @@ -2658,7 +2657,7 @@ static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft, int retval = 0; cpus_read_lock(); - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); if (!is_cpuset_online(cs)) { retval = -ENODEV; goto out_unlock; @@ -2694,7 +2693,7 @@ static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft, break; } out_unlock: - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); cpus_read_unlock(); return retval; } @@ -2707,7 +2706,7 @@ static int cpuset_write_s64(struct cgroup_subsys_state *css, struct cftype *cft, int retval = -ENODEV; cpus_read_lock(); - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); if (!is_cpuset_online(cs)) goto out_unlock; @@ -2720,7 +2719,7 @@ static int cpuset_write_s64(struct cgroup_subsys_state *css, struct cftype *cft, break; } out_unlock: - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); cpus_read_unlock(); return retval; } @@ -2753,7 +2752,7 @@ static ssize_t cpuset_write_resmask(struct kernfs_open_file *of, * operation like this one can lead to a deadlock through kernfs * active_ref protection. Let's break the protection. Losing the * protection is okay as we check whether @cs is online after - * grabbing cpuset_rwsem anyway. This only happens on the legacy + * grabbing cpuset_mutex anyway. This only happens on the legacy * hierarchies. */ css_get(&cs->css); @@ -2761,7 +2760,7 @@ static ssize_t cpuset_write_resmask(struct kernfs_open_file *of, flush_work(&cpuset_hotplug_work); cpus_read_lock(); - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); if (!is_cpuset_online(cs)) goto out_unlock; @@ -2785,7 +2784,7 @@ static ssize_t cpuset_write_resmask(struct kernfs_open_file *of, free_cpuset(trialcs); out_unlock: - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); cpus_read_unlock(); kernfs_unbreak_active_protection(of->kn); css_put(&cs->css); @@ -2933,13 +2932,13 @@ static ssize_t sched_partition_write(struct kernfs_open_file *of, char *buf, css_get(&cs->css); cpus_read_lock(); - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); if (!is_cpuset_online(cs)) goto out_unlock; retval = update_prstate(cs, val); out_unlock: - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); cpus_read_unlock(); css_put(&cs->css); return retval ?: nbytes; @@ -3156,7 +3155,7 @@ static int cpuset_css_online(struct cgroup_subsys_state *css) return 0; cpus_read_lock(); - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); set_bit(CS_ONLINE, &cs->flags); if (is_spread_page(parent)) @@ -3207,7 +3206,7 @@ static int cpuset_css_online(struct cgroup_subsys_state *css) cpumask_copy(cs->effective_cpus, parent->cpus_allowed); spin_unlock_irq(&callback_lock); out_unlock: - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); cpus_read_unlock(); return 0; } @@ -3228,7 +3227,7 @@ static void cpuset_css_offline(struct cgroup_subsys_state *css) struct cpuset *cs = css_cs(css); cpus_read_lock(); - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); if (is_partition_valid(cs)) update_prstate(cs, 0); @@ -3247,7 +3246,7 @@ static void cpuset_css_offline(struct cgroup_subsys_state *css) cpuset_dec(); clear_bit(CS_ONLINE, &cs->flags); - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); cpus_read_unlock(); } @@ -3260,7 +3259,7 @@ static void cpuset_css_free(struct cgroup_subsys_state *css) static void cpuset_bind(struct cgroup_subsys_state *root_css) { - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); spin_lock_irq(&callback_lock); if (is_in_v2_mode()) { @@ -3273,7 +3272,7 @@ static void cpuset_bind(struct cgroup_subsys_state *root_css) } spin_unlock_irq(&callback_lock); - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); } /* @@ -3294,7 +3293,7 @@ static int cpuset_can_fork(struct task_struct *task, struct css_set *cset) return 0; lockdep_assert_held(&cgroup_mutex); - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); /* Check to see if task is allowed in the cpuset */ ret = cpuset_can_attach_check(cs); @@ -3315,7 +3314,7 @@ static int cpuset_can_fork(struct task_struct *task, struct css_set *cset) */ cs->attach_in_progress++; out_unlock: - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); return ret; } @@ -3331,11 +3330,11 @@ static void cpuset_cancel_fork(struct task_struct *task, struct css_set *cset) if (same_cs) return; - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); cs->attach_in_progress--; if (!cs->attach_in_progress) wake_up(&cpuset_attach_wq); - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); } /* @@ -3363,7 +3362,7 @@ static void cpuset_fork(struct task_struct *task) } /* CLONE_INTO_CGROUP */ - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); guarantee_online_mems(cs, &cpuset_attach_nodemask_to); cpuset_attach_task(cs, task); @@ -3371,7 +3370,7 @@ static void cpuset_fork(struct task_struct *task) if (!cs->attach_in_progress) wake_up(&cpuset_attach_wq); - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); } struct cgroup_subsys cpuset_cgrp_subsys = { @@ -3472,7 +3471,7 @@ hotplug_update_tasks_legacy(struct cpuset *cs, is_empty = cpumask_empty(cs->cpus_allowed) || nodes_empty(cs->mems_allowed); - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); /* * Move tasks to the nearest ancestor with execution resources, @@ -3482,7 +3481,7 @@ hotplug_update_tasks_legacy(struct cpuset *cs, if (is_empty) remove_tasks_in_empty_cpuset(cs); - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); } static void @@ -3533,14 +3532,14 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp) retry: wait_event(cpuset_attach_wq, cs->attach_in_progress == 0); - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); /* * We have raced with task attaching. We wait until attaching * is finished, so we won't attach a task to an empty cpuset. */ if (cs->attach_in_progress) { - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); goto retry; } @@ -3637,7 +3636,7 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp) cpus_updated, mems_updated); unlock: - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); } /** @@ -3667,7 +3666,7 @@ static void cpuset_hotplug_workfn(struct work_struct *work) if (on_dfl && !alloc_cpumasks(NULL, &tmp)) ptmp = &tmp; - percpu_down_write(&cpuset_rwsem); + mutex_lock(&cpuset_mutex); /* fetch the available cpus/mems and find out which changed how */ cpumask_copy(&new_cpus, cpu_active_mask); @@ -3724,7 +3723,7 @@ static void cpuset_hotplug_workfn(struct work_struct *work) update_tasks_nodemask(&top_cpuset); } - percpu_up_write(&cpuset_rwsem); + mutex_unlock(&cpuset_mutex); /* if cpus or mems changed, we need to propagate to descendants */ if (cpus_updated || mems_updated) { @@ -4155,7 +4154,7 @@ void __cpuset_memory_pressure_bump(void) * - Used for /proc//cpuset. * - No need to task_lock(tsk) on this tsk->cpuset reference, as it * doesn't really matter if tsk->cpuset changes after we read it, - * and we take cpuset_rwsem, keeping cpuset_attach() from changing it + * and we take cpuset_mutex, keeping cpuset_attach() from changing it * anyway. */ int proc_cpuset_show(struct seq_file *m, struct pid_namespace *ns, diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 944c3ae39861..d826bec1c522 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -7590,6 +7590,7 @@ static int __sched_setscheduler(struct task_struct *p, int reset_on_fork; int queue_flags = DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOCLOCK; struct rq *rq; + bool cpuset_locked = false; /* The pi code expects interrupts enabled */ BUG_ON(pi && in_interrupt()); @@ -7639,8 +7640,14 @@ static int __sched_setscheduler(struct task_struct *p, return retval; } - if (pi) - cpuset_read_lock(); + /* + * SCHED_DEADLINE bandwidth accounting relies on stable cpusets + * information. + */ + if (dl_policy(policy) || dl_policy(p->policy)) { + cpuset_locked = true; + cpuset_lock(); + } /* * Make sure no PI-waiters arrive (or leave) while we are @@ -7716,8 +7723,8 @@ static int __sched_setscheduler(struct task_struct *p, if (unlikely(oldpolicy != -1 && oldpolicy != p->policy)) { policy = oldpolicy = -1; task_rq_unlock(rq, p, &rf); - if (pi) - cpuset_read_unlock(); + if (cpuset_locked) + cpuset_unlock(); goto recheck; } @@ -7784,7 +7791,8 @@ static int __sched_setscheduler(struct task_struct *p, task_rq_unlock(rq, p, &rf); if (pi) { - cpuset_read_unlock(); + if (cpuset_locked) + cpuset_unlock(); rt_mutex_adjust_pi(p); } @@ -7796,8 +7804,8 @@ static int __sched_setscheduler(struct task_struct *p, unlock: task_rq_unlock(rq, p, &rf); - if (pi) - cpuset_read_unlock(); + if (cpuset_locked) + cpuset_unlock(); return retval; } From patchwork Wed May 3 07:22:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Juri Lelli X-Patchwork-Id: 89598 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp1142852vqo; Wed, 3 May 2023 00:41:06 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5qhmbw33bjCPWnGACWL7NgQfMCPkS+tsEnBjEytAlzk3boAEAbdwNc1m2gqUosb9tR8L9c X-Received: by 2002:a05:6a20:4298:b0:f0:eb30:d49e with SMTP id o24-20020a056a20429800b000f0eb30d49emr25849220pzj.39.1683099666456; Wed, 03 May 2023 00:41:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683099666; cv=none; d=google.com; s=arc-20160816; b=aMxHYgt6Y8Ed0asOUx8cf/xReNlWAEQkRbpD58mGRRjMHxWpr3bRwzIioTYRVcXBwt USfPrvjn8U0HOpb2fT9YF5Em46AGyzJ0z+/cpc7jc3DaX0jgpKLO23wDx/lFDMmW95s+ q/oTsNBvn3k4/lVgGGK/dGCWiWYDEzUD3RKMKzgw9/fSuqCSXJC0GEuoz73vKcRG4p9f kFfdc40c6Et/tOd0Lm5vEECItdX9r7l0vEFxxrQnklj415f6Kl4wHnxdzw+aYU56s4fJ iXMo0Ve4/eT+fEmOYeIXbmBDr0uo3BOs4WAiHUgCCFz/dw1/fUQD7izpDy/8fjb/0TqE VC5A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=jDCLQUYbINeFQHzJXfJkeYrD7LMuwYqoDjKGOmqQCVc=; b=0Vj66LInOdIEykcGWiczh56i2N7OmYFLI1Ck0sXGxjkiQ3jx2wZWRelG35FqWQ5l88 rUwVDYQW4f+DYXNmiHrmhz6kVfU23G3VNKt/cZYoDS+umwx1GAQpu6LnKmK3psF69WBU +CM9uAT8hZmO1kzSSEOnyJX+EFoim9r18TZMILczPzODXTdmM6kKJ2JnkSoz0WmiMYl8 x8kg1VBRqnGoWga9JbBHkeLQv+Q/zOA5NnPhQ01XNzBNcQ8nvzuT8c+CU9HMM0axt4hh hQoP/54qN/9XwC8XZgljlnBpFMNSMybklCmaT/aIkFR0e/Af3OMzetT8QuMGcclpUkBK 5S5Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=dgwgX4Bb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id r17-20020aa79891000000b0063b8eec0832si32117290pfl.114.2023.05.03.00.40.28; Wed, 03 May 2023 00:41:06 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=dgwgX4Bb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229816AbjECHYE (ORCPT + 99 others); Wed, 3 May 2023 03:24:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34734 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229675AbjECHX4 (ORCPT ); Wed, 3 May 2023 03:23:56 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8B02A4224 for ; Wed, 3 May 2023 00:23:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1683098587; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jDCLQUYbINeFQHzJXfJkeYrD7LMuwYqoDjKGOmqQCVc=; b=dgwgX4BbA0wkRnt9SmL6XDwR6631XVWgb4h0AuBA40mhUFMfzoL4aUkJX6YQIsJnngzr6u CRkAEZKADWWNle8zgZ/SSLcBaYFxbzD+0+GWibi9ZDs2MIQHNmvPIr23iNwAFpDqUKtL6d nBygj4iATRXTUJ/jfjz2iDO7kfMXYcE= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-398-12y5UhNEP3mAKtQ2_RzBNg-1; Wed, 03 May 2023 03:23:06 -0400 X-MC-Unique: 12y5UhNEP3mAKtQ2_RzBNg-1 Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-3f3fe24912cso1278125e9.0 for ; Wed, 03 May 2023 00:23:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683098585; x=1685690585; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=jDCLQUYbINeFQHzJXfJkeYrD7LMuwYqoDjKGOmqQCVc=; b=KyhLV/I5cFA4wP8WrNXlnjC8qzdPw7dIrdEFCVjLZZpAkBGloO04PoYPVx0FBVkaDR cpml73LKb3pEj3TfKd1UZIpk+AZPBGacDw2GDFsjy14VWCuUq4nOjtuVKIBq724WAbmT Q+15IC4SEZfMq9qqq/so5HjnpKEjhOnBO/7hkNGhucF8zHXLhxGsv/N3yvhfjP0+eeVs PNzxTy/C063MdR099huMPMSeWvD3x3nFY7bml6gjCcV5kvACSxy6x/zTrhwiHQH6qiFn i5phFJzAETDSDytRFHxMlVsBU7acRuIVEAPPq87woOanXra5+pBSAzYulCS96NU+dLlS N8Pg== X-Gm-Message-State: AC+VfDzQxA6DNb0LY7o9xswW77xzCAVZjnMxEB27drpjSMjCgwLYvTTy NsSl7bB/UxT0F+wrO7kMnthBE0rwLDYHgb6s48GxHV6dKfPzZYDjaCXzzUqZ6wvfDuvu2fkc7R+ RV7U5E7rcjtrqfoTmS8vslj6R X-Received: by 2002:a05:600c:2194:b0:3f3:195b:d18c with SMTP id e20-20020a05600c219400b003f3195bd18cmr14138392wme.30.1683098585354; Wed, 03 May 2023 00:23:05 -0700 (PDT) X-Received: by 2002:a05:600c:2194:b0:3f3:195b:d18c with SMTP id e20-20020a05600c219400b003f3195bd18cmr14138347wme.30.1683098584931; Wed, 03 May 2023 00:23:04 -0700 (PDT) Received: from localhost.localdomain.com ([2a02:b127:8011:7489:32ac:78e2:be8c:a5fb]) by smtp.gmail.com with ESMTPSA id k1-20020a7bc301000000b003eddc6aa5fasm947259wmj.39.2023.05.03.00.23.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 May 2023 00:23:04 -0700 (PDT) From: Juri Lelli To: Peter Zijlstra , Ingo Molnar , Qais Yousef , Waiman Long , Tejun Heo , Zefan Li , Johannes Weiner , Hao Luo Cc: Dietmar Eggemann , Steven Rostedt , linux-kernel@vger.kernel.org, luca.abeni@santannapisa.it, claudio@evidence.eu.com, tommaso.cucinotta@santannapisa.it, bristot@redhat.com, mathieu.poirier@linaro.org, cgroups@vger.kernel.org, Vincent Guittot , Wei Wang , Rick Yiu , Quentin Perret , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Sudeep Holla , Juri Lelli Subject: [PATCH v2 3/6] sched/cpuset: Keep track of SCHED_DEADLINE task in cpusets Date: Wed, 3 May 2023 09:22:25 +0200 Message-Id: <20230503072228.115707-4-juri.lelli@redhat.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230503072228.115707-1-juri.lelli@redhat.com> References: <20230503072228.115707-1-juri.lelli@redhat.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1764857915831244047?= X-GMAIL-MSGID: =?utf-8?q?1764857915831244047?= Qais reported that iterating over all tasks when rebuilding root domains for finding out which ones are DEADLINE and need their bandwidth correctly restored on such root domains can be a costly operation (10+ ms delays on suspend-resume). To fix the problem keep track of the number of DEADLINE tasks belonging to each cpuset and then use this information (followup patch) to only perform the above iteration if DEADLINE tasks are actually present in the cpuset for which a corresponding root domain is being rebuilt. Reported-by: Qais Yousef Link: https://lore.kernel.org/lkml/20230206221428.2125324-1-qyousef@layalina.io/ Signed-off-by: Juri Lelli Reviewed-by: Waiman Long --- include/linux/cpuset.h | 4 ++++ kernel/cgroup/cgroup.c | 4 ++++ kernel/cgroup/cpuset.c | 25 +++++++++++++++++++++++++ kernel/sched/deadline.c | 14 ++++++++++++++ 4 files changed, 47 insertions(+) diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h index f90e6325d707..d629094fac6e 100644 --- a/include/linux/cpuset.h +++ b/include/linux/cpuset.h @@ -71,6 +71,8 @@ extern void cpuset_init_smp(void); extern void cpuset_force_rebuild(void); extern void cpuset_update_active_cpus(void); extern void cpuset_wait_for_hotplug(void); +extern void inc_dl_tasks_cs(struct task_struct *task); +extern void dec_dl_tasks_cs(struct task_struct *task); extern void cpuset_lock(void); extern void cpuset_unlock(void); extern void cpuset_cpus_allowed(struct task_struct *p, struct cpumask *mask); @@ -189,6 +191,8 @@ static inline void cpuset_update_active_cpus(void) static inline void cpuset_wait_for_hotplug(void) { } +static inline void inc_dl_tasks_cs(struct task_struct *task) { } +static inline void dec_dl_tasks_cs(struct task_struct *task) { } static inline void cpuset_lock(void) { } static inline void cpuset_unlock(void) { } diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 625d7483951c..9d809191a54f 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -57,6 +57,7 @@ #include #include #include +#include #include #include @@ -6683,6 +6684,9 @@ void cgroup_exit(struct task_struct *tsk) list_add_tail(&tsk->cg_list, &cset->dying_tasks); cset->nr_tasks--; + if (dl_task(tsk)) + dec_dl_tasks_cs(tsk); + WARN_ON_ONCE(cgroup_task_frozen(tsk)); if (unlikely(!(tsk->flags & PF_KTHREAD) && test_bit(CGRP_FREEZE, &task_dfl_cgroup(tsk)->flags))) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index ee66be215fb9..b9f4d5602517 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -193,6 +193,12 @@ struct cpuset { int use_parent_ecpus; int child_ecpus_count; + /* + * number of SCHED_DEADLINE tasks attached to this cpuset, so that we + * know when to rebuild associated root domain bandwidth information. + */ + int nr_deadline_tasks; + /* Invalid partition error code, not lock protected */ enum prs_errcode prs_err; @@ -245,6 +251,20 @@ static inline struct cpuset *parent_cs(struct cpuset *cs) return css_cs(cs->css.parent); } +void inc_dl_tasks_cs(struct task_struct *p) +{ + struct cpuset *cs = task_cs(p); + + cs->nr_deadline_tasks++; +} + +void dec_dl_tasks_cs(struct task_struct *p) +{ + struct cpuset *cs = task_cs(p); + + cs->nr_deadline_tasks--; +} + /* bits in struct cpuset flags field */ typedef enum { CS_ONLINE, @@ -2497,6 +2517,11 @@ static int cpuset_can_attach(struct cgroup_taskset *tset) ret = security_task_setscheduler(task); if (ret) goto out_unlock; + + if (dl_task(task)) { + cs->nr_deadline_tasks++; + cpuset_attach_old_cs->nr_deadline_tasks--; + } } /* diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 5a9a4b81c972..e11de074a6fd 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -16,6 +16,8 @@ * Fabio Checconi */ +#include + /* * Default limits for DL period; on the top end we guard against small util * tasks still getting ridiculously long effective runtimes, on the bottom end we @@ -2596,6 +2598,12 @@ static void switched_from_dl(struct rq *rq, struct task_struct *p) if (task_on_rq_queued(p) && p->dl.dl_runtime) task_non_contending(p); + /* + * In case a task is setscheduled out from SCHED_DEADLINE we need to + * keep track of that on its cpuset (for correct bandwidth tracking). + */ + dec_dl_tasks_cs(p); + if (!task_on_rq_queued(p)) { /* * Inactive timer is armed. However, p is leaving DEADLINE and @@ -2636,6 +2644,12 @@ static void switched_to_dl(struct rq *rq, struct task_struct *p) if (hrtimer_try_to_cancel(&p->dl.inactive_timer) == 1) put_task_struct(p); + /* + * In case a task is setscheduled to SCHED_DEADLINE we need to keep + * track of that on its cpuset (for correct bandwidth tracking). + */ + inc_dl_tasks_cs(p); + /* If p is not queued we will update its parameters at next wakeup. */ if (!task_on_rq_queued(p)) { add_rq_bw(&p->dl, &rq->dl); From patchwork Wed May 3 07:22:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Juri Lelli X-Patchwork-Id: 89596 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp1141326vqo; Wed, 3 May 2023 00:37:10 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4kGdisCOjJFcUnCP9aZaKv5S2fpEkGwmBrTQNR+I4jGCPESpGgQ8JA8Bk29Zg6CPsb3iKW X-Received: by 2002:a05:6a00:2e89:b0:640:defd:a6b9 with SMTP id fd9-20020a056a002e8900b00640defda6b9mr33184363pfb.3.1683099430213; Wed, 03 May 2023 00:37:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683099430; cv=none; d=google.com; s=arc-20160816; b=HBVPMjCTfTLZvrlThTjYlVfgyb2nYMb0CoPJXwgW2F9aeW9KMyhQmqrr59a3vM1W6g JbVLpmfCvhuYQFKY+wpDgdbCK5D4J7Tk43aEM6NpHYTE8hxQADS2n1oLxRGhjebGN04P ImYBSLepTmZuUyaM61PBC751DnNDZTz5u/ViWH1xSxbhdhXuHjwxXHhYF2sJD5E8oWzv H7+wcV/GJdb7Z8Xqyu3uyu9Nq40Tq+6flx20P1idoMhhjBlOrzb6E7Yx+h/aYMNs5QXo v6krxMdhte8sDyAtpsh1xJXSiI3WjaUrbsf6v440caitEeSXQSVR3LGgHHVNkCEkXKNC Mf8Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=lqYMRmafBeRnD0m4iF8Fupf42DdArIxTTZivyAku9O8=; b=UMATIPso0zT8RUvwdSASC3QNlofyrl3xyNKgNidlR1rGQfwk6fAniO5Znf5v5i3zxy rDmgVV3h2hg7/1P+3C6IC/F94xCnow01z7S2sNFMW8xkYBiJ4f/Equ3hBA44nuQiAqdo dUwSkO/gxheW+E666RweaBgzOPJYgRwZT2OOIrNguhI5ZvrJKnX/68pPUiNxwep65BhD paysDjfynXLWPa1Q8UQzNoyBkJyT5YL39RhqSvdbOaADSQdN1urlhT3PiTOXSPG1LL0y lxTlcyp8EO+xC/YC1nhCmPpGk2J6qIMXpIdDE0m4dI+8hxnsbb7LsP+dNBeFvFeBotEu Md6Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=CJ7KQtEz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y83-20020a626456000000b0063b872934a4si32333614pfb.340.2023.05.03.00.36.54; Wed, 03 May 2023 00:37:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=CJ7KQtEz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229678AbjECHYH (ORCPT + 99 others); Wed, 3 May 2023 03:24:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34746 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229793AbjECHX4 (ORCPT ); Wed, 3 May 2023 03:23:56 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B98B840FD for ; Wed, 3 May 2023 00:23:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1683098591; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lqYMRmafBeRnD0m4iF8Fupf42DdArIxTTZivyAku9O8=; b=CJ7KQtEzF6hRKiZKQggPGeQyCMAuZbmOzYe2XcBodfsWuSpVodSjM+Va2AcwHptaDrLUiR AMEPFCDjn6+/NlFdEqdxK2eFYAms+qMdfZiThAjnNwhs8SWslnX5O9Mw/iUk+IJiy9BpX2 0T3fh+u3Po3BVxmHSDvBHIxfwigFtMk= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-635-EfQFO7JqNOq1LkQraJqDew-1; Wed, 03 May 2023 03:23:10 -0400 X-MC-Unique: EfQFO7JqNOq1LkQraJqDew-1 Received: by mail-wm1-f71.google.com with SMTP id 5b1f17b1804b1-3f3fe24912cso1278685e9.0 for ; Wed, 03 May 2023 00:23:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683098589; x=1685690589; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lqYMRmafBeRnD0m4iF8Fupf42DdArIxTTZivyAku9O8=; b=mD+Ayy/L2oQmqmIzZoHfYNNNpp4vAbpupDOy+3S4zGKAntlcCQ6Jmvr89V+GqWC1HB aob/VZ1KuS9cPqZMAEvCx9Ks+XaMWt5gEALHQvtGmbepARnowJK0bzevO/CLqohFBMFy MsdqWfJQAl3r/GJgHxUoPAoXWIsNW+KmZIrB9ZLslZlAnEtUbWLO2JLKLkEajJ+gMKHh 0eLBEXDyKbkuYnW0+DYZpqlsE9/wxtlCVxCBoZnuOr1cGXgS7lkWFe44gFANbkwZppZ3 QdwAwO8NaUwPnBGqfPuQ1OVa+jAg3MhpU3Z6stx8qyPQ5oE6yyO6wdb0TyBEyxDsDlgC 1vYw== X-Gm-Message-State: AC+VfDwhzSUZhPPhlU8JnMFcYp3+/xJAAriOuxrWpk5/qhvX2wCM59iR d1dwSiRDrDIWtGs/w9AR/giQ7ePGoCV/hZEqNnC6O9NCRGyo6C2x1zR2gs7kq5PhJzWtDjimFSi zWsK7DCjsFMXq2LabPqeU0ffi X-Received: by 2002:a05:600c:2212:b0:3f1:73c1:d1ad with SMTP id z18-20020a05600c221200b003f173c1d1admr13737472wml.35.1683098589220; Wed, 03 May 2023 00:23:09 -0700 (PDT) X-Received: by 2002:a05:600c:2212:b0:3f1:73c1:d1ad with SMTP id z18-20020a05600c221200b003f173c1d1admr13737445wml.35.1683098588945; Wed, 03 May 2023 00:23:08 -0700 (PDT) Received: from localhost.localdomain.com ([2a02:b127:8011:7489:32ac:78e2:be8c:a5fb]) by smtp.gmail.com with ESMTPSA id k1-20020a7bc301000000b003eddc6aa5fasm947259wmj.39.2023.05.03.00.23.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 May 2023 00:23:08 -0700 (PDT) From: Juri Lelli To: Peter Zijlstra , Ingo Molnar , Qais Yousef , Waiman Long , Tejun Heo , Zefan Li , Johannes Weiner , Hao Luo Cc: Dietmar Eggemann , Steven Rostedt , linux-kernel@vger.kernel.org, luca.abeni@santannapisa.it, claudio@evidence.eu.com, tommaso.cucinotta@santannapisa.it, bristot@redhat.com, mathieu.poirier@linaro.org, cgroups@vger.kernel.org, Vincent Guittot , Wei Wang , Rick Yiu , Quentin Perret , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Sudeep Holla , Juri Lelli Subject: [PATCH v2 4/6] cgroup/cpuset: Iterate only if DEADLINE tasks are present Date: Wed, 3 May 2023 09:22:26 +0200 Message-Id: <20230503072228.115707-5-juri.lelli@redhat.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230503072228.115707-1-juri.lelli@redhat.com> References: <20230503072228.115707-1-juri.lelli@redhat.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1764857667955032898?= X-GMAIL-MSGID: =?utf-8?q?1764857667955032898?= update_tasks_root_domain currently iterates over all tasks even if no DEADLINE task is present on the cpuset/root domain for which bandwidth accounting is being rebuilt. This has been reported to introduce 10+ ms delays on suspend-resume operations. Skip the costly iteration for cpusets that don't contain DEADLINE tasks. Reported-by: Qais Yousef Link: https://lore.kernel.org/lkml/20230206221428.2125324-1-qyousef@layalina.io/ Signed-off-by: Juri Lelli Reviewed-by: Waiman Long --- kernel/cgroup/cpuset.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index b9f4d5602517..6587df42cb61 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -1090,6 +1090,9 @@ static void dl_update_tasks_root_domain(struct cpuset *cs) struct css_task_iter it; struct task_struct *task; + if (cs->nr_deadline_tasks == 0) + return; + css_task_iter_start(&cs->css, 0, &it); while ((task = css_task_iter_next(&it))) From patchwork Wed May 3 07:22:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Juri Lelli X-Patchwork-Id: 89599 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp1143030vqo; Wed, 3 May 2023 00:41:35 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5ft362ekALVNt+pJ1t+/ywQHwq8e2FV4zLWybND9Ov5seLRR84bVrbnaKuWkSfmEp6JXFC X-Received: by 2002:a17:902:ce8a:b0:1a1:d54b:71df with SMTP id f10-20020a170902ce8a00b001a1d54b71dfmr1066860plg.0.1683099695379; Wed, 03 May 2023 00:41:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683099695; cv=none; d=google.com; s=arc-20160816; b=VxGhR4Xl0VE8PhVtFXIcgdveJM9t1oY8iMABUAnJL2f+Rdqh7i0PI6Sg8/ZwZf5ei4 AXhs9Qis4nxuloCiY7DYbozWmnWwEUlZkA1tZBvXLICIZQkQibczRNscWshw3dI4EIEJ ZgcTFcm+PhhdCpuYlEuJlyjj5YQ5RcJoVA6fz5r268vIS5++qB1OrctDeWa+IzDDpfd9 XJ3qScm9veZNLnn2cDtFdHc0BQT4jjjml37qEAYZUg6f1+vw2yYE8sj+qbzQP3VVk4Hn CicV/3ixd/7Bc1UlqHyXrRy0gY8yf1o8x9B1cobHC1Ry1ugqMh9HBtLQkInHxnErMLe/ 0HFg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=T6oalUZq+znxEQ+aiMIlVP5IUOx8Oj8y5hFDoaPcV5Y=; b=dBfmXx/ltb6UuF7kFvQcnZ9f3oX1qoVQnszLYpQab/tWYqNnHbWI0Lg0MVe4KGWoE1 Q5JfbChB+9r3fjjvrDxPaAi1XTVqUQb9grvFAnBP5bUF2MFzEHUiWTTfTisPEHbwVgYw XDbKm027wInS1k32yuuS/TxJhIhulFJAEfJ59lUyU0YqPF+Vlqb0af6tJ70meGALG0Lp X/QuMtp7BpaLwOtxcKrahJbVLNlkkPSGgMz7Jjnl4+qpomGynNiLdR0efnrBSIAvzeTD /1BLZIO9UrnhmvjRMYzUtaV3BMyAE6IYWg07W3avIOsdQP1tcGFuKbRdbnZPlELSMqvH dtYQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Gtr53rK6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u23-20020a1709026e1700b001a980a2b405si959630plk.471.2023.05.03.00.41.18; Wed, 03 May 2023 00:41:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Gtr53rK6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229675AbjECHZA (ORCPT + 99 others); Wed, 3 May 2023 03:25:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34796 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229893AbjECHYZ (ORCPT ); Wed, 3 May 2023 03:24:25 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 51D144231 for ; Wed, 3 May 2023 00:23:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1683098595; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=T6oalUZq+znxEQ+aiMIlVP5IUOx8Oj8y5hFDoaPcV5Y=; b=Gtr53rK65fVSIVc3Cl8nQtY6CBS8TnPuoY1LXgNUiYfRLpC6ozULusTMDv+84fbIQVDG+n WUJlUeRtJFvIe6h3dHHDmQf9ZtDXek9iQhv56ddHJ85JEEV6mKjUTxyHobGavc7HQz5IVy n6bzDuGncWqojT+ODZY5RPTBEVVWD8s= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-330-lgjIlMNPPmWPYVDfK1DIiA-1; Wed, 03 May 2023 03:23:14 -0400 X-MC-Unique: lgjIlMNPPmWPYVDfK1DIiA-1 Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-3f170a1fbe7so28924135e9.2 for ; Wed, 03 May 2023 00:23:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683098593; x=1685690593; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=T6oalUZq+znxEQ+aiMIlVP5IUOx8Oj8y5hFDoaPcV5Y=; b=i4wNAgmWEWPEmMMXhPOCHX/Ra60l73YRKw1+T4ytOMs0U0rUG3837qCVDMlzz5qgQd GLca8x4EqbqPX8WQs5FQSFw3ZoNgTigb2bIx3ccVQKIll3TNVoDJb66UYgCzON4TCbRB RcsbaTnWFWrS83MfV3Cg2t6miX3jO4u2hihRhlqPjpMEQPwOTuSrWCu+Ny7RTszDEVai lCz0O4Z2pLQGJzbbJT8AMjy3qXKCQvw0OpnxTPH/CV9OEsxKAKpChgywoTPy44XRzjN7 9F7kKA5rNExcLJLzhHbXrYS45MApX769Hjs+Ld5HOt/zpT49/LNNeE2m7Bedr4E2tR1y gybA== X-Gm-Message-State: AC+VfDyv3J/eqMFQGMtfbKuWIDTO/2Jt9O+/LmD5d5kRuyvmcWmOeDYR m7yB7MJ/iHz9PeWuRRCg2cvQlUUFnpT6JVhCuaZm3ZM2QOz+yNpVQzwA8IAAVkioxnLP7VJ7Pr6 rYp0XWsqYZMAkMqkX18672Zoo X-Received: by 2002:a1c:4c0e:0:b0:3f0:7f4b:f3c0 with SMTP id z14-20020a1c4c0e000000b003f07f4bf3c0mr14741107wmf.19.1683098593142; Wed, 03 May 2023 00:23:13 -0700 (PDT) X-Received: by 2002:a1c:4c0e:0:b0:3f0:7f4b:f3c0 with SMTP id z14-20020a1c4c0e000000b003f07f4bf3c0mr14741087wmf.19.1683098592742; Wed, 03 May 2023 00:23:12 -0700 (PDT) Received: from localhost.localdomain.com ([2a02:b127:8011:7489:32ac:78e2:be8c:a5fb]) by smtp.gmail.com with ESMTPSA id k1-20020a7bc301000000b003eddc6aa5fasm947259wmj.39.2023.05.03.00.23.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 May 2023 00:23:11 -0700 (PDT) From: Juri Lelli To: Peter Zijlstra , Ingo Molnar , Qais Yousef , Waiman Long , Tejun Heo , Zefan Li , Johannes Weiner , Hao Luo Cc: Dietmar Eggemann , Steven Rostedt , linux-kernel@vger.kernel.org, luca.abeni@santannapisa.it, claudio@evidence.eu.com, tommaso.cucinotta@santannapisa.it, bristot@redhat.com, mathieu.poirier@linaro.org, cgroups@vger.kernel.org, Vincent Guittot , Wei Wang , Rick Yiu , Quentin Perret , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Sudeep Holla , Juri Lelli Subject: [PATCH v2 5/6] sched/deadline: Create DL BW alloc, free & check overflow interface Date: Wed, 3 May 2023 09:22:27 +0200 Message-Id: <20230503072228.115707-6-juri.lelli@redhat.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230503072228.115707-1-juri.lelli@redhat.com> References: <20230503072228.115707-1-juri.lelli@redhat.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1764857945956178166?= X-GMAIL-MSGID: =?utf-8?q?1764857945956178166?= From: Dietmar Eggemann Rework the existing dl_cpu_busy() interface which offers DL BW overflow checking and per-task DL BW allocation. Add dl_bw_free() as an interface to be able to free DL BW. It will be used to allow freeing of the DL BW request done during cpuset_can_attach() in case multiple controllers are attached to the cgroup next to the cpuset controller and one of the non-cpuset can_attach() fails. dl_bw_alloc() (and dl_bw_free()) now take a `u64 dl_bw` parameter instead of `struct task_struct *p` used in dl_cpu_busy(). This allows to allocate DL BW for a set of tasks too rater than only for a single task. Signed-off-by: Dietmar Eggemann Signed-off-by: Juri Lelli Signed-off-by: Dietmar Eggemann Signed-off-by: Juri Lelli --- include/linux/sched.h | 2 ++ kernel/sched/core.c | 4 ++-- kernel/sched/deadline.c | 53 +++++++++++++++++++++++++++++++---------- kernel/sched/sched.h | 2 +- 4 files changed, 45 insertions(+), 16 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index eed5d65b8d1f..0bee06542450 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1853,6 +1853,8 @@ current_restore_flags(unsigned long orig_flags, unsigned long flags) extern int cpuset_cpumask_can_shrink(const struct cpumask *cur, const struct cpumask *trial); extern int task_can_attach(struct task_struct *p, const struct cpumask *cs_effective_cpus); +extern int dl_bw_alloc(int cpu, u64 dl_bw); +extern void dl_bw_free(int cpu, u64 dl_bw); #ifdef CONFIG_SMP extern void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask); extern int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index d826bec1c522..df659892d7d5 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -9319,7 +9319,7 @@ int task_can_attach(struct task_struct *p, if (unlikely(cpu >= nr_cpu_ids)) return -EINVAL; - ret = dl_cpu_busy(cpu, p); + ret = dl_bw_alloc(cpu, p->dl.dl_bw); } out: @@ -9604,7 +9604,7 @@ static void cpuset_cpu_active(void) static int cpuset_cpu_inactive(unsigned int cpu) { if (!cpuhp_tasks_frozen) { - int ret = dl_cpu_busy(cpu, NULL); + int ret = dl_bw_check_overflow(cpu); if (ret) return ret; diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index e11de074a6fd..166c3e6eae61 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -3058,26 +3058,38 @@ int dl_cpuset_cpumask_can_shrink(const struct cpumask *cur, return ret; } -int dl_cpu_busy(int cpu, struct task_struct *p) +enum dl_bw_request { + dl_bw_req_check_overflow = 0, + dl_bw_req_alloc, + dl_bw_req_free +}; + +static int dl_bw_manage(enum dl_bw_request req, int cpu, u64 dl_bw) { - unsigned long flags, cap; + unsigned long flags; struct dl_bw *dl_b; - bool overflow; + bool overflow = 0; rcu_read_lock_sched(); dl_b = dl_bw_of(cpu); raw_spin_lock_irqsave(&dl_b->lock, flags); - cap = dl_bw_capacity(cpu); - overflow = __dl_overflow(dl_b, cap, 0, p ? p->dl.dl_bw : 0); - if (!overflow && p) { - /* - * We reserve space for this task in the destination - * root_domain, as we can't fail after this point. - * We will free resources in the source root_domain - * later on (see set_cpus_allowed_dl()). - */ - __dl_add(dl_b, p->dl.dl_bw, dl_bw_cpus(cpu)); + if (req == dl_bw_req_free) { + __dl_sub(dl_b, dl_bw, dl_bw_cpus(cpu)); + } else { + unsigned long cap = dl_bw_capacity(cpu); + + overflow = __dl_overflow(dl_b, cap, 0, dl_bw); + + if (req == dl_bw_req_alloc && !overflow) { + /* + * We reserve space in the destination + * root_domain, as we can't fail after this point. + * We will free resources in the source root_domain + * later on (see set_cpus_allowed_dl()). + */ + __dl_add(dl_b, dl_bw, dl_bw_cpus(cpu)); + } } raw_spin_unlock_irqrestore(&dl_b->lock, flags); @@ -3085,6 +3097,21 @@ int dl_cpu_busy(int cpu, struct task_struct *p) return overflow ? -EBUSY : 0; } + +int dl_bw_check_overflow(int cpu) +{ + return dl_bw_manage(dl_bw_req_check_overflow, cpu, 0); +} + +int dl_bw_alloc(int cpu, u64 dl_bw) +{ + return dl_bw_manage(dl_bw_req_alloc, cpu, dl_bw); +} + +void dl_bw_free(int cpu, u64 dl_bw) +{ + dl_bw_manage(dl_bw_req_free, cpu, dl_bw); +} #endif #ifdef CONFIG_SCHED_DEBUG diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index ec7b3e0a2b20..0ad712811e35 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -330,7 +330,7 @@ extern void __getparam_dl(struct task_struct *p, struct sched_attr *attr); extern bool __checkparam_dl(const struct sched_attr *attr); extern bool dl_param_changed(struct task_struct *p, const struct sched_attr *attr); extern int dl_cpuset_cpumask_can_shrink(const struct cpumask *cur, const struct cpumask *trial); -extern int dl_cpu_busy(int cpu, struct task_struct *p); +extern int dl_bw_check_overflow(int cpu); #ifdef CONFIG_CGROUP_SCHED From patchwork Wed May 3 07:22:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Juri Lelli X-Patchwork-Id: 89593 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp1138068vqo; Wed, 3 May 2023 00:29:20 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5aHWMYD4nMojcee2o54kOmdjVgfO96ArmYBSymfr46ebDSx/RL5Np62QLC3gLyNjt7tsXp X-Received: by 2002:a05:6a20:9150:b0:f5:d6cc:e3f1 with SMTP id x16-20020a056a20915000b000f5d6cce3f1mr26630519pzc.20.1683098959876; Wed, 03 May 2023 00:29:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683098959; cv=none; d=google.com; s=arc-20160816; b=F/o7+pxo2MDpYF1GC2dcJYss7sgwX9hhm5eCK6f55SqrVYkKSg+nlWqTDNSLL0xHcR 49q2H8jtF45SRwhkYk+17bw/l1DCLmNlGJAAE37cNRwwpcSR3mdh9c8L+zggSMcrJEDT Dl6q4Y1qvB4mJkL4G4bm8ndOVLwaWXTkEaXC8PrgMKlA6wiRlut8/ZIdYAeahQjc9y3G 520ws7slQm349c284DH0NuiLapoc0G5uCY/6k7Fd0G1PQXWAeNZW1/JKPl7VWszoptBx IeGGaoc9+wctWx/RseAKJWYksz1Gb70z6x2NKeK+A56wf7QGp4t9sTGtTJKU3Uj+DFR4 9M/Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=BtI4BRLmm2n37T/xC2JzerXvVB+wDFInkUK+VsZeYGc=; b=xb1W3B+bwu2e/6tOkr4kfw5ZF5UxAodN7U0fGGzDpXv7dR3hJELcIVtpau9B8zVJRw Lrh58cQwIHYLIlSIxA6RrsOb/9eRRRYltIlVa4rVn7v0V7fN70X85vubVhJP3SnzM1Eq eQSGWYp1sQtQhRarSpqdcUQJD0kiO8EKsduwHmMSw1RouezDrMuDMktRUgfVA7xFE9yq ub3nD7fjtUzGZNd5u3peLph2N1xtWDeR+zftpRs06keVLEVZTXyaz4Tdl9sPMAGtyCB+ zOPUvjYAWFNDKDkAFqB9OWdfFXTqhzrNQtHG0E5dx9wpKt1eyuxQAHfHI7G2uXGKsroY 0rWQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=LGF+S8yc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bs193-20020a6328ca000000b0052c27a0125bsi6065485pgb.738.2023.05.03.00.29.05; Wed, 03 May 2023 00:29:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=LGF+S8yc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229488AbjECHZD (ORCPT + 99 others); Wed, 3 May 2023 03:25:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34798 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229742AbjECHY3 (ORCPT ); Wed, 3 May 2023 03:24:29 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BE1FF423C for ; Wed, 3 May 2023 00:23:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1683098598; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BtI4BRLmm2n37T/xC2JzerXvVB+wDFInkUK+VsZeYGc=; b=LGF+S8ycdyEcjc/wmaCuecr+2Fa0nUteHfyZLioCZKSh12jHEmOWf0AYBvbDfgugDrauvP 7j6GBCXN3zPI5Mnis/3UtDm1et+pjvvEXGgVmDBj7rC1bRcA7YhK/Tc8v8dlKRoQxZ9/XX 6jkUGkEe4aaQ5oQZ7AkUx+y6XNlp/LY= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-19-Y6nrjbj8M9Wl104-2vmdbg-1; Wed, 03 May 2023 03:23:17 -0400 X-MC-Unique: Y6nrjbj8M9Wl104-2vmdbg-1 Received: by mail-wm1-f71.google.com with SMTP id 5b1f17b1804b1-3f17b8d24bbso29665145e9.2 for ; Wed, 03 May 2023 00:23:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683098596; x=1685690596; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BtI4BRLmm2n37T/xC2JzerXvVB+wDFInkUK+VsZeYGc=; b=jKSI2dFC00TsHAYvedJ0OF1PNt88Z8aSzNWy3dT//enOb6XUVxdwucTt8A+DkHlnJF NV986doTKInf13KRF6jvfVez2te5PIkuhp4zrfp0sam3Pn431s9IAjx2WpYRPdl6XeVw wHHFul3bjGsx8LI/z2SXxNYi9srcA6pdrbRlwBbf4YuxIIhKGdr0sMuy2tomOfYb9Syx tlYx+x9w3F3upV/NLVomD1COrK9OSW5r09R5/hI9fgD4VEzSaZRB5QVQ/CFWpTswX29E VR27Nrl5TBEV/DywvycZ5kj6R0IDVDCcW/xkFGu6f7M2mb9uj3hP94WqNQ2d0q4XdvXE ZvLQ== X-Gm-Message-State: AC+VfDyaM6dB7k47eOYqZWETin3QNN+uR+SAQm22NXdTsaPGzdmmxhUS eqsGwkbOEv/Ml/zYFS2x5nKXO+CT4mERRftKAOrf/Y74nTzu69hMq9pRg5+XpVhF/6/Iz4nM73L CXpTg8ZzaT7K7+EEl/VNRKtxI X-Received: by 2002:a7b:cc1a:0:b0:3f0:9564:f4f6 with SMTP id f26-20020a7bcc1a000000b003f09564f4f6mr14209878wmh.1.1683098596517; Wed, 03 May 2023 00:23:16 -0700 (PDT) X-Received: by 2002:a7b:cc1a:0:b0:3f0:9564:f4f6 with SMTP id f26-20020a7bcc1a000000b003f09564f4f6mr14209863wmh.1.1683098596243; Wed, 03 May 2023 00:23:16 -0700 (PDT) Received: from localhost.localdomain.com ([2a02:b127:8011:7489:32ac:78e2:be8c:a5fb]) by smtp.gmail.com with ESMTPSA id k1-20020a7bc301000000b003eddc6aa5fasm947259wmj.39.2023.05.03.00.23.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 May 2023 00:23:15 -0700 (PDT) From: Juri Lelli To: Peter Zijlstra , Ingo Molnar , Qais Yousef , Waiman Long , Tejun Heo , Zefan Li , Johannes Weiner , Hao Luo Cc: Dietmar Eggemann , Steven Rostedt , linux-kernel@vger.kernel.org, luca.abeni@santannapisa.it, claudio@evidence.eu.com, tommaso.cucinotta@santannapisa.it, bristot@redhat.com, mathieu.poirier@linaro.org, cgroups@vger.kernel.org, Vincent Guittot , Wei Wang , Rick Yiu , Quentin Perret , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Sudeep Holla , Juri Lelli Subject: [PATCH v2 6/6] cgroup/cpuset: Free DL BW in case can_attach() fails Date: Wed, 3 May 2023 09:22:28 +0200 Message-Id: <20230503072228.115707-7-juri.lelli@redhat.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230503072228.115707-1-juri.lelli@redhat.com> References: <20230503072228.115707-1-juri.lelli@redhat.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1764857174414236195?= X-GMAIL-MSGID: =?utf-8?q?1764857174414236195?= From: Dietmar Eggemann cpuset_can_attach() can fail. Postpone DL BW allocation until all tasks have been checked. DL BW is not allocated per-task but as a sum over all DL tasks migrating. If multiple controllers are attached to the cgroup next to the cpuset controller a non-cpuset can_attach() can fail. In this case free DL BW in cpuset_cancel_attach(). Finally, update cpuset DL task count (nr_deadline_tasks) only in cpuset_attach(). Suggested-by: Waiman Long Signed-off-by: Dietmar Eggemann Signed-off-by: Juri Lelli Reviewed-by: Waiman Long --- include/linux/sched.h | 2 +- kernel/cgroup/cpuset.c | 53 ++++++++++++++++++++++++++++++++++++++---- kernel/sched/core.c | 17 ++------------ 3 files changed, 51 insertions(+), 21 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 0bee06542450..2553918f0b61 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1852,7 +1852,7 @@ current_restore_flags(unsigned long orig_flags, unsigned long flags) } extern int cpuset_cpumask_can_shrink(const struct cpumask *cur, const struct cpumask *trial); -extern int task_can_attach(struct task_struct *p, const struct cpumask *cs_effective_cpus); +extern int task_can_attach(struct task_struct *p); extern int dl_bw_alloc(int cpu, u64 dl_bw); extern void dl_bw_free(int cpu, u64 dl_bw); #ifdef CONFIG_SMP diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 6587df42cb61..d1073603c96c 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -198,6 +198,8 @@ struct cpuset { * know when to rebuild associated root domain bandwidth information. */ int nr_deadline_tasks; + int nr_migrate_dl_tasks; + u64 sum_migrate_dl_bw; /* Invalid partition error code, not lock protected */ enum prs_errcode prs_err; @@ -2494,16 +2496,23 @@ static int cpuset_can_attach_check(struct cpuset *cs) return 0; } +static void reset_migrate_dl_data(struct cpuset *cs) +{ + cs->nr_migrate_dl_tasks = 0; + cs->sum_migrate_dl_bw = 0; +} + /* Called by cgroups to determine if a cpuset is usable; cpuset_mutex held */ static int cpuset_can_attach(struct cgroup_taskset *tset) { struct cgroup_subsys_state *css; - struct cpuset *cs; + struct cpuset *cs, *oldcs; struct task_struct *task; int ret; /* used later by cpuset_attach() */ cpuset_attach_old_cs = task_cs(cgroup_taskset_first(tset, &css)); + oldcs = cpuset_attach_old_cs; cs = css_cs(css); mutex_lock(&cpuset_mutex); @@ -2514,7 +2523,7 @@ static int cpuset_can_attach(struct cgroup_taskset *tset) goto out_unlock; cgroup_taskset_for_each(task, css, tset) { - ret = task_can_attach(task, cs->effective_cpus); + ret = task_can_attach(task); if (ret) goto out_unlock; ret = security_task_setscheduler(task); @@ -2522,11 +2531,31 @@ static int cpuset_can_attach(struct cgroup_taskset *tset) goto out_unlock; if (dl_task(task)) { - cs->nr_deadline_tasks++; - cpuset_attach_old_cs->nr_deadline_tasks--; + cs->nr_migrate_dl_tasks++; + cs->sum_migrate_dl_bw += task->dl.dl_bw; } } + if (!cs->nr_migrate_dl_tasks) + goto out_success; + + if (!cpumask_intersects(oldcs->effective_cpus, cs->effective_cpus)) { + int cpu = cpumask_any_and(cpu_active_mask, cs->effective_cpus); + + if (unlikely(cpu >= nr_cpu_ids)) { + reset_migrate_dl_data(cs); + ret = -EINVAL; + goto out_unlock; + } + + ret = dl_bw_alloc(cpu, cs->sum_migrate_dl_bw); + if (ret) { + reset_migrate_dl_data(cs); + goto out_unlock; + } + } + +out_success: /* * Mark attach is in progress. This makes validate_change() fail * changes which zero cpus/mems_allowed. @@ -2549,6 +2578,14 @@ static void cpuset_cancel_attach(struct cgroup_taskset *tset) cs->attach_in_progress--; if (!cs->attach_in_progress) wake_up(&cpuset_attach_wq); + + if (cs->nr_migrate_dl_tasks) { + int cpu = cpumask_any(cs->effective_cpus); + + dl_bw_free(cpu, cs->sum_migrate_dl_bw); + reset_migrate_dl_data(cs); + } + mutex_unlock(&cpuset_mutex); } @@ -2649,6 +2686,12 @@ static void cpuset_attach(struct cgroup_taskset *tset) out: cs->old_mems_allowed = cpuset_attach_nodemask_to; + if (cs->nr_migrate_dl_tasks) { + cs->nr_deadline_tasks += cs->nr_migrate_dl_tasks; + oldcs->nr_deadline_tasks -= cs->nr_migrate_dl_tasks; + reset_migrate_dl_data(cs); + } + cs->attach_in_progress--; if (!cs->attach_in_progress) wake_up(&cpuset_attach_wq); @@ -3328,7 +3371,7 @@ static int cpuset_can_fork(struct task_struct *task, struct css_set *cset) if (ret) goto out_unlock; - ret = task_can_attach(task, cs->effective_cpus); + ret = task_can_attach(task); if (ret) goto out_unlock; diff --git a/kernel/sched/core.c b/kernel/sched/core.c index df659892d7d5..ed0d7381b2ec 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -9294,8 +9294,7 @@ int cpuset_cpumask_can_shrink(const struct cpumask *cur, return ret; } -int task_can_attach(struct task_struct *p, - const struct cpumask *cs_effective_cpus) +int task_can_attach(struct task_struct *p) { int ret = 0; @@ -9308,21 +9307,9 @@ int task_can_attach(struct task_struct *p, * success of set_cpus_allowed_ptr() on all attached tasks * before cpus_mask may be changed. */ - if (p->flags & PF_NO_SETAFFINITY) { + if (p->flags & PF_NO_SETAFFINITY) ret = -EINVAL; - goto out; - } - - if (dl_task(p) && !cpumask_intersects(task_rq(p)->rd->span, - cs_effective_cpus)) { - int cpu = cpumask_any_and(cpu_active_mask, cs_effective_cpus); - if (unlikely(cpu >= nr_cpu_ids)) - return -EINVAL; - ret = dl_bw_alloc(cpu, p->dl.dl_bw); - } - -out: return ret; }