From patchwork Tue Dec 27 02:29:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 36733 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp1189668wrt; Mon, 26 Dec 2022 18:30:55 -0800 (PST) X-Google-Smtp-Source: AMrXdXt9w4g2y3ZPOVrdG+KL8/zUeHeoGzbzGOdy6qXp5DrINoOEdLnbFSJGqLcGc9rdE5mAhySM X-Received: by 2002:a17:902:8f87:b0:189:b36a:5448 with SMTP id z7-20020a1709028f8700b00189b36a5448mr22782729plo.44.1672108255496; Mon, 26 Dec 2022 18:30:55 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672108255; cv=none; d=google.com; s=arc-20160816; b=MNGpMbJnAYLVNW3BEtojnQsqAql9+X4RT66tlmM8DEnkZYJr0pTl/WG0L3OTJDpVG5 nvIYfhImHvGvPJGsVUcK3UYdMDPnP+tevZwlHnx+bMVwpL9k+xi2+URmGmO1PKl4m8ic mGyNC8EEzOlJl+Ak7HJ/1VYT3aHyNx1zrxrTq52J/qhDIqmLsrgR76ATDzpH0+N3o6R1 qUcp1F+M1P0iEYGvqC7NnwjKw9c4Dih9nLMesHE0Ri/Xb0rh6ZUHHE5E94r1xK6EBh27 no8v2hE0uUdgoAnxNlqQtnxj8GVCLz2FyASGXnFLJRNWBGkGGMTO8xRb2NoiaxbqqfA3 YEXA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=ZkiXXqGvaxrzURStopKYIY21GLdv3Aj4sQRrit16zm0=; b=M8Wp+iJCin+7hnASKy+33bAKWT4Y10mXL1r7ieYikFVBYRBX9lEafj9NZEmXd24Lrt oiz+PEcUtlnqcu28kX1C2d/YlP3tE6PprVdSAxlf9/Nvu8VJP7dFOG/TdvBeQzRqsAHF ck0q1j9EbcxqGU3/tLu9zDnVyMoJDJjd8Qzq+JnfB2VW8sbgmcmaKxtKPrTHJlvrY0Yr EgmFYu09o5anJPyqTRByrUgQia9Ho2A3vTuUUz8H2fSrXDm+Xc/+95OVEtntDmlCOJNM XCVAWODLm/z7YNnW5+uvaX0Fh1Zc37QVKpVYien3hnc1jC3DlqeXnOsouR5PftrYbwt3 WwZw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Pr6G3bQn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q10-20020a170902f78a00b00176db49b734si12624817pln.283.2022.12.26.18.30.43; Mon, 26 Dec 2022 18:30:55 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Pr6G3bQn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232671AbiL0CaV (ORCPT + 99 others); Mon, 26 Dec 2022 21:30:21 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52374 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232524AbiL0CaO (ORCPT ); Mon, 26 Dec 2022 21:30:14 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C73C6CCD for ; Mon, 26 Dec 2022 18:29:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1672108161; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZkiXXqGvaxrzURStopKYIY21GLdv3Aj4sQRrit16zm0=; b=Pr6G3bQnBRhKOd99skTEHcNFgDY1B15vVcBaEQi2a5dXXA0k+guc5kcfxCQ16CPrsSy7Jt 2HXyOL7qUrBESsXVAKINYZbWOkvk3HHEhbrHbT72arExioQ3jp3a1ftClX5toa91kBMmcc ewUrL4n1IOHz8BR7+MKKUItT+0T3v4E= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-180-cwoAp4vSOsyJyCDdAZiRug-1; Mon, 26 Dec 2022 21:29:17 -0500 X-MC-Unique: cwoAp4vSOsyJyCDdAZiRug-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 3513B85CBE1; Tue, 27 Dec 2022 02:29:17 +0000 (UTC) Received: from localhost (ovpn-8-22.pek2.redhat.com [10.72.8.22]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4BB46492B00; Tue, 27 Dec 2022 02:29:15 +0000 (UTC) From: Ming Lei To: Thomas Gleixner , Jens Axboe Cc: linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, Christoph Hellwig , John Garry , Ming Lei Subject: [PATCH V4 1/6] genirq/affinity: Remove the 'firstvec' parameter from irq_build_affinity_masks Date: Tue, 27 Dec 2022 10:29:00 +0800 Message-Id: <20221227022905.352674-2-ming.lei@redhat.com> In-Reply-To: <20221227022905.352674-1-ming.lei@redhat.com> References: <20221227022905.352674-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1753332586237347319?= X-GMAIL-MSGID: =?utf-8?q?1753332586237347319?= The 'firstvec' parameter is always same with the parameter of 'startvec', so use 'startvec' directly inside irq_build_affinity_masks(). Reviewed-by: Christoph Hellwig Signed-off-by: Ming Lei Reviewed-by: John Garry --- kernel/irq/affinity.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c index d9a5c1d65a79..3361e36ebaa1 100644 --- a/kernel/irq/affinity.c +++ b/kernel/irq/affinity.c @@ -337,10 +337,10 @@ static int __irq_build_affinity_masks(unsigned int startvec, * 2) spread other possible CPUs on these vectors */ static int irq_build_affinity_masks(unsigned int startvec, unsigned int numvecs, - unsigned int firstvec, struct irq_affinity_desc *masks) { unsigned int curvec = startvec, nr_present = 0, nr_others = 0; + unsigned int firstvec = startvec; cpumask_var_t *node_to_cpumask; cpumask_var_t nmsk, npresmsk; int ret = -ENOMEM; @@ -463,8 +463,7 @@ irq_create_affinity_masks(unsigned int nvecs, struct irq_affinity *affd) unsigned int this_vecs = affd->set_size[i]; int ret; - ret = irq_build_affinity_masks(curvec, this_vecs, - curvec, masks); + ret = irq_build_affinity_masks(curvec, this_vecs, masks); if (ret) { kfree(masks); return NULL; From patchwork Tue Dec 27 02:29:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 36732 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp1189634wrt; Mon, 26 Dec 2022 18:30:49 -0800 (PST) X-Google-Smtp-Source: AMrXdXsmTbhJ677Qi2cha4/ykkbHAp5HLAKr43lQN57zfL0eN6OQyomSqRT/0iLyriH1t9C091HU X-Received: by 2002:a62:2741:0:b0:577:a0d:b091 with SMTP id n62-20020a622741000000b005770a0db091mr22424226pfn.14.1672108248862; Mon, 26 Dec 2022 18:30:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672108248; cv=none; d=google.com; s=arc-20160816; b=o3uTVshOA1etQr8ebu7pZPFl6iC2jwZ7MhgEfhsNZ3f4yn87qo4/76oUpZssGo43Cp MoimqsOTRkt84SGOFcWLpAigXTh0hMSKUvtqlUkQY9TcHvXkvppenqBnREfS/mnrbfCy W01WWjHY3UAVHycwaB26n1MuYnGrsGsZGGM4sCgGOLBItf+2nPtcdRGZJc49OB01EJi4 XefdNfDtcAZh8XAMaytN1oaFKrb05K2xIWuMf6Klug4MbkLcm9K5GolLjHL/q6sth/RP UCVyKY2FUq3R1+GtSuTBrJhRYI9Q3G9cUvF+O55KPMtSOcqG73gYSX3C8qNSYZGuylo4 peVg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=8s9N4odoxU9TLcs2ylQzubyo1NqVpZprjNLrRaHWiXo=; b=wU97ws75HGTj4cjaJpPLz7jZZghcsIv6zy2xlab8T0BiFjEryBfH1tDh2/v4uAGu7W GDyCY628scjStNwvia7EoROf5Y6FDC968/FXq8GoT23MBQiC9U1pFzPTcXunv9WTjgcI 7adzSpMVyYd+hhlJzxqWxJpkHC4WnCINzbKhRHp3GBtbMHroKKamaC7z0USalHEj10ps z91wml3u15TaK10vhgAt0vEz5NFjoLCF6Y1d3/W6H/0Ge/h+g36MHLb+1ps9LaRI77ZF AoEjYNAKE1ufVvgKQfFHCcPetrqU+j56StneAtj95Di2jC7lB8W6i5XNlHdkwxQbuyyt dNxw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=EGUUYsX3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m2-20020a625802000000b00572e6576462si12584825pfb.97.2022.12.26.18.30.36; Mon, 26 Dec 2022 18:30:48 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=EGUUYsX3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232621AbiL0CaK (ORCPT + 99 others); Mon, 26 Dec 2022 21:30:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53624 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232279AbiL0CaH (ORCPT ); Mon, 26 Dec 2022 21:30:07 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5E62C6355 for ; Mon, 26 Dec 2022 18:29:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1672108166; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8s9N4odoxU9TLcs2ylQzubyo1NqVpZprjNLrRaHWiXo=; b=EGUUYsX3VSfYiV0pBFHLQBbviqi2N7tv6RPuKMz39PSSpQFAe+ISL98SlNaPART04G+qbR TjrxmoUGAf09BIMxpRgvxUcKvFADqly50ZT/6Ym+eMUjezCb+QGB265IAEp0Vm6tvLgsVi ddku2FeHoYovb2WRVIVLpfwwm7ExKLM= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-541-oY-z5lflOfCpSRNfuXqwcQ-1; Mon, 26 Dec 2022 21:29:22 -0500 X-MC-Unique: oY-z5lflOfCpSRNfuXqwcQ-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 3DD2F811E6E; Tue, 27 Dec 2022 02:29:22 +0000 (UTC) Received: from localhost (ovpn-8-22.pek2.redhat.com [10.72.8.22]) by smtp.corp.redhat.com (Postfix) with ESMTP id 413732026D4B; Tue, 27 Dec 2022 02:29:20 +0000 (UTC) From: Ming Lei To: Thomas Gleixner , Jens Axboe Cc: linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, Christoph Hellwig , John Garry , Ming Lei Subject: [PATCH V4 2/6] genirq/affinity: Pass affinity managed mask array to irq_build_affinity_masks Date: Tue, 27 Dec 2022 10:29:01 +0800 Message-Id: <20221227022905.352674-3-ming.lei@redhat.com> In-Reply-To: <20221227022905.352674-1-ming.lei@redhat.com> References: <20221227022905.352674-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1753332578955715344?= X-GMAIL-MSGID: =?utf-8?q?1753332578955715344?= Pass affinity managed mask array to irq_build_affinity_masks() so that index of the first affinity managed vector is always zero, then we can simplify the implementation a bit. Reviewed-by: Christoph Hellwig Signed-off-by: Ming Lei Reviewed-by: John Garry --- kernel/irq/affinity.c | 28 ++++++++++++---------------- 1 file changed, 12 insertions(+), 16 deletions(-) diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c index 3361e36ebaa1..da6379cd27fd 100644 --- a/kernel/irq/affinity.c +++ b/kernel/irq/affinity.c @@ -246,14 +246,13 @@ static void alloc_nodes_vectors(unsigned int numvecs, static int __irq_build_affinity_masks(unsigned int startvec, unsigned int numvecs, - unsigned int firstvec, cpumask_var_t *node_to_cpumask, const struct cpumask *cpu_mask, struct cpumask *nmsk, struct irq_affinity_desc *masks) { unsigned int i, n, nodes, cpus_per_vec, extra_vecs, done = 0; - unsigned int last_affv = firstvec + numvecs; + unsigned int last_affv = numvecs; unsigned int curvec = startvec; nodemask_t nodemsk = NODE_MASK_NONE; struct node_vectors *node_vectors; @@ -273,7 +272,7 @@ static int __irq_build_affinity_masks(unsigned int startvec, cpumask_and(nmsk, cpu_mask, node_to_cpumask[n]); cpumask_or(&masks[curvec].mask, &masks[curvec].mask, nmsk); if (++curvec == last_affv) - curvec = firstvec; + curvec = 0; } return numvecs; } @@ -321,7 +320,7 @@ static int __irq_build_affinity_masks(unsigned int startvec, * may start anywhere */ if (curvec >= last_affv) - curvec = firstvec; + curvec = 0; irq_spread_init_one(&masks[curvec].mask, nmsk, cpus_per_vec); } @@ -336,11 +335,10 @@ static int __irq_build_affinity_masks(unsigned int startvec, * 1) spread present CPU on these vectors * 2) spread other possible CPUs on these vectors */ -static int irq_build_affinity_masks(unsigned int startvec, unsigned int numvecs, +static int irq_build_affinity_masks(unsigned int numvecs, struct irq_affinity_desc *masks) { - unsigned int curvec = startvec, nr_present = 0, nr_others = 0; - unsigned int firstvec = startvec; + unsigned int curvec = 0, nr_present = 0, nr_others = 0; cpumask_var_t *node_to_cpumask; cpumask_var_t nmsk, npresmsk; int ret = -ENOMEM; @@ -360,9 +358,8 @@ static int irq_build_affinity_masks(unsigned int startvec, unsigned int numvecs, build_node_to_cpumask(node_to_cpumask); /* Spread on present CPUs starting from affd->pre_vectors */ - ret = __irq_build_affinity_masks(curvec, numvecs, firstvec, - node_to_cpumask, cpu_present_mask, - nmsk, masks); + ret = __irq_build_affinity_masks(curvec, numvecs, node_to_cpumask, + cpu_present_mask, nmsk, masks); if (ret < 0) goto fail_build_affinity; nr_present = ret; @@ -374,13 +371,12 @@ static int irq_build_affinity_masks(unsigned int startvec, unsigned int numvecs, * out vectors. */ if (nr_present >= numvecs) - curvec = firstvec; + curvec = 0; else - curvec = firstvec + nr_present; + curvec = nr_present; cpumask_andnot(npresmsk, cpu_possible_mask, cpu_present_mask); - ret = __irq_build_affinity_masks(curvec, numvecs, firstvec, - node_to_cpumask, npresmsk, nmsk, - masks); + ret = __irq_build_affinity_masks(curvec, numvecs, node_to_cpumask, + npresmsk, nmsk, masks); if (ret >= 0) nr_others = ret; @@ -463,7 +459,7 @@ irq_create_affinity_masks(unsigned int nvecs, struct irq_affinity *affd) unsigned int this_vecs = affd->set_size[i]; int ret; - ret = irq_build_affinity_masks(curvec, this_vecs, masks); + ret = irq_build_affinity_masks(this_vecs, &masks[curvec]); if (ret) { kfree(masks); return NULL; From patchwork Tue Dec 27 02:29:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 36736 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp1190142wrt; Mon, 26 Dec 2022 18:32:46 -0800 (PST) X-Google-Smtp-Source: AMrXdXteZtZX+F6Mnawf7/tcAZ/nO1kQdowSiek0ILoh4wfHWbzuKewNIL6z/jaiRZxs3LharAOC X-Received: by 2002:a05:6a20:b91c:b0:af:9dda:b033 with SMTP id fe28-20020a056a20b91c00b000af9ddab033mr20651425pzb.37.1672108366379; Mon, 26 Dec 2022 18:32:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672108366; cv=none; d=google.com; s=arc-20160816; b=GiG0I9kuvnKTj7PvSf4ksk3250Nb9mSE6LzuBPrl2ubmYi5j+ZEHdQoDW1/89Jc5dL vMhP0s10RzhzqTD38wTwOZqrQ3XWA0ddIpA4o+87+GCmqlKCsUMwKmEp5wSnVTFPIerv +wzjpC5N4lpSKvbWtalmoBCncvsvPM6RE35b96MwhW7WWP5Y0DI8hsRWusqdAy+pqom5 G7IVELxXq1zmpHqbsI6H2yNzEdAON+lISyuwDteAaF709tU0OsMkiHWiACWcKsFHlbXj DWMdtTdrGa4XhcPXH2xJCRfW5HPIJgF1JFISqZ3mK0/S+DDslduZ5jA+mnV81lctdNDf AekQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=FOpPDdJYfnKmifqA1VNKd/d9z12/7/aNV7k1KeSJt4w=; b=Tg4U+xbgGPZozQzINWZ4x1BmffWH4zdS672OWxd6vbKaGLKWeVYTOnRa98A3cQJxan NyvKGrEhI1RktzhjAE1nmxfvmTXOj1nfhUbPtqePX0L7IW91kstIwJ15//OWbsskWwuY dY+XiiTsDtb5y7GY8/H8HVQJ13jKBV1ClUJUln7Lj1jM3bvwdQ3A1wmzThgbMJDtMm0L ry0qOKtCuFJysM6E3JEKjg4CKcL/z1n/6NSNu68mQGeDV8yau9upNwYfxqH4wL5a/3fg bYZbUv8w+nCtD7C6AFsYre4zLPYRIUx2hcbtppQO7Nd9blS27LlbPSWMTYOa4Ug29f6F wPRQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Ka+Rt3t3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h71-20020a63834a000000b004995b7e4dcfsi4847875pge.436.2022.12.26.18.32.34; Mon, 26 Dec 2022 18:32:46 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Ka+Rt3t3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232651AbiL0CaZ (ORCPT + 99 others); Mon, 26 Dec 2022 21:30:25 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53638 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232540AbiL0CaP (ORCPT ); Mon, 26 Dec 2022 21:30:15 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0E5856460 for ; Mon, 26 Dec 2022 18:29:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1672108170; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FOpPDdJYfnKmifqA1VNKd/d9z12/7/aNV7k1KeSJt4w=; b=Ka+Rt3t3uHJk4atCav6OUB+5AIfPtXUg9+NXnuqwkIelRCzYi3g3HjWUjWTOLa0f7OQe97 8A+qAg58SvnU9WWdGNTffFApHE3xjQtTS0HC8nXInyoZH6DAm86mmMXvFTW7SFis54mDyC SFoRMHGjbwNOyRDoHsg2rQronxAxqWQ= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-665-nyGOiGg3Mqyxp3eB5EUzpw-1; Mon, 26 Dec 2022 21:29:26 -0500 X-MC-Unique: nyGOiGg3Mqyxp3eB5EUzpw-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 22227101A52E; Tue, 27 Dec 2022 02:29:26 +0000 (UTC) Received: from localhost (ovpn-8-22.pek2.redhat.com [10.72.8.22]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0E305492B00; Tue, 27 Dec 2022 02:29:24 +0000 (UTC) From: Ming Lei To: Thomas Gleixner , Jens Axboe Cc: linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, Christoph Hellwig , John Garry , Ming Lei Subject: [PATCH V4 3/6] genirq/affinity: Don't pass irq_affinity_desc array to irq_build_affinity_masks Date: Tue, 27 Dec 2022 10:29:02 +0800 Message-Id: <20221227022905.352674-4-ming.lei@redhat.com> In-Reply-To: <20221227022905.352674-1-ming.lei@redhat.com> References: <20221227022905.352674-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1753332702458593636?= X-GMAIL-MSGID: =?utf-8?q?1753332702458593636?= Prepare for abstracting irq_build_affinity_masks() into one public helper for assigning all CPUs evenly into several groups. Don't pass irq_affinity_desc array to irq_build_affinity_masks, instead return a cpumask array by storing each assigned group into one element of the array. This way helps us to provide generic interface for grouping all CPUs evenly from NUMA and CPU locality viewpoint, and the cost is one extra allocation in irq_build_affinity_masks(), which should be fine since it is done via GFP_KERNEL and irq_build_affinity_masks() is called very less. Reviewed-by: Christoph Hellwig Signed-off-by: Ming Lei Reviewed-by: John Garry --- kernel/irq/affinity.c | 34 ++++++++++++++++++++++++---------- 1 file changed, 24 insertions(+), 10 deletions(-) diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c index da6379cd27fd..00bba1020ecb 100644 --- a/kernel/irq/affinity.c +++ b/kernel/irq/affinity.c @@ -249,7 +249,7 @@ static int __irq_build_affinity_masks(unsigned int startvec, cpumask_var_t *node_to_cpumask, const struct cpumask *cpu_mask, struct cpumask *nmsk, - struct irq_affinity_desc *masks) + struct cpumask *masks) { unsigned int i, n, nodes, cpus_per_vec, extra_vecs, done = 0; unsigned int last_affv = numvecs; @@ -270,7 +270,7 @@ static int __irq_build_affinity_masks(unsigned int startvec, for_each_node_mask(n, nodemsk) { /* Ensure that only CPUs which are in both masks are set */ cpumask_and(nmsk, cpu_mask, node_to_cpumask[n]); - cpumask_or(&masks[curvec].mask, &masks[curvec].mask, nmsk); + cpumask_or(&masks[curvec], &masks[curvec], nmsk); if (++curvec == last_affv) curvec = 0; } @@ -321,7 +321,7 @@ static int __irq_build_affinity_masks(unsigned int startvec, */ if (curvec >= last_affv) curvec = 0; - irq_spread_init_one(&masks[curvec].mask, nmsk, + irq_spread_init_one(&masks[curvec], nmsk, cpus_per_vec); } done += nv->nvectors; @@ -335,16 +335,16 @@ static int __irq_build_affinity_masks(unsigned int startvec, * 1) spread present CPU on these vectors * 2) spread other possible CPUs on these vectors */ -static int irq_build_affinity_masks(unsigned int numvecs, - struct irq_affinity_desc *masks) +static struct cpumask *irq_build_affinity_masks(unsigned int numvecs) { unsigned int curvec = 0, nr_present = 0, nr_others = 0; cpumask_var_t *node_to_cpumask; cpumask_var_t nmsk, npresmsk; int ret = -ENOMEM; + struct cpumask *masks = NULL; if (!zalloc_cpumask_var(&nmsk, GFP_KERNEL)) - return ret; + return NULL; if (!zalloc_cpumask_var(&npresmsk, GFP_KERNEL)) goto fail_nmsk; @@ -353,6 +353,10 @@ static int irq_build_affinity_masks(unsigned int numvecs, if (!node_to_cpumask) goto fail_npresmsk; + masks = kcalloc(numvecs, sizeof(*masks), GFP_KERNEL); + if (!masks) + goto fail_node_to_cpumask; + /* Stabilize the cpumasks */ cpus_read_lock(); build_node_to_cpumask(node_to_cpumask); @@ -386,6 +390,7 @@ static int irq_build_affinity_masks(unsigned int numvecs, if (ret >= 0) WARN_ON(nr_present + nr_others < numvecs); + fail_node_to_cpumask: free_node_to_cpumask(node_to_cpumask); fail_npresmsk: @@ -393,7 +398,11 @@ static int irq_build_affinity_masks(unsigned int numvecs, fail_nmsk: free_cpumask_var(nmsk); - return ret < 0 ? ret : 0; + if (ret < 0) { + kfree(masks); + return NULL; + } + return masks; } static void default_calc_sets(struct irq_affinity *affd, unsigned int affvecs) @@ -457,13 +466,18 @@ irq_create_affinity_masks(unsigned int nvecs, struct irq_affinity *affd) */ for (i = 0, usedvecs = 0; i < affd->nr_sets; i++) { unsigned int this_vecs = affd->set_size[i]; - int ret; + int j; + struct cpumask *result = irq_build_affinity_masks(this_vecs); - ret = irq_build_affinity_masks(this_vecs, &masks[curvec]); - if (ret) { + if (!result) { kfree(masks); return NULL; } + + for (j = 0; j < this_vecs; j++) + cpumask_copy(&masks[curvec + j].mask, &result[j]); + kfree(result); + curvec += this_vecs; usedvecs += this_vecs; } From patchwork Tue Dec 27 02:29:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 36735 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp1190031wrt; Mon, 26 Dec 2022 18:32:19 -0800 (PST) X-Google-Smtp-Source: AMrXdXsrptDM2waNZ4lkoDPMJ+gJJjcn8QWycMRofRThxdnqykxTdB1mrCElSnjmfD8eooVbMLxO X-Received: by 2002:a05:6a21:3a45:b0:9d:efbe:e607 with SMTP id zu5-20020a056a213a4500b0009defbee607mr25098519pzb.35.1672108339313; Mon, 26 Dec 2022 18:32:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672108339; cv=none; d=google.com; s=arc-20160816; b=OWYW5slBYsyY2aIdMtLcNmm6WBz6XnRtVhXz6A8q1Bwd8HZ9BxP2yb6UELs7ygQpKe g9UI/UNwxG6KnM+2xCUJxLI9CIBaSEOZvivZuArS2K67v1nJO19D++zPQCildy/5zVXR go2NM/UuDRcyR1ihtrQOTZh5qCZZOQ5rK5LhFxWiVQ+u1TDKoDRedPemotMkVn5qswrt hcnKPJlu2h5d5RVDNLFiGvQniHwl2PzsV20bus4iLqgDIVMlAulZCAEG1qLO08qm9vPe O86xqMRwbF/5ZihHHOwFxoZoeaMFakNWKYrH55RiVbfbbDKe5Sz7r4DBg4K30SdpGNRU Dsbw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=o4lfxOJ6KG3xc/e8KvbkoDw3nXjNJwdLqHec0avF2zc=; b=XMqlR04+aEQ0YtZxBISzzoXjuC78PNllXBcFfsZXJSuJANBtBWV0Aw+ZG+phFw1xpO H3vOq03gZqSjfvjpp/9X4qqpdeIyxDnEaFNWBrkdqYF9hVDvjyBZTRiPbsD0EKYifVR5 RS/DpxJV3v/vMScmUc4vCqX6gEFKb80k2HXz8Df0IgGxmCabQzF01CZi39qai+kGwsHw Bh4aDrh+Ah8bVEFI/tLVAHsM94KFCLg7LgpM96wCWbfDS74IA+dxjvW+bVc4A1oS+88r XyZGIDDm5Ys6NwErRGeQRmdnM5s0SVF3dn8sYY11/cwRDyMknMUB5C6iBBLszJhj04Gd pvgQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=SE9G7UEU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q12-20020a170902f34c00b0019248864b1bsi11521872ple.624.2022.12.26.18.32.07; Mon, 26 Dec 2022 18:32:19 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=SE9G7UEU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232783AbiL0Can (ORCPT + 99 others); Mon, 26 Dec 2022 21:30:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53666 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232679AbiL0CaW (ORCPT ); Mon, 26 Dec 2022 21:30:22 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0B0CD6457 for ; Mon, 26 Dec 2022 18:29:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1672108174; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=o4lfxOJ6KG3xc/e8KvbkoDw3nXjNJwdLqHec0avF2zc=; b=SE9G7UEUmaRqllka0d5Sg/QgQeNsfTNzlHR5NiCFM+Xq/g8ZCgyN0i2TmaPRQmccuIB2QD a6t0iXAoYt/K15HT6zMbPfVLRS17OmU25NhIBbH+gP1lK6Mf0BWMWRT9ikDSWe06nziIWP 04HWjEsMUsgRJU77PCSFc58SdPqRC/4= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-256-fMAN8-WHOD6W3UO5nhHYnA-1; Mon, 26 Dec 2022 21:29:30 -0500 X-MC-Unique: fMAN8-WHOD6W3UO5nhHYnA-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 7533E3806723; Tue, 27 Dec 2022 02:29:30 +0000 (UTC) Received: from localhost (ovpn-8-22.pek2.redhat.com [10.72.8.22]) by smtp.corp.redhat.com (Postfix) with ESMTP id 25C8D40C2005; Tue, 27 Dec 2022 02:29:28 +0000 (UTC) From: Ming Lei To: Thomas Gleixner , Jens Axboe Cc: linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, Christoph Hellwig , John Garry , Ming Lei Subject: [PATCH V4 4/6] genirq/affinity: Rename irq_build_affinity_masks as group_cpus_evenly Date: Tue, 27 Dec 2022 10:29:03 +0800 Message-Id: <20221227022905.352674-5-ming.lei@redhat.com> In-Reply-To: <20221227022905.352674-1-ming.lei@redhat.com> References: <20221227022905.352674-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1753332674185610380?= X-GMAIL-MSGID: =?utf-8?q?1753332674185610380?= Map irq vector into group, so we can abstract the algorithm for generic use case. Rename irq_build_affinity_masks as group_cpus_evenly, so we can reuse the API for blk-mq to make default queue mapping even though irq vectors aren't involved. No functional change, just rename vector as group. Reviewed-by: Christoph Hellwig Signed-off-by: Ming Lei --- kernel/irq/affinity.c | 242 +++++++++++++++++++++--------------------- 1 file changed, 121 insertions(+), 121 deletions(-) diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c index 00bba1020ecb..54083331f1bc 100644 --- a/kernel/irq/affinity.c +++ b/kernel/irq/affinity.c @@ -9,13 +9,13 @@ #include #include -static void irq_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk, - unsigned int cpus_per_vec) +static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk, + unsigned int cpus_per_grp) { const struct cpumask *siblmsk; int cpu, sibl; - for ( ; cpus_per_vec > 0; ) { + for ( ; cpus_per_grp > 0; ) { cpu = cpumask_first(nmsk); /* Should not happen, but I'm too lazy to think about it */ @@ -24,18 +24,18 @@ static void irq_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk, cpumask_clear_cpu(cpu, nmsk); cpumask_set_cpu(cpu, irqmsk); - cpus_per_vec--; + cpus_per_grp--; /* If the cpu has siblings, use them first */ siblmsk = topology_sibling_cpumask(cpu); - for (sibl = -1; cpus_per_vec > 0; ) { + for (sibl = -1; cpus_per_grp > 0; ) { sibl = cpumask_next(sibl, siblmsk); if (sibl >= nr_cpu_ids) break; if (!cpumask_test_and_clear_cpu(sibl, nmsk)) continue; cpumask_set_cpu(sibl, irqmsk); - cpus_per_vec--; + cpus_per_grp--; } } } @@ -95,48 +95,48 @@ static int get_nodes_in_cpumask(cpumask_var_t *node_to_cpumask, return nodes; } -struct node_vectors { +struct node_groups { unsigned id; union { - unsigned nvectors; + unsigned ngroups; unsigned ncpus; }; }; static int ncpus_cmp_func(const void *l, const void *r) { - const struct node_vectors *ln = l; - const struct node_vectors *rn = r; + const struct node_groups *ln = l; + const struct node_groups *rn = r; return ln->ncpus - rn->ncpus; } /* - * Allocate vector number for each node, so that for each node: + * Allocate group number for each node, so that for each node: * * 1) the allocated number is >= 1 * - * 2) the allocated numbver is <= active CPU number of this node + * 2) the allocated number is <= active CPU number of this node * - * The actual allocated total vectors may be less than @numvecs when - * active total CPU number is less than @numvecs. + * The actual allocated total groups may be less than @numgrps when + * active total CPU number is less than @numgrps. * * Active CPUs means the CPUs in '@cpu_mask AND @node_to_cpumask[]' * for each node. */ -static void alloc_nodes_vectors(unsigned int numvecs, - cpumask_var_t *node_to_cpumask, - const struct cpumask *cpu_mask, - const nodemask_t nodemsk, - struct cpumask *nmsk, - struct node_vectors *node_vectors) +static void alloc_nodes_groups(unsigned int numgrps, + cpumask_var_t *node_to_cpumask, + const struct cpumask *cpu_mask, + const nodemask_t nodemsk, + struct cpumask *nmsk, + struct node_groups *node_groups) { unsigned n, remaining_ncpus = 0; for (n = 0; n < nr_node_ids; n++) { - node_vectors[n].id = n; - node_vectors[n].ncpus = UINT_MAX; + node_groups[n].id = n; + node_groups[n].ncpus = UINT_MAX; } for_each_node_mask(n, nodemsk) { @@ -148,61 +148,61 @@ static void alloc_nodes_vectors(unsigned int numvecs, if (!ncpus) continue; remaining_ncpus += ncpus; - node_vectors[n].ncpus = ncpus; + node_groups[n].ncpus = ncpus; } - numvecs = min_t(unsigned, remaining_ncpus, numvecs); + numgrps = min_t(unsigned, remaining_ncpus, numgrps); - sort(node_vectors, nr_node_ids, sizeof(node_vectors[0]), + sort(node_groups, nr_node_ids, sizeof(node_groups[0]), ncpus_cmp_func, NULL); /* - * Allocate vectors for each node according to the ratio of this - * node's nr_cpus to remaining un-assigned ncpus. 'numvecs' is + * Allocate groups for each node according to the ratio of this + * node's nr_cpus to remaining un-assigned ncpus. 'numgrps' is * bigger than number of active numa nodes. Always start the * allocation from the node with minimized nr_cpus. * * This way guarantees that each active node gets allocated at - * least one vector, and the theory is simple: over-allocation - * is only done when this node is assigned by one vector, so - * other nodes will be allocated >= 1 vector, since 'numvecs' is + * least one group, and the theory is simple: over-allocation + * is only done when this node is assigned by one group, so + * other nodes will be allocated >= 1 groups, since 'numgrps' is * bigger than number of numa nodes. * - * One perfect invariant is that number of allocated vectors for + * One perfect invariant is that number of allocated groups for * each node is <= CPU count of this node: * * 1) suppose there are two nodes: A and B * ncpu(X) is CPU count of node X - * vecs(X) is the vector count allocated to node X via this + * grps(X) is the group count allocated to node X via this * algorithm * * ncpu(A) <= ncpu(B) * ncpu(A) + ncpu(B) = N - * vecs(A) + vecs(B) = V + * grps(A) + grps(B) = G * - * vecs(A) = max(1, round_down(V * ncpu(A) / N)) - * vecs(B) = V - vecs(A) + * grps(A) = max(1, round_down(G * ncpu(A) / N)) + * grps(B) = G - grps(A) * - * both N and V are integer, and 2 <= V <= N, suppose - * V = N - delta, and 0 <= delta <= N - 2 + * both N and G are integer, and 2 <= G <= N, suppose + * G = N - delta, and 0 <= delta <= N - 2 * - * 2) obviously vecs(A) <= ncpu(A) because: + * 2) obviously grps(A) <= ncpu(A) because: * - * if vecs(A) is 1, then vecs(A) <= ncpu(A) given + * if grps(A) is 1, then grps(A) <= ncpu(A) given * ncpu(A) >= 1 * * otherwise, - * vecs(A) <= V * ncpu(A) / N <= ncpu(A), given V <= N + * grps(A) <= G * ncpu(A) / N <= ncpu(A), given G <= N * - * 3) prove how vecs(B) <= ncpu(B): + * 3) prove how grps(B) <= ncpu(B): * - * if round_down(V * ncpu(A) / N) == 0, vecs(B) won't be - * over-allocated, so vecs(B) <= ncpu(B), + * if round_down(G * ncpu(A) / N) == 0, vecs(B) won't be + * over-allocated, so grps(B) <= ncpu(B), * * otherwise: * - * vecs(A) = - * round_down(V * ncpu(A) / N) = + * grps(A) = + * round_down(G * ncpu(A) / N) = * round_down((N - delta) * ncpu(A) / N) = * round_down((N * ncpu(A) - delta * ncpu(A)) / N) >= * round_down((N * ncpu(A) - delta * N) / N) = @@ -210,52 +210,50 @@ static void alloc_nodes_vectors(unsigned int numvecs, * * then: * - * vecs(A) - V >= ncpu(A) - delta - V + * grps(A) - G >= ncpu(A) - delta - G * => - * V - vecs(A) <= V + delta - ncpu(A) + * G - grps(A) <= G + delta - ncpu(A) * => - * vecs(B) <= N - ncpu(A) + * grps(B) <= N - ncpu(A) * => - * vecs(B) <= cpu(B) + * grps(B) <= cpu(B) * * For nodes >= 3, it can be thought as one node and another big * node given that is exactly what this algorithm is implemented, - * and we always re-calculate 'remaining_ncpus' & 'numvecs', and - * finally for each node X: vecs(X) <= ncpu(X). + * and we always re-calculate 'remaining_ncpus' & 'numgrps', and + * finally for each node X: grps(X) <= ncpu(X). * */ for (n = 0; n < nr_node_ids; n++) { - unsigned nvectors, ncpus; + unsigned ngroups, ncpus; - if (node_vectors[n].ncpus == UINT_MAX) + if (node_groups[n].ncpus == UINT_MAX) continue; - WARN_ON_ONCE(numvecs == 0); + WARN_ON_ONCE(numgrps == 0); - ncpus = node_vectors[n].ncpus; - nvectors = max_t(unsigned, 1, - numvecs * ncpus / remaining_ncpus); - WARN_ON_ONCE(nvectors > ncpus); + ncpus = node_groups[n].ncpus; + ngroups = max_t(unsigned, 1, + numgrps * ncpus / remaining_ncpus); + WARN_ON_ONCE(ngroups > ncpus); - node_vectors[n].nvectors = nvectors; + node_groups[n].ngroups = ngroups; remaining_ncpus -= ncpus; - numvecs -= nvectors; + numgrps -= ngroups; } } -static int __irq_build_affinity_masks(unsigned int startvec, - unsigned int numvecs, - cpumask_var_t *node_to_cpumask, - const struct cpumask *cpu_mask, - struct cpumask *nmsk, - struct cpumask *masks) +static int __group_cpus_evenly(unsigned int startgrp, unsigned int numgrps, + cpumask_var_t *node_to_cpumask, + const struct cpumask *cpu_mask, + struct cpumask *nmsk, struct cpumask *masks) { - unsigned int i, n, nodes, cpus_per_vec, extra_vecs, done = 0; - unsigned int last_affv = numvecs; - unsigned int curvec = startvec; + unsigned int i, n, nodes, cpus_per_grp, extra_grps, done = 0; + unsigned int last_grp = numgrps; + unsigned int curgrp = startgrp; nodemask_t nodemsk = NODE_MASK_NONE; - struct node_vectors *node_vectors; + struct node_groups *node_groups; if (cpumask_empty(cpu_mask)) return 0; @@ -264,34 +262,33 @@ static int __irq_build_affinity_masks(unsigned int startvec, /* * If the number of nodes in the mask is greater than or equal the - * number of vectors we just spread the vectors across the nodes. + * number of groups we just spread the groups across the nodes. */ - if (numvecs <= nodes) { + if (numgrps <= nodes) { for_each_node_mask(n, nodemsk) { /* Ensure that only CPUs which are in both masks are set */ cpumask_and(nmsk, cpu_mask, node_to_cpumask[n]); - cpumask_or(&masks[curvec], &masks[curvec], nmsk); - if (++curvec == last_affv) - curvec = 0; + cpumask_or(&masks[curgrp], &masks[curgrp], nmsk); + if (++curgrp == last_grp) + curgrp = 0; } - return numvecs; + return numgrps; } - node_vectors = kcalloc(nr_node_ids, - sizeof(struct node_vectors), + node_groups = kcalloc(nr_node_ids, + sizeof(struct node_groups), GFP_KERNEL); - if (!node_vectors) + if (!node_groups) return -ENOMEM; - /* allocate vector number for each node */ - alloc_nodes_vectors(numvecs, node_to_cpumask, cpu_mask, - nodemsk, nmsk, node_vectors); - + /* allocate group number for each node */ + alloc_nodes_groups(numgrps, node_to_cpumask, cpu_mask, + nodemsk, nmsk, node_groups); for (i = 0; i < nr_node_ids; i++) { unsigned int ncpus, v; - struct node_vectors *nv = &node_vectors[i]; + struct node_groups *nv = &node_groups[i]; - if (nv->nvectors == UINT_MAX) + if (nv->ngroups == UINT_MAX) continue; /* Get the cpus on this node which are in the mask */ @@ -300,44 +297,47 @@ static int __irq_build_affinity_masks(unsigned int startvec, if (!ncpus) continue; - WARN_ON_ONCE(nv->nvectors > ncpus); + WARN_ON_ONCE(nv->ngroups > ncpus); /* Account for rounding errors */ - extra_vecs = ncpus - nv->nvectors * (ncpus / nv->nvectors); + extra_grps = ncpus - nv->ngroups * (ncpus / nv->ngroups); - /* Spread allocated vectors on CPUs of the current node */ - for (v = 0; v < nv->nvectors; v++, curvec++) { - cpus_per_vec = ncpus / nv->nvectors; + /* Spread allocated groups on CPUs of the current node */ + for (v = 0; v < nv->ngroups; v++, curgrp++) { + cpus_per_grp = ncpus / nv->ngroups; - /* Account for extra vectors to compensate rounding errors */ - if (extra_vecs) { - cpus_per_vec++; - --extra_vecs; + /* Account for extra groups to compensate rounding errors */ + if (extra_grps) { + cpus_per_grp++; + --extra_grps; } /* - * wrapping has to be considered given 'startvec' + * wrapping has to be considered given 'startgrp' * may start anywhere */ - if (curvec >= last_affv) - curvec = 0; - irq_spread_init_one(&masks[curvec], nmsk, - cpus_per_vec); + if (curgrp >= last_grp) + curgrp = 0; + grp_spread_init_one(&masks[curgrp], nmsk, + cpus_per_grp); } - done += nv->nvectors; + done += nv->ngroups; } - kfree(node_vectors); + kfree(node_groups); return done; } /* - * build affinity in two stages: - * 1) spread present CPU on these vectors - * 2) spread other possible CPUs on these vectors + * build affinity in two stages for each group, and try to put close CPUs + * in viewpoint of CPU and NUMA locality into same group, and we run + * two-stage grouping: + * + * 1) allocate present CPUs on these groups evenly first + * 2) allocate other possible CPUs on these groups evenly */ -static struct cpumask *irq_build_affinity_masks(unsigned int numvecs) +static struct cpumask *group_cpus_evenly(unsigned int numgrps) { - unsigned int curvec = 0, nr_present = 0, nr_others = 0; + unsigned int curgrp = 0, nr_present = 0, nr_others = 0; cpumask_var_t *node_to_cpumask; cpumask_var_t nmsk, npresmsk; int ret = -ENOMEM; @@ -353,7 +353,7 @@ static struct cpumask *irq_build_affinity_masks(unsigned int numvecs) if (!node_to_cpumask) goto fail_npresmsk; - masks = kcalloc(numvecs, sizeof(*masks), GFP_KERNEL); + masks = kcalloc(numgrps, sizeof(*masks), GFP_KERNEL); if (!masks) goto fail_node_to_cpumask; @@ -361,26 +361,26 @@ static struct cpumask *irq_build_affinity_masks(unsigned int numvecs) cpus_read_lock(); build_node_to_cpumask(node_to_cpumask); - /* Spread on present CPUs starting from affd->pre_vectors */ - ret = __irq_build_affinity_masks(curvec, numvecs, node_to_cpumask, - cpu_present_mask, nmsk, masks); + /* grouping present CPUs first */ + ret = __group_cpus_evenly(curgrp, numgrps, node_to_cpumask, + cpu_present_mask, nmsk, masks); if (ret < 0) goto fail_build_affinity; nr_present = ret; /* - * Spread on non present CPUs starting from the next vector to be - * handled. If the spreading of present CPUs already exhausted the - * vector space, assign the non present CPUs to the already spread - * out vectors. + * Allocate non present CPUs starting from the next group to be + * handled. If the grouping of present CPUs already exhausted the + * group space, assign the non present CPUs to the already + * allocated out groups. */ - if (nr_present >= numvecs) - curvec = 0; + if (nr_present >= numgrps) + curgrp = 0; else - curvec = nr_present; + curgrp = nr_present; cpumask_andnot(npresmsk, cpu_possible_mask, cpu_present_mask); - ret = __irq_build_affinity_masks(curvec, numvecs, node_to_cpumask, - npresmsk, nmsk, masks); + ret = __group_cpus_evenly(curgrp, numgrps, node_to_cpumask, + npresmsk, nmsk, masks); if (ret >= 0) nr_others = ret; @@ -388,7 +388,7 @@ static struct cpumask *irq_build_affinity_masks(unsigned int numvecs) cpus_read_unlock(); if (ret >= 0) - WARN_ON(nr_present + nr_others < numvecs); + WARN_ON(nr_present + nr_others < numgrps); fail_node_to_cpumask: free_node_to_cpumask(node_to_cpumask); @@ -467,7 +467,7 @@ irq_create_affinity_masks(unsigned int nvecs, struct irq_affinity *affd) for (i = 0, usedvecs = 0; i < affd->nr_sets; i++) { unsigned int this_vecs = affd->set_size[i]; int j; - struct cpumask *result = irq_build_affinity_masks(this_vecs); + struct cpumask *result = group_cpus_evenly(this_vecs); if (!result) { kfree(masks); From patchwork Tue Dec 27 02:29:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 36738 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp1190524wrt; Mon, 26 Dec 2022 18:34:24 -0800 (PST) X-Google-Smtp-Source: AMrXdXsMC1Q1GzQj1WaHqmQAjF8yV7nhaLBiZiFIFl60owAEjKZAT1sbKEsEQSt+xalckfFjeqoU X-Received: by 2002:a17:90a:9bc7:b0:219:8144:7965 with SMTP id b7-20020a17090a9bc700b0021981447965mr22977418pjw.17.1672108464642; Mon, 26 Dec 2022 18:34:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672108464; cv=none; d=google.com; s=arc-20160816; b=FHQt0N/xUZZVYlFeMCH8xPNt5pJBI936bP7vFY8Jk2/JYXutd2yMao0OnU+eOPEn9y t+A1zR2wTvI4jwDbbZR4TQJjX5AQ+JEu6MUy731C+XE2/ZLdBv67Vu7/5HUmUyritfXY Gh6ez0/FJ2Gc6y1RWApOx4E/DsRYBsmbCA0x79jM2oNjao1jnKvv5IxAib7fbmb2S7/m vDYBUOGr3kp+UleurT07aW7NaI4H5Ay4587rRbaRl+4ZDeaZ+KLlfjBMaVTXCKck0crf V3eHx0f7YIaXAhRoZuYBI0kOYRzS875/mnej2pZ0kXi6vkj/6xlSGVMxqauBqalz07/T NzWg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=st/Ppd9xFjCa5g7EoOG6hJl1d4pQHA9WlVf2GNzTSfY=; b=s3szhtogpjdeuLkk5Sv7KDDjYAD9qT9d8jtvBfXxAsf7FG4DFPoD4Wr6U5QCgJ5A9I PbcHpMG6AS3rD4XQm95Tho5f/22Ggmm6s3xNBCq8bEhY05CYmu+L8myAtFSDd/GJ3Ye9 TVStETfdLXSwFkMAekK2p/d//PilCBPsHenelX0KU9cvaGyI9C+d3UkB2FKVczeZNBFj uYnPqi6kpqfzoN7VllXivgshsToNKUTHHPvInRR7nZBb9aykLLRMe/2Q9Cj4Z7sbqGfG Sux6zG9wRD5m0my25vsqqprfA/wH5E266ErsIQaJN0EyA2spw9b0JbrqgwaKwBUrWqEu /50w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="YgzVy/q+"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bv17-20020a17090af19100b002233101b2aesi12256926pjb.53.2022.12.26.18.34.12; Mon, 26 Dec 2022 18:34:24 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="YgzVy/q+"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232728AbiL0CbC (ORCPT + 99 others); Mon, 26 Dec 2022 21:31:02 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53428 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232725AbiL0Caa (ORCPT ); Mon, 26 Dec 2022 21:30:30 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9B2BF210 for ; Mon, 26 Dec 2022 18:29:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1672108180; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=st/Ppd9xFjCa5g7EoOG6hJl1d4pQHA9WlVf2GNzTSfY=; b=YgzVy/q+FHr6LYNeqi9da26C2Tr6RZQ2VbNNPJNicL3g4/fA3t474AhJ5ySQ+AQLg7ZpYR v9mDby4XnEwebm5WY6arRXyzJ9b+kBMSiYSjxJCu5cS32n8NAVIsIJK0iGpLehbWOKxIm6 JYjBhg4Y8VHiWIp6mvLx7wp5NZkjNG8= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-445-rIe0rOvePWiAl2Dl682lHA-1; Mon, 26 Dec 2022 21:29:35 -0500 X-MC-Unique: rIe0rOvePWiAl2Dl682lHA-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id D5DCE3806723; Tue, 27 Dec 2022 02:29:34 +0000 (UTC) Received: from localhost (ovpn-8-22.pek2.redhat.com [10.72.8.22]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0D11640C2004; Tue, 27 Dec 2022 02:29:32 +0000 (UTC) From: Ming Lei To: Thomas Gleixner , Jens Axboe Cc: linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, Christoph Hellwig , John Garry , Ming Lei Subject: [PATCH V4 5/6] genirq/affinity: Move group_cpus_evenly() into lib/ Date: Tue, 27 Dec 2022 10:29:04 +0800 Message-Id: <20221227022905.352674-6-ming.lei@redhat.com> In-Reply-To: <20221227022905.352674-1-ming.lei@redhat.com> References: <20221227022905.352674-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1753332805253818171?= X-GMAIL-MSGID: =?utf-8?q?1753332805253818171?= group_cpus_evenly() has become one generic helper which can be used for other subsystems, so move it into lib/. Reviewed-by: Christoph Hellwig Signed-off-by: Ming Lei --- MAINTAINERS | 2 + include/linux/group_cpus.h | 14 ++ kernel/irq/affinity.c | 398 +--------------------------------- lib/Makefile | 2 + lib/group_cpus.c | 427 +++++++++++++++++++++++++++++++++++++ 5 files changed, 446 insertions(+), 397 deletions(-) create mode 100644 include/linux/group_cpus.h create mode 100644 lib/group_cpus.c diff --git a/MAINTAINERS b/MAINTAINERS index bb77a3ed9d54..2b6ba935f4bd 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -10881,6 +10881,8 @@ L: linux-kernel@vger.kernel.org S: Maintained T: git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq/core F: kernel/irq/ +F: include/linux/group_cpus.h +F: lib/group_cpus.c IRQCHIP DRIVERS M: Thomas Gleixner diff --git a/include/linux/group_cpus.h b/include/linux/group_cpus.h new file mode 100644 index 000000000000..e42807ec61f6 --- /dev/null +++ b/include/linux/group_cpus.h @@ -0,0 +1,14 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2016 Thomas Gleixner. + * Copyright (C) 2016-2017 Christoph Hellwig. + */ + +#ifndef __LINUX_GROUP_CPUS_H +#define __LINUX_GROUP_CPUS_H +#include +#include + +struct cpumask *group_cpus_evenly(unsigned int numgrps); + +#endif diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c index 54083331f1bc..44a4eba80315 100644 --- a/kernel/irq/affinity.c +++ b/kernel/irq/affinity.c @@ -7,403 +7,7 @@ #include #include #include -#include - -static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk, - unsigned int cpus_per_grp) -{ - const struct cpumask *siblmsk; - int cpu, sibl; - - for ( ; cpus_per_grp > 0; ) { - cpu = cpumask_first(nmsk); - - /* Should not happen, but I'm too lazy to think about it */ - if (cpu >= nr_cpu_ids) - return; - - cpumask_clear_cpu(cpu, nmsk); - cpumask_set_cpu(cpu, irqmsk); - cpus_per_grp--; - - /* If the cpu has siblings, use them first */ - siblmsk = topology_sibling_cpumask(cpu); - for (sibl = -1; cpus_per_grp > 0; ) { - sibl = cpumask_next(sibl, siblmsk); - if (sibl >= nr_cpu_ids) - break; - if (!cpumask_test_and_clear_cpu(sibl, nmsk)) - continue; - cpumask_set_cpu(sibl, irqmsk); - cpus_per_grp--; - } - } -} - -static cpumask_var_t *alloc_node_to_cpumask(void) -{ - cpumask_var_t *masks; - int node; - - masks = kcalloc(nr_node_ids, sizeof(cpumask_var_t), GFP_KERNEL); - if (!masks) - return NULL; - - for (node = 0; node < nr_node_ids; node++) { - if (!zalloc_cpumask_var(&masks[node], GFP_KERNEL)) - goto out_unwind; - } - - return masks; - -out_unwind: - while (--node >= 0) - free_cpumask_var(masks[node]); - kfree(masks); - return NULL; -} - -static void free_node_to_cpumask(cpumask_var_t *masks) -{ - int node; - - for (node = 0; node < nr_node_ids; node++) - free_cpumask_var(masks[node]); - kfree(masks); -} - -static void build_node_to_cpumask(cpumask_var_t *masks) -{ - int cpu; - - for_each_possible_cpu(cpu) - cpumask_set_cpu(cpu, masks[cpu_to_node(cpu)]); -} - -static int get_nodes_in_cpumask(cpumask_var_t *node_to_cpumask, - const struct cpumask *mask, nodemask_t *nodemsk) -{ - int n, nodes = 0; - - /* Calculate the number of nodes in the supplied affinity mask */ - for_each_node(n) { - if (cpumask_intersects(mask, node_to_cpumask[n])) { - node_set(n, *nodemsk); - nodes++; - } - } - return nodes; -} - -struct node_groups { - unsigned id; - - union { - unsigned ngroups; - unsigned ncpus; - }; -}; - -static int ncpus_cmp_func(const void *l, const void *r) -{ - const struct node_groups *ln = l; - const struct node_groups *rn = r; - - return ln->ncpus - rn->ncpus; -} - -/* - * Allocate group number for each node, so that for each node: - * - * 1) the allocated number is >= 1 - * - * 2) the allocated number is <= active CPU number of this node - * - * The actual allocated total groups may be less than @numgrps when - * active total CPU number is less than @numgrps. - * - * Active CPUs means the CPUs in '@cpu_mask AND @node_to_cpumask[]' - * for each node. - */ -static void alloc_nodes_groups(unsigned int numgrps, - cpumask_var_t *node_to_cpumask, - const struct cpumask *cpu_mask, - const nodemask_t nodemsk, - struct cpumask *nmsk, - struct node_groups *node_groups) -{ - unsigned n, remaining_ncpus = 0; - - for (n = 0; n < nr_node_ids; n++) { - node_groups[n].id = n; - node_groups[n].ncpus = UINT_MAX; - } - - for_each_node_mask(n, nodemsk) { - unsigned ncpus; - - cpumask_and(nmsk, cpu_mask, node_to_cpumask[n]); - ncpus = cpumask_weight(nmsk); - - if (!ncpus) - continue; - remaining_ncpus += ncpus; - node_groups[n].ncpus = ncpus; - } - - numgrps = min_t(unsigned, remaining_ncpus, numgrps); - - sort(node_groups, nr_node_ids, sizeof(node_groups[0]), - ncpus_cmp_func, NULL); - - /* - * Allocate groups for each node according to the ratio of this - * node's nr_cpus to remaining un-assigned ncpus. 'numgrps' is - * bigger than number of active numa nodes. Always start the - * allocation from the node with minimized nr_cpus. - * - * This way guarantees that each active node gets allocated at - * least one group, and the theory is simple: over-allocation - * is only done when this node is assigned by one group, so - * other nodes will be allocated >= 1 groups, since 'numgrps' is - * bigger than number of numa nodes. - * - * One perfect invariant is that number of allocated groups for - * each node is <= CPU count of this node: - * - * 1) suppose there are two nodes: A and B - * ncpu(X) is CPU count of node X - * grps(X) is the group count allocated to node X via this - * algorithm - * - * ncpu(A) <= ncpu(B) - * ncpu(A) + ncpu(B) = N - * grps(A) + grps(B) = G - * - * grps(A) = max(1, round_down(G * ncpu(A) / N)) - * grps(B) = G - grps(A) - * - * both N and G are integer, and 2 <= G <= N, suppose - * G = N - delta, and 0 <= delta <= N - 2 - * - * 2) obviously grps(A) <= ncpu(A) because: - * - * if grps(A) is 1, then grps(A) <= ncpu(A) given - * ncpu(A) >= 1 - * - * otherwise, - * grps(A) <= G * ncpu(A) / N <= ncpu(A), given G <= N - * - * 3) prove how grps(B) <= ncpu(B): - * - * if round_down(G * ncpu(A) / N) == 0, vecs(B) won't be - * over-allocated, so grps(B) <= ncpu(B), - * - * otherwise: - * - * grps(A) = - * round_down(G * ncpu(A) / N) = - * round_down((N - delta) * ncpu(A) / N) = - * round_down((N * ncpu(A) - delta * ncpu(A)) / N) >= - * round_down((N * ncpu(A) - delta * N) / N) = - * cpu(A) - delta - * - * then: - * - * grps(A) - G >= ncpu(A) - delta - G - * => - * G - grps(A) <= G + delta - ncpu(A) - * => - * grps(B) <= N - ncpu(A) - * => - * grps(B) <= cpu(B) - * - * For nodes >= 3, it can be thought as one node and another big - * node given that is exactly what this algorithm is implemented, - * and we always re-calculate 'remaining_ncpus' & 'numgrps', and - * finally for each node X: grps(X) <= ncpu(X). - * - */ - for (n = 0; n < nr_node_ids; n++) { - unsigned ngroups, ncpus; - - if (node_groups[n].ncpus == UINT_MAX) - continue; - - WARN_ON_ONCE(numgrps == 0); - - ncpus = node_groups[n].ncpus; - ngroups = max_t(unsigned, 1, - numgrps * ncpus / remaining_ncpus); - WARN_ON_ONCE(ngroups > ncpus); - - node_groups[n].ngroups = ngroups; - - remaining_ncpus -= ncpus; - numgrps -= ngroups; - } -} - -static int __group_cpus_evenly(unsigned int startgrp, unsigned int numgrps, - cpumask_var_t *node_to_cpumask, - const struct cpumask *cpu_mask, - struct cpumask *nmsk, struct cpumask *masks) -{ - unsigned int i, n, nodes, cpus_per_grp, extra_grps, done = 0; - unsigned int last_grp = numgrps; - unsigned int curgrp = startgrp; - nodemask_t nodemsk = NODE_MASK_NONE; - struct node_groups *node_groups; - - if (cpumask_empty(cpu_mask)) - return 0; - - nodes = get_nodes_in_cpumask(node_to_cpumask, cpu_mask, &nodemsk); - - /* - * If the number of nodes in the mask is greater than or equal the - * number of groups we just spread the groups across the nodes. - */ - if (numgrps <= nodes) { - for_each_node_mask(n, nodemsk) { - /* Ensure that only CPUs which are in both masks are set */ - cpumask_and(nmsk, cpu_mask, node_to_cpumask[n]); - cpumask_or(&masks[curgrp], &masks[curgrp], nmsk); - if (++curgrp == last_grp) - curgrp = 0; - } - return numgrps; - } - - node_groups = kcalloc(nr_node_ids, - sizeof(struct node_groups), - GFP_KERNEL); - if (!node_groups) - return -ENOMEM; - - /* allocate group number for each node */ - alloc_nodes_groups(numgrps, node_to_cpumask, cpu_mask, - nodemsk, nmsk, node_groups); - for (i = 0; i < nr_node_ids; i++) { - unsigned int ncpus, v; - struct node_groups *nv = &node_groups[i]; - - if (nv->ngroups == UINT_MAX) - continue; - - /* Get the cpus on this node which are in the mask */ - cpumask_and(nmsk, cpu_mask, node_to_cpumask[nv->id]); - ncpus = cpumask_weight(nmsk); - if (!ncpus) - continue; - - WARN_ON_ONCE(nv->ngroups > ncpus); - - /* Account for rounding errors */ - extra_grps = ncpus - nv->ngroups * (ncpus / nv->ngroups); - - /* Spread allocated groups on CPUs of the current node */ - for (v = 0; v < nv->ngroups; v++, curgrp++) { - cpus_per_grp = ncpus / nv->ngroups; - - /* Account for extra groups to compensate rounding errors */ - if (extra_grps) { - cpus_per_grp++; - --extra_grps; - } - - /* - * wrapping has to be considered given 'startgrp' - * may start anywhere - */ - if (curgrp >= last_grp) - curgrp = 0; - grp_spread_init_one(&masks[curgrp], nmsk, - cpus_per_grp); - } - done += nv->ngroups; - } - kfree(node_groups); - return done; -} - -/* - * build affinity in two stages for each group, and try to put close CPUs - * in viewpoint of CPU and NUMA locality into same group, and we run - * two-stage grouping: - * - * 1) allocate present CPUs on these groups evenly first - * 2) allocate other possible CPUs on these groups evenly - */ -static struct cpumask *group_cpus_evenly(unsigned int numgrps) -{ - unsigned int curgrp = 0, nr_present = 0, nr_others = 0; - cpumask_var_t *node_to_cpumask; - cpumask_var_t nmsk, npresmsk; - int ret = -ENOMEM; - struct cpumask *masks = NULL; - - if (!zalloc_cpumask_var(&nmsk, GFP_KERNEL)) - return NULL; - - if (!zalloc_cpumask_var(&npresmsk, GFP_KERNEL)) - goto fail_nmsk; - - node_to_cpumask = alloc_node_to_cpumask(); - if (!node_to_cpumask) - goto fail_npresmsk; - - masks = kcalloc(numgrps, sizeof(*masks), GFP_KERNEL); - if (!masks) - goto fail_node_to_cpumask; - - /* Stabilize the cpumasks */ - cpus_read_lock(); - build_node_to_cpumask(node_to_cpumask); - - /* grouping present CPUs first */ - ret = __group_cpus_evenly(curgrp, numgrps, node_to_cpumask, - cpu_present_mask, nmsk, masks); - if (ret < 0) - goto fail_build_affinity; - nr_present = ret; - - /* - * Allocate non present CPUs starting from the next group to be - * handled. If the grouping of present CPUs already exhausted the - * group space, assign the non present CPUs to the already - * allocated out groups. - */ - if (nr_present >= numgrps) - curgrp = 0; - else - curgrp = nr_present; - cpumask_andnot(npresmsk, cpu_possible_mask, cpu_present_mask); - ret = __group_cpus_evenly(curgrp, numgrps, node_to_cpumask, - npresmsk, nmsk, masks); - if (ret >= 0) - nr_others = ret; - - fail_build_affinity: - cpus_read_unlock(); - - if (ret >= 0) - WARN_ON(nr_present + nr_others < numgrps); - - fail_node_to_cpumask: - free_node_to_cpumask(node_to_cpumask); - - fail_npresmsk: - free_cpumask_var(npresmsk); - - fail_nmsk: - free_cpumask_var(nmsk); - if (ret < 0) { - kfree(masks); - return NULL; - } - return masks; -} +#include static void default_calc_sets(struct irq_affinity *affd, unsigned int affvecs) { diff --git a/lib/Makefile b/lib/Makefile index 59bd7c2f793a..bea177e7b21d 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -355,6 +355,8 @@ obj-$(CONFIG_SBITMAP) += sbitmap.o obj-$(CONFIG_PARMAN) += parman.o +obj-y += group_cpus.o + # GCC library routines obj-$(CONFIG_GENERIC_LIB_ASHLDI3) += ashldi3.o obj-$(CONFIG_GENERIC_LIB_ASHRDI3) += ashrdi3.o diff --git a/lib/group_cpus.c b/lib/group_cpus.c new file mode 100644 index 000000000000..99f08c6cb9d9 --- /dev/null +++ b/lib/group_cpus.c @@ -0,0 +1,427 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2016 Thomas Gleixner. + * Copyright (C) 2016-2017 Christoph Hellwig. + */ +#include +#include +#include +#include +#include + +static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk, + unsigned int cpus_per_grp) +{ + const struct cpumask *siblmsk; + int cpu, sibl; + + for ( ; cpus_per_grp > 0; ) { + cpu = cpumask_first(nmsk); + + /* Should not happen, but I'm too lazy to think about it */ + if (cpu >= nr_cpu_ids) + return; + + cpumask_clear_cpu(cpu, nmsk); + cpumask_set_cpu(cpu, irqmsk); + cpus_per_grp--; + + /* If the cpu has siblings, use them first */ + siblmsk = topology_sibling_cpumask(cpu); + for (sibl = -1; cpus_per_grp > 0; ) { + sibl = cpumask_next(sibl, siblmsk); + if (sibl >= nr_cpu_ids) + break; + if (!cpumask_test_and_clear_cpu(sibl, nmsk)) + continue; + cpumask_set_cpu(sibl, irqmsk); + cpus_per_grp--; + } + } +} + +static cpumask_var_t *alloc_node_to_cpumask(void) +{ + cpumask_var_t *masks; + int node; + + masks = kcalloc(nr_node_ids, sizeof(cpumask_var_t), GFP_KERNEL); + if (!masks) + return NULL; + + for (node = 0; node < nr_node_ids; node++) { + if (!zalloc_cpumask_var(&masks[node], GFP_KERNEL)) + goto out_unwind; + } + + return masks; + +out_unwind: + while (--node >= 0) + free_cpumask_var(masks[node]); + kfree(masks); + return NULL; +} + +static void free_node_to_cpumask(cpumask_var_t *masks) +{ + int node; + + for (node = 0; node < nr_node_ids; node++) + free_cpumask_var(masks[node]); + kfree(masks); +} + +static void build_node_to_cpumask(cpumask_var_t *masks) +{ + int cpu; + + for_each_possible_cpu(cpu) + cpumask_set_cpu(cpu, masks[cpu_to_node(cpu)]); +} + +static int get_nodes_in_cpumask(cpumask_var_t *node_to_cpumask, + const struct cpumask *mask, nodemask_t *nodemsk) +{ + int n, nodes = 0; + + /* Calculate the number of nodes in the supplied affinity mask */ + for_each_node(n) { + if (cpumask_intersects(mask, node_to_cpumask[n])) { + node_set(n, *nodemsk); + nodes++; + } + } + return nodes; +} + +struct node_groups { + unsigned id; + + union { + unsigned ngroups; + unsigned ncpus; + }; +}; + +static int ncpus_cmp_func(const void *l, const void *r) +{ + const struct node_groups *ln = l; + const struct node_groups *rn = r; + + return ln->ncpus - rn->ncpus; +} + +/* + * Allocate group number for each node, so that for each node: + * + * 1) the allocated number is >= 1 + * + * 2) the allocated number is <= active CPU number of this node + * + * The actual allocated total groups may be less than @numgrps when + * active total CPU number is less than @numgrps. + * + * Active CPUs means the CPUs in '@cpu_mask AND @node_to_cpumask[]' + * for each node. + */ +static void alloc_nodes_groups(unsigned int numgrps, + cpumask_var_t *node_to_cpumask, + const struct cpumask *cpu_mask, + const nodemask_t nodemsk, + struct cpumask *nmsk, + struct node_groups *node_groups) +{ + unsigned n, remaining_ncpus = 0; + + for (n = 0; n < nr_node_ids; n++) { + node_groups[n].id = n; + node_groups[n].ncpus = UINT_MAX; + } + + for_each_node_mask(n, nodemsk) { + unsigned ncpus; + + cpumask_and(nmsk, cpu_mask, node_to_cpumask[n]); + ncpus = cpumask_weight(nmsk); + + if (!ncpus) + continue; + remaining_ncpus += ncpus; + node_groups[n].ncpus = ncpus; + } + + numgrps = min_t(unsigned, remaining_ncpus, numgrps); + + sort(node_groups, nr_node_ids, sizeof(node_groups[0]), + ncpus_cmp_func, NULL); + + /* + * Allocate groups for each node according to the ratio of this + * node's nr_cpus to remaining un-assigned ncpus. 'numgrps' is + * bigger than number of active numa nodes. Always start the + * allocation from the node with minimized nr_cpus. + * + * This way guarantees that each active node gets allocated at + * least one group, and the theory is simple: over-allocation + * is only done when this node is assigned by one group, so + * other nodes will be allocated >= 1 groups, since 'numgrps' is + * bigger than number of numa nodes. + * + * One perfect invariant is that number of allocated groups for + * each node is <= CPU count of this node: + * + * 1) suppose there are two nodes: A and B + * ncpu(X) is CPU count of node X + * grps(X) is the group count allocated to node X via this + * algorithm + * + * ncpu(A) <= ncpu(B) + * ncpu(A) + ncpu(B) = N + * grps(A) + grps(B) = G + * + * grps(A) = max(1, round_down(G * ncpu(A) / N)) + * grps(B) = G - grps(A) + * + * both N and G are integer, and 2 <= G <= N, suppose + * G = N - delta, and 0 <= delta <= N - 2 + * + * 2) obviously grps(A) <= ncpu(A) because: + * + * if grps(A) is 1, then grps(A) <= ncpu(A) given + * ncpu(A) >= 1 + * + * otherwise, + * grps(A) <= G * ncpu(A) / N <= ncpu(A), given G <= N + * + * 3) prove how grps(B) <= ncpu(B): + * + * if round_down(G * ncpu(A) / N) == 0, vecs(B) won't be + * over-allocated, so grps(B) <= ncpu(B), + * + * otherwise: + * + * grps(A) = + * round_down(G * ncpu(A) / N) = + * round_down((N - delta) * ncpu(A) / N) = + * round_down((N * ncpu(A) - delta * ncpu(A)) / N) >= + * round_down((N * ncpu(A) - delta * N) / N) = + * cpu(A) - delta + * + * then: + * + * grps(A) - G >= ncpu(A) - delta - G + * => + * G - grps(A) <= G + delta - ncpu(A) + * => + * grps(B) <= N - ncpu(A) + * => + * grps(B) <= cpu(B) + * + * For nodes >= 3, it can be thought as one node and another big + * node given that is exactly what this algorithm is implemented, + * and we always re-calculate 'remaining_ncpus' & 'numgrps', and + * finally for each node X: grps(X) <= ncpu(X). + * + */ + for (n = 0; n < nr_node_ids; n++) { + unsigned ngroups, ncpus; + + if (node_groups[n].ncpus == UINT_MAX) + continue; + + WARN_ON_ONCE(numgrps == 0); + + ncpus = node_groups[n].ncpus; + ngroups = max_t(unsigned, 1, + numgrps * ncpus / remaining_ncpus); + WARN_ON_ONCE(ngroups > ncpus); + + node_groups[n].ngroups = ngroups; + + remaining_ncpus -= ncpus; + numgrps -= ngroups; + } +} + +static int __group_cpus_evenly(unsigned int startgrp, unsigned int numgrps, + cpumask_var_t *node_to_cpumask, + const struct cpumask *cpu_mask, + struct cpumask *nmsk, struct cpumask *masks) +{ + unsigned int i, n, nodes, cpus_per_grp, extra_grps, done = 0; + unsigned int last_grp = numgrps; + unsigned int curgrp = startgrp; + nodemask_t nodemsk = NODE_MASK_NONE; + struct node_groups *node_groups; + + if (cpumask_empty(cpu_mask)) + return 0; + + nodes = get_nodes_in_cpumask(node_to_cpumask, cpu_mask, &nodemsk); + + /* + * If the number of nodes in the mask is greater than or equal the + * number of groups we just spread the groups across the nodes. + */ + if (numgrps <= nodes) { + for_each_node_mask(n, nodemsk) { + /* Ensure that only CPUs which are in both masks are set */ + cpumask_and(nmsk, cpu_mask, node_to_cpumask[n]); + cpumask_or(&masks[curgrp], &masks[curgrp], nmsk); + if (++curgrp == last_grp) + curgrp = 0; + } + return numgrps; + } + + node_groups = kcalloc(nr_node_ids, + sizeof(struct node_groups), + GFP_KERNEL); + if (!node_groups) + return -ENOMEM; + + /* allocate group number for each node */ + alloc_nodes_groups(numgrps, node_to_cpumask, cpu_mask, + nodemsk, nmsk, node_groups); + for (i = 0; i < nr_node_ids; i++) { + unsigned int ncpus, v; + struct node_groups *nv = &node_groups[i]; + + if (nv->ngroups == UINT_MAX) + continue; + + /* Get the cpus on this node which are in the mask */ + cpumask_and(nmsk, cpu_mask, node_to_cpumask[nv->id]); + ncpus = cpumask_weight(nmsk); + if (!ncpus) + continue; + + WARN_ON_ONCE(nv->ngroups > ncpus); + + /* Account for rounding errors */ + extra_grps = ncpus - nv->ngroups * (ncpus / nv->ngroups); + + /* Spread allocated groups on CPUs of the current node */ + for (v = 0; v < nv->ngroups; v++, curgrp++) { + cpus_per_grp = ncpus / nv->ngroups; + + /* Account for extra groups to compensate rounding errors */ + if (extra_grps) { + cpus_per_grp++; + --extra_grps; + } + + /* + * wrapping has to be considered given 'startgrp' + * may start anywhere + */ + if (curgrp >= last_grp) + curgrp = 0; + grp_spread_init_one(&masks[curgrp], nmsk, + cpus_per_grp); + } + done += nv->ngroups; + } + kfree(node_groups); + return done; +} + +#ifdef CONFIG_SMP +/** + * group_cpus_evenly - Group all CPUs evenly per NUMA/CPU locality + * @numgrps: number of groups + * + * Return: cpumask array if successful, NULL otherwise. And each element + * includes CPUs assigned to this group + * + * Try to put close CPUs from viewpoint of CPU and NUMA locality into + * same group, and run two-stage grouping: + * 1) allocate present CPUs on these groups evenly first + * 2) allocate other possible CPUs on these groups evenly + * + * We guarantee in the resulted grouping that all CPUs are covered, and + * no same CPU is assigned to multiple groups + */ +struct cpumask *group_cpus_evenly(unsigned int numgrps) +{ + unsigned int curgrp = 0, nr_present = 0, nr_others = 0; + cpumask_var_t *node_to_cpumask; + cpumask_var_t nmsk, npresmsk; + int ret = -ENOMEM; + struct cpumask *masks = NULL; + + if (!zalloc_cpumask_var(&nmsk, GFP_KERNEL)) + return NULL; + + if (!zalloc_cpumask_var(&npresmsk, GFP_KERNEL)) + goto fail_nmsk; + + node_to_cpumask = alloc_node_to_cpumask(); + if (!node_to_cpumask) + goto fail_npresmsk; + + masks = kcalloc(numgrps, sizeof(*masks), GFP_KERNEL); + if (!masks) + goto fail_node_to_cpumask; + + /* Stabilize the cpumasks */ + cpus_read_lock(); + build_node_to_cpumask(node_to_cpumask); + + /* grouping present CPUs first */ + ret = __group_cpus_evenly(curgrp, numgrps, node_to_cpumask, + cpu_present_mask, nmsk, masks); + if (ret < 0) + goto fail_build_affinity; + nr_present = ret; + + /* + * Allocate non present CPUs starting from the next group to be + * handled. If the grouping of present CPUs already exhausted the + * group space, assign the non present CPUs to the already + * allocated out groups. + */ + if (nr_present >= numgrps) + curgrp = 0; + else + curgrp = nr_present; + cpumask_andnot(npresmsk, cpu_possible_mask, cpu_present_mask); + ret = __group_cpus_evenly(curgrp, numgrps, node_to_cpumask, + npresmsk, nmsk, masks); + if (ret >= 0) + nr_others = ret; + + fail_build_affinity: + cpus_read_unlock(); + + if (ret >= 0) + WARN_ON(nr_present + nr_others < numgrps); + + fail_node_to_cpumask: + free_node_to_cpumask(node_to_cpumask); + + fail_npresmsk: + free_cpumask_var(npresmsk); + + fail_nmsk: + free_cpumask_var(nmsk); + if (ret < 0) { + kfree(masks); + return NULL; + } + return masks; +} +#else +struct cpumask *group_cpus_evenly(unsigned int numgrps) +{ + struct cpumask *masks = kcalloc(numgrps, sizeof(*masks), GFP_KERNEL); + + if (!masks) + return NULL; + + /* assign all CPUs(cpu 0) to the 1st group only */ + cpumask_copy(&masks[0], cpu_possible_mask); + return masks; +} +#endif From patchwork Tue Dec 27 02:29:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 36734 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp1189783wrt; Mon, 26 Dec 2022 18:31:15 -0800 (PST) X-Google-Smtp-Source: AMrXdXs+6KdUGBJ8fWPo0rOqs/n2HCtvn9ZFU7x+RVCHUWbVwvBTuMz020jxMyQyzy7Ew8J89pJe X-Received: by 2002:a05:6a20:8e0b:b0:ac:706e:178c with SMTP id y11-20020a056a208e0b00b000ac706e178cmr32711426pzj.26.1672108275565; Mon, 26 Dec 2022 18:31:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672108275; cv=none; d=google.com; s=arc-20160816; b=n/1FxzFa4vwtsRJMz3EmOJX+07b1MLUOyW7GS4OrLOzniPlYijuCeR5PI8GJg+Kp+g nOsK3lbgVrSCrMJMPztnrxXljVH5cQ3zPO//IfU6M7V7v/yLDq8IYVApzeKssbvLnnDL uN4oO6QI6hEW580tfqrfKON+ElfsB4jk8pBI8ImC/nMP2fBJhF3zGVTkwYemUKeskGK1 k6+P22OryfPSrA68ikCdls96JBEwLwUofGEdfsdH/XdvNculxsvXO82R1I3KBpjeMnn9 f3kGWiTL+Dd+YSypNA5A0lU9LA02GuScwx9zMlZOKFu14DCbSvRyghBvBmKUb4IkRaqq Rgog== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=zS2731Tqsl5rZOtBPgZHUDyyYtPAz2zgn9cswpCwiq8=; b=K02lXgWNsJnosAeVUz6zFhvBjHB9HYSVt4qvydbqE365uq1PXijJweXM++7lu9hh+w h76pRIy+lzIkeWga94ZAsNSubN3NDYFW2hNO8VTmZLx4N2BkI+LJmK1ePp14swZOvnPc HX/Fr+50VkeM6ADhXyHI9Ze7Wiyrc3S3eReY1tZol/pxrBwx07nVCTlOCtGm/PMPnXVX IS8EgE0hjGViUi5VFVYBTEA08tNs5WK6PFfh9iJ0T3QeSEffLV+nUN5+5dZctsvY2n1D h6Frm7Hpjrd6malYZBNXqNJ0O1vL1ZS7QYW+FUfX6t+O3IQiCnypCVwYJfwCCQ+5Y5zn HtDg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=FUfsN85s; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id t63-20020a635f42000000b0048c4f23869dsi12942142pgb.796.2022.12.26.18.31.03; Mon, 26 Dec 2022 18:31:15 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=FUfsN85s; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232755AbiL0Cah (ORCPT + 99 others); Mon, 26 Dec 2022 21:30:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52568 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232674AbiL0CaW (ORCPT ); Mon, 26 Dec 2022 21:30:22 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 35750B6F for ; Mon, 26 Dec 2022 18:29:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1672108184; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zS2731Tqsl5rZOtBPgZHUDyyYtPAz2zgn9cswpCwiq8=; b=FUfsN85staqCrzoJqLjDtMfHP/pu9gMMEO45DKWabvuINyagncvi5tuTFlC2C5gSr0y5z8 pSMehgCZN8ALb5agIMVPaTvx8u6bKhJf3unkRERrNRtI8MI6CR8gr7hWET6ASVu5cbS3Jc e8pHPAVsLJfpLpjrEGSI5SwPln+s6Gk= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-58-980IQXpiPf-Q_89jrFImbA-1; Mon, 26 Dec 2022 21:29:39 -0500 X-MC-Unique: 980IQXpiPf-Q_89jrFImbA-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 19FD43C0F7E1; Tue, 27 Dec 2022 02:29:39 +0000 (UTC) Received: from localhost (ovpn-8-22.pek2.redhat.com [10.72.8.22]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0957D492C14; Tue, 27 Dec 2022 02:29:37 +0000 (UTC) From: Ming Lei To: Thomas Gleixner , Jens Axboe Cc: linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, Christoph Hellwig , John Garry , Ming Lei Subject: [PATCH V4 6/6] blk-mq: Build default queue map via group_cpus_evenly() Date: Tue, 27 Dec 2022 10:29:05 +0800 Message-Id: <20221227022905.352674-7-ming.lei@redhat.com> In-Reply-To: <20221227022905.352674-1-ming.lei@redhat.com> References: <20221227022905.352674-1-ming.lei@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.9 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1753332607303615231?= X-GMAIL-MSGID: =?utf-8?q?1753332607303615231?= The default queue mapping builder of blk_mq_map_queues doesn't take NUMA topo into account, so the built mapping is pretty bad, since CPUs belonging to different NUMA node are assigned to same queue. It is observed that IOPS drops by ~30% when running two jobs on same hctx of null_blk from two CPUs belonging to two NUMA nodes compared with from same NUMA node. Address the issue by reusing group_cpus_evenly() for building queue mapping since group_cpus_evenly() does group cpus according to CPU/NUMA locality. Also performance data becomes more stable with this patchset given correct queue mapping is applied wrt. numa locality viewpoint, for example, on one two nodes arm64 machine with 160 cpus, node 0(cpu 0~79), node 1(cpu 80~159): 1) modprobe null_blk nr_devices=1 submit_queues=2 2) run 'fio(t/io_uring -p 0 -n 4 -r 20 /dev/nullb0)', and observe that IOPS becomes much stable on multiple tests: - without patched: IOPS is 2.5M ~ 4.5M - patched: IOPS is 4.3 ~ 5M Lots of drivers may benefit from the change, such as nvme pci poll, nvme tcp, ... Reviewed-by: Christoph Hellwig Signed-off-by: Ming Lei Reviewed-by: John Garry --- block/blk-mq-cpumap.c | 63 +++++++++---------------------------------- 1 file changed, 13 insertions(+), 50 deletions(-) diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c index 9c2fce1a7b50..0c612c19feb8 100644 --- a/block/blk-mq-cpumap.c +++ b/block/blk-mq-cpumap.c @@ -10,66 +10,29 @@ #include #include #include +#include #include #include "blk.h" #include "blk-mq.h" -static int queue_index(struct blk_mq_queue_map *qmap, - unsigned int nr_queues, const int q) -{ - return qmap->queue_offset + (q % nr_queues); -} - -static int get_first_sibling(unsigned int cpu) -{ - unsigned int ret; - - ret = cpumask_first(topology_sibling_cpumask(cpu)); - if (ret < nr_cpu_ids) - return ret; - - return cpu; -} - void blk_mq_map_queues(struct blk_mq_queue_map *qmap) { - unsigned int *map = qmap->mq_map; - unsigned int nr_queues = qmap->nr_queues; - unsigned int cpu, first_sibling, q = 0; - - for_each_possible_cpu(cpu) - map[cpu] = -1; - - /* - * Spread queues among present CPUs first for minimizing - * count of dead queues which are mapped by all un-present CPUs - */ - for_each_present_cpu(cpu) { - if (q >= nr_queues) - break; - map[cpu] = queue_index(qmap, nr_queues, q++); + const struct cpumask *masks; + unsigned int queue, cpu; + + masks = group_cpus_evenly(qmap->nr_queues); + if (!masks) { + for_each_possible_cpu(cpu) + qmap->mq_map[cpu] = qmap->queue_offset; + return; } - for_each_possible_cpu(cpu) { - if (map[cpu] != -1) - continue; - /* - * First do sequential mapping between CPUs and queues. - * In case we still have CPUs to map, and we have some number of - * threads per cores then map sibling threads to the same queue - * for performance optimizations. - */ - if (q < nr_queues) { - map[cpu] = queue_index(qmap, nr_queues, q++); - } else { - first_sibling = get_first_sibling(cpu); - if (first_sibling == cpu) - map[cpu] = queue_index(qmap, nr_queues, q++); - else - map[cpu] = map[first_sibling]; - } + for (queue = 0; queue < qmap->nr_queues; queue++) { + for_each_cpu(cpu, &masks[queue]) + qmap->mq_map[cpu] = qmap->queue_offset + queue; } + kfree(masks); } EXPORT_SYMBOL_GPL(blk_mq_map_queues);