Message ID | 20230121042436.2661843-1-yury.norov@gmail.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp566971wrn; Fri, 20 Jan 2023 20:26:08 -0800 (PST) X-Google-Smtp-Source: AMrXdXvKBkq032d56mylM0VP0JjbSwBqmayNokNUjX2Gwp+OU0VCbMVw7gEhr8lHLG66GxOfw+mA X-Received: by 2002:aa7:c9da:0:b0:46d:35f6:5a9b with SMTP id i26-20020aa7c9da000000b0046d35f65a9bmr15899086edt.24.1674275168060; Fri, 20 Jan 2023 20:26:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674275168; cv=none; d=google.com; s=arc-20160816; b=z5EH5sBz0xl6orhNVXRblwsmo/BhXx5dPN9xZLVhNGQjnikDFU4156farzYzkVg1cd P6ISYDaSUai4wk0WJveQm6+01sT9sevM0BQWhc0bGY3ratPEwsCzMREsoZJ9vBCFJ/7e FSNWo3IMaID05/dIbDtuOSxT18U22Jdwgws7hMIGe7q7uaL8u7c6g13PQizktziYCXkG TTJFlvo0VntJJDNemHeSZZ4edrEZmIAOOXV350jdeBp3fCxDN4N/0d6FgPwefN8HwOHc xv6Q7rl81OnFkSopPAbvrlknNbIgUtKHP4wzeYu03n6p6zS2rSdd8P8NiOdrFVdhDvA8 IlXA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=91fcZuIyY4pZHLtPybZ4iBZso0SxeeXvMXKXY9svhwg=; b=DxiFSfWKMElDyB6S01LJKTMn+Yi++N9npRf6M3aGqEgP2/2yT5L2aQgto97j49Biuf v9dahGVW9ayTyjrXd/Bga/Qe4rJwPYeUa4eC1KwTk71qr98CIqd6nzPSCeyM+ASQ+Z3L lE2Oz1J7XPj8cv6crfWzZ8zWXCH0p8hq1GTjtWR/b8v3MvHGZa7kzoEUzPyHfMEEi74f x+sIEhfA1TW+9OQNq8cRE05VrY7dNZI1CtN4UuoP5evXLaeC4SHr+tm28JlVcLhrtz8h BDlMGB/TwvCqn58YtIABqL0NMn0ZgfG3DDfXTatXbqJ1Vtn44t24ENBlUkqf9r380NtH 3URQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=TYn7FhgO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bf16-20020a0564021a5000b0047b79b99e58si41675512edb.446.2023.01.20.20.25.44; Fri, 20 Jan 2023 20:26:08 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=TYn7FhgO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229751AbjAUEYo (ORCPT <rfc822;forouhar.linux@gmail.com> + 99 others); Fri, 20 Jan 2023 23:24:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35690 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229484AbjAUEYl (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 20 Jan 2023 23:24:41 -0500 Received: from mail-qt1-x82c.google.com (mail-qt1-x82c.google.com [IPv6:2607:f8b0:4864:20::82c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BECE74FCF3; Fri, 20 Jan 2023 20:24:40 -0800 (PST) Received: by mail-qt1-x82c.google.com with SMTP id e8so5844985qts.1; Fri, 20 Jan 2023 20:24:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=91fcZuIyY4pZHLtPybZ4iBZso0SxeeXvMXKXY9svhwg=; b=TYn7FhgOP8D6RAWzli28TBrMktadRuInALWG/3EHlqI+hrhC9AqsCTaUMdDDfO/geM ugems9KHeLAfzCfaAED1KeA0/4EYtvW4nsvg/ISYjXJle/zX5Z8iPLBVtiE1eE5SuP7Q 1YfL4aQ2Rz3S0e8uQiN2mnY3bn9VlgQX7HzrRpsF2ddcr0HEwo9Fsn9bUSoQlwwtvC2A XJJhPEqHGjAZtNz4mWv50MpIf/l647nJ+5dsfVl9wkudBkF/+HwCJoOIMH7OPiQOxBwG ji+K56I9UNN7YAGFuSaRajutULSV4Nwn8RwxDkRJPbs+SgvF4eys7Qv5jY4qqJEcvAI6 UCxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=91fcZuIyY4pZHLtPybZ4iBZso0SxeeXvMXKXY9svhwg=; b=Lu2ng4NRtCVFvwY7gizsk4mhG2Jcf4Rza2OWAIFdtqMCEI0PgF2kUg09JoRLoudhLw 85Ox49UajhLjm4W8ar8ImZd4l1TBhDSw9DzRK2R9vXRSul58BNVSnFfhL8lqxT0qvG2L SG0+EMiCKxg78OnxxQu7Xii2XHRI/2Yfpahxl97I80uM1zmnJ7xZ686kgJjxTuhd74kz dS6z0JN1n3rbprXhHYhRWlAbpqjh0Qm+X7BCb7OcJwRVYrkj+29FbOJJBMmVG1YmuI9F 5HwQL/ldWF/kjkoqk+XXzVc0BiJ2ERZ6J+9H3JCkEaT5oKkAPyGbwcMwDViptYbDOa3W ACuQ== X-Gm-Message-State: AFqh2kollBtbtQCU1wuGZOsd4weK/kVGYIjCBZsNUjjHCc2puhivOLrq sXGuGGRDR3UrbY8DX0A8FMYJu4MpqQQ= X-Received: by 2002:ac8:44ac:0:b0:3ae:4e47:52d7 with SMTP id a12-20020ac844ac000000b003ae4e4752d7mr22718279qto.38.1674275079012; Fri, 20 Jan 2023 20:24:39 -0800 (PST) Received: from localhost (50-242-44-45-static.hfc.comcastbusiness.net. [50.242.44.45]) by smtp.gmail.com with ESMTPSA id v7-20020a05620a440700b006fb112f512csm27622452qkp.74.2023.01.20.20.24.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 Jan 2023 20:24:38 -0800 (PST) From: Yury Norov <yury.norov@gmail.com> To: linux-kernel@vger.kernel.org, "David S. Miller" <davem@davemloft.net>, Andy Shevchenko <andriy.shevchenko@linux.intel.com>, Barry Song <baohua@kernel.org>, Ben Segall <bsegall@google.com>, Dietmar Eggemann <dietmar.eggemann@arm.com>, Gal Pressman <gal@nvidia.com>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Haniel Bristot de Oliveira <bristot@redhat.com>, Heiko Carstens <hca@linux.ibm.com>, Ingo Molnar <mingo@redhat.com>, Jacob Keller <jacob.e.keller@intel.com>, Jakub Kicinski <kuba@kernel.org>, Jason Gunthorpe <jgg@nvidia.com>, Jesse Brandeburg <jesse.brandeburg@intel.com>, Jonathan Cameron <Jonathan.Cameron@huawei.com>, Juri Lelli <juri.lelli@redhat.com>, Leon Romanovsky <leonro@nvidia.com>, Linus Torvalds <torvalds@linux-foundation.org>, Mel Gorman <mgorman@suse.de>, Peter Lafreniere <peter@n8pjl.ca>, Peter Zijlstra <peterz@infradead.org>, Rasmus Villemoes <linux@rasmusvillemoes.dk>, Saeed Mahameed <saeedm@nvidia.com>, Steven Rostedt <rostedt@goodmis.org>, Tariq Toukan <tariqt@nvidia.com>, Tariq Toukan <ttoukan.linux@gmail.com>, Tony Luck <tony.luck@intel.com>, Valentin Schneider <vschneid@redhat.com>, Vincent Guittot <vincent.guittot@linaro.org> Cc: Yury Norov <yury.norov@gmail.com>, linux-crypto@vger.kernel.org, netdev@vger.kernel.org, linux-rdma@vger.kernel.org Subject: [PATCH RESEND 0/9] sched: cpumask: improve on cpumask_local_spread() locality Date: Fri, 20 Jan 2023 20:24:27 -0800 Message-Id: <20230121042436.2661843-1-yury.norov@gmail.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755604758365682601?= X-GMAIL-MSGID: =?utf-8?q?1755604758365682601?= |
Series |
sched: cpumask: improve on cpumask_local_spread() locality
|
|
Message
Yury Norov
Jan. 21, 2023, 4:24 a.m. UTC
cpumask_local_spread() currently checks local node for presence of i'th CPU, and then if it finds nothing makes a flat search among all non-local CPUs. We can do it better by checking CPUs per NUMA hops. This has significant performance implications on NUMA machines, for example when using NUMA-aware allocated memory together with NUMA-aware IRQ affinity hints. Performance tests from patch 8 of this series for mellanox network driver show: TCP multi-stream, using 16 iperf3 instances pinned to 16 cores (with aRFS on). Active cores: 64,65,72,73,80,81,88,89,96,97,104,105,112,113,120,121 +-------------------------+-----------+------------------+------------------+ | | BW (Gbps) | TX side CPU util | RX side CPU util | +-------------------------+-----------+------------------+------------------+ | Baseline | 52.3 | 6.4 % | 17.9 % | +-------------------------+-----------+------------------+------------------+ | Applied on TX side only | 52.6 | 5.2 % | 18.5 % | +-------------------------+-----------+------------------+------------------+ | Applied on RX side only | 94.9 | 11.9 % | 27.2 % | +-------------------------+-----------+------------------+------------------+ | Applied on both sides | 95.1 | 8.4 % | 27.3 % | +-------------------------+-----------+------------------+------------------+ Bottleneck in RX side is released, reached linerate (~1.8x speedup). ~30% less cpu util on TX. This series was supposed to be included in v6.2, but that didn't happen. It spent enough in -next without any issues, so I hope we'll finally see it in v6.3. I believe, the best way would be moving it with scheduler patches, but I'm OK to try again with bitmap branch as well. Tariq Toukan (1): net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints Valentin Schneider (2): sched/topology: Introduce sched_numa_hop_mask() sched/topology: Introduce for_each_numa_hop_mask() Yury Norov (6): lib/find: introduce find_nth_and_andnot_bit cpumask: introduce cpumask_nth_and_andnot sched: add sched_numa_find_nth_cpu() cpumask: improve on cpumask_local_spread() locality lib/cpumask: reorganize cpumask_local_spread() logic lib/cpumask: update comment for cpumask_local_spread() drivers/net/ethernet/mellanox/mlx5/core/eq.c | 18 +++- include/linux/cpumask.h | 20 +++++ include/linux/find.h | 33 +++++++ include/linux/topology.h | 33 +++++++ kernel/sched/topology.c | 90 ++++++++++++++++++++ lib/cpumask.c | 52 ++++++----- lib/find_bit.c | 9 ++ 7 files changed, 230 insertions(+), 25 deletions(-)
Comments
On 21/01/2023 6:24, Yury Norov wrote: > cpumask_local_spread() currently checks local node for presence of i'th > CPU, and then if it finds nothing makes a flat search among all non-local > CPUs. We can do it better by checking CPUs per NUMA hops. > > This has significant performance implications on NUMA machines, for example > when using NUMA-aware allocated memory together with NUMA-aware IRQ > affinity hints. > > Performance tests from patch 8 of this series for mellanox network > driver show: > > TCP multi-stream, using 16 iperf3 instances pinned to 16 cores (with aRFS on). > Active cores: 64,65,72,73,80,81,88,89,96,97,104,105,112,113,120,121 > > +-------------------------+-----------+------------------+------------------+ > | | BW (Gbps) | TX side CPU util | RX side CPU util | > +-------------------------+-----------+------------------+------------------+ > | Baseline | 52.3 | 6.4 % | 17.9 % | > +-------------------------+-----------+------------------+------------------+ > | Applied on TX side only | 52.6 | 5.2 % | 18.5 % | > +-------------------------+-----------+------------------+------------------+ > | Applied on RX side only | 94.9 | 11.9 % | 27.2 % | > +-------------------------+-----------+------------------+------------------+ > | Applied on both sides | 95.1 | 8.4 % | 27.3 % | > +-------------------------+-----------+------------------+------------------+ > > Bottleneck in RX side is released, reached linerate (~1.8x speedup). > ~30% less cpu util on TX. > > This series was supposed to be included in v6.2, but that didn't happen. It > spent enough in -next without any issues, so I hope we'll finally see it > in v6.3. > > I believe, the best way would be moving it with scheduler patches, but I'm > OK to try again with bitmap branch as well. Now that Yury dropped several controversial bitmap patches form the PR, the rest are mostly in sched, or new API that's used by sched. Valentin, what do you think? Can you take it to your sched branch? > > Tariq Toukan (1): > net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity > hints > > Valentin Schneider (2): > sched/topology: Introduce sched_numa_hop_mask() > sched/topology: Introduce for_each_numa_hop_mask() > > Yury Norov (6): > lib/find: introduce find_nth_and_andnot_bit > cpumask: introduce cpumask_nth_and_andnot > sched: add sched_numa_find_nth_cpu() > cpumask: improve on cpumask_local_spread() locality > lib/cpumask: reorganize cpumask_local_spread() logic > lib/cpumask: update comment for cpumask_local_spread() > > drivers/net/ethernet/mellanox/mlx5/core/eq.c | 18 +++- > include/linux/cpumask.h | 20 +++++ > include/linux/find.h | 33 +++++++ > include/linux/topology.h | 33 +++++++ > kernel/sched/topology.c | 90 ++++++++++++++++++++ > lib/cpumask.c | 52 ++++++----- > lib/find_bit.c | 9 ++ > 7 files changed, 230 insertions(+), 25 deletions(-) >
On 22/01/23 14:57, Tariq Toukan wrote: > On 21/01/2023 6:24, Yury Norov wrote: >> >> This series was supposed to be included in v6.2, but that didn't happen. It >> spent enough in -next without any issues, so I hope we'll finally see it >> in v6.3. >> >> I believe, the best way would be moving it with scheduler patches, but I'm >> OK to try again with bitmap branch as well. > > Now that Yury dropped several controversial bitmap patches form the PR, > the rest are mostly in sched, or new API that's used by sched. > > Valentin, what do you think? Can you take it to your sched branch? > I would if I had one :-) Peter/Ingo, any objections to stashing this in tip/sched/core?
On 23/01/2023 11:57, Valentin Schneider wrote: > On 22/01/23 14:57, Tariq Toukan wrote: >> On 21/01/2023 6:24, Yury Norov wrote: >>> >>> This series was supposed to be included in v6.2, but that didn't happen. It >>> spent enough in -next without any issues, so I hope we'll finally see it >>> in v6.3. >>> >>> I believe, the best way would be moving it with scheduler patches, but I'm >>> OK to try again with bitmap branch as well. >> >> Now that Yury dropped several controversial bitmap patches form the PR, >> the rest are mostly in sched, or new API that's used by sched. >> >> Valentin, what do you think? Can you take it to your sched branch? >> > > I would if I had one :-) > Oh I see :) > Peter/Ingo, any objections to stashing this in tip/sched/core? > Hi Peter and Ingo, Can you please look into it? So we'll have enough time to act (in case...) during this kernel. We already missed one kernel... Thanks, Tariq
On Sun, 29 Jan 2023 10:07:58 +0200 Tariq Toukan wrote: > > Peter/Ingo, any objections to stashing this in tip/sched/core? > > Can you please look into it? So we'll have enough time to act (in > case...) during this kernel. > > We already missed one kernel... We really need this in linux-next by the end of the week. PTAL.
On Mon, 30 Jan 2023 12:22:06 -0800 Jakub Kicinski wrote: > On Sun, 29 Jan 2023 10:07:58 +0200 Tariq Toukan wrote: > > > Peter/Ingo, any objections to stashing this in tip/sched/core? > > > > Can you please look into it? So we'll have enough time to act (in > > case...) during this kernel. > > > > We already missed one kernel... > > We really need this in linux-next by the end of the week. PTAL. Peter, could you please take a look? Linux doesn't have an API for basic, common sense IRQ distribution on AMD systems. It's important :(
On Thu, Feb 2, 2023 at 9:33 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Mon, 30 Jan 2023 12:22:06 -0800 Jakub Kicinski wrote: > > On Sun, 29 Jan 2023 10:07:58 +0200 Tariq Toukan wrote: > > > > Peter/Ingo, any objections to stashing this in tip/sched/core? > > > > > > Can you please look into it? So we'll have enough time to act (in > > > case...) during this kernel. > > > > > > We already missed one kernel... > > > > We really need this in linux-next by the end of the week. PTAL. > > Peter, could you please take a look? Linux doesn't have an API for > basic, common sense IRQ distribution on AMD systems. It's important :( FWIW, it's already been in linux-next since mid-December through the bitmap branch, and no issues were reported so far. Thanks, Yury
On Mon, 23 Jan 2023 09:57:43 +0000 Valentin Schneider wrote: > On 22/01/23 14:57, Tariq Toukan wrote: > > On 21/01/2023 6:24, Yury Norov wrote: > >> > >> This series was supposed to be included in v6.2, but that didn't happen. It > >> spent enough in -next without any issues, so I hope we'll finally see it > >> in v6.3. > >> > >> I believe, the best way would be moving it with scheduler patches, but I'm > >> OK to try again with bitmap branch as well. > > > > Now that Yury dropped several controversial bitmap patches form the PR, > > the rest are mostly in sched, or new API that's used by sched. > > > > Valentin, what do you think? Can you take it to your sched branch? > > I would if I had one :-) > > Peter/Ingo, any objections to stashing this in tip/sched/core? No replies... so let me take it via networking.
Hello: This series was applied to netdev/net-next.git (master) by Jakub Kicinski <kuba@kernel.org>: On Fri, 20 Jan 2023 20:24:27 -0800 you wrote: > cpumask_local_spread() currently checks local node for presence of i'th > CPU, and then if it finds nothing makes a flat search among all non-local > CPUs. We can do it better by checking CPUs per NUMA hops. > > This has significant performance implications on NUMA machines, for example > when using NUMA-aware allocated memory together with NUMA-aware IRQ > affinity hints. > > [...] Here is the summary with links: - [1/9] lib/find: introduce find_nth_and_andnot_bit https://git.kernel.org/netdev/net-next/c/43245117806f - [2/9] cpumask: introduce cpumask_nth_and_andnot https://git.kernel.org/netdev/net-next/c/62f4386e564d - [3/9] sched: add sched_numa_find_nth_cpu() https://git.kernel.org/netdev/net-next/c/cd7f55359c90 - [4/9] cpumask: improve on cpumask_local_spread() locality https://git.kernel.org/netdev/net-next/c/406d394abfcd - [5/9] lib/cpumask: reorganize cpumask_local_spread() logic https://git.kernel.org/netdev/net-next/c/b1beed72b8b7 - [6/9] sched/topology: Introduce sched_numa_hop_mask() https://git.kernel.org/netdev/net-next/c/9feae65845f7 - [7/9] sched/topology: Introduce for_each_numa_hop_mask() https://git.kernel.org/netdev/net-next/c/06ac01721f7d - [8/9] net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints https://git.kernel.org/netdev/net-next/c/2acda57736de - [9/9] lib/cpumask: update comment for cpumask_local_spread() https://git.kernel.org/netdev/net-next/c/2ac4980c57f5 You are awesome, thank you!