Message ID | Y5uprmSmSfYechX2@yury-laptop |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:e747:0:0:0:0:0 with SMTP id c7csp660933wrn; Thu, 15 Dec 2022 15:24:34 -0800 (PST) X-Google-Smtp-Source: AA0mqf777yaR1Z0tL3tuBqN10cOqXn12dIu/oRVH9Zix4Lnt9U4Numstinzu82kfeY02VfOcrblV X-Received: by 2002:a05:6402:4011:b0:46d:87a5:4dac with SMTP id d17-20020a056402401100b0046d87a54dacmr9493491eda.36.1671146674132; Thu, 15 Dec 2022 15:24:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671146674; cv=none; d=google.com; s=arc-20160816; b=c21rNoBYTc3fA6FDck6O38Jqxg7jxVfJ91VFWJ4+ccScPFozqNZQpogfvXBfATEwd4 og/bpq/vuxx3URR9Ei97myIMIiUtM/R9VALEq0lypaoO0B5QKhRqVPj/nWtcdZRm8Zvt Di+xLRpXsCR4ZJuVjPR/6VuZkRXrhnFsYhCCQ1iTdqYk+gIUrYh3wmFL8HUP9U2GV65g SGJUN7t9D4oWZiyu/2FH7xSfnLI1T/7x9X6+77k3OLMO97iTKNp2/sH2Z5Vnz5EeFIB5 F07fEP/AmprulTTJQs9w6FhZisTPEDGfZN2+kDFAD7HqL2Ouq/a/StlPA8HTIBTZqPre DpJg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-disposition:mime-version:message-id :subject:cc:to:from:date:dkim-signature; bh=8DKcMUsVohq1zP2BP4+TD9a9jOkzZ1udLWR6UFI0elg=; b=Z4aLb7BZ060qMD2E74jaiyqxGa8B+I782TrseuDetXoC/MARXOYxfv+bkef9T4G17p BmXjubNnlgxgAlJyDhxX4KDkXPCjb/2HzchrFKKOOoigisFTc9+ZfyemHvlMUHxEY1dw BU+OXVp6To1nnKkRLF6EIWlQXy7ChJ4MJ5a2FZ/wdlrOr7BunhW+ulT/3NwDn8KJQ8qK 7GUmm1Tg2G0iEzm3zxB/lJqMZH82tFlvrMQTbDX1cu56xeWFr6ClsFrKqugO1A3vEUSl EnhamsncNr3G/CwA10UGwN3szG/HitpHy8Ehinagehr35Teo7EAFDKqiv8rMtCn/+e4J Ye3w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=jQNpm7q5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o3-20020aa7d3c3000000b00461f10cb543si572414edr.154.2022.12.15.15.24.07; Thu, 15 Dec 2022 15:24:34 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=jQNpm7q5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229915AbiLOXL7 (ORCPT <rfc822;jeantsuru.cumc.mandola@gmail.com> + 99 others); Thu, 15 Dec 2022 18:11:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51172 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229807AbiLOXLq (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 15 Dec 2022 18:11:46 -0500 Received: from mail-ot1-x335.google.com (mail-ot1-x335.google.com [IPv6:2607:f8b0:4864:20::335]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7F3433E0BE; Thu, 15 Dec 2022 15:11:45 -0800 (PST) Received: by mail-ot1-x335.google.com with SMTP id i26-20020a9d68da000000b00672301a1664so416466oto.6; Thu, 15 Dec 2022 15:11:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-disposition:mime-version:message-id:subject:cc:to:from:date :from:to:cc:subject:date:message-id:reply-to; bh=8DKcMUsVohq1zP2BP4+TD9a9jOkzZ1udLWR6UFI0elg=; b=jQNpm7q53MiBge8cfZpMiioLYSIs1TjfR4UT1bxm8/le/hkbq8Nc06hQAmkpY5kqKt mzdIkkdb8VUMtbXnsBuG0eLFoVVBeYxblUh/yRavCSwmUapTuDLOxHCOcWDa5dYlM5E+ SDt3f4IZUjjlc6bVhJDr/Zj9skuAjrLc8O0kYvEc71rIS7hGLW0nHZPjfsXeFA+inZeC jZ6ACE1t+lmjJwpsilyb+VxuXzpLfjOEh09Qf9pQyFoQSnt5Yx5JbTD4DV+paRdujoli Htd3bq6I4msBDBpipaLv6yfqwol+AnXDjOINi1JoLvXcNGhw2OgapQHdAvw/IsoQa152 w9BQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-disposition:mime-version:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=8DKcMUsVohq1zP2BP4+TD9a9jOkzZ1udLWR6UFI0elg=; b=c/oZm8HDZGj4/aEeCIuJWrsTNdipr5xY4TjIYnT/ygKEczC+i6bmsyv2Vl90Qhd/Rt ZG2hdasfhqdiA23VrQnBn2jIjzWtsf94PO4tWzQy+0sjWkp/7WOmVCL5FfuCnDVwAFqi RSdUF7YiPXrlVLYeSkn/94aX7hLdSr8DA8nUwesRpSWrk0BEbFktDEK6nfkUyO6VQezn VE1cJpLV/+FNh7TQtvJ/fFyTXOL8nDMBqGYAdgvp+8JitQmR5Q2fSZhd3kVFvFmYFc9b WsiSXbyApjpRBY+PUOVAcHuxB+7ivNlEA29KG2zeBpdnQyVikNCHwSRTSf7DDJZ7soha vRNA== X-Gm-Message-State: ANoB5pnVdibtlahzY0vEOBDYoR/npJOw2GfjOVv7GvK/7rt7JznSqvCV RIZ3O1sDOrLthhmrO4YIwEM= X-Received: by 2002:a9d:6006:0:b0:670:6246:8106 with SMTP id h6-20020a9d6006000000b0067062468106mr14896488otj.4.1671145904650; Thu, 15 Dec 2022 15:11:44 -0800 (PST) Received: from localhost ([12.97.180.36]) by smtp.gmail.com with ESMTPSA id a21-20020a056830101500b00670763270fcsm114511otp.71.2022.12.15.15.11.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Dec 2022 15:11:44 -0800 (PST) Date: Thu, 15 Dec 2022 15:11:42 -0800 From: Yury Norov <yury.norov@gmail.com> To: Linus Torvalds <torvalds@linux-foundation.org>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org> Cc: Yury Norov <yury.norov@gmail.com>, Andy Shevchenko <andriy.shevchenko@linux.intel.com>, Rasmus Villemoes <linux@rasmusvillemoes.dk>, Tariq Toukan <tariqt@nvidia.com>, Valentin Schneider <vschneid@redhat.com>, linux-rdma@vger.kernel.org, netdev@vger.kernel.org Subject: [GIT PULL] bitmap changes for v6.2-rc1 Message-ID: <Y5uprmSmSfYechX2@yury-laptop> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752324294908570394?= X-GMAIL-MSGID: =?utf-8?q?1752324294908570394?= |
Series |
[GIT,PULL] bitmap changes for v6.2-rc1
|
|
Pull-request
git@github.com:/norov/linux.git tags/bitmap-6.2-rc1Message
Yury Norov
Dec. 15, 2022, 11:11 p.m. UTC
The following changes since commit 76dcd734eca23168cb008912c0f69ff408905235: Linux 6.1-rc8 (2022-12-04 14:48:12 -0800) are available in the Git repository at: git@github.com:/norov/linux.git tags/bitmap-6.2-rc1 for you to fetch changes up to 2386459394d2a46964829a00c48a08a23ead94ed: lib/cpumask: update comment for cpumask_local_spread() (2022-12-15 14:44:43 -0800) ---------------------------------------------------------------- Hi Linus, Please pull bitmap patches for v6.2. They spent in -next for more than a week without any issues. The branch consists of: - optimize small_const path for find_next_bit() and friends (me); Introduces small_const_nbits_off() and uses it in find_next_bit()-like functions to allow static optimization when all bits to search are withing a word boundary, even if not a 1st word. - cpumask: improve on cpumask_local_spread() locality (me): The series makes cpumask_local_spread() picking Nth CPU based on NUMA hop distances, which is better than picking CPUs from a given node and if there's no N cpus - from a random node (how it works now). - sched, net: NUMA-aware CPU spreading interface (Valentin Schneider, Tariq Toukan): Iterators for NUMA-aware CPUs traversing. Better alternative for cpumask_local_spread(), when it's needed to enumerate all CPUs. Thanks, Yury ---------------------------------------------------------------- Tariq Toukan (1): net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints Valentin Schneider (2): sched/topology: Introduce sched_numa_hop_mask() sched/topology: Introduce for_each_numa_hop_mask() Yury Norov (9): bitmap: switch from inline to __always_inline bitmap: improve small_const case for find_next() functions bitmap: add tests for find_next_bit() lib/find: introduce find_nth_and_andnot_bit cpumask: introduce cpumask_nth_and_andnot sched: add sched_numa_find_nth_cpu() cpumask: improve on cpumask_local_spread() locality lib/cpumask: reorganize cpumask_local_spread() logic lib/cpumask: update comment for cpumask_local_spread() drivers/net/ethernet/mellanox/mlx5/core/eq.c | 18 ++- include/asm-generic/bitsperlong.h | 12 ++ include/linux/bitmap.h | 46 ++++---- include/linux/cpumask.h | 164 +++++++++++++++------------ include/linux/find.h | 118 +++++++++++-------- include/linux/nodemask.h | 86 +++++++------- include/linux/topology.h | 33 ++++++ kernel/sched/topology.c | 90 +++++++++++++++ lib/cpumask.c | 52 +++++---- lib/find_bit.c | 9 ++ lib/test_bitmap.c | 23 +++- 11 files changed, 438 insertions(+), 213 deletions(-)
Comments
On Thu, Dec 15, 2022 at 3:11 PM Yury Norov <yury.norov@gmail.com> wrote: > > Please pull bitmap patches for v6.2. They spent in -next for more than > a week without any issues. The branch consists of: So I've been holding off on this because these bitmap pulls have always scared me, and I wanted to have the time to actually look through them in detail before pulling. I'm back home, over the travel chaos, and while I have other pulls pending, they seem to be benign fixes so I started looking at this. And when looking at it, I did indeed finx what I think is a fundamental arithmetic bug. That small_const_nbits_off() is simply buggy. Try this: small_const_nbits_off(64,-1); and see it return true. The thing is, doing divides in C is something you need to be very careful about, because they do *not* behave nicely with signed values. They do the "mathematically intuitive" thing of truncating towards zero, but when you are talking about bit ranges like this, that is *NOT* what you want. The comment is fairly good, but it just doesn't match the code. If you want to check that 'nbits-1' and 'off' are in the same word, you simply should not use division at all. Sure, it would work, if you verify that they are both non-negative, but the fact is, even then it's just wrong, wrong, wrong. You want a shift operation or a masking operation. Honestly, in this case, I think the logical thing to do is "check that the upper bits are the same". The way you do that is probably something like !((off) ^ ((nbits)-1) & ~(BITS_PER_LONG-1)) which doesn't have the fundamental bug that the divide version has. So anyway, I won't be pulling this. I appreciate that you are trying to micro-optimize the bitop functions, but this is not the first time your pull requests have made me worried and caused me to not pull. This is such basic functionality that I really need these pull requests to be less worrisome. In other words, not only don't I want to see these kinds of bugs - please make sure that there isn't anythign that looks even *remotely* scary. The micro-optimizations had better be *so* obvious and so well thought through that I not only don't find bugs in them, I don't even get nervous about finding bugs. Side note: that whole small_const_nbits_off() thing could be possibly improved in other ways. It's actually unnecessarily strict (if it was correct, that is): we don't really require 'off' to be constant, what we *do* want is - "nbits > off" needs to be a compile-time true constant - that "off is in the same word as nbits-1" also needs to be a compile-time true constant. and the compiler could determine both of those to be the case even if 'off' itself isn't a constant, if there is code around it that verifies it. For example, if you have code like this: off = a & 7; n = find_next_bit(bitmap, 64, off); then the compiler could still see that it's that "last word only" case, even though 'off' isn't actually a constant value. So you could try using a helper macro like #define COMPILE_TIME_TRUE(x) (__builtin_constant_p(x) && (x)) to handle those kinds of things. But again - the code needs to be OBVIOUSLY CORRECT. Make me feel those warm and fuzzies about it because it's so obviously safe and correct. That said, I see your test-cases for your micro-optimizations, but I'd also like to see references to where it actually improves real code. I know for a fact that 'small_const_nbits()' actually triggers in real life. I don't see that your 'small_const_nbits_off()' would trigger in real life in any situation that isn't already covered by 'small_const_nbits()'. So convince me not only that the optimizations are obviously correct, but also that they actually matter. Linus
On Fri, Dec 23, 2022 at 10:44 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Honestly, in this case, I think the logical thing to do is "check that > the upper bits are the same". The way you do that is probably > something like > > !((off) ^ ((nbits)-1) & ~(BITS_PER_LONG-1)) Note that while the above is probably correct (but you always need to double-check my emailed "something like this" code - I literally write it in the MUA, and I make mistakes too), I'd never want to see that as part of one big complex macro. In fact, I think I am missing a set of parentheses, because '&' has a higher precedence than '^', so the above is actually buggy. So I'd much rather see something like this #define COMPILE_TIME_TRUE(x) (__builtin_constant_p(x) && (x)) #define bits_in_same_word(x,y) \ (!(((x)^(y))&~(BITS_PER_LONG-1))) #define bitmap_off_in_last_word(nbits,off) \ bits_in_same_word((nbits)-1,off) #define small_const_nbits_off(nbits, off) \ (__builtin_constant_p(nbits) && (nbits) > 0 && \ COMPILE_TIME_TRUE(bitmap_off_in_last_word(nbits,off))) where each step does one thing and one thing only, and you don't have one complicated thing that is hard to read. And again, don't take my word blindly for the above. I *think* the above may be correct, but there's a "think" and a "may" there. Plus I'd still like to hear about where the above would actually matter and make a code generation difference in real life (compared to just the simple "optimize the single-word bitmap" case). Linus
On Fri, Dec 23, 2022 at 10:44:07AM -0800, Linus Torvalds wrote: > On Thu, Dec 15, 2022 at 3:11 PM Yury Norov <yury.norov@gmail.com> wrote: > > > > Please pull bitmap patches for v6.2. They spent in -next for more than > > a week without any issues. The branch consists of: > > So I've been holding off on this because these bitmap pulls have > always scared me, and I wanted to have the time to actually look > through them in detail before pulling. > > I'm back home, over the travel chaos, and while I have other pulls > pending, they seem to be benign fixes so I started looking at this. > > And when looking at it, I did indeed finx what I think is a > fundamental arithmetic bug. > > That small_const_nbits_off() is simply buggy. > > Try this: > > small_const_nbits_off(64,-1); > > and see it return true. Hi Linus, Sorry for a delayed reply. small_const_nbits{_off} is used only for bitmap and find_bit functions where both offset and size are unsigned types. -1 there will turn into UINT_MAX or ULONG_MAX, and small_const_nbits_off(64, -1) returns false. The bitops.h functions (set_bit et all) also use unsigned type for nr. Negative offsets are not used in bit operations from very basic level. Notice that '(nbits) > 0' part in small_const_nbits() is there to exclude 0, not negative numbers. So, support for negative offset/size looks irrelevant for all existing users of small_const(). I doubt there'll be new non-bitmap users for the macro anytime soon. small_const_nbits() and proposed small_const_nbits_off() are in fact very bitmap-related macros. 'Small' in this context refers to a single-word bitmap. 0 is definitely a small and constant number, but small_const_nbits(0) will return false - only because inline versions of bitmap functions don't handle 0 correctly. I think, small_const_nbits confuses because it sounds too generic and is located in a very generic place. There's a historical reason for that. Originally, the macro was hosted in include/linux/bitmap.h and at that time find.h was in include/asm-generic/bitops/. In commit 586eaebea5988 (lib: extend the scope of small_const_nbits() macro) I moved the macro to include/asm-generic/bitsperlong.h to optimize find_bit functions too. After that, working on other parts I found that having bitmap.h and find.h in different include paths is a permanent headache due to things like circular dependencies, and moved find.h to include/linux, where it should be. And even made it an internal header for bitmap.h. But didn't move small_const_nbits(). Looks like I have to move it to somewhere in include/linux/bitops.h. [...] > So convince me not only that the optimizations are obviously correct, > but also that they actually matter. There're no existing users for small_const_nbits_off(). I've been reworking bitmap_find_free_region(), added a pile of tests and found that some quite trivial cases are not inlined, for example find_next_bit(addr, 128, 124). Let's ignore this patch unless we'll have real users. Regarding the rest of the series. Can you please take a look? It includes an optimization for CPUs allocations. With quite a simple rework of cpumask_local_spread() we are gaining measurable and significant improvement for many subsystems on NUMA machines. Tariq measured impact of NUMA-based locality on networking in his environment, and from his commit message: Performance tests: TCP multi-stream, using 16 iperf3 instances pinned to 16 cores (with aRFS on). Active cores: 64,65,72,73,80,81,88,89,96,97,104,105,112,113,120,121 +-------------------------+-----------+------------------+------------------+ | | BW (Gbps) | TX side CPU util | RX side CPU util | +-------------------------+-----------+------------------+------------------+ | Baseline | 52.3 | 6.4 % | 17.9 % | +-------------------------+-----------+------------------+------------------+ | Applied on TX side only | 52.6 | 5.2 % | 18.5 % | +-------------------------+-----------+------------------+------------------+ | Applied on RX side only | 94.9 | 11.9 % | 27.2 % | +-------------------------+-----------+------------------+------------------+ | Applied on both sides | 95.1 | 8.4 % | 27.3 % | +-------------------------+-----------+------------------+------------------+ Bottleneck in RX side is released, reached linerate (~1.8x speedup). ~30% less cpu util on TX. We'd really like to have this work in next kernel release. Thanks, Yury