Message ID | 20240120025053.684838-2-yury.norov@gmail.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-31684-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2bc4:b0:101:a8e8:374 with SMTP id hx4csp1409599dyb; Fri, 19 Jan 2024 18:51:30 -0800 (PST) X-Google-Smtp-Source: AGHT+IF4WyefzN1iv9Kg7RbqnxQ25g8hSm2JF8+D6CzJMwZQl9I1f9foqFgDd7a8CN7ba1qGD2Hi X-Received: by 2002:ac8:5a4b:0:b0:42a:279a:1fa1 with SMTP id o11-20020ac85a4b000000b0042a279a1fa1mr906474qta.116.1705719090630; Fri, 19 Jan 2024 18:51:30 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1705719090; cv=pass; d=google.com; s=arc-20160816; b=mpOxuTZi+kPrTcKzdB1SRre72b8T0dIyZS++dMmyhhsHswzybssW0W8Za5Yr8b2peJ fT8Ox7CyuGGj9NkkUDb8Kqa9FxNKI6N2aUfcnFRkjPQMCL28Vz5ddTVZSfqlTNVOekiF N5KrDk/8JzD/xjGjKI3V7ZkO2G14yd4dGW4i8N/Fcc8QkksKQORWVXb0C2i9BSPFXZWP JqtUycxC4JDkVQdBnm5p7uadlHCYuwpIIIk0uilyrJfHYpzRe/ZcLs3aefItWuthd1uP 1fm5GQICBaOZ1CaMgGY1GUHRRLd6Qx1810gS5CHQxTP4n/02FGM9dZ0RnuIaO+EfSYYY WJoQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=t+ncE/Ke8W/9eSicUFGfjAMk7XfHt6TzmaC+XB67O8c=; fh=XBn6QOxXmd+a21rfOzAv6TsGVH//lL1yHr3TAnHgMoo=; b=L7f/uTorVyYNZefnOn8vuhXl1nuvHjhV1ahTQvh8GrWieqiwfiDC51XOxyDl6+d7ZM HGeSlo1TsXYLhLw0oQh7Zgc0Ymyodrwx0v+yXienNo1QHRkOnLvQ+xs9LHseloN0I2eU /20LGmZn5VbTTow0di7jjIWU0YP4gJojVEu+ZCRv5iHkvo3rQvaqVDlvy1bPbkC1x5gL j8FPwdJxE2V8VkjAo76wEdiD4RzFO5A0uRGZZnN1EszTfTX4nfAJn6o1acjuseRG5ewc chJaz+ZFLtJxS46nZHjM5GGKFG7dVOpFLGxWPNHY99LlAXyJ9tmFpFwHXQacVLoLCq5P BPww== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=cZBtEHOe; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-31684-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-31684-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id h11-20020a05622a170b00b00429b47e9942si661087qtk.710.2024.01.19.18.51.30 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 19 Jan 2024 18:51:30 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-31684-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=cZBtEHOe; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-31684-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-31684-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 6DEC51C212B1 for <ouuuleilei@gmail.com>; Sat, 20 Jan 2024 02:51:30 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id BC73023B1; Sat, 20 Jan 2024 02:51:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="cZBtEHOe" Received: from mail-yw1-f176.google.com (mail-yw1-f176.google.com [209.85.128.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9E998ECD for <linux-kernel@vger.kernel.org>; Sat, 20 Jan 2024 02:50:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705719060; cv=none; b=TyLJ5aEXP0TRGPTykgxG0xaecSdYEOB92v3ZuDQ37O6kPVz0Oo1LweJxbytiCdzoLkaphKef6njDOQxYqTwdKLiq8iFO4v/vzD5l9F96cnXElLjdIoXcEUK9306RSKaQs4ehxkqv9sPDLmYK9kqPeVFe24t1eE5+d2AV4DWC2aU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705719060; c=relaxed/simple; bh=SVwOa745y/p7doJwcbG2VSsSKIR/sIktsJ8KB1vHWPM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Z2owwEFmf3ocAnKkzA3b4eLl2qqxfWT9EXsPko9RAJvIqzaHbfKOkl3pg+HBRLBbjCBHAqipVFiyZRfvbnlWE+0irM0tMwb7A1Wd9Wq3foTgAVaHbnQZr91LraWZut0slrZ9Cq93cFrPw+8591g27cyuaNAZlWZznun2TbWKAgQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=cZBtEHOe; arc=none smtp.client-ip=209.85.128.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-yw1-f176.google.com with SMTP id 00721157ae682-5ebca94cf74so13947847b3.0 for <linux-kernel@vger.kernel.org>; Fri, 19 Jan 2024 18:50:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1705719057; x=1706323857; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=t+ncE/Ke8W/9eSicUFGfjAMk7XfHt6TzmaC+XB67O8c=; b=cZBtEHOeuUjLoazd/HsZGtCtBfAy7fB5HFRlygF6yNS7WO46pqoen4IkeqB4b/5VV/ QmH/Wm0FQmSqcawq9G4riJbsPGm73YjmYfyQpELavf1+Sp5WvGZEBGtnoJKsbkflAcFm Er4J+38+3JUcQNpRSW3E/Z8htroQouIssP9+Kob9hz7RAHjpeke7RJBxlplcZpISHyML 2C91epABEvwST5Co9Azz5lsQ7Eq+WL3SG4+M7W3s0bWkZNyPEaeheFQoP/vKwlnvuXCz 5hZyb0yREa7+caL8kihNmYtbVD36jDB4HyuDUE+P12NB6NqrYNbCsj/c/vVpRPKz1abk x5OQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705719057; x=1706323857; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=t+ncE/Ke8W/9eSicUFGfjAMk7XfHt6TzmaC+XB67O8c=; b=ZZRp2/WT6YOnN0To871NwYbkeM4yUoCGudlHKvVXkJ0Xd6j1w52/rSjV4MdbVXq1v2 x6jCVDrE3DMZDG2idoQdg5/6aFneQ9FK8aAvSqUMgqRtZXvJmoCCO7CKGTwc3GAv1kXN 187GOzjAQHjryPQgbn1y6DIeCm1+0aw9Q0vGZfFaIOTySNfbXNJMBUTBlaQtJMS+rhEs i86Ihc4d5OqWrgHvNQA3SLCE+D2PRtLacRX18vi5t8NI47HcjfSY3iY1OrPFIC4yeeze CoWIG/r3hDI7sFzSRL9COn0gTzY4xzltVGW9SWXSil132wRn686KabP4MHV6tkOOKEHI pwuw== X-Gm-Message-State: AOJu0Yx/KyjZ6AhURcef6L8JR8gCTaUkVKQfsOZhGvJwxdqK3Pz4RXfr zYGIyr2hb6qeIBSvU3c5KKzi66tId4asc758GP3WELMMkSf6y0LX X-Received: by 2002:a81:7bc5:0:b0:5ff:9b17:8cd7 with SMTP id w188-20020a817bc5000000b005ff9b178cd7mr735066ywc.71.1705719057453; Fri, 19 Jan 2024 18:50:57 -0800 (PST) Received: from localhost ([2601:344:8301:57f0:2288:782e:a717:678d]) by smtp.gmail.com with ESMTPSA id s4-20020a0dd004000000b005fb0d1afc7csm7925684ywd.120.2024.01.19.18.50.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 19 Jan 2024 18:50:56 -0800 (PST) From: Yury Norov <yury.norov@gmail.com> To: Andrew Morton <akpm@linux-foundation.org>, Thomas Gleixner <tglx@linutronix.de>, Ming Lei <ming.lei@redhat.com>, linux-kernel@vger.kernel.org Cc: Yury Norov <yury.norov@gmail.com>, Andy Shevchenko <andriy.shevchenko@linux.intel.com>, Breno Leitao <leitao@debian.org>, Nathan Chancellor <nathan@kernel.org>, Rasmus Villemoes <linux@rasmusvillemoes.dk>, Zi Yan <ziy@nvidia.com> Subject: [PATCH 1/9] cpumask: introduce for_each_cpu_and_from() Date: Fri, 19 Jan 2024 18:50:45 -0800 Message-Id: <20240120025053.684838-2-yury.norov@gmail.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240120025053.684838-1-yury.norov@gmail.com> References: <20240120025053.684838-1-yury.norov@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1788576100953746769 X-GMAIL-MSGID: 1788576100953746769 |
Series |
lib/group_cpus: rework grp_spread_init_one() and make it O(1)
|
|
Commit Message
Yury Norov
Jan. 20, 2024, 2:50 a.m. UTC
Similarly to for_each_cpu_and(), introduce a for_each_cpu_and_from(),
which is handy when it's needed to traverse 2 cpumasks or bitmaps,
starting from a given position.
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
include/linux/cpumask.h | 11 +++++++++++
include/linux/find.h | 3 +++
2 files changed, 14 insertions(+)
Comments
On Fri, Jan 19, 2024 at 06:50:45PM -0800, Yury Norov wrote: > Similarly to for_each_cpu_and(), introduce a for_each_cpu_and_from(), > which is handy when it's needed to traverse 2 cpumasks or bitmaps, > starting from a given position. The new helper is useless, see https://lore.kernel.org/lkml/ZZNgDb6bzOscrNmk@fedora/ Thanks, Ming
On Sat, Jan 20, 2024 at 11:03:37AM +0800, Ming Lei wrote: > On Fri, Jan 19, 2024 at 06:50:45PM -0800, Yury Norov wrote: > > Similarly to for_each_cpu_and(), introduce a for_each_cpu_and_from(), > > which is handy when it's needed to traverse 2 cpumasks or bitmaps, > > starting from a given position. > > The new helper is useless, see > > https://lore.kernel.org/lkml/ZZNgDb6bzOscrNmk@fedora/ Let's consider the following configuration. CPUs: 0b1111 Sibling groups: 0b0011 and 0b1100 nmsk: 0b1111 As the complexity measure we take the number of accesses to nmsk in the outer loop, and to (nmsk & sibl) in the inner loop in search routines, so that cpumask_first(1111) requires 1 access to find the first set bit, and cpumask_first(1000) requires 4 such accesses. Actual find_bit() ops work better than this by using __ffs(), but on long bitmaps the performance will be as described above. Now, look at the code. This is yours: static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk, unsigned int cpus_per_grp) { const struct cpumask *siblmsk; int cpu, sibl; for ( ; cpus_per_grp > 0; ) { cpu = cpumask_first(nmsk); /* Should not happen, but I'm too lazy to think about it */ if (cpu >= nr_cpu_ids) return; cpumask_clear_cpu(cpu, nmsk); cpumask_set_cpu(cpu, irqmsk); cpus_per_grp--; /* If the cpu has siblings, use them first */ siblmsk = topology_sibling_cpumask(cpu); for (sibl = -1; cpus_per_grp > 0; ) { sibl = cpumask_next(sibl, siblmsk); if (sibl >= nr_cpu_ids) break; if (!cpumask_test_and_clear_cpu(sibl, nmsk)) continue; cpumask_set_cpu(sibl, irqmsk); cpus_per_grp--; } } } This is your code step-by-step: # loop cpu match siblmsk nmsk irqmsk 0 outer 0 yes 1110 0001 1 inner 0 no 1110 0001 2 inner 1 yes 0011 1100 0011 3 inner 2 no 1100 0011 4 inner 3 no 1100 0011 5 outer 0 no 1100 0011 6 outer 1 no 1100 0011 7 outer 2 yes 1000 0111 8 inner 0 no 1100 1000 0111 9 inner 1 no 1100 1000 0111 10 inner 2 no 1100 1000 0111 11 inner 3 yes 1100 0000 1111 12 outer 0 no 0000 1111 13 outer 1 no 0000 1111 14 outer 2 no 0000 1111 15 outer 3 no 0000 1111 This is mine: static void grp_spread_init_one(struct cpumask *irqmsk, struct cpumask *nmsk, unsigned int cpus_per_grp) { const struct cpumask *siblmsk; int cpu, sibl; for_each_cpu(cpu, nmsk) { if (cpus_per_grp-- == 0) return; /* * If a caller wants to spread IRQa on offline CPUs, we need to * take care of it explicitly because those offline CPUS are not * included in siblings cpumask. */ __cpumask_clear_cpu(cpu, nmsk); __cpumask_set_cpu(cpu, irqmsk); /* If the cpu has siblings, use them first */ siblmsk = topology_sibling_cpumask(cpu); sibl = cpu + 1; for_each_cpu_and_from(sibl, siblmsk, nmsk) { if (cpus_per_grp-- == 0) return; __cpumask_clear_cpu(sibl, nmsk); __cpumask_set_cpu(sibl, irqmsk); cpu = sibl + 1; } } } Step-by-step: # loop cpu match siblmsk nmsk irqmsk 0 outer 0 yes 1110 0001 1 inner 1 yes 0011 1100 0011 2 inner 2 no 0011 1100 0011 3 inner 3 no 0011 1100 0011 4 outer 2 yes 1000 0111 5 inner 3 yes 1100 0000 1111 Your code works worse because it's a Schlemiel the Painter's algorithm. I mentioned it twice in the commit messages and at least 3 times in replies to your comments. Here I'll stop and will not reply to your emails, including the rest of that Friday's night mailbombing, unless you at least admit you're wrong in this case and for_each_cpu_and_from() is useful here. I'd also recommend you to learn more about atomic operations basics and revoke your NAK from the patch #3. Thanks, Yury PS: There's a typo in the series name, I meant that the series makes the function O(N) of course. But even that is overly optimistic. It's O(N*S), where S is the number of sibling groups. A couple more patches needed to make it a true O(N). Still, much better.
On Sun, Jan 21, 2024 at 11:50:02AM -0800, Yury Norov wrote: > On Sat, Jan 20, 2024 at 11:03:37AM +0800, Ming Lei wrote: > > On Fri, Jan 19, 2024 at 06:50:45PM -0800, Yury Norov wrote: > > > Similarly to for_each_cpu_and(), introduce a for_each_cpu_and_from(), > > > which is handy when it's needed to traverse 2 cpumasks or bitmaps, > > > starting from a given position. > > > > The new helper is useless, see > > > > https://lore.kernel.org/lkml/ZZNgDb6bzOscrNmk@fedora/ > > Let's consider the following configuration. > Step-by-step: .. > > # loop cpu match siblmsk nmsk irqmsk > 0 outer 0 yes 1110 0001 > 1 inner 1 yes 0011 1100 0011 > 2 inner 2 no 0011 1100 0011 > 3 inner 3 no 0011 1100 0011 > 4 outer 2 yes 1000 0111 > 5 inner 3 yes 1100 0000 1111 > > Your code works worse because it's a Schlemiel the Painter's algorithm. > I mentioned it twice in the commit messages and at least 3 times in > replies to your comments. Does it really matter here in reality? Which kind of user visible improvements can be observed? I have mentioned several times, for control/management code path, we care more on maintainability, correctness instead of efficiency. You are _wasting_ resources in wrong place, if you are really interested in optimization, please do in fast code path, such as, related and not not limited, irq handling, io handling, memory allocation, .... Unfortunately, your V5 still have obvious bug, and as you mentioned, the patchset title is wrong too. > > Here I'll stop and will not reply to your emails, including the rest of > that Friday's night mailbombing, unless you at least admit you're wrong > in this case and for_each_cpu_and_from() is useful here. It is easy to get same result without adding for_each_cpu_and_from(), see the patch I sent: https://lore.kernel.org/lkml/20240120065543.739203-1-ming.lei@redhat.com/ in which we needn't to update iterator variable inside loop, and fix the bug in your patch 4 of v5, and it is still O(N). Meantime it is simpler and easier to get proved. Here your use of for_each_cpu_and_from() is tricky too, cause the loop condition variable(part of iterator variable) of cpu mask is being updated inside the loop. And we can get same result by cpumask_next_and() without playing the trick. > > I'd also recommend you to learn more about atomic operations basics and > revoke your NAK from the patch #3. If you think my comment on the NAK is wrong, please reply on the comment directly. > > Thanks, > Yury > > PS: There's a typo in the series name, I meant that the series makes the > function O(N) of course. But even that is overly optimistic. It's O(N*S), > where S is the number of sibling groups. A couple more patches needed to > make it a true O(N). Still, much better. Either O(1) or O(N) isn't one big deal here, cause it is oneshot slow code path, and nr_cpu_ids is not big enough in reality. Even you can't make real O(N) because your patch 4 has logic mistake, see my comment: https://lore.kernel.org/lkml/ZatlggW%2F8SH6od9O@fedora/ Thanks, Ming
diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index cfb545841a2c..73ff2e0ef090 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -332,6 +332,17 @@ unsigned int __pure cpumask_next_wrap(int n, const struct cpumask *mask, int sta #define for_each_cpu_and(cpu, mask1, mask2) \ for_each_and_bit(cpu, cpumask_bits(mask1), cpumask_bits(mask2), small_cpumask_bits) +/** + * for_each_cpu_and_from - iterate over every cpu in both masks starting from a given cpu + * @cpu: the (optionally unsigned) integer iterator + * @mask1: the first cpumask pointer + * @mask2: the second cpumask pointer + * + * After the loop, cpu is >= nr_cpu_ids. + */ +#define for_each_cpu_and_from(cpu, mask1, mask2) \ + for_each_and_bit_from(cpu, cpumask_bits(mask1), cpumask_bits(mask2), small_cpumask_bits) + /** * for_each_cpu_andnot - iterate over every cpu present in one mask, excluding * those present in another. diff --git a/include/linux/find.h b/include/linux/find.h index 5e4f39ef2e72..dfd3d51ff590 100644 --- a/include/linux/find.h +++ b/include/linux/find.h @@ -563,6 +563,9 @@ unsigned long find_next_bit_le(const void *addr, unsigned (bit) = find_next_and_bit((addr1), (addr2), (size), (bit)), (bit) < (size);\ (bit)++) +#define for_each_and_bit_from(bit, addr1, addr2, size) \ + for (; (bit) = find_next_and_bit((addr1), (addr2), (size), (bit)), (bit) < (size); (bit)++) + #define for_each_andnot_bit(bit, addr1, addr2, size) \ for ((bit) = 0; \ (bit) = find_next_andnot_bit((addr1), (addr2), (size), (bit)), (bit) < (size);\