[V2,1/2] net: Fixup netif_attrmask_next_and warning

Message ID 20221014030459.3272206-2-guoren@kernel.org
State New
Headers
Series net: Fixup cpu_mask usage |

Commit Message

Guo Ren Oct. 14, 2022, 3:04 a.m. UTC
  From: Guo Ren <guoren@linux.alibaba.com>

Don't pass nr_bits as arg1, cpu_max_bits_warn would cause warning
now 854701ba4c39 ("net: fix cpu_max_bits_warn() usage in
netif_attrmask_next{,_and}").

------------[ cut here ]------------
WARNING: CPU: 2 PID: 1 at include/linux/cpumask.h:110 __netif_set_xps_queue+0x14e/0x770
Modules linked in:
CPU: 2 PID: 1 Comm: swapper/0 Not tainted 6.0.0-rc4-00018-g854701ba4c39 #324
Hardware name: riscv-virtio,qemu (DT)
epc : __netif_set_xps_queue+0x14e/0x770
 ra : __netif_set_xps_queue+0x552/0x770
epc : ffffffff806fe448 ra : ffffffff806fe84c sp : ff600000023279d0
 gp : ffffffff815fff88 tp : ff600000023a0000 t0 : ff6000000308ab40
 t1 : 0000000000000003 t2 : 0000000000000000 s0 : ff60000002327a90
 s1 : 0000000000000000 a0 : ff6000000308ab00 a1 : ff6000000308ab00
 a2 : ff6000000308a8e8 a3 : 0000000000000004 a4 : 0000000000000000
 a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000000000000
 s2 : 0000000000000000 s3 : 0000000000000000 s4 : ff60000002327aa0
 s5 : ffffffff816031c8 s6 : 0000000000000000 s7 : 0000000000000001
 s8 : 0000000000000000 s9 : 0000000000000004 s10: ff6000000308a8c0
 s11: 0000000000000004 t3 : 0000000000000000 t4 : 0000000000000014
 t5 : 0000000000000000 t6 : 0000000000000000
status: 0000000200000120 badaddr: 0000000000000000 cause: 0000000000000003
[<ffffffff805e5824>] virtnet_set_affinity+0x14a/0x1c0
[<ffffffff805e7b04>] virtnet_probe+0x7fc/0xee2
[<ffffffff8050e120>] virtio_dev_probe+0x164/0x2de
[<ffffffff8055b69e>] really_probe+0x82/0x224
[<ffffffff8055b89a>] __driver_probe_device+0x5a/0xaa
[<ffffffff8055b916>] driver_probe_device+0x2c/0xb8
[<ffffffff8055bf34>] __driver_attach+0x76/0x108
[<ffffffff805597c0>] bus_for_each_dev+0x4a/0x8e
[<ffffffff8055b072>] driver_attach+0x1a/0x28
[<ffffffff8055ab8c>] bus_add_driver+0x13c/0x1a6
[<ffffffff8055c722>] driver_register+0x4a/0xfc
[<ffffffff8050dc34>] register_virtio_driver+0x1c/0x2c
[<ffffffff80a2bae4>] virtio_net_driver_init+0x7a/0xb0
[<ffffffff80002840>] do_one_initcall+0x66/0x2e4
[<ffffffff80a01212>] kernel_init_freeable+0x28a/0x304
[<ffffffff808b21e2>] kernel_init+0x1e/0x110
[<ffffffff80003c46>] ret_from_exception+0x0/0x10
---[ end trace 0000000000000000 ]---

Fixes: 80d19669ecd3 ("net: Refactor XPS for CPUs and Rx queues")
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
---
 net/core/dev.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
  

Comments

Jakub Kicinski Oct. 14, 2022, 3:35 a.m. UTC | #1
On Thu, 13 Oct 2022 23:04:58 -0400 guoren@kernel.org wrote:
> -	for (j = -1; j = netif_attrmask_next_and(j, online_mask, mask, nr_ids),
> -	     j < nr_ids;) {
> +	for (j = -1; j < nr_ids;
> +	     j = netif_attrmask_next_and(j, online_mask, mask, nr_ids)) {

This does not look equivalent, have you tested it?

nr_ids is unsigned, doesn't it mean we'll never enter the loop?

Can we instead revert 854701ba4c and take the larger rework Yury 
has posted a week ago into net-next?
  
Jakub Kicinski Oct. 14, 2022, 3:39 a.m. UTC | #2
On Thu, 13 Oct 2022 20:35:44 -0700 Jakub Kicinski wrote:
> Can we instead revert 854701ba4c and take the larger rework Yury 
> has posted a week ago into net-next?

Oh, it was reposted today:

https://lore.kernel.org/all/20221013234349.1165689-2-yury.norov@gmail.com/

But we need a revert of 854701ba4c as well to cover the issue back up
for 6.1, AFAIU.
  
Yury Norov Oct. 14, 2022, 4:42 a.m. UTC | #3
On Thu, Oct 13, 2022 at 08:39:11PM -0700, Jakub Kicinski wrote:
> On Thu, 13 Oct 2022 20:35:44 -0700 Jakub Kicinski wrote:
> > Can we instead revert 854701ba4c and take the larger rework Yury 
> > has posted a week ago into net-next?
> 
> Oh, it was reposted today:
> 
> https://lore.kernel.org/all/20221013234349.1165689-2-yury.norov@gmail.com/
> 
> But we need a revert of 854701ba4c as well to cover the issue back up
> for 6.1, AFAIU.

The patch 854701ba4c is technically correct. I fixed most of warnings in
advance, but nobody can foresee everything, right? I expected some noise,
and now we have just a few things to fix. This is what for -rc releases
exist, didn't they?

I suggest to keep the patch, because this is the only way to make
cpumask_check()-related issues visible to people. If things will go as
they go now, I expect that -rc3 will be clean from cpumask_check()
warnings.

Thanks,
Yury
  
Guo Ren Oct. 14, 2022, 6:38 a.m. UTC | #4
On Fri, Oct 14, 2022 at 11:35 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 13 Oct 2022 23:04:58 -0400 guoren@kernel.org wrote:
> > -     for (j = -1; j = netif_attrmask_next_and(j, online_mask, mask, nr_ids),
> > -          j < nr_ids;) {
> > +     for (j = -1; j < nr_ids;
> > +          j = netif_attrmask_next_and(j, online_mask, mask, nr_ids)) {
>
> This does not look equivalent, have you tested it?
>
> nr_ids is unsigned, doesn't it mean we'll never enter the loop?
Yes, you are right. Any unsigned int would break the result.
(gdb) p (int)-1 < (int)2
$1 = 1
(gdb) p (int)-1 < (unsigned int)2
$2 = 0
(gdb) p (unsigned int)-1 < (int)2
$4 = 0

So it should be:
 -     for (j = -1; j = netif_attrmask_next_and(j, online_mask, mask, nr_ids),
 -          j < nr_ids;) {
 +     for (j = -1; j < (int)nr_ids;
 +          j = netif_attrmask_next_and(j, online_mask, mask, nr_ids)) {

Right? Of cause, nr_ids couldn't be 0xffffffff (-1).

>
> Can we instead revert 854701ba4c and take the larger rework Yury
> has posted a week ago into net-next?
  
Guo Ren Oct. 14, 2022, 6:42 a.m. UTC | #5
On Fri, Oct 14, 2022 at 12:42 PM Yury Norov <yury.norov@gmail.com> wrote:
>
> On Thu, Oct 13, 2022 at 08:39:11PM -0700, Jakub Kicinski wrote:
> > On Thu, 13 Oct 2022 20:35:44 -0700 Jakub Kicinski wrote:
> > > Can we instead revert 854701ba4c and take the larger rework Yury
> > > has posted a week ago into net-next?
> >
> > Oh, it was reposted today:
> >
> > https://lore.kernel.org/all/20221013234349.1165689-2-yury.norov@gmail.com/
> >
> > But we need a revert of 854701ba4c as well to cover the issue back up
> > for 6.1, AFAIU.
>
> The patch 854701ba4c is technically correct. I fixed most of warnings in
> advance, but nobody can foresee everything, right? I expected some noise,
> and now we have just a few things to fix. This is what for -rc releases
> exist, didn't they?
Your job is great, I just want to help with some fixes. Fixes them in
-rc would be a good point.

>
> I suggest to keep the patch, because this is the only way to make
> cpumask_check()-related issues visible to people. If things will go as
> they go now, I expect that -rc3 will be clean from cpumask_check()
> warnings.
>
> Thanks,
> Yury
  
Andy Shevchenko Oct. 14, 2022, 10 a.m. UTC | #6
On Thu, Oct 13, 2022 at 11:04:58PM -0400, guoren@kernel.org wrote:
> From: Guo Ren <guoren@linux.alibaba.com>
> 
> Don't pass nr_bits as arg1, cpu_max_bits_warn would cause warning
> now 854701ba4c39 ("net: fix cpu_max_bits_warn() usage in
> netif_attrmask_next{,_and}").
> 
> ------------[ cut here ]------------
> WARNING: CPU: 2 PID: 1 at include/linux/cpumask.h:110 __netif_set_xps_queue+0x14e/0x770
> Modules linked in:

Submitting Patches documentation suggests to cut this to only what makes sense
for the report.
  
Guo Ren Oct. 14, 2022, 10:04 a.m. UTC | #7
On Fri, Oct 14, 2022 at 6:01 PM Andy Shevchenko
<andriy.shevchenko@linux.intel.com> wrote:
>
> On Thu, Oct 13, 2022 at 11:04:58PM -0400, guoren@kernel.org wrote:
> > From: Guo Ren <guoren@linux.alibaba.com>
> >
> > Don't pass nr_bits as arg1, cpu_max_bits_warn would cause warning
> > now 854701ba4c39 ("net: fix cpu_max_bits_warn() usage in
> > netif_attrmask_next{,_and}").
> >
> > ------------[ cut here ]------------
> > WARNING: CPU: 2 PID: 1 at include/linux/cpumask.h:110 __netif_set_xps_queue+0x14e/0x770
> > Modules linked in:
>
> Submitting Patches documentation suggests to cut this to only what makes sense
> for the report.
Right, thx for mentioning.

>
> --
> With Best Regards,
> Andy Shevchenko
>
>
  
Jakub Kicinski Oct. 14, 2022, 3:52 p.m. UTC | #8
On Fri, 14 Oct 2022 14:38:56 +0800 Guo Ren wrote:
> > This does not look equivalent, have you tested it?
> >
> > nr_ids is unsigned, doesn't it mean we'll never enter the loop?  
> 
> Yes, you are right. Any unsigned int would break the result.
> (gdb) p (int)-1 < (int)2
> $1 = 1
> (gdb) p (int)-1 < (unsigned int)2
> $2 = 0
> (gdb) p (unsigned int)-1 < (int)2
> $4 = 0
> 
> So it should be:
>  -     for (j = -1; j = netif_attrmask_next_and(j, online_mask, mask, nr_ids),
>  -          j < nr_ids;) {
>  +     for (j = -1; j < (int)nr_ids;
>  +          j = netif_attrmask_next_and(j, online_mask, mask, nr_ids)) {
> 
> Right? Of cause, nr_ids couldn't be 0xffffffff (-1).

No. You can't enter the loop with -1 as the iterator either. 
Let's move on.
  
Jakub Kicinski Oct. 14, 2022, 4:03 p.m. UTC | #9
On Thu, 13 Oct 2022 21:42:41 -0700 Yury Norov wrote:
> > Oh, it was reposted today:
> > 
> > https://lore.kernel.org/all/20221013234349.1165689-2-yury.norov@gmail.com/
> > 
> > But we need a revert of 854701ba4c as well to cover the issue back up
> > for 6.1, AFAIU.  
> 
> The patch 854701ba4c is technically correct. I fixed most of warnings in
> advance, but nobody can foresee everything, right? I expected some noise,
> and now we have just a few things to fix.

I got 6 warnings booting my machine after pulling back from Linus
(which included your patches in net for the first time).
And that's not including the XPS and the virtio warning.

> This is what for -rc releases exist, didn't they?
> 
> I suggest to keep the patch, because this is the only way to make
> cpumask_check()-related issues visible to people. If things will go as
> they go now, I expect that -rc3 will be clean from cpumask_check()
> warnings.

This sounds too close to saying that "it's okay for -rc1 to be broken".
Why were your changes not in linux-next for a month before the merge
window? :(

We will not be merging a refactoring series into net to silence an
arguably over-eager warning. We need a minimal fix, Guo Ren's patches
seem to miss the mark so I reckon the best use of everyone's time is 
to just drop the exposing patch and retry in -next 🤷
  
Yury Norov Oct. 14, 2022, 4:16 p.m. UTC | #10
On Fri, Oct 14, 2022 at 9:03 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 13 Oct 2022 21:42:41 -0700 Yury Norov wrote:
> > > Oh, it was reposted today:
> > >
> > > https://lore.kernel.org/all/20221013234349.1165689-2-yury.norov@gmail.com/
> > >
> > > But we need a revert of 854701ba4c as well to cover the issue back up
> > > for 6.1, AFAIU.
> >
> > The patch 854701ba4c is technically correct. I fixed most of warnings in
> > advance, but nobody can foresee everything, right? I expected some noise,
> > and now we have just a few things to fix.
>
> I got 6 warnings booting my machine after pulling back from Linus
> (which included your patches in net for the first time).
> And that's not including the XPS and the virtio warning.
>
> > This is what for -rc releases exist, didn't they?
> >
> > I suggest to keep the patch, because this is the only way to make
> > cpumask_check()-related issues visible to people. If things will go as
> > they go now, I expect that -rc3 will be clean from cpumask_check()
> > warnings.
>
> This sounds too close to saying that "it's okay for -rc1 to be broken".
> Why were your changes not in linux-next for a month before the merge
> window? :(

They spent about a month in -next. Nobody cared.

> We will not be merging a refactoring series into net to silence an
> arguably over-eager warning. We need a minimal fix, Guo Ren's patches
> seem to miss the mark so I reckon the best use of everyone's time is
> to just drop the exposing patch and retry in -next 🤷

If you prefer treating symptoms rather than the disease - I have nothing
to add.
  
Jakub Kicinski Oct. 14, 2022, 6:03 p.m. UTC | #11
On Fri, 14 Oct 2022 09:16:01 -0700 Yury Norov wrote:
> > We will not be merging a refactoring series into net to silence an
> > arguably over-eager warning. We need a minimal fix, Guo Ren's patches
> > seem to miss the mark so I reckon the best use of everyone's time is
> > to just drop the exposing patch and retry in -next 🤷  
> 
> If you prefer treating symptoms rather than the disease - I have nothing
> to add.

I don't, but we may consider different things to be "the disease".
Please do not insinuate that I don't care about fixing bugs.

What I can grok from the history and your commit messages is that 
you want to catch people who pass what you consider invalid inputs 
to the helpers, but nothing will crash/OOB access here, because 
the helper double checks that the input is < nr_bits.

So it's a nice cleanup and refactoring, sure, but not an urgent fix
that needs to go to Linus ASAP.

If that's not what you're fixing please explain, I believe I already
asked you to clarify before. And the commit message aren't exactly
informative either.
  
Guo Ren Oct. 15, 2022, 1:38 a.m. UTC | #12
On Fri, Oct 14, 2022 at 11:52 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Fri, 14 Oct 2022 14:38:56 +0800 Guo Ren wrote:
> > > This does not look equivalent, have you tested it?
> > >
> > > nr_ids is unsigned, doesn't it mean we'll never enter the loop?
> >
> > Yes, you are right. Any unsigned int would break the result.
> > (gdb) p (int)-1 < (int)2
> > $1 = 1
> > (gdb) p (int)-1 < (unsigned int)2
> > $2 = 0
> > (gdb) p (unsigned int)-1 < (int)2
> > $4 = 0
> >
> > So it should be:
> >  -     for (j = -1; j = netif_attrmask_next_and(j, online_mask, mask, nr_ids),
> >  -          j < nr_ids;) {
> >  +     for (j = -1; j < (int)nr_ids;
> >  +          j = netif_attrmask_next_and(j, online_mask, mask, nr_ids)) {
> >
> > Right? Of cause, nr_ids couldn't be 0xffffffff (-1).
>
> No. You can't enter the loop with -1 as the iterator either.
> Let's move on.
Oops, how about the below:
     for (j = netif_attrmask_next_and(-1, online_mask, mask, nr_ids);
j < (int)nr_ids;
          j = netif_attrmask_next_and(j, online_mask, mask, nr_ids)) {
  
Guo Ren Oct. 15, 2022, 1:41 a.m. UTC | #13
On Sat, Oct 15, 2022 at 12:03 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 13 Oct 2022 21:42:41 -0700 Yury Norov wrote:
> > > Oh, it was reposted today:
> > >
> > > https://lore.kernel.org/all/20221013234349.1165689-2-yury.norov@gmail.com/
> > >
> > > But we need a revert of 854701ba4c as well to cover the issue back up
> > > for 6.1, AFAIU.
> >
> > The patch 854701ba4c is technically correct. I fixed most of warnings in
> > advance, but nobody can foresee everything, right? I expected some noise,
> > and now we have just a few things to fix.
>
> I got 6 warnings booting my machine after pulling back from Linus
> (which included your patches in net for the first time).
> And that's not including the XPS and the virtio warning.
Oh, that's a wide effect than we thought.

>
> > This is what for -rc releases exist, didn't they?
> >
> > I suggest to keep the patch, because this is the only way to make
> > cpumask_check()-related issues visible to people. If things will go as
> > they go now, I expect that -rc3 will be clean from cpumask_check()
> > warnings.
>
> This sounds too close to saying that "it's okay for -rc1 to be broken".
> Why were your changes not in linux-next for a month before the merge
> window? :(
>
> We will not be merging a refactoring series into net to silence an
> arguably over-eager warning. We need a minimal fix, Guo Ren's patches
> seem to miss the mark so I reckon the best use of everyone's time is
> to just drop the exposing patch and retry in -next 🤷
  

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index fa53830d0683..9ec8b10ae329 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2589,8 +2589,8 @@  int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
 		copy = true;
 
 	/* allocate memory for queue storage */
-	for (j = -1; j = netif_attrmask_next_and(j, online_mask, mask, nr_ids),
-	     j < nr_ids;) {
+	for (j = -1; j < nr_ids;
+	     j = netif_attrmask_next_and(j, online_mask, mask, nr_ids)) {
 		if (!new_dev_maps) {
 			new_dev_maps = kzalloc(maps_sz, GFP_KERNEL);
 			if (!new_dev_maps) {