[net-next,2/2] bonding: fix link recovery in mode 2 when updelay is nonzero

Message ID cb89b92af89973ee049a696c362b4a2abfdd9b82.1668800711.git.jtoppins@redhat.com
State New
Headers
Series [net-next,1/2] selftests: bonding: up/down delay w/ slave link flapping |

Commit Message

Jonathan Toppins Nov. 18, 2022, 8:30 p.m. UTC
  Before this change when a bond in mode 2 lost link, all of its slaves
lost link, the bonding device would never recover even after the
expiration of updelay. This change removes the updelay when the bond
currently has no usable links. Conforming to bonding.txt section 13.1
paragraph 4.

Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
---
 drivers/net/bonding/bond_main.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)
  

Comments

Paolo Abeni Nov. 22, 2022, 10:59 a.m. UTC | #1
Hello,

On Fri, 2022-11-18 at 15:30 -0500, Jonathan Toppins wrote:
> Before this change when a bond in mode 2 lost link, all of its slaves
> lost link, the bonding device would never recover even after the
> expiration of updelay. This change removes the updelay when the bond
> currently has no usable links. Conforming to bonding.txt section 13.1
> paragraph 4.
> 
> Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>

Why are you targeting net-next? This looks like something suitable to
the -net tree to me. If, so could you please include a Fixes tag?

Note that we can add new self-tests even via the -net tree.

Thanks,

Paolo
  
Jonathan Toppins Nov. 22, 2022, 1:36 p.m. UTC | #2
On 11/22/22 05:59, Paolo Abeni wrote:
> Hello,
> 
> On Fri, 2022-11-18 at 15:30 -0500, Jonathan Toppins wrote:
>> Before this change when a bond in mode 2 lost link, all of its slaves
>> lost link, the bonding device would never recover even after the
>> expiration of updelay. This change removes the updelay when the bond
>> currently has no usable links. Conforming to bonding.txt section 13.1
>> paragraph 4.
>>
>> Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
> 
> Why are you targeting net-next? This looks like something suitable to
> the -net tree to me. If, so could you please include a Fixes tag?
> 
> Note that we can add new self-tests even via the -net tree.
> 

I could not find a reasonable fixes tag for this, hence why I targeted 
the net-next tree.

-Jon
  
Paolo Abeni Nov. 22, 2022, 2:45 p.m. UTC | #3
On Tue, 2022-11-22 at 08:36 -0500, Jonathan Toppins wrote:
> On 11/22/22 05:59, Paolo Abeni wrote:
> > Hello,
> > 
> > On Fri, 2022-11-18 at 15:30 -0500, Jonathan Toppins wrote:
> > > Before this change when a bond in mode 2 lost link, all of its slaves
> > > lost link, the bonding device would never recover even after the
> > > expiration of updelay. This change removes the updelay when the bond
> > > currently has no usable links. Conforming to bonding.txt section 13.1
> > > paragraph 4.
> > > 
> > > Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
> > 
> > Why are you targeting net-next? This looks like something suitable to
> > the -net tree to me. If, so could you please include a Fixes tag?
> > 
> > Note that we can add new self-tests even via the -net tree.
> > 
> 
> I could not find a reasonable fixes tag for this, hence why I targeted 
> the net-next tree.

When in doubt I think it's preferrable to point out a commit surely
affected by the issue - even if that is possibly not the one
introducing the issue - than no Fixes as all. The lack of tag will make
more difficult the work for stable teams.

In this specific case I think that:

Fixes: 41f891004063 ("bonding: ignore updelay param when there is no active slave")

should be ok, WDYT? if you agree would you mind repost for -net?

Thanks,

Paolo
  
Jonathan Toppins Nov. 22, 2022, 3:37 p.m. UTC | #4
On 11/22/22 09:45, Paolo Abeni wrote:
> On Tue, 2022-11-22 at 08:36 -0500, Jonathan Toppins wrote:
>> On 11/22/22 05:59, Paolo Abeni wrote:
>>> Hello,
>>>
>>> On Fri, 2022-11-18 at 15:30 -0500, Jonathan Toppins wrote:
>>>> Before this change when a bond in mode 2 lost link, all of its slaves
>>>> lost link, the bonding device would never recover even after the
>>>> expiration of updelay. This change removes the updelay when the bond
>>>> currently has no usable links. Conforming to bonding.txt section 13.1
>>>> paragraph 4.
>>>>
>>>> Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
>>>
>>> Why are you targeting net-next? This looks like something suitable to
>>> the -net tree to me. If, so could you please include a Fixes tag?
>>>
>>> Note that we can add new self-tests even via the -net tree.
>>>
>>
>> I could not find a reasonable fixes tag for this, hence why I targeted
>> the net-next tree.
> 
> When in doubt I think it's preferrable to point out a commit surely
> affected by the issue - even if that is possibly not the one
> introducing the issue - than no Fixes as all. The lack of tag will make
> more difficult the work for stable teams.
> 
> In this specific case I think that:
> 
> Fixes: 41f891004063 ("bonding: ignore updelay param when there is no active slave")
> 
> should be ok, WDYT? if you agree would you mind repost for -net?
> 
> Thanks,
> 
> Paolo
> 

Yes that looks like a good one. I will repost to -net a v2 that includes 
changes to reduce the number of icmp echos sent before failing the test.

Thanks,
-Jon
  
Nikolay Aleksandrov Nov. 22, 2022, 9:12 p.m. UTC | #5
On 22/11/2022 17:37, Jonathan Toppins wrote:
> On 11/22/22 09:45, Paolo Abeni wrote:
>> On Tue, 2022-11-22 at 08:36 -0500, Jonathan Toppins wrote:
>>> On 11/22/22 05:59, Paolo Abeni wrote:
>>>> Hello,
>>>>
>>>> On Fri, 2022-11-18 at 15:30 -0500, Jonathan Toppins wrote:
>>>>> Before this change when a bond in mode 2 lost link, all of its slaves
>>>>> lost link, the bonding device would never recover even after the
>>>>> expiration of updelay. This change removes the updelay when the bond
>>>>> currently has no usable links. Conforming to bonding.txt section 13.1
>>>>> paragraph 4.
>>>>>
>>>>> Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
>>>>
>>>> Why are you targeting net-next? This looks like something suitable to
>>>> the -net tree to me. If, so could you please include a Fixes tag?
>>>>
>>>> Note that we can add new self-tests even via the -net tree.
>>>>
>>>
>>> I could not find a reasonable fixes tag for this, hence why I targeted
>>> the net-next tree.
>>
>> When in doubt I think it's preferrable to point out a commit surely
>> affected by the issue - even if that is possibly not the one
>> introducing the issue - than no Fixes as all. The lack of tag will make
>> more difficult the work for stable teams.
>>
>> In this specific case I think that:
>>
>> Fixes: 41f891004063 ("bonding: ignore updelay param when there is no active slave")
>>
>> should be ok, WDYT? if you agree would you mind repost for -net?
>>
>> Thanks,
>>
>> Paolo
>>
> 
> Yes that looks like a good one. I will repost to -net a v2 that includes changes to reduce the number of icmp echos sent before failing the test.
> 
> Thanks,
> -Jon
> 

One minor nit - could you please change "mode 2" to "mode balance-xor" ?
It saves reviewers some grepping around the code to see what is mode 2.
Obviously one has to dig in the code to see how it's affected, but still
it is a bit more understandable. It'd be nice to add more as to why the link is not recovered,
I get it after reading the code, but it would be nice to include a more detailed explanation in the
commit message as well.

Thanks,
 Nik
  
Nikolay Aleksandrov Nov. 22, 2022, 9:15 p.m. UTC | #6
On 22/11/2022 23:12, Nikolay Aleksandrov wrote:
> On 22/11/2022 17:37, Jonathan Toppins wrote:
>> On 11/22/22 09:45, Paolo Abeni wrote:
>>> On Tue, 2022-11-22 at 08:36 -0500, Jonathan Toppins wrote:
>>>> On 11/22/22 05:59, Paolo Abeni wrote:
>>>>> Hello,
>>>>>
>>>>> On Fri, 2022-11-18 at 15:30 -0500, Jonathan Toppins wrote:
>>>>>> Before this change when a bond in mode 2 lost link, all of its slaves
>>>>>> lost link, the bonding device would never recover even after the
>>>>>> expiration of updelay. This change removes the updelay when the bond
>>>>>> currently has no usable links. Conforming to bonding.txt section 13.1
>>>>>> paragraph 4.
>>>>>>
>>>>>> Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
>>>>>
>>>>> Why are you targeting net-next? This looks like something suitable to
>>>>> the -net tree to me. If, so could you please include a Fixes tag?
>>>>>
>>>>> Note that we can add new self-tests even via the -net tree.
>>>>>
>>>>
>>>> I could not find a reasonable fixes tag for this, hence why I targeted
>>>> the net-next tree.
>>>
>>> When in doubt I think it's preferrable to point out a commit surely
>>> affected by the issue - even if that is possibly not the one
>>> introducing the issue - than no Fixes as all. The lack of tag will make
>>> more difficult the work for stable teams.
>>>
>>> In this specific case I think that:
>>>
>>> Fixes: 41f891004063 ("bonding: ignore updelay param when there is no active slave")
>>>
>>> should be ok, WDYT? if you agree would you mind repost for -net?
>>>
>>> Thanks,
>>>
>>> Paolo
>>>
>>
>> Yes that looks like a good one. I will repost to -net a v2 that includes changes to reduce the number of icmp echos sent before failing the test.
>>
>> Thanks,
>> -Jon
>>
> 
> One minor nit - could you please change "mode 2" to "mode balance-xor" ?
> It saves reviewers some grepping around the code to see what is mode 2.
> Obviously one has to dig in the code to see how it's affected, but still
> it is a bit more understandable. It'd be nice to add more as to why the link is not recovered,
> I get it after reading the code, but it would be nice to include a more detailed explanation in the
> commit message as well.
> 
> Thanks,
>  Nik
> 

Ah, I just noticed I'm late to the party. :)
Nevermind my comments, no need for a v3.
  
Jonathan Toppins Nov. 22, 2022, 9:17 p.m. UTC | #7
On 11/22/22 16:15, Nikolay Aleksandrov wrote:
> On 22/11/2022 23:12, Nikolay Aleksandrov wrote:
>> On 22/11/2022 17:37, Jonathan Toppins wrote:
>>> On 11/22/22 09:45, Paolo Abeni wrote:
>>>> On Tue, 2022-11-22 at 08:36 -0500, Jonathan Toppins wrote:
>>>>> On 11/22/22 05:59, Paolo Abeni wrote:
>>>>>> Hello,
>>>>>>
>>>>>> On Fri, 2022-11-18 at 15:30 -0500, Jonathan Toppins wrote:
>>>>>>> Before this change when a bond in mode 2 lost link, all of its slaves
>>>>>>> lost link, the bonding device would never recover even after the
>>>>>>> expiration of updelay. This change removes the updelay when the bond
>>>>>>> currently has no usable links. Conforming to bonding.txt section 13.1
>>>>>>> paragraph 4.
>>>>>>>
>>>>>>> Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
>>>>>>
>>>>>> Why are you targeting net-next? This looks like something suitable to
>>>>>> the -net tree to me. If, so could you please include a Fixes tag?
>>>>>>
>>>>>> Note that we can add new self-tests even via the -net tree.
>>>>>>
>>>>>
>>>>> I could not find a reasonable fixes tag for this, hence why I targeted
>>>>> the net-next tree.
>>>>
>>>> When in doubt I think it's preferrable to point out a commit surely
>>>> affected by the issue - even if that is possibly not the one
>>>> introducing the issue - than no Fixes as all. The lack of tag will make
>>>> more difficult the work for stable teams.
>>>>
>>>> In this specific case I think that:
>>>>
>>>> Fixes: 41f891004063 ("bonding: ignore updelay param when there is no active slave")
>>>>
>>>> should be ok, WDYT? if you agree would you mind repost for -net?
>>>>
>>>> Thanks,
>>>>
>>>> Paolo
>>>>
>>>
>>> Yes that looks like a good one. I will repost to -net a v2 that includes changes to reduce the number of icmp echos sent before failing the test.
>>>
>>> Thanks,
>>> -Jon
>>>
>>
>> One minor nit - could you please change "mode 2" to "mode balance-xor" ?
>> It saves reviewers some grepping around the code to see what is mode 2.
>> Obviously one has to dig in the code to see how it's affected, but still
>> it is a bit more understandable. It'd be nice to add more as to why the link is not recovered,
>> I get it after reading the code, but it would be nice to include a more detailed explanation in the
>> commit message as well.
>>
>> Thanks,
>>   Nik
>>
> 
> Ah, I just noticed I'm late to the party. :)
> Nevermind my comments, no need for a v3.
> 

If there are other issues with v2. I will gladly include these comments 
in a v3.

Thanks,
-Jon
  

Patch

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 1cd4e71916f8..6c4348245d1f 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2529,7 +2529,16 @@  static int bond_miimon_inspect(struct bonding *bond)
 	struct slave *slave;
 	bool ignore_updelay;
 
-	ignore_updelay = !rcu_dereference(bond->curr_active_slave);
+	if (BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP) {
+		ignore_updelay = !rcu_dereference(bond->curr_active_slave);
+	} else {
+		struct bond_up_slave *usable_slaves;
+
+		usable_slaves = rcu_dereference(bond->usable_slaves);
+
+		if (usable_slaves && usable_slaves->count == 0)
+			ignore_updelay = true;
+	}
 
 	bond_for_each_slave_rcu(bond, slave, iter) {
 		bond_propose_link_state(slave, BOND_LINK_NOCHANGE);