[v2,0/3] selftests/net: A couple of typos fixes in key-management/rst tests

Message ID 20240130-tcp-ao-test-key-mgmt-v2-0-d190430a6c60@arista.com
Headers
Series selftests/net: A couple of typos fixes in key-management/rst tests |

Message

Dmitry Safonov Jan. 30, 2024, 3:51 a.m. UTC
  Changes in v2:
- Dropped "selftests/net: Clean-up double assignment", going to send it
  to net-next with other changes (Simon)
- Added a patch to rectify RST selftests.
- Link to v1: https://lore.kernel.org/r/20240118-tcp-ao-test-key-mgmt-v1-0-3583ca147113@arista.com

Two typo fixes, noticed by Mohammad's review.
And a fix for an issue that got uncovered.

Signed-off-by: Dmitry Safonov <dima@arista.com>
---
Dmitry Safonov (2):
      selftests/net: Rectify key counters checks
      selftests/net: Repair RST passive reset selftest

Mohammad Nassiri (1):
      selftests/net: Argument value mismatch when calling verify_counters()

 .../testing/selftests/net/tcp_ao/key-management.c  |  46 ++++---
 tools/testing/selftests/net/tcp_ao/lib/sock.c      |  12 +-
 tools/testing/selftests/net/tcp_ao/rst.c           | 138 ++++++++++++++-------
 3 files changed, 124 insertions(+), 72 deletions(-)
---
base-commit: ecb1b8288dc7ccbdcb3b9df005fa1c0e0c0388a7
change-id: 20240118-tcp-ao-test-key-mgmt-bb51a5fe15a2

Best regards,
  

Comments

Jakub Kicinski Feb. 1, 2024, 12:36 a.m. UTC | #1
On Tue, 30 Jan 2024 03:51:51 +0000 Dmitry Safonov wrote:
> Two typo fixes, noticed by Mohammad's review.
> And a fix for an issue that got uncovered.

I can confirm that all tests pass now :)
Thank you!
  
patchwork-bot+netdevbpf@kernel.org Feb. 1, 2024, 12:40 a.m. UTC | #2
Hello:

This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue, 30 Jan 2024 03:51:51 +0000 you wrote:
> Changes in v2:
> - Dropped "selftests/net: Clean-up double assignment", going to send it
>   to net-next with other changes (Simon)
> - Added a patch to rectify RST selftests.
> - Link to v1: https://lore.kernel.org/r/20240118-tcp-ao-test-key-mgmt-v1-0-3583ca147113@arista.com
> 
> Two typo fixes, noticed by Mohammad's review.
> And a fix for an issue that got uncovered.
> 
> [...]

Here is the summary with links:
  - [v2,1/3] selftests/net: Argument value mismatch when calling verify_counters()
    https://git.kernel.org/netdev/net/c/d8f5df1fcea5
  - [v2,2/3] selftests/net: Rectify key counters checks
    https://git.kernel.org/netdev/net/c/384aa16d3776
  - [v2,3/3] selftests/net: Repair RST passive reset selftest
    https://git.kernel.org/netdev/net/c/6caf3adcc877

You are awesome, thank you!
  
Dmitry Safonov Feb. 1, 2024, 12:50 a.m. UTC | #3
On 2/1/24 00:36, Jakub Kicinski wrote:
> On Tue, 30 Jan 2024 03:51:51 +0000 Dmitry Safonov wrote:
>> Two typo fixes, noticed by Mohammad's review.
>> And a fix for an issue that got uncovered.
> 
> I can confirm that all tests pass now :)
> Thank you!

Thanks Jakub!

Please, let me know if there will be other issues with tcp-ao tests :)

Going to work on tracepoints and some other TCP-AO stuff for net-next.

Thanks,
             Dmitry
  
Jakub Kicinski Feb. 1, 2024, 9:21 p.m. UTC | #4
On Thu, 1 Feb 2024 00:50:46 +0000 Dmitry Safonov wrote:
> Please, let me know if there will be other issues with tcp-ao tests :)
> 
> Going to work on tracepoints and some other TCP-AO stuff for net-next.

Since you're being nice and helpful I figured I'll try testing TCP-AO
with debug options enabled :) (kernel/configs/debug.config and
kernel/configs/x86_debug.config included), that slows things down 
and causes a bit of flakiness in unsigned-md5-* tests:

https://netdev.bots.linux.dev/flakes.html?br-cnt=75&tn-needle=tcp-ao

This has links to outputs:
https://netdev.bots.linux.dev/contest.html?executor=vmksft-tcp-ao-dbg&pass=0

If it's a timing thing - FWIW we started exporting
KSFT_MACHINE_SLOW=yes on the slow runners.
  
Dmitry Safonov Feb. 1, 2024, 10:25 p.m. UTC | #5
Hi Jakub,

On 2/1/24 21:21, Jakub Kicinski wrote:
> On Thu, 1 Feb 2024 00:50:46 +0000 Dmitry Safonov wrote:
>> Please, let me know if there will be other issues with tcp-ao tests :)
>>
>> Going to work on tracepoints and some other TCP-AO stuff for net-next.
> 
> Since you're being nice and helpful I figured I'll try testing TCP-AO
> with debug options enabled :) (kernel/configs/debug.config and
> kernel/configs/x86_debug.config included),

Haha :)

> that slows things down 
> and causes a bit of flakiness in unsigned-md5-* tests:
> 
> https://netdev.bots.linux.dev/flakes.html?br-cnt=75&tn-needle=tcp-ao
> 
> This has links to outputs:
> https://netdev.bots.linux.dev/contest.html?executor=vmksft-tcp-ao-dbg&pass=0
> 
> If it's a timing thing - FWIW we started exporting
> KSFT_MACHINE_SLOW=yes on the slow runners.

I think, I know what happens here:

# ok 8 AO server (AO_REQUIRED): AO client: counter TCPAOGood increased 4
=> 6
# ok 9 AO server (AO_REQUIRED): unsigned client
# ok 10 AO server (AO_REQUIRED): unsigned client: counter TCPAORequired
increased 1 => 2
# not ok 11 AO server (AO_REQUIRED): unsigned client: Counter
netns_ao_good was not expected to increase 7 => 8

for each of tests the server listens at a new port, but re-uses the same
namespaces+veth. If the node/machine is quite slow, I guess a segment
might have been retransmitted and the test that initiated it had already
finished.
And as result, the per-namespace counters are incremented, which makes
the test fail (IOW, the test expects all segments in ns being dropped).

So, I should do one of the options:

1. relax per-namespace checks (the per-socket and per-key counters are
   checked)
2. unshare(net) + veth setup for each test
3. split the selftest on smaller ones (as they create new net-ns in
   initialization)

I'd probably prefer (2), albeit it slows down that slow machine even
more, but I don't think creating 2 net-ns + veth pair per each test
would add a lot more overhead even on some rpi board. But let's see,
maybe I'll just go with (1) as that's really easy.

I'll cook a patch this week.

Thanks,
             Dmitry
  
Dmitry Safonov Feb. 1, 2024, 11:37 p.m. UTC | #6
On 2/1/24 22:25, Dmitry Safonov wrote:
> Hi Jakub,
> 
> On 2/1/24 21:21, Jakub Kicinski wrote:
>> On Thu, 1 Feb 2024 00:50:46 +0000 Dmitry Safonov wrote:
>>> Please, let me know if there will be other issues with tcp-ao tests :)
>>>
>>> Going to work on tracepoints and some other TCP-AO stuff for net-next.
>>
>> Since you're being nice and helpful I figured I'll try testing TCP-AO
>> with debug options enabled :) (kernel/configs/debug.config and
>> kernel/configs/x86_debug.config included),
> 
> Haha :)
> 
>> that slows things down 
>> and causes a bit of flakiness in unsigned-md5-* tests:
>>
>> https://netdev.bots.linux.dev/flakes.html?br-cnt=75&tn-needle=tcp-ao
>>
>> This has links to outputs:
>> https://netdev.bots.linux.dev/contest.html?executor=vmksft-tcp-ao-dbg&pass=0
>>
>> If it's a timing thing - FWIW we started exporting
>> KSFT_MACHINE_SLOW=yes on the slow runners.
> 
> I think, I know what happens here:
> 
> # ok 8 AO server (AO_REQUIRED): AO client: counter TCPAOGood increased 4
> => 6
> # ok 9 AO server (AO_REQUIRED): unsigned client
> # ok 10 AO server (AO_REQUIRED): unsigned client: counter TCPAORequired
> increased 1 => 2
> # not ok 11 AO server (AO_REQUIRED): unsigned client: Counter
> netns_ao_good was not expected to increase 7 => 8
> 
> for each of tests the server listens at a new port, but re-uses the same
> namespaces+veth. If the node/machine is quite slow, I guess a segment
> might have been retransmitted and the test that initiated it had already
> finished.
> And as result, the per-namespace counters are incremented, which makes
> the test fail (IOW, the test expects all segments in ns being dropped).
> 
> So, I should do one of the options:
> 
> 1. relax per-namespace checks (the per-socket and per-key counters are
>    checked)
> 2. unshare(net) + veth setup for each test
> 3. split the selftest on smaller ones (as they create new net-ns in
>    initialization)

Actually, I think there may be an easier fix:

4. Make sure that client close()s TCP-AO first, making it twsk.
   And also make sure that net-ns counters read post server's close().

Will do this, let's see if this fixes the flakiness on the netdev bot :)

> I'd probably prefer (2), albeit it slows down that slow machine even
> more, but I don't think creating 2 net-ns + veth pair per each test
> would add a lot more overhead even on some rpi board. But let's see,
> maybe I'll just go with (1) as that's really easy.
> 
> I'll cook a patch this week.

Thanks,
            Dmitry
  
Dmitry Safonov Feb. 2, 2024, 2:30 a.m. UTC | #7
On 2/1/24 23:37, Dmitry Safonov wrote:
> On 2/1/24 22:25, Dmitry Safonov wrote:
>> Hi Jakub,
>>
>> On 2/1/24 21:21, Jakub Kicinski wrote:
>>> On Thu, 1 Feb 2024 00:50:46 +0000 Dmitry Safonov wrote:
>>>> Please, let me know if there will be other issues with tcp-ao tests :)
>>>>
>>>> Going to work on tracepoints and some other TCP-AO stuff for net-next.
>>>
>>> Since you're being nice and helpful I figured I'll try testing TCP-AO
>>> with debug options enabled :) (kernel/configs/debug.config and
>>> kernel/configs/x86_debug.config included),
>>
>> Haha :)
>>
>>> that slows things down 
>>> and causes a bit of flakiness in unsigned-md5-* tests:
>>>
>>> https://netdev.bots.linux.dev/flakes.html?br-cnt=75&tn-needle=tcp-ao
>>>
>>> This has links to outputs:
>>> https://netdev.bots.linux.dev/contest.html?executor=vmksft-tcp-ao-dbg&pass=0
>>>
>>> If it's a timing thing - FWIW we started exporting
>>> KSFT_MACHINE_SLOW=yes on the slow runners.
>>
>> I think, I know what happens here:
>>
>> # ok 8 AO server (AO_REQUIRED): AO client: counter TCPAOGood increased 4
>> => 6
>> # ok 9 AO server (AO_REQUIRED): unsigned client
>> # ok 10 AO server (AO_REQUIRED): unsigned client: counter TCPAORequired
>> increased 1 => 2
>> # not ok 11 AO server (AO_REQUIRED): unsigned client: Counter
>> netns_ao_good was not expected to increase 7 => 8
>>
>> for each of tests the server listens at a new port, but re-uses the same
>> namespaces+veth. If the node/machine is quite slow, I guess a segment
>> might have been retransmitted and the test that initiated it had already
>> finished.
>> And as result, the per-namespace counters are incremented, which makes
>> the test fail (IOW, the test expects all segments in ns being dropped).
>>
>> So, I should do one of the options:
>>
>> 1. relax per-namespace checks (the per-socket and per-key counters are
>>    checked)
>> 2. unshare(net) + veth setup for each test
>> 3. split the selftest on smaller ones (as they create new net-ns in
>>    initialization)
> 
> Actually, I think there may be an easier fix:
> 
> 4. Make sure that client close()s TCP-AO first, making it twsk.
>    And also make sure that net-ns counters read post server's close().
> 
> Will do this, let's see if this fixes the flakiness on the netdev bot :)

FWIW, I ended up with this:
https://lore.kernel.org/all/20240202-unsigned-md5-netns-counters-v1-1-8b90c37c0566@arista.com/

I reproduced the issue once, running unsigned-md5* in a loop, while in
another terminal building linux-next with all cores.
With the patch above, it survived 77 iterations of both ipv4/ipv6 tests
so far. So, there is a chance it fixes the issue :)

Thanks,
           Dmitry
  
Jakub Kicinski Feb. 2, 2024, 3:13 a.m. UTC | #8
On Fri, 2 Feb 2024 02:30:52 +0000 Dmitry Safonov wrote:
> > Actually, I think there may be an easier fix:
> > 
> > 4. Make sure that client close()s TCP-AO first, making it twsk.
> >    And also make sure that net-ns counters read post server's close().
> > 
> > Will do this, let's see if this fixes the flakiness on the netdev bot :)  
> 
> FWIW, I ended up with this:
> https://lore.kernel.org/all/20240202-unsigned-md5-netns-counters-v1-1-8b90c37c0566@arista.com/
> 
> I reproduced the issue once, running unsigned-md5* in a loop, while in
> another terminal building linux-next with all cores.
> With the patch above, it survived 77 iterations of both ipv4/ipv6 tests
> so far. So, there is a chance it fixes the issue :)

That was quick! Fingers crossed :)