[net] iavf: Do not restart Tx queues after reset task failure

Message ID 20221108102502.2147389-1-ivecera@redhat.com
State New
Headers
Series [net] iavf: Do not restart Tx queues after reset task failure |

Commit Message

Ivan Vecera Nov. 8, 2022, 10:25 a.m. UTC
  After commit aa626da947e9 ("iavf: Detach device during reset task")
the device is detached during reset task and re-attached at its end.
The problem occurs when reset task fails because Tx queues are
restarted during device re-attach and this leads later to a crash.

To resolve this issue properly close the net device in cause of
failure in reset task to avoid restarting of tx queues at the end.
Also replace the hacky manipulation with IFF_UP flag by device close
that clears properly both IFF_UP and __LINK_STATE_START flags.
In these case iavf_close() does not do anything because the adapter
state is already __IAVF_DOWN.

Reproducer:
1) Run some Tx traffic (e.g. iperf3) over iavf interface
2) Set VF trusted / untrusted in loop

[root@host ~]# cat repro.sh
#!/bin/sh

PF=enp65s0f0
IF=${PF}v0

ip link set up $IF
ip addr add 192.168.0.2/24 dev $IF
sleep 1

iperf3 -c 192.168.0.1 -t 600 --logfile /dev/null &
sleep 2

while :; do
        ip link set $PF vf 0 trust on
        ip link set $PF vf 0 trust off
done
[root@host ~]# ./repro.sh

Result:
[ 2006.650969] iavf 0000:41:01.0: Failed to init adminq: -53
[ 2006.675662] ice 0000:41:00.0: VF 0 is now trusted
[ 2006.689997] iavf 0000:41:01.0: Reset task did not complete, VF disabled
[ 2006.696611] iavf 0000:41:01.0: failed to allocate resources during reinit
[ 2006.703209] ice 0000:41:00.0: VF 0 is now untrusted
[ 2006.737011] ice 0000:41:00.0: VF 0 is now trusted
[ 2006.764536] ice 0000:41:00.0: VF 0 is now untrusted
[ 2006.768919] BUG: kernel NULL pointer dereference, address: 0000000000000b4a
[ 2006.776358] #PF: supervisor read access in kernel mode
[ 2006.781488] #PF: error_code(0x0000) - not-present page
[ 2006.786620] PGD 0 P4D 0
[ 2006.789152] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 2006.792903] ice 0000:41:00.0: VF 0 is now trusted
[ 2006.793501] CPU: 4 PID: 0 Comm: swapper/4 Kdump: loaded Not tainted 6.1.0-rc3+ #2
[ 2006.805668] Hardware name: Abacus electric, s.r.o. - servis@abacus.cz Super Server/H12SSW-iN, BIOS 2.4 04/13/2022
[ 2006.815915] RIP: 0010:iavf_xmit_frame_ring+0x96/0xf70 [iavf]
[ 2006.821028] ice 0000:41:00.0: VF 0 is now untrusted
[ 2006.821572] Code: 48 83 c1 04 48 c1 e1 04 48 01 f9 48 83 c0 10 6b 50 f8 55 c1 ea 14 45 8d 64 14 01 48 39 c8 75 eb 41 83 fc 07 0f 8f e9 08 00 00 <0f> b7 45 4a 0f b7 55 48 41 8d 74 24 05 31 c9 66 39 d0 0f 86 da 00
[ 2006.845181] RSP: 0018:ffffb253004bc9e8 EFLAGS: 00010293
[ 2006.850397] RAX: ffff9d154de45b00 RBX: ffff9d15497d52e8 RCX: ffff9d154de45b00
[ 2006.856327] ice 0000:41:00.0: VF 0 is now trusted
[ 2006.857523] RDX: 0000000000000000 RSI: 00000000000005a8 RDI: ffff9d154de45ac0
[ 2006.857525] RBP: 0000000000000b00 R08: ffff9d159cb010ac R09: 0000000000000001
[ 2006.857526] R10: ffff9d154de45940 R11: 0000000000000000 R12: 0000000000000002
[ 2006.883600] R13: ffff9d1770838dc0 R14: 0000000000000000 R15: ffffffffc07b8380
[ 2006.885840] ice 0000:41:00.0: VF 0 is now untrusted
[ 2006.890725] FS:  0000000000000000(0000) GS:ffff9d248e900000(0000) knlGS:0000000000000000
[ 2006.890727] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2006.909419] CR2: 0000000000000b4a CR3: 0000000c39c10002 CR4: 0000000000770ee0
[ 2006.916543] PKRU: 55555554
[ 2006.918254] ice 0000:41:00.0: VF 0 is now trusted
[ 2006.919248] Call Trace:
[ 2006.919250]  <IRQ>
[ 2006.919252]  dev_hard_start_xmit+0x9e/0x1f0
[ 2006.932587]  sch_direct_xmit+0xa0/0x370
[ 2006.936424]  __dev_queue_xmit+0x7af/0xd00
[ 2006.940429]  ip_finish_output2+0x26c/0x540
[ 2006.944519]  ip_output+0x71/0x110
[ 2006.947831]  ? __ip_finish_output+0x2b0/0x2b0
[ 2006.952180]  __ip_queue_xmit+0x16d/0x400
[ 2006.952721] ice 0000:41:00.0: VF 0 is now untrusted
[ 2006.956098]  __tcp_transmit_skb+0xa96/0xbf0
[ 2006.965148]  __tcp_retransmit_skb+0x174/0x860
[ 2006.969499]  ? cubictcp_cwnd_event+0x40/0x40
[ 2006.973769]  tcp_retransmit_skb+0x14/0xb0
...

Fixes: aa626da947e9 ("iavf: Detach device during reset task")
Cc: Jacob Keller <jacob.e.keller@intel.com>
Cc: Patryk Piotrowski <patryk.piotrowski@intel.com>
Cc: SlawomirX Laba <slawomirx.laba@intel.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
---
 drivers/net/ethernet/intel/iavf/iavf_main.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)
  

Comments

Jacob Keller Nov. 8, 2022, 4:40 p.m. UTC | #1
On 11/8/2022 2:25 AM, Ivan Vecera wrote:
> After commit aa626da947e9 ("iavf: Detach device during reset task")
> the device is detached during reset task and re-attached at its end.
> The problem occurs when reset task fails because Tx queues are
> restarted during device re-attach and this leads later to a crash.
> 
> To resolve this issue properly close the net device in cause of
> failure in reset task to avoid restarting of tx queues at the end.
> Also replace the hacky manipulation with IFF_UP flag by device close
> that clears properly both IFF_UP and __LINK_STATE_START flags.
> In these case iavf_close() does not do anything because the adapter
> state is already __IAVF_DOWN.

Thanks for fixing this. I thought we'd removed the last such use of 
incorrect IFF_UP munging in 7d59706dbef8 ("Revert "iavf: Fix deadlock 
occurrence during resetting VF interface""), but apparently we had 
missed one...

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
  
Leon Romanovsky Nov. 9, 2022, 6:20 p.m. UTC | #2
On Tue, Nov 08, 2022 at 11:25:02AM +0100, Ivan Vecera wrote:
> After commit aa626da947e9 ("iavf: Detach device during reset task")
> the device is detached during reset task and re-attached at its end.
> The problem occurs when reset task fails because Tx queues are
> restarted during device re-attach and this leads later to a crash.

<...>

> +	if (netif_running(netdev)) {
> +		/* Close device to ensure that Tx queues will not be started
> +		 * during netif_device_attach() at the end of the reset task.
> +		 */
> +		rtnl_lock();
> +		dev_close(netdev);
> +		rtnl_unlock();
> +	}

Sorry for my naive question, I see this pattern a lot (including RDMA), 
so curious. Everyone checks netif_running() outside of rtnl_lock, while
dev_close() changes state bit __LINK_STATE_START. Shouldn't rtnl_lock()
placed before netif_running()?

Thanks

> +
>  	dev_err(&adapter->pdev->dev, "failed to allocate resources during reinit\n");
>  reset_finish:
>  	rtnl_lock();
> -- 
> 2.37.4
>
  
Jacob Keller Nov. 9, 2022, 8:11 p.m. UTC | #3
> -----Original Message-----
> From: Leon Romanovsky <leon@kernel.org>
> Sent: Wednesday, November 9, 2022 10:21 AM
> To: ivecera <ivecera@redhat.com>
> Cc: netdev@vger.kernel.org; sassmann@redhat.com; Keller, Jacob E
> <jacob.e.keller@intel.com>; Piotrowski, Patryk <patryk.piotrowski@intel.com>;
> SlawomirX Laba <slawomirx.laba@intel.com>; Brandeburg, Jesse
> <jesse.brandeburg@intel.com>; Nguyen, Anthony L
> <anthony.l.nguyen@intel.com>; David S. Miller <davem@davemloft.net>; Eric
> Dumazet <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo
> Abeni <pabeni@redhat.com>; moderated list:INTEL ETHERNET DRIVERS <intel-
> wired-lan@lists.osuosl.org>; open list <linux-kernel@vger.kernel.org>
> Subject: Re: [PATCH net] iavf: Do not restart Tx queues after reset task failure
> 
> On Tue, Nov 08, 2022 at 11:25:02AM +0100, Ivan Vecera wrote:
> > After commit aa626da947e9 ("iavf: Detach device during reset task")
> > the device is detached during reset task and re-attached at its end.
> > The problem occurs when reset task fails because Tx queues are
> > restarted during device re-attach and this leads later to a crash.
> 
> <...>
> 
> > +	if (netif_running(netdev)) {
> > +		/* Close device to ensure that Tx queues will not be started
> > +		 * during netif_device_attach() at the end of the reset task.
> > +		 */
> > +		rtnl_lock();
> > +		dev_close(netdev);
> > +		rtnl_unlock();
> > +	}
> 
> Sorry for my naive question, I see this pattern a lot (including RDMA),
> so curious. Everyone checks netif_running() outside of rtnl_lock, while
> dev_close() changes state bit __LINK_STATE_START. Shouldn't rtnl_lock()
> placed before netif_running()?

Yes I think you're right. A ton of people check it without the lock but I think thats not strictly safe. Is dev_close safe to call when netif_running is false? Why not just remove the check and always call dev_close then.

Thanks,
Jake

> 
> Thanks
> 
> > +
> >  	dev_err(&adapter->pdev->dev, "failed to allocate resources during
> reinit\n");
> >  reset_finish:
> >  	rtnl_lock();
> > --
> > 2.37.4
> >
  
Leon Romanovsky Nov. 10, 2022, 9:17 a.m. UTC | #4
On Wed, Nov 09, 2022 at 08:11:55PM +0000, Keller, Jacob E wrote:
> 
> 
> > -----Original Message-----
> > From: Leon Romanovsky <leon@kernel.org>
> > Sent: Wednesday, November 9, 2022 10:21 AM
> > To: ivecera <ivecera@redhat.com>
> > Cc: netdev@vger.kernel.org; sassmann@redhat.com; Keller, Jacob E
> > <jacob.e.keller@intel.com>; Piotrowski, Patryk <patryk.piotrowski@intel.com>;
> > SlawomirX Laba <slawomirx.laba@intel.com>; Brandeburg, Jesse
> > <jesse.brandeburg@intel.com>; Nguyen, Anthony L
> > <anthony.l.nguyen@intel.com>; David S. Miller <davem@davemloft.net>; Eric
> > Dumazet <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo
> > Abeni <pabeni@redhat.com>; moderated list:INTEL ETHERNET DRIVERS <intel-
> > wired-lan@lists.osuosl.org>; open list <linux-kernel@vger.kernel.org>
> > Subject: Re: [PATCH net] iavf: Do not restart Tx queues after reset task failure
> > 
> > On Tue, Nov 08, 2022 at 11:25:02AM +0100, Ivan Vecera wrote:
> > > After commit aa626da947e9 ("iavf: Detach device during reset task")
> > > the device is detached during reset task and re-attached at its end.
> > > The problem occurs when reset task fails because Tx queues are
> > > restarted during device re-attach and this leads later to a crash.
> > 
> > <...>
> > 
> > > +	if (netif_running(netdev)) {
> > > +		/* Close device to ensure that Tx queues will not be started
> > > +		 * during netif_device_attach() at the end of the reset task.
> > > +		 */
> > > +		rtnl_lock();
> > > +		dev_close(netdev);
> > > +		rtnl_unlock();
> > > +	}
> > 
> > Sorry for my naive question, I see this pattern a lot (including RDMA),
> > so curious. Everyone checks netif_running() outside of rtnl_lock, while
> > dev_close() changes state bit __LINK_STATE_START. Shouldn't rtnl_lock()
> > placed before netif_running()?
> 
> Yes I think you're right. A ton of people check it without the lock but I think thats not strictly safe. Is dev_close safe to call when netif_running is false? Why not just remove the check and always call dev_close then.

I honestly don't know.

To remove any doubts, this patch is LGTM.

Thanks,
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
  
Ivan Vecera Nov. 10, 2022, 2:51 p.m. UTC | #5
On Wed, 9 Nov 2022 20:11:55 +0000
"Keller, Jacob E" <jacob.e.keller@intel.com> wrote:

> > Sorry for my naive question, I see this pattern a lot (including RDMA),
> > so curious. Everyone checks netif_running() outside of rtnl_lock, while
> > dev_close() changes state bit __LINK_STATE_START. Shouldn't rtnl_lock()
> > placed before netif_running()?  
> 
> Yes I think you're right. A ton of people check it without the lock but I think thats not strictly safe. Is dev_close safe to call when netif_running is false? Why not just remove the check and always call dev_close then.
> 
> Thanks,
> Jake

Check for a bit value (like netif_runnning()) is much cheaper than unconditionally
taking global lock like RTNL.

Ivan
  
Leon Romanovsky Nov. 10, 2022, 5:07 p.m. UTC | #6
On Thu, Nov 10, 2022 at 03:51:47PM +0100, Ivan Vecera wrote:
> On Wed, 9 Nov 2022 20:11:55 +0000
> "Keller, Jacob E" <jacob.e.keller@intel.com> wrote:
> 
> > > Sorry for my naive question, I see this pattern a lot (including RDMA),
> > > so curious. Everyone checks netif_running() outside of rtnl_lock, while
> > > dev_close() changes state bit __LINK_STATE_START. Shouldn't rtnl_lock()
> > > placed before netif_running()?  
> > 
> > Yes I think you're right. A ton of people check it without the lock but I think thats not strictly safe. Is dev_close safe to call when netif_running is false? Why not just remove the check and always call dev_close then.
> > 
> > Thanks,
> > Jake
> 
> Check for a bit value (like netif_runnning()) is much cheaper than unconditionally
> taking global lock like RTNL.

This cheap operation is racy and performed in non-performance critical path.

Thanks

> 
> Ivan
>
  
Leon Romanovsky Nov. 10, 2022, 9:07 p.m. UTC | #7
On Thu, Nov 10, 2022 at 12:24:18PM -0800, Jakub Kicinski wrote:
> On Thu, 10 Nov 2022 19:07:02 +0200 Leon Romanovsky wrote:
> > > > Yes I think you're right. A ton of people check it without the
> > > > lock but I think thats not strictly safe. Is dev_close safe to
> > > > call when netif_running is false? Why not just remove the check
> > > > and always call dev_close then.
> > > 
> > > Check for a bit value (like netif_runnning()) is much cheaper than
> > > unconditionally taking global lock like RTNL.  
> > 
> > This cheap operation is racy and performed in non-performance
> > critical path.
> 
> To be clear - the rtnl_lock around the entire if is still racy 
> in the grand scheme of things, no? What's stopping someone from
> bringing the device right back up after you drop the lock?

I want to believe what there is some sort of state machine that won't
allow simple toggling of dev_close/dev_open. If it doesn't, rtnl_lock
users should audit their code for possible toggling right after that
lock is dropped.

Anyway, this discussion reminds me our devl_lock debate where we had
completely opposite views if rtnl_lock model is the right one.
https://lore.kernel.org/netdev/20211101073259.33406da3@kicinski-fedora-PC1C0HJN/

Let's not start argue again, we had enough back then. :)

Thanks
  
Jacob Keller Nov. 10, 2022, 9:13 p.m. UTC | #8
> -----Original Message-----
> From: Leon Romanovsky <leon@kernel.org>
> Sent: Thursday, November 10, 2022 1:07 PM
> To: Jakub Kicinski <kuba@kernel.org>
> Cc: ivecera <ivecera@redhat.com>; Keller, Jacob E <jacob.e.keller@intel.com>;
> netdev@vger.kernel.org; sassmann@redhat.com; Piotrowski, Patryk
> <patryk.piotrowski@intel.com>; SlawomirX Laba <slawomirx.laba@intel.com>;
> Brandeburg, Jesse <jesse.brandeburg@intel.com>; Nguyen, Anthony L
> <anthony.l.nguyen@intel.com>; David S. Miller <davem@davemloft.net>; Eric
> Dumazet <edumazet@google.com>; Paolo Abeni <pabeni@redhat.com>; intel-
> wired-lan@lists.osuosl.org; open list <linux-kernel@vger.kernel.org>
> Subject: Re: [PATCH net] iavf: Do not restart Tx queues after reset task failure
> 
> On Thu, Nov 10, 2022 at 12:24:18PM -0800, Jakub Kicinski wrote:
> > On Thu, 10 Nov 2022 19:07:02 +0200 Leon Romanovsky wrote:
> > > > > Yes I think you're right. A ton of people check it without the
> > > > > lock but I think thats not strictly safe. Is dev_close safe to
> > > > > call when netif_running is false? Why not just remove the check
> > > > > and always call dev_close then.
> > > >
> > > > Check for a bit value (like netif_runnning()) is much cheaper than
> > > > unconditionally taking global lock like RTNL.
> > >
> > > This cheap operation is racy and performed in non-performance
> > > critical path.
> >
> > To be clear - the rtnl_lock around the entire if is still racy
> > in the grand scheme of things, no? What's stopping someone from
> > bringing the device right back up after you drop the lock?
> 

I think the reset flow uses netif_device_detach() to detach the device before reset. Is that enough to prevent other calls to dev_close outside the driver?

Also, perhaps we should avoid re-attaching the device if the reset fails...

> I want to believe what there is some sort of state machine that won't
> allow simple toggling of dev_close/dev_open. If it doesn't, rtnl_lock
> users should audit their code for possible toggling right after that
> lock is dropped.
> 

I think the key is that normally dev_open and dev_close are done by iproute2 netlink messages? so if we close it, its possible userspace could immediately open it.. though I think that isn't allowed while the device is detached, so we should stay closed until we re-attach, at which point dev_open can fail by noticing the VF is disabled...


> Anyway, this discussion reminds me our devl_lock debate where we had
> completely opposite views if rtnl_lock model is the right one.
> https://lore.kernel.org/netdev/20211101073259.33406da3@kicinski-fedora-
> PC1C0HJN/
> 
> Let's not start argue again, we had enough back then. :)
> 
> Thanks
  
Jankowski, Konrad0 Nov. 18, 2022, 2:30 p.m. UTC | #9
> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of Ivan
> Vecera
> Sent: Tuesday, November 8, 2022 11:25 AM
> To: netdev@vger.kernel.org
> Cc: SlawomirX Laba <slawomirx.laba@intel.com>; Eric Dumazet
> <edumazet@google.com>; moderated list:INTEL ETHERNET DRIVERS <intel-
> wired-lan@lists.osuosl.org>; open list <linux-kernel@vger.kernel.org>;
> Piotrowski, Patryk <patryk.piotrowski@intel.com>; Jakub Kicinski
> <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; David S. Miller
> <davem@davemloft.net>; sassmann@redhat.com
> Subject: [Intel-wired-lan] [PATCH net] iavf: Do not restart Tx queues after reset
> task failure
> 
> After commit aa626da947e9 ("iavf: Detach device during reset task") the device
> is detached during reset task and re-attached at its end.
> The problem occurs when reset task fails because Tx queues are restarted during
> device re-attach and this leads later to a crash.
> 
> To resolve this issue properly close the net device in cause of failure in reset task
> to avoid restarting of tx queues at the end.
> Also replace the hacky manipulation with IFF_UP flag by device close that clears
> properly both IFF_UP and __LINK_STATE_START flags.
> In these case iavf_close() does not do anything because the adapter state is
> already __IAVF_DOWN.
> 
> Reproducer:
> 1) Run some Tx traffic (e.g. iperf3) over iavf interface
> 2) Set VF trusted / untrusted in loop
> 
> [root@host ~]# cat repro.sh
> #!/bin/sh
> 
> PF=enp65s0f0
> IF=${PF}v0
> 
> ip link set up $IF
> ip addr add 192.168.0.2/24 dev $IF
> sleep 1
> 
> iperf3 -c 192.168.0.1 -t 600 --logfile /dev/null & sleep 2
> 
> while :; do
>         ip link set $PF vf 0 trust on
>         ip link set $PF vf 0 trust off
> done
> [root@host ~]# ./repro.sh
> 
> Result:
> [ 2006.650969] iavf 0000:41:01.0: Failed to init adminq: -53 [ 2006.675662] ice
> 0000:41:00.0: VF 0 is now trusted [ 2006.689997] iavf 0000:41:01.0: Reset task
> did not complete, VF disabled [ 2006.696611] iavf 0000:41:01.0: failed to allocate
> resources during reinit [ 2006.703209] ice 0000:41:00.0: VF 0 is now untrusted [
> 2006.737011] ice 0000:41:00.0: VF 0 is now trusted [ 2006.764536] ice
> 0000:41:00.0: VF 0 is now untrusted [ 2006.768919] BUG: kernel NULL pointer
> dereference, address: 0000000000000b4a [ 2006.776358] #PF: supervisor read
> access in kernel mode [ 2006.781488] #PF: error_code(0x0000) - not-present
> page [ 2006.786620] PGD 0 P4D 0 [ 2006.789152] Oops: 0000 [#1] PREEMPT SMP
> NOPTI [ 2006.792903] ice 0000:41:00.0: VF 0 is now trusted [ 2006.793501] CPU:
> 4 PID: 0 Comm: swapper/4 Kdump: loaded Not tainted 6.1.0-rc3+ #2 [
> 2006.805668] Hardware name: Abacus electric, s.r.o. - servis@abacus.cz Super
> Server/H12SSW-iN, BIOS 2.4 04/13/2022 [ 2006.815915] RIP:
> 0010:iavf_xmit_frame_ring+0x96/0xf70 [iavf] [ 2006.821028] ice 0000:41:00.0:
> VF 0 is now untrusted [ 2006.821572] Code: 48 83 c1 04 48 c1 e1 04 48 01 f9 48
> 83 c0 10 6b 50 f8 55 c1 ea 14 45 8d 64 14 01 48 39 c8 75 eb 41 83 fc 07 0f 8f e9 08
> 00 00 <0f> b7 45 4a 0f b7 55 48 41 8d 74 24 05 31 c9 66 39 d0 0f 86 da 00 [
> 2006.845181] RSP: 0018:ffffb253004bc9e8 EFLAGS: 00010293 [ 2006.850397]
> RAX: ffff9d154de45b00 RBX: ffff9d15497d52e8 RCX: ffff9d154de45b00 [
> 2006.856327] ice 0000:41:00.0: VF 0 is now trusted [ 2006.857523] RDX:
> 0000000000000000 RSI: 00000000000005a8 RDI: ffff9d154de45ac0 [
> 2006.857525] RBP: 0000000000000b00 R08: ffff9d159cb010ac R09:
> 0000000000000001 [ 2006.857526] R10: ffff9d154de45940 R11:
> 0000000000000000 R12: 0000000000000002 [ 2006.883600] R13:
> ffff9d1770838dc0 R14: 0000000000000000 R15: ffffffffc07b8380 [ 2006.885840]
> ice 0000:41:00.0: VF 0 is now untrusted [ 2006.890725] FS:
> 0000000000000000(0000) GS:ffff9d248e900000(0000) knlGS:0000000000000000
> [ 2006.890727] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [
> 2006.909419] CR2: 0000000000000b4a CR3: 0000000c39c10002 CR4:
> 0000000000770ee0 [ 2006.916543] PKRU: 55555554 [ 2006.918254] ice
> 0000:41:00.0: VF 0 is now trusted [ 2006.919248] Call Trace:
> [ 2006.919250]  <IRQ>
> [ 2006.919252]  dev_hard_start_xmit+0x9e/0x1f0 [ 2006.932587]
> sch_direct_xmit+0xa0/0x370 [ 2006.936424]  __dev_queue_xmit+0x7af/0xd00 [
> 2006.940429]  ip_finish_output2+0x26c/0x540 [ 2006.944519]
> ip_output+0x71/0x110 [ 2006.947831]  ? __ip_finish_output+0x2b0/0x2b0 [
> 2006.952180]  __ip_queue_xmit+0x16d/0x400 [ 2006.952721] ice 0000:41:00.0:
> VF 0 is now untrusted [ 2006.956098]  __tcp_transmit_skb+0xa96/0xbf0 [
> 2006.965148]  __tcp_retransmit_skb+0x174/0x860 [ 2006.969499]  ?
> cubictcp_cwnd_event+0x40/0x40 [ 2006.973769]
> tcp_retransmit_skb+0x14/0xb0 ...
> 
> Fixes: aa626da947e9 ("iavf: Detach device during reset task")
> Cc: Jacob Keller <jacob.e.keller@intel.com>
> Cc: Patryk Piotrowski <patryk.piotrowski@intel.com>
> Cc: SlawomirX Laba <slawomirx.laba@intel.com>
> Signed-off-by: Ivan Vecera <ivecera@redhat.com>
> ---
>  drivers/net/ethernet/intel/iavf/iavf_main.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c
> b/drivers/net/ethernet/intel/iavf/iavf_main.c
> index 5abcd66e7c7a..b66f8fa1d83b 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf_main.c
> +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c

Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
  
Jankowski, Konrad0 Nov. 18, 2022, 2:31 p.m. UTC | #10
> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of Ivan
> Vecera
> Sent: Tuesday, November 8, 2022 11:25 AM
> To: netdev@vger.kernel.org
> Cc: SlawomirX Laba <slawomirx.laba@intel.com>; Eric Dumazet
> <edumazet@google.com>; moderated list:INTEL ETHERNET DRIVERS <intel-
> wired-lan@lists.osuosl.org>; open list <linux-kernel@vger.kernel.org>;
> Piotrowski, Patryk <patryk.piotrowski@intel.com>; Jakub Kicinski
> <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; David S. Miller
> <davem@davemloft.net>; sassmann@redhat.com
> Subject: [Intel-wired-lan] [PATCH net] iavf: Do not restart Tx queues after reset
> task failure
> 
> After commit aa626da947e9 ("iavf: Detach device during reset task") the device
> is detached during reset task and re-attached at its end.
> The problem occurs when reset task fails because Tx queues are restarted during
> device re-attach and this leads later to a crash.
> 
> To resolve this issue properly close the net device in cause of failure in reset task
> to avoid restarting of tx queues at the end.
> Also replace the hacky manipulation with IFF_UP flag by device close that clears
> properly both IFF_UP and __LINK_STATE_START flags.
> In these case iavf_close() does not do anything because the adapter state is
> already __IAVF_DOWN.
> 
> Reproducer:
> 1) Run some Tx traffic (e.g. iperf3) over iavf interface
> 2) Set VF trusted / untrusted in loop
> 
> [root@host ~]# cat repro.sh
> #!/bin/sh
> 
> PF=enp65s0f0
> IF=${PF}v0
> 
> ip link set up $IF
> ip addr add 192.168.0.2/24 dev $IF
> sleep 1
> 
> iperf3 -c 192.168.0.1 -t 600 --logfile /dev/null & sleep 2
> 
> while :; do
>         ip link set $PF vf 0 trust on
>         ip link set $PF vf 0 trust off
> done
> [root@host ~]# ./repro.sh
> 
> Result:
> [ 2006.650969] iavf 0000:41:01.0: Failed to init adminq: -53 [ 2006.675662] ice
> 0000:41:00.0: VF 0 is now trusted [ 2006.689997] iavf 0000:41:01.0: Reset task
> did not complete, VF disabled [ 2006.696611] iavf 0000:41:01.0: failed to allocate
> resources during reinit [ 2006.703209] ice 0000:41:00.0: VF 0 is now untrusted [
> 2006.737011] ice 0000:41:00.0: VF 0 is now trusted [ 2006.764536] ice
> 0000:41:00.0: VF 0 is now untrusted [ 2006.768919] BUG: kernel NULL pointer
> dereference, address: 0000000000000b4a [ 2006.776358] #PF: supervisor read
> access in kernel mode [ 2006.781488] #PF: error_code(0x0000) - not-present
> page [ 2006.786620] PGD 0 P4D 0 [ 2006.789152] Oops: 0000 [#1] PREEMPT SMP
> NOPTI [ 2006.792903] ice 0000:41:00.0: VF 0 is now trusted [ 2006.793501] CPU:
> 4 PID: 0 Comm: swapper/4 Kdump: loaded Not tainted 6.1.0-rc3+ #2 [
> 2006.805668] Hardware name: Abacus electric, s.r.o. - servis@abacus.cz Super
> Server/H12SSW-iN, BIOS 2.4 04/13/2022 [ 2006.815915] RIP:
> 0010:iavf_xmit_frame_ring+0x96/0xf70 [iavf] [ 2006.821028] ice 0000:41:00.0:
> VF 0 is now untrusted [ 2006.821572] Code: 48 83 c1 04 48 c1 e1 04 48 01 f9 48
> 83 c0 10 6b 50 f8 55 c1 ea 14 45 8d 64 14 01 48 39 c8 75 eb 41 83 fc 07 0f 8f e9 08
> 00 00 <0f> b7 45 4a 0f b7 55 48 41 8d 74 24 05 31 c9 66 39 d0 0f 86 da 00 [
> 2006.845181] RSP: 0018:ffffb253004bc9e8 EFLAGS: 00010293 [ 2006.850397]
> RAX: ffff9d154de45b00 RBX: ffff9d15497d52e8 RCX: ffff9d154de45b00 [
> 2006.856327] ice 0000:41:00.0: VF 0 is now trusted [ 2006.857523] RDX:
> 0000000000000000 RSI: 00000000000005a8 RDI: ffff9d154de45ac0 [
> 2006.857525] RBP: 0000000000000b00 R08: ffff9d159cb010ac R09:
> 0000000000000001 [ 2006.857526] R10: ffff9d154de45940 R11:
> 0000000000000000 R12: 0000000000000002 [ 2006.883600] R13:
> ffff9d1770838dc0 R14: 0000000000000000 R15: ffffffffc07b8380 [ 2006.885840]
> ice 0000:41:00.0: VF 0 is now untrusted [ 2006.890725] FS:
> 0000000000000000(0000) GS:ffff9d248e900000(0000) knlGS:0000000000000000
> [ 2006.890727] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [
> 2006.909419] CR2: 0000000000000b4a CR3: 0000000c39c10002 CR4:
> 0000000000770ee0 [ 2006.916543] PKRU: 55555554 [ 2006.918254] ice
> 0000:41:00.0: VF 0 is now trusted [ 2006.919248] Call Trace:
> [ 2006.919250]  <IRQ>
> [ 2006.919252]  dev_hard_start_xmit+0x9e/0x1f0 [ 2006.932587]
> sch_direct_xmit+0xa0/0x370 [ 2006.936424]  __dev_queue_xmit+0x7af/0xd00 [
> 2006.940429]  ip_finish_output2+0x26c/0x540 [ 2006.944519]
> ip_output+0x71/0x110 [ 2006.947831]  ? __ip_finish_output+0x2b0/0x2b0 [
> 2006.952180]  __ip_queue_xmit+0x16d/0x400 [ 2006.952721] ice 0000:41:00.0:
> VF 0 is now untrusted [ 2006.956098]  __tcp_transmit_skb+0xa96/0xbf0 [
> 2006.965148]  __tcp_retransmit_skb+0x174/0x860 [ 2006.969499]  ?
> cubictcp_cwnd_event+0x40/0x40 [ 2006.973769]
> tcp_retransmit_skb+0x14/0xb0 ...
> 
> Fixes: aa626da947e9 ("iavf: Detach device during reset task")
> Cc: Jacob Keller <jacob.e.keller@intel.com>
> Cc: Patryk Piotrowski <patryk.piotrowski@intel.com>
> Cc: SlawomirX Laba <slawomirx.laba@intel.com>
> Signed-off-by: Ivan Vecera <ivecera@redhat.com>
> ---
>  drivers/net/ethernet/intel/iavf/iavf_main.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c
> b/drivers/net/ethernet/intel/iavf/iavf_main.c
> index 5abcd66e7c7a..b66f8fa1d83b 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf_main.c
> +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
> @@ -2921,7 +2921,6 @@ static void iavf_disable_vf(struct iavf_adapter

Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
  

Patch

diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 5abcd66e7c7a..b66f8fa1d83b 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -2921,7 +2921,6 @@  static void iavf_disable_vf(struct iavf_adapter *adapter)
 	iavf_free_queues(adapter);
 	memset(adapter->vf_res, 0, IAVF_VIRTCHNL_VF_RESOURCE_SIZE);
 	iavf_shutdown_adminq(&adapter->hw);
-	adapter->netdev->flags &= ~IFF_UP;
 	adapter->flags &= ~IAVF_FLAG_RESET_PENDING;
 	iavf_change_state(adapter, __IAVF_DOWN);
 	wake_up(&adapter->down_waitqueue);
@@ -3021,6 +3020,11 @@  static void iavf_reset_task(struct work_struct *work)
 		iavf_disable_vf(adapter);
 		mutex_unlock(&adapter->client_lock);
 		mutex_unlock(&adapter->crit_lock);
+		if (netif_running(netdev)) {
+			rtnl_lock();
+			dev_close(netdev);
+			rtnl_unlock();
+		}
 		return; /* Do not attempt to reinit. It's dead, Jim. */
 	}
 
@@ -3173,6 +3177,16 @@  static void iavf_reset_task(struct work_struct *work)
 
 	mutex_unlock(&adapter->client_lock);
 	mutex_unlock(&adapter->crit_lock);
+
+	if (netif_running(netdev)) {
+		/* Close device to ensure that Tx queues will not be started
+		 * during netif_device_attach() at the end of the reset task.
+		 */
+		rtnl_lock();
+		dev_close(netdev);
+		rtnl_unlock();
+	}
+
 	dev_err(&adapter->pdev->dev, "failed to allocate resources during reinit\n");
 reset_finish:
 	rtnl_lock();