Message ID | 20221108093534.1957820-1-ivecera@redhat.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp2593911wru; Tue, 8 Nov 2022 01:38:57 -0800 (PST) X-Google-Smtp-Source: AA0mqf562jjxOeju68lFKKcvzj5V31PsQVodjfEtz/YtwkZc/wrEGSx0SREYNx54AH2dvpZen6Q6 X-Received: by 2002:a17:906:7f94:b0:7ae:5bac:fb31 with SMTP id f20-20020a1709067f9400b007ae5bacfb31mr12888275ejr.96.1667900337350; Tue, 08 Nov 2022 01:38:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1667900337; cv=none; d=google.com; s=arc-20160816; b=XriYYONeoJyBVsthxI6yQDrfI3mxemlcO2L8hvdU1RRJLIncRGv5WuGcecYaOrgan+ RRo/PqHAmqendBM4HOdV9j2eRlgE0sxlr3rZaM4SDVhqyTDje4kBjbh9V42LVSzZsz8W pAP9avIERWuNN5pltXljds9YApeAgjjYS7WTJJospHNlZhowCWsOhGoY5Z3csD624Tze 21KMNYLRBd4W+lXdsYtgZ8iN7/mNjwXCRnijrKUd8ox+0HcODivHQ8nAYWoKK4aSWWan zQSvJYlNSrwUCqM4HSprcshyik8UJVBbeNTuHLJXmIH+4p2XGQcL+yEXG2l8U/Tpbmeu riHQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=ni7nJ+LyNnIMnJA8lMlyLPnJImyjWtt2FYs+ZUBq6os=; b=SjmRxd0cmKR4VAzJNxm0587rwz53A1OQ+o/FtUDrRdm6RrIrVB6laNX74K5f8APAh5 NPAdRstbsbJABB4u5XI52W+lqcq1VbCm6JyDvjbXBUpJM4sc1wGoRktRQko1Aq+5ntva xGBF0fNOtYXGLc0POicwUhDx/8O2TcvBr25CSe29wctum76O4Hn6MyIjVEaI35T5nOsK zU2R3Ec+ix3xqiNW7QtPQW9aDNQX095A4qIrkwM9aIgb8eRB03zGurxTgNo64h5+/bGI iAgwTGR3GOO2a3mNFkyRI68RTiv9+qJPCFhfdwrSxkj3e3Aw4OvvvChu7vf1FGF9Xb5s JO0Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=WqVMuWQh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id jg41-20020a170907972900b007add8160fd9si6044863ejc.765.2022.11.08.01.38.32; Tue, 08 Nov 2022 01:38:57 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=WqVMuWQh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233782AbiKHJhP (ORCPT <rfc822;hjfbswb@gmail.com> + 99 others); Tue, 8 Nov 2022 04:37:15 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50720 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233785AbiKHJhG (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 8 Nov 2022 04:37:06 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9772921814 for <linux-kernel@vger.kernel.org>; Tue, 8 Nov 2022 01:36:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1667900165; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=ni7nJ+LyNnIMnJA8lMlyLPnJImyjWtt2FYs+ZUBq6os=; b=WqVMuWQh/JY50yumWpiZXdOkgfSwoiHRvoBhgwakX/ffhS2OMkbPjQvcR6M2hcleMG5RLV vwpYob624uEIRx0kR7Rzsj+bf3btF7Kxy0iE4S90mRN3viQAGRhWPy6XdvhpQ7zdV9E6GT CEh2bMMVU4MIf3H8KtpibpxAsGWVsLY= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-36-sJJJvCkyNT29kytn07vuWg-1; Tue, 08 Nov 2022 04:36:02 -0500 X-MC-Unique: sJJJvCkyNT29kytn07vuWg-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id D0CCE2A59554; Tue, 8 Nov 2022 09:35:51 +0000 (UTC) Received: from p1.luc.cera.cz (ovpn-193-136.brq.redhat.com [10.40.193.136]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4C10A141511F; Tue, 8 Nov 2022 09:35:36 +0000 (UTC) From: Ivan Vecera <ivecera@redhat.com> To: netdev@vger.kernel.org Cc: sassmann@redhat.com, Jacob Keller <jacob.e.keller@intel.com>, Patryk Piotrowski <patryk.piotrowski@intel.com>, SlawomirX Laba <slawomirx.laba@intel.com>, Jesse Brandeburg <jesse.brandeburg@intel.com>, Tony Nguyen <anthony.l.nguyen@intel.com>, "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>, intel-wired-lan@lists.osuosl.org (moderated list:INTEL ETHERNET DRIVERS), linux-kernel@vger.kernel.org (open list) Subject: [PATCH net] iavf: Fix a crash during reset task Date: Tue, 8 Nov 2022 10:35:34 +0100 Message-Id: <20221108093534.1957820-1-ivecera@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748920264436479974?= X-GMAIL-MSGID: =?utf-8?q?1748920264436479974?= |
Series |
[net] iavf: Fix a crash during reset task
|
|
Commit Message
Ivan Vecera
Nov. 8, 2022, 9:35 a.m. UTC
Recent commit aa626da947e9 ("iavf: Detach device during reset task")
removed netif_tx_stop_all_queues() with an assumption that Tx queues
are already stopped by netif_device_detach() in the beginning of
reset task. This assumption is incorrect because during reset
task a potential link event can start Tx queues again.
Revert this change to fix this issue.
Reproducer:
1. Run some Tx traffic (e.g. iperf3) over iavf interface
2. Switch MTU of this interface in a loop
[root@host ~]# cat repro.sh
#!/bin/sh
IF=enp2s0f0v0
iperf3 -c 192.168.0.1 -t 600 --logfile /dev/null &
sleep 2
while :; do
for i in 1280 1500 2000 900 ; do
ip link set $IF mtu $i
sleep 2
done
done
[root@host ~]# ./repro.sh
Result:
[ 306.199917] iavf 0000:02:02.0 enp2s0f0v0: NIC Link is Up Speed is 40 Gbps Full Duplex
[ 308.205944] iavf 0000:02:02.0 enp2s0f0v0: NIC Link is Up Speed is 40 Gbps Full Duplex
[ 310.103223] BUG: kernel NULL pointer dereference, address: 0000000000000008
[ 310.110179] #PF: supervisor write access in kernel mode
[ 310.115396] #PF: error_code(0x0002) - not-present page
[ 310.120526] PGD 0 P4D 0
[ 310.123057] Oops: 0002 [#1] PREEMPT SMP NOPTI
[ 310.127408] CPU: 24 PID: 183 Comm: kworker/u64:9 Kdump: loaded Not tainted 6.1.0-rc3+ #2
[ 310.135485] Hardware name: Abacus electric, s.r.o. - servis@abacus.cz Super Server/H12SSW-iN, BIOS 2.4 04/13/2022
[ 310.145728] Workqueue: iavf iavf_reset_task [iavf]
[ 310.150520] RIP: 0010:iavf_xmit_frame_ring+0xd1/0xf70 [iavf]
[ 310.156180] Code: d0 0f 86 da 00 00 00 83 e8 01 0f b7 fa 29 f8 01 c8 39 c6 0f 8f a0 08 00 00 48 8b 45 20 48 8d 14 92 bf 01 00 00 00 4c 8d 3c d0 <49> 89 5f 08 8b 43 70 66 41 89 7f 14 41 89 47 10 f6 83 82 00 00 00
[ 310.174918] RSP: 0018:ffffbb5f0082caa0 EFLAGS: 00010293
[ 310.180137] RAX: 0000000000000000 RBX: ffff92345471a6e8 RCX: 0000000000000200
[ 310.187259] RDX: 0000000000000000 RSI: 000000000000000d RDI: 0000000000000001
[ 310.194385] RBP: ffff92341d249000 R08: ffff92434987fcac R09: 0000000000000001
[ 310.201509] R10: 0000000011f683b9 R11: 0000000011f50641 R12: 0000000000000008
[ 310.208631] R13: ffff923447500000 R14: 0000000000000000 R15: 0000000000000000
[ 310.215756] FS: 0000000000000000(0000) GS:ffff92434ee00000(0000) knlGS:0000000000000000
[ 310.223835] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 310.229572] CR2: 0000000000000008 CR3: 0000000fbc210004 CR4: 0000000000770ee0
[ 310.236696] PKRU: 55555554
[ 310.239399] Call Trace:
[ 310.241844] <IRQ>
[ 310.243855] ? dst_alloc+0x5b/0xb0
[ 310.247260] dev_hard_start_xmit+0x9e/0x1f0
[ 310.251439] sch_direct_xmit+0xa0/0x370
[ 310.255276] __qdisc_run+0x13e/0x580
[ 310.258848] __dev_queue_xmit+0x431/0xd00
[ 310.262851] ? selinux_ip_postroute+0x147/0x3f0
[ 310.267377] ip_finish_output2+0x26c/0x540
Fixes: aa626da947e9 ("iavf: Detach device during reset task")
Cc: Jacob Keller <jacob.e.keller@intel.com>
Cc: Patryk Piotrowski <patryk.piotrowski@intel.com>
Cc: SlawomirX Laba <slawomirx.laba@intel.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
---
drivers/net/ethernet/intel/iavf/iavf_main.c | 1 +
1 file changed, 1 insertion(+)
Comments
On 2022-11-08 10:35, Ivan Vecera wrote: > Recent commit aa626da947e9 ("iavf: Detach device during reset task") > removed netif_tx_stop_all_queues() with an assumption that Tx queues > are already stopped by netif_device_detach() in the beginning of > reset task. This assumption is incorrect because during reset > task a potential link event can start Tx queues again. > Revert this change to fix this issue. > > Reproducer: > 1. Run some Tx traffic (e.g. iperf3) over iavf interface > 2. Switch MTU of this interface in a loop > > [root@host ~]# cat repro.sh > #!/bin/sh > > IF=enp2s0f0v0 > > iperf3 -c 192.168.0.1 -t 600 --logfile /dev/null & > sleep 2 > > while :; do > for i in 1280 1500 2000 900 ; do > ip link set $IF mtu $i > sleep 2 > done > done With this patch applied iavf doesn't crash anymore but after a few cycles with the reproducer tx timeouts are observed. [ 47.551151] iavf 0000:00:09.0 eth0: NIC Link is Up Speed is 10 Gbps Full Duplex [ 54.035902] ------------[ cut here ]------------ [ 54.037397] NETDEV WATCHDOG: eth0 (iavf): transmit queue 3 timed out [ 54.039264] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:526 dev_watchdog+0x20f/0x250 [ 54.041524] Modules linked in: 8021q intel_rapl_msr intel_rapl_common kvm_intel kvm irqbypass rapl pcspkr drm ramoops reed_solomon crct10dif_pclmul crc32_pclmul crc32c_intel ata_generic pata_acpi ghash_clmulni_intel ata_piix aesni_intel crypto_simd iavf libata be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi [ 54.049723] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.1.0-rc2+ #90 [ 54.051049] Hardware name: Red Hat KVM, BIOS 1.15.0-2.module+el8.6.0+14757+c25ee005 04/01/2014 [ 54.052898] RIP: 0010:dev_watchdog+0x20f/0x250 [ 54.053907] Code: 00 e9 4d ff ff ff 48 89 df c6 05 92 24 96 01 01 e8 c6 f2 f8 ff 44 89 e9 48 89 de 48 c7 c7 28 7f f6 a0 48 89 c2 e8 6e 65 23 00 <0f> 0b e9 2f ff ff ff e8 25 06 2a 00 85 c0 74 b5 80 3d 74 1b 96 01 [ 54.057282] RSP: 0018:ffffaf56c00e0e80 EFLAGS: 00010282 [ 54.058164] RAX: 0000000000000000 RBX: ffff993ed95b8000 RCX: 0000000000000103 [ 54.059345] RDX: 0000000000000103 RSI: 00000000000000f6 RDI: 00000000ffffffff [ 54.060473] RBP: ffff993ed95b8508 R08: 0000000000000000 R09: c0000000fff7ffff [ 54.061558] R10: 0000000000000001 R11: ffffaf56c00e0d18 R12: ffff993ed95b8420 [ 54.062640] R13: 0000000000000003 R14: ffff993ed95b8508 R15: ffff993ef74a06c0 [ 54.063681] FS: 0000000000000000(0000) GS:ffff993ef7480000(0000) knlGS:0000000000000000 [ 54.064867] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 54.065654] CR2: 00007f42309e1280 CR3: 0000000107f6a003 CR4: 0000000000170ee0 [ 54.066612] Call Trace: [ 54.066985] <IRQ> [ 54.067265] ? mq_change_real_num_tx+0xd0/0xd0 [ 54.067844] call_timer_fn+0xa1/0x2c0 [ 54.068330] ? mq_change_real_num_tx+0xd0/0xd0 [ 54.068916] run_timer_softirq+0x527/0x550 [ 54.069447] ? lock_is_held_type+0xd8/0x130 [ 54.069998] __do_softirq+0xc3/0x481 [ 54.070469] irq_exit_rcu+0xe4/0x120 [ 54.070963] sysvec_apic_timer_interrupt+0x9e/0xc0 [ 54.071604] </IRQ> [ 54.071909] <TASK> [ 54.072223] asm_sysvec_apic_timer_interrupt+0x16/0x20 [ 54.072942] RIP: 0010:default_idle+0x10/0x20 [ 54.073533] Code: 89 df 31 f6 5b 5d e9 ff 1c a5 ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 eb 07 0f 00 2d f2 2a 42 00 fb f4 <c3> 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 65 This only occurs when the device is detached and reattached during reset. Stefan
On 11/8/2022 2:53 AM, Stefan Assmann wrote: > On 2022-11-08 10:35, Ivan Vecera wrote: >> Recent commit aa626da947e9 ("iavf: Detach device during reset task") >> removed netif_tx_stop_all_queues() with an assumption that Tx queues >> are already stopped by netif_device_detach() in the beginning of >> reset task. This assumption is incorrect because during reset >> task a potential link event can start Tx queues again. >> Revert this change to fix this issue. >> >> Reproducer: >> 1. Run some Tx traffic (e.g. iperf3) over iavf interface >> 2. Switch MTU of this interface in a loop >> >> [root@host ~]# cat repro.sh >> #!/bin/sh >> >> IF=enp2s0f0v0 >> >> iperf3 -c 192.168.0.1 -t 600 --logfile /dev/null & >> sleep 2 >> >> while :; do >> for i in 1280 1500 2000 900 ; do >> ip link set $IF mtu $i >> sleep 2 >> done >> done > > With this patch applied iavf doesn't crash anymore but after a few > cycles with the reproducer tx timeouts are observed. > > [ 47.551151] iavf 0000:00:09.0 eth0: NIC Link is Up Speed is 10 Gbps Full Duplex > [ 54.035902] ------------[ cut here ]------------ > [ 54.037397] NETDEV WATCHDOG: eth0 (iavf): transmit queue 3 timed out > [ 54.039264] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:526 dev_watchdog+0x20f/0x250 > [ 54.041524] Modules linked in: 8021q intel_rapl_msr intel_rapl_common kvm_intel kvm irqbypass rapl pcspkr drm ramoops reed_solomon crct10dif_pclmul crc32_pclmul crc32c_intel ata_generic pata_acpi ghash_clmulni_intel ata_piix aesni_intel crypto_simd iavf libata be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi > [ 54.049723] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.1.0-rc2+ #90 > [ 54.051049] Hardware name: Red Hat KVM, BIOS 1.15.0-2.module+el8.6.0+14757+c25ee005 04/01/2014 > [ 54.052898] RIP: 0010:dev_watchdog+0x20f/0x250 > [ 54.053907] Code: 00 e9 4d ff ff ff 48 89 df c6 05 92 24 96 01 01 e8 c6 f2 f8 ff 44 89 e9 48 89 de 48 c7 c7 28 7f f6 a0 48 89 c2 e8 6e 65 23 00 <0f> 0b e9 2f ff ff ff e8 25 06 2a 00 85 c0 74 b5 80 3d 74 1b 96 01 > [ 54.057282] RSP: 0018:ffffaf56c00e0e80 EFLAGS: 00010282 > [ 54.058164] RAX: 0000000000000000 RBX: ffff993ed95b8000 RCX: 0000000000000103 > [ 54.059345] RDX: 0000000000000103 RSI: 00000000000000f6 RDI: 00000000ffffffff > [ 54.060473] RBP: ffff993ed95b8508 R08: 0000000000000000 R09: c0000000fff7ffff > [ 54.061558] R10: 0000000000000001 R11: ffffaf56c00e0d18 R12: ffff993ed95b8420 > [ 54.062640] R13: 0000000000000003 R14: ffff993ed95b8508 R15: ffff993ef74a06c0 > [ 54.063681] FS: 0000000000000000(0000) GS:ffff993ef7480000(0000) knlGS:0000000000000000 > [ 54.064867] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 54.065654] CR2: 00007f42309e1280 CR3: 0000000107f6a003 CR4: 0000000000170ee0 > [ 54.066612] Call Trace: > [ 54.066985] <IRQ> > [ 54.067265] ? mq_change_real_num_tx+0xd0/0xd0 > [ 54.067844] call_timer_fn+0xa1/0x2c0 > [ 54.068330] ? mq_change_real_num_tx+0xd0/0xd0 > [ 54.068916] run_timer_softirq+0x527/0x550 > [ 54.069447] ? lock_is_held_type+0xd8/0x130 > [ 54.069998] __do_softirq+0xc3/0x481 > [ 54.070469] irq_exit_rcu+0xe4/0x120 > [ 54.070963] sysvec_apic_timer_interrupt+0x9e/0xc0 > [ 54.071604] </IRQ> > [ 54.071909] <TASK> > [ 54.072223] asm_sysvec_apic_timer_interrupt+0x16/0x20 > [ 54.072942] RIP: 0010:default_idle+0x10/0x20 > [ 54.073533] Code: 89 df 31 f6 5b 5d e9 ff 1c a5 ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 eb 07 0f 00 2d f2 2a 42 00 fb f4 <c3> 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 65 > > This only occurs when the device is detached and reattached during reset. Hi Ivan, Was there going to be an update to the patch to resolve this? If not, I'll take what there is now. Thanks, Tony
> -----Original Message----- > From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of Ivan > Vecera > Sent: Tuesday, November 8, 2022 10:36 AM > To: netdev@vger.kernel.org > Cc: SlawomirX Laba <slawomirx.laba@intel.com>; Eric Dumazet > <edumazet@google.com>; moderated list:INTEL ETHERNET DRIVERS <intel- > wired-lan@lists.osuosl.org>; open list <linux-kernel@vger.kernel.org>; > Piotrowski, Patryk <patryk.piotrowski@intel.com>; Jakub Kicinski > <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; David S. Miller > <davem@davemloft.net>; sassmann@redhat.com > Subject: [Intel-wired-lan] [PATCH net] iavf: Fix a crash during reset task > > Recent commit aa626da947e9 ("iavf: Detach device during reset task") removed > netif_tx_stop_all_queues() with an assumption that Tx queues are already > stopped by netif_device_detach() in the beginning of reset task. This assumption > is incorrect because during reset task a potential link event can start Tx queues > again. > Revert this change to fix this issue. > > Reproducer: > 1. Run some Tx traffic (e.g. iperf3) over iavf interface 2. Switch MTU of this > interface in a loop > > [root@host ~]# cat repro.sh > #!/bin/sh > > IF=enp2s0f0v0 > > iperf3 -c 192.168.0.1 -t 600 --logfile /dev/null & sleep 2 > > while :; do > for i in 1280 1500 2000 900 ; do > ip link set $IF mtu $i > sleep 2 > done > done > [root@host ~]# ./repro.sh > > Result: > [ 306.199917] iavf 0000:02:02.0 enp2s0f0v0: NIC Link is Up Speed is 40 Gbps Full > Duplex [ 308.205944] iavf 0000:02:02.0 enp2s0f0v0: NIC Link is Up Speed is 40 > Gbps Full Duplex [ 310.103223] BUG: kernel NULL pointer dereference, address: > 0000000000000008 [ 310.110179] #PF: supervisor write access in kernel mode [ > 310.115396] #PF: error_code(0x0002) - not-present page [ 310.120526] PGD 0 > P4D 0 [ 310.123057] Oops: 0002 [#1] PREEMPT SMP NOPTI [ 310.127408] CPU: > 24 PID: 183 Comm: kworker/u64:9 Kdump: loaded Not tainted 6.1.0-rc3+ #2 [ > 310.135485] Hardware name: Abacus electric, s.r.o. - servis@abacus.cz Super > Server/H12SSW-iN, BIOS 2.4 04/13/2022 [ 310.145728] Workqueue: iavf > iavf_reset_task [iavf] [ 310.150520] RIP: 0010:iavf_xmit_frame_ring+0xd1/0xf70 > [iavf] [ 310.156180] Code: d0 0f 86 da 00 00 00 83 e8 01 0f b7 fa 29 f8 01 c8 39 c6 > 0f 8f a0 08 00 00 48 8b 45 20 48 8d 14 92 bf 01 00 00 00 4c 8d 3c d0 <49> 89 5f 08 > 8b 43 70 66 41 89 7f 14 41 89 47 10 f6 83 82 00 00 00 [ 310.174918] RSP: > 0018:ffffbb5f0082caa0 EFLAGS: 00010293 [ 310.180137] RAX: > 0000000000000000 RBX: ffff92345471a6e8 RCX: 0000000000000200 [ > 310.187259] RDX: 0000000000000000 RSI: 000000000000000d RDI: > 0000000000000001 [ 310.194385] RBP: ffff92341d249000 R08: ffff92434987fcac > R09: 0000000000000001 [ 310.201509] R10: 0000000011f683b9 R11: > 0000000011f50641 R12: 0000000000000008 [ 310.208631] R13: > ffff923447500000 R14: 0000000000000000 R15: 0000000000000000 [ > 310.215756] FS: 0000000000000000(0000) GS:ffff92434ee00000(0000) > knlGS:0000000000000000 [ 310.223835] CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 [ 310.229572] CR2: 0000000000000008 CR3: > 0000000fbc210004 CR4: 0000000000770ee0 [ 310.236696] PKRU: 55555554 [ > 310.239399] Call Trace: > [ 310.241844] <IRQ> > [ 310.243855] ? dst_alloc+0x5b/0xb0 > [ 310.247260] dev_hard_start_xmit+0x9e/0x1f0 [ 310.251439] > sch_direct_xmit+0xa0/0x370 [ 310.255276] __qdisc_run+0x13e/0x580 [ > 310.258848] __dev_queue_xmit+0x431/0xd00 [ 310.262851] ? > selinux_ip_postroute+0x147/0x3f0 [ 310.267377] > ip_finish_output2+0x26c/0x540 > > Fixes: aa626da947e9 ("iavf: Detach device during reset task") > Cc: Jacob Keller <jacob.e.keller@intel.com> > Cc: Patryk Piotrowski <patryk.piotrowski@intel.com> > Cc: SlawomirX Laba <slawomirx.laba@intel.com> > Signed-off-by: Ivan Vecera <ivecera@redhat.com> > --- > drivers/net/ethernet/intel/iavf/iavf_main.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c > b/drivers/net/ethernet/intel/iavf/iavf_main.c > index 3fc572341781..5abcd66e7c7a 100644 > --- a/drivers/net/ethernet/intel/iavf/iavf_main.c > +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c > @@ -3033,6 +3033,7 @@ static void iavf_reset_task(struct work_struct *work) Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c index 3fc572341781..5abcd66e7c7a 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_main.c +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c @@ -3033,6 +3033,7 @@ static void iavf_reset_task(struct work_struct *work) if (running) { netif_carrier_off(netdev); + netif_tx_stop_all_queues(netdev); adapter->link_up = false; iavf_napi_disable_all(adapter); }