[v3,1/1] wifi: mac80211: fortify the spinlock against deadlock by interrupt

Message ID 20230425093547.1131-1-mirsad.todorovac@alu.unizg.hr
State New
Headers
Series [v3,1/1] wifi: mac80211: fortify the spinlock against deadlock by interrupt |

Commit Message

Mirsad Todorovac April 25, 2023, 9:35 a.m. UTC
  In the function ieee80211_tx_dequeue() there is a particular locking
sequence:

begin:
	spin_lock(&local->queue_stop_reason_lock);
	q_stopped = local->queue_stop_reasons[q];
	spin_unlock(&local->queue_stop_reason_lock);

However small the chance (increased by ftracetest), an asynchronous
interrupt can occur in between of spin_lock() and spin_unlock(),
and the interrupt routine will attempt to lock the same
&local->queue_stop_reason_lock again.

This will cause a costly reset of the CPU and the wifi device or an
altogether hang in the single CPU and single core scenario.

This is the probable trace of the deadlock:

Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:  Possible unsafe locking scenario:
Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:        CPU0
Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:        ----
Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:   lock(&local->queue_stop_reason_lock);
Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:   <Interrupt>
Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:     lock(&local->queue_stop_reason_lock);
Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:
                                                 *** DEADLOCK ***

Fixes: 4444bc2116ae ("wifi: mac80211: Proper mark iTXQs for resumption")
Link: https://lore.kernel.org/all/1f58a0d1-d2b9-d851-73c3-93fcc607501c@alu.unizg.hr/
Reported-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Cc: Gregory Greenman <gregory.greenman@intel.com>
Cc: Johannes Berg <johannes.berg@intel.com>
Link: https://lore.kernel.org/all/cdc80531-f25f-6f9d-b15f-25e16130b53a@alu.unizg.hr/
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Alexander Wetzel <alexander@wetzel-home.de>
Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
---
v2 -> v3:
- Fix the Fixes: tag as advised.
- change the net: to wifi: to comply with the original patch that
  is being fixed.
v1 -> v2:
- Minor rewording and clarification.
- Cc:-ed people that replied to the original bug report (forgotten
  in v1 by omission).

 net/mac80211/tx.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)
  

Comments

Leon Romanovsky April 25, 2023, 3:33 p.m. UTC | #1
On Tue, Apr 25, 2023 at 11:35:48AM +0200, Mirsad Goran Todorovac wrote:
> In the function ieee80211_tx_dequeue() there is a particular locking
> sequence:
> 
> begin:
> 	spin_lock(&local->queue_stop_reason_lock);
> 	q_stopped = local->queue_stop_reasons[q];
> 	spin_unlock(&local->queue_stop_reason_lock);
> 
> However small the chance (increased by ftracetest), an asynchronous
> interrupt can occur in between of spin_lock() and spin_unlock(),
> and the interrupt routine will attempt to lock the same
> &local->queue_stop_reason_lock again.
> 
> This will cause a costly reset of the CPU and the wifi device or an
> altogether hang in the single CPU and single core scenario.
> 
> This is the probable trace of the deadlock:
> 
> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:  Possible unsafe locking scenario:
> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:        CPU0
> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:        ----
> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:   lock(&local->queue_stop_reason_lock);
> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:   <Interrupt>
> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:     lock(&local->queue_stop_reason_lock);
> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:
>                                                  *** DEADLOCK ***

Can you please add to the commit message whole lockdep trace?

And please trim "Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: " line prefix,
it doesn't add any value.

Thanks
  
Mirsad Todorovac April 25, 2023, 3:59 p.m. UTC | #2
On 25.4.2023. 17:33, Leon Romanovsky wrote:
> On Tue, Apr 25, 2023 at 11:35:48AM +0200, Mirsad Goran Todorovac wrote:
>> In the function ieee80211_tx_dequeue() there is a particular locking
>> sequence:
>>
>> begin:
>> 	spin_lock(&local->queue_stop_reason_lock);
>> 	q_stopped = local->queue_stop_reasons[q];
>> 	spin_unlock(&local->queue_stop_reason_lock);
>>
>> However small the chance (increased by ftracetest), an asynchronous
>> interrupt can occur in between of spin_lock() and spin_unlock(),
>> and the interrupt routine will attempt to lock the same
>> &local->queue_stop_reason_lock again.
>>
>> This will cause a costly reset of the CPU and the wifi device or an
>> altogether hang in the single CPU and single core scenario.
>>
>> This is the probable trace of the deadlock:
>>
>> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:  Possible unsafe locking scenario:
>> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:        CPU0
>> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:        ----
>> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:   lock(&local->queue_stop_reason_lock);
>> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:   <Interrupt>
>> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:     lock(&local->queue_stop_reason_lock);
>> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:
>>                                                   *** DEADLOCK ***
> 
> Can you please add to the commit message whole lockdep trace?
> 
> And please trim "Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: " line prefix,
> it doesn't add any value.

Sure. I will do this ASAP. I thought of it myself, but I reckoned it would
be an overkill.

Will come in PATCH v4.

Best regards,
Mirsad
  

Patch

diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 7699fb410670..45cb8e7bcc61 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -3781,6 +3781,7 @@  struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
 	ieee80211_tx_result r;
 	struct ieee80211_vif *vif = txq->vif;
 	int q = vif->hw_queue[txq->ac];
+	unsigned long flags;
 	bool q_stopped;
 
 	WARN_ON_ONCE(softirq_count() == 0);
@@ -3789,9 +3790,9 @@  struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
 		return NULL;
 
 begin:
-	spin_lock(&local->queue_stop_reason_lock);
+	spin_lock_irqsave(&local->queue_stop_reason_lock, flags);
 	q_stopped = local->queue_stop_reasons[q];
-	spin_unlock(&local->queue_stop_reason_lock);
+	spin_unlock_irqrestore(&local->queue_stop_reason_lock, flags);
 
 	if (unlikely(q_stopped)) {
 		/* mark for waking later */