[RFC,v1,1/1] net: mac80211: fortify the spinlock against deadlock in interrupt

Message ID 20230423082403.49143-1-mirsad.todorovac@alu.unizg.hr
State New
Headers
Series [RFC,v1,1/1] net: mac80211: fortify the spinlock against deadlock in interrupt |

Commit Message

Mirsad Todorovac April 23, 2023, 8:24 a.m. UTC
  In the function ieee80211_tx_dequeue() there is a locking sequence:

begin:
	spin_lock(&local->queue_stop_reason_lock);
	q_stopped = local->queue_stop_reasons[q];
	spin_unlock(&local->queue_stop_reason_lock);

However small the chance (increased by ftracetest), an asynchronous
interrupt can occur in between of spin_lock() and spin_unlock(),
and the interrupt routine will attempt to lock the same
&local->queue_stop_reason_lock again.

This is the only remaining spin_lock() on local->queue_stop_reason_lock
that did not disable interrupts and could have possibly caused the deadlock
on the same CPU (core).

This will cause a costly reset of the CPU and wifi device or an
altogether hang in the single CPU and single core scenario.

This is the probable reproduce of the deadlock:

Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:  Possible unsafe locking scenario:
Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:        CPU0
Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:        ----
Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:   lock(&local->queue_stop_reason_lock);
Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:   <Interrupt>
Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:     lock(&local->queue_stop_reason_lock);
Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:
                                                 *** DEADLOCK ***

Fixes: 4444bc2116ae
Link: https://lore.kernel.org/all/1f58a0d1-d2b9-d851-73c3-93fcc607501c@alu.unizg.hr/
Cc: Alexander Wetzel <alexander@wetzel-home.de>
Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
---
 net/mac80211/tx.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)
  

Comments

Johannes Berg April 24, 2023, 5:27 p.m. UTC | #1
On Sun, 2023-04-23 at 10:24 +0200, Mirsad Goran Todorovac wrote:
> In the function ieee80211_tx_dequeue() there is a locking sequence:
> 
> begin:
> 	spin_lock(&local->queue_stop_reason_lock);
> 	q_stopped = local->queue_stop_reasons[q];
> 	spin_unlock(&local->queue_stop_reason_lock);
> 
> However small the chance (increased by ftracetest), an asynchronous
> interrupt can occur in between of spin_lock() and spin_unlock(),
> and the interrupt routine will attempt to lock the same
> &local->queue_stop_reason_lock again.
> 
> This is the only remaining spin_lock() on local->queue_stop_reason_lock
> that did not disable interrupts and could have possibly caused the deadlock
> on the same CPU (core).
> 
> This will cause a costly reset of the CPU and wifi device or an
> altogether hang in the single CPU and single core scenario.
> 
> This is the probable reproduce of the deadlock:
> 
> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:  Possible unsafe locking scenario:
> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:        CPU0
> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:        ----
> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:   lock(&local->queue_stop_reason_lock);
> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:   <Interrupt>
> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:     lock(&local->queue_stop_reason_lock);
> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:
>                                                  *** DEADLOCK ***
> 
> Fixes: 4444bc2116ae

That fixes tag is wrong, should be

Fixes: 4444bc2116ae ("wifi: mac80211: Proper mark iTXQs for resumption")

Otherwise seems fine to me, submit it properly?

johannes
  
Mirsad Todorovac April 25, 2023, 8:29 a.m. UTC | #2
On 24.4.2023. 19:27, Johannes Berg wrote:
> On Sun, 2023-04-23 at 10:24 +0200, Mirsad Goran Todorovac wrote:
>> In the function ieee80211_tx_dequeue() there is a locking sequence:
>>
>> begin:
>> 	spin_lock(&local->queue_stop_reason_lock);
>> 	q_stopped = local->queue_stop_reasons[q];
>> 	spin_unlock(&local->queue_stop_reason_lock);
>>
>> However small the chance (increased by ftracetest), an asynchronous
>> interrupt can occur in between of spin_lock() and spin_unlock(),
>> and the interrupt routine will attempt to lock the same
>> &local->queue_stop_reason_lock again.
>>
>> This is the only remaining spin_lock() on local->queue_stop_reason_lock
>> that did not disable interrupts and could have possibly caused the deadlock
>> on the same CPU (core).
>>
>> This will cause a costly reset of the CPU and wifi device or an
>> altogether hang in the single CPU and single core scenario.
>>
>> This is the probable reproduce of the deadlock:
>>
>> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:  Possible unsafe locking scenario:
>> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:        CPU0
>> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:        ----
>> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:   lock(&local->queue_stop_reason_lock);
>> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:   <Interrupt>
>> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:     lock(&local->queue_stop_reason_lock);
>> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:
>>                                                   *** DEADLOCK ***
>>
>> Fixes: 4444bc2116ae
> 
> That fixes tag is wrong, should be
> 
> Fixes: 4444bc2116ae ("wifi: mac80211: Proper mark iTXQs for resumption")
> 
> Otherwise seems fine to me, submit it properly?
> 
> johannes

Will do, Sir. Do I have an Acked-by: ?

Thank you.

Mirsad
  

Patch

diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 7699fb410670..45cb8e7bcc61 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -3781,6 +3781,7 @@  struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
 	ieee80211_tx_result r;
 	struct ieee80211_vif *vif = txq->vif;
 	int q = vif->hw_queue[txq->ac];
+	unsigned long flags;
 	bool q_stopped;
 
 	WARN_ON_ONCE(softirq_count() == 0);
@@ -3789,9 +3790,9 @@  struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
 		return NULL;
 
 begin:
-	spin_lock(&local->queue_stop_reason_lock);
+	spin_lock_irqsave(&local->queue_stop_reason_lock, flags);
 	q_stopped = local->queue_stop_reasons[q];
-	spin_unlock(&local->queue_stop_reason_lock);
+	spin_unlock_irqrestore(&local->queue_stop_reason_lock, flags);
 
 	if (unlikely(q_stopped)) {
 		/* mark for waking later */