ASoC: SOF: amd: Fix locking in ACP IRQ handler

Message ID 20240208234315.2182048-1-cristian.ciocaltea@collabora.com
State New
Headers
Series ASoC: SOF: amd: Fix locking in ACP IRQ handler |

Commit Message

Cristian Ciocaltea Feb. 8, 2024, 11:43 p.m. UTC
  A recent change in acp_irq_thread() was meant to address a potential race
condition while trying to acquire the hardware semaphore responsible for
the synchronization between firmware and host IPC interrupts.

This resulted in an improper use of the IPC spinlock, causing normal
kernel memory allocations (which may sleep) inside atomic contexts:

1707255557.133976 kernel: BUG: sleeping function called from invalid context at include/linux/sched/mm.h:315
1707255557.134068 kernel: in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 502, name: irq/66-AudioDSP
1707255557.134089 kernel: preempt_count: 1, expected: 0
1707255557.134105 kernel: RCU nest depth: 0, expected: 0
1707255557.134120 kernel: 1 lock held by irq/66-AudioDSP/502:
1707255557.134157 kernel:  #0: ffff96514137d848 (&sdev->ipc_lock){....}-{2:2}, at: acp_irq_thread+0x3a/0x130 [snd_sof_amd_acp]
1707255557.134205 kernel: irq event stamp: 36
1707255557.134258 kernel: hardirqs last  enabled at (35): [<ffffffff9b1059ac>] finish_task_switch.isra.0+0xbc/0x340
1707255557.134290 kernel: hardirqs last disabled at (36): [<ffffffff9be19313>] _raw_spin_lock_irq+0x53/0x60
1707255557.134338 kernel: softirqs last  enabled at (0): [<ffffffff9b0b1a62>] copy_process+0xb42/0x2a30
1707255557.134412 kernel: softirqs last disabled at (0): [<0000000000000000>] 0x0
1707255557.134446 kernel: CPU: 5 PID: 502 Comm: irq/66-AudioDSP Not tainted 6.8.0-rc3-next-20240206-audio-next #9
1707255557.134478 kernel: Hardware name: Valve Galileo/Galileo, BIOS F7G0105 11/07/2023
1707255557.134512 kernel: Call Trace:
1707255557.134544 kernel:  <TASK>
1707255557.134575 kernel:  dump_stack_lvl+0x5b/0x90
1707255557.134613 kernel:  __might_resched+0x16f/0x280
1707255557.134645 kernel:  __kmalloc+0x2f3/0x460
1707255557.134675 kernel:  ? sof_ipc3_rx_msg+0x70/0x130 [snd_sof]
1707255557.134718 kernel:  ? srso_return_thunk+0x5/0x5f
1707255557.134757 kernel:  sof_ipc3_rx_msg+0x70/0x130 [snd_sof]
1707255557.134793 kernel:  acp_sof_ipc_irq_thread+0x1e0/0x550 [snd_sof_amd_acp]
1707255557.134855 kernel:  acp_irq_thread+0xa3/0x130 [snd_sof_amd_acp]
1707255557.134904 kernel:  ? irq_thread+0xb5/0x1e0
1707255557.134947 kernel:  ? __pfx_irq_thread_fn+0x10/0x10
1707255557.134985 kernel:  irq_thread_fn+0x23/0x60
1707255557.135020 kernel:  ? irq_thread+0xb5/0x1e0
1707255557.135051 kernel:  irq_thread+0x10f/0x1e0
1707255557.135083 kernel:  ? __pfx_irq_thread_dtor+0x10/0x10
1707255557.135114 kernel:  ? __pfx_irq_thread+0x10/0x10
1707255557.135136 kernel:  kthread+0xe3/0x110
1707255557.135161 kernel:  ? __pfx_kthread+0x10/0x10
1707255557.135176 kernel:  ret_from_fork+0x34/0x50
1707255557.134855 kernel:  acp_irq_thread+0xa3/0x130 [snd_sof_amd_acp]
1707255557.134904 kernel:  ? irq_thread+0xb5/0x1e0
1707255557.134947 kernel:  ? __pfx_irq_thread_fn+0x10/0x10
1707255557.134985 kernel:  irq_thread_fn+0x23/0x60
1707255557.135020 kernel:  ? irq_thread+0xb5/0x1e0
1707255557.135051 kernel:  irq_thread+0x10f/0x1e0
1707255557.135083 kernel:  ? __pfx_irq_thread_dtor+0x10/0x10
1707255557.135114 kernel:  ? __pfx_irq_thread+0x10/0x10
1707255557.135136 kernel:  kthread+0xe3/0x110
1707255557.135161 kernel:  ? __pfx_kthread+0x10/0x10
1707255557.135176 kernel:  ret_from_fork+0x34/0x50
1707255557.135197 kernel:  ? __pfx_kthread+0x10/0x10
1707255557.135212 kernel:  ret_from_fork_asm+0x1a/0x30
1707255557.135227 kernel:  </TASK>

Moreover, there are attempts to lock a mutex from the same atomic
context:

1707255557.136357 kernel: =============================
1707255557.136393 kernel: [ BUG: Invalid wait context ]
1707255557.136413 kernel: 6.8.0-rc3-next-20240206-audio-next #9 Tainted: G        W
1707255557.136432 kernel: -----------------------------
1707255557.136451 kernel: irq/66-AudioDSP/502 is trying to lock:
1707255557.136470 kernel: ffff965152f26af8 (&sb->s_type->i_mutex_key#2){+.+.}-{3:3}, at: start_creating.part.0+0x5f/0x180
1707255557.136490 kernel: other info that might help us debug this:
1707255557.137103 kernel: context-{4:4}
1707255557.137129 kernel: 1 lock held by irq/66-AudioDSP/502:
1707255557.137153 kernel:  #0: ffff96514137d848 (&sdev->ipc_lock){....}-{2:2}, at: acp_irq_thread+0x3a/0x130 [snd_sof_amd_acp]
1707255557.137172 kernel: stack backtrace:
1707255557.137194 kernel: CPU: 5 PID: 502 Comm: irq/66-AudioDSP Tainted: G        W          6.8.0-rc3-next-20240206-audio-next #9
1707255557.137217 kernel: Hardware name: Valve Galileo/Galileo, BIOS F7G0105 11/07/2023
1707255557.137239 kernel: Call Trace:
1707255557.137275 kernel:  <TASK>
1707255557.137294 kernel:  dump_stack_lvl+0x5b/0x90
1707255557.137313 kernel:  __lock_acquire+0x95c/0x2270
1707255557.137331 kernel:  lock_acquire+0xcb/0x2c0
1707255557.137354 kernel:  ? start_creating.part.0+0x5f/0x180
1707255557.137387 kernel:  down_write+0x3f/0xc0
1707255557.137410 kernel:  ? start_creating.part.0+0x5f/0x180
1707255557.137429 kernel:  start_creating.part.0+0x5f/0x180
1707255557.137457 kernel:  __debugfs_create_file+0x61/0x210
1707255557.137475 kernel:  snd_sof_debugfs_io_item+0x75/0xc0 [snd_sof]
1707255557.137494 kernel:  sof_ipc3_do_rx_work+0x7cf/0x9f0 [snd_sof]
1707255557.137513 kernel:  sof_ipc3_rx_msg+0xb3/0x130 [snd_sof]
1707255557.137532 kernel:  acp_sof_ipc_irq_thread+0x1e0/0x550 [snd_sof_amd_acp]
1707255557.137551 kernel:  acp_irq_thread+0xa3/0x130 [snd_sof_amd_acp]
1707255557.137574 kernel:  ? irq_thread+0xb5/0x1e0
1707255557.137593 kernel:  ? __pfx_irq_thread_fn+0x10/0x10
1707255557.137611 kernel:  irq_thread_fn+0x23/0x60
1707255557.137630 kernel:  ? irq_thread+0xb5/0x1e0
1707255557.137653 kernel:  irq_thread+0x10f/0x1e0
1707255557.137671 kernel:  ? __pfx_irq_thread_dtor+0x10/0x10
1707255557.137690 kernel:  ? __pfx_irq_thread+0x10/0x10
1707255557.137708 kernel:  kthread+0xe3/0x110
1707255557.137734 kernel:  ? __pfx_kthread+0x10/0x10
1707255557.137763 kernel:  ret_from_fork+0x34/0x50
1707255557.137791 kernel:  ? __pfx_kthread+0x10/0x10
1707255557.137813 kernel:  ret_from_fork_asm+0x1a/0x30
1707255557.137831 kernel:  </TASK>

Fix the issues by reducing the lock scope in acp_irq_thread(), so that
it guards only the hardware semaphore acquiring attempt.  Additionally,
restore the initial locking in acp_sof_ipc_irq_thread() to synchronize
the handling of immediate replies from DSP core.

Fixes: 802134c8c2c8 ("ASoC: SOF: amd: Refactor spinlock_irq(&sdev->ipc_lock) sequence in irq_handler")
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
---
 sound/soc/sof/amd/acp-ipc.c |  2 ++
 sound/soc/sof/amd/acp.c     | 17 ++++++++---------
 2 files changed, 10 insertions(+), 9 deletions(-)
  

Comments

Mark Brown Feb. 9, 2024, 10:09 a.m. UTC | #1
On Fri, Feb 09, 2024 at 01:43:14AM +0200, Cristian Ciocaltea wrote:

> 1707255557.133976 kernel: BUG: sleeping function called from invalid context at include/linux/sched/mm.h:315
> 1707255557.134068 kernel: in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 502, name: irq/66-AudioDSP
> 1707255557.134089 kernel: preempt_count: 1, expected: 0

Please think hard before including complete backtraces in upstream
reports, they are very large and contain almost no useful information
relative to their size so often obscure the relevant content in your
message. If part of the backtrace is usefully illustrative (it often is
for search engines if nothing else) then it's usually better to pull out
the relevant sections.
  
Cristian Ciocaltea Feb. 9, 2024, 10:43 a.m. UTC | #2
On 2/9/24 12:09, Mark Brown wrote:
> On Fri, Feb 09, 2024 at 01:43:14AM +0200, Cristian Ciocaltea wrote:
> 
>> 1707255557.133976 kernel: BUG: sleeping function called from invalid context at include/linux/sched/mm.h:315
>> 1707255557.134068 kernel: in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 502, name: irq/66-AudioDSP
>> 1707255557.134089 kernel: preempt_count: 1, expected: 0
> 
> Please think hard before including complete backtraces in upstream
> reports, they are very large and contain almost no useful information
> relative to their size so often obscure the relevant content in your
> message. If part of the backtrace is usefully illustrative (it often is
> for search engines if nothing else) then it's usually better to pull out
> the relevant sections.

The commit message got indeed larger than I initially expected as I
tried to document two different contexts.  Should I try to shrink this
up a bit and resend the patch?

Thanks,
Cristian
  
Mark Brown Feb. 12, 2024, 12:57 a.m. UTC | #3
On Fri, 09 Feb 2024 01:43:14 +0200, Cristian Ciocaltea wrote:
> A recent change in acp_irq_thread() was meant to address a potential race
> condition while trying to acquire the hardware semaphore responsible for
> the synchronization between firmware and host IPC interrupts.
> 
> This resulted in an improper use of the IPC spinlock, causing normal
> kernel memory allocations (which may sleep) inside atomic contexts:
> 
> [...]

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next

Thanks!

[1/1] ASoC: SOF: amd: Fix locking in ACP IRQ handler
      commit: c4b603c6e2df3a17831731d8bbec5c16fa7bbdf8

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark
  

Patch

diff --git a/sound/soc/sof/amd/acp-ipc.c b/sound/soc/sof/amd/acp-ipc.c
index 2743f07a5e08..b44b1b1adb6e 100644
--- a/sound/soc/sof/amd/acp-ipc.c
+++ b/sound/soc/sof/amd/acp-ipc.c
@@ -188,11 +188,13 @@  irqreturn_t acp_sof_ipc_irq_thread(int irq, void *context)
 
 	dsp_ack = snd_sof_dsp_read(sdev, ACP_DSP_BAR, ACP_SCRATCH_REG_0 + dsp_ack_write);
 	if (dsp_ack) {
+		spin_lock_irq(&sdev->ipc_lock);
 		/* handle immediate reply from DSP core */
 		acp_dsp_ipc_get_reply(sdev);
 		snd_sof_ipc_reply(sdev, 0);
 		/* set the done bit */
 		acp_dsp_ipc_dsp_done(sdev);
+		spin_unlock_irq(&sdev->ipc_lock);
 		ipc_irq = true;
 	}
 
diff --git a/sound/soc/sof/amd/acp.c b/sound/soc/sof/amd/acp.c
index c7f86b46c426..e69abeece61f 100644
--- a/sound/soc/sof/amd/acp.c
+++ b/sound/soc/sof/amd/acp.c
@@ -343,21 +343,20 @@  static irqreturn_t acp_irq_thread(int irq, void *context)
 	unsigned int count = ACP_HW_SEM_RETRY_COUNT;
 
 	spin_lock_irq(&sdev->ipc_lock);
-	while (snd_sof_dsp_read(sdev, ACP_DSP_BAR, desc->hw_semaphore_offset)) {
-		/* Wait until acquired HW Semaphore lock or timeout */
-		count--;
-		if (!count) {
-			dev_err(sdev->dev, "%s: Failed to acquire HW lock\n", __func__);
-			spin_unlock_irq(&sdev->ipc_lock);
-			return IRQ_NONE;
-		}
+	/* Wait until acquired HW Semaphore lock or timeout */
+	while (snd_sof_dsp_read(sdev, ACP_DSP_BAR, desc->hw_semaphore_offset) && --count)
+		;
+	spin_unlock_irq(&sdev->ipc_lock);
+
+	if (!count) {
+		dev_err(sdev->dev, "%s: Failed to acquire HW lock\n", __func__);
+		return IRQ_NONE;
 	}
 
 	sof_ops(sdev)->irq_thread(irq, sdev);
 	/* Unlock or Release HW Semaphore */
 	snd_sof_dsp_write(sdev, ACP_DSP_BAR, desc->hw_semaphore_offset, 0x0);
 
-	spin_unlock_irq(&sdev->ipc_lock);
 	return IRQ_HANDLED;
 };