[v5] ASoC: SOF: Fix deadlock when shutdown a frozen userspace

Message ID 20221127-snd-freeze-v5-0-4ededeb08ba0@chromium.org
State New
Headers
Series [v5] ASoC: SOF: Fix deadlock when shutdown a frozen userspace |

Commit Message

Ricardo Ribalda Nov. 28, 2022, 1:51 p.m. UTC
  During kexec(), the userspace is frozen. Therefore we cannot wait for it
to complete.

Avoid running snd_sof_machine_unregister during shutdown.

This fixes:

[   84.943749] Freezing user space processes ... (elapsed 0.111 seconds) done.
[  246.784446] INFO: task kexec-lite:5123 blocked for more than 122 seconds.
[  246.819035] Call Trace:
[  246.821782]  <TASK>
[  246.824186]  __schedule+0x5f9/0x1263
[  246.828231]  schedule+0x87/0xc5
[  246.831779]  snd_card_disconnect_sync+0xb5/0x127
...
[  246.889249]  snd_sof_device_shutdown+0xb4/0x150
[  246.899317]  pci_device_shutdown+0x37/0x61
[  246.903990]  device_shutdown+0x14c/0x1d6
[  246.908391]  kernel_kexec+0x45/0xb9

And:

[  246.893222] INFO: task kexec-lite:4891 blocked for more than 122 seconds.
[  246.927709] Call Trace:
[  246.930461]  <TASK>
[  246.932819]  __schedule+0x5f9/0x1263
[  246.936855]  ? fsnotify_grab_connector+0x5c/0x70
[  246.942045]  schedule+0x87/0xc5
[  246.945567]  schedule_timeout+0x49/0xf3
[  246.949877]  wait_for_completion+0x86/0xe8
[  246.954463]  snd_card_free+0x68/0x89
...
[  247.001080]  platform_device_unregister+0x12/0x35

Cc: stable@vger.kernel.org
Fixes: 83bfc7e793b5 ("ASoC: SOF: core: unregister clients and machine drivers in .shutdown")
Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
---
To: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
To: Liam Girdwood <lgirdwood@gmail.com>
To: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
To: Bard Liao <yung-chuan.liao@linux.intel.com>
To: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
To: Kai Vehmanen <kai.vehmanen@linux.intel.com>
To: Daniel Baluta <daniel.baluta@nxp.com>
To: Mark Brown <broonie@kernel.org>
To: Jaroslav Kysela <perex@perex.cz>
To: Takashi Iwai <tiwai@suse.com>
Cc: sound-open-firmware@alsa-project.org
Cc: alsa-devel@alsa-project.org
Cc: linux-kernel@vger.kernel.org
---
Changes in v5:
- Edit subject prefix
- Link to v4: https://lore.kernel.org/r/20221127-snd-freeze-v4-0-51ca64b7f2ab@chromium.org

Changes in v4:
- Do not call snd_sof_machine_unregister from shutdown.
- Link to v3: https://lore.kernel.org/r/20221127-snd-freeze-v3-0-a2eda731ca14@chromium.org

Changes in v3:
- Wrap pm_freezing in a function
- Link to v2: https://lore.kernel.org/r/20221127-snd-freeze-v2-0-d8a425ea9663@chromium.org

Changes in v2:
- Only use pm_freezing if CONFIG_FREEZER 
- Link to v1: https://lore.kernel.org/r/20221127-snd-freeze-v1-0-57461a366ec2@chromium.org
---
 sound/soc/sof/core.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)


---
base-commit: 4312098baf37ee17a8350725e6e0d0e8590252d4
change-id: 20221127-snd-freeze-1ee143228326

Best regards,
  

Comments

Kai Vehmanen Nov. 28, 2022, 2:41 p.m. UTC | #1
Hi,

On Mon, 28 Nov 2022, Ricardo Ribalda wrote:

> During kexec(), the userspace is frozen. Therefore we cannot wait for it
> to complete.
> 
> Avoid running snd_sof_machine_unregister during shutdown.
[...]
>  	/*
> -	 * make sure clients and machine driver(s) are unregistered to force
> -	 * all userspace devices to be closed prior to the DSP shutdown sequence
> +	 * make sure clients are unregistered prior to the DSP shutdown
> +	 * sequence.
>  	 */
>  	sof_unregister_clients(sdev);
>  
> -	snd_sof_machine_unregister(sdev, pdata);
> -
>  	if (sdev->fw_state == SOF_FW_BOOT_COMPLETE)

this is problematic as removing that machine_unregister() call will (at 
least) bring back an issue on Intel platforms (rare problem hitting S5 on 
Chromebooks).

Not sure how to solve this, but if that call needs to be removed 
(unsafe to call at shutdown), then we need to rework how SOF 
does the cleanup.

Br, Kai
  
Ricardo Ribalda Nov. 28, 2022, 5:10 p.m. UTC | #2
Hi Kay

Thanks for your review

On Mon, 28 Nov 2022 at 15:41, Kai Vehmanen <kai.vehmanen@linux.intel.com> wrote:
>
> Hi,
>
> On Mon, 28 Nov 2022, Ricardo Ribalda wrote:
>
> > During kexec(), the userspace is frozen. Therefore we cannot wait for it
> > to complete.
> >
> > Avoid running snd_sof_machine_unregister during shutdown.
> [...]
> >       /*
> > -      * make sure clients and machine driver(s) are unregistered to force
> > -      * all userspace devices to be closed prior to the DSP shutdown sequence
> > +      * make sure clients are unregistered prior to the DSP shutdown
> > +      * sequence.
> >        */
> >       sof_unregister_clients(sdev);
> >
> > -     snd_sof_machine_unregister(sdev, pdata);
> > -
> >       if (sdev->fw_state == SOF_FW_BOOT_COMPLETE)
>
> this is problematic as removing that machine_unregister() call will (at
> least) bring back an issue on Intel platforms (rare problem hitting S5 on
> Chromebooks).

Do you know which devices were affected or how to trigger the issue?

I have access to the ChromeOS lab, so I can test on a big variety of devices

Thanks!


>
> Not sure how to solve this, but if that call needs to be removed
> (unsafe to call at shutdown), then we need to rework how SOF
> does the cleanup.
>
> Br, Kai
  

Patch

diff --git a/sound/soc/sof/core.c b/sound/soc/sof/core.c
index 3e6141d03770..9616ba607ded 100644
--- a/sound/soc/sof/core.c
+++ b/sound/soc/sof/core.c
@@ -475,19 +475,16 @@  EXPORT_SYMBOL(snd_sof_device_remove);
 int snd_sof_device_shutdown(struct device *dev)
 {
 	struct snd_sof_dev *sdev = dev_get_drvdata(dev);
-	struct snd_sof_pdata *pdata = sdev->pdata;
 
 	if (IS_ENABLED(CONFIG_SND_SOC_SOF_PROBE_WORK_QUEUE))
 		cancel_work_sync(&sdev->probe_work);
 
 	/*
-	 * make sure clients and machine driver(s) are unregistered to force
-	 * all userspace devices to be closed prior to the DSP shutdown sequence
+	 * make sure clients are unregistered prior to the DSP shutdown
+	 * sequence.
 	 */
 	sof_unregister_clients(sdev);
 
-	snd_sof_machine_unregister(sdev, pdata);
-
 	if (sdev->fw_state == SOF_FW_BOOT_COMPLETE)
 		return snd_sof_shutdown(sdev);