Message ID | 20240228-wave5-fix-abort-v1-1-7482b9316867@baylibre.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-85340-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:a81b:b0:108:e6aa:91d0 with SMTP id bq27csp3467315dyb; Wed, 28 Feb 2024 08:42:55 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCV5SxQRWvpXu2yTvBamRP6kbmaWeQO7H1TSt9hnDH8hIzK1HwfbIuwwPQuvtOVSbuqc7aODTubXf8JKhBZGT7nKIb9g+A== X-Google-Smtp-Source: AGHT+IEXkdSxgMGMXfBj5kb0mSa2EstacZLHxzNs8bzj5NyImVb47oytwvigoOJXG5vjKURg+Lz8 X-Received: by 2002:a05:6a00:bca:b0:6e5:7be5:79f8 with SMTP id x10-20020a056a000bca00b006e57be579f8mr137122pfu.5.1709138575141; Wed, 28 Feb 2024 08:42:55 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709138575; cv=pass; d=google.com; s=arc-20160816; b=XszPMmzFZo1jen3oYpqNKCq+cN6GDvjcwcDJUHtBBHEpOJu4Xlr5/lE2bBAgojzg/3 xtIY7zQKDtOJ9CDs4qZI9aWJ6KxbgPhztWflr0gr6xWiBR2fAVcDgsZd4t1pJhmLBB0H wSDwmw0BknTn7WfcZXegDfEGt+8f1ZLdmQSAls4TJlKVfLC7WIzjG9UpbGkYiintikmR DhgFWgVBopr2N5rl3hIKs2PHu/eqSmOR31H1O2obmJM/JsoDpBNisg1HewwoCAZllcM+ hZt4qnPZ6ceSY9PypeaXdoQsiq63c/g8/kVIvOVm6bU5+tYkFDJILqpCatXVpt/92vLR 6+3A== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:message-id:content-transfer-encoding:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:subject:date :from:dkim-signature; bh=Q/+mYwQG2fiJMD7iBIPPcaK1rFS1aZsi0yXF5dIj9kc=; fh=pey8+VkTUmeewqTubwo9P8Zgszl2aR3oR18LVyNJcrA=; b=udQ2whWusj7RkL17Tv0iZZzG/RZDGDyaBIfd+q2UPjOc+W1OEGacvFLtm9t16lp0qA q50x+T94qiRffFeUE+VNuEjlIykVA3q+18ZwUCnAaE8qWGr+i234b2qb17SFRVoOZv9l cHqexA3zPIPDDOMfvmsCAKDoz9p7OPRbpPTAu8sKvib2nJ1o21IYrCH+VwvoXKUv8P0+ jsBwrvjykXiUBshHbDct08GryBExjEYZENCfSj0TWc+IWuGussMIwVIZpujsnju6YH8u rilou1TxUW3TdZYJ3LlVE3VYHEGmytGsN2XawD1qEKgNjnDNH7RV/1HFXP/xcSufnS8B P1fw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@baylibre-com.20230601.gappssmtp.com header.s=20230601 header.b=F1PDbmDH; arc=pass (i=1 spf=pass spfdomain=baylibre.com dkim=pass dkdomain=baylibre-com.20230601.gappssmtp.com); spf=pass (google.com: domain of linux-kernel+bounces-85340-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-85340-ouuuleilei=gmail.com@vger.kernel.org" Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id jo30-20020a056a00909e00b006e53435b435si5119005pfb.391.2024.02.28.08.42.54 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Feb 2024 08:42:55 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-85340-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@baylibre-com.20230601.gappssmtp.com header.s=20230601 header.b=F1PDbmDH; arc=pass (i=1 spf=pass spfdomain=baylibre.com dkim=pass dkdomain=baylibre-com.20230601.gappssmtp.com); spf=pass (google.com: domain of linux-kernel+bounces-85340-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-85340-ouuuleilei=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 6B8A0B2AA8E for <ouuuleilei@gmail.com>; Wed, 28 Feb 2024 16:15:06 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id CBF5215E5CB; Wed, 28 Feb 2024 16:13:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=baylibre-com.20230601.gappssmtp.com header.i=@baylibre-com.20230601.gappssmtp.com header.b="F1PDbmDH" Received: from mail-lf1-f50.google.com (mail-lf1-f50.google.com [209.85.167.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4AE623FBB9 for <linux-kernel@vger.kernel.org>; Wed, 28 Feb 2024 16:12:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709136778; cv=none; b=Xclt4jU0OEyHTfgyAC9vZkad2FkoIkoBoe/NnO2YYtNkRSEsU5nMMspxK77EK+oXBUS2r7F6NP3reekFm4lvDp4mgNuErlGCRKMbkNCoGkvypOzSm2sKHuSozJBHM+0eel3IEZV+0m23o81fzdt/i7tY1iGhuGyGBlggDDJEdRo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709136778; c=relaxed/simple; bh=u7uFXnDrICcdk8xlJjFPen//WVVlF6lAfMWp3oCwzrY=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:To:Cc; b=HKqq+wlyaoQaci1BiBsbDdYh4NOQaac1NC7zjaur+65Pp1J0bLfbEJA0Miny7QglIqafIc8L5Bn0sCMLERX0l11EyI3J58zsvh+PadrfQtlGMAuLijZY39IrMOwGjHRAZYDcLtfflISqYjQQXJ5NCfBpCXVinDKABdKevnZBjtI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=baylibre.com; spf=pass smtp.mailfrom=baylibre.com; dkim=pass (2048-bit key) header.d=baylibre-com.20230601.gappssmtp.com header.i=@baylibre-com.20230601.gappssmtp.com header.b=F1PDbmDH; arc=none smtp.client-ip=209.85.167.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=baylibre.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=baylibre.com Received: by mail-lf1-f50.google.com with SMTP id 2adb3069b0e04-513181719easo1060501e87.3 for <linux-kernel@vger.kernel.org>; Wed, 28 Feb 2024 08:12:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=baylibre-com.20230601.gappssmtp.com; s=20230601; t=1709136772; x=1709741572; darn=vger.kernel.org; h=cc:to:message-id:content-transfer-encoding:mime-version:subject :date:from:from:to:cc:subject:date:message-id:reply-to; bh=Q/+mYwQG2fiJMD7iBIPPcaK1rFS1aZsi0yXF5dIj9kc=; b=F1PDbmDHzqrLbcTb+0mGZx05DLKpzy6pqdgaTu1MopxNeNm6MdqbGS5T53DuGQD9Mu 2m73vDI5VNVyVudf/ZgAF4NrOoDNVDemGbAcBlHzY+Mwq9+tA/H6IcIJnIZcXQG4xKgC aAqmb409HblK6wc2WTL/zU3rnxsZQtkrIQz/2XKLyVA98OSdrav83sMZ9YXCvMRWzqxw KROzuYkGvfTI3djoXx/I/HL+xplnLVns6wQIq4Ba7fx0247PW+/CXOuKs+DxzLFbsZFz mtymoWWxLHA0JvHMBZfSedh4pjQr+/7ZcwADDgxRON1qmdFHZJ3uUsWsauYJp0v0GpWx lM5g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709136772; x=1709741572; h=cc:to:message-id:content-transfer-encoding:mime-version:subject :date:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Q/+mYwQG2fiJMD7iBIPPcaK1rFS1aZsi0yXF5dIj9kc=; b=q3pFGjRgZrrnXUDmZVTO7KD3cCecWci30GYK6P+afYD/Zpi0qTZSSGUKyt6/VHYBLV QRVXh2Wc1VNgfnCZrAd94FRKWeGGA9EI1mlqGQiNh0oMKCwIzCo5hkywcpaqLjwWgGkX hmwcLIO/dWnpLruS5J0MctnG3Ealcz/GsavWNgOqDXBabGugno54IAfLGi6opvl8O0fJ fazcxqbI/2eu//lGMiPRyL4dAJPk06cyTUrqGsi8jnzruwXe7y9LE/m1fN7+Vaqu0IN2 hWh/pgWbSWq4rhUgWSzXUwU1oib6merxGaRpwULglhdDS1hJTWB/5wK8UIy7uC0byY3x SiXA== X-Forwarded-Encrypted: i=1; AJvYcCUVznDwRj8Kevcoxih47yc37sxp5Ow/TqJnV2rPjTymmsboxdrsV8FWCzz+2Meg86Z1XRcco1zspAVaAHFv2GojK76RJPHyGP2CMp5m X-Gm-Message-State: AOJu0YyI1LMxKGuzvyhhoKnZqfGt5Hw9nwMVXKpywkO1za5CaG66NHZ4 1kfLgtjaA1Vd1hRi2ztN8/DU70Higah99Rv5oOmmcGpCBbVeELefweofXBWDF+0= X-Received: by 2002:a05:6512:ba7:b0:512:bdd3:153c with SMTP id b39-20020a0565120ba700b00512bdd3153cmr144237lfv.18.1709136771948; Wed, 28 Feb 2024 08:12:51 -0800 (PST) Received: from [192.168.1.20] ([2a01:cb19:95ba:5000:d6dd:417f:52ac:335b]) by smtp.gmail.com with ESMTPSA id y16-20020a05600c365000b00412656ba919sm2444276wmq.20.2024.02.28.08.12.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Feb 2024 08:12:50 -0800 (PST) From: Mattijs Korpershoek <mkorpershoek@baylibre.com> Date: Wed, 28 Feb 2024 17:12:41 +0100 Subject: [PATCH] media: chips-media: wave5: Call v4l2_m2m_job_finish() in job_abort() Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <20240228-wave5-fix-abort-v1-1-7482b9316867@baylibre.com> X-B4-Tracking: v=1; b=H4sIAHhb32UC/x2MSQqAMAwAvyI5G2iDYvEr4qFL1FxUWlGh9O8Wj 8MwkyFxFE4wNhki35Lk2CvotgG/2X1llFAZSFGniAw+9uYeF3nRuiNeuAwUqDdGexegVmfkKv/ jNJfyAeappsRhAAAA To: Nas Chung <nas.chung@chipsnmedia.com>, Jackson Lee <jackson.lee@chipsnmedia.com>, Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Guillaume La Roque <glaroque@baylibre.com>, Brandon Brnich <b-brnich@ti.com>, Nicolas Dufresne <nicolas.dufresne@collabora.com>, Robert Beckett <bob.beckett@collabora.com>, Sebastian Fricke <sebastian.fricke@collabora.com>, Mattijs Korpershoek <mkorpershoek@baylibre.com>, linux-media@vger.kernel.org, linux-kernel@vger.kernel.org X-Mailer: b4 0.12.4-dev-6aa5d X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1792161690786048502 X-GMAIL-MSGID: 1792161690786048502 |
Series |
media: chips-media: wave5: Call v4l2_m2m_job_finish() in job_abort()
|
|
Commit Message
Mattijs Korpershoek
Feb. 28, 2024, 4:12 p.m. UTC
When aborting a stream, it's possible that v4l2_m2m_cancel_job()
remains stuck:
[ 3498.490038][ T1] sysrq: Show Blocked State
[ 3498.511754][ T1] task:V4L2DecodeCompo state:D stack:12480 pid:2387 ppid:1 flags:0x04000809
[ 3498.521153][ T1] Call trace:
[ 3498.524333][ T1] __switch_to+0x174/0x338
[ 3498.528611][ T1] __schedule+0x5ec/0x9cc
[ 3498.532795][ T1] schedule+0x84/0xf0
[ 3498.536630][ T1] v4l2_m2m_cancel_job+0x118/0x194
[ 3498.541595][ T1] v4l2_m2m_streamoff+0x34/0x13c
[ 3498.546384][ T1] v4l2_m2m_ioctl_streamoff+0x20/0x30
[ 3498.551607][ T1] v4l_streamoff+0x44/0x54
[ 3498.555877][ T1] __video_do_ioctl+0x388/0x4dc
[ 3498.560580][ T1] video_usercopy+0x618/0xa0c
[ 3498.565109][ T1] video_ioctl2+0x20/0x30
[ 3498.569292][ T1] v4l2_ioctl+0x74/0x8c
[ 3498.573300][ T1] __arm64_sys_ioctl+0xb0/0xec
[ 3498.577918][ T1] invoke_syscall+0x60/0x124
[ 3498.582368][ T1] el0_svc_common+0x90/0xfc
[ 3498.586724][ T1] do_el0_svc+0x34/0xb8
[ 3498.590733][ T1] el0_svc+0x2c/0xa4
[ 3498.594480][ T1] el0t_64_sync_handler+0x68/0xb4
[ 3498.599354][ T1] el0t_64_sync+0x1a4/0x1a8
[ 3498.603832][ T1] sysrq: Kill All Tasks
According to job_abort() documentation from v4l2_m2m_ops:
After the driver performs the necessary steps, it has to call
v4l2_m2m_job_finish() or v4l2_m2m_buf_done_and_job_finish() as
if the transaction ended normally.
This is not done in wave5_vpu_dec_job_abort(). Neither switch_state() nor
wave5_vpu_dec_set_eos_on_firmware() does this.
Add the missing call to fix the v4l2_m2m_cancel_job() hangs.
Fixes: 9707a6254a8a ("media: chips-media: wave5: Add the v4l2 layer")
Signed-off-by: Mattijs Korpershoek <mkorpershoek@baylibre.com>
---
This has been tested on the AM62Px SK EVM using Android 14.
See:
https://www.ti.com/tool/PROCESSOR-SDK-AM62P
Note: while this is has not been tested on an upstream linux tree, I
believe the fix is still valid for mainline and I would like to get
some feedback on it.
Thank you in advance for your time.
---
drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c | 3 +++
1 file changed, 3 insertions(+)
---
base-commit: 8c64f4cdf4e6cc5682c52523713af8c39c94e6d5
change-id: 20240228-wave5-fix-abort-f72d25881cbd
Best regards,
Comments
Hi, Le mercredi 28 février 2024 à 17:12 +0100, Mattijs Korpershoek a écrit : > When aborting a stream, it's possible that v4l2_m2m_cancel_job() > remains stuck: > > [ 3498.490038][ T1] sysrq: Show Blocked State > [ 3498.511754][ T1] task:V4L2DecodeCompo state:D stack:12480 pid:2387 ppid:1 flags:0x04000809 > [ 3498.521153][ T1] Call trace: > [ 3498.524333][ T1] __switch_to+0x174/0x338 > [ 3498.528611][ T1] __schedule+0x5ec/0x9cc > [ 3498.532795][ T1] schedule+0x84/0xf0 > [ 3498.536630][ T1] v4l2_m2m_cancel_job+0x118/0x194 > [ 3498.541595][ T1] v4l2_m2m_streamoff+0x34/0x13c > [ 3498.546384][ T1] v4l2_m2m_ioctl_streamoff+0x20/0x30 > [ 3498.551607][ T1] v4l_streamoff+0x44/0x54 > [ 3498.555877][ T1] __video_do_ioctl+0x388/0x4dc > [ 3498.560580][ T1] video_usercopy+0x618/0xa0c > [ 3498.565109][ T1] video_ioctl2+0x20/0x30 > [ 3498.569292][ T1] v4l2_ioctl+0x74/0x8c > [ 3498.573300][ T1] __arm64_sys_ioctl+0xb0/0xec > [ 3498.577918][ T1] invoke_syscall+0x60/0x124 > [ 3498.582368][ T1] el0_svc_common+0x90/0xfc > [ 3498.586724][ T1] do_el0_svc+0x34/0xb8 > [ 3498.590733][ T1] el0_svc+0x2c/0xa4 > [ 3498.594480][ T1] el0t_64_sync_handler+0x68/0xb4 > [ 3498.599354][ T1] el0t_64_sync+0x1a4/0x1a8 > [ 3498.603832][ T1] sysrq: Kill All Tasks > > According to job_abort() documentation from v4l2_m2m_ops: > > After the driver performs the necessary steps, it has to call > v4l2_m2m_job_finish() or v4l2_m2m_buf_done_and_job_finish() as > if the transaction ended normally. > > This is not done in wave5_vpu_dec_job_abort(). Neither switch_state() nor > wave5_vpu_dec_set_eos_on_firmware() does this. The doc said "the driver", not job_abort() specifically ... > > Add the missing call to fix the v4l2_m2m_cancel_job() hangs. > > Fixes: 9707a6254a8a ("media: chips-media: wave5: Add the v4l2 layer") > Signed-off-by: Mattijs Korpershoek <mkorpershoek@baylibre.com> > --- > This has been tested on the AM62Px SK EVM using Android 14. > See: > https://www.ti.com/tool/PROCESSOR-SDK-AM62P > > Note: while this is has not been tested on an upstream linux tree, I > believe the fix is still valid for mainline and I would like to get > some feedback on it. > > Thank you in advance for your time. > --- > drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c > index ef227af72348..b03e3633a1bc 100644 > --- a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c > +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c > @@ -1715,6 +1715,7 @@ static void wave5_vpu_dec_device_run(void *priv) > static void wave5_vpu_dec_job_abort(void *priv) > { > struct vpu_instance *inst = priv; > + struct v4l2_m2m_ctx *m2m_ctx = inst->v4l2_fh.m2m_ctx; > int ret; > > ret = switch_state(inst, VPU_INST_STATE_STOP); > @@ -1725,6 +1726,8 @@ static void wave5_vpu_dec_job_abort(void *priv) > if (ret) > dev_warn(inst->dev->dev, > "Setting EOS for the bitstream, fail: %d\n", ret); > + > + v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx); Wave5 firmware have no function to cancel pending jobs. By finishing the job without caring about the firmware state, wave5_vpu_dec_finish_decode() will be called concurrently to another job. This change will effectively breaks seeking, and will cause warning and possibly memory corruption. In principle, setting the EOS flag should be enough to ensure that the drain sequence is initiated, and that should allow finish_decoder() to be called, which is the only clean way to get finish_job() to be called. This driver implementation uses PIC_END operating mode of the IP, that ensure that each submitted job is assumed to be complete, which means each RUN will lead to a matching IRQ. We had a similar stall during development which was fixed with a firmware update, are you sure you have the very latest firmware ? Any chance you can use v4l2-tracer to share what your software have been doing ? What you can though though, to fortify this a little, is to introduce a watchdog here. You can trigger it from abort, and on timeout, you will have to fully reset and reboot the chip (which is not very fast, you don't want to do this at all time). When the reset have completed, you will have to carefully reset the driver state before you can safely continue. You'll need to add thread safe protection against spurious finish_decode() call, in case the watchdog timeout raced with the firmware finally coming back to life. regards, Nicolas p.s. you should perhaps trace the firmware job counters to try and understand more about your specific hang. > } > > static int wave5_vpu_dec_job_ready(void *priv) > > --- > base-commit: 8c64f4cdf4e6cc5682c52523713af8c39c94e6d5 > change-id: 20240228-wave5-fix-abort-f72d25881cbd > > Best regards,
Hi Nicolas, Thank you for your prompt reply. On ven., mars 01, 2024 at 09:01, Nicolas Dufresne <nicolas.dufresne@collabora.com> wrote: > Hi, > > Le mercredi 28 février 2024 à 17:12 +0100, Mattijs Korpershoek a écrit : >> When aborting a stream, it's possible that v4l2_m2m_cancel_job() >> remains stuck: >> >> [ 3498.490038][ T1] sysrq: Show Blocked State >> [ 3498.511754][ T1] task:V4L2DecodeCompo state:D stack:12480 pid:2387 ppid:1 flags:0x04000809 >> [ 3498.521153][ T1] Call trace: >> [ 3498.524333][ T1] __switch_to+0x174/0x338 >> [ 3498.528611][ T1] __schedule+0x5ec/0x9cc >> [ 3498.532795][ T1] schedule+0x84/0xf0 >> [ 3498.536630][ T1] v4l2_m2m_cancel_job+0x118/0x194 >> [ 3498.541595][ T1] v4l2_m2m_streamoff+0x34/0x13c >> [ 3498.546384][ T1] v4l2_m2m_ioctl_streamoff+0x20/0x30 >> [ 3498.551607][ T1] v4l_streamoff+0x44/0x54 >> [ 3498.555877][ T1] __video_do_ioctl+0x388/0x4dc >> [ 3498.560580][ T1] video_usercopy+0x618/0xa0c >> [ 3498.565109][ T1] video_ioctl2+0x20/0x30 >> [ 3498.569292][ T1] v4l2_ioctl+0x74/0x8c >> [ 3498.573300][ T1] __arm64_sys_ioctl+0xb0/0xec >> [ 3498.577918][ T1] invoke_syscall+0x60/0x124 >> [ 3498.582368][ T1] el0_svc_common+0x90/0xfc >> [ 3498.586724][ T1] do_el0_svc+0x34/0xb8 >> [ 3498.590733][ T1] el0_svc+0x2c/0xa4 >> [ 3498.594480][ T1] el0t_64_sync_handler+0x68/0xb4 >> [ 3498.599354][ T1] el0t_64_sync+0x1a4/0x1a8 >> [ 3498.603832][ T1] sysrq: Kill All Tasks >> >> According to job_abort() documentation from v4l2_m2m_ops: >> >> After the driver performs the necessary steps, it has to call >> v4l2_m2m_job_finish() or v4l2_m2m_buf_done_and_job_finish() as >> if the transaction ended normally. >> >> This is not done in wave5_vpu_dec_job_abort(). Neither switch_state() nor >> wave5_vpu_dec_set_eos_on_firmware() does this. > > The doc said "the driver", not job_abort() specifically ... Indeed. Seems I wanted to convince myself that this was job_abort() specific. Sorry about that. > >> >> Add the missing call to fix the v4l2_m2m_cancel_job() hangs. >> >> Fixes: 9707a6254a8a ("media: chips-media: wave5: Add the v4l2 layer") >> Signed-off-by: Mattijs Korpershoek <mkorpershoek@baylibre.com> >> --- >> This has been tested on the AM62Px SK EVM using Android 14. >> See: >> https://www.ti.com/tool/PROCESSOR-SDK-AM62P >> >> Note: while this is has not been tested on an upstream linux tree, I >> believe the fix is still valid for mainline and I would like to get >> some feedback on it. >> >> Thank you in advance for your time. >> --- >> drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c >> index ef227af72348..b03e3633a1bc 100644 >> --- a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c >> +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c >> @@ -1715,6 +1715,7 @@ static void wave5_vpu_dec_device_run(void *priv) >> static void wave5_vpu_dec_job_abort(void *priv) >> { >> struct vpu_instance *inst = priv; >> + struct v4l2_m2m_ctx *m2m_ctx = inst->v4l2_fh.m2m_ctx; >> int ret; >> >> ret = switch_state(inst, VPU_INST_STATE_STOP); >> @@ -1725,6 +1726,8 @@ static void wave5_vpu_dec_job_abort(void *priv) >> if (ret) >> dev_warn(inst->dev->dev, >> "Setting EOS for the bitstream, fail: %d\n", ret); >> + >> + v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx); > > Wave5 firmware have no function to cancel pending jobs. By finishing the job > without caring about the firmware state, wave5_vpu_dec_finish_decode() will be > called concurrently to another job. This change will effectively breaks seeking, > and will cause warning and possibly memory corruption. Ah. That's not what I intended. This patch would completely break the driver. Thank you for explaining that to me. > > In principle, setting the EOS flag should be enough to ensure that the drain > sequence is initiated, and that should allow finish_decoder() to be called, > which is the only clean way to get finish_job() to be called. > > This driver implementation uses PIC_END operating mode of the IP, that ensure > that each submitted job is assumed to be complete, which means each RUN will > lead to a matching IRQ. We had a similar stall during development which was > fixed with a firmware update, are you sure you have the very latest firmware ? Interesting. I double checked the firmware from linux-firmware: $ ~/work/upstream/linux-firmware/ main md5sum cnm/wave521c_k3_codec_fw.bin 02c5faa5405559bd59a7361a32c2695a cnm/wave521c_k3_codec_fw.bin $ ~/ adb shell md5sum /vendor/firmware/cnm/wave521c_k3_codec_fw.bin 02c5faa5405559bd59a7361a32c2695a /vendor/firmware/cnm/wave521c_k3_codec_fwbin Which should be: "FW version : 1.0.3" > Any chance you can use v4l2-tracer to share what your software have been doing ? I did not know v4l2-tracer. I tried to run it on android but it seems it segfaults when overriding mmap(). I will continue to try to get some traces. > > What you can though though, to fortify this a little, is to introduce a watchdog > here. You can trigger it from abort, and on timeout, you will have to fully > reset and reboot the chip (which is not very fast, you don't want to do this at > all time). When the reset have completed, you will have to carefully reset the > driver state before you can safely continue. You'll need to add thread safe > protection against spurious finish_decode() call, in case the watchdog timeout > raced with the firmware finally coming back to life. Ok, I can look into that as well. Given that i'm not super familiar with multimedia, this has helped me a lot. Thanks! > > regards, > Nicolas > > p.s. you should perhaps trace the firmware job counters to try and understand > more about your specific hang. I will look into it. > >> } >> >> static int wave5_vpu_dec_job_ready(void *priv) >> >> --- >> base-commit: 8c64f4cdf4e6cc5682c52523713af8c39c94e6d5 >> change-id: 20240228-wave5-fix-abort-f72d25881cbd >> >> Best regards,
diff --git a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c index ef227af72348..b03e3633a1bc 100644 --- a/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c +++ b/drivers/media/platform/chips-media/wave5/wave5-vpu-dec.c @@ -1715,6 +1715,7 @@ static void wave5_vpu_dec_device_run(void *priv) static void wave5_vpu_dec_job_abort(void *priv) { struct vpu_instance *inst = priv; + struct v4l2_m2m_ctx *m2m_ctx = inst->v4l2_fh.m2m_ctx; int ret; ret = switch_state(inst, VPU_INST_STATE_STOP); @@ -1725,6 +1726,8 @@ static void wave5_vpu_dec_job_abort(void *priv) if (ret) dev_warn(inst->dev->dev, "Setting EOS for the bitstream, fail: %d\n", ret); + + v4l2_m2m_job_finish(inst->v4l2_m2m_dev, m2m_ctx); } static int wave5_vpu_dec_job_ready(void *priv)