diff mbox series

[v2] media: verisilicon: Fix crash when probing encoder

Message ID	20230413104756.356695-1-benjamin.gaignard@collabora.com
State	New
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; sender: benjamin.gaignard) by madras.collabora.co.uk (Postfix) with ESMTPSA id 5A6AC660320E; Thu, 13 Apr 2023 11:48:04 +0100 (BST) From: Benjamin Gaignard <benjamin.gaignard@collabora.com> To: ezequiel@vanguardiasur.com.ar, p.zabel@pengutronix.de, mchehab@kernel.org, m.szyprowski@samsung.com Cc: linux-media@vger.kernel.org, linux-rockchip@lists.infradead.org, linux-kernel@vger.kernel.org, kernel@collabora.com, Benjamin Gaignard <benjamin.gaignard@collabora.com> Subject: [PATCH v2] media: verisilicon: Fix crash when probing encoder Date: Thu, 13 Apr 2023 12:47:56 +0200 Message-Id: <20230413104756.356695-1-benjamin.gaignard@collabora.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	[v2] media: verisilicon: Fix crash when probing encoder \| [v2] media: verisilicon: Fix crash when probing encoder

Commit Message

Benjamin Gaignard April 13, 2023, 10:47 a.m. UTC

  ctx->vpu_dst_fmt is no more initialized before calling hantro_try_fmt()
so assigne it to vpu_fmt led to crash the kernel.
Like for decoder case use 'fmt' as format for encoder and clean up
the code.

Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Fixes: db6f68b51e5c ("media: verisilicon: Do not set context src/dst formats in reset functions")
---
version 2:
- Remove useless vpu_fmt.

 drivers/media/platform/verisilicon/hantro_v4l2.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

Comments

Nicolas Dufresne April 13, 2023, 7:49 p.m. UTC | #1

Hi,

cause I sent a Rb on old one after v2 was sent, the extra cleanup make sense to
me.

Le jeudi 13 avril 2023 à 12:47 +0200, Benjamin Gaignard a écrit :
> ctx->vpu_dst_fmt is no more initialized before calling hantro_try_fmt()
> so assigne it to vpu_fmt led to crash the kernel.
> Like for decoder case use 'fmt' as format for encoder and clean up
> the code.
> 
> Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Fixes: db6f68b51e5c ("media: verisilicon: Do not set context src/dst formats in reset functions")


Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>

> ---
> version 2:
> - Remove useless vpu_fmt.
> 
>  drivers/media/platform/verisilicon/hantro_v4l2.c | 10 +++-------
>  1 file changed, 3 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/media/platform/verisilicon/hantro_v4l2.c b/drivers/media/platform/verisilicon/hantro_v4l2.c
> index 8f1414085f47..d71f79471396 100644
> --- a/drivers/media/platform/verisilicon/hantro_v4l2.c
> +++ b/drivers/media/platform/verisilicon/hantro_v4l2.c
> @@ -275,7 +275,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx,
>  			  struct v4l2_pix_format_mplane *pix_mp,
>  			  enum v4l2_buf_type type)
>  {
> -	const struct hantro_fmt *fmt, *vpu_fmt;
> +	const struct hantro_fmt *fmt;
>  	bool capture = V4L2_TYPE_IS_CAPTURE(type);
>  	bool coded;
>  
> @@ -295,11 +295,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx,
>  
>  	if (coded) {
>  		pix_mp->num_planes = 1;
> -		vpu_fmt = fmt;
> -	} else if (ctx->is_encoder) {
> -		vpu_fmt = ctx->vpu_dst_fmt;
> -	} else {
> -		vpu_fmt = fmt;
> +	} else if (!ctx->is_encoder) {
>  		/*
>  		 * Width/height on the CAPTURE end of a decoder are ignored and
>  		 * replaced by the OUTPUT ones.
> @@ -311,7 +307,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx,
>  	pix_mp->field = V4L2_FIELD_NONE;
>  
>  	v4l2_apply_frmsize_constraints(&pix_mp->width, &pix_mp->height,
> -				       &vpu_fmt->frmsize);
> +				       &fmt->frmsize);
>  
>  	if (!coded) {
>  		/* Fill remaining fields */
> -- 
> 2.34.1
> 
>

Nicolas Dufresne April 13, 2023, 7:52 p.m. UTC | #2

Hi,

Le jeudi 13 avril 2023 à 10:10 -0300, Ezequiel Garcia a écrit :
> Benjamin,
> 
> Please include the crash stracktrace in the commit.
> 
> 
Careful with HTML message, they don't always make it in these ML and tooling
might not play well with the tooling. Perhaps it can be edited while pulling ?
Here's the info from Marek's bug report:

hantro-vpu fdea0000.video-codec: Adding to iommu group 0
hantro-vpu fdea0000.video-codec: registered rockchip,rk3568-vpu-dec as 
/dev/video0
hantro-vpu fdee0000.video-codec: Adding to iommu group 1
hantro-vpu fdee0000.video-codec: registered rockchip,rk3568-vepu-enc as 
/dev/video1
Unable to handle kernel NULL pointer dereference at virtual address 
0000000000000008
Mem abort info:
   ESR = 0x0000000096000004
   EC = 0x25: DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
   FSC = 0x04: level 0 translation fault
Data abort info:
   ISV = 0, ISS = 0x00000004
   CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp=00000001f446f000
[0000000000000008] pgd=0000000000000000, p4d=0000000000000000
Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
Modules linked in: hantro_vpu v4l2_vp9 v4l2_h264 v4l2_mem2mem 
videobuf2_dma_contig snd_soc_simple_card display_connector 
snd_soc_simple_card_utils videobuf2_memops crct10dif_ce dwmac_rk 
rockchip_thermal videobuf2_v4l2 stmmac_platform rockchip_saradc 
industrialio_triggered_buffer kfifo_buf stmmac videodev pcs_xpcs 
rtc_rk808 videobuf2_common rockchipdrm panfrost mc drm_shmem_helper 
analogix_dp gpu_sched dw_mipi_dsi dw_hdmi drm_display_helper ip_tables 
x_tables ipv6
CPU: 3 PID: 171 Comm: v4l_id Not tainted 6.3.0-rc2+ #13478
Hardware name: Hardkernel ODROID-M1 (DT)
pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : hantro_try_fmt+0xb4/0x280 [hantro_vpu]
lr : hantro_try_fmt+0xa8/0x280 [hantro_vpu]
...
Call trace:
  hantro_try_fmt+0xb4/0x280 [hantro_vpu]
  hantro_set_fmt_out+0x3c/0x278 [hantro_vpu]
  hantro_reset_raw_fmt+0x94/0xb4 [hantro_vpu]
  hantro_set_fmt_cap+0x23c/0x250 [hantro_vpu]
  hantro_reset_fmts+0x94/0xcc [hantro_vpu]
  hantro_open+0xd4/0x20c [hantro_vpu]
  v4l2_open+0x80/0x120 [videodev]
  chrdev_open+0xc0/0x22c
  do_dentry_open+0x13c/0x490
  vfs_open+0x2c/0x38
  path_openat+0x550/0x938
  do_filp_open+0x80/0x12c
  do_sys_openat2+0xb4/0x16c
  __arm64_sys_openat+0x64/0xac
  invoke_syscall+0x48/0x114
  el0_svc_common.constprop.0+0xfc/0x11c
  do_el0_svc+0x38/0xa4
  el0_svc+0x48/0xb8
  el0t_64_sync_handler+0xb8/0xbc
  el0t_64_sync+0x190/0x194
Code: 97fe726c f940aa80 52864a61 72a686c1 (b9400800)
---[ end trace 0000000000000000 ]---


> 
> Thanks,
> Ezequiel
> 
> 
> On Thu, Apr 13, 2023 at 7:48 AM Benjamin Gaignard
> <benjamin.gaignard@collabora.com> wrote:
> > ctx->vpu_dst_fmt is no more initialized before calling hantro_try_fmt()
> > so assigne it to vpu_fmt led to crash the kernel.
> > Like for decoder case use 'fmt' as format for encoder and clean up
> > the code.
> > 
> > Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
> > Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> > Fixes: db6f68b51e5c ("media: verisilicon: Do not set context src/dst formats
> > in reset functions")
> > ---
> > version 2:
> > - Remove useless vpu_fmt.
> > 
> >  drivers/media/platform/verisilicon/hantro_v4l2.c | 10 +++-------
> >  1 file changed, 3 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/media/platform/verisilicon/hantro_v4l2.c
> > b/drivers/media/platform/verisilicon/hantro_v4l2.c
> > index 8f1414085f47..d71f79471396 100644
> > --- a/drivers/media/platform/verisilicon/hantro_v4l2.c
> > +++ b/drivers/media/platform/verisilicon/hantro_v4l2.c
> > @@ -275,7 +275,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx,
> >                           struct v4l2_pix_format_mplane *pix_mp,
> >                           enum v4l2_buf_type type)
> >  {
> > -       const struct hantro_fmt *fmt, *vpu_fmt;
> > +       const struct hantro_fmt *fmt;
> >         bool capture = V4L2_TYPE_IS_CAPTURE(type);
> >         bool coded;
> > 
> > @@ -295,11 +295,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx,
> > 
> >         if (coded) {
> >                 pix_mp->num_planes = 1;
> > -               vpu_fmt = fmt;
> > -       } else if (ctx->is_encoder) {
> > -               vpu_fmt = ctx->vpu_dst_fmt;
> > -       } else {
> > -               vpu_fmt = fmt;
> > +       } else if (!ctx->is_encoder) {
> >                 /*
> >                  * Width/height on the CAPTURE end of a decoder are ignored
> > and
> >                  * replaced by the OUTPUT ones.
> > @@ -311,7 +307,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx,
> >         pix_mp->field = V4L2_FIELD_NONE;
> > 
> >         v4l2_apply_frmsize_constraints(&pix_mp->width, &pix_mp->height,
> > -                                      &vpu_fmt->frmsize);
> > +                                      &fmt->frmsize);
> > 
> >         if (!coded) {
> >                 /* Fill remaining fields */

AngeloGioacchino Del Regno April 14, 2023, 10:37 a.m. UTC | #3

Il 13/04/23 12:47, Benjamin Gaignard ha scritto:
> ctx->vpu_dst_fmt is no more initialized before calling hantro_try_fmt()
> so assigne it to vpu_fmt led to crash the kernel.
> Like for decoder case use 'fmt' as format for encoder and clean up
> the code.
> 
> Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Fixes: db6f68b51e5c ("media: verisilicon: Do not set context src/dst formats in reset functions")

Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>

Diederik de Haas May 19, 2023, 10:34 p.m. UTC | #4

Hi,

On Thursday, 13 April 2023 21:52:50 CEST Nicolas Dufresne wrote:
> Le jeudi 13 avril 2023 à 10:10 -0300, Ezequiel Garcia a écrit :
> > Benjamin,
> > 
> > Please include the crash stracktrace in the commit.
> 
> Careful with HTML message, they don't always make it in these ML and tooling
> might not play well with the tooling. Perhaps it can be edited while
> pulling ? Here's the info from Marek's bug report:
> 
> hantro-vpu fdea0000.video-codec: Adding to iommu group 0
> hantro-vpu fdea0000.video-codec: registered rockchip,rk3568-vpu-dec as
> /dev/video0
> hantro-vpu fdee0000.video-codec: Adding to iommu group 1
> hantro-vpu fdee0000.video-codec: registered rockchip,rk3568-vepu-enc as
> /dev/video1
> Unable to handle kernel NULL pointer dereference at virtual address
> 0000000000000008
> Mem abort info:
>    ESR = 0x0000000096000004
>    EC = 0x25: DABT (current EL), IL = 32 bits
>    SET = 0, FnV = 0
>    EA = 0, S1PTW = 0
>    FSC = 0x04: level 0 translation fault
> Data abort info:
>    ISV = 0, ISS = 0x00000004
>    CM = 0, WnR = 0
> user pgtable: 4k pages, 48-bit VAs, pgdp=00000001f446f000
> [0000000000000008] pgd=0000000000000000, p4d=0000000000000000
> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> Modules linked in: hantro_vpu v4l2_vp9 v4l2_h264 v4l2_mem2mem
> videobuf2_dma_contig snd_soc_simple_card display_connector
> snd_soc_simple_card_utils videobuf2_memops crct10dif_ce dwmac_rk
> rockchip_thermal videobuf2_v4l2 stmmac_platform rockchip_saradc
> industrialio_triggered_buffer kfifo_buf stmmac videodev pcs_xpcs
> rtc_rk808 videobuf2_common rockchipdrm panfrost mc drm_shmem_helper
> analogix_dp gpu_sched dw_mipi_dsi dw_hdmi drm_display_helper ip_tables
> x_tables ipv6
> CPU: 3 PID: 171 Comm: v4l_id Not tainted 6.3.0-rc2+ #13478
> Hardware name: Hardkernel ODROID-M1 (DT)
> pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : hantro_try_fmt+0xb4/0x280 [hantro_vpu]
> lr : hantro_try_fmt+0xa8/0x280 [hantro_vpu]
> ...
> Call trace:
>   hantro_try_fmt+0xb4/0x280 [hantro_vpu]
>   hantro_set_fmt_out+0x3c/0x278 [hantro_vpu]
>   hantro_reset_raw_fmt+0x94/0xb4 [hantro_vpu]
>   hantro_set_fmt_cap+0x23c/0x250 [hantro_vpu]
>   hantro_reset_fmts+0x94/0xcc [hantro_vpu]
>   hantro_open+0xd4/0x20c [hantro_vpu]
>   v4l2_open+0x80/0x120 [videodev]
>   chrdev_open+0xc0/0x22c
>   do_dentry_open+0x13c/0x490
>   vfs_open+0x2c/0x38
>   path_openat+0x550/0x938
>   do_filp_open+0x80/0x12c
>   do_sys_openat2+0xb4/0x16c
>   __arm64_sys_openat+0x64/0xac
>   invoke_syscall+0x48/0x114
>   el0_svc_common.constprop.0+0xfc/0x11c
>   do_el0_svc+0x38/0xa4
>   el0_svc+0x48/0xb8
>   el0t_64_sync_handler+0xb8/0xbc
>   el0t_64_sync+0x190/0x194
> Code: 97fe726c f940aa80 52864a61 72a686c1 (b9400800)
> ---[ end trace 0000000000000000 ]---

When I booted into my 6.4-rc1 (but also rc2) kernel on my
Pine64 Quartz64 Model A, I noticed a crash which seems the same as
above, but I didn't have such a crash with my 6.3 kernel.
Searching for 'hantro' led me to this commit as the most likely culprit
but when I build a new 6.4-rcX kernel with this commit reverted,
I still had this crash.
Do you have suggestions which commit would then be the likely culprit?

Cheers,
  Diederik

For completeness, this is the error I got with 6.4-rcX:

[   26.976766] panfrost fde60000.gpu: clock rate = 594000000
[   26.977297] panfrost fde60000.gpu: bus_clock rate = 500000000
[   26.996012] random: crng init done
[   27.072438] videodev: Linux video capture interface: v2.00
[   27.119012] Registered IR keymap rc-cec
[   27.125161] rc rc0: dw_hdmi as /devices/platform/fe0a0000.hdmi/rc/rc0
[   27.125427] panfrost fde60000.gpu: mali-g52 id 0x7402 major 0x1 minor 0x0 status 0x0
[   27.125905] input: dw_hdmi as /devices/platform/fe0a0000.hdmi/rc/rc0/input1
[   27.126427] panfrost fde60000.gpu: features: 00000000,00000cf7, issues: 00000000,00000400
[   27.127785] panfrost fde60000.gpu: Features: L2:0x07110206 Shader:0x00000002 Tiler:0x00000209 Mem:0x1 MMU:0x00002823 AS:0xff JS:0x7
[   27.128954] panfrost fde60000.gpu: shader_present=0x1 l2_present=0x1
[   27.148920] gpio-fan gpio_fan: GPIO fan initialized
[   27.191131] [drm] Initialized panfrost 1.2.0 20180908 for fde60000.gpu on minor 1
[   27.265920] hantro-vpu fdea0000.video-codec: Adding to iommu group 0
[   27.267535] hantro-vpu fdea0000.video-codec: registered rockchip,rk3568-vpu-dec as /dev/video0
[   27.270668] hantro-vpu fdee0000.video-codec: Adding to iommu group 1
[   27.272590] hantro-vpu fdee0000.video-codec: registered rockchip,rk3568-vepu-enc as /dev/video1
[   27.573417] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[   27.574238] Mem abort info:
[   27.574499]   ESR = 0x0000000096000004
[   27.574836]   EC = 0x25: DABT (current EL), IL = 32 bits
[   27.575310]   SET = 0, FnV = 0
[   27.575586]   EA = 0, S1PTW = 0
[   27.575868]   FSC = 0x04: level 0 translation fault
[   27.576368] Data abort info:
[   27.576637]   ISV = 0, ISS = 0x00000004
[   27.576980]   CM = 0, WnR = 0
[   27.577247] user pgtable: 4k pages, 48-bit VAs, pgdp=000000010718b000
[   27.577818] [0000000000000008] pgd=0000000000000000, p4d=0000000000000000
[   27.578430] Internal error: Oops: 0000000096000004 [#1] SMP
[   27.578934] Modules linked in: polyval_generic(E+) ghash_ce(E) gf128mul(E) snd_soc_hdmi_codec(E+) sha2_ce(E) ecdh_generic(E+) sha256_arm64(E) sha1_ce(E) rfkill(E) snd_soc_spdif_tx(E) ecc(E) leds_gpio(E) snd_soc_simple_card(E) snd_soc_simple_card_utils(E) display_connector(E) gpio_fan(E) hantro_vpu(E) v4l2_vp9(E) snd_soc_rockchip_i2s_tdm(E) v4l2_h264(E) videobuf2_dma_contig(E) snd_soc_rockchip_spdif(E) snd_soc_rk817(E) v4l2_mem2mem(E) videobuf2_memops(E) governor_simpleondemand(E) rockchip_thermal(E) videobuf2_v4l2(E) dw_wdt(E) snd_soc_core(E) dw_hdmi_i2s_audio(E) dw_hdmi_cec(E) videodev(E) snd_pcm_dmaengine(E) panfrost(E) videobuf2_common(E) snd_pcm(E) rk805_pwrkey(E) snd_timer(E) gpu_sched(E) snd(E) soundcore(E) mc(E) drm_shmem_helper(E) cpufreq_dt(E) loop(E) fuse(E) efi_pstore(E) dm_mod(E) dax(E) configfs(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) crc32c_generic(E) xhci_plat_hcd(E) xhci_hcd(E) motorcomm(E) rk808_regulator(E) fan53555(E) gpio_rockchip(E) dwmac_rk(E) stmmac_platform(E)
[   27.579108]  crct10dif_ce(E) stmmac(E) pcs_xpcs(E) crct10dif_common(E) phylink(E) fixed(E) of_mdio(E) pinctrl_rockchip(E) phy_rockchip_inno_usb2(E) fixed_phy(E) sdhci_of_dwcmshc(E) dw_mmc_rockchip(E) fwnode_mdio(E) sdhci_pltfm(E) phy_rockchip_naneng_combphy(E) dw_mmc_pltfm(E) sdhci(E) dw_mmc(E) pl330(E) libphy(E) rockchipdrm(E) ptp(E) drm_dma_helper(E) analogix_dp(E) dw_hdmi(E) cec(E) rc_core(E) drm_display_helper(E) dw_mipi_dsi(E) pps_core(E) drm_kms_helper(E) ohci_platform(E) ohci_hcd(E) ehci_platform(E) io_domain(E) ehci_hcd(E) dwc3(E) i2c_rk3x(E) udc_core(E) usbcore(E) roles(E) ulpi(E) drm(E) usb_common(E)
[   27.591758] CPU: 1 PID: 407 Comm: v4l_id Tainted: G            E      6.4.0-0-pine64-arm64 #1  Debian 6.4~rc2-1~pine64
[   27.592705] Hardware name: Pine64 RK3566 Quartz64-A Board (DT)
[   27.593223] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   27.593843] pc : hantro_try_fmt+0xb8/0x290 [hantro_vpu]
[   27.594336] lr : hantro_try_fmt+0xac/0x290 [hantro_vpu]
[   27.594811] sp : ffff80000aa9b670
[   27.595107] x29: ffff80000aa9b670 x28: ffff80000aa9bb60 x27: 0000000000000000
[   27.595746] x26: 0000000000000000 x25: ffff000108380008 x24: ffff8000014ebac0
[   27.596383] x23: 0000000000000001 x22: ffff000108380000 x21: 0000000000000000
[   27.597019] x20: ffff8000014f10f0 x19: ffff80000aa9b708 x18: 0000000000000010
[   27.597657] x17: 0000000000000000 x16: 0000000000000000 x15: ffff80000aa9b6a0
[   27.598292] x14: ffff0001043fe280 x13: 0000000000000001 x12: 0000000000000001
[   27.598927] x11: 0000000000000002 x10: 0000000000000003 x9 : 0000000000000002
[   27.599563] x8 : 000000000000001f x7 : 000000000000005f x6 : 0000000000000003
[   27.600199] x5 : ffff8000013e2583 x4 : ffff8000013e2580 x3 : 00000000ffffffff
[   27.600834] x2 : 0000000000000010 x1 : 0000000000000000 x0 : 0000000034363253
[   27.601470] Call trace:
[   27.601693]  hantro_try_fmt+0xb8/0x290 [hantro_vpu]
[   27.602143]  hantro_set_fmt_out+0x44/0x388 [hantro_vpu]
[   27.602617]  hantro_reset_raw_fmt+0x7c/0xe0 [hantro_vpu]
[   27.603096]  hantro_set_fmt_cap+0x29c/0x2b8 [hantro_vpu]
[   27.603575]  hantro_reset_encoded_fmt+0x80/0xc0 [hantro_vpu]
[   27.604086]  hantro_reset_fmts+0x20/0x48 [hantro_vpu]
[   27.604545]  hantro_open+0xe0/0x208 [hantro_vpu]
[   27.604965]  v4l2_open+0x84/0x130 [videodev]
[   27.605389]  chrdev_open+0xd8/0x2d8
[   27.605712]  do_dentry_open+0x1bc/0x490
[   27.606060]  vfs_open+0x34/0x40
[   27.606345]  path_openat+0x9c8/0xf20
[   27.606670]  do_filp_open+0xa4/0x160
[   27.606991]  do_sys_openat2+0xc8/0x188
[   27.607327]  __arm64_sys_openat+0x6c/0xb8
[   27.607687]  invoke_syscall+0x78/0x108
[   27.608029]  el0_svc_common.constprop.0+0xd4/0x100
[   27.608456]  do_el0_svc+0x40/0xa8
[   27.608755]  el0_svc+0x34/0xd8
[   27.609036]  el0t_64_sync_handler+0xf4/0x120
[   27.609420]  el0t_64_sync+0x190/0x198
[   27.609753] Code: 97fbc8e2 f94056c1 52864a60 72a686c0 (b9400821) 
[   27.610297] ---[ end trace 0000000000000000 ]---

Benjamin Gaignard May 22, 2023, 4:17 p.m. UTC | #5

Le 20/05/2023 à 00:34, Diederik de Haas a écrit :
> Hi,
>
> On Thursday, 13 April 2023 21:52:50 CEST Nicolas Dufresne wrote:
>> Le jeudi 13 avril 2023 à 10:10 -0300, Ezequiel Garcia a écrit :
>>> Benjamin,
>>>
>>> Please include the crash stracktrace in the commit.
>> Careful with HTML message, they don't always make it in these ML and tooling
>> might not play well with the tooling. Perhaps it can be edited while
>> pulling ? Here's the info from Marek's bug report:
>>
>> hantro-vpu fdea0000.video-codec: Adding to iommu group 0
>> hantro-vpu fdea0000.video-codec: registered rockchip,rk3568-vpu-dec as
>> /dev/video0
>> hantro-vpu fdee0000.video-codec: Adding to iommu group 1
>> hantro-vpu fdee0000.video-codec: registered rockchip,rk3568-vepu-enc as
>> /dev/video1
>> Unable to handle kernel NULL pointer dereference at virtual address
>> 0000000000000008
>> Mem abort info:
>>     ESR = 0x0000000096000004
>>     EC = 0x25: DABT (current EL), IL = 32 bits
>>     SET = 0, FnV = 0
>>     EA = 0, S1PTW = 0
>>     FSC = 0x04: level 0 translation fault
>> Data abort info:
>>     ISV = 0, ISS = 0x00000004
>>     CM = 0, WnR = 0
>> user pgtable: 4k pages, 48-bit VAs, pgdp=00000001f446f000
>> [0000000000000008] pgd=0000000000000000, p4d=0000000000000000
>> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
>> Modules linked in: hantro_vpu v4l2_vp9 v4l2_h264 v4l2_mem2mem
>> videobuf2_dma_contig snd_soc_simple_card display_connector
>> snd_soc_simple_card_utils videobuf2_memops crct10dif_ce dwmac_rk
>> rockchip_thermal videobuf2_v4l2 stmmac_platform rockchip_saradc
>> industrialio_triggered_buffer kfifo_buf stmmac videodev pcs_xpcs
>> rtc_rk808 videobuf2_common rockchipdrm panfrost mc drm_shmem_helper
>> analogix_dp gpu_sched dw_mipi_dsi dw_hdmi drm_display_helper ip_tables
>> x_tables ipv6
>> CPU: 3 PID: 171 Comm: v4l_id Not tainted 6.3.0-rc2+ #13478
>> Hardware name: Hardkernel ODROID-M1 (DT)
>> pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> pc : hantro_try_fmt+0xb4/0x280 [hantro_vpu]
>> lr : hantro_try_fmt+0xa8/0x280 [hantro_vpu]
>> ...
>> Call trace:
>>    hantro_try_fmt+0xb4/0x280 [hantro_vpu]
>>    hantro_set_fmt_out+0x3c/0x278 [hantro_vpu]
>>    hantro_reset_raw_fmt+0x94/0xb4 [hantro_vpu]
>>    hantro_set_fmt_cap+0x23c/0x250 [hantro_vpu]
>>    hantro_reset_fmts+0x94/0xcc [hantro_vpu]
>>    hantro_open+0xd4/0x20c [hantro_vpu]
>>    v4l2_open+0x80/0x120 [videodev]
>>    chrdev_open+0xc0/0x22c
>>    do_dentry_open+0x13c/0x490
>>    vfs_open+0x2c/0x38
>>    path_openat+0x550/0x938
>>    do_filp_open+0x80/0x12c
>>    do_sys_openat2+0xb4/0x16c
>>    __arm64_sys_openat+0x64/0xac
>>    invoke_syscall+0x48/0x114
>>    el0_svc_common.constprop.0+0xfc/0x11c
>>    do_el0_svc+0x38/0xa4
>>    el0_svc+0x48/0xb8
>>    el0t_64_sync_handler+0xb8/0xbc
>>    el0t_64_sync+0x190/0x194
>> Code: 97fe726c f940aa80 52864a61 72a686c1 (b9400800)
>> ---[ end trace 0000000000000000 ]---
> When I booted into my 6.4-rc1 (but also rc2) kernel on my
> Pine64 Quartz64 Model A, I noticed a crash which seems the same as
> above, but I didn't have such a crash with my 6.3 kernel.
> Searching for 'hantro' led me to this commit as the most likely culprit
> but when I build a new 6.4-rcX kernel with this commit reverted,
> I still had this crash.
> Do you have suggestions which commit would then be the likely culprit?
>
This patch fix the crash at boot time, revert it doesn't seem to be the solution.
Maybe this proposal from Marek can help you ?
https://patchwork.kernel.org/project/linux-media/patch/20230421104759.2236463-1-m.szyprowski@samsung.com/

Regards,
Benjamin

> Cheers,
>    Diederik
>
> For completeness, this is the error I got with 6.4-rcX:
>
> [   26.976766] panfrost fde60000.gpu: clock rate = 594000000
> [   26.977297] panfrost fde60000.gpu: bus_clock rate = 500000000
> [   26.996012] random: crng init done
> [   27.072438] videodev: Linux video capture interface: v2.00
> [   27.119012] Registered IR keymap rc-cec
> [   27.125161] rc rc0: dw_hdmi as /devices/platform/fe0a0000.hdmi/rc/rc0
> [   27.125427] panfrost fde60000.gpu: mali-g52 id 0x7402 major 0x1 minor 0x0 status 0x0
> [   27.125905] input: dw_hdmi as /devices/platform/fe0a0000.hdmi/rc/rc0/input1
> [   27.126427] panfrost fde60000.gpu: features: 00000000,00000cf7, issues: 00000000,00000400
> [   27.127785] panfrost fde60000.gpu: Features: L2:0x07110206 Shader:0x00000002 Tiler:0x00000209 Mem:0x1 MMU:0x00002823 AS:0xff JS:0x7
> [   27.128954] panfrost fde60000.gpu: shader_present=0x1 l2_present=0x1
> [   27.148920] gpio-fan gpio_fan: GPIO fan initialized
> [   27.191131] [drm] Initialized panfrost 1.2.0 20180908 for fde60000.gpu on minor 1
> [   27.265920] hantro-vpu fdea0000.video-codec: Adding to iommu group 0
> [   27.267535] hantro-vpu fdea0000.video-codec: registered rockchip,rk3568-vpu-dec as /dev/video0
> [   27.270668] hantro-vpu fdee0000.video-codec: Adding to iommu group 1
> [   27.272590] hantro-vpu fdee0000.video-codec: registered rockchip,rk3568-vepu-enc as /dev/video1
> [   27.573417] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
> [   27.574238] Mem abort info:
> [   27.574499]   ESR = 0x0000000096000004
> [   27.574836]   EC = 0x25: DABT (current EL), IL = 32 bits
> [   27.575310]   SET = 0, FnV = 0
> [   27.575586]   EA = 0, S1PTW = 0
> [   27.575868]   FSC = 0x04: level 0 translation fault
> [   27.576368] Data abort info:
> [   27.576637]   ISV = 0, ISS = 0x00000004
> [   27.576980]   CM = 0, WnR = 0
> [   27.577247] user pgtable: 4k pages, 48-bit VAs, pgdp=000000010718b000
> [   27.577818] [0000000000000008] pgd=0000000000000000, p4d=0000000000000000
> [   27.578430] Internal error: Oops: 0000000096000004 [#1] SMP
> [   27.578934] Modules linked in: polyval_generic(E+) ghash_ce(E) gf128mul(E) snd_soc_hdmi_codec(E+) sha2_ce(E) ecdh_generic(E+) sha256_arm64(E) sha1_ce(E) rfkill(E) snd_soc_spdif_tx(E) ecc(E) leds_gpio(E) snd_soc_simple_card(E) snd_soc_simple_card_utils(E) display_connector(E) gpio_fan(E) hantro_vpu(E) v4l2_vp9(E) snd_soc_rockchip_i2s_tdm(E) v4l2_h264(E) videobuf2_dma_contig(E) snd_soc_rockchip_spdif(E) snd_soc_rk817(E) v4l2_mem2mem(E) videobuf2_memops(E) governor_simpleondemand(E) rockchip_thermal(E) videobuf2_v4l2(E) dw_wdt(E) snd_soc_core(E) dw_hdmi_i2s_audio(E) dw_hdmi_cec(E) videodev(E) snd_pcm_dmaengine(E) panfrost(E) videobuf2_common(E) snd_pcm(E) rk805_pwrkey(E) snd_timer(E) gpu_sched(E) snd(E) soundcore(E) mc(E) drm_shmem_helper(E) cpufreq_dt(E) loop(E) fuse(E) efi_pstore(E) dm_mod(E) dax(E) configfs(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) crc32c_generic(E) xhci_plat_hcd(E) xhci_hcd(E) motorcomm(E) rk808_regulator(E) fan53555(E) gpio_rockchip(E) dwmac_rk(E) stmmac_platform(E)
> [   27.579108]  crct10dif_ce(E) stmmac(E) pcs_xpcs(E) crct10dif_common(E) phylink(E) fixed(E) of_mdio(E) pinctrl_rockchip(E) phy_rockchip_inno_usb2(E) fixed_phy(E) sdhci_of_dwcmshc(E) dw_mmc_rockchip(E) fwnode_mdio(E) sdhci_pltfm(E) phy_rockchip_naneng_combphy(E) dw_mmc_pltfm(E) sdhci(E) dw_mmc(E) pl330(E) libphy(E) rockchipdrm(E) ptp(E) drm_dma_helper(E) analogix_dp(E) dw_hdmi(E) cec(E) rc_core(E) drm_display_helper(E) dw_mipi_dsi(E) pps_core(E) drm_kms_helper(E) ohci_platform(E) ohci_hcd(E) ehci_platform(E) io_domain(E) ehci_hcd(E) dwc3(E) i2c_rk3x(E) udc_core(E) usbcore(E) roles(E) ulpi(E) drm(E) usb_common(E)
> [   27.591758] CPU: 1 PID: 407 Comm: v4l_id Tainted: G            E      6.4.0-0-pine64-arm64 #1  Debian 6.4~rc2-1~pine64
> [   27.592705] Hardware name: Pine64 RK3566 Quartz64-A Board (DT)
> [   27.593223] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [   27.593843] pc : hantro_try_fmt+0xb8/0x290 [hantro_vpu]
> [   27.594336] lr : hantro_try_fmt+0xac/0x290 [hantro_vpu]
> [   27.594811] sp : ffff80000aa9b670
> [   27.595107] x29: ffff80000aa9b670 x28: ffff80000aa9bb60 x27: 0000000000000000
> [   27.595746] x26: 0000000000000000 x25: ffff000108380008 x24: ffff8000014ebac0
> [   27.596383] x23: 0000000000000001 x22: ffff000108380000 x21: 0000000000000000
> [   27.597019] x20: ffff8000014f10f0 x19: ffff80000aa9b708 x18: 0000000000000010
> [   27.597657] x17: 0000000000000000 x16: 0000000000000000 x15: ffff80000aa9b6a0
> [   27.598292] x14: ffff0001043fe280 x13: 0000000000000001 x12: 0000000000000001
> [   27.598927] x11: 0000000000000002 x10: 0000000000000003 x9 : 0000000000000002
> [   27.599563] x8 : 000000000000001f x7 : 000000000000005f x6 : 0000000000000003
> [   27.600199] x5 : ffff8000013e2583 x4 : ffff8000013e2580 x3 : 00000000ffffffff
> [   27.600834] x2 : 0000000000000010 x1 : 0000000000000000 x0 : 0000000034363253
> [   27.601470] Call trace:
> [   27.601693]  hantro_try_fmt+0xb8/0x290 [hantro_vpu]
> [   27.602143]  hantro_set_fmt_out+0x44/0x388 [hantro_vpu]
> [   27.602617]  hantro_reset_raw_fmt+0x7c/0xe0 [hantro_vpu]
> [   27.603096]  hantro_set_fmt_cap+0x29c/0x2b8 [hantro_vpu]
> [   27.603575]  hantro_reset_encoded_fmt+0x80/0xc0 [hantro_vpu]
> [   27.604086]  hantro_reset_fmts+0x20/0x48 [hantro_vpu]
> [   27.604545]  hantro_open+0xe0/0x208 [hantro_vpu]
> [   27.604965]  v4l2_open+0x84/0x130 [videodev]
> [   27.605389]  chrdev_open+0xd8/0x2d8
> [   27.605712]  do_dentry_open+0x1bc/0x490
> [   27.606060]  vfs_open+0x34/0x40
> [   27.606345]  path_openat+0x9c8/0xf20
> [   27.606670]  do_filp_open+0xa4/0x160
> [   27.606991]  do_sys_openat2+0xc8/0x188
> [   27.607327]  __arm64_sys_openat+0x6c/0xb8
> [   27.607687]  invoke_syscall+0x78/0x108
> [   27.608029]  el0_svc_common.constprop.0+0xd4/0x100
> [   27.608456]  do_el0_svc+0x40/0xa8
> [   27.608755]  el0_svc+0x34/0xd8
> [   27.609036]  el0t_64_sync_handler+0xf4/0x120
> [   27.609420]  el0t_64_sync+0x190/0x198
> [   27.609753] Code: 97fbc8e2 f94056c1 52864a60 72a686c0 (b9400821)
> [   27.610297] ---[ end trace 0000000000000000 ]---

Diederik de Haas May 22, 2023, 10:38 p.m. UTC | #6

On Monday, 22 May 2023 18:17:39 CEST Benjamin Gaignard wrote:
> Le 20/05/2023 à 00:34, Diederik de Haas a écrit :
> > On Thursday, 13 April 2023 21:52:50 CEST Nicolas Dufresne wrote:
> >> Le jeudi 13 avril 2023 à 10:10 -0300, Ezequiel Garcia a écrit :
> >>> Benjamin,
> >>> 
> >>> Please include the crash stracktrace in the commit.
> >> 
> >> Careful with HTML message, they don't always make it in these ML and
> >> tooling might not play well with the tooling. Perhaps it can be edited
> >> while pulling ? Here's the info from Marek's bug report:
> >> 
> >> hantro-vpu fdea0000.video-codec: Adding to iommu group 0
> >> hantro-vpu fdea0000.video-codec: registered rockchip,rk3568-vpu-dec as
> >> /dev/video0
> >> hantro-vpu fdee0000.video-codec: Adding to iommu group 1
> >> hantro-vpu fdee0000.video-codec: registered rockchip,rk3568-vepu-enc as
> >> /dev/video1
> >> Unable to handle kernel NULL pointer dereference at virtual address
> >> 0000000000000008
> >> Mem abort info:
> >> ESR = 0x0000000096000004
> >> EC = 0x25: DABT (current EL), IL = 32 bits
> >> SET = 0, FnV = 0
> >> EA = 0, S1PTW = 0
> >> FSC = 0x04: level 0 translation fault
> >> Data abort info:
> >> ISV = 0, ISS = 0x00000004
> >> CM = 0, WnR = 0
> >> user pgtable: 4k pages, 48-bit VAs, pgdp=00000001f446f000
> >> [0000000000000008] pgd=0000000000000000, p4d=0000000000000000
> >> Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> >> Modules linked in: hantro_vpu v4l2_vp9 v4l2_h264 v4l2_mem2mem
> >> videobuf2_dma_contig snd_soc_simple_card display_connector
> >> snd_soc_simple_card_utils videobuf2_memops crct10dif_ce dwmac_rk
> >> rockchip_thermal videobuf2_v4l2 stmmac_platform rockchip_saradc
> >> industrialio_triggered_buffer kfifo_buf stmmac videodev pcs_xpcs
> >> rtc_rk808 videobuf2_common rockchipdrm panfrost mc drm_shmem_helper
> >> analogix_dp gpu_sched dw_mipi_dsi dw_hdmi drm_display_helper ip_tables
> >> x_tables ipv6
> >> CPU: 3 PID: 171 Comm: v4l_id Not tainted 6.3.0-rc2+ #13478
> >> Hardware name: Hardkernel ODROID-M1 (DT)
> >> pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> >> pc : hantro_try_fmt+0xb4/0x280 [hantro_vpu]
> >> lr : hantro_try_fmt+0xa8/0x280 [hantro_vpu]
> >> ...
> >> Call trace:
> >> hantro_try_fmt+0xb4/0x280 [hantro_vpu]
> >> hantro_set_fmt_out+0x3c/0x278 [hantro_vpu]
> >> hantro_reset_raw_fmt+0x94/0xb4 [hantro_vpu]
> >> hantro_set_fmt_cap+0x23c/0x250 [hantro_vpu]
> >> hantro_reset_fmts+0x94/0xcc [hantro_vpu]
> >> hantro_open+0xd4/0x20c [hantro_vpu]
> >> v4l2_open+0x80/0x120 [videodev]
> >> chrdev_open+0xc0/0x22c
> >> do_dentry_open+0x13c/0x490
> >> vfs_open+0x2c/0x38
> >> path_openat+0x550/0x938
> >> do_filp_open+0x80/0x12c
> >> do_sys_openat2+0xb4/0x16c
> >> __arm64_sys_openat+0x64/0xac
> >> invoke_syscall+0x48/0x114
> >> el0_svc_common.constprop.0+0xfc/0x11c
> >> do_el0_svc+0x38/0xa4
> >> el0_svc+0x48/0xb8
> >> el0t_64_sync_handler+0xb8/0xbc
> >> el0t_64_sync+0x190/0x194
> >> Code: 97fe726c f940aa80 52864a61 72a686c1 (b9400800)
> >> ---[ end trace 0000000000000000 ]---
> > 
> > When I booted into my 6.4-rc1 (but also rc2) kernel on my
> > Pine64 Quartz64 Model A, I noticed a crash which seems the same as
> > above, but I didn't have such a crash with my 6.3 kernel.
> > Searching for 'hantro' led me to this commit as the most likely culprit
> > but when I build a new 6.4-rcX kernel with this commit reverted,
> > I still had this crash.
> > Do you have suggestions which commit would then be the likely culprit?
> 
> This patch fix the crash at boot time, revert it doesn't seem to be the
> solution. Maybe this proposal from Marek can help you ?
> https://patchwork.kernel.org/project/linux-media/patch/20230421104759.223646
> 3-1-m.szyprowski@samsung.com/

That helped :) After applying that patch I no longer have the crash.
Thanks!

Regards,
  Diederik

Linux regression tracking (Thorsten Leemhuis) May 23, 2023, 10:35 a.m. UTC | #7

[CCing the regression list, as it should be in the loop for regressions:
https://docs.kernel.org/admin-guide/reporting-regressions.html]

[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 20.05.23 00:34, Diederik de Haas wrote:
> 
> On Thursday, 13 April 2023 21:52:50 CEST Nicolas Dufresne wrote:
>> Le jeudi 13 avril 2023 à 10:10 -0300, Ezequiel Garcia a écrit :
> [....]
> 
> When I booted into my 6.4-rc1 (but also rc2) kernel on my
> Pine64 Quartz64 Model A, I noticed a crash which seems the same as
> above, but I didn't have such a crash with my 6.3 kernel.
> Searching for 'hantro' led me to this commit as the most likely culprit
> but when I build a new 6.4-rcX kernel with this commit reverted,
> I still had this crash.
> Do you have suggestions which commit would then be the likely culprit?
> [...]

Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced db6f68b51e5c
#regzbot title media: verisilicon: null pointer dereference in try_fmt
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

Linux regression tracking (Thorsten Leemhuis) May 23, 2023, 10:50 a.m. UTC | #8

CCing the Regression list and a bunch of other people that were CCed in
threads that look related:

On 23.05.23 00:38, Diederik de Haas wrote:
> On Monday, 22 May 2023 18:17:39 CEST Benjamin Gaignard wrote:
>> Le 20/05/2023 à 00:34, Diederik de Haas a écrit :
>>> On Thursday, 13 April 2023 21:52:50 CEST Nicolas Dufresne wrote:
> [...]
>>> When I booted into my 6.4-rc1 (but also rc2) kernel on my
>>> Pine64 Quartz64 Model A, I noticed a crash which seems the same as
>>> above, but I didn't have such a crash with my 6.3 kernel.
>>> Searching for 'hantro' led me to this commit as the most likely culprit
>>> but when I build a new 6.4-rcX kernel with this commit reverted,
>>> I still had this crash.
>>> Do you have suggestions which commit would then be the likely culprit?
>>
>> This patch fix the crash at boot time, revert it doesn't seem to be the
>> solution. Maybe this proposal from Marek can help you ?
>>
>> https://patchwork.kernel.org/project/linux-media/patch/20230421104759.2236463-1-m.szyprowski@samsung.com/
> 
> That helped :) After applying that patch I no longer have the crash.
> Thanks!

That regression fix is now a month old, but not yet merged afaics --
guess due to Nicolas comment that wasn't addressed yet and likely
requires a updated patch.

Michael afaics a week ago posted a patch that to my *very limited
understanding of things* (I hope I don't confuse matters here!) seems to
address the same problem, but slightly differently:
https://lore.kernel.org/all/20230516091209.3098262-1-m.tretter@pengutronix.de/

No reply yet.

That's all a bit unfortunate, as it's not how regression fixes should be
dealt with -- and caused multiple people headaches that could have been
avoided. :-/

But well, things happen. But it leads to the question:

How can we finally address the issue quickly now to ensure is doesn't
cause headaches for even more people?

Marek, Michael, could you work on a patch together that we then get
somewhat fast-tracked to Linus to avoid him getting even more unhappy
about the state of things[1]?

Ciao, Thorsten

[1] see
https://lore.kernel.org/all/CAHk-=wgzU8_dGn0Yg+DyX7ammTkDUCyEJ4C=NvnHRhxKWC7Wpw@mail.gmail.com/

Michael Tretter May 23, 2023, 2:54 p.m. UTC | #9

On Tue, 23 May 2023 12:50:42 +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
> CCing the Regression list and a bunch of other people that were CCed in
> threads that look related:

Thanks!

> 
> On 23.05.23 00:38, Diederik de Haas wrote:
> > On Monday, 22 May 2023 18:17:39 CEST Benjamin Gaignard wrote:
> >> Le 20/05/2023 à 00:34, Diederik de Haas a écrit :
> >>> On Thursday, 13 April 2023 21:52:50 CEST Nicolas Dufresne wrote:
> > [...]
> >>> When I booted into my 6.4-rc1 (but also rc2) kernel on my
> >>> Pine64 Quartz64 Model A, I noticed a crash which seems the same as
> >>> above, but I didn't have such a crash with my 6.3 kernel.
> >>> Searching for 'hantro' led me to this commit as the most likely culprit
> >>> but when I build a new 6.4-rcX kernel with this commit reverted,
> >>> I still had this crash.
> >>> Do you have suggestions which commit would then be the likely culprit?
> >>
> >> This patch fix the crash at boot time, revert it doesn't seem to be the
> >> solution. Maybe this proposal from Marek can help you ?
> >>
> >> https://patchwork.kernel.org/project/linux-media/patch/20230421104759.2236463-1-m.szyprowski@samsung.com/
> > 
> > That helped :) After applying that patch I no longer have the crash.
> > Thanks!
> 
> That regression fix is now a month old, but not yet merged afaics --
> guess due to Nicolas comment that wasn't addressed yet and likely
> requires a updated patch.

I agree with Nicolas comment on that patch and it needs to be updated.

> 
> Michael afaics a week ago posted a patch that to my *very limited
> understanding of things* (I hope I don't confuse matters here!) seems to
> address the same problem, but slightly differently:
> https://lore.kernel.org/all/20230516091209.3098262-1-m.tretter@pengutronix.de/

Correct, my patch addresses the same problem.

> 
> No reply yet.
> 
> That's all a bit unfortunate, as it's not how regression fixes should be
> dealt with -- and caused multiple people headaches that could have been
> avoided. :-/
> 
> But well, things happen. But it leads to the question:
> 
> How can we finally address the issue quickly now to ensure is doesn't
> cause headaches for even more people?
> 
> Marek, Michael, could you work on a patch together that we then get
> somewhat fast-tracked to Linus to avoid him getting even more unhappy
> about the state of things[1]?

Marek, if you have an updated patch, I will happily test and review it.
Otherwise, please take a look at my patch.

Michael

Marek Szyprowski May 23, 2023, 4:32 p.m. UTC | #10

On 23.05.2023 16:54, Michael Tretter wrote:
> On Tue, 23 May 2023 12:50:42 +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
>> CCing the Regression list and a bunch of other people that were CCed in
>> threads that look related:
> Thanks!
>
>> On 23.05.23 00:38, Diederik de Haas wrote:
>>> On Monday, 22 May 2023 18:17:39 CEST Benjamin Gaignard wrote:
>>>> Le 20/05/2023 à 00:34, Diederik de Haas a écrit :
>>>>> On Thursday, 13 April 2023 21:52:50 CEST Nicolas Dufresne wrote:
>>> [...]
>>>>> When I booted into my 6.4-rc1 (but also rc2) kernel on my
>>>>> Pine64 Quartz64 Model A, I noticed a crash which seems the same as
>>>>> above, but I didn't have such a crash with my 6.3 kernel.
>>>>> Searching for 'hantro' led me to this commit as the most likely culprit
>>>>> but when I build a new 6.4-rcX kernel with this commit reverted,
>>>>> I still had this crash.
>>>>> Do you have suggestions which commit would then be the likely culprit?
>>>> This patch fix the crash at boot time, revert it doesn't seem to be the
>>>> solution. Maybe this proposal from Marek can help you ?
>>>>
>>>> https://patchwork.kernel.org/project/linux-media/patch/20230421104759.2236463-1-m.szyprowski@samsung.com/
>>> That helped :) After applying that patch I no longer have the crash.
>>> Thanks!
>> That regression fix is now a month old, but not yet merged afaics --
>> guess due to Nicolas comment that wasn't addressed yet and likely
>> requires a updated patch.
> I agree with Nicolas comment on that patch and it needs to be updated.
>
>> Michael afaics a week ago posted a patch that to my *very limited
>> understanding of things* (I hope I don't confuse matters here!) seems to
>> address the same problem, but slightly differently:
>> https://lore.kernel.org/all/20230516091209.3098262-1-m.tretter@pengutronix.de/
> Correct, my patch addresses the same problem.
>
>> No reply yet.
>>
>> That's all a bit unfortunate, as it's not how regression fixes should be
>> dealt with -- and caused multiple people headaches that could have been
>> avoided. :-/
>>
>> But well, things happen. But it leads to the question:
>>
>> How can we finally address the issue quickly now to ensure is doesn't
>> cause headaches for even more people?
>>
>> Marek, Michael, could you work on a patch together that we then get
>> somewhat fast-tracked to Linus to avoid him getting even more unhappy
>> about the state of things[1]?
> Marek, if you have an updated patch, I will happily test and review it.
> Otherwise, please take a look at my patch.

Go ahead with your, my was just a blind try eliminating the oops during 
driver probe.


Best regards

Nicolas Dufresne May 25, 2023, 2:32 p.m. UTC | #11

Hi Micheal,

Le mardi 23 mai 2023 à 16:54 +0200, Michael Tretter a écrit :
> On Tue, 23 May 2023 12:50:42 +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
> > CCing the Regression list and a bunch of other people that were CCed in
> > threads that look related:
> 
> Thanks!
> 
> > 
> > On 23.05.23 00:38, Diederik de Haas wrote:
> > > On Monday, 22 May 2023 18:17:39 CEST Benjamin Gaignard wrote:
> > > > Le 20/05/2023 à 00:34, Diederik de Haas a écrit :
> > > > > On Thursday, 13 April 2023 21:52:50 CEST Nicolas Dufresne wrote:
> > > [...]
> > > > > When I booted into my 6.4-rc1 (but also rc2) kernel on my
> > > > > Pine64 Quartz64 Model A, I noticed a crash which seems the same as
> > > > > above, but I didn't have such a crash with my 6.3 kernel.
> > > > > Searching for 'hantro' led me to this commit as the most likely culprit
> > > > > but when I build a new 6.4-rcX kernel with this commit reverted,
> > > > > I still had this crash.
> > > > > Do you have suggestions which commit would then be the likely culprit?
> > > > 
> > > > This patch fix the crash at boot time, revert it doesn't seem to be the
> > > > solution. Maybe this proposal from Marek can help you ?
> > > > 
> > > > https://patchwork.kernel.org/project/linux-media/patch/20230421104759.2236463-1-m.szyprowski@samsung.com/
> > > 
> > > That helped :) After applying that patch I no longer have the crash.
> > > Thanks!
> > 
> > That regression fix is now a month old, but not yet merged afaics --
> > guess due to Nicolas comment that wasn't addressed yet and likely
> > requires a updated patch.
> 
> I agree with Nicolas comment on that patch and it needs to be updated.
> 
> > 
> > Michael afaics a week ago posted a patch that to my *very limited
> > understanding of things* (I hope I don't confuse matters here!) seems to
> > address the same problem, but slightly differently:
> > https://lore.kernel.org/all/20230516091209.3098262-1-m.tretter@pengutronix.de/
> 
> Correct, my patch addresses the same problem.

Sorry, just got really busy and missed the second fix. From a hot fix stand
point, your patch seems a lot safer. It does not go as far, and probably does
not make the driver better, but considering we had such a slow response we need
to do something about it.

Ezequiel, will you be fine with the approach ?

> 
> > 
> > No reply yet.
> > 
> > That's all a bit unfortunate, as it's not how regression fixes should be
> > dealt with -- and caused multiple people headaches that could have been
> > avoided. :-/
> > 
> > But well, things happen. But it leads to the question:
> > 
> > How can we finally address the issue quickly now to ensure is doesn't
> > cause headaches for even more people?
> > 
> > Marek, Michael, could you work on a patch together that we then get
> > somewhat fast-tracked to Linus to avoid him getting even more unhappy
> > about the state of things[1]?
> 
> Marek, if you have an updated patch, I will happily test and review it.
> Otherwise, please take a look at my patch.
> 
> Michael
>

Michael Tretter June 1, 2023, 1:27 p.m. UTC | #12

Hi Benjamin,

On Thu, 13 Apr 2023 12:47:56 +0200, Benjamin Gaignard wrote:
> ctx->vpu_dst_fmt is no more initialized before calling hantro_try_fmt()
> so assigne it to vpu_fmt led to crash the kernel.
> Like for decoder case use 'fmt' as format for encoder and clean up
> the code.
> 
> Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Fixes: db6f68b51e5c ("media: verisilicon: Do not set context src/dst formats in reset functions")
> ---
> version 2:
> - Remove useless vpu_fmt.
> 
>  drivers/media/platform/verisilicon/hantro_v4l2.c | 10 +++-------
>  1 file changed, 3 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/media/platform/verisilicon/hantro_v4l2.c b/drivers/media/platform/verisilicon/hantro_v4l2.c
> index 8f1414085f47..d71f79471396 100644
> --- a/drivers/media/platform/verisilicon/hantro_v4l2.c
> +++ b/drivers/media/platform/verisilicon/hantro_v4l2.c
> @@ -275,7 +275,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx,
>  			  struct v4l2_pix_format_mplane *pix_mp,
>  			  enum v4l2_buf_type type)
>  {
> -	const struct hantro_fmt *fmt, *vpu_fmt;
> +	const struct hantro_fmt *fmt;
>  	bool capture = V4L2_TYPE_IS_CAPTURE(type);
>  	bool coded;
>  
> @@ -295,11 +295,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx,
>  
>  	if (coded) {
>  		pix_mp->num_planes = 1;
> -		vpu_fmt = fmt;
> -	} else if (ctx->is_encoder) {
> -		vpu_fmt = ctx->vpu_dst_fmt;
> -	} else {
> -		vpu_fmt = fmt;
> +	} else if (!ctx->is_encoder) {
>  		/*
>  		 * Width/height on the CAPTURE end of a decoder are ignored and
>  		 * replaced by the OUTPUT ones.
> @@ -311,7 +307,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx,
>  	pix_mp->field = V4L2_FIELD_NONE;
>  
>  	v4l2_apply_frmsize_constraints(&pix_mp->width, &pix_mp->height,
> -				       &vpu_fmt->frmsize);
> +				       &fmt->frmsize);

This causes a regression on the OUTPUT device of the encoder. fmt->frmsize is
only valid for coded ("bitstream") formats, but fmt on the OUTPUT of an
encoder will be a raw format. This results in width and height to be clamped
to 0.

I think the correct fix would be to apply the frmsize constraints of the
currently configured coded format, but as ctx->vpu_dst_fmt is not initialized
before calling this code, I don't know how to get the coded format.

Michael

>  
>  	if (!coded) {
>  		/* Fill remaining fields */
> -- 
> 2.34.1
>

Benjamin Gaignard June 1, 2023, 2:08 p.m. UTC | #13

Le 01/06/2023 à 15:27, Michael Tretter a écrit :
> Hi Benjamin,
>
> On Thu, 13 Apr 2023 12:47:56 +0200, Benjamin Gaignard wrote:
>> ctx->vpu_dst_fmt is no more initialized before calling hantro_try_fmt()
>> so assigne it to vpu_fmt led to crash the kernel.
>> Like for decoder case use 'fmt' as format for encoder and clean up
>> the code.
>>
>> Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
>> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
>> Fixes: db6f68b51e5c ("media: verisilicon: Do not set context src/dst formats in reset functions")
>> ---
>> version 2:
>> - Remove useless vpu_fmt.
>>
>>   drivers/media/platform/verisilicon/hantro_v4l2.c | 10 +++-------
>>   1 file changed, 3 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/media/platform/verisilicon/hantro_v4l2.c b/drivers/media/platform/verisilicon/hantro_v4l2.c
>> index 8f1414085f47..d71f79471396 100644
>> --- a/drivers/media/platform/verisilicon/hantro_v4l2.c
>> +++ b/drivers/media/platform/verisilicon/hantro_v4l2.c
>> @@ -275,7 +275,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx,
>>   			  struct v4l2_pix_format_mplane *pix_mp,
>>   			  enum v4l2_buf_type type)
>>   {
>> -	const struct hantro_fmt *fmt, *vpu_fmt;
>> +	const struct hantro_fmt *fmt;
>>   	bool capture = V4L2_TYPE_IS_CAPTURE(type);
>>   	bool coded;
>>   
>> @@ -295,11 +295,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx,
>>   
>>   	if (coded) {
>>   		pix_mp->num_planes = 1;
>> -		vpu_fmt = fmt;
>> -	} else if (ctx->is_encoder) {
>> -		vpu_fmt = ctx->vpu_dst_fmt;
>> -	} else {
>> -		vpu_fmt = fmt;
>> +	} else if (!ctx->is_encoder) {
>>   		/*
>>   		 * Width/height on the CAPTURE end of a decoder are ignored and
>>   		 * replaced by the OUTPUT ones.
>> @@ -311,7 +307,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx,
>>   	pix_mp->field = V4L2_FIELD_NONE;
>>   
>>   	v4l2_apply_frmsize_constraints(&pix_mp->width, &pix_mp->height,
>> -				       &vpu_fmt->frmsize);
>> +				       &fmt->frmsize);
> This causes a regression on the OUTPUT device of the encoder. fmt->frmsize is
> only valid for coded ("bitstream") formats, but fmt on the OUTPUT of an
> encoder will be a raw format. This results in width and height to be clamped
> to 0.
>
> I think the correct fix would be to apply the frmsize constraints of the
> currently configured coded format, but as ctx->vpu_dst_fmt is not initialized
> before calling this code, I don't know how to get the coded format.

if ctx->dst_fmt is correctly set (and it should be) then doing:

pix_mp->width = ctx->dst_fmt.width;
pix_mp->height = ctx->dst_fmt.height;

should solve the issue.

Benjamin

>
> Michael
>
>>   
>>   	if (!coded) {
>>   		/* Fill remaining fields */
>> -- 
>> 2.34.1
>>

Michael Tretter July 6, 2023, 7:37 a.m. UTC | #14

On Thu, 01 Jun 2023 16:08:13 +0200, Benjamin Gaignard wrote:
> Le 01/06/2023 à 15:27, Michael Tretter a écrit :
> > On Thu, 13 Apr 2023 12:47:56 +0200, Benjamin Gaignard wrote:
> > > ctx->vpu_dst_fmt is no more initialized before calling hantro_try_fmt()
> > > so assigne it to vpu_fmt led to crash the kernel.
> > > Like for decoder case use 'fmt' as format for encoder and clean up
> > > the code.
> > > 
> > > Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
> > > Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> > > Fixes: db6f68b51e5c ("media: verisilicon: Do not set context src/dst formats in reset functions")
> > > ---
> > > version 2:
> > > - Remove useless vpu_fmt.
> > > 
> > >   drivers/media/platform/verisilicon/hantro_v4l2.c | 10 +++-------
> > >   1 file changed, 3 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/drivers/media/platform/verisilicon/hantro_v4l2.c b/drivers/media/platform/verisilicon/hantro_v4l2.c
> > > index 8f1414085f47..d71f79471396 100644
> > > --- a/drivers/media/platform/verisilicon/hantro_v4l2.c
> > > +++ b/drivers/media/platform/verisilicon/hantro_v4l2.c
> > > @@ -275,7 +275,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx,
> > >   			  struct v4l2_pix_format_mplane *pix_mp,
> > >   			  enum v4l2_buf_type type)
> > >   {
> > > -	const struct hantro_fmt *fmt, *vpu_fmt;
> > > +	const struct hantro_fmt *fmt;
> > >   	bool capture = V4L2_TYPE_IS_CAPTURE(type);
> > >   	bool coded;
> > > @@ -295,11 +295,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx,
> > >   	if (coded) {
> > >   		pix_mp->num_planes = 1;
> > > -		vpu_fmt = fmt;
> > > -	} else if (ctx->is_encoder) {
> > > -		vpu_fmt = ctx->vpu_dst_fmt;
> > > -	} else {
> > > -		vpu_fmt = fmt;
> > > +	} else if (!ctx->is_encoder) {
> > >   		/*
> > >   		 * Width/height on the CAPTURE end of a decoder are ignored and
> > >   		 * replaced by the OUTPUT ones.
> > > @@ -311,7 +307,7 @@ static int hantro_try_fmt(const struct hantro_ctx *ctx,
> > >   	pix_mp->field = V4L2_FIELD_NONE;
> > >   	v4l2_apply_frmsize_constraints(&pix_mp->width, &pix_mp->height,
> > > -				       &vpu_fmt->frmsize);
> > > +				       &fmt->frmsize);
> > This causes a regression on the OUTPUT device of the encoder. fmt->frmsize is
> > only valid for coded ("bitstream") formats, but fmt on the OUTPUT of an
> > encoder will be a raw format. This results in width and height to be clamped
> > to 0.
> > 
> > I think the correct fix would be to apply the frmsize constraints of the
> > currently configured coded format, but as ctx->vpu_dst_fmt is not initialized
> > before calling this code, I don't know how to get the coded format.
> 
> if ctx->dst_fmt is correctly set (and it should be) then doing:
> 
> pix_mp->width = ctx->dst_fmt.width;
> pix_mp->height = ctx->dst_fmt.height;
> 
> should solve the issue.

Using the width and height of dst_fmt for OUTPUT is not correct, since the
v4l2 stateless encoder spec dictates that the size is set on the OUTPUT device
and will be ignored on the CAPTURE device.

I sent a patch [0] to apply the frame constraints using dst_fmt.

Michael

[0] https://lore.kernel.org/linux-media/20230706071510.1717684-1-m.tretter@pengutronix.de/

> > >   	if (!coded) {
> > >   		/* Fill remaining fields */
> > > -- 
> > > 2.34.1
> > > 
>

diff mbox series

Patch

diff --git a/drivers/media/platform/verisilicon/hantro_v4l2.c b/drivers/media/platform/verisilicon/hantro_v4l2.c
index 8f1414085f47..d71f79471396 100644
--- a/drivers/media/platform/verisilicon/hantro_v4l2.c
+++ b/drivers/media/platform/verisilicon/hantro_v4l2.c
@@ -275,7 +275,7 @@  static int hantro_try_fmt(const struct hantro_ctx *ctx,
 			  struct v4l2_pix_format_mplane *pix_mp,
 			  enum v4l2_buf_type type)
 {
-	const struct hantro_fmt *fmt, *vpu_fmt;
+	const struct hantro_fmt *fmt;
 	bool capture = V4L2_TYPE_IS_CAPTURE(type);
 	bool coded;
 
@@ -295,11 +295,7 @@  static int hantro_try_fmt(const struct hantro_ctx *ctx,
 
 	if (coded) {
 		pix_mp->num_planes = 1;
-		vpu_fmt = fmt;
-	} else if (ctx->is_encoder) {
-		vpu_fmt = ctx->vpu_dst_fmt;
-	} else {
-		vpu_fmt = fmt;
+	} else if (!ctx->is_encoder) {
 		/*
 		 * Width/height on the CAPTURE end of a decoder are ignored and
 		 * replaced by the OUTPUT ones.
@@ -311,7 +307,7 @@  static int hantro_try_fmt(const struct hantro_ctx *ctx,
 	pix_mp->field = V4L2_FIELD_NONE;
 
 	v4l2_apply_frmsize_constraints(&pix_mp->width, &pix_mp->height,
-				       &vpu_fmt->frmsize);
+				       &fmt->frmsize);
 
 	if (!coded) {
 		/* Fill remaining fields */