Revert "venus: firmware: Correct non-pix start and end addresses"

Message ID 20230207102254.1446461-1-javierm@redhat.com
State New
Headers
Series Revert "venus: firmware: Correct non-pix start and end addresses" |

Commit Message

Javier Martinez Canillas Feb. 7, 2023, 10:22 a.m. UTC
  This reverts commit a837e5161cfffbb3242cc0eb574f8bf65fd32640, which broke
probing of the venus driver, at least on the SC7180 SoC HP X2 Chromebook:

  [   11.455782] qcom-venus aa00000.video-codec: Adding to iommu group 11
  [   11.506980] qcom-venus aa00000.video-codec: non legacy binding
  [   12.143432] qcom-venus aa00000.video-codec: failed to reset venus core
  [   12.156440] qcom-venus: probe of aa00000.video-codec failed with error -110

Matthias Kaehlcke also reported that the same change caused a regression in
SC7180 and sc7280, that prevents AOSS from entering sleep mode during system
suspend. So let's revert this commit for now to fix both issues.

Fixes: a837e5161cff ("venus: firmware: Correct non-pix start and end addresses")
Reported-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
---

 drivers/media/platform/qcom/venus/firmware.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
  

Comments

Vikash Garodia Feb. 7, 2023, 4:40 p.m. UTC | #1
Hi Javier and Matthias,
Can we try the attached patch if that fixes the suspend issue for sc7180 and sc7280 ?

> -----Original Message-----
> From: Javier Martinez Canillas <javierm@redhat.com>
> Sent: Tuesday, February 7, 2023 3:53 PM
> To: linux-kernel@vger.kernel.org
> Cc: Albert Esteve <aesteve@redhat.com>; stanimir.varbanov@linaro.org;
> Matthias Kaehlcke <mka@chromium.org>; Enric Balletbo i Serra
> <eballetb@redhat.com>; Javier Martinez Canillas <javierm@redhat.com>; Andy
> Gross <agross@kernel.org>; Bjorn Andersson <andersson@kernel.org>; Konrad
> Dybcio <konrad.dybcio@linaro.org>; Mauro Carvalho Chehab
> <mchehab@kernel.org>; Stanimir Varbanov
> <stanimir.k.varbanov@gmail.com>; Vikash Garodia (QUIC)
> <quic_vgarodia@quicinc.com>; linux-arm-msm@vger.kernel.org; linux-
> media@vger.kernel.org
> Subject: [PATCH] Revert "venus: firmware: Correct non-pix start and end
> addresses"
> 
> WARNING: This email originated from outside of Qualcomm. Please be wary of
> any links or attachments, and do not enable macros.
> 
> This reverts commit a837e5161cfffbb3242cc0eb574f8bf65fd32640, which
> broke probing of the venus driver, at least on the SC7180 SoC HP X2
> Chromebook:
> 
>   [   11.455782] qcom-venus aa00000.video-codec: Adding to iommu group 11
>   [   11.506980] qcom-venus aa00000.video-codec: non legacy binding
>   [   12.143432] qcom-venus aa00000.video-codec: failed to reset venus core
>   [   12.156440] qcom-venus: probe of aa00000.video-codec failed with error -
> 110
> 
> Matthias Kaehlcke also reported that the same change caused a regression in
> SC7180 and sc7280, that prevents AOSS from entering sleep mode during
> system suspend. So let's revert this commit for now to fix both issues.
> 
> Fixes: a837e5161cff ("venus: firmware: Correct non-pix start and end
> addresses")
> Reported-by: Matthias Kaehlcke <mka@chromium.org>
> Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
> ---
> 
>  drivers/media/platform/qcom/venus/firmware.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/media/platform/qcom/venus/firmware.c
> b/drivers/media/platform/qcom/venus/firmware.c
> index 142d4c74017c..d59ecf776715 100644
> --- a/drivers/media/platform/qcom/venus/firmware.c
> +++ b/drivers/media/platform/qcom/venus/firmware.c
> @@ -38,8 +38,8 @@ static void venus_reset_cpu(struct venus_core *core)
>         writel(fw_size, wrapper_base + WRAPPER_FW_END_ADDR);
>         writel(0, wrapper_base + WRAPPER_CPA_START_ADDR);
>         writel(fw_size, wrapper_base + WRAPPER_CPA_END_ADDR);
> -       writel(0, wrapper_base + WRAPPER_NONPIX_START_ADDR);
> -       writel(0, wrapper_base + WRAPPER_NONPIX_END_ADDR);
> +       writel(fw_size, wrapper_base + WRAPPER_NONPIX_START_ADDR);
> +       writel(fw_size, wrapper_base + WRAPPER_NONPIX_END_ADDR);
> 
>         if (IS_V6(core)) {
>                 /* Bring XTSS out of reset */
> --
> 2.39.1

Thanks,
Vikash
  
Matthias Kaehlcke Feb. 7, 2023, 5:50 p.m. UTC | #2
Hi Vikash,

On Tue, Feb 07, 2023 at 04:40:24PM +0000, Vikash Garodia wrote:
> Hi Javier and Matthias,
> Can we try the attached patch if that fixes the suspend issue for sc7180 and sc7280 ?

On my side the patch fixes the issue for sc7280, but not sc7180.

> > -----Original Message-----
> > From: Javier Martinez Canillas <javierm@redhat.com>
> > Sent: Tuesday, February 7, 2023 3:53 PM
> > To: linux-kernel@vger.kernel.org
> > Cc: Albert Esteve <aesteve@redhat.com>; stanimir.varbanov@linaro.org;
> > Matthias Kaehlcke <mka@chromium.org>; Enric Balletbo i Serra
> > <eballetb@redhat.com>; Javier Martinez Canillas <javierm@redhat.com>; Andy
> > Gross <agross@kernel.org>; Bjorn Andersson <andersson@kernel.org>; Konrad
> > Dybcio <konrad.dybcio@linaro.org>; Mauro Carvalho Chehab
> > <mchehab@kernel.org>; Stanimir Varbanov
> > <stanimir.k.varbanov@gmail.com>; Vikash Garodia (QUIC)
> > <quic_vgarodia@quicinc.com>; linux-arm-msm@vger.kernel.org; linux-
> > media@vger.kernel.org
> > Subject: [PATCH] Revert "venus: firmware: Correct non-pix start and end
> > addresses"
> > 
> > WARNING: This email originated from outside of Qualcomm. Please be wary of
> > any links or attachments, and do not enable macros.
> > 
> > This reverts commit a837e5161cfffbb3242cc0eb574f8bf65fd32640, which
> > broke probing of the venus driver, at least on the SC7180 SoC HP X2
> > Chromebook:
> > 
> >   [   11.455782] qcom-venus aa00000.video-codec: Adding to iommu group 11
> >   [   11.506980] qcom-venus aa00000.video-codec: non legacy binding
> >   [   12.143432] qcom-venus aa00000.video-codec: failed to reset venus core
> >   [   12.156440] qcom-venus: probe of aa00000.video-codec failed with error -
> > 110
> > 
> > Matthias Kaehlcke also reported that the same change caused a regression in
> > SC7180 and sc7280, that prevents AOSS from entering sleep mode during
> > system suspend. So let's revert this commit for now to fix both issues.
> > 
> > Fixes: a837e5161cff ("venus: firmware: Correct non-pix start and end
> > addresses")
> > Reported-by: Matthias Kaehlcke <mka@chromium.org>
> > Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
> > ---
> > 
> >  drivers/media/platform/qcom/venus/firmware.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/media/platform/qcom/venus/firmware.c
> > b/drivers/media/platform/qcom/venus/firmware.c
> > index 142d4c74017c..d59ecf776715 100644
> > --- a/drivers/media/platform/qcom/venus/firmware.c
> > +++ b/drivers/media/platform/qcom/venus/firmware.c
> > @@ -38,8 +38,8 @@ static void venus_reset_cpu(struct venus_core *core)
> >         writel(fw_size, wrapper_base + WRAPPER_FW_END_ADDR);
> >         writel(0, wrapper_base + WRAPPER_CPA_START_ADDR);
> >         writel(fw_size, wrapper_base + WRAPPER_CPA_END_ADDR);
> > -       writel(0, wrapper_base + WRAPPER_NONPIX_START_ADDR);
> > -       writel(0, wrapper_base + WRAPPER_NONPIX_END_ADDR);
> > +       writel(fw_size, wrapper_base + WRAPPER_NONPIX_START_ADDR);
> > +       writel(fw_size, wrapper_base + WRAPPER_NONPIX_END_ADDR);
> > 
> >         if (IS_V6(core)) {
> >                 /* Bring XTSS out of reset */
> > --
> > 2.39.1
> 
> Thanks,
> Vikash
  
Matthias Kaehlcke Feb. 7, 2023, 10:39 p.m. UTC | #3
On Tue, Feb 07, 2023 at 05:50:19PM +0000, mka@chromium.org wrote:
> Hi Vikash,
> 
> On Tue, Feb 07, 2023 at 04:40:24PM +0000, Vikash Garodia wrote:
> > Hi Javier and Matthias,
> > Can we try the attached patch if that fixes the suspend issue for sc7180 and sc7280 ?
> 
> On my side the patch fixes the issue for sc7280, but not sc7180.

Some more info for sc7180:

[   10.313055] qcom-venus aa00000.video-codec: failed to reset venus core
[   10.331454] qcom-venus: probe of aa00000.video-codec failed with error -110

So venus didn't probe successfully. As a result sync_state() of its rpmhpd and
interconnects isn't called and they keep running at max speed, which prevents
the Always-On subsystem from suspending:

[   30.171148] qcom-rpmhpd 18200000.rsc:power-controller: Consumer 'aa00000.video-codec' did not probe (successfully)
[   30.682950] qnoc-sc7180 9680000.interconnect: Consumer 'aa00000.video-codec' did not probe (successfully)
[   30.701843] qnoc-sc7180 1740000.interconnect: Consumer 'aa00000.video-codec' did not probe (successfully)
[   30.720168] qnoc-sc7180 1638000.interconnect: Consumer 'aa00000.video-codec' did not probe (successfully)
[   30.738478] qnoc-sc7180 1500000.interconnect: Consumer 'aa00000.video-codec' did not probe (successfully)

(these debug logs are not upstream)

> > > -----Original Message-----
> > > From: Javier Martinez Canillas <javierm@redhat.com>
> > > Sent: Tuesday, February 7, 2023 3:53 PM
> > > To: linux-kernel@vger.kernel.org
> > > Cc: Albert Esteve <aesteve@redhat.com>; stanimir.varbanov@linaro.org;
> > > Matthias Kaehlcke <mka@chromium.org>; Enric Balletbo i Serra
> > > <eballetb@redhat.com>; Javier Martinez Canillas <javierm@redhat.com>; Andy
> > > Gross <agross@kernel.org>; Bjorn Andersson <andersson@kernel.org>; Konrad
> > > Dybcio <konrad.dybcio@linaro.org>; Mauro Carvalho Chehab
> > > <mchehab@kernel.org>; Stanimir Varbanov
> > > <stanimir.k.varbanov@gmail.com>; Vikash Garodia (QUIC)
> > > <quic_vgarodia@quicinc.com>; linux-arm-msm@vger.kernel.org; linux-
> > > media@vger.kernel.org
> > > Subject: [PATCH] Revert "venus: firmware: Correct non-pix start and end
> > > addresses"
> > > 
> > > WARNING: This email originated from outside of Qualcomm. Please be wary of
> > > any links or attachments, and do not enable macros.
> > > 
> > > This reverts commit a837e5161cfffbb3242cc0eb574f8bf65fd32640, which
> > > broke probing of the venus driver, at least on the SC7180 SoC HP X2
> > > Chromebook:
> > > 
> > >   [   11.455782] qcom-venus aa00000.video-codec: Adding to iommu group 11
> > >   [   11.506980] qcom-venus aa00000.video-codec: non legacy binding
> > >   [   12.143432] qcom-venus aa00000.video-codec: failed to reset venus core
> > >   [   12.156440] qcom-venus: probe of aa00000.video-codec failed with error -
> > > 110
> > > 
> > > Matthias Kaehlcke also reported that the same change caused a regression in
> > > SC7180 and sc7280, that prevents AOSS from entering sleep mode during
> > > system suspend. So let's revert this commit for now to fix both issues.
> > > 
> > > Fixes: a837e5161cff ("venus: firmware: Correct non-pix start and end
> > > addresses")
> > > Reported-by: Matthias Kaehlcke <mka@chromium.org>
> > > Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
> > > ---
> > > 
> > >  drivers/media/platform/qcom/venus/firmware.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/media/platform/qcom/venus/firmware.c
> > > b/drivers/media/platform/qcom/venus/firmware.c
> > > index 142d4c74017c..d59ecf776715 100644
> > > --- a/drivers/media/platform/qcom/venus/firmware.c
> > > +++ b/drivers/media/platform/qcom/venus/firmware.c
> > > @@ -38,8 +38,8 @@ static void venus_reset_cpu(struct venus_core *core)
> > >         writel(fw_size, wrapper_base + WRAPPER_FW_END_ADDR);
> > >         writel(0, wrapper_base + WRAPPER_CPA_START_ADDR);
> > >         writel(fw_size, wrapper_base + WRAPPER_CPA_END_ADDR);
> > > -       writel(0, wrapper_base + WRAPPER_NONPIX_START_ADDR);
> > > -       writel(0, wrapper_base + WRAPPER_NONPIX_END_ADDR);
> > > +       writel(fw_size, wrapper_base + WRAPPER_NONPIX_START_ADDR);
> > > +       writel(fw_size, wrapper_base + WRAPPER_NONPIX_END_ADDR);
> > > 
> > >         if (IS_V6(core)) {
> > >                 /* Bring XTSS out of reset */
> > > --
> > > 2.39.1
> > 
> > Thanks,
> > Vikash
> 
>
  
Javier Martinez Canillas Feb. 8, 2023, 9:06 a.m. UTC | #4
Hello Vikash,

On 2/7/23 17:40, Vikash Garodia wrote:
> Hi Javier and Matthias,
> Can we try the attached patch if that fixes the suspend issue for sc7180 and sc7280 ?
> 

I tested your attached patch on an SC7180 machine (HP X2 Chromebook) and as Matthias
mentioned, it still causes the driver's probe to fail:

[ 2119.063779] qcom-venus aa00000.video-codec: non legacy binding
[ 2119.085695] platform video-firmware.0: Adding to iommu group 11
[ 2119.156302] arm-smmu 15000000.iommu: Unhandled context fault: fsr=0x402, iova=0x000000b0, fsynr=0x61, cbfrsynra=0xc40, cb=7
[ 2119.259382] qcom-venus aa00000.video-codec: failed to reset venus core
[ 2119.267782] platform video-firmware.0: Removing from iommu group 11
[ 2119.275052] qcom-venus: probe of aa00000.video-codec failed with error -110
  
Javier Martinez Canillas Feb. 10, 2023, 8:15 a.m. UTC | #5
On 2/8/23 10:06, Javier Martinez Canillas wrote:
> Hello Vikash,
> 
> On 2/7/23 17:40, Vikash Garodia wrote:
>> Hi Javier and Matthias,
>> Can we try the attached patch if that fixes the suspend issue for sc7180 and sc7280 ?
>>
> 
> I tested your attached patch on an SC7180 machine (HP X2 Chromebook) and as Matthias
> mentioned, it still causes the driver's probe to fail:
> 
> [ 2119.063779] qcom-venus aa00000.video-codec: non legacy binding
> [ 2119.085695] platform video-firmware.0: Adding to iommu group 11
> [ 2119.156302] arm-smmu 15000000.iommu: Unhandled context fault: fsr=0x402, iova=0x000000b0, fsynr=0x61, cbfrsynra=0xc40, cb=7
> [ 2119.259382] qcom-venus aa00000.video-codec: failed to reset venus core
> [ 2119.267782] platform video-firmware.0: Removing from iommu group 11
> [ 2119.275052] qcom-venus: probe of aa00000.video-codec failed with error -110
> 

So what should we do about this folks? Since not allowing the driver to probe
on at least SC7180 is a quite serious regression, can we revert for now until
a proper fix is figured out?
  
Vikash Garodia Feb. 10, 2023, 9:22 a.m. UTC | #6
Hi Javier,

>-----Original Message-----
>From: Javier Martinez Canillas <javierm@redhat.com>
>Sent: Friday, February 10, 2023 1:45 PM
>To: Vikash Garodia <vgarodia@qti.qualcomm.com>; linux-
>kernel@vger.kernel.org; mka@chromium.org
>Cc: Albert Esteve <aesteve@redhat.com>; stanimir.varbanov@linaro.org; Enric
>Balletbo i Serra <eballetb@redhat.com>; Andy Gross <agross@kernel.org>;
>Bjorn Andersson <andersson@kernel.org>; Konrad Dybcio
><konrad.dybcio@linaro.org>; Mauro Carvalho Chehab <mchehab@kernel.org>;
>Stanimir Varbanov <stanimir.k.varbanov@gmail.com>; Vikash Garodia (QUIC)
><quic_vgarodia@quicinc.com>; linux-arm-msm@vger.kernel.org; linux-
>media@vger.kernel.org; Fritz Koenig <frkoenig@google.com>; Dikshita Agarwal
>(QUIC) <quic_dikshita@quicinc.com>; Rajeshwar Kurapaty (QUIC)
><quic_rkurapat@quicinc.com>
>Subject: Re: [PATCH] Revert "venus: firmware: Correct non-pix start and end
>addresses"
>
>WARNING: This email originated from outside of Qualcomm. Please be wary of
>any links or attachments, and do not enable macros.
>
>On 2/8/23 10:06, Javier Martinez Canillas wrote:
>> Hello Vikash,
>>
>> On 2/7/23 17:40, Vikash Garodia wrote:
>>> Hi Javier and Matthias,
>>> Can we try the attached patch if that fixes the suspend issue for sc7180 and
>sc7280 ?
>>>
>>
>> I tested your attached patch on an SC7180 machine (HP X2 Chromebook)
>> and as Matthias mentioned, it still causes the driver's probe to fail:
>>
>> [ 2119.063779] qcom-venus aa00000.video-codec: non legacy binding [
>> 2119.085695] platform video-firmware.0: Adding to iommu group 11 [
>> 2119.156302] arm-smmu 15000000.iommu: Unhandled context fault:
>> fsr=0x402, iova=0x000000b0, fsynr=0x61, cbfrsynra=0xc40, cb=7 [
>> 2119.259382] qcom-venus aa00000.video-codec: failed to reset venus
>> core [ 2119.267782] platform video-firmware.0: Removing from iommu
>> group 11 [ 2119.275052] qcom-venus: probe of aa00000.video-codec
>> failed with error -110
>>
>
>So what should we do about this folks? Since not allowing the driver to probe on
>at least SC7180 is a quite serious regression, can we revert for now until a proper
>fix is figured out?

I am able to repro this issue on sc7180 and discussing with firmware team on the cause
of reset failure. The original patch was raised for fixing rare SMMU faults during warm
boot of video hardware. Hence looking to understand the regressing part before we
proceed to revert.

>--
>Best regards,
>
>Javier Martinez Canillas
>Core Platforms
>Red Hat
  
Javier Martinez Canillas Feb. 10, 2023, 10:07 a.m. UTC | #7
Hello Vikash,

On 2/10/23 10:22, Vikash Garodia wrote:

[...]

>>
>> So what should we do about this folks? Since not allowing the driver to probe on
>> at least SC7180 is a quite serious regression, can we revert for now until a proper
>> fix is figured out?
> 
> I am able to repro this issue on sc7180 and discussing with firmware team on the cause
> of reset failure. The original patch was raised for fixing rare SMMU faults during warm
> boot of video hardware. Hence looking to understand the regressing part before we
> proceed to revert.
> 

Great, if you are working on a proper fix then that would be much better indeed.

Thanks for the follow-up!
  
Linux regression tracking (Thorsten Leemhuis) Feb. 11, 2023, 2:27 p.m. UTC | #8
On 10.02.23 11:07, Javier Martinez Canillas wrote:
> On 2/10/23 10:22, Vikash Garodia wrote:
>
>>> So what should we do about this folks? Since not allowing the driver to probe on
>>> at least SC7180 is a quite serious regression, can we revert for now until a proper
>>> fix is figured out?
>>
>> I am able to repro this issue on sc7180 and discussing with firmware team on the cause
>> of reset failure. The original patch was raised for fixing rare SMMU faults during warm
>> boot of video hardware. Hence looking to understand the regressing part before we
>> proceed to revert.
> 
> Great, if you are working on a proper fix then that would be much better indeed.

Yeah, that's great, but OTOH: there is almost certainly just one week
before 6.2 will be released. Ideally this should be fixed by then.
Vikash, do you think that's in the cards? If not: why not revert this
now to make sure 6.2 works fine?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.
  
Linux regression tracking (Thorsten Leemhuis) Feb. 15, 2023, 10:53 a.m. UTC | #9
On 11.02.23 15:27, Linux regression tracking (Thorsten Leemhuis) wrote:
> On 10.02.23 11:07, Javier Martinez Canillas wrote:
>> On 2/10/23 10:22, Vikash Garodia wrote:
>>
>>>> So what should we do about this folks? Since not allowing the driver to probe on
>>>> at least SC7180 is a quite serious regression, can we revert for now until a proper
>>>> fix is figured out?
>>>
>>> I am able to repro this issue on sc7180 and discussing with firmware team on the cause
>>> of reset failure. The original patch was raised for fixing rare SMMU faults during warm
>>> boot of video hardware. Hence looking to understand the regressing part before we
>>> proceed to revert.
>>
>> Great, if you are working on a proper fix then that would be much better indeed.
> 
> Yeah, that's great, but OTOH: there is almost certainly just one week
> before 6.2 will be released. Ideally this should be fixed by then.
> Vikash, do you think that's in the cards? If not: why not revert this
> now to make sure 6.2 works fine?

Hmm, no reply. And we meanwhile have Wednesday and 6.2 is almost
certainly going to be out on Sunday. And the problem was called "a quite
serious regression" above. So why not quickly fix this with the revert,
as proposed earlier?

Vikash? Javier?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot ignore-activity
  
Javier Martinez Canillas Feb. 15, 2023, 10:57 a.m. UTC | #10
On Wed, Feb 15, 2023 at 11:53 AM Linux regression tracking (Thorsten
Leemhuis) <regressions@leemhuis.info> wrote:
>
> On 11.02.23 15:27, Linux regression tracking (Thorsten Leemhuis) wrote:
> > On 10.02.23 11:07, Javier Martinez Canillas wrote:
> >> On 2/10/23 10:22, Vikash Garodia wrote:
> >>
> >>>> So what should we do about this folks? Since not allowing the driver to probe on
> >>>> at least SC7180 is a quite serious regression, can we revert for now until a proper
> >>>> fix is figured out?
> >>>
> >>> I am able to repro this issue on sc7180 and discussing with firmware team on the cause
> >>> of reset failure. The original patch was raised for fixing rare SMMU faults during warm
> >>> boot of video hardware. Hence looking to understand the regressing part before we
> >>> proceed to revert.
> >>
> >> Great, if you are working on a proper fix then that would be much better indeed.
> >
> > Yeah, that's great, but OTOH: there is almost certainly just one week
> > before 6.2 will be released. Ideally this should be fixed by then.
> > Vikash, do you think that's in the cards? If not: why not revert this
> > now to make sure 6.2 works fine?
>
> Hmm, no reply. And we meanwhile have Wednesday and 6.2 is almost
> certainly going to be out on Sunday. And the problem was called "a quite
> serious regression" above. So why not quickly fix this with the revert,
> as proposed earlier?
>
> Vikash? Javier?
>

I agree with you, that we should land this revert and then properly
fix the page fault issue in v6.3.

But it's not my call, the v4l2/media folks have to decide that.
  
Linux regression tracking (Thorsten Leemhuis) Feb. 15, 2023, 1:18 p.m. UTC | #11
On 15.02.23 11:57, Javier Martinez Canillas wrote:
> On Wed, Feb 15, 2023 at 11:53 AM Linux regression tracking (Thorsten
> Leemhuis) <regressions@leemhuis.info> wrote:
>> On 11.02.23 15:27, Linux regression tracking (Thorsten Leemhuis) wrote:
>>> On 10.02.23 11:07, Javier Martinez Canillas wrote:
>>>> On 2/10/23 10:22, Vikash Garodia wrote:
>>>>
>>>>>> So what should we do about this folks? Since not allowing the driver to probe on
>>>>>> at least SC7180 is a quite serious regression, can we revert for now until a proper
>>>>>> fix is figured out?
>>>>> I am able to repro this issue on sc7180 and discussing with firmware team on the cause
>>>>> of reset failure. The original patch was raised for fixing rare SMMU faults during warm
>>>>> boot of video hardware. Hence looking to understand the regressing part before we
>>>>> proceed to revert.
>>>> Great, if you are working on a proper fix then that would be much better indeed.
>>> Yeah, that's great, but OTOH: there is almost certainly just one week
>>> before 6.2 will be released. Ideally this should be fixed by then.
>>> Vikash, do you think that's in the cards? If not: why not revert this
>>> now to make sure 6.2 works fine?
>> Hmm, no reply. And we meanwhile have Wednesday and 6.2 is almost
>> certainly going to be out on Sunday. And the problem was called "a quite
>> serious regression" above. So why not quickly fix this with the revert,
>> as proposed earlier?
>> Vikash? Javier?
>
> I agree with you, that we should land this revert and then properly
> fix the page fault issue in v6.3.
> 
> But it's not my call, the v4l2/media folks have to decide that.

In that case: Mauro, what's your opinion here?

Thread starts here:
https://lore.kernel.org/lkml/20230207102254.1446461-1-javierm@redhat.com/

Regression report:
https://lore.kernel.org/lkml/Y9LSMap%2BjRxbtpC8@google.com/

Ciao, Thorsten
  
Linux regression tracking (Thorsten Leemhuis) Feb. 21, 2023, 3:03 p.m. UTC | #12
On 15.02.23 14:18, Linux regression tracking (Thorsten Leemhuis) wrote:
> On 15.02.23 11:57, Javier Martinez Canillas wrote:
>> On Wed, Feb 15, 2023 at 11:53 AM Linux regression tracking (Thorsten
>> Leemhuis) <regressions@leemhuis.info> wrote:
>>> On 11.02.23 15:27, Linux regression tracking (Thorsten Leemhuis) wrote:
>>>> On 10.02.23 11:07, Javier Martinez Canillas wrote:
>>>>> On 2/10/23 10:22, Vikash Garodia wrote:
>>>>>
>>>>>>> So what should we do about this folks? Since not allowing the driver to probe on
>>>>>>> at least SC7180 is a quite serious regression, can we revert for now until a proper
>>>>>>> fix is figured out?
>>>>>> I am able to repro this issue on sc7180 and discussing with firmware team on the cause
>>>>>> of reset failure. The original patch was raised for fixing rare SMMU faults during warm
>>>>>> boot of video hardware. Hence looking to understand the regressing part before we
>>>>>> proceed to revert.
>>>>> Great, if you are working on a proper fix then that would be much better indeed.
>>>> Yeah, that's great, but OTOH: there is almost certainly just one week
>>>> before 6.2 will be released. Ideally this should be fixed by then.
>>>> Vikash, do you think that's in the cards? If not: why not revert this
>>>> now to make sure 6.2 works fine?
>>> Hmm, no reply. And we meanwhile have Wednesday and 6.2 is almost
>>> certainly going to be out on Sunday. And the problem was called "a quite
>>> serious regression" above. So why not quickly fix this with the revert,
>>> as proposed earlier?
>>> Vikash? Javier?
>>
>> I agree with you, that we should land this revert and then properly
>> fix the page fault issue in v6.3.
>>
>> But it's not my call, the v4l2/media folks have to decide that.
> 
> In that case: Mauro, what's your opinion here?
> 
> Thread starts here:
> https://lore.kernel.org/lkml/20230207102254.1446461-1-javierm@redhat.com/
> 
> Regression report:
> https://lore.kernel.org/lkml/Y9LSMap%2BjRxbtpC8@google.com/

No reply from Mauro and Linus chose to not apply the revert I pointed
him to. That at this point leads to the question:

Vikash, did you or somebody else make any progress to fix this properly?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke
  
Vikash Garodia Feb. 23, 2023, 5:45 a.m. UTC | #13
Hi All,

>-----Original Message-----
>From: Thorsten Leemhuis <regressions@leemhuis.info>
>Sent: Tuesday, February 21, 2023 8:33 PM
>To: Vikash Garodia <vgarodia@qti.qualcomm.com>
>Cc: linux-kernel@vger.kernel.org; mka@chromium.org; Albert Esteve
><aesteve@redhat.com>; stanimir.varbanov@linaro.org; Enric Balletbo i Serra
><eballetb@redhat.com>; Andy Gross <agross@kernel.org>; Bjorn Andersson
><andersson@kernel.org>; Konrad Dybcio <konrad.dybcio@linaro.org>; Stanimir
>Varbanov <stanimir.k.varbanov@gmail.com>; Vikash Garodia (QUIC)
><quic_vgarodia@quicinc.com>; linux-arm-msm@vger.kernel.org; linux-
>media@vger.kernel.org; Fritz Koenig <frkoenig@google.com>; Dikshita Agarwal
>(QUIC) <quic_dikshita@quicinc.com>; Rajeshwar Kurapaty (QUIC)
><quic_rkurapat@quicinc.com>; Javier Martinez Canillas <javierm@redhat.com>;
>Linux regressions mailing list <regressions@lists.linux.dev>; Mauro Carvalho
>Chehab <mchehab@kernel.org>
>Subject: Re: [PATCH] Revert "venus: firmware: Correct non-pix start and end
>addresses"
>
>WARNING: This email originated from outside of Qualcomm. Please be wary of
>any links or attachments, and do not enable macros.
>
>On 15.02.23 14:18, Linux regression tracking (Thorsten Leemhuis) wrote:
>> On 15.02.23 11:57, Javier Martinez Canillas wrote:
>>> On Wed, Feb 15, 2023 at 11:53 AM Linux regression tracking (Thorsten
>>> Leemhuis) <regressions@leemhuis.info> wrote:
>>>> On 11.02.23 15:27, Linux regression tracking (Thorsten Leemhuis) wrote:
>>>>> On 10.02.23 11:07, Javier Martinez Canillas wrote:
>>>>>> On 2/10/23 10:22, Vikash Garodia wrote:
>>>>>>
>>>>>>>> So what should we do about this folks? Since not allowing the
>>>>>>>> driver to probe on at least SC7180 is a quite serious
>>>>>>>> regression, can we revert for now until a proper fix is figured out?
>>>>>>> I am able to repro this issue on sc7180 and discussing with
>>>>>>> firmware team on the cause of reset failure. The original patch
>>>>>>> was raised for fixing rare SMMU faults during warm boot of video
>>>>>>> hardware. Hence looking to understand the regressing part before we
>proceed to revert.
>>>>>> Great, if you are working on a proper fix then that would be much better
>indeed.
>>>>> Yeah, that's great, but OTOH: there is almost certainly just one
>>>>> week before 6.2 will be released. Ideally this should be fixed by then.
>>>>> Vikash, do you think that's in the cards? If not: why not revert
>>>>> this now to make sure 6.2 works fine?
>>>> Hmm, no reply. And we meanwhile have Wednesday and 6.2 is almost
>>>> certainly going to be out on Sunday. And the problem was called "a
>>>> quite serious regression" above. So why not quickly fix this with
>>>> the revert, as proposed earlier?
>>>> Vikash? Javier?
>>>
>>> I agree with you, that we should land this revert and then properly
>>> fix the page fault issue in v6.3.
>>>
>>> But it's not my call, the v4l2/media folks have to decide that.
>>
>> In that case: Mauro, what's your opinion here?
>>
>> Thread starts here:
>> https://lore.kernel.org/lkml/20230207102254.1446461-1-javierm@redhat.c
>> om/
>>
>> Regression report:
>> https://lore.kernel.org/lkml/Y9LSMap%2BjRxbtpC8@google.com/
>
>No reply from Mauro and Linus chose to not apply the revert I pointed him to.
>That at this point leads to the question:
>
>Vikash, did you or somebody else make any progress to fix this properly?

We tried with different settings for the registers and arrive at a conclusion that
the original configuration was proper. There is no need to explicitly configure
the secure non-pixel region when there is no support for the usecase. So, in summary,
we are good to have the revert.

Stan, could you please help with the revert and a pull request having this revert
alongwith other pending changes ?

>Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>--
>Everything you wanna know about Linux kernel regression tracking:
>https://linux-regtracking.leemhuis.info/about/#tldr
>If I did something stupid, please tell me, as explained on that page.
>
>#regzbot poke
  
Javier Martinez Canillas Feb. 23, 2023, 8:05 a.m. UTC | #14
Vikash Garodia <vgarodia@qti.qualcomm.com> writes:

Hello Vikash,

> Hi All,
>

[...]

>>
>>No reply from Mauro and Linus chose to not apply the revert I pointed him to.
>>That at this point leads to the question:
>>
>>Vikash, did you or somebody else make any progress to fix this properly?
>
> We tried with different settings for the registers and arrive at a conclusion that
> the original configuration was proper. There is no need to explicitly configure
> the secure non-pixel region when there is no support for the usecase. So, in summary,
> we are good to have the revert.
>

Perfect. Thanks a lot for looking at this.

> Stan, could you please help with the revert and a pull request having this revert
> alongwith other pending changes ?
>

Other fix posted is "media: venus: dec: Fix capture formats enumeration order":

https://patchwork.kernel.org/project/linux-media/patch/20230210081835.2054482-1-javierm@redhat.com/
  
Javier Martinez Canillas Feb. 28, 2023, 4:03 p.m. UTC | #15
Javier Martinez Canillas <javierm@redhat.com> writes:

> Vikash Garodia <vgarodia@qti.qualcomm.com> writes:
>
> Hello Vikash,
>
>> Hi All,
>>
>
> [...]
>
>>>
>>>No reply from Mauro and Linus chose to not apply the revert I pointed him to.
>>>That at this point leads to the question:
>>>
>>>Vikash, did you or somebody else make any progress to fix this properly?
>>
>> We tried with different settings for the registers and arrive at a conclusion that
>> the original configuration was proper. There is no need to explicitly configure
>> the secure non-pixel region when there is no support for the usecase. So, in summary,
>> we are good to have the revert.
>>
>
> Perfect. Thanks a lot for looking at this.
>
>> Stan, could you please help with the revert and a pull request having this revert
>> alongwith other pending changes ?
>>
>
> Other fix posted is "media: venus: dec: Fix capture formats enumeration order":
>
> https://patchwork.kernel.org/project/linux-media/patch/20230210081835.2054482-1-javierm@redhat.com/
>

Vikash,

Could you or someone else from QC please Review/Ack these two patches,
since it seems that Stanimir moved on and maybe is not working in this
driver anymore?
  
Javier Martinez Canillas March 6, 2023, 10:43 a.m. UTC | #16
Dikshita Agarwal <quic_dikshita@quicinc.com> writes:

Hello Dikshita,

> On 3/1/2023 3:15 PM, Dikshita Agarwal wrote:
>>
>>
>> On 2/28/2023 9:33 PM, Javier Martinez Canillas wrote:
>>> Javier Martinez Canillas<javierm@redhat.com>  writes:
>>>
>>>> Vikash Garodia<vgarodia@qti.qualcomm.com>  writes:
>>>>
>>>> Hello Vikash,
>>>>
>>>>> Hi All,
>>>>>
>>>> [...]
>>>>
>>>>>> No reply from Mauro and Linus chose to not apply the revert I pointed him to.
>>>>>> That at this point leads to the question:
>>>>>>
>>>>>> Vikash, did you or somebody else make any progress to fix this properly?
>>>>> We tried with different settings for the registers and arrive at a conclusion that
>>>>> the original configuration was proper. There is no need to explicitly configure
>>>>> the secure non-pixel region when there is no support for the usecase. So, in summary,
>>>>> we are good to have the revert.
>>>>>
>>>> Perfect. Thanks a lot for looking at this.
>>>>
>>>>> Stan, could you please help with the revert and a pull request having this revert
>>>>> alongwith other pending changes ?
>>>>>
>>>> Other fix posted is "media: venus: dec: Fix capture formats enumeration order":
>>>>
>>>> https://patchwork.kernel.org/project/linux-media/patch/20230210081835.2054482-1-javierm@redhat.com/
>
> Hi Javier,
>
> Thanks for this patch "media: venus: dec: Fix capture formats 
> enumeration order".
>
> Somehow I can't find it in my mailbox to be able to reply there.
>
> Could you please explain what is the regression you see here?
>

You can find the thread and explanation of the issue here:

https://lore.kernel.org/lkml/Y+KPW18o%2FDa+N8UI@google.com/T/

But Stanimir already picked it and sent a PR for v6.3 including it.
  
Leonard Lausen April 1, 2023, 8:53 p.m. UTC | #17
Hi Javier, Dikshita, Stan,

the revert wasn't applied to v6.2 series. Can you please apply it and include it for v6.2.10?

March 6, 2023 at 5:43 AM, "Javier Martinez Canillas" <javierm@redhat.com> wrote:
>> On 3/1/2023 3:15 PM, Dikshita Agarwal wrote:
>>> On 2/28/2023 9:33 PM, Javier Martinez Canillas wrote:
>>>> Javier Martinez Canillas<javierm@redhat.com>  writes:
>>>>> Vikash Garodia<vgarodia@qti.qualcomm.com>  writes:
>>>>>
>>>>>> Stan, could you please help with the revert and a pull request having this revert
>>>>>> alongwith other pending changes ?
>>>>>>
>>>>> Other fix posted is "media: venus: dec: Fix capture formats enumeration order":
>>>>>
>>>>> https://patchwork.kernel.org/project/linux-media/patch/20230210081835.2054482-1-javierm@redhat.com/
>>
>> Hi Javier,
>>
>> Thanks for this patch "media: venus: dec: Fix capture formats
>> enumeration order".
>>
>> Somehow I can't find it in my mailbox to be able to reply there.
>>
>> Could you please explain what is the regression you see here?
>>
>
>You can find the thread and explanation of the issue here:
>
>https://lore.kernel.org/lkml/Y+KPW18o%2FDa+N8UI@google.com/T/
>
>But Stanimir already picked it and sent a PR for v6.3 including it.

While "media: venus: dec: Fix capture formats enumeration order" may have been
applied to v6.3, this still leaves the regression introduced by "venus:
firmware: Correct non-pix start and end addresses". As pointed out by Matthias
Kaehlcke, the commit prevents SC7180 and sc7280 AOSS from entering sleep mode
during system suspend. This is a serious regression in v6.2 kernel series.

Best regards,
Leonard Lausen
  
Linux regression tracking (Thorsten Leemhuis) April 2, 2023, 5:02 a.m. UTC | #18
On 01.04.23 22:53, Leonard Lausen wrote:
> Hi Javier, Dikshita, Stan,
> 
> the revert wasn't applied to v6.2 series. Can you please apply it and include it for v6.2.10?
> 
> March 6, 2023 at 5:43 AM, "Javier Martinez Canillas" <javierm@redhat.com> wrote:
>>> On 3/1/2023 3:15 PM, Dikshita Agarwal wrote:
>>>> On 2/28/2023 9:33 PM, Javier Martinez Canillas wrote:
>>>>> Javier Martinez Canillas<javierm@redhat.com>  writes:
>>>>>> Vikash Garodia<vgarodia@qti.qualcomm.com>  writes:
>>>>>>
>>>>>>> Stan, could you please help with the revert and a pull request having this revert
>>>>>>> alongwith other pending changes ?
>>>>>>>
>>>>>> Other fix posted is "media: venus: dec: Fix capture formats enumeration order":
>>>>>>
>>>>>> https://patchwork.kernel.org/project/linux-media/patch/20230210081835.2054482-1-javierm@redhat.com/
>>>
>>> Hi Javier,
>>>
>>> Thanks for this patch "media: venus: dec: Fix capture formats
>>> enumeration order".
>>>
>>> Somehow I can't find it in my mailbox to be able to reply there.
>>>
>>> Could you please explain what is the regression you see here?
>>>
>>
>> You can find the thread and explanation of the issue here:
>>
>> https://lore.kernel.org/lkml/Y+KPW18o%2FDa+N8UI@google.com/T/
>>
>> But Stanimir already picked it and sent a PR for v6.3 including it.
> 
> While "media: venus: dec: Fix capture formats enumeration order" may have been
> applied to v6.3,

To me it looks like it was submitted[1], but not yet applied even to the
media tree[2] -- while guess, maybe due problems mentioned in[3]? Or am
I missing something?

[1]
https://lore.kernel.org/all/20230329211655.100276-1-stanimir.k.varbanov@gmail.com/
[2] https://git.linuxtv.org/media_tree.git/log/?h=fixes
[3]
https://lore.kernel.org/all/20230329214310.2503484-1-jenkins@linuxtv.org/

> this still leaves the regression introduced by "venus:
> firmware: Correct non-pix start and end addresses". As pointed out by Matthias
> Kaehlcke, the commit prevents SC7180 and sc7280 AOSS from entering sleep mode
> during system suspend. This is a serious regression in v6.2 kernel series.

That fix is sitting in the media tree for a while and afaics still
hasn't been sent to Linus (which is needed to get this fixed in 6.2.y).

Mauro, could you maybe take care of that?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.
  
Leonard Lausen April 3, 2023, 12:27 a.m. UTC | #19
April 2, 2023 at 1:02 AM, <regressions@leemhuis.info> wrote:
> > this still leaves the regression introduced by "venus:
> >  firmware: Correct non-pix start and end addresses". As pointed out by Matthias
> >  Kaehlcke, the commit prevents SC7180 and sc7280 AOSS from entering sleep mode
> >  during system suspend. This is a serious regression in v6.2 kernel series.
> > 
> That fix is sitting in the media tree for a while and afaics still
> hasn't been sent to Linus (which is needed to get this fixed in 6.2.y).
> Mauro, could you maybe take care of that?

I see the revert made it to 6.3-rc5 as commit f95b8ea7. Now it just needs to
be included for v6.2.10

Thank you
Leonard Lausen
  
Linux regression tracking (Thorsten Leemhuis) April 3, 2023, 6:32 a.m. UTC | #20
On 02.04.23 07:02, Linux regression tracking (Thorsten Leemhuis) wrote:
> On 01.04.23 22:53, Leonard Lausen wrote:
>>
>> the revert wasn't applied to v6.2 series. Can you please apply it and include it for v6.2.10?

I pointed Linus to this and he merged the revert directly; and it's
already queued for the next 6.2.y release:

https://lore.kernel.org/all/CAHk-%3DwhRs_MavKCqtV3%3DK31dq9Z6HzbaG8Uxo-EV%3DuRxdsXduA@mail.gmail.com/
https://git.kernel.org/torvalds/c/f95b8ea79c47c0ad3d18f45ad538f9970e414d1f
https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/commit/?id=902f9eb696dfdd40e88d99bafa34ea25f1f9e927

Now to the remaining venus regression:

>> March 6, 2023 at 5:43 AM, "Javier Martinez Canillas" <javierm@redhat.com> wrote:
>>>> On 3/1/2023 3:15 PM, Dikshita Agarwal wrote:
>>>>> On 2/28/2023 9:33 PM, Javier Martinez Canillas wrote:
>>>>>> Javier Martinez Canillas<javierm@redhat.com>  writes:
>>>>>>> Vikash Garodia<vgarodia@qti.qualcomm.com>  writes:
>>>>>>>
>>>>>>>> Stan, could you please help with the revert and a pull request having this revert
>>>>>>>> alongwith other pending changes ?
>>>>>>>>
>>>>>>> Other fix posted is "media: venus: dec: Fix capture formats enumeration order":
>>>>>>>
>>>>>>> https://patchwork.kernel.org/project/linux-media/patch/20230210081835.2054482-1-javierm@redhat.com/
>>>>
>>>> Hi Javier,
>>>>
>>>> Thanks for this patch "media: venus: dec: Fix capture formats
>>>> enumeration order".
>>>>
>>>> Somehow I can't find it in my mailbox to be able to reply there.
>>>>
>>>> Could you please explain what is the regression you see here?
>>>>
>>>
>>> You can find the thread and explanation of the issue here:
>>>
>>> https://lore.kernel.org/lkml/Y+KPW18o%2FDa+N8UI@google.com/T/
>>>
>>> But Stanimir already picked it and sent a PR for v6.3 including it.
>>
>> While "media: venus: dec: Fix capture formats enumeration order" may have been
>> applied to v6.3,
> 
> To me it looks like it was submitted[1], but not yet applied even to the
> media tree[2] -- while guess, maybe due problems mentioned in[3]? Or am
> I missing something?
> 
> [1]
> https://lore.kernel.org/all/20230329211655.100276-1-stanimir.k.varbanov@gmail.com/
> [2] https://git.linuxtv.org/media_tree.git/log/?h=fixes
> [3]
> https://lore.kernel.org/all/20230329214310.2503484-1-jenkins@linuxtv.org/

I only notice now: from [1] above it looks like that regression fix was
applied to a tree that seems to be intended for 6.4. Is that okay for
everybody, or should we ask Linus to pick this up as well (unless of
course Mauro shows up and forwards the patch, of course)? They fix a
regression from 5.19 afaics, so not a fresh problem, but apparently one
that bugged a few people recently.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.
  

Patch

diff --git a/drivers/media/platform/qcom/venus/firmware.c b/drivers/media/platform/qcom/venus/firmware.c
index 142d4c74017c..d59ecf776715 100644
--- a/drivers/media/platform/qcom/venus/firmware.c
+++ b/drivers/media/platform/qcom/venus/firmware.c
@@ -38,8 +38,8 @@  static void venus_reset_cpu(struct venus_core *core)
 	writel(fw_size, wrapper_base + WRAPPER_FW_END_ADDR);
 	writel(0, wrapper_base + WRAPPER_CPA_START_ADDR);
 	writel(fw_size, wrapper_base + WRAPPER_CPA_END_ADDR);
-	writel(0, wrapper_base + WRAPPER_NONPIX_START_ADDR);
-	writel(0, wrapper_base + WRAPPER_NONPIX_END_ADDR);
+	writel(fw_size, wrapper_base + WRAPPER_NONPIX_START_ADDR);
+	writel(fw_size, wrapper_base + WRAPPER_NONPIX_END_ADDR);
 
 	if (IS_V6(core)) {
 		/* Bring XTSS out of reset */