Message ID | 20221118074810.380368-1-lizhenneng@kylinos.cn |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp53704wrr; Fri, 18 Nov 2022 00:00:26 -0800 (PST) X-Google-Smtp-Source: AA0mqf4hXJGWbd3j8966qXqEMST8aF3VtJN/51lfiuOS49fVcHI7GTmQrcZDdIqBMllYXzr/Ligr X-Received: by 2002:a05:6402:150:b0:468:fdc3:6b44 with SMTP id s16-20020a056402015000b00468fdc36b44mr3015956edu.388.1668758426085; Fri, 18 Nov 2022 00:00:26 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668758426; cv=none; d=google.com; s=arc-20160816; b=hB05slkvva1zkLfB1DM7YvSCEvgm2d7EEmxa+/L2v9+t89hgmG3wzJxMCm48ZvDIi2 htUNvZgs/LpbR00ublhW3rv45WxStL/+XgIFkYDQ5vq8Dy6pnTQt/UGO4NKfa2mED5Vd r+xiIvV1TNHH1sJQGCC1re+o6ug/oCujWez5Y7WtMHfo2fcSrM3w3USdRyJ4ZzMg6/FJ UYJwHJi1RaUSMCGVGWIc/vB0Q/rUwOyNC0zyS/Z0MbbVU+bOgQmiB15cKmuAUZwOtSOm m5cztaeo8TmokdhNwM6QcaEyGA8grCygvX2V/+HCEionSbnyvdh/ZuANOECJwX4jo+Vz JviQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=iiX8KWY83eAeqCIAj9x8KENhrg5CCb/889I4apjyHEk=; b=G9D5Me8WmoQwzjab0qzFSGftBo9h08D+dFsUbaVuyvdeVdwOZbc093OOHHS4FVhRR7 DQ+tmYmmvFD3tuXB1QYg1Yred9NcGK0sSZJtshOkmwnarT2RBoj5DLf6UtHMUSD4vHJt +mekB1/HXoMS8FkGsDiCKdfAuZEIxjRHC3BWis6avpmcLxHRl/RJbxk/wxSGbBEn8On+ 2xfDSe7ynCAvZogATboh0plbACgTm7QS/FBSBa6FQ6DzWOBFpqt7KihqPYVFGudzhiHX 9SHz1Pg/b3gmd+8ZMXgl087PohmDfrJyySULSIHKaBP046REQ9E2r/7YdxMHQSmdVm64 haVA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ga5-20020a1709070c0500b0078a19032c70si3007434ejc.334.2022.11.18.00.00.01; Fri, 18 Nov 2022 00:00:26 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241276AbiKRHsv (ORCPT <rfc822;a1648639935@gmail.com> + 99 others); Fri, 18 Nov 2022 02:48:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56340 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241110AbiKRHst (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 18 Nov 2022 02:48:49 -0500 Received: from mailgw.kylinos.cn (unknown [124.126.103.232]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 93C6113D41 for <linux-kernel@vger.kernel.org>; Thu, 17 Nov 2022 23:48:44 -0800 (PST) X-UUID: e2b1022d870444de904bc7758601fd3f-20221118 X-CPASD-INFO: 2196396c782b4b0f9f54fe0c44f3a206@e4FzVmBrZJNjWHSug6V7oFmXZWSWkVC ydm6GY49iXFaVhH5xTV5uYFV9fWtVYV9dYVR6eGxQYmBgZFJ4i3-XblBgXoZgUZB3gXNzVmNnZg== X-CLOUD-ID: 2196396c782b4b0f9f54fe0c44f3a206 X-CPASD-SUMMARY: SIP:-1,APTIP:-2.0,KEY:0.0,FROMBLOCK:1,OB:0.0,URL:-5,TVAL:196. 0,ESV:0.0,ECOM:-5.0,ML:0.0,FD:0.0,CUTS:123.0,IP:-2.0,MAL:-5.0,PHF:-5.0,PHC:-5 .0,SPF:4.0,EDMS:-5,IPLABEL:4480.0,FROMTO:0,AD:0,FFOB:0.0,CFOB:0.0,SPC:0,SIG:- 5,AUF:6,DUF:8732,ACD:145,DCD:145,SL:0,EISP:0,AG:0,CFC:0.421,CFSR:0.046,UAT:0, RAF:0,IMG:-5.0,DFA:0,DTA:0,IBL:-2.0,ADI:-5,SBL:0,REDM:0,REIP:0,ESB:0,ATTNUM:0 ,EAF:0,CID:-5.0,VERSION:2.3.17 X-CPASD-ID: e2b1022d870444de904bc7758601fd3f-20221118 X-CPASD-BLOCK: 1000 X-CPASD-STAGE: 1 X-UUID: e2b1022d870444de904bc7758601fd3f-20221118 X-User: lizhenneng@kylinos.cn Received: from localhost.localdomain [(116.128.244.169)] by mailgw (envelope-from <lizhenneng@kylinos.cn>) (Generic MTA) with ESMTP id 1969169403; Fri, 18 Nov 2022 15:49:02 +0800 From: Zhenneng Li <lizhenneng@kylinos.cn> To: Alex Deucher <alexander.deucher@amd.com> Cc: =?utf-8?q?Christian_K=C3=B6nig?= <christian.koenig@amd.com>, Xinhui.Pan@amd.com, David Airlie <airlied@gmail.com>, Daniel Vetter <daniel@ffwll.ch>, amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, Zhenneng Li <lizhenneng@kylinos.cn> Subject: [PATCH] drm/amdgpu: add mb for si Date: Fri, 18 Nov 2022 15:48:10 +0800 Message-Id: <20221118074810.380368-1-lizhenneng@kylinos.cn> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.9 required=5.0 tests=BAYES_00,PDS_RDNS_DYNAMIC_FP, RDNS_DYNAMIC,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1749820035316805617?= X-GMAIL-MSGID: =?utf-8?q?1749820035316805617?= |
Series |
drm/amdgpu: add mb for si
|
|
Commit Message
李真能
Nov. 18, 2022, 7:48 a.m. UTC
During reboot test on arm64 platform, it may failure on boot,
so add this mb in smc.
The error message are as follows:
[ 6.996395][ 7] [ T295] [drm:amdgpu_device_ip_late_init [amdgpu]] *ERROR*
late_init of IP block <si_dpm> failed -22
[ 7.006919][ 7] [ T295] amdgpu 0000:04:00.0: amdgpu_device_ip_late_init failed
[ 7.014224][ 7] [ T295] amdgpu 0000:04:00.0: Fatal error during GPU init
Signed-off-by: Zhenneng Li <lizhenneng@kylinos.cn>
---
drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c | 2 ++
1 file changed, 2 insertions(+)
Comments
Am 18.11.22 um 08:48 schrieb Zhenneng Li: > During reboot test on arm64 platform, it may failure on boot, > so add this mb in smc. > > The error message are as follows: > [ 6.996395][ 7] [ T295] [drm:amdgpu_device_ip_late_init [amdgpu]] *ERROR* > late_init of IP block <si_dpm> failed -22 > [ 7.006919][ 7] [ T295] amdgpu 0000:04:00.0: amdgpu_device_ip_late_init failed > [ 7.014224][ 7] [ T295] amdgpu 0000:04:00.0: Fatal error during GPU init Memory barries are not supposed to be sprinkled around like this, you need to give a detailed explanation why this is necessary. Regards, Christian. > > Signed-off-by: Zhenneng Li <lizhenneng@kylinos.cn> > --- > drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c b/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c > index 8f994ffa9cd1..c7656f22278d 100644 > --- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c > +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c > @@ -155,6 +155,8 @@ bool amdgpu_si_is_smc_running(struct amdgpu_device *adev) > u32 rst = RREG32_SMC(SMC_SYSCON_RESET_CNTL); > u32 clk = RREG32_SMC(SMC_SYSCON_CLOCK_CNTL_0); > > + mb(); > + > if (!(rst & RST_REG) && !(clk & CK_DISABLE)) > return true; >
On 11/18/22 09:01, Christian König wrote: > Am 18.11.22 um 08:48 schrieb Zhenneng Li: >> During reboot test on arm64 platform, it may failure on boot, >> so add this mb in smc. >> >> The error message are as follows: >> [ 6.996395][ 7] [ T295] [drm:amdgpu_device_ip_late_init [amdgpu]] *ERROR* >> late_init of IP block <si_dpm> failed -22 >> [ 7.006919][ 7] [ T295] amdgpu 0000:04:00.0: amdgpu_device_ip_late_init failed >> [ 7.014224][ 7] [ T295] amdgpu 0000:04:00.0: Fatal error during GPU init > > Memory barries are not supposed to be sprinkled around like this, you need to give a detailed explanation why this is necessary. > > Regards, > Christian. > >> >> Signed-off-by: Zhenneng Li <lizhenneng@kylinos.cn> >> --- >> drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c b/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c >> index 8f994ffa9cd1..c7656f22278d 100644 >> --- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c >> +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c >> @@ -155,6 +155,8 @@ bool amdgpu_si_is_smc_running(struct amdgpu_device *adev) >> u32 rst = RREG32_SMC(SMC_SYSCON_RESET_CNTL); >> u32 clk = RREG32_SMC(SMC_SYSCON_CLOCK_CNTL_0); >> + mb(); >> + >> if (!(rst & RST_REG) && !(clk & CK_DISABLE)) >> return true; In particular, it makes no sense in this specific place, since it cannot directly affect the values of rst & clk.
在 2022/11/18 17:18, Michel Dänzer 写道: > On 11/18/22 09:01, Christian König wrote: >> Am 18.11.22 um 08:48 schrieb Zhenneng Li: >>> During reboot test on arm64 platform, it may failure on boot, >>> so add this mb in smc. >>> >>> The error message are as follows: >>> [ 6.996395][ 7] [ T295] [drm:amdgpu_device_ip_late_init [amdgpu]] *ERROR* >>> late_init of IP block <si_dpm> failed -22 >>> [ 7.006919][ 7] [ T295] amdgpu 0000:04:00.0: amdgpu_device_ip_late_init failed >>> [ 7.014224][ 7] [ T295] amdgpu 0000:04:00.0: Fatal error during GPU init >> Memory barries are not supposed to be sprinkled around like this, you need to give a detailed explanation why this is necessary. >> >> Regards, >> Christian. >> >>> Signed-off-by: Zhenneng Li <lizhenneng@kylinos.cn> >>> --- >>> drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c | 2 ++ >>> 1 file changed, 2 insertions(+) >>> >>> diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c b/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c >>> index 8f994ffa9cd1..c7656f22278d 100644 >>> --- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c >>> +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c >>> @@ -155,6 +155,8 @@ bool amdgpu_si_is_smc_running(struct amdgpu_device *adev) >>> u32 rst = RREG32_SMC(SMC_SYSCON_RESET_CNTL); >>> u32 clk = RREG32_SMC(SMC_SYSCON_CLOCK_CNTL_0); >>> + mb(); >>> + >>> if (!(rst & RST_REG) && !(clk & CK_DISABLE)) >>> return true; > In particular, it makes no sense in this specific place, since it cannot directly affect the values of rst & clk. I thinks so too. But when I do reboot test using nine desktop machines, there maybe report this error on one or two machines after Hundreds of times or Thousands of times reboot test, at the beginning, I use msleep() instead of mb(), these two methods are all works, but I don't know what is the root case. I use this method on other verdor's oland card, this error message are reported again. What could be the root reason? test environmen: graphics card: OLAND 0x1002:0x6611 0x1642:0x1869 0x87 driver: amdgpu os: ubuntu 2004 platform: arm64 kernel: 5.4.18 >
[AMD Official Use Only - General] Could the attached patch help? Evan > -----Original Message----- > From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of ??? > Sent: Friday, November 18, 2022 5:25 PM > To: Michel Dänzer <michel.daenzer@mailbox.org>; Koenig, Christian > <Christian.Koenig@amd.com>; Deucher, Alexander > <Alexander.Deucher@amd.com> > Cc: amd-gfx@lists.freedesktop.org; Pan, Xinhui <Xinhui.Pan@amd.com>; > linux-kernel@vger.kernel.org; dri-devel@lists.freedesktop.org > Subject: Re: [PATCH] drm/amdgpu: add mb for si > > > 在 2022/11/18 17:18, Michel Dänzer 写道: > > On 11/18/22 09:01, Christian König wrote: > >> Am 18.11.22 um 08:48 schrieb Zhenneng Li: > >>> During reboot test on arm64 platform, it may failure on boot, so add > >>> this mb in smc. > >>> > >>> The error message are as follows: > >>> [ 6.996395][ 7] [ T295] [drm:amdgpu_device_ip_late_init > >>> [amdgpu]] *ERROR* > >>> late_init of IP block <si_dpm> failed -22 [ > >>> 7.006919][ 7] [ T295] amdgpu 0000:04:00.0: > >>> amdgpu_device_ip_late_init failed [ 7.014224][ 7] [ T295] amdgpu > >>> 0000:04:00.0: Fatal error during GPU init > >> Memory barries are not supposed to be sprinkled around like this, you > need to give a detailed explanation why this is necessary. > >> > >> Regards, > >> Christian. > >> > >>> Signed-off-by: Zhenneng Li <lizhenneng@kylinos.cn> > >>> --- > >>> drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c | 2 ++ > >>> 1 file changed, 2 insertions(+) > >>> > >>> diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c > >>> b/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c > >>> index 8f994ffa9cd1..c7656f22278d 100644 > >>> --- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c > >>> +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c > >>> @@ -155,6 +155,8 @@ bool amdgpu_si_is_smc_running(struct > >>> amdgpu_device *adev) > >>> u32 rst = RREG32_SMC(SMC_SYSCON_RESET_CNTL); > >>> u32 clk = RREG32_SMC(SMC_SYSCON_CLOCK_CNTL_0); > >>> + mb(); > >>> + > >>> if (!(rst & RST_REG) && !(clk & CK_DISABLE)) > >>> return true; > > In particular, it makes no sense in this specific place, since it cannot directly > affect the values of rst & clk. > > I thinks so too. > > But when I do reboot test using nine desktop machines, there maybe report > this error on one or two machines after Hundreds of times or Thousands of > times reboot test, at the beginning, I use msleep() instead of mb(), these > two methods are all works, but I don't know what is the root case. > > I use this method on other verdor's oland card, this error message are > reported again. > > What could be the root reason? > > test environmen: > > graphics card: OLAND 0x1002:0x6611 0x1642:0x1869 0x87 > > driver: amdgpu > > os: ubuntu 2004 > > platform: arm64 > > kernel: 5.4.18 > > >
That's not a patch but some binary file? Christian. Am 24.11.22 um 11:04 schrieb Quan, Evan: > [AMD Official Use Only - General] > > Could the attached patch help? > > Evan >> -----Original Message----- >> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of ??? >> Sent: Friday, November 18, 2022 5:25 PM >> To: Michel Dänzer <michel.daenzer@mailbox.org>; Koenig, Christian >> <Christian.Koenig@amd.com>; Deucher, Alexander >> <Alexander.Deucher@amd.com> >> Cc: amd-gfx@lists.freedesktop.org; Pan, Xinhui <Xinhui.Pan@amd.com>; >> linux-kernel@vger.kernel.org; dri-devel@lists.freedesktop.org >> Subject: Re: [PATCH] drm/amdgpu: add mb for si >> >> >> 在 2022/11/18 17:18, Michel Dänzer 写道: >>> On 11/18/22 09:01, Christian König wrote: >>>> Am 18.11.22 um 08:48 schrieb Zhenneng Li: >>>>> During reboot test on arm64 platform, it may failure on boot, so add >>>>> this mb in smc. >>>>> >>>>> The error message are as follows: >>>>> [ 6.996395][ 7] [ T295] [drm:amdgpu_device_ip_late_init >>>>> [amdgpu]] *ERROR* >>>>> late_init of IP block <si_dpm> failed -22 [ >>>>> 7.006919][ 7] [ T295] amdgpu 0000:04:00.0: >>>>> amdgpu_device_ip_late_init failed [ 7.014224][ 7] [ T295] amdgpu >>>>> 0000:04:00.0: Fatal error during GPU init >>>> Memory barries are not supposed to be sprinkled around like this, you >> need to give a detailed explanation why this is necessary. >>>> Regards, >>>> Christian. >>>> >>>>> Signed-off-by: Zhenneng Li <lizhenneng@kylinos.cn> >>>>> --- >>>>> drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c | 2 ++ >>>>> 1 file changed, 2 insertions(+) >>>>> >>>>> diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c >>>>> b/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c >>>>> index 8f994ffa9cd1..c7656f22278d 100644 >>>>> --- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c >>>>> +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c >>>>> @@ -155,6 +155,8 @@ bool amdgpu_si_is_smc_running(struct >>>>> amdgpu_device *adev) >>>>> u32 rst = RREG32_SMC(SMC_SYSCON_RESET_CNTL); >>>>> u32 clk = RREG32_SMC(SMC_SYSCON_CLOCK_CNTL_0); >>>>> + mb(); >>>>> + >>>>> if (!(rst & RST_REG) && !(clk & CK_DISABLE)) >>>>> return true; >>> In particular, it makes no sense in this specific place, since it cannot directly >> affect the values of rst & clk. >> >> I thinks so too. >> >> But when I do reboot test using nine desktop machines, there maybe report >> this error on one or two machines after Hundreds of times or Thousands of >> times reboot test, at the beginning, I use msleep() instead of mb(), these >> two methods are all works, but I don't know what is the root case. >> >> I use this method on other verdor's oland card, this error message are >> reported again. >> >> What could be the root reason? >> >> test environmen: >> >> graphics card: OLAND 0x1002:0x6611 0x1642:0x1869 0x87 >> >> driver: amdgpu >> >> os: ubuntu 2004 >> >> platform: arm64 >> >> kernel: 5.4.18 >>
On 11/24/2022 3:34 PM, Quan, Evan wrote: > [AMD Official Use Only - General] > > Could the attached patch help? > > Evan >> -----Original Message----- >> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of ??? >> Sent: Friday, November 18, 2022 5:25 PM >> To: Michel Dänzer <michel.daenzer@mailbox.org>; Koenig, Christian >> <Christian.Koenig@amd.com>; Deucher, Alexander >> <Alexander.Deucher@amd.com> >> Cc: amd-gfx@lists.freedesktop.org; Pan, Xinhui <Xinhui.Pan@amd.com>; >> linux-kernel@vger.kernel.org; dri-devel@lists.freedesktop.org >> Subject: Re: [PATCH] drm/amdgpu: add mb for si >> >> >> 在 2022/11/18 17:18, Michel Dänzer 写道: >>> On 11/18/22 09:01, Christian König wrote: >>>> Am 18.11.22 um 08:48 schrieb Zhenneng Li: >>>>> During reboot test on arm64 platform, it may failure on boot, so add >>>>> this mb in smc. >>>>> >>>>> The error message are as follows: >>>>> [ 6.996395][ 7] [ T295] [drm:amdgpu_device_ip_late_init >>>>> [amdgpu]] *ERROR* >>>>> late_init of IP block <si_dpm> failed -22 [ >>>>> 7.006919][ 7] [ T295] amdgpu 0000:04:00.0: The issue is happening in late_init() which eventually does ret = si_thermal_enable_alert(adev, false); Just before this, si_thermal_start_thermal_controller is called in hw_init and that enables thermal alert. Maybe the issue is with enable/disable of thermal alerts in quick succession. Adding a delay inside si_thermal_start_thermal_controller might help. Thanks, Lijo >>>>> amdgpu_device_ip_late_init failed [ 7.014224][ 7] [ T295] amdgpu >>>>> 0000:04:00.0: Fatal error during GPU init >>>> Memory barries are not supposed to be sprinkled around like this, you >> need to give a detailed explanation why this is necessary. >>>> >>>> Regards, >>>> Christian. >>>> >>>>> Signed-off-by: Zhenneng Li <lizhenneng@kylinos.cn> >>>>> --- >>>>> drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c | 2 ++ >>>>> 1 file changed, 2 insertions(+) >>>>> >>>>> diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c >>>>> b/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c >>>>> index 8f994ffa9cd1..c7656f22278d 100644 >>>>> --- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c >>>>> +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c >>>>> @@ -155,6 +155,8 @@ bool amdgpu_si_is_smc_running(struct >>>>> amdgpu_device *adev) >>>>> u32 rst = RREG32_SMC(SMC_SYSCON_RESET_CNTL); >>>>> u32 clk = RREG32_SMC(SMC_SYSCON_CLOCK_CNTL_0); >>>>> + mb(); >>>>> + >>>>> if (!(rst & RST_REG) && !(clk & CK_DISABLE)) >>>>> return true; >>> In particular, it makes no sense in this specific place, since it cannot directly >> affect the values of rst & clk. >> >> I thinks so too. >> >> But when I do reboot test using nine desktop machines, there maybe report >> this error on one or two machines after Hundreds of times or Thousands of >> times reboot test, at the beginning, I use msleep() instead of mb(), these >> two methods are all works, but I don't know what is the root case. >> >> I use this method on other verdor's oland card, this error message are >> reported again. >> >> What could be the root reason? >> >> test environmen: >> >> graphics card: OLAND 0x1002:0x6611 0x1642:0x1869 0x87 >> >> driver: amdgpu >> >> os: ubuntu 2004 >> >> platform: arm64 >> >> kernel: 5.4.18 >> >>>
On 11/24/2022 4:11 PM, Lazar, Lijo wrote: > > > On 11/24/2022 3:34 PM, Quan, Evan wrote: >> [AMD Official Use Only - General] >> >> Could the attached patch help? >> >> Evan >>> -----Original Message----- >>> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of ??? >>> Sent: Friday, November 18, 2022 5:25 PM >>> To: Michel Dänzer <michel.daenzer@mailbox.org>; Koenig, Christian >>> <Christian.Koenig@amd.com>; Deucher, Alexander >>> <Alexander.Deucher@amd.com> >>> Cc: amd-gfx@lists.freedesktop.org; Pan, Xinhui <Xinhui.Pan@amd.com>; >>> linux-kernel@vger.kernel.org; dri-devel@lists.freedesktop.org >>> Subject: Re: [PATCH] drm/amdgpu: add mb for si >>> >>> >>> 在 2022/11/18 17:18, Michel Dänzer 写道: >>>> On 11/18/22 09:01, Christian König wrote: >>>>> Am 18.11.22 um 08:48 schrieb Zhenneng Li: >>>>>> During reboot test on arm64 platform, it may failure on boot, so add >>>>>> this mb in smc. >>>>>> >>>>>> The error message are as follows: >>>>>> [ 6.996395][ 7] [ T295] [drm:amdgpu_device_ip_late_init >>>>>> [amdgpu]] *ERROR* >>>>>> late_init of IP block <si_dpm> failed -22 [ >>>>>> 7.006919][ 7] [ T295] amdgpu 0000:04:00.0: > > The issue is happening in late_init() which eventually does > > ret = si_thermal_enable_alert(adev, false); > > Just before this, si_thermal_start_thermal_controller is called in > hw_init and that enables thermal alert. > > Maybe the issue is with enable/disable of thermal alerts in quick > succession. Adding a delay inside si_thermal_start_thermal_controller > might help. > On a second look, temperature range is already set as part of si_thermal_start_thermal_controller in hw_init https://elixir.bootlin.com/linux/v6.1-rc6/source/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c#L6780 There is no need to set it again here - https://elixir.bootlin.com/linux/v6.1-rc6/source/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c#L7635 I think it is safe to remove the call from late_init altogether. Alex/Evan? Thanks, Lijo > Thanks, > Lijo > >>>>>> amdgpu_device_ip_late_init failed [ 7.014224][ 7] [ T295] amdgpu >>>>>> 0000:04:00.0: Fatal error during GPU init >>>>> Memory barries are not supposed to be sprinkled around like this, you >>> need to give a detailed explanation why this is necessary. >>>>> >>>>> Regards, >>>>> Christian. >>>>> >>>>>> Signed-off-by: Zhenneng Li <lizhenneng@kylinos.cn> >>>>>> --- >>>>>> drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c | 2 ++ >>>>>> 1 file changed, 2 insertions(+) >>>>>> >>>>>> diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c >>>>>> b/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c >>>>>> index 8f994ffa9cd1..c7656f22278d 100644 >>>>>> --- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c >>>>>> +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c >>>>>> @@ -155,6 +155,8 @@ bool amdgpu_si_is_smc_running(struct >>>>>> amdgpu_device *adev) >>>>>> u32 rst = RREG32_SMC(SMC_SYSCON_RESET_CNTL); >>>>>> u32 clk = RREG32_SMC(SMC_SYSCON_CLOCK_CNTL_0); >>>>>> + mb(); >>>>>> + >>>>>> if (!(rst & RST_REG) && !(clk & CK_DISABLE)) >>>>>> return true; >>>> In particular, it makes no sense in this specific place, since it >>>> cannot directly >>> affect the values of rst & clk. >>> >>> I thinks so too. >>> >>> But when I do reboot test using nine desktop machines, there maybe >>> report >>> this error on one or two machines after Hundreds of times or >>> Thousands of >>> times reboot test, at the beginning, I use msleep() instead of mb(), >>> these >>> two methods are all works, but I don't know what is the root case. >>> >>> I use this method on other verdor's oland card, this error message are >>> reported again. >>> >>> What could be the root reason? >>> >>> test environmen: >>> >>> graphics card: OLAND 0x1002:0x6611 0x1642:0x1869 0x87 >>> >>> driver: amdgpu >>> >>> os: ubuntu 2004 >>> >>> platform: arm64 >>> >>> kernel: 5.4.18 >>> >>>>
[AMD Official Use Only - General] Did you see that? It's a patch which I created by git-format-patch. Anyway I will paste the changes below. I was suspecting maybe we need some waits for smu running. diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c index 49c398ec0aaf..9f308a021b2d 100644 --- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c @@ -6814,6 +6814,7 @@ static int si_dpm_enable(struct amdgpu_device *adev) struct si_power_info *si_pi = si_get_pi(adev); struct amdgpu_ps *boot_ps = adev->pm.dpm.boot_ps; int ret; + int i; if (amdgpu_si_is_smc_running(adev)) return -EINVAL; @@ -6909,6 +6910,17 @@ static int si_dpm_enable(struct amdgpu_device *adev) si_program_response_times(adev); si_program_ds_registers(adev); si_dpm_start_smc(adev); + /* Waiting for smc alive */ + for (i = 0; i < adev->usec_timeout; i++) { + if (amdgpu_si_is_smc_running(adev)) + break; + udelay(1); + } + if (i >= adev->usec_timeout) { + DRM_ERROR("Timedout on waiting for smu running\n"); + return -EINVAL; + } + ret = si_notify_smc_display_change(adev, false); if (ret) { DRM_ERROR("si_notify_smc_display_change failed\n"); BR Evan > -----Original Message----- > From: Christian König <ckoenig.leichtzumerken@gmail.com> > Sent: Thursday, November 24, 2022 6:06 PM > To: Quan, Evan <Evan.Quan@amd.com>; 李真能 <lizhenneng@kylinos.cn>; > Michel Dänzer <michel.daenzer@mailbox.org>; Koenig, Christian > <Christian.Koenig@amd.com>; Deucher, Alexander > <Alexander.Deucher@amd.com> > Cc: dri-devel@lists.freedesktop.org; Pan, Xinhui <Xinhui.Pan@amd.com>; > linux-kernel@vger.kernel.org; amd-gfx@lists.freedesktop.org > Subject: Re: [PATCH] drm/amdgpu: add mb for si > > That's not a patch but some binary file? > > Christian. > > Am 24.11.22 um 11:04 schrieb Quan, Evan: > > [AMD Official Use Only - General] > > > > Could the attached patch help? > > > > Evan > >> -----Original Message----- > >> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf > Of ??? > >> Sent: Friday, November 18, 2022 5:25 PM > >> To: Michel Dänzer <michel.daenzer@mailbox.org>; Koenig, Christian > >> <Christian.Koenig@amd.com>; Deucher, Alexander > >> <Alexander.Deucher@amd.com> > >> Cc: amd-gfx@lists.freedesktop.org; Pan, Xinhui <Xinhui.Pan@amd.com>; > >> linux-kernel@vger.kernel.org; dri-devel@lists.freedesktop.org > >> Subject: Re: [PATCH] drm/amdgpu: add mb for si > >> > >> > >> 在 2022/11/18 17:18, Michel Dänzer 写道: > >>> On 11/18/22 09:01, Christian König wrote: > >>>> Am 18.11.22 um 08:48 schrieb Zhenneng Li: > >>>>> During reboot test on arm64 platform, it may failure on boot, so > >>>>> add this mb in smc. > >>>>> > >>>>> The error message are as follows: > >>>>> [ 6.996395][ 7] [ T295] [drm:amdgpu_device_ip_late_init > >>>>> [amdgpu]] *ERROR* > >>>>> late_init of IP block <si_dpm> failed -22 [ > >>>>> 7.006919][ 7] [ T295] amdgpu 0000:04:00.0: > >>>>> amdgpu_device_ip_late_init failed [ 7.014224][ 7] [ T295] > >>>>> amdgpu > >>>>> 0000:04:00.0: Fatal error during GPU init > >>>> Memory barries are not supposed to be sprinkled around like this, > >>>> you > >> need to give a detailed explanation why this is necessary. > >>>> Regards, > >>>> Christian. > >>>> > >>>>> Signed-off-by: Zhenneng Li <lizhenneng@kylinos.cn> > >>>>> --- > >>>>> drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c | 2 ++ > >>>>> 1 file changed, 2 insertions(+) > >>>>> > >>>>> diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c > >>>>> b/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c > >>>>> index 8f994ffa9cd1..c7656f22278d 100644 > >>>>> --- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c > >>>>> +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c > >>>>> @@ -155,6 +155,8 @@ bool amdgpu_si_is_smc_running(struct > >>>>> amdgpu_device *adev) > >>>>> u32 rst = RREG32_SMC(SMC_SYSCON_RESET_CNTL); > >>>>> u32 clk = RREG32_SMC(SMC_SYSCON_CLOCK_CNTL_0); > >>>>> + mb(); > >>>>> + > >>>>> if (!(rst & RST_REG) && !(clk & CK_DISABLE)) > >>>>> return true; > >>> In particular, it makes no sense in this specific place, since it > >>> cannot directly > >> affect the values of rst & clk. > >> > >> I thinks so too. > >> > >> But when I do reboot test using nine desktop machines, there maybe > >> report this error on one or two machines after Hundreds of times or > >> Thousands of times reboot test, at the beginning, I use msleep() > >> instead of mb(), these two methods are all works, but I don't know what > is the root case. > >> > >> I use this method on other verdor's oland card, this error message > >> are reported again. > >> > >> What could be the root reason? > >> > >> test environmen: > >> > >> graphics card: OLAND 0x1002:0x6611 0x1642:0x1869 0x87 > >> > >> driver: amdgpu > >> > >> os: ubuntu 2004 > >> > >> platform: arm64 > >> > >> kernel: 5.4.18 > >>
[AMD Official Use Only - General] > -----Original Message----- > From: Lazar, Lijo <Lijo.Lazar@amd.com> > Sent: Thursday, November 24, 2022 6:49 PM > To: Quan, Evan <Evan.Quan@amd.com>; 李真能 <lizhenneng@kylinos.cn>; > Michel Dänzer <michel.daenzer@mailbox.org>; Koenig, Christian > <Christian.Koenig@amd.com>; Deucher, Alexander > <Alexander.Deucher@amd.com> > Cc: amd-gfx@lists.freedesktop.org; Pan, Xinhui <Xinhui.Pan@amd.com>; > linux-kernel@vger.kernel.org; dri-devel@lists.freedesktop.org > Subject: Re: [PATCH] drm/amdgpu: add mb for si > > > > On 11/24/2022 4:11 PM, Lazar, Lijo wrote: > > > > > > On 11/24/2022 3:34 PM, Quan, Evan wrote: > >> [AMD Official Use Only - General] > >> > >> Could the attached patch help? > >> > >> Evan > >>> -----Original Message----- > >>> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf > Of ??? > >>> Sent: Friday, November 18, 2022 5:25 PM > >>> To: Michel Dänzer <michel.daenzer@mailbox.org>; Koenig, Christian > >>> <Christian.Koenig@amd.com>; Deucher, Alexander > >>> <Alexander.Deucher@amd.com> > >>> Cc: amd-gfx@lists.freedesktop.org; Pan, Xinhui <Xinhui.Pan@amd.com>; > >>> linux-kernel@vger.kernel.org; dri-devel@lists.freedesktop.org > >>> Subject: Re: [PATCH] drm/amdgpu: add mb for si > >>> > >>> > >>> 在 2022/11/18 17:18, Michel Dänzer 写道: > >>>> On 11/18/22 09:01, Christian König wrote: > >>>>> Am 18.11.22 um 08:48 schrieb Zhenneng Li: > >>>>>> During reboot test on arm64 platform, it may failure on boot, so > >>>>>> add this mb in smc. > >>>>>> > >>>>>> The error message are as follows: > >>>>>> [ 6.996395][ 7] [ T295] [drm:amdgpu_device_ip_late_init > >>>>>> [amdgpu]] *ERROR* > >>>>>> late_init of IP block <si_dpm> failed -22 [ > >>>>>> 7.006919][ 7] [ T295] amdgpu 0000:04:00.0: > > > > The issue is happening in late_init() which eventually does > > > > ret = si_thermal_enable_alert(adev, false); > > > > Just before this, si_thermal_start_thermal_controller is called in > > hw_init and that enables thermal alert. > > > > Maybe the issue is with enable/disable of thermal alerts in quick > > succession. Adding a delay inside si_thermal_start_thermal_controller > > might help. > > > > On a second look, temperature range is already set as part of > si_thermal_start_thermal_controller in hw_init > https://elixir.bootlin.com/linux/v6.1- > rc6/source/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c#L6780 > > There is no need to set it again here - > > https://elixir.bootlin.com/linux/v6.1- > rc6/source/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c#L7635 > > I think it is safe to remove the call from late_init altogether. Alex/Evan? > [Quan, Evan] Yes, it makes sense to me. But I'm not sure whether that’s related with the issue here. Since per my understandings, if the issue is caused by double calling of thermal_alert enablement, it will fail every time. That cannot explain why adding some delays or a mb() calling can help. BR Evan > Thanks, > Lijo > > > Thanks, > > Lijo > > > >>>>>> amdgpu_device_ip_late_init failed [ 7.014224][ 7] [ T295] amdgpu > >>>>>> 0000:04:00.0: Fatal error during GPU init > >>>>> Memory barries are not supposed to be sprinkled around like this, > you > >>> need to give a detailed explanation why this is necessary. > >>>>> > >>>>> Regards, > >>>>> Christian. > >>>>> > >>>>>> Signed-off-by: Zhenneng Li <lizhenneng@kylinos.cn> > >>>>>> --- > >>>>>> drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c | 2 ++ > >>>>>> 1 file changed, 2 insertions(+) > >>>>>> > >>>>>> diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c > >>>>>> b/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c > >>>>>> index 8f994ffa9cd1..c7656f22278d 100644 > >>>>>> --- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c > >>>>>> +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c > >>>>>> @@ -155,6 +155,8 @@ bool amdgpu_si_is_smc_running(struct > >>>>>> amdgpu_device *adev) > >>>>>> u32 rst = RREG32_SMC(SMC_SYSCON_RESET_CNTL); > >>>>>> u32 clk = RREG32_SMC(SMC_SYSCON_CLOCK_CNTL_0); > >>>>>> + mb(); > >>>>>> + > >>>>>> if (!(rst & RST_REG) && !(clk & CK_DISABLE)) > >>>>>> return true; > >>>> In particular, it makes no sense in this specific place, since it > >>>> cannot directly > >>> affect the values of rst & clk. > >>> > >>> I thinks so too. > >>> > >>> But when I do reboot test using nine desktop machines, there maybe > >>> report > >>> this error on one or two machines after Hundreds of times or > >>> Thousands of > >>> times reboot test, at the beginning, I use msleep() instead of mb(), > >>> these > >>> two methods are all works, but I don't know what is the root case. > >>> > >>> I use this method on other verdor's oland card, this error message are > >>> reported again. > >>> > >>> What could be the root reason? > >>> > >>> test environmen: > >>> > >>> graphics card: OLAND 0x1002:0x6611 0x1642:0x1869 0x87 > >>> > >>> driver: amdgpu > >>> > >>> os: ubuntu 2004 > >>> > >>> platform: arm64 > >>> > >>> kernel: 5.4.18 > >>> > >>>>
On 11/25/2022 7:43 AM, Quan, Evan wrote: > [AMD Official Use Only - General] > > > >> -----Original Message----- >> From: Lazar, Lijo <Lijo.Lazar@amd.com> >> Sent: Thursday, November 24, 2022 6:49 PM >> To: Quan, Evan <Evan.Quan@amd.com>; 李真能 <lizhenneng@kylinos.cn>; >> Michel Dänzer <michel.daenzer@mailbox.org>; Koenig, Christian >> <Christian.Koenig@amd.com>; Deucher, Alexander >> <Alexander.Deucher@amd.com> >> Cc: amd-gfx@lists.freedesktop.org; Pan, Xinhui <Xinhui.Pan@amd.com>; >> linux-kernel@vger.kernel.org; dri-devel@lists.freedesktop.org >> Subject: Re: [PATCH] drm/amdgpu: add mb for si >> >> >> >> On 11/24/2022 4:11 PM, Lazar, Lijo wrote: >>> >>> On 11/24/2022 3:34 PM, Quan, Evan wrote: >>>> [AMD Official Use Only - General] >>>> >>>> Could the attached patch help? >>>> >>>> Evan >>>>> -----Original Message----- >>>>> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf >> Of ??? >>>>> Sent: Friday, November 18, 2022 5:25 PM >>>>> To: Michel Dänzer <michel.daenzer@mailbox.org>; Koenig, Christian >>>>> <Christian.Koenig@amd.com>; Deucher, Alexander >>>>> <Alexander.Deucher@amd.com> >>>>> Cc: amd-gfx@lists.freedesktop.org; Pan, Xinhui <Xinhui.Pan@amd.com>; >>>>> linux-kernel@vger.kernel.org; dri-devel@lists.freedesktop.org >>>>> Subject: Re: [PATCH] drm/amdgpu: add mb for si >>>>> >>>>> >>>>> 在 2022/11/18 17:18, Michel Dänzer 写道: >>>>>> On 11/18/22 09:01, Christian König wrote: >>>>>>> Am 18.11.22 um 08:48 schrieb Zhenneng Li: >>>>>>>> During reboot test on arm64 platform, it may failure on boot, so >>>>>>>> add this mb in smc. >>>>>>>> >>>>>>>> The error message are as follows: >>>>>>>> [ 6.996395][ 7] [ T295] [drm:amdgpu_device_ip_late_init >>>>>>>> [amdgpu]] *ERROR* >>>>>>>> late_init of IP block <si_dpm> failed -22 [ >>>>>>>> 7.006919][ 7] [ T295] amdgpu 0000:04:00.0: >>> The issue is happening in late_init() which eventually does >>> >>> ret = si_thermal_enable_alert(adev, false); >>> >>> Just before this, si_thermal_start_thermal_controller is called in >>> hw_init and that enables thermal alert. >>> >>> Maybe the issue is with enable/disable of thermal alerts in quick >>> succession. Adding a delay inside si_thermal_start_thermal_controller >>> might help. >>> >> On a second look, temperature range is already set as part of >> si_thermal_start_thermal_controller in hw_init >> https://elixir.bootlin.com/linux/v6.1- >> rc6/source/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c#L6780 >> >> There is no need to set it again here - >> >> https://elixir.bootlin.com/linux/v6.1- >> rc6/source/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c#L7635 >> >> I think it is safe to remove the call from late_init altogether. Alex/Evan? >> > [Quan, Evan] Yes, it makes sense to me. But I'm not sure whether that’s related with the issue here. > Since per my understandings, if the issue is caused by double calling of thermal_alert enablement, it will fail every time. > That cannot explain why adding some delays or a mb() calling can help. The side effect of the patch is just some random delay introduced for every SMC message The issue happens in late_init(). Between late_init() and dpm enablement, there are many smc messages sent which don't have this issue. So I think the issue is not with FW not running. Thus the only case I see is enable/disable of thermal alert in random succession. Thanks, Lijo > BR > Evan >> Thanks, >> Lijo >> >>> Thanks, >>> Lijo >>> >>>>>>>> amdgpu_device_ip_late_init failed [ 7.014224][ 7] [ T295] amdgpu >>>>>>>> 0000:04:00.0: Fatal error during GPU init >>>>>>> Memory barries are not supposed to be sprinkled around like this, >> you >>>>> need to give a detailed explanation why this is necessary. >>>>>>> Regards, >>>>>>> Christian. >>>>>>> >>>>>>>> Signed-off-by: Zhenneng Li <lizhenneng@kylinos.cn> >>>>>>>> --- >>>>>>>> drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c | 2 ++ >>>>>>>> 1 file changed, 2 insertions(+) >>>>>>>> >>>>>>>> diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c >>>>>>>> b/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c >>>>>>>> index 8f994ffa9cd1..c7656f22278d 100644 >>>>>>>> --- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c >>>>>>>> +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c >>>>>>>> @@ -155,6 +155,8 @@ bool amdgpu_si_is_smc_running(struct >>>>>>>> amdgpu_device *adev) >>>>>>>> u32 rst = RREG32_SMC(SMC_SYSCON_RESET_CNTL); >>>>>>>> u32 clk = RREG32_SMC(SMC_SYSCON_CLOCK_CNTL_0); >>>>>>>> + mb(); >>>>>>>> + >>>>>>>> if (!(rst & RST_REG) && !(clk & CK_DISABLE)) >>>>>>>> return true; >>>>>> In particular, it makes no sense in this specific place, since it >>>>>> cannot directly >>>>> affect the values of rst & clk. >>>>> >>>>> I thinks so too. >>>>> >>>>> But when I do reboot test using nine desktop machines, there maybe >>>>> report >>>>> this error on one or two machines after Hundreds of times or >>>>> Thousands of >>>>> times reboot test, at the beginning, I use msleep() instead of mb(), >>>>> these >>>>> two methods are all works, but I don't know what is the root case. >>>>> >>>>> I use this method on other verdor's oland card, this error message are >>>>> reported again. >>>>> >>>>> What could be the root reason? >>>>> >>>>> test environmen: >>>>> >>>>> graphics card: OLAND 0x1002:0x6611 0x1642:0x1869 0x87 >>>>> >>>>> driver: amdgpu >>>>> >>>>> os: ubuntu 2004 >>>>> >>>>> platform: arm64 >>>>> >>>>> kernel: 5.4.18 >>>>>
diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c b/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c index 8f994ffa9cd1..c7656f22278d 100644 --- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_smc.c @@ -155,6 +155,8 @@ bool amdgpu_si_is_smc_running(struct amdgpu_device *adev) u32 rst = RREG32_SMC(SMC_SYSCON_RESET_CNTL); u32 clk = RREG32_SMC(SMC_SYSCON_CLOCK_CNTL_0); + mb(); + if (!(rst & RST_REG) && !(clk & CK_DISABLE)) return true;