[5.10] coresight: cti: Fix hang in cti_disable_hw()

Message ID 20221102112003.2318583-1-james.clark@arm.com
State New
Headers
Series [5.10] coresight: cti: Fix hang in cti_disable_hw() |

Commit Message

James Clark Nov. 2, 2022, 11:20 a.m. UTC
  commit 6746eae4bbaddcc16b40efb33dab79210828b3ce upstream.

cti_enable_hw() and cti_disable_hw() are called from an atomic context
so shouldn't use runtime PM because it can result in a sleep when
communicating with firmware.

This can cause a hang when running the Perf Coresight tests or running
this command:

  perf record -e cs_etm//u -- ls

With lock and scheduler debugging enabled the following is output:

   coresight cti_sys0: cti_enable_hw -- dev:cti_sys0  parent: 20020000.cti
   BUG: sleeping function called from invalid context at drivers/base/power/runtime.c:1151
   in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 330, name: perf-exec
   preempt_count: 2, expected: 0
   RCU nest depth: 0, expected: 0
   INFO: lockdep is turned off.
   irq event stamp: 0
   hardirqs last  enabled at (0): [<0000000000000000>] 0x0
   hardirqs last disabled at (0): [<ffff80000822b394>] copy_process+0xa0c/0x1948
   softirqs last  enabled at (0): [<ffff80000822b394>] copy_process+0xa0c/0x1948
   softirqs last disabled at (0): [<0000000000000000>] 0x0
   CPU: 3 PID: 330 Comm: perf-exec Not tainted 6.0.0-00053-g042116d99298 #7
   Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform, BIOS EDK II Sep 13 2022
   Call trace:
    dump_backtrace+0x134/0x140
    show_stack+0x20/0x58
    dump_stack_lvl+0x8c/0xb8
    dump_stack+0x18/0x34
    __might_resched+0x180/0x228
    __might_sleep+0x50/0x88
    __pm_runtime_resume+0xac/0xb0
    cti_enable+0x44/0x120
    coresight_control_assoc_ectdev+0xc0/0x150
    coresight_enable_path+0xb4/0x288
    etm_event_start+0x138/0x170
    etm_event_add+0x48/0x70
    event_sched_in.isra.122+0xb4/0x280
    merge_sched_in+0x1fc/0x3d0
    visit_groups_merge.constprop.137+0x16c/0x4b0
    ctx_sched_in+0x114/0x1f0
    perf_event_sched_in+0x60/0x90
    ctx_resched+0x68/0xb0
    perf_event_exec+0x138/0x508
    begin_new_exec+0x52c/0xd40
    load_elf_binary+0x6b8/0x17d0
    bprm_execve+0x360/0x7f8
    do_execveat_common.isra.47+0x218/0x238
    __arm64_sys_execve+0x48/0x60
    invoke_syscall+0x4c/0x110
    el0_svc_common.constprop.4+0xfc/0x120
    do_el0_svc+0x34/0xc0
    el0_svc+0x40/0x98
    el0t_64_sync_handler+0x98/0xc0
    el0t_64_sync+0x170/0x174

Fix the issue by removing the runtime PM calls completely. They are not
needed here because it must have already been done when building the
path for a trace.

Fixes: 835d722ba10a ("coresight: cti: Initial CoreSight CTI Driver")
Cc: stable <stable@kernel.org> # 5.10.x
Reported-by: Aishwarya TCV <Aishwarya.TCV@arm.com>
Reported-by: Cristian Marussi <Cristian.Marussi@arm.com>
Suggested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Tested-by: Mike Leach <mike.leach@linaro.org>
[ Fix build warnings ]
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20221025131032.1149459-1-suzuki.poulose@arm.com
Signed-off-by: James Clark <james.clark@arm.com>
---
 drivers/hwtracing/coresight/coresight-cti-core.c | 5 -----
 1 file changed, 5 deletions(-)
  

Comments

Greg KH Nov. 7, 2022, 9:11 a.m. UTC | #1
On Wed, Nov 02, 2022 at 11:20:03AM +0000, James Clark wrote:
> commit 6746eae4bbaddcc16b40efb33dab79210828b3ce upstream.

Isn't this commit 665c157e0204176023860b51a46528ba0ba62c33 instead?

confused,

greg k-h
  
Suzuki K Poulose Nov. 7, 2022, 9:23 a.m. UTC | #2
On 07/11/2022 09:11, Greg Kroah-Hartman wrote:
> On Wed, Nov 02, 2022 at 11:20:03AM +0000, James Clark wrote:
>> commit 6746eae4bbaddcc16b40efb33dab79210828b3ce upstream.
> 
> Isn't this commit 665c157e0204176023860b51a46528ba0ba62c33 instead?

This was reverted in commit d76308f03ee1 and pushed in later
with fixups as 6746eae4bbadd.

Cheers
Suzuki

> 
> confused,
> 
> greg k-h
  
Greg KH Nov. 7, 2022, 9:52 a.m. UTC | #3
On Mon, Nov 07, 2022 at 09:23:26AM +0000, Suzuki K Poulose wrote:
> On 07/11/2022 09:11, Greg Kroah-Hartman wrote:
> > On Wed, Nov 02, 2022 at 11:20:03AM +0000, James Clark wrote:
> > > commit 6746eae4bbaddcc16b40efb33dab79210828b3ce upstream.
> > 
> > Isn't this commit 665c157e0204176023860b51a46528ba0ba62c33 instead?
> 
> This was reverted in commit d76308f03ee1 and pushed in later
> with fixups as 6746eae4bbadd.

So which should be applied?

confused,

greg k-h
  
Suzuki K Poulose Nov. 7, 2022, 9:59 a.m. UTC | #4
On 07/11/2022 09:52, Greg Kroah-Hartman wrote:
> On Mon, Nov 07, 2022 at 09:23:26AM +0000, Suzuki K Poulose wrote:
>> On 07/11/2022 09:11, Greg Kroah-Hartman wrote:
>>> On Wed, Nov 02, 2022 at 11:20:03AM +0000, James Clark wrote:
>>>> commit 6746eae4bbaddcc16b40efb33dab79210828b3ce upstream.
>>>
>>> Isn't this commit 665c157e0204176023860b51a46528ba0ba62c33 instead?
>>
>> This was reverted in commit d76308f03ee1 and pushed in later
>> with fixups as 6746eae4bbadd.
> 
> So which should be applied?

Sorry, this particular post is a backport for v5.10 stable branch.

> 
> confused,

Now I am too. What is expected here ? My understanding is, we
should stick the "upstream" commit that is getting backported
at the beginning of the commit description. In that sense,
the commit id in this patch looks correct to me. Please let
me know if this is not the case.

As such, this is only for 5.10.x branch. The rest are taken
care of.

Please let us know if we are something missing.

Suzuki


> 
> greg k-h
  
Greg KH Nov. 7, 2022, 10:24 a.m. UTC | #5
On Mon, Nov 07, 2022 at 09:59:24AM +0000, Suzuki K Poulose wrote:
> On 07/11/2022 09:52, Greg Kroah-Hartman wrote:
> > On Mon, Nov 07, 2022 at 09:23:26AM +0000, Suzuki K Poulose wrote:
> > > On 07/11/2022 09:11, Greg Kroah-Hartman wrote:
> > > > On Wed, Nov 02, 2022 at 11:20:03AM +0000, James Clark wrote:
> > > > > commit 6746eae4bbaddcc16b40efb33dab79210828b3ce upstream.
> > > > 
> > > > Isn't this commit 665c157e0204176023860b51a46528ba0ba62c33 instead?
> > > 
> > > This was reverted in commit d76308f03ee1 and pushed in later
> > > with fixups as 6746eae4bbadd.
> > 
> > So which should be applied?
> 
> Sorry, this particular post is a backport for v5.10 stable branch.
> 
> > 
> > confused,
> 
> Now I am too. What is expected here ? My understanding is, we
> should stick the "upstream" commit that is getting backported
> at the beginning of the commit description. In that sense,
> the commit id in this patch looks correct to me. Please let
> me know if this is not the case.
> 
> As such, this is only for 5.10.x branch. The rest are taken
> care of.
> 
> Please let us know if we are something missing.

We already have commit 665c157e0204176023860b51a46528ba0ba62c33 queued
up in the 5.10 stable queue.  Is that not correct?  It has the same
subject line as this one.

Still confused.

thanks,

greg k-h
  
Suzuki K Poulose Nov. 7, 2022, 11:15 a.m. UTC | #6
Hi Greg,

On 07/11/2022 10:24, Greg Kroah-Hartman wrote:
> On Mon, Nov 07, 2022 at 09:59:24AM +0000, Suzuki K Poulose wrote:
>> On 07/11/2022 09:52, Greg Kroah-Hartman wrote:
>>> On Mon, Nov 07, 2022 at 09:23:26AM +0000, Suzuki K Poulose wrote:
>>>> On 07/11/2022 09:11, Greg Kroah-Hartman wrote:
>>>>> On Wed, Nov 02, 2022 at 11:20:03AM +0000, James Clark wrote:
>>>>>> commit 6746eae4bbaddcc16b40efb33dab79210828b3ce upstream.
>>>>>
>>>>> Isn't this commit 665c157e0204176023860b51a46528ba0ba62c33 instead?
>>>>
>>>> This was reverted in commit d76308f03ee1 and pushed in later
>>>> with fixups as 6746eae4bbadd.
>>>
>>> So which should be applied?
>>
>> Sorry, this particular post is a backport for v5.10 stable branch.
>>
>>>
>>> confused,
>>
>> Now I am too. What is expected here ? My understanding is, we
>> should stick the "upstream" commit that is getting backported
>> at the beginning of the commit description. In that sense,
>> the commit id in this patch looks correct to me. Please let
>> me know if this is not the case.
>>
>> As such, this is only for 5.10.x branch. The rest are taken
>> care of.
>>
>> Please let us know if we are something missing.
> 
> We already have commit 665c157e0204176023860b51a46528ba0ba62c33 queued
> up in the 5.10 stable queue.  Is that not correct?  It has the same

We pushed the fix as part of the coresight fixes for 6.1 [0], as

commit: 6746eae4bbad "coresight: cti: Fix hang in cti_disable_hw()"


But, the version in there, produced a build warning with "unused
variable" (with CONFIG_WERROR) [1] and thus was reverted in [2],

commit: d76308f03ee1: Revert "coresight: cti: Fix hang in cti_disable_hw()"


Later, we sent you the corrected fix separately [3], which was queued as
commit "6746eae4bbadd".

6746eae4bbad coresight: cti: Fix hang in cti_disable_hw()

So, in effect, here is what we have in the tree :

$ git log --oneline | grep "cti: Fix"
6746eae4bbad coresight: cti: Fix hang in cti_disable_hw()
d76308f03ee1 Revert "coresight: cti: Fix hang in cti_disable_hw()"
665c157e0204 coresight: cti: Fix hang in cti_disable_hw()

> subject line as this one.

I understand the "same" subject line has caused this confusion. Will
modify it in the future avoid this confusion.

So, kindly drop "665c157e0204" from the queue for 5.10, it would fail to
apply there anyway so does the correct "6746eae4bbad". This backport
is appropriate for 5.10 branch, with the correct version.

I have double checked the versions pulled into 6.0.x [4] and 5.15.x [5] 
branches and they all have the correct one (6746eae4bbad) applied.

[0] https://lkml.kernel.org/r/16664341837810@kroah.com
[1] https://lkml.kernel.org/r/20221024135752.2b83af97@canb.auug.org.au
[2] https://lkml.kernel.org/r/166659326494120@kroah.com
[3] https://lkml.kernel.org/r/1666717705115206@kroah.com
[4] https://lkml.kernel.org/r/166719866563237@kroah.com
[5] https://lkml.kernel.org/r/16671983698786@kroah.com


Hope this helps.

Suzuki

> 
> Still confused.
> 
> thanks,
> 
> greg k-h
  
Greg KH Nov. 7, 2022, 12:19 p.m. UTC | #7
On Mon, Nov 07, 2022 at 11:15:35AM +0000, Suzuki K Poulose wrote:
> Hi Greg,
> 
> On 07/11/2022 10:24, Greg Kroah-Hartman wrote:
> > On Mon, Nov 07, 2022 at 09:59:24AM +0000, Suzuki K Poulose wrote:
> > > On 07/11/2022 09:52, Greg Kroah-Hartman wrote:
> > > > On Mon, Nov 07, 2022 at 09:23:26AM +0000, Suzuki K Poulose wrote:
> > > > > On 07/11/2022 09:11, Greg Kroah-Hartman wrote:
> > > > > > On Wed, Nov 02, 2022 at 11:20:03AM +0000, James Clark wrote:
> > > > > > > commit 6746eae4bbaddcc16b40efb33dab79210828b3ce upstream.
> > > > > > 
> > > > > > Isn't this commit 665c157e0204176023860b51a46528ba0ba62c33 instead?
> > > > > 
> > > > > This was reverted in commit d76308f03ee1 and pushed in later
> > > > > with fixups as 6746eae4bbadd.
> > > > 
> > > > So which should be applied?
> > > 
> > > Sorry, this particular post is a backport for v5.10 stable branch.
> > > 
> > > > 
> > > > confused,
> > > 
> > > Now I am too. What is expected here ? My understanding is, we
> > > should stick the "upstream" commit that is getting backported
> > > at the beginning of the commit description. In that sense,
> > > the commit id in this patch looks correct to me. Please let
> > > me know if this is not the case.
> > > 
> > > As such, this is only for 5.10.x branch. The rest are taken
> > > care of.
> > > 
> > > Please let us know if we are something missing.
> > 
> > We already have commit 665c157e0204176023860b51a46528ba0ba62c33 queued
> > up in the 5.10 stable queue.  Is that not correct?  It has the same
> 
> We pushed the fix as part of the coresight fixes for 6.1 [0], as
> 
> commit: 6746eae4bbad "coresight: cti: Fix hang in cti_disable_hw()"
> 
> 
> But, the version in there, produced a build warning with "unused
> variable" (with CONFIG_WERROR) [1] and thus was reverted in [2],
> 
> commit: d76308f03ee1: Revert "coresight: cti: Fix hang in cti_disable_hw()"
> 
> 
> Later, we sent you the corrected fix separately [3], which was queued as
> commit "6746eae4bbadd".
> 
> 6746eae4bbad coresight: cti: Fix hang in cti_disable_hw()
> 
> So, in effect, here is what we have in the tree :
> 
> $ git log --oneline | grep "cti: Fix"
> 6746eae4bbad coresight: cti: Fix hang in cti_disable_hw()
> d76308f03ee1 Revert "coresight: cti: Fix hang in cti_disable_hw()"
> 665c157e0204 coresight: cti: Fix hang in cti_disable_hw()
> 
> > subject line as this one.
> 
> I understand the "same" subject line has caused this confusion. Will
> modify it in the future avoid this confusion.
> 
> So, kindly drop "665c157e0204" from the queue for 5.10, it would fail to
> apply there anyway so does the correct "6746eae4bbad". This backport
> is appropriate for 5.10 branch, with the correct version.

Ok, I dropped 665c157e0204 from the queue, but note that it _was_
backported there properly.  But only there, which is odd, Sasha's
scripts might not have caught up.

I'll queue this one up now instead.

thanks,

greg k-h
  

Patch

diff --git a/drivers/hwtracing/coresight/coresight-cti-core.c b/drivers/hwtracing/coresight/coresight-cti-core.c
index 0276700c246d..90270696206c 100644
--- a/drivers/hwtracing/coresight/coresight-cti-core.c
+++ b/drivers/hwtracing/coresight/coresight-cti-core.c
@@ -90,11 +90,9 @@  void cti_write_all_hw_regs(struct cti_drvdata *drvdata)
 static int cti_enable_hw(struct cti_drvdata *drvdata)
 {
 	struct cti_config *config = &drvdata->config;
-	struct device *dev = &drvdata->csdev->dev;
 	unsigned long flags;
 	int rc = 0;
 
-	pm_runtime_get_sync(dev->parent);
 	spin_lock_irqsave(&drvdata->spinlock, flags);
 
 	/* no need to do anything if enabled or unpowered*/
@@ -119,7 +117,6 @@  static int cti_enable_hw(struct cti_drvdata *drvdata)
 	/* cannot enable due to error */
 cti_err_not_enabled:
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);
-	pm_runtime_put(dev->parent);
 	return rc;
 }
 
@@ -153,7 +150,6 @@  static void cti_cpuhp_enable_hw(struct cti_drvdata *drvdata)
 static int cti_disable_hw(struct cti_drvdata *drvdata)
 {
 	struct cti_config *config = &drvdata->config;
-	struct device *dev = &drvdata->csdev->dev;
 
 	spin_lock(&drvdata->spinlock);
 
@@ -174,7 +170,6 @@  static int cti_disable_hw(struct cti_drvdata *drvdata)
 	coresight_disclaim_device_unlocked(drvdata->base);
 	CS_LOCK(drvdata->base);
 	spin_unlock(&drvdata->spinlock);
-	pm_runtime_put(dev->parent);
 	return 0;
 
 	/* not disabled this call */