diff mbox series

perf/arm-cmn: Fix DTC reset

Message ID	f0b61513276ee2c448ae02a6840135571039cea7.1680792373.git.robin.murphy@arm.com
State	New
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; From: Robin Murphy <robin.murphy@arm.com> To: will@kernel.org Cc: mark.rutland@arm.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, ilkka@os.amperecomputing.com, Geoff Blake <blakgeof@amazon.com> Subject: [PATCH] perf/arm-cmn: Fix DTC reset Date: Thu, 6 Apr 2023 15:49:15 +0100 Message-Id: <f0b61513276ee2c448ae02a6840135571039cea7.1680792373.git.robin.murphy@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	perf/arm-cmn: Fix DTC reset \| perf/arm-cmn: Fix DTC reset

Commit Message

Robin Murphy April 6, 2023, 2:49 p.m. UTC

  It turns out that my naive DTC reset logic fails to work as intended,
since clearing PMCR.PMU_EN appears to result in writes to PMOVSR_CLR
being ignored, while some hard-to-characterise combination of conditions
(differently between DTC0 and secondary DTCs) also appears to result in
PMOVSR reading as zero even when an overflow remains asserted. Thus
rather than resetting the PMU to a nice clean state, we can currently
end up with screaming spurious interrupts from secondary DTCs which we
can neither see nor clear. This behaviour is of course not documented.

Resetting PMCR to disable the interrupt output but enable the PMU itself
seems to at least make the PMOVSR_CLR write work as expected on DTC0
(although it looks like writing to PMCR twice has actually been having
some hidden side-effect of clearing any pending overflows there).
Unfortunately this still does not seem to help secondary DTCs, but going
beyond PMU scope and additionally resetting DTC_CTL does seems to make
everything work out, and superficially looks sensible. Therefore pile
that onto the house of empirical cards too, until I can check with the
hardware team whether there's actually any proper recommended way of
recovering from an arbitrary PMU state after an oops/kexec/whatever.

Fixes: 0ba64770a2f2 ("perf: Add Arm CMN-600 PMU driver")
Reported-by: Geoff Blake <blakgeof@amazon.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
This supersedes the previous shutdown/IRQ patches, now that I've
finally managed to make *some* sense of what's really going on. If
anyone's interested, this is the contrivance I used for testing:

https://gitlab.arm.com/linux-arm/linux-rm/-/commit/d8f1035c5bc510516d6e4f0b7bf0b875a749daf7
---
 drivers/perf/arm-cmn.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Geoff Blake April 6, 2023, 9:25 p.m. UTC | #1

Ran this patch on an AWS C6g.metal and unfortunately still see the 
spurious IRQs trigger quickly (within 10 tries) when using the following 
flow:

perf stat -a -e arm_cmn_0/event=0x5,type=0x5/ -- sleep 600
kexec -e 

Adding in the simple shutdown routine, I have run over 100 of the 
above cycles and the spurious IRQs haven't triggered.  I think we still 
need both for now.

-Geoff  

On Thu, 6 Apr 2023, Robin Murphy wrote:

> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> It turns out that my naive DTC reset logic fails to work as intended,
> since clearing PMCR.PMU_EN appears to result in writes to PMOVSR_CLR
> being ignored, while some hard-to-characterise combination of conditions
> (differently between DTC0 and secondary DTCs) also appears to result in
> PMOVSR reading as zero even when an overflow remains asserted. Thus
> rather than resetting the PMU to a nice clean state, we can currently
> end up with screaming spurious interrupts from secondary DTCs which we
> can neither see nor clear. This behaviour is of course not documented.
> 
> Resetting PMCR to disable the interrupt output but enable the PMU itself
> seems to at least make the PMOVSR_CLR write work as expected on DTC0
> (although it looks like writing to PMCR twice has actually been having
> some hidden side-effect of clearing any pending overflows there).
> Unfortunately this still does not seem to help secondary DTCs, but going
> beyond PMU scope and additionally resetting DTC_CTL does seems to make
> everything work out, and superficially looks sensible. Therefore pile
> that onto the house of empirical cards too, until I can check with the
> hardware team whether there's actually any proper recommended way of
> recovering from an arbitrary PMU state after an oops/kexec/whatever.
> 
> Fixes: 0ba64770a2f2 ("perf: Add Arm CMN-600 PMU driver")
> Reported-by: Geoff Blake <blakgeof@amazon.com>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
> This supersedes the previous shutdown/IRQ patches, now that I've
> finally managed to make *some* sense of what's really going on. If
> anyone's interested, this is the contrivance I used for testing:
> 
> https://gitlab.arm.com/linux-arm/linux-rm/-/commit/d8f1035c5bc510516d6e4f0b7bf0b875a749daf7
> ---
>  drivers/perf/arm-cmn.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
> index 144cc08d9e04..81fe01171e33 100644
> --- a/drivers/perf/arm-cmn.c
> +++ b/drivers/perf/arm-cmn.c
> @@ -1899,7 +1899,10 @@ static int arm_cmn_init_dtc(struct arm_cmn *cmn, struct arm_cmn_node *dn, int id
>         if (dtc->irq < 0)
>                 return dtc->irq;
> 
> -       writel_relaxed(0, dtc->base + CMN_DT_PMCR);
> +       if (idx == 0)
> +               writel_relaxed(0, dtc->base + CMN_DT_DTC_CTL);
> +
> +       writel_relaxed(CMN_DT_PMCR_PMU_EN, dtc->base + CMN_DT_PMCR);
>         writel_relaxed(0x1ff, dtc->base + CMN_DT_PMOVSR_CLR);
>         writel_relaxed(CMN_DT_PMCR_OVFL_INTR_EN, dtc->base + CMN_DT_PMCR);
> 
> --
> 2.39.2.101.g768bb238c484.dirty
> 
>

Will Deacon April 14, 2023, 12:42 p.m. UTC | #2

Hi Geoff,

On Thu, Apr 06, 2023 at 04:25:39PM -0500, Geoff Blake wrote:
> Ran this patch on an AWS C6g.metal and unfortunately still see the 
> spurious IRQs trigger quickly (within 10 tries) when using the following 
> flow:
> 
> perf stat -a -e arm_cmn_0/event=0x5,type=0x5/ -- sleep 600
> kexec -e 
> 
> Adding in the simple shutdown routine, I have run over 100 of the 
> above cycles and the spurious IRQs haven't triggered.  I think we still 
> need both for now.

Does that mean you're happy for me to queue this change as-is?

Will

Geoff Blake April 14, 2023, 2:04 p.m. UTC | #3

Sorry for the confusion Will, but no, the patch does not solve the issue, 
the spurious interrupt storm still exists.  The patch needs the 
additional shutdown routine Robin had in his older set.

Geoff

Robin Murphy April 14, 2023, 2:19 p.m. UTC | #4

On 2023-04-06 22:25, Geoff Blake wrote:
> Ran this patch on an AWS C6g.metal and unfortunately still see the
> spurious IRQs trigger quickly (within 10 tries) when using the following
> flow:
> 
> perf stat -a -e arm_cmn_0/event=0x5,type=0x5/ -- sleep 600
> kexec -e
> 
> Adding in the simple shutdown routine, I have run over 100 of the
> above cycles and the spurious IRQs haven't triggered.  I think we still
> need both for now.

There is no "need both" - if this patch doesn't work to reset the PMU as 
intended then we still need a better patch that does. After yet more 
trying, I still cannot reproduce your results, but I do suspect this 
patch isn't as good as it initially seemed.

I got my hands on a C6g.metal instance, and I'm building the mainline 
version of arm-cmn.c from my cmn-dev branch (including the two other 
pending fixes that I've sent recently) against the 5.15.0-1031-aws 
kernel that it came with, as a standalone module with a trivial 
makefile. Even running "stress -m 60" in the background, as the most 
effective thing I've found so far, that hnf_pocq_reqs_recvd event takes 
well over 8 minutes to overflow, so I have failed to achieve the 
necessary timing to kexec at just the right point for the residual 
interconnect traffic to add up and overflow the event during the handful 
of seconds that the kexec takes. For completeness, I have managed to run 
the perf stat/kexec, then run stress for 10 minutes under the new 
kernel, *then* finally load the module to achieve the right conditions, 
but that's so utterly contrived and long-winded that I don't really have 
the patience to do it more than the twice that I already did.

What I can do instantly and reliably is reproduce equivalent conditions 
with my (now even more stripped-down) remove hack[1] and a simple 
rmmod/insmod (with a few seconds in between for good measure), leading 
to demonstrable latent overflows on all 4 DTCs every time. The existing 
code does seem to manage to reset DTC0 such that its interrupt (IRQ 27) 
does not fire, consistent with what I've observed on other machines, 
while I see the secondary DTCs (IRQs 28, 29 and 30) each fire 100000 
times spuriously and get disabled. With this patch on top[2], that 
consistently does not happen over 100 unload/reload cycles.

Given that you say the same write to clear DTC_CTL, but a few seconds 
earlier in the form of the shutdown hook, does seem to work, I have 
still been wary of some kind of weird timing issue all along, but the 
fact that I was getting such consistent behaviour even on C6g seemed to 
be pointing away from that :/

The closest I've got so far is by leaving this even more involved test 
loop (with real PMU programming in between) running overnight:

for i in {1..10000}; do sudo insmod arm-cmn.ko && sudo perf stat -e 
arm_cmn_0/eventid=5,type=5/ sleep 1 && sudo rmmod arm-cmn && sleep 4; done

and now coming back to find /proc/interrupts saying this:

  27:          1          0          0...
  28:          1          0          0...
  29:          2          0          0...
  30:          1          0          0...

I've quite often seen a single IRQ firing earlier than expected (not 
necessarily spuriously), so I still need to check what's up with that - 
it may be that writing to the counters doesn't always take either. 
However, the single extra incidence of IRQ 29 which has happened at some 
point after I went home is more of a smoking gun:

[84581.790043] WARNING: CPU: 0 PID: 0 at /home/ubuntu/arm-cmn.c:1828 
arm_cmn_handle_irq+0x148/0x1cc [arm_cmn]

So something still snuck through reset, but it *was* at least visible 
and clearable by the time the IRQ was enabled. Interestingly the other 
warning for !dtc->cycles did not fire at the same time, even though the 
hack normally overflows PMCCNTR before PMEVCNTR(0). I'll keep digging...

Thanks,
Robin.

[1] 
https://gitlab.arm.com/linux-arm/linux-rm/-/commit/d3bdc783411fd71d5948ce7c7e7fa6cc6b388b6c
[2] 
https://gitlab.arm.com/linux-arm/linux-rm/-/commit/c742e8be1a783430151828ed27287ad3d61ff9d1

> 
> -Geoff
> 
> On Thu, 6 Apr 2023, Robin Murphy wrote:
> 
>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>>
>>
>>
>> It turns out that my naive DTC reset logic fails to work as intended,
>> since clearing PMCR.PMU_EN appears to result in writes to PMOVSR_CLR
>> being ignored, while some hard-to-characterise combination of conditions
>> (differently between DTC0 and secondary DTCs) also appears to result in
>> PMOVSR reading as zero even when an overflow remains asserted. Thus
>> rather than resetting the PMU to a nice clean state, we can currently
>> end up with screaming spurious interrupts from secondary DTCs which we
>> can neither see nor clear. This behaviour is of course not documented.
>>
>> Resetting PMCR to disable the interrupt output but enable the PMU itself
>> seems to at least make the PMOVSR_CLR write work as expected on DTC0
>> (although it looks like writing to PMCR twice has actually been having
>> some hidden side-effect of clearing any pending overflows there).
>> Unfortunately this still does not seem to help secondary DTCs, but going
>> beyond PMU scope and additionally resetting DTC_CTL does seems to make
>> everything work out, and superficially looks sensible. Therefore pile
>> that onto the house of empirical cards too, until I can check with the
>> hardware team whether there's actually any proper recommended way of
>> recovering from an arbitrary PMU state after an oops/kexec/whatever.
>>
>> Fixes: 0ba64770a2f2 ("perf: Add Arm CMN-600 PMU driver")
>> Reported-by: Geoff Blake <blakgeof@amazon.com>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> ---
>> This supersedes the previous shutdown/IRQ patches, now that I've
>> finally managed to make *some* sense of what's really going on. If
>> anyone's interested, this is the contrivance I used for testing:
>>
>> https://gitlab.arm.com/linux-arm/linux-rm/-/commit/d8f1035c5bc510516d6e4f0b7bf0b875a749daf7
>> ---
>>   drivers/perf/arm-cmn.c | 5 ++++-
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
>> index 144cc08d9e04..81fe01171e33 100644
>> --- a/drivers/perf/arm-cmn.c
>> +++ b/drivers/perf/arm-cmn.c
>> @@ -1899,7 +1899,10 @@ static int arm_cmn_init_dtc(struct arm_cmn *cmn, struct arm_cmn_node *dn, int id
>>          if (dtc->irq < 0)
>>                  return dtc->irq;
>>
>> -       writel_relaxed(0, dtc->base + CMN_DT_PMCR);
>> +       if (idx == 0)
>> +               writel_relaxed(0, dtc->base + CMN_DT_DTC_CTL);
>> +
>> +       writel_relaxed(CMN_DT_PMCR_PMU_EN, dtc->base + CMN_DT_PMCR);
>>          writel_relaxed(0x1ff, dtc->base + CMN_DT_PMOVSR_CLR);
>>          writel_relaxed(CMN_DT_PMCR_OVFL_INTR_EN, dtc->base + CMN_DT_PMCR);
>>
>> --
>> 2.39.2.101.g768bb238c484.dirty
>>
>>

Robin Murphy April 25, 2023, 3:49 p.m. UTC | #5

On 14/04/2023 3:19 pm, Robin Murphy wrote:
> On 2023-04-06 22:25, Geoff Blake wrote:
>> Ran this patch on an AWS C6g.metal and unfortunately still see the
>> spurious IRQs trigger quickly (within 10 tries) when using the following
>> flow:
>>
>> perf stat -a -e arm_cmn_0/event=0x5,type=0x5/ -- sleep 600
>> kexec -e
>>
>> Adding in the simple shutdown routine, I have run over 100 of the
>> above cycles and the spurious IRQs haven't triggered.  I think we still
>> need both for now.
> 
> There is no "need both" - if this patch doesn't work to reset the PMU as 
> intended then we still need a better patch that does. After yet more 
> trying, I still cannot reproduce your results, but I do suspect this 
> patch isn't as good as it initially seemed.
> 
> I got my hands on a C6g.metal instance, and I'm building the mainline 
> version of arm-cmn.c from my cmn-dev branch (including the two other 
> pending fixes that I've sent recently) against the 5.15.0-1031-aws 
> kernel that it came with, as a standalone module with a trivial 
> makefile. Even running "stress -m 60" in the background, as the most 
> effective thing I've found so far, that hnf_pocq_reqs_recvd event takes 
> well over 8 minutes to overflow, so I have failed to achieve the 
> necessary timing to kexec at just the right point for the residual 
> interconnect traffic to add up and overflow the event during the handful 
> of seconds that the kexec takes. For completeness, I have managed to run 
> the perf stat/kexec, then run stress for 10 minutes under the new 
> kernel, *then* finally load the module to achieve the right conditions, 
> but that's so utterly contrived and long-winded that I don't really have 
> the patience to do it more than the twice that I already did.
> 
> What I can do instantly and reliably is reproduce equivalent conditions 
> with my (now even more stripped-down) remove hack[1] and a simple 
> rmmod/insmod (with a few seconds in between for good measure), leading 
> to demonstrable latent overflows on all 4 DTCs every time. The existing 
> code does seem to manage to reset DTC0 such that its interrupt (IRQ 27) 
> does not fire, consistent with what I've observed on other machines, 
> while I see the secondary DTCs (IRQs 28, 29 and 30) each fire 100000 
> times spuriously and get disabled. With this patch on top[2], that 
> consistently does not happen over 100 unload/reload cycles.
> 
> Given that you say the same write to clear DTC_CTL, but a few seconds 
> earlier in the form of the shutdown hook, does seem to work, I have 
> still been wary of some kind of weird timing issue all along, but the 
> fact that I was getting such consistent behaviour even on C6g seemed to 
> be pointing away from that :/
> 
> The closest I've got so far is by leaving this even more involved test 
> loop (with real PMU programming in between) running overnight:
> 
> for i in {1..10000}; do sudo insmod arm-cmn.ko && sudo perf stat -e 
> arm_cmn_0/eventid=5,type=5/ sleep 1 && sudo rmmod arm-cmn && sleep 4; done
> 
> and now coming back to find /proc/interrupts saying this:
> 
>   27:          1          0          0...
>   28:          1          0          0...
>   29:          2          0          0...
>   30:          1          0          0...
> 
> I've quite often seen a single IRQ firing earlier than expected (not 
> necessarily spuriously), so I still need to check what's up with that - 
> it may be that writing to the counters doesn't always take either. 
> However, the single extra incidence of IRQ 29 which has happened at some 
> point after I went home is more of a smoking gun:
> 
> [84581.790043] WARNING: CPU: 0 PID: 0 at /home/ubuntu/arm-cmn.c:1828 
> arm_cmn_handle_irq+0x148/0x1cc [arm_cmn]
> 
> So something still snuck through reset, but it *was* at least visible 
> and clearable by the time the IRQ was enabled. Interestingly the other 
> warning for !dtc->cycles did not fire at the same time, even though the 
> hack normally overflows PMCCNTR before PMEVCNTR(0). I'll keep digging...

I realise I neglected to follow up on this - where I got to was adding 
an extra read back of CMN_DTC_CTL after the write, tweaking my remove 
hack to generate overflows even more reliably for good measure, and that 
then ran for ~56,000 test cycles (until my time on the instance ran out) 
without any stray interrupts at all. However I wasn't keen on posting 
that as a v2 without any better justification than it being a "add 
random things to change the timing a bit" bodge. Since we're in the 
merge window now, I'll see if I can get a better answer by -rc1.

Thanks,
Robin.

diff mbox series

Patch

diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
index 144cc08d9e04..81fe01171e33 100644
--- a/drivers/perf/arm-cmn.c
+++ b/drivers/perf/arm-cmn.c
@@ -1899,7 +1899,10 @@  static int arm_cmn_init_dtc(struct arm_cmn *cmn, struct arm_cmn_node *dn, int id
 	if (dtc->irq < 0)
 		return dtc->irq;
 
-	writel_relaxed(0, dtc->base + CMN_DT_PMCR);
+	if (idx == 0)
+		writel_relaxed(0, dtc->base + CMN_DT_DTC_CTL);
+
+	writel_relaxed(CMN_DT_PMCR_PMU_EN, dtc->base + CMN_DT_PMCR);
 	writel_relaxed(0x1ff, dtc->base + CMN_DT_PMOVSR_CLR);
 	writel_relaxed(CMN_DT_PMCR_OVFL_INTR_EN, dtc->base + CMN_DT_PMCR);