drivers/clocksource/arm_arch_timer: Tighten Allwinner arch timer workaround

Message ID 20221109221049.4bf3c5bb@rosh
State New
Headers
Series drivers/clocksource/arm_arch_timer: Tighten Allwinner arch timer workaround |

Commit Message

Ilya Dikariev Nov. 9, 2022, 9:10 p.m. UTC
  As we know, the Allwinner A64 SoC has a buggy RCU time unit. The
workaround named UNKNOWN1 was not sufficient to cover some more buggy
bunches of this SoC. This workaround diminish the mask to 8 bits instead
of 9.

An example run of timer test tool https://github.com/smaeul/timer-tools
on PinePhone device (owns the A64 SoC) gives following result on a non
patched kernel (cut off):

Running parallel counter test...
0: Failed after 5507 reads (0.003578 s)
0: 0x0000000c8272cbf1 -> 0x0000000c8272ccff -> 0x0000000c8272cc0e (     0.011 ms)
2: Failed after 14518 reads (0.009248 s)
2: 0x0000000c827513f1 -> 0x0000000c82751300 -> 0x0000000c8275140e (    -0.010 ms)
3: Failed after 14112 reads (0.008730 s)
3: 0x0000000c8274f3f2 -> 0x0000000c8274f300 -> 0x0000000c8274f40d (    -0.010 ms)
1: Failed after 12030 reads (0.008409 s)
1: 0x0000000c8274abf1 -> 0x0000000c8274acff -> 0x0000000c8274ac0f (     0.011 ms)
1: 0x0000000c827759f2 -> 0x0000000c82775aff -> 0x0000000c82775a0e (     0.011 ms)
0: 0x0000000c8277a9f2 -> 0x0000000c8277aaff -> 0x0000000c8277aa0d (     0.011 ms)
2: 0x0000000c8278f3f1 -> 0x0000000c8278f300 -> 0x0000000c8278f40e (    -0.010 ms)
0: 0x0000000c82785ff2 -> 0x0000000c82784300 -> 0x0000000c8278600d (    -0.309 ms)

After the proposed patch applied the test runs
correctly (~2 hours of testing with a tool above without fails)

Signed-off-by: Ilya Dikariev <dikarill@b-tu.de>
---
 drivers/clocksource/arm_arch_timer.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Marc Zyngier Nov. 10, 2022, 8:31 a.m. UTC | #1
On Wed, 09 Nov 2022 21:10:49 +0000,
Ilya Dikariev <dikarill@b-tu.de> wrote:
> 
> As we know, the Allwinner A64 SoC has a buggy RCU time unit. The

What is RCU?

> workaround named UNKNOWN1 was not sufficient to cover some more buggy
> bunches of this SoC. This workaround diminish the mask to 8 bits instead
> of 9.
> 
> An example run of timer test tool https://github.com/smaeul/timer-tools
> on PinePhone device (owns the A64 SoC) gives following result on a non
> patched kernel (cut off):
> 
> Running parallel counter test...
> 0: Failed after 5507 reads (0.003578 s)
> 0: 0x0000000c8272cbf1 -> 0x0000000c8272ccff -> 0x0000000c8272cc0e (     0.011 ms)
> 2: Failed after 14518 reads (0.009248 s)
> 2: 0x0000000c827513f1 -> 0x0000000c82751300 -> 0x0000000c8275140e (    -0.010 ms)
> 3: Failed after 14112 reads (0.008730 s)
> 3: 0x0000000c8274f3f2 -> 0x0000000c8274f300 -> 0x0000000c8274f40d (    -0.010 ms)
> 1: Failed after 12030 reads (0.008409 s)
> 1: 0x0000000c8274abf1 -> 0x0000000c8274acff -> 0x0000000c8274ac0f (     0.011 ms)
> 1: 0x0000000c827759f2 -> 0x0000000c82775aff -> 0x0000000c82775a0e (     0.011 ms)
> 0: 0x0000000c8277a9f2 -> 0x0000000c8277aaff -> 0x0000000c8277aa0d (     0.011 ms)
> 2: 0x0000000c8278f3f1 -> 0x0000000c8278f300 -> 0x0000000c8278f40e (    -0.010 ms)
> 0: 0x0000000c82785ff2 -> 0x0000000c82784300 -> 0x0000000c8278600d (    -0.309 ms)
> 
> After the proposed patch applied the test runs
> correctly (~2 hours of testing with a tool above without fails)

2 hours seems like an incredibly small amount of time given that the
existing workaround was believed to be correct. Run it continuously
for a couple of weeks on several different machines with varying
workloads and less us know the outcome.

Thanks,

	M.
  
Ilya Dikariev Nov. 10, 2022, 11:10 a.m. UTC | #2
El Thu, 10 Nov 2022 08:31:21 +0000
Marc Zyngier <maz@kernel.org> escribió:

MZ> > 
MZ> > As we know, the Allwinner A64 SoC has a buggy RCU time unit. The  
MZ> 
MZ> What is RCU?

I think I called it wrong. Anyway I mean the HR timer of A64.

MZ> 
MZ> > workaround named UNKNOWN1 was not sufficient to cover some more buggy
MZ> > bunches of this SoC. This workaround diminish the mask to 8 bits instead
MZ> > of 9.
MZ> > 
MZ> > An example run of timer test tool https://github.com/smaeul/timer-tools
MZ> > on PinePhone device (owns the A64 SoC) gives following result on a non
MZ> > patched kernel (cut off):
MZ> > 
MZ> > Running parallel counter test...
MZ> > 0: Failed after 5507 reads (0.003578 s)
MZ> > 0: 0x0000000c8272cbf1 -> 0x0000000c8272ccff -> 0x0000000c8272cc0e (     0.011 ms)

[......]

MZ> > After the proposed patch applied the test runs
MZ> > correctly (~2 hours of testing with a tool above without fails)  
MZ> 
MZ> 2 hours seems like an incredibly small amount of time given that the
MZ> existing workaround was believed to be correct. Run it continuously
MZ> for a couple of weeks on several different machines with varying
MZ> workloads and less us know the outcome.

The only A64 machine I own is the Pinephone. First time I did this patch
~9 month ago (on behalf os Samuel). Before it the system suffered hangs
every 15-20 minutes and backward time jumps ~1 time a day.
Since applying of this patch none of above occurred.
To be honest, I never did long tests (weeks). I will put the device
on a probe for some weeks and let you know then.


Best regards,
Ilya
  

Patch

diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index a7ff77550e17..3019faa263f5 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -371,7 +371,7 @@  static u64 notrace arm64_858921_read_cntvct_el0(void)
 	do {								\
 		_val = read_sysreg(reg);				\
 		_retries--;						\
-	} while (((_val + 1) & GENMASK(8, 0)) <= 1 && _retries);	\
+	} while (((_val + 1) & GENMASK(7, 0)) <= 1 && _retries);	\
 									\
 	WARN_ON_ONCE(!_retries);					\
 	_val;								\