diff mbox series

[RFC] cpufreq: qcom-cpufreq-hw: allow work to be done on other CPU for PREEMPT_RT

Message ID	20230315164910.302265-1-krzysztof.kozlowski@linaro.org
State	New
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; From: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> To: Andy Gross <agross@kernel.org>, Bjorn Andersson <andersson@kernel.org>, Konrad Dybcio <konrad.dybcio@linaro.org>, "Rafael J. Wysocki" <rafael@kernel.org>, Viresh Kumar <viresh.kumar@linaro.org>, linux-arm-msm@vger.kernel.org, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Adrien Thierry <athierry@redhat.com>, Brian Masney <bmasney@redhat.com>, linux-rt-users@vger.kernel.org, Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Subject: [RFC PATCH] cpufreq: qcom-cpufreq-hw: allow work to be done on other CPU for PREEMPT_RT Date: Wed, 15 Mar 2023 17:49:10 +0100 Message-Id: <20230315164910.302265-1-krzysztof.kozlowski@linaro.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	[RFC] cpufreq: qcom-cpufreq-hw: allow work to be done on other CPU for PREEMPT_RT \| [RFC] cpufreq: qcom-cpufreq-hw: allow work to be done on other CPU for PREEMPT_RT

Commit Message

Krzysztof Kozlowski March 15, 2023, 4:49 p.m. UTC

  Qualcomm cpufreq driver configures interrupts with affinity to each
cluster, e.g.  dcvsh-irq-0, dcvsh-irq-4 and dcvsh-irq-7 on SM8250.
Triggered interrupt will schedule delayed work, but, since workqueue
prefers local CPUs, it might get executed on a CPU dedicated to realtime
tasks causing unexpected latencies in realtime workload.

Use unbound workqueue for such case.  This might come with performance
or energy penalty, e.g. because of cache miss or when other CPU is
sleeping.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
---
 drivers/cpufreq/qcom-cpufreq-hw.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Comments

Krzysztof Kozlowski March 16, 2023, 12:28 p.m. UTC | #1

On 15/03/2023 17:49, Krzysztof Kozlowski wrote:
> Qualcomm cpufreq driver configures interrupts with affinity to each
> cluster, e.g.  dcvsh-irq-0, dcvsh-irq-4 and dcvsh-irq-7 on SM8250.
> Triggered interrupt will schedule delayed work, but, since workqueue
> prefers local CPUs, it might get executed on a CPU dedicated to realtime
> tasks causing unexpected latencies in realtime workload.
> 
> Use unbound workqueue for such case.  This might come with performance
> or energy penalty, e.g. because of cache miss or when other CPU is
> sleeping.
> 
> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
> ---
>  drivers/cpufreq/qcom-cpufreq-hw.c | 11 ++++++++++-

Let me also paste impact of this patch - rtla osnoise on entirely idle
system (cores 2-7 isolated for Realtime):

BEFORE:
       osnoise/7-2967    [007] d..h2..  3937.898311: irq_noise: dcvsh-irq-7:179 start 3937.898310871 duration 104 ns
 irq/179-dcvsh-i-343     [007] d..h2..  3937.898318: irq_noise: IPI:6 start 3937.898317537 duration 104 ns
 irq/179-dcvsh-i-343     [007] d...3..  3937.898321: thread_noise: irq/179-dcvsh-i:343 start 3937.898316287 duration 4740 ns
     kworker/7:0-85      [007] d..h2..  3937.898323: irq_noise: IPI:6 start 3937.898322381 duration 104 ns
     kworker/7:0-85      [007] d...3..  3937.898343: thread_noise: kworker/7:0:85 start 3937.898321339 duration 20990 ns
       osnoise/7-2967    [007] .......  3937.898343: sample_threshold: start 3937.898308475 duration 34531 ns interference 5

Noise duration: 34 us

AFTER:
       osnoise/7-2637    [007] d..h2..   530.563819: irq_noise: dcvsh-irq-7:178 start 530.563817139 duration 260 ns
       osnoise/7-2637    [007] d..h2..   530.563827: irq_noise: IPI:6 start 530.563826670 duration 156 ns
       osnoise/7-2637    [007] .......   530.563828: sample_threshold: start 530.563814587 duration 12864 ns interference 2

Noise duration: 13 us

Best regards,
Krzysztof

Krzysztof Kozlowski March 17, 2023, 8:13 a.m. UTC | #2

On 17/03/2023 00:57, Hillf Danton wrote:
> On 16 Mar 2023 13:28:18 +0100 Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
>> On 15/03/2023 17:49, Krzysztof Kozlowski wrote:
>>> Qualcomm cpufreq driver configures interrupts with affinity to each
>>> cluster, e.g.  dcvsh-irq-0, dcvsh-irq-4 and dcvsh-irq-7 on SM8250.
>>> Triggered interrupt will schedule delayed work, but, since workqueue
>>> prefers local CPUs, it might get executed on a CPU dedicated to realtime
>>> tasks causing unexpected latencies in realtime workload.
>>>
>>> Use unbound workqueue for such case.  This might come with performance
>>> or energy penalty, e.g. because of cache miss or when other CPU is
>>> sleeping.
>>>
>>> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
>>> ---
>>>  drivers/cpufreq/qcom-cpufreq-hw.c | 11 ++++++++++-
>>
>> Let me also paste impact of this patch - rtla osnoise on entirely idle
>> system (cores 2-7 isolated for Realtime):
> 
> Are cores 2-7 the non-housekeeping CPUs[1]?

Yes, since their are isolated for Realtime this is synonymous.

> 
> [1] https://lore.kernel.org/lkml/20230223150624.GA29739@lst.de/



Best regards,
Krzysztof

Sebastian Andrzej Siewior March 21, 2023, 10:04 a.m. UTC | #3

On 2023-03-15 17:49:10 [+0100], Krzysztof Kozlowski wrote:
> Qualcomm cpufreq driver configures interrupts with affinity to each
> cluster, e.g.  dcvsh-irq-0, dcvsh-irq-4 and dcvsh-irq-7 on SM8250.
> Triggered interrupt will schedule delayed work, but, since workqueue
> prefers local CPUs, it might get executed on a CPU dedicated to realtime
> tasks causing unexpected latencies in realtime workload.
> 
> Use unbound workqueue for such case.  This might come with performance
> or energy penalty, e.g. because of cache miss or when other CPU is
> sleeping.

I miss the point where it explains that only PREEMPT_RT is affected by
this.

> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
> ---
>  drivers/cpufreq/qcom-cpufreq-hw.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
> index 2f581d2d617d..c5ff8d25fabb 100644
> --- a/drivers/cpufreq/qcom-cpufreq-hw.c
> +++ b/drivers/cpufreq/qcom-cpufreq-hw.c
> @@ -390,7 +390,16 @@ static irqreturn_t qcom_lmh_dcvs_handle_irq(int irq, void *data)
>  
>  	/* Disable interrupt and enable polling */
>  	disable_irq_nosync(c_data->throttle_irq);
> -	schedule_delayed_work(&c_data->throttle_work, 0);
> +
> +	/*
> +	 * Workqueue prefers local CPUs and since interrupts have set affinity,
> +	 * the work might execute on a CPU dedicated to realtime tasks.
> +	 */
> +	if (IS_ENABLED(CONFIG_PREEMPT_RT))
> +		queue_delayed_work_on(WORK_CPU_UNBOUND, system_unbound_wq,
> +				      &c_data->throttle_work, 0);
> +	else
> +		schedule_delayed_work(&c_data->throttle_work, 0);

You isolated CPUs and use this on PREEMPT_RT. And this special use-case
is your reasoning to make this change and let it depend on PREEMPT_RT?

If you do PREEMPT_RT and you care about latency I would argue that you
either disable cpufreq and set it to PERFORMANCE so that the highest
available frequency is set once and not changed afterwards.

>  	if (qcom_cpufreq.soc_data->reg_intr_clr)
>  		writel_relaxed(GT_IRQ_STATUS,
> -- 
> 2.34.1
>

Krzysztof Kozlowski March 21, 2023, 10:24 a.m. UTC | #4

On 21/03/2023 11:04, Sebastian Andrzej Siewior wrote:
> On 2023-03-15 17:49:10 [+0100], Krzysztof Kozlowski wrote:
>> Qualcomm cpufreq driver configures interrupts with affinity to each
>> cluster, e.g.  dcvsh-irq-0, dcvsh-irq-4 and dcvsh-irq-7 on SM8250.
>> Triggered interrupt will schedule delayed work, but, since workqueue
>> prefers local CPUs, it might get executed on a CPU dedicated to realtime
>> tasks causing unexpected latencies in realtime workload.
>>
>> Use unbound workqueue for such case.  This might come with performance
>> or energy penalty, e.g. because of cache miss or when other CPU is
>> sleeping.
> 
> I miss the point where it explains that only PREEMPT_RT is affected by
> this.

I assume "realtime tasks" implies this, but I can make it clearer.

> 
>> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
>> ---
>>  drivers/cpufreq/qcom-cpufreq-hw.c | 11 ++++++++++-
>>  1 file changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
>> index 2f581d2d617d..c5ff8d25fabb 100644
>> --- a/drivers/cpufreq/qcom-cpufreq-hw.c
>> +++ b/drivers/cpufreq/qcom-cpufreq-hw.c
>> @@ -390,7 +390,16 @@ static irqreturn_t qcom_lmh_dcvs_handle_irq(int irq, void *data)
>>  
>>  	/* Disable interrupt and enable polling */
>>  	disable_irq_nosync(c_data->throttle_irq);
>> -	schedule_delayed_work(&c_data->throttle_work, 0);
>> +
>> +	/*
>> +	 * Workqueue prefers local CPUs and since interrupts have set affinity,
>> +	 * the work might execute on a CPU dedicated to realtime tasks.
>> +	 */
>> +	if (IS_ENABLED(CONFIG_PREEMPT_RT))
>> +		queue_delayed_work_on(WORK_CPU_UNBOUND, system_unbound_wq,
>> +				      &c_data->throttle_work, 0);
>> +	else
>> +		schedule_delayed_work(&c_data->throttle_work, 0);
> 
> You isolated CPUs and use this on PREEMPT_RT. And this special use-case
> is your reasoning to make this change and let it depend on PREEMPT_RT?
> 
> If you do PREEMPT_RT and you care about latency I would argue that you
> either disable cpufreq and set it to PERFORMANCE so that the highest
> available frequency is set once and not changed afterwards.

The cpufreq is set to performance. It will be changed anyway because
underlying FW notifies through such interrupts about thermal mitigation
happening.

The only other solution is to disable the cpufreq device, e.g. by not
compiling it.

Best regards,
Krzysztof

Sebastian Andrzej Siewior March 21, 2023, 10:57 a.m. UTC | #5

On 2023-03-21 11:24:46 [+0100], Krzysztof Kozlowski wrote:
> >> --- a/drivers/cpufreq/qcom-cpufreq-hw.c
> >> +++ b/drivers/cpufreq/qcom-cpufreq-hw.c
> >> @@ -390,7 +390,16 @@ static irqreturn_t qcom_lmh_dcvs_handle_irq(int irq, void *data)
> >>  
> >>  	/* Disable interrupt and enable polling */
> >>  	disable_irq_nosync(c_data->throttle_irq);
> >> -	schedule_delayed_work(&c_data->throttle_work, 0);
> >> +
> >> +	/*
> >> +	 * Workqueue prefers local CPUs and since interrupts have set affinity,
> >> +	 * the work might execute on a CPU dedicated to realtime tasks.
> >> +	 */
> >> +	if (IS_ENABLED(CONFIG_PREEMPT_RT))
> >> +		queue_delayed_work_on(WORK_CPU_UNBOUND, system_unbound_wq,
> >> +				      &c_data->throttle_work, 0);
> >> +	else
> >> +		schedule_delayed_work(&c_data->throttle_work, 0);
> > 
> > You isolated CPUs and use this on PREEMPT_RT. And this special use-case
> > is your reasoning to make this change and let it depend on PREEMPT_RT?
> > 
> > If you do PREEMPT_RT and you care about latency I would argue that you
> > either disable cpufreq and set it to PERFORMANCE so that the highest
> > available frequency is set once and not changed afterwards.
> 
> The cpufreq is set to performance. It will be changed anyway because
> underlying FW notifies through such interrupts about thermal mitigation
> happening.

I still fail to understand why this is PREEMPT_RT specific and not a
problem in general when it comes not NO_HZ_FULL and/ or CPU isolation.
However the thermal notifications have nothing to do with cpufreq.

> The only other solution is to disable the cpufreq device, e.g. by not
> compiling it.

People often disable cpufreq because _usually_ the system boots at
maximum performance. There are however exceptions and even x86 system
are configured sometimes to a lower clock speed by the firmware/ BIOS.
In this case it is nice to have a cpufreq so it is possible to set the
system during boot to a higher clock speed. And then remain idle unless
the cpufreq governor changed.

> Best regards,
> Krzysztof

Sebastian

Krzysztof Kozlowski March 21, 2023, 11:27 a.m. UTC | #6

On 21/03/2023 11:57, Sebastian Andrzej Siewior wrote:
> On 2023-03-21 11:24:46 [+0100], Krzysztof Kozlowski wrote:
>>>> --- a/drivers/cpufreq/qcom-cpufreq-hw.c
>>>> +++ b/drivers/cpufreq/qcom-cpufreq-hw.c
>>>> @@ -390,7 +390,16 @@ static irqreturn_t qcom_lmh_dcvs_handle_irq(int irq, void *data)
>>>>  
>>>>  	/* Disable interrupt and enable polling */
>>>>  	disable_irq_nosync(c_data->throttle_irq);
>>>> -	schedule_delayed_work(&c_data->throttle_work, 0);
>>>> +
>>>> +	/*
>>>> +	 * Workqueue prefers local CPUs and since interrupts have set affinity,
>>>> +	 * the work might execute on a CPU dedicated to realtime tasks.
>>>> +	 */
>>>> +	if (IS_ENABLED(CONFIG_PREEMPT_RT))
>>>> +		queue_delayed_work_on(WORK_CPU_UNBOUND, system_unbound_wq,
>>>> +				      &c_data->throttle_work, 0);
>>>> +	else
>>>> +		schedule_delayed_work(&c_data->throttle_work, 0);
>>>
>>> You isolated CPUs and use this on PREEMPT_RT. And this special use-case
>>> is your reasoning to make this change and let it depend on PREEMPT_RT?
>>>
>>> If you do PREEMPT_RT and you care about latency I would argue that you
>>> either disable cpufreq and set it to PERFORMANCE so that the highest
>>> available frequency is set once and not changed afterwards.
>>
>> The cpufreq is set to performance. It will be changed anyway because
>> underlying FW notifies through such interrupts about thermal mitigation
>> happening.
> 
> I still fail to understand why this is PREEMPT_RT specific and not a
> problem in general when it comes not NO_HZ_FULL and/ or CPU isolation.

Hm, good point, I actually don't know what is the workqueue
recommendation for NO_HZ_FULL CPUs - is still locality of the workqueue
preferred?

And how such code would look like?
if (tick_nohz_tick_stopped())?

> However the thermal notifications have nothing to do with cpufreq.

They have. The FW notifies that thermal mitigation is happening and
maximum allowed frequency is now XYZ. The cpufreq receives this and sets
maximum allowed scaling frequency for governor.

> 
>> The only other solution is to disable the cpufreq device, e.g. by not
>> compiling it.
> 
> People often disable cpufreq because _usually_ the system boots at
> maximum performance. There are however exceptions and even x86 system
> are configured sometimes to a lower clock speed by the firmware/ BIOS.
> In this case it is nice to have a cpufreq so it is possible to set the
> system during boot to a higher clock speed. And then remain idle unless
> the cpufreq governor changed.

Which we do not want here, thus disabling cpufreq is not the interesting
solution...

Best regards,
Krzysztof

Sebastian Andrzej Siewior March 21, 2023, 1:39 p.m. UTC | #7

On 2023-03-21 12:27:42 [+0100], Krzysztof Kozlowski wrote:
> > I still fail to understand why this is PREEMPT_RT specific and not a
> > problem in general when it comes not NO_HZ_FULL and/ or CPU isolation.
> 
> Hm, good point, I actually don't know what is the workqueue
> recommendation for NO_HZ_FULL CPUs - is still locality of the workqueue
> preferred?

If you isolate a CPU you want the kernel to stay away from it. The idea
is that something is done on that CPU and the kernel should leave it
alone. That is why the HZ tick avoided. That is why timers migrate to
the "housekeeping" CPU and do not fire on the CPU that it was programmed
on (unless the timer has to fire on this CPU).

> And how such code would look like?
> if (tick_nohz_tick_stopped())?

Yeah closer :) The CPU-mask for workqueues can still be different on
non-NOHZ-full CPUs. Still you interrupt the CPU doing in-userland work
and this is not desired.

You have a threaded-IRQ which does nothing but schedules a worker. Why?
Why not sleep and remain in that threaded IRQ until the work is done?
You _can_ sleep in the threaded IRQ if you have to. Force-threaded is
different but this is one is explicit threaded so you could do it.

> > However the thermal notifications have nothing to do with cpufreq.
> 
> They have. The FW notifies that thermal mitigation is happening and
> maximum allowed frequency is now XYZ. The cpufreq receives this and sets
> maximum allowed scaling frequency for governor.

I see. So the driver is doing something in worst case. This interrupt,
you have per-CPU and you need to do this CPU? I mean could you change
the affinity of the interrupt to another CPU?

> Best regards,
> Krzysztof

Sebastian

Krzysztof Kozlowski March 23, 2023, 8:16 a.m. UTC | #8

On 21/03/2023 14:39, Sebastian Andrzej Siewior wrote:
> On 2023-03-21 12:27:42 [+0100], Krzysztof Kozlowski wrote:
>>> I still fail to understand why this is PREEMPT_RT specific and not a
>>> problem in general when it comes not NO_HZ_FULL and/ or CPU isolation.
>>
>> Hm, good point, I actually don't know what is the workqueue
>> recommendation for NO_HZ_FULL CPUs - is still locality of the workqueue
>> preferred?
> 
> If you isolate a CPU you want the kernel to stay away from it. The idea
> is that something is done on that CPU and the kernel should leave it
> alone. That is why the HZ tick avoided. That is why timers migrate to
> the "housekeeping" CPU and do not fire on the CPU that it was programmed
> on (unless the timer has to fire on this CPU).
> 
>> And how such code would look like?
>> if (tick_nohz_tick_stopped())?
> 
> Yeah closer :) The CPU-mask for workqueues can still be different on
> non-NOHZ-full CPUs. Still you interrupt the CPU doing in-userland work
> and this is not desired.

Probably this should be done by workqueue core code.  Individual drivers
should not need to investigate which CPUs are isolated.


> You have a threaded-IRQ which does nothing but schedules a worker. Why?
> Why not sleep and remain in that threaded IRQ until the work is done?
> You _can_ sleep in the threaded IRQ if you have to. Force-threaded is
> different but this is one is explicit threaded so you could do it.

If I get your point correctly, you want the IRQ handler thread to do the
actual work instead of scheduling work? The answer to this is probably here:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e0e27c3d4e20dab861566f1c348ae44e4b498630

> 	
>>> However the thermal notifications have nothing to do with cpufreq.
>>
>> They have. The FW notifies that thermal mitigation is happening and
>> maximum allowed frequency is now XYZ. The cpufreq receives this and sets
>> maximum allowed scaling frequency for governor.
> 
> I see. So the driver is doing something in worst case. This interrupt,
> you have per-CPU and you need to do this CPU? I mean could you change
> the affinity of the interrupt to another CPU?

I don't know. The commit introducing it:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3ed6dfbd3bb987b3d2de86304ae45972ebff5870
claimed it helps to reduce number of interrupts hitting CPU 10x-100x
times... I don't see it - neither in tests nor in the code, so I am just
thinking to revert that one.

Best regards,
Krzysztof

Sebastian Andrzej Siewior March 23, 2023, 11:37 a.m. UTC | #9

On 2023-03-23 09:16:27 [+0100], Krzysztof Kozlowski wrote:
> > Yeah closer :) The CPU-mask for workqueues can still be different on
> > non-NOHZ-full CPUs. Still you interrupt the CPU doing in-userland work
> > and this is not desired.
> 
> Probably this should be done by workqueue core code.  Individual drivers
> should not need to investigate which CPUs are isolated.

_Either_ this is part of the interrupt service routine or it is not.
Sometimes work can be offloaded.
However this interrupt is short and offloads work to a workqueue.
Can the interrupt be moved to another CPU without breaking something? 
The use can only change the CPUs on which the workqueue can run but also
the affinity of the IRQ itself. If the user wishes to isolate CPU X and
move workqueues and interrupts away from the CPU the question is why is
this a problem for you.

> > You have a threaded-IRQ which does nothing but schedules a worker. Why?
> > Why not sleep and remain in that threaded IRQ until the work is done?
> > You _can_ sleep in the threaded IRQ if you have to. Force-threaded is
> > different but this is one is explicit threaded so you could do it.
> 
> If I get your point correctly, you want the IRQ handler thread to do the
> actual work instead of scheduling work? The answer to this is probably here:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e0e27c3d4e20dab861566f1c348ae44e4b498630

Let me look.

| Re-enabling an interrupt from its own interrupt handler may cause
| an interrupt storm, if there is a pending interrupt and because its
| handling is disabled due to already done entrance into the handler
| above in the stack.

I have hard time parsing this.
disable_irq_nosync() and enable enable_irq() shouldn't be invoked from
within the interrupt handler itself. This interrupt is already requested
as a threaded handler with IRQF_ONESHOT. This means the IRQ-chip already
disables the interrupt while the threaded handler is invoked. No need
for that.

I don't know what the purpose of reg_intr_clr here is. Acking the
interrupt before reading the status and doing any work does not look
right.

| Also, apparently it is improper to lock a mutex in an interrupt contex.

Again, this is an interrupt handler requested as a threaded handler.
This handler is invoked with enabled interrupts and preemption. The code
in this handler can invoke ssleep() or mutex_lock(). A might_sleep()
does not produce a plat here. It okay to acquire a mutex. This is why we
have threaded interrupts.

You can't acquire a mutex in a forced-threaded handler. This is not the
case here.

> > 	
> >>> However the thermal notifications have nothing to do with cpufreq.
> >>
> >> They have. The FW notifies that thermal mitigation is happening and
> >> maximum allowed frequency is now XYZ. The cpufreq receives this and sets
> >> maximum allowed scaling frequency for governor.
> > 
> > I see. So the driver is doing something in worst case. This interrupt,
> > you have per-CPU and you need to do this CPU? I mean could you change
> > the affinity of the interrupt to another CPU?
> 
> I don't know. The commit introducing it:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3ed6dfbd3bb987b3d2de86304ae45972ebff5870
> claimed it helps to reduce number of interrupts hitting CPU 10x-100x
> times... I don't see it - neither in tests nor in the code, so I am just
> thinking to revert that one.

So it may run on another CPU but doing it on the right cluster reduces
the received interrupt 10-100 times. Do we have per-cluster register or
is the interrupt ACK wrong and this what is observed?

The questions are:
- What is the interrupt signaling.
- What must be done to acknowledge the interrupt.

This needs to be figured out and verified that it actually works as
intended. The hardware might keep sending interrupt because the source
is either not acknowledge properly or the source of the interrupt (the
reason why it was generated in first place) is still pending/ not
handled. The changes you reference look like "if we do this then it
seems better.". No explanation about the issue/ root cause and the
targeted solution.

> Best regards,
> Krzysztof

Sebastian

diff mbox series

Patch

diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
index 2f581d2d617d..c5ff8d25fabb 100644
--- a/drivers/cpufreq/qcom-cpufreq-hw.c
+++ b/drivers/cpufreq/qcom-cpufreq-hw.c
@@ -390,7 +390,16 @@  static irqreturn_t qcom_lmh_dcvs_handle_irq(int irq, void *data)
 
 	/* Disable interrupt and enable polling */
 	disable_irq_nosync(c_data->throttle_irq);
-	schedule_delayed_work(&c_data->throttle_work, 0);
+
+	/*
+	 * Workqueue prefers local CPUs and since interrupts have set affinity,
+	 * the work might execute on a CPU dedicated to realtime tasks.
+	 */
+	if (IS_ENABLED(CONFIG_PREEMPT_RT))
+		queue_delayed_work_on(WORK_CPU_UNBOUND, system_unbound_wq,
+				      &c_data->throttle_work, 0);
+	else
+		schedule_delayed_work(&c_data->throttle_work, 0);
 
 	if (qcom_cpufreq.soc_data->reg_intr_clr)
 		writel_relaxed(GT_IRQ_STATUS,