gpiolib: Don't implicitly disable irq when masking

Message ID 20230510001151.3946931-1-chris.packham@alliedtelesis.co.nz
State New
Headers
Series gpiolib: Don't implicitly disable irq when masking |

Commit Message

Chris Packham May 10, 2023, 12:11 a.m. UTC
  When preparing to kexec into a new kernel the kexec code will mask all
interrupts for all interrupt domains before disabling them. In the case
of a gpio chip which has a mix of gpio and irq pins a warning would be
triggered as follows

  [root@localhost ~]# echo c >/proc/sysrq-trigger
  [  175.677728] sysrq: Trigger a crash
  <snip expected dump from SysRq>
  [  175.803409] WARNING: CPU: 1 PID: 2345 at drivers/gpio/gpiolib.c:3284 gpiochip_disable_irq+0x68/0xac
  [  175.918321] CPU: 1 PID: 2345 Comm: sh Kdump: loaded Tainted: G           O      5.15.109+ #5
  [  175.926742] Hardware name: Allied Telesis x240-26GHXm (DT)
  [  175.932216] pstate: 804000c9 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  [  175.939165] pc : gpiochip_disable_irq+0x68/0xac
  [  175.943686] lr : gpiochip_disable_irq+0x58/0xac
  [  175.948208] sp : ffff80000c03ba00
  [  175.951513] x29: ffff80000c03ba00 x28: ffff00002843c4c0 x27: 0000000000000000
  [  175.958638] x26: 0000000000000000 x25: ffff800008df0790 x24: 0000000000000000
  [  175.965763] x23: ffff8000095397f0 x22: ffff800009485178 x21: ffff000016ea38f0
  [  175.972888] x20: ffff000016ea3a20 x19: ffff000016073cc0 x18: ffffffffffffffff
  [  175.980012] x17: 20204f2020202020 x16: 2020202020204720 x15: ffff80008c03b667
  [  175.987136] x14: 0000000000000000 x13: ffff800009415148 x12: 0000000000000555
  [  175.994260] x11: 00000000000001c7 x10: ffff800009415148 x9 : ffff800009415148
  [  176.001385] x8 : 00000000ffffefff x7 : ffff80000946d148 x6 : ffff80000946d148
  [  176.008510] x5 : ffff00003fdc59a0 x4 : 0000000000000000 x3 : 0000000000000000
  [  176.015634] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000007
  [  176.022757] Call trace:
  [  176.025197]  gpiochip_disable_irq+0x68/0xac
  [  176.029373]  gpiochip_irq_mask+0x30/0x40
  [  176.033289]  machine_crash_shutdown+0xb4/0x11c
  [  176.037727]  __crash_kexec+0x88/0x16c
  [  176.041384]  panic+0x194/0x348
  [  176.044433]  sysrq_reset_seq_param_set+0x0/0x90
  [  176.048954]  __handle_sysrq+0x8c/0x1a0
  [  176.052695]  write_sysrq_trigger+0x88/0xc0
  [  176.056783]  proc_reg_write+0xa8/0x100
  [  176.060527]  vfs_write+0xf0/0x290
  [  176.063835]  ksys_write+0x68/0xf4
  [  176.067144]  __arm64_sys_write+0x1c/0x2c
  [  176.071058]  invoke_syscall+0x48/0x114
  [  176.074800]  el0_svc_common.constprop.0+0x44/0xfc
  [  176.079495]  do_el0_svc+0x28/0xa0
  [  176.082803]  el0_svc+0x28/0x80
  [  176.085851]  el0t_64_sync_handler+0xa4/0x130
  [  176.090112]  el0t_64_sync+0x1a0/0x1a4
  [  176.093768] ---[ end trace 486d53a4eb15ab42 ]---
  <more warnings for each gpio pin that is not used as an interrupt>

This is because gpiochip_irq_mask was being used to mask all possible
irqs in the domain but gpiochip_disable_irq will WARN if any of those
gpios haven't been requested as interrupts yet. Remove the call to
gpiochip_disable_irq to stop the warning.

Fixes: a8173820f441 ("gpio: gpiolib: Allow GPIO IRQs to lazy disable")
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
---
 drivers/gpio/gpiolib.c | 1 -
 1 file changed, 1 deletion(-)
  

Comments

Andy Shevchenko May 10, 2023, 7:42 a.m. UTC | #1
Wed, May 10, 2023 at 12:11:51PM +1200, Chris Packham kirjoitti:
> When preparing to kexec into a new kernel the kexec code will mask all
> interrupts for all interrupt domains before disabling them. In the case
> of a gpio chip which has a mix of gpio and irq pins a warning would be
> triggered as follows

>   [root@localhost ~]# echo c >/proc/sysrq-trigger

Besides the very noisy traceback in the commit message (read
https://kernel.org/doc/html/latest/process/submitting-patches.html#backtraces-in-commit-messages)
see below.

> This is because gpiochip_irq_mask was being used to mask all possible

We refer to the functions in the form as follows gpiochip_irq_mask().


> irqs in the domain but gpiochip_disable_irq will WARN if any of those

IRQs
gpiochip_disable_irq()

> gpios haven't been requested as interrupts yet. Remove the call to

GPIOs

> gpiochip_disable_irq to stop the warning.

gpiochip_disable_irq()

> Fixes: a8173820f441 ("gpio: gpiolib: Allow GPIO IRQs to lazy disable")
> Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
> ---
>  drivers/gpio/gpiolib.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
> index 8c041a8dd9d8..903f5185ae55 100644
> --- a/drivers/gpio/gpiolib.c
> +++ b/drivers/gpio/gpiolib.c
> @@ -1451,7 +1451,6 @@ static void gpiochip_irq_mask(struct irq_data *d)
>  
>  	if (gc->irq.irq_mask)
>  		gc->irq.irq_mask(d);
> -	gpiochip_disable_irq(gc, d->hwirq);
>  }

At the same time the gpiochip_irq_unmask() has the symmetrical call. Why?

Also it's obvious that you have used outdated repository. You need to rebase
against subsystem tree for-next branch.

P.S. It's also makes sense to Cc to Marc Zyngier <maz@kernel.org>.
  
Chris Packham May 10, 2023, 8:58 p.m. UTC | #2
Hi Andy,

On 10/05/23 19:42, andy.shevchenko@gmail.com wrote:
> Wed, May 10, 2023 at 12:11:51PM +1200, Chris Packham kirjoitti:
>> When preparing to kexec into a new kernel the kexec code will mask all
>> interrupts for all interrupt domains before disabling them. In the case
>> of a gpio chip which has a mix of gpio and irq pins a warning would be
>> triggered as follows
>>    [root@localhost ~]# echo c >/proc/sysrq-trigger
> Besides the very noisy traceback in the commit message (read
> https://kernel.org/doc/html/latest/process/submitting-patches.html#backtraces-in-commit-messages)
> see below.
>
>> This is because gpiochip_irq_mask was being used to mask all possible
> We refer to the functions in the form as follows gpiochip_irq_mask().
>
>
>> irqs in the domain but gpiochip_disable_irq will WARN if any of those
> IRQs
> gpiochip_disable_irq()
>
>> gpios haven't been requested as interrupts yet. Remove the call to
> GPIOs
>
>> gpiochip_disable_irq to stop the warning.
> gpiochip_disable_irq()
Will take the above points onboard for v2.
>
>> Fixes: a8173820f441 ("gpio: gpiolib: Allow GPIO IRQs to lazy disable")
>> Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
>> ---
>>   drivers/gpio/gpiolib.c | 1 -
>>   1 file changed, 1 deletion(-)
>>
>> diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
>> index 8c041a8dd9d8..903f5185ae55 100644
>> --- a/drivers/gpio/gpiolib.c
>> +++ b/drivers/gpio/gpiolib.c
>> @@ -1451,7 +1451,6 @@ static void gpiochip_irq_mask(struct irq_data *d)
>>   
>>   	if (gc->irq.irq_mask)
>>   		gc->irq.irq_mask(d);
>> -	gpiochip_disable_irq(gc, d->hwirq);
>>   }
> At the same time the gpiochip_irq_unmask() has the symmetrical call. Why?

Hmm you're right I never noticed that. I think that would also trigger a 
similar warning if it were ever hit. It's not hit in my use-case because 
nothing is running through all the irq domains unmasking interrupts.

The coupling of gpiochip_irq_mask()/gpiochip_irq_unmask() with 
gpiochip_disable_irq()/gpiochip_enable_irq() goes back to the same 
commit a8173820f441 ("gpio: gpiolib: Allow GPIO IRQs to lazy disable"). 
It's not immediately obvious to me why the coupling is needed. I was 
hoping that someone seeing my patch would confirm that it's not needed 
or say why it's needed suggest an alternative approach.

> Also it's obvious that you have used outdated repository. You need to rebase
> against subsystem tree for-next branch.
Yeah that's the tricky part. I'm currently based on lts-5.15 and in 
order to actually test this I need all of the support for my platform so 
I can use kdump to demonstrate the issue. I might be able to use a 
different platform that is already supported in a newer kernel
>
> P.S. It's also makes sense to Cc to Marc Zyngier <maz@kernel.org>.
>
Added.
  
Linus Walleij May 11, 2023, 8 a.m. UTC | #3
On Wed, May 10, 2023 at 10:59 PM Chris Packham
<Chris.Packham@alliedtelesis.co.nz> wrote:

> The coupling of gpiochip_irq_mask()/gpiochip_irq_unmask() with
> gpiochip_disable_irq()/gpiochip_enable_irq() goes back to the same
> commit a8173820f441 ("gpio: gpiolib: Allow GPIO IRQs to lazy disable").
> It's not immediately obvious to me why the coupling is needed.

That is just a refactoring of what existed before.

The use case is here:
drivers/media/cec/platform/cec-gpio/cec-gpio.c

The driver needs to switch, at runtime, between actively driving a GPIO
line with gpiod_set_value(), and setting the same line into input mode
and listening for signalling triggering IRQs on it, and then back to
output mode and driving the line again. It's a bidirectional GPIO line.
This use case yields a high need of control.

> I was
> hoping that someone seeing my patch would confirm that it's not needed
> or say why it's needed suggest an alternative approach.

Which IRQ-enabled gpiochip is this? Has it been converted to be immutable?
I think that could be part of the problem.

Yours,
Linus Walleij
  
Chris Packham May 11, 2023, 8:36 p.m. UTC | #4
Hi Linus,

On 11/05/23 20:00, Linus Walleij wrote:
> On Wed, May 10, 2023 at 10:59 PM Chris Packham
> <Chris.Packham@alliedtelesis.co.nz> wrote:
>
>> The coupling of gpiochip_irq_mask()/gpiochip_irq_unmask() with
>> gpiochip_disable_irq()/gpiochip_enable_irq() goes back to the same
>> commit a8173820f441 ("gpio: gpiolib: Allow GPIO IRQs to lazy disable").
>> It's not immediately obvious to me why the coupling is needed.
> That is just a refactoring of what existed before.
>
> The use case is here:
> drivers/media/cec/platform/cec-gpio/cec-gpio.c
>
> The driver needs to switch, at runtime, between actively driving a GPIO
> line with gpiod_set_value(), and setting the same line into input mode
> and listening for signalling triggering IRQs on it, and then back to
> output mode and driving the line again. It's a bidirectional GPIO line.
> This use case yields a high need of control.
>
>> I was
>> hoping that someone seeing my patch would confirm that it's not needed
>> or say why it's needed suggest an alternative approach.
> Which IRQ-enabled gpiochip is this? Has it been converted to be immutable?
> I think that could be part of the problem.

For me it's a pca9555. I spent yesterday trying to demonstrate the 
problem on a newer kernel. Some teething issues aside I can trigger the 
warning if I have a gpio-button using one of the pca9555 pins as an 
interrupt and then I export some of the other pins via sysfs.

Interestingly the warning isn't triggered if I use a gpio-hog instead of 
exporting the pins. I haven't figured out why that is but I'm assuming 
it's something to do with the hogged pins being excluded from the irq 
domain before it is registered.
  
Linus Walleij May 11, 2023, 8:48 p.m. UTC | #5
On Thu, May 11, 2023 at 10:36 PM Chris Packham
<Chris.Packham@alliedtelesis.co.nz> wrote:

> I spent yesterday trying to demonstrate the
> problem on a newer kernel. Some teething issues aside I can trigger the
> warning if I have a gpio-button using one of the pca9555 pins as an
> interrupt and then I export some of the other pins via sysfs.
>
> Interestingly the warning isn't triggered if I use a gpio-hog instead of
> exporting the pins.

What happens if you use the gpio character device instead of sysfs?

Like for example with the tools in tools/gpio or using libgpiod
example tools?

> I haven't figured out why that is but I'm assuming
> it's something to do with the hogged pins being excluded from the irq
> domain before it is registered.

If you write something to the "edge" file I can easily see things
going sidewise. The sysfs is really not a nice tool, which is why
it is deprecated.

Yours,
Linus Walleij
  
Chris Packham May 12, 2023, 3:50 a.m. UTC | #6
On 12/05/23 08:36, Chris Packham wrote:
> Hi Linus,
>
> On 11/05/23 20:00, Linus Walleij wrote:
>> On Wed, May 10, 2023 at 10:59 PM Chris Packham
>> <Chris.Packham@alliedtelesis.co.nz> wrote:
>>
>>> The coupling of gpiochip_irq_mask()/gpiochip_irq_unmask() with
>>> gpiochip_disable_irq()/gpiochip_enable_irq() goes back to the same
>>> commit a8173820f441 ("gpio: gpiolib: Allow GPIO IRQs to lazy disable").
>>> It's not immediately obvious to me why the coupling is needed.
>> That is just a refactoring of what existed before.
>>
>> The use case is here:
>> drivers/media/cec/platform/cec-gpio/cec-gpio.c
>>
>> The driver needs to switch, at runtime, between actively driving a GPIO
>> line with gpiod_set_value(), and setting the same line into input mode
>> and listening for signalling triggering IRQs on it, and then back to
>> output mode and driving the line again. It's a bidirectional GPIO line.
>> This use case yields a high need of control.
>>
>>> I was
>>> hoping that someone seeing my patch would confirm that it's not needed
>>> or say why it's needed suggest an alternative approach.
>> Which IRQ-enabled gpiochip is this? Has it been converted to be 
>> immutable?
>> I think that could be part of the problem.
>
> For me it's a pca9555. I spent yesterday trying to demonstrate the 
> problem on a newer kernel. Some teething issues aside I can trigger 
> the warning if I have a gpio-button using one of the pca9555 pins as 
> an interrupt and then I export some of the other pins via sysfs.
>
> Interestingly the warning isn't triggered if I use a gpio-hog instead 
> of exporting the pins. I haven't figured out why that is but I'm 
> assuming it's something to do with the hogged pins being excluded from 
> the irq domain before it is registered.

I'm starting to understand things.

When the gpio is exported to userland the irq_desc is created via 
device_add()/gpio_is_visible()/gpiod_to_irq()/gpiochip_to_irq(). I think 
that might be a bug because if the user wanted an interrupt they would 
have said so via edge_store() which also does the gpiod_to_irq() that 
ultimately creates the irq_desc. Having the gpio turned into an 
interrupt seems like an odd side-effect of gpio_is_visible().
  

Patch

diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index 8c041a8dd9d8..903f5185ae55 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -1451,7 +1451,6 @@  static void gpiochip_irq_mask(struct irq_data *d)
 
 	if (gc->irq.irq_mask)
 		gc->irq.irq_mask(d);
-	gpiochip_disable_irq(gc, d->hwirq);
 }
 
 static void gpiochip_irq_unmask(struct irq_data *d)