Message ID | 20240220111019.133697-3-herve.codina@bootlin.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-72877-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:693c:2685:b0:108:e6aa:91d0 with SMTP id mn5csp324808dyc; Tue, 20 Feb 2024 03:11:14 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCVzLGqkpLhDxzkuMuVzlgUFewgQyUN/Ib+Gd9O9W3AK6nkUFUiR66rcJa7LFLN0OpF8il2GROPQkYg4nCXteF/EAgLJmQ== X-Google-Smtp-Source: AGHT+IGFSMjVdvacBxZWP05MLpijnjTccdKX8OZfrshUwJVzKlJ6NwpO91gsWcfGdYnMXpKQZjrk X-Received: by 2002:a17:906:3915:b0:a3e:c680:ee3a with SMTP id f21-20020a170906391500b00a3ec680ee3amr2724404eje.30.1708427474217; Tue, 20 Feb 2024 03:11:14 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1708427474; cv=pass; d=google.com; s=arc-20160816; b=E6ywwaWVEFEDNEdsZoDWJo78CICwD6AO/DUwFBSfSLCpNCh05oRBtqlzECdfJwvQf+ HTaQxuSuUj9Om0HhNwSN7+5PqpUJderbF5yu8VfPtxbYQ95CFGhUBr2gCWPUE7dDHCpT RvUL1M9VYnvHkjgC3iXuCBh6QUZRsJjhU2fnEdbC0BYS+/7L5wwBL4E1pgamny270O3q UH6LqKKI6xf82X1066Gx1SLavmjuAugbHU6vBCNiEZfTjIlHZtQ7HDmtNoufD3hLgoxj CD3ZlPpvXnlIRDOFafsNUY/nkI3TmFI5meRHC+A8n6l6Uy+dya/CI+8rFIpGs5IbP4AJ V9qA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=c+dSIBMW24B1QEdw+4NnCXt+Ftn4RqDE56kLBoNW620=; fh=bQDfEZGNvUJOgi5NqmHXoHCcxSTUxiah6hyMMNL5D+k=; b=0Hg6usYmj+B5sEGzsSQSQ1YZ1w4dOWX0EHSW0eXFwxYsXRMJK++beNtu1wsJtON9nb GE/M+Mgo48levNHXXtChYd+8s/brYkGtT6+AnEybKEoshTVJ/SGFRQBYbwtRajtSGvJy IlAUTg57VEFcsm8bJQAA1hWPtiCQhUS4vy4tphkh/awRufC1PS7jkskoh67YgiOZ5m0D Hh6zFhzkrHU38qIxzGVZ1/C/er2Db49RtkQ79frK1WtMsH9z3PQHd9053/LNiqFLUu7N RQOy3kghW48RP7te/rKXVql4S/+hDMEKc3LyPnZ8jv/A9PYQHzZSvxvjt+TNp6SF4Ac+ dcqA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@bootlin.com header.s=gm1 header.b=BVyvJKXa; arc=pass (i=1 spf=pass spfdomain=bootlin.com dkim=pass dkdomain=bootlin.com dmarc=pass fromdomain=bootlin.com); spf=pass (google.com: domain of linux-kernel+bounces-72877-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-72877-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=bootlin.com Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id hd13-20020a170907968d00b00a3eb2a3c3e9si1576539ejc.1006.2024.02.20.03.11.14 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Feb 2024 03:11:14 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-72877-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@bootlin.com header.s=gm1 header.b=BVyvJKXa; arc=pass (i=1 spf=pass spfdomain=bootlin.com dkim=pass dkdomain=bootlin.com dmarc=pass fromdomain=bootlin.com); spf=pass (google.com: domain of linux-kernel+bounces-72877-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-72877-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=bootlin.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id CE8961F242B9 for <ouuuleilei@gmail.com>; Tue, 20 Feb 2024 11:11:13 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 3526367754; Tue, 20 Feb 2024 11:10:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bootlin.com header.i=@bootlin.com header.b="BVyvJKXa" Received: from relay8-d.mail.gandi.net (relay8-d.mail.gandi.net [217.70.183.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 70C4465BD7; Tue, 20 Feb 2024 11:10:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.70.183.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708427439; cv=none; b=JRUqY9cJ/Sa1Cue5fjF6KnRaZKpUmH9JOYvh2Xq4VFSnmKo1mn3bDnt/0RUZmFma4wd0Zq8iMPh62XhSmKU6x7P5IXI2rH+4ST6vAMJmp9DVqIdM6dK19idPNfT8fQUYtba8k5u/j1yRmizWkX8wyXeENRWl2mC2UACMlP385uA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708427439; c=relaxed/simple; bh=+gtPF82Psk+0scOoKi//Kqv72HSichWCQkcD4dfiT54=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Nkmalsi8tThYEDQFzZqvPp4IjvTlFLFJ1S3e5b0hFKBCb4kQCZoEJYRF2L2VXIe8AwWtqsYcfftXzvbTg4j0NEqfIcgw5BMO3UcI0LVVZr3mWvehjA1Q1jsveXjyI21jtz/xEUVguUIjAMZbrHC9/9UtxbemkHpX9po7CDmSHas= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=bootlin.com; spf=pass smtp.mailfrom=bootlin.com; dkim=pass (2048-bit key) header.d=bootlin.com header.i=@bootlin.com header.b=BVyvJKXa; arc=none smtp.client-ip=217.70.183.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=bootlin.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bootlin.com Received: by mail.gandi.net (Postfix) with ESMTPA id DF7C51BF216; Tue, 20 Feb 2024 11:10:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bootlin.com; s=gm1; t=1708427428; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=c+dSIBMW24B1QEdw+4NnCXt+Ftn4RqDE56kLBoNW620=; b=BVyvJKXaQGyrX5QqXRvRovkMd9I52tgnpBvFb8sGJW2ToyYfkTjQSpmRPbC7MWlr87qARC UuykfAJ2Gzw/xw5HTJEEqnS9JgcllN4ozixRVIdJYUmaJ03EwrSA/HPHMJ0LyBIfwuFF1T Vn1eXzZi+zEtjlgWodDARgePGpWww+XhLl3uCVA3aeN0DFXy7cgyb1C+jG8yyVHnXCLaiV irE83aWO1lAbdCOCsco0uZvMOIkdgUuXesKYs7D8/MbzY6shCK+YPobQUfruQ4uRjyYmG/ JGkV8ouvIR9DZ5KZSudRIcYRPo7yRuxpmIc3uM6rb3FMCWiLekvld3vCTbWM1g== From: Herve Codina <herve.codina@bootlin.com> To: Bartosz Golaszewski <brgl@bgdev.pl>, Kent Gibson <warthog618@gmail.com>, Linus Walleij <linus.walleij@linaro.org> Cc: linux-gpio@vger.kernel.org, linux-kernel@vger.kernel.org, Luca Ceresoli <luca.ceresoli@bootlin.com>, Thomas Petazzoni <thomas.petazzoni@bootlin.com>, Herve Codina <herve.codina@bootlin.com> Subject: [PATCH 2/2] gpiolib: cdev: release IRQs when the gpio chip device is removed Date: Tue, 20 Feb 2024 12:10:18 +0100 Message-ID: <20240220111019.133697-3-herve.codina@bootlin.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240220111019.133697-1-herve.codina@bootlin.com> References: <20240220111019.133697-1-herve.codina@bootlin.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-GND-Sasl: herve.codina@bootlin.com X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1791416047157736740 X-GMAIL-MSGID: 1791416047157736740 |
Series |
gpio-cdev: Release IRQ used by gpio-cdev on gpio chip removal
|
|
Commit Message
Herve Codina
Feb. 20, 2024, 11:10 a.m. UTC
When gpio chip device is removed while some related gpio are used by the
user-space, the following warning can appear:
remove_proc_entry: removing non-empty directory 'irq/233', leaking at least 'gpiomon'
WARNING: CPU: 2 PID: 72 at fs/proc/generic.c:717 remove_proc_entry+0x190/0x19c
...
Call trace:
remove_proc_entry+0x190/0x19c
unregister_irq_proc+0xd0/0x104
free_desc+0x4c/0xc4
irq_free_descs+0x6c/0x90
irq_dispose_mapping+0x104/0x14c
gpiochip_irqchip_remove+0xcc/0x1a4
gpiochip_remove+0x48/0x100
...
Indeed, the gpio cdev uses an IRQ but this IRQ is not released when the
gpio chip device is removed.
Release IRQs used in the device removal notifier functions.
Also move one of these function definition in order to avoid a forward
declaration (move after the edge_detector_stop() definition).
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
---
drivers/gpio/gpiolib-cdev.c | 33 ++++++++++++++++++++++-----------
1 file changed, 22 insertions(+), 11 deletions(-)
Comments
On Tue, Feb 20, 2024 at 12:10:18PM +0100, Herve Codina wrote: > When gpio chip device is removed while some related gpio are used by the > user-space, the following warning can appear: > remove_proc_entry: removing non-empty directory 'irq/233', leaking at least 'gpiomon' > WARNING: CPU: 2 PID: 72 at fs/proc/generic.c:717 remove_proc_entry+0x190/0x19c > ... > Call trace: > remove_proc_entry+0x190/0x19c > unregister_irq_proc+0xd0/0x104 > free_desc+0x4c/0xc4 > irq_free_descs+0x6c/0x90 > irq_dispose_mapping+0x104/0x14c > gpiochip_irqchip_remove+0xcc/0x1a4 > gpiochip_remove+0x48/0x100 > ... > > Indeed, the gpio cdev uses an IRQ but this IRQ is not released when the > gpio chip device is removed. > > Release IRQs used in the device removal notifier functions. > Also move one of these function definition in order to avoid a forward > declaration (move after the edge_detector_stop() definition). > > Signed-off-by: Herve Codina <herve.codina@bootlin.com> > --- > drivers/gpio/gpiolib-cdev.c | 33 ++++++++++++++++++++++----------- > 1 file changed, 22 insertions(+), 11 deletions(-) > > diff --git a/drivers/gpio/gpiolib-cdev.c b/drivers/gpio/gpiolib-cdev.c > index 2a88736629ef..aec4a4c8490a 100644 > --- a/drivers/gpio/gpiolib-cdev.c > +++ b/drivers/gpio/gpiolib-cdev.c > @@ -688,17 +688,6 @@ static void line_set_debounce_period(struct line *line, > GPIO_V2_LINE_FLAG_EVENT_CLOCK_HTE | \ > GPIO_V2_LINE_EDGE_FLAGS) > > -static int linereq_unregistered_notify(struct notifier_block *nb, > - unsigned long action, void *data) > -{ > - struct linereq *lr = container_of(nb, struct linereq, > - device_unregistered_nb); > - > - wake_up_poll(&lr->wait, EPOLLIN | EPOLLERR); > - > - return NOTIFY_OK; > -} > - > static void linereq_put_event(struct linereq *lr, > struct gpio_v2_line_event *le) > { > @@ -1189,6 +1178,23 @@ static int edge_detector_update(struct line *line, > return edge_detector_setup(line, lc, line_idx, edflags); > } > > +static int linereq_unregistered_notify(struct notifier_block *nb, > + unsigned long action, void *data) > +{ > + struct linereq *lr = container_of(nb, struct linereq, > + device_unregistered_nb); > + int i; > + > + for (i = 0; i < lr->num_lines; i++) { > + if (lr->lines[i].desc) > + edge_detector_stop(&lr->lines[i]); > + } > + Firstly, the re-ordering in the previous patch creates a race, as the NULLing of the gdev->chip serves to numb the cdev ioctls, so there is now a window between the notifier being called and that numbing, during which userspace may call linereq_set_config() and re-request the irq. There is also a race here with linereq_set_config(). That can be prevented by holding the lr->config_mutex - assuming the notifier is not being called from atomic context. You also have a race with the line being freed that could pull the lr out from under you, so a use after free problem. I'd rather live with the warning :(. Fixing that requires rethinking the lifecycle management for the linereq/lineevent. Cheers, Kent. > + wake_up_poll(&lr->wait, EPOLLIN | EPOLLERR); > + > + return NOTIFY_OK; > +} > + > static u64 gpio_v2_line_config_flags(struct gpio_v2_line_config *lc, > unsigned int line_idx) > { > @@ -1898,6 +1904,11 @@ static int lineevent_unregistered_notify(struct notifier_block *nb, > struct lineevent_state *le = container_of(nb, struct lineevent_state, > device_unregistered_nb); > > + if (le->irq) { > + free_irq(le->irq, le); > + le->irq = 0; > + } > + > wake_up_poll(&le->wait, EPOLLIN | EPOLLERR); > > return NOTIFY_OK; > -- > 2.43.0 >
On Tue, Feb 20, 2024 at 07:26:57PM +0100, Herve Codina wrote: > Hi Kent, > > On Tue, 20 Feb 2024 22:29:59 +0800 > Kent Gibson <warthog618@gmail.com> wrote: > .. > > I can call gcdev_unregister() right after gdev->chip = NULL. > The needed things is to have free_irq() (from the gcdev_unregister()) called > before calling gpiochip_irqchip_remove(). > > And so, why not: > --- 8< --- > ... > gpiochip_free_hogs(gc); > /* Numb the device, cancelling all outstanding operations */ > gdev->chip = NULL; > gcdev_unregister(gdev); > gpiochip_irqchip_remove(gc); > acpi_gpiochip_remove(gc); > of_gpiochip_remove(gc); > gpiochip_remove_pin_ranges(gc); > ... > --- 8< --- > Exactly - why not. Though adding some comments to the code as to why that ordering is required, as per the numbing, would be helpful for maintainability. > > > > There is also a race here with linereq_set_config(). That can be prevented > > by holding the lr->config_mutex - assuming the notifier is not being called > > from atomic context. > > I missed that one and indeed, I probably can take the mutex. With the mutex > holded, no more race condition with linereq_set_config() and so the IRQ cannot > be re-requested. > > > > > You also have a race with the line being freed that could pull the > > lr out from under you, so a use after free problem. > > I probably missed something but I don't see this use after free. > Can you give me some details/pointers ? > What is to prevent userspace releasing the request and freeing the linereq while you use it? The use after free is anywhere that is possible. AIUI, from the userspace side that is prevented by the file handle not being closed, and so linereq_release() not being called, while ioctls are in flight. But as you are calling in from the kernel side there is nothing in place to prevent userspace freeing the linereq while you are using it. > > > I'd rather live with the warning :(. > > Fixing that requires rethinking the lifecycle management for the > > linereq/lineevent. > > Well, currently the warning is a big one with a dump_stack included. > It will be interesting to have it fixed. > > The need to fix it is to have free_irq() called before > gpiochip_irqchip_remove(); > > Is there really no way to have this correct sequence without rethinking all > the lifecycle management ? > Not that I am aware of. You need to protect against the linereq being freed while you are using it, which is by definition is lifecycle management. Though it isn't necessarily __all__ the lifecycle management. > Also, after the warning related to the IRQ, the following one is present: > --- 8< --- > [ 9593.527961] gpio gpiochip9: REMOVING GPIOCHIP WITH GPIOS STILL REQUESTED > [ 9593.535602] ------------[ cut here ]------------ > [ 9593.540244] WARNING: CPU: 0 PID: 309 at drivers/gpio/gpiolib.c:2352 gpiod_free.part.0+0x20/0x48 > ... > [ 9593.725016] Call trace: > [ 9593.727468] gpiod_free.part.0+0x20/0x48 > [ 9593.731404] gpiod_free+0x14/0x24 > [ 9593.734728] lineevent_free+0x40/0x74 > [ 9593.738402] lineevent_release+0x14/0x24 > [ 9593.742335] __fput+0x70/0x2bc > [ 9593.745403] __fput_sync+0x50/0x5c > [ 9593.748817] __arm64_sys_close+0x38/0x7c > [ 9593.752751] invoke_syscall+0x48/0x114 > ... > [ 9593.815299] ---[ end trace 0000000000000000 ]--- > [ 9593.820616] hotplug-manager dock-hotplug-manager: remove overlay 0 (ovcs id 1) > gpiomon: error waiting for events: No such device > # > --- 8< --- > My understanding is that we now handle that case - that is what the gdev->device_notifier chain is for, and gpiolib and gpiolib-cdev now perform a controlled cleanupi - so that warning is obsolete and should be removed from gpiochip_remove(). IIRC, previously you would've gotten a panic shortly after that warning. Now you get gpiomon noticing that the device has been removed. Btw, where I mention linereq here, the same applies to lineevent and linehandle - the uAPI v1 equivalents of linereq. Cheers, Kent.
On Wed, Feb 21, 2024 at 08:25:55AM +0800, Kent Gibson wrote: > On Tue, Feb 20, 2024 at 07:26:57PM +0100, Herve Codina wrote: > > Hi Kent, > > > > > > I probably missed something but I don't see this use after free. > > Can you give me some details/pointers ? > > > > What is to prevent userspace releasing the request and freeing the > linereq while you use it? The use after free is anywhere that is > possible. > To answer my own question - the notifier call chain itself will prevent that - linereq_free() will get blocked on the notifier chain semaphore until the notifier call returns. So there is no use after free problem. My bad - sorry for the added confusion. Cheers, Kent.
On Tue, Feb 20, 2024 at 10:29:59PM +0800, Kent Gibson wrote: > On Tue, Feb 20, 2024 at 12:10:18PM +0100, Herve Codina wrote: .. > > } > > > > +static int linereq_unregistered_notify(struct notifier_block *nb, > > + unsigned long action, void *data) > > +{ > > + struct linereq *lr = container_of(nb, struct linereq, > > + device_unregistered_nb); > > + int i; > > + > > + for (i = 0; i < lr->num_lines; i++) { > > + if (lr->lines[i].desc) > > + edge_detector_stop(&lr->lines[i]); > > + } > > + > > Firstly, the re-ordering in the previous patch creates a race, > as the NULLing of the gdev->chip serves to numb the cdev ioctls, so > there is now a window between the notifier being called and that numbing, > during which userspace may call linereq_set_config() and re-request > the irq. > > There is also a race here with linereq_set_config(). That can be prevented > by holding the lr->config_mutex - assuming the notifier is not being called > from atomic context. > It occurs to me that the fixed reordering in patch 1 would place the notifier call AFTER the NULLing of the ioctls, so there will no longer be any chance of a race with linereq_set_config() - so holding the config_mutex semaphore is not necessary. In which case this patch is fine - it is only patch 1 that requires updating. Cheers, Kent.
On Thu, Feb 22, 2024 at 08:57:44AM +0800, Kent Gibson wrote: > On Tue, Feb 20, 2024 at 10:29:59PM +0800, Kent Gibson wrote: > > On Tue, Feb 20, 2024 at 12:10:18PM +0100, Herve Codina wrote: > > ... > > > > } > > > > > > +static int linereq_unregistered_notify(struct notifier_block *nb, > > > + unsigned long action, void *data) > > > +{ > > > + struct linereq *lr = container_of(nb, struct linereq, > > > + device_unregistered_nb); > > > + int i; > > > + > > > + for (i = 0; i < lr->num_lines; i++) { > > > + if (lr->lines[i].desc) > > > + edge_detector_stop(&lr->lines[i]); > > > + } > > > + > > > > Firstly, the re-ordering in the previous patch creates a race, > > as the NULLing of the gdev->chip serves to numb the cdev ioctls, so > > there is now a window between the notifier being called and that numbing, > > during which userspace may call linereq_set_config() and re-request > > the irq. > > > > There is also a race here with linereq_set_config(). That can be prevented > > by holding the lr->config_mutex - assuming the notifier is not being called > > from atomic context. > > > > It occurs to me that the fixed reordering in patch 1 would place > the notifier call AFTER the NULLing of the ioctls, so there will no longer > be any chance of a race with linereq_set_config() - so holding the > config_mutex semaphore is not necessary. > NULLing -> numbing The gdev->chip is NULLed, so the ioctls are numbed. And I need to let the coffee soak in before sending. > In which case this patch is fine - it is only patch 1 that requires > updating. > > Cheers, > Kent.
On Thu, 22 Feb 2024 02:05:30 +0100, Kent Gibson <warthog618@gmail.com> said: > On Thu, Feb 22, 2024 at 08:57:44AM +0800, Kent Gibson wrote: >> On Tue, Feb 20, 2024 at 10:29:59PM +0800, Kent Gibson wrote: >> > On Tue, Feb 20, 2024 at 12:10:18PM +0100, Herve Codina wrote: >> >> ... >> >> > > } >> > > >> > > +static int linereq_unregistered_notify(struct notifier_block *nb, >> > > + unsigned long action, void *data) >> > > +{ >> > > + struct linereq *lr = container_of(nb, struct linereq, >> > > + device_unregistered_nb); >> > > + int i; >> > > + >> > > + for (i = 0; i < lr->num_lines; i++) { >> > > + if (lr->lines[i].desc) >> > > + edge_detector_stop(&lr->lines[i]); >> > > + } >> > > + >> > >> > Firstly, the re-ordering in the previous patch creates a race, >> > as the NULLing of the gdev->chip serves to numb the cdev ioctls, so >> > there is now a window between the notifier being called and that numbing, >> > during which userspace may call linereq_set_config() and re-request >> > the irq. >> > >> > There is also a race here with linereq_set_config(). That can be prevented >> > by holding the lr->config_mutex - assuming the notifier is not being called >> > from atomic context. >> > >> >> It occurs to me that the fixed reordering in patch 1 would place >> the notifier call AFTER the NULLing of the ioctls, so there will no longer >> be any chance of a race with linereq_set_config() - so holding the >> config_mutex semaphore is not necessary. >> > > NULLing -> numbing > > The gdev->chip is NULLed, so the ioctls are numbed. > And I need to let the coffee soak in before sending. > >> In which case this patch is fine - it is only patch 1 that requires >> updating. >> >> Cheers, >> Kent. > The fix for the user-space issue may be more-or-less correct but the problem is deeper and this won't fix it for in-kernel users. Herve: please consider the following DT snippet: gpio0 { compatible = "foo"; gpio-controller; #gpio-cells = <2>; interrupt-controller; #interrupt-cells = <1>; ngpios = <8>; }; consumer { compatible = "bar"; interrupts-extended = <&gpio0 0>; }; If you unbind the "gpio0" device after the consumer requested the interrupt, you'll get the same splat. And device links will not help you here (on that note: Saravana: is there anything we could do about it? Have you even considered making the irqchip subsystem use the driver model in any way? Is it even feasible?). I would prefer this to be fixed at a lower lever than the GPIOLIB character device. Bartosz
Hi Bartosz, On Thu, 22 Feb 2024 00:31:08 -0800 Bartosz Golaszewski <brgl@bgdev.pl> wrote: > On Thu, 22 Feb 2024 02:05:30 +0100, Kent Gibson <warthog618@gmail.com> said: > > On Thu, Feb 22, 2024 at 08:57:44AM +0800, Kent Gibson wrote: > >> On Tue, Feb 20, 2024 at 10:29:59PM +0800, Kent Gibson wrote: > >> > On Tue, Feb 20, 2024 at 12:10:18PM +0100, Herve Codina wrote: > >> > >> ... > >> > >> > > } > >> > > > >> > > +static int linereq_unregistered_notify(struct notifier_block *nb, > >> > > + unsigned long action, void *data) > >> > > +{ > >> > > + struct linereq *lr = container_of(nb, struct linereq, > >> > > + device_unregistered_nb); > >> > > + int i; > >> > > + > >> > > + for (i = 0; i < lr->num_lines; i++) { > >> > > + if (lr->lines[i].desc) > >> > > + edge_detector_stop(&lr->lines[i]); > >> > > + } > >> > > + > >> > > >> > Firstly, the re-ordering in the previous patch creates a race, > >> > as the NULLing of the gdev->chip serves to numb the cdev ioctls, so > >> > there is now a window between the notifier being called and that numbing, > >> > during which userspace may call linereq_set_config() and re-request > >> > the irq. > >> > > >> > There is also a race here with linereq_set_config(). That can be prevented > >> > by holding the lr->config_mutex - assuming the notifier is not being called > >> > from atomic context. > >> > > >> > >> It occurs to me that the fixed reordering in patch 1 would place > >> the notifier call AFTER the NULLing of the ioctls, so there will no longer > >> be any chance of a race with linereq_set_config() - so holding the > >> config_mutex semaphore is not necessary. > >> > > > > NULLing -> numbing > > > > The gdev->chip is NULLed, so the ioctls are numbed. > > And I need to let the coffee soak in before sending. > > > >> In which case this patch is fine - it is only patch 1 that requires > >> updating. > >> > >> Cheers, > >> Kent. > > > > The fix for the user-space issue may be more-or-less correct but the problem is > deeper and this won't fix it for in-kernel users. > > Herve: please consider the following DT snippet: > > gpio0 { > compatible = "foo"; > > gpio-controller; > #gpio-cells = <2>; > interrupt-controller; > #interrupt-cells = <1>; > ngpios = <8>; > }; > > consumer { > compatible = "bar"; > > interrupts-extended = <&gpio0 0>; > }; > > If you unbind the "gpio0" device after the consumer requested the interrupt, > you'll get the same splat. And device links will not help you here (on that > note: Saravana: is there anything we could do about it? Have you even > considered making the irqchip subsystem use the driver model in any way? Is it > even feasible?). > > I would prefer this to be fixed at a lower lever than the GPIOLIB character > device. I think this use case is covered. When the consumer device related to the consumer DT node is added, a consumer/supplier relationship is created: parse_interrupts() parses the 'interrups-extended' property https://elixir.bootlin.com/linux/v6.8-rc1/source/drivers/of/property.c#L1316 and so, of_link_to_phandle() creates the consumer/supplier link. https://elixir.bootlin.com/linux/v6.8-rc1/source/drivers/of/property.c#L1316 We that link present, if the supplier is removed, the consumer is removed before. The consumer should release the interrupt during its remove process (i.e explicit in its .remove() or explicit because of a devm_*() call). At least, it is my understanding. Best regards, Hervé
On Thu, Feb 22, 2024 at 12:36 PM Herve Codina <herve.codina@bootlincom> wrote: > > Hi Bartosz, > > On Thu, 22 Feb 2024 00:31:08 -0800 > Bartosz Golaszewski <brgl@bgdev.pl> wrote: > > > On Thu, 22 Feb 2024 02:05:30 +0100, Kent Gibson <warthog618@gmail.com> said: > > > On Thu, Feb 22, 2024 at 08:57:44AM +0800, Kent Gibson wrote: > > >> On Tue, Feb 20, 2024 at 10:29:59PM +0800, Kent Gibson wrote: > > >> > On Tue, Feb 20, 2024 at 12:10:18PM +0100, Herve Codina wrote: > > >> > > >> ... > > >> > > >> > > } > > >> > > > > >> > > +static int linereq_unregistered_notify(struct notifier_block *nb, > > >> > > + unsigned long action, void *data) > > >> > > +{ > > >> > > + struct linereq *lr = container_of(nb, struct linereq, > > >> > > + device_unregistered_nb); > > >> > > + int i; > > >> > > + > > >> > > + for (i = 0; i < lr->num_lines; i++) { > > >> > > + if (lr->lines[i].desc) > > >> > > + edge_detector_stop(&lr->lines[i]); > > >> > > + } > > >> > > + > > >> > > > >> > Firstly, the re-ordering in the previous patch creates a race, > > >> > as the NULLing of the gdev->chip serves to numb the cdev ioctls, so > > >> > there is now a window between the notifier being called and that numbing, > > >> > during which userspace may call linereq_set_config() and re-request > > >> > the irq. > > >> > > > >> > There is also a race here with linereq_set_config(). That can be prevented > > >> > by holding the lr->config_mutex - assuming the notifier is not being called > > >> > from atomic context. > > >> > > > >> > > >> It occurs to me that the fixed reordering in patch 1 would place > > >> the notifier call AFTER the NULLing of the ioctls, so there will no longer > > >> be any chance of a race with linereq_set_config() - so holding the > > >> config_mutex semaphore is not necessary. > > >> > > > > > > NULLing -> numbing > > > > > > The gdev->chip is NULLed, so the ioctls are numbed. > > > And I need to let the coffee soak in before sending. > > > > > >> In which case this patch is fine - it is only patch 1 that requires > > >> updating. > > >> > > >> Cheers, > > >> Kent. > > > > > > > The fix for the user-space issue may be more-or-less correct but the problem is > > deeper and this won't fix it for in-kernel users. > > > > Herve: please consider the following DT snippet: > > > > gpio0 { > > compatible = "foo"; > > > > gpio-controller; > > #gpio-cells = <2>; > > interrupt-controller; > > #interrupt-cells = <1>; > > ngpios = <8>; > > }; > > > > consumer { > > compatible = "bar"; > > > > interrupts-extended = <&gpio0 0>; > > }; > > > > If you unbind the "gpio0" device after the consumer requested the interrupt, > > you'll get the same splat. And device links will not help you here (on that > > note: Saravana: is there anything we could do about it? Have you even > > considered making the irqchip subsystem use the driver model in any way? Is it > > even feasible?). > > > > I would prefer this to be fixed at a lower lever than the GPIOLIB character > > device. > > I think this use case is covered. > When the consumer device related to the consumer DT node is added, a > consumer/supplier relationship is created: > parse_interrupts() parses the 'interrups-extended' property > https://elixir.bootlin.com/linux/v6.8-rc1/source/drivers/of/property.c#L1316 > and so, of_link_to_phandle() creates the consumer/supplier link. > https://elixir.bootlin.com/linux/v6.8-rc1/source/drivers/of/property.c#L1316 > > We that link present, if the supplier is removed, the consumer is removed > before. > The consumer should release the interrupt during its remove process (i.e > explicit in its .remove() or explicit because of a devm_*() call). > > At least, it is my understanding. Well, then it doesn't work, because I literally just tried it before sending my previous email. Please try it yourself, you'll see. Also: an interrupt controller may not even have a device consuming its DT node (see IRQCHIP_DECLARE()), what happens then? Bart > > Best regards, > Hervé
On Thu, Feb 22, 2024 at 4:21 AM Bartosz Golaszewski <brgl@bgdev.pl> wrote: > > On Thu, Feb 22, 2024 at 12:36 PM Herve Codina <herve.codina@bootlin.com> wrote: > > > > Hi Bartosz, > > > > On Thu, 22 Feb 2024 00:31:08 -0800 > > Bartosz Golaszewski <brgl@bgdev.pl> wrote: > > > > > On Thu, 22 Feb 2024 02:05:30 +0100, Kent Gibson <warthog618@gmail.com> said: > > > > On Thu, Feb 22, 2024 at 08:57:44AM +0800, Kent Gibson wrote: > > > >> On Tue, Feb 20, 2024 at 10:29:59PM +0800, Kent Gibson wrote: > > > >> > On Tue, Feb 20, 2024 at 12:10:18PM +0100, Herve Codina wrote: > > > >> > > > >> ... > > > >> > > > >> > > } > > > >> > > > > > >> > > +static int linereq_unregistered_notify(struct notifier_block *nb, > > > >> > > + unsigned long action, void *data) > > > >> > > +{ > > > >> > > + struct linereq *lr = container_of(nb, struct linereq, > > > >> > > + device_unregistered_nb); > > > >> > > + int i; > > > >> > > + > > > >> > > + for (i = 0; i < lr->num_lines; i++) { > > > >> > > + if (lr->lines[i].desc) > > > >> > > + edge_detector_stop(&lr->lines[i]); > > > >> > > + } > > > >> > > + > > > >> > > > > >> > Firstly, the re-ordering in the previous patch creates a race, > > > >> > as the NULLing of the gdev->chip serves to numb the cdev ioctls, so > > > >> > there is now a window between the notifier being called and that numbing, > > > >> > during which userspace may call linereq_set_config() and re-request > > > >> > the irq. > > > >> > > > > >> > There is also a race here with linereq_set_config(). That can be prevented > > > >> > by holding the lr->config_mutex - assuming the notifier is not being called > > > >> > from atomic context. > > > >> > > > > >> > > > >> It occurs to me that the fixed reordering in patch 1 would place > > > >> the notifier call AFTER the NULLing of the ioctls, so there will no longer > > > >> be any chance of a race with linereq_set_config() - so holding the > > > >> config_mutex semaphore is not necessary. > > > >> > > > > > > > > NULLing -> numbing > > > > > > > > The gdev->chip is NULLed, so the ioctls are numbed. > > > > And I need to let the coffee soak in before sending. > > > > > > > >> In which case this patch is fine - it is only patch 1 that requires > > > >> updating. > > > >> > > > >> Cheers, > > > >> Kent. > > > > > > > > > > The fix for the user-space issue may be more-or-less correct but the problem is > > > deeper and this won't fix it for in-kernel users. > > > > > > Herve: please consider the following DT snippet: > > > > > > gpio0 { > > > compatible = "foo"; > > > > > > gpio-controller; > > > #gpio-cells = <2>; > > > interrupt-controller; > > > #interrupt-cells = <1>; > > > ngpios = <8>; > > > }; > > > > > > consumer { > > > compatible = "bar"; > > > > > > interrupts-extended = <&gpio0 0>; > > > }; > > > > > > If you unbind the "gpio0" device after the consumer requested the interrupt, > > > you'll get the same splat. And device links will not help you here (on that > > > note: Saravana: is there anything we could do about it? Have you even > > > considered making the irqchip subsystem use the driver model in any way? Is it > > > even feasible?). I did add support to irqchip to use the driver model. See IRQCHIP_PLATFORM_DRIVER_BEGIN() and uses of it. So this makes sure the probe ordering is correct. But when I added that support, there was some pushback on making the modules removable[1]. But that's why you'll see that the IRQCHIP_PLATFORM_DRIVER_BEGIN() macro set .suppress_bind_attrs = true. Do you have a way to unregister an interrupt controller in your example? If so, how do you unregister it? It shouldn't be too hard to extend those macros to add removal support. We could add a IRQCHIP_MATCH2() that also takes in an exit() function op that gets called on device unbind. [1] - https://lore.kernel.org/lkml/86sghas7so.wl-maz@kernel.org/#t > > > > > > I would prefer this to be fixed at a lower lever than the GPIOLIB character > > > device. > > > > I think this use case is covered. > > When the consumer device related to the consumer DT node is added, a > > consumer/supplier relationship is created: > > parse_interrupts() parses the 'interrups-extended' property > > https://elixir.bootlin.com/linux/v6.8-rc1/source/drivers/of/property.c#L1316 > > and so, of_link_to_phandle() creates the consumer/supplier link. > > https://elixir.bootlin.com/linux/v6.8-rc1/source/drivers/of/property.c#L1316 > > > > We that link present, if the supplier is removed, the consumer is removed > > before. > > The consumer should release the interrupt during its remove process (i.e > > explicit in its .remove() or explicit because of a devm_*() call). > > > > At least, it is my understanding. > > Well, then it doesn't work, because I literally just tried it before > sending my previous email. For your gpio0 device, can you see why __device_release_driver() doesn't end up calling device_links_unbind_consumers()? Also, can you look at /sys/class/devlink/<bus:gpio0-devicename>--<consumer device name> folders and see what the status file says before you try to unbind the gpio0 device? It should say "active". > Please try it yourself, you'll see. > > Also: an interrupt controller may not even have a device consuming its > DT node (see IRQCHIP_DECLARE()), what happens then? Yeah, we are screwed in those cases. Ideally we are rejecting all submissions for irqchip drivers that use IRQCHIP_DECLARE(). -Saravana
On Fri, Feb 23, 2024 at 12:51 AM Saravana Kannan <saravanak@google.com> wrote: > [snip] > > > > > > > > If you unbind the "gpio0" device after the consumer requested the interrupt, > > > > you'll get the same splat. And device links will not help you here (on that > > > > note: Saravana: is there anything we could do about it? Have you even > > > > considered making the irqchip subsystem use the driver model in any way? Is it > > > > even feasible?). > > I did add support to irqchip to use the driver model. See > IRQCHIP_PLATFORM_DRIVER_BEGIN() and uses of it. So this makes sure > the probe ordering is correct. > > But when I added that support, there was some pushback on making the > modules removable[1]. But that's why you'll see that the > IRQCHIP_PLATFORM_DRIVER_BEGIN() macro set .suppress_bind_attrs = true. > > Do you have a way to unregister an interrupt controller in your > example? If so, how do you unregister it? It shouldn't be too hard to > extend those macros to add removal support. We could add a > IRQCHIP_MATCH2() that also takes in an exit() function op that gets > called on device unbind. > The provider in my example is a GPIO chip that registers its own IRQ domain. The domain is destroyed when the GPIO controller is unregistered but interrupts can be requested orthogonally to the GPIO subsystem and we don't have the knowledge about any interrupt users as the .to_irq() callback is never called. Let me know if anything can be done in this case. I used the gpio-sim testing module for it (instantiated over DT) but any such GPIO chip that is also an interrupt-controller would behave the same. > [1] - https://lore.kernel.org/lkml/86sghas7so.wl-maz@kernel.org/#t > > > > > > > > > I would prefer this to be fixed at a lower lever than the GPIOLIB character > > > > device. > > > > > > I think this use case is covered. > > > When the consumer device related to the consumer DT node is added, a > > > consumer/supplier relationship is created: > > > parse_interrupts() parses the 'interrups-extended' property > > > https://elixir.bootlin.com/linux/v6.8-rc1/source/drivers/of/property.c#L1316 > > > and so, of_link_to_phandle() creates the consumer/supplier link. > > > https://elixir.bootlin.com/linux/v6.8-rc1/source/drivers/of/property.c#L1316 > > > > > > We that link present, if the supplier is removed, the consumer is removed > > > before. > > > The consumer should release the interrupt during its remove process (i.e > > > explicit in its .remove() or explicit because of a devm_*() call). > > > > > > At least, it is my understanding. > > > > Well, then it doesn't work, because I literally just tried it before > > sending my previous email. > > For your gpio0 device, can you see why __device_release_driver() > doesn't end up calling device_links_unbind_consumers()? > It never gets into the while (device_links_busy(dev)) loop. > Also, can you look at > /sys/class/devlink/<bus:gpio0-devicename>--<consumer device name> > folders and see what the status file says before you try to unbind the > gpio0 device? It should say "active". > It says "available". If you have some board supported upstream you could use for testing, I think I could prepare you a DT snippet for testing. Bart [snip]
diff --git a/drivers/gpio/gpiolib-cdev.c b/drivers/gpio/gpiolib-cdev.c index 2a88736629ef..aec4a4c8490a 100644 --- a/drivers/gpio/gpiolib-cdev.c +++ b/drivers/gpio/gpiolib-cdev.c @@ -688,17 +688,6 @@ static void line_set_debounce_period(struct line *line, GPIO_V2_LINE_FLAG_EVENT_CLOCK_HTE | \ GPIO_V2_LINE_EDGE_FLAGS) -static int linereq_unregistered_notify(struct notifier_block *nb, - unsigned long action, void *data) -{ - struct linereq *lr = container_of(nb, struct linereq, - device_unregistered_nb); - - wake_up_poll(&lr->wait, EPOLLIN | EPOLLERR); - - return NOTIFY_OK; -} - static void linereq_put_event(struct linereq *lr, struct gpio_v2_line_event *le) { @@ -1189,6 +1178,23 @@ static int edge_detector_update(struct line *line, return edge_detector_setup(line, lc, line_idx, edflags); } +static int linereq_unregistered_notify(struct notifier_block *nb, + unsigned long action, void *data) +{ + struct linereq *lr = container_of(nb, struct linereq, + device_unregistered_nb); + int i; + + for (i = 0; i < lr->num_lines; i++) { + if (lr->lines[i].desc) + edge_detector_stop(&lr->lines[i]); + } + + wake_up_poll(&lr->wait, EPOLLIN | EPOLLERR); + + return NOTIFY_OK; +} + static u64 gpio_v2_line_config_flags(struct gpio_v2_line_config *lc, unsigned int line_idx) { @@ -1898,6 +1904,11 @@ static int lineevent_unregistered_notify(struct notifier_block *nb, struct lineevent_state *le = container_of(nb, struct lineevent_state, device_unregistered_nb); + if (le->irq) { + free_irq(le->irq, le); + le->irq = 0; + } + wake_up_poll(&le->wait, EPOLLIN | EPOLLERR); return NOTIFY_OK;