Message ID | 20221110064147.343514404@goodmis.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp762934wru; Wed, 9 Nov 2022 22:50:05 -0800 (PST) X-Google-Smtp-Source: AA0mqf68tIgHJx+o9kxPNFQGWU2ZRl3gBltkVJSNnJcG5iSKJgyQ5Q7DJvt0j2vgoE3lhEADO+wx X-Received: by 2002:a17:906:9484:b0:7ae:6c36:3e09 with SMTP id t4-20020a170906948400b007ae6c363e09mr15208596ejx.385.1668063005500; Wed, 09 Nov 2022 22:50:05 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668063005; cv=none; d=google.com; s=arc-20160816; b=oALEC3tAwgI2Hi1R+GvAS8gJnKNXUiK36X9fvG5Rc6IuOC7C8JJ9/fktY3PUm3+g+Q 6AkJ0QARfCcOQviMewsUVc4DNkBIZpI8xZMdjgpU9M94CEEkJ9wU+AEAc2CdVsHEugQO mK0Kt8oKr7jH7sqyaQbncQZZwMOJG6W4HSNVz23ivGRZy+NSKiVZILuea42svogj2DsM oM6soZ9braGOkpZW1OdKNcfi1abV9Wp6u2nYe3xyoLgp61wW/wZJE84orVAUHQrBaylG cL5OsFZVrzUEudfbLEW9QmKb9e5seHxgQv8GMaI9ZZhA7KxNPjdfYC70TUDO4rnQWsQS Qsqw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:subject:cc:to:from:date :user-agent:message-id; bh=NsBX8u9dsy6BhoeXZGwxtixLFdb64Kv7TTwzYwB8Y+8=; b=U2qzDy0CJBS410kWnKQl1f2x2qhG8hTQVVSZczz9v/MT9pJIIYnVnxhNAb5bHM1VnM gZ6p6DemhPGBpN1KvJdHxlt4BUga85kt2nZjPkHLWCJO7fnW9m3pkVJvvBaPhcxI80Q6 mmu7YIfGX24FxwP/1qTivWHZJBVpK2pLDSbtI6K37pQB8/oiKBCgQmKLbDjQycVBggrq aLPrGIIi96oUzyJ8vhoMRytGe9JssTy76DfSRRa1FgflhJ5LwW0U+BcFbJUlU9q8oKMu 2TDfiPWXRxmorkvgbdoqNUMYw3plpcEiNypmF+tBrQQg0mpTA9We2J3d+APVcIoKtvVK Asng== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id rh7-20020a17090720e700b0077ef3eece57si14077279ejb.144.2022.11.09.22.49.36; Wed, 09 Nov 2022 22:50:05 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232675AbiKJGlz (ORCPT <rfc822;dexuan.linux@gmail.com> + 99 others); Thu, 10 Nov 2022 01:41:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48524 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232547AbiKJGl2 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 10 Nov 2022 01:41:28 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B812E2EF72 for <linux-kernel@vger.kernel.org>; Wed, 9 Nov 2022 22:41:14 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 4A102B820DF for <linux-kernel@vger.kernel.org>; Thu, 10 Nov 2022 06:41:13 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id DC7C9C43147; Thu, 10 Nov 2022 06:41:11 +0000 (UTC) Received: from rostedt by gandalf.local.home with local (Exim 4.96) (envelope-from <rostedt@goodmis.org>) id 1ot1Fz-009EfT-1e; Thu, 10 Nov 2022 01:41:47 -0500 Message-ID: <20221110064147.343514404@goodmis.org> User-Agent: quilt/0.66 Date: Thu, 10 Nov 2022 01:41:05 -0500 From: Steven Rostedt <rostedt@goodmis.org> To: linux-kernel@vger.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org>, Thomas Gleixner <tglx@linutronix.de>, Stephen Boyd <sboyd@kernel.org>, Guenter Roeck <linux@roeck-us.net>, Anna-Maria Gleixner <anna-maria@linutronix.de>, Andrew Morton <akpm@linux-foundation.org>, Julia Lawall <Julia.Lawall@inria.fr> Subject: [PATCH v6 4/6] timers: Add timer_shutdown_sync() to be called before freeing timers References: <20221110064101.429013735@goodmis.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Spam-Status: No, score=-6.7 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1749090833884894039?= X-GMAIL-MSGID: =?utf-8?q?1749090833884894039?= |
Series |
timers: Use timer_shutdown*() before freeing timers
|
|
Commit Message
Steven Rostedt
Nov. 10, 2022, 6:41 a.m. UTC
From: "Steven Rostedt (Google)" <rostedt@goodmis.org> We are hitting a common bug were a timer is being triggered after it is freed. This causes a corruption in the timer link list and crashes the kernel. Unfortunately it is not easy to know what timer it was that was freed. Looking at the code, it appears that there are several cases that del_timer() is used when del_timer_sync() should have been. Add a timer_shutdown_sync() that not only does a del_timer_sync() but will mark the timer as terminated in case it gets rearmed, it will trigger a WARN_ON. The timer_shutdown_sync() is more likely to be used by developers that are about to free a timer, then using del_timer_sync() as the latter is not as obvious to being needed for freeing. Having the word "shutdown" in the name of the function will hopefully help developers know that that function needs to be called before freeing. The added bonus is the marking of the timer as being freed such that it will trigger a warning if it gets rearmed. At least that way if the system crashes on a freed timer, at least we may see which timer it was that was freed. There's some situations that already know that the timer is shutdown and does not need to perform the synchronization (or can not due to its context). For these locations there's timer_shutdown() that only shuts down the timer (prevents it from being rearmed) but does not add checks if the timer is currently running. This code is taken from Thomas Gleixner's "untested" version from my original patch and modified after testing and with some other comments from Linus addressed. As well as some extra comments added. Link: https://lore.kernel.org/all/87pmlrkgi3.ffs@tglx/ Link: https://lkml.kernel.org/r/20221106212702.363575800@goodmis.org Link: https://lore.kernel.org/all/20221105060024.598488967@goodmis.org/ Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Stephen Boyd <sboyd@kernel.org> Cc: Anna-Maria Gleixner <anna-maria@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Julia Lawall <Julia.Lawall@inria.fr> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> --- include/linux/timer.h | 27 ++++++++++++++++++++++----- kernel/time/timer.c | 43 ++++++++++++++++++++++++++----------------- 2 files changed, 48 insertions(+), 22 deletions(-)
Comments
On Thu, Nov 10 2022 at 01:41, Steven Rostedt wrote: $Subject: -ENOPARSE timers: Provide timer_shutdown_sync() and then have some reasonable explanation in the change log? > We are hitting a common bug were a timer is being triggered after it > is We are hitting? Talking in pluralis majestatis by now? > freed. This causes a corruption in the timer link list and crashes the > kernel. Unfortunately it is not easy to know what timer it was that was Well, that's not entirely true. debugobjects can tell you exactly what happens. > freed. Looking at the code, it appears that there are several cases that > del_timer() is used when del_timer_sync() should have been. > diff --git a/kernel/time/timer.c b/kernel/time/timer.c > index 717fcb9fb14a..111a3550b3f2 100644 > --- a/kernel/time/timer.c > +++ b/kernel/time/timer.c > @@ -1017,7 +1017,8 @@ __mod_timer(struct timer_list *timer, unsigned long expires, unsigned int option > unsigned int idx = UINT_MAX; > int ret = 0; > > - BUG_ON(!timer->function); > + if (WARN_ON_ONCE(!timer->function)) > + return -EINVAL; Can you please make these BUG -> WARN conversions a separate patch? > +/** > + * timer_shutdown_sync - called before freeing the timer 1) The sentence after the dash starts with an upper case letter as all sentences do. 2) "called before freeing the timer" tells us what? See below. > + * @timer: The timer to be freed > + * > + * Shutdown the timer before freeing. This will return when all pending timers > + * have finished and it is safe to free the timer. "_ALL_ pending timers have finished?" This is about exactly _ONE_ timer, i.e. the one which is handed in via the @timer argument. You want to educate people to do the right thing and then you go and provide them uncomprehensible documentation garbage. How is that supposed to work? Can you please stop this frenzy and get your act together? > + * > + * Note, after calling this, if the timer is added back to the queue > + * it will fail to be added and a WARNING will be triggered. There is surely a way to express this so that the average driver writer who does not have the background of you working on this understands this "note". > + * > + * Returns if it deactivated a pending timer or not. Please look up the kernel-doc syntax for documenting return values. Thanks, tglx
On Thu, Nov 10 2022 at 01:41, Steven Rostedt wrote: > +static inline int timer_shutdown_sync(struct timer_list *timer) > +{ > + return __del_timer_sync(timer, true); > +} > +static int __try_to_del_timer_sync(struct timer_list *timer, bool free) > { > struct timer_base *base; > unsigned long flags; > @@ -1285,11 +1281,25 @@ int try_to_del_timer_sync(struct timer_list *timer) > > if (base->running_timer != timer) > ret = detach_if_pending(timer, base, true); > + if (free) > + timer->function = NULL; Same problem as in the timer_shutdown() case just more subtle: CPU0 CPU1 lock_timer(timer); base->running_timer = timer; fn = timer->function; unlock_timer(timer); fn(timer) { __try_to_del_timer_sync(timer, free=true) lock_timer(timer); if (base->running_timer != timer) // Not taken if (free) mod_timer(timer); if (WARN_ON_ONCE(!timer->function)) return; // not taken timer->function = NULL; unlock_timer(timer); lock_timer(timer); enqueue_timer(timer); unlock_timer(timer); } //timer expires lock_timer(timer); fn = timer->function; unlock_timer(timer); fn(timer); <--- NULL pointer dereference You surely have spent a massive amount of analysis on this! Can you please explain how you came up with the brilliant idea of asking Linus to pull this post -rc4 without a review from the timer maintainers or anyone else who understands concurrency? If we really want to make this work, then this needs at least a sanity check of timer->function in the mod/add*_timer() path _after_ locking the timer. Though I'm not convinced that this would really be cutting it simply because the circular dependencies of timer scheduling work and work arming timer is as demonstrated above not as trivial as you might think. In the worst case the concurrent code path might still end up in a UAF as far as I can tell. But what's worse is that you try to create the illusion that timer_shutdown_sync() is actually preventing people from shooting themself into their feet. As implemented right now it's just a bandaid which makes it less likely, but does neither prevent any of the hard to debug shutdown issues nor the resulting holes in peoples feets. Thanks, tglx
On Sun, 13 Nov 2022 22:52:16 +0100 Thomas Gleixner <tglx@linutronix.de> wrote: > On Thu, Nov 10 2022 at 01:41, Steven Rostedt wrote: > > $Subject: -ENOPARSE > > timers: Provide timer_shutdown_sync() > > and then have some reasonable explanation in the change log? > > > We are hitting a common bug were a timer is being triggered after it > > is > > We are hitting? Talking in pluralis majestatis by now? Should I say Chromebooks are hitting? > > > freed. This causes a corruption in the timer link list and crashes the > > kernel. Unfortunately it is not easy to know what timer it was that was > > Well, that's not entirely true. debugobjects can tell you exactly what > happens. Only if you have it enabled when it happens, and it has too much overhead to run in production. The full series changes debug object timers to report an issue if there's a timer not in the shutdown state when it is freed. This catches potential issues similar to how lockdep can catch potential deadlocks without having to hit the deadlock. The current debug object timers only catches it if the race condition is hit. > > > freed. Looking at the code, it appears that there are several cases that > > del_timer() is used when del_timer_sync() should have been. > > diff --git a/kernel/time/timer.c b/kernel/time/timer.c > > index 717fcb9fb14a..111a3550b3f2 100644 > > --- a/kernel/time/timer.c > > +++ b/kernel/time/timer.c > > @@ -1017,7 +1017,8 @@ __mod_timer(struct timer_list *timer, unsigned long expires, unsigned int option > > unsigned int idx = UINT_MAX; > > int ret = 0; > > > > - BUG_ON(!timer->function); > > + if (WARN_ON_ONCE(!timer->function)) > > + return -EINVAL; > > Can you please make these BUG -> WARN conversions a separate patch? OK. > > > +/** > > + * timer_shutdown_sync - called before freeing the timer > > 1) The sentence after the dash starts with an upper case letter as all > sentences do. > > 2) "called before freeing the timer" tells us what? > > See below. > > > + * @timer: The timer to be freed > > + * > > + * Shutdown the timer before freeing. This will return when all pending timers > > + * have finished and it is safe to free the timer. > > "_ALL_ pending timers have finished?" > > This is about exactly _ONE_ timer, i.e. the one which is handed in via > the @timer argument. > > You want to educate people to do the right thing and then you go and > provide them uncomprehensible documentation garbage. How is that > supposed to work? I don't know. Other people I showed this to appeared to understand it. But I'm all for updates. > > Can you please stop this frenzy and get your act together? What the hell. I'm just trying to get this in because it's a thorn in our side. Sorry I'm not up to par with your expectations. I'm willing to make changes, but let's leave out the insults. This work is being done on top of my day job. > > > + * > > + * Note, after calling this, if the timer is added back to the queue > > + * it will fail to be added and a WARNING will be triggered. > > There is surely a way to express this so that the average driver writer > who does not have the background of you working on this understands this > "note". > > > + * > > + * Returns if it deactivated a pending timer or not. > > Please look up the kernel-doc syntax for documenting return values. > Will do. -- Steve
On Mon, 14 Nov 2022 00:18:21 +0100 Thomas Gleixner <tglx@linutronix.de> wrote: > > @@ -1285,11 +1281,25 @@ int try_to_del_timer_sync(struct timer_list *timer) > > > > if (base->running_timer != timer) > > ret = detach_if_pending(timer, base, true); > > + if (free) > > + timer->function = NULL; > > Same problem as in the timer_shutdown() case just more subtle: > > CPU0 CPU1 > > lock_timer(timer); > base->running_timer = timer; > fn = timer->function; > unlock_timer(timer); > fn(timer) { > > __try_to_del_timer_sync(timer, free=true) > lock_timer(timer); > if (base->running_timer != timer) > // Not taken > if (free) mod_timer(timer); > if (WARN_ON_ONCE(!timer->function)) > return; // not taken > timer->function = NULL; > unlock_timer(timer); > lock_timer(timer); > enqueue_timer(timer); > unlock_timer(timer); > } > > //timer expires > lock_timer(timer); > fn = timer->function; > unlock_timer(timer); > fn(timer); <--- NULL pointer dereference > > You surely have spent a massive amount of analysis on this! > > Can you please explain how you came up with the brilliant idea of asking > Linus to pull this post -rc4 without a review from the timer maintainers > or anyone else who understands concurrency? I trusted the source of this code: https://lore.kernel.org/all/87pmlrkgi3.ffs@tglx/ -- Steve
On Sun, Nov 13 2022 at 19:15, Steven Rostedt wrote: > Thomas Gleixner <tglx@linutronix.de> wrote: >> You surely have spent a massive amount of analysis on this! >> >> Can you please explain how you came up with the brilliant idea of asking >> Linus to pull this post -rc4 without a review from the timer maintainers >> or anyone else who understands concurrency? > > I trusted the source of this code: > > https://lore.kernel.org/all/87pmlrkgi3.ffs@tglx/ Sure because uncomplied suggestions are the ultimate source of truth and correctness, right? I'm terribly sorry that I misled you on this, but OTOH it's pretty obvious that you decided to ignore: https://lore.kernel.org/all/87v8vjiaih.ffs@tglx/ Thanks, tglx
On Sun, Nov 13 2022 at 19:11, Steven Rostedt wrote: > On Sun, 13 Nov 2022 22:52:16 +0100 > Thomas Gleixner <tglx@linutronix.de> wrote: >> > We are hitting a common bug were a timer is being triggered after it >> > is >> >> We are hitting? Talking in pluralis majestatis by now? > > Should I say Chromebooks are hitting? That would be at least more comprehensible than 'We', unless you (or whoever is 'We') is a synomym for chromeborks. >> > freed. This causes a corruption in the timer link list and crashes the >> > kernel. Unfortunately it is not easy to know what timer it was that was >> >> Well, that's not entirely true. debugobjects can tell you exactly what >> happens. > > Only if you have it enabled when it happens, and it has too much > overhead to run in production. The full series changes debug object > timers to report an issue if there's a timer not in the shutdown state > when it is freed. The series changes 'debug object timers' to report an issue? Can you pretty please stop this completely nonsensical blurb? This series has absolutely nothing to do with debugobjects at least not to my knowledge. If the series expands the magics of debugobjects then you fundamentaly failed to explain that. > This catches potential issues similar to how lockdep can catch > potential deadlocks without having to hit the deadlock. By introducing new problems? > The current debug object timers only catches it if the race condition > is hit. True. But most if not all of the mentioned issues have been reported before via debugobject enabled kernels. So what's the actual benefit? >> > + * @timer: The timer to be freed >> > + * >> > + * Shutdown the timer before freeing. This will return when all pending timers >> > + * have finished and it is safe to free the timer. >> >> "_ALL_ pending timers have finished?" >> >> This is about exactly _ONE_ timer, i.e. the one which is handed in via >> the @timer argument. >> >> You want to educate people to do the right thing and then you go and >> provide them uncomprehensible documentation garbage. How is that >> supposed to work? > > I don't know. Other people I showed this to appeared to understand it. > But I'm all for updates. Do I really need to explain to you what the diffference between 'all pending timers' and the one which is subject of the function call is? No, I'm not rewriting this for you and your peers who care obviously as much about correctness as you do. >> Can you please stop this frenzy and get your act together? > > What the hell. I'm just trying to get this in because it's a thorn in > our side. It's not a thorn in 'our' (who ever is our) side. It's a fundamental problem of circular shutdown dependencies as I explained to you long ago. > Sorry I'm not up to par with your expectations. I'm willing to make > changes, but let's leave out the insults. This work is being done on > top of my day job. Sure and because of that you are talking about this as a 'thorn on our side'. If that's a thorn at (I assume) your employers side, which is then related to your day job, then you should have the backing of that company to spend company time on it and not inflict half baken changes on the kernel which solve nothing. Coming back to your claim that I'm insulting. Please point me to the actual insult I commenced and I'm happy to apologize. Thanks, Thomas
On Mon, 14 Nov 2022 01:33:25 +0100 Thomas Gleixner <tglx@linutronix.de> wrote: > On Sun, Nov 13 2022 at 19:15, Steven Rostedt wrote: > > Thomas Gleixner <tglx@linutronix.de> wrote: > >> You surely have spent a massive amount of analysis on this! > >> > >> Can you please explain how you came up with the brilliant idea of asking > >> Linus to pull this post -rc4 without a review from the timer maintainers > >> or anyone else who understands concurrency? > > > > I trusted the source of this code: > > > > https://lore.kernel.org/all/87pmlrkgi3.ffs@tglx/ > > Sure because uncomplied suggestions are the ultimate source of truth and > correctness, right? Well, I figured it covered the race conditions. > > I'm terribly sorry that I misled you on this, but OTOH it's pretty > obvious that you decided to ignore: > > https://lore.kernel.org/all/87v8vjiaih.ffs@tglx/ > I'm not sure what you mean by that. The idea is that once timer_shutdown() is called, we still warn on re-arming the timer. Yeah, I did not follow Linus's suggestion that we just use shutdown to prevent the race and let it re-arm if it wants. That is, I did not blindly convert all del_timer_sync() to timer_shutdown(). The script only converts it if there's an immediate free of the object that holds the timer in the same function without any paths to avoid it. The final patch series (https://lore.kernel.org/all/20221104054053.431922658@goodmis.org/) works to make sure that after the shutdown is called, it does not get re-armed. -- Steve
On Mon, 14 Nov 2022 02:04:56 +0100 Thomas Gleixner <tglx@linutronix.de> wrote: > On Sun, Nov 13 2022 at 19:11, Steven Rostedt wrote: > > On Sun, 13 Nov 2022 22:52:16 +0100 > > Thomas Gleixner <tglx@linutronix.de> wrote: > >> > We are hitting a common bug were a timer is being triggered after it > >> > is > >> > >> We are hitting? Talking in pluralis majestatis by now? > > > > Should I say Chromebooks are hitting? > > That would be at least more comprehensible than 'We', unless you (or > whoever is 'We') is a synomym for chromeborks. Sure, I'll update it to start with: Out in the field, the main cause of kernel crashes for Chromebooks is in the timer code. > > >> > freed. This causes a corruption in the timer link list and crashes the > >> > kernel. Unfortunately it is not easy to know what timer it was that was > >> > >> Well, that's not entirely true. debugobjects can tell you exactly what > >> happens. > > > > Only if you have it enabled when it happens, and it has too much > > overhead to run in production. The full series changes debug object > > timers to report an issue if there's a timer not in the shutdown state > > when it is freed. > > The series changes 'debug object timers' to report an issue? The full series does. This isn't the full series, but only the part that Linus asked for. https://lore.kernel.org/lkml/20221104054917.915205356@goodmis.org/ > > Can you pretty please stop this completely nonsensical blurb? This > series has absolutely nothing to do with debugobjects at least not to > my knowledge. If the series expands the magics of debugobjects then > you fundamentaly failed to explain that. The full series does, but I was asked by Linus to only give the part that he could take early. The changes to debugobjects can only be done after we covert the other users of timers to make sure they are shutdown before being freed. Otherwise you will get a lot of false positives. > > > This catches potential issues similar to how lockdep can catch > > potential deadlocks without having to hit the deadlock. > > By introducing new problems? > > > The current debug object timers only catches it if the race condition > > is hit. > > True. But most if not all of the mentioned issues have been reported > before via debugobject enabled kernels. So what's the actual benefit? Because we are still hitting bugs in the field and have no idea who the culprit is. The bugs are triggered by what users are doing (probably unplugging some USB device or something) and we have not been able to reproduce it in the lab. The user's activities causes a crash later on in the timer code. And the crash report shows the backtrace in the timer code where the timer link list is corrupted. Something that would happen if the object was freed. > > >> > + * @timer: The timer to be freed > >> > + * > >> > + * Shutdown the timer before freeing. This will return when all pending timers > >> > + * have finished and it is safe to free the timer. > >> > >> "_ALL_ pending timers have finished?" > >> > >> This is about exactly _ONE_ timer, i.e. the one which is handed in via > >> the @timer argument. > >> > >> You want to educate people to do the right thing and then you go and > >> provide them uncomprehensible documentation garbage. How is that > >> supposed to work? > > > > I don't know. Other people I showed this to appeared to understand it. > > But I'm all for updates. > > Do I really need to explain to you what the diffference between 'all > pending timers' and the one which is subject of the function call is? > > No, I'm not rewriting this for you and your peers who care obviously as > much about correctness as you do. I'm not asking you to rewrite it, I'm fine doing it. My response here was due to your condescending remarks. That is: Instead of saying: You want to educate people to do the right thing and then you go and provide them uncomprehensible documentation garbage. How is that supposed to work? say: You want to educate people to do the right thing, then please be more accurate in your terminology. "All pending timers" is confusing because this is about _ONE_ timer, i.e. the one which is handed in via the @timer argument. Please rewrite the kernel doc to reflect this. > > >> Can you please stop this frenzy and get your act together? > > > > What the hell. I'm just trying to get this in because it's a thorn in > > our side. > > It's not a thorn in 'our' (who ever is our) side. It's a fundamental > problem of circular shutdown dependencies as I explained to you long > ago. The thorn is in the Chromebook users, that are having their machines crash due to something freeing an active timer. > > > Sorry I'm not up to par with your expectations. I'm willing to make > > changes, but let's leave out the insults. This work is being done on > > top of my day job. > > Sure and because of that you are talking about this as a 'thorn on our > side'. If that's a thorn at (I assume) your employers side, which is > then related to your day job, then you should have the backing of that > company to spend company time on it and not inflict half baken changes > on the kernel which solve nothing. It may be my employer's, but not my team's issue. It's Guenter's team where I looked at a bug report that he posted and figured I could help. But I have other responsibilities that are not going away when I decided to help here. Thus, I just extended my work week. This is why I came back to it. I reported this back in April, but then found myself too busy with my current job to follow through with it. Then recently Guenter reported that the timer crashes are still the #1 reason for kernel crashes, and I figured I should then finish this series. > > Coming back to your claim that I'm insulting. Please point me to the > actual insult I commenced and I'm happy to apologize. It's more the condescending attitude than a direct insult. -- Steve
On Mon, Nov 14 2022 at 01:33, Thomas Gleixner wrote:
> https://lore.kernel.org/all/87v8vjiaih.ffs@tglx/
I went back to the original thread and looked at the Bluetooth example
and then at commit 72ef98445aca ("Bluetooth: hci_qca: Use del_timer_sync()
before freeing"). That commit fixes the obvious problem of using
del_timer() instead of del_timer_sync(). Also the reordering of the
timer teardown vs. the workqueue teardown makes it less likely to
explode, but it's still fundamentally broken.
destroy_workqueue(wq);
/* After this point @wq cannot be touched anymore */
---> timer expires
queue_work(wq) <---- Explodes with a NULl pointer dereference
deep in the work queue core code.
del_timer_sync(t);
As I said in the above mail:
"So well written drivers have a priv->shutdown flag which makes timer
callbacks and workqueue functions aware that a shutdown is in progress
so they can take appropriate action."
That's exactly the point why I was not convinced that any form of
timer_shutdown_sync() will solve these kind of problem. It might just
lure people into the false expectation that all teardown ordering
problems go magically away when this function is used.
The above commit is just a proof.
timer_shutdown_sync() can solve the problem in that driver, but you
_cannot_ issue a warning if any of the enqueue functions is invoked with
timer->function == NULL. Why?
The ordering in that driver would have to go back to the original
ordering to prevent the above problem.
timer_shutdown_sync(t);
Now t->function == NULL, right?
destroy_workqueue(wq)
drain_workqueue(wq)
bt_work()
mod_timer(t); <- would warn because t->function == NULL
So if we want to make this solid and make the life of driver writers
easier, then we cannot issue a warning as I said in the original thread
already.
The semantics of timer_shutdown_sync() have to be:
After return:
- the timer is not queued
- the timer callbacks is not running
- the timer cannot be enqueued again
For that BT case this is the right thing to do because the draining of
the pending work via destroy_workqueue() must not rearm the timers.
There is no functional requirement to do so because the device is
on the way out already.
It won't solve all of those problems but probably quite some of
them. Needs a careful look at each usage site.
So something like the below should do the trick. It's compiled this time
and I spent more than 5 seconds to stare at it. Still needs some
eyeballs and splitting apart into more digestable pieces.
The only downside of this is that timers which are not properly
initialized are now silently ignored. That's not a real problem as
driver writers should run their code with debugobjects enabled at least
once, which will tell them nicely. So if someone has to scratch his head
why his timer is not firing, then it's well deserved.
Thanks,
tglx
---
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -183,12 +183,47 @@ extern int timer_reduce(struct timer_lis
extern void add_timer(struct timer_list *timer);
extern int try_to_del_timer_sync(struct timer_list *timer);
+extern int timer_delete_sync(struct timer_list *timer, bool shutdown);
-#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT)
- extern int del_timer_sync(struct timer_list *timer);
-#else
-# define del_timer_sync(t) del_timer(t)
-#endif
+/**
+ * del_timer_sync - Delete a pending timer and wait for a running callback
+ * @timer: The timer to be deleted
+ *
+ * The function ensures under timer_base(@timer)->lock that:
+ * - @timer is not queued
+ * - The callback function of @timer is not running
+ *
+ * But this function cannot guarantee that the timer is not rearmed again
+ * by some concurrent or preempting code, right after it dropped the base
+ * lock.
+ *
+ * If this guarantee is needed, e.g. for teardown, then use
+ * timer_shutdown_sync() instead.
+ *
+ * Returns: %0 if the timer was not pending
+ * %1 if the timer was pending
+ */
+static inline int del_timer_sync(struct timer_list *timer)
+{
+ return timer_delete_sync(timer, false);
+}
+
+/**
+ * timer_shutdown_sync - Shutdown a timer and prevent rearming
+ * @timer: The timer to be shutdown
+ *
+ * When the function returns it is guaranteed that:
+ * - @timer is not queued
+ * - The callback function of @timer is not running
+ * - @timer cannot be enqueued again
+ *
+ * Returns: %0 if the timer was not pending
+ * %1 if the timer was pending
+ */
+static inline int timer_shutdown_sync(struct timer_list *timer)
+{
+ return timer_delete_sync(timer, true);
+}
#define del_singleshot_timer_sync(t) del_timer_sync(t)
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1017,8 +1017,6 @@ static inline int
unsigned int idx = UINT_MAX;
int ret = 0;
- BUG_ON(!timer->function);
-
/*
* This is a common optimization triggered by the networking code - if
* the timer is re-modified to have the same timeout or ends up in the
@@ -1044,6 +1042,15 @@ static inline int
* dequeue/enqueue dance.
*/
base = lock_timer_base(timer, &flags);
+ /*
+ * Has @timer been shutdown? This needs to be evaluated
+ * while holding base lock to prevent a race against the
+ * shutdown code.
+ */
+ if (!timer->function) {
+ ret = 0;
+ goto out_unlock;
+ }
forward_timer_base(base);
if (timer_pending(timer) && (options & MOD_TIMER_REDUCE) &&
@@ -1070,6 +1077,15 @@ static inline int
}
} else {
base = lock_timer_base(timer, &flags);
+ /*
+ * Has @timer been shutdown? This needs to be evaluated
+ * while holding base lock to prevent a race against the
+ * shutdown code.
+ */
+ if (!timer->function) {
+ ret = 0;
+ goto out_unlock;
+ }
forward_timer_base(base);
}
@@ -1193,7 +1209,8 @@ EXPORT_SYMBOL(timer_reduce);
*/
void add_timer(struct timer_list *timer)
{
- BUG_ON(timer_pending(timer));
+ if (WARN_ON_ONCE(timer_pending(timer)))
+ return;
__mod_timer(timer, timer->expires, MOD_TIMER_NOTPENDING);
}
EXPORT_SYMBOL(add_timer);
@@ -1210,7 +1227,8 @@ void add_timer_on(struct timer_list *tim
struct timer_base *new_base, *base;
unsigned long flags;
- BUG_ON(timer_pending(timer) || !timer->function);
+ if (WARN_ON_ONCE(timer_pending(timer)))
+ return;
new_base = get_timer_cpu_base(timer->flags, cpu);
@@ -1220,6 +1238,13 @@ void add_timer_on(struct timer_list *tim
* wrong base locked. See lock_timer_base().
*/
base = lock_timer_base(timer, &flags);
+ /*
+ * Has @timer been shutdown? This needs to be evaluated while
+ * holding base lock to prevent a race against the shutdown code.
+ */
+ if (!timer->function)
+ goto out_unlock;
+
if (base != new_base) {
timer->flags |= TIMER_MIGRATING;
@@ -1233,20 +1258,22 @@ void add_timer_on(struct timer_list *tim
debug_timer_activate(timer);
internal_add_timer(base, timer);
+out_unlock:
raw_spin_unlock_irqrestore(&base->lock, flags);
}
EXPORT_SYMBOL_GPL(add_timer_on);
/**
- * del_timer - deactivate a timer.
- * @timer: the timer to be deactivated
+ * del_timer - Deactivate a timer.
+ * @timer: The timer to be deactivated
*
- * del_timer() deactivates a timer - this works on both active and inactive
- * timers.
+ * Returns: %0 If the timer was not pending
+ * %1 If the timer was pending and deactivated
*
- * The function returns whether it has deactivated a pending timer or not.
- * (ie. del_timer() of an inactive timer returns 0, del_timer() of an
- * active timer returns 1.)
+ * Note, the function does not wait for an eventually running timer
+ * callback on a different CPU and it neither prevents rearming of
+ * the timer. See del_timer_sync() and timer_shutdown_sync() for
+ * alternative options.
*/
int del_timer(struct timer_list *timer)
{
@@ -1267,13 +1294,24 @@ int del_timer(struct timer_list *timer)
EXPORT_SYMBOL(del_timer);
/**
- * try_to_del_timer_sync - Try to deactivate a timer
- * @timer: timer to delete
+ * __try_to_del_timer_sync - Internal function: Try to deactivate a timer
+ * @timer: Timer to deactivate
+ * @shutdown: If true this indicates that the timer is about to be
+ * shutdown permanently.
+ *
+ * This function tries to deactivate @timer.
+ *
+ * If @shutdown is true then @timer->function is set to NULL under the
+ * timer base lock which prevents further rearming of the timer.
+ *
+ * Returns: %0 If the timer was not pending
+ * %1 If the timer was pending and deactivated
+ * %-1 If the timer callback is running on a different CPU
*
- * This function tries to deactivate a timer. Upon successful (ret >= 0)
- * exit the timer is not queued and the handler is not running on any CPU.
+ * Note: This function cannot guarantee that the timer cannot be rearmed
+ * after dropping the base lock unless @shutdown is true.
*/
-int try_to_del_timer_sync(struct timer_list *timer)
+static int __try_to_del_timer_sync(struct timer_list *timer, bool free)
{
struct timer_base *base;
unsigned long flags;
@@ -1285,11 +1323,30 @@ int try_to_del_timer_sync(struct timer_l
if (base->running_timer != timer)
ret = detach_if_pending(timer, base, true);
+ if (free)
+ timer->function = NULL;
raw_spin_unlock_irqrestore(&base->lock, flags);
return ret;
}
+
+/**
+ * try_to_del_timer_sync - Try to deactivate a timer
+ * @timer: Timer to deactivate
+ *
+ * Returns: %0 If the timer was not pending
+ * %1 If the timer was pending and deactivated
+ * %-1 If the timer callback is running on a different CPU
+ *
+ * Note: This function cannot guarantee that the timer cannot be rearmed
+ * right after dropping the base lock. That needs to be prevented
+ * by the calling code if necessary.
+ */
+int try_to_del_timer_sync(struct timer_list *timer)
+{
+ return __try_to_del_timer_sync(timer, false);
+}
EXPORT_SYMBOL(try_to_del_timer_sync);
#ifdef CONFIG_PREEMPT_RT
@@ -1365,16 +1422,13 @@ static inline void timer_sync_wait_runni
static inline void del_timer_wait_running(struct timer_list *timer) { }
#endif
-#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT)
/**
- * del_timer_sync - deactivate a timer and wait for the handler to finish.
- * @timer: the timer to be deactivated
+ * timer_delete_sync - Deactivate a timer and wait for the handler to finish.
+ * @timer: The timer to be deactivated
+ * @shutdown: If true @timer->function will be set to NULL under the
+ * timer base lock which prevents rearming of @timer
*
- * This function only differs from del_timer() on SMP: besides deactivating
- * the timer it also makes sure the handler has finished executing on other
- * CPUs.
- *
- * Synchronization rules: Callers must prevent restarting of the timer,
+ * SMP synchronization rules: Callers must prevent restarting of the timer,
* otherwise this function is meaningless. It must not be called from
* interrupt contexts unless the timer is an irqsafe one. The caller must
* not hold locks which would prevent completion of the timer's
@@ -1400,9 +1454,15 @@ static inline void del_timer_wait_runnin
* The interrupt on the other CPU is waiting to grab somelock but
* it has interrupted the softirq that CPU0 is waiting to finish.
*
- * The function returns whether it has deactivated a pending timer or not.
+ * If @shutdown is not set the timer can be rearmed later. If it is set
+ * then @timer->function is set to NULL under timer base lock which
+ * prevents rearming of the timer. If the timer should be reused after
+ * shutdown it has to be initialized again.
+ *
+ * Returns: %0 If the timer was not pending
+ * %1 If the timer was pending and deactivated
*/
-int del_timer_sync(struct timer_list *timer)
+int timer_delete_sync(struct timer_list *timer, bool shutdown)
{
int ret;
@@ -1432,7 +1492,7 @@ int del_timer_sync(struct timer_list *ti
lockdep_assert_preemption_enabled();
do {
- ret = try_to_del_timer_sync(timer);
+ ret = __try_to_del_timer_sync(timer, shutdown);
if (unlikely(ret < 0)) {
del_timer_wait_running(timer);
@@ -1442,8 +1502,7 @@ int del_timer_sync(struct timer_list *ti
return ret;
}
-EXPORT_SYMBOL(del_timer_sync);
-#endif
+EXPORT_SYMBOL(timer_delete_sync);
static void call_timer_fn(struct timer_list *timer,
void (*fn)(struct timer_list *),
@@ -1509,6 +1568,12 @@ static void expire_timers(struct timer_b
fn = timer->function;
+ if (WARN_ON_ONCE(!fn)) {
+ /* Should never happen. Emphasis on should! */
+ base->running_timer = NULL;
+ return;
+ }
+
if (timer->flags & TIMER_IRQSAFE) {
raw_spin_unlock(&base->lock);
call_timer_fn(timer, fn, baseclk);
On Mon, 14 Nov 2022 16:42:22 +0100 Thomas Gleixner <tglx@linutronix.de> wrote: > So something like the below should do the trick. It's compiled this time > and I spent more than 5 seconds to stare at it. Still needs some > eyeballs and splitting apart into more digestable pieces. Thanks Thomas. I really appreciate this. > > The only downside of this is that timers which are not properly > initialized are now silently ignored. That's not a real problem as > driver writers should run their code with debugobjects enabled at least > once, which will tell them nicely. So if someone has to scratch his head > why his timer is not firing, then it's well deserved. I just came back from my trip with over 300 patches to review :-p Luckily, for me, Masami is now a co-maintainer and has started that process already :-) When I catch up, I'll take a look at this more closely, and we (Guenter and I) will be running with DEBUG_OBJECTS enabled which will hopefully help catch missed places. At least for the drivers we care about ;-) -- Steve
On Mon, Nov 14, 2022 at 7:42 AM Thomas Gleixner <tglx@linutronix.de> wrote: > > So if we want to make this solid and make the life of driver writers > easier, then we cannot issue a warning as I said in the original thread > already. So I think that there are two issues at play: (a) do we want to *find* problem places after the conversion (b) do we want to make driver writing easier and (a) argues for warning on timer re-arming, but (b) just says "don't warn, just ignore it, the driver is being shut down". I'm personally ok with either of those approaches, and it's literally just a question of mindset. > The semantics of timer_shutdown_sync() have to be: > > After return: > - the timer is not queued > - the timer callbacks is not running > - the timer cannot be enqueued again Yes, but that last case is literally a "do we expect the *driver* to not enqueue it and warn if it tries, or do we just silently enforce it"? I agree with all three points. I'm just not sure about who we expect to do the "don't enqueue again". There's a big argument for "make it easy for driver writers" in just saying "make mod_timer() silently just ignore a re-arming". Making things easier for driver writers is a good thing. But maybe it's a "you shouldn't have done that in the first place" thing, and merits a warning? I have no strong opinions on that. What I *do* still want to happen is for subsystems to be able to start doing the conversion one by one. Which is why I'd still prefer to have the new names available just so that we don't have to have one 50-patch series, but we can have subsystems apply the obvious cases. And I'd still like the mindless "let's get the non-semantic changes out of the way" as one single patch, to get rid of mindless noise. And honestly, for that to happen I'd be perfectly happy with something like #define timer_shutdown(t) del_timer(t) #define timer_shutdown_sync(t) del_timer_sync(t) (obviously with the patches that first remove the existing 'timer_shutdown()' uses first). That wouldn't introduce the *new* semantics, but it would at least allow the different subsystems to do the obvious cases, and let the networking people wonder about the much less obvious ones. Linus
On Mon, 14 Nov 2022 09:16:31 -0800 Linus Torvalds <torvalds@linux-foundation.org> wrote: > And honestly, for that to happen I'd be perfectly happy with something like > > #define timer_shutdown(t) del_timer(t) > #define timer_shutdown_sync(t) del_timer_sync(t) > > (obviously with the patches that first remove the existing > 'timer_shutdown()' uses first). That wouldn't introduce the *new* > semantics, but it would at least allow the different subsystems to do > the obvious cases, and let the networking people wonder about the much > less obvious ones. I can create the above series, if Thomas is OK with this approach. -- Steve
On Mon, Nov 14, 2022 at 9:49 AM Steven Rostedt <rostedt@goodmis.org> wrote: > > I can create the above series, if Thomas is OK with this approach. Note that I'd definitely be more comfortable with a "real" implementation, but only if people are happy with it. Of course, the alternative is to just keep it entirely as one single separate branch that does all of this, and _not_ have subsystems merge things on their own at all. The only complicated cases I've seen (but maybe I just missed some) were networking, and they could do their stuff later. So I guess I don't care _that_ deeply, and if Thomas is happier with that "keep ti separate" thing, I won't object. Linus
On Mon, Nov 14 2022 at 09:08, Steven Rostedt wrote: > On Mon, 14 Nov 2022 02:04:56 +0100 > Thomas Gleixner <tglx@linutronix.de> wrote: >> >> Coming back to your claim that I'm insulting. Please point me to the >> actual insult I commenced and I'm happy to apologize. > > It's more the condescending attitude than a direct insult. I can see that. TBH, this was just my last line of defense to not being insulting, because I was seriously grumpy about this whole thing and even more so when I discovered that it was just hastily cobbled together and then sold as the panacea for solving driver teardown issues. You surely can do better and you very well know how kernel development works. I'm sorry if I offended you. I might have to adjust my expectations. Thanks, tglx
On Mon, Nov 14 2022 at 08:36, Steven Rostedt wrote: > On Mon, 14 Nov 2022 01:33:25 +0100 > Thomas Gleixner <tglx@linutronix.de> wrote: >> https://lore.kernel.org/all/87v8vjiaih.ffs@tglx/ >> > I'm not sure what you mean by that. The idea is that once timer_shutdown() > is called, we still warn on re-arming the timer. That's the whole point. As Linus and I discussed in that thread: "That would mean, that we still check the function pointer for NULL without warning and just return. That would indeed be a good argument for not having the warning at all." and as I demonstrated you on the example of the BT driver which you "fixed" this is the only sensible way to handle this. The warning does not buy us anything, unless you want to go and amend all the usage sites which trigger it with 'if (mystruct->shutdown)' conditionals. It's very similar to the work->canceling logic for kthreads that Linus mentioned in this thread which prevents that the work timer is rearmed concurrently. The difference is that timer_shutdown() is a final decision which renders the timer unusable unless it is explicitely reinitialized. But that's mostly a matter of documentation and it has to be made clear that nothing in a shutdown path which has the BT pattern: timer_shutdown(); destroy_workqueue(); relies on the timer being functional after the shutdown point. I'm pretty sure that the vast majority of such use cases do not care, but given the size of the driver zoo I'm also sure that you'll find at least one which depends on the timer working accross teardown. Thanks, tglx
On Mon, 14 Nov 2022 19:53:37 +0100 Thomas Gleixner <tglx@linutronix.de> wrote: > I can see that. TBH, this was just my last line of defense to not being > insulting, because I was seriously grumpy about this whole thing and > even more so when I discovered that it was just hastily cobbled together > and then sold as the panacea for solving driver teardown issues. > > You surely can do better and you very well know how kernel development > works. > > I'm sorry if I offended you. I might have to adjust my expectations. No problem. I'm also under a lot of stress lately and not getting enough rest. Which is a reason I was a bit slack in my development. Now that I'm back home and not working from a hotel room, I'm a bit more focused and will not be rushing as much. Cheers! -- Steve
On Mon, 14 Nov 2022 20:13:28 +0100 Thomas Gleixner <tglx@linutronix.de> wrote: > On Mon, Nov 14 2022 at 08:36, Steven Rostedt wrote: > > On Mon, 14 Nov 2022 01:33:25 +0100 > > Thomas Gleixner <tglx@linutronix.de> wrote: > >> https://lore.kernel.org/all/87v8vjiaih.ffs@tglx/ > >> > > I'm not sure what you mean by that. The idea is that once timer_shutdown() > > is called, we still warn on re-arming the timer. > > That's the whole point. As Linus and I discussed in that thread: > > "That would mean, that we still check the function pointer for NULL > without warning and just return. That would indeed be a good argument > for not having the warning at all." > > and as I demonstrated you on the example of the BT driver which you > "fixed" this is the only sensible way to handle this. I agree that it wasn't a complete fix, but as I mentioned before, I was pulled off before I could do more. > > The warning does not buy us anything, unless you want to go and amend > all the usage sites which trigger it with 'if (mystruct->shutdown)' > conditionals. The rationale for the warning was that it would let us know what drivers need to be fixed for older kernels without the shutdown state. I'm perfectly fine in removing the warning. We may just add it to the field kernels so that we can know if there's any drivers that have issues that we need to look at. > > It's very similar to the work->canceling logic for kthreads that Linus > mentioned in this thread which prevents that the work timer is rearmed > concurrently. The difference is that timer_shutdown() is a final > decision which renders the timer unusable unless it is explicitely > reinitialized. > > But that's mostly a matter of documentation and it has to be made clear > that nothing in a shutdown path which has the BT pattern: > > timer_shutdown(); > destroy_workqueue(); > > relies on the timer being functional after the shutdown point. I'm > pretty sure that the vast majority of such use cases do not care, but > given the size of the driver zoo I'm also sure that you'll find at least > one which depends on the timer working accross teardown. > Agreed. -- Steve
Linus! On Mon, Nov 14 2022 at 09:16, Linus Torvalds wrote: > On Mon, Nov 14, 2022 at 7:42 AM Thomas Gleixner <tglx@linutronix.de> wrote: >> >> So if we want to make this solid and make the life of driver writers >> easier, then we cannot issue a warning as I said in the original thread >> already. > > So I think that there are two issues at play: > > (a) do we want to *find* problem places after the conversion > > (b) do we want to make driver writing easier > > and (a) argues for warning on timer re-arming, but (b) just says > "don't warn, just ignore it, the driver is being shut down". > > I'm personally ok with either of those approaches, and it's literally > just a question of mindset. Correct. I'm very much for (b). Look at the bluetooth example. The "fix" was obviously right and then introduced a new subtle bug which will only happen every 7th half-moon. But if you turn it around then: timer_shutdown(); destroy_workqueue(); will trigger the warning in mod_timer() every 6.5th half-moon. And then you have to go and sprinkle 'if (mydev->inshutdown)' conditionals all over the place with a high probability that they will not cut it completely. Or you end up with the reverse order of shutdown calls which is wrong too. So I rather have the very simple semantics that attempts to arm a shutdown timer are silently ignored. As I said to Steven in the other mail, I'm sure that the vast majority of teardown sites will not depend on the timer(s) being functional. The two other esoteric cases will have to be treated special. >> The semantics of timer_shutdown_sync() have to be: >> >> After return: >> - the timer is not queued >> - the timer callbacks is not running >> - the timer cannot be enqueued again > > Yes, but that last case is literally a "do we expect the *driver* to > not enqueue it and warn if it tries, or do we just silently enforce > it"? > > I agree with all three points. I'm just not sure about who we expect > to do the "don't enqueue again". > > There's a big argument for "make it easy for driver writers" in just > saying "make mod_timer() silently just ignore a re-arming". Making > things easier for driver writers is a good thing. > > But maybe it's a "you shouldn't have done that in the first place" > thing, and merits a warning? See above. > I have no strong opinions on that. > > What I *do* still want to happen is for subsystems to be able to start > doing the conversion one by one. Which is why I'd still prefer to have > the new names available just so that we don't have to have one > 50-patch series, but we can have subsystems apply the obvious cases. > > And I'd still like the mindless "let's get the non-semantic changes > out of the way" as one single patch, to get rid of mindless noise. > > And honestly, for that to happen I'd be perfectly happy with something like > > #define timer_shutdown(t) del_timer(t) > #define timer_shutdown_sync(t) del_timer_sync(t) > > (obviously with the patches that first remove the existing > 'timer_shutdown()' uses first). That wouldn't introduce the *new* > semantics, but it would at least allow the different subsystems to do > the obvious cases, and let the networking people wonder about the much > less obvious ones. As we are at -rc5 now and the core code is not yet ready, I suggest that we get the core changes done for the next merge window and have some obvious fixes which demonstrate the usage, e.g. the borked BT fix replacement, and then subsystem people can queue their stuff for 6.3 or send in the obvious bugfixes during the 6.2-rc series. I'm not a fan of having #define timer_shutdown_sync(t) del_timer_sync(t) as a gap measure right now. That's just going to make things worse because the semantical difference between the both functions is significant and I don't want people to run around and replace their 'if (mydev->in_shutdown)' conditionals prematurely or do any other fancy "fixes" which cause more problems than they solve. This problem exists for ever so there is no need to rush this just because. If we all agree that the semantics of timer_shutdown_sync() are: After return: - the timer is not queued - the timer callback is not running - the timer cannot be enqueued again. Any attempts to do so are silently ignored (needs some more explanation...) and the semantics of timer_shutdown() are: After return: - the timer is not queued - the timer cannot be enqueued again. Any attempts to do so are silently ignored (needs some more explanation...) - the timer callback might be still running then we can definitly get this in shape for 6.2. Thanks, tglx
On Mon, Nov 14 2022 at 14:28, Steven Rostedt wrote: > On Mon, 14 Nov 2022 20:13:28 +0100 > Thomas Gleixner <tglx@linutronix.de> wrote: >> The warning does not buy us anything, unless you want to go and amend >> all the usage sites which trigger it with 'if (mystruct->shutdown)' >> conditionals. > > The rationale for the warning was that it would let us know what drivers > need to be fixed for older kernels without the shutdown state. I'm > perfectly fine in removing the warning. We may just add it to the field > kernels so that we can know if there's any drivers that have issues that we > need to look at. The warning is not guaranteed to catch the subtle cases. It might happen once in a blue-moon. I rather argue that (once we agreed on the semantics) we should backport timer_shutdown() and the fixes which we add to Linus tree. Searching for potentially problematic places is a job for Coccinelle, though fixing them needs to have deep human inspection. Backporting the core code and the corresponding fixes is way simpler than identifying the problematic cases via the unreliable warning and then coming up with a per driver solution by sprinkling 'if (in_shutdown)' conditionals all over the place. Thanks, tglx
diff --git a/include/linux/timer.h b/include/linux/timer.h index 648f00105f58..4d56e20613eb 100644 --- a/include/linux/timer.h +++ b/include/linux/timer.h @@ -183,12 +183,29 @@ extern int timer_reduce(struct timer_list *timer, unsigned long expires); extern void add_timer(struct timer_list *timer); extern int try_to_del_timer_sync(struct timer_list *timer); +extern int __del_timer_sync(struct timer_list *timer, bool free); -#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT) - extern int del_timer_sync(struct timer_list *timer); -#else -# define del_timer_sync(t) del_timer(t) -#endif +static inline int del_timer_sync(struct timer_list *timer) +{ + return __del_timer_sync(timer, false); +} + +/** + * timer_shutdown_sync - called before freeing the timer + * @timer: The timer to be freed + * + * Shutdown the timer before freeing. This will return when all pending timers + * have finished and it is safe to free the timer. + * + * Note, after calling this, if the timer is added back to the queue + * it will fail to be added and a WARNING will be triggered. + * + * Returns if it deactivated a pending timer or not. + */ +static inline int timer_shutdown_sync(struct timer_list *timer) +{ + return __del_timer_sync(timer, true); +} #define del_singleshot_timer_sync(t) del_timer_sync(t) diff --git a/kernel/time/timer.c b/kernel/time/timer.c index 717fcb9fb14a..111a3550b3f2 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -1017,7 +1017,8 @@ __mod_timer(struct timer_list *timer, unsigned long expires, unsigned int option unsigned int idx = UINT_MAX; int ret = 0; - BUG_ON(!timer->function); + if (WARN_ON_ONCE(!timer->function)) + return -EINVAL; /* * This is a common optimization triggered by the networking code - if @@ -1193,7 +1194,8 @@ EXPORT_SYMBOL(timer_reduce); */ void add_timer(struct timer_list *timer) { - BUG_ON(timer_pending(timer)); + if (WARN_ON_ONCE(timer_pending(timer))) + return; __mod_timer(timer, timer->expires, MOD_TIMER_NOTPENDING); } EXPORT_SYMBOL(add_timer); @@ -1210,7 +1212,8 @@ void add_timer_on(struct timer_list *timer, int cpu) struct timer_base *new_base, *base; unsigned long flags; - BUG_ON(timer_pending(timer) || !timer->function); + if (WARN_ON_ONCE(timer_pending(timer) || !timer->function)) + return; new_base = get_timer_cpu_base(timer->flags, cpu); @@ -1266,14 +1269,7 @@ int del_timer(struct timer_list *timer) } EXPORT_SYMBOL(del_timer); -/** - * try_to_del_timer_sync - Try to deactivate a timer - * @timer: timer to delete - * - * This function tries to deactivate a timer. Upon successful (ret >= 0) - * exit the timer is not queued and the handler is not running on any CPU. - */ -int try_to_del_timer_sync(struct timer_list *timer) +static int __try_to_del_timer_sync(struct timer_list *timer, bool free) { struct timer_base *base; unsigned long flags; @@ -1285,11 +1281,25 @@ int try_to_del_timer_sync(struct timer_list *timer) if (base->running_timer != timer) ret = detach_if_pending(timer, base, true); + if (free) + timer->function = NULL; raw_spin_unlock_irqrestore(&base->lock, flags); return ret; } + +/** + * try_to_del_timer_sync - Try to deactivate a timer + * @timer: timer to delete + * + * This function tries to deactivate a timer. Upon successful (ret >= 0) + * exit the timer is not queued and the handler is not running on any CPU. + */ +int try_to_del_timer_sync(struct timer_list *timer) +{ + return __try_to_del_timer_sync(timer, false); +} EXPORT_SYMBOL(try_to_del_timer_sync); #ifdef CONFIG_PREEMPT_RT @@ -1365,10 +1375,10 @@ static inline void timer_sync_wait_running(struct timer_base *base) { } static inline void del_timer_wait_running(struct timer_list *timer) { } #endif -#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT) /** - * del_timer_sync - deactivate a timer and wait for the handler to finish. + * __del_timer_sync - deactivate a timer and wait for the handler to finish. * @timer: the timer to be deactivated + * @free: Set to true if the timer is about to be freed * * This function only differs from del_timer() on SMP: besides deactivating * the timer it also makes sure the handler has finished executing on other @@ -1402,7 +1412,7 @@ static inline void del_timer_wait_running(struct timer_list *timer) { } * * The function returns whether it has deactivated a pending timer or not. */ -int del_timer_sync(struct timer_list *timer) +int __del_timer_sync(struct timer_list *timer, bool free) { int ret; @@ -1432,7 +1442,7 @@ int del_timer_sync(struct timer_list *timer) lockdep_assert_preemption_enabled(); do { - ret = try_to_del_timer_sync(timer); + ret = __try_to_del_timer_sync(timer, free); if (unlikely(ret < 0)) { del_timer_wait_running(timer); @@ -1442,8 +1452,7 @@ int del_timer_sync(struct timer_list *timer) return ret; } -EXPORT_SYMBOL(del_timer_sync); -#endif +EXPORT_SYMBOL(__del_timer_sync); static void call_timer_fn(struct timer_list *timer, void (*fn)(struct timer_list *),