Message ID | 20231201092654.34614-13-anna-maria@linutronix.de |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp991726vqy; Fri, 1 Dec 2023 01:27:58 -0800 (PST) X-Google-Smtp-Source: AGHT+IF5qFET8aGFrzdyIVibjCJci+Ft6FCKqZOdkZWwH5mDKR9TKTNHNsLGawbNvbkRci6rVkOX X-Received: by 2002:aca:1b1a:0:b0:3b2:e6a4:e177 with SMTP id b26-20020aca1b1a000000b003b2e6a4e177mr2073265oib.51.1701422878068; Fri, 01 Dec 2023 01:27:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701422878; cv=none; d=google.com; s=arc-20160816; b=J6Csz8PRhF59mZG1Q81udY9b8EHfzdr+ik/zhu643B6fWL/8SPIXikmrtTcOHvQulN FbijCWcvU0PDgDfHdgCnlY++2RJtwzrmMoWVmp2RriMx3E0rpDlseDI2jgB4dQUjL9yR /5wLudxOgUn4GB96YfYuSjrLHRl/QAbN4iGmmCY856ARnH+TrlA0Senn3VJ9gfruivux VrddHkrYXRx6CxAJ8u9HTvamDuzUBjHUZNKTztKMmcd8fHLzYH//QkUPUV7oBT4oL28d u4ic2ON/MWJklUu7Vi7aKEiV1MWKAN6cI2/E58Ief00hX0550sd5BwZc+cxY+b4dhqko b5Gg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:dkim-signature :dkim-signature:from; bh=nZi/SXfoi2GICSzM+IYBAVSlO8Hm1DXEFr76jQHYOZ0=; fh=PG6uS4TiiUSDyl8D/joYkWbwCgDm4ug0ir2h7tHBJXQ=; b=HD1x18lJAYoG/c4S+FLkOjtAx0Cdp/pTcQX3Ro7VlzjkZkLaMgH7tJkfcD6SiBWbwA eqZ7BZc0gGBC3Ub3AgEgKhu+zJeOtfhktvsg2/BJbEA6t2pNCYCF9B3kOlMNe3sexEc9 Wv3rKHxMA2/dss5UUagIaHsjIn6Fl/WC0JjBrVRmIIhgVf3ZLiZDr59QXj9eyAliJELG HdpauBjgD8lzNb9R9QKEKF/cXvyVyT2nSnAFtPgwLKcdtWLOXkOPfdmk7fUl89zXy/kL JSRxRKNeecOv2x78GRnxGZUjSzI0CZo58EuXIxULKiOTLirz2LHGtVWfQS5nyIkkKZYb gdKw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b="lPUu/u37"; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=DGxDnst2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from morse.vger.email (morse.vger.email. [2620:137:e000::3:1]) by mx.google.com with ESMTPS id bx41-20020a056a02052900b005c5e2ac7c3csi3279148pgb.732.2023.12.01.01.27.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 01 Dec 2023 01:27:58 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) client-ip=2620:137:e000::3:1; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b="lPUu/u37"; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=DGxDnst2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by morse.vger.email (Postfix) with ESMTP id 24B31833AB02; Fri, 1 Dec 2023 01:27:55 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at morse.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1378043AbjLAJ1m (ORCPT <rfc822;ruipengqi7@gmail.com> + 99 others); Fri, 1 Dec 2023 04:27:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54358 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1378055AbjLAJ1W (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 1 Dec 2023 04:27:22 -0500 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A3A521725 for <linux-kernel@vger.kernel.org>; Fri, 1 Dec 2023 01:27:15 -0800 (PST) From: Anna-Maria Behnsen <anna-maria@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1701422834; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nZi/SXfoi2GICSzM+IYBAVSlO8Hm1DXEFr76jQHYOZ0=; b=lPUu/u3782O76F4tssDfbbwJSu1UYJWRXA1YF1HFvK1m8kbs0zLQsUh/Qrq0SympHt1Xn4 fWs6pLLsziRXsDGGtVSkmgIluUmEPyOt49N0oJuW1GpqQsDrQiTwLCGHDYPxL2sHNl8pO8 uZrfXgDRa3Soql/IDz++WAZHigpe67VrkWm7MrhlkMdDJb1cuTVJRBpNVrj0k0qs67dVi3 SnfBc962XKiwdxw27XZSz4hJUgXOfPYnpxuSOBDbSvDj36Sfa0N/0H08AMFC8K3OePFC/t LsAllTQBY9b+fu99vTafsfRYXEb2PxTXLZsynXnrqJJ34TflfOYmg29flJ7ItA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1701422834; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nZi/SXfoi2GICSzM+IYBAVSlO8Hm1DXEFr76jQHYOZ0=; b=DGxDnst2vjquka22zFxBnL29sukEirkFvC4s2bWFecYQdCP9TQh2tKsQYBTAwCLj3xIhX6 heDteX2CRHR+6lAg== To: linux-kernel@vger.kernel.org Cc: Peter Zijlstra <peterz@infradead.org>, John Stultz <jstultz@google.com>, Thomas Gleixner <tglx@linutronix.de>, Eric Dumazet <edumazet@google.com>, "Rafael J . Wysocki" <rafael.j.wysocki@intel.com>, Arjan van de Ven <arjan@infradead.org>, "Paul E . McKenney" <paulmck@kernel.org>, Frederic Weisbecker <frederic@kernel.org>, Rik van Riel <riel@surriel.com>, Steven Rostedt <rostedt@goodmis.org>, Sebastian Siewior <bigeasy@linutronix.de>, Giovanni Gherdovich <ggherdovich@suse.cz>, Lukasz Luba <lukasz.luba@arm.com>, "Gautham R . Shenoy" <gautham.shenoy@amd.com>, Srinivas Pandruvada <srinivas.pandruvada@intel.com>, K Prateek Nayak <kprateek.nayak@amd.com>, Anna-Maria Behnsen <anna-maria@linutronix.de> Subject: [PATCH v9 12/32] timers: Fix nextevt calculation when no timers are pending Date: Fri, 1 Dec 2023 10:26:34 +0100 Message-Id: <20231201092654.34614-13-anna-maria@linutronix.de> In-Reply-To: <20231201092654.34614-1-anna-maria@linutronix.de> References: <20231201092654.34614-1-anna-maria@linutronix.de> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on morse.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (morse.vger.email [0.0.0.0]); Fri, 01 Dec 2023 01:27:55 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784071195826791695 X-GMAIL-MSGID: 1784071195826791695 |
Series |
timers: Move from a push remote at enqueue to a pull at expiry model
|
|
Commit Message
Anna-Maria Behnsen
Dec. 1, 2023, 9:26 a.m. UTC
When no timer is queued into an empty timer base, the next_expiry will not
be updated. It was originally calculated as
base->clk + NEXT_TIMER_MAX_DELTA
When the timer base stays empty long enough (> NEXT_TIMER_MAX_DELTA), the
next_expiry value of the empty base suggests that there is a timer pending
soon. This might be more a kind of a theoretical problem, but the fix
doesn't hurt.
Use only base->next_expiry value as nextevt when timers are
pending. Otherwise nextevt will be jiffies + NEXT_TIMER_MAX_DELTA. As all
information is in place, update base->next_expiry value of the empty timer
base as well.
Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
---
v9: New patch
---
kernel/time/timer.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
Comments
On 2023-12-01 10:26:34 [+0100], Anna-Maria Behnsen wrote: > When no timer is queued into an empty timer base, the next_expiry will not > be updated. It was originally calculated as > > base->clk + NEXT_TIMER_MAX_DELTA > > When the timer base stays empty long enough (> NEXT_TIMER_MAX_DELTA), the > next_expiry value of the empty base suggests that there is a timer pending > soon. This might be more a kind of a theoretical problem, but the fix > doesn't hurt. So __run_timers() sets base::next_expiry to base->clk + NEXT_TIMER_MAX_DELTA and then we have no more timers enqueued. But wouldn't base->timers_pending remain false? Therefore it would use "expires = KTIME_MAX" as return value (well cmp_next_hrtimer_event())? Based on the code as of #11, it would only set timer_base::is_idle wrongly false if it wraps around. Other than that, I don't see an issue. What do I miss? If you update it regardless here then it would make a difference to run_local_timers() assuming we have still hrtimer which expire and this next_expiry check might raise a softirq since it does not consider the timers_pending value. > Use only base->next_expiry value as nextevt when timers are > pending. Otherwise nextevt will be jiffies + NEXT_TIMER_MAX_DELTA. As all > information is in place, update base->next_expiry value of the empty timer > base as well. or consider timers_pending in run_local_timers()? An additional read vs write? > --- a/kernel/time/timer.c > +++ b/kernel/time/timer.c > @@ -1944,10 +1943,20 @@ u64 get_next_timer_interrupt(unsigned long basej, u64 basem) > __forward_timer_base(base, basej); > > if (base->timers_pending) { > + nextevt = base->next_expiry; > + > /* If we missed a tick already, force 0 delta */ > if (time_before(nextevt, basej)) > nextevt = basej; > expires = basem + (u64)(nextevt - basej) * TICK_NSEC; > + } else { > + /* > + * Move next_expiry for the empty base into the future to > + * prevent a unnecessary raise of the timer softirq when the an > + * next_expiry value will be reached even if there is no timer > + * pending. > + */ > + base->next_expiry = nextevt; > } > > /* Sebastian
Sebastian Siewior <bigeasy@linutronix.de> writes: > On 2023-12-01 10:26:34 [+0100], Anna-Maria Behnsen wrote: >> When no timer is queued into an empty timer base, the next_expiry will not >> be updated. It was originally calculated as >> >> base->clk + NEXT_TIMER_MAX_DELTA >> >> When the timer base stays empty long enough (> NEXT_TIMER_MAX_DELTA), the >> next_expiry value of the empty base suggests that there is a timer pending >> soon. This might be more a kind of a theoretical problem, but the fix >> doesn't hurt. > > So __run_timers() sets base::next_expiry to base->clk + > NEXT_TIMER_MAX_DELTA and then we have no more timers enqueued. > > But wouldn't base->timers_pending remain false? Therefore it would use > "expires = KTIME_MAX" as return value (well cmp_next_hrtimer_event())? Jupp. > Based on the code as of #11, it would only set timer_base::is_idle > wrongly false if it wraps around. Other than that, I don't see an issue. > What do I miss? And it will raise an unnecessary softirq when it wraps around as you also mentioned on the next paragraph. > If you update it regardless here then it would make a difference to > run_local_timers() assuming we have still hrtimer which expire and this > next_expiry check might raise a softirq since it does not consider the > timers_pending value. The only difference with this change would be that the softirq will not be raised when it wraps around. >> Use only base->next_expiry value as nextevt when timers are >> pending. Otherwise nextevt will be jiffies + NEXT_TIMER_MAX_DELTA. As all >> information is in place, update base->next_expiry value of the empty timer >> base as well. > > or consider timers_pending in run_local_timers()? An additional read vs > write? This would also be a possibility to add the check in run_local_timers() with timers_pending. And we also have to make the is_idle marking in get_next_timer_interrupt() dependant on base::timers_pending bit. But this also means, we cannot rely on next_expiry when no timer is pending. Frederic, what do you think? Thanks, Anna-Maria
Le Tue, Dec 05, 2023 at 12:53:03PM +0100, Anna-Maria Behnsen a écrit : > Sebastian Siewior <bigeasy@linutronix.de> writes: > >> Use only base->next_expiry value as nextevt when timers are > >> pending. Otherwise nextevt will be jiffies + NEXT_TIMER_MAX_DELTA. As all > >> information is in place, update base->next_expiry value of the empty timer > >> base as well. > > > > or consider timers_pending in run_local_timers()? An additional read vs > > write? > > This would also be a possibility to add the check in run_local_timers() > with timers_pending. We could but do we really care about avoiding a potential softirq every 12 days (on 1000 Hz...) > And we also have to make the is_idle marking in > get_next_timer_interrupt() dependant on base::timers_pending bit. Yes that, on the other hand, looks mandatory! Because if the CPU sleeps for 12 days and then gets an interrupt and then go back to sleep, base->is_idle will be set as false and remote enqueues won't be notified. > But this also means, we cannot rely on next_expiry when no timer is pending. But note that this patch only fixes that partially anyway. Suppose the tick is stopped entirely and the CPU sleeps for 13 days without any interruption. Then it's woken up with TIF_RESCHED without any timer queued, get_next_timer_interrupt() won't be called upon tick restart to fix ->next_expiry. > > Frederic, what do you think? So it looks like is_idle must be fixed. As for the timer softirq, ->next_expiry is already unreliable because when a timer is removed, ->next_expiry is not updated (even though that removed timer might have been the earliest). So ->next_expiry can already carry a "too early" value. The only constraint is that ->next_expiry can't be later than the first timer. So I'd rather put a comment somewhere about the fact that wrapping is expected to behave ok. But it's your call. Thanks.
Frederic Weisbecker <frederic@kernel.org> writes: > Le Tue, Dec 05, 2023 at 12:53:03PM +0100, Anna-Maria Behnsen a écrit : >> >> Frederic, what do you think? > > So it looks like is_idle must be fixed. > > As for the timer softirq, ->next_expiry is already unreliable because when > a timer is removed, ->next_expiry is not updated (even though that removed > timer might have been the earliest). So ->next_expiry can already carry a > "too early" value. The only constraint is that ->next_expiry can't be later > than the first timer. > > So I'd rather put a comment somewhere about the fact that wrapping is expected > to behave ok. But it's your call. Ok. If both solutions are fine, I would like to take the solution with updating the next_expiry values for empty bases. It will make the compare of expiry values of global and local timer base easier in one of the patches later on. Thanks, Anna-Maria
On Tue, Dec 12, 2023 at 02:21:25PM +0100, Anna-Maria Behnsen wrote: > Frederic Weisbecker <frederic@kernel.org> writes: > > > Le Tue, Dec 05, 2023 at 12:53:03PM +0100, Anna-Maria Behnsen a écrit : > >> > >> Frederic, what do you think? > > > > So it looks like is_idle must be fixed. > > > > As for the timer softirq, ->next_expiry is already unreliable because when > > a timer is removed, ->next_expiry is not updated (even though that removed > > timer might have been the earliest). So ->next_expiry can already carry a > > "too early" value. The only constraint is that ->next_expiry can't be later > > than the first timer. > > > > So I'd rather put a comment somewhere about the fact that wrapping is expected > > to behave ok. But it's your call. > > Ok. If both solutions are fine, I would like to take the solution with > updating the next_expiry values for empty bases. It will make the > compare of expiry values of global and local timer base easier in one of > the patches later on. Fine by me at least! Thanks. > Thanks, > > Anna-Maria >
Le Fri, Dec 01, 2023 at 10:26:34AM +0100, Anna-Maria Behnsen a écrit : > When no timer is queued into an empty timer base, the next_expiry will not > be updated. It was originally calculated as > > base->clk + NEXT_TIMER_MAX_DELTA > > When the timer base stays empty long enough (> NEXT_TIMER_MAX_DELTA), the > next_expiry value of the empty base suggests that there is a timer pending > soon. This might be more a kind of a theoretical problem, but the fix > doesn't hurt. This solves a real issue. I suggest removing the last sentence and add instead: If the CPU sleeps in idle for a bit more than NEXT_TIMER_MAX_DELTA (~12 days in HZ=1000) and then an interrupt fires, upon going back to idle get_next_timer_interrupt() will still return KTIME_MAX but incorrectly set is_idle to false. Therefore the CPU will keep the tick stopped and go back to sleep though further remote enqueue of timers to this CPU will fail to send an IPI. As a result the timer will remain ignored. Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
diff --git a/kernel/time/timer.c b/kernel/time/timer.c index 0826018d9873..4dffe966424c 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -1922,8 +1922,8 @@ static u64 cmp_next_hrtimer_event(u64 basem, u64 expires) u64 get_next_timer_interrupt(unsigned long basej, u64 basem) { struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]); + unsigned long nextevt = basej + NEXT_TIMER_MAX_DELTA; u64 expires = KTIME_MAX; - unsigned long nextevt; /* * Pretend that there is no timer pending if the cpu is offline. @@ -1935,7 +1935,6 @@ u64 get_next_timer_interrupt(unsigned long basej, u64 basem) raw_spin_lock(&base->lock); if (base->next_expiry_recalc) next_expiry_recalc(base); - nextevt = base->next_expiry; /* * We have a fresh next event. Check whether we can forward the @@ -1944,10 +1943,20 @@ u64 get_next_timer_interrupt(unsigned long basej, u64 basem) __forward_timer_base(base, basej); if (base->timers_pending) { + nextevt = base->next_expiry; + /* If we missed a tick already, force 0 delta */ if (time_before(nextevt, basej)) nextevt = basej; expires = basem + (u64)(nextevt - basej) * TICK_NSEC; + } else { + /* + * Move next_expiry for the empty base into the future to + * prevent a unnecessary raise of the timer softirq when the + * next_expiry value will be reached even if there is no timer + * pending. + */ + base->next_expiry = nextevt; } /*