[v3] clocksource: Scale the max retry number of watchdog read according to CPU numbers
Message ID | 20240129134505.961208-1-feng.tang@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-42835-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2087:b0:106:209c:c626 with SMTP id gs7csp578232dyb; Mon, 29 Jan 2024 05:53:57 -0800 (PST) X-Google-Smtp-Source: AGHT+IFRnoPXAE4Tz2FeQTAc8YdQp4D8KFNLsUUtZoJmDK3y+N1LJCWy9paY17hF8m9AW4uR9vWC X-Received: by 2002:a81:de05:0:b0:5d7:1941:2c18 with SMTP id k5-20020a81de05000000b005d719412c18mr3410471ywj.69.1706536437424; Mon, 29 Jan 2024 05:53:57 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706536437; cv=pass; d=google.com; s=arc-20160816; b=J7hq+0/11lgfOSnubkmD3Iw/iTbEjRg0RzccrzwpwljZSlEIi6naPz3yNW4ZfTN0Ao md9dYGuDqnGJafNms3x/gpNcWEDTzGzcviwwcB56wEqeheERj6d+y5HVnAQzflEoU2eT Yh90fYxREHH8wViCKPKPdLjyL4gLbLRLy0Zby4lt5BL6BodyrDCGV1X8MpWtNf/aE5Uy 2WKlTgIYfElVRseFzrk8d3H9nVYG5PSodRY3WhGaM+2SwlfYrQST3fypecPCfQdNoLP1 MnAncHRubbQVrggNL23YO0bFe62YCySYgQFvANgjzBqFKpjXh4TVH/YtPira+8RXVvWu OxJQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from:dkim-signature; bh=Gjz5pfo2MAgNnr7DQWiOES4fIYeZroTJ4petTJGYMKw=; fh=lKRi11FvCur4v6UHyVT9da5QdIyi/LE3rb/3hTFdpx8=; b=A6vmV7yNHJfaG86jkqcDRvaY44az2HhszCa8NFRnB2h2s/U3nR+OJSFIiy6VF4+Z85 F5qzjiJqRUzNhf3SXfBj1AXh6Nc+p0wK+aL2MBgcaeSE5tErUKh0Qs0R0ctcwma8uiJe gfwWuvaBR0rO0BM00Vl3pBJwqS8UPaj386gaY1XbK9OCHzROIZ4mI0h4bO/+vQhH8bc5 0l6IhGhWjELroKFgS8tfAHvjrCBJOrXKqBiyW+2jBNv67x/YUa2B0VHluEXsbEwMx4Zu NjjB7pt7HWPYgn/YM3Clu6uNjS3w2dBFOWYSk8DUyp2h/iJLeHPnvMoRqlN1aGTgCpTi j08w== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=EvEcwYMO; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-42835-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-42835-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id bm42-20020a05620a19aa00b00783e1590eb6si6356309qkb.726.2024.01.29.05.53.57 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Jan 2024 05:53:57 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-42835-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=EvEcwYMO; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-42835-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-42835-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 2ACDB1C243DD for <ouuuleilei@gmail.com>; Mon, 29 Jan 2024 13:53:57 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id B67DE657D9; Mon, 29 Jan 2024 13:53:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="EvEcwYMO" Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 621A6657A7 for <linux-kernel@vger.kernel.org>; Mon, 29 Jan 2024 13:53:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706536419; cv=none; b=lYueK+ysYC4M3/r8dxcRb7zIAIFhjAq9pX/JSVqNSynUo30+hUkUJOZ2dNBolAMiEr+G0z67kXjFKRqZY0Ahs5xEA1S3JgjyBbX2KiHxyLC54M3ZSHqIVrkPb6sy6F6iId0EFkdlvmgVxcWgfK8ulvx/EBQOd1s3OOJThDBY/C8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706536419; c=relaxed/simple; bh=oul+8EboxPRMOmzDiRgdrl50CO3bLgpxsn1+vCbxHUY=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=fTQoIXvPoLD46KnXVA6utVjCc0EPSJCWfxxXdnUKWzCcIeo4hslkfLvvyCVkrvxDu4Wj4QQAFChA/32UNxYLCxvAPIQoYml5ViBuiAzEl7Zw3qfcNirtW9AHdKeh7qt1RA0W8Juci+hpflHj+UVOcJ8MzI1UgWznGIbxCmPYj4k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=EvEcwYMO; arc=none smtp.client-ip=192.198.163.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706536417; x=1738072417; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=oul+8EboxPRMOmzDiRgdrl50CO3bLgpxsn1+vCbxHUY=; b=EvEcwYMOPzSUSeGBF6NdIVMio2W46ZWDfsH1evwMANldGUUswKTFAp8w 99w7XoN1MRmDNeeZTD2SnT+kAmjNtn/1jB1aVWEmlM//QPyjp93nXmW4b CVwfHpV8uJzvSVIuNn8+s2krnJTioNWgSyGt7ria3fJpAkPBlmEIgj+C1 lcSYSNS1+CxP7btDvtEmkM6eFXFlIK5ronEeWwW1pZZFkVldq/Ijvd88l gBffxP14zZGni1oBfNREwMiEb2xP8X/7bP2j5Qpn13fkduyXtbq0ifAdk 2xIN2ox2OY/X9g2wrHeUfc86g2IYpwNMZMhEYb2Wv1lPnvv7wqROOLgIn g==; X-IronPort-AV: E=McAfee;i="6600,9927,10967"; a="1887228" X-IronPort-AV: E=Sophos;i="6.05,227,1701158400"; d="scan'208";a="1887228" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmvoesa107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Jan 2024 05:53:37 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10967"; a="821855385" X-IronPort-AV: E=Sophos;i="6.05,227,1701158400"; d="scan'208";a="821855385" Received: from feng-clx.sh.intel.com ([10.239.159.50]) by orsmga001.jf.intel.com with ESMTP; 29 Jan 2024 05:53:33 -0800 From: Feng Tang <feng.tang@intel.com> To: John Stultz <jstultz@google.com>, Thomas Gleixner <tglx@linutronix.de>, Stephen Boyd <sboyd@kernel.org>, paulmck@kernel.org, Waiman Long <longman@redhat.com>, Peter Zijlstra <peterz@infradead.org>, linux-kernel@vger.kernel.org Cc: Feng Tang <feng.tang@intel.com>, Jin Wang <jin1.wang@intel.com> Subject: [PATCH v3] clocksource: Scale the max retry number of watchdog read according to CPU numbers Date: Mon, 29 Jan 2024 21:45:05 +0800 Message-Id: <20240129134505.961208-1-feng.tang@intel.com> X-Mailer: git-send-email 2.34.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789147551119072009 X-GMAIL-MSGID: 1789433151354745279 |
Series |
[v3] clocksource: Scale the max retry number of watchdog read according to CPU numbers
|
|
Commit Message
Feng Tang
Jan. 29, 2024, 1:45 p.m. UTC
There was a bug on one 8-socket server that the TSC is wrongly marked as 'unstable' and disabled during boot time (reproduce rate is about every 120 rounds of reboot tests), with log: clocksource: timekeeping watchdog on CPU227: wd-tsc-wd excessive read-back delay of 153560ns vs. limit of 125000ns, wd-wd read-back delay only 11440ns, attempt 3, marking tsc unstable tsc: Marking TSC unstable due to clocksource watchdog TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'. sched_clock: Marking unstable (119294969739, 159204297)<-(125446229205, -5992055152) clocksource: Checking clocksource tsc synchronization from CPU 319 to CPUs 0,99,136,180,210,542,601,896. clocksource: Switched to clocksource hpet The reason is for platform with lots of CPU, there are sporadic big or huge read latency of read watchog/clocksource during boot or when system is under stress work load, and the frequency and maximum value of the latency goes up with the increasing of CPU numbers. Current code already has logic to detect and filter such high latency case by reading 3 times of watchdog, and check the 2 deltas. Due to the randomness of the latency, there is a low possibility situation that the first delta (latency) is big, but the second delta is small and looks valid, which can escape from the check, and there is a 'max_cswd_read_retries' for retrying that check covering this case, whose default value is only 2 and may be not enough for machines with huge number of CPUs. So scale and enlarge the max retry number according to CPU number to better filter those latency noise for large systems, which has been verified fine in 4 days and 670 rounds of reboot test on the 8-socket machine. Also add sanity check for user input value for 'max_cswd_read_retries', make it self-adaptive by default, and provide a general helper for getting this max retry number as suggested by Paul and Waiman. Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Waiman Long <longman@redhat.com> Signed-off-by: Feng Tang <feng.tang@intel.com> Tested-by: Jin Wang <jin1.wang@intel.com> --- Changelog: since v2: * Fix the unexported symbol of helper function being used by kernel module issue (Waiman) since v1: * Add santity check for user input value of 'max_cswd_read_retries' and a helper function for getting max retry nubmer (Paul) * Apply the same logic to watchdog test code (Waiman) include/linux/clocksource.h | 18 +++++++++++++++++- kernel/time/clocksource-wdtest.c | 12 +++++++----- kernel/time/clocksource.c | 10 ++++++---- 3 files changed, 30 insertions(+), 10 deletions(-)
Comments
On Mon, Jan 29, 2024 at 09:45:05PM +0800, Feng Tang wrote: > There was a bug on one 8-socket server that the TSC is wrongly marked as > 'unstable' and disabled during boot time (reproduce rate is about every > 120 rounds of reboot tests), with log: > > clocksource: timekeeping watchdog on CPU227: wd-tsc-wd excessive read-back delay of 153560ns vs. limit of 125000ns, > wd-wd read-back delay only 11440ns, attempt 3, marking tsc unstable > tsc: Marking TSC unstable due to clocksource watchdog > TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'. > sched_clock: Marking unstable (119294969739, 159204297)<-(125446229205, -5992055152) > clocksource: Checking clocksource tsc synchronization from CPU 319 to CPUs 0,99,136,180,210,542,601,896. > clocksource: Switched to clocksource hpet > > The reason is for platform with lots of CPU, there are sporadic big or huge > read latency of read watchog/clocksource during boot or when system is under > stress work load, and the frequency and maximum value of the latency goes up > with the increasing of CPU numbers. Current code already has logic to detect > and filter such high latency case by reading 3 times of watchdog, and check > the 2 deltas. Due to the randomness of the latency, there is a low possibility > situation that the first delta (latency) is big, but the second delta is small > and looks valid, which can escape from the check, and there is a > 'max_cswd_read_retries' for retrying that check covering this case, whose > default value is only 2 and may be not enough for machines with huge number > of CPUs. > > So scale and enlarge the max retry number according to CPU number to better > filter those latency noise for large systems, which has been verified fine > in 4 days and 670 rounds of reboot test on the 8-socket machine. > > Also add sanity check for user input value for 'max_cswd_read_retries', make > it self-adaptive by default, and provide a general helper for getting this > max retry number as suggested by Paul and Waiman. > > Cc: Paul E. McKenney <paulmck@kernel.org> > Cc: Waiman Long <longman@redhat.com> > Signed-off-by: Feng Tang <feng.tang@intel.com> > Tested-by: Jin Wang <jin1.wang@intel.com> Tested-by: Paul E. McKenney <paulmck@kernel.org> > --- > Changelog: > > since v2: > * Fix the unexported symbol of helper function being used by > kernel module issue (Waiman) > > since v1: > * Add santity check for user input value of 'max_cswd_read_retries' > and a helper function for getting max retry nubmer (Paul) > * Apply the same logic to watchdog test code (Waiman) > > include/linux/clocksource.h | 18 +++++++++++++++++- > kernel/time/clocksource-wdtest.c | 12 +++++++----- > kernel/time/clocksource.c | 10 ++++++---- > 3 files changed, 30 insertions(+), 10 deletions(-) > > diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h > index 1d42d4b17327..0483f7dd66a3 100644 > --- a/include/linux/clocksource.h > +++ b/include/linux/clocksource.h > @@ -291,7 +291,23 @@ static inline void timer_probe(void) {} > #define TIMER_ACPI_DECLARE(name, table_id, fn) \ > ACPI_DECLARE_PROBE_ENTRY(timer, name, table_id, 0, NULL, 0, fn) > > -extern ulong max_cswd_read_retries; > +extern long max_cswd_read_retries; > + > +static inline long clocksource_max_watchdog_read_retries(void) > +{ > + long max_retries = max_cswd_read_retries; > + > + if (max_cswd_read_retries <= 0) { > + /* santity check for user input value */ > + if (max_cswd_read_retries != -1) > + pr_warn_once("max_cswd_read_retries was set with an invalid number: %ld\n", > + max_cswd_read_retries); > + > + max_retries = ilog2(num_online_cpus()) + 1; > + } > + return max_retries; > +} > + > void clocksource_verify_percpu(struct clocksource *cs); > > #endif /* _LINUX_CLOCKSOURCE_H */ > diff --git a/kernel/time/clocksource-wdtest.c b/kernel/time/clocksource-wdtest.c > index df922f49d171..c70cea3c44a1 100644 > --- a/kernel/time/clocksource-wdtest.c > +++ b/kernel/time/clocksource-wdtest.c > @@ -106,6 +106,7 @@ static int wdtest_func(void *arg) > unsigned long j1, j2; > char *s; > int i; > + long max_retries; > > schedule_timeout_uninterruptible(holdoff * HZ); > > @@ -139,18 +140,19 @@ static int wdtest_func(void *arg) > WARN_ON_ONCE(time_before(j2, j1 + NSEC_PER_USEC)); > > /* Verify tsc-like stability with various numbers of errors injected. */ > - for (i = 0; i <= max_cswd_read_retries + 1; i++) { > - if (i <= 1 && i < max_cswd_read_retries) > + max_retries = clocksource_max_watchdog_read_retries(); > + for (i = 0; i <= max_retries + 1; i++) { > + if (i <= 1 && i < max_retries) > s = ""; > - else if (i <= max_cswd_read_retries) > + else if (i <= max_retries) > s = ", expect message"; > else > s = ", expect clock skew"; > - pr_info("--- Watchdog with %dx error injection, %lu retries%s.\n", i, max_cswd_read_retries, s); > + pr_info("--- Watchdog with %dx error injection, %ld retries%s.\n", i, max_retries, s); > WRITE_ONCE(wdtest_ktime_read_ndelays, i); > schedule_timeout_uninterruptible(2 * HZ); > WARN_ON_ONCE(READ_ONCE(wdtest_ktime_read_ndelays)); > - WARN_ON_ONCE((i <= max_cswd_read_retries) != > + WARN_ON_ONCE((i <= max_retries) != > !(clocksource_wdtest_ktime.flags & CLOCK_SOURCE_UNSTABLE)); > wdtest_ktime_clocksource_reset(); > } > diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c > index c108ed8a9804..2e5a1d6c6712 100644 > --- a/kernel/time/clocksource.c > +++ b/kernel/time/clocksource.c > @@ -208,8 +208,8 @@ void clocksource_mark_unstable(struct clocksource *cs) > spin_unlock_irqrestore(&watchdog_lock, flags); > } > > -ulong max_cswd_read_retries = 2; > -module_param(max_cswd_read_retries, ulong, 0644); > +long max_cswd_read_retries = -1; > +module_param(max_cswd_read_retries, long, 0644); > EXPORT_SYMBOL_GPL(max_cswd_read_retries); > static int verify_n_cpus = 8; > module_param(verify_n_cpus, int, 0644); > @@ -225,8 +225,10 @@ static enum wd_read_status cs_watchdog_read(struct clocksource *cs, u64 *csnow, > unsigned int nretries; > u64 wd_end, wd_end2, wd_delta; > int64_t wd_delay, wd_seq_delay; > + long max_retries; > > - for (nretries = 0; nretries <= max_cswd_read_retries; nretries++) { > + max_retries = clocksource_max_watchdog_read_retries(); > + for (nretries = 0; nretries <= max_retries; nretries++) { > local_irq_disable(); > *wdnow = watchdog->read(watchdog); > *csnow = cs->read(cs); > @@ -238,7 +240,7 @@ static enum wd_read_status cs_watchdog_read(struct clocksource *cs, u64 *csnow, > wd_delay = clocksource_cyc2ns(wd_delta, watchdog->mult, > watchdog->shift); > if (wd_delay <= WATCHDOG_MAX_SKEW) { > - if (nretries > 1 || nretries >= max_cswd_read_retries) { > + if (nretries > 1 || nretries >= max_retries) { > pr_warn("timekeeping watchdog on CPU%d: %s retried %d times before success\n", > smp_processor_id(), watchdog->name, nretries); > } > -- > 2.34.1 >
On 1/29/24 08:45, Feng Tang wrote: > There was a bug on one 8-socket server that the TSC is wrongly marked as > 'unstable' and disabled during boot time (reproduce rate is about every > 120 rounds of reboot tests), with log: > > clocksource: timekeeping watchdog on CPU227: wd-tsc-wd excessive read-back delay of 153560ns vs. limit of 125000ns, > wd-wd read-back delay only 11440ns, attempt 3, marking tsc unstable > tsc: Marking TSC unstable due to clocksource watchdog > TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'. > sched_clock: Marking unstable (119294969739, 159204297)<-(125446229205, -5992055152) > clocksource: Checking clocksource tsc synchronization from CPU 319 to CPUs 0,99,136,180,210,542,601,896. > clocksource: Switched to clocksource hpet > > The reason is for platform with lots of CPU, there are sporadic big or huge > read latency of read watchog/clocksource during boot or when system is under > stress work load, and the frequency and maximum value of the latency goes up > with the increasing of CPU numbers. Current code already has logic to detect > and filter such high latency case by reading 3 times of watchdog, and check > the 2 deltas. Due to the randomness of the latency, there is a low possibility > situation that the first delta (latency) is big, but the second delta is small > and looks valid, which can escape from the check, and there is a > 'max_cswd_read_retries' for retrying that check covering this case, whose > default value is only 2 and may be not enough for machines with huge number > of CPUs. > > So scale and enlarge the max retry number according to CPU number to better > filter those latency noise for large systems, which has been verified fine > in 4 days and 670 rounds of reboot test on the 8-socket machine. > > Also add sanity check for user input value for 'max_cswd_read_retries', make > it self-adaptive by default, and provide a general helper for getting this > max retry number as suggested by Paul and Waiman. > > Cc: Paul E. McKenney <paulmck@kernel.org> > Cc: Waiman Long <longman@redhat.com> > Signed-off-by: Feng Tang <feng.tang@intel.com> > Tested-by: Jin Wang <jin1.wang@intel.com> > --- > Changelog: > > since v2: > * Fix the unexported symbol of helper function being used by > kernel module issue (Waiman) > > since v1: > * Add santity check for user input value of 'max_cswd_read_retries' > and a helper function for getting max retry nubmer (Paul) > * Apply the same logic to watchdog test code (Waiman) > > include/linux/clocksource.h | 18 +++++++++++++++++- > kernel/time/clocksource-wdtest.c | 12 +++++++----- > kernel/time/clocksource.c | 10 ++++++---- > 3 files changed, 30 insertions(+), 10 deletions(-) > > diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h > index 1d42d4b17327..0483f7dd66a3 100644 > --- a/include/linux/clocksource.h > +++ b/include/linux/clocksource.h > @@ -291,7 +291,23 @@ static inline void timer_probe(void) {} > #define TIMER_ACPI_DECLARE(name, table_id, fn) \ > ACPI_DECLARE_PROBE_ENTRY(timer, name, table_id, 0, NULL, 0, fn) > > -extern ulong max_cswd_read_retries; > +extern long max_cswd_read_retries; > + > +static inline long clocksource_max_watchdog_read_retries(void) > +{ > + long max_retries = max_cswd_read_retries; > + > + if (max_cswd_read_retries <= 0) { > + /* santity check for user input value */ > + if (max_cswd_read_retries != -1) > + pr_warn_once("max_cswd_read_retries was set with an invalid number: %ld\n", > + max_cswd_read_retries); > + > + max_retries = ilog2(num_online_cpus()) + 1; > + } > + return max_retries; > +} > + > void clocksource_verify_percpu(struct clocksource *cs); > > #endif /* _LINUX_CLOCKSOURCE_H */ > diff --git a/kernel/time/clocksource-wdtest.c b/kernel/time/clocksource-wdtest.c > index df922f49d171..c70cea3c44a1 100644 > --- a/kernel/time/clocksource-wdtest.c > +++ b/kernel/time/clocksource-wdtest.c > @@ -106,6 +106,7 @@ static int wdtest_func(void *arg) > unsigned long j1, j2; > char *s; > int i; > + long max_retries; > > schedule_timeout_uninterruptible(holdoff * HZ); > > @@ -139,18 +140,19 @@ static int wdtest_func(void *arg) > WARN_ON_ONCE(time_before(j2, j1 + NSEC_PER_USEC)); > > /* Verify tsc-like stability with various numbers of errors injected. */ > - for (i = 0; i <= max_cswd_read_retries + 1; i++) { > - if (i <= 1 && i < max_cswd_read_retries) > + max_retries = clocksource_max_watchdog_read_retries(); > + for (i = 0; i <= max_retries + 1; i++) { > + if (i <= 1 && i < max_retries) > s = ""; > - else if (i <= max_cswd_read_retries) > + else if (i <= max_retries) > s = ", expect message"; > else > s = ", expect clock skew"; > - pr_info("--- Watchdog with %dx error injection, %lu retries%s.\n", i, max_cswd_read_retries, s); > + pr_info("--- Watchdog with %dx error injection, %ld retries%s.\n", i, max_retries, s); > WRITE_ONCE(wdtest_ktime_read_ndelays, i); > schedule_timeout_uninterruptible(2 * HZ); > WARN_ON_ONCE(READ_ONCE(wdtest_ktime_read_ndelays)); > - WARN_ON_ONCE((i <= max_cswd_read_retries) != > + WARN_ON_ONCE((i <= max_retries) != > !(clocksource_wdtest_ktime.flags & CLOCK_SOURCE_UNSTABLE)); > wdtest_ktime_clocksource_reset(); > } > diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c > index c108ed8a9804..2e5a1d6c6712 100644 > --- a/kernel/time/clocksource.c > +++ b/kernel/time/clocksource.c > @@ -208,8 +208,8 @@ void clocksource_mark_unstable(struct clocksource *cs) > spin_unlock_irqrestore(&watchdog_lock, flags); > } > > -ulong max_cswd_read_retries = 2; > -module_param(max_cswd_read_retries, ulong, 0644); > +long max_cswd_read_retries = -1; > +module_param(max_cswd_read_retries, long, 0644); > EXPORT_SYMBOL_GPL(max_cswd_read_retries); > static int verify_n_cpus = 8; > module_param(verify_n_cpus, int, 0644); > @@ -225,8 +225,10 @@ static enum wd_read_status cs_watchdog_read(struct clocksource *cs, u64 *csnow, > unsigned int nretries; > u64 wd_end, wd_end2, wd_delta; > int64_t wd_delay, wd_seq_delay; > + long max_retries; > > - for (nretries = 0; nretries <= max_cswd_read_retries; nretries++) { > + max_retries = clocksource_max_watchdog_read_retries(); > + for (nretries = 0; nretries <= max_retries; nretries++) { > local_irq_disable(); > *wdnow = watchdog->read(watchdog); > *csnow = cs->read(cs); > @@ -238,7 +240,7 @@ static enum wd_read_status cs_watchdog_read(struct clocksource *cs, u64 *csnow, > wd_delay = clocksource_cyc2ns(wd_delta, watchdog->mult, > watchdog->shift); > if (wd_delay <= WATCHDOG_MAX_SKEW) { > - if (nretries > 1 || nretries >= max_cswd_read_retries) { > + if (nretries > 1 || nretries >= max_retries) { > pr_warn("timekeeping watchdog on CPU%d: %s retried %d times before success\n", > smp_processor_id(), watchdog->name, nretries); > } Reviewed-by: Waiman Long <longman@redhat.com>
On Mon, Jan 29 2024 at 21:45, Feng Tang wrote: > +static inline long clocksource_max_watchdog_read_retries(void) > +{ > + long max_retries = max_cswd_read_retries; > + > + if (max_cswd_read_retries <= 0) { > + /* santity check for user input value */ > + if (max_cswd_read_retries != -1) > + pr_warn_once("max_cswd_read_retries was set with an invalid number: %ld\n", > + max_cswd_read_retries); > + > + max_retries = ilog2(num_online_cpus()) + 1; I'm getting tired of these knobs and the horrors behind them. Why not simply doing the obvious: retries = ilog2(num_online_cpus()) + 1; and remove the knob alltogether? Thanks, tglx
Hi Thomas, On Mon, Feb 19, 2024 at 12:32:05PM +0100, Thomas Gleixner wrote: > On Mon, Jan 29 2024 at 21:45, Feng Tang wrote: > > +static inline long clocksource_max_watchdog_read_retries(void) > > +{ > > + long max_retries = max_cswd_read_retries; > > + > > + if (max_cswd_read_retries <= 0) { > > + /* santity check for user input value */ > > + if (max_cswd_read_retries != -1) > > + pr_warn_once("max_cswd_read_retries was set with an invalid number: %ld\n", > > + max_cswd_read_retries); > > + > > + max_retries = ilog2(num_online_cpus()) + 1; > > I'm getting tired of these knobs and the horrors behind them. Why not > simply doing the obvious: > > retries = ilog2(num_online_cpus()) + 1; > > and remove the knob alltogether? Thanks for the suggestion! Yes, this makes sense to me. IIUC, the 'max_cswd_read_retries' was introduced mainly to cover different platforms' requirement, which could now be covered by the new self-adaptive number. If there is no concern from other developers, I will send a new version in this direction. Thanks, Feng > > Thanks, > > tglx
On 2/19/24 09:37, Feng Tang wrote: > Hi Thomas, > > On Mon, Feb 19, 2024 at 12:32:05PM +0100, Thomas Gleixner wrote: >> On Mon, Jan 29 2024 at 21:45, Feng Tang wrote: >>> +static inline long clocksource_max_watchdog_read_retries(void) >>> +{ >>> + long max_retries = max_cswd_read_retries; >>> + >>> + if (max_cswd_read_retries <= 0) { >>> + /* santity check for user input value */ >>> + if (max_cswd_read_retries != -1) >>> + pr_warn_once("max_cswd_read_retries was set with an invalid number: %ld\n", >>> + max_cswd_read_retries); >>> + >>> + max_retries = ilog2(num_online_cpus()) + 1; >> I'm getting tired of these knobs and the horrors behind them. Why not >> simply doing the obvious: >> >> retries = ilog2(num_online_cpus()) + 1; >> >> and remove the knob alltogether? > Thanks for the suggestion! Yes, this makes sense to me. IIUC, the > 'max_cswd_read_retries' was introduced mainly to cover different > platforms' requirement, which could now be covered by the new > self-adaptive number. > > If there is no concern from other developers, I will send a new > version in this direction. I see no problem simplifying it. Cheers, Longman
On Mon, Feb 19, 2024 at 09:20:31PM -0500, Waiman Long wrote: > > On 2/19/24 09:37, Feng Tang wrote: > > Hi Thomas, > > > > On Mon, Feb 19, 2024 at 12:32:05PM +0100, Thomas Gleixner wrote: > > > On Mon, Jan 29 2024 at 21:45, Feng Tang wrote: > > > > +static inline long clocksource_max_watchdog_read_retries(void) > > > > +{ > > > > + long max_retries = max_cswd_read_retries; > > > > + > > > > + if (max_cswd_read_retries <= 0) { > > > > + /* santity check for user input value */ > > > > + if (max_cswd_read_retries != -1) > > > > + pr_warn_once("max_cswd_read_retries was set with an invalid number: %ld\n", > > > > + max_cswd_read_retries); > > > > + > > > > + max_retries = ilog2(num_online_cpus()) + 1; > > > I'm getting tired of these knobs and the horrors behind them. Why not > > > simply doing the obvious: > > > > > > retries = ilog2(num_online_cpus()) + 1; > > > > > > and remove the knob alltogether? > > Thanks for the suggestion! Yes, this makes sense to me. IIUC, the > > 'max_cswd_read_retries' was introduced mainly to cover different > > platforms' requirement, which could now be covered by the new > > self-adaptive number. > > > > If there is no concern from other developers, I will send a new > > version in this direction. > > I see no problem simplifying it. My guess is that we will eventually end up with something like this: retries = ilog2(num_online_cpus()) / 2 + 1; but I am not at all opposed to starting without the division by 2. Thanx, Paul
On Tue, Feb 20, 2024 at 07:24:27AM -0800, Paul E. McKenney wrote: > On Mon, Feb 19, 2024 at 09:20:31PM -0500, Waiman Long wrote: > > > > On 2/19/24 09:37, Feng Tang wrote: > > > Hi Thomas, > > > > > > On Mon, Feb 19, 2024 at 12:32:05PM +0100, Thomas Gleixner wrote: > > > > On Mon, Jan 29 2024 at 21:45, Feng Tang wrote: > > > > > +static inline long clocksource_max_watchdog_read_retries(void) > > > > > +{ > > > > > + long max_retries = max_cswd_read_retries; > > > > > + > > > > > + if (max_cswd_read_retries <= 0) { > > > > > + /* santity check for user input value */ > > > > > + if (max_cswd_read_retries != -1) > > > > > + pr_warn_once("max_cswd_read_retries was set with an invalid number: %ld\n", > > > > > + max_cswd_read_retries); > > > > > + > > > > > + max_retries = ilog2(num_online_cpus()) + 1; > > > > I'm getting tired of these knobs and the horrors behind them. Why not > > > > simply doing the obvious: > > > > > > > > retries = ilog2(num_online_cpus()) + 1; > > > > > > > > and remove the knob alltogether? > > > Thanks for the suggestion! Yes, this makes sense to me. IIUC, the > > > 'max_cswd_read_retries' was introduced mainly to cover different > > > platforms' requirement, which could now be covered by the new > > > self-adaptive number. > > > > > > If there is no concern from other developers, I will send a new > > > version in this direction. > > > > I see no problem simplifying it. > > My guess is that we will eventually end up with something like this: > > retries = ilog2(num_online_cpus()) / 2 + 1; Good point! Initially when writing the patch, I did try to search if there is a 'ilog4' api :) as the ilog2 of that 8-socket machine is about 10, which is more than enough. Thanks, Feng > > but I am not at all opposed to starting without the division by 2. > > Thanx, Paul
On Tue, Feb 20, 2024 at 11:27:21PM +0800, Feng Tang wrote: > On Tue, Feb 20, 2024 at 07:24:27AM -0800, Paul E. McKenney wrote: > > On Mon, Feb 19, 2024 at 09:20:31PM -0500, Waiman Long wrote: > > > > > > On 2/19/24 09:37, Feng Tang wrote: > > > > Hi Thomas, > > > > > > > > On Mon, Feb 19, 2024 at 12:32:05PM +0100, Thomas Gleixner wrote: > > > > > On Mon, Jan 29 2024 at 21:45, Feng Tang wrote: > > > > > > +static inline long clocksource_max_watchdog_read_retries(void) > > > > > > +{ > > > > > > + long max_retries = max_cswd_read_retries; > > > > > > + > > > > > > + if (max_cswd_read_retries <= 0) { > > > > > > + /* santity check for user input value */ > > > > > > + if (max_cswd_read_retries != -1) > > > > > > + pr_warn_once("max_cswd_read_retries was set with an invalid number: %ld\n", > > > > > > + max_cswd_read_retries); > > > > > > + > > > > > > + max_retries = ilog2(num_online_cpus()) + 1; > > > > > I'm getting tired of these knobs and the horrors behind them. Why not > > > > > simply doing the obvious: > > > > > > > > > > retries = ilog2(num_online_cpus()) + 1; > > > > > > > > > > and remove the knob alltogether? > > > > Thanks for the suggestion! Yes, this makes sense to me. IIUC, the > > > > 'max_cswd_read_retries' was introduced mainly to cover different > > > > platforms' requirement, which could now be covered by the new > > > > self-adaptive number. > > > > > > > > If there is no concern from other developers, I will send a new > > > > version in this direction. > > > > > > I see no problem simplifying it. > > > > My guess is that we will eventually end up with something like this: > > > > retries = ilog2(num_online_cpus()) / 2 + 1; > > Good point! Initially when writing the patch, I did try to search if > there is a 'ilog4' api :) as the ilog2 of that 8-socket machine is > about 10, which is more than enough. I am also not averse to starting with the above, either. ;-) Thanx, Paul > Thanks, > Feng > > > > > but I am not at all opposed to starting without the division by 2. > > > > Thanx, Paul
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h index 1d42d4b17327..0483f7dd66a3 100644 --- a/include/linux/clocksource.h +++ b/include/linux/clocksource.h @@ -291,7 +291,23 @@ static inline void timer_probe(void) {} #define TIMER_ACPI_DECLARE(name, table_id, fn) \ ACPI_DECLARE_PROBE_ENTRY(timer, name, table_id, 0, NULL, 0, fn) -extern ulong max_cswd_read_retries; +extern long max_cswd_read_retries; + +static inline long clocksource_max_watchdog_read_retries(void) +{ + long max_retries = max_cswd_read_retries; + + if (max_cswd_read_retries <= 0) { + /* santity check for user input value */ + if (max_cswd_read_retries != -1) + pr_warn_once("max_cswd_read_retries was set with an invalid number: %ld\n", + max_cswd_read_retries); + + max_retries = ilog2(num_online_cpus()) + 1; + } + return max_retries; +} + void clocksource_verify_percpu(struct clocksource *cs); #endif /* _LINUX_CLOCKSOURCE_H */ diff --git a/kernel/time/clocksource-wdtest.c b/kernel/time/clocksource-wdtest.c index df922f49d171..c70cea3c44a1 100644 --- a/kernel/time/clocksource-wdtest.c +++ b/kernel/time/clocksource-wdtest.c @@ -106,6 +106,7 @@ static int wdtest_func(void *arg) unsigned long j1, j2; char *s; int i; + long max_retries; schedule_timeout_uninterruptible(holdoff * HZ); @@ -139,18 +140,19 @@ static int wdtest_func(void *arg) WARN_ON_ONCE(time_before(j2, j1 + NSEC_PER_USEC)); /* Verify tsc-like stability with various numbers of errors injected. */ - for (i = 0; i <= max_cswd_read_retries + 1; i++) { - if (i <= 1 && i < max_cswd_read_retries) + max_retries = clocksource_max_watchdog_read_retries(); + for (i = 0; i <= max_retries + 1; i++) { + if (i <= 1 && i < max_retries) s = ""; - else if (i <= max_cswd_read_retries) + else if (i <= max_retries) s = ", expect message"; else s = ", expect clock skew"; - pr_info("--- Watchdog with %dx error injection, %lu retries%s.\n", i, max_cswd_read_retries, s); + pr_info("--- Watchdog with %dx error injection, %ld retries%s.\n", i, max_retries, s); WRITE_ONCE(wdtest_ktime_read_ndelays, i); schedule_timeout_uninterruptible(2 * HZ); WARN_ON_ONCE(READ_ONCE(wdtest_ktime_read_ndelays)); - WARN_ON_ONCE((i <= max_cswd_read_retries) != + WARN_ON_ONCE((i <= max_retries) != !(clocksource_wdtest_ktime.flags & CLOCK_SOURCE_UNSTABLE)); wdtest_ktime_clocksource_reset(); } diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index c108ed8a9804..2e5a1d6c6712 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -208,8 +208,8 @@ void clocksource_mark_unstable(struct clocksource *cs) spin_unlock_irqrestore(&watchdog_lock, flags); } -ulong max_cswd_read_retries = 2; -module_param(max_cswd_read_retries, ulong, 0644); +long max_cswd_read_retries = -1; +module_param(max_cswd_read_retries, long, 0644); EXPORT_SYMBOL_GPL(max_cswd_read_retries); static int verify_n_cpus = 8; module_param(verify_n_cpus, int, 0644); @@ -225,8 +225,10 @@ static enum wd_read_status cs_watchdog_read(struct clocksource *cs, u64 *csnow, unsigned int nretries; u64 wd_end, wd_end2, wd_delta; int64_t wd_delay, wd_seq_delay; + long max_retries; - for (nretries = 0; nretries <= max_cswd_read_retries; nretries++) { + max_retries = clocksource_max_watchdog_read_retries(); + for (nretries = 0; nretries <= max_retries; nretries++) { local_irq_disable(); *wdnow = watchdog->read(watchdog); *csnow = cs->read(cs); @@ -238,7 +240,7 @@ static enum wd_read_status cs_watchdog_read(struct clocksource *cs, u64 *csnow, wd_delay = clocksource_cyc2ns(wd_delta, watchdog->mult, watchdog->shift); if (wd_delay <= WATCHDOG_MAX_SKEW) { - if (nretries > 1 || nretries >= max_cswd_read_retries) { + if (nretries > 1 || nretries >= max_retries) { pr_warn("timekeeping watchdog on CPU%d: %s retried %d times before success\n", smp_processor_id(), watchdog->name, nretries); }