From patchwork Mon Jan 22 17:23:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiri Wiesner X-Patchwork-Id: 190300 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2bc4:b0:101:a8e8:374 with SMTP id hx4csp2761309dyb; Mon, 22 Jan 2024 10:34:48 -0800 (PST) X-Google-Smtp-Source: AGHT+IHCaKsWfudA4kUY3KIkqpSk45rIFfIulMNeUFvQ4wTIeSkDgJ1n8wXb8q9LOxBT6qRuXl0Q X-Received: by 2002:a05:6a21:7882:b0:19c:5387:c112 with SMTP id bf2-20020a056a21788200b0019c5387c112mr610443pzc.43.1705948488650; Mon, 22 Jan 2024 10:34:48 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1705948488; cv=pass; d=google.com; s=arc-20160816; b=0ei04ZKL6qmngSdOPgaJIeLYDL/Z3kqfukOMwLzAJS6JP4PXCEyfJ999JFdTi+cZkC uSE1mQkGOqZkMePHRMXJ4TqyHvZJQDnX4FPIshktdE1CZvX1upMu/DOIR339HDOIoUgk +mnqkOwIlJAl9hD3hctszP2d5k6sRSohndfq9OWQM3k7BWWBNxP+9twloBQnl668gSG7 umP777o8AyDwSEXysMUb2iui/OGalIg11P5rqmFjDhC0Uc8yTyUQPGqJcwxKZJOY50SJ fraeH/12R2RclU1wot+mNGBSsol8q7BGKRRCLU1LSmjTdFS+nNJdvx8loIvz21zUCFu3 kTuw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=user-agent:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:subject:cc:to:from :date:dkim-signature:dkim-signature:dkim-signature:dkim-signature; bh=w7phIx9YZ+djyn/G3G10Q3TOuO+gXHmpB4a37pPWqI4=; fh=K9eOPowSNCHNEltiQEZKVlmTFxutM7whsYPWsjDosEc=; b=phL23sylOvOtfJy64KWaYSN5fxYPh0PvUNsTI87OK49kNq1QLAmDia9KexIgKMJSri PxdXFArul48TioAH9iDSXtOs9k2qnzGhJFzwNV6nU7o4U/cegESTiTKZDU7eTPvV6Vse Xtviiuef5p205JVB0mq1CfF3iOIwFD38y6koU2Ur1pw3dO0aZ9vdrHpOqFEf8koaS87w U3r/sLzi9wS3rgFLpQbeTT54YsTt411rrzV9pHlSwDF/pSysJ2QcOiqbbSjDVKKZMkPO /CdnXEc5E0AuTIWGsK72LRCtdxeKjQXzKSE/C7yKy9bIBKMwNav3OkH2RsAkstHg4kEN KlFg== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=IkeY0XKf; dkim=neutral (no key) header.i=@suse.de header.s=susede2_ed25519; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=IkeY0XKf; dkim=neutral (no key) header.i=@suse.de header.s=susede2_ed25519; arc=pass (i=1 spf=pass spfdomain=suse.de dkim=pass dkdomain=suse.de dkim=pass dkdomain=suse.de dmarc=pass fromdomain=suse.de); spf=pass (google.com: domain of linux-kernel+bounces-33801-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-33801-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=suse.de Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id e16-20020a630f10000000b005cd8cb67fe9si8506270pgl.583.2024.01.22.10.34.48 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Jan 2024 10:34:48 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-33801-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=IkeY0XKf; dkim=neutral (no key) header.i=@suse.de header.s=susede2_ed25519; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=IkeY0XKf; dkim=neutral (no key) header.i=@suse.de header.s=susede2_ed25519; arc=pass (i=1 spf=pass spfdomain=suse.de dkim=pass dkdomain=suse.de dkim=pass dkdomain=suse.de dmarc=pass fromdomain=suse.de); spf=pass (google.com: domain of linux-kernel+bounces-33801-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-33801-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=suse.de Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 2B775B2FD03 for ; Mon, 22 Jan 2024 18:01:08 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 1061C53E10; Mon, 22 Jan 2024 17:24:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b="IkeY0XKf"; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b="jLWQNuuS"; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b="IkeY0XKf"; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b="jLWQNuuS" Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 876A33A1A0 for ; Mon, 22 Jan 2024 17:23:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705944240; cv=none; b=pYzjJoctSYv4CFtdl4cLIkRCuiyIXT4s/q+wMpxeevXsPqsfllb3Kcj1To9YCX82CNw8l/IgTSM8pEr3xJ+JXyWNlXn/gaVodKPuQY9LJowqiYCGQ4HRkV6cMXMDeUvantl28o6GOZjof/lmUiT0A1VKyibIxYBUQxxR1rcswGU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705944240; c=relaxed/simple; bh=DS26voeHquSs1rZvZjGZPXSp1ip97sF6UV65vOwDmpU=; h=Date:From:To:Cc:Subject:Message-ID:MIME-Version:Content-Type: Content-Disposition; b=aXJ7VZ2ddGAzKzg5yIzbYsc1tLXyzwFYhqwnkneMnjvaQSGZEq0F15cnajzivx0G6XznXSGlblBdNa0N9UrZmhY+tTCr9Uvc5R4r/31RjGMV5kxb3gGk/Wj+j83YmnvpIABxy16a4geRHupjJ2jORWsmlT5s+f38QxWMAXFpynw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=suse.de; spf=pass smtp.mailfrom=suse.de; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b=IkeY0XKf; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b=jLWQNuuS; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b=IkeY0XKf; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b=jLWQNuuS; arc=none smtp.client-ip=195.135.223.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.de Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id C1B701F388; Mon, 22 Jan 2024 17:23:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1705944230; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type; bh=w7phIx9YZ+djyn/G3G10Q3TOuO+gXHmpB4a37pPWqI4=; b=IkeY0XKfOuDO6aeipUtj0lMmBIhnlH+Et5YAs6VbrGusUkjHlhghGVH9028j6h71hudS+f CLqX2SQ1BKHf5c+9v1S7AQ5ypx+sjTvtrRQaijN8bLDSGfQlJKEwY3EXuvluDOLWlmjmnd vt+WaEkBB4wkj44v/wj/KWNOwr2LeGQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1705944230; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type; bh=w7phIx9YZ+djyn/G3G10Q3TOuO+gXHmpB4a37pPWqI4=; b=jLWQNuuS2Kbl2rmdAW7JytKkV5rbKNh9PD4R61aH074lWdsr1LMC1oyifxKsldtE4+PxAe xIbE5fsuOGvil8DA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1705944230; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type; bh=w7phIx9YZ+djyn/G3G10Q3TOuO+gXHmpB4a37pPWqI4=; b=IkeY0XKfOuDO6aeipUtj0lMmBIhnlH+Et5YAs6VbrGusUkjHlhghGVH9028j6h71hudS+f CLqX2SQ1BKHf5c+9v1S7AQ5ypx+sjTvtrRQaijN8bLDSGfQlJKEwY3EXuvluDOLWlmjmnd vt+WaEkBB4wkj44v/wj/KWNOwr2LeGQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1705944230; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type; bh=w7phIx9YZ+djyn/G3G10Q3TOuO+gXHmpB4a37pPWqI4=; b=jLWQNuuS2Kbl2rmdAW7JytKkV5rbKNh9PD4R61aH074lWdsr1LMC1oyifxKsldtE4+PxAe xIbE5fsuOGvil8DA== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id AAA66136A4; Mon, 22 Jan 2024 17:23:50 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id 662gKaakrmXJDwAAD6G6ig (envelope-from ); Mon, 22 Jan 2024 17:23:50 +0000 Received: by incl.suse.cz (Postfix, from userid 1000) id 1F7A59C7E1; Mon, 22 Jan 2024 18:23:50 +0100 (CET) Date: Mon, 22 Jan 2024 18:23:50 +0100 From: Jiri Wiesner To: linux-kernel@vger.kernel.org Cc: John Stultz , Thomas Gleixner , Stephen Boyd , "Paul E. McKenney" , Feng Tang Subject: [PATCH v3] clocksource: Skip watchdog check for large watchdog intervals Message-ID: <20240122172350.GA740@incl> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.10.1 (2018-07-13) Authentication-Results: smtp-out2.suse.de; none X-Spam-Level: X-Spam-Score: -3.80 X-Spamd-Result: default: False [-3.80 / 50.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; RCPT_COUNT_FIVE(0.00)[6]; RCVD_COUNT_THREE(0.00)[3]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; NEURAL_HAM_SHORT(-0.20)[-1.000]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.de:email,intel.com:email]; FUZZY_BLOCKED(0.00)[rspamd.com]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; MID_RHS_NOT_FQDN(0.50)[]; RCVD_TLS_ALL(0.00)[]; BAYES_HAM(-3.00)[100.00%] X-Spam-Flag: NO X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1788816642121190183 X-GMAIL-MSGID: 1788816642121190183 There have been reports of the watchdog marking clocksources unstable on machines with 8 NUMA nodes: > clocksource: timekeeping watchdog on CPU373: Marking clocksource 'tsc' as unstable because the skew is too large: > clocksource: 'hpet' wd_nsec: 14523447520 wd_now: 5a749706 wd_last: 45adf1e0 mask: ffffffff > clocksource: 'tsc' cs_nsec: 14524115132 cs_now: 515ce2c5a96caa cs_last: 515cd9a9d83918 mask: ffffffffffffffff > clocksource: 'tsc' is current clocksource. > tsc: Marking TSC unstable due to clocksource watchdog > TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'. > sched_clock: Marking unstable (1950347883333462, 79649632569)<-(1950428279338308, -745776594) > clocksource: Checking clocksource tsc synchronization from CPU 400 to CPUs 0,46,52,54,138,208,392,397. > clocksource: Switched to clocksource hpet The measured clocksource skew - the absolute difference between cs_nsec and wd_nsec - was 668 microseconds: > cs_nsec - wd_nsec = 14524115132 - 14523447520 = 667612 The kernel (based on 5.14.21) used 200 microseconds for the uncertainty_margin of both the clocksource and watchdog, resulting in a threshold of 400 microseconds (the md variable). Both the cs_nsec and the wd_nsec value indicate that the readout interval was circa 14.5 seconds. The observed behaviour is that watchdog checks failed for large readout intervals on 8 NUMA node machines. This indicates that the size of the skew was directly proportinal to the length of the readout interval on those machines. The measured clocksource skew, 668 microseconds, was evaluated against a threshold (the md variable) that is suited for readout intervals of roughly WATCHDOG_INTERVAL, i.e. HZ >> 1, which is 0.5 second. The intention of 2e27e793e280 ("clocksource: Reduce clocksource-skew threshold") was to tighten the threshold for evaluating skew and set the lower bound for the uncertainty_margin of clocksources to twice WATCHDOG_MAX_SKEW. Later in c37e85c135ce ("clocksource: Loosen clocksource watchdog constraints"), the WATCHDOG_MAX_SKEW constant was increased to 125 microseconds to fit the limit of NTP, which is able to use a clocksource that suffers from up to 500 microseconds of skew per second. Both the TSC and the HPET use default uncertainty_margin. When the readout interval gets stretched the default uncertainty_margin is no longer a suitable lower bound for evaluating skew - it imposes a limit that is far stricter than the skew with which NTP can deal. The root causes of the skew being directly proportinal to the length of the readout interval are * the inaccuracy of the shift/mult pairs of clocksources and the watchdog * the conversion to nanoseconds is imprecise for large readout intervals Prevent this by skipping the current watchdog check if the readout interval exceeds 2 * WATCHDOG_INTERVAL. Considering the maximum readout interval of 2 * WATCHDOG_INTERVAL, the current default uncertainty margin (of the TSC and HPET) corresponds to a limit on clocksource skew of 250 ppm (microseconds of skew per second). To keep the limit imposed by NTP (500 microseconds of skew per second) for all possible readout intervals, the margins would have to be scaled so that the threshold value is proportional to the length of the actual readout interval. As for why the readout interval may get stretched: Since the watchdog is executed in softirq context the expiration of the watchdog timer can get severely delayed on account of a ksoftirqd thread not getting to run in a timely manner. Surely, a system with such belated softirq execution is not working well and the scheduling issue should be looked into but the clocksource watchdog should be able to deal with it accordingly. Fixes: 2e27e793e280 ("clocksource: Reduce clocksource-skew threshold") Suggested-by: Feng Tang Reviewed-by: Feng Tang Tested-by: Paul E. McKenney Signed-off-by: Jiri Wiesner --- v2: fixed interger overflow in WATCHDOG_INTR_MAX_NS on i386 v3: variable renaming, threshold adjusted, message and log changes kernel/time/clocksource.c | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index c108ed8a9804..3052b1f1168e 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -99,6 +99,7 @@ static u64 suspend_start; * Interval: 0.5sec. */ #define WATCHDOG_INTERVAL (HZ >> 1) +#define WATCHDOG_INTERVAL_MAX_NS ((2 * WATCHDOG_INTERVAL) * (NSEC_PER_SEC / HZ)) /* * Threshold: 0.0312s, when doubled: 0.0625s. @@ -134,6 +135,7 @@ static DECLARE_WORK(watchdog_work, clocksource_watchdog_work); static DEFINE_SPINLOCK(watchdog_lock); static int watchdog_running; static atomic_t watchdog_reset_pending; +static int64_t watchdog_max_interval; static inline void clocksource_watchdog_lock(unsigned long *flags) { @@ -399,8 +401,8 @@ static inline void clocksource_reset_watchdog(void) static void clocksource_watchdog(struct timer_list *unused) { u64 csnow, wdnow, cslast, wdlast, delta; + int64_t wd_nsec, cs_nsec, interval; int next_cpu, reset_pending; - int64_t wd_nsec, cs_nsec; struct clocksource *cs; enum wd_read_status read_ret; unsigned long extra_wait = 0; @@ -470,6 +472,27 @@ static void clocksource_watchdog(struct timer_list *unused) if (atomic_read(&watchdog_reset_pending)) continue; + /* + * The processing of timer softirqs can get delayed (usually + * on account of ksoftirqd not getting to run in a timely + * manner), which causes the watchdog interval to stretch. + * Skew detection may fail for longer watchdog intervals + * on account of fixed margins being used. + * Some clocksources, e.g. acpi_pm, cannot tolerate + * watchdog intervals longer than a few seconds. + */ + interval = max(cs_nsec, wd_nsec); + if (unlikely(interval > WATCHDOG_INTERVAL_MAX_NS)) { + if (system_state > SYSTEM_SCHEDULING && + interval > 2 * watchdog_max_interval) { + watchdog_max_interval = interval; + pr_warn("Long readout interval, skipping watchdog check: cs_nsec: %lld wd_nsec: %lld\n", + cs_nsec, wd_nsec); + } + watchdog_timer.expires = jiffies; + continue; + } + /* Check the deviation from the watchdog clocksource. */ md = cs->uncertainty_margin + watchdog->uncertainty_margin; if (abs(cs_nsec - wd_nsec) > md) {