Message ID | 20221021062131.1826810-2-feng.tang@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4242:0:0:0:0:0 with SMTP id s2csp521789wrr; Thu, 20 Oct 2022 23:34:00 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4CL8EDjabPgYOe3xH00F3RCx+kQKDpBSz2p5OdMlHbX2m2cYpMzZy9t0iN/YkBfJ5Z1gY5 X-Received: by 2002:aa7:c6c8:0:b0:460:e19b:ec12 with SMTP id b8-20020aa7c6c8000000b00460e19bec12mr5465247eds.209.1666334040760; Thu, 20 Oct 2022 23:34:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666334040; cv=none; d=google.com; s=arc-20160816; b=tFBApDGuzm+sc0IMOfpzgwIfSDk70IuQvqSrHVDu9uxiEes2X6fErIn7mlP8XPH/Rj 0vb5BTvJL5H8LbWCy4xcdmdqh1luC8AATVdjQ4KgFELhCobruxzxTajlYIveUR56yymh CfvdrcDBEpsdNO8+t51XE0TbAUpN2rx6mPlanp4H7HcmgQ2ePQyHq6/Hyzx5xcc2Sqnp HuIW1KdZVlFNpLQIkfdny0M1li+t3JTIbMv3JzEdoPvkXCubdxbgHmEOs3SEuXLZFHNB TAjapURtS0DKLoAEI23oJ3OXUtPkn/Eketgq6S0kPo6XOmQDMTpy5tDIetwBjE9JNN2k b6QQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=N2qR5ZH6fBABl2WcUiv/7d02SDv+bhAxybM+ZhReIag=; b=aiLMXC38Ry4gjfxX0JpF3sJp8ZmtVlW1rYLkq4qwNm2l5MIVKLbpYXfnY4ra872rdO p24WTVClQMoJYBPSFMTBsLhJVj6Q/ffvP6a9SW7JxoQ9h/z7mlGgv6Lp9SHeDgvEUNc3 FrkFT07pejk3yrgqzz+JvNzkms9juBYGjvZYz6yfAHh2DlOHtQmy0miafUemxyht3AOQ cccaFbLrj5RrSOR+ZtA4CUqSkok1pWbzQn1nuWe67JiyQx7+Na2AkiJi8XF5eRbGykXV 6wwJVcu+0cnBVvT2y4S4HHvTHNVn8fk+pfellLS2/IpyuGFWLu2GHu9tRkwFcmYwDwUa m6HQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=cqRyEv+y; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m22-20020a056402051600b0045961c7dde9si16956729edv.63.2022.10.20.23.33.36; Thu, 20 Oct 2022 23:34:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=cqRyEv+y; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229615AbiJUGVn (ORCPT <rfc822;pwkd43@gmail.com> + 99 others); Fri, 21 Oct 2022 02:21:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42454 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229962AbiJUGVk (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 21 Oct 2022 02:21:40 -0400 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5451F228CCC for <linux-kernel@vger.kernel.org>; Thu, 20 Oct 2022 23:21:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1666333299; x=1697869299; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=j97xvK9t++xli12verMTAsYTdoFDdE9oHAeuOe31Zps=; b=cqRyEv+yuOYL6IJQPIXC/OMjAUgmcQucvyA1K0v7ls/tIEFftq9N2+L8 FpTGjk3FEpQp1w3TRI6M954iOiPbbPZ/LxyVD6QSIN8nQDuzxIQcN3IlJ SO30puLbznZFyUqwQ/ZxiK64GRSje74xkhxpuhSC51nMEBT2hflCHgTIx YoY17OQP4Oy0IHk18IKMC+eWOnbqt7z1NE3pIF5XReyw7izeFuRbuNyai JVjfjrLjaiX4SVeOFSfI5Da5id5tbsEmlvuTMEFvl1K3K6Jxq2YbaYrNg kkX04gNljCjrhlIBXbwnirgSXW0f3tON+OmGyRxUji42VX1IeWbaSPgFp w==; X-IronPort-AV: E=McAfee;i="6500,9779,10506"; a="287324653" X-IronPort-AV: E=Sophos;i="5.95,200,1661842800"; d="scan'208";a="287324653" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Oct 2022 23:21:39 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10506"; a="755625363" X-IronPort-AV: E=Sophos;i="5.95,200,1661842800"; d="scan'208";a="755625363" Received: from feng-clx.sh.intel.com ([10.238.200.228]) by orsmga004.jf.intel.com with ESMTP; 20 Oct 2022 23:21:36 -0700 From: Feng Tang <feng.tang@intel.com> To: Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@intel.com>, "H . Peter Anvin" <hpa@zytor.com>, Peter Zijlstra <peterz@infradead.org>, x86@kernel.org, linux-kernel@vger.kernel.org Cc: rui.zhang@intel.com, tim.c.chen@intel.com, Xiongfeng Wang <wangxiongfeng2@huawei.com>, liaoyu15@huawei.com, Feng Tang <feng.tang@intel.com> Subject: [PATCH v1 2/2] x86/tsc: Extend watchdog check exemption to 4-Sockets platform Date: Fri, 21 Oct 2022 14:21:31 +0800 Message-Id: <20221021062131.1826810-2-feng.tang@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20221021062131.1826810-1-feng.tang@intel.com> References: <20221021062131.1826810-1-feng.tang@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747277883103926743?= X-GMAIL-MSGID: =?utf-8?q?1747277883103926743?= |
Series |
[v1,1/2] x86/tsc: use logical_package as a better estimation of socket numbers
|
|
Commit Message
Feng Tang
Oct. 21, 2022, 6:21 a.m. UTC
There is report again that the tsc clocksource on a 4 sockets x86
Skylake server was wrongly judged as 'unstable' by 'jiffies' watchdog,
and disabled [1].
Commit b50db7095fe0 ("x86/tsc: Disable clocksource watchdog for TSC
on qualified platorms") was introduce to deal with these false
alarms of tsc unstable issues, covering qualified platforms for 2
sockets or smaller ones.
Extend the exemption to 4 sockets to fix the issue.
We also got similar reports on 8 sockets platform from internal test,
but as Peter pointed out, there was tsc sync issues for 8-sockets
platform, and it'd better be handled architecture by architecture,
instead of directly changing the threshold to 8 here.
Rui also proposed another way to disable 'jiffies' as clocksource
watchdog [2], which can also solve this specific problem in an
architecture independent way, with one limitation that there are
also some tsc false alarms which were reported by other hardware
watchdogs like HPET/PMTIMER, while 'jiffies' watchdog is mostly
used in kernel boot phase.
[1]. https://lore.kernel.org/all/9d3bf570-3108-0336-9c52-9bee15767d29@huawei.com/
[2]. https://lore.kernel.org/all/bd5b97f89ab2887543fc262348d1c7cafcaae536.camel@intel.com/
Reported-by: Yu Liao <liaoyu15@huawei.com>
Signed-off-by: Feng Tang <feng.tang@intel.com>
---
arch/x86/kernel/tsc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Comments
On Fri, Oct 21, 2022 at 02:21:31PM +0800, Feng Tang wrote: > There is report again that the tsc clocksource on a 4 sockets x86 > Skylake server was wrongly judged as 'unstable' by 'jiffies' watchdog, > and disabled [1]. > > Commit b50db7095fe0 ("x86/tsc: Disable clocksource watchdog for TSC > on qualified platorms") was introduce to deal with these false > alarms of tsc unstable issues, covering qualified platforms for 2 > sockets or smaller ones. > > Extend the exemption to 4 sockets to fix the issue. > > We also got similar reports on 8 sockets platform from internal test, > but as Peter pointed out, there was tsc sync issues for 8-sockets > platform, and it'd better be handled architecture by architecture, > instead of directly changing the threshold to 8 here. > > Rui also proposed another way to disable 'jiffies' as clocksource > watchdog [2], which can also solve this specific problem in an > architecture independent way, with one limitation that there are > also some tsc false alarms which were reported by other hardware > watchdogs like HPET/PMTIMER, while 'jiffies' watchdog is mostly > used in kernel boot phase. > > [1]. https://lore.kernel.org/all/9d3bf570-3108-0336-9c52-9bee15767d29@huawei.com/ > [2]. https://lore.kernel.org/all/bd5b97f89ab2887543fc262348d1c7cafcaae536.camel@intel.com/ > > Reported-by: Yu Liao <liaoyu15@huawei.com> > Signed-off-by: Feng Tang <feng.tang@intel.com> We have a number of four-socket systems whose TSCs seem to be reliable. We do see issues where high memory load forces the TSC to be marked unstable, but that is because those systems are using an older kernel. If the TSCs do start to misbehave, I will of course let you all know. But in the meantime: Reviewed-by: Paul E. McKenney <paulmck@kernel.org> The previous patch that changes the definition of "socket" I have no opinion on. I must let you guys work that out. However, I do note that this patch can be rebased so as to no longer depend on that patch. Thanx, Paul > --- > arch/x86/kernel/tsc.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c > index 178448ef00c7..356f06287034 100644 > --- a/arch/x86/kernel/tsc.c > +++ b/arch/x86/kernel/tsc.c > @@ -1400,7 +1400,7 @@ static int __init init_tsc_clocksource(void) > if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) && > boot_cpu_has(X86_FEATURE_NONSTOP_TSC) && > boot_cpu_has(X86_FEATURE_TSC_ADJUST) && > - logical_packages <= 2) > + logical_packages <= 4) > clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY; > > /* > -- > 2.34.1 >
On Fri, Jun 02, 2023 at 11:00:55AM -0700, Paul E. McKenney wrote: > On Fri, Oct 21, 2022 at 02:21:31PM +0800, Feng Tang wrote: > > There is report again that the tsc clocksource on a 4 sockets x86 > > Skylake server was wrongly judged as 'unstable' by 'jiffies' watchdog, > > and disabled [1]. > > > > Commit b50db7095fe0 ("x86/tsc: Disable clocksource watchdog for TSC > > on qualified platorms") was introduce to deal with these false > > alarms of tsc unstable issues, covering qualified platforms for 2 > > sockets or smaller ones. > > > > Extend the exemption to 4 sockets to fix the issue. > > > > We also got similar reports on 8 sockets platform from internal test, > > but as Peter pointed out, there was tsc sync issues for 8-sockets > > platform, and it'd better be handled architecture by architecture, > > instead of directly changing the threshold to 8 here. > > > > Rui also proposed another way to disable 'jiffies' as clocksource > > watchdog [2], which can also solve this specific problem in an > > architecture independent way, with one limitation that there are > > also some tsc false alarms which were reported by other hardware > > watchdogs like HPET/PMTIMER, while 'jiffies' watchdog is mostly > > used in kernel boot phase. > > > > [1]. https://lore.kernel.org/all/9d3bf570-3108-0336-9c52-9bee15767d29@huawei.com/ > > [2]. https://lore.kernel.org/all/bd5b97f89ab2887543fc262348d1c7cafcaae536.camel@intel.com/ > > > > Reported-by: Yu Liao <liaoyu15@huawei.com> > > Signed-off-by: Feng Tang <feng.tang@intel.com> > > We have a number of four-socket systems whose TSCs seem to be reliable. > We do see issues where high memory load forces the TSC to be marked > unstable, but that is because those systems are using an older kernel. Thanks for sharing the info. > > If the TSCs do start to misbehave, I will of course let you all know. That will be very helpful! I don't have much access to 4 socket machines. > But in the meantime: > > Reviewed-by: Paul E. McKenney <paulmck@kernel.org> > > The previous patch that changes the definition of "socket" I have no > opinion on. I must let you guys work that out. However, I do note that > this patch can be rebased so as to no longer depend on that patch. During previous discussion, Thomas and Peter mentioned they only saw real TSC synchronization issue on some old generation 8 socket/package machine. I can separate this and send out for review. Meanwhile I'll rework on the 1/2 patch and test more. Thanks, Feng > Thanx, Paul > > > --- > > arch/x86/kernel/tsc.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c > > index 178448ef00c7..356f06287034 100644 > > --- a/arch/x86/kernel/tsc.c > > +++ b/arch/x86/kernel/tsc.c > > @@ -1400,7 +1400,7 @@ static int __init init_tsc_clocksource(void) > > if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) && > > boot_cpu_has(X86_FEATURE_NONSTOP_TSC) && > > boot_cpu_has(X86_FEATURE_TSC_ADJUST) && > > - logical_packages <= 2) > > + logical_packages <= 4) > > clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY; > > > > /* > > -- > > 2.34.1 > >
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index 178448ef00c7..356f06287034 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -1400,7 +1400,7 @@ static int __init init_tsc_clocksource(void) if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) && boot_cpu_has(X86_FEATURE_NONSTOP_TSC) && boot_cpu_has(X86_FEATURE_TSC_ADJUST) && - logical_packages <= 2) + logical_packages <= 4) clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY; /*