From patchwork Wed Jun 7 07:54:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Tang X-Patchwork-Id: 104325 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp91796vqr; Wed, 7 Jun 2023 01:13:16 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ597H8G0PE0xBn9gfY/l2YDOICOotjWC/uv+WjA5aPBXkmhgRZ0qa2SJlHAneSbaUPQZY6a X-Received: by 2002:a05:6a20:7f81:b0:10c:6956:a23c with SMTP id d1-20020a056a207f8100b0010c6956a23cmr2905301pzj.25.1686125595845; Wed, 07 Jun 2023 01:13:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686125595; cv=none; d=google.com; s=arc-20160816; b=MnFfZntw80Wag2TwWBv3kIGdH8oEVulZKGFRHVt+kNNZwPZmq8toEhGYdml65tkWDA FZE6918Hx7xuVn8Ow/LHAcdNLu8lo0xBxr+S/z0N+J38whp3Kf0V8Ugsb5UgFIbfKtS5 mjkB7M9+MYERgrclc0YrXVqwN53rGWri5qLn1xcufM1KkeM+LimkHaHIqJRhaGbtTT0Q HwnjB0Tt0slUzBtXiUB+tjVViDPFQWC5yTwJVkzyaOAc+4W9Ek7rp6z5CfXrH5f+ZQ1v tbx8MSu6tPeyN19g+k49vZlyhHyJlit2GMzgOAav1aguMZm6QIdTEXEcYdbe+DZUR1KP 0cSA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=RbaEXsy5QCz1Fz2feZEBxdpqTqRvKOTp7GeS7/uZENM=; b=FMkZh5SMv1A8ejInn4v7Og6DnEpSkq0OgmAVVfIO9YhVeR+EzH1f20Ll+e34DY4HlL DnPmwJLFgYrnhrLgEJ0OdwCMPkFhizXriB/JfW2jbKAu5Nl06gxi66Ta/eaIys7TQdY5 BAzcr714agF7SyymImTuAI785IoTwb6hVomvuF0S6vqo8Vx+8GwKh+D8L3n7sUxei21H E42SlmEFP+pNpkOxWH/XJZkjZsfNaUT46AcbpfY0F1OQvJAHPls8IGC8/CQcKMmmygpA Tk/Gjwa6GXG2JN6aITWlIhs1sWN0HUU1JNQMy8jxaA5hwfEZmA9eoa5BRe3B09U17uOc Ox9A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=BNiGR1kw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v135-20020a63618d000000b00545276c46e7si280662pgb.624.2023.06.07.01.13.02; Wed, 07 Jun 2023 01:13:15 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=BNiGR1kw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238608AbjFGIBx (ORCPT + 99 others); Wed, 7 Jun 2023 04:01:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48744 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234162AbjFGIBr (ORCPT ); Wed, 7 Jun 2023 04:01:47 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B4B808E for ; Wed, 7 Jun 2023 01:01:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686124906; x=1717660906; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=e8MTFcSll/wbxcfO1IjSS7T6C650r/6+xSXMzyq2gZA=; b=BNiGR1kwHOIBLRnkJSXnpTrfL49+jvGDm4jNTgHuSLLzMeHvWAL++TG/ rlP8bgrC2/TKMz478GzxcIFBnKR2ou/yVYg31R1GtBfSn4easK9FSLmqm 2RIO4gkakvlSA981EBP7tVXO7vgJTK6ao0/LvA/razaSw5NHFsXuShXER jaCETr+McZDQN1oBhMi428teWmBs3FgWw4+ShxYMLrAyblxhng2PmtBJ2 zU3fj5SinMW7PPCZastEiiBDiZqsP1mGnS11qkpTL04ZQLPXpiuiNl8PU vTcf9iMqnz/aVM/ukGOAOh+uM0KcXn/ZACfgwgSpoo1gdRD34rHq50/HV g==; X-IronPort-AV: E=McAfee;i="6600,9927,10733"; a="360245238" X-IronPort-AV: E=Sophos;i="6.00,223,1681196400"; d="scan'208";a="360245238" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Jun 2023 01:01:45 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10733"; a="799202916" X-IronPort-AV: E=Sophos;i="6.00,223,1681196400"; d="scan'208";a="799202916" Received: from feng-clx.sh.intel.com ([10.238.200.228]) by FMSMGA003.fm.intel.com with ESMTP; 07 Jun 2023 01:01:30 -0700 From: Feng Tang To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H . Peter Anvin" , Peter Zijlstra , x86@kernel.org, linux-kernel@vger.kernel.org, paulmck@kernel.org Cc: rui.zhang@intel.com, Feng Tang , Yu Liao Subject: [PATCH] x86/tsc: Extend watchdog check exemption to 4-Sockets platform Date: Wed, 7 Jun 2023 15:54:33 +0800 Message-Id: <20230607075433.387075-1-feng.tang@intel.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768030832673183756?= X-GMAIL-MSGID: =?utf-8?q?1768030832673183756?= There were reports again that the tsc clocksource on 4 sockets x86 servers was wrongly judged as 'unstable' by 'jiffies' and other watchdogs, and disabled [1][2]. Commit b50db7095fe0 ("x86/tsc: Disable clocksource watchdog for TSC on qualified platorms") was introduce to deal with these false alarms of tsc unstable issues, covering qualified platforms for 2 sockets or smaller ones. And from history of chasing TSC issues, Thomas and Peter only saw real TSC synchronization issue on 8 socket machines. So extend the exemption to 4 sockets to fix the issue. Rui also proposed another way to disable 'jiffies' as clocksource watchdog [3], which can also solve problem in [1]. in an architecture independent way, but can't cure the problem in [2]. whose watchdog is HPET or PMTIMER, while 'jiffies' is mostly used as watchdog in boot phase. 'nr_online_nodes' has known inaccurate problem for cases like platform with cpu-less memory nodes, sub numa cluster enabled, fakenuma, kernel cmdline parameter 'maxcpus=', etc. The harmful case is the 'maxcpus' one which could possibly under estimates the package number, and disable the watchdog, but bright side is it is mostly for debug usage. All these will be addressed in other patches, as discussed in thread [4]. [1]. https://lore.kernel.org/all/9d3bf570-3108-0336-9c52-9bee15767d29@huawei.com/ [2]. https://lore.kernel.org/lkml/06df410c-2177-4671-832f-339cff05b1d9@paulmck-laptop/ [3]. https://lore.kernel.org/all/bd5b97f89ab2887543fc262348d1c7cafcaae536.camel@intel.com/ [4]. https://lore.kernel.org/all/20221021062131.1826810-1-feng.tang@intel.com/ Reported-by: Yu Liao Reported-by: Paul E. McKenney Signed-off-by: Feng Tang Reviewed-by: Paul E. McKenney --- arch/x86/kernel/tsc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index 344698852146..f15066a1d473 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -1238,7 +1238,7 @@ static void __init check_system_tsc_reliable(void) if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) && boot_cpu_has(X86_FEATURE_NONSTOP_TSC) && boot_cpu_has(X86_FEATURE_TSC_ADJUST) && - nr_online_nodes <= 2) + nr_online_nodes <= 4) tsc_disable_clocksource_watchdog(); }