Message ID | 20230522033018.1276836-1-feng.tang@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp1185821vqo; Sun, 21 May 2023 20:46:06 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5bdFl0ovGWl+kMaRaAHekQkStrGPcIwGyn4OQ7mSRss6hYOV77tEKY3ody5zvapQCuHMxS X-Received: by 2002:a17:90a:9514:b0:24e:1f06:4d32 with SMTP id t20-20020a17090a951400b0024e1f064d32mr9064689pjo.2.1684727166557; Sun, 21 May 2023 20:46:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684727166; cv=none; d=google.com; s=arc-20160816; b=OXEcCIJ+J0peIp0X90WNmDLYCQGMus07KxkCthCO2zLcg1Ax8uNuJZH/1Mxh2KwXim TrjojpZR5CwdYSH2lV/g+6meOLvXwuA3YLYgmN0rpIg1hLWrjhRY+c+H9DeIu5bVs3+h tv34NNe0GXuvzzk6wvOhL/eJsO61Psi3aEMNyR0TC1scgL8X8qET9G7toaKc6JZ3SGnZ X0SDnjO5QELfqfIxLnVwDKAB8FxSVKPP42ElaGamEbZP+Ox2Wu6mHO+pxmpREjzQFCK6 O//GOB6MmFoctHdgfuMWK6PHv8Nn/c/PrJurSdP8W9IFcVEarO/ULP31qW58SSpF3tJp +l2A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=/SoaAZneV/IWUj4Nwp+KpLLGoV0BDyB38DSmShyTM9E=; b=aaX5NpnbwhW+039fTpWSX3On7wd6YH42ROPlvU99aiKbnBx1Sdl2UZSfPPT4xXNPl8 G7qlqKGRRoqt0cUDPUhj0tS4fpW4jCGzOWA1C1IDMXV0svbrLIFGbuw3wVoZIy1uIew6 7wEzSWe3Az36HtBbnXruHcqJpV3JbUeHRwdUxCfkXCAyol9y4Oblii1Y1vLuiW180oOX LCm1m5tknFzxAfWwkwxJ1wf5pQb41l+nye5wD9oNyydXqvyiVkXHznmGiBh+9r95cAmJ liYPXAUfABVGRY/VVOcoOlTXAZWLFhjFxUQb1sG+Bz6YyGt8JQh5haJBto907b6nHfMV x+6A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=k5R3ktOX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q194-20020a632acb000000b0052d7238ad55si4022292pgq.793.2023.05.21.20.45.52; Sun, 21 May 2023 20:46:06 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=k5R3ktOX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229819AbjEVDhS (ORCPT <rfc822;cscallsign@gmail.com> + 99 others); Sun, 21 May 2023 23:37:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56370 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229501AbjEVDhC (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sun, 21 May 2023 23:37:02 -0400 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EBB7DA2 for <linux-kernel@vger.kernel.org>; Sun, 21 May 2023 20:37:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1684726621; x=1716262621; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=P59aXtyfm/ISccYK7l1gh94sKmTWDcB1nL/Li5f4TEA=; b=k5R3ktOXkdkt0+QT6+0OUnnjFlCtu1g5BcSiKlgOnMr8fWofsbXdACSw 2JM03bymWJiWZ/FDw9y9IKYZmafvavn0fkU7gt+bDOEjE1WfFRCxCXVCA QayN3eETImf5BnjwyPN4Nf8EWQEsQazCW2NJ/lgaX2AbeYys3O85MHbps Hy4Gw7MAVhHh4NT1axLbn+TSYreprOi10XyxozdEi33VNt8a9TdN7jjpm 7nkqtTxkMVk2U+yxFIzjRoYzo/zhcPAeS7Gv3g7bew+sByTY6eMe/4Pn2 LnET4VN1atNRHmo70FimFJx8UWrcm0avaeSHLsmcsMNtg3EJHjyDoCG3q g==; X-IronPort-AV: E=McAfee;i="6600,9927,10717"; a="350327417" X-IronPort-AV: E=Sophos;i="6.00,183,1681196400"; d="scan'208";a="350327417" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2023 20:37:01 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10717"; a="815499748" X-IronPort-AV: E=Sophos;i="6.00,183,1681196400"; d="scan'208";a="815499748" Received: from feng-clx.sh.intel.com ([10.238.200.228]) by fmsmga002.fm.intel.com with ESMTP; 21 May 2023 20:36:59 -0700 From: Feng Tang <feng.tang@intel.com> To: Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@intel.com>, "H . Peter Anvin" <hpa@zytor.com>, Peter Zijlstra <peterz@infradead.org>, paulmck@kernel.org, rui.zhang@intel.com, x86@kernel.org, linux-kernel@vger.kernel.org Cc: Feng Tang <feng.tang@intel.com> Subject: [PATCH RFC] x86/tsc: Make recalibration default on for TSC_KNOWN_FREQ cases Date: Mon, 22 May 2023 11:30:18 +0800 Message-Id: <20230522033018.1276836-1-feng.tang@intel.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766564473797044670?= X-GMAIL-MSGID: =?utf-8?q?1766564473797044670?= |
Series |
[RFC] x86/tsc: Make recalibration default on for TSC_KNOWN_FREQ cases
|
|
Commit Message
Feng Tang
May 22, 2023, 3:30 a.m. UTC
Commit a7ec817d5542 ("x86/tsc: Add option to force frequency
recalibration with HW timer") was added to handle cases that the
firmware has bug and provides a wrong TSC frequency number, and it
is optional given that this kind of firmware issue rarely happens
(Paul reported once [1]).
But Rui reported that some Sapphire Rapids platform met this issue
again recently, and as firmware is also a kind of 'software' which
can't be bug free, make the recalibration default on. When the
values from firmware and HW timer's calibration have big gap,
raise a warning and let vendor to check which side is broken.
One downside is, many VMs also has X86_FEATURE_TSC_KNOWN_FREQ set,
and they will also do this recalibration.
[1]. https://lore.kernel.org/lkml/20221117230910.GI4001@paulmck-ThinkPad-P17-Gen-1/
Signed-off-by: Feng Tang <feng.tang@intel.com>
---
Documentation/admin-guide/kernel-parameters.txt | 4 ----
arch/x86/kernel/tsc.c | 14 ++++----------
2 files changed, 4 insertions(+), 14 deletions(-)
Comments
On Mon, May 22 2023 at 11:30, Feng Tang wrote: > Commit a7ec817d5542 ("x86/tsc: Add option to force frequency > recalibration with HW timer") was added to handle cases that the > firmware has bug and provides a wrong TSC frequency number, and it > is optional given that this kind of firmware issue rarely happens > (Paul reported once [1]). > > But Rui reported that some Sapphire Rapids platform met this issue > again recently, and as firmware is also a kind of 'software' which > can't be bug free, make the recalibration default on. When the > values from firmware and HW timer's calibration have big gap, > raise a warning and let vendor to check which side is broken. Sure firmware can have bugs, but if firmware validation does not even catch such a trivially to detect bug, then their validation is nothing else than rubber stamping. Seriously. Are any of these affected platforms shipping already or is this just Intel internal muck? > One downside is, many VMs also has X86_FEATURE_TSC_KNOWN_FREQ set, > and they will also do this recalibration. It's also pointless for those SoCs which lack legacy hardware. So why do you force this on everyone? Thanks, tglx
Hi Thomas, Thanks for the review! On Mon, May 22, 2023 at 10:14:08AM +0200, Thomas Gleixner wrote: > On Mon, May 22 2023 at 11:30, Feng Tang wrote: > > Commit a7ec817d5542 ("x86/tsc: Add option to force frequency > > recalibration with HW timer") was added to handle cases that the > > firmware has bug and provides a wrong TSC frequency number, and it > > is optional given that this kind of firmware issue rarely happens > > (Paul reported once [1]). > > > > But Rui reported that some Sapphire Rapids platform met this issue > > again recently, and as firmware is also a kind of 'software' which > > can't be bug free, make the recalibration default on. When the > > values from firmware and HW timer's calibration have big gap, > > raise a warning and let vendor to check which side is broken. > > Sure firmware can have bugs, but if firmware validation does not even > catch such a trivially to detect bug, then their validation is nothing > else than rubber stamping. Seriously. Yes, agree. > Are any of these affected platforms shipping already or is this just > Intel internal muck? Paul and Rui can provide more info. AFAIK, those problems were raised by external customers, so the platform were already shipped from Intel. But I'm not sure they are commercial versions or early engineering drops. > > > One downside is, many VMs also has X86_FEATURE_TSC_KNOWN_FREQ set, > > and they will also do this recalibration. > > It's also pointless for those SoCs which lack legacy hardware. Yes. > So why do you force this on everyone? How about we keep the optional parameter, and enforce the check for bare metal platforms which got TSC frequency info from CPUID(0x15), like: --- diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index 344698852146..ec1ff6aaf5a9 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -670,8 +670,10 @@ unsigned long native_calibrate_tsc(void) * frequency and is the most accurate one so far we have. This * is considered a known frequency. */ - if (crystal_khz != 0) + if (crystal_khz != 0) { setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ); + tsc_force_recalibrate = 1; + } /* * Some Intel SoCs like Skylake and Kabylake don't report the crystal Thanks, Feng > Thanks, > > tglx
On Mon, May 22 2023 at 16:47, Feng Tang wrote: > On Mon, May 22, 2023 at 10:14:08AM +0200, Thomas Gleixner wrote: >> On Mon, May 22 2023 at 11:30, Feng Tang wrote: >> Are any of these affected platforms shipping already or is this just >> Intel internal muck? > > Paul and Rui can provide more info. AFAIK, those problems were raised > by external customers, so the platform were already shipped from > Intel. But I'm not sure they are commercial versions or early > engineering drops. So its at a company which knows how to update firmware, right? >> So why do you force this on everyone? > > How about we keep the optional parameter, and enforce the check for > bare metal platforms which got TSC frequency info from CPUID(0x15), > like: What prevents a hypervisor from providing this info in CPUID(0x15)? > @@ -670,8 +670,10 @@ unsigned long native_calibrate_tsc(void) > * frequency and is the most accurate one so far we have. This > * is considered a known frequency. > */ > - if (crystal_khz != 0) > + if (crystal_khz != 0) { > setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ); > + tsc_force_recalibrate = 1; > + } > > /* > * Some Intel SoCs like Skylake and Kabylake don't report the crystal and five lines further down: /* * For Atom SoCs TSC is the only reliable clocksource. * Mark TSC reliable so no watchdog on it. */ if (boot_cpu_data.x86_model == INTEL_FAM6_ATOM_GOLDMONT) setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE); So its reliable and needs recalibration against hardware which does not exist. Thanks, tglx
On Mon, May 22, 2023 at 01:49:53PM +0200, Thomas Gleixner wrote: > On Mon, May 22 2023 at 16:47, Feng Tang wrote: > > On Mon, May 22, 2023 at 10:14:08AM +0200, Thomas Gleixner wrote: > >> On Mon, May 22 2023 at 11:30, Feng Tang wrote: > >> Are any of these affected platforms shipping already or is this just > >> Intel internal muck? > > > > Paul and Rui can provide more info. AFAIK, those problems were raised > > by external customers, so the platform were already shipped from > > Intel. But I'm not sure they are commercial versions or early > > engineering drops. > > So its at a company which knows how to update firmware, right? Yes. And the recalibration may help to exposed the bug quickly. > >> So why do you force this on everyone? > > > > How about we keep the optional parameter, and enforce the check for > > bare metal platforms which got TSC frequency info from CPUID(0x15), > > like: > > What prevents a hypervisor from providing this info in CPUID(0x15)? > > > @@ -670,8 +670,10 @@ unsigned long native_calibrate_tsc(void) > > * frequency and is the most accurate one so far we have. This > > * is considered a known frequency. > > */ > > - if (crystal_khz != 0) > > + if (crystal_khz != 0) { > > setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ); > > + tsc_force_recalibrate = 1; > > + } > > > > /* > > * Some Intel SoCs like Skylake and Kabylake don't report the crystal > > and five lines further down: > > /* > * For Atom SoCs TSC is the only reliable clocksource. > * Mark TSC reliable so no watchdog on it. > */ > if (boot_cpu_data.x86_model == INTEL_FAM6_ATOM_GOLDMONT) > setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE); > > So its reliable and needs recalibration against hardware which does not > exist. I misunderstood it. When you said 'SOCs which lack legacy hardware', I thought you were referring those old Merrifield/Medfield things, which may have no HPET/ACPI PM_Timer but an APB timer, and mainly go through MSR way (tsc_msr.c) for TSC frequency. In this native_calibrate_tsc(), which touches the INTEL_FAM6_ATOM_GOLDMONT and INTEL_FAM6_ATOM_GOLDMONT_D, I dug out one Apollo Lake and one Denverton platform (which comply to those GOLDMNOT model), and they both have 'hpet' and 'acpi_pm' clocksource registered. Thanks, Feng > > Thanks, > > tglx
On Mon, May 22 2023 at 21:00, Feng Tang wrote: > On Mon, May 22, 2023 at 01:49:53PM +0200, Thomas Gleixner wrote: >> > Paul and Rui can provide more info. AFAIK, those problems were raised >> > by external customers, so the platform were already shipped from >> > Intel. But I'm not sure they are commercial versions or early >> > engineering drops. >> >> So its at a company which knows how to update firmware, right? > > Yes. And the recalibration may help to exposed the bug quickly. That should be exposed _before_ crappy firmware is shipped and validation can use the command line parameter. I'm tired of this constant source of embarrassing stupidity. It's not rocket science to catch this before shipping. And guess what. Making this easy to recover from is just not making the situation any better because firmware people will even care less. >> and five lines further down: >> >> /* >> * For Atom SoCs TSC is the only reliable clocksource. >> * Mark TSC reliable so no watchdog on it. >> */ >> if (boot_cpu_data.x86_model == INTEL_FAM6_ATOM_GOLDMONT) >> setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE); >> >> So its reliable and needs recalibration against hardware which does not >> exist. > > I misunderstood it. When you said 'SOCs which lack legacy hardware', > I thought you were referring those old Merrifield/Medfield things, > which may have no HPET/ACPI PM_Timer but an APB timer, and mainly go > through MSR way (tsc_msr.c) for TSC frequency. > > In this native_calibrate_tsc(), which touches the INTEL_FAM6_ATOM_GOLDMONT > and INTEL_FAM6_ATOM_GOLDMONT_D, I dug out one Apollo Lake and one > Denverton platform (which comply to those GOLDMNOT model), and they > both have 'hpet' and 'acpi_pm' clocksource registered. So that comment is wrong and that commit log is from fantasy land? http://lkml.kernel.org/r/1479241644-234277-4-git-send-email-bin.gao@linux.intel.com Clearly the left hand is not knowing what the right hand is doing. Thanks, tglx
On Mon, May 22, 2023 at 04:31:28PM +0200, Thomas Gleixner wrote: > On Mon, May 22 2023 at 21:00, Feng Tang wrote: > > On Mon, May 22, 2023 at 01:49:53PM +0200, Thomas Gleixner wrote: > >> > Paul and Rui can provide more info. AFAIK, those problems were raised > >> > by external customers, so the platform were already shipped from > >> > Intel. But I'm not sure they are commercial versions or early > >> > engineering drops. > >> > >> So its at a company which knows how to update firmware, right? > > > > Yes. And the recalibration may help to exposed the bug quickly. > > That should be exposed _before_ crappy firmware is shipped and > validation can use the command line parameter. I'm tired of this > constant source of embarrassing stupidity. It's not rocket science to > catch this before shipping. > > And guess what. Making this easy to recover from is just not making the > situation any better because firmware people will even care less. I can't argue with that :) > >> and five lines further down: > >> > >> /* > >> * For Atom SoCs TSC is the only reliable clocksource. > >> * Mark TSC reliable so no watchdog on it. > >> */ > >> if (boot_cpu_data.x86_model == INTEL_FAM6_ATOM_GOLDMONT) > >> setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE); > >> > >> So its reliable and needs recalibration against hardware which does not > >> exist. > > > > I misunderstood it. When you said 'SOCs which lack legacy hardware', > > I thought you were referring those old Merrifield/Medfield things, > > which may have no HPET/ACPI PM_Timer but an APB timer, and mainly go > > through MSR way (tsc_msr.c) for TSC frequency. > > > > In this native_calibrate_tsc(), which touches the INTEL_FAM6_ATOM_GOLDMONT > > and INTEL_FAM6_ATOM_GOLDMONT_D, I dug out one Apollo Lake and one > > Denverton platform (which comply to those GOLDMNOT model), and they > > both have 'hpet' and 'acpi_pm' clocksource registered. > > So that comment is wrong and that commit log is from fantasy land? > > http://lkml.kernel.org/r/1479241644-234277-4-git-send-email-bin.gao@linux.intel.com > > Clearly the left hand is not knowing what the right hand is doing. I started working on Atom (Moorestown) in about 2008, and moved to other platforms before the time of the patch. And I don't understand the commit log: "On Intel GOLDMONT Atom SoC TSC is the only reliable clocksource. We mark TSC reliable to avoid watchdog on it." Clearly the Denventon I found today has both HPET and ACPI_PM timer: [root@dnv0 ~]# grep . /sys/devices/system/clocksource/clocksource0/* /sys/devices/system/clocksource/clocksource0/available_clocksource:tsc hpet acpi_pm /sys/devices/system/clocksource/clocksource0/current_clocksource:tsc The lscpu info is: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 39 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 12 On-line CPU(s) list: 0-11 Vendor ID: GenuineIntel BIOS Vendor ID: Intel(R) Corporation Model name: Intel(R) Atom(TM) CPU C3850 @ 2.10GHz BIOS Model name: Intel(R) Atom(TM) CPU C3850 @ 2.10GHz CPU @ 2.1GHz BIOS CPU family: 43 CPU family: 6 Model: 95 Thread(s) per core: 1 Core(s) per socket: 12 Socket(s): 1 Stepping: 1 Maybe this cpu model (0x5F) has been used by some type of platforms which has met the false alarm watchdog issue. Thanks, Feng > Thanks, > > tglx
On Mon, May 22, 2023 at 11:20:15PM +0800, Feng Tang wrote: > And I don't understand the commit log: "On Intel GOLDMONT Atom SoC > TSC is the only reliable clocksource. We mark TSC reliable to avoid > watchdog on it." > > Clearly the Denventon I found today has both HPET and ACPI_PM timer: > > [root@dnv0 ~]# grep . /sys/devices/system/clocksource/clocksource0/* > /sys/devices/system/clocksource/clocksource0/available_clocksource:tsc hpet acpi_pm > /sys/devices/system/clocksource/clocksource0/current_clocksource:tsc > > The lscpu info is: > > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Address sizes: 39 bits physical, 48 bits virtual > Byte Order: Little Endian > CPU(s): 12 > On-line CPU(s) list: 0-11 > Vendor ID: GenuineIntel > BIOS Vendor ID: Intel(R) Corporation > Model name: Intel(R) Atom(TM) CPU C3850 @ 2.10GHz > BIOS Model name: Intel(R) Atom(TM) CPU C3850 @ 2.10GHz CPU @ 2.1GHz > BIOS CPU family: 43 > CPU family: 6 > Model: 95 > Thread(s) per core: 1 > Core(s) per socket: 12 > Socket(s): 1 > Stepping: 1 > > Maybe this cpu model (0x5F) has been used by some type of platforms > which has met the false alarm watchdog issue. It has them; but they are not *reliable*.
On Mon, May 22, 2023 at 06:11:58PM +0200, Peter Zijlstra wrote: > On Mon, May 22, 2023 at 11:20:15PM +0800, Feng Tang wrote: > > > And I don't understand the commit log: "On Intel GOLDMONT Atom SoC > > TSC is the only reliable clocksource. We mark TSC reliable to avoid > > watchdog on it." > > > > Clearly the Denventon I found today has both HPET and ACPI_PM timer: > > > > [root@dnv0 ~]# grep . /sys/devices/system/clocksource/clocksource0/* > > /sys/devices/system/clocksource/clocksource0/available_clocksource:tsc hpet acpi_pm > > /sys/devices/system/clocksource/clocksource0/current_clocksource:tsc > > > > The lscpu info is: > > > > Architecture: x86_64 > > CPU op-mode(s): 32-bit, 64-bit > > Address sizes: 39 bits physical, 48 bits virtual > > Byte Order: Little Endian > > CPU(s): 12 > > On-line CPU(s) list: 0-11 > > Vendor ID: GenuineIntel > > BIOS Vendor ID: Intel(R) Corporation > > Model name: Intel(R) Atom(TM) CPU C3850 @ 2.10GHz > > BIOS Model name: Intel(R) Atom(TM) CPU C3850 @ 2.10GHz CPU @ 2.1GHz > > BIOS CPU family: 43 > > CPU family: 6 > > Model: 95 > > Thread(s) per core: 1 > > Core(s) per socket: 12 > > Socket(s): 1 > > Stepping: 1 > > > > Maybe this cpu model (0x5F) has been used by some type of platforms > > which has met the false alarm watchdog issue. > > It has them; but they are not *reliable*. Yes, that's possible. I tried to CC the author Bin in case he can provide more background or information for his statement, but his email address is unreachable now. Thanks, Feng
On Tue, May 23, 2023 at 09:18:23AM +0800, Feng Tang wrote: > On Mon, May 22, 2023 at 06:11:58PM +0200, Peter Zijlstra wrote: > > On Mon, May 22, 2023 at 11:20:15PM +0800, Feng Tang wrote: > > > > > And I don't understand the commit log: "On Intel GOLDMONT Atom SoC > > > TSC is the only reliable clocksource. We mark TSC reliable to avoid > > > watchdog on it." > > > > > > Clearly the Denventon I found today has both HPET and ACPI_PM timer: > > > > > > [root@dnv0 ~]# grep . /sys/devices/system/clocksource/clocksource0/* > > > /sys/devices/system/clocksource/clocksource0/available_clocksource:tsc hpet acpi_pm > > > /sys/devices/system/clocksource/clocksource0/current_clocksource:tsc > > > > > > The lscpu info is: > > > > > > Architecture: x86_64 > > > CPU op-mode(s): 32-bit, 64-bit > > > Address sizes: 39 bits physical, 48 bits virtual > > > Byte Order: Little Endian > > > CPU(s): 12 > > > On-line CPU(s) list: 0-11 > > > Vendor ID: GenuineIntel > > > BIOS Vendor ID: Intel(R) Corporation > > > Model name: Intel(R) Atom(TM) CPU C3850 @ 2.10GHz > > > BIOS Model name: Intel(R) Atom(TM) CPU C3850 @ 2.10GHz CPU @ 2.1GHz > > > BIOS CPU family: 43 > > > CPU family: 6 > > > Model: 95 > > > Thread(s) per core: 1 > > > Core(s) per socket: 12 > > > Socket(s): 1 > > > Stepping: 1 > > > > > > Maybe this cpu model (0x5F) has been used by some type of platforms > > > which has met the false alarm watchdog issue. > > > > It has them; but they are not *reliable*. > > Yes, that's possible. I tried to CC the author Bin in case he can > provide more background or information for his statement, but his > email address is unreachable now. IIRC HPET stops in C10 or something stupid like that, I forgot what the problem with ACPI_PM is.
On Tue, May 23, 2023 at 10:11:15AM +0200, Peter Zijlstra wrote: > On Tue, May 23, 2023 at 09:18:23AM +0800, Feng Tang wrote: > > On Mon, May 22, 2023 at 06:11:58PM +0200, Peter Zijlstra wrote: > > > On Mon, May 22, 2023 at 11:20:15PM +0800, Feng Tang wrote: > > > > > > > And I don't understand the commit log: "On Intel GOLDMONT Atom SoC > > > > TSC is the only reliable clocksource. We mark TSC reliable to avoid > > > > watchdog on it." > > > > > > > > Clearly the Denventon I found today has both HPET and ACPI_PM timer: > > > > > > > > [root@dnv0 ~]# grep . /sys/devices/system/clocksource/clocksource0/* > > > > /sys/devices/system/clocksource/clocksource0/available_clocksource:tsc hpet acpi_pm > > > > /sys/devices/system/clocksource/clocksource0/current_clocksource:tsc > > > > > > > > The lscpu info is: > > > > > > > > Architecture: x86_64 > > > > CPU op-mode(s): 32-bit, 64-bit > > > > Address sizes: 39 bits physical, 48 bits virtual > > > > Byte Order: Little Endian > > > > CPU(s): 12 > > > > On-line CPU(s) list: 0-11 > > > > Vendor ID: GenuineIntel > > > > BIOS Vendor ID: Intel(R) Corporation > > > > Model name: Intel(R) Atom(TM) CPU C3850 @ 2.10GHz > > > > BIOS Model name: Intel(R) Atom(TM) CPU C3850 @ 2.10GHz CPU @ 2.1GHz > > > > BIOS CPU family: 43 > > > > CPU family: 6 > > > > Model: 95 > > > > Thread(s) per core: 1 > > > > Core(s) per socket: 12 > > > > Socket(s): 1 > > > > Stepping: 1 > > > > > > > > Maybe this cpu model (0x5F) has been used by some type of platforms > > > > which has met the false alarm watchdog issue. > > > > > > It has them; but they are not *reliable*. > > > > Yes, that's possible. I tried to CC the author Bin in case he can > > provide more background or information for his statement, but his > > email address is unreachable now. > > IIRC HPET stops in C10 or something stupid like that, I forgot what the > problem with ACPI_PM is. Yes, that HPET issue is a common one for several generations of client platforms like Baytrail, Coffee Lake etc.
On Mon, May 22, 2023 at 10:14:08AM +0200, Thomas Gleixner wrote: > On Mon, May 22 2023 at 11:30, Feng Tang wrote: > > Commit a7ec817d5542 ("x86/tsc: Add option to force frequency > > recalibration with HW timer") was added to handle cases that the > > firmware has bug and provides a wrong TSC frequency number, and it > > is optional given that this kind of firmware issue rarely happens > > (Paul reported once [1]). > > > > But Rui reported that some Sapphire Rapids platform met this issue > > again recently, and as firmware is also a kind of 'software' which > > can't be bug free, make the recalibration default on. When the > > values from firmware and HW timer's calibration have big gap, > > raise a warning and let vendor to check which side is broken. > > Sure firmware can have bugs, but if firmware validation does not even > catch such a trivially to detect bug, then their validation is nothing > else than rubber stamping. Seriously. > > Are any of these affected platforms shipping already or is this just > Intel internal muck? > > > One downside is, many VMs also has X86_FEATURE_TSC_KNOWN_FREQ set, > > and they will also do this recalibration. > > It's also pointless for those SoCs which lack legacy hardware. > > So why do you force this on everyone? Just for the record, this patch could be helpful in allowing victims of TSC mis-synchronization to more easily provide a more complete bug report to the firmware people. There is of course no point if there is already a fix available. But it is not all that hard to work around not having this patch upstream. This can be hand-applied as needed, NTP drift rates can be pressed into service for those of us having atomic clocks near all our servers, or the firmware guys can be tasked with figuring it out. So this patch would be nice to have, but we could live without it. Thanx, Paul
On 6/2/23 11:29, Paul E. McKenney wrote: >>> One downside is, many VMs also has X86_FEATURE_TSC_KNOWN_FREQ set, >>> and they will also do this recalibration. >> It's also pointless for those SoCs which lack legacy hardware. >> >> So why do you force this on everyone? > Just for the record, this patch could be helpful in allowing victims > of TSC mis-synchronization to more easily provide a more complete bug > report to the firmware people. There is of course no point if there is > already a fix available. > > But it is not all that hard to work around not having this patch upstream. > This can be hand-applied as needed, NTP drift rates can be pressed > into service for those of us having atomic clocks near all our servers, > or the firmware guys can be tasked with figuring it out. > > So this patch would be nice to have, but we could live without it. Is this the kind of thing we could relegate to a kernel unit test? Like make the recalibration logic _available_, but don't have it affect the rest of the system. I love patching my kernel as much as the next guy. But, you know what I *don't* love? Explaining how to patch kernels to other people. ;)
On Fri, Jun 02, 2023 at 11:36:54AM -0700, Dave Hansen wrote: > On 6/2/23 11:29, Paul E. McKenney wrote: > >>> One downside is, many VMs also has X86_FEATURE_TSC_KNOWN_FREQ set, > >>> and they will also do this recalibration. > >> It's also pointless for those SoCs which lack legacy hardware. > >> > >> So why do you force this on everyone? > > Just for the record, this patch could be helpful in allowing victims > > of TSC mis-synchronization to more easily provide a more complete bug > > report to the firmware people. There is of course no point if there is > > already a fix available. > > > > But it is not all that hard to work around not having this patch upstream. > > This can be hand-applied as needed, NTP drift rates can be pressed > > into service for those of us having atomic clocks near all our servers, > > or the firmware guys can be tasked with figuring it out. > > > > So this patch would be nice to have, but we could live without it. > > Is this the kind of thing we could relegate to a kernel unit test? Like > make the recalibration logic _available_, but don't have it affect the > rest of the system. > > I love patching my kernel as much as the next guy. But, you know what I > *don't* love? Explaining how to patch kernels to other people. ;) One could argue that we already have the TSC equivalent of a kernel unit test with the tsc=recalibrate kernel boot parameter. So, would it make sense to have something like tsc=recalibrate (already present) in the guise of something like hpet=recalibrate and/or pmtmr=recalibrate in order to allow people to opt into recalibrating whatever timer is marked unstable? Thanx, Paul
On Fri, Jun 02, 2023 at 12:08:10PM -0700, Paul E. McKenney wrote: > On Fri, Jun 02, 2023 at 11:36:54AM -0700, Dave Hansen wrote: > > On 6/2/23 11:29, Paul E. McKenney wrote: > > >>> One downside is, many VMs also has X86_FEATURE_TSC_KNOWN_FREQ set, > > >>> and they will also do this recalibration. > > >> It's also pointless for those SoCs which lack legacy hardware. > > >> > > >> So why do you force this on everyone? > > > Just for the record, this patch could be helpful in allowing victims > > > of TSC mis-synchronization to more easily provide a more complete bug > > > report to the firmware people. There is of course no point if there is > > > already a fix available. > > > > > > But it is not all that hard to work around not having this patch upstream. > > > This can be hand-applied as needed, NTP drift rates can be pressed > > > into service for those of us having atomic clocks near all our servers, > > > or the firmware guys can be tasked with figuring it out. > > > > > > So this patch would be nice to have, but we could live without it. > > > > Is this the kind of thing we could relegate to a kernel unit test? Like > > make the recalibration logic _available_, but don't have it affect the > > rest of the system. > > > > I love patching my kernel as much as the next guy. But, you know what I > > *don't* love? Explaining how to patch kernels to other people. ;) > > One could argue that we already have the TSC equivalent of a kernel unit > test with the tsc=recalibrate kernel boot parameter. > > So, would it make sense to have something like tsc=recalibrate (already > present) in the guise of something like hpet=recalibrate and/or > pmtmr=recalibrate in order to allow people to opt into recalibrating > whatever timer is marked unstable? This kind of hint parsing should be in tsc.c, so some name like 'tsc_recal=hpet/pmtmr' ? As hpet and pmtimer are the only 2 choices and hpet is the default one, if people want to use pmtimer, they can use combined parameter "tsc=recalibrate nohpet". Also, thanks for sharing your thoughts form a victim's viewpoint in the other mail! :) Thanks, Feng > Thanx, Paul
On Mon, Jun 05, 2023 at 09:04:45AM +0800, Feng Tang wrote: > On Fri, Jun 02, 2023 at 12:08:10PM -0700, Paul E. McKenney wrote: > > On Fri, Jun 02, 2023 at 11:36:54AM -0700, Dave Hansen wrote: > > > On 6/2/23 11:29, Paul E. McKenney wrote: > > > >>> One downside is, many VMs also has X86_FEATURE_TSC_KNOWN_FREQ set, > > > >>> and they will also do this recalibration. > > > >> It's also pointless for those SoCs which lack legacy hardware. > > > >> > > > >> So why do you force this on everyone? > > > > Just for the record, this patch could be helpful in allowing victims > > > > of TSC mis-synchronization to more easily provide a more complete bug > > > > report to the firmware people. There is of course no point if there is > > > > already a fix available. > > > > > > > > But it is not all that hard to work around not having this patch upstream. > > > > This can be hand-applied as needed, NTP drift rates can be pressed > > > > into service for those of us having atomic clocks near all our servers, > > > > or the firmware guys can be tasked with figuring it out. > > > > > > > > So this patch would be nice to have, but we could live without it. > > > > > > Is this the kind of thing we could relegate to a kernel unit test? Like > > > make the recalibration logic _available_, but don't have it affect the > > > rest of the system. > > > > > > I love patching my kernel as much as the next guy. But, you know what I > > > *don't* love? Explaining how to patch kernels to other people. ;) > > > > One could argue that we already have the TSC equivalent of a kernel unit > > test with the tsc=recalibrate kernel boot parameter. > > > > So, would it make sense to have something like tsc=recalibrate (already > > present) in the guise of something like hpet=recalibrate and/or > > pmtmr=recalibrate in order to allow people to opt into recalibrating > > whatever timer is marked unstable? > > This kind of hint parsing should be in tsc.c, so some name like > 'tsc_recal=hpet/pmtmr' ? > > As hpet and pmtimer are the only 2 choices and hpet is the default > one, if people want to use pmtimer, they can use combined parameter > "tsc=recalibrate nohpet". Or something like "tsc=recalibrateother"? I am not all that worried about what the boot parameters look like, as long as we have a way of telling the kernel to recalibrate whatever clocksource just got marked unstable. ;-) > Also, thanks for sharing your thoughts form a victim's viewpoint > in the other mail! :) Glad it helped! ;-) Thanx, Paul
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 9e5bab29685f..a826ab3b5dfb 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -6451,10 +6451,6 @@ in situations with strict latency requirements (where interruptions from clocksource watchdog are not acceptable). - [x86] recalibrate: force recalibration against a HW timer - (HPET or PM timer) on systems whose TSC frequency was - obtained from HW or FW using either an MSR or CPUID(0x15). - Warn if the difference is more than 500 ppm. [x86] watchdog: Use TSC as the watchdog clocksource with which to check other HW timers (HPET or PM timer), but only on systems where TSC has been deemed trustworthy. diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index 344698852146..b77c5b1ad304 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -48,8 +48,6 @@ static DEFINE_STATIC_KEY_FALSE(__use_tsc); int tsc_clocksource_reliable; -static int __read_mostly tsc_force_recalibrate; - static u32 art_to_tsc_numerator; static u32 art_to_tsc_denominator; static u64 art_to_tsc_offset; @@ -310,8 +308,6 @@ static int __init tsc_setup(char *str) __func__); tsc_as_watchdog = 0; } - if (!strcmp(str, "recalibrate")) - tsc_force_recalibrate = 1; if (!strcmp(str, "watchdog")) { if (no_tsc_watchdog) pr_alert("%s: tsc=watchdog overridden by earlier tsc=nowatchdog\n", @@ -1395,7 +1391,6 @@ static void tsc_refine_calibration_work(struct work_struct *work) else freq = calc_pmtimer_ref(delta, ref_start, ref_stop); - /* Will hit this only if tsc_force_recalibrate has been set */ if (boot_cpu_has(X86_FEATURE_TSC_KNOWN_FREQ)) { /* Warn if the deviation exceeds 500 ppm */ @@ -1456,17 +1451,16 @@ static int __init init_tsc_clocksource(void) clocksource_tsc.flags |= CLOCK_SOURCE_SUSPEND_NONSTOP; /* - * When TSC frequency is known (retrieved via MSR or CPUID), we skip - * the refined calibration and directly register it as a clocksource. + * When TSC frequency is known (retrieved via MSR or CPUID), we + * directly register it as a clocksource. As the firmware could + * be wrong (very unlikely) about the values, the recalibration + * by hardware timer is kept just as a sanity check. */ if (boot_cpu_has(X86_FEATURE_TSC_KNOWN_FREQ)) { if (boot_cpu_has(X86_FEATURE_ART)) art_related_clocksource = &clocksource_tsc; clocksource_register_khz(&clocksource_tsc, tsc_khz); clocksource_unregister(&clocksource_tsc_early); - - if (!tsc_force_recalibrate) - return 0; } schedule_delayed_work(&tsc_irqwork, 0);