Message ID | 20221116082221.25243-1-yunying.sun@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp16295wru; Wed, 16 Nov 2022 00:08:11 -0800 (PST) X-Google-Smtp-Source: AA0mqf6ZXoYDXorkviA1BQjympJJXZxRkC1tKer0ZZSbCiQbrbYo8cIT8NgfUTpuDyqCQW750326 X-Received: by 2002:a65:63c2:0:b0:46b:2753:2a60 with SMTP id n2-20020a6563c2000000b0046b27532a60mr19412200pgv.192.1668586091231; Wed, 16 Nov 2022 00:08:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668586091; cv=none; d=google.com; s=arc-20160816; b=iGsUX8WA3VXMTe2mZBbAV9f3rrZnVlMFclqAYBS54K2VyEnY0nJX/YrU3hw3jsVbvO WQ/JobpNxUFItHy4DDITAA290paTZIB57GUjTTken2e6AbKDsp4M0j9qG/oK93rrPi6g qvihELGy04AHfKyyRodwixI6DfdfWEctovAnSw0IBdx0vyTjFfKvcBZpV5mDfb4sNXIm ncQmOBS8cHCCJFbN40TPwwFzhZFsZpVW3N1oftSvB40ql4frjqKPP8tUHUtLqvPKiyT8 yMNk/pquEHpoFQG3wY4cXbAiaeKOs4hCdZbzVFhfqJXYzcLO1NBOSNdMMgUPaTh14zkw 5Rjw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:subject:cc:to:from :dkim-signature; bh=aNLoYamOaOxKQT27HqH3T6zvdW0Rq/i2g2I7MJMmJNM=; b=pinoKYCXSdsRGxy4YL3OGO1njrGpfilBEdZqqnc3qxDZ/w5s+W4ivQk688vABBbS8+ 5ny9DwF4rMxdprwxx6L6e0JTI8GvHUwvxSNU156X/et0xF/fTEncmDDbqbOIBgBPX4TE f2U8EmRuqQ2gPwZ2O69Fm57Ksf1kfBJWNkhNsQivtKG9we76wgQo6U047HNxZFH0X82b 5Hcj4bFg3ICS6Oyx8k/AQq9B1VI6rhXMWCh8OO55D1DdD8tnWPmv1EKiz7KwgERdTRuQ ZkWY+XvNWXjg//zJIYN5yR3TFgBKMGNgQNE0HcpIuOC2zL5ebGAreI85p6A437jcyPLm tSDg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=d1l8zT6g; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id mw7-20020a17090b4d0700b00210767d910asi1636884pjb.36.2022.11.16.00.07.35; Wed, 16 Nov 2022 00:08:11 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=d1l8zT6g; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233066AbiKPIHQ (ORCPT <rfc822;maxim.cournoyer@gmail.com> + 99 others); Wed, 16 Nov 2022 03:07:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40334 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233077AbiKPIHN (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 16 Nov 2022 03:07:13 -0500 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D243A6583 for <linux-kernel@vger.kernel.org>; Wed, 16 Nov 2022 00:07:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668586032; x=1700122032; h=from:to:cc:subject:date:message-id; bh=3mwv0wPuZe8XW9kvluU9XKECvz486EpvYsA7IYgIBqs=; b=d1l8zT6goVnqEeNHhwxn+vpHNxxxmxjNNlPD+grj6kOfwadYXKFuQn2s dqXfU+VR2+WRfdh/YxTLWIumKIWO8eXMouAr4t71FWuRePp7HFTL7ESHZ gFkaapwJ+nPIBxt7zT3yljUb7SxyZG3rRKKWzUJ1Pyo1WqiHgamY4KPk6 hoXQXAlcuitpksWb/sctbXyc1vMLAD4rWAEk/IZrhvaD8GnjtCGXGUORX jB9Alcun9YuoUcE50U9PEXhzHmP9BEYWiYw+SrZtAOiEHG6gUPjc4WJ9u yvQ7xoEz9z8DcWPqf9w6DKREKSB3UsYGP6d3bEydpHwIOI+M3aZ3pYuaE A==; X-IronPort-AV: E=McAfee;i="6500,9779,10532"; a="314295979" X-IronPort-AV: E=Sophos;i="5.96,167,1665471600"; d="scan'208";a="314295979" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Nov 2022 00:07:12 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10532"; a="884299967" X-IronPort-AV: E=Sophos;i="5.96,167,1665471600"; d="scan'208";a="884299967" Received: from ubuntu.bj.intel.com ([10.238.154.182]) by fmsmga006.fm.intel.com with ESMTP; 16 Nov 2022 00:07:10 -0800 From: Yunying Sun <yunying.sun@intel.com> To: tglx@linutronix.de, peterz@infradead.org, paulmck@kernel.org, longman@redhat.com, x86@kernel.org, linux-kernel@vger.kernel.org Cc: feng.tang@intel.com, yunying.sun@intel.com Subject: [PATCH] clocksource: Print clocksource name when clocksource is tested unstable Date: Wed, 16 Nov 2022 16:22:21 +0800 Message-Id: <20221116082221.25243-1-yunying.sun@intel.com> X-Mailer: git-send-email 2.17.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1749639329500172157?= X-GMAIL-MSGID: =?utf-8?q?1749639329500172157?= |
Series |
clocksource: Print clocksource name when clocksource is tested unstable
|
|
Commit Message
Sun, Yunying
Nov. 16, 2022, 8:22 a.m. UTC
Some "TSC fall back to HPET" cases are seen on systems that have more
than 2 numa nodes. When this happens, in kernel log it has:
clocksource: timekeeping watchdog on CPU168: hpet read-back delay of 4296200ns, attempt 4, marking unstable
The "hpet" here is misleading since it prints only the name of watchdog,
where actually it's measuring the delay of 3 reads: wd-clocksource-wd.
Signed-off-by: Yunying Sun <yunying.sun@intel.com>
---
kernel/time/clocksource.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
Comments
On Wed, Nov 16, 2022 at 04:22:21PM +0800, Yunying Sun wrote: > Some "TSC fall back to HPET" cases are seen on systems that have more > than 2 numa nodes. When this happens, in kernel log it has: > > clocksource: timekeeping watchdog on CPU168: hpet read-back delay of 4296200ns, attempt 4, marking unstable > > The "hpet" here is misleading since it prints only the name of watchdog, > where actually it's measuring the delay of 3 reads: wd-clocksource-wd. > > Signed-off-by: Yunying Sun <yunying.sun@intel.com> Good eyes! I queued this with wordsmithing as shown below. Please let me know if I messed something up. Thanx, Paul ------------------------------------------------------------------------ commit 72ff63e2d3f19963a4ef401e5f8c1bd16646b587 Author: Yunying Sun <yunying.sun@intel.com> Date: Wed Nov 16 16:22:21 2022 +0800 clocksource: Print clocksource name when clocksource is tested unstable Some "TSC fall back to HPET" messages appear on systems having more than 2 NUMA nodes: clocksource: timekeeping watchdog on CPU168: hpet read-back delay of 4296200ns, attempt 4, marking unstable The "hpet" here is misleading the clocksource watchdog is really doing repeated reads of "hpet" in order to check for unrelated delays. Therefore, print the name of the clocksource under test, prefixed by "wd-" and suffixed by "-wd", for example, "wd-tsc-wd". Signed-off-by: Yunying Sun <yunying.sun@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index 4015ec6503a52..6f681d56a843f 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -257,8 +257,8 @@ static enum wd_read_status cs_watchdog_read(struct clocksource *cs, u64 *csnow, goto skip_test; } - pr_warn("timekeeping watchdog on CPU%d: %s read-back delay of %lldns, attempt %d, marking unstable\n", - smp_processor_id(), watchdog->name, wd_delay, nretries); + pr_warn("timekeeping watchdog on CPU%d: wd-%s-wd read-back delay of %lldns, attempt %d, marking unstable\n", + smp_processor_id(), cs->name, wd_delay, nretries); return WD_READ_UNSTABLE; skip_test:
Hi Paul, Thank you for the quick review. The commit message looks more accurate and clear after your wordsmithing. -Yunying -----Original Message----- From: Paul E. McKenney <paulmck@kernel.org> Sent: Thursday, 17 November, 2022 04:31 To: Sun, Yunying <yunying.sun@intel.com> Cc: tglx@linutronix.de; peterz@infradead.org; longman@redhat.com; x86@kernel.org; linux-kernel@vger.kernel.org; Tang, Feng <feng.tang@intel.com> Subject: Re: [PATCH] clocksource: Print clocksource name when clocksource is tested unstable On Wed, Nov 16, 2022 at 04:22:21PM +0800, Yunying Sun wrote: > Some "TSC fall back to HPET" cases are seen on systems that have more > than 2 numa nodes. When this happens, in kernel log it has: > > clocksource: timekeeping watchdog on CPU168: hpet read-back delay of > 4296200ns, attempt 4, marking unstable > > The "hpet" here is misleading since it prints only the name of > watchdog, where actually it's measuring the delay of 3 reads: wd-clocksource-wd. > > Signed-off-by: Yunying Sun <yunying.sun@intel.com> Good eyes! I queued this with wordsmithing as shown below. Please let me know if I messed something up. Thanx, Paul ------------------------------------------------------------------------ commit 72ff63e2d3f19963a4ef401e5f8c1bd16646b587 Author: Yunying Sun <yunying.sun@intel.com> Date: Wed Nov 16 16:22:21 2022 +0800 clocksource: Print clocksource name when clocksource is tested unstable Some "TSC fall back to HPET" messages appear on systems having more than 2 NUMA nodes: clocksource: timekeeping watchdog on CPU168: hpet read-back delay of 4296200ns, attempt 4, marking unstable The "hpet" here is misleading the clocksource watchdog is really doing repeated reads of "hpet" in order to check for unrelated delays. Therefore, print the name of the clocksource under test, prefixed by "wd-" and suffixed by "-wd", for example, "wd-tsc-wd". Signed-off-by: Yunying Sun <yunying.sun@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index 4015ec6503a52..6f681d56a843f 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -257,8 +257,8 @@ static enum wd_read_status cs_watchdog_read(struct clocksource *cs, u64 *csnow, goto skip_test; } - pr_warn("timekeeping watchdog on CPU%d: %s read-back delay of %lldns, attempt %d, marking unstable\n", - smp_processor_id(), watchdog->name, wd_delay, nretries); + pr_warn("timekeeping watchdog on CPU%d: wd-%s-wd read-back delay of %lldns, attempt %d, marking unstable\n", + smp_processor_id(), cs->name, wd_delay, nretries); return WD_READ_UNSTABLE; skip_test:
On Wed, Nov 16, 2022 at 04:22:21PM +0800, Yunying Sun wrote: > Some "TSC fall back to HPET" cases are seen on systems that have more > than 2 numa nodes. When this happens, in kernel log it has: Hi Yunying, If the system's physical sockests number <= 2, I would suggest you to try the patch https://lore.kernel.org/lkml/20221021062131.1826810-1-feng.tang@intel.com/, which is still under review and discussion. Thanks, Feng > clocksource: timekeeping watchdog on CPU168: hpet read-back delay of 4296200ns, attempt 4, marking unstable > > The "hpet" here is misleading since it prints only the name of watchdog, > where actually it's measuring the delay of 3 reads: wd-clocksource-wd. > > Signed-off-by: Yunying Sun <yunying.sun@intel.com> > --- > kernel/time/clocksource.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c > index 8058bec87ace..fac8c0d90e61 100644 > --- a/kernel/time/clocksource.c > +++ b/kernel/time/clocksource.c > @@ -257,8 +257,8 @@ static enum wd_read_status cs_watchdog_read(struct clocksource *cs, u64 *csnow, > goto skip_test; > } > > - pr_warn("timekeeping watchdog on CPU%d: %s read-back delay of %lldns, attempt %d, marking unstable\n", > - smp_processor_id(), watchdog->name, wd_delay, nretries); > + pr_warn("timekeeping watchdog on CPU%d: wd-%s-wd read-back delay of %lldns, attempt %d, marking unstable\n", > + smp_processor_id(), cs->name, wd_delay, nretries); > return WD_READ_UNSTABLE; > > skip_test: > -- > 2.17.0 >
Hi Feng, For the "tsc fall back to hpet" issue, thanks a lot for pointing me to the fix proposal patch earlier: https://lore.kernel.org/lkml/20221021062131.1826810-1-feng.tang@intel.com/ . I shared this patch set to our customer yesterday. Just got feedback that customer has verified the "tsc fall back to hpet" issue does not appear anymore with this patch set, after 24hrs stress test on 8 servers. -Yunying -----Original Message----- From: Tang, Feng <feng.tang@intel.com> Sent: Thursday, 17 November, 2022 14:47 To: Sun, Yunying <yunying.sun@intel.com> Cc: tglx@linutronix.de; peterz@infradead.org; paulmck@kernel.org; longman@redhat.com; x86@kernel.org; linux-kernel@vger.kernel.org Subject: Re: [PATCH] clocksource: Print clocksource name when clocksource is tested unstable On Wed, Nov 16, 2022 at 04:22:21PM +0800, Yunying Sun wrote: > Some "TSC fall back to HPET" cases are seen on systems that have more > than 2 numa nodes. When this happens, in kernel log it has: Hi Yunying, If the system's physical sockests number <= 2, I would suggest you to try the patch https://lore.kernel.org/lkml/20221021062131.1826810-1-feng.tang@intel.com/, which is still under review and discussion. Thanks, Feng > clocksource: timekeeping watchdog on CPU168: hpet read-back delay of > 4296200ns, attempt 4, marking unstable > > The "hpet" here is misleading since it prints only the name of > watchdog, where actually it's measuring the delay of 3 reads: wd-clocksource-wd. > > Signed-off-by: Yunying Sun <yunying.sun@intel.com> > --- > kernel/time/clocksource.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c > index 8058bec87ace..fac8c0d90e61 100644 > --- a/kernel/time/clocksource.c > +++ b/kernel/time/clocksource.c > @@ -257,8 +257,8 @@ static enum wd_read_status cs_watchdog_read(struct clocksource *cs, u64 *csnow, > goto skip_test; > } > > - pr_warn("timekeeping watchdog on CPU%d: %s read-back delay of %lldns, attempt %d, marking unstable\n", > - smp_processor_id(), watchdog->name, wd_delay, nretries); > + pr_warn("timekeeping watchdog on CPU%d: wd-%s-wd read-back delay of %lldns, attempt %d, marking unstable\n", > + smp_processor_id(), cs->name, wd_delay, nretries); > return WD_READ_UNSTABLE; > > skip_test: > -- > 2.17.0 >
On Thu, Nov 17, 2022 at 04:03:07PM +0800, Sun, Yunying wrote: > Hi Feng, > > For the "tsc fall back to hpet" issue, thanks a lot for pointing me to the fix proposal patch earlier: > https://lore.kernel.org/lkml/20221021062131.1826810-1-feng.tang@intel.com/ . > > I shared this patch set to our customer yesterday. > Just got feedback that customer has verified the "tsc fall back to hpet" issue does not appear anymore with this patch set, after 24hrs stress test on 8 servers. Thanks for the testing! So your case is a real case that current 'nr_online_nodes' check cannot cover, which Dave concerned about. Thanks, Feng > -Yunying > > -----Original Message----- > From: Tang, Feng <feng.tang@intel.com> > Sent: Thursday, 17 November, 2022 14:47 > To: Sun, Yunying <yunying.sun@intel.com> > Cc: tglx@linutronix.de; peterz@infradead.org; paulmck@kernel.org; longman@redhat.com; x86@kernel.org; linux-kernel@vger.kernel.org > Subject: Re: [PATCH] clocksource: Print clocksource name when clocksource is tested unstable > > On Wed, Nov 16, 2022 at 04:22:21PM +0800, Yunying Sun wrote: > > Some "TSC fall back to HPET" cases are seen on systems that have more > > than 2 numa nodes. When this happens, in kernel log it has: > > Hi Yunying, > > If the system's physical sockests number <= 2, I would suggest you to try the patch https://lore.kernel.org/lkml/20221021062131.1826810-1-feng.tang@intel.com/, > which is still under review and discussion. > > Thanks, > Feng > > > clocksource: timekeeping watchdog on CPU168: hpet read-back delay of > > 4296200ns, attempt 4, marking unstable > > > > The "hpet" here is misleading since it prints only the name of > > watchdog, where actually it's measuring the delay of 3 reads: wd-clocksource-wd. > > > > Signed-off-by: Yunying Sun <yunying.sun@intel.com> > > --- > > kernel/time/clocksource.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c > > index 8058bec87ace..fac8c0d90e61 100644 > > --- a/kernel/time/clocksource.c > > +++ b/kernel/time/clocksource.c > > @@ -257,8 +257,8 @@ static enum wd_read_status cs_watchdog_read(struct clocksource *cs, u64 *csnow, > > goto skip_test; > > } > > > > - pr_warn("timekeeping watchdog on CPU%d: %s read-back delay of %lldns, attempt %d, marking unstable\n", > > - smp_processor_id(), watchdog->name, wd_delay, nretries); > > + pr_warn("timekeeping watchdog on CPU%d: wd-%s-wd read-back delay of %lldns, attempt %d, marking unstable\n", > > + smp_processor_id(), cs->name, wd_delay, nretries); > > return WD_READ_UNSTABLE; > > > > skip_test: > > -- > > 2.17.0 > >
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index 8058bec87ace..fac8c0d90e61 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -257,8 +257,8 @@ static enum wd_read_status cs_watchdog_read(struct clocksource *cs, u64 *csnow, goto skip_test; } - pr_warn("timekeeping watchdog on CPU%d: %s read-back delay of %lldns, attempt %d, marking unstable\n", - smp_processor_id(), watchdog->name, wd_delay, nretries); + pr_warn("timekeeping watchdog on CPU%d: wd-%s-wd read-back delay of %lldns, attempt %d, marking unstable\n", + smp_processor_id(), cs->name, wd_delay, nretries); return WD_READ_UNSTABLE; skip_test: