Message ID | 20231024075748.1675382-1-dapeng1.mi@linux.intel.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:ce89:0:b0:403:3b70:6f57 with SMTP id p9csp1780345vqx; Tue, 24 Oct 2023 00:51:40 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGKbDXfOKmtIg+dNS4dLavy3edd+Lc9sFkJqODDrhr/X/WMkhEeVjE02lBFpuAWSbV0RLjr X-Received: by 2002:a05:6a00:5a:b0:68e:496a:7854 with SMTP id i26-20020a056a00005a00b0068e496a7854mr9193136pfk.18.1698133900051; Tue, 24 Oct 2023 00:51:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698133900; cv=none; d=google.com; s=arc-20160816; b=YdQ+p30RV9GZIQAvw98sR9VtpFdCWrecj+AHqie8ye8RUtGilwIRSD7bZzNdHotY0Z adWKdis7z0tFZ55xyPaCF8JVmaBqomJPZUAt5889b+HQZ+O735ZQm/AkNiGglyRavoJl 1WHltvfuNan7K/MC8VyGkPKyfURvliYM7OwHJaeCet+sCArAzlePRafMewyyqZ8/IsCy Mkf6U5p8YO+JX6HcZqK8O1OLXYrukEKdjmttikXaqM0Bs8Nj39AVd5eRjZFrwvtwVZ4O gEQ2DbQzVT0Ercik5lJkX1W3tK+jhZJt8h+stPZ6IJvumm0W7FQYpW4N/VQ9FSYDih0Y fwuQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=qtyuKCqU4WmJSGQQ1s6RAGkxehWA1+VHnx/599fUqXc=; fh=E3ilibYCxplRFz20OpOkjMj1lMKBPGRzcx1abGX98y8=; b=NB+g7TbQcdKFWVC0Atryt+adSQB+WSEI/XUTG6l5m8JmuuafxVejUzaiyUWL5t8p57 NMI5Butsto8seESKmKAbw30OW0KwjBbjz4rtvxQ4Bt3eBqgfBx4P88clf3CJyT01iVDX bJWvJvg1UtmER5AACxqqsYC6AHznN5QRkLwTQ/un8YwXqGht7gl7wWFG2u+4OS1Tx8cd M05JnANDwvPEJncSkZ7TJ8BdhLc9sT5SboD2ad25Kcrg3xFOo7KeJFppWmwQEgH/MhHL XyiYuGENmktOsbpYqAIz07BGS4QQyJYVKtyaO4wL9c4WPX8jVxkIHiIySTpP9cqWG0LE 76yw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=HUBkLDNw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id 185-20020a6301c2000000b005ab7b51ab5bsi722912pgb.110.2023.10.24.00.51.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Oct 2023 00:51:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=HUBkLDNw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id B571F806923B; Tue, 24 Oct 2023 00:51:37 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233767AbjJXHuy (ORCPT <rfc822;a1648639935@gmail.com> + 26 others); Tue, 24 Oct 2023 03:50:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32928 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233269AbjJXHuv (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 24 Oct 2023 03:50:51 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C0295E5; Tue, 24 Oct 2023 00:50:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1698133849; x=1729669849; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=pBPmX8plw4xqxvX7zjZCy6fPKvlgK/+i1hyGOGqL9Is=; b=HUBkLDNwzLRE9Yjk0jVKDr8/DgJ6r6TifBN3ZlcwFfJ8AshAhzmGrO5I ym0XfEp1oEcVFIQM0r4dmwKdi62WaefhrKiqce8EGdLXy9m0EiEK33qUb IbUpfohuedNpVba/xP+sd17s2capGPw2E7f9shk5QsLoqcfUNEpC6gSW2 TREHWqc77/LbrBbhJURlqhnp3uSGG0F/ECd4lY+shlvBJcz3e3/mu/HDr VS849fvMyxLco9glR+A7Oay1VBZPmH74SN0Av8HtB2fhw9KEkey6CWsMh S1maMJMgpuN6zBFDKb2hI+PYery78P5xKCb7v3EiQjn+FWHkBYH8rZy+c Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10872"; a="367235163" X-IronPort-AV: E=Sophos;i="6.03,247,1694761200"; d="scan'208";a="367235163" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Oct 2023 00:50:42 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10872"; a="1089766162" X-IronPort-AV: E=Sophos;i="6.03,247,1694761200"; d="scan'208";a="1089766162" Received: from dmi-pnp-i7.sh.intel.com ([10.239.159.155]) by fmsmga005.fm.intel.com with ESMTP; 24 Oct 2023 00:50:40 -0700 From: Dapeng Mi <dapeng1.mi@linux.intel.com> To: Sean Christopherson <seanjc@google.com>, Paolo Bonzini <pbonzini@redhat.com> Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Zhenyu Wang <zhenyuw@linux.intel.com>, Zhang Xiong <xiong.y.zhang@intel.com>, Jim Mattson <jmattson@google.com>, Mingwei Zhang <mizhang@google.com>, Like Xu <like.xu.linux@gmail.com>, Dapeng Mi <dapeng1.mi@intel.com>, Dapeng Mi <dapeng1.mi@linux.intel.com> Subject: [kvm-unit-tests Patch 0/5] Fix PMU test failures on Sapphire Rapids Date: Tue, 24 Oct 2023 15:57:43 +0800 Message-Id: <20231024075748.1675382-1-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Tue, 24 Oct 2023 00:51:37 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780622452250533849 X-GMAIL-MSGID: 1780622452250533849 |
Series |
Fix PMU test failures on Sapphire Rapids
|
|
Message
Mi, Dapeng
Oct. 24, 2023, 7:57 a.m. UTC
When running pmu test on Intel Sapphire Rapids, we found several failures are encountered, such as "llc misses" failure, "all counters" failure and "fixed counter 3" failure. Intel Sapphire Rapids introduces new fixed counter 3, total PMU counters including GP and fixed counters increase to 12 and also optimizes cache subsystem. All these changes make the original assumptions in pmu test unavailable any more on Sapphire Rapids. Patches 2-4 fixes these failures, patch 0 remove the duplicate code and patch 5 adds assert to ensure predefine fixed events are matched with HW fixed counters. Dapeng Mi (4): x86: pmu: Change the minimum value of llc_misses event to 0 x86: pmu: Enlarge cnt array length to 64 in check_counters_many() x86: pmu: Support validation for Intel PMU fixed counter 3 x86: pmu: Add asserts to warn inconsistent fixed events and counters Xiong Zhang (1): x86: pmu: Remove duplicate code in pmu_init() lib/x86/pmu.c | 5 ----- x86/pmu.c | 17 ++++++++++++----- 2 files changed, 12 insertions(+), 10 deletions(-) base-commit: bfe5d7d0e14c8199d134df84d6ae8487a9772c48
Comments
On Tue, Oct 24, 2023, Dapeng Mi wrote: > When running pmu test on Intel Sapphire Rapids, we found several > failures are encountered, such as "llc misses" failure, "all counters" > failure and "fixed counter 3" failure. hmm, I have tested your series on a SPR machine. It looks like, all "llc misses" already pass on my side. "all counters" always fail with/without your patches. "fixed counter 3" never exists... I have "fixed cntr-{0,1,2}" and "fixed-{0,1,2}" You may want to double check the requirements of your series. Not just under your setting without explainning those setting in detail. Maybe what I am missing is your topdown series? So, before your topdown series checked in. I don't see value in this series. Thanks. -Mingwei > > Intel Sapphire Rapids introduces new fixed counter 3, total PMU counters > including GP and fixed counters increase to 12 and also optimizes cache > subsystem. All these changes make the original assumptions in pmu test > unavailable any more on Sapphire Rapids. Patches 2-4 fixes these > failures, patch 0 remove the duplicate code and patch 5 adds assert to > ensure predefine fixed events are matched with HW fixed counters. > > Dapeng Mi (4): > x86: pmu: Change the minimum value of llc_misses event to 0 > x86: pmu: Enlarge cnt array length to 64 in check_counters_many() > x86: pmu: Support validation for Intel PMU fixed counter 3 > x86: pmu: Add asserts to warn inconsistent fixed events and counters > > Xiong Zhang (1): > x86: pmu: Remove duplicate code in pmu_init() > > lib/x86/pmu.c | 5 ----- > x86/pmu.c | 17 ++++++++++++----- > 2 files changed, 12 insertions(+), 10 deletions(-) > > > base-commit: bfe5d7d0e14c8199d134df84d6ae8487a9772c48 > -- > 2.34.1 >
On 10/26/2023 7:47 AM, Mingwei Zhang wrote: > On Tue, Oct 24, 2023, Dapeng Mi wrote: >> When running pmu test on Intel Sapphire Rapids, we found several >> failures are encountered, such as "llc misses" failure, "all counters" >> failure and "fixed counter 3" failure. > hmm, I have tested your series on a SPR machine. It looks like, all "llc > misses" already pass on my side. "all counters" always fail with/without > your patches. "fixed counter 3" never exists... I have "fixed > cntr-{0,1,2}" and "fixed-{0,1,2}" 1. "LLC misses" failure Yeah, the "LLC misses" failure is not always seen. I can see the "LLCĀ misses" 2 ~3 times out of 10 runs of PMU standalone test and you could see the failure with higher possibility if you run the full kvm-unit-tests. I think whether you can see the "LLC misses" failure it really depends on current cache status on your system, how much cache memory are consumed by other programs. If there are lots of free cache lines on system when running the pmu test, you may have higher possibility to see the LLC misses failures just like what I see below. PASS: Intel: llc references-7 *FAIL*: Intel: llc misses-0 PASS: Intel: llc misses-1 PASS: Intel: llc misses-2 2. "all counters" failure Actually the "all counters" failure are not always seen, but it doesn't mean current code is correct. In current code, the length of "cnt[10]" array in check_counters_many() is defined as 10, but there are at least 11 counters supported (8 GP counters + 3 fixed counters) on SPR even though fixed counter 3 is not supported in current upstream code. Obviously there would be out of range memory access in check_counters_many(). > > You may want to double check the requirements of your series. Not just > under your setting without explainning those setting in detail. > > Maybe what I am missing is your topdown series? So, before your topdown > series checked in. I don't see value in this series. 3. "fixed counter 3" failure Yeah, I just realized I used the kernel which includes the vtopdown supporting patches after Jim's reminding. As the reply for Jim's comments says, the patches for support slots event are still valuable for current emulation framework and I would split them from the original vtopdown patchset and resend them as an independent patchset. Anyway, even though there is not slots event support in Kernel, it only impacts the patch 4/5, other patches are still valuable. > > Thanks. > -Mingwei >> Intel Sapphire Rapids introduces new fixed counter 3, total PMU counters >> including GP and fixed counters increase to 12 and also optimizes cache >> subsystem. All these changes make the original assumptions in pmu test >> unavailable any more on Sapphire Rapids. Patches 2-4 fixes these >> failures, patch 0 remove the duplicate code and patch 5 adds assert to >> ensure predefine fixed events are matched with HW fixed counters. >> >> Dapeng Mi (4): >> x86: pmu: Change the minimum value of llc_misses event to 0 >> x86: pmu: Enlarge cnt array length to 64 in check_counters_many() >> x86: pmu: Support validation for Intel PMU fixed counter 3 >> x86: pmu: Add asserts to warn inconsistent fixed events and counters >> >> Xiong Zhang (1): >> x86: pmu: Remove duplicate code in pmu_init() >> >> lib/x86/pmu.c | 5 ----- >> x86/pmu.c | 17 ++++++++++++----- >> 2 files changed, 12 insertions(+), 10 deletions(-) >> >> >> base-commit: bfe5d7d0e14c8199d134df84d6ae8487a9772c48 >> -- >> 2.34.1 >>
On Thu, Oct 26, 2023, Mi, Dapeng wrote: > On 10/26/2023 7:47 AM, Mingwei Zhang wrote: > > On Tue, Oct 24, 2023, Dapeng Mi wrote: > > > When running pmu test on Intel Sapphire Rapids, we found several > > > failures are encountered, such as "llc misses" failure, "all counters" > > > failure and "fixed counter 3" failure. > > hmm, I have tested your series on a SPR machine. It looks like, all "llc > > misses" already pass on my side. "all counters" always fail with/without > > your patches. "fixed counter 3" never exists... I have "fixed > > cntr-{0,1,2}" and "fixed-{0,1,2}" > > 1. "LLC misses" failure > > Yeah, the "LLC misses" failure is not always seen. I can see the "LLCĀ > misses" 2 ~3 times out of 10 runs of PMU standalone test and you could see > the failure with higher possibility if you run the full kvm-unit-tests. I > think whether you can see the "LLC misses" failure it really depends on > current cache status on your system, how much cache memory are consumed by > other programs. If there are lots of free cache lines on system when running > the pmu test, you may have higher possibility to see the LLC misses failures > just like what I see below. > > PASS: Intel: llc references-7 > *FAIL*: Intel: llc misses-0 > PASS: Intel: llc misses-1 > PASS: Intel: llc misses-2 > > 2. "all counters" failure > > Actually the "all counters" failure are not always seen, but it doesn't mean > current code is correct. In current code, the length of "cnt[10]" array in > check_counters_many() is defined as 10, but there are at least 11 counters > supported (8 GP counters + 3 fixed counters) on SPR even though fixed > counter 3 is not supported in current upstream code. Obviously there would > be out of range memory access in check_counters_many(). > ok, I will double check on these. Thanks. > > > > You may want to double check the requirements of your series. Not just > > under your setting without explainning those setting in detail. > > > > Maybe what I am missing is your topdown series? So, before your topdown > > series checked in. I don't see value in this series. > > 3. "fixed counter 3" failure > > Yeah, I just realized I used the kernel which includes the vtopdown > supporting patches after Jim's reminding. As the reply for Jim's comments > says, the patches for support slots event are still valuable for current > emulation framework and I would split them from the original vtopdown > patchset and resend them as an independent patchset. Anyway, even though > there is not slots event support in Kernel, it only impacts the patch 4/5, > other patches are still valuable. > > > > > > Thanks. > > -Mingwei > > > Intel Sapphire Rapids introduces new fixed counter 3, total PMU counters > > > including GP and fixed counters increase to 12 and also optimizes cache > > > subsystem. All these changes make the original assumptions in pmu test > > > unavailable any more on Sapphire Rapids. Patches 2-4 fixes these > > > failures, patch 0 remove the duplicate code and patch 5 adds assert to > > > ensure predefine fixed events are matched with HW fixed counters. > > > > > > Dapeng Mi (4): > > > x86: pmu: Change the minimum value of llc_misses event to 0 > > > x86: pmu: Enlarge cnt array length to 64 in check_counters_many() > > > x86: pmu: Support validation for Intel PMU fixed counter 3 > > > x86: pmu: Add asserts to warn inconsistent fixed events and counters > > > > > > Xiong Zhang (1): > > > x86: pmu: Remove duplicate code in pmu_init() > > > > > > lib/x86/pmu.c | 5 ----- > > > x86/pmu.c | 17 ++++++++++++----- > > > 2 files changed, 12 insertions(+), 10 deletions(-) > > > > > > > > > base-commit: bfe5d7d0e14c8199d134df84d6ae8487a9772c48 > > > -- > > > 2.34.1 > > >