Message ID | 20240201061505.2027804-1-dapeng1.mi@linux.intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-47685-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:693c:2685:b0:106:209c:c626 with SMTP id mn5csp261817dyc; Wed, 31 Jan 2024 23:08:46 -0800 (PST) X-Google-Smtp-Source: AGHT+IGb+/SyhbWpQz3vjRs/pMyrnMjYza8gPbQ02J3KJCjxHB7R4xVhSZmqe3GUtf4AmRRIrWF+ X-Received: by 2002:a05:6358:6f11:b0:178:a1d9:4a9f with SMTP id r17-20020a0563586f1100b00178a1d94a9fmr3505974rwn.31.1706771325682; Wed, 31 Jan 2024 23:08:45 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706771325; cv=pass; d=google.com; s=arc-20160816; b=iUIV9HfvPboBreyUSHWCvzOMUyeswM4W690pRUNodvt7bQAFJjjZdEzUB7yvR+ZmPV RH7fwKyf1LBhp7pye/wCVNMf3teh7XsCLGUP1gk9eb4DL8fHeB6FGHw7Fq00yh3OiXmb LfI9x3DWhm8Fsz9qA2tEJ+9ti8Bg4SpzA5B7M9pbiIIIYKujDPN5aJQfCrbIdbBAI53m Fl/97ePWPpLiDY7y4x6WH347uIK/0EGr545IVGxQ/Zug3s323SII2k8KhghTtFebj1zw F6lrhA2rNubLDZuxvj4zgWLFInzL4BJm0n+PVkJ8Is19O1dD+03qdVdBHuWaTthgbaY6 jiJg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from:dkim-signature; bh=Arp9qpRunnPQOPvahOR2PiANM2aFgs6wU7oLbkgt5aY=; fh=F3gHi1YTctpmu5rjlAAQ8zNSQMxd5i8SEecEYLBQbS0=; b=gLhXTQxIatdSNWSP7S0IiMdFkvOKrXH3VwwKCUJiK8/hrK8DLVrJ9rIex74ZVZ/EdJ PnbN7SAQxbicMMMJAtMqlon35j+vc/d2iHacahga3f9IH3qAtpFYxZzbxsv2htS/Khih fao8YEoqGmjtFMlsBhuq8rl+oJru3sc1xYIUjTkFcCCM5zvQZea4bfsJee1sWUhYeb4y TR7tIzgxI8niSWSF7TvFNWViIyO8vVgQAictESK7JNzRTRsdS7DHlt0avjxGpwA+54S7 X/M4T2cbAT81vDLQEWwOz/9nD6ItuHUALlEjPUNtmORH/FR7VU16PrLE7dYOjWFKus4r 06FQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=nzqp2sYE; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-47685-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-47685-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Forwarded-Encrypted: i=1; AJvYcCWNpG7Ld/rsWvANvsY/AKSDI1XCqC1314EPFwL/NtjPvpS5m8wgDQuY9qdMEqo4P7mIORV4XN0WtLJRq/qanTAYYCrW0w== Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id ff20-20020a056a002f5400b006dbdcd07423si11141984pfb.211.2024.01.31.23.08.45 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 31 Jan 2024 23:08:45 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-47685-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=nzqp2sYE; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-47685-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-47685-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 51292B2A0E4 for <ouuuleilei@gmail.com>; Thu, 1 Feb 2024 06:08:05 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id CB95B86155; Thu, 1 Feb 2024 06:07:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="nzqp2sYE" Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9FC2784A2B; Thu, 1 Feb 2024 06:07:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706767664; cv=none; b=M75LZ2BGxFxOgy22iVCXPA/OdG6WxQ28iWyNEVdPmR/skhd2HuzKNXptTYWPrVB5tb7lC+19ZAwXouvlOhl+pQRZwV3jOIHQ64n8FNplJbJL9zS6L/nuWrf0nWle7vEgRgRk07lKK1u3blopbQMKscZ8813l/M2l03I08fdQ53M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706767664; c=relaxed/simple; bh=2Mh1AT0f2dbPt4JXsiALccjzvLIfnSSqSxigb7iI0+0=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=PabyGuUW3ASnOuRM+OWCLfZGNEPxA2z+mWt2qc6mZFFhbloRKzvrKaiTHbR0JJLK5HWtckhePWrP9W604kUmMXgS+Bj4ZOqQxdRkwIu6mVdkgdTW8rh1J58k6S86n27YxSxK36F6uaSK9Mkkcu8CdRnoniDw6c9smVblGq3u1yM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=nzqp2sYE; arc=none smtp.client-ip=192.198.163.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706767663; x=1738303663; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=2Mh1AT0f2dbPt4JXsiALccjzvLIfnSSqSxigb7iI0+0=; b=nzqp2sYEPWOnwkw6Fu6zI8vhn0lElEF4LmYPGHoXnJxAiWrTREJFVOkJ 84ie2zE9xxZZzSXlIMQk5oNtXBobFUkpFhMhWRtIOsvZ7Mg7JQG+MzEhD glpBB9WpUY2O/Iu0p30nLGleOMXPyt8tlHE01R2CNdO/0pg3PAnGsSx6m dmt9aTjg4vZ8x/6VC2Swg49KVmPD+ZsfWJIO7sr3V1HKI/SFF/M9VExZe XhEyVCm/tsEvxStk2yr6PYDsakvx53fZx+JCyyFWItTK3pGQMXNViFp2r nSEtn4rhI2EL9rV/TMlnax0SQh/1ZegP4OdlCG3qQK6K35BSc1H2For+G w==; X-IronPort-AV: E=McAfee;i="6600,9927,10969"; a="10574908" X-IronPort-AV: E=Sophos;i="6.05,234,1701158400"; d="scan'208";a="10574908" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jan 2024 22:07:42 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.05,234,1701158400"; d="scan'208";a="30776298" Received: from dmi-pnp-i7.sh.intel.com ([10.239.159.155]) by fmviesa001.fm.intel.com with ESMTP; 31 Jan 2024 22:07:39 -0800 From: Dapeng Mi <dapeng1.mi@linux.intel.com> To: Sean Christopherson <seanjc@google.com>, Paolo Bonzini <pbonzini@redhat.com> Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Kan Liang <kan.liang@linux.intel.com>, Jim Mattson <jmattson@google.com>, Jinrong Liang <cloudliang@tencent.com>, Aaron Lewis <aaronlewis@google.com>, Dapeng Mi <dapeng1.mi@intel.com>, Dapeng Mi <dapeng1.mi@linux.intel.com> Subject: [PATCH] KVM: selftests: Test top-down slots event Date: Thu, 1 Feb 2024 14:15:05 +0800 Message-Id: <20240201061505.2027804-1-dapeng1.mi@linux.intel.com> X-Mailer: git-send-email 2.34.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789679449507509267 X-GMAIL-MSGID: 1789679449507509267 |
Series |
KVM: selftests: Test top-down slots event
|
|
Commit Message
Mi, Dapeng
Feb. 1, 2024, 6:15 a.m. UTC
Although the fixed counter 3 and the exclusive pseudo slots events is
not supported by KVM yet, the architectural slots event is supported by
KVM and can be programed on any GP counter. Thus add validation for this
architectural slots event.
Top-down slots event "counts the total number of available slots for an
unhalted logical processor, and increments by machine-width of the
narrowest pipeline as employed by the Top-down Microarchitecture
Analysis method." So suppose the measured count of slots event would be
always larger than 0.
pmu_counters_test passed with this patch on Intel Sapphire Rapids.
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
tools/testing/selftests/kvm/x86_64/pmu_counters_test.c | 1 +
1 file changed, 1 insertion(+)
base-commit: f0f3b810edda57f317d79f452056786257089667
Comments
On Thu, Feb 01, 2024, Dapeng Mi wrote: > Although the fixed counter 3 and the exclusive pseudo slots events is > not supported by KVM yet, the architectural slots event is supported by > KVM and can be programed on any GP counter. Thus add validation for this > architectural slots event. > > Top-down slots event "counts the total number of available slots for an > unhalted logical processor, and increments by machine-width of the > narrowest pipeline as employed by the Top-down Microarchitecture > Analysis method." So suppose the measured count of slots event would be > always larger than 0. Please translate that into something non-perf folks can understand. I know what a pipeline slot is, and I know a dictionary's definition of "available" is, but I still have no idea what this event actually counts. In other words, I want a precise definition of exactly what constitutes an "available slot", in verbiage that anyone with basic understanding of x86 architectures can follow after reading the whitepaper[*], which is helpful for understanding the concepts, but doesn't crisply explain what this event counts. Examples of when a slot is available vs. unavailable would be extremely helpful. [*] https://www.intel.com/content/www/us/en/docs/vtune-profiler/cookbook/2023-0/top-down-microarchitecture-analysis-method.html
On 2/2/2024 2:02 AM, Sean Christopherson wrote: > On Thu, Feb 01, 2024, Dapeng Mi wrote: >> Although the fixed counter 3 and the exclusive pseudo slots events is >> not supported by KVM yet, the architectural slots event is supported by >> KVM and can be programed on any GP counter. Thus add validation for this >> architectural slots event. >> >> Top-down slots event "counts the total number of available slots for an >> unhalted logical processor, and increments by machine-width of the >> narrowest pipeline as employed by the Top-down Microarchitecture >> Analysis method." So suppose the measured count of slots event would be >> always larger than 0. > Please translate that into something non-perf folks can understand. I know what > a pipeline slot is, and I know a dictionary's definition of "available" is, but I > still have no idea what this event actually counts. In other words, I want a > precise definition of exactly what constitutes an "available slot", in verbiage > that anyone with basic understanding of x86 architectures can follow after reading > the whitepaper[*], which is helpful for understanding the concepts, but doesn't > crisply explain what this event counts. > > Examples of when a slot is available vs. unavailable would be extremely helpful. > > [*] https://www.intel.com/content/www/us/en/docs/vtune-profiler/cookbook/2023-0/top-down-microarchitecture-analysis-method.html Yeah, indeed, 'slots' is not easily understood from its literal meaning. I also took some time to understand it when I look at this event for the first time. Simply speaking, slots is an abstract concept which indicates how many uops (decoded from instructions) can be processed simultaneously (per cycle) on HW. we assume there is a classic 5-stage pipeline, fetch, decode, execute, memory access and register writeback. In topdown micro-architectural analysis method, the former two stages (fetch/decode) is called front-end and the last three stages are called back-end. In modern Intel processors, a complicated instruction could be decoded into several uops (micro-operations) and so these uops can be processed simultaneously and then improve the performance. Thus, assume a processor can decode and dispatch 4 uops in front-end and execute 4 uops in back-end simultaneously (per-cycle), so we would say this processor has 4 topdown slots per-cycle. If a slot is spare and can be used to process new uop, we say it's available, but if a slot is occupied by a uop for several cycles and not retired (maybe blocked by memory access), we say this slot is stall and unavailable. Ok, I would rewrite the commit description and add more explanation there.
On Fri, Feb 02, 2024, Dapeng Mi wrote: > > On 2/2/2024 2:02 AM, Sean Christopherson wrote: > > On Thu, Feb 01, 2024, Dapeng Mi wrote: > > > Although the fixed counter 3 and the exclusive pseudo slots events is > > > not supported by KVM yet, the architectural slots event is supported by > > > KVM and can be programed on any GP counter. Thus add validation for this > > > architectural slots event. > > > > > > Top-down slots event "counts the total number of available slots for an > > > unhalted logical processor, and increments by machine-width of the > > > narrowest pipeline as employed by the Top-down Microarchitecture > > > Analysis method." So suppose the measured count of slots event would be > > > always larger than 0. > > Please translate that into something non-perf folks can understand. I know what > > a pipeline slot is, and I know a dictionary's definition of "available" is, but I > > still have no idea what this event actually counts. In other words, I want a > > precise definition of exactly what constitutes an "available slot", in verbiage > > that anyone with basic understanding of x86 architectures can follow after reading > > the whitepaper[*], which is helpful for understanding the concepts, but doesn't > > crisply explain what this event counts. > > > > Examples of when a slot is available vs. unavailable would be extremely helpful. > > > > [*] https://www.intel.com/content/www/us/en/docs/vtune-profiler/cookbook/2023-0/top-down-microarchitecture-analysis-method.html > > Yeah, indeed, 'slots' is not easily understood from its literal meaning. I > also took some time to understand it when I look at this event for the first > time. Simply speaking, slots is an abstract concept which indicates how many > uops (decoded from instructions) can be processed simultaneously (per cycle) > on HW. we assume there is a classic 5-stage pipeline, fetch, decode, > execute, memory access and register writeback. In topdown > micro-architectural analysis method, the former two stages (fetch/decode) is > called front-end and the last three stages are called back-end. > > In modern Intel processors, a complicated instruction could be decoded into > several uops (micro-operations) and so these uops can be processed > simultaneously and then improve the performance. Thus, assume a processor > can decode and dispatch 4 uops in front-end and execute 4 uops in back-end > simultaneously (per-cycle), so we would say this processor has 4 topdown > slots per-cycle. If a slot is spare and can be used to process new uop, we > say it's available, but if a slot is occupied by a uop for several cycles > and not retired (maybe blocked by memory access), we say this slot is stall > and unavailable. In that case, can't the test assert that the count is at least NUM_INSNS_RETIRED? AFAIK, none of the sequences in the measured code can be fused, i.e. the test can assert that every instruction requires at least one uop, and IIUC, actually executing a uop requires an available slot at _some_ time. diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c index ae5f6042f1e8..29609b52f8fa 100644 --- a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c +++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c @@ -119,6 +119,9 @@ static void guest_assert_event_count(uint8_t idx, case INTEL_ARCH_REFERENCE_CYCLES_INDEX: GUEST_ASSERT_NE(count, 0); break; + case INTEL_ARCH_TOPDOWN_SLOTS_INDEX: + GUEST_ASSERT(count >= NUM_INSNS_RETIRED); + break; default: break; }
On 2/3/2024 1:24 AM, Sean Christopherson wrote: > On Fri, Feb 02, 2024, Dapeng Mi wrote: >> On 2/2/2024 2:02 AM, Sean Christopherson wrote: >>> On Thu, Feb 01, 2024, Dapeng Mi wrote: >>>> Although the fixed counter 3 and the exclusive pseudo slots events is >>>> not supported by KVM yet, the architectural slots event is supported by >>>> KVM and can be programed on any GP counter. Thus add validation for this >>>> architectural slots event. >>>> >>>> Top-down slots event "counts the total number of available slots for an >>>> unhalted logical processor, and increments by machine-width of the >>>> narrowest pipeline as employed by the Top-down Microarchitecture >>>> Analysis method." So suppose the measured count of slots event would be >>>> always larger than 0. >>> Please translate that into something non-perf folks can understand. I know what >>> a pipeline slot is, and I know a dictionary's definition of "available" is, but I >>> still have no idea what this event actually counts. In other words, I want a >>> precise definition of exactly what constitutes an "available slot", in verbiage >>> that anyone with basic understanding of x86 architectures can follow after reading >>> the whitepaper[*], which is helpful for understanding the concepts, but doesn't >>> crisply explain what this event counts. >>> >>> Examples of when a slot is available vs. unavailable would be extremely helpful. >>> >>> [*] https://www.intel.com/content/www/us/en/docs/vtune-profiler/cookbook/2023-0/top-down-microarchitecture-analysis-method.html >> Yeah, indeed, 'slots' is not easily understood from its literal meaning. I >> also took some time to understand it when I look at this event for the first >> time. Simply speaking, slots is an abstract concept which indicates how many >> uops (decoded from instructions) can be processed simultaneously (per cycle) >> on HW. we assume there is a classic 5-stage pipeline, fetch, decode, >> execute, memory access and register writeback. In topdown >> micro-architectural analysis method, the former two stages (fetch/decode) is >> called front-end and the last three stages are called back-end. >> >> In modern Intel processors, a complicated instruction could be decoded into >> several uops (micro-operations) and so these uops can be processed >> simultaneously and then improve the performance. Thus, assume a processor >> can decode and dispatch 4 uops in front-end and execute 4 uops in back-end >> simultaneously (per-cycle), so we would say this processor has 4 topdown >> slots per-cycle. If a slot is spare and can be used to process new uop, we >> say it's available, but if a slot is occupied by a uop for several cycles >> and not retired (maybe blocked by memory access), we say this slot is stall >> and unavailable. > In that case, can't the test assert that the count is at least NUM_INSNS_RETIRED? > AFAIK, none of the sequences in the measured code can be fused, i.e. the test can > assert that every instruction requires at least one uop, and IIUC, actually > executing a uop requires an available slot at _some_ time. Yeah, looks the instruction sequence can't be marco-fused on x86 platforms, the slots count should be equal or larger than NUM_INSNS_RETIRED. > > diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c > index ae5f6042f1e8..29609b52f8fa 100644 > --- a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c > +++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c > @@ -119,6 +119,9 @@ static void guest_assert_event_count(uint8_t idx, > case INTEL_ARCH_REFERENCE_CYCLES_INDEX: > GUEST_ASSERT_NE(count, 0); > break; > + case INTEL_ARCH_TOPDOWN_SLOTS_INDEX: > + GUEST_ASSERT(count >= NUM_INSNS_RETIRED); > + break; > default: > break; > }
diff --git a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c index ae5f6042f1e8..99bcb619b861 100644 --- a/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c +++ b/tools/testing/selftests/kvm/x86_64/pmu_counters_test.c @@ -117,6 +117,7 @@ static void guest_assert_event_count(uint8_t idx, fallthrough; case INTEL_ARCH_CPU_CYCLES_INDEX: case INTEL_ARCH_REFERENCE_CYCLES_INDEX: + case INTEL_ARCH_TOPDOWN_SLOTS_INDEX: GUEST_ASSERT_NE(count, 0); break; default: