Message ID | 20221118141509.489359-2-jiaxi.chen@linux.intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp218582wrr; Fri, 18 Nov 2022 06:16:01 -0800 (PST) X-Google-Smtp-Source: AA0mqf54IkIt9TwvPLbYJPbA6sxwW1+BFq5QMKXC2z6POnbYZ654ax3Hpd0En/kll72EdJZMDzGt X-Received: by 2002:a17:902:bf06:b0:17b:d6ad:94c8 with SMTP id bi6-20020a170902bf0600b0017bd6ad94c8mr7807473plb.110.1668780961498; Fri, 18 Nov 2022 06:16:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668780961; cv=none; d=google.com; s=arc-20160816; b=aEj7gGelD5EYTW0oee57DKO5aWb1FMQQxiZC76n4NMpmUgownIU/megTS+CgYO2i4S SEl3dVBc+j3TKbaX9FQCJzzkmZRrH3ttMLuxQO8NhkCnW27wHD13zlTlShRC82Dt96Zu iT4XpvWx2rzUuDrlCvIrAVw47uPUSdc1O4w49h8B71Ry5ZxhB0IdZWGTp6Ej5xiYr6kc LirdLFCc800J9kRdzCSUOLwB9MxBToc+SxoiBpKBeYHxYa+bfcDwIB1bURya4orHW2ug xjY+oeHJthxdKA+CSlfibQPo7KyK90R0lkHHw0JAk1kU/WouqTZxFD8x486+As1ge49s z17A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=Kltk2nD3Mrcnmy3G4qc76Cjad5bdzhp6gl4wQ9oDRTc=; b=kVZLt29WhzLdUDkIY38OQgc4qp78QtHAOGVggfYBsfmlu3Rw98Sd9C1Ij/Ajs01iJd g6Fe+fvg1qfpAyqk4UXjO84EQ+sZnBq9jBx+JzWMcXoo7gDC3br51oDpCk6VTZGKULSW K9blOVMIFEydsnR32kCK4uqxTWWXlxpJVKZee6odElXlOcte1hAE/HyW4dLnykE9CyuY 8T/oJ/blIZWWOZFsyCNYZp1lDwLae08dAcwL3yuImVRbCg6HINvOQRXvTfMEvw4/Sv25 oO2ExZGriHxg9kv6QIKaxpFT8jleoaO/74+B1auJ3lc+3nk46HMXCnNa2UcqeS+kvGH4 YI3w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=mfu4VgRd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s7-20020a639247000000b0046ebaed8e45si3849057pgn.62.2022.11.18.06.15.46; Fri, 18 Nov 2022 06:16:01 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=mfu4VgRd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242234AbiKROPT (ORCPT <rfc822;kkmonlee@gmail.com> + 99 others); Fri, 18 Nov 2022 09:15:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48888 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242216AbiKROPM (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 18 Nov 2022 09:15:12 -0500 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 27DB266C9C; Fri, 18 Nov 2022 06:15:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1668780912; x=1700316912; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=18jsSFYGJy6gbnsbxxDweHKGvPhAtMZ4eTFHplPUl9k=; b=mfu4VgRdZBG0idDlssInTmD6n3wPVWo0NT2uAT/dMnHEg1mIJ2GK6dJk MNQb0fCQiHQHn2ka6reQvwx2B7Szng+aO8Aa95KcgAUaUkjoDc789E7vy mUfHnF33lpK6AnDCg4Q6IG0yCO83cVctLCvvKETspi5W0jjC+ihb4LiHK Rc2GkShALVKCDswZMRJWtCmWwMGw0sW3/UC8CsAzA9uI8YBxEcQIgN5Kq OK6MSD3DESyE5RGbIcKZ8WUYJjWAQnBCuHAhqEZy+QbC6Ip7G8i4db5fW u4qXfnUi1ABoSZ3zuH/BgqWUZUsVAv6GTcRf84dNR2tp2DkWdPp4WBtjC Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10535"; a="292842728" X-IronPort-AV: E=Sophos;i="5.96,174,1665471600"; d="scan'208";a="292842728" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Nov 2022 06:15:11 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10535"; a="673204436" X-IronPort-AV: E=Sophos;i="5.96,174,1665471600"; d="scan'208";a="673204436" Received: from jiaxichen-precision-3650-tower.sh.intel.com ([10.239.159.75]) by orsmga001.jf.intel.com with ESMTP; 18 Nov 2022 06:15:06 -0800 From: Jiaxi Chen <jiaxi.chen@linux.intel.com> To: kvm@vger.kernel.org Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, seanjc@google.com, pbonzini@redhat.com, ndesaulniers@google.com, alexandre.belloni@bootlin.com, peterz@infradead.org, jpoimboe@kernel.org, chang.seok.bae@intel.com, pawan.kumar.gupta@linux.intel.com, babu.moger@amd.com, jmattson@google.com, sandipan.das@amd.com, tony.luck@intel.com, sathyanarayanan.kuppuswamy@linux.intel.com, fenghua.yu@intel.com, keescook@chromium.org, nathan@kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v4 1/6] x86: KVM: Advertise CMPccXADD CPUID to user space Date: Fri, 18 Nov 2022 22:15:04 +0800 Message-Id: <20221118141509.489359-2-jiaxi.chen@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221118141509.489359-1-jiaxi.chen@linux.intel.com> References: <20221118141509.489359-1-jiaxi.chen@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_PASS, SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1749843665455639061?= X-GMAIL-MSGID: =?utf-8?q?1749843665455639061?= |
Series |
x86: KVM: Advertise CPUID of new Intel platform instructions to user space
|
|
Commit Message
Jiaxi Chen
Nov. 18, 2022, 2:15 p.m. UTC
CMPccXADD is a new set of instructions in the latest Intel platform
Sierra Forest. This new instruction set includes a semaphore operation
that can compare and add the operands if condition is met, which can
improve database performance.
The bit definition:
CPUID.(EAX=7,ECX=1):EAX[bit 7]
This CPUID is exposed to userspace. Besides, there is no other VMX
control for this instruction.
Signed-off-by: Jiaxi Chen <jiaxi.chen@linux.intel.com>
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/kvm/cpuid.c | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)
Comments
On 11/18/22 06:15, Jiaxi Chen wrote: > CMPccXADD is a new set of instructions in the latest Intel platform > Sierra Forest. This new instruction set includes a semaphore operation > that can compare and add the operands if condition is met, which can > improve database performance. > > The bit definition: > CPUID.(EAX=7,ECX=1):EAX[bit 7] > > This CPUID is exposed to userspace. Besides, there is no other VMX > control for this instruction. The last time you posted these, I asked: > Intel folks, when you add these bits, can you please include information > about the "vetting" that you performed? I think you're alluding to that in your comment about VMX contols. Could you be more explicit here and include *all* of your logic about why this feature is OK to pass through to guests? Also, do we *want* this showing up in /proc/cpuinfo? There are also two distinct kinds of features that you're adding here. These: > +#define X86_FEATURE_CMPCCXADD (12*32+ 7) /* CMPccXADD instructions */ and these: +#define X86_FEATURE_PREFETCHITI KVM_X86_FEATURE(CPUID_7_1_EDX, 14) Could you also please include a sentence or two about why the feature was treated on way versus another? That's frankly a lot more important than telling us which random Intel codename this shows up on first, or wasting space on telling us what the CPUID bit definition is. We can kinda get that from the patch. Another nit on these: > This CPUID is exposed to userspace. Besides, there is no other VMX > control for this instruction. Please remember to use imperative voice when describing what the patch in question does. Using passive voice like that makes it seem like you're describing the state of the art rather than the patch. For example, that should probably be: Expose CMPCCXADD to KVM userspace. This is safe because there are no new VMX controls or host enabling required for guests to use this feature. See how that first sentence is giving orders? It's *telling* you what to do. That's imperative voice and that's what you use to describe the actions of *this* patch.
On Fri, Nov 18, 2022 at 08:47:55AM -0800, Dave Hansen wrote:
> Also, do we *want* this showing up in /proc/cpuinfo?
Yeah, I was wondering about that. Currently, we try to add feature bits
to /proc/cpuinfo only when the kernel has done any enablement for them.
For other needs, people should use tools/arch/x86/kcpuid/
If we say that adding those bits so that guests can export them doesn't
count as a real enablement, then sure we can hide them initially.
I wanna say yes here...
On 11/19/2022 12:47 AM, Dave Hansen wrote: > On 11/18/22 06:15, Jiaxi Chen wrote: >> CMPccXADD is a new set of instructions in the latest Intel platform >> Sierra Forest. This new instruction set includes a semaphore operation >> that can compare and add the operands if condition is met, which can >> improve database performance. >> >> The bit definition: >> CPUID.(EAX=7,ECX=1):EAX[bit 7] >> >> This CPUID is exposed to userspace. Besides, there is no other VMX >> control for this instruction. > > The last time you posted these, I asked: > >> Intel folks, when you add these bits, can you please include information >> about the "vetting" that you performed? > > I think you're alluding to that in your comment about VMX contols. > Could you be more explicit here and include *all* of your logic about > why this feature is OK to pass through to guests? > Yes, that's very rigorous. Will check and add these for all features in this patch series. > Also, do we *want* this showing up in /proc/cpuinfo? > > There are also two distinct kinds of features that you're adding here. > These: > >> +#define X86_FEATURE_CMPCCXADD (12*32+ 7) /* CMPccXADD instructions */ > > and these: > > +#define X86_FEATURE_PREFETCHITI KVM_X86_FEATURE(CPUID_7_1_EDX, 14) > > Could you also please include a sentence or two about why the feature > was treated on way versus another? That's frankly a lot more important > than telling us which random Intel codename this shows up on first, or > wasting space on telling us what the CPUID bit definition is. We can > kinda get that from the patch. Yes. A few words of description is necessary here. Features which has been enabled in kernel usually should be added to /proc/cpuinfo. The first way is often used for bit whose leaf has many other bits in use. It's very simple to do, just adding one line for each feature based on existing words in can get the effect. For those bits whose leaf has just a few bits in use, they should be defined in a 'scattered' way. However, this kind of features in this patch series have no other kernel usage and they just need to be advertised to kvm userspace. Therefore, define them in a kvm-only way is more explicit. > > Another nit on these: > >> This CPUID is exposed to userspace. Besides, there is no other VMX >> control for this instruction. > > Please remember to use imperative voice when describing what the patch > in question does. Using passive voice like that makes it seem like > you're describing the state of the art rather than the patch. > > For example, that should probably be: > > Expose CMPCCXADD to KVM userspace. This is safe because there > are no new VMX controls or host enabling required for guests to > use this feature. > > See how that first sentence is giving orders? It's *telling* you what > to do. That's imperative voice and that's what you use to describe the > actions of *this* patch. Appreciate your very detailed suggestions. Thanks very much!
On 11/21/22 06:46, Jiaxi Chen wrote: > Features which has been enabled in kernel usually should be added to > /proc/cpuinfo. Features that the kernel *itself* is actually using always get in there. Things like "smep". But, things that the kernel "enables" but that only get used by userspace don't generally show up in /proc/cpuinfo. KVM is kinda a weird case. The kernel is making the feature available to guests, but it's not _using_ it in any meaningful way. To me, this seems much more akin to the features that are just available to userspace than something that the kernel is truly using. Also, these feature names are just long and ugly, and the "flags" line is already a human-*un*readable mess. I think we should just leave them out.
On Mon, Nov 21, 2022 at 10:46:21PM +0800, Jiaxi Chen wrote: > Features which has been enabled in kernel usually should be added to > /proc/cpuinfo. No, pls read this first: Documentation/x86/cpuinfo.rst If something's not clear, we will extend it so that it is. /proc/cpuinfo - a user ABI - is not a dumping ground for CPUID bits.
On Mon, Nov 21, 2022, Dave Hansen wrote: > On 11/21/22 06:46, Jiaxi Chen wrote: > > Features which has been enabled in kernel usually should be added to > > /proc/cpuinfo. > > Features that the kernel *itself* is actually using always get in there. > Things like "smep". > > But, things that the kernel "enables" but that only get used by > userspace don't generally show up in /proc/cpuinfo. > > KVM is kinda a weird case. The kernel is making the feature available > to guests, but it's not _using_ it in any meaningful way. To me, this > seems much more akin to the features that are just available to > userspace than something that the kernel is truly using. Actually, for these features that don't require additional KVM enabling, KVM isn't making the feature avaiable to the guest. KVM is just advertising to userspace that KVM "supports" these features. Userspace ultimately controls guest CPUID; outside of a few special cases, KVM neither rejects nor filters unsupported bits in CPUID. Most VMMs will only enable features that KVM "officially" supports, but for these features, nothing stops userspace from enumerating support to the guest without waiting for KVM. > Also, these feature names are just long and ugly, and the "flags" line > is already a human-*un*readable mess. I think we should just leave them > out. I have no strong opinion, either way works for me.
On Mon, Nov 21, 2022 at 03:48:06PM +0000, Sean Christopherson wrote: > Actually, for these features that don't require additional KVM enabling, KVM isn't > making the feature avaiable to the guest. KVM is just advertising to userspace > that KVM "supports" these features. Userspace ultimately controls guest CPUID; > outside of a few special cases, KVM neither rejects nor filters unsupported bits > in CPUID. So is there any point to those "enable it in KVM" patches streaming constantly? AFAICT, the only reason is to "pass through" the CPUID bit to the guest in case KVM is not showing it in the intercepted CPUID...
On Mon, Nov 21, 2022, Borislav Petkov wrote: > On Mon, Nov 21, 2022 at 03:48:06PM +0000, Sean Christopherson wrote: > > Actually, for these features that don't require additional KVM enabling, KVM isn't > > making the feature avaiable to the guest. KVM is just advertising to userspace > > that KVM "supports" these features. Userspace ultimately controls guest CPUID; > > outside of a few special cases, KVM neither rejects nor filters unsupported bits > > in CPUID. > > So is there any point to those "enable it in KVM" patches streaming constantly? Yes. Most userspace VMMs sanitize their CPUID models based on KVM_GET_SUPPORTED_CPUID, e.g. by default, QEMU will refuse to enable features in guest CPUID that aren't reported as supported by KVM. Another use case is for userspace to blindly use the result of KVM_GET_SUPPORTED_CPUID as the guest's CPUID model, e.g. for using KVM to isolate code as opposed to standing up a traditional virtual machine. For that use case, userspace again relies on KVM to enumerate support. What I was trying to call out in the above is that the KVM "enabling" technically doesn't expose the feature to the guest. E.g. a clever guest could ignore CPUID and probe the relevant instructions manually by seeing whether or not they #UD.
On Mon, Nov 21, 2022 at 05:28:39PM +0000, Sean Christopherson wrote: > Yes. Most userspace VMMs sanitize their CPUID models based on KVM_GET_SUPPORTED_CPUID, > e.g. by default, QEMU will refuse to enable features in guest CPUID that aren't > reported as supported by KVM. > > Another use case is for userspace to blindly use the result of KVM_GET_SUPPORTED_CPUID > as the guest's CPUID model, e.g. for using KVM to isolate code as opposed to standing > up a traditional virtual machine. For that use case, userspace again relies on KVM to > enumerate support. Ah ok, thx. /me makes a mental note. > What I was trying to call out in the above is that the KVM "enabling" technically > doesn't expose the feature to the guest. E.g. a clever guest could ignore CPUID > and probe the relevant instructions manually by seeing whether or not they #UD. As can a nasty userspace on baremetal. That's why /proc/cpuinfo is not really the authority of what's supported and we're going away from treating it that way.
On 11/21/2022 11:29 PM, Dave Hansen wrote: > On 11/21/22 06:46, Jiaxi Chen wrote: >> Features which has been enabled in kernel usually should be added to >> /proc/cpuinfo. > > Features that the kernel *itself* is actually using always get in there. > Things like "smep". > > But, things that the kernel "enables" but that only get used by > userspace don't generally show up in /proc/cpuinfo. > > KVM is kinda a weird case. The kernel is making the feature available > to guests, but it's not _using_ it in any meaningful way. To me, this > seems much more akin to the features that are just available to > userspace than something that the kernel is truly using. > > Also, these feature names are just long and ugly, and the "flags" line > is already a human-*un*readable mess. I think we should just leave them > out. True and agree. As for these cpuids are not truly used by kernel except for advertising to kvm userspace, we can hide them in /proc/cpuinfo by overriding their name with "".
On 11/21/2022 11:38 PM, Borislav Petkov wrote: > On Mon, Nov 21, 2022 at 10:46:21PM +0800, Jiaxi Chen wrote: >> Features which has been enabled in kernel usually should be added to >> /proc/cpuinfo. > > No, pls read this first: Documentation/x86/cpuinfo.rst > > If something's not clear, we will extend it so that it is. > > /proc/cpuinfo - a user ABI - is not a dumping ground for CPUID bits. > Thanks. Sorry for the miss understanding. For those feature bits who have truly kernel usage, their flags should appear in /proc/cpuinfo. For others, they are not generally show up here, it depends. As for features in this patch series: The first-way defined bits are on an expected-dense cpuid leaf[1] and some of their siblings have kernel usages[2]. Given that, define them like X86_FEATURE_* in arch/x86/include/asm/cpufeatures.h. But due to their complicated and unreadable feature name[3], prefer to hide them in /proc/cpuinfo. The second-way defined bits are on a new and sparse cpuid leaf. Besides, these bits have no turly kernel use case. Therefore, move these new bits to kvm-only leaves to achieve the purpose for advertising these bits to kvm userspace[4]. Then of course they will not show up in /proc/cpuinfo. [1] https://lore.kernel.org/all/Y3O7UYWfOLfJkwM%2F@zn.tnic/ [2] https://lore.kernel.org/all/f8607d23-afaa-2670-dd03-2ae8ec1e79a0@intel.com/ [3] https://lore.kernel.org/all/6d7fae50-ef3c-dc1e-336c-691095007117@intel.com/ [4] https://lore.kernel.org/all/Y1ATKF2xjERFbspn@google.com/
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index b71f4f2ecdd5..19db3940f262 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -308,6 +308,7 @@ /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */ #define X86_FEATURE_AVX_VNNI (12*32+ 4) /* AVX VNNI instructions */ #define X86_FEATURE_AVX512_BF16 (12*32+ 5) /* AVX512 BFLOAT16 instructions */ +#define X86_FEATURE_CMPCCXADD (12*32+ 7) /* CMPccXADD instructions */ /* AMD-defined CPU features, CPUID level 0x80000008 (EBX), word 13 */ #define X86_FEATURE_CLZERO (13*32+ 0) /* CLZERO instruction */ diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 62bc7a01cecc..7ab7cc717b1c 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -657,7 +657,7 @@ void kvm_set_cpu_caps(void) kvm_cpu_cap_set(X86_FEATURE_SPEC_CTRL_SSBD); kvm_cpu_cap_mask(CPUID_7_1_EAX, - F(AVX_VNNI) | F(AVX512_BF16) + F(AVX_VNNI) | F(AVX512_BF16) | F(CMPCCXADD) ); kvm_cpu_cap_mask(CPUID_D_1_EAX,