[v4,1/6] x86: KVM: Advertise CMPccXADD CPUID to user space

Message ID 20221118141509.489359-2-jiaxi.chen@linux.intel.com
State New
Headers
Series x86: KVM: Advertise CPUID of new Intel platform instructions to user space |

Commit Message

Jiaxi Chen Nov. 18, 2022, 2:15 p.m. UTC
  CMPccXADD is a new set of instructions in the latest Intel platform
Sierra Forest. This new instruction set includes a semaphore operation
that can compare and add the operands if condition is met, which can
improve database performance.

The bit definition:
CPUID.(EAX=7,ECX=1):EAX[bit 7]

This CPUID is exposed to userspace. Besides, there is no other VMX
control for this instruction.

Signed-off-by: Jiaxi Chen <jiaxi.chen@linux.intel.com>
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 arch/x86/kvm/cpuid.c               | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)
  

Comments

Dave Hansen Nov. 18, 2022, 4:47 p.m. UTC | #1
On 11/18/22 06:15, Jiaxi Chen wrote:
> CMPccXADD is a new set of instructions in the latest Intel platform
> Sierra Forest. This new instruction set includes a semaphore operation
> that can compare and add the operands if condition is met, which can
> improve database performance.
> 
> The bit definition:
> CPUID.(EAX=7,ECX=1):EAX[bit 7]
> 
> This CPUID is exposed to userspace. Besides, there is no other VMX
> control for this instruction.

The last time you posted these, I asked:

> Intel folks, when you add these bits, can you please include information
> about the "vetting" that you performed?

I think you're alluding to that in your comment about VMX contols.
Could you be more explicit here and include *all* of your logic about
why this feature is OK to pass through to guests?

Also, do we *want* this showing up in /proc/cpuinfo?

There are also two distinct kinds of features that you're adding here.
These:

> +#define X86_FEATURE_CMPCCXADD           (12*32+ 7) /* CMPccXADD instructions */

and these:

+#define X86_FEATURE_PREFETCHITI         KVM_X86_FEATURE(CPUID_7_1_EDX, 14)

Could you also please include a sentence or two about why the feature
was treated on way versus another?  That's frankly a lot more important
than telling us which random Intel codename this shows up on first, or
wasting space on telling us what the CPUID bit definition is.  We can
kinda get that from the patch.

Another nit on these:

> This CPUID is exposed to userspace. Besides, there is no other VMX
> control for this instruction.

Please remember to use imperative voice when describing what the patch
in question does.  Using passive voice like that makes it seem like
you're describing the state of the art rather than the patch.

For example, that should probably be:

	Expose CMPCCXADD to KVM userspace.  This is safe because there
	are no new VMX controls or host enabling required for guests to
	use this feature.

See how that first sentence is giving orders?  It's *telling* you what
to do.  That's imperative voice and that's what you use to describe the
actions of *this* patch.
  
Borislav Petkov Nov. 18, 2022, 6:34 p.m. UTC | #2
On Fri, Nov 18, 2022 at 08:47:55AM -0800, Dave Hansen wrote:
> Also, do we *want* this showing up in /proc/cpuinfo?

Yeah, I was wondering about that. Currently, we try to add feature bits
to /proc/cpuinfo only when the kernel has done any enablement for them.
For other needs, people should use tools/arch/x86/kcpuid/

If we say that adding those bits so that guests can export them doesn't
count as a real enablement, then sure we can hide them initially.

I wanna say yes here...
  
Jiaxi Chen Nov. 21, 2022, 2:46 p.m. UTC | #3
On 11/19/2022 12:47 AM, Dave Hansen wrote:
> On 11/18/22 06:15, Jiaxi Chen wrote:
>> CMPccXADD is a new set of instructions in the latest Intel platform
>> Sierra Forest. This new instruction set includes a semaphore operation
>> that can compare and add the operands if condition is met, which can
>> improve database performance.
>>
>> The bit definition:
>> CPUID.(EAX=7,ECX=1):EAX[bit 7]
>>
>> This CPUID is exposed to userspace. Besides, there is no other VMX
>> control for this instruction.
> 
> The last time you posted these, I asked:
> 
>> Intel folks, when you add these bits, can you please include information
>> about the "vetting" that you performed?
> 
> I think you're alluding to that in your comment about VMX contols.
> Could you be more explicit here and include *all* of your logic about
> why this feature is OK to pass through to guests?
> 
Yes, that's very rigorous. Will check and add these for all features in
this patch series.

> Also, do we *want* this showing up in /proc/cpuinfo?
>
> There are also two distinct kinds of features that you're adding here.
> These:
> 
>> +#define X86_FEATURE_CMPCCXADD           (12*32+ 7) /* CMPccXADD instructions */
> 
> and these:
> 
> +#define X86_FEATURE_PREFETCHITI         KVM_X86_FEATURE(CPUID_7_1_EDX, 14)
> 
> Could you also please include a sentence or two about why the feature
> was treated on way versus another?  That's frankly a lot more important
> than telling us which random Intel codename this shows up on first, or
> wasting space on telling us what the CPUID bit definition is.  We can
> kinda get that from the patch.
Yes. A few words of description is necessary here.

Features which has been enabled in kernel usually should be added to
/proc/cpuinfo.

The first way is often used for bit whose leaf has many other bits in
use. It's very simple to do, just adding one line for each feature based
on existing words in can get the effect.

For those bits whose leaf has just a few bits in use, they should be
defined in a 'scattered' way. However, this kind of features in this
patch series have no other kernel usage and they just need to be
advertised to kvm userspace. Therefore, define them in a kvm-only way is
more explicit.

> 
> Another nit on these:
> 
>> This CPUID is exposed to userspace. Besides, there is no other VMX
>> control for this instruction.
> 
> Please remember to use imperative voice when describing what the patch
> in question does.  Using passive voice like that makes it seem like
> you're describing the state of the art rather than the patch.
> 
> For example, that should probably be:
> 
> 	Expose CMPCCXADD to KVM userspace.  This is safe because there
> 	are no new VMX controls or host enabling required for guests to
> 	use this feature.
> 
> See how that first sentence is giving orders?  It's *telling* you what
> to do.  That's imperative voice and that's what you use to describe the
> actions of *this* patch.

Appreciate your very detailed suggestions. Thanks very much!
  
Dave Hansen Nov. 21, 2022, 3:29 p.m. UTC | #4
On 11/21/22 06:46, Jiaxi Chen wrote:
> Features which has been enabled in kernel usually should be added to
> /proc/cpuinfo.

Features that the kernel *itself* is actually using always get in there.
 Things like "smep".

But, things that the kernel "enables" but that only get used by
userspace don't generally show up in /proc/cpuinfo.

KVM is kinda a weird case.  The kernel is making the feature available
to guests, but it's not _using_ it in any meaningful way.  To me, this
seems much more akin to the features that are just available to
userspace than something that the kernel is truly using.

Also, these feature names are just long and ugly, and the "flags" line
is already a human-*un*readable mess.  I think we should just leave them
out.
  
Borislav Petkov Nov. 21, 2022, 3:38 p.m. UTC | #5
On Mon, Nov 21, 2022 at 10:46:21PM +0800, Jiaxi Chen wrote:
> Features which has been enabled in kernel usually should be added to
> /proc/cpuinfo.

No, pls read this first: Documentation/x86/cpuinfo.rst

If something's not clear, we will extend it so that it is.

/proc/cpuinfo - a user ABI - is not a dumping ground for CPUID bits.
  
Sean Christopherson Nov. 21, 2022, 3:48 p.m. UTC | #6
On Mon, Nov 21, 2022, Dave Hansen wrote:
> On 11/21/22 06:46, Jiaxi Chen wrote:
> > Features which has been enabled in kernel usually should be added to
> > /proc/cpuinfo.
> 
> Features that the kernel *itself* is actually using always get in there.
>  Things like "smep".
> 
> But, things that the kernel "enables" but that only get used by
> userspace don't generally show up in /proc/cpuinfo.
> 
> KVM is kinda a weird case.  The kernel is making the feature available
> to guests, but it's not _using_ it in any meaningful way.  To me, this
> seems much more akin to the features that are just available to
> userspace than something that the kernel is truly using.

Actually, for these features that don't require additional KVM enabling, KVM isn't
making the feature avaiable to the guest.  KVM is just advertising to userspace
that KVM "supports" these features.  Userspace ultimately controls guest CPUID;
outside of a few special cases, KVM neither rejects nor filters unsupported bits
in CPUID.

Most VMMs will only enable features that KVM "officially" supports, but for these
features, nothing stops userspace from enumerating support to the guest without
waiting for KVM.

> Also, these feature names are just long and ugly, and the "flags" line
> is already a human-*un*readable mess.  I think we should just leave them
> out.

I have no strong opinion, either way works for me.
  
Borislav Petkov Nov. 21, 2022, 3:53 p.m. UTC | #7
On Mon, Nov 21, 2022 at 03:48:06PM +0000, Sean Christopherson wrote:
> Actually, for these features that don't require additional KVM enabling, KVM isn't
> making the feature avaiable to the guest.  KVM is just advertising to userspace
> that KVM "supports" these features.  Userspace ultimately controls guest CPUID;
> outside of a few special cases, KVM neither rejects nor filters unsupported bits
> in CPUID.

So is there any point to those "enable it in KVM" patches streaming constantly?

AFAICT, the only reason is to "pass through" the CPUID bit to the guest
in case KVM is not showing it in the intercepted CPUID...
  
Sean Christopherson Nov. 21, 2022, 5:28 p.m. UTC | #8
On Mon, Nov 21, 2022, Borislav Petkov wrote:
> On Mon, Nov 21, 2022 at 03:48:06PM +0000, Sean Christopherson wrote:
> > Actually, for these features that don't require additional KVM enabling, KVM isn't
> > making the feature avaiable to the guest.  KVM is just advertising to userspace
> > that KVM "supports" these features.  Userspace ultimately controls guest CPUID;
> > outside of a few special cases, KVM neither rejects nor filters unsupported bits
> > in CPUID.
> 
> So is there any point to those "enable it in KVM" patches streaming constantly?

Yes.  Most userspace VMMs sanitize their CPUID models based on KVM_GET_SUPPORTED_CPUID,
e.g. by default, QEMU will refuse to enable features in guest CPUID that aren't
reported as supported by KVM.

Another use case is for userspace to blindly use the result of KVM_GET_SUPPORTED_CPUID
as the guest's CPUID model, e.g. for using KVM to isolate code as opposed to standing
up a traditional virtual machine.  For that use case, userspace again relies on KVM to
enumerate support.

What I was trying to call out in the above is that the KVM "enabling" technically
doesn't expose the feature to the guest.  E.g. a clever guest could ignore CPUID
and probe the relevant instructions manually by seeing whether or not they #UD.
  
Borislav Petkov Nov. 21, 2022, 7:50 p.m. UTC | #9
On Mon, Nov 21, 2022 at 05:28:39PM +0000, Sean Christopherson wrote:
> Yes.  Most userspace VMMs sanitize their CPUID models based on KVM_GET_SUPPORTED_CPUID,
> e.g. by default, QEMU will refuse to enable features in guest CPUID that aren't
> reported as supported by KVM.
> 
> Another use case is for userspace to blindly use the result of KVM_GET_SUPPORTED_CPUID
> as the guest's CPUID model, e.g. for using KVM to isolate code as opposed to standing
> up a traditional virtual machine.  For that use case, userspace again relies on KVM to
> enumerate support.

Ah ok, thx.

/me makes a mental note.

> What I was trying to call out in the above is that the KVM "enabling" technically
> doesn't expose the feature to the guest.  E.g. a clever guest could ignore CPUID
> and probe the relevant instructions manually by seeing whether or not they #UD.

As can a nasty userspace on baremetal. That's why /proc/cpuinfo is not
really the authority of what's supported and we're going away from
treating it that way.
  
Jiaxi Chen Nov. 23, 2022, 6:33 a.m. UTC | #10
On 11/21/2022 11:29 PM, Dave Hansen wrote:
> On 11/21/22 06:46, Jiaxi Chen wrote:
>> Features which has been enabled in kernel usually should be added to
>> /proc/cpuinfo.
> 
> Features that the kernel *itself* is actually using always get in there.
>  Things like "smep".
> 
> But, things that the kernel "enables" but that only get used by
> userspace don't generally show up in /proc/cpuinfo.
> 
> KVM is kinda a weird case.  The kernel is making the feature available
> to guests, but it's not _using_ it in any meaningful way.  To me, this
> seems much more akin to the features that are just available to
> userspace than something that the kernel is truly using.
> 
> Also, these feature names are just long and ugly, and the "flags" line
> is already a human-*un*readable mess.  I think we should just leave them
> out.

True and agree. As for these cpuids are not truly used by kernel except
for advertising to kvm userspace, we can hide them in /proc/cpuinfo by
overriding their name with "".
  
Jiaxi Chen Nov. 23, 2022, 7:46 a.m. UTC | #11
On 11/21/2022 11:38 PM, Borislav Petkov wrote:
> On Mon, Nov 21, 2022 at 10:46:21PM +0800, Jiaxi Chen wrote:
>> Features which has been enabled in kernel usually should be added to
>> /proc/cpuinfo.
> 
> No, pls read this first: Documentation/x86/cpuinfo.rst
> 
> If something's not clear, we will extend it so that it is.
> 
> /proc/cpuinfo - a user ABI - is not a dumping ground for CPUID bits.
> 

Thanks. Sorry for the miss understanding.

For those feature bits who have truly kernel usage, their flags should
appear in /proc/cpuinfo. For others, they are not generally show up
here, it depends.

As for features in this patch series:

The first-way defined bits are on an expected-dense cpuid leaf[1] and
some of their siblings have kernel usages[2]. Given that, define them
like X86_FEATURE_* in arch/x86/include/asm/cpufeatures.h. But due to
their complicated and unreadable feature name[3], prefer to hide them in
/proc/cpuinfo.

The second-way defined bits are on a new and sparse cpuid leaf. Besides,
these bits have no turly kernel use case. Therefore, move these new bits
to kvm-only leaves to achieve the purpose for advertising these bits to
kvm userspace[4]. Then of course they will not show up in /proc/cpuinfo.

[1] https://lore.kernel.org/all/Y3O7UYWfOLfJkwM%2F@zn.tnic/
[2]
https://lore.kernel.org/all/f8607d23-afaa-2670-dd03-2ae8ec1e79a0@intel.com/
[3]
https://lore.kernel.org/all/6d7fae50-ef3c-dc1e-336c-691095007117@intel.com/
[4] https://lore.kernel.org/all/Y1ATKF2xjERFbspn@google.com/
  

Patch

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index b71f4f2ecdd5..19db3940f262 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -308,6 +308,7 @@ 
 /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX_VNNI		(12*32+ 4) /* AVX VNNI instructions */
 #define X86_FEATURE_AVX512_BF16		(12*32+ 5) /* AVX512 BFLOAT16 instructions */
+#define X86_FEATURE_CMPCCXADD           (12*32+ 7) /* CMPccXADD instructions */
 
 /* AMD-defined CPU features, CPUID level 0x80000008 (EBX), word 13 */
 #define X86_FEATURE_CLZERO		(13*32+ 0) /* CLZERO instruction */
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 62bc7a01cecc..7ab7cc717b1c 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -657,7 +657,7 @@  void kvm_set_cpu_caps(void)
 		kvm_cpu_cap_set(X86_FEATURE_SPEC_CTRL_SSBD);
 
 	kvm_cpu_cap_mask(CPUID_7_1_EAX,
-		F(AVX_VNNI) | F(AVX512_BF16)
+		F(AVX_VNNI) | F(AVX512_BF16) | F(CMPCCXADD)
 	);
 
 	kvm_cpu_cap_mask(CPUID_D_1_EAX,