Message ID | 20231004002038.907778-1-jmattson@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:2a8e:b0:403:3b70:6f57 with SMTP id in14csp2433536vqb; Tue, 3 Oct 2023 17:21:43 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGlSseDeFaTuqwMMBEXKCT/xFWVREIB9gkPDZI4tqrrK/uuMFRwYXaeyUTdDXd99bIqxNsS X-Received: by 2002:a05:6808:1411:b0:3ae:2b08:549d with SMTP id w17-20020a056808141100b003ae2b08549dmr1222177oiv.37.1696378903417; Tue, 03 Oct 2023 17:21:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696378903; cv=none; d=google.com; s=arc-20160816; b=YKfyGgOoRYVUO+NLbuGJ6esTpqGkVrwHcbc3cHK7tUb18A0S7uXsoGLvm8DbUMWJvp IM/hEpbBJ6zRaruqxtQn67FDOi2Ds2zCxm/GaWOzSQ7Hg9t390AcjcFh+cgjPzyi/YCm VOhj+XfJmva8x74Uz4By93Oa31diSFopDjehyhigjR9qoajh/zeSk42muwbNp5yAzIg9 1Y3+TCD/9tnnNVVdv0dkAgoppsAf6SqD8TPBISzbDrl+VUzbGZmXC+nQ/3/QmoxzK/nd luAD1SJISLa9JzxHvJPYfFK6fcImnYtwb1YYcueU6P3zSVkT21ADwKVgvY/B2/hWqDiL LGNg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:mime-version:date :dkim-signature; bh=xgPwohGXCtmI6X+8NYz58Cu3MtfPGyJKCVYsJcgu7r0=; fh=rjpEzXaGRx7eyD6h8nBtz7sAfuqib16qPeK99DzopzM=; b=Dkduh+h55+uUIb2kViABE7ewlzPQhm2Z9QLIU1MCUUxbEl1ZCRkKwFC03YX9T+azL7 eTC1dOEVANoxoIUXUCG6NgF8p8dtDTpqwxvqmbfitcg2R11dyS9Pi6IpiZit+siToNOP wgDT1QJXyebl50xCHizfUvHq/BAg5soA6M9Zhw2cc0jzWMmG+FT1R4Ln94yfQDZ1GRnz QMhycOj27GC+cyKw53NPsDMrvB2XXO6iGR0Vjg2Awx9jlJftssJlpOXbmvCjeoGHO+m1 W/NHyJMWcAzBOq/x0atLS4MGT3Ey0TMDQPTdSGAlOu1xsh4iWlwxi0UbHqrgatz71Til wQpg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=skz6FcOu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from howler.vger.email (howler.vger.email. [2620:137:e000::3:4]) by mx.google.com with ESMTPS id r65-20020a632b44000000b0056fc3ceaba4si2709127pgr.432.2023.10.03.17.21.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 03 Oct 2023 17:21:43 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) client-ip=2620:137:e000::3:4; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=skz6FcOu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id B6C21809AF97; Tue, 3 Oct 2023 17:21:17 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238812AbjJDAVP (ORCPT <rfc822;chrisfriedt@gmail.com> + 17 others); Tue, 3 Oct 2023 20:21:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58448 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238557AbjJDAVO (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 3 Oct 2023 20:21:14 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 38988A7 for <linux-kernel@vger.kernel.org>; Tue, 3 Oct 2023 17:21:11 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-5a22eb73cb3so23503757b3.3 for <linux-kernel@vger.kernel.org>; Tue, 03 Oct 2023 17:21:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1696378870; x=1696983670; darn=vger.kernel.org; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=xgPwohGXCtmI6X+8NYz58Cu3MtfPGyJKCVYsJcgu7r0=; b=skz6FcOuiY5MvlXPfa2GeYSxiEPaFAB1bY874odJJtLPDCWj4dYojpOoz8V4014leb 6ZkGgJdM0zt18nVMIi1e49HBVb4WbxZap9s/dbwKUH3y7MMQgyZdJSMTe70V41B0MCFK /AqsKl3Ega/WGWdXyZ1kxBjBO3xHEk8WsanhCHIH1GFUJuHj+w+eiGgHacRHVzJ2a6ne 87WkCk6oAtigIZycEgoolmcd5tgtGRjMyJpHqH1MEeAP9OYuMgMwfY+fO1lg21HiPGCd vgZt4d0LcW6tqKQmoIzxg+aPf2LZRWmciMF25PRPM6eOmid0gWuPtENQfMk/5nWDI6JJ 1ADA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696378870; x=1696983670; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=xgPwohGXCtmI6X+8NYz58Cu3MtfPGyJKCVYsJcgu7r0=; b=hqVbycEbpzfgEctaUK0eXRF3VtmbgYaRahL2kIvyzoUiyhoHKPGWlzGisNIxOO+8g6 CZrU/0uKRKr0E34yQvPn3WpSnfmWR5pKb83olQ0FxujKfatdXMLR9oGLLo1X0sGkg/3l VWoNIxpnoeyqjiOtYPirGmvuzDhxX3xnqwFT2Rnyv8b62Mk0KAvT1V68j1GT0jogyizr hDbAhDNtQVz6YJoNzKcgN/8/AylYv8QcYlzaZy0o1+nHt8O8GLa/zXkyNUnvhle1lPIk 9AApalyg+kpIrK3Lw48hGOxKEUvtUTCO4uPGXIuUMMGDMjhaLKgNQiSk+vvCmXZphogN dBew== X-Gm-Message-State: AOJu0YzRedjTjPNEvphyZsyW1DA61hkMMGZstaIOkHX50MS7grU9MUaD J5+zSXCAyqN2qaj1S7PDzHo8GLXUwtySGA== X-Received: from loggerhead.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:29a]) (user=jmattson job=sendgmr) by 2002:a81:7609:0:b0:59b:f3a2:cd79 with SMTP id r9-20020a817609000000b0059bf3a2cd79mr18329ywc.8.1696378870385; Tue, 03 Oct 2023 17:21:10 -0700 (PDT) Date: Tue, 3 Oct 2023 17:20:38 -0700 Mime-Version: 1.0 X-Mailer: git-send-email 2.42.0.582.g8ccd20d70d-goog Message-ID: <20231004002038.907778-1-jmattson@google.com> Subject: [PATCH] x86: KVM: Add feature flag for AMD's FsGsKernelGsBaseNonSerializing From: Jim Mattson <jmattson@google.com> To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Pawan Gupta <pawan.kumar.gupta@linux.intel.com>, Jiaxi Chen <jiaxi.chen@linux.intel.com>, Kim Phillips <kim.phillips@amd.com>, Paolo Bonzini <pbonzini@redhat.com>, Sean Christopherson <seanjc@google.com>, "H. Peter Anvin" <hpa@zytor.com>, x86@kernel.org, Dave Hansen <dave.hansen@linux.intel.com>, Borislav Petkov <bp@alien8.de>, Ingo Molnar <mingo@redhat.com>, Thomas Gleixner <tglx@linutronix.de> Cc: Jim Mattson <jmattson@google.com> Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Tue, 03 Oct 2023 17:21:17 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1778782204806756792 X-GMAIL-MSGID: 1778782204806756792 |
Series |
x86: KVM: Add feature flag for AMD's FsGsKernelGsBaseNonSerializing
|
|
Commit Message
Jim Mattson
Oct. 4, 2023, 12:20 a.m. UTC
Define an X86_FEATURE_* flag for
CPUID.80000021H:EAX.FsGsKernelGsBaseNonSerializing[bit 1], and
advertise the feature to userspace via KVM_GET_SUPPORTED_CPUID.
This feature is not yet documented in the APM. See AMD's "Processor
Programming Reference (PPR) for AMD Family 19h Model 61h, Revision B1
Processors (56713-B1-PUB)."
Signed-off-by: Jim Mattson <jmattson@google.com>
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/kvm/cpuid.c | 3 ++-
2 files changed, 3 insertions(+), 1 deletion(-)
Comments
On 10/3/23 17:20, Jim Mattson wrote: > Define an X86_FEATURE_* flag for > CPUID.80000021H:EAX.FsGsKernelGsBaseNonSerializing[bit 1], and > advertise the feature to userspace via KVM_GET_SUPPORTED_CPUID. ... > +#define X86_FEATURE_BASES_NON_SERIAL (20*32+ 1) /* "" FSBASE, GSBASE, and KERNELGSBASE are non-serializing */ This is failing to differentiate two *VERY* different things. FSBASE, GSBASE, and KERNELGSBASE themselves are registers. They have *NOTHING* to do with serialization. WRFSBASE, for instance is not serializing. Reading (with RDMSR) or using any of those three registers is not serializing. The *ONLY* thing that relates them to serialization is the WRMSR instruction which itself is (mostly) architecturally serializing and the fact that WRMSR has historically been the main way to write those three registers. The AMD docs call this out, which helps. But the changelog, comments and probably the feature naming need some work. Why does this matter, btw? Why do guests need this bit passed through?
On Tue, Oct 3, 2023 at 5:57 PM Dave Hansen <dave.hansen@intel.com> wrote: > > On 10/3/23 17:20, Jim Mattson wrote: > > Define an X86_FEATURE_* flag for > > CPUID.80000021H:EAX.FsGsKernelGsBaseNonSerializing[bit 1], and > > advertise the feature to userspace via KVM_GET_SUPPORTED_CPUID. > ... > > +#define X86_FEATURE_BASES_NON_SERIAL (20*32+ 1) /* "" FSBASE, GSBASE, and KERNELGSBASE are non-serializing */ > > This is failing to differentiate two *VERY* different things. > > FSBASE, GSBASE, and KERNELGSBASE themselves are registers. They have > *NOTHING* to do with serialization. WRFSBASE, for instance is not > serializing. Reading (with RDMSR) or using any of those three registers > is not serializing. > > The *ONLY* thing that relates them to serialization is the WRMSR > instruction which itself is (mostly) architecturally serializing and the > fact that WRMSR has historically been the main way to write those three > registers. > > The AMD docs call this out, which helps. But the changelog, comments > and probably the feature naming need some work. You're right; I was overly terse. I'll elucidate in v2. > Why does this matter, btw? Why do guests need this bit passed through? The business of declaring breaking changes to the architectural specification in a CPUID bit has never made much sense to me. Legacy software that depends on the original architectural specification isn't going to query the CPUID bit, because the CPUID bit didn't exist when it was written. New software probably isn't going to query the CPUID bit, either, because it has to have an implementation that works on newer processors regardless. Why, then, would a developer bother to provide an implementation that only works on older processors *and* the code to select an implementation based on a CPUID bit? Take, for example, CPUID.(EAX=7,ECX=0):EBX[bit 13], which, IIRC, was the first CPUID bit of the "Ha ha; we're changing the architectural specification" category. When Intel introduced this new behavior in Haswell, they broke WIN87EM.DLL in Windows XP (see https://communities.vmware.com/t5/Legacy-User-Blogs/General-Protection-Fault-in-module-WIN87EM-DLL-at-0001-02C6/ta-p/2770422). I know of at least three software packages commonly running in VMs that were broken as a result. The CPUID bit didn't solve any problems, and I doubt that any software queries that bit today. As a hypervisor developer, however, it's not up to me to make value judgments on individual CPUID bits. If a bit indicates an innate characteristic of the hardware, it should be passed through. No one is likely to query CPUID.80000021H.EAX[bit 21] today, but if someone does query the bit in the future, they can reasonably expect that WRMSR({FS,GS,KERNELGS}_BASE) is a serializing operation whenever this bit is clear. Therefore, any hypervisor that doesn't pass the bit through is broken. Sadly, this also means that for a heterogenous migration pool, the hypervisor must set this bit in the guest CPUID if it is set on any host in the pool. Yes, that means that the legacy behavior may sometimes be present in a VM that enumerates the CPUID bit, but that's the best we can do. I'm a little surprised at the pushback, TBH. Are you implying that there is some advantage to *not* passing this bit through?
On 10/3/23 19:44, Jim Mattson wrote: > I'm a little surprised at the pushback, TBH. Are you implying that > there is some advantage to *not* passing this bit through? I'm not really trying to push back. I'm honestly just curious. Linux obviously doesn't cat about the bit. So is this for some future Linux or some other OS?
On Tue, Oct 3, 2023 at 8:27 PM Dave Hansen <dave.hansen@intel.com> wrote: > > On 10/3/23 19:44, Jim Mattson wrote: > > I'm a little surprised at the pushback, TBH. Are you implying that > > there is some advantage to *not* passing this bit through? > > I'm not really trying to push back. I'm honestly just curious. Linux > obviously doesn't cat about the bit. So is this for some future Linux > or some other OS? It's not for any particular guest OS. It's just for correctness of the virtual CPU. Pedantically, hardware that enumerates this bit cannot run a guest that doesn't. Pragmatically, it almost certainly doesn't matter. Getting it right is trivial and has no impact on performance or code size, so why not just do it?
On Tue, Oct 03, 2023 at 07:44:51PM -0700, Jim Mattson wrote: > The business of declaring breaking changes to the architectural > specification in a CPUID bit has never made much sense to me. How else should they be expressed then? In some flaky PDF which changes URLs whenever the new corporate CMS gets installed? Or we should do f/m/s matching which doesn't make any sense for VMs? When you think about it, CPUID is the best thing we have. > No one is likely to query CPUID.80000021H.EAX[bit 21] today, but if > someone does query the bit in the future, they can reasonably expect > that WRMSR({FS,GS,KERNELGS}_BASE) is a serializing operation whenever > this bit is clear. Therefore, any hypervisor that doesn't pass the bit > through is broken. Sadly, this also means that for a heterogenous > migration pool, the hypervisor must set this bit in the guest CPUID if > it is set on any host in the pool. Yes, that means that the legacy > behavior may sometimes be present in a VM that enumerates the CPUID > bit, but that's the best we can do. Yes, add this to your commit message. > I'm a little surprised at the pushback, TBH. Are you implying that > there is some advantage to *not* passing this bit through? We don't add stuff which is not worth adding. There has to be *at* *least* some justification for it. Thx.
On Wed, Oct 4, 2023 at 12:59 AM Borislav Petkov <bp@alien8.de> wrote: > > On Tue, Oct 03, 2023 at 07:44:51PM -0700, Jim Mattson wrote: > > The business of declaring breaking changes to the architectural > > specification in a CPUID bit has never made much sense to me. > > How else should they be expressed then? > > In some flaky PDF which changes URLs whenever the new corporate CMS gets > installed? > > Or we should do f/m/s matching which doesn't make any sense for VMs? > > When you think about it, CPUID is the best thing we have. > > > No one is likely to query CPUID.80000021H.EAX[bit 21] today, but if > > someone does query the bit in the future, they can reasonably expect > > that WRMSR({FS,GS,KERNELGS}_BASE) is a serializing operation whenever > > this bit is clear. Therefore, any hypervisor that doesn't pass the bit > > through is broken. Sadly, this also means that for a heterogenous > > migration pool, the hypervisor must set this bit in the guest CPUID if > > it is set on any host in the pool. Yes, that means that the legacy > > behavior may sometimes be present in a VM that enumerates the CPUID > > bit, but that's the best we can do. > > Yes, add this to your commit message. > > > I'm a little surprised at the pushback, TBH. Are you implying that > > there is some advantage to *not* passing this bit through? > > We don't add stuff which is not worth adding. There has to be *at* > *least* some justification for it. Let me propose the following axiom as justification: KVM_GET_SUPPORTED_CPUID must pass through any defeature bits that are set on the host, unless KVM is prepared to emulate the missing feature. Here, a defeature bit is any CPUID bit where a value of '1' indicates the absence of a feature. > Thx. > > -- > Regards/Gruss, > Boris. > > https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Oct 4, 2023 at 12:59 AM Borislav Petkov <bp@alien8.de> wrote: > > On Tue, Oct 03, 2023 at 07:44:51PM -0700, Jim Mattson wrote: > > The business of declaring breaking changes to the architectural > > specification in a CPUID bit has never made much sense to me. > > How else should they be expressed then? > > In some flaky PDF which changes URLs whenever the new corporate CMS gets > installed? > > Or we should do f/m/s matching which doesn't make any sense for VMs? > > When you think about it, CPUID is the best thing we have. Every time a new defeature bit is introduced, it breaks existing hypervisors, because no one can predict ahead of time that these bits have to be passed through. I wonder if we could convince x86 CPU vendors to put all defeature bits under a single leaf, so that we can just set the entire leaf to all 1's in KVM_GET_SUPPORTED_CPUID.
On 10/5/23 09:22, Jim Mattson wrote: > On Wed, Oct 4, 2023 at 12:59 AM Borislav Petkov <bp@alien8.de> wrote: >> On Tue, Oct 03, 2023 at 07:44:51PM -0700, Jim Mattson wrote: >>> The business of declaring breaking changes to the architectural >>> specification in a CPUID bit has never made much sense to me. >> How else should they be expressed then? >> >> In some flaky PDF which changes URLs whenever the new corporate CMS gets >> installed? >> >> Or we should do f/m/s matching which doesn't make any sense for VMs? >> >> When you think about it, CPUID is the best thing we have. > Every time a new defeature bit is introduced, it breaks existing > hypervisors, because no one can predict ahead of time that these bits > have to be passed through. > > I wonder if we could convince x86 CPU vendors to put all defeature > bits under a single leaf, so that we can just set the entire leaf to > all 1's in KVM_GET_SUPPORTED_CPUID. I hope I'm not throwing stones from a glass house here... But I'm struggling to think of cases where Intel has read-only "defeature bits" like this one. There are certainly things like MSR_IA32_MISC_ENABLE_FAST_STRING that can be toggled, but read-only indicators of a departure from established architecture seems ... suboptimal. It's arguable that TDX changed a bunch of architecture like causing exceptions on CPUID and MSRs that never caused exceptions before and _that_ constitutes a defeature. But that's the least of the problems for a TDX VM. :) (Seriously, I'm not trying to shame Intel's x86 fellow travelers here, just trying to make sure I'm not missing something).
On 10/4/23 09:58, Borislav Petkov wrote: > On Tue, Oct 03, 2023 at 07:44:51PM -0700, Jim Mattson wrote: >> The business of declaring breaking changes to the architectural >> specification in a CPUID bit has never made much sense to me. > How else should they be expressed then? > > In some flaky PDF which changes URLs whenever the new corporate CMS gets > installed? > > Or we should do f/m/s matching which doesn't make any sense for VMs? Nothing *needs* to be done other than documenting this retroactive change to what constitutes architectural behavior. It's not a CPUID that can be queried to change behavior; the user can use CPUID to diagnose that something has broken, but the broken program cannot know in the first place that the CPUID bit exists. I agree with Jim that it would be nice to have some bits from Intel, and some bits from AMD, that current processors always return as 1. Future processors can change those to 0 as desired. Intel did something similar with VMX. They have a bunch of bits for which we don't know the meaning, but we know it is something that "right now always causes vmexits". Even if in the future you might be able to disable it, the polarity of the bit is the same as for all other vmexit controls. Paolo
On Thu, Oct 5, 2023 at 9:35 AM Dave Hansen <dave.hansen@intel.com> wrote: > > On 10/5/23 09:22, Jim Mattson wrote: > > On Wed, Oct 4, 2023 at 12:59 AM Borislav Petkov <bp@alien8.de> wrote: > >> On Tue, Oct 03, 2023 at 07:44:51PM -0700, Jim Mattson wrote: > >>> The business of declaring breaking changes to the architectural > >>> specification in a CPUID bit has never made much sense to me. > >> How else should they be expressed then? > >> > >> In some flaky PDF which changes URLs whenever the new corporate CMS gets > >> installed? > >> > >> Or we should do f/m/s matching which doesn't make any sense for VMs? > >> > >> When you think about it, CPUID is the best thing we have. > > Every time a new defeature bit is introduced, it breaks existing > > hypervisors, because no one can predict ahead of time that these bits > > have to be passed through. > > > > I wonder if we could convince x86 CPU vendors to put all defeature > > bits under a single leaf, so that we can just set the entire leaf to > > all 1's in KVM_GET_SUPPORTED_CPUID. > > I hope I'm not throwing stones from a glass house here... > > But I'm struggling to think of cases where Intel has read-only > "defeature bits" like this one. There are certainly things like > MSR_IA32_MISC_ENABLE_FAST_STRING that can be toggled, but read-only > indicators of a departure from established architecture seems ... > suboptimal. > > It's arguable that TDX changed a bunch of architecture like causing > exceptions on CPUID and MSRs that never caused exceptions before and > _that_ constitutes a defeature. But that's the least of the problems > for a TDX VM. :) > > (Seriously, I'm not trying to shame Intel's x86 fellow travelers here, > just trying to make sure I'm not missing something). Intel's defeature bits that I know of are: CPUID.(EAX=7,ECX=0):EBX[bit 13] (Haswell) - "Deprecates FPU CS and FPU DS values if 1." CPUID.(EAX=7,ECX=0):EBX[bit 6] (Skylake) - "FDP_EXCPTN_ONLY. x87 FPU Data Pointer updated only on x87 exceptions if 1."
On 10/5/23 18:41, Jim Mattson wrote: >> I hope I'm not throwing stones from a glass house here... >> >> But I'm struggling to think of cases where Intel has read-only >> "defeature bits" like this one. There are certainly things like >> MSR_IA32_MISC_ENABLE_FAST_STRING that can be toggled, but read-only >> indicators of a departure from established architecture seems ... >> suboptimal. >> >> It's arguable that TDX changed a bunch of architecture like causing >> exceptions on CPUID and MSRs that never caused exceptions before and >> _that_ constitutes a defeature. But that's the least of the problems >> for a TDX VM. 😄 >> >> (Seriously, I'm not trying to shame Intel's x86 fellow travelers here, >> just trying to make sure I'm not missing something). > Intel's defeature bits that I know of are: > > CPUID.(EAX=7,ECX=0):EBX[bit 13] (Haswell) - "Deprecates FPU CS and FPU > DS values if 1." > CPUID.(EAX=7,ECX=0):EBX[bit 6] (Skylake) - "FDP_EXCPTN_ONLY. x87 FPU > Data Pointer updated only on x87 exceptions if 1." And for AMD, to get the full landscape: - CPUID(EAX=8000_0021h).EAX[0], "Processor ignores nested data breakpoints" - CPUID(EAX=8000_0021h).EAX[9], "SMM_CTL MSR is not present" (the MSR used to be always present if SVM is available) AMD had a few processors without X86_BUG_NULL_SEG that do not expose X86_FEATURE_NULL_SEL_CLR_BASE, but that's conservative so not a big deal. Paolo
On Thu, Oct 5, 2023 at 9:39 AM Paolo Bonzini <pbonzini@redhat.com> wrote: > I agree with Jim that it would be nice to have some bits from Intel, and > some bits from AMD, that current processors always return as 1. Future > processors can change those to 0 as desired. That's not quite what I meant. Today, hypervisors will not pass through a non-zero CPUID bit that they don't know the definition of. This makes sense for positive features, and for multi-bit fields. I'm suggesting a leaf devoted to single bit negative features. If a bit is set in hardware, it means that something has been taken away. Hypervisors don't need to know exactly what was taken away. For this leaf only, hypervisors will always pass through a non-zero bit, even if they have know idea what it means.
On 10/5/23 19:06, Jim Mattson wrote: > On Thu, Oct 5, 2023 at 9:39 AM Paolo Bonzini<pbonzini@redhat.com> wrote: > >> I agree with Jim that it would be nice to have some bits from Intel, and >> some bits from AMD, that current processors always return as 1. Future >> processors can change those to 0 as desired. > That's not quite what I meant. > > I'm suggesting a leaf devoted to single bit negative features. If a > bit is set in hardware, it means that something has been taken away. > Hypervisors don't need to know exactly what was taken away. For this > leaf only, hypervisors will always pass through a non-zero bit, even > if they have know idea what it means. Understood, but I'm suggesting that these might even have the right polarity: if a bit is set it means that something is there and might not in the future, even if we don't know exactly what. We can pass through the bit, we can AND bits across the migration pool to define what to pass to the guest, we can force-set the leaves to zero (feature removed). Either way, the point is to group future defeatures together. That said, these bits are only for documentation/debugging purposes anyway. So I like the idea because it would educate the architects about this issue, more than because it is actually useful... Paolo
On Thu, Oct 5, 2023 at 10:14 AM Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 10/5/23 19:06, Jim Mattson wrote: > > On Thu, Oct 5, 2023 at 9:39 AM Paolo Bonzini<pbonzini@redhat.com> wrote: > > > >> I agree with Jim that it would be nice to have some bits from Intel, and > >> some bits from AMD, that current processors always return as 1. Future > >> processors can change those to 0 as desired. > > That's not quite what I meant. > > > > I'm suggesting a leaf devoted to single bit negative features. If a > > bit is set in hardware, it means that something has been taken away. > > Hypervisors don't need to know exactly what was taken away. For this > > leaf only, hypervisors will always pass through a non-zero bit, even > > if they have know idea what it means. > > Understood, but I'm suggesting that these might even have the right > polarity: if a bit is set it means that something is there and might not > in the future, even if we don't know exactly what. We can pass through > the bit, we can AND bits across the migration pool to define what to > pass to the guest, we can force-set the leaves to zero (feature > removed). Either way, the point is to group future defeatures together. Oh, yeah. Your suggestion is better. :)
On 10/5/23 09:41, Jim Mattson wrote: >> >> But I'm struggling to think of cases where Intel has read-only >> "defeature bits" like this one. There are certainly things like >> MSR_IA32_MISC_ENABLE_FAST_STRING that can be toggled, but read-only >> indicators of a departure from established architecture seems ... >> suboptimal. ... > CPUID.(EAX=7,ECX=0):EBX[bit 13] (Haswell) - "Deprecates FPU CS and FPU > DS values if 1." > CPUID.(EAX=7,ECX=0):EBX[bit 6] (Skylake) - "FDP_EXCPTN_ONLY. x87 FPU > Data Pointer updated only on x87 exceptions if 1." Thanks! Trying to group these does make sense to me. I don't think people take architecture breakage lightly, but I certainly never considered that it might, for instance, be important enough to create a new VM migration pool. I'll try to keep an eye out for these.
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 58cb9495e40f..b53951c83d1d 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -443,6 +443,7 @@ /* AMD-defined Extended Feature 2 EAX, CPUID level 0x80000021 (EAX), word 20 */ #define X86_FEATURE_NO_NESTED_DATA_BP (20*32+ 0) /* "" No Nested Data Breakpoints */ +#define X86_FEATURE_BASES_NON_SERIAL (20*32+ 1) /* "" FSBASE, GSBASE, and KERNELGSBASE are non-serializing */ #define X86_FEATURE_LFENCE_RDTSC (20*32+ 2) /* "" LFENCE always serializing / synchronizes RDTSC */ #define X86_FEATURE_NULL_SEL_CLR_BASE (20*32+ 6) /* "" Null Selector Clears Base */ #define X86_FEATURE_AUTOIBRS (20*32+ 8) /* "" Automatic IBRS */ diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 0544e30b4946..5e776e8619be 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -761,7 +761,8 @@ void kvm_set_cpu_caps(void) kvm_cpu_cap_mask(CPUID_8000_0021_EAX, F(NO_NESTED_DATA_BP) | F(LFENCE_RDTSC) | 0 /* SmmPgCfgLock */ | - F(NULL_SEL_CLR_BASE) | F(AUTOIBRS) | 0 /* PrefetchCtlMsr */ + F(NULL_SEL_CLR_BASE) | F(AUTOIBRS) | 0 /* PrefetchCtlMsr */ | + F(BASES_NON_SERIAL) ); if (cpu_feature_enabled(X86_FEATURE_SRSO_NO))