Message ID | 20231020-delay-verw-v1-6-cff54096326d@linux.intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:2010:b0:403:3b70:6f57 with SMTP id fe16csp1315853vqb; Fri, 20 Oct 2023 13:46:32 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGeJ0M2npZnlZqRYXKedb4T64J5ZS5JMUc4VfZkocl2JhruHWOoVLUJG9bd2pfmCgrEnmPX X-Received: by 2002:a05:6358:52c8:b0:166:d637:a2db with SMTP id z8-20020a05635852c800b00166d637a2dbmr3786648rwz.9.1697834792074; Fri, 20 Oct 2023 13:46:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697834792; cv=none; d=google.com; s=arc-20160816; b=kfKXMcGURBS5s8OO/NG5oB/8Gz1Sy/YrqCQlCm90K93z3Ta/T3F9ZR/zPeZsTArj3s O/MBHM0n71kLR2zAdxTIfIJleX3Mv0E8SjZ8/NJOyDZFZo/yDkGfZiivE6jN9pJOLyqK NjK+EJl0limH77kzbNCre7x8K5T/f5Kc+SsANLW0pQGzBZL+B6pJzV30lX5D+8I2jmE9 KJzRTkDIOtVf0n7eA5fCEAkSqFoaKsFXNXuUK2+VWEHBOnUHdU/DZm7xSRZXok4Xj7Ct 2H+Iax68nm2tXeC+jhn37UhxhR9a2EGjb5Cwv9kECp9XL2E7hET9McymeX2J4LBEiz/Q woXg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=0+7uwXB+8WqXbDyxiJ57dmAUbeDIaySL2qn3LxaZELU=; fh=lbJsPO+ZRDN9baDq/BZnnNH6CGOpZtNT0YCi3hapTHQ=; b=UmZverm4kSRT1qcxbziI3C6H0bsPqVKkBCf98xkRHg0jhdp9/PMg8RjVcFE//t3xIw Kofv7LSSrEoq5u1FtP8a6FmLkFsi1gvqKGRN3zWCd77cm5uO4M1QBO++uSohckmJdyDY yeuuCk12s59Kv+TlMcq+HkYUOVuJinRevYU5a5+fU4jRSGWnZG8VNYy+LtqIaZWwLKRM Kez+mW5ODRBurrLr0Qa65LqY/F548n4j7sN37sopwsrSEK2DVPy8fEmEnd1JSqfd2Tfo 3JoWfup3vw+PLIhMb9oxonJFqO6f0MZvhnYtpVVm26ijSdbduzG6BJJbUirJ8Ec7mo7a q/0w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="F/rxuNem"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from fry.vger.email (fry.vger.email. [23.128.96.38]) by mx.google.com with ESMTPS id 199-20020a6301d0000000b00578af1e2f3csi2554272pgb.426.2023.10.20.13.46.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 Oct 2023 13:46:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) client-ip=23.128.96.38; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="F/rxuNem"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id 8821D81C179C; Fri, 20 Oct 2023 13:46:29 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231338AbjJTUp6 (ORCPT <rfc822;a1648639935@gmail.com> + 26 others); Fri, 20 Oct 2023 16:45:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40002 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231160AbjJTUpp (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 20 Oct 2023 16:45:45 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 264701717; Fri, 20 Oct 2023 13:45:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1697834732; x=1729370732; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=efcm4OHP2Uz572Un+YAEz/A+CIucgALlZ4w/Rew7Rqg=; b=F/rxuNemKnaWdJ9lk7HdzRn0wZJOjnk0pEjMJE2qTWjQYYbfxpO9ni6z stLc0qUV/lcjFfZubfAZirqtcbolK1gCJ8aVvhfqHc+A5z1QzGruG6RFr B7HWY9JJSzKbDI36O7KjjwaFlg7zp5X6IFNnks6c24/xD9z34v6HrdH6R FK01NAw09dGSJp1dEZut39X7CGRJDRdMaDc4v4YzoOM9XqrNEOUSYqshw ZECUt0iyK9UUuLd2E0WsRiSziaKCYsdIqdZpOlUVuybAkmVETWrUetOR8 ZxnpYG0JqA57GH6bnBV5i4cvBDdMpQ0KKot8/tyjr3OgyP/uzKWVJSgVl w==; X-IronPort-AV: E=McAfee;i="6600,9927,10869"; a="365909446" X-IronPort-AV: E=Sophos;i="6.03,239,1694761200"; d="scan'208";a="365909446" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Oct 2023 13:45:31 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10869"; a="848201802" X-IronPort-AV: E=Sophos;i="6.03,239,1694761200"; d="scan'208";a="848201802" Received: from hkchanda-mobl.amr.corp.intel.com (HELO desk) ([10.209.90.113]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Oct 2023 13:45:30 -0700 Date: Fri, 20 Oct 2023 13:45:29 -0700 From: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> To: Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>, x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>, Peter Zijlstra <peterz@infradead.org>, Josh Poimboeuf <jpoimboe@kernel.org>, Andy Lutomirski <luto@kernel.org>, Jonathan Corbet <corbet@lwn.net>, Sean Christopherson <seanjc@google.com>, Paolo Bonzini <pbonzini@redhat.com>, tony.luck@intel.com, ak@linux.intel.com, tim.c.chen@linux.intel.com Cc: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, Alyssa Milburn <alyssa.milburn@linux.intel.com>, Daniel Sneddon <daniel.sneddon@linux.intel.com>, antonio.gomez.iglesias@linux.intel.com, Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Subject: [PATCH 6/6] KVM: VMX: Move VERW closer to VMentry for MDS mitigation Message-ID: <20231020-delay-verw-v1-6-cff54096326d@linux.intel.com> X-Mailer: b4 0.12.3 References: <20231020-delay-verw-v1-0-cff54096326d@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20231020-delay-verw-v1-0-cff54096326d@linux.intel.com> X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Fri, 20 Oct 2023 13:46:29 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780308814888062778 X-GMAIL-MSGID: 1780308814888062778 |
Series |
Delay VERW
|
|
Commit Message
Pawan Gupta
Oct. 20, 2023, 8:45 p.m. UTC
During VMentry VERW is executed to mitigate MDS. After VERW, any memory
access like register push onto stack may put host data in MDS affected
CPU buffers. A guest can then use MDS to sample host data.
Although likelihood of secrets surviving in registers at current VERW
callsite is less, but it can't be ruled out. Harden the MDS mitigation
by moving the VERW mitigation late in VMentry path.
Note that VERW for MMIO Stale Data mitigation is unchanged because of
the complexity of per-guest conditional VERW which is not easy to handle
that late in asm with no GPRs available. If the CPU is also affected by
MDS, VERW is unconditionally executed late in asm regardless of guest
having MMIO access.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
arch/x86/kvm/vmx/vmenter.S | 9 +++++++++
arch/x86/kvm/vmx/vmx.c | 10 +++++++---
2 files changed, 16 insertions(+), 3 deletions(-)
Comments
On Fri, Oct 20, 2023, Pawan Gupta wrote: > During VMentry VERW is executed to mitigate MDS. After VERW, any memory > access like register push onto stack may put host data in MDS affected > CPU buffers. A guest can then use MDS to sample host data. > > Although likelihood of secrets surviving in registers at current VERW > callsite is less, but it can't be ruled out. Harden the MDS mitigation > by moving the VERW mitigation late in VMentry path. > > Note that VERW for MMIO Stale Data mitigation is unchanged because of > the complexity of per-guest conditional VERW which is not easy to handle > that late in asm with no GPRs available. If the CPU is also affected by > MDS, VERW is unconditionally executed late in asm regardless of guest > having MMIO access. > > Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> > --- > arch/x86/kvm/vmx/vmenter.S | 9 +++++++++ > arch/x86/kvm/vmx/vmx.c | 10 +++++++--- > 2 files changed, 16 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S > index be275a0410a8..efa716cf4727 100644 > --- a/arch/x86/kvm/vmx/vmenter.S > +++ b/arch/x86/kvm/vmx/vmenter.S > @@ -1,6 +1,7 @@ > /* SPDX-License-Identifier: GPL-2.0 */ > #include <linux/linkage.h> > #include <asm/asm.h> > +#include <asm/segment.h> > #include <asm/bitsperlong.h> > #include <asm/kvm_vcpu_regs.h> > #include <asm/nospec-branch.h> > @@ -31,6 +32,8 @@ > #define VCPU_R15 __VCPU_REGS_R15 * WORD_SIZE > #endif > > +#define GUEST_CLEAR_CPU_BUFFERS USER_CLEAR_CPU_BUFFERS > + > .macro VMX_DO_EVENT_IRQOFF call_insn call_target > /* > * Unconditionally create a stack frame, getting the correct RSP on the > @@ -177,10 +180,16 @@ SYM_FUNC_START(__vmx_vcpu_run) > * the 'vmx_vmexit' label below. > */ > .Lvmresume: > + /* Mitigate CPU data sampling attacks .e.g. MDS */ > + GUEST_CLEAR_CPU_BUFFERS I have a very hard time believing that it's worth duplicating the mitigation for VMRESUME vs. VMLAUNCH just to land it after a Jcc. 3b1: 48 8b 00 mov (%rax),%rax 3b4: 74 18 je 3ce <__vmx_vcpu_run+0x9e> 3b6: eb 0e jmp 3c6 <__vmx_vcpu_run+0x96> 3b8: 0f 00 2d 05 00 00 00 verw 0x5(%rip) # 3c4 <__vmx_vcpu_run+0x94> 3bf: 0f 1f 80 00 00 18 00 nopl 0x180000(%rax) 3c6: 0f 01 c3 vmresume 3c9: e9 c9 00 00 00 jmp 497 <vmx_vmexit+0xa7> 3ce: eb 0e jmp 3de <__vmx_vcpu_run+0xae> 3d0: 0f 00 2d 05 00 00 00 verw 0x5(%rip) # 3dc <__vmx_vcpu_run+0xac> 3d7: 0f 1f 80 00 00 18 00 nopl 0x180000(%rax) 3de: 0f 01 c2 vmlaunch Also, would it'd be better to put the NOP first? Or even better, out of line? It'd be quite hilarious if the CPU pulled a stupid and speculated on the operand of the NOP, i.e. if the user/guest controlled RAX allowed for pulling in data after the VERW.
On Fri, Oct 20, 2023 at 03:55:07PM -0700, Sean Christopherson wrote: > On Fri, Oct 20, 2023, Pawan Gupta wrote: > > During VMentry VERW is executed to mitigate MDS. After VERW, any memory > > access like register push onto stack may put host data in MDS affected > > CPU buffers. A guest can then use MDS to sample host data. > > > > Although likelihood of secrets surviving in registers at current VERW > > callsite is less, but it can't be ruled out. Harden the MDS mitigation > > by moving the VERW mitigation late in VMentry path. > > > > Note that VERW for MMIO Stale Data mitigation is unchanged because of > > the complexity of per-guest conditional VERW which is not easy to handle > > that late in asm with no GPRs available. If the CPU is also affected by > > MDS, VERW is unconditionally executed late in asm regardless of guest > > having MMIO access. > > > > Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> > > --- > > arch/x86/kvm/vmx/vmenter.S | 9 +++++++++ > > arch/x86/kvm/vmx/vmx.c | 10 +++++++--- > > 2 files changed, 16 insertions(+), 3 deletions(-) > > > > diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S > > index be275a0410a8..efa716cf4727 100644 > > --- a/arch/x86/kvm/vmx/vmenter.S > > +++ b/arch/x86/kvm/vmx/vmenter.S > > @@ -1,6 +1,7 @@ > > /* SPDX-License-Identifier: GPL-2.0 */ > > #include <linux/linkage.h> > > #include <asm/asm.h> > > +#include <asm/segment.h> > > #include <asm/bitsperlong.h> > > #include <asm/kvm_vcpu_regs.h> > > #include <asm/nospec-branch.h> > > @@ -31,6 +32,8 @@ > > #define VCPU_R15 __VCPU_REGS_R15 * WORD_SIZE > > #endif > > > > +#define GUEST_CLEAR_CPU_BUFFERS USER_CLEAR_CPU_BUFFERS > > + > > .macro VMX_DO_EVENT_IRQOFF call_insn call_target > > /* > > * Unconditionally create a stack frame, getting the correct RSP on the > > @@ -177,10 +180,16 @@ SYM_FUNC_START(__vmx_vcpu_run) > > * the 'vmx_vmexit' label below. > > */ > > .Lvmresume: > > + /* Mitigate CPU data sampling attacks .e.g. MDS */ > > + GUEST_CLEAR_CPU_BUFFERS > > I have a very hard time believing that it's worth duplicating the mitigation > for VMRESUME vs. VMLAUNCH just to land it after a Jcc. VERW modifies the flags, so it either needs to be after Jcc or we push/pop flags that adds 2 extra memory operations. Please let me know if there is a better option. > Also, would it'd be better to put the NOP first? Or even better, out of line? > It'd be quite hilarious if the CPU pulled a stupid and speculated on the operand > of the NOP, i.e. if the user/guest controlled RAX allowed for pulling in data > after the VERW. I did confirm with CPU architects that NOP operand won't be dereferenced, even speculatively. But yes, even if it did, moving NOP first does take care of it.
On Fri, Oct 20, 2023, Pawan Gupta wrote: > On Fri, Oct 20, 2023 at 03:55:07PM -0700, Sean Christopherson wrote: > > On Fri, Oct 20, 2023, Pawan Gupta wrote: > > > During VMentry VERW is executed to mitigate MDS. After VERW, any memory > > > access like register push onto stack may put host data in MDS affected > > > CPU buffers. A guest can then use MDS to sample host data. > > > > > > Although likelihood of secrets surviving in registers at current VERW > > > callsite is less, but it can't be ruled out. Harden the MDS mitigation > > > by moving the VERW mitigation late in VMentry path. > > > > > > Note that VERW for MMIO Stale Data mitigation is unchanged because of > > > the complexity of per-guest conditional VERW which is not easy to handle > > > that late in asm with no GPRs available. If the CPU is also affected by > > > MDS, VERW is unconditionally executed late in asm regardless of guest > > > having MMIO access. > > > > > > Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> > > > --- > > > arch/x86/kvm/vmx/vmenter.S | 9 +++++++++ > > > arch/x86/kvm/vmx/vmx.c | 10 +++++++--- > > > 2 files changed, 16 insertions(+), 3 deletions(-) > > > > > > diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S > > > index be275a0410a8..efa716cf4727 100644 > > > --- a/arch/x86/kvm/vmx/vmenter.S > > > +++ b/arch/x86/kvm/vmx/vmenter.S > > > @@ -1,6 +1,7 @@ > > > /* SPDX-License-Identifier: GPL-2.0 */ > > > #include <linux/linkage.h> > > > #include <asm/asm.h> > > > +#include <asm/segment.h> > > > #include <asm/bitsperlong.h> > > > #include <asm/kvm_vcpu_regs.h> > > > #include <asm/nospec-branch.h> > > > @@ -31,6 +32,8 @@ > > > #define VCPU_R15 __VCPU_REGS_R15 * WORD_SIZE > > > #endif > > > > > > +#define GUEST_CLEAR_CPU_BUFFERS USER_CLEAR_CPU_BUFFERS > > > + > > > .macro VMX_DO_EVENT_IRQOFF call_insn call_target > > > /* > > > * Unconditionally create a stack frame, getting the correct RSP on the > > > @@ -177,10 +180,16 @@ SYM_FUNC_START(__vmx_vcpu_run) > > > * the 'vmx_vmexit' label below. > > > */ > > > .Lvmresume: > > > + /* Mitigate CPU data sampling attacks .e.g. MDS */ > > > + GUEST_CLEAR_CPU_BUFFERS > > > > I have a very hard time believing that it's worth duplicating the mitigation > > for VMRESUME vs. VMLAUNCH just to land it after a Jcc. > > VERW modifies the flags, so it either needs to be after Jcc or we > push/pop flags that adds 2 extra memory operations. Please let me know > if there is a better option. Ugh, I assumed that piggybacking VERW overrode the original behavior entirely, I didn't realize it sacrifices EFLAGS.ZF on the altar of mitigations. Luckily, this is easy to solve now that VMRESUME vs. VMLAUNCH uses a flag instead of a dedicated bool. From: Sean Christopherson <seanjc@google.com> Date: Mon, 23 Oct 2023 07:44:35 -0700 Subject: [PATCH] KVM: VMX: Use BT+JNC, i.e. EFLAGS.CF to select VMRESUME vs. VMLAUNCH Use EFLAGS.CF instead of EFLAGS.ZF to track whether to use VMRESUME versus VMLAUNCH. Freeing up EFLAGS.ZF will allow doing VERW, which clobbers ZF, for MDS mitigations as late as possible without needing to duplicate VERW for both paths. Signed-off-by: Sean Christopherson <seanjc@google.com> --- arch/x86/kvm/vmx/run_flags.h | 7 +++++-- arch/x86/kvm/vmx/vmenter.S | 6 +++--- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/vmx/run_flags.h b/arch/x86/kvm/vmx/run_flags.h index edc3f16cc189..6a9bfdfbb6e5 100644 --- a/arch/x86/kvm/vmx/run_flags.h +++ b/arch/x86/kvm/vmx/run_flags.h @@ -2,7 +2,10 @@ #ifndef __KVM_X86_VMX_RUN_FLAGS_H #define __KVM_X86_VMX_RUN_FLAGS_H -#define VMX_RUN_VMRESUME (1 << 0) -#define VMX_RUN_SAVE_SPEC_CTRL (1 << 1) +#define VMX_RUN_VMRESUME_SHIFT 0 +#define VMX_RUN_SAVE_SPEC_CTRL_SHIFT 1 + +#define VMX_RUN_VMRESUME BIT(VMX_RUN_VMRESUME_SHIFT) +#define VMX_RUN_SAVE_SPEC_CTRL BIT(VMX_RUN_SAVE_SPEC_CTRL_SHIFT) #endif /* __KVM_X86_VMX_RUN_FLAGS_H */ diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S index be275a0410a8..b3b13ec04bac 100644 --- a/arch/x86/kvm/vmx/vmenter.S +++ b/arch/x86/kvm/vmx/vmenter.S @@ -139,7 +139,7 @@ SYM_FUNC_START(__vmx_vcpu_run) mov (%_ASM_SP), %_ASM_AX /* Check if vmlaunch or vmresume is needed */ - test $VMX_RUN_VMRESUME, %ebx + bt $VMX_RUN_VMRESUME_SHIFT, %ebx /* Load guest registers. Don't clobber flags. */ mov VCPU_RCX(%_ASM_AX), %_ASM_CX @@ -161,8 +161,8 @@ SYM_FUNC_START(__vmx_vcpu_run) /* Load guest RAX. This kills the @regs pointer! */ mov VCPU_RAX(%_ASM_AX), %_ASM_AX - /* Check EFLAGS.ZF from 'test VMX_RUN_VMRESUME' above */ - jz .Lvmlaunch + /* Check EFLAGS.CF from the VMX_RUN_VMRESUME bit test above. */ + jnc .Lvmlaunch /* * After a successful VMRESUME/VMLAUNCH, control flow "magically" base-commit: ec2f1daad460c6201338dae606466220ccaa96d5 --
On Mon, Oct 23, 2023 at 07:58:57AM -0700, Sean Christopherson wrote: > On Fri, Oct 20, 2023, Pawan Gupta wrote: > > On Fri, Oct 20, 2023 at 03:55:07PM -0700, Sean Christopherson wrote: > > > On Fri, Oct 20, 2023, Pawan Gupta wrote: > > > > During VMentry VERW is executed to mitigate MDS. After VERW, any memory > > > > access like register push onto stack may put host data in MDS affected > > > > CPU buffers. A guest can then use MDS to sample host data. > > > > > > > > Although likelihood of secrets surviving in registers at current VERW > > > > callsite is less, but it can't be ruled out. Harden the MDS mitigation > > > > by moving the VERW mitigation late in VMentry path. > > > > > > > > Note that VERW for MMIO Stale Data mitigation is unchanged because of > > > > the complexity of per-guest conditional VERW which is not easy to handle > > > > that late in asm with no GPRs available. If the CPU is also affected by > > > > MDS, VERW is unconditionally executed late in asm regardless of guest > > > > having MMIO access. > > > > > > > > Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> > > > > --- > > > > arch/x86/kvm/vmx/vmenter.S | 9 +++++++++ > > > > arch/x86/kvm/vmx/vmx.c | 10 +++++++--- > > > > 2 files changed, 16 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S > > > > index be275a0410a8..efa716cf4727 100644 > > > > --- a/arch/x86/kvm/vmx/vmenter.S > > > > +++ b/arch/x86/kvm/vmx/vmenter.S > > > > @@ -1,6 +1,7 @@ > > > > /* SPDX-License-Identifier: GPL-2.0 */ > > > > #include <linux/linkage.h> > > > > #include <asm/asm.h> > > > > +#include <asm/segment.h> > > > > #include <asm/bitsperlong.h> > > > > #include <asm/kvm_vcpu_regs.h> > > > > #include <asm/nospec-branch.h> > > > > @@ -31,6 +32,8 @@ > > > > #define VCPU_R15 __VCPU_REGS_R15 * WORD_SIZE > > > > #endif > > > > > > > > +#define GUEST_CLEAR_CPU_BUFFERS USER_CLEAR_CPU_BUFFERS > > > > + > > > > .macro VMX_DO_EVENT_IRQOFF call_insn call_target > > > > /* > > > > * Unconditionally create a stack frame, getting the correct RSP on the > > > > @@ -177,10 +180,16 @@ SYM_FUNC_START(__vmx_vcpu_run) > > > > * the 'vmx_vmexit' label below. > > > > */ > > > > .Lvmresume: > > > > + /* Mitigate CPU data sampling attacks .e.g. MDS */ > > > > + GUEST_CLEAR_CPU_BUFFERS > > > > > > I have a very hard time believing that it's worth duplicating the mitigation > > > for VMRESUME vs. VMLAUNCH just to land it after a Jcc. > > > > VERW modifies the flags, so it either needs to be after Jcc or we > > push/pop flags that adds 2 extra memory operations. Please let me know > > if there is a better option. > > Ugh, I assumed that piggybacking VERW overrode the original behavior entirely, I > didn't realize it sacrifices EFLAGS.ZF on the altar of mitigations. > > Luckily, this is easy to solve now that VMRESUME vs. VMLAUNCH uses a flag instead > of a dedicated bool. Thats great. > From: Sean Christopherson <seanjc@google.com> > Date: Mon, 23 Oct 2023 07:44:35 -0700 > Subject: [PATCH] KVM: VMX: Use BT+JNC, i.e. EFLAGS.CF to select VMRESUME vs. > VMLAUNCH > > Use EFLAGS.CF instead of EFLAGS.ZF to track whether to use VMRESUME versus > VMLAUNCH. Freeing up EFLAGS.ZF will allow doing VERW, which clobbers ZF, > for MDS mitigations as late as possible without needing to duplicate VERW > for both paths. > > Signed-off-by: Sean Christopherson <seanjc@google.com> Thanks for the patch, I will include it in the next revision.
On Fri, Oct 20, 2023 at 01:45:29PM -0700, Pawan Gupta wrote: > @@ -31,6 +32,8 @@ > #define VCPU_R15 __VCPU_REGS_R15 * WORD_SIZE > #endif > > +#define GUEST_CLEAR_CPU_BUFFERS USER_CLEAR_CPU_BUFFERS I don't think the extra macro buys anything here.
On Mon, Oct 23, 2023 at 11:56:43AM -0700, Josh Poimboeuf wrote: > On Fri, Oct 20, 2023 at 01:45:29PM -0700, Pawan Gupta wrote: > > @@ -31,6 +32,8 @@ > > #define VCPU_R15 __VCPU_REGS_R15 * WORD_SIZE > > #endif > > > > +#define GUEST_CLEAR_CPU_BUFFERS USER_CLEAR_CPU_BUFFERS > > I don't think the extra macro buys anything here. Using USER_CLEAR_CPU_BUFFERS in the VMentry path didn't feel right. But, after "USER_" is gone as per your comment on 2/6 patch, GUEST_CLEAR_CPU_BUFFERS can also go away.
diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S index be275a0410a8..efa716cf4727 100644 --- a/arch/x86/kvm/vmx/vmenter.S +++ b/arch/x86/kvm/vmx/vmenter.S @@ -1,6 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 */ #include <linux/linkage.h> #include <asm/asm.h> +#include <asm/segment.h> #include <asm/bitsperlong.h> #include <asm/kvm_vcpu_regs.h> #include <asm/nospec-branch.h> @@ -31,6 +32,8 @@ #define VCPU_R15 __VCPU_REGS_R15 * WORD_SIZE #endif +#define GUEST_CLEAR_CPU_BUFFERS USER_CLEAR_CPU_BUFFERS + .macro VMX_DO_EVENT_IRQOFF call_insn call_target /* * Unconditionally create a stack frame, getting the correct RSP on the @@ -177,10 +180,16 @@ SYM_FUNC_START(__vmx_vcpu_run) * the 'vmx_vmexit' label below. */ .Lvmresume: + /* Mitigate CPU data sampling attacks .e.g. MDS */ + GUEST_CLEAR_CPU_BUFFERS + vmresume jmp .Lvmfail .Lvmlaunch: + /* Mitigate CPU data sampling attacks .e.g. MDS */ + GUEST_CLEAR_CPU_BUFFERS + vmlaunch jmp .Lvmfail diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index c16297a49e4d..e3d0eda292c3 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7226,13 +7226,17 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu, guest_state_enter_irqoff(); - /* L1D Flush includes CPU buffer clear to mitigate MDS */ + /* + * L1D Flush includes CPU buffer clear to mitigate MDS, but VERW + * mitigation for MDS is done late in VMentry and is still executed + * inspite of L1D Flush. This is because an extra VERW should not matter + * much after the big hammer L1D Flush. + */ if (static_branch_unlikely(&vmx_l1d_should_flush)) vmx_l1d_flush(vcpu); - else if (cpu_feature_enabled(X86_FEATURE_USER_CLEAR_CPU_BUF)) - mds_clear_cpu_buffers(); else if (static_branch_unlikely(&mmio_stale_data_clear) && kvm_arch_has_assigned_device(vcpu->kvm)) + /* MMIO mitigation is mutually exclusive to MDS mitigation later in asm */ mds_clear_cpu_buffers(); vmx_disable_fb_clear(vmx);