Message ID | 20231025-delay-verw-v3-1-52663677ee35@linux.intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:d641:0:b0:403:3b70:6f57 with SMTP id cy1csp232032vqb; Wed, 25 Oct 2023 13:54:58 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEtJSHV1NXB5L14Uutb3t5SRi0xJV7Zty62c+T6abaWQ8Nhw+LjnAb4rpcdaLwHgLDOcBr4 X-Received: by 2002:a05:6830:1b79:b0:6bc:8afe:8a15 with SMTP id d25-20020a0568301b7900b006bc8afe8a15mr13980788ote.38.1698267298221; Wed, 25 Oct 2023 13:54:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698267298; cv=none; d=google.com; s=arc-20160816; b=sRu5O74vwxWBAQh/9cF5jWgIPGVbkVVRrU0yZHyJHnEteuMDrtXrSYuf90pu66jWxj y0CZ/3N8q/4nuHjDHvuPjTXIrjmjngqJOkourSJkGvx9VNVCEtTMwTZaqD1uoB8DoWy8 EkHlG5bqyObSTeGr1Ya0RTEMN2b4z04cYGpkvq9H+jzQ6RduVpxDQc5j2cTPxVgauO5W 9bjTUTdx/ods/WHKmCWNTm/TQW0ufdp91AFAhRsgF0gmSnMw7ZEqVsJZyRhV6Kc1RH9u Zdv6hBqC6RF3YefxcO1bTUfcjWhFdshE64Mw84kh6qsdH0fOGc4AbT3YpXRlyEJxH6gX qJJw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=v66TrwN0sy6aehif6gWbz27zeRRHZQY9Z/gfXbjKhf0=; fh=qERIqab1IS0kFX4A0/Z/ByLJ99QIXEw3Uf6Pmmx8DsE=; b=VF3x+V37g8MF7KIAuiNR+zzcINRZrIbBOC9qi6+WMObGsgKRuzi5cpG3sjOpKJ/rQn Qy1kQk/4CqgeXtpw7Zcnndq/Ml+JfOH7Qi3obV1VAvCdEhSnGjtOCUD93u479hH2P2mI 5eLk+jDoFokvmcfGTvt+BeuIHGS/DnC6uCQMnorOm1silkVfs8x6rRiNVH6a5GT6hBeo FL1BCgxsG4tfe21mjwajqHYGT3iuZfsp28y6uHXWBbHg9G3K5y2rYtC72ooq8TGx/+oX s+cwSKLKpYqKPrlHyB93ai6/s+L6avxDS+aervhMEgMwrmZIdkHzp8h9P/cD9nsBXFjm GLmg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=RB6+4Btn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id q129-20020a0de787000000b0059eae2bc317si12507917ywe.217.2023.10.25.13.54.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Oct 2023 13:54:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=RB6+4Btn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id BB45A80620AE; Wed, 25 Oct 2023 13:53:33 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234796AbjJYUxG (ORCPT <rfc822;a1648639935@gmail.com> + 25 others); Wed, 25 Oct 2023 16:53:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48480 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232329AbjJYUxA (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 25 Oct 2023 16:53:00 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.24]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DD02E184; Wed, 25 Oct 2023 13:52:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1698267175; x=1729803175; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=bKrk4oUB1RyJhM26Xb3XGAzoBI+ZL+b1OrWpIw4mqdA=; b=RB6+4Btni8ROXTTL1LKark9z43ex0ID6SLYJovFW2r6FhpqAsAUBXK0d b5J5DMVM4K0j9//ZtFBZvjzJEaf4tUBDo2vmgyjt+cjatFbpdhunDP1kk mJTPBBR1Awa8Oz5QUF8xYsrVeZbh33BZ0Csekrwim99wgzw9mJcKmYasu zh7Se2GPUKGHLsauDAi6dlR0ni1vb6Mir89850XYahJiSJuhF4k98RPMw bDjQkY/57lkhIw7C82oxNBebmtfJdtSDl41v0tIvMoGKcMNmzT6xJwMjU MqqTlSvuiBR44eQ3d2X0dHQUP8/OPGg0BgTtn/1Wfv0wUvqdIyrnAFVSR A==; X-IronPort-AV: E=McAfee;i="6600,9927,10874"; a="390255480" X-IronPort-AV: E=Sophos;i="6.03,250,1694761200"; d="scan'208";a="390255480" Received: from fmviesa002.fm.intel.com ([10.60.135.142]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Oct 2023 13:52:55 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.03,250,1694761200"; d="scan'208";a="259602" Received: from kkomeyli-mobl.amr.corp.intel.com (HELO desk) ([10.251.29.139]) by fmviesa002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Oct 2023 13:52:43 -0700 Date: Wed, 25 Oct 2023 13:52:51 -0700 From: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> To: Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>, x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>, Peter Zijlstra <peterz@infradead.org>, Josh Poimboeuf <jpoimboe@kernel.org>, Andy Lutomirski <luto@kernel.org>, Jonathan Corbet <corbet@lwn.net>, Sean Christopherson <seanjc@google.com>, Paolo Bonzini <pbonzini@redhat.com>, tony.luck@intel.com, ak@linux.intel.com, tim.c.chen@linux.intel.com Cc: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kvm@vger.kernel.org, Alyssa Milburn <alyssa.milburn@linux.intel.com>, Daniel Sneddon <daniel.sneddon@linux.intel.com>, antonio.gomez.iglesias@linux.intel.com, Pawan Gupta <pawan.kumar.gupta@linux.intel.com>, Alyssa Milburn <alyssa.milburn@intel.com>, Andrew Cooper <andrew.cooper3@citrix.com> Subject: [PATCH v3 1/6] x86/bugs: Add asm helpers for executing VERW Message-ID: <20231025-delay-verw-v3-1-52663677ee35@linux.intel.com> X-Mailer: b4 0.12.3 References: <20231025-delay-verw-v3-0-52663677ee35@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20231025-delay-verw-v3-0-52663677ee35@linux.intel.com> X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Wed, 25 Oct 2023 13:53:34 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780762330390121979 X-GMAIL-MSGID: 1780762330390121979 |
Series |
Delay VERW
|
|
Commit Message
Pawan Gupta
Oct. 25, 2023, 8:52 p.m. UTC
MDS mitigation requires clearing the CPU buffers before returning to
user. This needs to be done late in the exit-to-user path. Current
location of VERW leaves a possibility of kernel data ending up in CPU
buffers for memory accesses done after VERW such as:
1. Kernel data accessed by an NMI between VERW and return-to-user can
remain in CPU buffers ( since NMI returning to kernel does not
execute VERW to clear CPU buffers.
2. Alyssa reported that after VERW is executed,
CONFIG_GCC_PLUGIN_STACKLEAK=y scrubs the stack used by a system
call. Memory accesses during stack scrubbing can move kernel stack
contents into CPU buffers.
3. When caller saved registers are restored after a return from
function executing VERW, the kernel stack accesses can remain in
CPU buffers(since they occur after VERW).
To fix this VERW needs to be moved very late in exit-to-user path.
In preparation for moving VERW to entry/exit asm code, create macros
that can be used in asm. Also make them depend on a new feature flag
X86_FEATURE_CLEAR_CPU_BUF.
Reported-by: Alyssa Milburn <alyssa.milburn@intel.com>
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
arch/x86/entry/entry.S | 16 ++++++++++++++++
arch/x86/include/asm/cpufeatures.h | 2 +-
arch/x86/include/asm/nospec-branch.h | 15 +++++++++++++++
3 files changed, 32 insertions(+), 1 deletion(-)
Comments
On 25/10/2023 9:52 pm, Pawan Gupta wrote: > diff --git a/arch/x86/entry/entry.S b/arch/x86/entry/entry.S > index bfb7bcb362bc..f8ba0c0b6e60 100644 > --- a/arch/x86/entry/entry.S > +++ b/arch/x86/entry/entry.S > @@ -20,3 +23,16 @@ SYM_FUNC_END(entry_ibpb) > EXPORT_SYMBOL_GPL(entry_ibpb); > > .popsection > + > +.pushsection .entry.text, "ax" > + > +.align L1_CACHE_BYTES, 0xcc > +SYM_CODE_START_NOALIGN(mds_verw_sel) > + UNWIND_HINT_UNDEFINED > + ANNOTATE_NOENDBR > + .word __KERNEL_DS You need another .align here. Otherwise subsequent code will still start in this cacheline and defeat the purpose of trying to keep it separate. > +SYM_CODE_END(mds_verw_sel); Thinking about it, should this really be CODE and not a data entry? It lives in .entry.text but it really is data and objtool shouldn't be writing ORC data for it at all. (Not to mention that if it's marked as STT_OBJECT, objdump -d will do the sensible thing and not even try to disassemble it). ~Andrew P.S. Please CC on the full series. Far less effort than fishing the rest off lore.
On Wed, Oct 25, 2023 at 10:10:41PM +0100, Andrew Cooper wrote: > On 25/10/2023 9:52 pm, Pawan Gupta wrote: > > diff --git a/arch/x86/entry/entry.S b/arch/x86/entry/entry.S > > index bfb7bcb362bc..f8ba0c0b6e60 100644 > > --- a/arch/x86/entry/entry.S > > +++ b/arch/x86/entry/entry.S > > @@ -20,3 +23,16 @@ SYM_FUNC_END(entry_ibpb) > > EXPORT_SYMBOL_GPL(entry_ibpb); > > > > .popsection > > + > > +.pushsection .entry.text, "ax" > > + > > +.align L1_CACHE_BYTES, 0xcc > > +SYM_CODE_START_NOALIGN(mds_verw_sel) > > + UNWIND_HINT_UNDEFINED > > + ANNOTATE_NOENDBR > > + .word __KERNEL_DS > > You need another .align here. Otherwise subsequent code will still > start in this cacheline and defeat the purpose of trying to keep it > separate. > > > +SYM_CODE_END(mds_verw_sel); > > Thinking about it, should this really be CODE and not a data entry? > > It lives in .entry.text but it really is data and objtool shouldn't be > writing ORC data for it at all. > > (Not to mention that if it's marked as STT_OBJECT, objdump -d will do > the sensible thing and not even try to disassemble it). > > ~Andrew > > P.S. Please CC on the full series. Far less effort than fishing the > rest off lore. +1 to putting it in .rodata or so.
On 25/10/2023 10:28 pm, Josh Poimboeuf wrote: > On Wed, Oct 25, 2023 at 10:10:41PM +0100, Andrew Cooper wrote: >> On 25/10/2023 9:52 pm, Pawan Gupta wrote: >>> diff --git a/arch/x86/entry/entry.S b/arch/x86/entry/entry.S >>> index bfb7bcb362bc..f8ba0c0b6e60 100644 >>> --- a/arch/x86/entry/entry.S >>> +++ b/arch/x86/entry/entry.S >>> @@ -20,3 +23,16 @@ SYM_FUNC_END(entry_ibpb) >>> EXPORT_SYMBOL_GPL(entry_ibpb); >>> >>> .popsection >>> + >>> +.pushsection .entry.text, "ax" >>> + >>> +.align L1_CACHE_BYTES, 0xcc >>> +SYM_CODE_START_NOALIGN(mds_verw_sel) >>> + UNWIND_HINT_UNDEFINED >>> + ANNOTATE_NOENDBR >>> + .word __KERNEL_DS >> You need another .align here. Otherwise subsequent code will still >> start in this cacheline and defeat the purpose of trying to keep it >> separate. >> >>> +SYM_CODE_END(mds_verw_sel); >> Thinking about it, should this really be CODE and not a data entry? >> >> It lives in .entry.text but it really is data and objtool shouldn't be >> writing ORC data for it at all. >> >> (Not to mention that if it's marked as STT_OBJECT, objdump -d will do >> the sensible thing and not even try to disassemble it). >> >> ~Andrew >> >> P.S. Please CC on the full series. Far less effort than fishing the >> rest off lore. > +1 to putting it in .rodata or so. It's necessarily in .entry.text so it doesn't explode with KPTI active. ~Andrew
On Wed, Oct 25, 2023 at 10:30:52PM +0100, Andrew Cooper wrote: > On 25/10/2023 10:28 pm, Josh Poimboeuf wrote: > > On Wed, Oct 25, 2023 at 10:10:41PM +0100, Andrew Cooper wrote: > >> On 25/10/2023 9:52 pm, Pawan Gupta wrote: > >>> diff --git a/arch/x86/entry/entry.S b/arch/x86/entry/entry.S > >>> index bfb7bcb362bc..f8ba0c0b6e60 100644 > >>> --- a/arch/x86/entry/entry.S > >>> +++ b/arch/x86/entry/entry.S > >>> @@ -20,3 +23,16 @@ SYM_FUNC_END(entry_ibpb) > >>> EXPORT_SYMBOL_GPL(entry_ibpb); > >>> > >>> .popsection > >>> + > >>> +.pushsection .entry.text, "ax" > >>> + > >>> +.align L1_CACHE_BYTES, 0xcc > >>> +SYM_CODE_START_NOALIGN(mds_verw_sel) > >>> + UNWIND_HINT_UNDEFINED > >>> + ANNOTATE_NOENDBR > >>> + .word __KERNEL_DS > >> You need another .align here. Otherwise subsequent code will still > >> start in this cacheline and defeat the purpose of trying to keep it > >> separate. > >> > >>> +SYM_CODE_END(mds_verw_sel); > >> Thinking about it, should this really be CODE and not a data entry? > >> > >> It lives in .entry.text but it really is data and objtool shouldn't be > >> writing ORC data for it at all. > >> > >> (Not to mention that if it's marked as STT_OBJECT, objdump -d will do > >> the sensible thing and not even try to disassemble it). > >> > >> ~Andrew > >> > >> P.S. Please CC on the full series. Far less effort than fishing the > >> rest off lore. > > +1 to putting it in .rodata or so. > > It's necessarily in .entry.text so it doesn't explode with KPTI active. Ah, right. In general tooling doesn't take too kindly to putting data in a text section. But it might be ok.
On Wed, Oct 25, 2023 at 10:10:41PM +0100, Andrew Cooper wrote: > > +.align L1_CACHE_BYTES, 0xcc > > +SYM_CODE_START_NOALIGN(mds_verw_sel) > > + UNWIND_HINT_UNDEFINED > > + ANNOTATE_NOENDBR > > + .word __KERNEL_DS > > You need another .align here. Otherwise subsequent code will still > start in this cacheline and defeat the purpose of trying to keep it > separate. Right. > > +SYM_CODE_END(mds_verw_sel); > > Thinking about it, should this really be CODE and not a data entry? Would that require adding a data equivalent of .entry.text and update KPTI to keep it mapped? Or is there an easier option? > P.S. Please CC on the full series. Far less effort than fishing the > rest off lore. I didn't realize get_maintainer.pl isn't doing that already. Proposing below update to MAINTAINERS: --- From: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Date: Wed, 25 Oct 2023 14:50:41 -0700 Subject: [PATCH] MAINTAINERS: Update entry for X86 HARDWARE VULNERABILITIES Add Andrew Cooper to maintainers of hardware vulnerabilities mitigations. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> --- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index 2894f0777537..bf8c8707b8f8 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -23382,6 +23382,7 @@ M: Thomas Gleixner <tglx@linutronix.de> M: Borislav Petkov <bp@alien8.de> M: Peter Zijlstra <peterz@infradead.org> M: Josh Poimboeuf <jpoimboe@kernel.org> +M: Andrew Cooper <andrew.cooper3@citrix.com> R: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> S: Maintained F: Documentation/admin-guide/hw-vuln/
On 25/10/2023 11:07 pm, Pawan Gupta wrote: > On Wed, Oct 25, 2023 at 10:10:41PM +0100, Andrew Cooper wrote: >>> +.align L1_CACHE_BYTES, 0xcc >>> +SYM_CODE_START_NOALIGN(mds_verw_sel) >>> + UNWIND_HINT_UNDEFINED >>> + ANNOTATE_NOENDBR >>> + .word __KERNEL_DS >> You need another .align here. Otherwise subsequent code will still >> start in this cacheline and defeat the purpose of trying to keep it >> separate. > Right. > >>> +SYM_CODE_END(mds_verw_sel); >> Thinking about it, should this really be CODE and not a data entry? > Would that require adding a data equivalent of .entry.text and update > KPTI to keep it mapped? Or is there an easier option? Leave it right here in .entry.text , but try using SYM_DATA() and friends. See whether objtool vomits over the result or not. And if objtool does vomit over the result, then leaving it as it is in this patch with SYM_CODE() is good enough. > >> P.S. Please CC on the full series. Far less effort than fishing the >> rest off lore. > I didn't realize get_maintainer.pl isn't doing that already. Proposing > below update to MAINTAINERS: > > --- > From: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> > Date: Wed, 25 Oct 2023 14:50:41 -0700 > Subject: [PATCH] MAINTAINERS: Update entry for X86 HARDWARE VULNERABILITIES > > Add Andrew Cooper to maintainers of hardware vulnerabilities > mitigations. > > Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> > --- > MAINTAINERS | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/MAINTAINERS b/MAINTAINERS > index 2894f0777537..bf8c8707b8f8 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -23382,6 +23382,7 @@ M: Thomas Gleixner <tglx@linutronix.de> > M: Borislav Petkov <bp@alien8.de> > M: Peter Zijlstra <peterz@infradead.org> > M: Josh Poimboeuf <jpoimboe@kernel.org> > +M: Andrew Cooper <andrew.cooper3@citrix.com> Oh, right. Perhaps R rather than M seeing as I can't make any time commitments, but sure. ~Andrew
<snip> > + > +.pushsection .entry.text, "ax" > + > +.align L1_CACHE_BYTES, 0xcc > +SYM_CODE_START_NOALIGN(mds_verw_sel) > + UNWIND_HINT_UNDEFINED > + ANNOTATE_NOENDBR > + .word __KERNEL_DS > +SYM_CODE_END(mds_verw_sel); > +/* For KVM */ > +EXPORT_SYMBOL_GPL(mds_verw_sel); > + > +.popsection <snip> > diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h > index c55cc243592e..005e69f93115 100644 > --- a/arch/x86/include/asm/nospec-branch.h > +++ b/arch/x86/include/asm/nospec-branch.h > @@ -329,6 +329,21 @@ > #endif > .endm > > +/* > + * Macros to execute VERW instruction that mitigate transient data sampling > + * attacks such as MDS. On affected systems a microcode update overloaded VERW > + * instruction to also clear the CPU buffers. VERW clobbers CFLAGS.ZF. > + * > + * Note: Only the memory operand variant of VERW clears the CPU buffers. > + */ > +.macro EXEC_VERW > + verw _ASM_RIP(mds_verw_sel) > +.endm > + > +.macro CLEAR_CPU_BUFFERS > + ALTERNATIVE "", __stringify(EXEC_VERW), X86_FEATURE_CLEAR_CPU_BUF > +.endm What happened with the first 5 bytes of a 7 byte nop being complemented by __KERNEL_DS in order to handle VERW being executed after user registers are restored and having its memory operand ? > + > #else /* __ASSEMBLY__ */ > > #define ANNOTATE_RETPOLINE_SAFE \ >
On 26/10/2023 2:44 pm, Nikolay Borisov wrote: > > > <snip> >> + >> +.pushsection .entry.text, "ax" >> + >> +.align L1_CACHE_BYTES, 0xcc >> +SYM_CODE_START_NOALIGN(mds_verw_sel) >> + UNWIND_HINT_UNDEFINED >> + ANNOTATE_NOENDBR >> + .word __KERNEL_DS >> +SYM_CODE_END(mds_verw_sel); >> +/* For KVM */ >> +EXPORT_SYMBOL_GPL(mds_verw_sel); >> + >> +.popsection > > <snip> > >> diff --git a/arch/x86/include/asm/nospec-branch.h >> b/arch/x86/include/asm/nospec-branch.h >> index c55cc243592e..005e69f93115 100644 >> --- a/arch/x86/include/asm/nospec-branch.h >> +++ b/arch/x86/include/asm/nospec-branch.h >> @@ -329,6 +329,21 @@ >> #endif >> .endm >> +/* >> + * Macros to execute VERW instruction that mitigate transient data >> sampling >> + * attacks such as MDS. On affected systems a microcode update >> overloaded VERW >> + * instruction to also clear the CPU buffers. VERW clobbers CFLAGS.ZF. >> + * >> + * Note: Only the memory operand variant of VERW clears the CPU >> buffers. >> + */ >> +.macro EXEC_VERW >> + verw _ASM_RIP(mds_verw_sel) >> +.endm >> + >> +.macro CLEAR_CPU_BUFFERS >> + ALTERNATIVE "", __stringify(EXEC_VERW), X86_FEATURE_CLEAR_CPU_BUF >> +.endm > > > What happened with the first 5 bytes of a 7 byte nop being > complemented by __KERNEL_DS in order to handle VERW being executed > after user registers are restored and having its memory operand ? It was moved out of line (so no need to hide a constant in a nop), deduped, and renamed to mds_verw_sel. verw _ASM_RIP(mds_verw_sel) *is* the memory form. ~Andrew
On Wed, Oct 25, 2023 at 11:13:46PM +0100, Andrew Cooper wrote: > On 25/10/2023 11:07 pm, Pawan Gupta wrote: > > On Wed, Oct 25, 2023 at 10:10:41PM +0100, Andrew Cooper wrote: > >>> +.align L1_CACHE_BYTES, 0xcc > >>> +SYM_CODE_START_NOALIGN(mds_verw_sel) > >>> + UNWIND_HINT_UNDEFINED > >>> + ANNOTATE_NOENDBR > >>> + .word __KERNEL_DS > >> You need another .align here. Otherwise subsequent code will still > >> start in this cacheline and defeat the purpose of trying to keep it > >> separate. > > Right. > > > >>> +SYM_CODE_END(mds_verw_sel); > >> Thinking about it, should this really be CODE and not a data entry? > > Would that require adding a data equivalent of .entry.text and update > > KPTI to keep it mapped? Or is there an easier option? > > Leave it right here in .entry.text , but try using SYM_DATA() and > friends. See whether objtool vomits over the result or not. objtool still complaints when using SYM_DATA*() without the annotations: vmlinux.o: warning: objtool: mds_verw_sel+0x0: unreachable instruction vmlinux.o: warning: objtool: .altinstr_replacement+0x2c: relocation to !ENDBR: mds_verw_sel+0x0 > And if objtool does vomit over the result, then leaving it as it is in > this patch with SYM_CODE() is good enough. Settling with SYM_CODE(). On the bright-side, I am seeing even better perf with VERW operand out-of-line: Baseline: v6.6-rc5 | Test | Configuration | v1 | v3 | | ------------------ | ---------------------- | ---- | ---- | | build-linux-kernel | defconfig | 1.00 | 1.00 | | hackbench | 32 - Process | 1.02 | 1.06 | | nginx | Short Connection - 500 | 1.01 | 1.04 | Disclaimer: These are collected by a stupid dev who knows nothing about perf, please take this with a grain of salt. I will be sending v4 soon.
On 27/10/2023 2:48 pm, Pawan Gupta wrote: > On the bright-side, I am seeing even better perf with VERW operand > out-of-line: > > Baseline: v6.6-rc5 > > | Test | Configuration | v1 | v3 | > | ------------------ | ---------------------- | ---- | ---- | > | build-linux-kernel | defconfig | 1.00 | 1.00 | > | hackbench | 32 - Process | 1.02 | 1.06 | > | nginx | Short Connection - 500 | 1.01 | 1.04 | > > Disclaimer: These are collected by a stupid dev who knows nothing about > perf, please take this with a grain of salt. :) Almost as if it's a good idea to follow the advice of the Optimisation Guide on mixing code and data, which is "don't". ~Andrew
On Fri, Oct 27, 2023 at 03:12:45PM +0100, Andrew Cooper wrote: > Almost as if it's a good idea to follow the advice of the Optimisation > Guide on mixing code and data, which is "don't". Thanks a lot Andrew and Peter for shepherding me this way.
diff --git a/arch/x86/entry/entry.S b/arch/x86/entry/entry.S index bfb7bcb362bc..f8ba0c0b6e60 100644 --- a/arch/x86/entry/entry.S +++ b/arch/x86/entry/entry.S @@ -6,6 +6,9 @@ #include <linux/linkage.h> #include <asm/export.h> #include <asm/msr-index.h> +#include <asm/unwind_hints.h> +#include <asm/segment.h> +#include <asm/cache.h> .pushsection .noinstr.text, "ax" @@ -20,3 +23,16 @@ SYM_FUNC_END(entry_ibpb) EXPORT_SYMBOL_GPL(entry_ibpb); .popsection + +.pushsection .entry.text, "ax" + +.align L1_CACHE_BYTES, 0xcc +SYM_CODE_START_NOALIGN(mds_verw_sel) + UNWIND_HINT_UNDEFINED + ANNOTATE_NOENDBR + .word __KERNEL_DS +SYM_CODE_END(mds_verw_sel); +/* For KVM */ +EXPORT_SYMBOL_GPL(mds_verw_sel); + +.popsection diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 58cb9495e40f..f21fc0f12737 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -308,10 +308,10 @@ #define X86_FEATURE_SMBA (11*32+21) /* "" Slow Memory Bandwidth Allocation */ #define X86_FEATURE_BMEC (11*32+22) /* "" Bandwidth Monitoring Event Configuration */ #define X86_FEATURE_USER_SHSTK (11*32+23) /* Shadow stack support for user mode applications */ - #define X86_FEATURE_SRSO (11*32+24) /* "" AMD BTB untrain RETs */ #define X86_FEATURE_SRSO_ALIAS (11*32+25) /* "" AMD BTB untrain RETs through aliasing */ #define X86_FEATURE_IBPB_ON_VMEXIT (11*32+26) /* "" Issue an IBPB only on VMEXIT */ +#define X86_FEATURE_CLEAR_CPU_BUF (11*32+27) /* "" Clear CPU buffers */ /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */ #define X86_FEATURE_AVX_VNNI (12*32+ 4) /* AVX VNNI instructions */ diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h index c55cc243592e..005e69f93115 100644 --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -329,6 +329,21 @@ #endif .endm +/* + * Macros to execute VERW instruction that mitigate transient data sampling + * attacks such as MDS. On affected systems a microcode update overloaded VERW + * instruction to also clear the CPU buffers. VERW clobbers CFLAGS.ZF. + * + * Note: Only the memory operand variant of VERW clears the CPU buffers. + */ +.macro EXEC_VERW + verw _ASM_RIP(mds_verw_sel) +.endm + +.macro CLEAR_CPU_BUFFERS + ALTERNATIVE "", __stringify(EXEC_VERW), X86_FEATURE_CLEAR_CPU_BUF +.endm + #else /* __ASSEMBLY__ */ #define ANNOTATE_RETPOLINE_SAFE \