Message ID | 20230809072200.543939260@infradead.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:c44e:0:b0:3f2:4152:657d with SMTP id w14csp2645605vqr; Wed, 9 Aug 2023 01:29:42 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHCfI8oHH1PZcliQkRqMG00hu9ZoUDyDemK2iGLjznoe3n1dfesKhQ3M+ooPc6fJvWDDFVW X-Received: by 2002:a05:6358:260b:b0:134:d559:2590 with SMTP id l11-20020a056358260b00b00134d5592590mr1613489rwc.14.1691569782160; Wed, 09 Aug 2023 01:29:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691569782; cv=none; d=google.com; s=arc-20160816; b=Pc5+PRFqDzYYC7pv5xm5bPqdX2BFxrbQjarVhlOC9s9wOo3sa3pBL6EpuvsGAlT9nH TvnJTtQxgfjHUj/ee0a5zl3XmbV0sr38I2G5XN4i/2VW/RepOZP5ZX4zJvGepwDOhFGQ T7jEYE22GuwKgSwiYsBhgmDh4ObbxEjlLnwfvSLLpMMhWRXKWhqP2C+NUXM+9KfyyJEm 1pFCuUDmFYH+5uyml/0ptJUDSIFO/YDTYWBRcQIPpExGOtWVdx68YSncmQv9BgRvMydi rYDAr6/3v2KnB7ChKE68AF6gVRHCS1Vo+zPUDWNf+DxhqFLSE6/ZAAr3nlDDeXU4iun1 Ib1g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:subject:cc:to:from:date :user-agent:message-id:dkim-signature; bh=exUBzq/dF7lmdYkBLNsZoUkJD4k+zb5rOeJWW508skg=; fh=o45i9bBrOE/GuAHaEwxsN59PvrZuIOmhioZFWcmA1tw=; b=r5HLrR6G4e0aPKpiD+9EwTToaRQO+Nl1FafUnCSu8/aqymaxwbNm6o7H/LIFU9ZfPq ZcLJ8SibDHb9qRLwYe4sFVtPHYuLfKpgSTele9Sxh2NJZfNbcZoJMPTdbBP1bIhRlyX0 dIW+x1nnyDe124um4TzdCsdpilRS4TX47ZYIZme17ZCiJsfPLzFEE1ffW7p6gvMwUuBa +vBBN6CFlnhnwn4pnEJNIdUlkObMThemwa9VORUfbTPbGWauL3U311akY7pIWOhm+yAd Dfvf7IwGYQbyW/9lhECkZgeueC14DojWJIEZQMU3h0xFva/XZqZ9L283v8bp+6IPvHcp 8C2w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=JxDWLZvM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e4-20020a170902b78400b001b9d926ccdesi8729340pls.576.2023.08.09.01.29.29; Wed, 09 Aug 2023 01:29:42 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=JxDWLZvM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231750AbjHIH1c (ORCPT <rfc822;aaronkmseo@gmail.com> + 99 others); Wed, 9 Aug 2023 03:27:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44868 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231648AbjHIH1A (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 9 Aug 2023 03:27:00 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AA743172A for <linux-kernel@vger.kernel.org>; Wed, 9 Aug 2023 00:26:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=exUBzq/dF7lmdYkBLNsZoUkJD4k+zb5rOeJWW508skg=; b=JxDWLZvMnGQatlKLB+c3K0MQdb pfOc2Hs9/KVby/P2OhY7l1hl2dly5eJptfTxkQD/guGWDDsjS2VBNff9JEch82WMG3B2XQiB+gVBW 6TW0+mLoyygAskPidPOQMyn/mh1Im8yjMujHVFoCCeZS70FW3pjBwts0cnIJ4rIGdPqnzYsu6Td3G MumArwqbWMUv/rcRe9U6X/8xEXSIdWTjQgXRaMGZQa9NHsTzQjP2B28CYTLVxQ+DuRJ3aRij2HmV1 1JXAs31lA+mURcNYJUARJ/9S5AVpNjMjfnuwYEYSxEhrzWTVx+hfgvyxkBCDHdNNKH/eJpN3Zf4MX 3uPg5XmA==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1qTdae-004olM-Ro; Wed, 09 Aug 2023 07:26:45 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 60E623003F1; Wed, 9 Aug 2023 09:26:44 +0200 (CEST) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 0) id 439E8201D9950; Wed, 9 Aug 2023 09:26:44 +0200 (CEST) Message-ID: <20230809072200.543939260@infradead.org> User-Agent: quilt/0.66 Date: Wed, 09 Aug 2023 09:12:20 +0200 From: Peter Zijlstra <peterz@infradead.org> To: x86@kernel.org Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, David.Kaplan@amd.com, Andrew.Cooper3@citrix.com, jpoimboe@kernel.org, gregkh@linuxfoundation.org Subject: [RFC][PATCH 02/17] x86/cpu: Clean up SRSO return thunk mess References: <20230809071218.000335006@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773739475544349121 X-GMAIL-MSGID: 1773739475544349121 |
Series |
Fix up the recent SRSO patches
|
|
Commit Message
Peter Zijlstra
Aug. 9, 2023, 7:12 a.m. UTC
Use the existing configurable return thunk. There is absolute no
justification for having created this __x86_return_thunk alternative.
To clarify, the whole thing looks like:
Zen3/4 does:
srso_alias_untrain_ret:
nop2
lfence
jmp srso_alias_return_thunk
int3
srso_alias_safe_ret: // aliasses srso_alias_untrain_ret just so
add $8, %rsp
ret
int3
srso_alias_return_thunk:
call srso_alias_safe_ret
ud2
While Zen1/2 does:
srso_untrain_ret:
movabs $foo, %rax
lfence
call srso_safe_ret (jmp srso_return_thunk ?)
int3
srso_safe_ret: // embedded in movabs immediate
add $8,%rsp
ret
int3
srso_return_thunk:
call srso_safe_ret
ud2
While retbleed does:
zen_untrain_ret:
test $0xcc, %bl
lfence
jmp zen_return_thunk
int3
zen_return_thunk: // embedded in the test instruction
ret
int3
Where Zen1/2 flush the BTB using the instruction decoder trick
(test,movabs) Zen3/4 use instruction aliasing. SRSO adds RSB (RAP in
AMD speak) stuffing to force a return mis-predict.
That is; the AMD retbleed is a form of Speculative-Type-Confusion
where the branch predictor is trained to use the BTB to predict the
RET address, while AMD inception/SRSO is a form of
Speculative-Type-Confusion where another instruction is trained to be
treated like a CALL instruction and poison the RSB (RAP).
Pick one of three options at boot.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
arch/x86/include/asm/nospec-branch.h | 4 +++
arch/x86/kernel/cpu/bugs.c | 7 ++++--
arch/x86/kernel/vmlinux.lds.S | 2 -
arch/x86/lib/retpoline.S | 37 ++++++++++++++++++++++++-----------
4 files changed, 36 insertions(+), 14 deletions(-)
Comments
On 9.08.23 г. 10:12 ч., Peter Zijlstra wrote: > Use the existing configurable return thunk. There is absolute no > justification for having created this __x86_return_thunk alternative. > > To clarify, the whole thing looks like: > > Zen3/4 does: > > srso_alias_untrain_ret: > nop2 > lfence > jmp srso_alias_return_thunk > int3 > > srso_alias_safe_ret: // aliasses srso_alias_untrain_ret just so > add $8, %rsp > ret > int3 > > srso_alias_return_thunk: > call srso_alias_safe_ret > ud2 > > While Zen1/2 does: > > srso_untrain_ret: > movabs $foo, %rax > lfence > call srso_safe_ret (jmp srso_return_thunk ?) > int3 > > srso_safe_ret: // embedded in movabs immediate > add $8,%rsp > ret > int3 > > srso_return_thunk: > call srso_safe_ret > ud2 > > While retbleed does: > > zen_untrain_ret: > test $0xcc, %bl > lfence > jmp zen_return_thunk > int3 > > zen_return_thunk: // embedded in the test instruction > ret > int3 > > Where Zen1/2 flush the BTB using the instruction decoder trick > (test,movabs) Zen3/4 use instruction aliasing. SRSO adds RSB (RAP in > AMD speak) stuffing to force a return mis-predict. > > That is; the AMD retbleed is a form of Speculative-Type-Confusion > where the branch predictor is trained to use the BTB to predict the > RET address, while AMD inception/SRSO is a form of > Speculative-Type-Confusion where another instruction is trained to be > treated like a CALL instruction and poison the RSB (RAP). > > Pick one of three options at boot. > So this boils down to simply removing one level of indirection, instead of patching the body of __x86_return_thunk you directly patch the return sites with the correct thunk. Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
On Wed, Aug 09, 2023 at 09:12:20AM +0200, Peter Zijlstra wrote: > Where Zen1/2 flush the BTB using the instruction decoder trick > (test,movabs) Zen3/4 use instruction aliasing. SRSO adds RSB (RAP in BTB aliasing. > AMD speak) stuffing to force a return mis-predict. No it doesn't. It causes BTB aliasing which evicts any potentially poisoned entries. > That is; the AMD retbleed is a form of Speculative-Type-Confusion > where the branch predictor is trained to use the BTB to predict the > RET address, while AMD inception/SRSO is a form of > Speculative-Type-Confusion where another instruction is trained to be > treated like a CALL instruction and poison the RSB (RAP). Nope, Andy explained it already in the 0th message. > Pick one of three options at boot. Yes, provided microarchitecturally that works, I'm all for removing the __ret alternative. Thx.
On Thu, Aug 10, 2023 at 01:51:48PM +0200, Borislav Petkov wrote: > On Wed, Aug 09, 2023 at 09:12:20AM +0200, Peter Zijlstra wrote: > > Where Zen1/2 flush the BTB using the instruction decoder trick > > (test,movabs) Zen3/4 use instruction aliasing. SRSO adds RSB (RAP in > > BTB aliasing. > > > AMD speak) stuffing to force a return mis-predict. > > No it doesn't. It causes BTB aliasing which evicts any potentially > poisoned entries. It does; so zen1/2 use the decoder thing to flush BTB entry of the RET, both retbleed and srso do. Then zen3/4 use the aliassing trick to flush the BTB entry of the RET. Then both srso options use RSB/RAP stuffing to force a mispredict there. Retbleed doesn't do this. retbleed is about BTB, srso does both BTB and RSB/RAP. > > That is; the AMD retbleed is a form of Speculative-Type-Confusion > > where the branch predictor is trained to use the BTB to predict the > > RET address, while AMD inception/SRSO is a form of > > Speculative-Type-Confusion where another instruction is trained to be > > treated like a CALL instruction and poison the RSB (RAP). > > Nope, Andy explained it already in the 0th message. I'm still of the opinion that branch-type-confusion is an integral part of setting up the srso RSB/RAP trickery. It just targets a different predictor, RSB/RAP vs BTB. > > Pick one of three options at boot. > > Yes, provided microarchitecturally that works, I'm all for removing the > __ret alternative. So this patch doesn't actually change anything except one layer of indirection. Your thing does: SYNC_FUNC_START(foo) ... ALTERNATIVE "ret; int3", "jmp __x86_return_thunk", X86_FEATURE_RETHUNK SYM_FUNC_END(foo) SYM_FUNC_START(__x86_return_thunk) ALTERNATIVE("jmp __ret", "call srso_safe_ret", X86_FEATURE_SRSO, "call srso_alias_safe_ret", X86_FEATURE_SRSO_ALIAS); int3 SYM_FUNC_END(__x86_return_thunk) So what was RET, jumps to __x86_return_thunk, which then jumps to the actual return thunk. After this patch things look equivalent to: SYM_FUNC_START(foo) ... ALTERNATIVE "ret; int3" "jmp __x86_return_thunk", X86_FEATURE_RETHUNK "jmp srso_return_thunk, X86_FEATURE_SRSO "jmp srsi_alias_return_thunk", X86_FEATURE_SRSO_ALIAS SYM_FUNC_END(foo) SYM_CODE_START(srso_return_thunk) UNWIND_HINT_FUNC ANNOTATE_NOENDBR call srso_safe_ret; ud2 SYM_CODE_END(srso_return_thunk) SYM_CODE_START(srso_alias_return_thunk) UNWIND_HINT_FUNC ANNOTATE_NOENDBR call srso_alias_safe_ret; ud2 SYM_CODE_END(srso_alias_return_thunk) Except of course we don't have an actual ALTERNATIVE at the ret site, but .return_sites and rewriting things to either "ret; int3" or whatever function is in x86_return_thunk. Before this patch, only one ret thunk is used at any one time, after this patch still only one ret thunk is used. fundamentally, you can only ever use one ret. IOW this patch changes nothing for SRSO, it still does a jump to a call. But it does clean up retbleed, which you had as a jump to a jump, back to just a jump, and it does get rid of that extra alternative layer yo had by using the one we already have at .return_sites rewrite.
On Thu, Aug 10, 2023 at 02:37:56PM +0200, Peter Zijlstra wrote: > It does; so zen1/2 use the decoder thing to flush BTB entry of the RET, > both retbleed and srso do. > > Then zen3/4 use the aliassing trick to flush the BTB entry of the RET. Yes, I was correcting your "instruction aliasing". It is "BTB aliasing" by causing those bits in the VAs to XOR. > Then both srso options use RSB/RAP stuffing to force a mispredict there. They cause the RETs to mispredict - no stuffing. That's the add $8, %rsp in the zen3/4 case which causes the RET to mispredict. There's no doing a bunch of CALLs to stuff something. > Retbleed doesn't do this. > > retbleed is about BTB, srso does both BTB and RSB/RAP. Yes. > So this patch doesn't actually change anything except one layer of > indirection. I agree with everything from here on to the end. Provided we can do that and there's no some microarchitectural catch there, I'm all for removing the __ret alternative. Thx.
On Thu, Aug 10, 2023 at 02:56:31PM +0200, Borislav Petkov wrote: > > Then both srso options use RSB/RAP stuffing to force a mispredict there. > > They cause the RETs to mispredict - no stuffing. That's the add $8, > %rsp in the zen3/4 case which causes the RET to mispredict. There's no > doing a bunch of CALLs to stuff something. This is what is called RSB stuffing, we've been doing it for ages on the Intel side, and code in nospec-branch.h has a number of variants of this. CALL srso_safe_ret // push addr of UD2 into RSB -- aka 'stuff' UD2 srso_safe_ret: ADD $8, %RSP // skip over the return to UD2 RET // pop RSB, speculate into UD2, miss like a beast Now compare to __FILL_ONE_RETURN, which has the comment 'Stuff a single RSB slot.' That expands to: call 772f int3 772: add $8, %rsp lfence Which is the same sequence and causes the next RET to speculate into that int3. So RSB stuffing is sticking addresses to traps in the RSB so that subsequent predictions go into said traps instead of potentially user controlled targets.
On Thu, Aug 10, 2023 at 02:37:56PM +0200, Peter Zijlstra wrote: > After this patch things look equivalent to: > > SYM_FUNC_START(foo) > ... > ALTERNATIVE "ret; int3" > "jmp __x86_return_thunk", X86_FEATURE_RETHUNK > "jmp srso_return_thunk, X86_FEATURE_SRSO > "jmp srsi_alias_return_thunk", X86_FEATURE_SRSO_ALIAS > SYM_FUNC_END(foo) > > SYM_CODE_START(srso_return_thunk) > UNWIND_HINT_FUNC > ANNOTATE_NOENDBR > call srso_safe_ret; > ud2 > SYM_CODE_END(srso_return_thunk) > > SYM_CODE_START(srso_alias_return_thunk) > UNWIND_HINT_FUNC > ANNOTATE_NOENDBR > call srso_alias_safe_ret; > ud2 > SYM_CODE_END(srso_alias_return_thunk) > So it looks like the compilers are still not emitting int3 after jmp, even with the SLS options enabled :/ This means the tail end of functions compiled with: -mharden-sls=all -mfunction-return=thunk-extern Is still a regular: jmp __x86_return_thunk, no trailing trap. https://godbolt.org/z/Ecqv76YbE If we all could please finally fix that, then I can rewrite the above to effectively be: SYM_FUNC_START(foo) ... ALTERNATIVE "ret; int3" "jmp __x86_return_thunk", X86_FEATURE_RETHUNK "call srso_safe_ret, X86_FEATURE_SRSO "call srso_alias_safe_ret", X86_FEATURE_SRSO_ALIAS int3 // <--- *MISSING* SYM_FUNC_END(foo) Bonus points if I can compile time tell if a compiler DTRT, feature flag or what have you in the preprocessor would be awesome.
On Fri, Aug 11, 2023 at 12:01 AM Peter Zijlstra <peterz@infradead.org> wrote: > > On Thu, Aug 10, 2023 at 02:37:56PM +0200, Peter Zijlstra wrote: > > > After this patch things look equivalent to: > > > > SYM_FUNC_START(foo) > > ... > > ALTERNATIVE "ret; int3" > > "jmp __x86_return_thunk", X86_FEATURE_RETHUNK > > "jmp srso_return_thunk, X86_FEATURE_SRSO > > "jmp srsi_alias_return_thunk", X86_FEATURE_SRSO_ALIAS > > SYM_FUNC_END(foo) > > > > SYM_CODE_START(srso_return_thunk) > > UNWIND_HINT_FUNC > > ANNOTATE_NOENDBR > > call srso_safe_ret; > > ud2 > > SYM_CODE_END(srso_return_thunk) > > > > SYM_CODE_START(srso_alias_return_thunk) > > UNWIND_HINT_FUNC > > ANNOTATE_NOENDBR > > call srso_alias_safe_ret; > > ud2 > > SYM_CODE_END(srso_alias_return_thunk) > > > > So it looks like the compilers are still not emitting int3 after jmp, > even with the SLS options enabled :/ > > This means the tail end of functions compiled with: > > -mharden-sls=all -mfunction-return=thunk-extern > > Is still a regular: jmp __x86_return_thunk, no trailing trap. > > https://godbolt.org/z/Ecqv76YbE I don't have time to finish this today, but https://reviews.llvm.org/D157734 should do what you're looking for, I think. > > If we all could please finally fix that, then I can rewrite the above to > effectively be: > > SYM_FUNC_START(foo) > ... > ALTERNATIVE "ret; int3" > "jmp __x86_return_thunk", X86_FEATURE_RETHUNK > "call srso_safe_ret, X86_FEATURE_SRSO > "call srso_alias_safe_ret", X86_FEATURE_SRSO_ALIAS > int3 // <--- *MISSING* > SYM_FUNC_END(foo) > > Bonus points if I can compile time tell if a compiler DTRT, feature flag > or what have you in the preprocessor would be awesome. Probably not a preprocessor token; in the past I have made that suggestion and the old guard informed me "no, too many preprocessor tokens to lex, no more!" I still disagree but that is a viewpoint I can sympathize with, slightly. Probably version checks for now on the SLS config (or version checks on a new kconfig CONFIG_IMPROVED_SLS)
On Fri, Aug 11, 2023 at 10:00:31AM -0700, Nick Desaulniers wrote: > On Fri, Aug 11, 2023 at 12:01 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > > On Thu, Aug 10, 2023 at 02:37:56PM +0200, Peter Zijlstra wrote: > > > > > After this patch things look equivalent to: > > > > > > SYM_FUNC_START(foo) > > > ... > > > ALTERNATIVE "ret; int3" > > > "jmp __x86_return_thunk", X86_FEATURE_RETHUNK > > > "jmp srso_return_thunk, X86_FEATURE_SRSO > > > "jmp srsi_alias_return_thunk", X86_FEATURE_SRSO_ALIAS > > > SYM_FUNC_END(foo) > > > > > > SYM_CODE_START(srso_return_thunk) > > > UNWIND_HINT_FUNC > > > ANNOTATE_NOENDBR > > > call srso_safe_ret; > > > ud2 > > > SYM_CODE_END(srso_return_thunk) > > > > > > SYM_CODE_START(srso_alias_return_thunk) > > > UNWIND_HINT_FUNC > > > ANNOTATE_NOENDBR > > > call srso_alias_safe_ret; > > > ud2 > > > SYM_CODE_END(srso_alias_return_thunk) > > > > > > > So it looks like the compilers are still not emitting int3 after jmp, > > even with the SLS options enabled :/ > > > > This means the tail end of functions compiled with: > > > > -mharden-sls=all -mfunction-return=thunk-extern > > > > Is still a regular: jmp __x86_return_thunk, no trailing trap. > > > > https://godbolt.org/z/Ecqv76YbE > > I don't have time to finish this today, but > https://reviews.llvm.org/D157734 should do what you're looking for, I > think. Hmm, so your wording seems to imply regular SLS would already emit INT3 after jump, but I'm not seeing that in clang-16 output. Should I upgrade my llvm? [[edit]] Oooh, now I see, regular SLS would emit RET; INT3, but what I'm alluding to was that sls=all should also emit INT3 after every JMP due to AMD BTC. This is an SLS option that seems to have gone missing in both compilers for a long while. And yesterday I only quickly looked at bigger gcc output and not clang. But when I look at clang-16 output I see things like: 1053: 2e e8 00 00 00 00 cs call 1059 <yield_to+0xe9> 1055: R_X86_64_PLT32 __x86_indirect_thunk_r11-0x4 1059: 84 c0 test %al,%al 105b: 74 1c je 1079 <yield_to+0x109> 105d: eb 6e jmp 10cd <yield_to+0x15d> No INT3 105f: 41 bc 01 00 00 00 mov $0x1,%r12d 1065: 80 7c 24 04 00 cmpb $0x0,0x4(%rsp) 106a: 74 0d je 1079 <yield_to+0x109> 106c: 4d 39 fe cmp %r15,%r14 106f: 74 08 je 1079 <yield_to+0x109> 1071: 4c 89 ff mov %r15,%rdi 1074: e8 00 00 00 00 call 1079 <yield_to+0x109> 1075: R_X86_64_PLT32 resched_curr-0x4 1079: 4d 39 fe cmp %r15,%r14 107c: 74 08 je 1086 <yield_to+0x116> 107e: 4c 89 ff mov %r15,%rdi 1081: e8 00 00 00 00 call 1086 <yield_to+0x116> 1082: R_X86_64_PLT32 _raw_spin_unlock-0x4 1086: 4c 89 f7 mov %r14,%rdi 1089: e8 00 00 00 00 call 108e <yield_to+0x11e> 108a: R_X86_64_PLT32 _raw_spin_unlock-0x4 108e: f7 c3 00 02 00 00 test $0x200,%ebx 1094: 74 06 je 109c <yield_to+0x12c> 1096: ff 15 00 00 00 00 call *0x0(%rip) # 109c <yield_to+0x12c> 1098: R_X86_64_PC32 pv_ops+0xfc 109c: 45 85 e4 test %r12d,%r12d 109f: 7e 05 jle 10a6 <yield_to+0x136> 10a1: e8 00 00 00 00 call 10a6 <yield_to+0x136> 10a2: R_X86_64_PLT32 schedule-0x4 10a6: 44 89 e0 mov %r12d,%eax 10a9: 48 83 c4 08 add $0x8,%rsp 10ad: 5b pop %rbx 10ae: 41 5c pop %r12 10b0: 41 5d pop %r13 10b2: 41 5e pop %r14 10b4: 41 5f pop %r15 10b6: 5d pop %rbp 10b7: 2e e9 00 00 00 00 cs jmp 10bd <yield_to+0x14d> 10b9: R_X86_64_PLT32 __x86_return_thunk-0x4 CS padding!! 10bd: 41 bc fd ff ff ff mov $0xfffffffd,%r12d 10c3: f7 c3 00 02 00 00 test $0x200,%ebx So since you (surprisingly!) CS pad the return thunk, I *could* pull it off there, 6 bytes is enough space to write: 'CALL foo; INT3' But really SLS *should* put INT3 after every JMP instruction -- of course including the return thunk one.
--- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -342,9 +342,13 @@ extern retpoline_thunk_t __x86_indirect_ extern retpoline_thunk_t __x86_indirect_jump_thunk_array[]; extern void __x86_return_thunk(void); +extern void srso_return_thunk(void); +extern void srso_alias_return_thunk(void); + extern void zen_untrain_ret(void); extern void srso_untrain_ret(void); extern void srso_untrain_ret_alias(void); + extern void entry_ibpb(void); extern void (*x86_return_thunk)(void); --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -2305,10 +2305,13 @@ static void __init srso_select_mitigatio */ setup_force_cpu_cap(X86_FEATURE_RETHUNK); - if (boot_cpu_data.x86 == 0x19) + if (boot_cpu_data.x86 == 0x19) { setup_force_cpu_cap(X86_FEATURE_SRSO_ALIAS); - else + x86_return_thunk = srso_alias_return_thunk; + } else { setup_force_cpu_cap(X86_FEATURE_SRSO); + x86_return_thunk = srso_return_thunk; + } srso_mitigation = SRSO_MITIGATION_SAFE_RET; } else { pr_err("WARNING: kernel not compiled with CPU_SRSO.\n"); --- a/arch/x86/kernel/vmlinux.lds.S +++ b/arch/x86/kernel/vmlinux.lds.S @@ -523,7 +523,7 @@ INIT_PER_CPU(irq_stack_backing_store); #endif #ifdef CONFIG_RETHUNK -. = ASSERT((__ret & 0x3f) == 0, "__ret not cacheline-aligned"); +. = ASSERT((__x86_return_thunk & 0x3f) == 0, "__x86_return_thunk not cacheline-aligned"); . = ASSERT((srso_safe_ret & 0x3f) == 0, "srso_safe_ret not cacheline-aligned"); #endif --- a/arch/x86/lib/retpoline.S +++ b/arch/x86/lib/retpoline.S @@ -151,10 +151,11 @@ SYM_CODE_END(__x86_indirect_jump_thunk_a .section .text.__x86.rethunk_untrain SYM_START(srso_untrain_ret_alias, SYM_L_GLOBAL, SYM_A_NONE) + UNWIND_HINT_FUNC ANNOTATE_NOENDBR ASM_NOP2 lfence - jmp __x86_return_thunk + jmp srso_alias_return_thunk SYM_FUNC_END(srso_untrain_ret_alias) __EXPORT_THUNK(srso_untrain_ret_alias) @@ -184,7 +185,7 @@ SYM_FUNC_END(srso_safe_ret_alias) * from re-poisioning the BTB prediction. */ .align 64 - .skip 64 - (__ret - zen_untrain_ret), 0xcc + .skip 64 - (__x86_return_thunk - zen_untrain_ret), 0xcc SYM_START(zen_untrain_ret, SYM_L_GLOBAL, SYM_A_NONE) ANNOTATE_NOENDBR /* @@ -216,10 +217,10 @@ SYM_START(zen_untrain_ret, SYM_L_GLOBAL, * evicted, __x86_return_thunk will suffer Straight Line Speculation * which will be contained safely by the INT3. */ -SYM_INNER_LABEL(__ret, SYM_L_GLOBAL) +SYM_INNER_LABEL(__x86_return_thunk, SYM_L_GLOBAL) ret int3 -SYM_CODE_END(__ret) +SYM_CODE_END(__x86_return_thunk) /* * Ensure the TEST decoding / BTB invalidation is complete. @@ -230,11 +231,13 @@ SYM_CODE_END(__ret) * Jump back and execute the RET in the middle of the TEST instruction. * INT3 is for SLS protection. */ - jmp __ret + jmp __x86_return_thunk int3 SYM_FUNC_END(zen_untrain_ret) __EXPORT_THUNK(zen_untrain_ret) +EXPORT_SYMBOL(__x86_return_thunk) + /* * SRSO untraining sequence for Zen1/2, similar to zen_untrain_ret() * above. On kernel entry, srso_untrain_ret() is executed which is a @@ -257,6 +260,7 @@ SYM_INNER_LABEL(srso_safe_ret, SYM_L_GLO int3 int3 int3 + /* end of movabs */ lfence call srso_safe_ret int3 @@ -264,12 +268,23 @@ SYM_CODE_END(srso_safe_ret) SYM_FUNC_END(srso_untrain_ret) __EXPORT_THUNK(srso_untrain_ret) -SYM_FUNC_START(__x86_return_thunk) - ALTERNATIVE_2 "jmp __ret", "call srso_safe_ret", X86_FEATURE_SRSO, \ - "call srso_safe_ret_alias", X86_FEATURE_SRSO_ALIAS - int3 -SYM_CODE_END(__x86_return_thunk) -EXPORT_SYMBOL(__x86_return_thunk) +/* + * Both these do an unbalanced CALL to mess up the RSB, terminate with UD2 + * to indicate noreturn. + */ +SYM_CODE_START(srso_return_thunk) + UNWIND_HINT_FUNC + ANNOTATE_NOENDBR + call srso_safe_ret + ud2 +SYM_CODE_END(srso_return_thunk) + +SYM_CODE_START(srso_alias_return_thunk) + UNWIND_HINT_FUNC + ANNOTATE_NOENDBR + call srso_safe_ret_alias + ud2 +SYM_CODE_END(srso_alias_return_thunk) #endif /* CONFIG_RETHUNK */