Message ID | 20230607072936.3766231-4-nik.borisov@suse.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp76611vqr; Wed, 7 Jun 2023 00:40:12 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5uop5kRscrpDBHpskw2IfhndqZaVnJ7pTJ2PHGZQr3wneDyVgRaGD7djUETYPUuI9M358q X-Received: by 2002:a05:6a00:2195:b0:639:a518:3842 with SMTP id h21-20020a056a00219500b00639a5183842mr5764306pfi.7.1686123612414; Wed, 07 Jun 2023 00:40:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686123612; cv=none; d=google.com; s=arc-20160816; b=Cj+SwvHSiuIousnCHzWYxQ1pMZAzRBEmfMMDwkNVwpLD2Blpg62bxDQ3Fhjyusmabk kShBUTHdWeYZgax4XsDka1guwRXjMptEpnQbwI9CqbSWro8NkgjfhWAFE0Negj/0RDvD Yy6vtCUt7Bqo+ucsmiHNewzNoYtfrKv2dEYT2f8BIQgzX7z8t2IhmDPjOJXeCtHXROum 1tUGuF5IzgDM6cIYU1mdEdz0JATRcVC7iySmlRXDr/i42zXw4Fg2sW/6Qv3aeJyTdzuQ mAQM57Y9WEjTFIZ7G8CSdI79naaQecyZJCPekObf42vDU2BwNvqoBcPoE59kJxij3ONX hTHA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=y8Pjt8EA8tFinVOCzCHnLFdQUw7jcSWs6svSbR9DVSQ=; b=KTUZt+n/fCTJ90AskZhAa22hyvXvF4T5MHC1sVhdWpqM2BegSGi2h0o3gmV71QYkHU UA9ytXQf6PSTs7/iBTNcEb8Y5ep5mHWx34XgM+jQ9WbisZ8l91l7zj2FfInTfMclZ0GG ohq6P/U/YGVExlfHpLun2FFswdI1R3GDtYIRfDfPst1bDrWeR0l3Tzz0ir98hXbPBHw4 OUQsslPl9zQ8/IIh7ezyqgos9xxgjUo52asGVjolCfAXxK8Kzq8DqN4nRTGJSsj+kLl5 trDVawoPYqiiuWcnqUaHoW5eqDLI/0DL6961ag6aQksBFQA7JYNp0ysk9MS+ChTnlN0K MNfw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=tNQhbwuz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id r14-20020aa7988e000000b0066172edecccsi1228434pfl.145.2023.06.07.00.39.58; Wed, 07 Jun 2023 00:40:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=tNQhbwuz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238879AbjFGHfy (ORCPT <rfc822;literming00@gmail.com> + 99 others); Wed, 7 Jun 2023 03:35:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33564 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239174AbjFGHeb (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 7 Jun 2023 03:34:31 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 144911BEF for <linux-kernel@vger.kernel.org>; Wed, 7 Jun 2023 00:29:44 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id B296B219EC; Wed, 7 Jun 2023 07:29:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1686122982; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=y8Pjt8EA8tFinVOCzCHnLFdQUw7jcSWs6svSbR9DVSQ=; b=tNQhbwuzxd5SYqDHVX1o1k0QPg1PyklcAk0KnWACq32tGICNwRNqxEcpfISX3DoiwspJma InYIIsN2JAZuvWOhF/lywInWaSzYm6Vlgg6fNqZxDHqt1tBGevmgEIzlefQ8vnHNXqlMvk RmV2oovEiZq5IZC61VRKr1+GcSKlyus= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 610D413776; Wed, 7 Jun 2023 07:29:42 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id OMwTFeYxgGSUIQAAMHmgww (envelope-from <nik.borisov@suse.com>); Wed, 07 Jun 2023 07:29:42 +0000 From: Nikolay Borisov <nik.borisov@suse.com> To: x86@kernel.org Cc: linux-kernel@vger.kernel.org, mhocko@suse.com, jslaby@suse.cz, Nikolay Borisov <nik.borisov@suse.com> Subject: [PATCH 3/3] x86: Disable running 32bit processes if ia32_disabled is passed Date: Wed, 7 Jun 2023 10:29:36 +0300 Message-Id: <20230607072936.3766231-4-nik.borisov@suse.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230607072936.3766231-1-nik.borisov@suse.com> References: <20230607072936.3766231-1-nik.borisov@suse.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768028753236071642?= X-GMAIL-MSGID: =?utf-8?q?1768028753236071642?= |
Series |
Add ability to disable ia32 at boot time
|
|
Commit Message
Nikolay Borisov
June 7, 2023, 7:29 a.m. UTC
In addition to disabling 32bit syscall interface let's also disable the
ability to run 32bit processes altogether. This is achieved by setting
the GDT_ENTRY_DEFAULT_USER32_CS descriptor to not present which would
cause 32 bit processes to trap with a #NP exception. Furthermore,
forbid loading compat processes as well.
Signed-off-by: Nikolay Borisov <nik.borisov@suse.com>
---
arch/x86/include/asm/elf.h | 5 +++--
arch/x86/kernel/cpu/common.c | 8 ++++++++
2 files changed, 11 insertions(+), 2 deletions(-)
Comments
On Wed, Jun 07 2023 at 10:29, Nikolay Borisov wrote: > In addition to disabling 32bit syscall interface let's also disable the > ability to run 32bit processes altogether. This is achieved by setting > the GDT_ENTRY_DEFAULT_USER32_CS descriptor to not present which would > cause 32 bit processes to trap with a #NP exception. Furthermore, > forbid loading compat processes as well. This is obviously the wrong order of things. Prevent loading of compat processes is the first step, no? > > +extern bool ia32_disabled; > #define compat_elf_check_arch(x) \ So in patch 1 you add the declaration with #ifdef guards and now you add another one without. Fortunately this is the last patch otherwise we'd might end up with another incarnation in the next header file. > - (elf_check_arch_ia32(x) || \ > - (IS_ENABLED(CONFIG_X86_X32_ABI) && (x)->e_machine == EM_X86_64)) > + (!ia32_disabled && (elf_check_arch_ia32(x) || \ > + (IS_ENABLED(CONFIG_X86_X32_ABI) && (x)->e_machine == EM_X86_64))) If I'm reading this correctly then ia32_disabled also prevents binaries with X32 ABI to be loaded. That might be intentional but I'm failing to find any explanation for this in the changelog. X32_ABI != IA32_EMULATION > static inline void elf_common_init(struct thread_struct *t, > struct pt_regs *regs, const u16 ds) > diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c > index 71f8b55f70c9..ddc301c09419 100644 > --- a/arch/x86/kernel/cpu/common.c > +++ b/arch/x86/kernel/cpu/common.c > @@ -2359,6 +2359,11 @@ void microcode_check(struct cpuinfo_x86 *prev_info) > } > #endif > > +static void remove_user32cs_from_gdt(void * __unused) > +{ > + get_current_gdt_rw()[GDT_ENTRY_DEFAULT_USER32_CS].p = 0; > +} > + > /* > * Invoked from core CPU hotplug code after hotplug operations > */ > @@ -2368,4 +2373,7 @@ void arch_smt_update(void) > cpu_bugs_smt_update(); > /* Check whether IPI broadcasting can be enabled */ > apic_smt_update(); > + if (ia32_disabled) > + on_each_cpu(remove_user32cs_from_gdt, NULL, 1); > + > } This issues a SMP function call on all online CPUs to set these entries to 0 on _every_ CPU hotplug operation. I'm sure there is a reason why these bits need to be cleared over and over. It's just not obvious to me why clearing them _ONCE_ per boot is not sufficient. It's neither clear to me why CPU0 must do that ($NCPUS - 1) times, but for the last CPU it's enough to do it once. That aside, what's the justification for doing this in the first place and why is this again inconsistent vs. CONFIG_IA32_EMULATION=n? The changelog is full of void in that aspect. Thanks, tglx
On 7.06.23 г. 15:01 ч., Thomas Gleixner wrote: > On Wed, Jun 07 2023 at 10:29, Nikolay Borisov wrote: >> In addition to disabling 32bit syscall interface let's also disable the >> ability to run 32bit processes altogether. This is achieved by setting >> the GDT_ENTRY_DEFAULT_USER32_CS descriptor to not present which would >> cause 32 bit processes to trap with a #NP exception. Furthermore, >> forbid loading compat processes as well. > > This is obviously the wrong order of things. Prevent loading of compat > processes is the first step, no? You mean to change the sequence in which those things are mentioned in the log? > >> >> +extern bool ia32_disabled; >> #define compat_elf_check_arch(x) \ > > So in patch 1 you add the declaration with #ifdef guards and now you add > another one without. Fortunately this is the last patch otherwise we'd > might end up with another incarnation in the next header file. My bad, will fix it. > >> - (elf_check_arch_ia32(x) || \ >> - (IS_ENABLED(CONFIG_X86_X32_ABI) && (x)->e_machine == EM_X86_64)) >> + (!ia32_disabled && (elf_check_arch_ia32(x) || \ >> + (IS_ENABLED(CONFIG_X86_X32_ABI) && (x)->e_machine == EM_X86_64))) > > If I'm reading this correctly then ia32_disabled also prevents binaries > with X32 ABI to be loaded. > > That might be intentional but I'm failing to find any explanation for > this in the changelog. > > X32_ABI != IA32_EMULATION Right, however given the other changes (i.e disabling sysenter/int 0x80) can we really have a working X32 abi when ia32_disabled is true? Now I'm thinking can we really have IA32_EMULATION && X32_ABI && ia32_disabled, I guess the answer is no? > >> static inline void elf_common_init(struct thread_struct *t, >> struct pt_regs *regs, const u16 ds) >> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c >> index 71f8b55f70c9..ddc301c09419 100644 >> --- a/arch/x86/kernel/cpu/common.c >> +++ b/arch/x86/kernel/cpu/common.c >> @@ -2359,6 +2359,11 @@ void microcode_check(struct cpuinfo_x86 *prev_info) >> } >> #endif >> >> +static void remove_user32cs_from_gdt(void * __unused) >> +{ >> + get_current_gdt_rw()[GDT_ENTRY_DEFAULT_USER32_CS].p = 0; >> +} >> + >> /* >> * Invoked from core CPU hotplug code after hotplug operations >> */ >> @@ -2368,4 +2373,7 @@ void arch_smt_update(void) >> cpu_bugs_smt_update(); >> /* Check whether IPI broadcasting can be enabled */ >> apic_smt_update(); >> + if (ia32_disabled) >> + on_each_cpu(remove_user32cs_from_gdt, NULL, 1); >> + >> } > > This issues a SMP function call on all online CPUs to set these entries > to 0 on _every_ CPU hotplug operation. > > I'm sure there is a reason why these bits need to be cleared over and > over. It's just not obvious to me why clearing them _ONCE_ per boot is > not sufficient. It's neither clear to me why CPU0 must do that ($NCPUS - > 1) times, but for the last CPU it's enough to do it once. Actually clearing them once per-cpu is perfectly fine. Looking around the code i saw arch_smt_update() to be the only place where a on_each_cpu() call is being made hence I put the code there. Another aspect I was thinking of is what if a cpu gets onlined at a later stage and a 32bit process is scheduled on that cpu, if the gdt entry wasn't cleared on that CPU then it would be possible to run 32bit processes on it. I guess a better alternative is to use arch_initcall() ? > > That aside, what's the justification for doing this in the first place > and why is this again inconsistent vs. CONFIG_IA32_EMULATION=n? I'll put it under an ifdef CONFIG_IA32_EMULATION, unfortunately the traps.h header can't be included in elf.h without causing build breakage. > > The changelog is full of void in that aspect. > > Thanks, > > tglx >
On Wed, Jun 07 2023 at 15:19, Nikolay Borisov wrote: > On 7.06.23 г. 15:01 ч., Thomas Gleixner wrote: >> >>> - (elf_check_arch_ia32(x) || \ >>> - (IS_ENABLED(CONFIG_X86_X32_ABI) && (x)->e_machine == EM_X86_64)) >>> + (!ia32_disabled && (elf_check_arch_ia32(x) || \ >>> + (IS_ENABLED(CONFIG_X86_X32_ABI) && (x)->e_machine == EM_X86_64))) >> >> If I'm reading this correctly then ia32_disabled also prevents binaries >> with X32 ABI to be loaded. >> >> That might be intentional but I'm failing to find any explanation for >> this in the changelog. >> >> X32_ABI != IA32_EMULATION > > Right, however given the other changes (i.e disabling sysenter/int 0x80) > can we really have a working X32 abi when ia32_disabled is true? Now I'm > thinking can we really have IA32_EMULATION && X32_ABI && ia32_disabled, > I guess the answer is no? X32_ABI is completely _independent_ from IA32_EMULATION. It just shares some of the required compat code, but it does not use sysenter/int 0x80 at all. It uses the regular 64bit system call. You can build a working kernel with X32_ABI=y and IA32_EMULATION=n. So why would boottime disabling of IA32_EMULATION affect X32_ABI in any way? >> >> This issues a SMP function call on all online CPUs to set these entries >> to 0 on _every_ CPU hotplug operation. >> >> I'm sure there is a reason why these bits need to be cleared over and >> over. It's just not obvious to me why clearing them _ONCE_ per boot is >> not sufficient. It's neither clear to me why CPU0 must do that ($NCPUS - >> 1) times, but for the last CPU it's enough to do it once. > > Actually clearing them once per-cpu is perfectly fine. Looking around > the code i saw arch_smt_update() to be the only place where a > on_each_cpu() call is being made hence I put the code there. Another > aspect I was thinking of is what if a cpu gets onlined at a later stage > and a 32bit process is scheduled on that cpu, if the gdt entry wasn't > cleared on that CPU then it would be possible to run 32bit processes on > it. I guess a better alternative is to use arch_initcall() ? Why do you need an on_each_cpu() function call at all? Why would you need an extra arch_initcall()? The obvious place to clear this is when a CPU is initialized, no? >> That aside, what's the justification for doing this in the first place >> and why is this again inconsistent vs. CONFIG_IA32_EMULATION=n? > > I'll put it under an ifdef CONFIG_IA32_EMULATION, unfortunately the > traps.h header can't be included in elf.h without causing build breakage. You are not answering my question at all and neither the elf nor the traps header have anything to do with it. I'm happy to rephrase it: 1) What is the justification for setting the 'present' bit of GDT_ENTRY_DEFAULT_USER32_CS to 0? 2) Why is #1 inconsistent with CONFIG_IA32_EMULATION=n? Thanks, tglx
On Wed, Jun 07 2023 at 14:01, Thomas Gleixner wrote: > On Wed, Jun 07 2023 at 10:29, Nikolay Borisov wrote: >> @@ -2368,4 +2373,7 @@ void arch_smt_update(void) >> cpu_bugs_smt_update(); >> /* Check whether IPI broadcasting can be enabled */ >> apic_smt_update(); >> + if (ia32_disabled) >> + on_each_cpu(remove_user32cs_from_gdt, NULL, 1); My brain based compiler tells me, that this breaks the 32bit build and the 64bit build with CONFIG_IA32_EMULATION=n. I'm pretty confident that a real compiler will agree.
On 7.06.23 г. 15:53 ч., Thomas Gleixner wrote: > On Wed, Jun 07 2023 at 15:19, Nikolay Borisov wrote: >> On 7.06.23 г. 15:01 ч., Thomas Gleixner wrote: >>> >>>> - (elf_check_arch_ia32(x) || \ >>>> - (IS_ENABLED(CONFIG_X86_X32_ABI) && (x)->e_machine == EM_X86_64)) >>>> + (!ia32_disabled && (elf_check_arch_ia32(x) || \ >>>> + (IS_ENABLED(CONFIG_X86_X32_ABI) && (x)->e_machine == EM_X86_64))) >>> >>> If I'm reading this correctly then ia32_disabled also prevents binaries >>> with X32 ABI to be loaded. >>> >>> That might be intentional but I'm failing to find any explanation for >>> this in the changelog. >>> >>> X32_ABI != IA32_EMULATION >> >> Right, however given the other changes (i.e disabling sysenter/int 0x80) >> can we really have a working X32 abi when ia32_disabled is true? Now I'm >> thinking can we really have IA32_EMULATION && X32_ABI && ia32_disabled, >> I guess the answer is no? > > X32_ABI is completely _independent_ from IA32_EMULATION. > > It just shares some of the required compat code, but it does not use > sysenter/int 0x80 at all. It uses the regular 64bit system call. > > You can build a working kernel with X32_ABI=y and IA32_EMULATION=n. > > So why would boottime disabling of IA32_EMULATION affect X32_ABI in any > way? In this case it shouldn't affect it and the check should be ((elf_check_arch_ia32(x) && !ia32_disabled) || (IS_ENABLED(CONFIG_X86_X32_ABI) && (x)->e_machine == EM_X86_64)). > >>> >>> This issues a SMP function call on all online CPUs to set these entries >>> to 0 on _every_ CPU hotplug operation. >>> >>> I'm sure there is a reason why these bits need to be cleared over and >>> over. It's just not obvious to me why clearing them _ONCE_ per boot is >>> not sufficient. It's neither clear to me why CPU0 must do that ($NCPUS - >>> 1) times, but for the last CPU it's enough to do it once. >> >> Actually clearing them once per-cpu is perfectly fine. Looking around >> the code i saw arch_smt_update() to be the only place where a >> on_each_cpu() call is being made hence I put the code there. Another >> aspect I was thinking of is what if a cpu gets onlined at a later stage >> and a 32bit process is scheduled on that cpu, if the gdt entry wasn't >> cleared on that CPU then it would be possible to run 32bit processes on >> it. I guess a better alternative is to use arch_initcall() ? > > Why do you need an on_each_cpu() function call at all? Why would you > need an extra arch_initcall()? > > The obvious place to clear this is when a CPU is initialized, no? > >>> That aside, what's the justification for doing this in the first place >>> and why is this again inconsistent vs. CONFIG_IA32_EMULATION=n? >> >> I'll put it under an ifdef CONFIG_IA32_EMULATION, unfortunately the >> traps.h header can't be included in elf.h without causing build breakage. > > You are not answering my question at all and neither the elf nor the > traps header have anything to do with it. I'm happy to rephrase it: > > 1) What is the justification for setting the 'present' bit of > GDT_ENTRY_DEFAULT_USER32_CS to 0? This was something which was suggested by Andrew Cooper on irc, to my understanding the idea is that by not having a 32bit capable descriptor it's impossible to run a 32bit code. I guess the scenario where it might be relevant if someone starts a 64bit process and with inline assembly tries to run 32bit code somehow, though it might be a far fetched example and also the fact that the compat_elf_check_arch() forbids loading 32bit processes might be sufficient. > > 2) Why is #1 inconsistent with CONFIG_IA32_EMULATION=n? Because I forgot doing it. > > Thanks, > > tglx > > >
On Wed, Jun 07 2023 at 16:38, Nikolay Borisov wrote: > On 7.06.23 г. 15:53 ч., Thomas Gleixner wrote: >> So why would boottime disabling of IA32_EMULATION affect X32_ABI in any >> way? > > In this case it shouldn't affect it and the check should be > > ((elf_check_arch_ia32(x) && !ia32_disabled) || > (IS_ENABLED(CONFIG_X86_X32_ABI) && (x)->e_machine == EM_X86_64)). Correct. >> 1) What is the justification for setting the 'present' bit of >> GDT_ENTRY_DEFAULT_USER32_CS to 0? > > This was something which was suggested by Andrew Cooper on irc, to my > understanding the idea is that by not having a 32bit capable descriptor > it's impossible to run a 32bit code. Right, but that's a completely separate change. If it is agreed on then it needs to be consistent and not depend on this command line parameter. > I guess the scenario where it might be relevant if someone starts a > 64bit process and with inline assembly tries to run 32bit code > somehow, though it might be a far fetched example and also the fact > that the compat_elf_check_arch() forbids loading 32bit processes might > be sufficient. Guesswork is not helpful. Facts matter. Fact is that today a 64bit application can do a far jump of far call into a 32bit code segment based on the default descriptors or by setting them up via LDT. That 32bit code obviously can't do compat syscalls if IA32_EMULATION is disabled, but other than that it just works. Whether that makes sense or not is a completely different question. Ergo fact is that clearing the present bit is a user space visible change, which can't be done nilly willy and burried into a patch which is about making CONFIG_IA32_EMULATION a boot time switch. Thanks, tglx
On Wed, Jun 07 2023 at 18:25, Andrew Cooper wrote: > On 07/06/2023 3:49 pm, Thomas Gleixner wrote: >> Ergo fact is that clearing the present bit is a user space visible >> change, which can't be done nilly willy and burried into a patch >> which is about making CONFIG_IA32_EMULATION a boot time switch. > > Removing GDT_ENTRY_DEFAULT_USER32_CS is necessary but not sufficient to > block userspace getting into 32bit mode. Correct. > You also have to block Linux from taking any SYSRETL or SYSEXITL path > out of the kernel, as these will load fixed 32bit mode attributes into > %cs without reference to the GDT. That's non-trivial as there is no way to disable 32bit SYSCALL on AMD (Intel does not support 32bit syscall and you get #UD if CS.L != 1). So to be safe you'd need to make ignore_sysret() kill the process w/o returning to user space. Though arguably if GDT does not have USER32_CS and LDT is disabled (or the creation of code segments is blocked) then invoking SYSCALL from compat mode requires quite some advanced magic (assumed there are no CPU and kernel bugs), no? > And you need to prevent any userspace use of the LDT, which might be as > simple as just blocking SYS_modify_ldt, but it's been a while since I > last looked. CONFIG_MODIFY_LDT_SYSCALL=n is the only in kernel option right now, but that could be made boottime disabled trivially. Extending LDT to reject the creation of code segments is not rocket science either. Though the real question is: What is the benefit of such a change? So far I haven't seen any argument for that. Maybe there is none :) Thanks, tglx
On Thu, Jun 08 2023 at 00:43, Andrew Cooper wrote: > On 07/06/2023 10:52 pm, Thomas Gleixner wrote: >> On Wed, Jun 07 2023 at 18:25, Andrew Cooper wrote: >>> You also have to block Linux from taking any SYSRETL or SYSEXITL path >>> out of the kernel, as these will load fixed 32bit mode attributes into >>> %cs without reference to the GDT. >> That's non-trivial as there is no way to disable 32bit SYSCALL on AMD >> (Intel does not support 32bit syscall and you get #UD if CS.L != 1). So >> to be safe you'd need to make ignore_sysret() kill the process w/o >> returning to user space. > > ignore_sysret() desperately needs renaming to entry_SYSCALL32_ignore() > or similar. No argument about that. > And yes, wiring this into SIGSEGV/etc would be a sensible move. The only option is to wire it into die_hard_crash_and_burn(). There is no sane way to deliver a signal to the process which managed to run into this. Appropriate info to parent or ptracer will still be delivered. > The same applies to 32bit SYSENTER if configured. (Linux doesn't, but > other OSes do.) Why the heck are they doing that? I really wish that we could disable syscall32 reliably on AMD and make it raise #UD as it does on Intal. >> Though the real question is: >> >> What is the benefit of such a change? >> >> So far I haven't seen any argument for that. Maybe there is none :) > > Hardening. The general purpose distros definitely won't care, but > special purpose ones will. > > An x86 bytestream is decoded differently in different modes, and malware > can hide in the differences. Standard tooling can't cope with > multi-mode binaries, and if it happens by accident you tend get very > obscure crash to diagnose. IOW, you are talking about defense in depth, right? I can buy that one. > Furthermore, despite CET-SS explicitly trying to account for and protect > against accidental mismatches, there are errata in some parts which let > userspace forge legal return addresses on the shadow stack by dropping > into 32bit mode because, there's a #GP check missing in a microflow. Didn't we assume that there are no CPU bugs? :) > For usecases where there ought not to be any 32bit code at all (and > there absolutely are), it would be lovely if this could be enforced, > rather than relying on blind hope that it doesn't happen. I think it's rather clear what needs to be done here to achieve that, but that's completely orthogonal to the intent of the patch series in question which aims to make CONFIG_IA32_EMULATION a boot time decision. Thanks, tglx
On Wed, Jun 7, 2023 at 3:41 AM Nikolay Borisov <nik.borisov@suse.com> wrote: > > In addition to disabling 32bit syscall interface let's also disable the > ability to run 32bit processes altogether. This is achieved by setting > the GDT_ENTRY_DEFAULT_USER32_CS descriptor to not present which would > cause 32 bit processes to trap with a #NP exception. Furthermore, > forbid loading compat processes as well. > > Signed-off-by: Nikolay Borisov <nik.borisov@suse.com> > --- > arch/x86/include/asm/elf.h | 5 +++-- > arch/x86/kernel/cpu/common.c | 8 ++++++++ > 2 files changed, 11 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h > index 18fd06f7936a..406245bc0fb0 100644 > --- a/arch/x86/include/asm/elf.h > +++ b/arch/x86/include/asm/elf.h > @@ -148,9 +148,10 @@ do { \ > #define elf_check_arch(x) \ > ((x)->e_machine == EM_X86_64) > > +extern bool ia32_disabled; > #define compat_elf_check_arch(x) \ > - (elf_check_arch_ia32(x) || \ > - (IS_ENABLED(CONFIG_X86_X32_ABI) && (x)->e_machine == EM_X86_64)) > + (!ia32_disabled && (elf_check_arch_ia32(x) || \ > + (IS_ENABLED(CONFIG_X86_X32_ABI) && (x)->e_machine == EM_X86_64))) > > static inline void elf_common_init(struct thread_struct *t, > struct pt_regs *regs, const u16 ds) > diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c > index 71f8b55f70c9..ddc301c09419 100644 > --- a/arch/x86/kernel/cpu/common.c > +++ b/arch/x86/kernel/cpu/common.c > @@ -2359,6 +2359,11 @@ void microcode_check(struct cpuinfo_x86 *prev_info) > } > #endif > > +static void remove_user32cs_from_gdt(void * __unused) > +{ > + get_current_gdt_rw()[GDT_ENTRY_DEFAULT_USER32_CS].p = 0; > +} > + > /* > * Invoked from core CPU hotplug code after hotplug operations > */ > @@ -2368,4 +2373,7 @@ void arch_smt_update(void) > cpu_bugs_smt_update(); > /* Check whether IPI broadcasting can be enabled */ > apic_smt_update(); > + if (ia32_disabled) > + on_each_cpu(remove_user32cs_from_gdt, NULL, 1); > + > } Disabling USER32_CS isn't really necessary, as simply running 32-bit user code (in a normally 64-bit process) doesn't pose much of a threat to the kernel with 32-bit syscalls disabled. Wine, for example, is moving to a model where the main code runs in 64-bit mode, uses only 64-bit unix libraries, and the 32->64 bit transitions are done entirely in userspace. It will still need the ability to use 32-bit code segments, but won't need 32-bit unix libraries or syscalls to run 32-bit Windows code. Brian Gerst
On 08. 06. 23, 2:25, Thomas Gleixner wrote: > I really wish that we could disable syscall32 reliably on AMD and make > it raise #UD as it does on Intal. Sorry, I am likely missing something, but why is not #GP enough when we set CSTAR = 0? It's of course not as good as Intel's *defined* #UD, but why is not the above sufficient/reliable? thanks,
On 08. 06. 23, 2:25, Thomas Gleixner wrote: >> For usecases where there ought not to be any 32bit code at all (and >> there absolutely are), it would be lovely if this could be enforced, >> rather than relying on blind hope that it doesn't happen. > > I think it's rather clear what needs to be done here to achieve that, > but that's completely orthogonal to the intent of the patch series in > question which aims to make CONFIG_IA32_EMULATION a boot time decision. Agreed. The original intent was only to disable the 32bit paths in the kernel. I.e. set CONFIG_IA32_EMULATION=n by a runtime switch. Then we came up with idea to disallow loading 32bit binaries to WARN people as the bins won't (mostly) work anyway. (We are/were aware this doesn't forbid running 32bit code.) But now, when we are doing that, I think we should disable 32 bits completely by the switch. I.e. don't provide 32bit segments and whatever. And make that clear and documented in the series. Because as it stands now, it's half-way. Agreed? This whole attempt served as a call for opinions which we got and which is perfect. thanks,
On 08. 06. 23, 8:16, Jiri Slaby wrote: > On 08. 06. 23, 2:25, Thomas Gleixner wrote: >> I really wish that we could disable syscall32 reliably on AMD and make >> it raise #UD as it does on Intal. > > Sorry, I am likely missing something, but why is not #GP enough when we > set CSTAR = 0? Or rather to some hole (to avoid mappings when mmap_min_addr=0) or to something like entry_SYSCALL32_kill which you suggested elsewhere. But that is maybe what you consider not being "reliable". > It's of course not as good as Intel's *defined* #UD, but > why is not the above sufficient/reliable? > > thanks,
On Thu, Jun 08 2023 at 08:16, Jiri Slaby wrote: > On 08. 06. 23, 2:25, Thomas Gleixner wrote: >> I really wish that we could disable syscall32 reliably on AMD and make >> it raise #UD as it does on Intal. > > Sorry, I am likely missing something, but why is not #GP enough when we > set CSTAR = 0? Because you are not getting a #GP. It will try to execute from virtual address 0 in CPL 0 and with RSP still pointing to the user space stack. So you have several possibilities: 1) 0 is mapped in user space and SMEP/SMAP is off. Attacker won 2) 0 is not mapped or SMEP is on. You get #PF from CPL0 and RSP is still pointing to the user space stack. If SMAP is on this results in #DF If SMAP is off, kernel uses an attacker controlled stack... Similar sillies when you set it to a valid kernel address which is not mapped or lacks X or contains invalid opcode .... So no. CSTAR _must_ be a valid kernel text address which handles the 32bit syscall. Right now all it does is SYSRETL when IA32_EMULATION is disabled. So the only way to handle that is to have proper entry code which switches to kernel context and then runs "syscall32_kill_myself()" which kills the process hard and it exits without the chance to attempt a return to user. Anything else wont work. Bah. Was it really necessary to bring this up so I hade to page in the gory details of this hardware insanity again? Thanks, tglx
On Thu, Jun 08 2023 at 12:25, Andrew Cooper wrote: > On 08/06/2023 1:25 am, Thomas Gleixner wrote: >> I really wish that we could disable syscall32 reliably on AMD and make >> it raise #UD as it does on Intal. > > You know that's changing in FRED, and will follow the AMD model? > > The SYSCALL instruction is lower latency if it doesn't need to check %cs > to conditionally #UD. Yes, but with FRED the CPL0 context is fully consistent. There are no CPL3 leftovers. >> Didn't we assume that there are no CPU bugs? :) > > Tell me, is such a CPU delivered with or without a complimentary unicorn? :) It's deliverd by a fairytale princess :) Thanks, tglx
On 8.06.23 г. 14:25 ч., Andrew Cooper wrote: > On 08/06/2023 1:25 am, Thomas Gleixner wrote: >> On Thu, Jun 08 2023 at 00:43, Andrew Cooper wrote: >>> And yes, wiring this into SIGSEGV/etc would be a sensible move. >> The only option is to wire it into die_hard_crash_and_burn(). There is >> no sane way to deliver a signal to the process which managed to run into >> this. Appropriate info to parent or ptracer will still be delivered. > > Hmm yeah. Something has already gone seriously wrong to end up here, so > terminating it is probably best. > >> I really wish that we could disable syscall32 reliably on AMD and make >> it raise #UD as it does on Intal. > > You know that's changing in FRED, and will follow the AMD model? > > The SYSCALL instruction is lower latency if it doesn't need to check %cs > to conditionally #UD. > >>> Furthermore, despite CET-SS explicitly trying to account for and protect >>> against accidental mismatches, there are errata in some parts which let >>> userspace forge legal return addresses on the shadow stack by dropping >>> into 32bit mode because, there's a #GP check missing in a microflow. >> Didn't we assume that there are no CPU bugs? :) > > Tell me, is such a CPU delivered with or without a complimentary unicorn? :) > >>> For usecases where there ought not to be any 32bit code at all (and >>> there absolutely are), it would be lovely if this could be enforced, >>> rather than relying on blind hope that it doesn't happen. >> I think it's rather clear what needs to be done here to achieve that, >> but that's completely orthogonal to the intent of the patch series in >> question which aims to make CONFIG_IA32_EMULATION a boot time decision. > > Fully inhibiting 32bit mode is stronger, because it impacts code which > would otherwise operate in CONFIG_IA32_EMULATION=n configuration. > > Which is fine, but I agree that it doesn't want to be confused with > being a "runtime CONFIG_IA32_EMULATION" knob. > > If the runtime form could also come with an equivalent Kconfig form, > that would be awesome too. Actually that's something I'm working on. I.e be able to set the default state of IA32_EMULATION at compile time (i.e disabled or enabled) and provide a boottime switch to override this. That way upstream can retain the current behavior, while distros can set the "default disabled" config-time switch and finally users will have the ability to override the config-switch at boottime. > > ~Andrew
diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h index 18fd06f7936a..406245bc0fb0 100644 --- a/arch/x86/include/asm/elf.h +++ b/arch/x86/include/asm/elf.h @@ -148,9 +148,10 @@ do { \ #define elf_check_arch(x) \ ((x)->e_machine == EM_X86_64) +extern bool ia32_disabled; #define compat_elf_check_arch(x) \ - (elf_check_arch_ia32(x) || \ - (IS_ENABLED(CONFIG_X86_X32_ABI) && (x)->e_machine == EM_X86_64)) + (!ia32_disabled && (elf_check_arch_ia32(x) || \ + (IS_ENABLED(CONFIG_X86_X32_ABI) && (x)->e_machine == EM_X86_64))) static inline void elf_common_init(struct thread_struct *t, struct pt_regs *regs, const u16 ds) diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 71f8b55f70c9..ddc301c09419 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -2359,6 +2359,11 @@ void microcode_check(struct cpuinfo_x86 *prev_info) } #endif +static void remove_user32cs_from_gdt(void * __unused) +{ + get_current_gdt_rw()[GDT_ENTRY_DEFAULT_USER32_CS].p = 0; +} + /* * Invoked from core CPU hotplug code after hotplug operations */ @@ -2368,4 +2373,7 @@ void arch_smt_update(void) cpu_bugs_smt_update(); /* Check whether IPI broadcasting can be enabled */ apic_smt_update(); + if (ia32_disabled) + on_each_cpu(remove_user32cs_from_gdt, NULL, 1); + }