Message ID | 20231213163443.70490-2-brgerst@gmail.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:3b04:b0:fb:cd0c:d3e with SMTP id c4csp7903100dys; Wed, 13 Dec 2023 08:35:02 -0800 (PST) X-Google-Smtp-Source: AGHT+IF5unaNtwL7EgyA/5oa8kK7aCMPn72nTgt9xGJCOQK2/SzZ3w3TZYAeB2f1CzNqSG06OJAj X-Received: by 2002:a17:902:e752:b0:1d3:45c9:534c with SMTP id p18-20020a170902e75200b001d345c9534cmr2441352plf.29.1702485301992; Wed, 13 Dec 2023 08:35:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702485301; cv=none; d=google.com; s=arc-20160816; b=p+RmZYEuXXkwKfCZi8zLVEy6JJ/U3RFMnQLVucL1+tL66QEEUmqmFp5fBm3ECaQ5lj xMx1KlrxS73LToBameHYMjJ4yir34/lMg5TF36HDM0ahxy+/RPz53lYkqPyU67YZVBL5 B2r0EWQA3MyUN/dfolTAvM9cm7bUcstrhDa9lQf7QhYTUSfqipMeMv2BoZcleXO8o4XN rC/lFXgDxGHxctK54BzggirK2WpP46n/9z02gsuylh03+29IFfQBBWN4oAcjnL6RheTJ Jx6aYv8xXMrBcHoSbSFPTOJNXChMCc4a52IPzE9oeiSYTZ+7/nQceeAU8IEkkAdg5OX+ cqwg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=zJ6yzzdHCK6etXayum+0eGdCGB2aFZEnSrrZblNix5w=; fh=hGVwEbR+KjbERSNTtzxXLjTmhPUyOh65sEze+4fUHLA=; b=bDH3JDz1Uh5ZWpvVNt/VK5T8GkRp7OdGitQmycTCSx0wGzoXetWzuUxu9C3niKAppQ +uS/oi64RvLglMLcN/uQEdujUo2dpXrzLCTAsm3kiFrb7396HJ0uRfQqFXASmqrYes2V 4PAixFJfphRgD0lHsJeopcQxaQkS14xCVmSwzXX1GjJH1e7yXHy/IFO5nJxTF7r1/pA0 670cxh2q0uNhV2/FAurV74s8JDp73idRCbVLamVJhCiNWkXvkHzxACKGLkrPtDrRu3EY u+AGZ/elaqvFlsu0uFr8YffVGmwRdKgRO4BOd6cpTZ5JaYqwgxeaXaD+bhLMRVgulaZ5 q/Mw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=R+YhY2IB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from agentk.vger.email (agentk.vger.email. [2620:137:e000::3:2]) by mx.google.com with ESMTPS id t6-20020a170902b20600b001cfe0284177si9554724plr.102.2023.12.13.08.35.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Dec 2023 08:35:01 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) client-ip=2620:137:e000::3:2; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=R+YhY2IB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 5B38480A8537; Wed, 13 Dec 2023 08:34:57 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232874AbjLMQeo (ORCPT <rfc822;dexuan.linux@gmail.com> + 99 others); Wed, 13 Dec 2023 11:34:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60226 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232663AbjLMQem (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 13 Dec 2023 11:34:42 -0500 Received: from mail-oo1-xc2e.google.com (mail-oo1-xc2e.google.com [IPv6:2607:f8b0:4864:20::c2e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 38C1998 for <linux-kernel@vger.kernel.org>; Wed, 13 Dec 2023 08:34:49 -0800 (PST) Received: by mail-oo1-xc2e.google.com with SMTP id 006d021491bc7-59093f6c94aso2375406eaf.0 for <linux-kernel@vger.kernel.org>; Wed, 13 Dec 2023 08:34:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1702485287; x=1703090087; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=zJ6yzzdHCK6etXayum+0eGdCGB2aFZEnSrrZblNix5w=; b=R+YhY2IBlfBgXuGL1bpFveRtSQR9n0PiMGYM4wqdsvdj38EURvCi5SE2KyX7pLol1V Ux74x1K4AUNUKCgGB6WESiM+54J14rLQmvG4l0pqIEHqsSFDrf3IljRU2kfc7SKuIER3 yYIcbJp0DjUKVeFE689yIz32ESMhle7NPXbzOdiBq/oFk2DuOkmbqnk/5LfNPEK7XfAV /GWzoXGYMRD0T6psf3cZFiYarwpKR39/4X3eVd6SLVWsdKmCPB0exQQrjzL7v8zOZiTi 856ydvvV06Dh0PPi5jPsujAhasB1itWXPeIP5PNslI43KOssBndqjG2Ql49PZTExKdY+ kqxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702485287; x=1703090087; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zJ6yzzdHCK6etXayum+0eGdCGB2aFZEnSrrZblNix5w=; b=kcllg9wLL8G7GxQhWPp0tvUUIcgyHr9nctEHDonTbjtElWdnO82p2i057YGJgURvkY k9aQy6ZLv2cg2eQVclX5hL10/6qexyxAGhk/9QgqPSkKwc4PtGUpXkNSFiJr2ty75Bqf 7wcjbZkvvPUZddwv6+WqkXcfnsTIjNzrmhZZ08ED5xT8+5idqp4w+N0ITf+HJXkujWKA 6NjV97YtLfM3hcx35Q6PdIcq1p+TAg8kdfT2JNFnOe7txk0KgURkYIbMT4p0NSUHhmmN uuDCX8zq7WSwjlBblwCqaW6ncCbU3EQH/fJsQqRiVtO1jgCwReNGThuyDXSi6qhFs4z8 M6cA== X-Gm-Message-State: AOJu0Yw5WNEHboq9s01IaLhhinsCSoUOhVg/MfKOfiK8vPB8XvinrYeZ g+abiKtzkXsJOHr0THKlcGYTrDc9rQ== X-Received: by 2002:a4a:e825:0:b0:590:e78e:3e37 with SMTP id d5-20020a4ae825000000b00590e78e3e37mr2884222ood.1.1702485287640; Wed, 13 Dec 2023 08:34:47 -0800 (PST) Received: from citadel.lan ([2600:6c4a:4d3f:6d5c::1019]) by smtp.gmail.com with ESMTPSA id j11-20020a4ad2cb000000b005907ad9f302sm3104901oos.37.2023.12.13.08.34.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Dec 2023 08:34:47 -0800 (PST) From: Brian Gerst <brgerst@gmail.com> To: linux-kernel@vger.kernel.org, x86@kernel.org Cc: Ingo Molnar <mingo@kernel.org>, Thomas Gleixner <tglx@linutronix.de>, Borislav Petkov <bp@alien8.de>, "H . Peter Anvin" <hpa@zytor.com>, Peter Zijlstra <peterz@infradead.org>, Linus Torvalds <torvalds@linuxfoundation.org>, Brian Gerst <brgerst@gmail.com> Subject: [PATCH 1/3] x86: Move TSS and LDT to end of the GDT Date: Wed, 13 Dec 2023 11:34:41 -0500 Message-ID: <20231213163443.70490-2-brgerst@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20231213163443.70490-1-brgerst@gmail.com> References: <20231213163443.70490-1-brgerst@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.6 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Wed, 13 Dec 2023 08:34:57 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1785185228291407500 X-GMAIL-MSGID: 1785185228291407500 |
Series |
Reject setting system segments from userspace
|
|
Commit Message
Brian Gerst
Dec. 13, 2023, 4:34 p.m. UTC
This will make testing for system segments easier.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
arch/x86/include/asm/segment.h | 33 +++++++++++++++++++++------------
1 file changed, 21 insertions(+), 12 deletions(-)
Comments
On Wed, 13 Dec 2023 at 08:34, Brian Gerst <brgerst@gmail.com> wrote: > > This will make testing for system segments easier. It seems to make more sense organizationally too, with the special non-data/code segments clearly separate at the end. So I think this is fine conceptually. HOWEVER, I think that you might want to expand on this a bit more, because there are other special segments selectors that might not be thing you want to expose to user space. We have GDT_ENTRY_PERCPU for example, which is a kernel-only segment. It also happens to be 32-bit only, it doesn't matter for the thing you're trying to fix, but that valid_user_selector() thing is then used on x86-32 too. So the ESPFIX and per-cpu segments are kernel-only, but then the VDSO getcpu one is a user segment. And the PnP and APM BIOS segments are similarly kernel-only. But then the VDSO getcpu segment is user-visible, in the middle, and again, it's 32-bit only but that whole GDT_SYSTEM_START thing is supposed to work there too. End result: this seems incomplete and not really fully baked. I wonder if instead of GDT_SYSTEM_START, you'd be better off just making a trivial constant bitmap of "these are user visible segments in the GDT". No need to re-order things, just have something like #define USER_SEGMENTS_MASK \ ((1ul << GDT_ENTRY_DEFAULT_USER_CS) | ,,,, and use that for the test (remember to check for GDT_ENTRIES as the max). Hmm? Linus
On Wed, 13 Dec 2023 at 10:51, Linus Torvalds <torvalds@linuxfoundation.org> wrote: > > We have GDT_ENTRY_PERCPU for example, which is a kernel-only segment. > It also happens to be 32-bit only, it doesn't matter for the thing > you're trying to fix, but that valid_user_selector() thing is then > used on x86-32 too. > > So the ESPFIX and per-cpu segments are kernel-only, but then the VDSO > getcpu one is a user segment. > > And the PnP and APM BIOS segments are similarly kernel-only. Final (?) note: when looking at this, I have to say that our GDT_ENTRY_INIT() and GDT_ENTRY() macros are horrendous. I know exactly *why* they are horrendous, with all the history of passing in raw flags values, etc, and you can most certainly see that whole thing in the GDT_ENTRY() macro. It's used in assembly code in a couple of cases too. But then you look at GDT_ENTRY_INIT(), and it turns that illegible "flags" value into (slightly more) legible S/DPL/etc values. So it literally makes people use those odd "this is how this is encoded" values even when the code actually wants to use a structure definition that has the flags split out. I guess it's much too much work to really fix things, but maybe we could at least add #defines and comments for the special values. So instead of GDT_ENTRY_INIT(0xc093, 0, 0xfffff) we could maybe have #define GDT_ENTRY_FLAGS(type,s,dpl,p,avl,l,d,g) \ ((type) | (s)<<4) | \ (dpl) << 5) | .... and have #defines for those 0xc093 values (with comments), so that we'd have GDT_ENTRY_INIT(KERNEL_DATA_FLAGS, 0, 0xffff) instead of a magic 0xc093 number. This would require some nit-picky "read all those values and know the crazy descriptor table layout" thing. Maybe somebody has a serious case of insomnia and boredom? Linus
On 13/12/2023 20:08, Linus Torvalds wrote: > I guess it's much too much work to really fix things, but maybe we > could at least add #defines and comments for the special values. > > So instead of > > GDT_ENTRY_INIT(0xc093, 0, 0xfffff) > > we could maybe have > > #define GDT_ENTRY_FLAGS(type,s,dpl,p,avl,l,d,g) \ > ((type) | > (s)<<4) | \ > (dpl) << 5) | .... > > and have #defines for those 0xc093 values (with comments), so that we'd have > > GDT_ENTRY_INIT(KERNEL_DATA_FLAGS, 0, 0xffff) > > instead of a magic 0xc093 number. > > This would require some nit-picky "read all those values and know the > crazy descriptor table layout" thing. Maybe somebody has a serious > case of insomnia and boredom? I took a stab at this, see attached RFC patch. Maybe this does too much, though. I did basic build and boot tests on both 64- and 32-bit, but I would also try a binary diff before/after just to verify nothing changed by accident, as well as making sure all the code is actually compiled (some of the BIOS stuff only gets used on 32-bit with ISA/PNP enabled, for example). While preparing the patch I also came across some things that are unclear to me: - why do we want some segments with the A (accessed) bit set and some with it cleared -- is there an actual reason for the difference, or could we just set it for all of them? - why does setup_percpu_segment() want the DB (size) flag clear? This seems to indicate that it's a 16-bit segment -- is this correct? Vegard
On Sat, 16 Dec 2023 at 10:25, Vegard Nossum <vegard.nossum@oracle.com> wrote: > > While preparing the patch I also came across some things that are > unclear to me: > > - why do we want some segments with the A (accessed) bit set and some > with it cleared -- is there an actual reason for the difference, or > could we just set it for all of them? I think it's random, and an effect of just having hardcoded numbers and not having any structure to it. But I do think you're right that we should just start with all kernel-created segment descriptors marked as accessed. I do not believe that we have any actual *use* for the descriptor access bit. > - why does setup_percpu_segment() want the DB (size) flag clear? This > seems to indicate that it's a 16-bit segment -- is this correct? I think it's nonsensical and doesn't matter, and is another mistake from us just having random numbers. I don't think the DB bit matters except for when it's used for the code or stack segment (or, apparently, if it's a grow-down segment). So I think your patch looks good, and I would keep it in that form if it makes it easier to just verify that it generates an identical kernel image. And then as a separate patch, I would remove that DB bit clear thing. Anyway, I do like your patch, and I think the fact that you found those oddities is a good argument *for* the patch, but at the same time I think I'll just bow to the x86 maintainers who may think that this is churn in an area that they'd rather not touch any more. So consider that an "ack" from me, but with that caveat of yes, I think a binary diff would be a good thing because this is *so* odd and low-level and maybe people just think it's not worth it. Thanks, Linus
On December 13, 2023 10:51:11 AM PST, Linus Torvalds <torvalds@linuxfoundation.org> wrote: >On Wed, 13 Dec 2023 at 08:34, Brian Gerst <brgerst@gmail.com> wrote: >> >> This will make testing for system segments easier. > >It seems to make more sense organizationally too, with the special >non-data/code segments clearly separate at the end. > >So I think this is fine conceptually. > >HOWEVER, I think that you might want to expand on this a bit more, >because there are other special segments selectors that might not be >thing you want to expose to user space. > >We have GDT_ENTRY_PERCPU for example, which is a kernel-only segment. >It also happens to be 32-bit only, it doesn't matter for the thing >you're trying to fix, but that valid_user_selector() thing is then >used on x86-32 too. > >So the ESPFIX and per-cpu segments are kernel-only, but then the VDSO >getcpu one is a user segment. > >And the PnP and APM BIOS segments are similarly kernel-only. > >But then the VDSO getcpu segment is user-visible, in the middle, and >again, it's 32-bit only but that whole GDT_SYSTEM_START thing is >supposed to work there too. > >End result: this seems incomplete and not really fully baked. > >I wonder if instead of GDT_SYSTEM_START, you'd be better off just >making a trivial constant bitmap of "these are user visible segments >in the GDT". No need to re-order things, just have something like > > #define USER_SEGMENTS_MASK \ > ((1ul << GDT_ENTRY_DEFAULT_USER_CS) | > ,,,, > >and use that for the test (remember to check for GDT_ENTRIES as the max). > >Hmm? > > Linus Somewhat unrelated: X86_BUG_ESPFIX should just be deleted, as we aren't actually ever cleaning it. All current x86 processors have that problem (until FRED).
diff --git a/arch/x86/include/asm/segment.h b/arch/x86/include/asm/segment.h index 9d6411c65920..a155843d0c37 100644 --- a/arch/x86/include/asm/segment.h +++ b/arch/x86/include/asm/segment.h @@ -83,8 +83,8 @@ * 13 - kernel data segment * 14 - default user CS * 15 - default user DS - * 16 - TSS <=== cacheline #5 - * 17 - LDT + * 16 - unused <=== cacheline #5 + * 17 - unused * 18 - PNPBIOS support (16->32 gate) * 19 - PNPBIOS support * 20 - PNPBIOS support <=== cacheline #6 @@ -97,8 +97,11 @@ * 26 - ESPFIX small SS * 27 - per-cpu [ offset to per-cpu data area ] * 28 - VDSO getcpu - * 29 - unused - * 30 - unused + * + * ------- start of system segments: + * + * 29 - TSS + * 30 - LDT * 31 - TSS for double fault handler */ #define GDT_ENTRY_TLS_MIN 6 @@ -108,8 +111,6 @@ #define GDT_ENTRY_KERNEL_DS 13 #define GDT_ENTRY_DEFAULT_USER_CS 14 #define GDT_ENTRY_DEFAULT_USER_DS 15 -#define GDT_ENTRY_TSS 16 -#define GDT_ENTRY_LDT 17 #define GDT_ENTRY_PNPBIOS_CS32 18 #define GDT_ENTRY_PNPBIOS_CS16 19 #define GDT_ENTRY_PNPBIOS_DS 20 @@ -121,6 +122,10 @@ #define GDT_ENTRY_PERCPU 27 #define GDT_ENTRY_CPUNODE 28 +/* Start of system segments */ + +#define GDT_ENTRY_TSS 29 +#define GDT_ENTRY_LDT 30 #define GDT_ENTRY_DOUBLEFAULT_TSS 31 /* @@ -188,20 +193,22 @@ #define GDT_ENTRY_DEFAULT_USER_DS 5 #define GDT_ENTRY_DEFAULT_USER_CS 6 -/* Needs two entries */ -#define GDT_ENTRY_TSS 8 -/* Needs two entries */ -#define GDT_ENTRY_LDT 10 - #define GDT_ENTRY_TLS_MIN 12 #define GDT_ENTRY_TLS_MAX 14 #define GDT_ENTRY_CPUNODE 15 +/* Start of system segments */ + +/* Needs two entries */ +#define GDT_ENTRY_TSS 16 +/* Needs two entries */ +#define GDT_ENTRY_LDT 18 + /* * Number of entries in the GDT table: */ -#define GDT_ENTRIES 16 +#define GDT_ENTRIES 20 /* * Segment selector values corresponding to the above entries: @@ -219,6 +226,8 @@ #endif +#define GDT_SYSTEM_START GDT_ENTRY_TSS + #define IDT_ENTRIES 256 #define NUM_EXCEPTION_VECTORS 32