[1/3] x86: Move TSS and LDT to end of the GDT

Message ID 20231213163443.70490-2-brgerst@gmail.com
State New
Headers
Series Reject setting system segments from userspace |

Commit Message

Brian Gerst Dec. 13, 2023, 4:34 p.m. UTC
  This will make testing for system segments easier.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
---
 arch/x86/include/asm/segment.h | 33 +++++++++++++++++++++------------
 1 file changed, 21 insertions(+), 12 deletions(-)
  

Comments

Linus Torvalds Dec. 13, 2023, 6:51 p.m. UTC | #1
On Wed, 13 Dec 2023 at 08:34, Brian Gerst <brgerst@gmail.com> wrote:
>
> This will make testing for system segments easier.

It seems to make more sense organizationally too, with the special
non-data/code segments clearly separate at the end.

So I think this is fine conceptually.

HOWEVER, I think that you might want to expand on this a bit more,
because there are other special segments selectors that might not be
thing you want to expose to user space.

We have GDT_ENTRY_PERCPU for example, which is a kernel-only segment.
It also happens to be 32-bit only, it doesn't matter for the thing
you're trying to fix, but that valid_user_selector() thing is then
used on x86-32 too.

So the ESPFIX and per-cpu segments are kernel-only, but then the VDSO
getcpu one is a user segment.

And the PnP and APM BIOS segments are similarly kernel-only.

But then the VDSO getcpu segment is user-visible, in the middle, and
again, it's 32-bit only but that whole GDT_SYSTEM_START thing is
supposed to work there too.

End result: this seems incomplete and not really fully baked.

I wonder if instead of GDT_SYSTEM_START, you'd be better off just
making a trivial constant bitmap of "these are user visible segments
in the GDT". No need to re-order things, just have something like

   #define USER_SEGMENTS_MASK \
        ((1ul << GDT_ENTRY_DEFAULT_USER_CS) |
         ,,,,

and use that for the test (remember to check for GDT_ENTRIES as the max).

Hmm?

             Linus
  
Linus Torvalds Dec. 13, 2023, 7:08 p.m. UTC | #2
On Wed, 13 Dec 2023 at 10:51, Linus Torvalds
<torvalds@linuxfoundation.org> wrote:
>
> We have GDT_ENTRY_PERCPU for example, which is a kernel-only segment.
> It also happens to be 32-bit only, it doesn't matter for the thing
> you're trying to fix, but that valid_user_selector() thing is then
> used on x86-32 too.
>
> So the ESPFIX and per-cpu segments are kernel-only, but then the VDSO
> getcpu one is a user segment.
>
> And the PnP and APM BIOS segments are similarly kernel-only.

Final (?) note: when looking at this, I have to say that our
GDT_ENTRY_INIT() and GDT_ENTRY() macros are horrendous.

I know exactly *why* they are horrendous, with all the history of
passing in raw flags values, etc, and you can most certainly see that
whole thing in the GDT_ENTRY() macro. It's used in assembly code in a
couple of cases too.

But then you look at GDT_ENTRY_INIT(), and it turns that illegible
"flags" value into (slightly more) legible S/DPL/etc values. So it
literally makes people use those odd "this is how this is encoded"
values even when the code actually wants to use a structure definition
that has the flags split out.

I guess it's much too much work to really fix things, but maybe we
could at least add #defines and comments for the special values.

So instead of

        GDT_ENTRY_INIT(0xc093, 0, 0xfffff)

we could maybe have

       #define GDT_ENTRY_FLAGS(type,s,dpl,p,avl,l,d,g) \
                ((type) |
                 (s)<<4) | \
                (dpl) << 5) | ....

and have #defines for those 0xc093 values (with comments), so that we'd have

        GDT_ENTRY_INIT(KERNEL_DATA_FLAGS, 0, 0xffff)

instead of a magic 0xc093 number.

This would require some nit-picky "read all those values and know the
crazy descriptor table layout" thing. Maybe somebody has a serious
case of insomnia and boredom?

           Linus
  
Vegard Nossum Dec. 16, 2023, 6:24 p.m. UTC | #3
On 13/12/2023 20:08, Linus Torvalds wrote:
> I guess it's much too much work to really fix things, but maybe we
> could at least add #defines and comments for the special values.
> 
> So instead of
> 
>          GDT_ENTRY_INIT(0xc093, 0, 0xfffff)
> 
> we could maybe have
> 
>         #define GDT_ENTRY_FLAGS(type,s,dpl,p,avl,l,d,g) \
>                  ((type) |
>                   (s)<<4) | \
>                  (dpl) << 5) | ....
> 
> and have #defines for those 0xc093 values (with comments), so that we'd have
> 
>          GDT_ENTRY_INIT(KERNEL_DATA_FLAGS, 0, 0xffff)
> 
> instead of a magic 0xc093 number.
> 
> This would require some nit-picky "read all those values and know the
> crazy descriptor table layout" thing. Maybe somebody has a serious
> case of insomnia and boredom?

I took a stab at this, see attached RFC patch. Maybe this does too much,
though.

I did basic build and boot tests on both 64- and 32-bit, but I would
also try a binary diff before/after just to verify nothing changed by
accident, as well as making sure all the code is actually compiled (some
of the BIOS stuff only gets used on 32-bit with ISA/PNP enabled, for
example).

While preparing the patch I also came across some things that are
unclear to me:

- why do we want some segments with the A (accessed) bit set and some
with it cleared -- is there an actual reason for the difference, or
could we just set it for all of them?

- why does setup_percpu_segment() want the DB (size) flag clear? This
seems to indicate that it's a 16-bit segment -- is this correct?


Vegard
  
Linus Torvalds Dec. 16, 2023, 6:40 p.m. UTC | #4
On Sat, 16 Dec 2023 at 10:25, Vegard Nossum <vegard.nossum@oracle.com> wrote:
>
> While preparing the patch I also came across some things that are
> unclear to me:
>
> - why do we want some segments with the A (accessed) bit set and some
> with it cleared -- is there an actual reason for the difference, or
> could we just set it for all of them?

I think it's random, and an effect of just having hardcoded numbers
and not having any structure to it.

But I do think you're right that we should just start with all
kernel-created segment descriptors marked as accessed. I do not
believe that we have any actual *use* for the descriptor access bit.

> - why does setup_percpu_segment() want the DB (size) flag clear? This
> seems to indicate that it's a 16-bit segment -- is this correct?

I think it's nonsensical and doesn't matter, and is another mistake
from us just having random numbers.

I don't think the DB bit matters except for when it's used for the
code or stack segment (or, apparently, if it's a grow-down segment).

So I think your patch looks good, and I would keep it in that form if
it makes it easier to just verify that it generates an identical
kernel image.

And then as a separate patch, I would remove that DB bit clear thing.

Anyway, I do like your patch, and I think the fact that you found
those oddities is a good argument *for* the patch, but at the same
time I think I'll just bow to the x86 maintainers who may think that
this is churn in an area that they'd rather not touch any more.

So consider that an "ack" from me, but with that caveat of yes, I
think a binary diff would be a good thing because this is *so* odd and
low-level and maybe people just think it's not worth it.

Thanks,

                  Linus
  
H. Peter Anvin Dec. 17, 2023, 9:09 p.m. UTC | #5
On December 13, 2023 10:51:11 AM PST, Linus Torvalds <torvalds@linuxfoundation.org> wrote:
>On Wed, 13 Dec 2023 at 08:34, Brian Gerst <brgerst@gmail.com> wrote:
>>
>> This will make testing for system segments easier.
>
>It seems to make more sense organizationally too, with the special
>non-data/code segments clearly separate at the end.
>
>So I think this is fine conceptually.
>
>HOWEVER, I think that you might want to expand on this a bit more,
>because there are other special segments selectors that might not be
>thing you want to expose to user space.
>
>We have GDT_ENTRY_PERCPU for example, which is a kernel-only segment.
>It also happens to be 32-bit only, it doesn't matter for the thing
>you're trying to fix, but that valid_user_selector() thing is then
>used on x86-32 too.
>
>So the ESPFIX and per-cpu segments are kernel-only, but then the VDSO
>getcpu one is a user segment.
>
>And the PnP and APM BIOS segments are similarly kernel-only.
>
>But then the VDSO getcpu segment is user-visible, in the middle, and
>again, it's 32-bit only but that whole GDT_SYSTEM_START thing is
>supposed to work there too.
>
>End result: this seems incomplete and not really fully baked.
>
>I wonder if instead of GDT_SYSTEM_START, you'd be better off just
>making a trivial constant bitmap of "these are user visible segments
>in the GDT". No need to re-order things, just have something like
>
>   #define USER_SEGMENTS_MASK \
>        ((1ul << GDT_ENTRY_DEFAULT_USER_CS) |
>         ,,,,
>
>and use that for the test (remember to check for GDT_ENTRIES as the max).
>
>Hmm?
>
>             Linus

Somewhat unrelated: X86_BUG_ESPFIX should just be deleted, as we aren't actually ever cleaning it. All current x86 processors have that problem (until FRED).
  

Patch

diff --git a/arch/x86/include/asm/segment.h b/arch/x86/include/asm/segment.h
index 9d6411c65920..a155843d0c37 100644
--- a/arch/x86/include/asm/segment.h
+++ b/arch/x86/include/asm/segment.h
@@ -83,8 +83,8 @@ 
  *  13 - kernel data segment
  *  14 - default user CS
  *  15 - default user DS
- *  16 - TSS								<=== cacheline #5
- *  17 - LDT
+ *  16 - unused								<=== cacheline #5
+ *  17 - unused
  *  18 - PNPBIOS support (16->32 gate)
  *  19 - PNPBIOS support
  *  20 - PNPBIOS support						<=== cacheline #6
@@ -97,8 +97,11 @@ 
  *  26 - ESPFIX small SS
  *  27 - per-cpu			[ offset to per-cpu data area ]
  *  28 - VDSO getcpu
- *  29 - unused
- *  30 - unused
+ *
+ *  ------- start of system segments:
+ *
+ *  29 - TSS
+ *  30 - LDT
  *  31 - TSS for double fault handler
  */
 #define GDT_ENTRY_TLS_MIN		6
@@ -108,8 +111,6 @@ 
 #define GDT_ENTRY_KERNEL_DS		13
 #define GDT_ENTRY_DEFAULT_USER_CS	14
 #define GDT_ENTRY_DEFAULT_USER_DS	15
-#define GDT_ENTRY_TSS			16
-#define GDT_ENTRY_LDT			17
 #define GDT_ENTRY_PNPBIOS_CS32		18
 #define GDT_ENTRY_PNPBIOS_CS16		19
 #define GDT_ENTRY_PNPBIOS_DS		20
@@ -121,6 +122,10 @@ 
 #define GDT_ENTRY_PERCPU		27
 #define GDT_ENTRY_CPUNODE		28
 
+/* Start of system segments */
+
+#define GDT_ENTRY_TSS			29
+#define GDT_ENTRY_LDT			30
 #define GDT_ENTRY_DOUBLEFAULT_TSS	31
 
 /*
@@ -188,20 +193,22 @@ 
 #define GDT_ENTRY_DEFAULT_USER_DS	5
 #define GDT_ENTRY_DEFAULT_USER_CS	6
 
-/* Needs two entries */
-#define GDT_ENTRY_TSS			8
-/* Needs two entries */
-#define GDT_ENTRY_LDT			10
-
 #define GDT_ENTRY_TLS_MIN		12
 #define GDT_ENTRY_TLS_MAX		14
 
 #define GDT_ENTRY_CPUNODE		15
 
+/* Start of system segments */
+
+/* Needs two entries */
+#define GDT_ENTRY_TSS			16
+/* Needs two entries */
+#define GDT_ENTRY_LDT			18
+
 /*
  * Number of entries in the GDT table:
  */
-#define GDT_ENTRIES			16
+#define GDT_ENTRIES			20
 
 /*
  * Segment selector values corresponding to the above entries:
@@ -219,6 +226,8 @@ 
 
 #endif
 
+#define GDT_SYSTEM_START		GDT_ENTRY_TSS
+
 #define IDT_ENTRIES			256
 #define NUM_EXCEPTION_VECTORS		32