x86/ia32: Do not modify the DPL bits for a null selector

Message ID 20230706052231.2183-1-xin3.li@intel.com
State New
Headers
Series x86/ia32: Do not modify the DPL bits for a null selector |

Commit Message

Li, Xin3 July 6, 2023, 5:22 a.m. UTC
  When a null selector is to be loaded into a segment register,
reload_segments() sets its DPL bits to 3. Later when an IRET
instruction loads it, it zeros the segment register. The two
operations offset each other to actually effect a nop.

Fix it by not modifying the DPL bits for a null selector.

Signed-off-by: Xin Li <xin3.li@intel.com>
---
 arch/x86/kernel/signal_32.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)
  

Comments

Eric W. Biederman July 6, 2023, 2:05 p.m. UTC | #1
Xin Li <xin3.li@intel.com> writes:

> When a null selector is to be loaded into a segment register,
> reload_segments() sets its DPL bits to 3. Later when an IRET
> instruction loads it, it zeros the segment register. The two
> operations offset each other to actually effect a nop.
>
> Fix it by not modifying the DPL bits for a null selector.

Maybe this is the right thing but this needs some serious comments
about what is going on.

In particular how does sel <= 3 equate to a null selector?  Is that
defined somewhere?  At a minimum you should have static asserts to make
certain no one redefines the first 4 segment selectors as anything else,
if you want to refer to them by number instead of testing for specific
properties.

As written this looks like it requires an enormous amount of knowledge
about how other parts of the code works, to be comprehensible or to
change safely.  That level of non-local knowledge should be unnecessary.

Eric


> Signed-off-by: Xin Li <xin3.li@intel.com>
> ---
>  arch/x86/kernel/signal_32.c | 21 +++++++++++++--------
>  1 file changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/kernel/signal_32.c b/arch/x86/kernel/signal_32.c
> index 9027fc088f97..7796cf84fca2 100644
> --- a/arch/x86/kernel/signal_32.c
> +++ b/arch/x86/kernel/signal_32.c
> @@ -36,22 +36,27 @@
>  #ifdef CONFIG_IA32_EMULATION
>  #include <asm/ia32_unistd.h>
>  
> +static inline u16 usrseg(u16 sel)
> +{
> +	return sel <= 3 ? sel : sel | 3;
> +}
> +
>  static inline void reload_segments(struct sigcontext_32 *sc)
>  {
>  	unsigned int cur;
>  
>  	savesegment(gs, cur);
> -	if ((sc->gs | 0x03) != cur)
> -		load_gs_index(sc->gs | 0x03);
> +	if (usrseg(sc->gs) != cur)
> +		load_gs_index(usrseg(sc->gs));
>  	savesegment(fs, cur);
> -	if ((sc->fs | 0x03) != cur)
> -		loadsegment(fs, sc->fs | 0x03);
> +	if (usrseg(sc->fs) != cur)
> +		loadsegment(fs, usrseg(sc->fs));
>  	savesegment(ds, cur);
> -	if ((sc->ds | 0x03) != cur)
> -		loadsegment(ds, sc->ds | 0x03);
> +	if (usrseg(sc->ds) != cur)
> +		loadsegment(ds, usrseg(sc->ds));
>  	savesegment(es, cur);
> -	if ((sc->es | 0x03) != cur)
> -		loadsegment(es, sc->es | 0x03);
> +	if (usrseg(sc->es) != cur)
> +		loadsegment(es, usrseg(sc->es));
>  }
>  
>  #define sigset32_t			compat_sigset_t
  
Dave Hansen July 6, 2023, 3:28 p.m. UTC | #2
On 7/5/23 22:22, Xin Li wrote:
> When a null selector is to be loaded into a segment register,
> reload_segments() sets its DPL bits to 3. Later when an IRET
> instruction loads it, it zeros the segment register. The two
> operations offset each other to actually effect a nop.
> 
> Fix it by not modifying the DPL bits for a null selector.

So in the end, this is an optimization attempt?  Is there any other benefit?
  
Li, Xin3 July 7, 2023, 2:29 a.m. UTC | #3
> > When a null selector is to be loaded into a segment register,
> > reload_segments() sets its DPL bits to 3. Later when an IRET
> > instruction loads it, it zeros the segment register. The two
> > operations offset each other to actually effect a nop.
> >
> > Fix it by not modifying the DPL bits for a null selector.
> 
> So in the end, this is an optimization attempt?  Is there any other benefit?

You asked but I think you have the answer 😉, but want me to give the
details in the commit message, which probably I should have done.

Unlike IRET, ERETU, introduced with FRED to return to ring 3 from ring
0, does not make any of DS, ES, FS, or GS null if it is found to have
DPL < 3. It is expected that a FRED-enabled operating system will return
to ring 3 (in compatibility mode) only when those non-null selectors
all have DPL = 3.

Thus when FRED is enabled, because reload_segments() sets the DPL bits
of null selector to 3, we end up with having 3 in a segment register
even when it is initially set to 0. As a result, the sigreturn selftest
fails as it sets DS to 0 and then checks if it's still 0 after sigreturn.

Thanks!
  Xin
  
Li, Xin3 July 7, 2023, 4:07 a.m. UTC | #4
> > When a null selector is to be loaded into a segment register,
> > reload_segments() sets its DPL bits to 3. Later when an IRET
> > instruction loads it, it zeros the segment register. The two
> > operations offset each other to actually effect a nop.
> >
> > Fix it by not modifying the DPL bits for a null selector.
> 
> Maybe this is the right thing but this needs some serious comments about what is
> going on.
> 
> In particular how does sel <= 3 equate to a null selector?  Is that defined
> somewhere?

In protected mode, a NULL selector (values 0000 through 0003) can be
loaded into DS, ES, FS, or GS registers without causing a protection
exception.

This can be found at the description of instruction LDS/LES/LFS/LGS/LSS,
in section 3.3 "instructions (A-L)" of Intel SDM Volume 2, Instruction Set
Reference.
 
>At a minimum you should have static asserts to make certain no
> one redefines the first 4 segment selectors as anything else, if you want to refer to
> them by number instead of testing for specific properties.

Bits 0 and 1 of a selector are NOT used to index the GDT or IDT, thus
selector values 0 ~ 3 all point to the first entry of the GDT, i.e.,
the null selector.

Thus 3 as a selector is the same as 0, and it doesn't matter to change
it or not. But when IRET sees an invalid segment register in ES, FS, GS,
and DS, it sets it to 0, making 0 a preferred null selector value.

The sigreturn selftest sets DS of its signal context to 0 and then checks
if it's still 0 after sigreturn. And it passes. However if it sets DS to
any other null selector value, e.g., 1, it fails. For FRED, if we don't
modify the DPL bits, the test passes, because ERETU doesn't change segment
registers.

> As written this looks like it requires an enormous amount of knowledge about
> how other parts of the code works, to be comprehensible or to change safely.
> That level of non-local knowledge should be unnecessary.
  
Li, Xin3 July 7, 2023, 4:16 a.m. UTC | #5
> Thus 3 as a selector is the same as 0, and it doesn't matter to change it or not. But
> when IRET sees an invalid segment register in ES, FS, GS, and DS, it sets it to 0,
> making 0 a preferred null selector value.

To clarify, an invalid segment register value includes NULL selector values.
  
Eric W. Biederman July 7, 2023, 5:28 p.m. UTC | #6
"Li, Xin3" <xin3.li@intel.com> writes:

>> Thus 3 as a selector is the same as 0, and it doesn't matter to change it or not. But
>> when IRET sees an invalid segment register in ES, FS, GS, and DS, it sets it to 0,
>> making 0 a preferred null selector value.
>
> To clarify, an invalid segment register value includes NULL selector values.

Perhaps something like patch below to make it clear that we are
normalizing the segment values and forcing that normalization.

I am a bit confused why this code is not the same for ia32 and
ia32_emulation.  I would think the normalization at least should apply
to the 32bit case as well.

Eric

diff --git a/arch/x86/kernel/signal_32.c b/arch/x86/kernel/signal_32.c
index 9027fc088f97..e5f3978388fd 100644
--- a/arch/x86/kernel/signal_32.c
+++ b/arch/x86/kernel/signal_32.c
@@ -36,22 +36,47 @@
 #ifdef CONFIG_IA32_EMULATION
 #include <asm/ia32_unistd.h>
 
+static inline unsigned int normalize_seg_index(unsigned int index)
+{
+	/*
+	 * Convert the segment index into normalized form.
+	 *
+	 * For the indexes 0,1,2,3 always use the value of 0, as IRET
+	 * forces this form for the nul segment.
+	 *
+	 * Otherwise set both DPL bits to force it to be a userspace
+	 * ring 3 segment index.
+	 */
+	return (index < 3) ? 0 : index | 3;
+}
+
 static inline void reload_segments(struct sigcontext_32 *sc)
 {
-	unsigned int cur;
+	unsigned int new, cur;
 
+	new = normalize_seg_index(sc->gs);
 	savesegment(gs, cur);
-	if ((sc->gs | 0x03) != cur)
-		load_gs_index(sc->gs | 0x03);
+	cur = normalize_seg_index(cur);
+	if (new != cur)
+		load_gs_index(new);
+
+	new = normalize_seg_index(sc->fs);
 	savesegment(fs, cur);
-	if ((sc->fs | 0x03) != cur)
-		loadsegment(fs, sc->fs | 0x03);
+	cur = normalize_seg_index(cur);
+	if (new != cur)
+		loadsegment(fs, new);
+
+	new = normalize_seg_index(sc->ds);
 	savesegment(ds, cur);
-	if ((sc->ds | 0x03) != cur)
-		loadsegment(ds, sc->ds | 0x03);
+	cur = normalize_seg_index(cur);
+	if (new != cur)
+		loadsegment(ds, new);
+
+	new = normalize_seg_index(sc->es);
 	savesegment(es, cur);
-	if ((sc->es | 0x03) != cur)
-		loadsegment(es, sc->es | 0x03);
+	cur = normalize_seg_index(cur);
+	if (new != cur)
+		loadsegment(es, new);
 }
 
 #define sigset32_t			compat_sigset_t
  
Li, Xin3 July 7, 2023, 10:17 p.m. UTC | #7
> Perhaps something like patch below to make it clear that we are
> normalizing the segment values and forcing that normalization.

"Normalizing" sounds a good term.

> I am a bit confused why this code is not the same for ia32 and
> ia32_emulation.  I would think the normalization at least should apply
> to the 32bit case as well.
> 
> Eric
> 
> diff --git a/arch/x86/kernel/signal_32.c b/arch/x86/kernel/signal_32.c
> index 9027fc088f97..e5f3978388fd 100644
> --- a/arch/x86/kernel/signal_32.c
> +++ b/arch/x86/kernel/signal_32.c
> @@ -36,22 +36,47 @@
>  #ifdef CONFIG_IA32_EMULATION
>  #include <asm/ia32_unistd.h>
> 
> +static inline unsigned int normalize_seg_index(unsigned int index)

normalize_usrseg_index?

> +{
> +	/*
> +	 * Convert the segment index into normalized form.
> +	 *
> +	 * For the indexes 0,1,2,3 always use the value of 0, as IRET
> +	 * forces this form for the nul segment.
> +	 *
> +	 * Otherwise set both DPL bits to force it to be a userspace
> +	 * ring 3 segment index.
> +	 */
> +	return (index < 3) ? 0 : index | 3;

s/</<=

no?

With this patch it looks that 1,2,3 are no longer valid selector values
in Linux, which seems the right thing to do but I don't know if there is
any side effect.

> +}
> +
>  static inline void reload_segments(struct sigcontext_32 *sc)
>  {
> -	unsigned int cur;
> +	unsigned int new, cur;
> 
> +	new = normalize_seg_index(sc->gs);
>  	savesegment(gs, cur);
> -	if ((sc->gs | 0x03) != cur)
> -		load_gs_index(sc->gs | 0x03);
> +	cur = normalize_seg_index(cur);
> +	if (new != cur)
> +		load_gs_index(new);
> +
> +	new = normalize_seg_index(sc->fs);
>  	savesegment(fs, cur);
> -	if ((sc->fs | 0x03) != cur)
> -		loadsegment(fs, sc->fs | 0x03);
> +	cur = normalize_seg_index(cur);
> +	if (new != cur)
> +		loadsegment(fs, new);
> +
> +	new = normalize_seg_index(sc->ds);
>  	savesegment(ds, cur);
> -	if ((sc->ds | 0x03) != cur)
> -		loadsegment(ds, sc->ds | 0x03);
> +	cur = normalize_seg_index(cur);
> +	if (new != cur)
> +		loadsegment(ds, new);
> +
> +	new = normalize_seg_index(sc->es);
>  	savesegment(es, cur);
> -	if ((sc->es | 0x03) != cur)
> -		loadsegment(es, sc->es | 0x03);
> +	cur = normalize_seg_index(cur);
> +	if (new != cur)
> +		loadsegment(es, new);
>  }
> 
>  #define sigset32_t			compat_sigset_t
  
Eric W. Biederman July 7, 2023, 10:51 p.m. UTC | #8
"Li, Xin3" <xin3.li@intel.com> writes:

>> Perhaps something like patch below to make it clear that we are
>> normalizing the segment values and forcing that normalization.
>
> "Normalizing" sounds a good term.
>
>> I am a bit confused why this code is not the same for ia32 and
>> ia32_emulation.  I would think the normalization at least should apply
>> to the 32bit case as well.
>> 
>> Eric
>> 
>> diff --git a/arch/x86/kernel/signal_32.c b/arch/x86/kernel/signal_32.c
>> index 9027fc088f97..e5f3978388fd 100644
>> --- a/arch/x86/kernel/signal_32.c
>> +++ b/arch/x86/kernel/signal_32.c
>> @@ -36,22 +36,47 @@
>>  #ifdef CONFIG_IA32_EMULATION
>>  #include <asm/ia32_unistd.h>
>> 
>> +static inline unsigned int normalize_seg_index(unsigned int index)
>
> normalize_usrseg_index?

Perhaps useg?  I think that is the abbreviation I have seen used
elsewhere.  I was trying to get things as close as I could to the
x86 documentation so people could figure out what is going on by
reading the documentation.

>> +{
>> +	/*
>> +	 * Convert the segment index into normalized form.
>> +	 *
>> +	 * For the indexes 0,1,2,3 always use the value of 0, as IRET
>> +	 * forces this form for the nul segment.
>> +	 *
>> +	 * Otherwise set both DPL bits to force it to be a userspace
>> +	 * ring 3 segment index.
>> +	 */
>> +	return (index < 3) ? 0 : index | 3;
>
> s/</<=
>
> no?

Yes.  My typo.

The patch was very much a suggestion on how we can perhaps write the
code to make it clearer what is happening.  Normalizing the segment
index values seems like what we have been intending to do all along.

In fact it might make sense for clarity to simply use the existing
"index | 3" logic in what I called normal_seg_index, and then just
update the normalization to add the null pointer case.

I was just spending a little time trying to make it so that the code
is readable.

> With this patch it looks that 1,2,3 are no longer valid selector values
> in Linux, which seems the right thing to do but I don't know if there is
> any side effect.

You have said that IRET always changes 1,2,3 to 0.  So I don't expect
the selector values of 1,2,3 will last very long.

If that is not the case I misunderstood, and we should consider doing
something else.

It bothers me that the existing code, and your code as well only
handles the normalization on x86_64 for ia32 mode.  Shouldn't
the same normalization logic apply in a 32bit kernel as well?
Scope creep I know but the fact the code does not match
seems concerning.

Eric


>> +}
>> +
>>  static inline void reload_segments(struct sigcontext_32 *sc)
>>  {
>> -	unsigned int cur;
>> +	unsigned int new, cur;
>> 
>> +	new = normalize_seg_index(sc->gs);
>>  	savesegment(gs, cur);
>> -	if ((sc->gs | 0x03) != cur)
>> -		load_gs_index(sc->gs | 0x03);
>> +	cur = normalize_seg_index(cur);
>> +	if (new != cur)
>> +		load_gs_index(new);
>> +
>> +	new = normalize_seg_index(sc->fs);
>>  	savesegment(fs, cur);
>> -	if ((sc->fs | 0x03) != cur)
>> -		loadsegment(fs, sc->fs | 0x03);
>> +	cur = normalize_seg_index(cur);
>> +	if (new != cur)
>> +		loadsegment(fs, new);
>> +
>> +	new = normalize_seg_index(sc->ds);
>>  	savesegment(ds, cur);
>> -	if ((sc->ds | 0x03) != cur)
>> -		loadsegment(ds, sc->ds | 0x03);
>> +	cur = normalize_seg_index(cur);
>> +	if (new != cur)
>> +		loadsegment(ds, new);
>> +
>> +	new = normalize_seg_index(sc->es);
>>  	savesegment(es, cur);
>> -	if ((sc->es | 0x03) != cur)
>> -		loadsegment(es, sc->es | 0x03);
>> +	cur = normalize_seg_index(cur);
>> +	if (new != cur)
>> +		loadsegment(es, new);
>>  }
>> 
>>  #define sigset32_t			compat_sigset_t
  
Li, Xin3 July 8, 2023, 8:44 a.m. UTC | #9
> > normalize_usrseg_index?
> 
> Perhaps useg?  I think that is the abbreviation I have seen used
> elsewhere.  I was trying to get things as close as I could to the
> x86 documentation so people could figure out what is going on by
> reading the documentation.

useg seems the right term afaict.

 
> >> +	return (index < 3) ? 0 : index | 3;
> >
> > s/</<=
> >
> > no?
> 
> Yes.  My typo.
> 
> The patch was very much a suggestion on how we can perhaps write the
> code to make it clearer what is happening.  Normalizing the segment
> index values seems like what we have been intending to do all along.
> 
> In fact it might make sense for clarity to simply use the existing
> "index | 3" logic in what I called normal_seg_index, and then just
> update the normalization to add the null pointer case.
> 
> I was just spending a little time trying to make it so that the code
> is readable.
> 
> > With this patch it looks that 1,2,3 are no longer valid selector values
> > in Linux, which seems the right thing to do but I don't know if there is
> > any side effect.
> 
> You have said that IRET always changes 1,2,3 to 0.  So I don't expect
> the selector values of 1,2,3 will last very long.
> 
> If that is not the case I misunderstood, and we should consider doing
> something else.

Your understanding is correct. And I am NOT opposed to your change,
but want to see if there is any concern from other people.

On the other side, I got to mention that when FRED is enabled, ERETU
doesn't forcibly set non-zero null selector values to 0. Then for the
sigreturn self-test, it can set DS to any of the null selector values,
and expect DS is set to the value after sigreturn.

With your proposal, 1, 2 and 3 are no longer valid selector values in
Linux, and Linux would fociably set non-zero null selector values to 0,
just like what IRET has been doing.

Then the sigreturn self-test should be changed to check any non-zero
null selector value will be set to 0 after sigreturn. And this behavior
is consistent w/ and w/o FRED.

I think we should do it.

> It bothers me that the existing code, and your code as well only
> handles the normalization on x86_64 for ia32 mode.  Shouldn't
> the same normalization logic apply in a 32bit kernel as well?
> Scope creep I know but the fact the code does not match
> seems concerning.

Agreed! We *should* fix it in the same way.

I guess people are just conservative with 32-bit kernel specific changes.

How many OSVs still release 32-bit kernel? What about the testing?
  
Li, Xin3 July 24, 2023, 8:54 a.m. UTC | #10
> > It bothers me that the existing code, and your code as well only
> > handles the normalization on x86_64 for ia32 mode.  Shouldn't the same
> > normalization logic apply in a 32bit kernel as well?
> > Scope creep I know but the fact the code does not match seems
> > concerning.
> 
> Agreed! We *should* fix it in the same way.

The fact is that the existing code unconditionally sets the DPL bits
to 3 only on x86_64 (why did we forgot to do it on 32-bit?). Thus,
there is nothing to revert for null selectors on 32-bit.

With your suggestion to normalize null selectors, we need to add the
code only when IRET is NOT used to return to user level. Fortunately,
Brian Gerst just posted a patch set
https://lore.kernel.org/lkml/20230721161018.50214-1-brgerst@gmail.com/,
which makes the case whether IRET is used or not explicit on both x86_64
and x86_32. As a result it is straightforward to add the normalization
code afterwards.

Thanks!
  Xin
  

Patch

diff --git a/arch/x86/kernel/signal_32.c b/arch/x86/kernel/signal_32.c
index 9027fc088f97..7796cf84fca2 100644
--- a/arch/x86/kernel/signal_32.c
+++ b/arch/x86/kernel/signal_32.c
@@ -36,22 +36,27 @@ 
 #ifdef CONFIG_IA32_EMULATION
 #include <asm/ia32_unistd.h>
 
+static inline u16 usrseg(u16 sel)
+{
+	return sel <= 3 ? sel : sel | 3;
+}
+
 static inline void reload_segments(struct sigcontext_32 *sc)
 {
 	unsigned int cur;
 
 	savesegment(gs, cur);
-	if ((sc->gs | 0x03) != cur)
-		load_gs_index(sc->gs | 0x03);
+	if (usrseg(sc->gs) != cur)
+		load_gs_index(usrseg(sc->gs));
 	savesegment(fs, cur);
-	if ((sc->fs | 0x03) != cur)
-		loadsegment(fs, sc->fs | 0x03);
+	if (usrseg(sc->fs) != cur)
+		loadsegment(fs, usrseg(sc->fs));
 	savesegment(ds, cur);
-	if ((sc->ds | 0x03) != cur)
-		loadsegment(ds, sc->ds | 0x03);
+	if (usrseg(sc->ds) != cur)
+		loadsegment(ds, usrseg(sc->ds));
 	savesegment(es, cur);
-	if ((sc->es | 0x03) != cur)
-		loadsegment(es, sc->es | 0x03);
+	if (usrseg(sc->es) != cur)
+		loadsegment(es, usrseg(sc->es));
 }
 
 #define sigset32_t			compat_sigset_t