[v2,0/2] x86/cpu: fix invalid MTRR mask values for SEV or TME

Message ID 20240131230902.1867092-1-pbonzini@redhat.com
Headers
Series x86/cpu: fix invalid MTRR mask values for SEV or TME |

Message

Paolo Bonzini Jan. 31, 2024, 11:09 p.m. UTC
  Supersedes: <20240130180400.1698136-1-pbonzini@redhat.com>

MKTME repurposes the high bit of physical address to key id for encryption
key and, even though MAXPHYADDR in CPUID[0x80000008] remains the same,
the valid bits in the MTRR mask register are based on the reduced number
of physical address bits.  This breaks boot on machines that have TME enabled
and do something to cleanup MTRRs, unless "disable_mtrr_cleanup" is
passed on the command line.  The fix is to move the check to early CPU
initialization, which runs before Linux sets up MTRRs.

However, as noticed by Kirill, the patch I sent as v1 actually works only
until Linux 6.6.  In Linux 6.7, commit fbf6449f84bf ("x86/sev-es: Set
x86_virt_bits to the correct value straight away, instead of a two-phase
approach") reorganized the initialization of c->x86_phys_bits in a way
that broke the patch.  But even in 6.7 AMD processors, which did try to
reduce it in this_cpu->c_early_init(c), had their x86_phys_bits value
overwritten by get_cpu_address_sizes(), so that early_identify_cpu()
left the wrong value in x86_phys_bits.  This probably went unnoticed
because on AMD processors you need not apply the reduced MAXPHYADDR to
MTRR masks.

Therefore, this v2 prepends the fix for this issue in commit fbf6449f84bf.
Apologies for the oversight.

Tested on an AMD Epyc machine (where I resorted to dumping mtrr_state) and
on the problematic Intel Emerald Rapids machine.

Thanks,

Paolo

Paolo Bonzini (2):
  x86/cpu: allow reducing x86_phys_bits during early_identify_cpu()
  x86/cpu/intel: Detect TME keyid bits before setting MTRR mask
    registers

 arch/x86/kernel/cpu/common.c |   4 +-
 arch/x86/kernel/cpu/intel.c  | 178 ++++++++++++++++++-----------------
 2 files changed, 93 insertions(+), 89 deletions(-)
  

Comments

Dave Hansen Feb. 1, 2024, 6:29 p.m. UTC | #1
On 1/31/24 15:09, Paolo Bonzini wrote:
> However, as noticed by Kirill, the patch I sent as v1 actually works only
> until Linux 6.6.  In Linux 6.7, commit fbf6449f84bf ("x86/sev-es: Set
> x86_virt_bits to the correct value straight away, instead of a two-phase
> approach") reorganized the initialization of c->x86_phys_bits in a way
> that broke the patch.  But even in 6.7 AMD processors, which did try to
> reduce it in this_cpu->c_early_init(c), had their x86_phys_bits value
> overwritten by get_cpu_address_sizes(), so that early_identify_cpu()
> left the wrong value in x86_phys_bits.  This probably went unnoticed
> because on AMD processors you need not apply the reduced MAXPHYADDR to
> MTRR masks.

I really wanted get_cpu_address_sizes() to be the one and only spot
where c->x86_phys_bits is established.  That way, we don't get a bunch
of code all of the place tweaking it and fighting for who "wins".

We're not there yet, but the approach in this patch moves it back in the
wrong direction because it permits the random tweaking of c->x86_phys_bits.

Could we instead do something more like the (completely untested)
attached patch?

BTW, I'm pretty sure the WARN_ON() in the patch won't actually work, but
it'd be nice to work toward a point when it does.
  
Dave Hansen Feb. 1, 2024, 9:42 p.m. UTC | #2
On 2/1/24 10:29, Dave Hansen wrote:
> Could we instead do something more like the (completely untested)
> attached patch?

.. actually attaching it here
  
Paolo Bonzini Feb. 1, 2024, 11:08 p.m. UTC | #3
On Thu, Feb 1, 2024 at 7:29 PM Dave Hansen <dave.hansen@intel.com> wrote:
> I really wanted get_cpu_address_sizes() to be the one and only spot
> where c->x86_phys_bits is established.  That way, we don't get a bunch
> of code all of the place tweaking it and fighting for who "wins".
> We're not there yet, but the approach in this patch moves it back in the
> wrong direction because it permits the random tweaking of c->x86_phys_bits.

I see your point; one of my earlier attempts added a
->c_detect_mem_encrypt() callback that basically amounted to either
amd_detect_mem_encrypt() or detect_tme(). I bailed out of it mostly
because these functions do more than adjusting phys_bits, and it
seemed arbitrary to call them from get_cpu_address_sizes(). The two
approaches share the idea of centralizing the computation of
x86_phys_bits in get_cpu_address_sizes().

There is unfortunately an important hurdle for your patch, in that
currently the BSP and AP flows are completely different. For the BSP
the flow is ->c_early_init(), then get_cpu_address_sizes(), then again
->c_early_init() called by ->c_init(), then ->c_init(). For APs it is
get_cpu_address_sizes(), then ->c_early_init() called by ->c_init(),
then the rest of ->c_init(). And let's not even look at
->c_identify().

This difference is bad for your patch, because get_cpu_address_sizes()
is called too early to see enc_phys_bits on APs. But it was also
something that fbf6449f84bf didn't take into account, because it left
behind the tentative initialization of x86_*_bits in identify_cpu(),
while removing it from early_identify_cpu().  And

TBH my first reaction after Kirill pointed me to fbf6449f84bf was to
revert it. It's not like the code before was much less of a dumpster
fire, but that commit made the BSP vs. AP mess even worse. But it
fixed a real-world bug and it did remove most of the "first set then
adjust" logic, at least for the BSP, so a revert wasn't on the table
and patch 1 was what came out of it.

I know that in general in Linux we prefer to fix things for good.
Dancing one step behind and two ahead comes with the the risk that you
only do the step behind. But in the end something like this patch 1
would have to be posted for stable branches (and Greg doesn't like
one-off patches), and I am not even sure it's a step behind because it
removes _some_ of the BSP vs. AP differences introduced by
fbf6449f84bf.

In a nutshell: I don't dislike the idea behind your patch, but the
code is just not ready for it.

I can look into cleaning up cpu/common.c though. I'm a natural
procrastinator, it's my kind of thing to embark on a series of 30
patches to clean up 20 years old code...

Paolo

> Could we instead do something more like the (completely untested)
> attached patch?
>
> BTW, I'm pretty sure the WARN_ON() in the patch won't actually work, but
> it'd be nice to work toward a point when it does.
>
  
Paolo Bonzini Feb. 4, 2024, 5:21 p.m. UTC | #4
On Fri, Feb 2, 2024 at 12:08 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On Thu, Feb 1, 2024 at 7:29 PM Dave Hansen <dave.hansen@intel.com> wrote:
> > I really wanted get_cpu_address_sizes() to be the one and only spot
> > where c->x86_phys_bits is established.  That way, we don't get a bunch
> > of code all of the place tweaking it and fighting for who "wins".
> > We're not there yet, but the approach in this patch moves it back in the
> > wrong direction because it permits the random tweaking of c->x86_phys_bits.
>
> I see your point; one of my earlier attempts added a
> ->c_detect_mem_encrypt() callback that basically amounted to either
> amd_detect_mem_encrypt() or detect_tme(). I bailed out of it mostly
> because these functions do more than adjusting phys_bits, and it
> seemed arbitrary to call them from get_cpu_address_sizes(). The two
> approaches share the idea of centralizing the computation of
> x86_phys_bits in get_cpu_address_sizes().
>
> There is unfortunately an important hurdle for your patch, in that
> currently the BSP and AP flows are completely different. For the BSP
> the flow is ->c_early_init(), then get_cpu_address_sizes(), then again
> ->c_early_init() called by ->c_init(), then ->c_init(). For APs it is
> get_cpu_address_sizes(), then ->c_early_init() called by ->c_init(),
> then the rest of ->c_init(). And let's not even look at
> ->c_identify().
>
> This difference is bad for your patch, because get_cpu_address_sizes()
> is called too early to see enc_phys_bits on APs. But it was also
> something that fbf6449f84bf didn't take into account, because it left
> behind the tentative initialization of x86_*_bits in identify_cpu(),
> while removing it from early_identify_cpu().  And
>
> TBH my first reaction after Kirill pointed me to fbf6449f84bf was to
> revert it. It's not like the code before was much less of a dumpster
> fire, but that commit made the BSP vs. AP mess even worse. But it
> fixed a real-world bug and it did remove most of the "first set then
> adjust" logic, at least for the BSP, so a revert wasn't on the table
> and patch 1 was what came out of it.
>
> I know that in general in Linux we prefer to fix things for good.
> Dancing one step behind and two ahead comes with the the risk that you
> only do the step behind. But in the end something like this patch 1
> would have to be posted for stable branches (and Greg doesn't like
> one-off patches), and I am not even sure it's a step behind because it
> removes _some_ of the BSP vs. AP differences introduced by
> fbf6449f84bf.
>
> In a nutshell: I don't dislike the idea behind your patch, but the
> code is just not ready for it.

This is the diffstat before your patch can be applied more or less as is:

$ git diffstat origin/master
 arch/x86/include/asm/processor.h |   1 +
 arch/x86/kernel/cpu/amd.c        |  12 ++--
 arch/x86/kernel/cpu/centaur.c    |  13 ++---
 arch/x86/kernel/cpu/common.c     | 103 +++++++++++++++++++----------------
 arch/x86/kernel/cpu/hygon.c      |   2 -
 arch/x86/kernel/cpu/intel.c      |   4 +-
 arch/x86/kernel/cpu/transmeta.c  |   2 -
 arch/x86/kernel/cpu/zhaoxin.c    |   1 -
 8 files changed, 69 insertions(+), 69 deletions(-)

$ git log --oneline --reverse origin/master..
d639afed02aa x86/cpu/common: move code up to early
get_cpu_address_sizes() to a new function
40d34260a4ba x86/cpu/common: pull get_cpu_*() calls common to BSPs and
APs to a new function
67b9839f9c38 x86/cpu/common: put all setup_force/clear capabilities together
ebeae7f91cbc x86/cpu/centaur: do everything before
early_init_centaur() in early_init_centaur()
d62fa7400885 x86/cpu/cyrix: call early init function from init function
0aa0916cd7e0 x86/cpu/common: call early_init function on APs from common code
d656b651d217 (HEAD) dave


I still haven't tested very well, but anyway, what do you think?

Paolo
  
Paolo Bonzini Feb. 13, 2024, 3:45 p.m. UTC | #5
On Sun, Feb 4, 2024 at 6:21 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
> On Fri, Feb 2, 2024 at 12:08 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
> > On Thu, Feb 1, 2024 at 7:29 PM Dave Hansen <dave.hansen@intel.com> wrote:
> > > I really wanted get_cpu_address_sizes() to be the one and only spot
> > > where c->x86_phys_bits is established.  That way, we don't get a bunch
> > > of code all of the place tweaking it and fighting for who "wins".
> > > We're not there yet, but the approach in this patch moves it back in the
> > > wrong direction because it permits the random tweaking of c->x86_phys_bits.
> >
> > There is unfortunately an important hurdle [...] in that
> > currently the BSP and AP flows are completely different. For the BSP
> > the flow is ->c_early_init(), then get_cpu_address_sizes(), then again
> > ->c_early_init() called by ->c_init(), then ->c_init(). For APs it is
> > get_cpu_address_sizes(), then ->c_early_init() called by ->c_init(),
> > then the rest of ->c_init(). And let's not even look at
> > ->c_identify(). [...] get_cpu_address_sizes()
> > is called too early to see enc_phys_bits on APs. But it was also
> > something that fbf6449f84bf didn't take into account, because it left
> > behind the tentative initialization of x86_*_bits in identify_cpu(),
> > while removing it from early_identify_cpu().  And

Ping, either for applying the original patches or for guidance on how
to proceed.

Paolo
  
Dave Hansen Feb. 13, 2024, 5:07 p.m. UTC | #6
On 2/1/24 15:08, Paolo Bonzini wrote:
> There is unfortunately an important hurdle for your patch, in that
> currently the BSP and AP flows are completely different.

Do we even _need_ c->x86_phys_bits for APs?  I need to do a bit more
grepping, but I only see it being read in show_cpuinfo().  Everything
else seems to be boot_cpu_data.x86_phys_bits.
  
Dave Hansen Feb. 13, 2024, 10:02 p.m. UTC | #7
On 2/13/24 07:45, Paolo Bonzini wrote:
> Ping, either for applying the original patches or for guidance on how
> to proceed.

Gah, all of the gunk that get_cpu_address_sizes() touches are out of
control.

They (phys/virt_bits and clflush) need to get consolidated back to a
single copy that gets set up *once* in early boot and then read by
everyone else.  I've got a series to do that, but it's got its tentacles
in quite a few places.  They're not great backporting material.

Your patches make things a wee bit worse in the meantime, but they pale
in comparison to the random spaghetti that we've already got.  Also, we
probably need the early TME stuff regardless.

I think I'll probably suck it up, apply them, then fix them up along
with the greater mess.

Anybody have any better ideas?
  
Paolo Bonzini Feb. 13, 2024, 10:25 p.m. UTC | #8
On Tue, Feb 13, 2024 at 11:02 PM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 2/13/24 07:45, Paolo Bonzini wrote:
> > Ping, either for applying the original patches or for guidance on how
> > to proceed.
>
> Gah, all of the gunk that get_cpu_address_sizes() touches are out of
> control.
>
> They (phys/virt_bits and clflush) need to get consolidated back to a
> single copy that gets set up *once* in early boot and then read by
> everyone else.
>
> I've got a series to do that, but it's got its tentacles
> in quite a few places.  They're not great backporting material.

Yes, same for mine. I probably digressed in a different direction than
you, but there will be conflicts almost surely. I can post my stuff in
the next few days.

Paolo
  
Paolo Bonzini Feb. 20, 2024, 12:22 p.m. UTC | #9
On Tue, Feb 13, 2024 at 11:02 PM Dave Hansen <dave.hansen@intel.com> wrote:
> Your patches make things a wee bit worse in the meantime, but they pale
> in comparison to the random spaghetti that we've already got.  Also, we
> probably need the early TME stuff regardless.
>
> I think I'll probably suck it up, apply them, then fix them up along
> with the greater mess.
>
> Anybody have any better ideas?

Ping, in the end are we applying these patches for either 6.8 or 6.9?

Paolo
  
Dave Hansen Feb. 22, 2024, 6:06 p.m. UTC | #10
On 2/20/24 04:22, Paolo Bonzini wrote:
> On Tue, Feb 13, 2024 at 11:02 PM Dave Hansen <dave.hansen@intel.com> wrote:
>> Your patches make things a wee bit worse in the meantime, but they pale
>> in comparison to the random spaghetti that we've already got.  Also, we
>> probably need the early TME stuff regardless.
>>
>> I think I'll probably suck it up, apply them, then fix them up along
>> with the greater mess.
>>
>> Anybody have any better ideas?
> Ping, in the end are we applying these patches for either 6.8 or 6.9?

Let me poke at them and see if we can stick them in x86/urgent early
next week.  They do fix an actual bug that's biting people, right?
  
Paolo Bonzini Feb. 22, 2024, 6:08 p.m. UTC | #11
On Thu, Feb 22, 2024 at 7:07 PM Dave Hansen <dave.hansen@intel.com> wrote:
> > Ping, in the end are we applying these patches for either 6.8 or 6.9?
>
> Let me poke at them and see if we can stick them in x86/urgent early
> next week.  They do fix an actual bug that's biting people, right?

Yes, I have gotten reports of {Sapphire,Emerald} Rapids machines that
don't boot at all without either these patches or
"disable_mtrr_cleanup".

Paolo
  
Yin Fengwei Feb. 26, 2024, 1:57 a.m. UTC | #12
On 2/23/24 02:08, Paolo Bonzini wrote:
> On Thu, Feb 22, 2024 at 7:07 PM Dave Hansen <dave.hansen@intel.com> wrote:
>>> Ping, in the end are we applying these patches for either 6.8 or 6.9?
>>
>> Let me poke at them and see if we can stick them in x86/urgent early
>> next week.  They do fix an actual bug that's biting people, right?
> 
> Yes, I have gotten reports of {Sapphire,Emerald} Rapids machines that
> don't boot at all without either these patches or
> "disable_mtrr_cleanup".
We tried platform other than Sapphire and Emerald. This patchset can fix
boot issues on that platform also.


Regards
Yin, Fengwei

> 
> Paolo
> 
>
  
Dave Hansen Feb. 26, 2024, 4:21 p.m. UTC | #13
On 2/25/24 17:57, Yin Fengwei wrote:
> On 2/23/24 02:08, Paolo Bonzini wrote:
>> On Thu, Feb 22, 2024 at 7:07 PM Dave Hansen <dave.hansen@intel.com> wrote:
>>>> Ping, in the end are we applying these patches for either 6.8 or 6.9?
>>> Let me poke at them and see if we can stick them in x86/urgent early
>>> next week.  They do fix an actual bug that's biting people, right?
>> Yes, I have gotten reports of {Sapphire,Emerald} Rapids machines that
>> don't boot at all without either these patches or
>> "disable_mtrr_cleanup".
> We tried platform other than Sapphire and Emerald. This patchset can fix
> boot issues on that platform also.

Fengwei, could you also test this series on the troublesome hardware,
please?

> https://lore.kernel.org/all/20240222183926.517AFCD2@davehans-spike.ostc.intel.com/

If it _also_ fixes the problem, it'll be a strong indication that it's
the right long-term approach.
  
Yin Fengwei Feb. 27, 2024, 2:02 a.m. UTC | #14
On 2/27/24 00:21, Dave Hansen wrote:
> On 2/25/24 17:57, Yin Fengwei wrote:
>> On 2/23/24 02:08, Paolo Bonzini wrote:
>>> On Thu, Feb 22, 2024 at 7:07 PM Dave Hansen <dave.hansen@intel.com> wrote:
>>>>> Ping, in the end are we applying these patches for either 6.8 or 6.9?
>>>> Let me poke at them and see if we can stick them in x86/urgent early
>>>> next week.  They do fix an actual bug that's biting people, right?
>>> Yes, I have gotten reports of {Sapphire,Emerald} Rapids machines that
>>> don't boot at all without either these patches or
>>> "disable_mtrr_cleanup".
>> We tried platform other than Sapphire and Emerald. This patchset can fix
>> boot issues on that platform also.
> 
> Fengwei, could you also test this series on the troublesome hardware,
> please?
Sure. I will try it on my env. I just didn't connected your patchset to this
boot issue. :(.


Regards
Yin, Fengwei

> 
>> https://lore.kernel.org/all/20240222183926.517AFCD2@davehans-spike.ostc.intel.com/
> 
> If it _also_ fixes the problem, it'll be a strong indication that it's
> the right long-term approach.
>
  
Yin Fengwei Feb. 27, 2024, 6:08 a.m. UTC | #15
Hi Dave,

On 2/27/24 00:21, Dave Hansen wrote:
> On 2/25/24 17:57, Yin Fengwei wrote:
>> On 2/23/24 02:08, Paolo Bonzini wrote:
>>> On Thu, Feb 22, 2024 at 7:07 PM Dave Hansen <dave.hansen@intel.com> wrote:
>>>>> Ping, in the end are we applying these patches for either 6.8 or 6.9?
>>>> Let me poke at them and see if we can stick them in x86/urgent early
>>>> next week.  They do fix an actual bug that's biting people, right?
>>> Yes, I have gotten reports of {Sapphire,Emerald} Rapids machines that
>>> don't boot at all without either these patches or
>>> "disable_mtrr_cleanup".
>> We tried platform other than Sapphire and Emerald. This patchset can fix
>> boot issues on that platform also.
> 
> Fengwei, could you also test this series on the troublesome hardware,
> please?
> 
>> https://lore.kernel.org/all/20240222183926.517AFCD2@davehans-spike.ostc.intel.com/
> 
> If it _also_ fixes the problem, it'll be a strong indication that it's
> the right long-term approach.
I tried your patchset on a Sapphire machine which is the only one broken machine
I can access today. The base commit is 45ec2f5f6ed3 from the latest linus tree.

Without your patchset, the system boot hang.
With your patchset, the system boot successfully.


Regards
Yin, Fengwei
  
Dave Hansen Feb. 27, 2024, 1:30 p.m. UTC | #16
On 2/26/24 22:08, Yin Fengwei wrote:
>>> https://lore.kernel.org/all/20240222183926.517AFCD2@davehans-spike.ostc.intel.com/
>> If it _also_ fixes the problem, it'll be a strong indication that it's
>> the right long-term approach.
> I tried your patchset on a Sapphire machine which is the only one broken machine
> I can access today. The base commit is 45ec2f5f6ed3 from the latest linus tree.
> 
> Without your patchset, the system boot hang.
> With your patchset, the system boot successfully.

Yay!  Thanks for testing.
  
Yin Fengwei Feb. 28, 2024, 12:07 a.m. UTC | #17
On 2/27/24 21:30, Dave Hansen wrote:
> On 2/26/24 22:08, Yin Fengwei wrote:
>>>> https://lore.kernel.org/all/20240222183926.517AFCD2@davehans-spike.ostc.intel.com/
>>> If it _also_ fixes the problem, it'll be a strong indication that it's
>>> the right long-term approach.
>> I tried your patchset on a Sapphire machine which is the only one broken machine
>> I can access today. The base commit is 45ec2f5f6ed3 from the latest linus tree.
>>
>> Without your patchset, the system boot hang.
>> With your patchset, the system boot successfully.
> 
> Yay!  Thanks for testing.
My pleasure.


Regards
Yin, Fengwei