[RESEND] x86/pat: Simplifying the PAT programming protocol

Message ID 20240124130650.496056-1-kirill.shutemov@linux.intel.com
State New
Headers
Series [RESEND] x86/pat: Simplifying the PAT programming protocol |

Commit Message

Kirill A. Shutemov Jan. 24, 2024, 1:06 p.m. UTC
  The programming protocol for the PAT MSR follows the MTRR programming
protocol. However, this protocol is cumbersome and requires disabling
caching (CR0.CD=1), which is not possible on some platforms.

Specifically, a TDX guest is not allowed to set CR0.CD. It triggers
a #VE exception.

Turned out the requirement to follow the MTRR programming protocol for
PAT programming is unnecessarily strict. The new Intel Software
Developer Manual[1] (December 2023) relaxes this requirement. Please
refer to the section titled "Programming the PAT" for more information.

The AMD documentation does not link PAT programming to MTRR.

The kernel only needs to flush the TLB after updating the PAT MSR. The
set_memory code already takes care of flushing the TLB and cache when
changing the memory type of a page.

[1] http://www.intel.com/sdm

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
---
 arch/x86/kernel/cpu/cacheinfo.c | 7 ++++---
 arch/x86/mm/pat/memtype.c       | 9 +++------
 2 files changed, 7 insertions(+), 9 deletions(-)
  

Comments

Kirill A. Shutemov Jan. 31, 2024, 12:58 p.m. UTC | #1
On Wed, Jan 24, 2024 at 03:06:50PM +0200, Kirill A. Shutemov wrote:
> The programming protocol for the PAT MSR follows the MTRR programming
> protocol. However, this protocol is cumbersome and requires disabling
> caching (CR0.CD=1), which is not possible on some platforms.
> 
> Specifically, a TDX guest is not allowed to set CR0.CD. It triggers
> a #VE exception.
> 
> Turned out the requirement to follow the MTRR programming protocol for
> PAT programming is unnecessarily strict. The new Intel Software
> Developer Manual[1] (December 2023) relaxes this requirement. Please
> refer to the section titled "Programming the PAT" for more information.
> 
> The AMD documentation does not link PAT programming to MTRR.
> 
> The kernel only needs to flush the TLB after updating the PAT MSR. The
> set_memory code already takes care of flushing the TLB and cache when
> changing the memory type of a page.
> 
> [1] http://www.intel.com/sdm
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reviewed-by: Juergen Gross <jgross@suse.com>

Any feedback?
  
Borislav Petkov Jan. 31, 2024, 5:57 p.m. UTC | #2
On Wed, Jan 24, 2024 at 03:06:50PM +0200, Kirill A. Shutemov wrote:
> The programming protocol for the PAT MSR follows the MTRR programming
> protocol. However, this protocol is cumbersome and requires disabling
> caching (CR0.CD=1), which is not possible on some platforms.
> 
> Specifically, a TDX guest is not allowed to set CR0.CD. It triggers
> a #VE exception.
> 
> Turned out the requirement to follow the MTRR programming protocol for
> PAT programming is unnecessarily strict. The new Intel Software
> Developer Manual[1] (December 2023) relaxes this requirement. Please
> refer to the section titled "Programming the PAT" for more information.

How about you state that requirement here instead of referring to that
doc which is hard to read and changes constantly?

I'd prefer to have that programming requirement spelled out here to know
in the future what that requirement was and what "variant" was added to
the kernel in case someone decides to relax it even more.

> 
> The AMD documentation does not link PAT programming to MTRR.
> 
> The kernel only needs to flush the TLB after updating the PAT MSR. The
> set_memory code already takes care of flushing the TLB and cache when
> changing the memory type of a page.

So far so good. However, what guarantees that this relaxing of the
protocol doesn't break any existing machines?

If anything, this patch needs to be tested on everything possible. I can
do that on AMD hw and some old Intels, just to be sure.

> @@ -296,13 +298,8 @@ void __init pat_bp_init(void)
>  	/*
>  	 * Xen PV doesn't allow to set PAT MSR, but all cache modes are
>  	 * supported.
> -	 * When running as TDX guest setting the PAT MSR won't work either
> -	 * due to the requirement to set CR0.CD when doing so. Rely on
> -	 * firmware to have set the PAT MSR correctly.
>  	 */
> -	if (pat_disabled ||
> -	    cpu_feature_enabled(X86_FEATURE_XENPV) ||
> -	    cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) {
> +	if (pat_disabled || cpu_feature_enabled(X86_FEATURE_XENPV)) {
>  		init_cache_modes(pat_msr_val);
>  		return;

What does that mean, now, practically?

That TDX guests virtualize the PAT MSR just like with any other guest or
what is going on there?

This should be explicitly stated in the commit message.

Thx.
  
Kirill A. Shutemov Jan. 31, 2024, 6:52 p.m. UTC | #3
On Wed, Jan 31, 2024 at 06:57:38PM +0100, Borislav Petkov wrote:
> On Wed, Jan 24, 2024 at 03:06:50PM +0200, Kirill A. Shutemov wrote:
> > The programming protocol for the PAT MSR follows the MTRR programming
> > protocol. However, this protocol is cumbersome and requires disabling
> > caching (CR0.CD=1), which is not possible on some platforms.
> > 
> > Specifically, a TDX guest is not allowed to set CR0.CD. It triggers
> > a #VE exception.
> > 
> > Turned out the requirement to follow the MTRR programming protocol for
> > PAT programming is unnecessarily strict. The new Intel Software
> > Developer Manual[1] (December 2023) relaxes this requirement. Please
> > refer to the section titled "Programming the PAT" for more information.
> 
> How about you state that requirement here instead of referring to that
> doc which is hard to read and changes constantly?
> 
> I'd prefer to have that programming requirement spelled out here to know
> in the future what that requirement was and what "variant" was added to
> the kernel in case someone decides to relax it even more.

I summarized the requirements below: TLB has to flashed. Here's what SDM
says:

    The operating system (OS) is responsible for ensuring that changes to a
    PAT entry occur in a manner that maintains the consistency of the
    processor caches and translation lookaside buffers (TLB). It requires the
    OS to invalidate all affected TLB entries (including global entries) and
    all entries in all paging-structure caches. It may also require flushing
    of the processor caches in certain situations.

    ...

    Example of a sequence to invalidate the processor TLBs and caches (if
    necessary):

    1. If the PCIDE or PGE flag is set in CR4, flush TLBs by clearing one of
    those flags (then restore the flag via a subsequent CR4 write).

    Otherwise, flush TLBs by executing a MOV from control register CR3 to
    another register and then a MOV from that register back to CR3.

    2. In the case that there are changes to memory-type mappings for which
    cache self-snooping behavior would be problematic given the existing
    mappings (e.g., changing a cache line's memory type from WB to UC to be
    used for memory-mapped I/O), then cache flushing is also required. This
    can be done by executing CLFLUSH operations for all affected cache lines
    or by executing the WBINVD instruction (recommended only if there are a
    large number of affected mappings or if it is unknown which mappings are
    affected)

The second step is relevant for set_memory code that already does the
flushing on changing memory type.

> > The AMD documentation does not link PAT programming to MTRR.
> > 
> > The kernel only needs to flush the TLB after updating the PAT MSR. The
> > set_memory code already takes care of flushing the TLB and cache when
> > changing the memory type of a page.
> 
> So far so good. However, what guarantees that this relaxing of the
> protocol doesn't break any existing machines?

Our HW folks confirmed that the new optimized sequence works on all past
processors that support PAT.

> If anything, this patch needs to be tested on everything possible. I can
> do that on AMD hw and some old Intels, just to be sure.

Testing wouldn't hurt, but it cannot possibly prove that the new flow is
safe by testing.

> > @@ -296,13 +298,8 @@ void __init pat_bp_init(void)
> >  	/*
> >  	 * Xen PV doesn't allow to set PAT MSR, but all cache modes are
> >  	 * supported.
> > -	 * When running as TDX guest setting the PAT MSR won't work either
> > -	 * due to the requirement to set CR0.CD when doing so. Rely on
> > -	 * firmware to have set the PAT MSR correctly.
> >  	 */
> > -	if (pat_disabled ||
> > -	    cpu_feature_enabled(X86_FEATURE_XENPV) ||
> > -	    cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) {
> > +	if (pat_disabled || cpu_feature_enabled(X86_FEATURE_XENPV)) {
> >  		init_cache_modes(pat_msr_val);
> >  		return;
> 
> What does that mean, now, practically?
> 
> That TDX guests virtualize the PAT MSR just like with any other guest or
> what is going on there?
> 
> This should be explicitly stated in the commit message.

PAT MST was always virtualized, but we was not able to program it as we
followed MTRR protocol that sets CR0.CD. And I covered it in the commit
message:

    Specifically, a TDX guest is not allowed to set CR0.CD. It triggers
    a #VE exception.
  
Borislav Petkov Jan. 31, 2024, 8:23 p.m. UTC | #4
On Wed, Jan 31, 2024 at 08:52:46PM +0200, Kirill A. Shutemov wrote:
> The second step is relevant for set_memory code that already does the
> flushing on changing memory type.

So the "relaxation" is the removal of that CR0.CD requirement?

> Our HW folks confirmed that the new optimized sequence works on all past
> processors that support PAT.

Your folks confirmed that for all released hw and for x86 hardware from
other vendors?

> Testing wouldn't hurt, but it cannot possibly prove that the new flow is
> safe by testing.

This basically says that I should not take this patch at all as there's
no way of even testing it?!

I hope I'm misunderstanding you.

> PAT MST was always virtualized, but we was not able to program it as we
> followed MTRR protocol that sets CR0.CD. And I covered it in the commit
> message:
> 
>     Specifically, a TDX guest is not allowed to set CR0.CD. It triggers
>     a #VE exception.

Ok, I'll extend that part once the rest has been sorted out.

Thx.
  
Kirill A. Shutemov Jan. 31, 2024, 10:17 p.m. UTC | #5
On Wed, Jan 31, 2024 at 09:23:40PM +0100, Borislav Petkov wrote:
> On Wed, Jan 31, 2024 at 08:52:46PM +0200, Kirill A. Shutemov wrote:
> > The second step is relevant for set_memory code that already does the
> > flushing on changing memory type.
> 
> So the "relaxation" is the removal of that CR0.CD requirement?

And double WBINVD if the machine has no X86_FEATURE_SELFSNOOP (before and
after TLB flush).

> > Our HW folks confirmed that the new optimized sequence works on all past
> > processors that support PAT.
> 
> Your folks confirmed that for all released hw and for x86 hardware from
> other vendors?

No. They can only talk for Intel CPUs. But AMD docs don't require MTTR
flow to begin with.

It is better to double-check on AMD side.

> > Testing wouldn't hurt, but it cannot possibly prove that the new flow is
> > safe by testing.
> 
> This basically says that I should not take this patch at all as there's
> no way of even testing it?!
> 
> I hope I'm misunderstanding you.

Testing can prove that the proposed patch is broken if a problem show ups.
But if you found no issue during testing you cannot say that the patch is
safe. You could just get lucky and don't hit pathological scenario with
broken cache/TLB aliases or something.

It is better to get confirmation from HW folks.
  
Borislav Petkov Feb. 5, 2024, 1:09 p.m. UTC | #6
On Thu, Feb 01, 2024 at 12:17:32AM +0200, Kirill A. Shutemov wrote:
> It is better to get confirmation from HW folks.

Yah, lemme see what I can do.
  
Borislav Petkov Feb. 13, 2024, 4:15 p.m. UTC | #7
On Thu, Feb 01, 2024 at 12:17:32AM +0200, Kirill A. Shutemov wrote:
> > So the "relaxation" is the removal of that CR0.CD requirement?

So I'm looking at the SDM, revisions 081US and 082US.

Section

"12.11.8 MTRR Considerations in MP Systems"

still has

"4. Enter the no-fill cache mode. (Set the CD flag in control register
CR0 to 1 and the NW flag to 0.)"

and

"4. Enter the no-fill cache mode. (Set the CD flag in control register
CR0 to 1 and the NW flag to 0.)"

so where is that relaxation written? Am I looking at the wrong place?

> And double WBINVD if the machine has no X86_FEATURE_SELFSNOOP (before and
> after TLB flush).

That's still there too. Steps 5 and 11, respectively.

Hmmm?
  
Kirill A. Shutemov Feb. 13, 2024, 4:57 p.m. UTC | #8
On Tue, Feb 13, 2024 at 05:15:14PM +0100, Borislav Petkov wrote:
> On Thu, Feb 01, 2024 at 12:17:32AM +0200, Kirill A. Shutemov wrote:
> > > So the "relaxation" is the removal of that CR0.CD requirement?
> 
> So I'm looking at the SDM, revisions 081US and 082US.
> 
> Section
> 
> "12.11.8 MTRR Considerations in MP Systems"

The point is that PAT programming doesn't need to follow MTRR
considerations anymore.

Previously "Programming the PAT" section had this:

   The operating system is responsible for ensuring that changes to a PAT
   entry occur in a manner that maintains the consistency of the processor
   caches and translation lookaside buffers (TLB). This is accomplished by
   following the procedure as specified in Section 12.11.8, “MTRR
   Considerations in MP Systems,” for changing the value of an MTRR in a
   multiple processor system. It requires a specific sequence of
   operations that includes flushing the processors caches and TLBs.

The new version points to MTTR consideration as one of possible way to
invalidate TLB and caches:

  The operating system (OS) is responsible for ensuring that changes to a
  PAT entry occur in a manner that maintains the consistency of the
  processor caches and translation lookaside buffers (TLB). It requires the
  OS to invalidate all affected TLB entries (including global entries) and
  all entries in all paging-structure caches. It may also require flushing
  of the processor caches in certain situations. This can be accomplished
  in various ways, including the sequence below or by following the
  procedure specified in Section 12.11.8, “MTRR Considerations in MP
  Systems.” (See Section 4.10.4, “Invalidation of TLBs and
  Paging-Structure Caches” for additional background information.) Also
  note that in a multi-processor environment, it is the software's
  responsibility to resolve differences in conflicting memory types across
  logical processors that may arise from changes to the PAT (e.g., if two
  logical processors map a linear address to the same physical address but
  have PATs that specify a different memory type for that physical
  address).

The new text follows with example of sequence that flushes TLB and
caches. And it doesn't touch CR0.CD.
  
Borislav Petkov Feb. 20, 2024, 1:28 p.m. UTC | #9
On Tue, Feb 13, 2024 at 06:57:19PM +0200, Kirill A. Shutemov wrote:
> The new text follows with example of sequence that flushes TLB and
> caches. And it doesn't touch CR0.CD.

Ok, sounds like AMD is fine here. I'm going to take this one and see
what explodes. If something does, we can always remove it.

Thx.
  

Patch

diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index c131c412db89..78afad50a7c0 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -1118,15 +1118,16 @@  static void cache_cpu_init(void)
 	unsigned long flags;
 
 	local_irq_save(flags);
-	cache_disable();
 
-	if (memory_caching_control & CACHE_MTRR)
+	if (memory_caching_control & CACHE_MTRR) {
+		cache_disable();
 		mtrr_generic_set_state();
+		cache_enable();
+	}
 
 	if (memory_caching_control & CACHE_PAT)
 		pat_cpu_init();
 
-	cache_enable();
 	local_irq_restore(flags);
 }
 
diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
index 0904d7e8e126..0d72183b5dd0 100644
--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -240,6 +240,8 @@  void pat_cpu_init(void)
 	}
 
 	wrmsrl(MSR_IA32_CR_PAT, pat_msr_val);
+
+	__flush_tlb_all();
 }
 
 /**
@@ -296,13 +298,8 @@  void __init pat_bp_init(void)
 	/*
 	 * Xen PV doesn't allow to set PAT MSR, but all cache modes are
 	 * supported.
-	 * When running as TDX guest setting the PAT MSR won't work either
-	 * due to the requirement to set CR0.CD when doing so. Rely on
-	 * firmware to have set the PAT MSR correctly.
 	 */
-	if (pat_disabled ||
-	    cpu_feature_enabled(X86_FEATURE_XENPV) ||
-	    cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) {
+	if (pat_disabled || cpu_feature_enabled(X86_FEATURE_XENPV)) {
 		init_cache_modes(pat_msr_val);
 		return;
 	}