[v2,7/8] x86/mm: only check uniform after calling mtrr_type_lookup()

Message ID 20230209072220.6836-8-jgross@suse.com
State New
Headers
Series x86/mtrr: fix handling with PAT but without MTRR |

Commit Message

Juergen Gross Feb. 9, 2023, 7:22 a.m. UTC
  Today pud_set_huge() and pmd_set_huge() test for the MTRR type to be
WB or INVALID after calling mtrr_type_lookup(). Those tests can be
dropped, as the only reason to not use a large mapping would be
uniform being 0. Any MTRR type can be accepted as long as it applies
to the whole memory range covered by the mapping, as the alternative
would only be to map the same region with smaller pages instead using
the same PAT type as for the large mapping.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
---
 arch/x86/mm/pgtable.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)
  

Comments

Edgecombe, Rick P Feb. 11, 2023, 12:06 a.m. UTC | #1
On Thu, 2023-02-09 at 08:22 +0100, Juergen Gross wrote:
> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> index e4f499eb0f29..7b9c5443d176 100644
> --- a/arch/x86/mm/pgtable.c
> +++ b/arch/x86/mm/pgtable.c
> @@ -721,8 +721,7 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr,
> pgprot_t prot)
>         u8 mtrr, uniform;

'mtrr' is now unused. Can it be dropped? Same for the pmd_set_huge().

>  
>         mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE, &uniform);
> -       if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
> -           (mtrr != MTRR_TYPE_WRBACK))
> +       if (!uniform)
>                 return 0;
  
Michael Kelley (LINUX) Feb. 13, 2023, 1:08 a.m. UTC | #2
From: Juergen Gross <jgross@suse.com> Sent: Wednesday, February 8, 2023 11:22 PM
>
> Today pud_set_huge() and pmd_set_huge() test for the MTRR type to be
> WB or INVALID after calling mtrr_type_lookup(). Those tests can be
> dropped, as the only reason to not use a large mapping would be
> uniform being 0. Any MTRR type can be accepted as long as it applies
> to the whole memory range covered by the mapping, as the alternative
> would only be to map the same region with smaller pages instead using
> the same PAT type as for the large mapping.
> 
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
>  arch/x86/mm/pgtable.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> index e4f499eb0f29..7b9c5443d176 100644
> --- a/arch/x86/mm/pgtable.c
> +++ b/arch/x86/mm/pgtable.c
> @@ -721,8 +721,7 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
>  	u8 mtrr, uniform;
> 
>  	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE, &uniform);
> -	if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
> -	    (mtrr != MTRR_TYPE_WRBACK))
> +	if (!uniform)
>  		return 0;
> 
>  	/* Bail out if we are we on a populated non-leaf entry: */
> @@ -748,8 +747,7 @@ int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
>  	u8 mtrr, uniform;
> 
>  	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE, &uniform);
> -	if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
> -	    (mtrr != MTRR_TYPE_WRBACK)) {
> +	if (!uniform) {
>  		pr_warn_once("%s: Cannot satisfy [mem %#010llx-%#010llx] with a huge-page mapping due to MTRR override.\n",
>  			     __func__, addr, addr + PMD_SIZE);

I'm seeing this warning trigger in a normal Hyper-V guest (i.e., *not* an
SEV-SNP Confidential VM).  The original filtering here based on
MTRR_TYPE_WRBACK appears to be hiding a bug in mtrr_type_lookup_variable()
where it incorrectly thinks an address range matches two different variable
MTRRs, and hence clears "uniform".

Here are the variable MTRRs in the normal Hyper-V guest with 32 GiBytes
of memory:

[    0.043592] MTRR variable ranges enabled:
[    0.048308]   0 base 000000000000 mask FFFF00000000 write-back
[    0.057450]   1 base 000100000000 mask FFF000000000 write-back
[    0.063972]   2 disabled
[    0.066755]   3 disabled
[    0.070024]   4 disabled
[    0.072856]   5 disabled
[    0.076112]   6 disabled
[    0.078760]   7 disabled

Variable MTRR #0 covers addresses up to 4 GiByte, while #1 covers
4 GiByte to 64 GiByte.   But in mtrr_type_lookup_variable(), address
range 0xF8000000 to 0xF81FFFFF is matching both MTRRs, when it
should be matching just #0.

The problem looks to be this code in mtrr_type_lookup_variable():

              if ((start & mask) != (base & mask))
                        continue;

If the mask bits of start and base are different, then the
MTRR doesn't match, and the continue statement should be
executed.  That's correct.  But if the mask bits are the same,
that's not sufficient for the MTRR to match.  If the end
address is less than base, the MTRR doesn't match, and
the continue statement should still be executed, which
isn't happening.

But somebody please check my thinking. :-)

Michael
  
Juergen Gross Feb. 13, 2023, 6:08 a.m. UTC | #3
On 11.02.23 01:06, Edgecombe, Rick P wrote:
> On Thu, 2023-02-09 at 08:22 +0100, Juergen Gross wrote:
>> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
>> index e4f499eb0f29..7b9c5443d176 100644
>> --- a/arch/x86/mm/pgtable.c
>> +++ b/arch/x86/mm/pgtable.c
>> @@ -721,8 +721,7 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr,
>> pgprot_t prot)
>>          u8 mtrr, uniform;
> 
> 'mtrr' is now unused. Can it be dropped? Same for the pmd_set_huge().

I guess it will be used again, due to the comment you made for the whole
series.


Juergen
  
Juergen Gross Feb. 13, 2023, 6:35 a.m. UTC | #4
On 13.02.23 02:08, Michael Kelley (LINUX) wrote:
> From: Juergen Gross <jgross@suse.com> Sent: Wednesday, February 8, 2023 11:22 PM
>>
>> Today pud_set_huge() and pmd_set_huge() test for the MTRR type to be
>> WB or INVALID after calling mtrr_type_lookup(). Those tests can be
>> dropped, as the only reason to not use a large mapping would be
>> uniform being 0. Any MTRR type can be accepted as long as it applies
>> to the whole memory range covered by the mapping, as the alternative
>> would only be to map the same region with smaller pages instead using
>> the same PAT type as for the large mapping.
>>
>> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>>   arch/x86/mm/pgtable.c | 6 ++----
>>   1 file changed, 2 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
>> index e4f499eb0f29..7b9c5443d176 100644
>> --- a/arch/x86/mm/pgtable.c
>> +++ b/arch/x86/mm/pgtable.c
>> @@ -721,8 +721,7 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
>>   	u8 mtrr, uniform;
>>
>>   	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE, &uniform);
>> -	if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
>> -	    (mtrr != MTRR_TYPE_WRBACK))
>> +	if (!uniform)
>>   		return 0;
>>
>>   	/* Bail out if we are we on a populated non-leaf entry: */
>> @@ -748,8 +747,7 @@ int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
>>   	u8 mtrr, uniform;
>>
>>   	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE, &uniform);
>> -	if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
>> -	    (mtrr != MTRR_TYPE_WRBACK)) {
>> +	if (!uniform) {
>>   		pr_warn_once("%s: Cannot satisfy [mem %#010llx-%#010llx] with a huge-page mapping due to MTRR override.\n",
>>   			     __func__, addr, addr + PMD_SIZE);
> 
> I'm seeing this warning trigger in a normal Hyper-V guest (i.e., *not* an
> SEV-SNP Confidential VM).  The original filtering here based on
> MTRR_TYPE_WRBACK appears to be hiding a bug in mtrr_type_lookup_variable()
> where it incorrectly thinks an address range matches two different variable
> MTRRs, and hence clears "uniform".
> 
> Here are the variable MTRRs in the normal Hyper-V guest with 32 GiBytes
> of memory:
> 
> [    0.043592] MTRR variable ranges enabled:
> [    0.048308]   0 base 000000000000 mask FFFF00000000 write-back
> [    0.057450]   1 base 000100000000 mask FFF000000000 write-back
> [    0.063972]   2 disabled
> [    0.066755]   3 disabled
> [    0.070024]   4 disabled
> [    0.072856]   5 disabled
> [    0.076112]   6 disabled
> [    0.078760]   7 disabled
> 
> Variable MTRR #0 covers addresses up to 4 GiByte, while #1 covers
> 4 GiByte to 64 GiByte.   But in mtrr_type_lookup_variable(), address
> range 0xF8000000 to 0xF81FFFFF is matching both MTRRs, when it
> should be matching just #0.
> 
> The problem looks to be this code in mtrr_type_lookup_variable():
> 
>                if ((start & mask) != (base & mask))
>                          continue;
> 
> If the mask bits of start and base are different, then the
> MTRR doesn't match, and the continue statement should be
> executed.  That's correct.  But if the mask bits are the same,
> that's not sufficient for the MTRR to match.  If the end
> address is less than base, the MTRR doesn't match, and
> the continue statement should still be executed, which
> isn't happening.
> 
> But somebody please check my thinking. :-)

I don't see a flaw in your reasoning.

Rick mentioned a problem with this patch in a KVM guest. I'll try to
reproduce his setup for checking whether fixing mtrr_type_lookup_variable()
is enough, or if we need to keep the tests for WB in this patch.


Juergen
  
Juergen Gross Feb. 15, 2023, 1:40 p.m. UTC | #5
On 13.02.23 02:08, Michael Kelley (LINUX) wrote:
> From: Juergen Gross <jgross@suse.com> Sent: Wednesday, February 8, 2023 11:22 PM
>>
>> Today pud_set_huge() and pmd_set_huge() test for the MTRR type to be
>> WB or INVALID after calling mtrr_type_lookup(). Those tests can be
>> dropped, as the only reason to not use a large mapping would be
>> uniform being 0. Any MTRR type can be accepted as long as it applies
>> to the whole memory range covered by the mapping, as the alternative
>> would only be to map the same region with smaller pages instead using
>> the same PAT type as for the large mapping.
>>
>> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>>   arch/x86/mm/pgtable.c | 6 ++----
>>   1 file changed, 2 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
>> index e4f499eb0f29..7b9c5443d176 100644
>> --- a/arch/x86/mm/pgtable.c
>> +++ b/arch/x86/mm/pgtable.c
>> @@ -721,8 +721,7 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
>>   	u8 mtrr, uniform;
>>
>>   	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE, &uniform);
>> -	if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
>> -	    (mtrr != MTRR_TYPE_WRBACK))
>> +	if (!uniform)
>>   		return 0;
>>
>>   	/* Bail out if we are we on a populated non-leaf entry: */
>> @@ -748,8 +747,7 @@ int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
>>   	u8 mtrr, uniform;
>>
>>   	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE, &uniform);
>> -	if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
>> -	    (mtrr != MTRR_TYPE_WRBACK)) {
>> +	if (!uniform) {
>>   		pr_warn_once("%s: Cannot satisfy [mem %#010llx-%#010llx] with a huge-page mapping due to MTRR override.\n",
>>   			     __func__, addr, addr + PMD_SIZE);
> 
> I'm seeing this warning trigger in a normal Hyper-V guest (i.e., *not* an
> SEV-SNP Confidential VM).  The original filtering here based on
> MTRR_TYPE_WRBACK appears to be hiding a bug in mtrr_type_lookup_variable()
> where it incorrectly thinks an address range matches two different variable
> MTRRs, and hence clears "uniform".
> 
> Here are the variable MTRRs in the normal Hyper-V guest with 32 GiBytes
> of memory:
> 
> [    0.043592] MTRR variable ranges enabled:
> [    0.048308]   0 base 000000000000 mask FFFF00000000 write-back
> [    0.057450]   1 base 000100000000 mask FFF000000000 write-back

I've read the SDM chapter for MTRRs again. Doesn't #1 violate the requirements
for MTRR settings? The SDM says:

   For ranges greater than 4 KBytes, each range must be of length 2^n and its
   base address must be aligned on a 2^n boundary, where n is a value equal to
   or greater than 12. The base-address alignment value cannot be less than its
   length. For example, an 8-KByte range cannot be aligned on a 4-KByte boundary.
   It must be aligned on at least an 8-KByte boundary.

This makes the reasoning below wrong.

> [    0.063972]   2 disabled
> [    0.066755]   3 disabled
> [    0.070024]   4 disabled
> [    0.072856]   5 disabled
> [    0.076112]   6 disabled
> [    0.078760]   7 disabled
> 
> Variable MTRR #0 covers addresses up to 4 GiByte, while #1 covers
> 4 GiByte to 64 GiByte.   But in mtrr_type_lookup_variable(), address
> range 0xF8000000 to 0xF81FFFFF is matching both MTRRs, when it
> should be matching just #0.
> 
> The problem looks to be this code in mtrr_type_lookup_variable():
> 
>                if ((start & mask) != (base & mask))
>                          continue;
> 
> If the mask bits of start and base are different, then the
> MTRR doesn't match, and the continue statement should be
> executed.  That's correct.  But if the mask bits are the same,
> that's not sufficient for the MTRR to match.  If the end
> address is less than base, the MTRR doesn't match, and
> the continue statement should still be executed, which
> isn't happening.
> 
> But somebody please check my thinking. :-)

I think you need to correct the hypervisor.


Juergen
  
Michael Kelley (LINUX) Feb. 15, 2023, 7:38 p.m. UTC | #6
From: Juergen Gross <jgross@suse.com> Sent: Wednesday, February 15, 2023 5:40 AM
> 
> On 13.02.23 02:08, Michael Kelley (LINUX) wrote:
> > From: Juergen Gross <jgross@suse.com> Sent: Wednesday, February 8, 2023 11:22
> PM
> >>
> >> Today pud_set_huge() and pmd_set_huge() test for the MTRR type to be
> >> WB or INVALID after calling mtrr_type_lookup(). Those tests can be
> >> dropped, as the only reason to not use a large mapping would be
> >> uniform being 0. Any MTRR type can be accepted as long as it applies
> >> to the whole memory range covered by the mapping, as the alternative
> >> would only be to map the same region with smaller pages instead using
> >> the same PAT type as for the large mapping.
> >>
> >> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> >> Signed-off-by: Juergen Gross <jgross@suse.com>
> >> ---
> >>   arch/x86/mm/pgtable.c | 6 ++----
> >>   1 file changed, 2 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> >> index e4f499eb0f29..7b9c5443d176 100644
> >> --- a/arch/x86/mm/pgtable.c
> >> +++ b/arch/x86/mm/pgtable.c
> >> @@ -721,8 +721,7 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t
> prot)
> >>   	u8 mtrr, uniform;
> >>
> >>   	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE, &uniform);
> >> -	if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
> >> -	    (mtrr != MTRR_TYPE_WRBACK))
> >> +	if (!uniform)
> >>   		return 0;
> >>
> >>   	/* Bail out if we are we on a populated non-leaf entry: */
> >> @@ -748,8 +747,7 @@ int pmd_set_huge(pmd_t *pmd, phys_addr_t addr,
> pgprot_t prot)
> >>   	u8 mtrr, uniform;
> >>
> >>   	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE, &uniform);
> >> -	if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
> >> -	    (mtrr != MTRR_TYPE_WRBACK)) {
> >> +	if (!uniform) {
> >>   		pr_warn_once("%s: Cannot satisfy [mem %#010llx-%#010llx] with a
> huge-page mapping due to MTRR override.\n",
> >>   			     __func__, addr, addr + PMD_SIZE);
> >
> > I'm seeing this warning trigger in a normal Hyper-V guest (i.e., *not* an
> > SEV-SNP Confidential VM).  The original filtering here based on
> > MTRR_TYPE_WRBACK appears to be hiding a bug in mtrr_type_lookup_variable()
> > where it incorrectly thinks an address range matches two different variable
> > MTRRs, and hence clears "uniform".
> >
> > Here are the variable MTRRs in the normal Hyper-V guest with 32 GiBytes
> > of memory:
> >
> > [    0.043592] MTRR variable ranges enabled:
> > [    0.048308]   0 base 000000000000 mask FFFF00000000 write-back
> > [    0.057450]   1 base 000100000000 mask FFF000000000 write-back
> 
> I've read the SDM chapter for MTRRs again. Doesn't #1 violate the requirements
> for MTRR settings? The SDM says:
> 
>    For ranges greater than 4 KBytes, each range must be of length 2^n and its
>    base address must be aligned on a 2^n boundary, where n is a value equal to
>    or greater than 12. The base-address alignment value cannot be less than its
>    length. For example, an 8-KByte range cannot be aligned on a 4-KByte boundary.
>    It must be aligned on at least an 8-KByte boundary.
> 
> This makes the reasoning below wrong.

Argh.  It sure looks like you are right.  I just assumed the MTRRs coming from
Hyper-V were good.  Shame on me. :-(

I've ping'ed the Hyper-V team to see what they say.  But it's hard to see how
they could argue that these MTRRs are correctly formed.  The Intel spec is
unambiguous.

Even if Hyper-V agrees that the MTRRs are wrong, a fix will take time to
propagate.  In the meantime, it seems like the Linux mitigations could be
any of the following:

1) Keep the test for WB in pud_set_huge() and pmd_set_huge()

2) Remove the test, but have "uniform" set to 1 when multiple MTRRs are
    matched but all have the same caching type, which you proposed to
    solve Rick Edgecombe's problem.  This is likely to paper over the
    problem I saw with the Hyper-V MTRRs because the incorrectly matching
    MTRRs would all be WB.

3) In *all* Hyper-V VMs (not just Confidential VMs), disable X86_FEATURE_MTRR
    and use the new override to set the default type to WB.   Hopefully we don't
    have to do this, but I can submit a separate patch if it becomes necessary.

Michael
  
Juergen Gross Feb. 16, 2023, 5:22 a.m. UTC | #7
On 15.02.23 20:38, Michael Kelley (LINUX) wrote:
> From: Juergen Gross <jgross@suse.com> Sent: Wednesday, February 15, 2023 5:40 AM
>>
>> On 13.02.23 02:08, Michael Kelley (LINUX) wrote:
>>> From: Juergen Gross <jgross@suse.com> Sent: Wednesday, February 8, 2023 11:22
>> PM
>>>>
>>>> Today pud_set_huge() and pmd_set_huge() test for the MTRR type to be
>>>> WB or INVALID after calling mtrr_type_lookup(). Those tests can be
>>>> dropped, as the only reason to not use a large mapping would be
>>>> uniform being 0. Any MTRR type can be accepted as long as it applies
>>>> to the whole memory range covered by the mapping, as the alternative
>>>> would only be to map the same region with smaller pages instead using
>>>> the same PAT type as for the large mapping.
>>>>
>>>> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>> ---
>>>>    arch/x86/mm/pgtable.c | 6 ++----
>>>>    1 file changed, 2 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
>>>> index e4f499eb0f29..7b9c5443d176 100644
>>>> --- a/arch/x86/mm/pgtable.c
>>>> +++ b/arch/x86/mm/pgtable.c
>>>> @@ -721,8 +721,7 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t
>> prot)
>>>>    	u8 mtrr, uniform;
>>>>
>>>>    	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE, &uniform);
>>>> -	if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
>>>> -	    (mtrr != MTRR_TYPE_WRBACK))
>>>> +	if (!uniform)
>>>>    		return 0;
>>>>
>>>>    	/* Bail out if we are we on a populated non-leaf entry: */
>>>> @@ -748,8 +747,7 @@ int pmd_set_huge(pmd_t *pmd, phys_addr_t addr,
>> pgprot_t prot)
>>>>    	u8 mtrr, uniform;
>>>>
>>>>    	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE, &uniform);
>>>> -	if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
>>>> -	    (mtrr != MTRR_TYPE_WRBACK)) {
>>>> +	if (!uniform) {
>>>>    		pr_warn_once("%s: Cannot satisfy [mem %#010llx-%#010llx] with a
>> huge-page mapping due to MTRR override.\n",
>>>>    			     __func__, addr, addr + PMD_SIZE);
>>>
>>> I'm seeing this warning trigger in a normal Hyper-V guest (i.e., *not* an
>>> SEV-SNP Confidential VM).  The original filtering here based on
>>> MTRR_TYPE_WRBACK appears to be hiding a bug in mtrr_type_lookup_variable()
>>> where it incorrectly thinks an address range matches two different variable
>>> MTRRs, and hence clears "uniform".
>>>
>>> Here are the variable MTRRs in the normal Hyper-V guest with 32 GiBytes
>>> of memory:
>>>
>>> [    0.043592] MTRR variable ranges enabled:
>>> [    0.048308]   0 base 000000000000 mask FFFF00000000 write-back
>>> [    0.057450]   1 base 000100000000 mask FFF000000000 write-back
>>
>> I've read the SDM chapter for MTRRs again. Doesn't #1 violate the requirements
>> for MTRR settings? The SDM says:
>>
>>     For ranges greater than 4 KBytes, each range must be of length 2^n and its
>>     base address must be aligned on a 2^n boundary, where n is a value equal to
>>     or greater than 12. The base-address alignment value cannot be less than its
>>     length. For example, an 8-KByte range cannot be aligned on a 4-KByte boundary.
>>     It must be aligned on at least an 8-KByte boundary.
>>
>> This makes the reasoning below wrong.
> 
> Argh.  It sure looks like you are right.  I just assumed the MTRRs coming from
> Hyper-V were good.  Shame on me. :-(

I assumed the same, as I didn't see any flaw in your reasoning before. :-)

> I've ping'ed the Hyper-V team to see what they say.  But it's hard to see how
> they could argue that these MTRRs are correctly formed.  The Intel spec is
> unambiguous.
> 
> Even if Hyper-V agrees that the MTRRs are wrong, a fix will take time to
> propagate.  In the meantime, it seems like the Linux mitigations could be
> any of the following:
> 
> 1) Keep the test for WB in pud_set_huge() and pmd_set_huge()
> 
> 2) Remove the test, but have "uniform" set to 1 when multiple MTRRs are
>      matched but all have the same caching type, which you proposed to
>      solve Rick Edgecombe's problem.  This is likely to paper over the
>      problem I saw with the Hyper-V MTRRs because the incorrectly matching
>      MTRRs would all be WB.
> 
> 3) In *all* Hyper-V VMs (not just Confidential VMs), disable X86_FEATURE_MTRR
>      and use the new override to set the default type to WB.   Hopefully we don't
>      have to do this, but I can submit a separate patch if it becomes necessary.

4) Sanitize MTRRs in mtrr_cleanup(), resulting in MTRR#1 in your example to
    be modified to start at 0 (which would not really help to solve the
    multiple match you are seeing, but I'm about to solve that one, too, as
    the multiple MTRR match is allowed in the specs, but not really handled
    correctly in mtrr_type_lookup()).


Juergen
  

Patch

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index e4f499eb0f29..7b9c5443d176 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -721,8 +721,7 @@  int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
 	u8 mtrr, uniform;
 
 	mtrr = mtrr_type_lookup(addr, addr + PUD_SIZE, &uniform);
-	if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
-	    (mtrr != MTRR_TYPE_WRBACK))
+	if (!uniform)
 		return 0;
 
 	/* Bail out if we are we on a populated non-leaf entry: */
@@ -748,8 +747,7 @@  int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
 	u8 mtrr, uniform;
 
 	mtrr = mtrr_type_lookup(addr, addr + PMD_SIZE, &uniform);
-	if ((mtrr != MTRR_TYPE_INVALID) && (!uniform) &&
-	    (mtrr != MTRR_TYPE_WRBACK)) {
+	if (!uniform) {
 		pr_warn_once("%s: Cannot satisfy [mem %#010llx-%#010llx] with a huge-page mapping due to MTRR override.\n",
 			     __func__, addr, addr + PMD_SIZE);
 		return 0;