[v2,01/12] x86/ioremap: Fix page aligned size calculation in __ioremap_caller()

Message ID 1668147701-4583-2-git-send-email-mikelley@microsoft.com
State New
Headers
Series Drivers: hv: Add PCI pass-thru support to Hyper-V Confidential VMs |

Commit Message

Michael Kelley (LINUX) Nov. 11, 2022, 6:21 a.m. UTC
  If applying the PHYSICAL_PAGE_MASK to the phys_addr argument causes
upper bits to be masked out, the re-calculation of size to account for
page alignment is incorrect because the same bits are not masked out
in last_addr.

Fix this by masking the page aligned last_addr as well.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
---
 arch/x86/mm/ioremap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Dave Hansen Nov. 12, 2022, 12:11 a.m. UTC | #1
On 11/10/22 22:21, Michael Kelley wrote:
> If applying the PHYSICAL_PAGE_MASK to the phys_addr argument causes
> upper bits to be masked out, the re-calculation of size to account for
> page alignment is incorrect because the same bits are not masked out
> in last_addr.
> 
> Fix this by masking the page aligned last_addr as well.

This makes sense at first glance.

How did you notice this?  What is the impact to users?  Did the bug
actually cause you some trouble or was it by inspection?  Do you have a
sense of how many folks might be impacted?  Any thoughts on how it
lasted for 14+ years?

For the functionality of the mapping, I guess 'size' doesn't really
matter because even a 1-byte 'size' will map a page.  The other fallout
would be from memtype_reserve() reserving too little.  But, that's
unlikely to matter for small mappings because even though:

	ioremap(0x1800, 0x800);

would end up just reserving 0x1000->0x1800, it still wouldn't allow

	ioremap(0x1000, 0x800);

to succeed because *both* of them would end up trying to reserve the
beginning of the page.  Basically, the first caller effectively reserves
the whole page and any second user will fail.

So the other place it would matter would be for mappings that span two
pages, say:

	ioremap(0x1fff, 0x2)

But I guess those aren't very common.  Most large ioremap() callers seem
to already have base and size page-aligned.

Anyway, sorry to make so much of a big deal about a one-liner.  But,
these decade-old bugs really make me wonder how they stuck around for so
long.

I'd be curious if you thought about this too while putting together this
fox.
  
Michael Kelley (LINUX) Nov. 12, 2022, 4:31 a.m. UTC | #2
From: Dave Hansen <dave.hansen@intel.com> Sent: Friday, November 11, 2022 4:12 PM
> 
> On 11/10/22 22:21, Michael Kelley wrote:
> > If applying the PHYSICAL_PAGE_MASK to the phys_addr argument causes
> > upper bits to be masked out, the re-calculation of size to account for
> > page alignment is incorrect because the same bits are not masked out
> > in last_addr.
> >
> > Fix this by masking the page aligned last_addr as well.
> 
> This makes sense at first glance.
> 
> How did you notice this?  What is the impact to users?  Did the bug
> actually cause you some trouble or was it by inspection?  Do you have a
> sense of how many folks might be impacted?  Any thoughts on how it
> lasted for 14+ years?
> 
> For the functionality of the mapping, I guess 'size' doesn't really
> matter because even a 1-byte 'size' will map a page.  The other fallout
> would be from memtype_reserve() reserving too little.  But, that's
> unlikely to matter for small mappings because even though:
> 
> 	ioremap(0x1800, 0x800);
> 
> would end up just reserving 0x1000->0x1800, it still wouldn't allow
> 
> 	ioremap(0x1000, 0x800);
> 
> to succeed because *both* of them would end up trying to reserve the
> beginning of the page.  Basically, the first caller effectively reserves
> the whole page and any second user will fail.
> 
> So the other place it would matter would be for mappings that span two
> pages, say:
> 
> 	ioremap(0x1fff, 0x2)
> 
> But I guess those aren't very common.  Most large ioremap() callers seem
> to already have base and size page-aligned.
> 
> Anyway, sorry to make so much of a big deal about a one-liner.  But,
> these decade-old bugs really make me wonder how they stuck around for so
> long.
> 
> I'd be curious if you thought about this too while putting together this
> fox.

The bug only manifests if the phys_addr input argument exceeds
PHYSICAL_PAGE_MASK, which is the global variable physical_mask, which is
the size of the machine's or VM's physical address space. That's the only case
in which masking with PHYSICAL_PAGE_MASK changes anything.   So I don't
see that your examples fit the situation.  In the case where the masking does
clear some high order bits, the "size" calculation yields a huge number which
then quickly causes an error.

With that understanding, I'd guess that over the last 14 years, the bug has
never manifested, or if it did, it was due to something badly broken in the
caller.  It's not clear why masking with PHYSICAL_PAGE_MASK is there in the
first place, other than as a "safety check" on the phys_addr input argument
that wasn't done quite correctly.

I hit the issue because this patch series does a *transition* in how the AMD
SNP "vTOM" bit is handled.  vTOM is bit 46 in a 47-bit physical address space --
i.e., it's the high order bit.  Current code treats the vTOM bit as part of the
physical address, and current code passes addresses with vTOM set into
__ioremap_caller() and everything works.   But Patch 5 of this patch series
changes the underlying global variable physical_mask to remove bit 46,
similar to what tdx_early_init() does.  At that point, passing __ioremap_caller()
a phys_addr with the vTOM bit set causes the bug and a failure.  With the
fix, Patch 5 in this series causes __ioremap_caller() to mask out the vTOM bit,
which is what I want, at least temporarily.

Later patches in the series change things so that we no longer pass a
phys_addr to __ioremap_caller() that has the vTOM bit set.  After those
later patches, this fix to __ioremap_caller() isn't needed.  But I wanted to
avoid cramming all the vTOM-related changes into a single huge patch.
Having __ioremap_caller() correctly handle a phys_addr that exceeds
physical_mask instead of blowing up let's this patch series sequence things
into reasonable size chunks.  And given that the __ioremap_caller() code is
wrong regardless, fixing it seemed like a reasonable overall solution.

Michael
  
Dave Hansen Nov. 14, 2022, 4:40 p.m. UTC | #3
On 11/10/22 22:21, Michael Kelley wrote:
> --- a/arch/x86/mm/ioremap.c
> +++ b/arch/x86/mm/ioremap.c
> @@ -218,7 +218,7 @@ static void __ioremap_check_mem(resource_size_t addr, unsigned long size,
>  	 */
>  	offset = phys_addr & ~PAGE_MASK;
>  	phys_addr &= PHYSICAL_PAGE_MASK;
> -	size = PAGE_ALIGN(last_addr+1) - phys_addr;
> +	size = (PAGE_ALIGN(last_addr+1) & PHYSICAL_PAGE_MASK) - phys_addr;

Michael, thanks for the explanation in your other reply.  First and
foremost, I *totally* missed the reason for this patch.  I was thinking
about issues that could pop up from the _lower_ bits being masked off.

Granted, your changelog _did_ say "upper bits", so shame on me.  But, it
would be great to put some more background in the changelog to make it a
bit harder for silly reviewers to miss such things.

I'd also like to propose something that I think is more straightforward:

        /*
         * Mappings have to be page-aligned
         */
        offset = phys_addr & ~PAGE_MASK;
        phys_addr &= PAGE_MASK;
        size = PAGE_ALIGN(last_addr+1) - phys_addr;

	/*
	 * Mask out any bits not parts of the actual physical
	 * address, like memory encryption bits.
	 */
	phys_addr &= PHYSICAL_PAGE_MASK;

Because, first of all, that "Mappings have to be page-aligned" thing is
(now) doing more than page-aligning things.  Second, the moment you mask
out the metadata bits, the 'size' calculation gets harder.  Doing it in
two phases (page alignment followed by metadata bit masking) breaks up
the two logical operations.
  
Michael Kelley (LINUX) Nov. 14, 2022, 4:53 p.m. UTC | #4
From: Dave Hansen <dave.hansen@intel.com> Sent: Monday, November 14, 2022 8:40 AM
> 
> On 11/10/22 22:21, Michael Kelley wrote:
> > --- a/arch/x86/mm/ioremap.c
> > +++ b/arch/x86/mm/ioremap.c
> > @@ -218,7 +218,7 @@ static void __ioremap_check_mem(resource_size_t addr,
> unsigned long size,
> >  	 */
> >  	offset = phys_addr & ~PAGE_MASK;
> >  	phys_addr &= PHYSICAL_PAGE_MASK;
> > -	size = PAGE_ALIGN(last_addr+1) - phys_addr;
> > +	size = (PAGE_ALIGN(last_addr+1) & PHYSICAL_PAGE_MASK) - phys_addr;
> 
> Michael, thanks for the explanation in your other reply.  First and
> foremost, I *totally* missed the reason for this patch.  I was thinking
> about issues that could pop up from the _lower_ bits being masked off.
> 
> Granted, your changelog _did_ say "upper bits", so shame on me.  But, it
> would be great to put some more background in the changelog to make it a
> bit harder for silly reviewers to miss such things.
> 
> I'd also like to propose something that I think is more straightforward:
> 
>         /*
>          * Mappings have to be page-aligned
>          */
>         offset = phys_addr & ~PAGE_MASK;
>         phys_addr &= PAGE_MASK;
>         size = PAGE_ALIGN(last_addr+1) - phys_addr;
> 
> 	/*
> 	 * Mask out any bits not parts of the actual physical
> 	 * address, like memory encryption bits.
> 	 */
> 	phys_addr &= PHYSICAL_PAGE_MASK;
> 
> Because, first of all, that "Mappings have to be page-aligned" thing is
> (now) doing more than page-aligning things.  Second, the moment you mask
> out the metadata bits, the 'size' calculation gets harder.  Doing it in
> two phases (page alignment followed by metadata bit masking) breaks up
> the two logical operations.
> 

Work for me.  Will do this in v3.

Michael
  
Dave Hansen Nov. 14, 2022, 4:57 p.m. UTC | #5
On 11/14/22 08:53, Michael Kelley (LINUX) wrote:
>> Because, first of all, that "Mappings have to be page-aligned" thing is
>> (now) doing more than page-aligning things.  Second, the moment you mask
>> out the metadata bits, the 'size' calculation gets harder.  Doing it in
>> two phases (page alignment followed by metadata bit masking) breaks up
>> the two logical operations.
>>
> Work for me.  Will do this in v3.

Kirill also made a good point about TDX: it isn't affected by this
because it always passes *real* (no metadata bits set) physical
addresses in here.  Could you double check that you don't want to do the
same for your code?
  
Michael Kelley (LINUX) Nov. 14, 2022, 5:25 p.m. UTC | #6
From: Dave Hansen <dave.hansen@intel.com> Sent: Monday, November 14, 2022 8:58 AM
> 
> On 11/14/22 08:53, Michael Kelley (LINUX) wrote:
> >> Because, first of all, that "Mappings have to be page-aligned" thing is
> >> (now) doing more than page-aligning things.  Second, the moment you mask
> >> out the metadata bits, the 'size' calculation gets harder.  Doing it in
> >> two phases (page alignment followed by metadata bit masking) breaks up
> >> the two logical operations.
> >>
> > Work for me.  Will do this in v3.
> 
> Kirill also made a good point about TDX: it isn't affected by this
> because it always passes *real* (no metadata bits set) physical
> addresses in here.  Could you double check that you don't want to do the
> same for your code?
> 

Yes, we want to do the same for the Hyper-V vTOM code.   And when this
full patch set is applied, we're only passing in *real* physical addresses and
are not depending on __ioremap_caller() doing any masking.

But this patch set is executing a transition from current code, which passes
physical addresses with metadata bits set (i.e., the vTOM bit), to the new
approach, which does not.  There are several places in the current Hyper-V
vTOM code that need changes to make this transition.  These changes are
non-trivial, and I don't want to have to cram them all into one big patch.
By making this fix, the current code continues to work throughout this
patch series while the changes are incrementally made in multiple
individual patches.  But when it's all done, we won't be passing any
physical addresses with the vTOM bit set.

Note that current code works and doesn't hit the bug because the
global variable physical_mask includes the vTOM bit as part of the
physical address.  But Patch 5 of the series removes the vTOM bit
from physical_mask.  At that point, the current __ioremap_caller()
code breaks due to the bug.  By fixing the bug, the current Hyper-V
vTOM code continues to work until all the changes can be completed
(which is Patch 10 of the series).

Perhaps it's convoluted, but basically I'm trying to avoid having to merge
Patches 5 thru 10 into one big patch.  And since the current
__ioremap_caller() is wrong anyway, fixing it made everything smoother.

Michael
  

Patch

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 78c5bc6..0343de4 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -218,7 +218,7 @@  static void __ioremap_check_mem(resource_size_t addr, unsigned long size,
 	 */
 	offset = phys_addr & ~PAGE_MASK;
 	phys_addr &= PHYSICAL_PAGE_MASK;
-	size = PAGE_ALIGN(last_addr+1) - phys_addr;
+	size = (PAGE_ALIGN(last_addr+1) & PHYSICAL_PAGE_MASK) - phys_addr;
 
 	retval = memtype_reserve(phys_addr, (u64)phys_addr + size,
 						pcm, &new_pcm);