[RFC,V1,5/5] x86: CVMs: Ensure that memory conversions happen at 2M alignment

Message ID 20240112055251.36101-6-vannapurve@google.com
State New
Headers
Series x86: CVMs: Align memory conversions to 2M granularity |

Commit Message

Vishal Annapurve Jan. 12, 2024, 5:52 a.m. UTC
  Return error on conversion of memory ranges not aligned to 2M size.

Signed-off-by: Vishal Annapurve <vannapurve@google.com>
---
 arch/x86/mm/pat/set_memory.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)
  

Comments

Dave Hansen Jan. 31, 2024, 4:33 p.m. UTC | #1
On 1/11/24 21:52, Vishal Annapurve wrote:
> @@ -2133,8 +2133,10 @@ static int __set_memory_enc_pgtable(unsigned long addr, int numpages, bool enc)
>  	int ret;
>  
>  	/* Should not be working on unaligned addresses */
> -	if (WARN_ONCE(addr & ~PAGE_MASK, "misaligned address: %#lx\n", addr))
> -		addr &= PAGE_MASK;
> +	if (WARN_ONCE(addr & ~HPAGE_MASK, "misaligned address: %#lx\n", addr)
> +		|| WARN_ONCE((numpages << PAGE_SHIFT) & ~HPAGE_MASK,
> +			"misaligned numpages: %#lx\n", numpages))
> +		return -EINVAL;

This series is talking about swiotlb and DMA, then this applies a
restriction to what I *thought* was a much more generic function:
__set_memory_enc_pgtable().  What prevents this function from getting
used on 4k mappings?
  
Vishal Annapurve Feb. 1, 2024, 3:46 a.m. UTC | #2
On Wed, Jan 31, 2024 at 10:03 PM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 1/11/24 21:52, Vishal Annapurve wrote:
> > @@ -2133,8 +2133,10 @@ static int __set_memory_enc_pgtable(unsigned long addr, int numpages, bool enc)
> >       int ret;
> >
> >       /* Should not be working on unaligned addresses */
> > -     if (WARN_ONCE(addr & ~PAGE_MASK, "misaligned address: %#lx\n", addr))
> > -             addr &= PAGE_MASK;
> > +     if (WARN_ONCE(addr & ~HPAGE_MASK, "misaligned address: %#lx\n", addr)
> > +             || WARN_ONCE((numpages << PAGE_SHIFT) & ~HPAGE_MASK,
> > +                     "misaligned numpages: %#lx\n", numpages))
> > +             return -EINVAL;
>
> This series is talking about swiotlb and DMA, then this applies a
> restriction to what I *thought* was a much more generic function:
> __set_memory_enc_pgtable().  What prevents this function from getting
> used on 4k mappings?
>
>

The end goal here is to limit the conversion granularity to hugepage
sizes. SWIOTLB allocations are the major source of unaligned
allocations(and so the conversions) that need to be fixed before
achieving this goal.

This change will ensure that conversion fails for unaligned ranges, as
I don't foresee the need for 4K aligned conversions apart from DMA
allocations.
  
Jeremi Piotrowski Feb. 1, 2024, 12:02 p.m. UTC | #3
On 01/02/2024 04:46, Vishal Annapurve wrote:
> On Wed, Jan 31, 2024 at 10:03 PM Dave Hansen <dave.hansen@intel.com> wrote:
>>
>> On 1/11/24 21:52, Vishal Annapurve wrote:
>>> @@ -2133,8 +2133,10 @@ static int __set_memory_enc_pgtable(unsigned long addr, int numpages, bool enc)
>>>       int ret;
>>>
>>>       /* Should not be working on unaligned addresses */
>>> -     if (WARN_ONCE(addr & ~PAGE_MASK, "misaligned address: %#lx\n", addr))
>>> -             addr &= PAGE_MASK;
>>> +     if (WARN_ONCE(addr & ~HPAGE_MASK, "misaligned address: %#lx\n", addr)
>>> +             || WARN_ONCE((numpages << PAGE_SHIFT) & ~HPAGE_MASK,
>>> +                     "misaligned numpages: %#lx\n", numpages))
>>> +             return -EINVAL;
>>
>> This series is talking about swiotlb and DMA, then this applies a
>> restriction to what I *thought* was a much more generic function:
>> __set_memory_enc_pgtable().  What prevents this function from getting
>> used on 4k mappings?
>>
>>
> 
> The end goal here is to limit the conversion granularity to hugepage
> sizes. SWIOTLB allocations are the major source of unaligned
> allocations(and so the conversions) that need to be fixed before
> achieving this goal.
> 
> This change will ensure that conversion fails for unaligned ranges, as
> I don't foresee the need for 4K aligned conversions apart from DMA
> allocations.

Hi Vishal,

This assumption is wrong. set_memory_decrypted is called from various
parts of the kernel: kexec, sev-guest, kvmclock, hyperv code. These conversions
are for non-DMA allocations that need to be done at 4KB granularity
because the data structures in question are page sized.

Thanks,
Jeremi
  
Jeremi Piotrowski Feb. 2, 2024, 8 a.m. UTC | #4
On 02/02/2024 06:08, Vishal Annapurve wrote:
> On Thu, Feb 1, 2024 at 5:32 PM Jeremi Piotrowski
> <jpiotrowski@linux.microsoft.com> wrote:
>>
>> On 01/02/2024 04:46, Vishal Annapurve wrote:
>>> On Wed, Jan 31, 2024 at 10:03 PM Dave Hansen <dave.hansen@intel.com> wrote:
>>>>
>>>> On 1/11/24 21:52, Vishal Annapurve wrote:
>>>>> @@ -2133,8 +2133,10 @@ static int __set_memory_enc_pgtable(unsigned long addr, int numpages, bool enc)
>>>>>       int ret;
>>>>>
>>>>>       /* Should not be working on unaligned addresses */
>>>>> -     if (WARN_ONCE(addr & ~PAGE_MASK, "misaligned address: %#lx\n", addr))
>>>>> -             addr &= PAGE_MASK;
>>>>> +     if (WARN_ONCE(addr & ~HPAGE_MASK, "misaligned address: %#lx\n", addr)
>>>>> +             || WARN_ONCE((numpages << PAGE_SHIFT) & ~HPAGE_MASK,
>>>>> +                     "misaligned numpages: %#lx\n", numpages))
>>>>> +             return -EINVAL;
>>>>
>>>> This series is talking about swiotlb and DMA, then this applies a
>>>> restriction to what I *thought* was a much more generic function:
>>>> __set_memory_enc_pgtable().  What prevents this function from getting
>>>> used on 4k mappings?
>>>>
>>>>
>>>
>>> The end goal here is to limit the conversion granularity to hugepage
>>> sizes. SWIOTLB allocations are the major source of unaligned
>>> allocations(and so the conversions) that need to be fixed before
>>> achieving this goal.
>>>
>>> This change will ensure that conversion fails for unaligned ranges, as
>>> I don't foresee the need for 4K aligned conversions apart from DMA
>>> allocations.
>>
>> Hi Vishal,
>>
>> This assumption is wrong. set_memory_decrypted is called from various
>> parts of the kernel: kexec, sev-guest, kvmclock, hyperv code. These conversions
>> are for non-DMA allocations that need to be done at 4KB granularity
>> because the data structures in question are page sized.
>>
>> Thanks,
>> Jeremi
> 
> Thanks Jeremi for pointing out these usecases.
> 
> My brief analysis for these call sites:
> 1) machine_kexec_64.c, realmode/init.c, kvm/mmu/mmu.c - shared memory
> allocation/conversion happens when host side memory encryption
> (CC_ATTR_HOST_MEM_ENCRYPT) is enabled.
> 2) kernel/kvmclock.c -  Shared memory allocation can be made to align
> 2M even if the memory needed is lesser.
> 3) drivers/virt/coco/sev-guest/sev-guest.c,
> drivers/virt/coco/tdx-guest/tdx-guest.c - Shared memory allocation can
> be made to align 2M even if the memory needed is lesser.
> 
> I admit I haven't analyzed hyperv code in context of these changes,
> but will take a better look to see if the calls for memory conversion
> here can fit the category of "Shared memory allocation can be made to
> align 2M even if the memory needed is lesser".
> 
> Agree that this patch should be modified to look something like
> (subject to more changes on the call sites)

No, this patch is still built on the wrong assumptions. You're trying
to alter a generic function in the guest for the constraints of a very
specific hypervisor + host userspace + memory backend combination.
That's not right.

Is the numpages check supposed to ensure that the guest *only* toggles
visibility in chunks of 2MB? Then you're exposing more memory to the host
than the guest intends.

If you must - focus on getting swiotlb conversions to happen at the desired
granularity but don't try to force every single conversion to be >4K.

Thanks,
Jeremi


> 
> =============
> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> index e9b448d1b1b7..8c608d6913c4 100644
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -2132,10 +2132,15 @@ static int __set_memory_enc_pgtable(unsigned
> long addr, int numpages, bool enc)
>         struct cpa_data cpa;
>         int ret;
> 
>         /* Should not be working on unaligned addresses */
>         if (WARN_ONCE(addr & ~PAGE_MASK, "misaligned address: %#lx\n", addr))
>                 addr &= PAGE_MASK;
> 
> +       if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT) &&
> +               (WARN_ONCE(addr & ~HPAGE_MASK, "misaligned address:
> %#lx\n", addr)
> +                       || WARN_ONCE((numpages << PAGE_SHIFT) & ~HPAGE_MASK,
> +                               "misaligned numpages: %#lx\n", numpages)))
> +               return -EINVAL;
> +
>         memset(&cpa, 0, sizeof(cpa));
>         cpa.vaddr = &addr;
>         cpa.numpages = numpages;
  
Dave Hansen Feb. 2, 2024, 4:35 p.m. UTC | #5
On 2/2/24 08:22, Vishal Annapurve wrote:
>> If you must - focus on getting swiotlb conversions to happen at the desired
>> granularity but don't try to force every single conversion to be >4K.
> If any conversion within a guest happens at 4K granularity, then this
> will effectively cause non-hugepage aligned EPT/NPT entries. This
> series is trying to get all private and shared memory regions to be
> hugepage aligned to address the problem statement.

Yeah, but the series is trying to do that by being awfully myopic at
this stage and without being _declared_ to be so myopic.

Take a look at all of the set_memory_decrypted() calls.  How many of
them even operate on the part of the guest address space rooted in the
memfd where splits matter?  They're not doing conversions.  They're just
setting up shared mappings in the page tables of gunk that was never
private in the first place.
  

Patch

diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index bda9f129835e..6f7b06a502f4 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -2133,8 +2133,10 @@  static int __set_memory_enc_pgtable(unsigned long addr, int numpages, bool enc)
 	int ret;
 
 	/* Should not be working on unaligned addresses */
-	if (WARN_ONCE(addr & ~PAGE_MASK, "misaligned address: %#lx\n", addr))
-		addr &= PAGE_MASK;
+	if (WARN_ONCE(addr & ~HPAGE_MASK, "misaligned address: %#lx\n", addr)
+		|| WARN_ONCE((numpages << PAGE_SHIFT) & ~HPAGE_MASK,
+			"misaligned numpages: %#lx\n", numpages))
+		return -EINVAL;
 
 	memset(&cpa, 0, sizeof(cpa));
 	cpa.vaddr = &addr;