[PATCHv2,2/3] x86/tdx: Fix race between set_memory_encrypted() and load_unaligned_zeropad()

Message ID 20230526120225.31936-3-kirill.shutemov@linux.intel.com
State New
Headers
Series x86/tdx: Fix one more load_unaligned_zeropad() issue |

Commit Message

Kirill A. Shutemov May 26, 2023, 12:02 p.m. UTC
  Touching privately mapped GPA that is not properly converted to private
with MapGPA and accepted leads to unrecoverable exit to VMM.

load_unaligned_zeropad() can touch memory that is not owned by the
caller, but just happened to next after the owned memory.
This load_unaligned_zeropad() behaviour makes it important when kernel
asks VMM to convert a GPA from shared to private or back. Kernel must
never have a page mapped into direct mapping (and aliases) as private
when the GPA is already converted to shared or when GPA is not yet
converted to private.

guest.enc_status_change_prepare() called before adjusting direct mapping
and therefore it is responsible for converting the memory to private.

guest.enc_status_change_finish() called after adjusting direct mapping
and it converts the memory to shared.

It is okay to have a shared mapping of memory that is not converted
properly. handle_mmio() knows how to deal with load_unaligned_zeropad()
stepping on it.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Fixes: 7dbde7631629 ("x86/mm/cpa: Add support for TDX shared memory")
Cc: stable@vger.kernel.org
---
 arch/x86/coco/tdx/tdx.c | 56 ++++++++++++++++++++++++++++++++++++++---
 1 file changed, 53 insertions(+), 3 deletions(-)
  

Comments

Kuppuswamy Sathyanarayanan May 26, 2023, 10:10 p.m. UTC | #1
On 5/26/23 5:02 AM, Kirill A. Shutemov wrote:
> Touching privately mapped GPA that is not properly converted to private
> with MapGPA and accepted leads to unrecoverable exit to VMM.
> 
> load_unaligned_zeropad() can touch memory that is not owned by the
> caller, but just happened to next after the owned memory.

/s/to/to be ?

> This load_unaligned_zeropad() behaviour makes it important when kernel
> asks VMM to convert a GPA from shared to private or back. Kernel must
> never have a page mapped into direct mapping (and aliases) as private
> when the GPA is already converted to shared or when GPA is not yet
> converted to private.

I am wondering whether this issue exist in the AMD code? 

IMO, you can add some info on the window in set_memory_encrypted()
where this race exists.

> 
> guest.enc_status_change_prepare() called before adjusting direct mapping
> and therefore it is responsible for converting the memory to private.
> 
> guest.enc_status_change_finish() called after adjusting direct mapping
> and it converts the memory to shared.
> 
> It is okay to have a shared mapping of memory that is not converted
> properly. handle_mmio() knows how to deal with load_unaligned_zeropad()
> stepping on it.

> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Fixes: 7dbde7631629 ("x86/mm/cpa: Add support for TDX shared memory")
> Cc: stable@vger.kernel.org
> ---
>  arch/x86/coco/tdx/tdx.c | 56 ++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 53 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
> index e146b599260f..59cc13e41aa6 100644
> --- a/arch/x86/coco/tdx/tdx.c
> +++ b/arch/x86/coco/tdx/tdx.c
> @@ -840,6 +840,30 @@ static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
>  	return true;
>  }
>  
> +static bool tdx_enc_status_change_prepare(unsigned long vaddr, int numpages,
> +					  bool enc)
> +{
> +	/*
> +	 * Only handle shared->private conversion here.
> +	 * See the comment in tdx_early_init().
> +	 */
> +	if (enc)
> +		return tdx_enc_status_changed(vaddr, numpages, enc);
> +	return true;
> +}
> +
> +static bool tdx_enc_status_change_finish(unsigned long vaddr, int numpages,
> +					 bool enc)
> +{
> +	/*
> +	 * Only handle private->shared conversion here.
> +	 * See the comment in tdx_early_init().
> +	 */
> +	if (!enc)
> +		return tdx_enc_status_changed(vaddr, numpages, enc);
> +	return true;
> +}
> +
>  void __init tdx_early_init(void)
>  {
>  	u64 cc_mask;
> @@ -867,9 +891,35 @@ void __init tdx_early_init(void)
>  	 */
>  	physical_mask &= cc_mask - 1;
>  
> -	x86_platform.guest.enc_cache_flush_required = tdx_cache_flush_required;
> -	x86_platform.guest.enc_tlb_flush_required   = tdx_tlb_flush_required;

I think you don't need to change the order here.

> -	x86_platform.guest.enc_status_change_finish = tdx_enc_status_changed;
> +	/*
> +	 * Touching privately mapped GPA that is not properly converted to
> +	 * private with MapGPA and accepted leads to unrecoverable exit
> +	 * to VMM.
> +	 *
> +	 * load_unaligned_zeropad() can touch memory that is not owned by
> +	 * the caller, but just happened to next after the owned memory.
> +	 * This load_unaligned_zeropad() behaviour makes it important when
> +	 * kernel asks VMM to convert a GPA from shared to private or back.
> +	 * Kernel must never have a page mapped into direct mapping (and
> +	 * aliases) as private when the GPA is already converted to shared or
> +	 * when GPA is not yet converted to private.
> +	 *
> +	 * guest.enc_status_change_prepare() called before adjusting direct
> +	 * mapping and therefore it is responsible for converting the memory
> +	 * to private.
> +	 *
> +	 * guest.enc_status_change_finish() called after adjusting direct
> +	 * mapping and it converts the memory to shared.
> +	 *
> +	 * It is okay to have a shared mapping of memory that is not converted
> +	 * properly. handle_mmio() knows how to deal with load_unaligned_zeropad()
> +	 * stepping on it.
> +	 */
> +	x86_platform.guest.enc_status_change_prepare = tdx_enc_status_change_prepare;
> +	x86_platform.guest.enc_status_change_finish  = tdx_enc_status_change_finish;
> +
> +	x86_platform.guest.enc_cache_flush_required  = tdx_cache_flush_required;
> +	x86_platform.guest.enc_tlb_flush_required    = tdx_tlb_flush_required;
>  
>  	pr_info("Guest detected\n");
>  }
  
Kirill A. Shutemov May 30, 2023, 12:57 a.m. UTC | #2
On Fri, May 26, 2023 at 03:10:56PM -0700, Sathyanarayanan Kuppuswamy wrote:
> 
> 
> On 5/26/23 5:02 AM, Kirill A. Shutemov wrote:
> > Touching privately mapped GPA that is not properly converted to private
> > with MapGPA and accepted leads to unrecoverable exit to VMM.
> > 
> > load_unaligned_zeropad() can touch memory that is not owned by the
> > caller, but just happened to next after the owned memory.
> 
> /s/to/to be ?

Yep, my bad.

> > This load_unaligned_zeropad() behaviour makes it important when kernel
> > asks VMM to convert a GPA from shared to private or back. Kernel must
> > never have a page mapped into direct mapping (and aliases) as private
> > when the GPA is already converted to shared or when GPA is not yet
> > converted to private.
> 
> I am wondering whether this issue exist in the AMD code? 
> 
> IMO, you can add some info on the window in set_memory_encrypted()
> where this race exists.

I don't think AMD affected by load_unaligned_zeropad() the same way as
Intel does. But I'm not sure.

Tom, do you have any comments?

> 
> > 
> > guest.enc_status_change_prepare() called before adjusting direct mapping
> > and therefore it is responsible for converting the memory to private.
> > 
> > guest.enc_status_change_finish() called after adjusting direct mapping
> > and it converts the memory to shared.
> > 
> > It is okay to have a shared mapping of memory that is not converted
> > properly. handle_mmio() knows how to deal with load_unaligned_zeropad()
> > stepping on it.
> 
> > 
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > Fixes: 7dbde7631629 ("x86/mm/cpa: Add support for TDX shared memory")
> > Cc: stable@vger.kernel.org
> > ---
> >  arch/x86/coco/tdx/tdx.c | 56 ++++++++++++++++++++++++++++++++++++++---
> >  1 file changed, 53 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
> > index e146b599260f..59cc13e41aa6 100644
> > --- a/arch/x86/coco/tdx/tdx.c
> > +++ b/arch/x86/coco/tdx/tdx.c
> > @@ -840,6 +840,30 @@ static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
> >  	return true;
> >  }
> >  
> > +static bool tdx_enc_status_change_prepare(unsigned long vaddr, int numpages,
> > +					  bool enc)
> > +{
> > +	/*
> > +	 * Only handle shared->private conversion here.
> > +	 * See the comment in tdx_early_init().
> > +	 */
> > +	if (enc)
> > +		return tdx_enc_status_changed(vaddr, numpages, enc);
> > +	return true;
> > +}
> > +
> > +static bool tdx_enc_status_change_finish(unsigned long vaddr, int numpages,
> > +					 bool enc)
> > +{
> > +	/*
> > +	 * Only handle private->shared conversion here.
> > +	 * See the comment in tdx_early_init().
> > +	 */
> > +	if (!enc)
> > +		return tdx_enc_status_changed(vaddr, numpages, enc);
> > +	return true;
> > +}
> > +
> >  void __init tdx_early_init(void)
> >  {
> >  	u64 cc_mask;
> > @@ -867,9 +891,35 @@ void __init tdx_early_init(void)
> >  	 */
> >  	physical_mask &= cc_mask - 1;
> >  
> > -	x86_platform.guest.enc_cache_flush_required = tdx_cache_flush_required;
> > -	x86_platform.guest.enc_tlb_flush_required   = tdx_tlb_flush_required;
> 
> I think you don't need to change the order here.

I wanted to emphasise that the comment is for _prepare/_finish callbacks
and I hoped re-order would help with this.

> > -	x86_platform.guest.enc_status_change_finish = tdx_enc_status_changed;
> > +	/*
> > +	 * Touching privately mapped GPA that is not properly converted to
> > +	 * private with MapGPA and accepted leads to unrecoverable exit
> > +	 * to VMM.
> > +	 *
> > +	 * load_unaligned_zeropad() can touch memory that is not owned by
> > +	 * the caller, but just happened to next after the owned memory.
> > +	 * This load_unaligned_zeropad() behaviour makes it important when
> > +	 * kernel asks VMM to convert a GPA from shared to private or back.
> > +	 * Kernel must never have a page mapped into direct mapping (and
> > +	 * aliases) as private when the GPA is already converted to shared or
> > +	 * when GPA is not yet converted to private.
> > +	 *
> > +	 * guest.enc_status_change_prepare() called before adjusting direct
> > +	 * mapping and therefore it is responsible for converting the memory
> > +	 * to private.
> > +	 *
> > +	 * guest.enc_status_change_finish() called after adjusting direct
> > +	 * mapping and it converts the memory to shared.
> > +	 *
> > +	 * It is okay to have a shared mapping of memory that is not converted
> > +	 * properly. handle_mmio() knows how to deal with load_unaligned_zeropad()
> > +	 * stepping on it.
> > +	 */
> > +	x86_platform.guest.enc_status_change_prepare = tdx_enc_status_change_prepare;
> > +	x86_platform.guest.enc_status_change_finish  = tdx_enc_status_change_finish;
> > +
> > +	x86_platform.guest.enc_cache_flush_required  = tdx_cache_flush_required;
> > +	x86_platform.guest.enc_tlb_flush_required    = tdx_tlb_flush_required;
> >  
> >  	pr_info("Guest detected\n");
> >  }
> 
> -- 
> Sathyanarayanan Kuppuswamy
> Linux Kernel Developer
  
Tom Lendacky May 30, 2023, 12:57 p.m. UTC | #3
On 5/29/23 19:57, Kirill A. Shutemov wrote:
> On Fri, May 26, 2023 at 03:10:56PM -0700, Sathyanarayanan Kuppuswamy wrote:
>>
>>
>> On 5/26/23 5:02 AM, Kirill A. Shutemov wrote:
>>> Touching privately mapped GPA that is not properly converted to private
>>> with MapGPA and accepted leads to unrecoverable exit to VMM.
>>>
>>> load_unaligned_zeropad() can touch memory that is not owned by the
>>> caller, but just happened to next after the owned memory.
>>
>> /s/to/to be ?
> 
> Yep, my bad.
> 
>>> This load_unaligned_zeropad() behaviour makes it important when kernel
>>> asks VMM to convert a GPA from shared to private or back. Kernel must
>>> never have a page mapped into direct mapping (and aliases) as private
>>> when the GPA is already converted to shared or when GPA is not yet
>>> converted to private.
>>
>> I am wondering whether this issue exist in the AMD code?
>>
>> IMO, you can add some info on the window in set_memory_encrypted()
>> where this race exists.
> 
> I don't think AMD affected by load_unaligned_zeropad() the same way as
> Intel does. But I'm not sure.
> 
> Tom, do you have any comments?

Right, shouldn't be an issue for SNP.

Thanks,
Tom

> 
>>
>>>
>>> guest.enc_status_change_prepare() called before adjusting direct mapping
>>> and therefore it is responsible for converting the memory to private.
>>>
>>> guest.enc_status_change_finish() called after adjusting direct mapping
>>> and it converts the memory to shared.
>>>
>>> It is okay to have a shared mapping of memory that is not converted
>>> properly. handle_mmio() knows how to deal with load_unaligned_zeropad()
>>> stepping on it.
>>
>>>
>>> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>>> Fixes: 7dbde7631629 ("x86/mm/cpa: Add support for TDX shared memory")
>>> Cc: stable@vger.kernel.org
>>> ---
>>>   arch/x86/coco/tdx/tdx.c | 56 ++++++++++++++++++++++++++++++++++++++---
>>>   1 file changed, 53 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
>>> index e146b599260f..59cc13e41aa6 100644
>>> --- a/arch/x86/coco/tdx/tdx.c
>>> +++ b/arch/x86/coco/tdx/tdx.c
>>> @@ -840,6 +840,30 @@ static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
>>>   	return true;
>>>   }
>>>   
>>> +static bool tdx_enc_status_change_prepare(unsigned long vaddr, int numpages,
>>> +					  bool enc)
>>> +{
>>> +	/*
>>> +	 * Only handle shared->private conversion here.
>>> +	 * See the comment in tdx_early_init().
>>> +	 */
>>> +	if (enc)
>>> +		return tdx_enc_status_changed(vaddr, numpages, enc);
>>> +	return true;
>>> +}
>>> +
>>> +static bool tdx_enc_status_change_finish(unsigned long vaddr, int numpages,
>>> +					 bool enc)
>>> +{
>>> +	/*
>>> +	 * Only handle private->shared conversion here.
>>> +	 * See the comment in tdx_early_init().
>>> +	 */
>>> +	if (!enc)
>>> +		return tdx_enc_status_changed(vaddr, numpages, enc);
>>> +	return true;
>>> +}
>>> +
>>>   void __init tdx_early_init(void)
>>>   {
>>>   	u64 cc_mask;
>>> @@ -867,9 +891,35 @@ void __init tdx_early_init(void)
>>>   	 */
>>>   	physical_mask &= cc_mask - 1;
>>>   
>>> -	x86_platform.guest.enc_cache_flush_required = tdx_cache_flush_required;
>>> -	x86_platform.guest.enc_tlb_flush_required   = tdx_tlb_flush_required;
>>
>> I think you don't need to change the order here.
> 
> I wanted to emphasise that the comment is for _prepare/_finish callbacks
> and I hoped re-order would help with this.
> 
>>> -	x86_platform.guest.enc_status_change_finish = tdx_enc_status_changed;
>>> +	/*
>>> +	 * Touching privately mapped GPA that is not properly converted to
>>> +	 * private with MapGPA and accepted leads to unrecoverable exit
>>> +	 * to VMM.
>>> +	 *
>>> +	 * load_unaligned_zeropad() can touch memory that is not owned by
>>> +	 * the caller, but just happened to next after the owned memory.
>>> +	 * This load_unaligned_zeropad() behaviour makes it important when
>>> +	 * kernel asks VMM to convert a GPA from shared to private or back.
>>> +	 * Kernel must never have a page mapped into direct mapping (and
>>> +	 * aliases) as private when the GPA is already converted to shared or
>>> +	 * when GPA is not yet converted to private.
>>> +	 *
>>> +	 * guest.enc_status_change_prepare() called before adjusting direct
>>> +	 * mapping and therefore it is responsible for converting the memory
>>> +	 * to private.
>>> +	 *
>>> +	 * guest.enc_status_change_finish() called after adjusting direct
>>> +	 * mapping and it converts the memory to shared.
>>> +	 *
>>> +	 * It is okay to have a shared mapping of memory that is not converted
>>> +	 * properly. handle_mmio() knows how to deal with load_unaligned_zeropad()
>>> +	 * stepping on it.
>>> +	 */
>>> +	x86_platform.guest.enc_status_change_prepare = tdx_enc_status_change_prepare;
>>> +	x86_platform.guest.enc_status_change_finish  = tdx_enc_status_change_finish;
>>> +
>>> +	x86_platform.guest.enc_cache_flush_required  = tdx_cache_flush_required;
>>> +	x86_platform.guest.enc_tlb_flush_required    = tdx_tlb_flush_required;
>>>   
>>>   	pr_info("Guest detected\n");
>>>   }
>>
>> -- 
>> Sathyanarayanan Kuppuswamy
>> Linux Kernel Developer
>
  
Kuppuswamy Sathyanarayanan May 30, 2023, 1:22 p.m. UTC | #4
Hi,

On 5/30/23 5:57 AM, Tom Lendacky wrote:
> On 5/29/23 19:57, Kirill A. Shutemov wrote:
>> On Fri, May 26, 2023 at 03:10:56PM -0700, Sathyanarayanan Kuppuswamy wrote:
>>>
>>>
>>> On 5/26/23 5:02 AM, Kirill A. Shutemov wrote:
>>>> Touching privately mapped GPA that is not properly converted to private
>>>> with MapGPA and accepted leads to unrecoverable exit to VMM.
>>>>
>>>> load_unaligned_zeropad() can touch memory that is not owned by the
>>>> caller, but just happened to next after the owned memory.
>>>
>>> /s/to/to be ?
>>
>> Yep, my bad.
>>
>>>> This load_unaligned_zeropad() behaviour makes it important when kernel
>>>> asks VMM to convert a GPA from shared to private or back. Kernel must
>>>> never have a page mapped into direct mapping (and aliases) as private
>>>> when the GPA is already converted to shared or when GPA is not yet
>>>> converted to private.
>>>
>>> I am wondering whether this issue exist in the AMD code?
>>>
>>> IMO, you can add some info on the window in set_memory_encrypted()
>>> where this race exists.
>>
>> I don't think AMD affected by load_unaligned_zeropad() the same way as
>> Intel does. But I'm not sure.
>>
>> Tom, do you have any comments?
> 
> Right, shouldn't be an issue for SNP.

Thanks for confirming.

> 
> Thanks,
> Tom
> 
>>
>>>
>>>>
>>>> guest.enc_status_change_prepare() called before adjusting direct mapping
>>>> and therefore it is responsible for converting the memory to private.
>>>>
>>>> guest.enc_status_change_finish() called after adjusting direct mapping
>>>> and it converts the memory to shared.
>>>>
>>>> It is okay to have a shared mapping of memory that is not converted
>>>> properly. handle_mmio() knows how to deal with load_unaligned_zeropad()
>>>> stepping on it.
>>>
>>>>

Rest looks good to me.

Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

>>>> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>>>> Fixes: 7dbde7631629 ("x86/mm/cpa: Add support for TDX shared memory")
>>>> Cc: stable@vger.kernel.org
>>>> ---
>>>>   arch/x86/coco/tdx/tdx.c | 56 ++++++++++++++++++++++++++++++++++++++---
>>>>   1 file changed, 53 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
>>>> index e146b599260f..59cc13e41aa6 100644
>>>> --- a/arch/x86/coco/tdx/tdx.c
>>>> +++ b/arch/x86/coco/tdx/tdx.c
>>>> @@ -840,6 +840,30 @@ static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
>>>>       return true;
>>>>   }
>>>>   +static bool tdx_enc_status_change_prepare(unsigned long vaddr, int numpages,
>>>> +                      bool enc)
>>>> +{
>>>> +    /*
>>>> +     * Only handle shared->private conversion here.
>>>> +     * See the comment in tdx_early_init().
>>>> +     */
>>>> +    if (enc)
>>>> +        return tdx_enc_status_changed(vaddr, numpages, enc);
>>>> +    return true;
>>>> +}
>>>> +
>>>> +static bool tdx_enc_status_change_finish(unsigned long vaddr, int numpages,
>>>> +                     bool enc)
>>>> +{
>>>> +    /*
>>>> +     * Only handle private->shared conversion here.
>>>> +     * See the comment in tdx_early_init().
>>>> +     */
>>>> +    if (!enc)
>>>> +        return tdx_enc_status_changed(vaddr, numpages, enc);
>>>> +    return true;
>>>> +}
>>>> +
>>>>   void __init tdx_early_init(void)
>>>>   {
>>>>       u64 cc_mask;
>>>> @@ -867,9 +891,35 @@ void __init tdx_early_init(void)
>>>>        */
>>>>       physical_mask &= cc_mask - 1;
>>>>   -    x86_platform.guest.enc_cache_flush_required = tdx_cache_flush_required;
>>>> -    x86_platform.guest.enc_tlb_flush_required   = tdx_tlb_flush_required;
>>>
>>> I think you don't need to change the order here.
>>
>> I wanted to emphasise that the comment is for _prepare/_finish callbacks
>> and I hoped re-order would help with this.
>>
>>>> -    x86_platform.guest.enc_status_change_finish = tdx_enc_status_changed;
>>>> +    /*
>>>> +     * Touching privately mapped GPA that is not properly converted to
>>>> +     * private with MapGPA and accepted leads to unrecoverable exit
>>>> +     * to VMM.
>>>> +     *
>>>> +     * load_unaligned_zeropad() can touch memory that is not owned by
>>>> +     * the caller, but just happened to next after the owned memory.
>>>> +     * This load_unaligned_zeropad() behaviour makes it important when
>>>> +     * kernel asks VMM to convert a GPA from shared to private or back.
>>>> +     * Kernel must never have a page mapped into direct mapping (and
>>>> +     * aliases) as private when the GPA is already converted to shared or
>>>> +     * when GPA is not yet converted to private.
>>>> +     *
>>>> +     * guest.enc_status_change_prepare() called before adjusting direct
>>>> +     * mapping and therefore it is responsible for converting the memory
>>>> +     * to private.
>>>> +     *
>>>> +     * guest.enc_status_change_finish() called after adjusting direct
>>>> +     * mapping and it converts the memory to shared.
>>>> +     *
>>>> +     * It is okay to have a shared mapping of memory that is not converted
>>>> +     * properly. handle_mmio() knows how to deal with load_unaligned_zeropad()
>>>> +     * stepping on it.
>>>> +     */
>>>> +    x86_platform.guest.enc_status_change_prepare = tdx_enc_status_change_prepare;
>>>> +    x86_platform.guest.enc_status_change_finish  = tdx_enc_status_change_finish;
>>>> +
>>>> +    x86_platform.guest.enc_cache_flush_required  = tdx_cache_flush_required;
>>>> +    x86_platform.guest.enc_tlb_flush_required    = tdx_tlb_flush_required;
>>>>         pr_info("Guest detected\n");
>>>>   }
>>>
>>> -- 
>>> Sathyanarayanan Kuppuswamy
>>> Linux Kernel Developer
>>
  
Michael Kelley (LINUX) May 31, 2023, 8 p.m. UTC | #5
From: Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com>
Sent: Tuesday, May 30, 2023 6:22 AM
> 
> Hi,
> 
> On 5/30/23 5:57 AM, Tom Lendacky wrote:
> > On 5/29/23 19:57, Kirill A. Shutemov wrote:
> >> On Fri, May 26, 2023 at 03:10:56PM -0700, Sathyanarayanan Kuppuswamy wrote:
> >>>
> >>>
> >>> On 5/26/23 5:02 AM, Kirill A. Shutemov wrote:
> >>>> Touching privately mapped GPA that is not properly converted to private
> >>>> with MapGPA and accepted leads to unrecoverable exit to VMM.
> >>>>
> >>>> load_unaligned_zeropad() can touch memory that is not owned by the
> >>>> caller, but just happened to next after the owned memory.
> >>>
> >>> /s/to/to be ?
> >>
> >> Yep, my bad.
> >>
> >>>> This load_unaligned_zeropad() behaviour makes it important when kernel
> >>>> asks VMM to convert a GPA from shared to private or back. Kernel must
> >>>> never have a page mapped into direct mapping (and aliases) as private
> >>>> when the GPA is already converted to shared or when GPA is not yet
> >>>> converted to private.
> >>>
> >>> I am wondering whether this issue exist in the AMD code?
> >>>
> >>> IMO, you can add some info on the window in set_memory_encrypted()
> >>> where this race exists.
> >>
> >> I don't think AMD affected by load_unaligned_zeropad() the same way as
> >> Intel does. But I'm not sure.
> >>
> >> Tom, do you have any comments?
> >
> > Right, shouldn't be an issue for SNP.
> 
> Thanks for confirming.
> 

Tom -- For my education, could you elaborate on why this problem can't
occur in an SEV-SNP guest?  There's still a window where the direct map
PTE and the RMP as maintained by the hypervisor are out-of-sync.  If
load_unaligned_zeropad() does a read using the direct map PTE during
this out-of-sync window, isn't that going to trap to the hypervisor?  How
is the scenario is handled from there to provide the zeros to
load_unaligned_zeropad()?  I need to make sure Hyper-V is doing whatever
is needed. :-)  

Thanks,

Michael
  
Tom Lendacky June 1, 2023, 6:19 p.m. UTC | #6
On 5/31/23 15:00, Michael Kelley (LINUX) wrote:
> From: Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com>
> Sent: Tuesday, May 30, 2023 6:22 AM
>>
>> Hi,
>>
>> On 5/30/23 5:57 AM, Tom Lendacky wrote:
>>> On 5/29/23 19:57, Kirill A. Shutemov wrote:
>>>> On Fri, May 26, 2023 at 03:10:56PM -0700, Sathyanarayanan Kuppuswamy wrote:
>>>>>
>>>>>
>>>>> On 5/26/23 5:02 AM, Kirill A. Shutemov wrote:
>>>>>> Touching privately mapped GPA that is not properly converted to private
>>>>>> with MapGPA and accepted leads to unrecoverable exit to VMM.
>>>>>>
>>>>>> load_unaligned_zeropad() can touch memory that is not owned by the
>>>>>> caller, but just happened to next after the owned memory.
>>>>>
>>>>> /s/to/to be ?
>>>>
>>>> Yep, my bad.
>>>>
>>>>>> This load_unaligned_zeropad() behaviour makes it important when kernel
>>>>>> asks VMM to convert a GPA from shared to private or back. Kernel must
>>>>>> never have a page mapped into direct mapping (and aliases) as private
>>>>>> when the GPA is already converted to shared or when GPA is not yet
>>>>>> converted to private.
>>>>>
>>>>> I am wondering whether this issue exist in the AMD code?
>>>>>
>>>>> IMO, you can add some info on the window in set_memory_encrypted()
>>>>> where this race exists.
>>>>
>>>> I don't think AMD affected by load_unaligned_zeropad() the same way as
>>>> Intel does. But I'm not sure.
>>>>
>>>> Tom, do you have any comments?
>>>
>>> Right, shouldn't be an issue for SNP.
>>
>> Thanks for confirming.
>>
> 
> Tom -- For my education, could you elaborate on why this problem can't
> occur in an SEV-SNP guest?  There's still a window where the direct map
> PTE and the RMP as maintained by the hypervisor are out-of-sync.  If
> load_unaligned_zeropad() does a read using the direct map PTE during
> this out-of-sync window, isn't that going to trap to the hypervisor?  How
> is the scenario is handled from there to provide the zeros to
> load_unaligned_zeropad()?  I need to make sure Hyper-V is doing whatever
> is needed. :-)

Ah, I think I misunderstood this when it was being talked about. The issue 
SNP would have would be between setting the c-bit but before the PVALIDATE 
is issued. Prior to the RMP being updated, referencing the page will 
generate an #NPF and automatically change the RMP over to private (in 
KVM). However, after the guest is resumed, the page will not have been 
validated resulting in a #VC with error code 0x404 being generated, 
causing the guest to terminate itself.

I suppose, when a 0x404 error code is encountered by the #VC handler, it 
could call search_exception_tables() and call ex_handler_zeropad() for the 
EX_TYPE_ZEROPAD type (ex_handler_zeropad is currently static, though).

Thanks,
Tom

> 
> Thanks,
> 
> Michael
> 
>
  
Michael Kelley (LINUX) June 2, 2023, 4:11 p.m. UTC | #7
From: Tom Lendacky <thomas.lendacky@amd.com> Sent: Thursday, June 1, 2023 11:19 AM
> 
> On 5/31/23 15:00, Michael Kelley (LINUX) wrote:
> > From: Sathyanarayanan Kuppuswamy
> <sathyanarayanan.kuppuswamy@linux.intel.com>
> > Sent: Tuesday, May 30, 2023 6:22 AM
> >>
> >> Hi,
> >>
> >> On 5/30/23 5:57 AM, Tom Lendacky wrote:
> >>> On 5/29/23 19:57, Kirill A. Shutemov wrote:
> >>>> On Fri, May 26, 2023 at 03:10:56PM -0700, Sathyanarayanan Kuppuswamy
> wrote:
> >>>>>
> >>>>>
> >>>>> On 5/26/23 5:02 AM, Kirill A. Shutemov wrote:
> >>>>>> Touching privately mapped GPA that is not properly converted to private
> >>>>>> with MapGPA and accepted leads to unrecoverable exit to VMM.
> >>>>>>
> >>>>>> load_unaligned_zeropad() can touch memory that is not owned by the
> >>>>>> caller, but just happened to next after the owned memory.
> >>>>>
> >>>>> /s/to/to be ?
> >>>>
> >>>> Yep, my bad.
> >>>>
> >>>>>> This load_unaligned_zeropad() behaviour makes it important when kernel
> >>>>>> asks VMM to convert a GPA from shared to private or back. Kernel must
> >>>>>> never have a page mapped into direct mapping (and aliases) as private
> >>>>>> when the GPA is already converted to shared or when GPA is not yet
> >>>>>> converted to private.
> >>>>>
> >>>>> I am wondering whether this issue exist in the AMD code?
> >>>>>
> >>>>> IMO, you can add some info on the window in set_memory_encrypted()
> >>>>> where this race exists.
> >>>>
> >>>> I don't think AMD affected by load_unaligned_zeropad() the same way as
> >>>> Intel does. But I'm not sure.
> >>>>
> >>>> Tom, do you have any comments?
> >>>
> >>> Right, shouldn't be an issue for SNP.
> >>
> >> Thanks for confirming.
> >>
> >
> > Tom -- For my education, could you elaborate on why this problem can't
> > occur in an SEV-SNP guest?  There's still a window where the direct map
> > PTE and the RMP as maintained by the hypervisor are out-of-sync.  If
> > load_unaligned_zeropad() does a read using the direct map PTE during
> > this out-of-sync window, isn't that going to trap to the hypervisor?  How
> > is the scenario is handled from there to provide the zeros to
> > load_unaligned_zeropad()?  I need to make sure Hyper-V is doing whatever
> > is needed. :-)
> 
> Ah, I think I misunderstood this when it was being talked about. The issue
> SNP would have would be between setting the c-bit but before the PVALIDATE
> is issued. Prior to the RMP being updated, referencing the page will
> generate an #NPF and automatically change the RMP over to private (in
> KVM). However, after the guest is resumed, the page will not have been
> validated resulting in a #VC with error code 0x404 being generated,
> causing the guest to terminate itself.
> 
> I suppose, when a 0x404 error code is encountered by the #VC handler, it
> could call search_exception_tables() and call ex_handler_zeropad() for the
> EX_TYPE_ZEROPAD type (ex_handler_zeropad is currently static, though).
> 

Tom -- Does the above sequence *depend* on the hypervisor doing anything
to make it work?  I'm not clear on why KVM would automatically change the
page over to private.  If there's a dependency on the hypervisor doing
something, then it seems like we'll need to standardize that "something"
across hypervisors, lest we end up with per-hypervisor code in Linux to handle
this scenario.  And running SEV-SNP with multiple VMPLs probably makes it
even more complicated.

Kirill -- Same question about TDX.  Does making load_unaligned_zeropad()
work in a TDX VM depend on the hypervisor doing anything?  Or is the
behavior seen by the guest dependent only on architected behavior of
the TDX processor?

Looking at this problem from a slightly higher level, and thinking out loud
a bit, load_unaligned_zeropad() functionality is provided only for certain
architectures:  x86/64, arm, arm64, and PowerPC 64 (little endian).  There are
fallbacks for architectures that don't support it.  With two minor tweaks to
Kconfig files, I've built x86 with load_unaligned_zeropad() disabled. Maybe
with today's processors the performance benefits are past their prime,
and running with it disabled in CoCo VMs is the better solution.  Does
anyone have a sense of whether the perf impact would be measureable?

If doing the load_unaligned_zeropad() enable/disable at build time is too
limiting, maybe it could be runtime based on whether page private/shared
state is being enforced.  I haven't looked at the details.

Thoughts?

Michael
  
Tom Lendacky June 2, 2023, 5:05 p.m. UTC | #8
On 6/2/23 11:11, Michael Kelley (LINUX) wrote:
> From: Tom Lendacky <thomas.lendacky@amd.com> Sent: Thursday, June 1, 2023 11:19 AM
>>
>> On 5/31/23 15:00, Michael Kelley (LINUX) wrote:
>>> From: Sathyanarayanan Kuppuswamy
>> <sathyanarayanan.kuppuswamy@linux.intel.com>
>>> Sent: Tuesday, May 30, 2023 6:22 AM
>>>>
>>>> Hi,
>>>>
>>>> On 5/30/23 5:57 AM, Tom Lendacky wrote:
>>>>> On 5/29/23 19:57, Kirill A. Shutemov wrote:
>>>>>> On Fri, May 26, 2023 at 03:10:56PM -0700, Sathyanarayanan Kuppuswamy
>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 5/26/23 5:02 AM, Kirill A. Shutemov wrote:
>>>>>>>> Touching privately mapped GPA that is not properly converted to private
>>>>>>>> with MapGPA and accepted leads to unrecoverable exit to VMM.
>>>>>>>>
>>>>>>>> load_unaligned_zeropad() can touch memory that is not owned by the
>>>>>>>> caller, but just happened to next after the owned memory.
>>>>>>>
>>>>>>> /s/to/to be ?
>>>>>>
>>>>>> Yep, my bad.
>>>>>>
>>>>>>>> This load_unaligned_zeropad() behaviour makes it important when kernel
>>>>>>>> asks VMM to convert a GPA from shared to private or back. Kernel must
>>>>>>>> never have a page mapped into direct mapping (and aliases) as private
>>>>>>>> when the GPA is already converted to shared or when GPA is not yet
>>>>>>>> converted to private.
>>>>>>>
>>>>>>> I am wondering whether this issue exist in the AMD code?
>>>>>>>
>>>>>>> IMO, you can add some info on the window in set_memory_encrypted()
>>>>>>> where this race exists.
>>>>>>
>>>>>> I don't think AMD affected by load_unaligned_zeropad() the same way as
>>>>>> Intel does. But I'm not sure.
>>>>>>
>>>>>> Tom, do you have any comments?
>>>>>
>>>>> Right, shouldn't be an issue for SNP.
>>>>
>>>> Thanks for confirming.
>>>>
>>>
>>> Tom -- For my education, could you elaborate on why this problem can't
>>> occur in an SEV-SNP guest?  There's still a window where the direct map
>>> PTE and the RMP as maintained by the hypervisor are out-of-sync.  If
>>> load_unaligned_zeropad() does a read using the direct map PTE during
>>> this out-of-sync window, isn't that going to trap to the hypervisor?  How
>>> is the scenario is handled from there to provide the zeros to
>>> load_unaligned_zeropad()?  I need to make sure Hyper-V is doing whatever
>>> is needed. :-)
>>
>> Ah, I think I misunderstood this when it was being talked about. The issue
>> SNP would have would be between setting the c-bit but before the PVALIDATE
>> is issued. Prior to the RMP being updated, referencing the page will
>> generate an #NPF and automatically change the RMP over to private (in
>> KVM). However, after the guest is resumed, the page will not have been
>> validated resulting in a #VC with error code 0x404 being generated,
>> causing the guest to terminate itself.
>>
>> I suppose, when a 0x404 error code is encountered by the #VC handler, it
>> could call search_exception_tables() and call ex_handler_zeropad() for the
>> EX_TYPE_ZEROPAD type (ex_handler_zeropad is currently static, though).
>>
> 
> Tom -- Does the above sequence *depend* on the hypervisor doing anything
> to make it work?  I'm not clear on why KVM would automatically change the
> page over to private.  If there's a dependency on the hypervisor doing
> something, then it seems like we'll need to standardize that "something"
> across hypervisors, lest we end up with per-hypervisor code in Linux to handle
> this scenario.  And running SEV-SNP with multiple VMPLs probably makes it
> even more complicated.

No, it doesn't depend on the hypervisor doing anything. If the RMP hasn't 
been updated, then a #VMEXIT(NPF) will be triggered (see APM vol 2, Table 
15-39). The hypervisor is free to do what it wants with that, e.g. just 
resume the guest (and immediately take another #VMEXIT(NPF), possibly). 
Since it is a different thread/vCPU trying to access the memory than is 
changing the state of the memory, eventually the guest kernel thread that 
is changing the state of the memory will issue the page state change to 
the hypervisor and the other thread can continue.

Thanks,
Tom

> 
> Kirill -- Same question about TDX.  Does making load_unaligned_zeropad()
> work in a TDX VM depend on the hypervisor doing anything?  Or is the
> behavior seen by the guest dependent only on architected behavior of
> the TDX processor?
> 
> Looking at this problem from a slightly higher level, and thinking out loud
> a bit, load_unaligned_zeropad() functionality is provided only for certain
> architectures:  x86/64, arm, arm64, and PowerPC 64 (little endian).  There are
> fallbacks for architectures that don't support it.  With two minor tweaks to
> Kconfig files, I've built x86 with load_unaligned_zeropad() disabled. Maybe
> with today's processors the performance benefits are past their prime,
> and running with it disabled in CoCo VMs is the better solution.  Does
> anyone have a sense of whether the perf impact would be measureable?
> 
> If doing the load_unaligned_zeropad() enable/disable at build time is too
> limiting, maybe it could be runtime based on whether page private/shared
> state is being enforced.  I haven't looked at the details.
> 
> Thoughts?
> 
> Michael
  
Dave Hansen June 2, 2023, 5:42 p.m. UTC | #9
On 6/2/23 09:11, Michael Kelley (LINUX) wrote:
> Tom -- Does the above sequence *depend* on the hypervisor doing anything
> to make it work?  I'm not clear on why KVM would automatically change the
> page over to private.  If there's a dependency on the hypervisor doing
> something, then it seems like we'll need to standardize that "something"
> across hypervisors, lest we end up with per-hypervisor code in Linux to handle
> this scenario.  And running SEV-SNP with multiple VMPLs probably makes it
> even more complicated.
> 
> Kirill -- Same question about TDX.  Does making load_unaligned_zeropad()
> work in a TDX VM depend on the hypervisor doing anything?  Or is the
> behavior seen by the guest dependent only on architected behavior of
> the TDX processor?

No, there's no active help from the hypervisor here.

Also, fwiw, the "architected behavior" here is really just the TDX
module policy and _arguably_ the hardware Secure-EPT controlled by the
TDX module.
  
Kirill A. Shutemov June 5, 2023, 12:12 p.m. UTC | #10
On Fri, Jun 02, 2023 at 10:42:33AM -0700, Dave Hansen wrote:
> On 6/2/23 09:11, Michael Kelley (LINUX) wrote:
> > Tom -- Does the above sequence *depend* on the hypervisor doing anything
> > to make it work?  I'm not clear on why KVM would automatically change the
> > page over to private.  If there's a dependency on the hypervisor doing
> > something, then it seems like we'll need to standardize that "something"
> > across hypervisors, lest we end up with per-hypervisor code in Linux to handle
> > this scenario.  And running SEV-SNP with multiple VMPLs probably makes it
> > even more complicated.
> > 
> > Kirill -- Same question about TDX.  Does making load_unaligned_zeropad()
> > work in a TDX VM depend on the hypervisor doing anything?  Or is the
> > behavior seen by the guest dependent only on architected behavior of
> > the TDX processor?
> 
> No, there's no active help from the hypervisor here.
> 
> Also, fwiw, the "architected behavior" here is really just the TDX
> module policy and _arguably_ the hardware Secure-EPT controlled by the
> TDX module.

Right. There's nothing VMM can do to change behaviour here. VMM can remove
private page, but it will lead to VM termination on access to the page,
but VMM controls VM lifecycle anyway.
  
Dave Hansen June 5, 2023, 11:13 p.m. UTC | #11
On 5/26/23 05:02, Kirill A. Shutemov wrote:
> Touching privately mapped GPA that is not properly converted to private
> with MapGPA and accepted leads to unrecoverable exit to VMM.
> 
> load_unaligned_zeropad() can touch memory that is not owned by the
> caller, but just happened to next after the owned memory.
> This load_unaligned_zeropad() behaviour makes it important when kernel
> asks VMM to convert a GPA from shared to private or back. Kernel must
> never have a page mapped into direct mapping (and aliases) as private
> when the GPA is already converted to shared or when GPA is not yet
> converted to private.
> 
> guest.enc_status_change_prepare() called before adjusting direct mapping
> and therefore it is responsible for converting the memory to private.
> 
> guest.enc_status_change_finish() called after adjusting direct mapping
> and it converts the memory to shared.
> 
> It is okay to have a shared mapping of memory that is not converted
> properly. handle_mmio() knows how to deal with load_unaligned_zeropad()
> stepping on it.

Yeah, as other said, the changelog grammar here is a bit funky.  Can you
fix it up and resend, please?
  

Patch

diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index e146b599260f..59cc13e41aa6 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -840,6 +840,30 @@  static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc)
 	return true;
 }
 
+static bool tdx_enc_status_change_prepare(unsigned long vaddr, int numpages,
+					  bool enc)
+{
+	/*
+	 * Only handle shared->private conversion here.
+	 * See the comment in tdx_early_init().
+	 */
+	if (enc)
+		return tdx_enc_status_changed(vaddr, numpages, enc);
+	return true;
+}
+
+static bool tdx_enc_status_change_finish(unsigned long vaddr, int numpages,
+					 bool enc)
+{
+	/*
+	 * Only handle private->shared conversion here.
+	 * See the comment in tdx_early_init().
+	 */
+	if (!enc)
+		return tdx_enc_status_changed(vaddr, numpages, enc);
+	return true;
+}
+
 void __init tdx_early_init(void)
 {
 	u64 cc_mask;
@@ -867,9 +891,35 @@  void __init tdx_early_init(void)
 	 */
 	physical_mask &= cc_mask - 1;
 
-	x86_platform.guest.enc_cache_flush_required = tdx_cache_flush_required;
-	x86_platform.guest.enc_tlb_flush_required   = tdx_tlb_flush_required;
-	x86_platform.guest.enc_status_change_finish = tdx_enc_status_changed;
+	/*
+	 * Touching privately mapped GPA that is not properly converted to
+	 * private with MapGPA and accepted leads to unrecoverable exit
+	 * to VMM.
+	 *
+	 * load_unaligned_zeropad() can touch memory that is not owned by
+	 * the caller, but just happened to next after the owned memory.
+	 * This load_unaligned_zeropad() behaviour makes it important when
+	 * kernel asks VMM to convert a GPA from shared to private or back.
+	 * Kernel must never have a page mapped into direct mapping (and
+	 * aliases) as private when the GPA is already converted to shared or
+	 * when GPA is not yet converted to private.
+	 *
+	 * guest.enc_status_change_prepare() called before adjusting direct
+	 * mapping and therefore it is responsible for converting the memory
+	 * to private.
+	 *
+	 * guest.enc_status_change_finish() called after adjusting direct
+	 * mapping and it converts the memory to shared.
+	 *
+	 * It is okay to have a shared mapping of memory that is not converted
+	 * properly. handle_mmio() knows how to deal with load_unaligned_zeropad()
+	 * stepping on it.
+	 */
+	x86_platform.guest.enc_status_change_prepare = tdx_enc_status_change_prepare;
+	x86_platform.guest.enc_status_change_finish  = tdx_enc_status_change_finish;
+
+	x86_platform.guest.enc_cache_flush_required  = tdx_cache_flush_required;
+	x86_platform.guest.enc_tlb_flush_required    = tdx_tlb_flush_required;
 
 	pr_info("Guest detected\n");
 }