[04/10] swiotlb: Use free_decrypted_pages()

Message ID 20231017202505.340906-5-rick.p.edgecombe@intel.com
State New
Headers
Series Handle set_memory_XXcrypted() errors |

Commit Message

Edgecombe, Rick P Oct. 17, 2023, 8:24 p.m. UTC
  On TDX it is possible for the untrusted host to cause
set_memory_encrypted() or set_memory_decrypted() to fail such that an
error is returned and the resulting memory is shared. Callers need to take
care to handle these errors to avoid returning decrypted (shared) memory to
the page allocator, which could lead to functional or security issues.

Swiotlb could free decrypted/shared pages if set_memory_decrypted() fails.
Use the recently added free_decrypted_pages() to avoid this.

In swiotlb_exit(), check for set_memory_encrypted() errors manually,
because the pages are not nessarily going to the page allocator.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: iommu@lists.linux.dev
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
 kernel/dma/swiotlb.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)
  

Comments

Christoph Hellwig Oct. 18, 2023, 4:43 a.m. UTC | #1
On Tue, Oct 17, 2023 at 01:24:59PM -0700, Rick Edgecombe wrote:
> On TDX it is possible for the untrusted host to cause
> set_memory_encrypted() or set_memory_decrypted() to fail such that an
> error is returned and the resulting memory is shared. Callers need to take
> care to handle these errors to avoid returning decrypted (shared) memory to
> the page allocator, which could lead to functional or security issues.
> 
> Swiotlb could free decrypted/shared pages if set_memory_decrypted() fails.
> Use the recently added free_decrypted_pages() to avoid this.
> 
> In swiotlb_exit(), check for set_memory_encrypted() errors manually,
> because the pages are not nessarily going to the page allocator.

Whatever recently introduced it didn't make it to my mailbox.  Please
always CC everyone on every patch in a series, everything else is
impossible to review.
  
Edgecombe, Rick P Oct. 18, 2023, 3:55 p.m. UTC | #2
On Wed, 2023-10-18 at 06:43 +0200, Christoph Hellwig wrote:
> On Tue, Oct 17, 2023 at 01:24:59PM -0700, Rick Edgecombe wrote:
> > On TDX it is possible for the untrusted host to cause
> > set_memory_encrypted() or set_memory_decrypted() to fail such that
> > an
> > error is returned and the resulting memory is shared. Callers need
> > to take
> > care to handle these errors to avoid returning decrypted (shared)
> > memory to
> > the page allocator, which could lead to functional or security
> > issues.
> > 
> > Swiotlb could free decrypted/shared pages if set_memory_decrypted()
> > fails.
> > Use the recently added free_decrypted_pages() to avoid this.
> > 
> > In swiotlb_exit(), check for set_memory_encrypted() errors
> > manually,
> > because the pages are not nessarily going to the page allocator.
> 
> Whatever recently introduced it didn't make it to my mailbox.  Please
> always CC everyone on every patch in a series, everything else is
> impossible to review.

Ok. The series touches a bunch of set_memory() callers all over, so I
was trying to manage the CC list to something reasonable. I tried to
give a summary in each commit, but I guess it wasn't in depth enough.
Here is the lore link, if you haven't already found it:
https://lore.kernel.org/lkml/20231017202505.340906-1-rick.p.edgecombe@intel.com/
  
Petr Tesařík Oct. 31, 2023, 10:43 a.m. UTC | #3
On Tue, 17 Oct 2023 13:24:59 -0700
Rick Edgecombe <rick.p.edgecombe@intel.com> wrote:

> On TDX it is possible for the untrusted host to cause
> set_memory_encrypted() or set_memory_decrypted() to fail such that an
> error is returned and the resulting memory is shared. Callers need to take
> care to handle these errors to avoid returning decrypted (shared) memory to
> the page allocator, which could lead to functional or security issues.
> 
> Swiotlb could free decrypted/shared pages if set_memory_decrypted() fails.
> Use the recently added free_decrypted_pages() to avoid this.
> 
> In swiotlb_exit(), check for set_memory_encrypted() errors manually,
> because the pages are not nessarily going to the page allocator.
> 
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: iommu@lists.linux.dev
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
>  kernel/dma/swiotlb.c | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 394494a6b1f3..ad06786c4f98 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -524,6 +524,7 @@ void __init swiotlb_exit(void)
>  	unsigned long tbl_vaddr;
>  	size_t tbl_size, slots_size;
>  	unsigned int area_order;
> +	int ret;
>  
>  	if (swiotlb_force_bounce)
>  		return;
> @@ -536,17 +537,19 @@ void __init swiotlb_exit(void)
>  	tbl_size = PAGE_ALIGN(mem->end - mem->start);
>  	slots_size = PAGE_ALIGN(array_size(sizeof(*mem->slots), mem->nslabs));
>  
> -	set_memory_encrypted(tbl_vaddr, tbl_size >> PAGE_SHIFT);
> +	ret = set_memory_encrypted(tbl_vaddr, tbl_size >> PAGE_SHIFT);
>  	if (mem->late_alloc) {
>  		area_order = get_order(array_size(sizeof(*mem->areas),
>  			mem->nareas));
>  		free_pages((unsigned long)mem->areas, area_order);
> -		free_pages(tbl_vaddr, get_order(tbl_size));
> +		if (!ret)
> +			free_pages(tbl_vaddr, get_order(tbl_size));
>  		free_pages((unsigned long)mem->slots, get_order(slots_size));
>  	} else {
>  		memblock_free_late(__pa(mem->areas),
>  			array_size(sizeof(*mem->areas), mem->nareas));
> -		memblock_free_late(mem->start, tbl_size);
> +		if (!ret)
> +			memblock_free_late(mem->start, tbl_size);
>  		memblock_free_late(__pa(mem->slots), slots_size);
>  	}
>  
> @@ -581,7 +584,7 @@ static struct page *alloc_dma_pages(gfp_t gfp, size_t bytes)
>  	return page;
>  
>  error:
> -	__free_pages(page, order);
> +	free_decrypted_pages((unsigned long)vaddr, order);
>  	return NULL;
>  }

I admit I'm not familiar with the encryption/decryption API, but if a
__free_pages() is not sufficient here, then it is quite confusing.
The error label is reached only if set_memory_decrypted() returns
non-zero. My naive expectation is that the memory is *not* decrypted in
that case and does not require special treatment. Is this assumption
wrong?

OTOH I believe there is a bug in the logic. The subsequent
__free_pages() in swiotlb_alloc_tlb() would have to be changed to a
free_decrypted_pages(). However, I'm proposing a different approach to
address the latter issue here:

https://lore.kernel.org/linux-iommu/20231026095123.222-1-petrtesarik@huaweicloud.com/T/

Petr T
  
Edgecombe, Rick P Oct. 31, 2023, 3:54 p.m. UTC | #4
On Tue, 2023-10-31 at 11:43 +0100, Petr Tesařík wrote:
> 
> I admit I'm not familiar with the encryption/decryption API, but if a
> __free_pages() is not sufficient here, then it is quite confusing.
> The error label is reached only if set_memory_decrypted() returns
> non-zero. My naive expectation is that the memory is *not* decrypted
> in
> that case and does not require special treatment. Is this assumption
> wrong?

Yea, the memory can still be decrypted, or partially decrypted. On x86,
all the set_memory() calls can fail part way through the work, and they
don't rollback the changes they had made up to that point. I was
thinking about trying to changes this, but that is the current
behavior.

But in TDX's case set_memory_decrypted() can be fully successful and
just return an error. And there aren't any plans to fix the TDX case
(which has special VMM work that the kernel doesn't have control over).
So instead, the plan is to WARN about it and count on the caller to
handle the error properly:
https://lore.kernel.org/lkml/20231027214744.1742056-1-rick.p.edgecombe@intel.com/

> 
> OTOH I believe there is a bug in the logic. The subsequent
> __free_pages() in swiotlb_alloc_tlb() would have to be changed to a
> free_decrypted_pages(). However, I'm proposing a different approach
> to
> address the latter issue here:
> 
> https://lore.kernel.org/linux-iommu/20231026095123.222-1-petrtesarik@huaweicloud.com/T/

Oh, yes, that makes sense. I was planning to send a patch to just leak
the pages if set_memory_decrypted() fails, after my v2 linked above is
accepted. It could have a different label than the phys_limit check
error path added in your linked patch, so that case would still free
the perfectly fine encrypted pages.
  
Petr Tesařík Oct. 31, 2023, 5:13 p.m. UTC | #5
On Tue, 31 Oct 2023 15:54:52 +0000
"Edgecombe, Rick P" <rick.p.edgecombe@intel.com> wrote:

> On Tue, 2023-10-31 at 11:43 +0100, Petr Tesařík wrote:
> > 
> > I admit I'm not familiar with the encryption/decryption API, but if a
> > __free_pages() is not sufficient here, then it is quite confusing.
> > The error label is reached only if set_memory_decrypted() returns
> > non-zero. My naive expectation is that the memory is *not* decrypted
> > in
> > that case and does not require special treatment. Is this assumption
> > wrong?  
> 
> Yea, the memory can still be decrypted, or partially decrypted. On x86,
> all the set_memory() calls can fail part way through the work, and they
> don't rollback the changes they had made up to that point.

Thank you for the explanation. So, after set_memory_decrypted() fails,
the pages become Schroedinger-crypted, but since its true state cannot
be observed by the guest kernel, it stays as such forever.

Sweet.

>[...]
> > OTOH I believe there is a bug in the logic. The subsequent
> > __free_pages() in swiotlb_alloc_tlb() would have to be changed to a
> > free_decrypted_pages(). However, I'm proposing a different approach
> > to
> > address the latter issue here:
> > 
> > https://lore.kernel.org/linux-iommu/20231026095123.222-1-petrtesarik@huaweicloud.com/T/  
> 
> Oh, yes, that makes sense. I was planning to send a patch to just leak
> the pages if set_memory_decrypted() fails, after my v2 linked above is
> accepted. It could have a different label than the phys_limit check
> error path added in your linked patch, so that case would still free
> the perfectly fine encrypted pages.

Hm, should I incorporate this knowledge into a v2 of my patch and
address both issues?

Petr T
  
Edgecombe, Rick P Oct. 31, 2023, 5:29 p.m. UTC | #6
On Tue, 2023-10-31 at 18:13 +0100, Petr Tesařík wrote:
> Thank you for the explanation. So, after set_memory_decrypted()
> fails,
> the pages become Schroedinger-crypted, but since its true state
> cannot
> be observed by the guest kernel, it stays as such forever.
> 
> Sweet.
> 
Yes... The untrusted host (the part of the VMM TDX is defending
against) gets to specify the return code of these operations (success
or failure). But the coco(a general term for TDX and similar from other
vendors) threat model doesn't include DOS. So the guest should trust
the return code as far as trying to not crash, but not trust it in
regards to the potential to leak data.

It's a bit to ask of the callers, but the other solution we discussed
was to panic the guest if any weirdness is observed by the VMM, in
which case the callers would never see the error. And of course
panicing the kernel is Bad. So that is how we arrived at this request
of the callers. Appreciate the effort to handle it on that side.


> Hm, should I incorporate this knowledge into a v2 of my patch and
> address both issues?

That sounds good to me! Feel free to CC me if you would like, and I can
scrutinize it for this particular issue.
  
Petr Tesařík Nov. 1, 2023, 6:27 a.m. UTC | #7
Hi,

On Tue, 31 Oct 2023 17:29:25 +0000
"Edgecombe, Rick P" <rick.p.edgecombe@intel.com> wrote:

> On Tue, 2023-10-31 at 18:13 +0100, Petr Tesařík wrote:
> > Thank you for the explanation. So, after set_memory_decrypted()
> > fails,
> > the pages become Schroedinger-crypted, but since its true state
> > cannot
> > be observed by the guest kernel, it stays as such forever.
> > 
> > Sweet.
> >   
> Yes... The untrusted host (the part of the VMM TDX is defending
> against) gets to specify the return code of these operations (success
> or failure). But the coco(a general term for TDX and similar from other
> vendors) threat model doesn't include DOS. So the guest should trust
> the return code as far as trying to not crash, but not trust it in
> regards to the potential to leak data.
> 
> It's a bit to ask of the callers, but the other solution we discussed
> was to panic the guest if any weirdness is observed by the VMM, in
> which case the callers would never see the error. And of course
> panicing the kernel is Bad. So that is how we arrived at this request
> of the callers. Appreciate the effort to handle it on that side.
> 
> 
> > Hm, should I incorporate this knowledge into a v2 of my patch and
> > address both issues?  
> 
> That sounds good to me! Feel free to CC me if you would like, and I can
> scrutinize it for this particular issue.

I'm sorry I missed that free_decrypted_pages() is added by the very
same series, so I cannot use it just yet. I can open-code it and let
you convert the code to the new function. You may then also want to
convert another open-coded instance further down in swiotlb_free_tlb().

In any case, there is an interdependency between the two patches, so we
should agree in which order to apply them.

Petr T
  
Edgecombe, Rick P Nov. 1, 2023, 2:40 p.m. UTC | #8
On Wed, 2023-11-01 at 07:27 +0100, Petr Tesařík wrote:
> I'm sorry I missed that free_decrypted_pages() is added by the very
> same series, so I cannot use it just yet. I can open-code it and let
> you convert the code to the new function. You may then also want to
> convert another open-coded instance further down in
> swiotlb_free_tlb().
> 
> In any case, there is an interdependency between the two patches, so
> we
> should agree in which order to apply them.

Open coding in the callers is the current plan (see an explanation
after the "---" in the v1 of that patch[0] if you are interested).
There might not be any interdependency between the the warning and
swiotlb changes, but I can double check if you CC me.


[0]
https://lore.kernel.org/lkml/20231024234829.1443125-1-rick.p.edgecombe@intel.com/
  

Patch

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 394494a6b1f3..ad06786c4f98 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -524,6 +524,7 @@  void __init swiotlb_exit(void)
 	unsigned long tbl_vaddr;
 	size_t tbl_size, slots_size;
 	unsigned int area_order;
+	int ret;
 
 	if (swiotlb_force_bounce)
 		return;
@@ -536,17 +537,19 @@  void __init swiotlb_exit(void)
 	tbl_size = PAGE_ALIGN(mem->end - mem->start);
 	slots_size = PAGE_ALIGN(array_size(sizeof(*mem->slots), mem->nslabs));
 
-	set_memory_encrypted(tbl_vaddr, tbl_size >> PAGE_SHIFT);
+	ret = set_memory_encrypted(tbl_vaddr, tbl_size >> PAGE_SHIFT);
 	if (mem->late_alloc) {
 		area_order = get_order(array_size(sizeof(*mem->areas),
 			mem->nareas));
 		free_pages((unsigned long)mem->areas, area_order);
-		free_pages(tbl_vaddr, get_order(tbl_size));
+		if (!ret)
+			free_pages(tbl_vaddr, get_order(tbl_size));
 		free_pages((unsigned long)mem->slots, get_order(slots_size));
 	} else {
 		memblock_free_late(__pa(mem->areas),
 			array_size(sizeof(*mem->areas), mem->nareas));
-		memblock_free_late(mem->start, tbl_size);
+		if (!ret)
+			memblock_free_late(mem->start, tbl_size);
 		memblock_free_late(__pa(mem->slots), slots_size);
 	}
 
@@ -581,7 +584,7 @@  static struct page *alloc_dma_pages(gfp_t gfp, size_t bytes)
 	return page;
 
 error:
-	__free_pages(page, order);
+	free_decrypted_pages((unsigned long)vaddr, order);
 	return NULL;
 }