[v10,046/108] KVM: Add flags to struct kvm_gfn_range

Message ID 880c1016c29624964baee580985b6a736fc7d656.1667110240.git.isaku.yamahata@intel.com
State New
Headers
Series KVM TDX basic feature support |

Commit Message

Isaku Yamahata Oct. 30, 2022, 6:22 a.m. UTC
  From: Isaku Yamahata <isaku.yamahata@intel.com>

kvm_unmap_gfn_range() needs to know the reason of the callback for TDX.
mmu notifier, set memattr ioctl or restrictedmem notifier.  Based on the
reason, TDX changes the behavior.  For mmu notifier, it's the operation on
shared memory slot to zap shared PTE.  For set memattr, private<->shared
conversion, zap the original PTE.  For restrictedmem, it's a hint that TDX
can ignore.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 include/linux/kvm_host.h | 8 +++++++-
 virt/kvm/kvm_main.c      | 5 ++++-
 2 files changed, 11 insertions(+), 2 deletions(-)
  

Comments

Kai Huang Dec. 14, 2022, 10:51 a.m. UTC | #1
On Sat, 2022-10-29 at 23:22 -0700, isaku.yamahata@intel.com wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> kvm_unmap_gfn_range() needs to know the reason of the callback for TDX.
> mmu notifier, set memattr ioctl or restrictedmem notifier.  Based on the
> reason, TDX changes the behavior.  For mmu notifier, it's the operation on
> shared memory slot to zap shared PTE.  For set memattr, private<->shared
> conversion, zap the original PTE.  For restrictedmem, it's a hint that TDX
> can ignore.

Could you elaborate why restricted memfd notifier can be ignored? IIUC if
userspace punch a hole, the pages within the hole will be de-allocated.  So why
can such notifier be ignored?

Btw, I think this patch should be just merged to the patch which consumes those
flags (checked it is next patch).  It's easier to review when putting them
together.

> 
> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> ---
>  include/linux/kvm_host.h | 8 +++++++-
>  virt/kvm/kvm_main.c      | 5 ++++-
>  2 files changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 839d98d56632..b658803ea2c7 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -247,12 +247,18 @@ int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
>  
>  
>  #if defined(KVM_ARCH_WANT_MMU_NOTIFIER) || defined(CONFIG_HAVE_KVM_RESTRICTED_MEM)
> +#define KVM_GFN_RANGE_FLAGS_RESTRICTED_MEM	BIT(0)
> +#define KVM_GFN_RANGE_FLAGS_SET_MEM_ATTR	BIT(1)
>  struct kvm_gfn_range {
>  	struct kvm_memory_slot *slot;
>  	gfn_t start;
>  	gfn_t end;
> -	pte_t pte;
> +	union {
> +		pte_t pte;
> +		int attr;
> +	};

attribute is u64 in Chao's series.

>  	bool may_block;
> +	unsigned int flags;
>  };
  
Isaku Yamahata Dec. 15, 2022, 10:10 p.m. UTC | #2
On Wed, Dec 14, 2022 at 10:51:31AM +0000,
"Huang, Kai" <kai.huang@intel.com> wrote:

> On Sat, 2022-10-29 at 23:22 -0700, isaku.yamahata@intel.com wrote:
> > From: Isaku Yamahata <isaku.yamahata@intel.com>
> > 
> > kvm_unmap_gfn_range() needs to know the reason of the callback for TDX.
> > mmu notifier, set memattr ioctl or restrictedmem notifier.  Based on the
> > reason, TDX changes the behavior.  For mmu notifier, it's the operation on
> > shared memory slot to zap shared PTE.  For set memattr, private<->shared
> > conversion, zap the original PTE.  For restrictedmem, it's a hint that TDX
> > can ignore.
> 
> Could you elaborate why restricted memfd notifier can be ignored? IIUC if
> userspace punch a hole, the pages within the hole will be de-allocated.  So why
> can such notifier be ignored?

Because set-memory-attribute ioctl is expected to follow the callback from
restrictedmem.  So set memory attributes can do de-allocation. I wanted to avoid
zapping twice.

With v9 UPM, the restrictedmem callback was triggered for both allocation and
punch-hole.
With v10 UPM, the callback is triggered only for punch-hole. With v10 callback
semantics, probably this can be cleaned up slightly.
  
Kai Huang Dec. 15, 2022, 10:41 p.m. UTC | #3
On Thu, 2022-12-15 at 14:10 -0800, Isaku Yamahata wrote:
> On Wed, Dec 14, 2022 at 10:51:31AM +0000,
> "Huang, Kai" <kai.huang@intel.com> wrote:
> 
> > On Sat, 2022-10-29 at 23:22 -0700, isaku.yamahata@intel.com wrote:
> > > From: Isaku Yamahata <isaku.yamahata@intel.com>
> > > 
> > > kvm_unmap_gfn_range() needs to know the reason of the callback for TDX.
> > > mmu notifier, set memattr ioctl or restrictedmem notifier.  Based on the
> > > reason, TDX changes the behavior.  For mmu notifier, it's the operation on
> > > shared memory slot to zap shared PTE.  For set memattr, private<->shared
> > > conversion, zap the original PTE.  For restrictedmem, it's a hint that TDX
> > > can ignore.
> > 
> > Could you elaborate why restricted memfd notifier can be ignored? IIUC if
> > userspace punch a hole, the pages within the hole will be de-allocated.  So why
> > can such notifier be ignored?
> 
> Because set-memory-attribute ioctl is expected to follow the callback from
> restrictedmem.  So set memory attributes can do de-allocation. I wanted to avoid
> zapping twice.

Even this is true, the punch hole can be done alone w/o being followed by
set_memory_attribute(), correct?  Your explanation doesn't seem to be
reasonable?

At least, you need to explain the semantics of how to use "punch hole" and
set_memory_attributes() clearly in the changelog.  Otherwise it's hard for
people to review.
  

Patch

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 839d98d56632..b658803ea2c7 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -247,12 +247,18 @@  int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
 
 
 #if defined(KVM_ARCH_WANT_MMU_NOTIFIER) || defined(CONFIG_HAVE_KVM_RESTRICTED_MEM)
+#define KVM_GFN_RANGE_FLAGS_RESTRICTED_MEM	BIT(0)
+#define KVM_GFN_RANGE_FLAGS_SET_MEM_ATTR	BIT(1)
 struct kvm_gfn_range {
 	struct kvm_memory_slot *slot;
 	gfn_t start;
 	gfn_t end;
-	pte_t pte;
+	union {
+		pte_t pte;
+		int attr;
+	};
 	bool may_block;
+	unsigned int flags;
 };
 bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range);
 #endif
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 3b05a3396f89..dda2f2ec4faa 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -676,6 +676,7 @@  static __always_inline int __kvm_handle_hva_range(struct kvm *kvm,
 			gfn_range.start = hva_to_gfn_memslot(hva_start, slot);
 			gfn_range.end = hva_to_gfn_memslot(hva_end + PAGE_SIZE - 1, slot);
 			gfn_range.slot = slot;
+			gfn_range.flags = 0;
 
 			if (!locked) {
 				locked = true;
@@ -947,8 +948,9 @@  static void kvm_unmap_mem_range(struct kvm *kvm, gfn_t start, gfn_t end,
 	int i;
 	int r = 0;
 
-	gfn_range.pte = __pte(0);
+	gfn_range.attr = attr;
 	gfn_range.may_block = true;
+	gfn_range.flags = KVM_GFN_RANGE_FLAGS_SET_MEM_ATTR;
 
 	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
 		slots = __kvm_memslots(kvm, i);
@@ -1074,6 +1076,7 @@  static void kvm_restrictedmem_invalidate_begin(struct restrictedmem_notifier *no
 	gfn_range.slot = slot;
 	gfn_range.pte = __pte(0);
 	gfn_range.may_block = true;
+	gfn_range.flags = KVM_GFN_RANGE_FLAGS_RESTRICTED_MEM;
 
 	if (kvm_unmap_gfn_range(kvm, &gfn_range))
 		kvm_flush_remote_tlbs(kvm);