Commit Message
Isaku Yamahata
Oct. 30, 2022, 6:22 a.m. UTC
From: Isaku Yamahata <isaku.yamahata@intel.com> kvm_unmap_gfn_range() needs to know the reason of the callback for TDX. mmu notifier, set memattr ioctl or restrictedmem notifier. Based on the reason, TDX changes the behavior. For mmu notifier, it's the operation on shared memory slot to zap shared PTE. For set memattr, private<->shared conversion, zap the original PTE. For restrictedmem, it's a hint that TDX can ignore. Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> --- include/linux/kvm_host.h | 8 +++++++- virt/kvm/kvm_main.c | 5 ++++- 2 files changed, 11 insertions(+), 2 deletions(-)
Comments
On Sat, 2022-10-29 at 23:22 -0700, isaku.yamahata@intel.com wrote: > From: Isaku Yamahata <isaku.yamahata@intel.com> > > kvm_unmap_gfn_range() needs to know the reason of the callback for TDX. > mmu notifier, set memattr ioctl or restrictedmem notifier. Based on the > reason, TDX changes the behavior. For mmu notifier, it's the operation on > shared memory slot to zap shared PTE. For set memattr, private<->shared > conversion, zap the original PTE. For restrictedmem, it's a hint that TDX > can ignore. Could you elaborate why restricted memfd notifier can be ignored? IIUC if userspace punch a hole, the pages within the hole will be de-allocated. So why can such notifier be ignored? Btw, I think this patch should be just merged to the patch which consumes those flags (checked it is next patch). It's easier to review when putting them together. > > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> > --- > include/linux/kvm_host.h | 8 +++++++- > virt/kvm/kvm_main.c | 5 ++++- > 2 files changed, 11 insertions(+), 2 deletions(-) > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index 839d98d56632..b658803ea2c7 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -247,12 +247,18 @@ int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu); > > > #if defined(KVM_ARCH_WANT_MMU_NOTIFIER) || defined(CONFIG_HAVE_KVM_RESTRICTED_MEM) > +#define KVM_GFN_RANGE_FLAGS_RESTRICTED_MEM BIT(0) > +#define KVM_GFN_RANGE_FLAGS_SET_MEM_ATTR BIT(1) > struct kvm_gfn_range { > struct kvm_memory_slot *slot; > gfn_t start; > gfn_t end; > - pte_t pte; > + union { > + pte_t pte; > + int attr; > + }; attribute is u64 in Chao's series. > bool may_block; > + unsigned int flags; > };
On Wed, Dec 14, 2022 at 10:51:31AM +0000, "Huang, Kai" <kai.huang@intel.com> wrote: > On Sat, 2022-10-29 at 23:22 -0700, isaku.yamahata@intel.com wrote: > > From: Isaku Yamahata <isaku.yamahata@intel.com> > > > > kvm_unmap_gfn_range() needs to know the reason of the callback for TDX. > > mmu notifier, set memattr ioctl or restrictedmem notifier. Based on the > > reason, TDX changes the behavior. For mmu notifier, it's the operation on > > shared memory slot to zap shared PTE. For set memattr, private<->shared > > conversion, zap the original PTE. For restrictedmem, it's a hint that TDX > > can ignore. > > Could you elaborate why restricted memfd notifier can be ignored? IIUC if > userspace punch a hole, the pages within the hole will be de-allocated. So why > can such notifier be ignored? Because set-memory-attribute ioctl is expected to follow the callback from restrictedmem. So set memory attributes can do de-allocation. I wanted to avoid zapping twice. With v9 UPM, the restrictedmem callback was triggered for both allocation and punch-hole. With v10 UPM, the callback is triggered only for punch-hole. With v10 callback semantics, probably this can be cleaned up slightly.
On Thu, 2022-12-15 at 14:10 -0800, Isaku Yamahata wrote: > On Wed, Dec 14, 2022 at 10:51:31AM +0000, > "Huang, Kai" <kai.huang@intel.com> wrote: > > > On Sat, 2022-10-29 at 23:22 -0700, isaku.yamahata@intel.com wrote: > > > From: Isaku Yamahata <isaku.yamahata@intel.com> > > > > > > kvm_unmap_gfn_range() needs to know the reason of the callback for TDX. > > > mmu notifier, set memattr ioctl or restrictedmem notifier. Based on the > > > reason, TDX changes the behavior. For mmu notifier, it's the operation on > > > shared memory slot to zap shared PTE. For set memattr, private<->shared > > > conversion, zap the original PTE. For restrictedmem, it's a hint that TDX > > > can ignore. > > > > Could you elaborate why restricted memfd notifier can be ignored? IIUC if > > userspace punch a hole, the pages within the hole will be de-allocated. So why > > can such notifier be ignored? > > Because set-memory-attribute ioctl is expected to follow the callback from > restrictedmem. So set memory attributes can do de-allocation. I wanted to avoid > zapping twice. Even this is true, the punch hole can be done alone w/o being followed by set_memory_attribute(), correct? Your explanation doesn't seem to be reasonable? At least, you need to explain the semantics of how to use "punch hole" and set_memory_attributes() clearly in the changelog. Otherwise it's hard for people to review.
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 839d98d56632..b658803ea2c7 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -247,12 +247,18 @@ int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu); #if defined(KVM_ARCH_WANT_MMU_NOTIFIER) || defined(CONFIG_HAVE_KVM_RESTRICTED_MEM) +#define KVM_GFN_RANGE_FLAGS_RESTRICTED_MEM BIT(0) +#define KVM_GFN_RANGE_FLAGS_SET_MEM_ATTR BIT(1) struct kvm_gfn_range { struct kvm_memory_slot *slot; gfn_t start; gfn_t end; - pte_t pte; + union { + pte_t pte; + int attr; + }; bool may_block; + unsigned int flags; }; bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range); #endif diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 3b05a3396f89..dda2f2ec4faa 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -676,6 +676,7 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm, gfn_range.start = hva_to_gfn_memslot(hva_start, slot); gfn_range.end = hva_to_gfn_memslot(hva_end + PAGE_SIZE - 1, slot); gfn_range.slot = slot; + gfn_range.flags = 0; if (!locked) { locked = true; @@ -947,8 +948,9 @@ static void kvm_unmap_mem_range(struct kvm *kvm, gfn_t start, gfn_t end, int i; int r = 0; - gfn_range.pte = __pte(0); + gfn_range.attr = attr; gfn_range.may_block = true; + gfn_range.flags = KVM_GFN_RANGE_FLAGS_SET_MEM_ATTR; for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) { slots = __kvm_memslots(kvm, i); @@ -1074,6 +1076,7 @@ static void kvm_restrictedmem_invalidate_begin(struct restrictedmem_notifier *no gfn_range.slot = slot; gfn_range.pte = __pte(0); gfn_range.may_block = true; + gfn_range.flags = KVM_GFN_RANGE_FLAGS_RESTRICTED_MEM; if (kvm_unmap_gfn_range(kvm, &gfn_range)) kvm_flush_remote_tlbs(kvm);