Message ID | 20221025151344.3784230-5-chao.p.peng@linux.intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp1066092wru; Tue, 25 Oct 2022 08:22:24 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4Zgqf03HOq0N+RHrN6sz1Dd/y+fY8U+gq+AND4QA09YRdBWCPMXgtgzBSi21cdP0WaZdK8 X-Received: by 2002:a17:907:1b22:b0:741:8809:b4e6 with SMTP id mp34-20020a1709071b2200b007418809b4e6mr33352920ejc.84.1666711343818; Tue, 25 Oct 2022 08:22:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666711343; cv=none; d=google.com; s=arc-20160816; b=tNF/SjaeFHdRfOQn1uQ6T97MZ47RiM7KUo73IYUKcF2BCgam4xFBTaCuFrzUvSK5Q2 +PhdmHyoFJvl/WwJ+oVCcwqzXl6Mxc7DKT2fVkaolH5hbw5z+7X/Z2Ix+YzqlG/aP9E8 g+DI13g4W9Ua9PNWAXO1dNoNeVIHY8BF9NJLnuSL2EoGrGyjsZEqbw4t/aD/ng/Wniak s9qWdS11JSuLSzGxPg7uqD0wUeXzQG5HlLK2/fwiWEv1GgAjkaRHd5eGE34l76KNLsLq 6xQr1BxU9i7DH6oti7YadHqw/BwtAiulp+BmbO6Irb+izi6/vyUECgaKbvnb6Ly4sTTW Em1w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=ZeCX6+XYJ88tSgL5hc/072IEYNeLVQ1q2js914cezdE=; b=MZame2CzeRBSVkWxngykXQpt2lDkFqCeRgkrws34TuVX4GQXAq0dRImbbm8wh1FIIK SdguEs9hTDiAyfBcVzCPYVwoxAIlEk84ygQQHBktEgxYRpyOrGCo0yLw3iVMUKWqKyL3 7ZZXUr+owPdxd6XiiaIUMm9SDfQ+sluzla4RniGP/WkyxZK4Doz93SpEVtjLin11ASAF J7YOVXz/7e8cd4OLjAZB/dMYjaPiLiIPxeWoz8z4l7e3LYpzStRcr8aCkulwIgn6WO2R V8PYVLC3w/t9Zu1vN8q0P5wVIOKrORLDFebxLG1Kdf8UzXUFfTp3a489cyQ4g68M5CbQ o9/A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="VyLGvL/w"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y9-20020a056402358900b00458d1c48708si3575866edc.303.2022.10.25.08.21.59; Tue, 25 Oct 2022 08:22:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="VyLGvL/w"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233201AbiJYPTh (ORCPT <rfc822;pwkd43@gmail.com> + 99 others); Tue, 25 Oct 2022 11:19:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41744 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233086AbiJYPTL (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 25 Oct 2022 11:19:11 -0400 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AF9081918BB; Tue, 25 Oct 2022 08:19:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1666711150; x=1698247150; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=6EFuXYDl0hDb29a8QD5wsl2s9EmZqV7kVnmmwCD6iGo=; b=VyLGvL/wP8/mBpFhPp6ykzEEm8SsapOp+ks30wrzmCruDBiVSmpcHuKT wulS5CD1fPE2CtvLJlLxnLOfdrGQP25pkPg+WVvVWvwBdD3p3qpT9pUnp qOAg9ixu6WgDqwBVHo1E2z+aRXuN16A+S+H+pWnwyWzsB4bpRbP07rYYU 7SBt2dZeVWCoKst8641CgaqTTeYlczKxWeG+bPeWRvqKxqRsF8Js0aM8I x5dU6LyOzeP3sn/9qod2leUwe+CMtydc8j9YXV7RJrU1Wipbm/qMkACE5 JsCgvuMi5jpxOG3Ep44PJd26m8+1dyi8xYt5hV82/N6yr+f99WvIONhzD w==; X-IronPort-AV: E=McAfee;i="6500,9779,10510"; a="334307384" X-IronPort-AV: E=Sophos;i="5.95,212,1661842800"; d="scan'208";a="334307384" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Oct 2022 08:19:10 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10510"; a="736865621" X-IronPort-AV: E=Sophos;i="5.95,212,1661842800"; d="scan'208";a="736865621" Received: from chaop.bj.intel.com ([10.240.193.75]) by fmsmga002.fm.intel.com with ESMTP; 25 Oct 2022 08:18:59 -0700 From: Chao Peng <chao.p.peng@linux.intel.com> To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, qemu-devel@nongnu.org Cc: Paolo Bonzini <pbonzini@redhat.com>, Jonathan Corbet <corbet@lwn.net>, Sean Christopherson <seanjc@google.com>, Vitaly Kuznetsov <vkuznets@redhat.com>, Wanpeng Li <wanpengli@tencent.com>, Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, x86@kernel.org, "H . Peter Anvin" <hpa@zytor.com>, Hugh Dickins <hughd@google.com>, Jeff Layton <jlayton@kernel.org>, "J . Bruce Fields" <bfields@fieldses.org>, Andrew Morton <akpm@linux-foundation.org>, Shuah Khan <shuah@kernel.org>, Mike Rapoport <rppt@kernel.org>, Steven Price <steven.price@arm.com>, "Maciej S . Szmigiero" <mail@maciej.szmigiero.name>, Vlastimil Babka <vbabka@suse.cz>, Vishal Annapurve <vannapurve@google.com>, Yu Zhang <yu.c.zhang@linux.intel.com>, Chao Peng <chao.p.peng@linux.intel.com>, "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>, luto@kernel.org, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com, aarcange@redhat.com, ddutile@redhat.com, dhildenb@redhat.com, Quentin Perret <qperret@google.com>, tabba@google.com, Michael Roth <michael.roth@amd.com>, mhocko@suse.com, Muchun Song <songmuchun@bytedance.com>, wei.w.wang@intel.com Subject: [PATCH v9 4/8] KVM: Use gfn instead of hva for mmu_notifier_retry Date: Tue, 25 Oct 2022 23:13:40 +0800 Message-Id: <20221025151344.3784230-5-chao.p.peng@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221025151344.3784230-1-chao.p.peng@linux.intel.com> References: <20221025151344.3784230-1-chao.p.peng@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.5 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747673513916970101?= X-GMAIL-MSGID: =?utf-8?q?1747673513916970101?= |
Series |
KVM: mm: fd-based approach for supporting KVM
|
|
Commit Message
Chao Peng
Oct. 25, 2022, 3:13 p.m. UTC
Currently in mmu_notifier validate path, hva range is recorded and then
checked against in the mmu_notifier_retry_hva() of the page fault path.
However, for the to be introduced private memory, a page fault may not
have a hva associated, checking gfn(gpa) makes more sense.
For existing non private memory case, gfn is expected to continue to
work. The only downside is when aliasing multiple gfns to a single hva,
the current algorithm of checking multiple ranges could result in a much
larger range being rejected. Such aliasing should be uncommon, so the
impact is expected small.
It also fixes a bug in kvm_zap_gfn_range() which has already been using
gfn when calling kvm_mmu_invalidate_begin/end() while these functions
accept hva in current code.
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
---
arch/x86/kvm/mmu/mmu.c | 2 +-
include/linux/kvm_host.h | 18 +++++++---------
virt/kvm/kvm_main.c | 45 ++++++++++++++++++++++++++--------------
3 files changed, 39 insertions(+), 26 deletions(-)
Comments
Hi, On Tue, Oct 25, 2022 at 4:19 PM Chao Peng <chao.p.peng@linux.intel.com> wrote: > > Currently in mmu_notifier validate path, hva range is recorded and then > checked against in the mmu_notifier_retry_hva() of the page fault path. > However, for the to be introduced private memory, a page fault may not > have a hva associated, checking gfn(gpa) makes more sense. > > For existing non private memory case, gfn is expected to continue to > work. The only downside is when aliasing multiple gfns to a single hva, > the current algorithm of checking multiple ranges could result in a much > larger range being rejected. Such aliasing should be uncommon, so the > impact is expected small. > > It also fixes a bug in kvm_zap_gfn_range() which has already been using nit: Now it's kvm_unmap_gfn_range(). > gfn when calling kvm_mmu_invalidate_begin/end() while these functions > accept hva in current code. > > Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> > --- Based on reading this code and my limited knowledge of the x86 MMU code: Reviewed-by: Fuad Tabba <tabba@google.com> Cheers, /fuad > arch/x86/kvm/mmu/mmu.c | 2 +- > include/linux/kvm_host.h | 18 +++++++--------- > virt/kvm/kvm_main.c | 45 ++++++++++++++++++++++++++-------------- > 3 files changed, 39 insertions(+), 26 deletions(-) > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index 6f81539061d6..33b1aec44fb8 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -4217,7 +4217,7 @@ static bool is_page_fault_stale(struct kvm_vcpu *vcpu, > return true; > > return fault->slot && > - mmu_invalidate_retry_hva(vcpu->kvm, mmu_seq, fault->hva); > + mmu_invalidate_retry_gfn(vcpu->kvm, mmu_seq, fault->gfn); > } > > static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index 739a7562a1f3..79e5cbc35fcf 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -775,8 +775,8 @@ struct kvm { > struct mmu_notifier mmu_notifier; > unsigned long mmu_invalidate_seq; > long mmu_invalidate_in_progress; > - unsigned long mmu_invalidate_range_start; > - unsigned long mmu_invalidate_range_end; > + gfn_t mmu_invalidate_range_start; > + gfn_t mmu_invalidate_range_end; > #endif > struct list_head devices; > u64 manual_dirty_log_protect; > @@ -1365,10 +1365,8 @@ void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc); > void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc); > #endif > > -void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start, > - unsigned long end); > -void kvm_mmu_invalidate_end(struct kvm *kvm, unsigned long start, > - unsigned long end); > +void kvm_mmu_invalidate_begin(struct kvm *kvm, gfn_t start, gfn_t end); > +void kvm_mmu_invalidate_end(struct kvm *kvm, gfn_t start, gfn_t end); > > long kvm_arch_dev_ioctl(struct file *filp, > unsigned int ioctl, unsigned long arg); > @@ -1937,9 +1935,9 @@ static inline int mmu_invalidate_retry(struct kvm *kvm, unsigned long mmu_seq) > return 0; > } > > -static inline int mmu_invalidate_retry_hva(struct kvm *kvm, > +static inline int mmu_invalidate_retry_gfn(struct kvm *kvm, > unsigned long mmu_seq, > - unsigned long hva) > + gfn_t gfn) > { > lockdep_assert_held(&kvm->mmu_lock); > /* > @@ -1949,8 +1947,8 @@ static inline int mmu_invalidate_retry_hva(struct kvm *kvm, > * positives, due to shortcuts when handing concurrent invalidations. > */ > if (unlikely(kvm->mmu_invalidate_in_progress) && > - hva >= kvm->mmu_invalidate_range_start && > - hva < kvm->mmu_invalidate_range_end) > + gfn >= kvm->mmu_invalidate_range_start && > + gfn < kvm->mmu_invalidate_range_end) > return 1; > if (kvm->mmu_invalidate_seq != mmu_seq) > return 1; > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 8dace78a0278..09c9cdeb773c 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -540,8 +540,7 @@ static void kvm_mmu_notifier_invalidate_range(struct mmu_notifier *mn, > > typedef bool (*hva_handler_t)(struct kvm *kvm, struct kvm_gfn_range *range); > > -typedef void (*on_lock_fn_t)(struct kvm *kvm, unsigned long start, > - unsigned long end); > +typedef void (*on_lock_fn_t)(struct kvm *kvm, gfn_t start, gfn_t end); > > typedef void (*on_unlock_fn_t)(struct kvm *kvm); > > @@ -628,7 +627,8 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm, > locked = true; > KVM_MMU_LOCK(kvm); > if (!IS_KVM_NULL_FN(range->on_lock)) > - range->on_lock(kvm, range->start, range->end); > + range->on_lock(kvm, gfn_range.start, > + gfn_range.end); > if (IS_KVM_NULL_FN(range->handler)) > break; > } > @@ -715,15 +715,9 @@ static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn, > kvm_handle_hva_range(mn, address, address + 1, pte, kvm_set_spte_gfn); > } > > -void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start, > - unsigned long end) > +static inline void update_invalidate_range(struct kvm *kvm, gfn_t start, > + gfn_t end) > { > - /* > - * The count increase must become visible at unlock time as no > - * spte can be established without taking the mmu_lock and > - * count is also read inside the mmu_lock critical section. > - */ > - kvm->mmu_invalidate_in_progress++; > if (likely(kvm->mmu_invalidate_in_progress == 1)) { > kvm->mmu_invalidate_range_start = start; > kvm->mmu_invalidate_range_end = end; > @@ -744,6 +738,28 @@ void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start, > } > } > > +static void mark_invalidate_in_progress(struct kvm *kvm, gfn_t start, gfn_t end) > +{ > + /* > + * The count increase must become visible at unlock time as no > + * spte can be established without taking the mmu_lock and > + * count is also read inside the mmu_lock critical section. > + */ > + kvm->mmu_invalidate_in_progress++; > +} > + > +static bool kvm_mmu_handle_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) > +{ > + update_invalidate_range(kvm, range->start, range->end); > + return kvm_unmap_gfn_range(kvm, range); > +} > + > +void kvm_mmu_invalidate_begin(struct kvm *kvm, gfn_t start, gfn_t end) > +{ > + mark_invalidate_in_progress(kvm, start, end); > + update_invalidate_range(kvm, start, end); > +} > + > static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, > const struct mmu_notifier_range *range) > { > @@ -752,8 +768,8 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, > .start = range->start, > .end = range->end, > .pte = __pte(0), > - .handler = kvm_unmap_gfn_range, > - .on_lock = kvm_mmu_invalidate_begin, > + .handler = kvm_mmu_handle_gfn_range, > + .on_lock = mark_invalidate_in_progress, > .on_unlock = kvm_arch_guest_memory_reclaimed, > .flush_on_ret = true, > .may_block = mmu_notifier_range_blockable(range), > @@ -791,8 +807,7 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, > return 0; > } > > -void kvm_mmu_invalidate_end(struct kvm *kvm, unsigned long start, > - unsigned long end) > +void kvm_mmu_invalidate_end(struct kvm *kvm, gfn_t start, gfn_t end) > { > /* > * This sequence increase will notify the kvm page fault that > -- > 2.25.1 >
On Thu, Oct 27, 2022 at 11:29:14AM +0100, Fuad Tabba wrote: > Hi, > > On Tue, Oct 25, 2022 at 4:19 PM Chao Peng <chao.p.peng@linux.intel.com> wrote: > > > > Currently in mmu_notifier validate path, hva range is recorded and then > > checked against in the mmu_notifier_retry_hva() of the page fault path. > > However, for the to be introduced private memory, a page fault may not > > have a hva associated, checking gfn(gpa) makes more sense. > > > > For existing non private memory case, gfn is expected to continue to > > work. The only downside is when aliasing multiple gfns to a single hva, > > the current algorithm of checking multiple ranges could result in a much > > larger range being rejected. Such aliasing should be uncommon, so the > > impact is expected small. > > > > It also fixes a bug in kvm_zap_gfn_range() which has already been using > > nit: Now it's kvm_unmap_gfn_range(). Forgot to mention: the bug is still with kvm_zap_gfn_range(). It calls kvm_mmu_invalidate_begin/end with a gfn range but before this series kvm_mmu_invalidate_begin/end actually accept a hva range. Note it's unrelated to whether we use kvm_zap_gfn_range() or kvm_unmap_gfn_range() in the following patch (patch 05). Thanks, Chao > > > gfn when calling kvm_mmu_invalidate_begin/end() while these functions > > accept hva in current code. > > > > Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> > > --- > > Based on reading this code and my limited knowledge of the x86 MMU code: > Reviewed-by: Fuad Tabba <tabba@google.com> > > Cheers, > /fuad > > > > arch/x86/kvm/mmu/mmu.c | 2 +- > > include/linux/kvm_host.h | 18 +++++++--------- > > virt/kvm/kvm_main.c | 45 ++++++++++++++++++++++++++-------------- > > 3 files changed, 39 insertions(+), 26 deletions(-) > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > index 6f81539061d6..33b1aec44fb8 100644 > > --- a/arch/x86/kvm/mmu/mmu.c > > +++ b/arch/x86/kvm/mmu/mmu.c > > @@ -4217,7 +4217,7 @@ static bool is_page_fault_stale(struct kvm_vcpu *vcpu, > > return true; > > > > return fault->slot && > > - mmu_invalidate_retry_hva(vcpu->kvm, mmu_seq, fault->hva); > > + mmu_invalidate_retry_gfn(vcpu->kvm, mmu_seq, fault->gfn); > > } > > > > static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > > index 739a7562a1f3..79e5cbc35fcf 100644 > > --- a/include/linux/kvm_host.h > > +++ b/include/linux/kvm_host.h > > @@ -775,8 +775,8 @@ struct kvm { > > struct mmu_notifier mmu_notifier; > > unsigned long mmu_invalidate_seq; > > long mmu_invalidate_in_progress; > > - unsigned long mmu_invalidate_range_start; > > - unsigned long mmu_invalidate_range_end; > > + gfn_t mmu_invalidate_range_start; > > + gfn_t mmu_invalidate_range_end; > > #endif > > struct list_head devices; > > u64 manual_dirty_log_protect; > > @@ -1365,10 +1365,8 @@ void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc); > > void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc); > > #endif > > > > -void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start, > > - unsigned long end); > > -void kvm_mmu_invalidate_end(struct kvm *kvm, unsigned long start, > > - unsigned long end); > > +void kvm_mmu_invalidate_begin(struct kvm *kvm, gfn_t start, gfn_t end); > > +void kvm_mmu_invalidate_end(struct kvm *kvm, gfn_t start, gfn_t end); > > > > long kvm_arch_dev_ioctl(struct file *filp, > > unsigned int ioctl, unsigned long arg); > > @@ -1937,9 +1935,9 @@ static inline int mmu_invalidate_retry(struct kvm *kvm, unsigned long mmu_seq) > > return 0; > > } > > > > -static inline int mmu_invalidate_retry_hva(struct kvm *kvm, > > +static inline int mmu_invalidate_retry_gfn(struct kvm *kvm, > > unsigned long mmu_seq, > > - unsigned long hva) > > + gfn_t gfn) > > { > > lockdep_assert_held(&kvm->mmu_lock); > > /* > > @@ -1949,8 +1947,8 @@ static inline int mmu_invalidate_retry_hva(struct kvm *kvm, > > * positives, due to shortcuts when handing concurrent invalidations. > > */ > > if (unlikely(kvm->mmu_invalidate_in_progress) && > > - hva >= kvm->mmu_invalidate_range_start && > > - hva < kvm->mmu_invalidate_range_end) > > + gfn >= kvm->mmu_invalidate_range_start && > > + gfn < kvm->mmu_invalidate_range_end) > > return 1; > > if (kvm->mmu_invalidate_seq != mmu_seq) > > return 1; > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > index 8dace78a0278..09c9cdeb773c 100644 > > --- a/virt/kvm/kvm_main.c > > +++ b/virt/kvm/kvm_main.c > > @@ -540,8 +540,7 @@ static void kvm_mmu_notifier_invalidate_range(struct mmu_notifier *mn, > > > > typedef bool (*hva_handler_t)(struct kvm *kvm, struct kvm_gfn_range *range); > > > > -typedef void (*on_lock_fn_t)(struct kvm *kvm, unsigned long start, > > - unsigned long end); > > +typedef void (*on_lock_fn_t)(struct kvm *kvm, gfn_t start, gfn_t end); > > > > typedef void (*on_unlock_fn_t)(struct kvm *kvm); > > > > @@ -628,7 +627,8 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm, > > locked = true; > > KVM_MMU_LOCK(kvm); > > if (!IS_KVM_NULL_FN(range->on_lock)) > > - range->on_lock(kvm, range->start, range->end); > > + range->on_lock(kvm, gfn_range.start, > > + gfn_range.end); > > if (IS_KVM_NULL_FN(range->handler)) > > break; > > } > > @@ -715,15 +715,9 @@ static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn, > > kvm_handle_hva_range(mn, address, address + 1, pte, kvm_set_spte_gfn); > > } > > > > -void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start, > > - unsigned long end) > > +static inline void update_invalidate_range(struct kvm *kvm, gfn_t start, > > + gfn_t end) > > { > > - /* > > - * The count increase must become visible at unlock time as no > > - * spte can be established without taking the mmu_lock and > > - * count is also read inside the mmu_lock critical section. > > - */ > > - kvm->mmu_invalidate_in_progress++; > > if (likely(kvm->mmu_invalidate_in_progress == 1)) { > > kvm->mmu_invalidate_range_start = start; > > kvm->mmu_invalidate_range_end = end; > > @@ -744,6 +738,28 @@ void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start, > > } > > } > > > > +static void mark_invalidate_in_progress(struct kvm *kvm, gfn_t start, gfn_t end) > > +{ > > + /* > > + * The count increase must become visible at unlock time as no > > + * spte can be established without taking the mmu_lock and > > + * count is also read inside the mmu_lock critical section. > > + */ > > + kvm->mmu_invalidate_in_progress++; > > +} > > + > > +static bool kvm_mmu_handle_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) > > +{ > > + update_invalidate_range(kvm, range->start, range->end); > > + return kvm_unmap_gfn_range(kvm, range); > > +} > > + > > +void kvm_mmu_invalidate_begin(struct kvm *kvm, gfn_t start, gfn_t end) > > +{ > > + mark_invalidate_in_progress(kvm, start, end); > > + update_invalidate_range(kvm, start, end); > > +} > > + > > static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, > > const struct mmu_notifier_range *range) > > { > > @@ -752,8 +768,8 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, > > .start = range->start, > > .end = range->end, > > .pte = __pte(0), > > - .handler = kvm_unmap_gfn_range, > > - .on_lock = kvm_mmu_invalidate_begin, > > + .handler = kvm_mmu_handle_gfn_range, > > + .on_lock = mark_invalidate_in_progress, > > .on_unlock = kvm_arch_guest_memory_reclaimed, > > .flush_on_ret = true, > > .may_block = mmu_notifier_range_blockable(range), > > @@ -791,8 +807,7 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, > > return 0; > > } > > > > -void kvm_mmu_invalidate_end(struct kvm *kvm, unsigned long start, > > - unsigned long end) > > +void kvm_mmu_invalidate_end(struct kvm *kvm, gfn_t start, gfn_t end) > > { > > /* > > * This sequence increase will notify the kvm page fault that > > -- > > 2.25.1 > >
On Fri, Nov 04, 2022, Chao Peng wrote: > On Thu, Oct 27, 2022 at 11:29:14AM +0100, Fuad Tabba wrote: > > Hi, > > > > On Tue, Oct 25, 2022 at 4:19 PM Chao Peng <chao.p.peng@linux.intel.com> wrote: > > > > > > Currently in mmu_notifier validate path, hva range is recorded and then > > > checked against in the mmu_notifier_retry_hva() of the page fault path. > > > However, for the to be introduced private memory, a page fault may not > > > have a hva associated, checking gfn(gpa) makes more sense. > > > > > > For existing non private memory case, gfn is expected to continue to > > > work. The only downside is when aliasing multiple gfns to a single hva, > > > the current algorithm of checking multiple ranges could result in a much > > > larger range being rejected. Such aliasing should be uncommon, so the > > > impact is expected small. > > > > > > It also fixes a bug in kvm_zap_gfn_range() which has already been using > > > > nit: Now it's kvm_unmap_gfn_range(). > > Forgot to mention: the bug is still with kvm_zap_gfn_range(). It calls > kvm_mmu_invalidate_begin/end with a gfn range but before this series > kvm_mmu_invalidate_begin/end actually accept a hva range. Note it's > unrelated to whether we use kvm_zap_gfn_range() or kvm_unmap_gfn_range() > in the following patch (patch 05). Grr, in the future, if you find an existing bug, please send a patch. At the very least, report the bug. The APICv case that this was added for could very well be broken because of this, and the resulting failures would be an absolute nightmare to debug. Compile tested only... -- From: Sean Christopherson <seanjc@google.com> Date: Fri, 4 Nov 2022 22:20:33 +0000 Subject: [PATCH] KVM: x86/mmu: Block all page faults during kvm_zap_gfn_range() When zapping a GFN range, pass 0 => ALL_ONES for the to-be-invalidated range to effectively block all page faults while the zap is in-progress. The invalidation helpers take a host virtual address, whereas zapping a GFN obviously provides a guest physical address and with the wrong unit of measurement (frame vs. byte). Alternatively, KVM could walk all memslots to get the associated HVAs, but thanks to SMM, that would require multiple lookups. And practically speaking, kvm_zap_gfn_range() usage is quite rare and not a hot path, e.g. MTRR and CR0.CD are almost guaranteed to be done only on vCPU0 during boot, and APICv inhibits are similarly infrequent operations. Fixes: edb298c663fc ("KVM: x86/mmu: bump mmu notifier count in kvm_zap_gfn_range") Cc: stable@vger.kernel.org Cc: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> --- arch/x86/kvm/mmu/mmu.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 6f81539061d6..1ccb769f62af 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -6056,7 +6056,7 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end) write_lock(&kvm->mmu_lock); - kvm_mmu_invalidate_begin(kvm, gfn_start, gfn_end); + kvm_mmu_invalidate_begin(kvm, 0, -1ul); flush = kvm_rmap_zap_gfn_range(kvm, gfn_start, gfn_end); @@ -6070,7 +6070,7 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end) kvm_flush_remote_tlbs_with_address(kvm, gfn_start, gfn_end - gfn_start); - kvm_mmu_invalidate_end(kvm, gfn_start, gfn_end); + kvm_mmu_invalidate_end(kvm, 0, -1ul); write_unlock(&kvm->mmu_lock); } base-commit: c12879206e47730ff5ab255bbf625b28ade4028f --
On Fri, Nov 04, 2022 at 10:29:48PM +0000, Sean Christopherson wrote: > On Fri, Nov 04, 2022, Chao Peng wrote: > > On Thu, Oct 27, 2022 at 11:29:14AM +0100, Fuad Tabba wrote: > > > Hi, > > > > > > On Tue, Oct 25, 2022 at 4:19 PM Chao Peng <chao.p.peng@linux.intel.com> wrote: > > > > > > > > Currently in mmu_notifier validate path, hva range is recorded and then > > > > checked against in the mmu_notifier_retry_hva() of the page fault path. > > > > However, for the to be introduced private memory, a page fault may not > > > > have a hva associated, checking gfn(gpa) makes more sense. > > > > > > > > For existing non private memory case, gfn is expected to continue to > > > > work. The only downside is when aliasing multiple gfns to a single hva, > > > > the current algorithm of checking multiple ranges could result in a much > > > > larger range being rejected. Such aliasing should be uncommon, so the > > > > impact is expected small. > > > > > > > > It also fixes a bug in kvm_zap_gfn_range() which has already been using > > > > > > nit: Now it's kvm_unmap_gfn_range(). > > > > Forgot to mention: the bug is still with kvm_zap_gfn_range(). It calls > > kvm_mmu_invalidate_begin/end with a gfn range but before this series > > kvm_mmu_invalidate_begin/end actually accept a hva range. Note it's > > unrelated to whether we use kvm_zap_gfn_range() or kvm_unmap_gfn_range() > > in the following patch (patch 05). > > Grr, in the future, if you find an existing bug, please send a patch. At the > very least, report the bug. Agreed, this can be sent out separately from this series. > The APICv case that this was added for could very > well be broken because of this, and the resulting failures would be an absolute > nightmare to debug. Given the apicv_inhibit should be rare, the change looks good to me. Just to be clear, your will send out this fix, right? Chao > > Compile tested only... > > -- > From: Sean Christopherson <seanjc@google.com> > Date: Fri, 4 Nov 2022 22:20:33 +0000 > Subject: [PATCH] KVM: x86/mmu: Block all page faults during > kvm_zap_gfn_range() > > When zapping a GFN range, pass 0 => ALL_ONES for the to-be-invalidated > range to effectively block all page faults while the zap is in-progress. > The invalidation helpers take a host virtual address, whereas zapping a > GFN obviously provides a guest physical address and with the wrong unit > of measurement (frame vs. byte). > > Alternatively, KVM could walk all memslots to get the associated HVAs, > but thanks to SMM, that would require multiple lookups. And practically > speaking, kvm_zap_gfn_range() usage is quite rare and not a hot path, > e.g. MTRR and CR0.CD are almost guaranteed to be done only on vCPU0 > during boot, and APICv inhibits are similarly infrequent operations. > > Fixes: edb298c663fc ("KVM: x86/mmu: bump mmu notifier count in kvm_zap_gfn_range") > Cc: stable@vger.kernel.org > Cc: Maxim Levitsky <mlevitsk@redhat.com> > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > arch/x86/kvm/mmu/mmu.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index 6f81539061d6..1ccb769f62af 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -6056,7 +6056,7 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end) > > write_lock(&kvm->mmu_lock); > > - kvm_mmu_invalidate_begin(kvm, gfn_start, gfn_end); > + kvm_mmu_invalidate_begin(kvm, 0, -1ul); > > flush = kvm_rmap_zap_gfn_range(kvm, gfn_start, gfn_end); > > @@ -6070,7 +6070,7 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end) > kvm_flush_remote_tlbs_with_address(kvm, gfn_start, > gfn_end - gfn_start); > > - kvm_mmu_invalidate_end(kvm, gfn_start, gfn_end); > + kvm_mmu_invalidate_end(kvm, 0, -1ul); > > write_unlock(&kvm->mmu_lock); > } > > base-commit: c12879206e47730ff5ab255bbf625b28ade4028f > --
On Tue, Nov 08, 2022, Chao Peng wrote: > On Fri, Nov 04, 2022 at 10:29:48PM +0000, Sean Christopherson wrote: > > The APICv case that this was added for could very well be broken because of > > this, and the resulting failures would be an absolute nightmare to debug. > > Given the apicv_inhibit should be rare, the change looks good to me. > Just to be clear, your will send out this fix, right? Ya, I'll post an official patch.
On Tue, Oct 25, 2022, Chao Peng wrote: > @@ -715,15 +715,9 @@ static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn, > kvm_handle_hva_range(mn, address, address + 1, pte, kvm_set_spte_gfn); > } > > -void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start, > - unsigned long end) > +static inline Don't tag static functions with "inline" unless they're in headers, in which case the inline is effectively required. In pretty much every scenario, the compiler can do a better job of optimizing inline vs. non-inline, i.e. odds are very good the compiler would inline this helper anyways, and if not, there would likely be a good reason not to inline it. It'll be a moot point in this case (more below), but this would also reduce the line length and avoid the wrap. > void update_invalidate_range(struct kvm *kvm, gfn_t start, > + gfn_t end) I appreciate the effort to make this easier to read, but making such a big divergence from the kernel's preferred formatting is often counter-productive, e.g. I blinked a few times when first reading this code. Again, moot point this time (still below ;-) ), but for future reference, better options are to either let the line poke out or simply wrap early to get the bundling of parameters that you want, e.g. static inline void update_invalidate_range(struct kvm *kvm, gfn_t start, gfn_t end) or static inline void update_invalidate_range(struct kvm *kvm, gfn_t start, gfn_t end) > { > - /* > - * The count increase must become visible at unlock time as no > - * spte can be established without taking the mmu_lock and > - * count is also read inside the mmu_lock critical section. > - */ > - kvm->mmu_invalidate_in_progress++; > if (likely(kvm->mmu_invalidate_in_progress == 1)) { > kvm->mmu_invalidate_range_start = start; > kvm->mmu_invalidate_range_end = end; > @@ -744,6 +738,28 @@ void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start, > } > } > > +static void mark_invalidate_in_progress(struct kvm *kvm, gfn_t start, gfn_t end) Splitting the helpers this way yields a weird API overall, e.g. it's possible (common, actually) to have an "end" without a "begin". Taking the range in the "end" is also dangerous/misleading/imbalanced, because _if_ there are multiple ranges in a batch, each range would need to be unwound independently, e.g. the invocation of the "end" helper in kvm_mmu_notifier_invalidate_range_end() is flat out wrong, it just doesn't cause problems because KVM doesn't (currently) try to unwind regions (and probably never will, but that's beside the point). Rather than shunt what is effectively the "begin" into a separate helper, provide three separate APIs, e.g. begin, range_add, end. That way, begin+end don't take a range and thus are symmetrical, always paired, and can't screw up unwinding since they don't have a range to unwind. It'll require three calls in every case, but that's not the end of the world since none of these flows are super hot paths. > +{ > + /* > + * The count increase must become visible at unlock time as no > + * spte can be established without taking the mmu_lock and > + * count is also read inside the mmu_lock critical section. > + */ > + kvm->mmu_invalidate_in_progress++; This should invalidate (ha!) mmu_invalidate_range_{start,end}, and then WARN in mmu_invalidate_retry() if the range isn't valid. And the "add" helper should WARN if mmu_invalidate_in_progress == 0. > +} > + > +static bool kvm_mmu_handle_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) "handle" is waaaay too generic. Just match kvm_unmap_gfn_range() and call it kvm_mmu_unmap_gfn_range(). This is a local function so it's unlikely to collide with arch code, now or in the future. > +{ > + update_invalidate_range(kvm, range->start, range->end); > + return kvm_unmap_gfn_range(kvm, range); > +} Overall, this? Compile tested only... --- arch/x86/kvm/mmu/mmu.c | 8 +++++--- include/linux/kvm_host.h | 33 +++++++++++++++++++++------------ virt/kvm/kvm_main.c | 30 +++++++++++++++++++++--------- 3 files changed, 47 insertions(+), 24 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 93c389eaf471..d4b373e3e524 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4259,7 +4259,7 @@ static bool is_page_fault_stale(struct kvm_vcpu *vcpu, return true; return fault->slot && - mmu_invalidate_retry_hva(vcpu->kvm, mmu_seq, fault->hva); + mmu_invalidate_retry_gfn(vcpu->kvm, mmu_seq, fault->gfn); } static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) @@ -6098,7 +6098,9 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end) write_lock(&kvm->mmu_lock); - kvm_mmu_invalidate_begin(kvm, gfn_start, gfn_end); + kvm_mmu_invalidate_begin(kvm); + + kvm_mmu_invalidate_range_add(kvm, gfn_start, gfn_end); flush = kvm_rmap_zap_gfn_range(kvm, gfn_start, gfn_end); @@ -6112,7 +6114,7 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end) kvm_flush_remote_tlbs_with_address(kvm, gfn_start, gfn_end - gfn_start); - kvm_mmu_invalidate_end(kvm, gfn_start, gfn_end); + kvm_mmu_invalidate_end(kvm); write_unlock(&kvm->mmu_lock); } diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index e6e66c5e56f2..29aa6d6827cc 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -778,8 +778,8 @@ struct kvm { struct mmu_notifier mmu_notifier; unsigned long mmu_invalidate_seq; long mmu_invalidate_in_progress; - unsigned long mmu_invalidate_range_start; - unsigned long mmu_invalidate_range_end; + gfn_t mmu_invalidate_range_start; + gfn_t mmu_invalidate_range_end; #endif struct list_head devices; u64 manual_dirty_log_protect; @@ -1378,10 +1378,9 @@ void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc); void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc); #endif -void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start, - unsigned long end); -void kvm_mmu_invalidate_end(struct kvm *kvm, unsigned long start, - unsigned long end); +void kvm_mmu_invalidate_begin(struct kvm *kvm); +void kvm_mmu_invalidate_range_add(struct kvm *kvm, gfn_t start, gfn_t end); +void kvm_mmu_invalidate_end(struct kvm *kvm); long kvm_arch_dev_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg); @@ -1952,9 +1951,9 @@ static inline int mmu_invalidate_retry(struct kvm *kvm, unsigned long mmu_seq) return 0; } -static inline int mmu_invalidate_retry_hva(struct kvm *kvm, +static inline int mmu_invalidate_retry_gfn(struct kvm *kvm, unsigned long mmu_seq, - unsigned long hva) + gfn_t gfn) { lockdep_assert_held(&kvm->mmu_lock); /* @@ -1963,10 +1962,20 @@ static inline int mmu_invalidate_retry_hva(struct kvm *kvm, * that might be being invalidated. Note that it may include some false * positives, due to shortcuts when handing concurrent invalidations. */ - if (unlikely(kvm->mmu_invalidate_in_progress) && - hva >= kvm->mmu_invalidate_range_start && - hva < kvm->mmu_invalidate_range_end) - return 1; + if (unlikely(kvm->mmu_invalidate_in_progress)) { + /* + * Dropping mmu_lock after bumping mmu_invalidate_in_progress + * but before updating the range is a KVM bug. + */ + if (WARN_ON_ONCE(kvm->mmu_invalidate_range_start == INVALID_GPA || + kvm->mmu_invalidate_range_end == INVALID_GPA)) + return 1; + + if (gfn >= kvm->mmu_invalidate_range_start && + gfn < kvm->mmu_invalidate_range_end) + return 1; + } + if (kvm->mmu_invalidate_seq != mmu_seq) return 1; return 0; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 43bbe4fde078..e9e03b979f77 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -540,9 +540,7 @@ static void kvm_mmu_notifier_invalidate_range(struct mmu_notifier *mn, typedef bool (*hva_handler_t)(struct kvm *kvm, struct kvm_gfn_range *range); -typedef void (*on_lock_fn_t)(struct kvm *kvm, unsigned long start, - unsigned long end); - +typedef void (*on_lock_fn_t)(struct kvm *kvm); typedef void (*on_unlock_fn_t)(struct kvm *kvm); struct kvm_hva_range { @@ -628,7 +626,8 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm, locked = true; KVM_MMU_LOCK(kvm); if (!IS_KVM_NULL_FN(range->on_lock)) - range->on_lock(kvm, range->start, range->end); + range->on_lock(kvm); + if (IS_KVM_NULL_FN(range->handler)) break; } @@ -715,8 +714,7 @@ static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn, kvm_handle_hva_range(mn, address, address + 1, pte, kvm_set_spte_gfn); } -void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start, - unsigned long end) +void kvm_mmu_invalidate_begin(struct kvm *kvm) { /* * The count increase must become visible at unlock time as no @@ -724,6 +722,15 @@ void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start, * count is also read inside the mmu_lock critical section. */ kvm->mmu_invalidate_in_progress++; + + kvm->mmu_invalidate_range_start = INVALID_GPA; + kvm->mmu_invalidate_range_end = INVALID_GPA; +} + +void kvm_mmu_invalidate_range_add(struct kvm *kvm, gfn_t start, gfn_t end) +{ + WARN_ON_ONCE(!kvm->mmu_invalidate_in_progress); + if (likely(kvm->mmu_invalidate_in_progress == 1)) { kvm->mmu_invalidate_range_start = start; kvm->mmu_invalidate_range_end = end; @@ -744,6 +751,12 @@ void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start, } } +static bool kvm_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) +{ + kvm_mmu_invalidate_range_add(kvm, range->start, range->end); + return kvm_unmap_gfn_range(kvm, range); +} + static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, const struct mmu_notifier_range *range) { @@ -752,7 +765,7 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, .start = range->start, .end = range->end, .pte = __pte(0), - .handler = kvm_unmap_gfn_range, + .handler = kvm_mmu_unmap_gfn_range, .on_lock = kvm_mmu_invalidate_begin, .on_unlock = kvm_arch_guest_memory_reclaimed, .flush_on_ret = true, @@ -791,8 +804,7 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, return 0; } -void kvm_mmu_invalidate_end(struct kvm *kvm, unsigned long start, - unsigned long end) +void kvm_mmu_invalidate_end(struct kvm *kvm) { /* * This sequence increase will notify the kvm page fault that base-commit: d663b8a285986072428a6a145e5994bc275df994 --
On Thu, Nov 10, 2022 at 08:06:33PM +0000, Sean Christopherson wrote: > On Tue, Oct 25, 2022, Chao Peng wrote: > > @@ -715,15 +715,9 @@ static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn, > > kvm_handle_hva_range(mn, address, address + 1, pte, kvm_set_spte_gfn); > > } > > > > -void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start, > > - unsigned long end) > > +static inline > > Don't tag static functions with "inline" unless they're in headers, in which case > the inline is effectively required. In pretty much every scenario, the compiler > can do a better job of optimizing inline vs. non-inline, i.e. odds are very good > the compiler would inline this helper anyways, and if not, there would likely be > a good reason not to inline it. Yep, I know the rationale behind, I made a mistake. > > It'll be a moot point in this case (more below), but this would also reduce the > line length and avoid the wrap. > > > void update_invalidate_range(struct kvm *kvm, gfn_t start, > > + gfn_t end) > > I appreciate the effort to make this easier to read, but making such a big divergence > from the kernel's preferred formatting is often counter-productive, e.g. I blinked a > few times when first reading this code. > > Again, moot point this time (still below ;-) ), but for future reference, better > options are to either let the line poke out or simply wrap early to get the > bundling of parameters that you want, e.g. > > static inline void update_invalidate_range(struct kvm *kvm, gfn_t start, gfn_t end) > > or > > static inline void update_invalidate_range(struct kvm *kvm, > gfn_t start, gfn_t end) Fully agreed. > > > { > > - /* > > - * The count increase must become visible at unlock time as no > > - * spte can be established without taking the mmu_lock and > > - * count is also read inside the mmu_lock critical section. > > - */ > > - kvm->mmu_invalidate_in_progress++; > > if (likely(kvm->mmu_invalidate_in_progress == 1)) { > > kvm->mmu_invalidate_range_start = start; > > kvm->mmu_invalidate_range_end = end; > > @@ -744,6 +738,28 @@ void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start, > > } > > } > > > > +static void mark_invalidate_in_progress(struct kvm *kvm, gfn_t start, gfn_t end) > > Splitting the helpers this way yields a weird API overall, e.g. it's possible > (common, actually) to have an "end" without a "begin". > > Taking the range in the "end" is also dangerous/misleading/imbalanced, because _if_ > there are multiple ranges in a batch, each range would need to be unwound > independently, e.g. the invocation of the "end" helper in > kvm_mmu_notifier_invalidate_range_end() is flat out wrong, it just doesn't cause > problems because KVM doesn't (currently) try to unwind regions (and probably never > will, but that's beside the point). I actually also don't feel good with existing code (taking range in the "start" and "end") but didn't go further to find a better solution. > > Rather than shunt what is effectively the "begin" into a separate helper, provide > three separate APIs, e.g. begin, range_add, end. That way, begin+end don't take a > range and thus are symmetrical, always paired, and can't screw up unwinding since > they don't have a range to unwind. This looks much better to me. > > It'll require three calls in every case, but that's not the end of the world since > none of these flows are super hot paths. > > > +{ > > + /* > > + * The count increase must become visible at unlock time as no > > + * spte can be established without taking the mmu_lock and > > + * count is also read inside the mmu_lock critical section. > > + */ > > + kvm->mmu_invalidate_in_progress++; > > This should invalidate (ha!) mmu_invalidate_range_{start,end}, and then WARN in > mmu_invalidate_retry() if the range isn't valid. And the "add" helper should > WARN if mmu_invalidate_in_progress == 0. > > > +} > > + > > +static bool kvm_mmu_handle_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) > > "handle" is waaaay too generic. Just match kvm_unmap_gfn_range() and call it > kvm_mmu_unmap_gfn_range(). This is a local function so it's unlikely to collide > with arch code, now or in the future. Agreed. > > > +{ > > + update_invalidate_range(kvm, range->start, range->end); > > + return kvm_unmap_gfn_range(kvm, range); > > +} > > Overall, this? Compile tested only... Thanks! Chao > > --- > arch/x86/kvm/mmu/mmu.c | 8 +++++--- > include/linux/kvm_host.h | 33 +++++++++++++++++++++------------ > virt/kvm/kvm_main.c | 30 +++++++++++++++++++++--------- > 3 files changed, 47 insertions(+), 24 deletions(-) > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index 93c389eaf471..d4b373e3e524 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -4259,7 +4259,7 @@ static bool is_page_fault_stale(struct kvm_vcpu *vcpu, > return true; > > return fault->slot && > - mmu_invalidate_retry_hva(vcpu->kvm, mmu_seq, fault->hva); > + mmu_invalidate_retry_gfn(vcpu->kvm, mmu_seq, fault->gfn); > } > > static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > @@ -6098,7 +6098,9 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end) > > write_lock(&kvm->mmu_lock); > > - kvm_mmu_invalidate_begin(kvm, gfn_start, gfn_end); > + kvm_mmu_invalidate_begin(kvm); > + > + kvm_mmu_invalidate_range_add(kvm, gfn_start, gfn_end); > > flush = kvm_rmap_zap_gfn_range(kvm, gfn_start, gfn_end); > > @@ -6112,7 +6114,7 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end) > kvm_flush_remote_tlbs_with_address(kvm, gfn_start, > gfn_end - gfn_start); > > - kvm_mmu_invalidate_end(kvm, gfn_start, gfn_end); > + kvm_mmu_invalidate_end(kvm); > > write_unlock(&kvm->mmu_lock); > } > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index e6e66c5e56f2..29aa6d6827cc 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -778,8 +778,8 @@ struct kvm { > struct mmu_notifier mmu_notifier; > unsigned long mmu_invalidate_seq; > long mmu_invalidate_in_progress; > - unsigned long mmu_invalidate_range_start; > - unsigned long mmu_invalidate_range_end; > + gfn_t mmu_invalidate_range_start; > + gfn_t mmu_invalidate_range_end; > #endif > struct list_head devices; > u64 manual_dirty_log_protect; > @@ -1378,10 +1378,9 @@ void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc); > void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc); > #endif > > -void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start, > - unsigned long end); > -void kvm_mmu_invalidate_end(struct kvm *kvm, unsigned long start, > - unsigned long end); > +void kvm_mmu_invalidate_begin(struct kvm *kvm); > +void kvm_mmu_invalidate_range_add(struct kvm *kvm, gfn_t start, gfn_t end); > +void kvm_mmu_invalidate_end(struct kvm *kvm); > > long kvm_arch_dev_ioctl(struct file *filp, > unsigned int ioctl, unsigned long arg); > @@ -1952,9 +1951,9 @@ static inline int mmu_invalidate_retry(struct kvm *kvm, unsigned long mmu_seq) > return 0; > } > > -static inline int mmu_invalidate_retry_hva(struct kvm *kvm, > +static inline int mmu_invalidate_retry_gfn(struct kvm *kvm, > unsigned long mmu_seq, > - unsigned long hva) > + gfn_t gfn) > { > lockdep_assert_held(&kvm->mmu_lock); > /* > @@ -1963,10 +1962,20 @@ static inline int mmu_invalidate_retry_hva(struct kvm *kvm, > * that might be being invalidated. Note that it may include some false > * positives, due to shortcuts when handing concurrent invalidations. > */ > - if (unlikely(kvm->mmu_invalidate_in_progress) && > - hva >= kvm->mmu_invalidate_range_start && > - hva < kvm->mmu_invalidate_range_end) > - return 1; > + if (unlikely(kvm->mmu_invalidate_in_progress)) { > + /* > + * Dropping mmu_lock after bumping mmu_invalidate_in_progress > + * but before updating the range is a KVM bug. > + */ > + if (WARN_ON_ONCE(kvm->mmu_invalidate_range_start == INVALID_GPA || > + kvm->mmu_invalidate_range_end == INVALID_GPA)) > + return 1; > + > + if (gfn >= kvm->mmu_invalidate_range_start && > + gfn < kvm->mmu_invalidate_range_end) > + return 1; > + } > + > if (kvm->mmu_invalidate_seq != mmu_seq) > return 1; > return 0; > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 43bbe4fde078..e9e03b979f77 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -540,9 +540,7 @@ static void kvm_mmu_notifier_invalidate_range(struct mmu_notifier *mn, > > typedef bool (*hva_handler_t)(struct kvm *kvm, struct kvm_gfn_range *range); > > -typedef void (*on_lock_fn_t)(struct kvm *kvm, unsigned long start, > - unsigned long end); > - > +typedef void (*on_lock_fn_t)(struct kvm *kvm); > typedef void (*on_unlock_fn_t)(struct kvm *kvm); > > struct kvm_hva_range { > @@ -628,7 +626,8 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm, > locked = true; > KVM_MMU_LOCK(kvm); > if (!IS_KVM_NULL_FN(range->on_lock)) > - range->on_lock(kvm, range->start, range->end); > + range->on_lock(kvm); > + > if (IS_KVM_NULL_FN(range->handler)) > break; > } > @@ -715,8 +714,7 @@ static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn, > kvm_handle_hva_range(mn, address, address + 1, pte, kvm_set_spte_gfn); > } > > -void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start, > - unsigned long end) > +void kvm_mmu_invalidate_begin(struct kvm *kvm) > { > /* > * The count increase must become visible at unlock time as no > @@ -724,6 +722,15 @@ void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start, > * count is also read inside the mmu_lock critical section. > */ > kvm->mmu_invalidate_in_progress++; > + > + kvm->mmu_invalidate_range_start = INVALID_GPA; > + kvm->mmu_invalidate_range_end = INVALID_GPA; > +} > + > +void kvm_mmu_invalidate_range_add(struct kvm *kvm, gfn_t start, gfn_t end) > +{ > + WARN_ON_ONCE(!kvm->mmu_invalidate_in_progress); > + > if (likely(kvm->mmu_invalidate_in_progress == 1)) { > kvm->mmu_invalidate_range_start = start; > kvm->mmu_invalidate_range_end = end; > @@ -744,6 +751,12 @@ void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start, > } > } > > +static bool kvm_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) > +{ > + kvm_mmu_invalidate_range_add(kvm, range->start, range->end); > + return kvm_unmap_gfn_range(kvm, range); > +} > + > static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, > const struct mmu_notifier_range *range) > { > @@ -752,7 +765,7 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, > .start = range->start, > .end = range->end, > .pte = __pte(0), > - .handler = kvm_unmap_gfn_range, > + .handler = kvm_mmu_unmap_gfn_range, > .on_lock = kvm_mmu_invalidate_begin, > .on_unlock = kvm_arch_guest_memory_reclaimed, > .flush_on_ret = true, > @@ -791,8 +804,7 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, > return 0; > } > > -void kvm_mmu_invalidate_end(struct kvm *kvm, unsigned long start, > - unsigned long end) > +void kvm_mmu_invalidate_end(struct kvm *kvm) > { > /* > * This sequence increase will notify the kvm page fault that > > base-commit: d663b8a285986072428a6a145e5994bc275df994 > -- >
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 6f81539061d6..33b1aec44fb8 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4217,7 +4217,7 @@ static bool is_page_fault_stale(struct kvm_vcpu *vcpu, return true; return fault->slot && - mmu_invalidate_retry_hva(vcpu->kvm, mmu_seq, fault->hva); + mmu_invalidate_retry_gfn(vcpu->kvm, mmu_seq, fault->gfn); } static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 739a7562a1f3..79e5cbc35fcf 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -775,8 +775,8 @@ struct kvm { struct mmu_notifier mmu_notifier; unsigned long mmu_invalidate_seq; long mmu_invalidate_in_progress; - unsigned long mmu_invalidate_range_start; - unsigned long mmu_invalidate_range_end; + gfn_t mmu_invalidate_range_start; + gfn_t mmu_invalidate_range_end; #endif struct list_head devices; u64 manual_dirty_log_protect; @@ -1365,10 +1365,8 @@ void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc); void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc); #endif -void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start, - unsigned long end); -void kvm_mmu_invalidate_end(struct kvm *kvm, unsigned long start, - unsigned long end); +void kvm_mmu_invalidate_begin(struct kvm *kvm, gfn_t start, gfn_t end); +void kvm_mmu_invalidate_end(struct kvm *kvm, gfn_t start, gfn_t end); long kvm_arch_dev_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg); @@ -1937,9 +1935,9 @@ static inline int mmu_invalidate_retry(struct kvm *kvm, unsigned long mmu_seq) return 0; } -static inline int mmu_invalidate_retry_hva(struct kvm *kvm, +static inline int mmu_invalidate_retry_gfn(struct kvm *kvm, unsigned long mmu_seq, - unsigned long hva) + gfn_t gfn) { lockdep_assert_held(&kvm->mmu_lock); /* @@ -1949,8 +1947,8 @@ static inline int mmu_invalidate_retry_hva(struct kvm *kvm, * positives, due to shortcuts when handing concurrent invalidations. */ if (unlikely(kvm->mmu_invalidate_in_progress) && - hva >= kvm->mmu_invalidate_range_start && - hva < kvm->mmu_invalidate_range_end) + gfn >= kvm->mmu_invalidate_range_start && + gfn < kvm->mmu_invalidate_range_end) return 1; if (kvm->mmu_invalidate_seq != mmu_seq) return 1; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 8dace78a0278..09c9cdeb773c 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -540,8 +540,7 @@ static void kvm_mmu_notifier_invalidate_range(struct mmu_notifier *mn, typedef bool (*hva_handler_t)(struct kvm *kvm, struct kvm_gfn_range *range); -typedef void (*on_lock_fn_t)(struct kvm *kvm, unsigned long start, - unsigned long end); +typedef void (*on_lock_fn_t)(struct kvm *kvm, gfn_t start, gfn_t end); typedef void (*on_unlock_fn_t)(struct kvm *kvm); @@ -628,7 +627,8 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm, locked = true; KVM_MMU_LOCK(kvm); if (!IS_KVM_NULL_FN(range->on_lock)) - range->on_lock(kvm, range->start, range->end); + range->on_lock(kvm, gfn_range.start, + gfn_range.end); if (IS_KVM_NULL_FN(range->handler)) break; } @@ -715,15 +715,9 @@ static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn, kvm_handle_hva_range(mn, address, address + 1, pte, kvm_set_spte_gfn); } -void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start, - unsigned long end) +static inline void update_invalidate_range(struct kvm *kvm, gfn_t start, + gfn_t end) { - /* - * The count increase must become visible at unlock time as no - * spte can be established without taking the mmu_lock and - * count is also read inside the mmu_lock critical section. - */ - kvm->mmu_invalidate_in_progress++; if (likely(kvm->mmu_invalidate_in_progress == 1)) { kvm->mmu_invalidate_range_start = start; kvm->mmu_invalidate_range_end = end; @@ -744,6 +738,28 @@ void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start, } } +static void mark_invalidate_in_progress(struct kvm *kvm, gfn_t start, gfn_t end) +{ + /* + * The count increase must become visible at unlock time as no + * spte can be established without taking the mmu_lock and + * count is also read inside the mmu_lock critical section. + */ + kvm->mmu_invalidate_in_progress++; +} + +static bool kvm_mmu_handle_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) +{ + update_invalidate_range(kvm, range->start, range->end); + return kvm_unmap_gfn_range(kvm, range); +} + +void kvm_mmu_invalidate_begin(struct kvm *kvm, gfn_t start, gfn_t end) +{ + mark_invalidate_in_progress(kvm, start, end); + update_invalidate_range(kvm, start, end); +} + static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, const struct mmu_notifier_range *range) { @@ -752,8 +768,8 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, .start = range->start, .end = range->end, .pte = __pte(0), - .handler = kvm_unmap_gfn_range, - .on_lock = kvm_mmu_invalidate_begin, + .handler = kvm_mmu_handle_gfn_range, + .on_lock = mark_invalidate_in_progress, .on_unlock = kvm_arch_guest_memory_reclaimed, .flush_on_ret = true, .may_block = mmu_notifier_range_blockable(range), @@ -791,8 +807,7 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, return 0; } -void kvm_mmu_invalidate_end(struct kvm *kvm, unsigned long start, - unsigned long end) +void kvm_mmu_invalidate_end(struct kvm *kvm, gfn_t start, gfn_t end) { /* * This sequence increase will notify the kvm page fault that