From patchwork Tue Jul 18 23:44:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 122325 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:c923:0:b0:3e4:2afc:c1 with SMTP id j3csp2098670vqt; Tue, 18 Jul 2023 17:20:20 -0700 (PDT) X-Google-Smtp-Source: APBJJlFzbCKczS4syW8OH9XgwFOJ/6mimknbwntHtmqnb36BhU0icd1JHzPe/usQCv2yMy90PUzr X-Received: by 2002:a17:906:21a:b0:982:b224:2a5d with SMTP id 26-20020a170906021a00b00982b2242a5dmr1004063ejd.37.1689726020355; Tue, 18 Jul 2023 17:20:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689726020; cv=none; d=google.com; s=arc-20160816; b=qilHQzRMS2B0ZcU/UNBBMKT6gLGsr58cp5KAKhBqI02WejVs95PfuInqAal5PhtxZK Pb6cwya5vXxGZukrtUAMBzsjmU0TI1sMH+Z5S5CLyOzEvGDI7oHU/UR+WxNoSZGW515c Pl+NDzRaMQHEzQbJHmI5xjxzWGopTYazf4oxYGtNwGMu9trzxh21QCW15s8OAo89yjcQ NfUzpIRdIbohIjXdGTOSuudAXJ4En2YdV1u6Vdlg0hnvDksgHmUXegHgzIJy9cVhCK0q KLO4toSZFapqFbLliXVuOir8Vc0qzFVpJHdfwyoo3S5yI0v4RNb0WrXrFtFBrRo0fPUQ svwQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:reply-to:dkim-signature; bh=13TvpQfWmhj98VTWRgGiYJYYPcBR6KCMHuSUQlwO+hU=; fh=D6d5aNryYQiYyjenfSFWOT3594jRd4xJmyqVXmIxeQo=; b=BB6z3ryp+Cepi886wkYPLu/tj2VpZnkL1pEH57hhPP6GVeGHti3v/AA6zukeMyP0ik 23wTE/n9K1ddd0WfM/SC3YyWI+Nf/o8DlR3ue7GKVa+Uoqj0waVlTWL38oJDrNZRUIBF 51rBZqbNvdMTuFhKInvM5bh6DrbkD7/nUKYvP6FS1D/VP+P2bh0z25pIhGyjT/J2KmKq YSGyBvDDsW6sftMphB+ObQgKoU+teKeammq6iTPPnmaFtQruCCtdM49LB8bxPIjDANgO nETP0Q5RzWQ0ASFZVkFA/qCWcJ3qS9dBo1PPKEFlEhOzhbsJmEoyYaqTGIfyEgYb3Lwk fZtQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=K73Vbg7k; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ck18-20020a170906c45200b0098e2aa0bb92si1942901ejb.137.2023.07.18.17.19.56; Tue, 18 Jul 2023 17:20:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=K73Vbg7k; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230341AbjGRXtQ (ORCPT + 99 others); Tue, 18 Jul 2023 19:49:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48728 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230118AbjGRXsz (ORCPT ); Tue, 18 Jul 2023 19:48:55 -0400 Received: from mail-pl1-x649.google.com (mail-pl1-x649.google.com [IPv6:2607:f8b0:4864:20::649]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 943F010A for ; Tue, 18 Jul 2023 16:48:37 -0700 (PDT) Received: by mail-pl1-x649.google.com with SMTP id d9443c01a7336-1b888bdacbcso33053115ad.2 for ; Tue, 18 Jul 2023 16:48:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689724117; x=1692316117; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=13TvpQfWmhj98VTWRgGiYJYYPcBR6KCMHuSUQlwO+hU=; b=K73Vbg7kksC/z8D2Q5HaDQ7UWFNhH20EcZQeVd3Lf2J0boclBZkYp6kZLyJv4l7d/v 56zL/m0nSTB8tFW2XTPItzxIs14nSlrppiGSTorI7aichxttdeRbEqNHnqUSw3Aw+2kn Fji8j9h2Uv0Ij0eYpo+vRulpevlfNeeaN++p4cr5gXiklm6fhe6e4jDIizxqxGBZJh16 032nb0dR2D1cAOXQ4dh7+UhQLasc5in709vPZ9EttwJPaHiHomwPODSWeSP1mdLy4wFs ++94jsLDULOKu9MnMNrAsqU8w6PnH6mDknBjPD335n+eYk0/ErbJSraNW2wkmXKS+1vk RBZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689724117; x=1692316117; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=13TvpQfWmhj98VTWRgGiYJYYPcBR6KCMHuSUQlwO+hU=; b=IKU2ZQ9WvI88K+Flqbmm4l/5/LVzTtJDRE9e0BTbRIfxiuBtK9VkyDkoEPKond0CqT cXZOMfT1q4RzACsBcxiFrR/iNPz0FzhZh53TZwc7dcC4ZpNoNdh7rqewLeZieGMBYeCM vgMVUZMPvfAKh806fmBQgMOkf0UrVuDo2oTlLQwAZAJZhQWwuzksuvhB1/HurMhJz1hc AvKMu6WTStpV5ry24kvA7i4IMn/rHy95jrgFDDOlnS+YwGlfKWOvINWcts5NIhw9vrP1 Sm36FJeTHtQZ4P3D0CnH7q1ZPd7pyS2UfhMABBHZlzUHoHEGCyMGpXTWLdKafWrHjwit oTcA== X-Gm-Message-State: ABy/qLYZo5siCBHkDXxFhle2k/ZvVJ4unN9Mo//qrUWa/itOrBvlNBLl 7eDQsz7XEwh7irvrE0iWeX21vQdPIj8= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:903:2301:b0:1bb:1ffd:5cc8 with SMTP id d1-20020a170903230100b001bb1ffd5cc8mr7895plh.11.1689724116857; Tue, 18 Jul 2023 16:48:36 -0700 (PDT) Reply-To: Sean Christopherson Date: Tue, 18 Jul 2023 16:44:46 -0700 In-Reply-To: <20230718234512.1690985-1-seanjc@google.com> Mime-Version: 1.0 References: <20230718234512.1690985-1-seanjc@google.com> X-Mailer: git-send-email 2.41.0.255.g8b1d071c50-goog Message-ID: <20230718234512.1690985-4-seanjc@google.com> Subject: [RFC PATCH v11 03/29] KVM: Use gfn instead of hva for mmu_notifier_retry From: Sean Christopherson To: Paolo Bonzini , Marc Zyngier , Oliver Upton , Huacai Chen , Michael Ellerman , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , Sean Christopherson , "Matthew Wilcox (Oracle)" , Andrew Morton , Paul Moore , James Morris , "Serge E. Hallyn" Cc: kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org, Chao Peng , Fuad Tabba , Jarkko Sakkinen , Yu Zhang , Vishal Annapurve , Ackerley Tng , Maciej Szmigiero , Vlastimil Babka , David Hildenbrand , Quentin Perret , Michael Roth , Wang , Liam Merwick , Isaku Yamahata , "Kirill A . Shutemov" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771806151386950759 X-GMAIL-MSGID: 1771806151386950759 From: Chao Peng Currently in mmu_notifier invalidate path, hva range is recorded and then checked against by mmu_notifier_retry_hva() in the page fault handling path. However, for the to be introduced private memory, a page fault may not have a hva associated, checking gfn(gpa) makes more sense. For existing hva based shared memory, gfn is expected to also work. The only downside is when aliasing multiple gfns to a single hva, the current algorithm of checking multiple ranges could result in a much larger range being rejected. Such aliasing should be uncommon, so the impact is expected small. Suggested-by: Sean Christopherson Signed-off-by: Chao Peng Reviewed-by: Fuad Tabba Tested-by: Fuad Tabba [sean: convert vmx_set_apic_access_page_addr() to gfn-based API] Signed-off-by: Sean Christopherson Reviewed-by: Paolo Bonzini --- arch/x86/kvm/mmu/mmu.c | 10 ++++++---- arch/x86/kvm/vmx/vmx.c | 11 +++++------ include/linux/kvm_host.h | 33 +++++++++++++++++++++------------ virt/kvm/kvm_main.c | 40 +++++++++++++++++++++++++++++++--------- 4 files changed, 63 insertions(+), 31 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index d72f2b20f430..b034727c4cf9 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3087,7 +3087,7 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep) * * There are several ways to safely use this helper: * - * - Check mmu_invalidate_retry_hva() after grabbing the mapping level, before + * - Check mmu_invalidate_retry_gfn() after grabbing the mapping level, before * consuming it. In this case, mmu_lock doesn't need to be held during the * lookup, but it does need to be held while checking the MMU notifier. * @@ -4400,7 +4400,7 @@ static bool is_page_fault_stale(struct kvm_vcpu *vcpu, return true; return fault->slot && - mmu_invalidate_retry_hva(vcpu->kvm, fault->mmu_seq, fault->hva); + mmu_invalidate_retry_gfn(vcpu->kvm, fault->mmu_seq, fault->gfn); } static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) @@ -6301,7 +6301,9 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end) write_lock(&kvm->mmu_lock); - kvm_mmu_invalidate_begin(kvm, 0, -1ul); + kvm_mmu_invalidate_begin(kvm); + + kvm_mmu_invalidate_range_add(kvm, gfn_start, gfn_end); flush = kvm_rmap_zap_gfn_range(kvm, gfn_start, gfn_end); @@ -6314,7 +6316,7 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end) if (flush) kvm_flush_remote_tlbs_range(kvm, gfn_start, gfn_end - gfn_start); - kvm_mmu_invalidate_end(kvm, 0, -1ul); + kvm_mmu_invalidate_end(kvm); write_unlock(&kvm->mmu_lock); } diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 0ecf4be2c6af..946380b53cf5 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6729,10 +6729,10 @@ static void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu) return; /* - * Grab the memslot so that the hva lookup for the mmu_notifier retry - * is guaranteed to use the same memslot as the pfn lookup, i.e. rely - * on the pfn lookup's validation of the memslot to ensure a valid hva - * is used for the retry check. + * Explicitly grab the memslot using KVM's internal slot ID to ensure + * KVM doesn't unintentionally grab a userspace memslot. It _should_ + * be impossible for userspace to create a memslot for the APIC when + * APICv is enabled, but paranoia won't hurt in this case. */ slot = id_to_memslot(slots, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT); if (!slot || slot->flags & KVM_MEMSLOT_INVALID) @@ -6757,8 +6757,7 @@ static void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu) return; read_lock(&vcpu->kvm->mmu_lock); - if (mmu_invalidate_retry_hva(kvm, mmu_seq, - gfn_to_hva_memslot(slot, gfn))) { + if (mmu_invalidate_retry_gfn(kvm, mmu_seq, gfn)) { kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu); read_unlock(&vcpu->kvm->mmu_lock); goto out; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index b901571ab61e..90a0be261a5c 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -788,8 +788,8 @@ struct kvm { struct mmu_notifier mmu_notifier; unsigned long mmu_invalidate_seq; long mmu_invalidate_in_progress; - unsigned long mmu_invalidate_range_start; - unsigned long mmu_invalidate_range_end; + gfn_t mmu_invalidate_range_start; + gfn_t mmu_invalidate_range_end; #endif struct list_head devices; u64 manual_dirty_log_protect; @@ -1371,10 +1371,9 @@ void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc); void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc); #endif -void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start, - unsigned long end); -void kvm_mmu_invalidate_end(struct kvm *kvm, unsigned long start, - unsigned long end); +void kvm_mmu_invalidate_begin(struct kvm *kvm); +void kvm_mmu_invalidate_range_add(struct kvm *kvm, gfn_t start, gfn_t end); +void kvm_mmu_invalidate_end(struct kvm *kvm); long kvm_arch_dev_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg); @@ -1940,9 +1939,9 @@ static inline int mmu_invalidate_retry(struct kvm *kvm, unsigned long mmu_seq) return 0; } -static inline int mmu_invalidate_retry_hva(struct kvm *kvm, +static inline int mmu_invalidate_retry_gfn(struct kvm *kvm, unsigned long mmu_seq, - unsigned long hva) + gfn_t gfn) { lockdep_assert_held(&kvm->mmu_lock); /* @@ -1951,10 +1950,20 @@ static inline int mmu_invalidate_retry_hva(struct kvm *kvm, * that might be being invalidated. Note that it may include some false * positives, due to shortcuts when handing concurrent invalidations. */ - if (unlikely(kvm->mmu_invalidate_in_progress) && - hva >= kvm->mmu_invalidate_range_start && - hva < kvm->mmu_invalidate_range_end) - return 1; + if (unlikely(kvm->mmu_invalidate_in_progress)) { + /* + * Dropping mmu_lock after bumping mmu_invalidate_in_progress + * but before updating the range is a KVM bug. + */ + if (WARN_ON_ONCE(kvm->mmu_invalidate_range_start == INVALID_GPA || + kvm->mmu_invalidate_range_end == INVALID_GPA)) + return 1; + + if (gfn >= kvm->mmu_invalidate_range_start && + gfn < kvm->mmu_invalidate_range_end) + return 1; + } + if (kvm->mmu_invalidate_seq != mmu_seq) return 1; return 0; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 50aea855eeae..8101b11a13ba 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -518,9 +518,7 @@ static inline struct kvm *mmu_notifier_to_kvm(struct mmu_notifier *mn) typedef bool (*gfn_handler_t)(struct kvm *kvm, struct kvm_gfn_range *range); -typedef void (*on_lock_fn_t)(struct kvm *kvm, unsigned long start, - unsigned long end); - +typedef void (*on_lock_fn_t)(struct kvm *kvm); typedef void (*on_unlock_fn_t)(struct kvm *kvm); struct kvm_mmu_notifier_range { @@ -617,7 +615,8 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm, locked = true; KVM_MMU_LOCK(kvm); if (!IS_KVM_NULL_FN(range->on_lock)) - range->on_lock(kvm, range->start, range->end); + range->on_lock(kvm); + if (IS_KVM_NULL_FN(range->handler)) break; } @@ -721,15 +720,26 @@ static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn, kvm_handle_hva_range(mn, address, address + 1, pte, kvm_change_spte_gfn); } -void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start, - unsigned long end) +void kvm_mmu_invalidate_begin(struct kvm *kvm) { + lockdep_assert_held_write(&kvm->mmu_lock); /* * The count increase must become visible at unlock time as no * spte can be established without taking the mmu_lock and * count is also read inside the mmu_lock critical section. */ kvm->mmu_invalidate_in_progress++; + + if (likely(kvm->mmu_invalidate_in_progress == 1)) + kvm->mmu_invalidate_range_start = INVALID_GPA; +} + +void kvm_mmu_invalidate_range_add(struct kvm *kvm, gfn_t start, gfn_t end) +{ + lockdep_assert_held_write(&kvm->mmu_lock); + + WARN_ON_ONCE(!kvm->mmu_invalidate_in_progress); + if (likely(kvm->mmu_invalidate_in_progress == 1)) { kvm->mmu_invalidate_range_start = start; kvm->mmu_invalidate_range_end = end; @@ -750,6 +760,12 @@ void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start, } } +static bool kvm_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) +{ + kvm_mmu_invalidate_range_add(kvm, range->start, range->end); + return kvm_unmap_gfn_range(kvm, range); +} + static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, const struct mmu_notifier_range *range) { @@ -757,7 +773,7 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, const struct kvm_mmu_notifier_range hva_range = { .start = range->start, .end = range->end, - .handler = kvm_unmap_gfn_range, + .handler = kvm_mmu_unmap_gfn_range, .on_lock = kvm_mmu_invalidate_begin, .on_unlock = kvm_arch_guest_memory_reclaimed, .flush_on_ret = true, @@ -796,8 +812,7 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, return 0; } -void kvm_mmu_invalidate_end(struct kvm *kvm, unsigned long start, - unsigned long end) +void kvm_mmu_invalidate_end(struct kvm *kvm) { /* * This sequence increase will notify the kvm page fault that @@ -812,6 +827,13 @@ void kvm_mmu_invalidate_end(struct kvm *kvm, unsigned long start, * in conjunction with the smp_rmb in mmu_invalidate_retry(). */ kvm->mmu_invalidate_in_progress--; + + /* + * Assert that at least one range must be added between start() and + * end(). Not adding a range isn't fatal, but it is a KVM bug. + */ + WARN_ON_ONCE(kvm->mmu_invalidate_in_progress && + kvm->mmu_invalidate_range_start == INVALID_GPA); } static void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn,