From patchwork Fri Jan 27 04:44:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Stevens X-Patchwork-Id: 49063 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp647701wrn; Thu, 26 Jan 2023 20:48:52 -0800 (PST) X-Google-Smtp-Source: AMrXdXvC/uOiXEA50BpCOJ1q+bAqzuwzH8NNIqa3nv6owLt9AHkfevO0MTLWN5ma4XgAQkvAPutH X-Received: by 2002:a17:907:8a1d:b0:86f:950a:32ce with SMTP id sc29-20020a1709078a1d00b0086f950a32cemr51982091ejc.75.1674794932719; Thu, 26 Jan 2023 20:48:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674794932; cv=none; d=google.com; s=arc-20160816; b=dgvyO814L8v2w4hHPSi3qgTfNPR83PSHjltBUmJ8Sg/DQEVjrL4hLymy8BBgEbdY8T daqxRqbTa1VCI0PMb9p7f9j0ipSAI3HaiSW6jiKTagKR+1kwyBbQ464kq4qzFa+adj4K 72jc4baijN7+FL38zOWpssMKslVzRWsw7aQ/EpYnfXf4DNR0MqosoKR1wbevsVYu8Cno 8XvjyHRMI4tNxKxABK7NvYO3RXYyI2MfBnKlNz4YMzFvJTjQTzEd7DPGUiaqSqtSVlWk yLQ12ygfdacyUzdpdrytqAnci7xNz4+FOCIh4g0MUZmb5SBz0zdme9jPcTRGZgZuHe7G mUbQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=sm5+zxZ8TukkvdWRFsQQygkxSem3YXdv7EANNgiD4ZQ=; b=VgxHJ7yT7YbEoz5xxVHroCnj2G+vUl+9tWdAMx1um3x2epErrdZsAp/4Fx8Dxm263d j33d83y+dMOJzzzbG+81ojFDAQatCSgSRs5zeFPMERrIk2Wd0jOrgtixs+wLD5d2V6ez OrnZHyU5wQI7CUdlgL7iNGZZ7kpB+EDagX4gp5znHSe1QiXpGNWDcpgAdac4HhEB49na ru3HVj7pOvR6IS4Jl53Lze+wNGqz6++PusEnqPbUeImy8pBVR1+UjezebfaRsj2iqslh uPlvG+9Pv4+7Gtlekl0eRwyJTuM+s3H2lcS5Iwpy2a+tr0lUiCI9zDYjmn14Ssl2Dc9f FpRQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b="lmhxQ/Xz"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m7-20020a170906848700b008785fbe7098si3745982ejx.193.2023.01.26.20.48.29; Thu, 26 Jan 2023 20:48:52 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b="lmhxQ/Xz"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230015AbjA0Ep0 (ORCPT + 99 others); Thu, 26 Jan 2023 23:45:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36784 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229550AbjA0EpZ (ORCPT ); Thu, 26 Jan 2023 23:45:25 -0500 Received: from mail-pj1-x1035.google.com (mail-pj1-x1035.google.com [IPv6:2607:f8b0:4864:20::1035]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1F31D7376C for ; Thu, 26 Jan 2023 20:45:23 -0800 (PST) Received: by mail-pj1-x1035.google.com with SMTP id b10so3507475pjo.1 for ; Thu, 26 Jan 2023 20:45:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=sm5+zxZ8TukkvdWRFsQQygkxSem3YXdv7EANNgiD4ZQ=; b=lmhxQ/XzVr8v0Ewn2LZYjsmdA6i4mZjwixkwAIMotR5Gtv3YFr4tO3ZjaZJ8bMry9l hxgTGJ8yBe4I3+syBMYMvaX/yDK6y0tLKulMdg+y4OI9xrTUjFO7gggkf/0uzn84Dd4y jZyjocPkL2pYx0+09r7Ska61Dxk6w4404VI7w= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=sm5+zxZ8TukkvdWRFsQQygkxSem3YXdv7EANNgiD4ZQ=; b=tifMoH/tEJPt9Rd5xevslwEcawXG54n8G1aWKHaX1ppdZ+o3cq6kyFlEW9OJpQQWQ0 PkrMsWs7Vh/buP7eAi1W2LFkdrvEX3aSsjs6bYaGIdbKd4hVHPBHGfIBof/foY7l6If/ CrbuGDhO/8WiLfWI8hvvVb41P8oKgBHeAtrM9XVgbdUNrlOF8CtCioO3DP4SbKWKfwYT e8K++x0TK7or7V5x/jxND4Il0Rb67cyboLPtxHi4SIg9nj+3vghOqIlwSqlFLHBVj28z FCXiLVOZSy07slAhBJtgJE7sSMqV77SL7pIUWQ7Odk1l5tv5JHWF2ijvIGB2rHfMwgQ5 NIQQ== X-Gm-Message-State: AFqh2kqYaeQgk9ju0chfO15l7xvZWutvOFYKNOI7fTVZWFdZf52iJX9e jIgJHrKSP0zgbgBu/nm7WRLyNRA6qZ4fO91Y X-Received: by 2002:a05:6a20:d48f:b0:bb:84a1:68b0 with SMTP id im15-20020a056a20d48f00b000bb84a168b0mr14174687pzb.55.1674794722498; Thu, 26 Jan 2023 20:45:22 -0800 (PST) Received: from localhost ([2401:fa00:8f:203:24fb:4159:9391:441d]) by smtp.gmail.com with UTF8SMTPSA id f190-20020a636ac7000000b004a737a6e62fsm1465547pgc.14.2023.01.26.20.45.20 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 26 Jan 2023 20:45:22 -0800 (PST) From: David Stevens X-Google-Original-From: David Stevens To: Sean Christopherson , David Woodhouse Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Stevens Subject: [PATCH 1/3] KVM: Support sharing gpc locks Date: Fri, 27 Jan 2023 13:44:58 +0900 Message-Id: <20230127044500.680329-2-stevensd@google.com> X-Mailer: git-send-email 2.39.1.456.gfc5497dd1b-goog In-Reply-To: <20230127044500.680329-1-stevensd@google.com> References: <20230127044500.680329-1-stevensd@google.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1756149770986657663?= X-GMAIL-MSGID: =?utf-8?q?1756149770986657663?= From: David Stevens Support initializing a gfn_to_pfn_cache with an external lock instead of its embedded lock. This allows groups of gpcs that are accessed together to share a lock, which can greatly simplify locking. Signed-off-by: David Stevens --- arch/x86/kvm/x86.c | 8 +++--- arch/x86/kvm/xen.c | 58 +++++++++++++++++++-------------------- include/linux/kvm_host.h | 12 ++++++++ include/linux/kvm_types.h | 3 +- virt/kvm/pfncache.c | 37 +++++++++++++++---------- 5 files changed, 70 insertions(+), 48 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 508074e47bc0..ec0de9bc2eae 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3047,14 +3047,14 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v, struct pvclock_vcpu_time_info *guest_hv_clock; unsigned long flags; - read_lock_irqsave(&gpc->lock, flags); + read_lock_irqsave(gpc->lock, flags); while (!kvm_gpc_check(gpc, offset + sizeof(*guest_hv_clock))) { - read_unlock_irqrestore(&gpc->lock, flags); + read_unlock_irqrestore(gpc->lock, flags); if (kvm_gpc_refresh(gpc, offset + sizeof(*guest_hv_clock))) return; - read_lock_irqsave(&gpc->lock, flags); + read_lock_irqsave(gpc->lock, flags); } guest_hv_clock = (void *)(gpc->khva + offset); @@ -3083,7 +3083,7 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v, guest_hv_clock->version = ++vcpu->hv_clock.version; mark_page_dirty_in_slot(v->kvm, gpc->memslot, gpc->gpa >> PAGE_SHIFT); - read_unlock_irqrestore(&gpc->lock, flags); + read_unlock_irqrestore(gpc->lock, flags); trace_kvm_pvclock_update(v->vcpu_id, &vcpu->hv_clock); } diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c index 2681e2007e39..fa8ab23271d3 100644 --- a/arch/x86/kvm/xen.c +++ b/arch/x86/kvm/xen.c @@ -59,12 +59,12 @@ static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn) wall_nsec = ktime_get_real_ns() - get_kvmclock_ns(kvm); /* It could be invalid again already, so we need to check */ - read_lock_irq(&gpc->lock); + read_lock_irq(gpc->lock); if (gpc->valid) break; - read_unlock_irq(&gpc->lock); + read_unlock_irq(gpc->lock); } while (1); /* Paranoia checks on the 32-bit struct layout */ @@ -101,7 +101,7 @@ static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn) smp_wmb(); wc->version = wc_version + 1; - read_unlock_irq(&gpc->lock); + read_unlock_irq(gpc->lock); kvm_make_all_cpus_request(kvm, KVM_REQ_MASTERCLOCK_UPDATE); @@ -274,15 +274,15 @@ static void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, bool atomic) */ if (atomic) { local_irq_save(flags); - if (!read_trylock(&gpc1->lock)) { + if (!read_trylock(gpc1->lock)) { local_irq_restore(flags); return; } } else { - read_lock_irqsave(&gpc1->lock, flags); + read_lock_irqsave(gpc1->lock, flags); } while (!kvm_gpc_check(gpc1, user_len1)) { - read_unlock_irqrestore(&gpc1->lock, flags); + read_unlock_irqrestore(gpc1->lock, flags); /* When invoked from kvm_sched_out() we cannot sleep */ if (atomic) @@ -291,7 +291,7 @@ static void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, bool atomic) if (kvm_gpc_refresh(gpc1, user_len1)) return; - read_lock_irqsave(&gpc1->lock, flags); + read_lock_irqsave(gpc1->lock, flags); } if (likely(!user_len2)) { @@ -316,19 +316,19 @@ static void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, bool atomic) * takes them more than one at a time. Set a subclass on the * gpc1 lock to make lockdep shut up about it. */ - lock_set_subclass(&gpc1->lock.dep_map, 1, _THIS_IP_); + lock_set_subclass(gpc1->lock.dep_map, 1, _THIS_IP_); if (atomic) { - if (!read_trylock(&gpc2->lock)) { - read_unlock_irqrestore(&gpc1->lock, flags); + if (!read_trylock(gpc2->lock)) { + read_unlock_irqrestore(gpc1->lock, flags); return; } } else { - read_lock(&gpc2->lock); + read_lock(gpc2->lock); } if (!kvm_gpc_check(gpc2, user_len2)) { - read_unlock(&gpc2->lock); - read_unlock_irqrestore(&gpc1->lock, flags); + read_unlock(gpc2->lock); + read_unlock_irqrestore(gpc1->lock, flags); /* When invoked from kvm_sched_out() we cannot sleep */ if (atomic) @@ -428,9 +428,9 @@ static void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, bool atomic) } if (user_len2) - read_unlock(&gpc2->lock); + read_unlock(gpc2->lock); - read_unlock_irqrestore(&gpc1->lock, flags); + read_unlock_irqrestore(gpc1->lock, flags); mark_page_dirty_in_slot(v->kvm, gpc1->memslot, gpc1->gpa >> PAGE_SHIFT); if (user_len2) @@ -505,14 +505,14 @@ void kvm_xen_inject_pending_events(struct kvm_vcpu *v) * does anyway. Page it in and retry the instruction. We're just a * little more honest about it. */ - read_lock_irqsave(&gpc->lock, flags); + read_lock_irqsave(gpc->lock, flags); while (!kvm_gpc_check(gpc, sizeof(struct vcpu_info))) { - read_unlock_irqrestore(&gpc->lock, flags); + read_unlock_irqrestore(gpc->lock, flags); if (kvm_gpc_refresh(gpc, sizeof(struct vcpu_info))) return; - read_lock_irqsave(&gpc->lock, flags); + read_lock_irqsave(gpc->lock, flags); } /* Now gpc->khva is a valid kernel address for the vcpu_info */ @@ -540,7 +540,7 @@ void kvm_xen_inject_pending_events(struct kvm_vcpu *v) : "0" (evtchn_pending_sel32)); WRITE_ONCE(vi->evtchn_upcall_pending, 1); } - read_unlock_irqrestore(&gpc->lock, flags); + read_unlock_irqrestore(gpc->lock, flags); /* For the per-vCPU lapic vector, deliver it as MSI. */ if (v->arch.xen.upcall_vector) @@ -568,9 +568,9 @@ int __kvm_xen_has_interrupt(struct kvm_vcpu *v) BUILD_BUG_ON(sizeof(rc) != sizeof_field(struct compat_vcpu_info, evtchn_upcall_pending)); - read_lock_irqsave(&gpc->lock, flags); + read_lock_irqsave(gpc->lock, flags); while (!kvm_gpc_check(gpc, sizeof(struct vcpu_info))) { - read_unlock_irqrestore(&gpc->lock, flags); + read_unlock_irqrestore(gpc->lock, flags); /* * This function gets called from kvm_vcpu_block() after setting the @@ -590,11 +590,11 @@ int __kvm_xen_has_interrupt(struct kvm_vcpu *v) */ return 0; } - read_lock_irqsave(&gpc->lock, flags); + read_lock_irqsave(gpc->lock, flags); } rc = ((struct vcpu_info *)gpc->khva)->evtchn_upcall_pending; - read_unlock_irqrestore(&gpc->lock, flags); + read_unlock_irqrestore(gpc->lock, flags); return rc; } @@ -1172,7 +1172,7 @@ static bool wait_pending_event(struct kvm_vcpu *vcpu, int nr_ports, int idx, i; idx = srcu_read_lock(&kvm->srcu); - read_lock_irqsave(&gpc->lock, flags); + read_lock_irqsave(gpc->lock, flags); if (!kvm_gpc_check(gpc, PAGE_SIZE)) goto out_rcu; @@ -1193,7 +1193,7 @@ static bool wait_pending_event(struct kvm_vcpu *vcpu, int nr_ports, } out_rcu: - read_unlock_irqrestore(&gpc->lock, flags); + read_unlock_irqrestore(gpc->lock, flags); srcu_read_unlock(&kvm->srcu, idx); return ret; @@ -1576,7 +1576,7 @@ int kvm_xen_set_evtchn_fast(struct kvm_xen_evtchn *xe, struct kvm *kvm) idx = srcu_read_lock(&kvm->srcu); - read_lock_irqsave(&gpc->lock, flags); + read_lock_irqsave(gpc->lock, flags); if (!kvm_gpc_check(gpc, PAGE_SIZE)) goto out_rcu; @@ -1607,10 +1607,10 @@ int kvm_xen_set_evtchn_fast(struct kvm_xen_evtchn *xe, struct kvm *kvm) } else { rc = 1; /* Delivered to the bitmap in shared_info. */ /* Now switch to the vCPU's vcpu_info to set the index and pending_sel */ - read_unlock_irqrestore(&gpc->lock, flags); + read_unlock_irqrestore(gpc->lock, flags); gpc = &vcpu->arch.xen.vcpu_info_cache; - read_lock_irqsave(&gpc->lock, flags); + read_lock_irqsave(gpc->lock, flags); if (!kvm_gpc_check(gpc, sizeof(struct vcpu_info))) { /* * Could not access the vcpu_info. Set the bit in-kernel @@ -1644,7 +1644,7 @@ int kvm_xen_set_evtchn_fast(struct kvm_xen_evtchn *xe, struct kvm *kvm) } out_rcu: - read_unlock_irqrestore(&gpc->lock, flags); + read_unlock_irqrestore(gpc->lock, flags); srcu_read_unlock(&kvm->srcu, idx); if (kick_vcpu) { diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 109b18e2789c..7d1f9c6561e3 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1279,6 +1279,18 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn); void kvm_gpc_init(struct gfn_to_pfn_cache *gpc, struct kvm *kvm, struct kvm_vcpu *vcpu, enum pfn_cache_usage usage); +/** + * kvm_gpc_init_with_lock - initialize gfn_to_pfn_cache with an external lock. + * + * @lock: an initialized rwlock + * + * See kvm_gpc_init. Allows multiple gfn_to_pfn_cache structs to share the + * same lock. + */ +void kvm_gpc_init_with_lock(struct gfn_to_pfn_cache *gpc, struct kvm *kvm, + struct kvm_vcpu *vcpu, enum pfn_cache_usage usage, + rwlock_t *lock); + /** * kvm_gpc_activate - prepare a cached kernel mapping and HPA for a given guest * physical address. diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h index 76de36e56cdf..b6432c8cc19c 100644 --- a/include/linux/kvm_types.h +++ b/include/linux/kvm_types.h @@ -70,7 +70,8 @@ struct gfn_to_pfn_cache { struct kvm *kvm; struct kvm_vcpu *vcpu; struct list_head list; - rwlock_t lock; + rwlock_t *lock; + rwlock_t _lock; struct mutex refresh_lock; void *khva; kvm_pfn_t pfn; diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c index 2d6aba677830..2c6a2edaca9f 100644 --- a/virt/kvm/pfncache.c +++ b/virt/kvm/pfncache.c @@ -31,7 +31,7 @@ void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm, unsigned long start, spin_lock(&kvm->gpc_lock); list_for_each_entry(gpc, &kvm->gpc_list, list) { - write_lock_irq(&gpc->lock); + write_lock_irq(gpc->lock); /* Only a single page so no need to care about length */ if (gpc->valid && !is_error_noslot_pfn(gpc->pfn) && @@ -50,7 +50,7 @@ void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm, unsigned long start, __set_bit(gpc->vcpu->vcpu_idx, vcpu_bitmap); } } - write_unlock_irq(&gpc->lock); + write_unlock_irq(gpc->lock); } spin_unlock(&kvm->gpc_lock); @@ -147,7 +147,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc) lockdep_assert_held(&gpc->refresh_lock); - lockdep_assert_held_write(&gpc->lock); + lockdep_assert_held_write(gpc->lock); /* * Invalidate the cache prior to dropping gpc->lock, the gpa=>uhva @@ -160,7 +160,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc) mmu_seq = gpc->kvm->mmu_invalidate_seq; smp_rmb(); - write_unlock_irq(&gpc->lock); + write_unlock_irq(gpc->lock); /* * If the previous iteration "failed" due to an mmu_notifier @@ -208,7 +208,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc) } } - write_lock_irq(&gpc->lock); + write_lock_irq(gpc->lock); /* * Other tasks must wait for _this_ refresh to complete before @@ -231,7 +231,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc) return 0; out_error: - write_lock_irq(&gpc->lock); + write_lock_irq(gpc->lock); return -EFAULT; } @@ -261,7 +261,7 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, */ mutex_lock(&gpc->refresh_lock); - write_lock_irq(&gpc->lock); + write_lock_irq(gpc->lock); if (!gpc->active) { ret = -EINVAL; @@ -321,7 +321,7 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unmap_old = (old_pfn != gpc->pfn); out_unlock: - write_unlock_irq(&gpc->lock); + write_unlock_irq(gpc->lock); mutex_unlock(&gpc->refresh_lock); @@ -339,20 +339,29 @@ EXPORT_SYMBOL_GPL(kvm_gpc_refresh); void kvm_gpc_init(struct gfn_to_pfn_cache *gpc, struct kvm *kvm, struct kvm_vcpu *vcpu, enum pfn_cache_usage usage) +{ + rwlock_init(&gpc->_lock); + kvm_gpc_init_with_lock(gpc, kvm, vcpu, usage, &gpc->_lock); +} +EXPORT_SYMBOL_GPL(kvm_gpc_init); + +void kvm_gpc_init_with_lock(struct gfn_to_pfn_cache *gpc, struct kvm *kvm, + struct kvm_vcpu *vcpu, enum pfn_cache_usage usage, + rwlock_t *lock) { WARN_ON_ONCE(!usage || (usage & KVM_GUEST_AND_HOST_USE_PFN) != usage); WARN_ON_ONCE((usage & KVM_GUEST_USES_PFN) && !vcpu); - rwlock_init(&gpc->lock); mutex_init(&gpc->refresh_lock); gpc->kvm = kvm; gpc->vcpu = vcpu; + gpc->lock = lock; gpc->usage = usage; gpc->pfn = KVM_PFN_ERR_FAULT; gpc->uhva = KVM_HVA_ERR_BAD; } -EXPORT_SYMBOL_GPL(kvm_gpc_init); +EXPORT_SYMBOL_GPL(kvm_gpc_init_with_lock); int kvm_gpc_activate(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned long len) { @@ -371,9 +380,9 @@ int kvm_gpc_activate(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned long len) * refresh must not establish a mapping until the cache is * reachable by mmu_notifier events. */ - write_lock_irq(&gpc->lock); + write_lock_irq(gpc->lock); gpc->active = true; - write_unlock_irq(&gpc->lock); + write_unlock_irq(gpc->lock); } return __kvm_gpc_refresh(gpc, gpa, len); } @@ -391,7 +400,7 @@ void kvm_gpc_deactivate(struct gfn_to_pfn_cache *gpc) * must stall mmu_notifier events until all users go away, i.e. * until gpc->lock is dropped and refresh is guaranteed to fail. */ - write_lock_irq(&gpc->lock); + write_lock_irq(gpc->lock); gpc->active = false; gpc->valid = false; @@ -406,7 +415,7 @@ void kvm_gpc_deactivate(struct gfn_to_pfn_cache *gpc) old_pfn = gpc->pfn; gpc->pfn = KVM_PFN_ERR_FAULT; - write_unlock_irq(&gpc->lock); + write_unlock_irq(gpc->lock); spin_lock(&kvm->gpc_lock); list_del(&gpc->list); From patchwork Fri Jan 27 04:44:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Stevens X-Patchwork-Id: 49064 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp650271wrn; Thu, 26 Jan 2023 20:58:30 -0800 (PST) X-Google-Smtp-Source: AMrXdXtgKHq9x+MokEQRjrZ73DTFlu5EmM84WQjmwLXTpcQEfii+E06VkIKGHLtbimfi70sDRILw X-Received: by 2002:a05:6402:448d:b0:498:2f9f:3442 with SMTP id er13-20020a056402448d00b004982f9f3442mr41267412edb.2.1674795510414; Thu, 26 Jan 2023 20:58:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674795510; cv=none; d=google.com; s=arc-20160816; b=d2ve8+acli+j1YhvjhaNjn/lshX8FyYXEZ/QUQ3rGCEBlTWTrzTX31TA2OdXOEAyf/ oESi0JvP5EXCBOlgahjGsoKUFaW8uJQLtNUt9YKCcN102C1YknCTUgpCNuOjldALWZum q1nUPLXAk8GXgyWfX7uMHgQhKmeRgL0AzyNirm7XVLr7WcRK05FMu033yHwfrTHH6e2B LuZAQToRi4C02WcwhiX5Iv08svoB5kydw6tlO0MWpKwmiwTkVvpbWjq5TEvmZbLnecfS KHEM5WcJ9da9uUSXYTvroZNfG6GO8yYI3tU6avAk2AI/aihzi6WMI/yeh5bJN9rPxBhZ MXRQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=gGw7veRvhdNX0oFsBRU1tw8SlWDf/IqfMO8eO5En504=; b=rx39n0t/mbmm2HNNTZxqI5N/tBZ/QHLlrbXRfq59AWcCvSOkCJb5usZvlp8DhwPaPS BdnIIxW+1FhVjXG9B3zGRZZQ5zpoSQSlJJ83kX9dhSfWGVp7goADT7LQCEoS5biVSac1 iN4geWDkopT+Yw3TBpX6dPCtJqZYQhOavt8u2ZH35lWWx+LgIoubPNCIxwQdJxY00c7B v56eS4wRNLwvA55NH5pgUVQH56SUF2GED1x3KnqIZRy4uM6Y+QaO+dGJXehxjlcnV8TW UPuxbBGxLVvxK3ATcztFvCTZ+w8HttC7wxSqV1gQzzAM2XJGrudEz9mYmj9VJLbSUkWL RDTw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=MepRbm4r; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k16-20020aa7c390000000b004a0e445bbdbsi3043281edq.601.2023.01.26.20.58.02; Thu, 26 Jan 2023 20:58:30 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=MepRbm4r; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230363AbjA0Epe (ORCPT + 99 others); Thu, 26 Jan 2023 23:45:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36976 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230051AbjA0Ep2 (ORCPT ); Thu, 26 Jan 2023 23:45:28 -0500 Received: from mail-pj1-x1035.google.com (mail-pj1-x1035.google.com [IPv6:2607:f8b0:4864:20::1035]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8271E73779 for ; Thu, 26 Jan 2023 20:45:26 -0800 (PST) Received: by mail-pj1-x1035.google.com with SMTP id b10so3507555pjo.1 for ; Thu, 26 Jan 2023 20:45:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=gGw7veRvhdNX0oFsBRU1tw8SlWDf/IqfMO8eO5En504=; b=MepRbm4rcmFcgkoFp9nT02upnWcDX91x1NivyRGygNzVL4N+nhGBvANjTNk+jgshDs zYX4f3+E7jWkLxCZWwXjic9DFaTo+hJpUb0HMNJR8EvWA5Gl6QJ4iCmV04ZJBg/u/Okc nA1E/Z470z8oWt+dbPmVuiWKt5qFyY3VrEXVk= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gGw7veRvhdNX0oFsBRU1tw8SlWDf/IqfMO8eO5En504=; b=x90G/R4wju0cvDOCMEU5Vr6mKA6s+rLQ78Rtx7Oiwho3l6DEzTrAcOMCuKMkg/ycmp 1hVHWAz0FgynQuCQG0ysAPuWC2ef4AFCzqRdUxAgeWO54p3fAvpDzzlJdj7EuhdbCZGI TI2PpFVFuZGLEu87kT68dzIwYxg5hOGTWDFBp3AxGlw6u/8s2KaoDlVRLohYQmn2bcwZ YBN7ltiBKPlPKcguUQy48WnVj7FB0p9WW+jtJWJwDlofw/cJ+R2zcHE71pzd9hnLhnmk 9d8v2zop4ucUpZWJcgLSRbIs1GHkn7ZWK9US6Nh7Uach9BcOG7yEEux7n2gGgcap4oC0 4X1A== X-Gm-Message-State: AFqh2kqhhZsJusr0aj2QvPU7zEya9/mq8Laxnj8pvCzfZbwcKvjx2+GI Rp8+049cd3zVkUtdn5J3RrtT6Q== X-Received: by 2002:a17:903:543:b0:194:6110:9fe1 with SMTP id jo3-20020a170903054300b0019461109fe1mr34011797plb.4.1674794725930; Thu, 26 Jan 2023 20:45:25 -0800 (PST) Received: from localhost ([2401:fa00:8f:203:24fb:4159:9391:441d]) by smtp.gmail.com with UTF8SMTPSA id l8-20020a170902f68800b001960e64fc24sm1783885plg.119.2023.01.26.20.45.24 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 26 Jan 2023 20:45:25 -0800 (PST) From: David Stevens X-Google-Original-From: David Stevens To: Sean Christopherson , David Woodhouse Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Stevens Subject: [PATCH 2/3] KVM: use gfn=>pfn cache in nested_get_vmcs12_pages Date: Fri, 27 Jan 2023 13:44:59 +0900 Message-Id: <20230127044500.680329-3-stevensd@google.com> X-Mailer: git-send-email 2.39.1.456.gfc5497dd1b-goog In-Reply-To: <20230127044500.680329-1-stevensd@google.com> References: <20230127044500.680329-1-stevensd@google.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1756150377391877244?= X-GMAIL-MSGID: =?utf-8?q?1756150377391877244?= From: David Stevens Use gfn_to_pfn_cache to access guest pages needed by nested_get_vmcs12_pages. This replaces kvm_vcpu_map, which doesn't properly handle updates to the HVA->GFN mapping. The MSR bitmap is only accessed in nested_vmx_prepare_msr_bitmap, so it could potentially be accessed directly through the HVA. However, using a persistent gpc should be more efficient, and maintenance of the gpc can be easily done alongside the other gpcs. Signed-off-by: David Stevens --- arch/x86/kvm/vmx/nested.c | 206 ++++++++++++++++++++++++++++++-------- arch/x86/kvm/vmx/vmx.c | 38 ++++++- arch/x86/kvm/vmx/vmx.h | 11 +- 3 files changed, 204 insertions(+), 51 deletions(-) diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 557b9c468734..cb41113caa8a 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -324,9 +324,10 @@ static void free_nested(struct kvm_vcpu *vcpu) * page's backing page (yeah, confusing) shouldn't actually be accessed, * and if it is written, the contents are irrelevant. */ - kvm_vcpu_unmap(vcpu, &vmx->nested.apic_access_page_map, false); - kvm_vcpu_unmap(vcpu, &vmx->nested.virtual_apic_map, true); - kvm_vcpu_unmap(vcpu, &vmx->nested.pi_desc_map, true); + kvm_gpc_deactivate(&vmx->nested.apic_access_gpc); + kvm_gpc_deactivate(&vmx->nested.virtual_apic_gpc); + kvm_gpc_deactivate(&vmx->nested.pi_desc_gpc); + kvm_gpc_deactivate(&vmx->nested.msr_bitmap_gpc); vmx->nested.pi_desc = NULL; kvm_mmu_free_roots(vcpu->kvm, &vcpu->arch.guest_mmu, KVM_MMU_ROOTS_ALL); @@ -558,19 +559,22 @@ static inline void nested_vmx_set_intercept_for_msr(struct vcpu_vmx *vmx, msr_bitmap_l0, msr); } +static bool nested_vmcs12_gpc_check(struct gfn_to_pfn_cache *gpc, + gpa_t gpa, unsigned long len, bool *try_refresh); + /* * Merge L0's and L1's MSR bitmap, return false to indicate that * we do not use the hardware. */ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu, - struct vmcs12 *vmcs12) + struct vmcs12 *vmcs12, + bool *try_refresh) { struct vcpu_vmx *vmx = to_vmx(vcpu); int msr; unsigned long *msr_bitmap_l1; unsigned long *msr_bitmap_l0 = vmx->nested.vmcs02.msr_bitmap; struct hv_enlightened_vmcs *evmcs = vmx->nested.hv_evmcs; - struct kvm_host_map *map = &vmx->nested.msr_bitmap_map; /* Nothing to do if the MSR bitmap is not in use. */ if (!cpu_has_vmx_msr_bitmap() || @@ -590,10 +594,11 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu, evmcs->hv_clean_fields & HV_VMX_ENLIGHTENED_CLEAN_FIELD_MSR_BITMAP) return true; - if (kvm_vcpu_map(vcpu, gpa_to_gfn(vmcs12->msr_bitmap), map)) + if (!nested_vmcs12_gpc_check(&vmx->nested.msr_bitmap_gpc, + vmcs12->msr_bitmap, PAGE_SIZE, try_refresh)) return false; - msr_bitmap_l1 = (unsigned long *)map->hva; + msr_bitmap_l1 = vmx->nested.msr_bitmap_gpc.khva; /* * To keep the control flow simple, pay eight 8-byte writes (sixteen @@ -654,8 +659,6 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu, nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0, MSR_IA32_PRED_CMD, MSR_TYPE_W); - kvm_vcpu_unmap(vcpu, &vmx->nested.msr_bitmap_map, false); - vmx->nested.force_msr_bitmap_recalc = false; return true; @@ -3184,11 +3187,59 @@ static bool nested_get_evmcs_page(struct kvm_vcpu *vcpu) return true; } -static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu) +static bool nested_vmcs12_gpc_check(struct gfn_to_pfn_cache *gpc, + gpa_t gpa, unsigned long len, bool *try_refresh) +{ + bool check; + + if (gpc->gpa != gpa || !gpc->active) + return false; + check = kvm_gpc_check(gpc, len); + if (!check) + *try_refresh = true; + return check; +} + +static void nested_vmcs12_gpc_refresh(struct gfn_to_pfn_cache *gpc, + gpa_t gpa, unsigned long len) +{ + if (gpc->gpa != gpa || !gpc->active) { + kvm_gpc_deactivate(gpc); + + if (kvm_gpc_activate(gpc, gpa, len)) + kvm_gpc_deactivate(gpc); + } else { + if (kvm_gpc_refresh(gpc, len)) + kvm_gpc_deactivate(gpc); + } +} + +static void nested_get_vmcs12_pages_refresh(struct kvm_vcpu *vcpu) +{ + struct vmcs12 *vmcs12 = get_vmcs12(vcpu); + struct vcpu_vmx *vmx = to_vmx(vcpu); + + if (nested_cpu_has2(vmcs12, SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)) + nested_vmcs12_gpc_refresh(&vmx->nested.apic_access_gpc, + vmcs12->apic_access_addr, PAGE_SIZE); + + if (nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW)) + nested_vmcs12_gpc_refresh(&vmx->nested.virtual_apic_gpc, + vmcs12->virtual_apic_page_addr, PAGE_SIZE); + + if (nested_cpu_has_posted_intr(vmcs12)) + nested_vmcs12_gpc_refresh(&vmx->nested.pi_desc_gpc, + vmcs12->posted_intr_desc_addr, sizeof(struct pi_desc)); + + if (cpu_has_vmx_msr_bitmap() && nested_cpu_has(vmcs12, CPU_BASED_USE_MSR_BITMAPS)) + nested_vmcs12_gpc_refresh(&vmx->nested.msr_bitmap_gpc, + vmcs12->msr_bitmap, PAGE_SIZE); +} + +static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu, bool *try_refresh) { struct vmcs12 *vmcs12 = get_vmcs12(vcpu); struct vcpu_vmx *vmx = to_vmx(vcpu); - struct kvm_host_map *map; if (!vcpu->arch.pdptrs_from_userspace && !nested_cpu_has_ept(vmcs12) && is_pae_paging(vcpu)) { @@ -3197,16 +3248,19 @@ static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu) * the guest CR3 might be restored prior to setting the nested * state which can lead to a load of wrong PDPTRs. */ - if (CC(!load_pdptrs(vcpu, vcpu->arch.cr3))) + if (CC(!load_pdptrs(vcpu, vcpu->arch.cr3))) { + *try_refresh = false; return false; + } } - if (nested_cpu_has2(vmcs12, SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)) { - map = &vmx->nested.apic_access_page_map; - - if (!kvm_vcpu_map(vcpu, gpa_to_gfn(vmcs12->apic_access_addr), map)) { - vmcs_write64(APIC_ACCESS_ADDR, pfn_to_hpa(map->pfn)); + if (nested_vmcs12_gpc_check(&vmx->nested.apic_access_gpc, + vmcs12->apic_access_addr, PAGE_SIZE, try_refresh)) { + vmcs_write64(APIC_ACCESS_ADDR, + pfn_to_hpa(vmx->nested.apic_access_gpc.pfn)); + } else if (*try_refresh) { + return false; } else { pr_debug_ratelimited("%s: no backing for APIC-access address in vmcs12\n", __func__); @@ -3219,10 +3273,13 @@ static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu) } if (nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW)) { - map = &vmx->nested.virtual_apic_map; - - if (!kvm_vcpu_map(vcpu, gpa_to_gfn(vmcs12->virtual_apic_page_addr), map)) { - vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, pfn_to_hpa(map->pfn)); + if (nested_vmcs12_gpc_check(&vmx->nested.virtual_apic_gpc, + vmcs12->virtual_apic_page_addr, PAGE_SIZE, + try_refresh)) { + vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, + pfn_to_hpa(vmx->nested.virtual_apic_gpc.pfn)); + } else if (*try_refresh) { + return false; } else if (nested_cpu_has(vmcs12, CPU_BASED_CR8_LOAD_EXITING) && nested_cpu_has(vmcs12, CPU_BASED_CR8_STORE_EXITING) && !nested_cpu_has2(vmcs12, SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)) { @@ -3245,14 +3302,16 @@ static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu) } if (nested_cpu_has_posted_intr(vmcs12)) { - map = &vmx->nested.pi_desc_map; - - if (!kvm_vcpu_map(vcpu, gpa_to_gfn(vmcs12->posted_intr_desc_addr), map)) { + if (nested_vmcs12_gpc_check(&vmx->nested.pi_desc_gpc, + vmcs12->posted_intr_desc_addr, + sizeof(struct pi_desc), try_refresh)) { vmx->nested.pi_desc = - (struct pi_desc *)(((void *)map->hva) + - offset_in_page(vmcs12->posted_intr_desc_addr)); + (struct pi_desc *)vmx->nested.pi_desc_gpc.khva; vmcs_write64(POSTED_INTR_DESC_ADDR, - pfn_to_hpa(map->pfn) + offset_in_page(vmcs12->posted_intr_desc_addr)); + pfn_to_hpa(vmx->nested.pi_desc_gpc.pfn) + + offset_in_page(vmx->nested.pi_desc_gpc.gpa)); + } else if (*try_refresh) { + return false; } else { /* * Defer the KVM_INTERNAL_EXIT until KVM tries to @@ -3264,16 +3323,22 @@ static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu) pin_controls_clearbit(vmx, PIN_BASED_POSTED_INTR); } } - if (nested_vmx_prepare_msr_bitmap(vcpu, vmcs12)) + if (nested_vmx_prepare_msr_bitmap(vcpu, vmcs12, try_refresh)) { exec_controls_setbit(vmx, CPU_BASED_USE_MSR_BITMAPS); - else + } else { + if (*try_refresh) + return false; exec_controls_clearbit(vmx, CPU_BASED_USE_MSR_BITMAPS); + } return true; } static bool vmx_get_nested_state_pages(struct kvm_vcpu *vcpu) { + bool success, try_refresh; + int idx; + /* * Note: nested_get_evmcs_page() also updates 'vp_assist_page' copy * in 'struct kvm_vcpu_hv' in case eVMCS is in use, this is mandatory @@ -3291,8 +3356,24 @@ static bool vmx_get_nested_state_pages(struct kvm_vcpu *vcpu) return false; } - if (is_guest_mode(vcpu) && !nested_get_vmcs12_pages(vcpu)) - return false; + if (!is_guest_mode(vcpu)) + return true; + + try_refresh = true; +retry: + idx = srcu_read_lock(&vcpu->kvm->srcu); + success = nested_get_vmcs12_pages(vcpu, &try_refresh); + srcu_read_unlock(&vcpu->kvm->srcu, idx); + + if (!success) { + if (try_refresh) { + nested_get_vmcs12_pages_refresh(vcpu); + try_refresh = false; + goto retry; + } else { + return false; + } + } return true; } @@ -3389,6 +3470,8 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, .failed_vmentry = 1, }; u32 failed_index; + bool success, try_refresh; + unsigned long flags; trace_kvm_nested_vmenter(kvm_rip_read(vcpu), vmx->nested.current_vmptr, @@ -3441,13 +3524,26 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, prepare_vmcs02_early(vmx, &vmx->vmcs01, vmcs12); if (from_vmentry) { - if (unlikely(!nested_get_vmcs12_pages(vcpu))) { - vmx_switch_vmcs(vcpu, &vmx->vmcs01); - return NVMX_VMENTRY_KVM_INTERNAL_ERROR; + try_refresh = true; +retry: + read_lock_irqsave(vmx->nested.apic_access_gpc.lock, flags); + success = nested_get_vmcs12_pages(vcpu, &try_refresh); + + if (unlikely(!success)) { + read_unlock_irqrestore(vmx->nested.apic_access_gpc.lock, flags); + if (try_refresh) { + nested_get_vmcs12_pages_refresh(vcpu); + try_refresh = false; + goto retry; + } else { + vmx_switch_vmcs(vcpu, &vmx->vmcs01); + return NVMX_VMENTRY_KVM_INTERNAL_ERROR; + } } if (nested_vmx_check_vmentry_hw(vcpu)) { vmx_switch_vmcs(vcpu, &vmx->vmcs01); + read_unlock_irqrestore(vmx->nested.apic_access_gpc.lock, flags); return NVMX_VMENTRY_VMFAIL; } @@ -3455,12 +3551,16 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, &entry_failure_code)) { exit_reason.basic = EXIT_REASON_INVALID_STATE; vmcs12->exit_qualification = entry_failure_code; + read_unlock_irqrestore(vmx->nested.apic_access_gpc.lock, flags); goto vmentry_fail_vmexit; } } enter_guest_mode(vcpu); + if (from_vmentry) + read_unlock_irqrestore(vmx->nested.apic_access_gpc.lock, flags); + if (prepare_vmcs02(vcpu, vmcs12, from_vmentry, &entry_failure_code)) { exit_reason.basic = EXIT_REASON_INVALID_STATE; vmcs12->exit_qualification = entry_failure_code; @@ -3810,9 +3910,10 @@ void nested_mark_vmcs12_pages_dirty(struct kvm_vcpu *vcpu) static int vmx_complete_nested_posted_interrupt(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); - int max_irr; + int max_irr, idx; void *vapic_page; u16 status; + bool success; if (!vmx->nested.pi_pending) return 0; @@ -3827,7 +3928,17 @@ static int vmx_complete_nested_posted_interrupt(struct kvm_vcpu *vcpu) max_irr = find_last_bit((unsigned long *)vmx->nested.pi_desc->pir, 256); if (max_irr != 256) { - vapic_page = vmx->nested.virtual_apic_map.hva; +retry: + idx = srcu_read_lock(&vcpu->kvm->srcu); + success = kvm_gpc_check(&vmx->nested.virtual_apic_gpc, PAGE_SIZE); + srcu_read_unlock(&vcpu->kvm->srcu, idx); + + if (!success) { + if (kvm_gpc_refresh(&vmx->nested.virtual_apic_gpc, PAGE_SIZE)) + goto mmio_needed; + goto retry; + } + vapic_page = vmx->nested.virtual_apic_gpc.khva; if (!vapic_page) goto mmio_needed; @@ -4827,12 +4938,6 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason, vmx_update_cpu_dirty_logging(vcpu); } - /* Unpin physical memory we referred to in vmcs02 */ - kvm_vcpu_unmap(vcpu, &vmx->nested.apic_access_page_map, false); - kvm_vcpu_unmap(vcpu, &vmx->nested.virtual_apic_map, true); - kvm_vcpu_unmap(vcpu, &vmx->nested.pi_desc_map, true); - vmx->nested.pi_desc = NULL; - if (vmx->nested.reload_vmcs01_apic_access_page) { vmx->nested.reload_vmcs01_apic_access_page = false; kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu); @@ -5246,6 +5351,12 @@ static inline void nested_release_vmcs12(struct kvm_vcpu *vcpu) kvm_mmu_free_roots(vcpu->kvm, &vcpu->arch.guest_mmu, KVM_MMU_ROOTS_ALL); vmx->nested.current_vmptr = INVALID_GPA; + + kvm_gpc_deactivate(&vmx->nested.apic_access_gpc); + kvm_gpc_deactivate(&vmx->nested.virtual_apic_gpc); + kvm_gpc_deactivate(&vmx->nested.pi_desc_gpc); + kvm_gpc_deactivate(&vmx->nested.msr_bitmap_gpc); + vmx->nested.pi_desc = NULL; } /* Emulate the VMXOFF instruction */ @@ -5620,6 +5731,17 @@ static int handle_vmptrld(struct kvm_vcpu *vcpu) VMXERR_VMPTRLD_INCORRECT_VMCS_REVISION_ID); } + kvm_gpc_activate(&vmx->nested.apic_access_gpc, + vmx->nested.cached_vmcs12->apic_access_addr, PAGE_SIZE); + kvm_gpc_activate(&vmx->nested.virtual_apic_gpc, + vmx->nested.cached_vmcs12->virtual_apic_page_addr, + PAGE_SIZE); + kvm_gpc_activate(&vmx->nested.pi_desc_gpc, + vmx->nested.cached_vmcs12->posted_intr_desc_addr, + sizeof(struct pi_desc)); + kvm_gpc_activate(&vmx->nested.msr_bitmap_gpc, + vmx->nested.cached_vmcs12->msr_bitmap, PAGE_SIZE); + set_current_vmptr(vmx, vmptr); } diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index c788aa382611..1bb8252d40aa 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -4097,16 +4097,27 @@ static bool vmx_guest_apic_has_interrupt(struct kvm_vcpu *vcpu) struct vcpu_vmx *vmx = to_vmx(vcpu); void *vapic_page; u32 vppr; - int rvi; + int rvi, idx; + bool success; if (WARN_ON_ONCE(!is_guest_mode(vcpu)) || !nested_cpu_has_vid(get_vmcs12(vcpu)) || - WARN_ON_ONCE(!vmx->nested.virtual_apic_map.gfn)) + WARN_ON_ONCE(!vmx->nested.virtual_apic_gpc.gpa)) return false; rvi = vmx_get_rvi(); +retry: + idx = srcu_read_lock(&vcpu->kvm->srcu); + success = kvm_gpc_check(&vmx->nested.virtual_apic_gpc, PAGE_SIZE); + srcu_read_unlock(&vcpu->kvm->srcu, idx); - vapic_page = vmx->nested.virtual_apic_map.hva; + if (!success) { + if (kvm_gpc_refresh(&vmx->nested.virtual_apic_gpc, PAGE_SIZE)) + return false; + goto retry; + } + + vapic_page = vmx->nested.virtual_apic_gpc.khva; vppr = *((u32 *)(vapic_page + APIC_PROCPRI)); return ((rvi & 0xf0) > (vppr & 0xf0)); @@ -4804,6 +4815,27 @@ static void init_vmcs(struct vcpu_vmx *vmx) } vmx_setup_uret_msrs(vmx); + + if (nested) { + memset(&vmx->nested.apic_access_gpc, 0, sizeof(vmx->nested.apic_access_gpc)); + kvm_gpc_init(&vmx->nested.apic_access_gpc, kvm, &vmx->vcpu, + KVM_GUEST_USES_PFN); + + memset(&vmx->nested.virtual_apic_gpc, 0, sizeof(vmx->nested.virtual_apic_gpc)); + kvm_gpc_init_with_lock(&vmx->nested.virtual_apic_gpc, kvm, &vmx->vcpu, + KVM_GUEST_AND_HOST_USE_PFN, + vmx->nested.apic_access_gpc.lock); + + memset(&vmx->nested.pi_desc_gpc, 0, sizeof(vmx->nested.pi_desc_gpc)); + kvm_gpc_init_with_lock(&vmx->nested.pi_desc_gpc, kvm, &vmx->vcpu, + KVM_GUEST_AND_HOST_USE_PFN, + vmx->nested.apic_access_gpc.lock); + + memset(&vmx->nested.msr_bitmap_gpc, 0, sizeof(vmx->nested.msr_bitmap_gpc)); + kvm_gpc_init_with_lock(&vmx->nested.msr_bitmap_gpc, kvm, &vmx->vcpu, + KVM_HOST_USES_PFN, + vmx->nested.apic_access_gpc.lock); + } } static void __vmx_vcpu_reset(struct kvm_vcpu *vcpu) diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index a3da84f4ea45..e067730a0222 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -207,13 +207,12 @@ struct nested_vmx { /* * Guest pages referred to in the vmcs02 with host-physical - * pointers, so we must keep them pinned while L2 runs. + * pointers. */ - struct kvm_host_map apic_access_page_map; - struct kvm_host_map virtual_apic_map; - struct kvm_host_map pi_desc_map; - - struct kvm_host_map msr_bitmap_map; + struct gfn_to_pfn_cache apic_access_gpc; + struct gfn_to_pfn_cache virtual_apic_gpc; + struct gfn_to_pfn_cache pi_desc_gpc; + struct gfn_to_pfn_cache msr_bitmap_gpc; struct pi_desc *pi_desc; bool pi_pending; From patchwork Fri Jan 27 04:45:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Stevens X-Patchwork-Id: 49065 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp652498wrn; Thu, 26 Jan 2023 21:04:34 -0800 (PST) X-Google-Smtp-Source: AMrXdXvVv4o28qZluleWY7/WXtChBVyn8+bycC/Q/626bVctYuxPnGWi45SCZt3H27eAXqJ0Eruz X-Received: by 2002:aa7:c1d7:0:b0:479:971e:58f6 with SMTP id d23-20020aa7c1d7000000b00479971e58f6mr40357533edp.19.1674795874592; Thu, 26 Jan 2023 21:04:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674795874; cv=none; d=google.com; s=arc-20160816; b=fLTckprsi2pBS2xM8EdUeD8zI1Mpg091UfuT1cS26B/3P32SDW0PUgOb8s7tyRCsY3 Z67Dwk4r9eDBiJx01FvnpLeHIfTqqrGL8LsSAqskgtMKW/vHcN5C8b41n6oJnf2+VY6H 8K+VJlXW+WDt+NzgxqRpoF8/Fr8PEaeetsDACi6oJfKOprlLCrrGYPigS4HcbLHO7lP1 EqruanQ46mtZU835Cnnfz2oki7P4W0gNld5RAli+vho96lspsy8LyM/0yg+jvNoZMgTF RW2uhx+uFi6PT4y63r8mKAAU4rP5vcw5gBwQaNp/jtUKQQX82TDzGtYkpEQbKosqfS6X pYqg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=FYPKVkwOWU9zgZqtWQtEW3K1C52xSLWToQaOfT5mUKo=; b=K0q9NgZczlW18kUBhh5g5KaaG3Q1F6djP5VBzqvu7OcrxxQuY+5lsgooDGFhOA/C8d 44nXXrl7HVjX+3C8lMnok6mUjeb95+MrxLX+yth4uY0RxGjL66JiR81abPLwyDcK0/ce OUYZqxpGFucOTEtEDEMmEARAjgLgOu9cU90aDPeZIfp25CktamwRoASYgvUet8Xdz8zq wwQbOUgY6WBWB74H2xJnR3YwKZk/Bh/2NmoctUX57HUWkcIzAcGANq9Ew68RNrEdFIzc 5TbMbvsl2hMyN+bRDyhjylbALRvxRZvT8TIDWXUF7GBmOkKjiOzsIOnXzr92iPNdw3RZ JW1A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=fN5wswRS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id r12-20020a50d68c000000b0049e1f0c6c8asi3955445edi.584.2023.01.26.21.04.06; Thu, 26 Jan 2023 21:04:34 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=fN5wswRS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230482AbjA0Epj (ORCPT + 99 others); Thu, 26 Jan 2023 23:45:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37152 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230081AbjA0Epe (ORCPT ); Thu, 26 Jan 2023 23:45:34 -0500 Received: from mail-pl1-x630.google.com (mail-pl1-x630.google.com [IPv6:2607:f8b0:4864:20::630]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D828174486 for ; Thu, 26 Jan 2023 20:45:29 -0800 (PST) Received: by mail-pl1-x630.google.com with SMTP id x5so407269plr.2 for ; Thu, 26 Jan 2023 20:45:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=FYPKVkwOWU9zgZqtWQtEW3K1C52xSLWToQaOfT5mUKo=; b=fN5wswRSpZgPTVOajp+lH+nj12h6tglpxMvxaffLIpgBV6krk3rycGKrq0kNjEM8lX D8nklQb9wARX9gqIrc9Q0KEDdj+60NApl5Ml8HhnL7rNKCA83aIfdiNXJUnn1KkAcL02 itzWTS2B6bAWvTsf8X1OCjLMoAk8TgfLg053E= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FYPKVkwOWU9zgZqtWQtEW3K1C52xSLWToQaOfT5mUKo=; b=tOd6wL8Rh5HFEYvv4SNXluKx3X3LeRJl8MfCtb3nHcbJEuXbccg8CpnvVsMgHL8C+G 5lHou/8oDONUDgxOIZLviZ5jqi0UHRAjZIkqDvrvB2s0KINqUHoJds4nTe/R0seuPAde f9qdrvWB+0BDbbnhxmU1kHl3YMP+JkzOuIJVkgMCGd0fv/cdZE+J2aEfAj4Wqv1bknzZ H5GW6KAFYnXfYDP7skG2gMcECFhhBlzavchDM4NWwkqsf90aDnfQDalUjbcetmAHNo3G qPHCzoDZ4KKJuVQmQ3UyphPRE1g4pX2obpxMEGDg0UI3eUm7/emrmhbF9xDyYW916eO8 zvqg== X-Gm-Message-State: AO0yUKWiPhv22u4Z9QQDYCOY2hJ15tvruFbT/ONAlE5rGapxDm8F1D0n EWMa8iSillSFGG527o4flHc2Tg== X-Received: by 2002:a17:902:d48d:b0:196:4f0d:c31f with SMTP id c13-20020a170902d48d00b001964f0dc31fmr2507526plg.12.1674794729484; Thu, 26 Jan 2023 20:45:29 -0800 (PST) Received: from localhost ([2401:fa00:8f:203:24fb:4159:9391:441d]) by smtp.gmail.com with UTF8SMTPSA id 13-20020a170902c14d00b00194706d3f25sm1804857plj.144.2023.01.26.20.45.27 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 26 Jan 2023 20:45:28 -0800 (PST) From: David Stevens X-Google-Original-From: David Stevens To: Sean Christopherson , David Woodhouse Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Stevens Subject: [PATCH 3/3] KVM: use gfn=>pfn cache for evmcs Date: Fri, 27 Jan 2023 13:45:00 +0900 Message-Id: <20230127044500.680329-4-stevensd@google.com> X-Mailer: git-send-email 2.39.1.456.gfc5497dd1b-goog In-Reply-To: <20230127044500.680329-1-stevensd@google.com> References: <20230127044500.680329-1-stevensd@google.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1756150759084317780?= X-GMAIL-MSGID: =?utf-8?q?1756150759084317780?= From: David Stevens Use gfn_to_pfn_cache to access evmcs. This replaces kvm_vcpu_map, which doesn't properly handle updates to the HVA->GFN mapping. This change introduces a number of new failure cases, since refreshing a gpc can fail. Since the evmcs is sometimes accessed alongside vmcs12 pages, the evmcs gpc is initialized to share the vmcs12 pages' gpc lock for simplicity. This is coarser locking than necessary, but taking the lock outside of the vcpu thread should be rare, so the impact should be minimal. Signed-off-by: David Stevens --- arch/x86/kvm/vmx/hyperv.c | 41 ++++++++++- arch/x86/kvm/vmx/hyperv.h | 2 + arch/x86/kvm/vmx/nested.c | 151 +++++++++++++++++++++++++++----------- arch/x86/kvm/vmx/vmx.c | 10 +++ arch/x86/kvm/vmx/vmx.h | 3 +- 5 files changed, 158 insertions(+), 49 deletions(-) diff --git a/arch/x86/kvm/vmx/hyperv.c b/arch/x86/kvm/vmx/hyperv.c index 22daca752797..1b140ef1d4db 100644 --- a/arch/x86/kvm/vmx/hyperv.c +++ b/arch/x86/kvm/vmx/hyperv.c @@ -554,12 +554,21 @@ bool nested_evmcs_l2_tlb_flush_enabled(struct kvm_vcpu *vcpu) { struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu); struct vcpu_vmx *vmx = to_vmx(vcpu); - struct hv_enlightened_vmcs *evmcs = vmx->nested.hv_evmcs; + struct hv_enlightened_vmcs *evmcs; + unsigned long flags; + bool nested_flush_hypercall; - if (!hv_vcpu || !evmcs) + if (!hv_vcpu || !evmptr_is_valid(vmx->nested.hv_evmcs_vmptr)) return false; - if (!evmcs->hv_enlightenments_control.nested_flush_hypercall) + evmcs = nested_evmcs_lock_and_acquire(vcpu, &flags); + if (!evmcs) + return false; + + nested_flush_hypercall = evmcs->hv_enlightenments_control.nested_flush_hypercall; + read_unlock_irqrestore(vmx->nested.hv_evmcs_gpc.lock, flags); + + if (!nested_flush_hypercall) return false; return hv_vcpu->vp_assist_page.nested_control.features.directhypercall; @@ -569,3 +578,29 @@ void vmx_hv_inject_synthetic_vmexit_post_tlb_flush(struct kvm_vcpu *vcpu) { nested_vmx_vmexit(vcpu, HV_VMX_SYNTHETIC_EXIT_REASON_TRAP_AFTER_FLUSH, 0, 0); } + +struct hv_enlightened_vmcs *nested_evmcs_lock_and_acquire(struct kvm_vcpu *vcpu, + unsigned long *flags_out) +{ + unsigned long flags; + struct vcpu_vmx *vmx = to_vmx(vcpu); + +retry: + read_lock_irqsave(vmx->nested.hv_evmcs_gpc.lock, flags); + if (!kvm_gpc_check(&vmx->nested.hv_evmcs_gpc, sizeof(struct hv_enlightened_vmcs))) { + read_unlock_irqrestore(vmx->nested.hv_evmcs_gpc.lock, flags); + if (!vmx->nested.hv_evmcs_gpc.active) + return NULL; + + if (kvm_gpc_refresh(&vmx->nested.hv_evmcs_gpc, + sizeof(struct hv_enlightened_vmcs))) { + kvm_gpc_deactivate(&vmx->nested.hv_evmcs_gpc); + return NULL; + } + + goto retry; + } + + *flags_out = flags; + return vmx->nested.hv_evmcs_gpc.khva; +} diff --git a/arch/x86/kvm/vmx/hyperv.h b/arch/x86/kvm/vmx/hyperv.h index ab08a9b9ab7d..43a9488f9a38 100644 --- a/arch/x86/kvm/vmx/hyperv.h +++ b/arch/x86/kvm/vmx/hyperv.h @@ -306,5 +306,7 @@ void nested_evmcs_filter_control_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 * int nested_evmcs_check_controls(struct vmcs12 *vmcs12); bool nested_evmcs_l2_tlb_flush_enabled(struct kvm_vcpu *vcpu); void vmx_hv_inject_synthetic_vmexit_post_tlb_flush(struct kvm_vcpu *vcpu); +struct hv_enlightened_vmcs *nested_evmcs_lock_and_acquire(struct kvm_vcpu *vcpu, + unsigned long *flags_out); #endif /* __KVM_X86_VMX_HYPERV_H */ diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index cb41113caa8a..b8fff71583c9 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -229,10 +229,8 @@ static inline void nested_release_evmcs(struct kvm_vcpu *vcpu) struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu); struct vcpu_vmx *vmx = to_vmx(vcpu); - if (evmptr_is_valid(vmx->nested.hv_evmcs_vmptr)) { - kvm_vcpu_unmap(vcpu, &vmx->nested.hv_evmcs_map, true); - vmx->nested.hv_evmcs = NULL; - } + if (evmptr_is_valid(vmx->nested.hv_evmcs_vmptr)) + kvm_gpc_deactivate(&vmx->nested.hv_evmcs_gpc); vmx->nested.hv_evmcs_vmptr = EVMPTR_INVALID; @@ -574,7 +572,7 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu, int msr; unsigned long *msr_bitmap_l1; unsigned long *msr_bitmap_l0 = vmx->nested.vmcs02.msr_bitmap; - struct hv_enlightened_vmcs *evmcs = vmx->nested.hv_evmcs; + struct hv_enlightened_vmcs *evmcs; /* Nothing to do if the MSR bitmap is not in use. */ if (!cpu_has_vmx_msr_bitmap() || @@ -589,10 +587,18 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu, * - Nested hypervisor (L1) has enabled 'Enlightened MSR Bitmap' feature * and tells KVM (L0) there were no changes in MSR bitmap for L2. */ - if (!vmx->nested.force_msr_bitmap_recalc && evmcs && - evmcs->hv_enlightenments_control.msr_bitmap && - evmcs->hv_clean_fields & HV_VMX_ENLIGHTENED_CLEAN_FIELD_MSR_BITMAP) - return true; + if (!vmx->nested.force_msr_bitmap_recalc && vmx->nested.hv_evmcs_gpc.active) { + if (!kvm_gpc_check(&vmx->nested.hv_evmcs_gpc, + sizeof(struct hv_enlightened_vmcs))) { + *try_refresh = true; + return false; + } + + evmcs = vmx->nested.hv_evmcs_gpc.khva; + if (evmcs->hv_enlightenments_control.msr_bitmap && + evmcs->hv_clean_fields & HV_VMX_ENLIGHTENED_CLEAN_FIELD_MSR_BITMAP) + return true; + } if (!nested_vmcs12_gpc_check(&vmx->nested.msr_bitmap_gpc, vmcs12->msr_bitmap, PAGE_SIZE, try_refresh)) @@ -1573,11 +1579,18 @@ static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx) vmcs_load(vmx->loaded_vmcs->vmcs); } -static void copy_enlightened_to_vmcs12(struct vcpu_vmx *vmx, u32 hv_clean_fields) +static bool copy_enlightened_to_vmcs12(struct vcpu_vmx *vmx, bool full_copy) { struct vmcs12 *vmcs12 = vmx->nested.cached_vmcs12; - struct hv_enlightened_vmcs *evmcs = vmx->nested.hv_evmcs; + struct hv_enlightened_vmcs *evmcs; struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(&vmx->vcpu); + unsigned long flags; + u32 hv_clean_fields; + + evmcs = nested_evmcs_lock_and_acquire(&vmx->vcpu, &flags); + if (!evmcs) + return false; + hv_clean_fields = full_copy ? 0 : evmcs->hv_clean_fields; /* HV_VMX_ENLIGHTENED_CLEAN_FIELD_NONE */ vmcs12->tpr_threshold = evmcs->tpr_threshold; @@ -1814,13 +1827,25 @@ static void copy_enlightened_to_vmcs12(struct vcpu_vmx *vmx, u32 hv_clean_fields * vmcs12->exit_io_instruction_eip = evmcs->exit_io_instruction_eip; */ - return; + read_unlock_irqrestore(vmx->nested.hv_evmcs_gpc.lock, flags); + return true; } static void copy_vmcs12_to_enlightened(struct vcpu_vmx *vmx) { struct vmcs12 *vmcs12 = vmx->nested.cached_vmcs12; - struct hv_enlightened_vmcs *evmcs = vmx->nested.hv_evmcs; + struct hv_enlightened_vmcs *evmcs; + unsigned long flags; + + evmcs = nested_evmcs_lock_and_acquire(&vmx->vcpu, &flags); + if (WARN_ON_ONCE(!evmcs)) { + /* + * We can't sync, so the state is now invalid. This isn't an immediate + * problem, but further accesses will be errors. Failing to acquire the + * evmcs gpc deactivates it, so any subsequent attempts will also fail. + */ + return; + } /* * Should not be changed by KVM: @@ -1988,6 +2013,8 @@ static void copy_vmcs12_to_enlightened(struct vcpu_vmx *vmx) evmcs->guest_bndcfgs = vmcs12->guest_bndcfgs; + read_unlock_irqrestore(vmx->nested.hv_evmcs_gpc.lock, flags); + return; } @@ -2001,6 +2028,8 @@ static enum nested_evmptrld_status nested_vmx_handle_enlightened_vmptrld( struct vcpu_vmx *vmx = to_vmx(vcpu); bool evmcs_gpa_changed = false; u64 evmcs_gpa; + struct hv_enlightened_vmcs *hv_evmcs; + unsigned long flags; if (likely(!guest_cpuid_has_evmcs(vcpu))) return EVMPTRLD_DISABLED; @@ -2016,11 +2045,14 @@ static enum nested_evmptrld_status nested_vmx_handle_enlightened_vmptrld( nested_release_evmcs(vcpu); - if (kvm_vcpu_map(vcpu, gpa_to_gfn(evmcs_gpa), - &vmx->nested.hv_evmcs_map)) + if (kvm_gpc_activate(&vmx->nested.hv_evmcs_gpc, evmcs_gpa, PAGE_SIZE)) { + kvm_gpc_deactivate(&vmx->nested.hv_evmcs_gpc); return EVMPTRLD_ERROR; + } - vmx->nested.hv_evmcs = vmx->nested.hv_evmcs_map.hva; + hv_evmcs = nested_evmcs_lock_and_acquire(&vmx->vcpu, &flags); + if (!hv_evmcs) + return EVMPTRLD_ERROR; /* * Currently, KVM only supports eVMCS version 1 @@ -2044,9 +2076,10 @@ static enum nested_evmptrld_status nested_vmx_handle_enlightened_vmptrld( * eVMCS version or VMCS12 revision_id as valid values for first * u32 field of eVMCS. */ - if ((vmx->nested.hv_evmcs->revision_id != KVM_EVMCS_VERSION) && - (vmx->nested.hv_evmcs->revision_id != VMCS12_REVISION)) { + if (hv_evmcs->revision_id != KVM_EVMCS_VERSION && + hv_evmcs->revision_id != VMCS12_REVISION) { nested_release_evmcs(vcpu); + read_unlock_irqrestore(vmx->nested.hv_evmcs_gpc.lock, flags); return EVMPTRLD_VMFAIL; } @@ -2072,8 +2105,15 @@ static enum nested_evmptrld_status nested_vmx_handle_enlightened_vmptrld( * between different L2 guests as KVM keeps a single VMCS12 per L1. */ if (from_launch || evmcs_gpa_changed) { - vmx->nested.hv_evmcs->hv_clean_fields &= - ~HV_VMX_ENLIGHTENED_CLEAN_FIELD_ALL; + if (!evmcs_gpa_changed) { + hv_evmcs = nested_evmcs_lock_and_acquire(&vmx->vcpu, &flags); + if (!hv_evmcs) + return EVMPTRLD_ERROR; + } + + hv_evmcs->hv_clean_fields &= ~HV_VMX_ENLIGHTENED_CLEAN_FIELD_ALL; + + read_unlock_irqrestore(vmx->nested.hv_evmcs_gpc.lock, flags); vmx->nested.force_msr_bitmap_recalc = true; } @@ -2399,9 +2439,10 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct loaded_vmcs *vmcs0 } } -static void prepare_vmcs02_rare(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12) +static void prepare_vmcs02_rare(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12, + struct hv_enlightened_vmcs *hv_evmcs) { - struct hv_enlightened_vmcs *hv_evmcs = vmx->nested.hv_evmcs; + struct vcpu_vmx *vmx = to_vmx(vcpu); if (!hv_evmcs || !(hv_evmcs->hv_clean_fields & HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP2)) { @@ -2534,13 +2575,17 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12, { struct vcpu_vmx *vmx = to_vmx(vcpu); bool load_guest_pdptrs_vmcs12 = false; + struct hv_enlightened_vmcs *hv_evmcs = NULL; + + if (evmptr_is_valid(vmx->nested.hv_evmcs_vmptr)) + hv_evmcs = vmx->nested.hv_evmcs_gpc.khva; if (vmx->nested.dirty_vmcs12 || evmptr_is_valid(vmx->nested.hv_evmcs_vmptr)) { - prepare_vmcs02_rare(vmx, vmcs12); + prepare_vmcs02_rare(vcpu, vmcs12, hv_evmcs); vmx->nested.dirty_vmcs12 = false; load_guest_pdptrs_vmcs12 = !evmptr_is_valid(vmx->nested.hv_evmcs_vmptr) || - !(vmx->nested.hv_evmcs->hv_clean_fields & + !(hv_evmcs->hv_clean_fields & HV_VMX_ENLIGHTENED_CLEAN_FIELD_GUEST_GRP1); } @@ -2663,8 +2708,7 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12, * here. */ if (evmptr_is_valid(vmx->nested.hv_evmcs_vmptr)) - vmx->nested.hv_evmcs->hv_clean_fields |= - HV_VMX_ENLIGHTENED_CLEAN_FIELD_ALL; + hv_evmcs->hv_clean_fields |= HV_VMX_ENLIGHTENED_CLEAN_FIELD_ALL; return 0; } @@ -3214,7 +3258,7 @@ static void nested_vmcs12_gpc_refresh(struct gfn_to_pfn_cache *gpc, } } -static void nested_get_vmcs12_pages_refresh(struct kvm_vcpu *vcpu) +static bool nested_get_vmcs12_pages_refresh(struct kvm_vcpu *vcpu) { struct vmcs12 *vmcs12 = get_vmcs12(vcpu); struct vcpu_vmx *vmx = to_vmx(vcpu); @@ -3231,9 +3275,24 @@ static void nested_get_vmcs12_pages_refresh(struct kvm_vcpu *vcpu) nested_vmcs12_gpc_refresh(&vmx->nested.pi_desc_gpc, vmcs12->posted_intr_desc_addr, sizeof(struct pi_desc)); - if (cpu_has_vmx_msr_bitmap() && nested_cpu_has(vmcs12, CPU_BASED_USE_MSR_BITMAPS)) + if (cpu_has_vmx_msr_bitmap() && nested_cpu_has(vmcs12, CPU_BASED_USE_MSR_BITMAPS)) { + if (vmx->nested.hv_evmcs_gpc.active) { + if (kvm_gpc_refresh(&vmx->nested.hv_evmcs_gpc, PAGE_SIZE)) { + kvm_gpc_deactivate(&vmx->nested.hv_evmcs_gpc); + pr_debug_ratelimited("%s: no backing for evmcs\n", __func__); + vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR; + vcpu->run->internal.suberror = + KVM_INTERNAL_ERROR_EMULATION; + vcpu->run->internal.ndata = 0; + return false; + } + } + nested_vmcs12_gpc_refresh(&vmx->nested.msr_bitmap_gpc, vmcs12->msr_bitmap, PAGE_SIZE); + } + + return true; } static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu, bool *try_refresh) @@ -3366,13 +3425,11 @@ static bool vmx_get_nested_state_pages(struct kvm_vcpu *vcpu) srcu_read_unlock(&vcpu->kvm->srcu, idx); if (!success) { - if (try_refresh) { - nested_get_vmcs12_pages_refresh(vcpu); + if (try_refresh && nested_get_vmcs12_pages_refresh(vcpu)) { try_refresh = false; goto retry; - } else { - return false; } + return false; } return true; @@ -3531,14 +3588,12 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, if (unlikely(!success)) { read_unlock_irqrestore(vmx->nested.apic_access_gpc.lock, flags); - if (try_refresh) { - nested_get_vmcs12_pages_refresh(vcpu); + if (try_refresh && nested_get_vmcs12_pages_refresh(vcpu)) { try_refresh = false; goto retry; - } else { - vmx_switch_vmcs(vcpu, &vmx->vmcs01); - return NVMX_VMENTRY_KVM_INTERNAL_ERROR; } + vmx_switch_vmcs(vcpu, &vmx->vmcs01); + return NVMX_VMENTRY_KVM_INTERNAL_ERROR; } if (nested_vmx_check_vmentry_hw(vcpu)) { @@ -3680,7 +3735,8 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch) return nested_vmx_failInvalid(vcpu); if (evmptr_is_valid(vmx->nested.hv_evmcs_vmptr)) { - copy_enlightened_to_vmcs12(vmx, vmx->nested.hv_evmcs->hv_clean_fields); + if (!copy_enlightened_to_vmcs12(vmx, false)) + return nested_vmx_fail(vcpu, VMXERR_VMPTRLD_INVALID_ADDRESS); /* Enlightened VMCS doesn't have launch state */ vmcs12->launch_state = !launch; } else if (enable_shadow_vmcs) { @@ -5421,7 +5477,7 @@ static int handle_vmclear(struct kvm_vcpu *vcpu) vmptr + offsetof(struct vmcs12, launch_state), &zero, sizeof(zero)); - } else if (vmx->nested.hv_evmcs && vmptr == vmx->nested.hv_evmcs_vmptr) { + } else if (vmx->nested.hv_evmcs_gpc.active && vmptr == vmx->nested.hv_evmcs_vmptr) { nested_release_evmcs(vcpu); } @@ -5448,8 +5504,9 @@ static int handle_vmread(struct kvm_vcpu *vcpu) unsigned long exit_qualification = vmx_get_exit_qual(vcpu); u32 instr_info = vmcs_read32(VMX_INSTRUCTION_INFO); struct vcpu_vmx *vmx = to_vmx(vcpu); + struct hv_enlightened_vmcs *evmcs; struct x86_exception e; - unsigned long field; + unsigned long field, flags; u64 value; gva_t gva = 0; short offset; @@ -5498,8 +5555,13 @@ static int handle_vmread(struct kvm_vcpu *vcpu) if (offset < 0) return nested_vmx_fail(vcpu, VMXERR_UNSUPPORTED_VMCS_COMPONENT); + evmcs = nested_evmcs_lock_and_acquire(&vmx->vcpu, &flags); + if (!evmcs) + return nested_vmx_fail(vcpu, VMXERR_VMPTRLD_INVALID_ADDRESS); + /* Read the field, zero-extended to a u64 value */ - value = evmcs_read_any(vmx->nested.hv_evmcs, field, offset); + value = evmcs_read_any(evmcs, field, offset); + read_unlock_irqrestore(vmx->nested.hv_evmcs_gpc.lock, flags); } /* @@ -6604,7 +6666,7 @@ static int vmx_get_nested_state(struct kvm_vcpu *vcpu, } else { copy_vmcs02_to_vmcs12_rare(vcpu, get_vmcs12(vcpu)); if (!vmx->nested.need_vmcs12_to_shadow_sync) { - if (evmptr_is_valid(vmx->nested.hv_evmcs_vmptr)) + if (evmptr_is_valid(vmx->nested.hv_evmcs_vmptr)) { /* * L1 hypervisor is not obliged to keep eVMCS * clean fields data always up-to-date while @@ -6612,8 +6674,9 @@ static int vmx_get_nested_state(struct kvm_vcpu *vcpu, * supposed to be actual upon vmentry so we need * to ignore it here and do full copy. */ - copy_enlightened_to_vmcs12(vmx, 0); - else if (enable_shadow_vmcs) + if (!copy_enlightened_to_vmcs12(vmx, true)) + return -EFAULT; + } else if (enable_shadow_vmcs) copy_shadow_to_vmcs12(vmx); } } diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 1bb8252d40aa..1c13fc1b7b5e 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -4835,6 +4835,16 @@ static void init_vmcs(struct vcpu_vmx *vmx) kvm_gpc_init_with_lock(&vmx->nested.msr_bitmap_gpc, kvm, &vmx->vcpu, KVM_HOST_USES_PFN, vmx->nested.apic_access_gpc.lock); + + memset(&vmx->nested.hv_evmcs_gpc, 0, sizeof(vmx->nested.hv_evmcs_gpc)); + /* + * Share the same lock for simpler locking. Taking the lock + * outside of the vcpu thread should be rare, so the cost of + * the coarser locking should be minimal + */ + kvm_gpc_init_with_lock(&vmx->nested.hv_evmcs_gpc, kvm, &vmx->vcpu, + KVM_GUEST_AND_HOST_USE_PFN, + vmx->nested.apic_access_gpc.lock); } } diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index e067730a0222..71e52daf60af 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -252,9 +252,8 @@ struct nested_vmx { bool guest_mode; } smm; + struct gfn_to_pfn_cache hv_evmcs_gpc; gpa_t hv_evmcs_vmptr; - struct kvm_host_map hv_evmcs_map; - struct hv_enlightened_vmcs *hv_evmcs; }; struct vcpu_vmx {