From patchwork Wed Oct 19 16:56:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 5755 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4ac7:0:0:0:0:0 with SMTP id y7csp431661wrs; Wed, 19 Oct 2022 09:57:27 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4PXLfYJJ8f5zfe7fvyryo95zg2Ipit0RqABxP4A+VKQB7J06FSKe3nJDzcjhzQ9QOnrBp4 X-Received: by 2002:a17:906:9bc3:b0:78d:816f:3743 with SMTP id de3-20020a1709069bc300b0078d816f3743mr7642472ejc.380.1666198647657; Wed, 19 Oct 2022 09:57:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666198647; cv=none; d=google.com; s=arc-20160816; b=Ja/IpHUoUSHgrwOL9E5bNWLZx8Chhqgm4stgGaFuleqv9Lhj6g0OfpLccS4ydc7uMN AeDtOWn88q80aekaGuSBIZGAJ4dFUu91n69tgYdBiBcLmHWzS7UfheMnrAxzt4Ddg3pV OrpK769u5IVlPnpgDitvziLcmilmRArsh3esapdKy4AGZmrWC44/om+KnYh5M7EMxViZ e8kwwoxNBl22Cmw4cWxI7s/ikfNXDk1VXqEQfR2x0ew2TJr+Cs5KG7/K7fbtQRR+nUSY Jr9kthCflvbzot4Yra/VF4Qw7krG6YCxMYDELoDHKCMLcOOvlNTFBPdYKvw05B9tU+mT RNvQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:reply-to:dkim-signature; bh=tUi0BoiMqtO/M/WpUaQHGu1rzQiQ7c9np2BzoBF+UZg=; b=QB5XOqsRlrDDqVY0SHYQLWnriI/QRufCukMPeGRDzU0xY6au/VpvHxlFvUxn9/qz+n KgeKo4ROrNYKoet1/wOgu8lj/O3FFvjWE5zR4bym3AqQxSfbcHaN2j9qG2pRQBVbb0bE ZgGmLNLjZPAWDgmPIF/8zFUeR5s6nLbvyKcZcMtwZb276hMbINypODVZHSHkwNcWIP4D XbBPM86Yd/p/UFSQ0ItLGgeNC+TCSWz/L2GOcrBIFiM42xfzpTxxDcvREnKyv3HWU0r8 h0jvKqSJ/8OppE2VJN+zAzCoMwI+22ejKeq4djyhyg+oPv3asIJW72qL0yweFnUyXQqu XhrA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=Wx1HWb2A; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b4-20020a0564021f0400b0045d8bff7afesi7320956edb.376.2022.10.19.09.56.59; Wed, 19 Oct 2022 09:57:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=Wx1HWb2A; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230242AbiJSQ40 (ORCPT + 99 others); Wed, 19 Oct 2022 12:56:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37022 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229714AbiJSQ4Z (ORCPT ); Wed, 19 Oct 2022 12:56:25 -0400 Received: from mail-pl1-x649.google.com (mail-pl1-x649.google.com [IPv6:2607:f8b0:4864:20::649]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2AE5814FD01 for ; Wed, 19 Oct 2022 09:56:24 -0700 (PDT) Received: by mail-pl1-x649.google.com with SMTP id n1-20020a170902f60100b00179c0a5c51fso12073446plg.7 for ; Wed, 19 Oct 2022 09:56:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=tUi0BoiMqtO/M/WpUaQHGu1rzQiQ7c9np2BzoBF+UZg=; b=Wx1HWb2AcBMg7QFvDkkaYmzy/2Ol3jWSHV09YGjMZQhczMlcdYT0g571M8pTqkvscu 07QFzJG1Epij+6OOw4tOL24/F6ieoj3UE/elZbukLXfCV7pKFjvpUEvBbSlhZvtps7Fe Z2+YkVZ7UIp0KaHuK6kiS4OWDbCRcnX5xh5nWLZ0eUPqAa7gX8C/YVd3EgRxFVrFA3RM jP4jPQ0tOydhJknuwYbF5Bg04SnkJuBC6kxbsFNepTozpVEZFHBgteXp1cj1oU+M8wKU 1gKYnzFB3RQJ/tpJ0AArvkHq/wdxyJIUoeEQ2dHFSL6QtR9pqFG9JUD1yMYIqOFL0jCq k5dg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=tUi0BoiMqtO/M/WpUaQHGu1rzQiQ7c9np2BzoBF+UZg=; b=1/01EFBlMcgaNKVPpGkXTQP1IuCAoB5LJhZkTa1Xs02Nt8xBTOKp8HK9Kz6UOfcxwh PyBjrygtiRCrDXN5gVyOR2Ll9kcgagBU/qbZmr2nF2qmkiTyyI4cjnt3tbpOx1wnuYr2 gcMW72a26cL+PJRRWrsrFAtLf+hFyLvdcGZMYlrxJCDynw8737imwd0wZFk21uQjcaUS 1uSRBRIu78Nhx/SYw+K83zoAkKt1BjdCmJbePqdmDqB9yJPgTc1mG2UMU2ODkLRz5JmS dIWT19ttrspKVgMzQywD5htDRpBSZvVvsOGs3hUOWg2SROe1BgHJzSjud+crPRg4g6KP xqAw== X-Gm-Message-State: ACrzQf2n+EmIfYY7dDNIySLkIyaUd/oH5iwwkW5CBNKiGKj1Ftbh/L1/ VBWeLZei+flNLC4HQYJ3YICPmoqzBUU= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:90a:b794:b0:20a:eab5:cf39 with SMTP id m20-20020a17090ab79400b0020aeab5cf39mr2181968pjr.1.1666198583493; Wed, 19 Oct 2022 09:56:23 -0700 (PDT) Reply-To: Sean Christopherson Date: Wed, 19 Oct 2022 16:56:11 +0000 In-Reply-To: <20221019165618.927057-1-seanjc@google.com> Mime-Version: 1.0 References: <20221019165618.927057-1-seanjc@google.com> X-Mailer: git-send-email 2.38.0.413.g74048e4d9e-goog Message-ID: <20221019165618.927057-2-seanjc@google.com> Subject: [PATCH v6 1/8] KVM: x86/mmu: Tag disallowed NX huge pages even if they're not tracked From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Mingwei Zhang , David Matlack , Yan Zhao , Ben Gardon X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747135913231383413?= X-GMAIL-MSGID: =?utf-8?q?1747135913231383413?= Tag shadow pages that cannot be replaced with an NX huge page regardless of whether or not zapping the page would allow KVM to immediately create a huge page, e.g. because something else prevents creating a huge page. I.e. track pages that are disallowed from being NX huge pages regardless of whether or not the page could have been huge at the time of fault. KVM currently tracks pages that were disallowed from being huge due to the NX workaround if and only if the page could otherwise be huge. But that fails to handled the scenario where whatever restriction prevented KVM from installing a huge page goes away, e.g. if dirty logging is disabled, the host mapping level changes, etc... Failure to tag shadow pages appropriately could theoretically lead to false negatives, e.g. if a fetch fault requests a small page and thus isn't tracked, and a read/write fault later requests a huge page, KVM will not reject the huge page as it should. To avoid yet another flag, initialize the list_head and use list_empty() to determine whether or not a page is on the list of NX huge pages that should be recovered. Note, the TDP MMU accounting is still flawed as fixing the TDP MMU is more involved due to mmu_lock being held for read. This will be addressed in a future commit. Fixes: 5bcaf3e1715f ("KVM: x86/mmu: Account NX huge page disallowed iff huge page was requested") Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 32 ++++++++++++++++++++++++-------- arch/x86/kvm/mmu/mmu_internal.h | 10 +++++++++- arch/x86/kvm/mmu/paging_tmpl.h | 6 +++--- arch/x86/kvm/mmu/tdp_mmu.c | 4 +++- 4 files changed, 39 insertions(+), 13 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 6f81539061d6..f1e089dfdd22 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -802,15 +802,25 @@ static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp) kvm_flush_remote_tlbs_with_address(kvm, gfn, 1); } -void account_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp) +void account_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp, + bool nx_huge_page_possible) { - if (sp->lpage_disallowed) + sp->lpage_disallowed = true; + + /* + * If it's possible to replace the shadow page with an NX huge page, + * i.e. if the shadow page is the only thing currently preventing KVM + * from using a huge page, add the shadow page to the list of "to be + * zapped for NX recovery" pages. Note, the shadow page can already be + * on the list if KVM is reusing an existing shadow page, i.e. if KVM + * links a shadow page at multiple points. + */ + if (!nx_huge_page_possible || !list_empty(&sp->lpage_disallowed_link)) return; ++kvm->stat.nx_lpage_splits; list_add_tail(&sp->lpage_disallowed_link, &kvm->arch.lpage_disallowed_mmu_pages); - sp->lpage_disallowed = true; } static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp) @@ -832,9 +842,13 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp) void unaccount_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp) { - --kvm->stat.nx_lpage_splits; sp->lpage_disallowed = false; - list_del(&sp->lpage_disallowed_link); + + if (list_empty(&sp->lpage_disallowed_link)) + return; + + --kvm->stat.nx_lpage_splits; + list_del_init(&sp->lpage_disallowed_link); } static struct kvm_memory_slot * @@ -2129,6 +2143,8 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm, set_page_private(virt_to_page(sp->spt), (unsigned long)sp); + INIT_LIST_HEAD(&sp->lpage_disallowed_link); + /* * active_mmu_pages must be a FIFO list, as kvm_zap_obsolete_pages() * depends on valid pages being added to the head of the list. See @@ -3126,9 +3142,9 @@ static int __direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) continue; link_shadow_page(vcpu, it.sptep, sp); - if (fault->is_tdp && fault->huge_page_disallowed && - fault->req_level >= it.level) - account_huge_nx_page(vcpu->kvm, sp); + if (fault->is_tdp && fault->huge_page_disallowed) + account_huge_nx_page(vcpu->kvm, sp, + fault->req_level >= it.level); } if (WARN_ON_ONCE(it.level != fault->goal_level)) diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index 582def531d4d..cca1ad75d096 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -100,6 +100,13 @@ struct kvm_mmu_page { }; }; + /* + * Tracks shadow pages that, if zapped, would allow KVM to create an NX + * huge page. A shadow page will have lpage_disallowed set but not be + * on the list if a huge page is disallowed for other reasons, e.g. + * because KVM is shadowing a PTE at the same gfn, the memslot isn't + * properly aligned, etc... + */ struct list_head lpage_disallowed_link; #ifdef CONFIG_X86_32 /* @@ -315,7 +322,8 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_ void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc); -void account_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp); +void account_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp, + bool nx_huge_page_possible); void unaccount_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp); #endif /* __KVM_X86_MMU_INTERNAL_H */ diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 5ab5f94dcb6f..8fd0c4e1e575 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -713,9 +713,9 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, continue; link_shadow_page(vcpu, it.sptep, sp); - if (fault->huge_page_disallowed && - fault->req_level >= it.level) - account_huge_nx_page(vcpu->kvm, sp); + if (fault->huge_page_disallowed) + account_huge_nx_page(vcpu->kvm, sp, + fault->req_level >= it.level); } if (WARN_ON_ONCE(it.level != fault->goal_level)) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 672f0432d777..80a4a1a09131 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -284,6 +284,8 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu) static void tdp_mmu_init_sp(struct kvm_mmu_page *sp, tdp_ptep_t sptep, gfn_t gfn, union kvm_mmu_page_role role) { + INIT_LIST_HEAD(&sp->lpage_disallowed_link); + set_page_private(virt_to_page(sp->spt), (unsigned long)sp); sp->role = role; @@ -1141,7 +1143,7 @@ static int tdp_mmu_link_sp(struct kvm *kvm, struct tdp_iter *iter, spin_lock(&kvm->arch.tdp_mmu_pages_lock); list_add(&sp->link, &kvm->arch.tdp_mmu_pages); if (account_nx) - account_huge_nx_page(kvm, sp); + account_huge_nx_page(kvm, sp, true); spin_unlock(&kvm->arch.tdp_mmu_pages_lock); tdp_account_mmu_page(kvm, sp);