From patchwork Thu Feb 2 18:28:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 52114 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp400723wrn; Thu, 2 Feb 2023 10:30:00 -0800 (PST) X-Google-Smtp-Source: AK7set/OasZB2baD6y8Ck78GBuZ30MjF/hN4ui5TycBAT9Iry0dERAx9U7pZaejBBmkBp7nsN79F X-Received: by 2002:a17:90b:1c92:b0:22c:351d:c2bb with SMTP id oo18-20020a17090b1c9200b0022c351dc2bbmr7336403pjb.23.1675362600344; Thu, 02 Feb 2023 10:30:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1675362600; cv=none; d=google.com; s=arc-20160816; b=K3ila12RS2B4YleK5UHXTe4JZ4fQWwikPLZ2ekP3X+uBPO5x9fSDKZkjbrOswtvRb1 dwb5BsQAc+ats7ga0JYZyA1WxUeEJ0AhIK3O0GEzPsEMrW6FT6FQm8559E5TIVpGOFcJ 8WGEqo9CP7c9HhwJsZRqEB1OxU8w1LDqnMjOWJ0apvA6u2Bo+eMaFms6VOuWb4SC/WW/ ZWiEmlEz+fIlE3rAEDrlAfEBtCr0fie0YykqsomfoshLZ2E0xpOfvrQKEEVnX9AQOUvz bVTHuJxoWm31pgKGHIYNzgwA6QhdnojKZK85E43TKACMzcW0VeqFho2FvYIi5MGTlb8W ShLA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:reply-to:dkim-signature; bh=3LgqJvQdTUnDV8KRYP5nK1SQHicVLOIOxzAR2ICFNms=; b=UdULFcSNsYWkSe6y9+SQfalo2cPOQgtKZGT0Wmaeara6ecqQLeLq2wX4pCLcIkRqLM pawxTDcBPsWI6URbw/4RfavuPFeb7psMk1aCJRkrzmGBLoaT22zurdrwVSF9HrOkNxWC L0X270DBygIBSCOjviXUrAKP6JpEi2lsrQkLiRRxM8Ico8GpC9xx1R+/D3jHjrNvkjd8 4Th6FRPGEMe6W1lhq68uxNdNW+E0AL5LrxvFlXtDtQEy/x7SzAy4ycCLw2wmXXZABh+O iqc108TECP2dfaDzhQnHvAhlqA/VGjsI2Z3yOeRFYfHtK1547+WvkMd9VdkQoI8L6Hvl LLtw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=pTHp9YIB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id mg16-20020a17090b371000b00229290e462fsi374904pjb.90.2023.02.02.10.29.47; Thu, 02 Feb 2023 10:30:00 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=pTHp9YIB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232370AbjBBS24 (ORCPT + 99 others); Thu, 2 Feb 2023 13:28:56 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33296 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232431AbjBBS2k (ORCPT ); Thu, 2 Feb 2023 13:28:40 -0500 Received: from mail-pf1-x44a.google.com (mail-pf1-x44a.google.com [IPv6:2607:f8b0:4864:20::44a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CA69563841 for ; Thu, 2 Feb 2023 10:28:22 -0800 (PST) Received: by mail-pf1-x44a.google.com with SMTP id h11-20020a056a00230b00b00593b9e6ee79so1354491pfh.8 for ; Thu, 02 Feb 2023 10:28:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=3LgqJvQdTUnDV8KRYP5nK1SQHicVLOIOxzAR2ICFNms=; b=pTHp9YIB2s9i7ADRCxkxVMBN5TbosV9207UzxT5JLIoA0iiwHbPDOCF2B0ulq0gjTz dmAyphmWqMmPznzay2s5DpCEp6NHjMRfjIIOhNALqfZq69tbr0GklWjDEBSb6T6yiMcI Acuzx3NkksjEreIyP7FImla+4q+h2dTJ7RCe9MYgXigjsRFI8fyQnksSrbMoHYw8b91W ziXGJ4LKf1/CT/L9ptzIHGgioiuiR31ITCTvKCYXTdb0PQKQH3W7HHV1V2qWrfBNepiz dUAZbmAh8qZCWqN/6bAgBFqW5SsXLlahiTC1ecRRDFk1WgO502NqjIWWgNW35qFckskp oU6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=3LgqJvQdTUnDV8KRYP5nK1SQHicVLOIOxzAR2ICFNms=; b=zD11s4snaAKynwU04nbOo5c4PJtohs6q+pvotVq3UGTmph/8mWN2R1F1Q0v/48J/Xx +F/1dSWiKYJcsh32udctHZbeYuGbW/h40AVDhiWFDGxgA0AVI87htF9mDB6XhxW5ejOI 04MZ2B5j4ISrRDdxoamsHovoeBdVWDZOWM8Nc+CMX8/N9UnDiq/2fxjBSph4Qd9ETl4R oZCRWHUejgKlG/+hnA/pfi7XdepsgK5/uCdcMP/wgASIqEmdM82yl+Mq14b2HZF/UpfW TXaruhTEht/WfIXKO7GiteqOOUVr9HKB2CckMPTIQsHCMiCgVWajKXh7sL3UJwEtBBPJ uV5A== X-Gm-Message-State: AO0yUKUojMN1tpMfL81WUZ+a23ouwpvZEuiAsoaEEtY/EeROw6yOAwGc 1u/i245EkuAaSb9H3/oOLAagcw2u3Zw= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:90a:3da4:b0:22c:24f0:32f4 with SMTP id i33-20020a17090a3da400b0022c24f032f4mr755189pjc.93.1675362501969; Thu, 02 Feb 2023 10:28:21 -0800 (PST) Reply-To: Sean Christopherson Date: Thu, 2 Feb 2023 18:28:15 +0000 In-Reply-To: <20230202182817.407394-1-seanjc@google.com> Mime-Version: 1.0 References: <20230202182817.407394-1-seanjc@google.com> X-Mailer: git-send-email 2.39.1.519.gcb327c4b5f-goog Message-ID: <20230202182817.407394-2-seanjc@google.com> Subject: [PATCH v2 1/3] KVM: x86/mmu: Use EMULTYPE flag to track write #PFs to shadow pages From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Huang Hang , Lai Jiangshan X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1756745013868319005?= X-GMAIL-MSGID: =?utf-8?q?1756745013868319005?= Use a new EMULTYPE flag, EMULTYPE_WRITE_PF_TO_SP, to track page faults on self-changing writes to shadowed page tables instead of propagating that information to the emulator via a semi-persistent vCPU flag. Using a flag in "struct kvm_vcpu_arch" is confusing, especially as implemented, as it's not at all obvious that clearing the flag only when emulation actually occurs is correct. E.g. if KVM sets the flag and then retries the fault without ever getting to the emulator, the flag will be left set for future calls into the emulator. But because the flag is consumed if and only if both EMULTYPE_PF and EMULTYPE_ALLOW_RETRY_PF are set, and because EMULTYPE_ALLOW_RETRY_PF is deliberately not set for direct MMUs, emulated MMIO, or while L2 is active, KVM avoids false positives on a stale flag since FNAME(page_fault) is guaranteed to be run and refresh the flag before it's ultimately consumed by the tail end of reexecute_instruction(). Signed-off-by: Sean Christopherson --- arch/x86/include/asm/kvm_host.h | 37 ++++++++++++++++++--------------- arch/x86/kvm/mmu/mmu.c | 5 +++-- arch/x86/kvm/mmu/mmu_internal.h | 12 ++++++++++- arch/x86/kvm/mmu/paging_tmpl.h | 4 +--- arch/x86/kvm/x86.c | 15 ++----------- 5 files changed, 37 insertions(+), 36 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 4d2bc08794e4..a0fa6333edbe 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -942,23 +942,6 @@ struct kvm_vcpu_arch { u64 msr_kvm_poll_control; - /* - * Indicates the guest is trying to write a gfn that contains one or - * more of the PTEs used to translate the write itself, i.e. the access - * is changing its own translation in the guest page tables. KVM exits - * to userspace if emulation of the faulting instruction fails and this - * flag is set, as KVM cannot make forward progress. - * - * If emulation fails for a write to guest page tables, KVM unprotects - * (zaps) the shadow page for the target gfn and resumes the guest to - * retry the non-emulatable instruction (on hardware). Unprotecting the - * gfn doesn't allow forward progress for a self-changing access because - * doing so also zaps the translation for the gfn, i.e. retrying the - * instruction will hit a !PRESENT fault, which results in a new shadow - * page and sends KVM back to square one. - */ - bool write_fault_to_shadow_pgtable; - /* set at EPT violation at this point */ unsigned long exit_qualification; @@ -1891,6 +1874,25 @@ u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu); * EMULTYPE_COMPLETE_USER_EXIT - Set when the emulator should update interruptibility * state and inject single-step #DBs after skipping * an instruction (after completing userspace I/O). + * + * EMULTYPE_WRITE_PF_TO_SP - Set when emulating an intercepted page fault that + * is attempting to write a gfn that contains one or + * more of the PTEs used to translate the write itself, + * and the owning page table is being shadowed by KVM. + * If emulation of the faulting instruction fails and + * this flag is set, KVM will exit to userspace instead + * of retrying emulation as KVM cannot make forward + * progress. + * + * If emulation fails for a write to guest page tables, + * KVM unprotects (zaps) the shadow page for the target + * gfn and resumes the guest to retry the non-emulatable + * instruction (on hardware). Unprotecting the gfn + * doesn't allow forward progress for a self-changing + * access because doing so also zaps the translation for + * the gfn, i.e. retrying the instruction will hit a + * !PRESENT fault, which results in a new shadow page + * and sends KVM back to square one. */ #define EMULTYPE_NO_DECODE (1 << 0) #define EMULTYPE_TRAP_UD (1 << 1) @@ -1900,6 +1902,7 @@ u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu); #define EMULTYPE_VMWARE_GP (1 << 5) #define EMULTYPE_PF (1 << 6) #define EMULTYPE_COMPLETE_USER_EXIT (1 << 7) +#define EMULTYPE_WRITE_PF_TO_SP (1 << 8) int kvm_emulate_instruction(struct kvm_vcpu *vcpu, int emulation_type); int kvm_emulate_instruction_from_buffer(struct kvm_vcpu *vcpu, diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index c91ee2927dd7..bf38575a1957 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4203,7 +4203,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work) work->arch.cr3 != vcpu->arch.mmu->get_guest_pgd(vcpu)) return; - kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, 0, true); + kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, 0, true, NULL); } static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) @@ -5664,7 +5664,8 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err if (r == RET_PF_INVALID) { r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa, - lower_32_bits(error_code), false); + lower_32_bits(error_code), false, + &emulation_type); if (KVM_BUG_ON(r == RET_PF_INVALID, vcpu->kvm)) return -EIO; } diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index cc58631e2336..2cbb155c686c 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -240,6 +240,13 @@ struct kvm_page_fault { kvm_pfn_t pfn; hva_t hva; bool map_writable; + + /* + * Indicates the guest is trying to write a gfn that contains one or + * more of the PTEs used to translate the write itself, i.e. the access + * is changing its own translation in the guest page tables. + */ + bool write_fault_to_shadow_pgtable; }; int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault); @@ -273,7 +280,7 @@ enum { }; static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, - u32 err, bool prefetch) + u32 err, bool prefetch, int *emulation_type) { struct kvm_page_fault fault = { .addr = cr2_or_gpa, @@ -312,6 +319,9 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, else r = vcpu->arch.mmu->page_fault(vcpu, &fault); + if (fault.write_fault_to_shadow_pgtable && emulation_type) + *emulation_type |= EMULTYPE_WRITE_PF_TO_SP; + /* * Similar to above, prefetch faults aren't truly spurious, and the * async #PF path doesn't do emulation. Do count faults that are fixed diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 57f0b75c80f9..5d2958299b4f 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -825,10 +825,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault if (r) return r; - vcpu->arch.write_fault_to_shadow_pgtable = false; - is_self_change_mapping = FNAME(is_self_change_mapping)(vcpu, - &walker, fault->user, &vcpu->arch.write_fault_to_shadow_pgtable); + &walker, fault->user, &fault->write_fault_to_shadow_pgtable); if (is_self_change_mapping) fault->max_level = PG_LEVEL_4K; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 508074e47bc0..de2a0d1c9c21 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8427,7 +8427,6 @@ static int handle_emulation_failure(struct kvm_vcpu *vcpu, int emulation_type) } static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, - bool write_fault_to_shadow_pgtable, int emulation_type) { gpa_t gpa = cr2_or_gpa; @@ -8498,7 +8497,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, * be fixed by unprotecting shadow page and it should * be reported to userspace. */ - return !write_fault_to_shadow_pgtable; + return !(emulation_type & EMULTYPE_WRITE_PF_TO_SP); } static bool retry_instruction(struct x86_emulate_ctxt *ctxt, @@ -8746,20 +8745,12 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, int r; struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt; bool writeback = true; - bool write_fault_to_spt; if (unlikely(!kvm_can_emulate_insn(vcpu, emulation_type, insn, insn_len))) return 1; vcpu->arch.l1tf_flush_l1d = true; - /* - * Clear write_fault_to_shadow_pgtable here to ensure it is - * never reused. - */ - write_fault_to_spt = vcpu->arch.write_fault_to_shadow_pgtable; - vcpu->arch.write_fault_to_shadow_pgtable = false; - if (!(emulation_type & EMULTYPE_NO_DECODE)) { kvm_clear_exception_queue(vcpu); @@ -8780,7 +8771,6 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, return 1; } if (reexecute_instruction(vcpu, cr2_or_gpa, - write_fault_to_spt, emulation_type)) return 1; @@ -8859,8 +8849,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, return 1; if (r == EMULATION_FAILED) { - if (reexecute_instruction(vcpu, cr2_or_gpa, write_fault_to_spt, - emulation_type)) + if (reexecute_instruction(vcpu, cr2_or_gpa, emulation_type)) return 1; return handle_emulation_failure(vcpu, emulation_type); From patchwork Thu Feb 2 18:28:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 52117 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp400795wrn; Thu, 2 Feb 2023 10:30:09 -0800 (PST) X-Google-Smtp-Source: AK7set+p8nANhA40hfXJv19S/vqnEaX3K3gYIDLm94P50K3MIO/fBwpinKSGhF+nkF00uUhaIAk4 X-Received: by 2002:a17:90b:1d82:b0:22c:169b:ec44 with SMTP id pf2-20020a17090b1d8200b0022c169bec44mr7671548pjb.31.1675362608793; Thu, 02 Feb 2023 10:30:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1675362608; cv=none; d=google.com; s=arc-20160816; b=zEMDFEUQrnQdIktD3bBwT5hmP2W18wnEp5KLOOK5ff/QzGDYirX0gJvHDIQqHj0YYV 09+C5QHc16ajK2bYh0IO7sJrGZWv89favJiXjCEer/QyQx8Tv6eb2NM8E4MTNpCb5s3f bIVrQjrP9/zYUKsZ+RTrh1W9RrHmEu1hwH3PkKmFVVwNX+3R3Q4wzHAYXxUVzr1Rzsst qPcRiFWyoYGjZMxmMrzFTrV/lXcQHTvsOyrHClRNyy/6gG0JmylwY8nF/tOGwAkIp8B5 NTJbygvafPr99bCyEgiUdLAocmbRsqokpjJLxws2vkzanoESkAWZGgNhehVYoYgK84k9 MxCw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:reply-to:dkim-signature; bh=23k7ngnnq1dagvKcwTov6zadXURGVa/hBCuLb+iV0g0=; b=jNCodkGzIHP3NPibNSRHkUvof15XGd20BkpSrOnYX+QFU07MPKyRlKWm90L5gqcVfk EkACuNAZEz/X37Zt9oFdA3YG/Fw/hbOovYJxsoU7bgwN2J1ov9gTTQKX1YX81nLbfQPJ kiUr7KdYXuvX0pALn06gPnb0ZjdTOEN5b1DdBx+4Ufw9AlLAX2vIDCqntc1ODpkpCN9q NjHy5b3E0+/YMNudLEGO9R5QZo+u/XX16GezPvMCjmjjY1YzrkwoDwTccmmZPh0+ceIm IJRsAmftO4S2E1Lsqih1c7K1CM6EtQwoEOTpoby9I8nG2SanbWS82wwZuebAphnUs5zS hRmA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="qlL/NRrp"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id on16-20020a17090b1d1000b002291890533esi493412pjb.114.2023.02.02.10.29.56; Thu, 02 Feb 2023 10:30:08 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="qlL/NRrp"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232697AbjBBS3G (ORCPT + 99 others); Thu, 2 Feb 2023 13:29:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60754 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232568AbjBBS2l (ORCPT ); Thu, 2 Feb 2023 13:28:41 -0500 Received: from mail-pf1-x44a.google.com (mail-pf1-x44a.google.com [IPv6:2607:f8b0:4864:20::44a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 74EC754542 for ; Thu, 2 Feb 2023 10:28:24 -0800 (PST) Received: by mail-pf1-x44a.google.com with SMTP id v23-20020aa78097000000b005748c087db1so1356000pff.2 for ; Thu, 02 Feb 2023 10:28:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=23k7ngnnq1dagvKcwTov6zadXURGVa/hBCuLb+iV0g0=; b=qlL/NRrpG/+vpCbk4ZkF9xxNrvsWRSi6Ew/JHzWHJRFEE18vAOAO0CiLpVr7HiCAIS Xe+67hFPUl+94O9IvYmpLCQy2+x2p+BQrVc5ZrQIT6CqMJ4N9kHM1xwcuvqQSdloVLTl 4h9uMBNRWSRlWSp3i/vCULSGzs4w2spCcj5cGI0XJ7GVEtCFgRZS+QcocrikNW3ysSjD 0/27DxwuU9cVtWo6s0S9Vmak9nzECvFXw97c22bO2QBK2MLxHv3N+Z1FpfaQ5iGdvYz1 VjUJ6CgH/iBzlsETUhDGS6soZPFVpZQHzdogttMDnsowbI5LcqQ/mCsxEGfDGAwDjG0S n4mQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=23k7ngnnq1dagvKcwTov6zadXURGVa/hBCuLb+iV0g0=; b=Ww0FZsoqWstDI3QG7hsJdUGvMdJsbxIxoHt/mEeWP4QGI5gV9T9kphvmhFMc+prQxy 87kVNaxW8YMi5UE08KaccdSYwzGZZOOXCbBOE04BBtJGygZwXo0mA1XieHqUsxheAymg 4V772L3FpdPnES5FGzWtSY2Lit4NPzln2LSMDBFNwpJ7YxOFoAf/l6F98nVo3LF0SVj7 c7VN0nAC9y0TdBz/fYREpQoxW/fkmXJOYdNCSSJBIZx+vIAFLaxuhnqgCDaLGvTPoDPJ E8Qg+qokEm82xc+ZCtReaT82DcnvpnhVJhVNDbj88oFdXZyehstlODP4TQdFooOs04Tt cJ+w== X-Gm-Message-State: AO0yUKUXi1eXSOo7jdCYoRwAg/lxgyGGXKiDO31WkKyOcdCUzzQGiSo6 +r70IKzFyB9NthqHudJJJZyAR0sY2Q4= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a63:5715:0:b0:477:c1a3:9a10 with SMTP id l21-20020a635715000000b00477c1a39a10mr1205868pgb.33.1675362503600; Thu, 02 Feb 2023 10:28:23 -0800 (PST) Reply-To: Sean Christopherson Date: Thu, 2 Feb 2023 18:28:16 +0000 In-Reply-To: <20230202182817.407394-1-seanjc@google.com> Mime-Version: 1.0 References: <20230202182817.407394-1-seanjc@google.com> X-Mailer: git-send-email 2.39.1.519.gcb327c4b5f-goog Message-ID: <20230202182817.407394-3-seanjc@google.com> Subject: [PATCH v2 2/3] KVM: x86/mmu: Detect write #PF to shadow pages during FNAME(fetch) walk From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Huang Hang , Lai Jiangshan X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1756745022749476600?= X-GMAIL-MSGID: =?utf-8?q?1756745022749476600?= From: Lai Jiangshan Move the detection of write #PF to shadow pages, i.e. a fault on a write to a page table that is being shadowed by KVM that is used to translate the write itself, from FNAME(is_self_change_mapping) to FNAME(fetch). There is no need to detect the self-referential write before kvm_faultin_pfn() as KVM does not consume EMULTYPE_WRITE_PF_TO_SP for accesses that resolve to "error or no-slot" pfns, i.e. KVM doesn't allow retrying MMIO accesses or writes to read-only memslots. Detecting the EMULTYPE_WRITE_PF_TO_SP scenario in FNAME(fetch) will allow dropping FNAME(is_self_change_mapping) entirely, as the hugepage interaction can be deferred to kvm_mmu_hugepage_adjust(). Cc: Huang Hang Signed-off-by: Lai Jiangshan Link: https://lore.kernel.org/r/20221213125538.81209-1-jiangshanlai@gmail.com [sean: split to separate patch, write changelog] Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/paging_tmpl.h | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 5d2958299b4f..f57d9074fb9b 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -685,6 +685,9 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, if (sp != ERR_PTR(-EEXIST)) link_shadow_page(vcpu, it.sptep, sp); + + if (fault->write && table_gfn == fault->gfn) + fault->write_fault_to_shadow_pgtable = true; } kvm_mmu_hugepage_adjust(vcpu, fault); @@ -741,17 +744,13 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, * created when kvm establishes shadow page table that stop kvm using large * page size. Do it early can avoid unnecessary #PF and emulation. * - * @write_fault_to_shadow_pgtable will return true if the fault gfn is - * currently used as its page table. - * * Note: the PDPT page table is not checked for PAE-32 bit guest. It is ok * since the PDPT is always shadowed, that means, we can not use large page * size to map the gfn which is used as PDPT. */ static bool FNAME(is_self_change_mapping)(struct kvm_vcpu *vcpu, - struct guest_walker *walker, bool user_fault, - bool *write_fault_to_shadow_pgtable) + struct guest_walker *walker, bool user_fault) { int level; gfn_t mask = ~(KVM_PAGES_PER_HPAGE(walker->level) - 1); @@ -765,7 +764,6 @@ FNAME(is_self_change_mapping)(struct kvm_vcpu *vcpu, gfn_t gfn = walker->gfn ^ walker->table_gfn[level - 1]; self_changed |= !(gfn & mask); - *write_fault_to_shadow_pgtable |= !gfn; } return self_changed; @@ -826,7 +824,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault return r; is_self_change_mapping = FNAME(is_self_change_mapping)(vcpu, - &walker, fault->user, &fault->write_fault_to_shadow_pgtable); + &walker, fault->user); if (is_self_change_mapping) fault->max_level = PG_LEVEL_4K; From patchwork Thu Feb 2 18:28:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 52120 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp401008wrn; Thu, 2 Feb 2023 10:30:28 -0800 (PST) X-Google-Smtp-Source: AK7set9w94CfptXj4FiCxhrW4+korYiH+ulNOW886IwXYpfJg4MwGq/CMmqg3zCNmsbFmWIaRlRi X-Received: by 2002:a05:6a00:228d:b0:58d:ac19:8950 with SMTP id f13-20020a056a00228d00b0058dac198950mr2846091pfe.33.1675362628028; Thu, 02 Feb 2023 10:30:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1675362628; cv=none; d=google.com; s=arc-20160816; b=YAPAUATnaVnO70dfyl5KzN4U4HBefOGABFCEYWox8kIxwjjLZpLjZZsuFCF7FvVDYr 6fAJqhRIeaPvK2DZSFH1Oune3E9zhBJh2zteWj+1MS6Vcl7jHw4YhrNPWwzJyspLePdS l1zSBS/Q/6nuhqiKzhS7lcSdSWw3sGExyOzEaRw5KBcKV1pxqt+pBYfNcUZ0PE3Qngjs QDEol9LgkM+uWVh/Vj1TNxduXOzjFgPtbK4R7B4nEs8rRhTA6C//gaEh1/4O16m6TuSB xKT4BpAnS5x+sMlYkJsfmotgGOTj6pPMtMPa3lSgoZBSpaptrOiZc4ICgcrIpEe0s30M SmMA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:reply-to:dkim-signature; bh=Leen11Z2EIFQsH+NN7hqdICZGapTSqx3IkQSdi85rv4=; b=Qs+7ZkZAexe2mQCzg83rl3kVrn4uCtIs7jgWj06TLMbz0encRhJjt/gKWoKHA/MRK+ QdEAKbJK0tOsi5piSIzj6hGgdSgf+Xn4EZE/nUo6h6D3KibPMk23qi8KiB5I+Hb6bILh zSrMEO/2hrExmCWuu9X4xxpaKHuCU04Q2eJ4lQ9Pv1OMV5YdkqxxPantxwh7VoBmIU1i 3cgA+BAhEr3CFp+U959vji396U/SsXJq3RalM9bZiqEYF+n8UeFmxh5qF8Bgl7W7lUGP T23nmzfYbLh0lViXx1gIKId0Umu+pU7Vs1bpriqFoLiHF23kXYtdHXDGFcUGazlgN3HA rDOA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="swDomma/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z7-20020aa79f87000000b005943e277f02si8968pfr.141.2023.02.02.10.30.15; Thu, 02 Feb 2023 10:30:28 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="swDomma/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232655AbjBBS33 (ORCPT + 99 others); Thu, 2 Feb 2023 13:29:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33490 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232658AbjBBS2s (ORCPT ); Thu, 2 Feb 2023 13:28:48 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 82E9466EEE for ; Thu, 2 Feb 2023 10:28:27 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id e69-20020a253748000000b00845f15be258so2466644yba.9 for ; Thu, 02 Feb 2023 10:28:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=Leen11Z2EIFQsH+NN7hqdICZGapTSqx3IkQSdi85rv4=; b=swDomma/Ao/1AjVGqZhc77Ly1aJmaYi96lJ6fEnLjho/yWLYjCbG87uCRLZVXzLq1M LScVQt+pZ9knZ5XD8RVoKRxLT1q+fEpmj2L2LXViaHSf7ilAayTs/eQh/Rwtvr2Vs3eW TtDm8cGugXeffCt23ir1w6zNFfj7yik+sgpFjM9V9BSjPfEoc9GUBKMV6qBA74fyHAWP +dsl0anfnGpkYnWUciMVIuHcfYl7QH02zwFv0QlTgGrXln7suPDF2xX5Qqk8Hsk0ZjyP jBboZOjT/qXiq8k9yfGUjmpOpA/fkbxmaAzM0uqiIkU6UwAe9vuKWKkQTF0m1+TCwBM8 dJvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Leen11Z2EIFQsH+NN7hqdICZGapTSqx3IkQSdi85rv4=; b=cRLIsyneAQf9tYmJ7Z2YrC/Rx5yxL4HigQJd6hgJrY2yvqlIZR/GDe1AVuxrVQdabM 4ttdOyp933bldbGK1JmuCk5tka/M9KHuQad2pCSDTT4sbbLdBBVNdVXK3bYRL3KFMPlZ Mey83qunQq5NfZX53RfUrxBI5jliqf7Er/7xqIZKyS6mea6QLIQtGDOiKF+hDCzI7WeO 7oLfeVKfWG2nFPr0xNx6ooRbBLVYh0DHiLxdE0SrUcpbupWkDi6Ojpnv1Hb+2WuOhzyv M4SMPvG5cJnrAdyIaAaZy5gmeWU0FgkvE3o1g2DQF5EQbG56tvUWBg53DPeXcum5IXfm Vw7Q== X-Gm-Message-State: AO0yUKWQyrKfJPj1T8tJcRxpEJc/ZN7rjTHgM2XSdhjk13oXyakRIptx OSPX3LnazhwG47bHK+aK9NVc1DJe2Wk= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a25:d095:0:b0:857:8f9c:7b87 with SMTP id h143-20020a25d095000000b008578f9c7b87mr368794ybg.558.1675362505413; Thu, 02 Feb 2023 10:28:25 -0800 (PST) Reply-To: Sean Christopherson Date: Thu, 2 Feb 2023 18:28:17 +0000 In-Reply-To: <20230202182817.407394-1-seanjc@google.com> Mime-Version: 1.0 References: <20230202182817.407394-1-seanjc@google.com> X-Mailer: git-send-email 2.39.1.519.gcb327c4b5f-goog Message-ID: <20230202182817.407394-4-seanjc@google.com> Subject: [PATCH v2 3/3] KVM: x86/mmu: Remove FNAME(is_self_change_mapping) From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Huang Hang , Lai Jiangshan X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1756745042998183211?= X-GMAIL-MSGID: =?utf-8?q?1756745042998183211?= From: Lai Jiangshan Drop FNAME(is_self_change_mapping) and instead rely on kvm_mmu_hugepage_adjust() to adjust the hugepage accordingly. Prior to commit 4cd071d13c5c ("KVM: x86/mmu: Move calls to thp_adjust() down a level"), the hugepage adjustment was done before allocating new shadow pages, i.e. failed to restrict the hugepage sizes if a new shadow page resulted in account_shadowed() changing the disallowed hugepage tracking. Removing FNAME(is_self_change_mapping) fixes a bug reported by Huang Hang where KVM unnecessarily forces a 4KiB page. FNAME(is_self_change_mapping) has a defect in that it blindly disables _all_ hugepage mappings rather than trying to reduce the size of the hugepage. If the guest is writing to a 1GiB page and the 1GiB is self-referential but a 2MiB page is not, then KVM can and should create a 2MiB mapping. Add a comment above the call to kvm_mmu_hugepage_adjust() to call out the new dependency on adjusting the hugepage size after walking indirect PTEs. Reported-by: Huang Hang Signed-off-by: Lai Jiangshan Link: https://lore.kernel.org/r/20221213125538.81209-1-jiangshanlai@gmail.com [sean: rework changelog after separating out the emulator change] Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/paging_tmpl.h | 51 +++++----------------------------- 1 file changed, 7 insertions(+), 44 deletions(-) diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index f57d9074fb9b..a056f2773dd9 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -690,6 +690,12 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, fault->write_fault_to_shadow_pgtable = true; } + /* + * Adjust the hugepage size _after_ resolving indirect shadow pages. + * KVM doesn't support mapping hugepages into the guest for gfns that + * are being shadowed by KVM, i.e. allocating a new shadow page may + * affect the allowed hugepage size. + */ kvm_mmu_hugepage_adjust(vcpu, fault); trace_kvm_mmu_spte_requested(fault); @@ -734,41 +740,6 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, return RET_PF_RETRY; } - /* - * To see whether the mapped gfn can write its page table in the current - * mapping. - * - * It is the helper function of FNAME(page_fault). When guest uses large page - * size to map the writable gfn which is used as current page table, we should - * force kvm to use small page size to map it because new shadow page will be - * created when kvm establishes shadow page table that stop kvm using large - * page size. Do it early can avoid unnecessary #PF and emulation. - * - * Note: the PDPT page table is not checked for PAE-32 bit guest. It is ok - * since the PDPT is always shadowed, that means, we can not use large page - * size to map the gfn which is used as PDPT. - */ -static bool -FNAME(is_self_change_mapping)(struct kvm_vcpu *vcpu, - struct guest_walker *walker, bool user_fault) -{ - int level; - gfn_t mask = ~(KVM_PAGES_PER_HPAGE(walker->level) - 1); - bool self_changed = false; - - if (!(walker->pte_access & ACC_WRITE_MASK || - (!is_cr0_wp(vcpu->arch.mmu) && !user_fault))) - return false; - - for (level = walker->level; level <= walker->max_level; level++) { - gfn_t gfn = walker->gfn ^ walker->table_gfn[level - 1]; - - self_changed |= !(gfn & mask); - } - - return self_changed; -} - /* * Page fault handler. There are several causes for a page fault: * - there is no shadow pte for the guest pte @@ -787,7 +758,6 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault { struct guest_walker walker; int r; - bool is_self_change_mapping; pgprintk("%s: addr %lx err %x\n", __func__, fault->addr, fault->error_code); WARN_ON_ONCE(fault->is_tdp); @@ -812,6 +782,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault } fault->gfn = walker.gfn; + fault->max_level = walker.level; fault->slot = kvm_vcpu_gfn_to_memslot(vcpu, fault->gfn); if (page_fault_handle_page_track(vcpu, fault)) { @@ -823,14 +794,6 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault if (r) return r; - is_self_change_mapping = FNAME(is_self_change_mapping)(vcpu, - &walker, fault->user); - - if (is_self_change_mapping) - fault->max_level = PG_LEVEL_4K; - else - fault->max_level = walker.level; - r = kvm_faultin_pfn(vcpu, fault, walker.pte_access); if (r != RET_PF_CONTINUE) return r;