From patchwork Thu Sep 14 01:55:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sean Christopherson X-Patchwork-Id: 139477 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:172:b0:3f2:4152:657d with SMTP id h50csp244014vqi; Thu, 14 Sep 2023 03:20:55 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEhpPsrO3jW4ZMM+iULrP6/DIt+sbv9YSvSpb5uIBomf5wcUc3+VI9YDHBOIFhXmCXhkjz3 X-Received: by 2002:a05:6a21:329c:b0:12e:98a3:77b7 with SMTP id yt28-20020a056a21329c00b0012e98a377b7mr5870893pzb.59.1694686854997; Thu, 14 Sep 2023 03:20:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694686854; cv=none; d=google.com; s=arc-20160816; b=Mx1uwr8mAGEanNfJCDE0c5oNMCNGceKM3fqcBTDcDjMkmqYb5tWSO73bNnIow8UrX2 tJdM6g5myCZR1r1peJ7INZEHstBiiKNH2kPvu1TzQqjmuFp9jpOuRTHmwvziB3CLsQ37 SOmqmqOzxDnw//qG/WXI/K6xGRAau4DtmskeXMMLk6ShvZz4yrs+vo8WUjrkPhVC0V7U VNX5OawEXHKndMitFf16qvKowHCA1AnqaNwgPN1X9NhZQ8j647Pv1gaoEU6I7MXPRBT8 u8uhGWysVT1wZenj7UyYsRXhiavpT7y9D292RemnPk8PvNwkDOHLc9u47unIeeugJxTD e9wg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:reply-to:dkim-signature; bh=tHJ1oIsuZhhB0EvSSZcC1bEv2oeM3xd3VRRUpH1LT1s=; fh=61hsfVoef5Tbbo+Rm06/Hxsz4fAtyORDF8Po5ZVRZDI=; b=uLY/0iawQneGxknGVgaVyCn0vo/bRYVd+zsQrsgXxCVVPSTr7/VdRO6n8YBQZZ1u6A xxHliGXre/cWfgdqWk7X7d51oAnNraWZ5UocH9dlKwTsaRYRb6v0a0meTyZWlebnIo+/ H1cNNZ1d14FktTXaNha64j+5L4oy4SmNJtcJQNSqGiPROvpljHOPoz4/xjLyvrToPWmP dDkY/5MBCrCjXiS48FxmIT4inxUJ0Qta2JmKSW2aIIYHryyvgP0O444OCV/o0e7BmNoc jLP9cY+mHO6GXG9HaVoTvmPEPt9JwisCRUvHUNvdysXtdv4DgcM5mTpkuNgpkntKmhiw yRgA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=k8H1WS+i; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from fry.vger.email (fry.vger.email. [23.128.96.38]) by mx.google.com with ESMTPS id w15-20020a170902e88f00b001b87bd2f7b0si1450167plg.402.2023.09.14.03.20.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Sep 2023 03:20:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) client-ip=23.128.96.38; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=k8H1WS+i; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id E69A0802D272; Wed, 13 Sep 2023 18:59:33 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234301AbjINB7Q (ORCPT + 35 others); Wed, 13 Sep 2023 21:59:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39210 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234306AbjINB6c (ORCPT ); Wed, 13 Sep 2023 21:58:32 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D07D31FF7 for ; Wed, 13 Sep 2023 18:56:12 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-d814634fe4bso612709276.1 for ; Wed, 13 Sep 2023 18:56:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1694656571; x=1695261371; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=tHJ1oIsuZhhB0EvSSZcC1bEv2oeM3xd3VRRUpH1LT1s=; b=k8H1WS+ik5sEN0liqNxifgH2t79bYTLvthfYFcv5oHQCnlcNxn1t5IB/Txo5/AZzxQ 6rFiqoEDnpZsKOFLxtNNA+iXW3EOQZPUCWD7rymmNIkjhLjjwTuWRS6Ci7Br8VzlWJ+g xK1FNQ+IkS8uhixtBj60+c5byipi2+LKWxjIT6fCjwDFd3lfZyMTnZtRl8KV98zrNFPP J5PhlZqIAlFZb4hw4pP2IRfpEzsM7UZtlvr9ZGjKW3BWrIaDxTpXhTPhlws284hmHtkK UV0MTO0UBEgZoT9FG8M3BURS/vpKvocZzL+glHNLuCvLVfAGHQoZwNitRftox56VelRw kpjg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694656571; x=1695261371; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=tHJ1oIsuZhhB0EvSSZcC1bEv2oeM3xd3VRRUpH1LT1s=; b=EuRYt8ioz1TD/0YoPesZsS3RWDb58kChoV8t+zGGWjA8A8MFMaNaLipGAZJ8bJ4gej DSq24E6zkMmpQqdhV8+/q5D8dBB5KF+c/amRn7Ljt6vVSmhjb9ULUrSrFMjZnvxmdFHT j1KAFKlzoJSh2dOXBWeVzbAHHjiD4+K/8Z2T1VC57tU4sFrljhc4P/wOubKD/XMjSqhT ry5TTAVnu7HeLi7QErNG+t09lUGIR/PxYdaxYbd5wYDfdqw+D/mgdzFoKhhG5VDeKnUl mXt7No/U7NVPTbvwHXzRDMra7YxJ3JIEG9XjfeRoOH6dXwUDq3N2P2Uk5NHi25qlDw5h D57w== X-Gm-Message-State: AOJu0YwWve77gi5nOzAKvG3G9ksP03cg0iWr/quEwT2y9NUJSBm8XRPe ugJr/wjg1QQQ7s4z7zIRsFsnC37ihw8= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a25:aa83:0:b0:d77:984e:c770 with SMTP id t3-20020a25aa83000000b00d77984ec770mr96102ybi.5.1694656571602; Wed, 13 Sep 2023 18:56:11 -0700 (PDT) Reply-To: Sean Christopherson Date: Wed, 13 Sep 2023 18:55:16 -0700 In-Reply-To: <20230914015531.1419405-1-seanjc@google.com> Mime-Version: 1.0 References: <20230914015531.1419405-1-seanjc@google.com> X-Mailer: git-send-email 2.42.0.283.g2d96d420d3-goog Message-ID: <20230914015531.1419405-19-seanjc@google.com> Subject: [RFC PATCH v12 18/33] KVM: x86/mmu: Handle page fault for private memory From: Sean Christopherson To: Paolo Bonzini , Marc Zyngier , Oliver Upton , Huacai Chen , Michael Ellerman , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , Sean Christopherson , "Matthew Wilcox (Oracle)" , Andrew Morton , Paul Moore , James Morris , "Serge E. Hallyn" Cc: kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org, Chao Peng , Fuad Tabba , Jarkko Sakkinen , Anish Moorthy , Yu Zhang , Isaku Yamahata , Xu Yilun , Vlastimil Babka , Vishal Annapurve , Ackerley Tng , Maciej Szmigiero , David Hildenbrand , Quentin Perret , Michael Roth , Wang , Liam Merwick , Isaku Yamahata , "Kirill A . Shutemov" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Wed, 13 Sep 2023 18:59:33 -0700 (PDT) X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777007963696848318 X-GMAIL-MSGID: 1777007963696848318 From: Chao Peng A KVM_MEM_PRIVATE memslot can include both fd-based private memory and hva-based shared memory. Architecture code (like TDX code) can tell whether the on-going fault is private or not. This patch adds a 'is_private' field to kvm_page_fault to indicate this and architecture code is expected to set it. To handle page fault for such memslot, the handling logic is different depending on whether the fault is private or shared. KVM checks if 'is_private' matches the host's view of the page (maintained in mem_attr_array). - For a successful match, private pfn is obtained with restrictedmem_get_page() and shared pfn is obtained with existing get_user_pages(). - For a failed match, KVM causes a KVM_EXIT_MEMORY_FAULT exit to userspace. Userspace then can convert memory between private/shared in host's view and retry the fault. Co-developed-by: Yu Zhang Signed-off-by: Yu Zhang Signed-off-by: Chao Peng Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 94 +++++++++++++++++++++++++++++++-- arch/x86/kvm/mmu/mmu_internal.h | 1 + 2 files changed, 90 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index a079f36a8bf5..9b48d8d0300b 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3147,9 +3147,9 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn, return level; } -int kvm_mmu_max_mapping_level(struct kvm *kvm, - const struct kvm_memory_slot *slot, gfn_t gfn, - int max_level) +static int __kvm_mmu_max_mapping_level(struct kvm *kvm, + const struct kvm_memory_slot *slot, + gfn_t gfn, int max_level, bool is_private) { struct kvm_lpage_info *linfo; int host_level; @@ -3161,6 +3161,9 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm, break; } + if (is_private) + return max_level; + if (max_level == PG_LEVEL_4K) return PG_LEVEL_4K; @@ -3168,6 +3171,16 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm, return min(host_level, max_level); } +int kvm_mmu_max_mapping_level(struct kvm *kvm, + const struct kvm_memory_slot *slot, gfn_t gfn, + int max_level) +{ + bool is_private = kvm_slot_can_be_private(slot) && + kvm_mem_is_private(kvm, gfn); + + return __kvm_mmu_max_mapping_level(kvm, slot, gfn, max_level, is_private); +} + void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { struct kvm_memory_slot *slot = fault->slot; @@ -3188,8 +3201,9 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault * Enforce the iTLB multihit workaround after capturing the requested * level, which will be used to do precise, accurate accounting. */ - fault->req_level = kvm_mmu_max_mapping_level(vcpu->kvm, slot, - fault->gfn, fault->max_level); + fault->req_level = __kvm_mmu_max_mapping_level(vcpu->kvm, slot, + fault->gfn, fault->max_level, + fault->is_private); if (fault->req_level == PG_LEVEL_4K || fault->huge_page_disallowed) return; @@ -4261,6 +4275,55 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work) kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, 0, true, NULL); } +static inline u8 kvm_max_level_for_order(int order) +{ + BUILD_BUG_ON(KVM_MAX_HUGEPAGE_LEVEL > PG_LEVEL_1G); + + KVM_MMU_WARN_ON(order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G) && + order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M) && + order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_4K)); + + if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G)) + return PG_LEVEL_1G; + + if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M)) + return PG_LEVEL_2M; + + return PG_LEVEL_4K; +} + +static void kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vcpu, + struct kvm_page_fault *fault) +{ + kvm_prepare_memory_fault_exit(vcpu, fault->gfn << PAGE_SHIFT, + PAGE_SIZE, fault->write, fault->exec, + fault->is_private); +} + +static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu, + struct kvm_page_fault *fault) +{ + int max_order, r; + + if (!kvm_slot_can_be_private(fault->slot)) { + kvm_mmu_prepare_memory_fault_exit(vcpu, fault); + return -EFAULT; + } + + r = kvm_gmem_get_pfn(vcpu->kvm, fault->slot, fault->gfn, &fault->pfn, + &max_order); + if (r) { + kvm_mmu_prepare_memory_fault_exit(vcpu, fault); + return r; + } + + fault->max_level = min(kvm_max_level_for_order(max_order), + fault->max_level); + fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY); + + return RET_PF_CONTINUE; +} + static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { struct kvm_memory_slot *slot = fault->slot; @@ -4293,6 +4356,14 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault return RET_PF_EMULATE; } + if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) { + kvm_mmu_prepare_memory_fault_exit(vcpu, fault); + return -EFAULT; + } + + if (fault->is_private) + return kvm_faultin_pfn_private(vcpu, fault); + async = false; fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, false, &async, fault->write, &fault->map_writable, @@ -7184,6 +7255,19 @@ void kvm_mmu_pre_destroy_vm(struct kvm *kvm) } #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES +bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm, + struct kvm_gfn_range *range) +{ + /* + * KVM x86 currently only supports KVM_MEMORY_ATTRIBUTE_PRIVATE, skip + * the slot if the slot will never consume the PRIVATE attribute. + */ + if (!kvm_slot_can_be_private(range->slot)) + return false; + + return kvm_mmu_unmap_gfn_range(kvm, range); +} + static bool hugepage_test_mixed(struct kvm_memory_slot *slot, gfn_t gfn, int level) { diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index b102014e2c60..4efbf43b4b18 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -202,6 +202,7 @@ struct kvm_page_fault { /* Derived from mmu and global state. */ const bool is_tdp; + const bool is_private; const bool nx_huge_page_workaround_enabled; /*