From patchwork Fri Apr 7 20:19:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sagi Shahar X-Patchwork-Id: 81029 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp514617vqo; Fri, 7 Apr 2023 13:33:51 -0700 (PDT) X-Google-Smtp-Source: AKy350Yr3QP1Y4jVnwPNWm/HARYtJtqDuIA1zA69nPzQGT9JIqUEYzyRaXF/Nhn+6mKICsmO4BPV X-Received: by 2002:a17:907:78d6:b0:947:335f:5a0d with SMTP id kv22-20020a17090778d600b00947335f5a0dmr612298ejc.62.1680899630915; Fri, 07 Apr 2023 13:33:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680899630; cv=none; d=google.com; s=arc-20160816; b=ltMKkV+CvnK6NMkcip4aL6P/GdBgWJcWK6S9dm0XAkxJ/oGK7u/Ag8t5gesPhjzfPG sW2/1DDjtfD46FHiXjQ6YtbBHTyd88pYZKY9lYnrcPorg7FVhcAbU27t/gybl6nUvrkA zwKEKWhtVzdPzr0a1wXqn/kqh4ggC5uHHae1Q60b1XjiJMYi6ATm8FJrqw4mxR77xq27 EZ7GZ/CU1OpGmBoe9pKBgWTlPXn1M4eHmjU5NNk947Wo0gzuqSocDr01+ARFiPNmNqF0 IsGEo5xrDHKIGC97UgvifNjm3U+Tn/a+uEQ8sI9FvtKIhn896BT+4VMvvXIFEKYKc8Pv uMUg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=88YN7x3qppPMVMra5pRzHHPVPBEL4gMuQXheETIYTNU=; b=v8sCpRMz9P/Up7is1XxHFQ+BHMzX6LXtHXSqmWvB4tgWX4Iwi1vc/t/E5H9VisrlDn z4osB9H12AUIWgsxhB0w7BUxsZEDWRfXy1lM2bi7RjemYis4RqrPw/VpzFlI6qd/51yV 1VWY49vS5wnDsR0EsY6MIg6iaYLc5iue//JbnrP5srO5QTk49n2KWD9UOet68vHXLMaH z3ygWK7covBQQlwaNnsHoRIC4f247RRnlGRynP2OMl/j4TJlVdXb9XJwWWskpIyyhmS5 X3fk0/Nad9g0yaHIaxQFssnCxelqFH2yU8JMOJWw9fxeRhfjYL36TWC4HmBm9HhfvpI0 ATFw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=OMW9kzjN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e24-20020a170906249800b00948021bad39si3846186ejb.646.2023.04.07.13.33.27; Fri, 07 Apr 2023 13:33:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=OMW9kzjN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230013AbjDGUUn (ORCPT + 99 others); Fri, 7 Apr 2023 16:20:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46070 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231283AbjDGUU3 (ORCPT ); Fri, 7 Apr 2023 16:20:29 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BF449CA1E for ; Fri, 7 Apr 2023 13:20:11 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id 204-20020a250fd5000000b00b6d6655dc35so42735805ybp.6 for ; Fri, 07 Apr 2023 13:20:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1680898810; x=1683490810; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=88YN7x3qppPMVMra5pRzHHPVPBEL4gMuQXheETIYTNU=; b=OMW9kzjNG7JkuumKUv1gxM5ClRnzMIgI7kcKO2DxeN5LnLC/y/rEnRoSpywnKQs6hD xlQD/OsuGnJ5npkky5Zeo0idUTx/ZeUjWrN65bVSD+9YE94mwbexeVuduMYRcJXkdWR9 c5e4/E0ZMYZkZuNx3ir9D1lXfiqPJjj8W9x34KqqTmzX6auXI6JlOD5uUwQyobkTZHkX GCYUe1C5vg1gq2woLG6c4aqn7ii1vyB75r4qVRPb3gbnB8AF+iJW/hONV03AuDt//Ykl +JKBwB+Km4olaMCfJvrbGcPiR60/pmR36bZ7vdgt09knNdseMPjEeCZ5XJB2aV2wWMRD M2Dg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680898810; x=1683490810; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=88YN7x3qppPMVMra5pRzHHPVPBEL4gMuQXheETIYTNU=; b=PqwT6TPWyaZCnjM5nJua/CMiBucQExiiWd3RbQaE60QBXvgbUz+HHZVcxzT0aSyZb0 1B+bMUfI4STuCQJqm8cGTdsxKGFuqx+l8AR0Y+8Gym3ZRM49C3Ejb/N2u0mf6oUurhyE 6YtOH5QSwF56wZM5SMBrD7IEh2h3UOTGTT2m2aCGPretjuWMEcLmFrlsqLOtXICfRbOn /8zhPMw0rMmG+Ao6VngTJRIQjAryn7zf5YAB/VHvwydKxMCiNIJ4qOh6O4w2b7JtNPr/ rzxWGF6O28N8rQaY8l68WK0qevvI0hBcmtlrusQIIStQpyuZ9H+SwsLEnXfMM3cjgzCp rVuw== X-Gm-Message-State: AAQBX9dzWiyNfaYe9/0FyfYAq+re0xnkNxgHb1C0HJbcZEpY4xmdFpgN LiirI6vGAnBUFN5hq/t7l4WeQYu/5A== X-Received: from sagi.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:241b]) (user=sagis job=sendgmr) by 2002:a25:d958:0:b0:b75:3fd4:1b31 with SMTP id q85-20020a25d958000000b00b753fd41b31mr2479838ybg.1.1680898810376; Fri, 07 Apr 2023 13:20:10 -0700 (PDT) Date: Fri, 7 Apr 2023 20:19:21 +0000 In-Reply-To: <20230407201921.2703758-1-sagis@google.com> Mime-Version: 1.0 References: <20230407201921.2703758-1-sagis@google.com> X-Mailer: git-send-email 2.40.0.577.gac1e443424-goog Message-ID: <20230407201921.2703758-6-sagis@google.com> Subject: [RFC PATCH 5/5] KVM: TDX: Add core logic for TDX intra-host migration From: Sagi Shahar To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org Cc: Sean Christopherson , Paolo Bonzini , Isaku Yamahata , Erdem Aktas , David Matlack , Kai Huang , Zhi Wang , Chao Peng , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , Sagi Shahar X-Spam-Status: No, score=-7.7 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1762551011471763384?= X-GMAIL-MSGID: =?utf-8?q?1762551011471763384?= Adds the core logic for transferring state between source and destination TDs during intra-host migration. Signed-off-by: Sagi Shahar --- arch/x86/kvm/vmx/tdx.c | 191 ++++++++++++++++++++++++++++++++++++++++- 1 file changed, 190 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 0999a6d827c99..05b164a91437b 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -2834,9 +2834,198 @@ static __always_inline bool tdx_guest(struct kvm *kvm) return tdx_kvm->finalized; } +#define for_each_memslot_pair(memslots_1, memslots_2, memslot_iter_1, \ + memslot_iter_2) \ + for (memslot_iter_1 = rb_first(&memslots_1->gfn_tree), \ + memslot_iter_2 = rb_first(&memslots_2->gfn_tree); \ + memslot_iter_1 && memslot_iter_2; \ + memslot_iter_1 = rb_next(memslot_iter_1), \ + memslot_iter_2 = rb_next(memslot_iter_2)) + static int tdx_migrate_from(struct kvm *dst, struct kvm *src) { - return -EINVAL; + struct rb_node *src_memslot_iter, *dst_memslot_iter; + struct vcpu_tdx *dst_tdx_vcpu, *src_tdx_vcpu; + struct kvm_memslots *src_slots, *dst_slots; + struct kvm_vcpu *dst_vcpu, *src_vcpu; + struct kvm_tdx *src_tdx, *dst_tdx; + unsigned long i, j; + int ret; + + src_tdx = to_kvm_tdx(src); + dst_tdx = to_kvm_tdx(dst); + + src_slots = __kvm_memslots(src, 0); + dst_slots = __kvm_memslots(dst, 0); + + ret = -EINVAL; + + if (!src_tdx->finalized) { + pr_warn("Cannot migrate from a non finalized VM\n"); + goto abort; + } + + // Traverse both memslots in gfn order and compare them + for_each_memslot_pair(src_slots, dst_slots, src_memslot_iter, dst_memslot_iter) { + struct kvm_memory_slot *src_slot, *dst_slot; + + src_slot = + container_of(src_memslot_iter, struct kvm_memory_slot, + gfn_node[src_slots->node_idx]); + dst_slot = + container_of(src_memslot_iter, struct kvm_memory_slot, + gfn_node[dst_slots->node_idx]); + + if (src_slot->base_gfn != dst_slot->base_gfn || + src_slot->npages != dst_slot->npages) { + pr_warn("Cannot migrate between VMs with different memory slots configurations\n"); + goto abort; + } + + if (src_slot->flags != dst_slot->flags) { + pr_warn("Cannot migrate between VMs with different memory slots configurations\n"); + goto abort; + } + + if (src_slot->flags & KVM_MEM_PRIVATE) { + if (src_slot->restrictedmem.file->f_inode->i_ino != + dst_slot->restrictedmem.file->f_inode->i_ino) { + pr_warn("Private memslots points to different restricted files\n"); + goto abort; + } + + if (src_slot->restrictedmem.index != dst_slot->restrictedmem.index) { + pr_warn("Private memslots points to the restricted file at different offsets\n"); + goto abort; + } + } + } + + if (src_memslot_iter || dst_memslot_iter) { + pr_warn("Cannot migrate between VMs with different memory slots configurations\n"); + goto abort; + } + + dst_tdx->hkid = src_tdx->hkid; + dst_tdx->tdr_pa = src_tdx->tdr_pa; + + dst_tdx->tdcs_pa = kcalloc(tdx_info.nr_tdcs_pages, sizeof(*dst_tdx->tdcs_pa), + GFP_KERNEL_ACCOUNT | __GFP_ZERO); + if (!dst_tdx->tdcs_pa) { + ret = -ENOMEM; + goto late_abort; + } + memcpy(dst_tdx->tdcs_pa, src_tdx->tdcs_pa, + tdx_info.nr_tdcs_pages * sizeof(*dst_tdx->tdcs_pa)); + + dst_tdx->tsc_offset = src_tdx->tsc_offset; + dst_tdx->attributes = src_tdx->attributes; + dst_tdx->xfam = src_tdx->xfam; + dst_tdx->kvm.arch.gfn_shared_mask = src_tdx->kvm.arch.gfn_shared_mask; + + kvm_for_each_vcpu(i, src_vcpu, src) + tdx_flush_vp_on_cpu(src_vcpu); + + /* Copy per-vCPU state */ + kvm_for_each_vcpu(i, src_vcpu, src) { + src_tdx_vcpu = to_tdx(src_vcpu); + dst_vcpu = kvm_get_vcpu(dst, i); + dst_tdx_vcpu = to_tdx(dst_vcpu); + + vcpu_load(dst_vcpu); + + memcpy(dst_vcpu->arch.regs, src_vcpu->arch.regs, + NR_VCPU_REGS * sizeof(src_vcpu->arch.regs[0])); + dst_vcpu->arch.regs_avail = src_vcpu->arch.regs_avail; + dst_vcpu->arch.regs_dirty = src_vcpu->arch.regs_dirty; + + dst_vcpu->arch.tsc_offset = dst_tdx->tsc_offset; + + dst_tdx_vcpu->interrupt_disabled_hlt = src_tdx_vcpu->interrupt_disabled_hlt; + dst_tdx_vcpu->buggy_hlt_workaround = src_tdx_vcpu->buggy_hlt_workaround; + + dst_tdx_vcpu->tdvpr_pa = src_tdx_vcpu->tdvpr_pa; + dst_tdx_vcpu->tdvpx_pa = kcalloc(tdx_info.nr_tdvpx_pages, + sizeof(*dst_tdx_vcpu->tdvpx_pa), + GFP_KERNEL_ACCOUNT); + if (!dst_tdx_vcpu->tdvpx_pa) { + ret = -ENOMEM; + vcpu_put(dst_vcpu); + goto late_abort; + } + memcpy(dst_tdx_vcpu->tdvpx_pa, src_tdx_vcpu->tdvpx_pa, + tdx_info.nr_tdvpx_pages * sizeof(*dst_tdx_vcpu->tdvpx_pa)); + + td_vmcs_write64(dst_tdx_vcpu, POSTED_INTR_DESC_ADDR, __pa(&dst_tdx_vcpu->pi_desc)); + + /* Copy private EPT tables */ + if (kvm_mmu_move_private_pages_from(dst_vcpu, src_vcpu)) { + ret = -EINVAL; + vcpu_put(dst_vcpu); + goto late_abort; + } + + for (j = 0; j < tdx_info.nr_tdvpx_pages; j++) + src_tdx_vcpu->tdvpx_pa[j] = 0; + + src_tdx_vcpu->tdvpr_pa = 0; + + vcpu_put(dst_vcpu); + } + + for_each_memslot_pair(src_slots, dst_slots, src_memslot_iter, + dst_memslot_iter) { + struct kvm_memory_slot *src_slot, *dst_slot; + + src_slot = container_of(src_memslot_iter, + struct kvm_memory_slot, + gfn_node[src_slots->node_idx]); + dst_slot = container_of(src_memslot_iter, + struct kvm_memory_slot, + gfn_node[dst_slots->node_idx]); + + for (i = 1; i < KVM_NR_PAGE_SIZES; ++i) { + unsigned long ugfn; + int level = i + 1; + + /* + * If the gfn and userspace address are not aligned wrt each other, then + * large page support should already be disabled at this level. + */ + ugfn = dst_slot->userspace_addr >> PAGE_SHIFT; + if ((dst_slot->base_gfn ^ ugfn) & (KVM_PAGES_PER_HPAGE(level) - 1)) + continue; + + dst_slot->arch.lpage_info[i - 1] = + src_slot->arch.lpage_info[i - 1]; + src_slot->arch.lpage_info[i - 1] = NULL; + } + } + + dst->mem_attr_array.xa_head = src->mem_attr_array.xa_head; + src->mem_attr_array.xa_head = NULL; + + dst_tdx->finalized = true; + + /* Clear source VM to avoid freeing the hkid and pages on VM put */ + src_tdx->hkid = -1; + src_tdx->tdr_pa = 0; + for (i = 0; i < tdx_info.nr_tdcs_pages; i++) + src_tdx->tdcs_pa[i] = 0; + + return 0; + +late_abort: + /* If we aborted after the state transfer already started, the src VM + * is no longer valid. + */ + kvm_vm_dead(src); + +abort: + dst_tdx->hkid = -1; + dst_tdx->tdr_pa = 0; + + return ret; } int tdx_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd)