From patchwork Wed Nov 8 11:18:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Nicolas Saenz Julienne X-Patchwork-Id: 162981 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:aa0b:0:b0:403:3b70:6f57 with SMTP id k11csp845183vqo; Wed, 8 Nov 2023 03:26:43 -0800 (PST) X-Google-Smtp-Source: AGHT+IE4P81E/JutCsH/B1ywYXzvyRPVY0ue6UBS5BVSs2sXu0ZlpznetEUM/ZcpmUpr2f0yytjk X-Received: by 2002:a05:6808:14d3:b0:3b2:dd7d:fc5e with SMTP id f19-20020a05680814d300b003b2dd7dfc5emr1793700oiw.10.1699442803445; Wed, 08 Nov 2023 03:26:43 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699442803; cv=none; d=google.com; s=arc-20160816; b=s3DLLNrbuXO8p896y4FMCD/9fF9Dyvf/nJarxvkpu8EH8ifLGAOVPzKJH1Ce+Le6GS s5gTx9dFvwE70L06z2ZhtKcER4HdHXGL7skBPmnLvL3OviWP5fLQ6wzvwveaktqOERSo ++oK31ejCp387n5CKuAVfdG+aE7DGpjBR32LRw8oKeKAAiC33rQlmtQrg+VtoeqoA9Ml Lx6WiFYYsEXbTqncjxDGVu4eElEB401Gva+8cqCUIQ+UXXhyDPFCjLVP3W7o1N3KjWth bs4496kyj+39MzKIOK5AFMh/S9wstqEbcEjjuyPgsZ8lI/1TadWEw4IxV9j8rv9DKSpb WmGw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=pKFAR2whklJOfx0aE2YbJs1djQGs6K3PfD1UPNSSZz8=; fh=Qdq7NqGm5JR9LpctBpXjoRI38Lb2mCk6xy26GEDp1Bg=; b=djAhvz40eCZl04bwJFiZPIUTFAjI7gWBqCpHc1NRzh/zmu623AFIvrxvx03O5ZC7pF YZnPp1uqXWywsht9QLY+eX1nnhjudHzEsgvpE5JWORO2H8YHj0dVjPsfUpVcubn87OG7 MHWIaN5fcHDqeNna1oVUnzSCUS26t69D5fdHsv0fm6AEPj9zwGbl8YAEIi+YtvCWVWF4 lQrhPyOTp384vHg9P1Z1S7BgRHAeBIVCvBheln0qwN7j5GcIjyMg6POIxN8Hdi4PIDyp vha7nXqYHQAVML/K/lXIK1sQ9UqYwp9SOCZF9xP/Eake/p3bfwE/K52C3T9gqKEzWXtN 3DVg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=lpv9ixHt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id fj13-20020a056a003a0d00b006b2562dbd24si13297451pfb.220.2023.11.08.03.26.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Nov 2023 03:26:43 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=lpv9ixHt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 202A18246337; Wed, 8 Nov 2023 03:26:38 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344524AbjKHL0F (ORCPT + 32 others); Wed, 8 Nov 2023 06:26:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51974 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344684AbjKHLZb (ORCPT ); Wed, 8 Nov 2023 06:25:31 -0500 Received: from smtp-fw-9102.amazon.com (smtp-fw-9102.amazon.com [207.171.184.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CE0251FE6; Wed, 8 Nov 2023 03:25:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1699442729; x=1730978729; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=pKFAR2whklJOfx0aE2YbJs1djQGs6K3PfD1UPNSSZz8=; b=lpv9ixHtv3vsCQrayxp3rZEZ1aMCCZtv2Ex0GTnkf7QGEq+jVR0RVBHD Gt5PPp2pBc9vsTotBUFYExN9lLZt7EbH78gjtCU8pSc4kLzPE+s5b7wHC +yAhecJ/HG2TTnxJrYXeP0ba+SYcm9xiM7Fv0/64ut0+/NK36E43X2T4F Q=; X-IronPort-AV: E=Sophos;i="6.03,286,1694736000"; d="scan'208";a="375132593" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO email-inbound-relay-pdx-2b-m6i4x-f253a3a3.us-west-2.amazon.com) ([10.25.36.214]) by smtp-border-fw-9102.sea19.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Nov 2023 11:25:28 +0000 Received: from smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev (pdx2-ws-svc-p26-lb5-vlan2.pdx.amazon.com [10.39.38.66]) by email-inbound-relay-pdx-2b-m6i4x-f253a3a3.us-west-2.amazon.com (Postfix) with ESMTPS id BAE6980718; Wed, 8 Nov 2023 11:25:27 +0000 (UTC) Received: from EX19MTAEUB002.ant.amazon.com [10.0.43.254:32548] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.45.210:2525] with esmtp (Farcaster) id 5bc3b812-def8-49f3-9115-267bad567819; Wed, 8 Nov 2023 11:25:26 +0000 (UTC) X-Farcaster-Flow-ID: 5bc3b812-def8-49f3-9115-267bad567819 Received: from EX19D004EUC001.ant.amazon.com (10.252.51.190) by EX19MTAEUB002.ant.amazon.com (10.252.51.59) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.39; Wed, 8 Nov 2023 11:25:26 +0000 Received: from dev-dsk-nsaenz-1b-189b39ae.eu-west-1.amazon.com (10.13.235.138) by EX19D004EUC001.ant.amazon.com (10.252.51.190) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.39; Wed, 8 Nov 2023 11:25:21 +0000 From: Nicolas Saenz Julienne To: CC: , , , , , , , , , , , , , , , Nicolas Saenz Julienne Subject: [RFC 33/33] Documentation: KVM: Introduce "Emulating Hyper-V VSM with KVM" Date: Wed, 8 Nov 2023 11:18:06 +0000 Message-ID: <20231108111806.92604-34-nsaenz@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20231108111806.92604-1-nsaenz@amazon.com> References: <20231108111806.92604-1-nsaenz@amazon.com> MIME-Version: 1.0 X-Originating-IP: [10.13.235.138] X-ClientProxiedBy: EX19D036UWC002.ant.amazon.com (10.13.139.242) To EX19D004EUC001.ant.amazon.com (10.252.51.190) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Wed, 08 Nov 2023 03:26:38 -0800 (PST) X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781994937215657580 X-GMAIL-MSGID: 1781994937215657580 Introduce "Emulating Hyper-V VSM with KVM", which describes the KVM APIs made available to a VMM that wants to emulate Hyper-V's VSM. Signed-off-by: Nicolas Saenz Julienne --- .../virt/kvm/x86/emulating-hyperv-vsm.rst | 136 ++++++++++++++++++ 1 file changed, 136 insertions(+) create mode 100644 Documentation/virt/kvm/x86/emulating-hyperv-vsm.rst diff --git a/Documentation/virt/kvm/x86/emulating-hyperv-vsm.rst b/Documentation/virt/kvm/x86/emulating-hyperv-vsm.rst new file mode 100644 index 000000000000..8f76bf09c530 --- /dev/null +++ b/Documentation/virt/kvm/x86/emulating-hyperv-vsm.rst @@ -0,0 +1,136 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============================== +Emulating Hyper-V VSM with KVM +============================== + +Hyper-V's Virtual Secure Mode (VSM) is a virtualisation security feature +that leverages the hypervisor to create secure execution environments +within a guest. VSM is documented as part of Microsoft's Hypervisor Top +Level Functional Specification[1]. + +Emulating Hyper-V's Virtual Secure Mode with KVM requires coordination +between KVM and the VMM. Most of the VSM state and configuration is left +to be handled by user-space, but some has made its way into KVM. This +document describes the mechanisms through which a VMM can implement VSM +support. + +Virtual Trust Levels +-------------------- + +The main concept VSM introduces are Virtual Trust Levels or VTLs. Each +VTL is a CPU mode, with its own private CPU architectural state, +interrupt subsystem (limited to a local APIC), and memory access +permissions. VTLs are hierarchical, where VTL0 corresponds to normal +guest execution and VTL > 0 to privileged execution contexts. In +practice, when virtualising Windows on top of KVM, we only see VTL0 and +VTL1. Although the spec allows going all the way to VTL15. VTLs are +orthogonal to ring levels, so each VTL is capable of runnig its own +operating system and user-space[2]. + + ┌──────────────────────────────┐ ┌──────────────────────────────┐ + │ Normal Mode (VTL0) │ │ Secure Mode (VTL1) │ + │ ┌──────────────────────────┐ │ │ ┌──────────────────────────┐ │ + │ │ User-mode Processes │ │ │ │Secure User-mode Processes│ │ + │ └──────────────────────────┘ │ │ └──────────────────────────┘ │ + │ ┌──────────────────────────┐ │ │ ┌──────────────────────────┐ │ + │ │ Kernel │ │ │ │ Secure Kernel │ │ + │ └──────────────────────────┘ │ │ └──────────────────────────┘ │ + └──────────────────────────────┘ └──────────────────────────────┘ + ┌───────────────────────────────────────────────────────────────┐ + │ Hypervisor/KVM │ + └───────────────────────────────────────────────────────────────┘ + ┌───────────────────────────────────────────────────────────────┐ + │ Hardware │ + └───────────────────────────────────────────────────────────────┘ + +VTLs break the core assumption that a vCPU has a single architectural +state, lAPIC state, SynIC state, etc. As such, each VTL is modeled as a +distinct KVM vCPU, with the restriction that only one is allowed to run +at any moment in time. Having multiple KVM vCPUs tracking a single guest +CPU complicates vCPU numbering. VMs that enable VSM are expected to use +CAP_APIC_ID_GROUPS to segregate vCPUs (and their lAPICs) into different +groups. For example, a 4 CPU VSM VM will setup the APIC ID groups feature +so only the first two bits of the APIC ID are exposed to the guest. The +remaining bits represent the vCPU's VTL. The 'sibling' vCPU to VTL0's +vCPU2 at VTL3 will have an APIC ID of 0xE. Using this approach a VMM and +KVM are capable of querying a vCPU's VTL, or finding the vCPU associated +to a specific VTL. + +KVM's lAPIC implementation is aware of groups, and takes note of the +source vCPU's group when delivering IPIs. As such, it shouldn't be +possible to target a different VTL through the APIC. Interrupts are +delivered to the vCPU's lAPIC subsystem regardless of the VTL's runstate, +this also includes timers. Ultimately, any interrupt incoming from an +outside source (IOAPIC/MSIs) is routed to VTL0. + +Moving Between VTLs +------------------- + +All VSM configuration and VTL handling hypercalls are passed through to +user-space. Notably the two primitives that allow switching between VTLs. +All shared state synchronization and KVM vCPU scheduling is left to the +VMM to manage. For example, upon receiving a VTL call, the VMM stops the +vCPU that issued the hypercall, and schedules the vCPU corresponding to +the next privileged VTL. When that privileged vCPU is done executing, it +issues a VTL return hypercall, so the opposite operation happens. All +this is transparent to KVM, which limits itself to running vCPUs. + +An interrupt directed at a privileged VTL always has precedence over the +execution of lower VTLs. To honor this, the VMM can monitor events +targeted at privileged vCPUs with poll(), and should trigger an +asynchronous VTL switch whenever events become available. Additionally, +the target VTL's vCPU VP assist overlay page is used to notify the target +VTL with the reason for the switch. The VMM can keep track of the VP +assist page by installing an MSR filter for HV_X64_MSR_VP_ASSIST_PAGE. + +Hyper-V VP registers +-------------------- + +VP register hypercalls are passed through to user-space. All requests can +be fulfilled either by using already existing KVM state ioctls, or are +related to VSM's configuration, which is already handled by the VMM. Note +that HV_REGISTER_VSM_CODE_PAGE_OFFSETS is the only VSM specific VP +register the kernel controls, as such it is made available through the +KVM_HV_GET_VSM_STATE ioctl. + +Per-VTL Memory Protections +-------------------------- + +A privileged VTL can change the memory access restrictions of lower VTLs. +It does so to hide secrets from them, or to control what they are allowed +to execute. The list of memory protections allowed is[3]: + - No access + - Read-only, no execute + - Read-only, execute + - Read/write, no execute + - Read/write, execute + +VTL memory protection hypercalls are passed through to user-space, but +KVM provides an interface that allows changing memory protections on a +per-VTL basis. This is made possible by the KVM VTL device. VMMs can +create one per VTL and it exposes a ioctl, KVM_SET_MEMORY_ATTRIBUTES, +that controls the memory protections applied to that VTL. The KVM TDP MMU +is VTL aware and page faults are resolved taking into account the +corresponding VTL device's memory attributes. + +When a memory access violates VTL memory protections, KVM issues a secure +memory intercept, which is passed as a SynIC message into the next +privileged VTL. This happens transparently for the VMM. Additionally, KVM +exits with a user-space memory fault. This allows the VMM to stop the +vCPU while the secure intercept is handled by the privileged VTL. In the +good case, the instruction that triggered the fault is emulated and +control is returned to the lower VTL, in the bad case, Windows crashes +gracefully. + +Hyper-V's TLFS also states that DMA should follow VTL0's memory access +restrictions. This is out of scope for this document, as IOMMU mappings +are not handled by KVM. + +[1] https://raw.githubusercontent.com/Microsoft/Virtualization-Documentation/master/tlfs/Hypervisor%20Top%20Level%20Functional%20Specification%20v6.0b.pdf + +[2] Conceptually this design is similar to arm's TrustZone: The +hypervisor plays the role of EL3. Windows (VTL0) runs in Non-Secure +(EL0/EL1) and the secure kernel (VTL1) in Secure World (EL1s/EL0s). + +[3] TLFS 15.9.3