Message ID | 20230419221716.3603068-1-atishp@rivosinc.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp698378vqo; Wed, 19 Apr 2023 15:18:52 -0700 (PDT) X-Google-Smtp-Source: AKy350YpodMrhc/sdZd/ZJ6ZsCplLlKaPg6bNSD1q83O3ce2M1hRXNEmXc+TysBCzcWyijUyMPKZ X-Received: by 2002:a05:6a20:5498:b0:da:8ee4:5b9 with SMTP id i24-20020a056a20549800b000da8ee405b9mr175321pzk.7.1681942732433; Wed, 19 Apr 2023 15:18:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1681942732; cv=none; d=google.com; s=arc-20160816; b=btXTGgC9dApHUdXBBWbRoZAQ+eRiJ/r3PDBN0tzMklOEocpOqrfG+h7nboWuAEwQWC yw3TRxF3OMXRPNhh4PTzoL/Nz4WnPOxLIb8bdf0HdO/MLIPvdJEhXcpLZyG2yootUJHX B9mynpm7MWKzymhXEGiZm+4dYSzDaKKESjUUJb2xZLoQNoy2VEJ9AlQH496SYq7HnXz0 R8DiQ0NW+jw7/cdd0Gd89TU07DdJB2we9bHm8VUuZS52tpimmT/vAIEr4U3Ff+16vF9O CjrZJPBEHVu53Eg7Az2FMZoTXjoItM9h2Y4L3L05XGQ9HWRoCbrsmnK5j9GZ73XCI2L9 Ss9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=bHSOgTgczHoSpVKp/2cbvFdlvzihl0SXPdnpVyAQEXE=; b=MOf+Wv/XBzVhTMo/0V/uJcu1kyeERZDxZI/9dnLmKM++upF7jx3Qti0/e8A9lh6+n+ OTgep80+UCziKKoOaVYvbXBiBHXYTbWXk/RNb/sMWfvPPkPXFHfV6XHlYAqQMy0+nsIv RMHOGGMDscKCvOovnAm7JQzpVwnPVingMXOM6Hby3m9uzbPPdRFiFj+4aSaZmAmOlJ5Q uVyeOY5VfKh7SJfkhogfeqe3o76W3vxTxhm2caVTnFEvXzFmwJMHKzJosViWJPar0lHm N5uYt14zLiJ/npy0Z24hIVP/NhCWBnfLPeteLM8PJZHt6mSSADAVTf2SgNYxzGBryz5H opCQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@rivosinc-com.20221208.gappssmtp.com header.s=20221208 header.b=BEqP6G9E; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i5-20020a6561a5000000b0051398cda533si18123272pgv.388.2023.04.19.15.18.39; Wed, 19 Apr 2023 15:18:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@rivosinc-com.20221208.gappssmtp.com header.s=20221208 header.b=BEqP6G9E; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230410AbjDSWRf (ORCPT <rfc822;peter110.wang@gmail.com> + 99 others); Wed, 19 Apr 2023 18:17:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41096 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229812AbjDSWRd (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 19 Apr 2023 18:17:33 -0400 Received: from mail-pl1-x633.google.com (mail-pl1-x633.google.com [IPv6:2607:f8b0:4864:20::633]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8E0021721 for <linux-kernel@vger.kernel.org>; Wed, 19 Apr 2023 15:17:30 -0700 (PDT) Received: by mail-pl1-x633.google.com with SMTP id d9443c01a7336-1a5197f00e9so4654195ad.1 for <linux-kernel@vger.kernel.org>; Wed, 19 Apr 2023 15:17:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20221208.gappssmtp.com; s=20221208; t=1681942650; x=1684534650; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=bHSOgTgczHoSpVKp/2cbvFdlvzihl0SXPdnpVyAQEXE=; b=BEqP6G9EMWgC3XGd8MFyO1/zN07IG/ZOcdvKNw2uvUe6FY9MwYMD9OtT/e6Hpsh+d1 UTBw0SQO/dfPji/vJHTQDRu1iDUarJ1tS8TRJuTOO+KYKTKVWfmVRS/Xp1euAnU845Gl HHL+bdzN6lJB4kFW+YVJlX+ztx23Vcd8AuVprRPD1kNdS6LLAlflOX5ATt9IweQywq5l Kd6OhXAFXqN5HuxtdNo8NV2Lp/SReOMX0ij2xUdlRnElKqTtVrWyvcnLURzPo/12F526 EtapUBzLLKw514EMqqXGsXvC8TxYgg37pxXtw48fvr8B9i2gaBLQlmF4jAsOItZdgUPw hT2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681942650; x=1684534650; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=bHSOgTgczHoSpVKp/2cbvFdlvzihl0SXPdnpVyAQEXE=; b=AH+2a5s8NmCJpZMlWTw7aGsSYGygkqyEgWKBJhRCVfZtOYKNzC/zmfeFSPUqJN353I /H+2P7aD+gShKEZj71rulBoRMLI8xK1gl0UgbVcG2A0NfrMRK9BxQc2q5Y/l6t2OxuW7 VzP7t0xCOHp73AE6yNzgMyLxqtWZ+lQCa7cbIduAZyK5VB6cO9d5nsaTlp0Nk1sSAjd9 AnisEyLQOz/BlgS8qfteQ6UkaPCaJCDnOM8SCzVz7ql31cJAjB+38sFkLPtJ1+NX/uUv Xup6RiVU9CaiXymX6/snShFS0ZooEs+8YbUCyOHHg8JmJLpbxX/w3Fa5+1lDeLU5gzpr 56EA== X-Gm-Message-State: AAQBX9er1JbF2JWeXazka2KH0xruKZLx31xcQBER14kKITazwUDcfde9 J5U9Gqt9j+A8xogh3WrMgkGQeOJNmas20xCM/54= X-Received: by 2002:a17:902:bcc3:b0:1a6:7b92:15ec with SMTP id o3-20020a170902bcc300b001a67b9215ecmr6654089pls.41.1681942649644; Wed, 19 Apr 2023 15:17:29 -0700 (PDT) Received: from atishp.ba.rivosinc.com ([66.220.2.162]) by smtp.gmail.com with ESMTPSA id jn11-20020a170903050b00b00196807b5189sm11619190plb.292.2023.04.19.15.17.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Apr 2023 15:17:29 -0700 (PDT) From: Atish Patra <atishp@rivosinc.com> To: linux-kernel@vger.kernel.org Cc: Atish Patra <atishp@rivosinc.com>, Alexandre Ghiti <alex@ghiti.fr>, Andrew Jones <ajones@ventanamicro.com>, Andrew Morton <akpm@linux-foundation.org>, Anup Patel <anup@brainfault.org>, Atish Patra <atishp@atishpatra.org>, =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= <bjorn@rivosinc.com>, Suzuki K Poulose <suzuki.poulose@arm.com>, Will Deacon <will@kernel.org>, Marc Zyngier <maz@kernel.org>, Sean Christopherson <seanjc@google.com>, linux-coco@lists.linux.dev, Dylan Reid <dylan@rivosinc.com>, abrestic@rivosinc.com, Samuel Ortiz <sameo@rivosinc.com>, Christoph Hellwig <hch@infradead.org>, Conor Dooley <conor.dooley@microchip.com>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Guo Ren <guoren@kernel.org>, Heiko Stuebner <heiko@sntech.de>, Jiri Slaby <jirislaby@kernel.org>, kvm-riscv@lists.infradead.org, kvm@vger.kernel.org, linux-mm@kvack.org, linux-riscv@lists.infradead.org, Mayuresh Chitale <mchitale@ventanamicro.com>, Palmer Dabbelt <palmer@dabbelt.com>, Paolo Bonzini <pbonzini@redhat.com>, Paul Walmsley <paul.walmsley@sifive.com>, Rajnesh Kanwal <rkanwal@rivosinc.com>, Uladzislau Rezki <urezki@gmail.com> Subject: [RFC 00/48] RISC-V CoVE support Date: Wed, 19 Apr 2023 15:16:28 -0700 Message-Id: <20230419221716.3603068-1-atishp@rivosinc.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1763644782518044940?= X-GMAIL-MSGID: =?utf-8?q?1763644782518044940?= |
Series |
RISC-V CoVE support
|
|
Message
Atish Patra
April 19, 2023, 10:16 p.m. UTC
This patch series adds the RISC-V Confidential VM Extension (CoVE) support to Linux kernel. The RISC-V CoVE specification introduces non-ISA, SBI APIs. These APIs enable a confidential environment in which a guest VM's data can be isolated from the host while the host retains control of guest VM management and platform resources(memory, CPU, I/O). This is a very early WIP work. We want to share this with the community to get any feedback on overall architecture and direction. Any other feedback is welcome too. The detailed CoVE architecture document can be found here [0]. It used to be called AP-TEE and renamed to CoVE recently to avoid overloading term of TEE in general. The specification is in the draft stages and is subjected to change based on the feedback from the community. The CoVE specification introduces 3 new SBI extensions. COVH - CoVE Host side interface COVG - CoVE Guest side interface COVI - CoVE Secure Interrupt management extension Some key acronyms introduced: TSM - TEE Security Manager TVM - TEE VM (aka Confidential VM) CoVE Architecture: ==================== The CoVE APIs are designed to be implementation and architecture agnostic, allowing for different deployment models while retaining common host and guest kernel code. Two examples are shown in Figure 1 and Figure 2. As shown in both figures, the architecture introduces a new software component called the "TEE Security Manager" (TSM) that runs in HS mode. The TSM has minimal hw attested footprint on TCB as it is a passive component that doesn't support scheduling or timer interrupts. Both example deployment models provide memory isolation between the host and the TEE VM (TVM). Non secure world | Secure world | | | Non | | Virtualized | Virtualized | Virtualized Virtualized | Env | Env | Env Env | +----------+ | +----------+ | +----------+ +----------+ | -------------- | | | | | | | | | | | | Host Apps| | | Apps | | | Apps | | Apps | | VU-Mode | (VMM) | | | | | | | | | | +----------+ | +----------+ | +----------+ +----------+ | -------------- | | +----------+ | +----------+ +----------+ | | | | | | | | | | | | | | | | | TVM | | TVM | | | | | Guest | | | Guest | | Guest | | VS-Mode Syscalls | +----------+ | +----------+ +----------+ | | | | | | | SBI | SBI(COVG + COVI) | | | | | | +--------------------------+ | +---------------------------+ -------------- | Host (Linux) | | | TSM (Salus) | +--------------------------+ | +---------------------------+ | | | HS-Mode SBI (COVH + COVI) | SBI (COVH + COVI) | | | +-----------------------------------------------------------+ -------------- | Firmware(OpenSBI) + TSM Driver | M-Mode +-----------------------------------------------------------+ -------------- +----------------------------------------------------------------------------- | Hardware (RISC-V CPU + RoT + IOMMU) +---------------------------------------------------------------------------- Figure 1: Host in HS model The deployment model shown in Figure 1 runs the host in HS mode where it is peer to the TSM which also runs in HS mode. It requires another component known as TSM Driver running in higher privilege mode than host/TSM. It is responsible for switching the context between the host and the TSM. TSM driver also manages the platform specific hardware solution via confidential domain bit as described in the specification[0] to provide the required memory isolation. Non secure world | Secure world | Virtualized Env | Virtualized Virtualized | Env Env | +-------------------------+ | +----------+ +----------+ | ------------ | | | | | | | | | | | | Host Apps| | | Apps | | | Apps | | Apps | | VU-Mode +----------+ | +----------+ | +----------+ +----------+ | ------------ | | | | | Syscalls SBI | | | | | | | | | +--------------------------+ | +-----------+ +-----------+ | | Host (Linux) | | | TVM Guest| | TVM Guest| | VS-Mode +--------------------------+ | +-----------+ +-----------+ | | | | | | SBI (COVH + COVI) | SBI SBI | | | (COVG + COVI) (COVG + COVI)| | | | | | +-----------------------------------------------------------+ -------------- | TSM(Salus) | HS-Mode +-----------------------------------------------------------+ -------------- | SBI | +---------------------------------------------------------+ -------------- | Firmware(OpenSBI) | M-Mode +---------------------------------------------------------+ -------------- +----------------------------------------------------------------------------- | Hardware (RISC-V CPU + RoT + IOMMU) +---------------------------------------------------------------------------- Figure 2: Host in VS model The deployment model shown in Figure 2 simplifies the context switch and memory isolation by running the host in VS mode as a guest of TSM. Thus, the memory isolation is achieved by gstage mapping by the TSM. We don't need any additional hardware confidential domain bit to provide memory isolation. The downside of this model the host has to run the non-confidential VMs in nested environment which may have lower performance (yet to be measured). The current implementation Salus(TSM) doesn't support full nested virtualization yet. The platform must have a RoT to provide attestation in either model. This patch series implements the APIs defined by CoVE. The current TSM implementation allows the host to run TVMs as shown in figure 2. We are working on deployment model 1 in parallel. We do not expect any significant changes in either host/guest side ABI due to that. Shared memory between the host & TSM: ===================================== To accelerate the H-mode CSR/GPR access, CoVE also reuses the Nested Acceleration (NACL) SBI extension[1]. NACL defines a per physical cpu shared memory area that is allocated at the boot. It allows the host running in VS mode to access H-mode CSR/GPR easily without trapping into the TSM. The CoVE specification clearly defines the exact state of the shared memory with r/w permissions at every call. Secure Interrupt management: =========================== The CoVE specification relies on the MSI based interrupt scheme defined in Advanced Interrupt Architecture specification[2]. The COVI SBI extension adds functions to bind a guest interrupt file to a TVMs. After that, only TCB components (TSM, TVM, TSM driver) can modify that. The host can inject an interrupt via TSM only. The TVMs are also in complete control of which interrupts it can receive. By default, all interrupts are denied. In this proof-of-concept implementation, all the interrupts are allowed by the guest at boot time to keep it simple. Device I/O: =========== In order to support paravirt I/O devices, SWIOTLB bounce buffer must be used by the guest. As the host can not access confidential memory, this buffer memory must be shared with the host via share/unshare functions defined in COVG SBI extension. RISC-V implementation achieves this generalizing mem_encrypt_init() similar to TDX/SEV/CCA. That's why, the CoVE Guest is only allowed to use virtio devices with VIRTIO_F_ACCESS_PLATFORM and VIRTIO_F_VERSION_1 as they force virtio drivers to use the DMA API. MMIO emulation: ====================== TVM can register regions of address space as MMIO regions to be emulated by the host. TSM provides explicit SBI functions i.e. SBI_EXT_COVG_[ADD/REMOVE]_MMIO_REGION to request/remove MMIO regions. Any reads or writes to those MMIO region after SBI_EXT_COVG_ADD_MMIO_REGION call are forwarded to the host for emulation. This series allows any ioremapped memory to be emulated as MMIO region with above APIs via arch hookups inspired from pKVM work. We are aware that this model doesn't address all the threat vectors. We have also implemented the device filtering/authorization approach adopted by TDX[4]. However, those patches are not part of this series as the base TDX patches are still under active development. RISC-V CoVE will also adapt the revamped device filtering work once it is accepted by the Linux community in the future. The direct assignment of devices are a work in progress and will be added in the future[4]. VMM support: ============ This series is only tested with kvmtool support. Other VMM support (qemu-kvm, crossvm/rust-vmm) will be added later. Test cases: =========== We are working on kvm selftest for CoVE. We will post them as soon as they are ready. We haven't started any work on kvm unit-tests as RISC-V doesn't have basic infrastructure to support that. Once the kvm uni-test infrastructure is in place, we will add support for CoVE as well. Open design questions: ====================== 1. The current implementation has two separate configs for guest(CONFIG_RISCV_COVE_GUEST) and the host (RISCV_COVE_HOST). The default defconfig will enable both so that same unified image works as both host & guest. Most likely distro prefer this way to minimize the maintenance burden but some may want a minimal CoVE guest image that has only hardened drivers. In addition to that, Android runs a microdroid instance in the confidential guests. A separate config will help in those case. Please let us know if there is any concern with two configs. 2. Lazy gstage page allocation vs upfront allocation with page pool. Currently, all gstage mappings happen at runtime during the fault. This is expensive as we need to convert that page to confidential memory as well. A page pool framework may be a better choice which can hold all the confidential pages which can be pre-allocated upfront. A generic page pool infrastructure may benefit other CC solutions ? 3. In order to allow both confidential VM and non-confidential VM, the series uses regular branching instead of static branches for CoVE VM specific cases through out KVM. That may cause a few more branch penalties while running regular VMs. The alternate option is to use function pointers for any function that needs to take a different path. As per my understanding, that would be worse than branches. Patch organization: =================== This series depends on quite a few RISC-V patches that are not upstream yet. Here are the dependencies. 1. RISC-V IPI improvement series 2. RISC-V AIA support series. 3. RISC-V NACL support series In this series, PATCH [0-5] are generic improvement and cleanup patches which can be merged independently. PATCH [6-26, 34-37] adds host side for CoVE. PATCH [27-33] adds the interrupt related changes. PATCH [34-49] Adds the guest side changes for CoVE. The TSM project is written in rust and can be found here: https://github.com/rivosinc/salus Running the stack ==================== To run/test the stack, you would need the following components : 1) Qemu 2) Common Host & Guest Kernel 3) kvmtool 4) Host RootFS with KVMTOOL and Guest Kernel 5) Salus The detailed steps are available at[6] The Linux kernel patches are also available at [7] and the kvmtool patches are available at [8]. TODOs ======= As this is a very early work, the todo list is quite long :). Here are some of them (not in any specific order) 1. Support fd based private memory interface proposed in https://lkml.org/lkml/2022/1/18/395 2. Align with updated guest runtime device filtering approach. 3. IOMMU integration 4. Dedicated device assignment via TDSIP & SPDM[4] 5. Support huge pages 6. Page pool allocator to avoid convert/reclaim at every fault 7. Other VMM support (qemu-kvm, crossvm) 8. Complete the PoC for the deployment model 1 where host runs in HS mode 9. Attestation integration 10. Harden the interrupt allowed list 11. kvm self-tests support for CoVE 11. kvm unit-tests support for CoVE 12. Guest hardening 13. Port pKVM on RISC-V using CoVE 14. Any other ? Links ============ [0] CoVE architecture Specification. https://github.com/riscv-non-isa/riscv-ap-tee/blob/main/specification/riscv-aptee-spec.pdf [1] https://lists.riscv.org/g/sig-hypervisors/message/260 [2] https://github.com/riscv/riscv-aia/releases/download/1.0-RC2/riscv-interrupts-1.0-RC2.pdf [3] https://github.com/rivosinc/linux/tree/cove_integration_device_filtering1 [4] https://github.com/intel/tdx/commits/guest-filter-upstream [5] https://lists.riscv.org/g/tech-ap-tee/message/83 [6] https://github.com/rivosinc/cove/wiki/CoVE-KVM-RISCV64-on-QEMU [7] https://github.com/rivosinc/linux/commits/cove-integration [8] https://github.com/rivosinc/kvmtool/tree/cove-integration-03072023 Atish Patra (33): RISC-V: KVM: Improve KVM error reporting to the user space RISC-V: KVM: Invoke aia_update with preempt disabled/irq enabled RISC-V: KVM: Add a helper function to get pgd size RISC-V: Add COVH SBI extensions definitions RISC-V: KVM: Implement COVH SBI extension RISC-V: KVM: Add a barebone CoVE implementation RISC-V: KVM: Add UABI to support static memory region attestation RISC-V: KVM: Add CoVE related nacl helpers RISC-V: KVM: Implement static memory region measurement RISC-V: KVM: Use the new VM IOCTL for measuring pages RISC-V: KVM: Exit to the user space for trap redirection RISC-V: KVM: Return early for gstage modifications RISC-V: KVM: Skip dirty logging updates for TVM RISC-V: KVM: Add a helper function to trigger fence ops RISC-V: KVM: Skip most VCPU requests for TVMs RISC-V : KVM: Skip vmid/hgatp management for TVMs RISC-V: KVM: Skip TLB management for TVMs RISC-V: KVM: Register memory regions as confidential for TVMs RISC-V: KVM: Add gstage mapping for TVMs RISC-V: KVM: Handle SBI call forward from the TSM RISC-V: KVM: Implement vcpu load/put functions for CoVE guests RISC-V: KVM: Wireup TVM world switch RISC-V: KVM: Skip HVIP update for TVMs RISC-V: KVM: Implement COVI SBI extension RISC-V: KVM: Add interrupt management functions for TVM RISC-V: KVM: Skip AIA CSR updates for TVMs RISC-V: KVM: Perform limited operations in hardware enable/disable RISC-V: KVM: Indicate no support user space emulated IRQCHIP RISC-V: KVM: Add AIA support for TVMs RISC-V: KVM: Hookup TVM VCPU init/destroy RISC-V: KVM: Initialize CoVE RISC-V: KVM: Add TVM init/destroy calls drivers/hvc: sbi: Disable HVC console for TVMs Rajnesh Kanwal (15): mm/vmalloc: Introduce arch hooks to notify ioremap/unmap changes RISC-V: KVM: Update timer functionality for TVMs. RISC-V: Add COVI extension definitions RISC-V: KVM: Read/write gprs from/to shmem in case of TVM VCPU. RISC-V: Add COVG SBI extension definitions RISC-V: Add CoVE guest config and helper functions RISC-V: Implement COVG SBI extension RISC-V: COVE: Add COVH invalidate, validate, promote, demote and remove APIs. RISC-V: KVM: Add host side support to handle COVG SBI calls. RISC-V: Allow host to inject any ext interrupt id to a CoVE guest. RISC-V: Add base memory encryption functions. RISC-V: Add cc_platform_has() for RISC-V for CoVE RISC-V: ioremap: Implement for arch specific ioremap hooks riscv/virtio: Have CoVE guests enforce restricted virtio memory access. RISC-V: Add shared bounce buffer to support DBCN for CoVE Guest. arch/riscv/Kbuild | 2 + arch/riscv/Kconfig | 27 + arch/riscv/cove/Makefile | 2 + arch/riscv/cove/core.c | 40 + arch/riscv/cove/cove_guest_sbi.c | 109 +++ arch/riscv/include/asm/cove.h | 27 + arch/riscv/include/asm/covg_sbi.h | 38 + arch/riscv/include/asm/csr.h | 2 + arch/riscv/include/asm/kvm_cove.h | 206 +++++ arch/riscv/include/asm/kvm_cove_sbi.h | 101 +++ arch/riscv/include/asm/kvm_host.h | 10 +- arch/riscv/include/asm/kvm_vcpu_sbi.h | 3 + arch/riscv/include/asm/mem_encrypt.h | 26 + arch/riscv/include/asm/sbi.h | 107 +++ arch/riscv/include/uapi/asm/kvm.h | 17 + arch/riscv/kernel/irq.c | 12 + arch/riscv/kernel/setup.c | 2 + arch/riscv/kvm/Makefile | 1 + arch/riscv/kvm/aia.c | 101 ++- arch/riscv/kvm/aia_device.c | 41 +- arch/riscv/kvm/aia_imsic.c | 127 ++- arch/riscv/kvm/cove.c | 1005 +++++++++++++++++++++++ arch/riscv/kvm/cove_sbi.c | 490 +++++++++++ arch/riscv/kvm/main.c | 30 +- arch/riscv/kvm/mmu.c | 45 +- arch/riscv/kvm/tlb.c | 11 +- arch/riscv/kvm/vcpu.c | 69 +- arch/riscv/kvm/vcpu_exit.c | 34 +- arch/riscv/kvm/vcpu_insn.c | 115 ++- arch/riscv/kvm/vcpu_sbi.c | 16 + arch/riscv/kvm/vcpu_sbi_covg.c | 232 ++++++ arch/riscv/kvm/vcpu_timer.c | 26 +- arch/riscv/kvm/vm.c | 34 +- arch/riscv/kvm/vmid.c | 17 +- arch/riscv/mm/Makefile | 3 + arch/riscv/mm/init.c | 17 +- arch/riscv/mm/ioremap.c | 45 + arch/riscv/mm/mem_encrypt.c | 61 ++ drivers/tty/hvc/hvc_riscv_sbi.c | 5 + drivers/tty/serial/earlycon-riscv-sbi.c | 51 +- include/uapi/linux/kvm.h | 8 + mm/vmalloc.c | 16 + 42 files changed, 3222 insertions(+), 109 deletions(-) create mode 100644 arch/riscv/cove/Makefile create mode 100644 arch/riscv/cove/core.c create mode 100644 arch/riscv/cove/cove_guest_sbi.c create mode 100644 arch/riscv/include/asm/cove.h create mode 100644 arch/riscv/include/asm/covg_sbi.h create mode 100644 arch/riscv/include/asm/kvm_cove.h create mode 100644 arch/riscv/include/asm/kvm_cove_sbi.h create mode 100644 arch/riscv/include/asm/mem_encrypt.h create mode 100644 arch/riscv/kvm/cove.c create mode 100644 arch/riscv/kvm/cove_sbi.c create mode 100644 arch/riscv/kvm/vcpu_sbi_covg.c create mode 100644 arch/riscv/mm/ioremap.c create mode 100644 arch/riscv/mm/mem_encrypt.c -- 2.25.1
Comments
On Thu, Apr 20, 2023 at 3:47 AM Atish Patra <atishp@rivosinc.com> wrote: > > This patch series adds the RISC-V Confidential VM Extension (CoVE) support to > Linux kernel. The RISC-V CoVE specification introduces non-ISA, SBI APIs. These > APIs enable a confidential environment in which a guest VM's data can be isolated > from the host while the host retains control of guest VM management and platform > resources(memory, CPU, I/O). > > This is a very early WIP work. We want to share this with the community to get any > feedback on overall architecture and direction. Any other feedback is welcome too. > > The detailed CoVE architecture document can be found here [0]. It used to be > called AP-TEE and renamed to CoVE recently to avoid overloading term of TEE in > general. The specification is in the draft stages and is subjected to change based > on the feedback from the community. > > The CoVE specification introduces 3 new SBI extensions. > COVH - CoVE Host side interface > COVG - CoVE Guest side interface > COVI - CoVE Secure Interrupt management extension > > Some key acronyms introduced: > > TSM - TEE Security Manager > TVM - TEE VM (aka Confidential VM) > > CoVE Architecture: > ==================== > The CoVE APIs are designed to be implementation and architecture agnostic, > allowing for different deployment models while retaining common host and guest > kernel code. Two examples are shown in Figure 1 and Figure 2. > As shown in both figures, the architecture introduces a new software component > called the "TEE Security Manager" (TSM) that runs in HS mode. The TSM has minimal > hw attested footprint on TCB as it is a passive component that doesn't support > scheduling or timer interrupts. Both example deployment models provide memory > isolation between the host and the TEE VM (TVM). > > > Non secure world | Secure world | > | | > Non | | > Virtualized | Virtualized | Virtualized Virtualized | > Env | Env | Env Env | > +----------+ | +----------+ | +----------+ +----------+ | -------------- > | | | | | | | | | | | > | Host Apps| | | Apps | | | Apps | | Apps | | VU-Mode > | (VMM) | | | | | | | | | | > +----------+ | +----------+ | +----------+ +----------+ | -------------- > | | +----------+ | +----------+ +----------+ | > | | | | | | | | | | > | | | | | | TVM | | TVM | | > | | | Guest | | | Guest | | Guest | | VS-Mode > Syscalls | +----------+ | +----------+ +----------+ | > | | | | | > | SBI | SBI(COVG + COVI) | > | | | | | > +--------------------------+ | +---------------------------+ -------------- > | Host (Linux) | | | TSM (Salus) | > +--------------------------+ | +---------------------------+ > | | | HS-Mode > SBI (COVH + COVI) | SBI (COVH + COVI) > | | | > +-----------------------------------------------------------+ -------------- > | Firmware(OpenSBI) + TSM Driver | M-Mode > +-----------------------------------------------------------+ -------------- > +----------------------------------------------------------------------------- > | Hardware (RISC-V CPU + RoT + IOMMU) > +---------------------------------------------------------------------------- > Figure 1: Host in HS model > > > The deployment model shown in Figure 1 runs the host in HS mode where it is peer > to the TSM which also runs in HS mode. It requires another component known as TSM > Driver running in higher privilege mode than host/TSM. It is responsible for switching > the context between the host and the TSM. TSM driver also manages the platform > specific hardware solution via confidential domain bit as described in the specification[0] > to provide the required memory isolation. > > > Non secure world | Secure world > | > Virtualized Env | Virtualized Virtualized | > Env Env | > +-------------------------+ | +----------+ +----------+ | ------------ > | | | | | | | | | | | > | Host Apps| | | Apps | | | Apps | | Apps | | VU-Mode > +----------+ | +----------+ | +----------+ +----------+ | ------------ > | | | | | > Syscalls SBI | | | | > | | | | | > +--------------------------+ | +-----------+ +-----------+ | > | Host (Linux) | | | TVM Guest| | TVM Guest| | VS-Mode > +--------------------------+ | +-----------+ +-----------+ | > | | | | | > SBI (COVH + COVI) | SBI SBI | > | | (COVG + COVI) (COVG + COVI)| > | | | | | > +-----------------------------------------------------------+ -------------- > | TSM(Salus) | HS-Mode > +-----------------------------------------------------------+ -------------- > | > SBI > | > +---------------------------------------------------------+ -------------- > | Firmware(OpenSBI) | M-Mode > +---------------------------------------------------------+ -------------- > +----------------------------------------------------------------------------- > | Hardware (RISC-V CPU + RoT + IOMMU) > +---------------------------------------------------------------------------- > Figure 2: Host in VS model > > > The deployment model shown in Figure 2 simplifies the context switch and memory isolation > by running the host in VS mode as a guest of TSM. Thus, the memory isolation is > achieved by gstage mapping by the TSM. We don't need any additional hardware confidential > domain bit to provide memory isolation. The downside of this model the host has to run the > non-confidential VMs in nested environment which may have lower performance (yet to be measured). > The current implementation Salus(TSM) doesn't support full nested virtualization yet. > > The platform must have a RoT to provide attestation in either model. > This patch series implements the APIs defined by CoVE. The current TSM implementation > allows the host to run TVMs as shown in figure 2. We are working on deployment > model 1 in parallel. We do not expect any significant changes in either host/guest side > ABI due to that. > > Shared memory between the host & TSM: > ===================================== > To accelerate the H-mode CSR/GPR access, CoVE also reuses the Nested Acceleration (NACL) > SBI extension[1]. NACL defines a per physical cpu shared memory area that is allocated > at the boot. It allows the host running in VS mode to access H-mode CSR/GPR easily > without trapping into the TSM. The CoVE specification clearly defines the exact > state of the shared memory with r/w permissions at every call. > > Secure Interrupt management: > =========================== > The CoVE specification relies on the MSI based interrupt scheme defined in Advanced Interrupt > Architecture specification[2]. The COVI SBI extension adds functions to bind > a guest interrupt file to a TVMs. After that, only TCB components (TSM, TVM, TSM driver) > can modify that. The host can inject an interrupt via TSM only. > The TVMs are also in complete control of which interrupts it can receive. By default, > all interrupts are denied. In this proof-of-concept implementation, all the interrupts > are allowed by the guest at boot time to keep it simple. > > Device I/O: > =========== > In order to support paravirt I/O devices, SWIOTLB bounce buffer must be used by the > guest. As the host can not access confidential memory, this buffer memory > must be shared with the host via share/unshare functions defined in COVG SBI extension. > RISC-V implementation achieves this generalizing mem_encrypt_init() similar to TDX/SEV/CCA. > That's why, the CoVE Guest is only allowed to use virtio devices with VIRTIO_F_ACCESS_PLATFORM > and VIRTIO_F_VERSION_1 as they force virtio drivers to use the DMA API. > > MMIO emulation: > ====================== > TVM can register regions of address space as MMIO regions to be emulated by > the host. TSM provides explicit SBI functions i.e. SBI_EXT_COVG_[ADD/REMOVE]_MMIO_REGION > to request/remove MMIO regions. Any reads or writes to those MMIO region after > SBI_EXT_COVG_ADD_MMIO_REGION call are forwarded to the host for emulation. > > This series allows any ioremapped memory to be emulated as MMIO region with > above APIs via arch hookups inspired from pKVM work. We are aware that this model > doesn't address all the threat vectors. We have also implemented the device > filtering/authorization approach adopted by TDX[4]. However, those patches are not > part of this series as the base TDX patches are still under active development. > RISC-V CoVE will also adapt the revamped device filtering work once it is accepted > by the Linux community in the future. > > The direct assignment of devices are a work in progress and will be added in the future[4]. > > VMM support: > ============ > This series is only tested with kvmtool support. Other VMM support (qemu-kvm, crossvm/rust-vmm) > will be added later. > > Test cases: > =========== > We are working on kvm selftest for CoVE. We will post them as soon as they are ready. > We haven't started any work on kvm unit-tests as RISC-V doesn't have basic infrastructure > to support that. Once the kvm uni-test infrastructure is in place, we will add > support for CoVE as well. > > Open design questions: > ====================== > > 1. The current implementation has two separate configs for guest(CONFIG_RISCV_COVE_GUEST) > and the host (RISCV_COVE_HOST). The default defconfig will enable both so that > same unified image works as both host & guest. Most likely distro prefer this way > to minimize the maintenance burden but some may want a minimal CoVE guest image > that has only hardened drivers. In addition to that, Android runs a microdroid instance > in the confidential guests. A separate config will help in those case. Please let us > know if there is any concern with two configs. > > 2. Lazy gstage page allocation vs upfront allocation with page pool. > Currently, all gstage mappings happen at runtime during the fault. This is expensive > as we need to convert that page to confidential memory as well. A page pool framework > may be a better choice which can hold all the confidential pages which can be > pre-allocated upfront. A generic page pool infrastructure may benefit other CC solutions ? > > 3. In order to allow both confidential VM and non-confidential VM, the series > uses regular branching instead of static branches for CoVE VM specific cases through > out KVM. That may cause a few more branch penalties while running regular VMs. > The alternate option is to use function pointers for any function that needs to > take a different path. As per my understanding, that would be worse than branches. > > Patch organization: > =================== > This series depends on quite a few RISC-V patches that are not upstream yet. > Here are the dependencies. > > 1. RISC-V IPI improvement series > 2. RISC-V AIA support series. > 3. RISC-V NACL support series > > In this series, PATCH [0-5] are generic improvement and cleanup patches which > can be merged independently. > > PATCH [6-26, 34-37] adds host side for CoVE. > PATCH [27-33] adds the interrupt related changes. > PATCH [34-49] Adds the guest side changes for CoVE. > > The TSM project is written in rust and can be found here: > https://github.com/rivosinc/salus > > Running the stack > ==================== > > To run/test the stack, you would need the following components : > > 1) Qemu > 2) Common Host & Guest Kernel > 3) kvmtool > 4) Host RootFS with KVMTOOL and Guest Kernel > 5) Salus > > The detailed steps are available at[6] > > The Linux kernel patches are also available at [7] and the kvmtool patches > are available at [8]. > > TODOs > ======= > As this is a very early work, the todo list is quite long :). > Here are some of them (not in any specific order) > > 1. Support fd based private memory interface proposed in > https://lkml.org/lkml/2022/1/18/395 > 2. Align with updated guest runtime device filtering approach. > 3. IOMMU integration > 4. Dedicated device assignment via TDSIP & SPDM[4] > 5. Support huge pages > 6. Page pool allocator to avoid convert/reclaim at every fault > 7. Other VMM support (qemu-kvm, crossvm) > 8. Complete the PoC for the deployment model 1 where host runs in HS mode > 9. Attestation integration > 10. Harden the interrupt allowed list > 11. kvm self-tests support for CoVE > 11. kvm unit-tests support for CoVE > 12. Guest hardening > 13. Port pKVM on RISC-V using CoVE > 14. Any other ? > > Links > ============ > [0] CoVE architecture Specification. > https://github.com/riscv-non-isa/riscv-ap-tee/blob/main/specification/riscv-aptee-spec.pdf I just noticed that this link is broken due to a recent PR merge. Here is the updated link https://github.com/riscv-non-isa/riscv-ap-tee/blob/main/specification/riscv-cove.pdf Sorry for the noise. > [1] https://lists.riscv.org/g/sig-hypervisors/message/260 > [2] https://github.com/riscv/riscv-aia/releases/download/1.0-RC2/riscv-interrupts-1.0-RC2.pdf > [3] https://github.com/rivosinc/linux/tree/cove_integration_device_filtering1 > [4] https://github.com/intel/tdx/commits/guest-filter-upstream > [5] https://lists.riscv.org/g/tech-ap-tee/message/83 > [6] https://github.com/rivosinc/cove/wiki/CoVE-KVM-RISCV64-on-QEMU > [7] https://github.com/rivosinc/linux/commits/cove-integration > [8] https://github.com/rivosinc/kvmtool/tree/cove-integration-03072023 > > Atish Patra (33): > RISC-V: KVM: Improve KVM error reporting to the user space > RISC-V: KVM: Invoke aia_update with preempt disabled/irq enabled > RISC-V: KVM: Add a helper function to get pgd size > RISC-V: Add COVH SBI extensions definitions > RISC-V: KVM: Implement COVH SBI extension > RISC-V: KVM: Add a barebone CoVE implementation > RISC-V: KVM: Add UABI to support static memory region attestation > RISC-V: KVM: Add CoVE related nacl helpers > RISC-V: KVM: Implement static memory region measurement > RISC-V: KVM: Use the new VM IOCTL for measuring pages > RISC-V: KVM: Exit to the user space for trap redirection > RISC-V: KVM: Return early for gstage modifications > RISC-V: KVM: Skip dirty logging updates for TVM > RISC-V: KVM: Add a helper function to trigger fence ops > RISC-V: KVM: Skip most VCPU requests for TVMs > RISC-V : KVM: Skip vmid/hgatp management for TVMs > RISC-V: KVM: Skip TLB management for TVMs > RISC-V: KVM: Register memory regions as confidential for TVMs > RISC-V: KVM: Add gstage mapping for TVMs > RISC-V: KVM: Handle SBI call forward from the TSM > RISC-V: KVM: Implement vcpu load/put functions for CoVE guests > RISC-V: KVM: Wireup TVM world switch > RISC-V: KVM: Skip HVIP update for TVMs > RISC-V: KVM: Implement COVI SBI extension > RISC-V: KVM: Add interrupt management functions for TVM > RISC-V: KVM: Skip AIA CSR updates for TVMs > RISC-V: KVM: Perform limited operations in hardware enable/disable > RISC-V: KVM: Indicate no support user space emulated IRQCHIP > RISC-V: KVM: Add AIA support for TVMs > RISC-V: KVM: Hookup TVM VCPU init/destroy > RISC-V: KVM: Initialize CoVE > RISC-V: KVM: Add TVM init/destroy calls > drivers/hvc: sbi: Disable HVC console for TVMs > > Rajnesh Kanwal (15): > mm/vmalloc: Introduce arch hooks to notify ioremap/unmap changes > RISC-V: KVM: Update timer functionality for TVMs. > RISC-V: Add COVI extension definitions > RISC-V: KVM: Read/write gprs from/to shmem in case of TVM VCPU. > RISC-V: Add COVG SBI extension definitions > RISC-V: Add CoVE guest config and helper functions > RISC-V: Implement COVG SBI extension > RISC-V: COVE: Add COVH invalidate, validate, promote, demote and > remove APIs. > RISC-V: KVM: Add host side support to handle COVG SBI calls. > RISC-V: Allow host to inject any ext interrupt id to a CoVE guest. > RISC-V: Add base memory encryption functions. > RISC-V: Add cc_platform_has() for RISC-V for CoVE > RISC-V: ioremap: Implement for arch specific ioremap hooks > riscv/virtio: Have CoVE guests enforce restricted virtio memory > access. > RISC-V: Add shared bounce buffer to support DBCN for CoVE Guest. > > arch/riscv/Kbuild | 2 + > arch/riscv/Kconfig | 27 + > arch/riscv/cove/Makefile | 2 + > arch/riscv/cove/core.c | 40 + > arch/riscv/cove/cove_guest_sbi.c | 109 +++ > arch/riscv/include/asm/cove.h | 27 + > arch/riscv/include/asm/covg_sbi.h | 38 + > arch/riscv/include/asm/csr.h | 2 + > arch/riscv/include/asm/kvm_cove.h | 206 +++++ > arch/riscv/include/asm/kvm_cove_sbi.h | 101 +++ > arch/riscv/include/asm/kvm_host.h | 10 +- > arch/riscv/include/asm/kvm_vcpu_sbi.h | 3 + > arch/riscv/include/asm/mem_encrypt.h | 26 + > arch/riscv/include/asm/sbi.h | 107 +++ > arch/riscv/include/uapi/asm/kvm.h | 17 + > arch/riscv/kernel/irq.c | 12 + > arch/riscv/kernel/setup.c | 2 + > arch/riscv/kvm/Makefile | 1 + > arch/riscv/kvm/aia.c | 101 ++- > arch/riscv/kvm/aia_device.c | 41 +- > arch/riscv/kvm/aia_imsic.c | 127 ++- > arch/riscv/kvm/cove.c | 1005 +++++++++++++++++++++++ > arch/riscv/kvm/cove_sbi.c | 490 +++++++++++ > arch/riscv/kvm/main.c | 30 +- > arch/riscv/kvm/mmu.c | 45 +- > arch/riscv/kvm/tlb.c | 11 +- > arch/riscv/kvm/vcpu.c | 69 +- > arch/riscv/kvm/vcpu_exit.c | 34 +- > arch/riscv/kvm/vcpu_insn.c | 115 ++- > arch/riscv/kvm/vcpu_sbi.c | 16 + > arch/riscv/kvm/vcpu_sbi_covg.c | 232 ++++++ > arch/riscv/kvm/vcpu_timer.c | 26 +- > arch/riscv/kvm/vm.c | 34 +- > arch/riscv/kvm/vmid.c | 17 +- > arch/riscv/mm/Makefile | 3 + > arch/riscv/mm/init.c | 17 +- > arch/riscv/mm/ioremap.c | 45 + > arch/riscv/mm/mem_encrypt.c | 61 ++ > drivers/tty/hvc/hvc_riscv_sbi.c | 5 + > drivers/tty/serial/earlycon-riscv-sbi.c | 51 +- > include/uapi/linux/kvm.h | 8 + > mm/vmalloc.c | 16 + > 42 files changed, 3222 insertions(+), 109 deletions(-) > create mode 100644 arch/riscv/cove/Makefile > create mode 100644 arch/riscv/cove/core.c > create mode 100644 arch/riscv/cove/cove_guest_sbi.c > create mode 100644 arch/riscv/include/asm/cove.h > create mode 100644 arch/riscv/include/asm/covg_sbi.h > create mode 100644 arch/riscv/include/asm/kvm_cove.h > create mode 100644 arch/riscv/include/asm/kvm_cove_sbi.h > create mode 100644 arch/riscv/include/asm/mem_encrypt.h > create mode 100644 arch/riscv/kvm/cove.c > create mode 100644 arch/riscv/kvm/cove_sbi.c > create mode 100644 arch/riscv/kvm/vcpu_sbi_covg.c > create mode 100644 arch/riscv/mm/ioremap.c > create mode 100644 arch/riscv/mm/mem_encrypt.c > > -- > 2.25.1 >
On Wed, Apr 19, 2023, Atish Patra wrote: > 2. Lazy gstage page allocation vs upfront allocation with page pool. > Currently, all gstage mappings happen at runtime during the fault. This is expensive > as we need to convert that page to confidential memory as well. A page pool framework > may be a better choice which can hold all the confidential pages which can be > pre-allocated upfront. A generic page pool infrastructure may benefit other CC solutions ? I'm sorry, what? Do y'all really not pay any attention to what is happening outside of the RISC-V world? We, where "we" is KVM x86 and ARM, with folks contributing from 5+ companines, have been working on this problem for going on three *years*. And that's just from the first public posting[1], there have been discussions about how to approach this for even longer. There have been multiple related presentations at KVM Forum, something like 4 or 5 just at KVM Forum 2022 alone. Patch 1 says "This patch is based on pkvm patches", so clearly you are at least aware that there is other work going on in this space. At a very quick glance, this series is suffers from all of the same flaws that SNP, TDX, and pKVM have encountered. E.g. assuming guest memory is backed by struct page memory, relying on pinning to solve all problems (hint, it doesn't), and so on and so forth. And to make things worse, this series is riddled with bugs. E.g. patch 19 alone manages to squeeze in multiple fatal bugs in five new lines of code: deadlock due to not releasing mmap_lock on failure, failure to correcty handle MOVE, failure to handle DELETE at all, failure to honor (or reject) READONLY, and probably several others. diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c index 4b0f09e..63889d9 100644 --- a/arch/riscv/kvm/mmu.c +++ b/arch/riscv/kvm/mmu.c @@ -499,6 +499,11 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, mmap_read_lock(current->mm); + if (is_cove_vm(kvm)) { + ret = kvm_riscv_cove_vm_add_memreg(kvm, base_gpa, size); + if (ret) + return ret; + } /* * A memory region could potentially cover multiple VMAs, and * any holes between them, so iterate over all of them to find I get that this is an RFC, but for a series of this size, operating in an area that is under heavy development by multiple other architectures, to have a diffstat that shows _zero_ changes to common KVM is simply unacceptable. Please, go look at restrictedmem[2] and work on building CoVE support on top of that. If the current proposal doesn't fit CoVE's needs, then we need to know _before_ all of that code gets merged. [1] https://lore.kernel.org/linux-mm/20200522125214.31348-1-kirill.shutemov@linux.intel.com [2] https://lkml.kernel.org/r/20221202061347.1070246-1-chao.p.peng%40linux.intel.com > arch/riscv/Kbuild | 2 + > arch/riscv/Kconfig | 27 + > arch/riscv/cove/Makefile | 2 + > arch/riscv/cove/core.c | 40 + > arch/riscv/cove/cove_guest_sbi.c | 109 +++ > arch/riscv/include/asm/cove.h | 27 + > arch/riscv/include/asm/covg_sbi.h | 38 + > arch/riscv/include/asm/csr.h | 2 + > arch/riscv/include/asm/kvm_cove.h | 206 +++++ > arch/riscv/include/asm/kvm_cove_sbi.h | 101 +++ > arch/riscv/include/asm/kvm_host.h | 10 +- > arch/riscv/include/asm/kvm_vcpu_sbi.h | 3 + > arch/riscv/include/asm/mem_encrypt.h | 26 + > arch/riscv/include/asm/sbi.h | 107 +++ > arch/riscv/include/uapi/asm/kvm.h | 17 + > arch/riscv/kernel/irq.c | 12 + > arch/riscv/kernel/setup.c | 2 + > arch/riscv/kvm/Makefile | 1 + > arch/riscv/kvm/aia.c | 101 ++- > arch/riscv/kvm/aia_device.c | 41 +- > arch/riscv/kvm/aia_imsic.c | 127 ++- > arch/riscv/kvm/cove.c | 1005 +++++++++++++++++++++++ > arch/riscv/kvm/cove_sbi.c | 490 +++++++++++ > arch/riscv/kvm/main.c | 30 +- > arch/riscv/kvm/mmu.c | 45 +- > arch/riscv/kvm/tlb.c | 11 +- > arch/riscv/kvm/vcpu.c | 69 +- > arch/riscv/kvm/vcpu_exit.c | 34 +- > arch/riscv/kvm/vcpu_insn.c | 115 ++- > arch/riscv/kvm/vcpu_sbi.c | 16 + > arch/riscv/kvm/vcpu_sbi_covg.c | 232 ++++++ > arch/riscv/kvm/vcpu_timer.c | 26 +- > arch/riscv/kvm/vm.c | 34 +- > arch/riscv/kvm/vmid.c | 17 +- > arch/riscv/mm/Makefile | 3 + > arch/riscv/mm/init.c | 17 +- > arch/riscv/mm/ioremap.c | 45 + > arch/riscv/mm/mem_encrypt.c | 61 ++ > drivers/tty/hvc/hvc_riscv_sbi.c | 5 + > drivers/tty/serial/earlycon-riscv-sbi.c | 51 +- > include/uapi/linux/kvm.h | 8 + > mm/vmalloc.c | 16 + > 42 files changed, 3222 insertions(+), 109 deletions(-) > create mode 100644 arch/riscv/cove/Makefile > create mode 100644 arch/riscv/cove/core.c > create mode 100644 arch/riscv/cove/cove_guest_sbi.c > create mode 100644 arch/riscv/include/asm/cove.h > create mode 100644 arch/riscv/include/asm/covg_sbi.h > create mode 100644 arch/riscv/include/asm/kvm_cove.h > create mode 100644 arch/riscv/include/asm/kvm_cove_sbi.h > create mode 100644 arch/riscv/include/asm/mem_encrypt.h > create mode 100644 arch/riscv/kvm/cove.c > create mode 100644 arch/riscv/kvm/cove_sbi.c > create mode 100644 arch/riscv/kvm/vcpu_sbi_covg.c > create mode 100644 arch/riscv/mm/ioremap.c > create mode 100644 arch/riscv/mm/mem_encrypt.c > > -- > 2.25.1 >
On Thu, Apr 20, 2023 at 10:00 PM Sean Christopherson <seanjc@google.com> wrote: > > On Wed, Apr 19, 2023, Atish Patra wrote: > > 2. Lazy gstage page allocation vs upfront allocation with page pool. > > Currently, all gstage mappings happen at runtime during the fault. This is expensive > > as we need to convert that page to confidential memory as well. A page pool framework > > may be a better choice which can hold all the confidential pages which can be > > pre-allocated upfront. A generic page pool infrastructure may benefit other CC solutions ? > > I'm sorry, what? Do y'all really not pay any attention to what is happening > outside of the RISC-V world? > > We, where "we" is KVM x86 and ARM, with folks contributing from 5+ companines, > have been working on this problem for going on three *years*. And that's just > from the first public posting[1], there have been discussions about how to approach > this for even longer. There have been multiple related presentations at KVM Forum, > something like 4 or 5 just at KVM Forum 2022 alone. > Yes. We are following the restrictedmem effort and was reviewing the v10 this week. I did mention about that in the 1st item in the TODO list. We are planning to use the restrictedmen feature once it is closer to upstream (which seems to be the case looking at v10). Another reason is that this initial series is based on kvmtool only. We are working on qemu-kvm right now but have some RISC-V specific dependencies(interrupt controller stuff) which are not there yet. As the restrictedmem patches are already available in qemu-kvm too, our plan was to support CoVE in qemu-kvm first and work on restrictedmem after that. This item was just based on this RFC implementation which uses a lazy gstage page allocation. The idea was to check if there is any interest at all in this approach. I should have mentioned about restrictedmem plan in this section as well. Sorry for the confusion. Thanks for your suggestion. It seems we should just directly move to restrictedmem asap. > Patch 1 says "This patch is based on pkvm patches", so clearly you are at least > aware that there is other work going on in this space. > Yes. We have been following pkvm, tdx & CCA patches. The MMIO section has more details on TDX/pkvm related aspects. > At a very quick glance, this series is suffers from all of the same flaws that SNP, > TDX, and pKVM have encountered. E.g. assuming guest memory is backed by struct page > memory, relying on pinning to solve all problems (hint, it doesn't), and so on and > so forth. > > And to make things worse, this series is riddled with bugs. E.g. patch 19 alone > manages to squeeze in multiple fatal bugs in five new lines of code: deadlock due > to not releasing mmap_lock on failure, failure to correcty handle MOVE, failure to That's an oversight. Apologies for that. Thanks for pointing it out. > handle DELETE at all, failure to honor (or reject) READONLY, and probably several > others. > It should be rejected for READONLY as our APIs don't have any permission flags yet. I think we should add that to enable CoVE APIs to support as well ? Same goes for DELETE ops as we don't have an API to delete any confidential memory region yet. I was not very sure about the use case for MOVE though (migration possibly ?) kvm_riscv_cove_vm_add_memreg should have been invoked only for CREATE & reject others for now. I will revise the patch accordingly and leave a TODO comment for the future about API updates. > diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c > index 4b0f09e..63889d9 100644 > --- a/arch/riscv/kvm/mmu.c > +++ b/arch/riscv/kvm/mmu.c > @@ -499,6 +499,11 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, > > mmap_read_lock(current->mm); > > + if (is_cove_vm(kvm)) { > + ret = kvm_riscv_cove_vm_add_memreg(kvm, base_gpa, size); > + if (ret) > + return ret; > + } > /* > * A memory region could potentially cover multiple VMAs, and > * any holes between them, so iterate over all of them to find > > I get that this is an RFC, but for a series of this size, operating in an area that > is under heavy development by multiple other architectures, to have a diffstat that > shows _zero_ changes to common KVM is simply unacceptable. > Thanks for the valuable feedback. This is pretty much pre-RFC as the spec is very much in draft state. We want to share with the larger linux community to gather feedback sooner than later so that we can incorporate that feedback into the spec if any. > Please, go look at restrictedmem[2] and work on building CoVE support on top of > that. If the current proposal doesn't fit CoVE's needs, then we need to know _before_ > all of that code gets merged. > Absolutely. That has always been the plan. > [1] https://lore.kernel.org/linux-mm/20200522125214.31348-1-kirill.shutemov@linux.intel.com > [2] https://lkml.kernel.org/r/20221202061347.1070246-1-chao.p.peng%40linux.intel.com > > > arch/riscv/Kbuild | 2 + > > arch/riscv/Kconfig | 27 + > > arch/riscv/cove/Makefile | 2 + > > arch/riscv/cove/core.c | 40 + > > arch/riscv/cove/cove_guest_sbi.c | 109 +++ > > arch/riscv/include/asm/cove.h | 27 + > > arch/riscv/include/asm/covg_sbi.h | 38 + > > arch/riscv/include/asm/csr.h | 2 + > > arch/riscv/include/asm/kvm_cove.h | 206 +++++ > > arch/riscv/include/asm/kvm_cove_sbi.h | 101 +++ > > arch/riscv/include/asm/kvm_host.h | 10 +- > > arch/riscv/include/asm/kvm_vcpu_sbi.h | 3 + > > arch/riscv/include/asm/mem_encrypt.h | 26 + > > arch/riscv/include/asm/sbi.h | 107 +++ > > arch/riscv/include/uapi/asm/kvm.h | 17 + > > arch/riscv/kernel/irq.c | 12 + > > arch/riscv/kernel/setup.c | 2 + > > arch/riscv/kvm/Makefile | 1 + > > arch/riscv/kvm/aia.c | 101 ++- > > arch/riscv/kvm/aia_device.c | 41 +- > > arch/riscv/kvm/aia_imsic.c | 127 ++- > > arch/riscv/kvm/cove.c | 1005 +++++++++++++++++++++++ > > arch/riscv/kvm/cove_sbi.c | 490 +++++++++++ > > arch/riscv/kvm/main.c | 30 +- > > arch/riscv/kvm/mmu.c | 45 +- > > arch/riscv/kvm/tlb.c | 11 +- > > arch/riscv/kvm/vcpu.c | 69 +- > > arch/riscv/kvm/vcpu_exit.c | 34 +- > > arch/riscv/kvm/vcpu_insn.c | 115 ++- > > arch/riscv/kvm/vcpu_sbi.c | 16 + > > arch/riscv/kvm/vcpu_sbi_covg.c | 232 ++++++ > > arch/riscv/kvm/vcpu_timer.c | 26 +- > > arch/riscv/kvm/vm.c | 34 +- > > arch/riscv/kvm/vmid.c | 17 +- > > arch/riscv/mm/Makefile | 3 + > > arch/riscv/mm/init.c | 17 +- > > arch/riscv/mm/ioremap.c | 45 + > > arch/riscv/mm/mem_encrypt.c | 61 ++ > > drivers/tty/hvc/hvc_riscv_sbi.c | 5 + > > drivers/tty/serial/earlycon-riscv-sbi.c | 51 +- > > include/uapi/linux/kvm.h | 8 + > > mm/vmalloc.c | 16 + > > 42 files changed, 3222 insertions(+), 109 deletions(-) > > create mode 100644 arch/riscv/cove/Makefile > > create mode 100644 arch/riscv/cove/core.c > > create mode 100644 arch/riscv/cove/cove_guest_sbi.c > > create mode 100644 arch/riscv/include/asm/cove.h > > create mode 100644 arch/riscv/include/asm/covg_sbi.h > > create mode 100644 arch/riscv/include/asm/kvm_cove.h > > create mode 100644 arch/riscv/include/asm/kvm_cove_sbi.h > > create mode 100644 arch/riscv/include/asm/mem_encrypt.h > > create mode 100644 arch/riscv/kvm/cove.c > > create mode 100644 arch/riscv/kvm/cove_sbi.c > > create mode 100644 arch/riscv/kvm/vcpu_sbi_covg.c > > create mode 100644 arch/riscv/mm/ioremap.c > > create mode 100644 arch/riscv/mm/mem_encrypt.c > > > > -- > > 2.25.1 > >
On Fri, Apr 21, 2023, Atish Kumar Patra wrote: > On Thu, Apr 20, 2023 at 10:00 PM Sean Christopherson <seanjc@google.com> wrote: > > > > On Wed, Apr 19, 2023, Atish Patra wrote: > > > 2. Lazy gstage page allocation vs upfront allocation with page pool. > > > Currently, all gstage mappings happen at runtime during the fault. This is expensive > > > as we need to convert that page to confidential memory as well. A page pool framework > > > may be a better choice which can hold all the confidential pages which can be > > > pre-allocated upfront. A generic page pool infrastructure may benefit other CC solutions ? > > > > I'm sorry, what? Do y'all really not pay any attention to what is happening > > outside of the RISC-V world? > > > > We, where "we" is KVM x86 and ARM, with folks contributing from 5+ companines, > > have been working on this problem for going on three *years*. And that's just > > from the first public posting[1], there have been discussions about how to approach > > this for even longer. There have been multiple related presentations at KVM Forum, > > something like 4 or 5 just at KVM Forum 2022 alone. > > > > I did mention about that in the 1st item in the TODO list. My apologies, I completely missed the todo list. > Thanks for your suggestion. It seems we should just directly move to > restrictedmem asap. Yes please, for the sake of everyone involved. It will likely save you from running into the same pitfalls that x86 and ARM already encountered, and the more eyeballs and use cases on whatever restrictemem ends up being called, the better. Thanks!
On Thu, Apr 20, 2023 at 09:30:29AM -0700, Sean Christopherson wrote: > Please, go look at restrictedmem[2] and work on building CoVE support on top of > that. If the current proposal doesn't fit CoVE's needs, then we need to know _before_ > all of that code gets merged. I agree it's preferable to know beforehand to avoid potential maintainability quagmires bringing additional architectures onboard, and that it probably makes sense here to get that early input. But as a general statement, it's not necessarily a *requirement*. I worry that if we commit to such a policy that by the time restrictedmem gets close to merge, yet another architecture/use-case will come along that delays things further for architectures that already have hardware in the field. Not saying that's the case here, but just in general I think it's worth keeping the option open on iterating on a partial solution vs. trying to address everything on the first shot, depending on how the timing works out. Thanks, Mike > > [1] https://lore.kernel.org/linux-mm/20200522125214.31348-1-kirill.shutemov@linux.intel.com > [2] https://lkml.kernel.org/r/20221202061347.1070246-1-chao.p.peng%40linux.intel.com > > > arch/riscv/Kbuild | 2 + > > arch/riscv/Kconfig | 27 + > > arch/riscv/cove/Makefile | 2 + > > arch/riscv/cove/core.c | 40 + > > arch/riscv/cove/cove_guest_sbi.c | 109 +++ > > arch/riscv/include/asm/cove.h | 27 + > > arch/riscv/include/asm/covg_sbi.h | 38 + > > arch/riscv/include/asm/csr.h | 2 + > > arch/riscv/include/asm/kvm_cove.h | 206 +++++ > > arch/riscv/include/asm/kvm_cove_sbi.h | 101 +++ > > arch/riscv/include/asm/kvm_host.h | 10 +- > > arch/riscv/include/asm/kvm_vcpu_sbi.h | 3 + > > arch/riscv/include/asm/mem_encrypt.h | 26 + > > arch/riscv/include/asm/sbi.h | 107 +++ > > arch/riscv/include/uapi/asm/kvm.h | 17 + > > arch/riscv/kernel/irq.c | 12 + > > arch/riscv/kernel/setup.c | 2 + > > arch/riscv/kvm/Makefile | 1 + > > arch/riscv/kvm/aia.c | 101 ++- > > arch/riscv/kvm/aia_device.c | 41 +- > > arch/riscv/kvm/aia_imsic.c | 127 ++- > > arch/riscv/kvm/cove.c | 1005 +++++++++++++++++++++++ > > arch/riscv/kvm/cove_sbi.c | 490 +++++++++++ > > arch/riscv/kvm/main.c | 30 +- > > arch/riscv/kvm/mmu.c | 45 +- > > arch/riscv/kvm/tlb.c | 11 +- > > arch/riscv/kvm/vcpu.c | 69 +- > > arch/riscv/kvm/vcpu_exit.c | 34 +- > > arch/riscv/kvm/vcpu_insn.c | 115 ++- > > arch/riscv/kvm/vcpu_sbi.c | 16 + > > arch/riscv/kvm/vcpu_sbi_covg.c | 232 ++++++ > > arch/riscv/kvm/vcpu_timer.c | 26 +- > > arch/riscv/kvm/vm.c | 34 +- > > arch/riscv/kvm/vmid.c | 17 +- > > arch/riscv/mm/Makefile | 3 + > > arch/riscv/mm/init.c | 17 +- > > arch/riscv/mm/ioremap.c | 45 + > > arch/riscv/mm/mem_encrypt.c | 61 ++ > > drivers/tty/hvc/hvc_riscv_sbi.c | 5 + > > drivers/tty/serial/earlycon-riscv-sbi.c | 51 +- > > include/uapi/linux/kvm.h | 8 + > > mm/vmalloc.c | 16 + > > 42 files changed, 3222 insertions(+), 109 deletions(-) > > create mode 100644 arch/riscv/cove/Makefile > > create mode 100644 arch/riscv/cove/core.c > > create mode 100644 arch/riscv/cove/cove_guest_sbi.c > > create mode 100644 arch/riscv/include/asm/cove.h > > create mode 100644 arch/riscv/include/asm/covg_sbi.h > > create mode 100644 arch/riscv/include/asm/kvm_cove.h > > create mode 100644 arch/riscv/include/asm/kvm_cove_sbi.h > > create mode 100644 arch/riscv/include/asm/mem_encrypt.h > > create mode 100644 arch/riscv/kvm/cove.c > > create mode 100644 arch/riscv/kvm/cove_sbi.c > > create mode 100644 arch/riscv/kvm/vcpu_sbi_covg.c > > create mode 100644 arch/riscv/mm/ioremap.c > > create mode 100644 arch/riscv/mm/mem_encrypt.c > > > > -- > > 2.25.1 > > >
On 2023-04-19 at 15:16 -07, Atish Patra <atishp@rivosinc.com> wrote... > This patch series adds the RISC-V Confidential VM Extension (CoVE) support to > Linux kernel. The RISC-V CoVE specification introduces non-ISA, SBI APIs. These > APIs enable a confidential environment in which a guest VM's data can be isolated > from the host while the host retains control of guest VM management and platform > resources(memory, CPU, I/O). > > This is a very early WIP work. We want to share this with the community to get any > feedback on overall architecture and direction. Any other feedback is welcome too. > > The detailed CoVE architecture document can be found here [0]. It used to be > called AP-TEE and renamed to CoVE recently to avoid overloading term of TEE in > general. The specification is in the draft stages and is subjected to change based > on the feedback from the community. > > The CoVE specification introduces 3 new SBI extensions. > COVH - CoVE Host side interface > COVG - CoVE Guest side interface > COVI - CoVE Secure Interrupt management extension > > Some key acronyms introduced: > > TSM - TEE Security Manager > TVM - TEE VM (aka Confidential VM) > > CoVE Architecture: > ==================== > The CoVE APIs are designed to be implementation and architecture agnostic, > allowing for different deployment models while retaining common host and guest > kernel code. Two examples are shown in Figure 1 and Figure 2. > As shown in both figures, the architecture introduces a new software component > called the "TEE Security Manager" (TSM) that runs in HS mode. The TSM has minimal > hw attested footprint on TCB as it is a passive component that doesn't support > scheduling or timer interrupts. Both example deployment models provide memory > isolation between the host and the TEE VM (TVM). > > > Non secure world | Secure world | > | | > Non | | > Virtualized | Virtualized | Virtualized Virtualized | > Env | Env | Env Env | > +----------+ | +----------+ | +----------+ +----------+ | -------------- > | | | | | | | | | | | > | Host Apps| | | Apps | | | Apps | | Apps | | VU-Mode > | (VMM) | | | | | | | | | | > +----------+ | +----------+ | +----------+ +----------+ | -------------- > | | +----------+ | +----------+ +----------+ | > | | | | | | | | | | > | | | | | | TVM | | TVM | | > | | | Guest | | | Guest | | Guest | | VS-Mode > Syscalls | +----------+ | +----------+ +----------+ | > | | | | | > | SBI | SBI(COVG + COVI) | > | | | | | > +--------------------------+ | +---------------------------+ -------------- > | Host (Linux) | | | TSM (Salus) | > +--------------------------+ | +---------------------------+ > | | | HS-Mode > SBI (COVH + COVI) | SBI (COVH + COVI) > | | | > +-----------------------------------------------------------+ -------------- > | Firmware(OpenSBI) + TSM Driver | M-Mode > +-----------------------------------------------------------+ -------------- > +----------------------------------------------------------------------------- > | Hardware (RISC-V CPU + RoT + IOMMU) > +---------------------------------------------------------------------------- > Figure 1: Host in HS model > > > The deployment model shown in Figure 1 runs the host in HS mode where it is peer > to the TSM which also runs in HS mode. It requires another component known as TSM > Driver running in higher privilege mode than host/TSM. It is responsible for switching > the context between the host and the TSM. TSM driver also manages the platform > specific hardware solution via confidential domain bit as described in the specification[0] > to provide the required memory isolation. > > > Non secure world | Secure world > | > Virtualized Env | Virtualized Virtualized | > Env Env | > +-------------------------+ | +----------+ +----------+ | ------------ > | | | | | | | | | | | > | Host Apps| | | Apps | | | Apps | | Apps | | VU-Mode > +----------+ | +----------+ | +----------+ +----------+ | ------------ > | | | | | > Syscalls SBI | | | | > | | | | | > +--------------------------+ | +-----------+ +-----------+ | > | Host (Linux) | | | TVM Guest| | TVM Guest| | VS-Mode > +--------------------------+ | +-----------+ +-----------+ | > | | | | | > SBI (COVH + COVI) | SBI SBI | > | | (COVG + COVI) (COVG + COVI)| > | | | | | > +-----------------------------------------------------------+ -------------- > | TSM(Salus) | HS-Mode > +-----------------------------------------------------------+ -------------- > | > SBI > | > +---------------------------------------------------------+ -------------- > | Firmware(OpenSBI) | M-Mode > +---------------------------------------------------------+ -------------- > +----------------------------------------------------------------------------- > | Hardware (RISC-V CPU + RoT + IOMMU) > +---------------------------------------------------------------------------- > Figure 2: Host in VS model > > > The deployment model shown in Figure 2 simplifies the context switch and memory isolation > by running the host in VS mode as a guest of TSM. Thus, the memory isolation is > achieved by gstage mapping by the TSM. We don't need any additional hardware confidential > domain bit to provide memory isolation. The downside of this model the host has to run the > non-confidential VMs in nested environment which may have lower performance (yet to be measured). > The current implementation Salus(TSM) doesn't support full nested virtualization yet. > > The platform must have a RoT to provide attestation in either model. > This patch series implements the APIs defined by CoVE. The current TSM implementation > allows the host to run TVMs as shown in figure 2. We are working on deployment > model 1 in parallel. We do not expect any significant changes in either host/guest side > ABI due to that. > > Shared memory between the host & TSM: > ===================================== > To accelerate the H-mode CSR/GPR access, CoVE also reuses the Nested Acceleration (NACL) > SBI extension[1]. NACL defines a per physical cpu shared memory area that is allocated > at the boot. It allows the host running in VS mode to access H-mode CSR/GPR easily > without trapping into the TSM. The CoVE specification clearly defines the exact > state of the shared memory with r/w permissions at every call. > > Secure Interrupt management: > =========================== > The CoVE specification relies on the MSI based interrupt scheme defined in Advanced Interrupt > Architecture specification[2]. The COVI SBI extension adds functions to bind > a guest interrupt file to a TVMs. After that, only TCB components (TSM, TVM, TSM driver) > can modify that. The host can inject an interrupt via TSM only. > The TVMs are also in complete control of which interrupts it can receive. By default, > all interrupts are denied. In this proof-of-concept implementation, all the interrupts > are allowed by the guest at boot time to keep it simple. > > Device I/O: > =========== > In order to support paravirt I/O devices, SWIOTLB bounce buffer must be used by the > guest. As the host can not access confidential memory, this buffer memory > must be shared with the host via share/unshare functions defined in COVG SBI extension. > RISC-V implementation achieves this generalizing mem_encrypt_init() similar to TDX/SEV/CCA. > That's why, the CoVE Guest is only allowed to use virtio devices with VIRTIO_F_ACCESS_PLATFORM > and VIRTIO_F_VERSION_1 as they force virtio drivers to use the DMA API. > > MMIO emulation: > ====================== > TVM can register regions of address space as MMIO regions to be emulated by > the host. TSM provides explicit SBI functions i.e. SBI_EXT_COVG_[ADD/REMOVE]_MMIO_REGION > to request/remove MMIO regions. Any reads or writes to those MMIO region after > SBI_EXT_COVG_ADD_MMIO_REGION call are forwarded to the host for emulation. > > This series allows any ioremapped memory to be emulated as MMIO region with > above APIs via arch hookups inspired from pKVM work. We are aware that this model > doesn't address all the threat vectors. We have also implemented the device > filtering/authorization approach adopted by TDX[4]. However, those patches are not > part of this series as the base TDX patches are still under active development. > RISC-V CoVE will also adapt the revamped device filtering work once it is accepted > by the Linux community in the future. > > The direct assignment of devices are a work in progress and will be added in the future[4]. > > VMM support: > ============ > This series is only tested with kvmtool support. Other VMM support (qemu-kvm, crossvm/rust-vmm) > will be added later. > > Test cases: > =========== > We are working on kvm selftest for CoVE. We will post them as soon as they are ready. > We haven't started any work on kvm unit-tests as RISC-V doesn't have basic infrastructure > to support that. Once the kvm uni-test infrastructure is in place, we will add > support for CoVE as well. > > Open design questions: > ====================== > > 1. The current implementation has two separate configs for guest(CONFIG_RISCV_COVE_GUEST) > and the host (RISCV_COVE_HOST). The default defconfig will enable both so that > same unified image works as both host & guest. Most likely distro prefer this way > to minimize the maintenance burden but some may want a minimal CoVE guest image > that has only hardened drivers. In addition to that, Android runs a microdroid instance > in the confidential guests. A separate config will help in those case. Please let us > know if there is any concern with two configs. > > 2. Lazy gstage page allocation vs upfront allocation with page pool. > Currently, all gstage mappings happen at runtime during the fault. This is expensive > as we need to convert that page to confidential memory as well. A page pool framework > may be a better choice which can hold all the confidential pages which can be > pre-allocated upfront. A generic page pool infrastructure may benefit other CC solutions ? > > 3. In order to allow both confidential VM and non-confidential VM, the series > uses regular branching instead of static branches for CoVE VM specific cases through > out KVM. That may cause a few more branch penalties while running regular VMs. > The alternate option is to use function pointers for any function that needs to > take a different path. As per my understanding, that would be worse than branches. > > Patch organization: > =================== > This series depends on quite a few RISC-V patches that are not upstream yet. > Here are the dependencies. > > 1. RISC-V IPI improvement series > 2. RISC-V AIA support series. > 3. RISC-V NACL support series > > In this series, PATCH [0-5] are generic improvement and cleanup patches which > can be merged independently. > > PATCH [6-26, 34-37] adds host side for CoVE. > PATCH [27-33] adds the interrupt related changes. > PATCH [34-49] Adds the guest side changes for CoVE. > > The TSM project is written in rust and can be found here: > https://github.com/rivosinc/salus > > Running the stack > ==================== > > To run/test the stack, you would need the following components : > > 1) Qemu > 2) Common Host & Guest Kernel > 3) kvmtool > 4) Host RootFS with KVMTOOL and Guest Kernel > 5) Salus > > The detailed steps are available at[6] > > The Linux kernel patches are also available at [7] and the kvmtool patches > are available at [8]. > > TODOs > ======= > As this is a very early work, the todo list is quite long :). > Here are some of them (not in any specific order) > > 1. Support fd based private memory interface proposed in > https://lkml.org/lkml/2022/1/18/395 > 2. Align with updated guest runtime device filtering approach. > 3. IOMMU integration > 4. Dedicated device assignment via TDSIP & SPDM[4] > 5. Support huge pages > 6. Page pool allocator to avoid convert/reclaim at every fault > 7. Other VMM support (qemu-kvm, crossvm) > 8. Complete the PoC for the deployment model 1 where host runs in HS mode > 9. Attestation integration > 10. Harden the interrupt allowed list > 11. kvm self-tests support for CoVE > 11. kvm unit-tests support for CoVE > 12. Guest hardening > 13. Port pKVM on RISC-V using CoVE > 14. Any other ? > > Links > ============ > [0] CoVE architecture Specification. > https://github.com/riscv-non-isa/riscv-ap-tee/blob/main/specification/riscv-aptee-spec.pdf URL does not work for me. I found this: https://github.com/riscv-non-isa/riscv-ap-tee/blob/main/specification/riscv-cove.pdf > [1] https://lists.riscv.org/g/sig-hypervisors/message/260 > [2] https://github.com/riscv/riscv-aia/releases/download/1.0-RC2/riscv-interrupts-1.0-RC2.pdf > [3] https://github.com/rivosinc/linux/tree/cove_integration_device_filtering1 > [4] https://github.com/intel/tdx/commits/guest-filter-upstream > [5] https://lists.riscv.org/g/tech-ap-tee/message/83 > [6] https://github.com/rivosinc/cove/wiki/CoVE-KVM-RISCV64-on-QEMU > [7] https://github.com/rivosinc/linux/commits/cove-integration > [8] https://github.com/rivosinc/kvmtool/tree/cove-integration-03072023 > > Atish Patra (33): > RISC-V: KVM: Improve KVM error reporting to the user space > RISC-V: KVM: Invoke aia_update with preempt disabled/irq enabled > RISC-V: KVM: Add a helper function to get pgd size > RISC-V: Add COVH SBI extensions definitions > RISC-V: KVM: Implement COVH SBI extension > RISC-V: KVM: Add a barebone CoVE implementation > RISC-V: KVM: Add UABI to support static memory region attestation > RISC-V: KVM: Add CoVE related nacl helpers > RISC-V: KVM: Implement static memory region measurement > RISC-V: KVM: Use the new VM IOCTL for measuring pages > RISC-V: KVM: Exit to the user space for trap redirection > RISC-V: KVM: Return early for gstage modifications > RISC-V: KVM: Skip dirty logging updates for TVM > RISC-V: KVM: Add a helper function to trigger fence ops > RISC-V: KVM: Skip most VCPU requests for TVMs > RISC-V : KVM: Skip vmid/hgatp management for TVMs > RISC-V: KVM: Skip TLB management for TVMs > RISC-V: KVM: Register memory regions as confidential for TVMs > RISC-V: KVM: Add gstage mapping for TVMs > RISC-V: KVM: Handle SBI call forward from the TSM > RISC-V: KVM: Implement vcpu load/put functions for CoVE guests > RISC-V: KVM: Wireup TVM world switch > RISC-V: KVM: Skip HVIP update for TVMs > RISC-V: KVM: Implement COVI SBI extension > RISC-V: KVM: Add interrupt management functions for TVM > RISC-V: KVM: Skip AIA CSR updates for TVMs > RISC-V: KVM: Perform limited operations in hardware enable/disable > RISC-V: KVM: Indicate no support user space emulated IRQCHIP > RISC-V: KVM: Add AIA support for TVMs > RISC-V: KVM: Hookup TVM VCPU init/destroy > RISC-V: KVM: Initialize CoVE > RISC-V: KVM: Add TVM init/destroy calls > drivers/hvc: sbi: Disable HVC console for TVMs > > Rajnesh Kanwal (15): > mm/vmalloc: Introduce arch hooks to notify ioremap/unmap changes > RISC-V: KVM: Update timer functionality for TVMs. > RISC-V: Add COVI extension definitions > RISC-V: KVM: Read/write gprs from/to shmem in case of TVM VCPU. > RISC-V: Add COVG SBI extension definitions > RISC-V: Add CoVE guest config and helper functions > RISC-V: Implement COVG SBI extension > RISC-V: COVE: Add COVH invalidate, validate, promote, demote and > remove APIs. > RISC-V: KVM: Add host side support to handle COVG SBI calls. > RISC-V: Allow host to inject any ext interrupt id to a CoVE guest. > RISC-V: Add base memory encryption functions. > RISC-V: Add cc_platform_has() for RISC-V for CoVE > RISC-V: ioremap: Implement for arch specific ioremap hooks > riscv/virtio: Have CoVE guests enforce restricted virtio memory > access. > RISC-V: Add shared bounce buffer to support DBCN for CoVE Guest. > > arch/riscv/Kbuild | 2 + > arch/riscv/Kconfig | 27 + > arch/riscv/cove/Makefile | 2 + > arch/riscv/cove/core.c | 40 + > arch/riscv/cove/cove_guest_sbi.c | 109 +++ > arch/riscv/include/asm/cove.h | 27 + > arch/riscv/include/asm/covg_sbi.h | 38 + > arch/riscv/include/asm/csr.h | 2 + > arch/riscv/include/asm/kvm_cove.h | 206 +++++ > arch/riscv/include/asm/kvm_cove_sbi.h | 101 +++ > arch/riscv/include/asm/kvm_host.h | 10 +- > arch/riscv/include/asm/kvm_vcpu_sbi.h | 3 + > arch/riscv/include/asm/mem_encrypt.h | 26 + > arch/riscv/include/asm/sbi.h | 107 +++ > arch/riscv/include/uapi/asm/kvm.h | 17 + > arch/riscv/kernel/irq.c | 12 + > arch/riscv/kernel/setup.c | 2 + > arch/riscv/kvm/Makefile | 1 + > arch/riscv/kvm/aia.c | 101 ++- > arch/riscv/kvm/aia_device.c | 41 +- > arch/riscv/kvm/aia_imsic.c | 127 ++- > arch/riscv/kvm/cove.c | 1005 +++++++++++++++++++++++ > arch/riscv/kvm/cove_sbi.c | 490 +++++++++++ > arch/riscv/kvm/main.c | 30 +- > arch/riscv/kvm/mmu.c | 45 +- > arch/riscv/kvm/tlb.c | 11 +- > arch/riscv/kvm/vcpu.c | 69 +- > arch/riscv/kvm/vcpu_exit.c | 34 +- > arch/riscv/kvm/vcpu_insn.c | 115 ++- > arch/riscv/kvm/vcpu_sbi.c | 16 + > arch/riscv/kvm/vcpu_sbi_covg.c | 232 ++++++ > arch/riscv/kvm/vcpu_timer.c | 26 +- > arch/riscv/kvm/vm.c | 34 +- > arch/riscv/kvm/vmid.c | 17 +- > arch/riscv/mm/Makefile | 3 + > arch/riscv/mm/init.c | 17 +- > arch/riscv/mm/ioremap.c | 45 + > arch/riscv/mm/mem_encrypt.c | 61 ++ > drivers/tty/hvc/hvc_riscv_sbi.c | 5 + > drivers/tty/serial/earlycon-riscv-sbi.c | 51 +- > include/uapi/linux/kvm.h | 8 + > mm/vmalloc.c | 16 + > 42 files changed, 3222 insertions(+), 109 deletions(-) > create mode 100644 arch/riscv/cove/Makefile > create mode 100644 arch/riscv/cove/core.c > create mode 100644 arch/riscv/cove/cove_guest_sbi.c > create mode 100644 arch/riscv/include/asm/cove.h > create mode 100644 arch/riscv/include/asm/covg_sbi.h > create mode 100644 arch/riscv/include/asm/kvm_cove.h > create mode 100644 arch/riscv/include/asm/kvm_cove_sbi.h > create mode 100644 arch/riscv/include/asm/mem_encrypt.h > create mode 100644 arch/riscv/kvm/cove.c > create mode 100644 arch/riscv/kvm/cove_sbi.c > create mode 100644 arch/riscv/kvm/vcpu_sbi_covg.c > create mode 100644 arch/riscv/mm/ioremap.c > create mode 100644 arch/riscv/mm/mem_encrypt.c -- Cheers, Christophe de Dinechin (https://c3d.github.io) Freedom Covenant (https://github.com/c3d/freedom-covenant) Theory of Incomplete Measurements (https://c3d.github.io/TIM)