[v2] docs: security: Confidential computing intro and threat model for x86 virtualization
Message ID | 20230612164727.3935657-1-carlos.bilbao@amd.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp17178vqr; Mon, 12 Jun 2023 10:16:50 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7ELXYPBmkR+NPYFnwPzZB72ABWBj5DbFPDDnsob1DW1ha5x2Pst4tpTxSzWlqhOELvMKcC X-Received: by 2002:a17:902:bcc6:b0:1b3:d11c:23ca with SMTP id o6-20020a170902bcc600b001b3d11c23camr2734326pls.8.1686590209891; Mon, 12 Jun 2023 10:16:49 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1686590209; cv=pass; d=google.com; s=arc-20160816; b=fNKrcvFFBiAijxVvV4OYWosqSiPEGBCHn2O2B4c5bk2ce91Jg0rktFXK90BghYF1ZB t/KHvkD/3Bw3HEkitnA6Ar/xmRM5Kr/87M3O3FGbKMVjCzMRIB3+1hA+MTMFFS2VJ/Rw qX68V3G08qK6I4n0fKKL9DwzsQe/WhzoTmdFqMSfX+l1t/NTzjx2O/HpSl0WyBfG2tmF wlnCEJcGyT8Be3bgxGKRS95rEhzQAwOYusCBhCvM/jAJIbWAswAU/7F/8Ohd+5zMxExr 49020CVvI9bB3L/fSFZJsrWoxHwMeOoOOgZ6LoxiRCRvn7O7MISjtoW49FJCqGfR3By4 svYQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=Hk60pp8Glx9LR/eSVjkS1qmHqY58icbX5GEGMbphrC8=; b=W4M+LBChyGnpfAiAB37jIFGQlsK2enVGMrruLGqWUSon+E97t0C3Dsdctw30n/exn7 Q/uITRe5QtDoMMfLxIprYzXWCbDVFxwH3gzTzwq9BE+sYyeIIitmoZ1gaMp5GPjlLNWi jeFyDzWcl3TnFdzlZ/k4/o4c2MFe6Oz7Busf/gY1L/lkD4aWpjeFy+bDqFjC83BnxCn8 rB1StisrMEIcwlvheoKU8H98uDzDOCytXcPFTBVvogGkyb3/mzmPMgpfytBt2X/9gx5o hjMk2LAnJdfdH37ZbWkZB4oLxnJL0/6Gf0XxZletMiyt335B6qfnBRWbK7WO+Zn6YinA G+yw== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@amd.com header.s=selector1 header.b=bD8Xn5eQ; arc=pass (i=1 spf=pass spfdomain=amd.com dmarc=pass fromdomain=amd.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amd.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ne11-20020a17090b374b00b002565555d1ebsi8913585pjb.110.2023.06.12.10.16.37; Mon, 12 Jun 2023 10:16:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@amd.com header.s=selector1 header.b=bD8Xn5eQ; arc=pass (i=1 spf=pass spfdomain=amd.com dmarc=pass fromdomain=amd.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amd.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236117AbjFLQrq (ORCPT <rfc822;rust.linux@gmail.com> + 99 others); Mon, 12 Jun 2023 12:47:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43782 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229866AbjFLQrn (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 12 Jun 2023 12:47:43 -0400 Received: from NAM04-BN8-obe.outbound.protection.outlook.com (mail-bn8nam04on2071.outbound.protection.outlook.com [40.107.100.71]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0426CE6E; Mon, 12 Jun 2023 09:47:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=C00VG//6FuuvE7Xj/HFgQftOaRWZZ/faNz5UdC/ooc4PUbqXglJVcsfrVPckRGVUMzqE3lid0qyzQrCXph4LRHUhMbyDwbbsPRz3gos7CG8CHYLYQGFE2qpRhjtVybybNSY5BjLr8PdBQxeFM+3jpC+hRuGqhztMIkTAvPk+hoPRPEG6O5qor/bGGwuEnPkcnndhR+GzGNDQM6vFZMdsJQ2fRH58kHfXkLKe9f68ijyxEB4H11UgvNqqSSb6Wphc/+tN6W82RNznSHf0MPC46nuWvVWukJS/QzChJL9cfRc4x5+nJM2qOaZnH3/bttdB2xMLpjLyqpHV7fa6NA4Dsw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Hk60pp8Glx9LR/eSVjkS1qmHqY58icbX5GEGMbphrC8=; b=Gwoj3t8ysjc+f34/oeIDrxv5F6tElFBcLqwuarDZOcS36nANx40lsifBTzOedVlnz5YaUvQWJ5Dr1uaI275Js5rtWFUJtKNmDIjB5tsSo0YWLH+JilIswui06FPajDN+Tk0WJ+KhrXcLvNNbkH4EBiHdKxqd94+EyEkKDjrlKMd5XbWTER4+F22a+Z1AbpartKKU1YPLJ2J8w3j6ZHAYjQtJtsuEQBfm6QJEIKOPju164M0VX/p/v7URSAal0HZezW7kYELy4nNoAOuaS3zu41MIjIzepOBuR9CEbmWOTPu/ViWFWTvw+6wBkf4+PJoSEoni3f4RWZGNdcL2PSJ+rg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=lwn.net smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Hk60pp8Glx9LR/eSVjkS1qmHqY58icbX5GEGMbphrC8=; b=bD8Xn5eQSpIt7D2IP0GMll8iaKB/FsIAgEazfat2NQrIFkDd5sKsVkUsHFZWZReNwN9QCdRZ/ZYhyoVe8pz3WiNclU3WOmvFokv6H37DD/4aJGGQ96/KOPhsdj1N+RV81iEETEMmlItnMHj6fN02zOEs8OgPyc94CpJGMAvz1Z0= Received: from DM6PR08CA0040.namprd08.prod.outlook.com (2603:10b6:5:1e0::14) by IA0PR12MB9046.namprd12.prod.outlook.com (2603:10b6:208:405::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6455.36; Mon, 12 Jun 2023 16:47:38 +0000 Received: from CY4PEPF0000E9D9.namprd05.prod.outlook.com (2603:10b6:5:1e0:cafe::6a) by DM6PR08CA0040.outlook.office365.com (2603:10b6:5:1e0::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6477.34 via Frontend Transport; Mon, 12 Jun 2023 16:47:37 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB03.amd.com; pr=C Received: from SATLEXMB03.amd.com (165.204.84.17) by CY4PEPF0000E9D9.mail.protection.outlook.com (10.167.241.77) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.6500.22 via Frontend Transport; Mon, 12 Jun 2023 16:47:37 +0000 Received: from SATLEXMB08.amd.com (10.181.40.132) by SATLEXMB03.amd.com (10.181.40.144) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Mon, 12 Jun 2023 11:47:36 -0500 Received: from SATLEXMB04.amd.com (10.181.40.145) by SATLEXMB08.amd.com (10.181.40.132) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Mon, 12 Jun 2023 09:47:36 -0700 Received: from iron-maiden.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server id 15.1.2507.23 via Frontend Transport; Mon, 12 Jun 2023 11:47:29 -0500 From: Carlos Bilbao <carlos.bilbao@amd.com> To: <corbet@lwn.net> CC: <linux-doc@vger.kernel.org>, <linux-kernel@vger.kernel.org>, <ardb@kernel.org>, <kraxel@redhat.com>, <dovmurik@linux.ibm.com>, <elena.reshetova@intel.com>, <dave.hansen@linux.intel.com>, <Dhaval.Giani@amd.com>, <michael.day@amd.com>, <pavankumar.paluri@amd.com>, <David.Kaplan@amd.com>, <Reshma.Lal@amd.com>, <Jeremy.Powell@amd.com>, <sathyanarayanan.kuppuswamy@linux.intel.com>, <alexander.shishkin@linux.intel.com>, <thomas.lendacky@amd.com>, <tglx@linutronix.de>, <dgilbert@redhat.com>, <gregkh@linuxfoundation.org>, <dinechin@redhat.com>, <linux-coco@lists.linux.dev>, <berrange@redhat.com>, <mst@redhat.com>, <tytso@mit.edu>, <jikos@kernel.org>, <joro@8bytes.org>, <leon@kernel.org>, <richard.weinberger@gmail.com>, <lukas@wunner.de>, <jejb@linux.ibm.com>, <cdupontd@redhat.com>, <jasowang@redhat.com>, <sameo@rivosinc.com>, <bp@alien8.de>, <seanjc@google.com>, <security@kernel.org>, Carlos Bilbao <carlos.bilbao@amd.com>, Larry Dewey <larry.dewey@amd.com>, David Kaplan <david.kaplan@amd.com> Subject: [PATCH v2] docs: security: Confidential computing intro and threat model for x86 virtualization Date: Mon, 12 Jun 2023 11:47:27 -0500 Message-ID: <20230612164727.3935657-1-carlos.bilbao@amd.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CY4PEPF0000E9D9:EE_|IA0PR12MB9046:EE_ X-MS-Office365-Filtering-Correlation-Id: 86670dcb-0dc0-4fd4-17fd-08db6b64bac1 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: GcXQsoPnda47WGVp2R21mV3ibcIEIXL9bEclwnJHTJULCu4ShGhwSO2AeQSe5GEzmRoMcKshu2DZ6Csoz51LPSWh2QbF/rQk8ZmGBhwAvfXqQlfns6I8wpHkCGol1mXHK+iQ6bvAmu2t9hcaZfXRoipWFLrWmC+5U2qOFWx7EJaruudcKzQ2U7n1WtBKKcJG+XS0brRrUldgKhwYF0bt7OL5Kvbo/JiXbkZ9yQrMg7a23wLlMKBWvbsbRSNugZ2rvhT1r9nB9h1+w3+p2qwrxa2whaCB8juq7lroWKERtRHsMsJCJg9L1WcVVvFzgSYT/JV7i9Cmyoxy/0uKOl3l/FXcM87DlvfV284CmxW4WkAWzISTwWFrz3Cvmp3ZvvVFUIEUIfUPAkFQDT5kxNgzjto73JXrXGyGzH5/R/t3aoWIOpqyJs89f1A8o6uuuVL7uo2iBw5/BvhncEkdut2yb3N/b3KJ4musMXultMclaZBhZv1TLaPfzQPVgs00a1hOghGATaQZfYPhvBj5No6CusfQ1byzhBswU9k5pgYT7KPDESx6LYazt54CAGCEzMY6OwAmhmJAsXsaaGU2HBMFABQf2QDflltfH7d/tyqVo+kiPmNC0/YByKGNp4Z9kB/9n7JSpzFx6kJkhb2+9POFNwcQrYns+Ucb1I2K7v+EvWikTitMD0gk9fLN/fu1wVW1ko70Ajkr7+8tb29N2H3JhrstEGJEmxn3wdU12kMDDfr3B5kCQETufx+JTgldNSRFwhGhzS0ztUi0t8HHeZTkMrDadZnwANeE1PqzlezvdA8= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB03.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230028)(4636009)(39860400002)(396003)(136003)(376002)(346002)(451199021)(36840700001)(46966006)(40470700004)(26005)(1076003)(336012)(2616005)(426003)(966005)(36860700001)(47076005)(83380400001)(186003)(82310400005)(70206006)(356005)(30864003)(70586007)(2906002)(82740400003)(81166007)(478600001)(7696005)(40460700003)(44832011)(7406005)(7416002)(54906003)(15650500001)(8676002)(5660300002)(8936002)(36756003)(41300700001)(86362001)(316002)(40480700001)(6916009)(4326008)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 12 Jun 2023 16:47:37.5925 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 86670dcb-0dc0-4fd4-17fd-08db6b64bac1 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB03.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CY4PEPF0000E9D9.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: IA0PR12MB9046 X-Spam-Status: No, score=-1.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FORGED_SPF_HELO, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768518015704211666?= X-GMAIL-MSGID: =?utf-8?q?1768518015704211666?= |
Series |
[v2] docs: security: Confidential computing intro and threat model for x86 virtualization
|
|
Commit Message
Carlos Bilbao
June 12, 2023, 4:47 p.m. UTC
Kernel developers working on confidential computing for virtualized environments in x86 operate under a set of assumptions regarding the Linux kernel threat model that differs from the traditional view. Historically, the Linux threat model acknowledges attackers residing in userspace, as well as a limited set of external attackers that are able to interact with the kernel through networking or limited HW-specific exposed interfaces (e.g. USB, thunderbolt). The goal of this document is to explain additional attack vectors that arise in the virtualized confidential computing space and discuss the proposed protection mechanisms for the Linux kernel. Reviewed-by: Larry Dewey <larry.dewey@amd.com> Reviewed-by: David Kaplan <david.kaplan@amd.com> Co-developed-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Carlos Bilbao <carlos.bilbao@amd.com> --- V1 can be found in: https://lore.kernel.org/lkml/20230327141816.2648615-1-carlos.bilbao@amd.com/ Changes since v1: - Apply feedback from first version of the patch - Clarify that the document applies only to a particular angle of confidential computing, namely confidential computing for virtualized environments. Also, state that the document is specific to x86 and that the main goal is to discuss the emerging threats. - Change commit message and file name accordingly - Replace AMD's link to AMD SEV SNP white paper - Minor tweaking and clarifications --- Documentation/security/index.rst | 1 + .../security/x86-confidential-computing.rst | 298 ++++++++++++++++++ MAINTAINERS | 6 + 3 files changed, 305 insertions(+) create mode 100644 Documentation/security/x86-confidential-computing.rst
Comments
Hi-- On 6/12/23 09:47, Carlos Bilbao wrote: > Kernel developers working on confidential computing for virtualized > environments in x86 operate under a set of assumptions regarding the Linux > kernel threat model that differs from the traditional view. Historically, > the Linux threat model acknowledges attackers residing in userspace, as > well as a limited set of external attackers that are able to interact with > the kernel through networking or limited HW-specific exposed interfaces > (e.g. USB, thunderbolt). The goal of this document is to explain additional > attack vectors that arise in the virtualized confidential computing space > and discuss the proposed protection mechanisms for the Linux kernel. > > Reviewed-by: Larry Dewey <larry.dewey@amd.com> > Reviewed-by: David Kaplan <david.kaplan@amd.com> > Co-developed-by: Elena Reshetova <elena.reshetova@intel.com> > Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> > Signed-off-by: Carlos Bilbao <carlos.bilbao@amd.com> > --- > > V1 can be found in: > https://lore.kernel.org/lkml/20230327141816.2648615-1-carlos.bilbao@amd.com/ > Changes since v1: > > - Apply feedback from first version of the patch > - Clarify that the document applies only to a particular angle of > confidential computing, namely confidential computing for virtualized > environments. Also, state that the document is specific to x86 and > that the main goal is to discuss the emerging threats. > - Change commit message and file name accordingly > - Replace AMD's link to AMD SEV SNP white paper > - Minor tweaking and clarifications > > --- > Documentation/security/index.rst | 1 + > .../security/x86-confidential-computing.rst | 298 ++++++++++++++++++ > MAINTAINERS | 6 + > 3 files changed, 305 insertions(+) > create mode 100644 Documentation/security/x86-confidential-computing.rst > > diff --git a/Documentation/security/index.rst b/Documentation/security/index.rst > index 6ed8d2fa6f9e..bda919aecb37 100644 > --- a/Documentation/security/index.rst > +++ b/Documentation/security/index.rst > @@ -6,6 +6,7 @@ Security Documentation > :maxdepth: 1 > > credentials > + x86-confidential-computing Does the new entry align with the others? > IMA-templates > keys/index > lsm > diff --git a/Documentation/security/x86-confidential-computing.rst b/Documentation/security/x86-confidential-computing.rst > new file mode 100644 > index 000000000000..5c52b8888089 > --- /dev/null > +++ b/Documentation/security/x86-confidential-computing.rst > @@ -0,0 +1,298 @@ > +====================================================== > +Confidential Computing in Linux for x86 virtualization > +====================================================== > + > +.. contents:: :local: > + > +By: Elena Reshetova <elena.reshetova@intel.com> and Carlos Bilbao <carlos.bilbao@amd.com> > + > +Motivation > +========== > + > +Kernel developers working on confidential computing for virtualized > +environments in x86 operate under a set of assumptions regarding the Linux > +kernel threat model that differ from the traditional view. Historically, > +the Linux threat model acknowledges attackers residing in userspace, as > +well as a limited set of external attackers that are able to interact with > +the kernel through various networking or limited HW-specific exposed > +interfaces (USB, thunderbolt). The goal of this document is to explain > +additional attack vectors that arise in the confidential computing space > +and discuss the proposed protection mechanisms for the Linux kernel. > + > +Overview and terminology > +======================== > + > +Confidential Computing (CoCo) is a broad term covering a wide range of > +security technologies that aim to protect the confidentiality and integrity > +of data in use (vs. data at rest or data in transit). At its core, CoCo > +solutions provide a Trusted Execution Environment (TEE), where secure data > +processing can be performed and, as a result, they are typically further > +classified into different subtypes depending on the SW that is intended > +to be run in TEE. This document focuses on a subclass of CoCo technologies > +that are targeting virtualized environments and allow running Virtual > +Machines (VM) inside TEE. From now on in this document will be referring > +to this subclass of CoCo as 'Confidential Computing (CoCo) for the > +virtualized environments (VE)'. > + > +CoCo, in the virtualization context, refers to a set of HW and/or SW > +technologies that allow for stronger security guarantees for the SW running > +inside a CoCo VM. Namely, confidential computing allows its users to > +confirm the trustworthiness of all SW pieces to include in its reduced > +Trusted Computing Base (TCB) given its ability to attest the state of these > +trusted components. > + > +While the concrete implementation details differ between technologies, all > +available mechanisms aim to provide increased confidentiality and > +integrity for the VM's guest memory and execution state (vCPU registers), > +more tightly controlled guest interrupt injection, as well as some > +additional mechanisms to control guest-host page mapping. More details on > +the x86-specific solutions can be found in > +:doc:`Intel Trust Domain Extensions (TDX) </arch/x86/tdx>` and <Documentation/arch/x86/tdx> or does it work without the leading subdir? > +`AMD Memory Encryption <https://www.amd.com/system/files/techdocs/sev-snp-strengthening-vm-isolation-with-integrity-protection-and-more.pdf>`_. > + > +The basic CoCo guest layout includes the host, guest, the interfaces that > +communicate guest and host, a platform capable of supporting CoCo VMs, and > +a trusted intermediary between the guest VM and the underlying platform > +that acts as a security manager. The host-side virtual machine monitor > +(VMM) typically consists of a subset of traditional VMM features and > +is still in charge of the guest lifecycle, i.e. create or destroy a CoCo > +VM, manage its access to system resources, etc. However, since it > +typically stays out of CoCo VM TCB, its access is limited to preserve the to preserving the ? > +security objectives. > + > +In the following diagram, the "<--->" lines represent bi-directional > +communication channels or interfaces between the CoCo security manager and > +the rest of the components (data flow for guest, host, hardware) :: > + > + +-------------------+ +-----------------------+ > + | CoCo guest VM |<---->| | > + +-------------------+ | | > + | Interfaces | | CoCo security manager | > + +-------------------+ | | > + | Host VMM |<---->| | > + +-------------------+ | | > + | | > + +--------------------+ | | > + | CoCo platform |<--->| | > + +--------------------+ +-----------------------+ > + > +The specific details of the CoCo security manager vastly diverge between > +technologies. For example, in some cases, it will be implemented in HW > +while in others it may be pure SW. In some cases, such as for the > +`Protected kernel-based virtual machine (pKVM) <https://github.com/intel-staging/pKVM-IA>`, > +the CoCo security manager is a small, isolated and highly privileged > +(compared to the rest of SW running on the host) part of a traditional > +VMM. > + > +Existing Linux kernel threat model > +================================== > + > +The overall components of the current Linux kernel threat model are:: > + > + +-----------------------+ +-------------------+ > + | |<---->| Userspace | > + | | +-------------------+ > + | External attack | | Interfaces | > + | vectors | +-------------------+ > + | |<---->| Linux Kernel | > + | | +-------------------+ > + +-----------------------+ +-------------------+ > + | Bootloader/BIOS | > + +-------------------+ > + +-------------------+ > + | HW platform | > + +-------------------+ > + > +There is also communication between the bootloader and the kernel during > +the boot process, but this diagram does not represent it explicitly. The > +"Interfaces" box represents the various interfaces that allow > +communication between kernel and userspace. This includes system calls, > +kernel APIs, device drivers, etc. > + > +The existing Linux kernel threat model typically assumes execution on a > +trusted HW platform with all of the firmware and bootloaders included on > +its TCB. The primary attacker resides in the userspace, and all of the data > +coming from there is generally considered untrusted, unless userspace is > +privileged enough to perform trusted actions. In addition, external > +attackers are typically considered, including those with access to enabled > +external networks (e.g. Ethernet, Wireless, Bluetooth), exposed hardware > +interfaces (e.g. USB, Thunderbolt), and the ability to modify the contents > +of disks offline. > + > +Regarding external attack vectors, it is interesting to note that in most > +cases external attackers will try to exploit vulnerabilities in userspace > +first, but that it is possible for an attacker to directly target the > +kernel; particularly if the host has physical access. Examples of direct > +kernel attacks include the vulnerabilities CVE-2019-19524, CVE-2022-0435 > +and CVE-2020-24490. > + > +Confidential Computing threat model and its security objectives > +=============================================================== > + > +Confidential Computing adds a new type of attacker to the above list: a > +potentially misbehaving host (which can also include some part of a > +traditional VMM or all of it), which is typically placed outside of the > +CoCo VM TCB due to its large SW attack surface. It is important to note > +that this doesn’t imply that the host or VMM are intentionally > +malicious, but that there exists a security value in having a small CoCo > +VM TCB. This new type of adversary may be viewed as a more powerful type > +of external attacker, as it resides locally on the same physical machine > +-in contrast to a remote network attacker- and has control over the guest Hyphens (dashes) are not normally used for a parenthetical phrase AFAIK. > +kernel communication with most of the HW:: I would prefer to capitalize "kernel" above. > + > + +------------------------+ > + | CoCo guest VM | > + +-----------------------+ | +-------------------+ | > + | |<--->| | Userspace | | > + | | | +-------------------+ | > + | External attack | | | Interfaces | | > + | vectors | | +-------------------+ | > + | |<--->| | Linux Kernel | | > + | | | +-------------------+ | > + +-----------------------+ | +-------------------+ | > + | | Bootloader/BIOS | | > + +-----------------------+ | +-------------------+ | > + | |<--->+------------------------+ > + | | | Interfaces | > + | | +------------------------+ > + | CoCo security |<--->| Host/Host-side VMM | > + | manager | +------------------------+ > + | | +------------------------+ > + | |<--->| CoCo platform | > + +-----------------------+ +------------------------+ > + > +While traditionally the host has unlimited access to guest data and can > +leverage this access to attack the guest, the CoCo systems mitigate such > +attacks by adding security features like guest data confidentiality and > +integrity protection. This threat model assumes that those features are > +available and intact. > + > +The **Linux kernel CoCo VM security objectives** can be summarized as follows: > + > +1. Preserve the confidentiality and integrity of CoCo guest's private > +memory and registers. > + > +2. Prevent privileged escalation from a host into a CoCo guest Linux kernel. > +While it is true that the host (and host-side VMM) requires some level of > +privilege to create, destroy, or pause the guest, part of the goal of > +preventing privileged escalation is to ensure that these operations do not > +provide a pathway for attackers to gain access to the guest's kernel. > + > +The above security objectives result in two primary **Linux kernel CoCo > +VM assets**: > + > +1. Guest kernel execution context. > +2. Guest kernel private memory. > + > +The host retains full control over the CoCo guest resources, and can deny > +access to them at any time. Examples of resources include CPU time, memory > +that the guest can consume, network bandwidth, etc. Because of this, the > +host Denial of Service (DoS) attacks against CoCo guests are beyond the > +scope of this threat model. > + > +The **Linux CoCo VM attack surface** is any interface exposed from a CoCo > +guest Linux kernel towards an untrusted host that is not covered by the > +CoCo technology SW/HW protection. This includes any possible > +side-channels, as well as transient execution side channels. Examples of > +explicit (not side-channel) interfaces include accesses to port I/O, MMIO > +and DMA interfaces, access to PCI configuration space, VMM-specific > +hypercalls (towards Host-side VMM), access to shared memory pages, > +interrupts allowed to be injected into the guest kernel by the host, as > +well as CoCo technology specific hypercalls, if present. Additionally, the technology-specific > +host in a CoCo system typically controls the process of creating a CoCo > +guest: it has a method to load into a guest the firmware and bootloader > +images, the kernel image together with the kernel command line. All of this > +data should also be considered untrusted until its integrity and > +authenticity is established via attestation. > + > +The table below shows a threat matrix for the CoCo guest Linux kernel with > +the potential mitigation strategies. The matrix refers to CoCo-specific > +versions of the guest, host and platform. > + > +.. list-table:: CoCo Linux guest kernel threat matrix > + :widths: auto > + :align: center > + :header-rows: 1 > + > + * - Threat name > + - Threat description > + - Mitigation strategies > + > + * - Guest malicious configuration > + - A misbehaving host modifies one of the following guest's > + configuration: > + > + 1. Guest firmware or bootloader > + > + 2. Guest kernel or module binaries > + > + 3. Guest command line parameters > + > + This allows the host to break the integrity of the code running > + inside a CoCo guest, and violates the CoCo security objectives. > + - The integrity of the guest's configuration passed via untrusted host > + must be ensured by methods such as remote attestation and signing. > + This should be largely transparent to the guest kernel, and would > + allow it to assume a trusted state at the time of boot. > + > + * - CoCo guest data attacks > + - A misbehaving host retains full control of the CoCo guest's data > + in-transit between the guest and the host-managed physical or > + virtual devices. This allows any attack against confidentiality, > + integrity or freshness of such data. > + - The CoCo guest is responsible for ensuring the confidentiality, > + integrity and freshness of such data using well-established > + security mechanisms. For example, for any guest external network > + communications passed via the untrusted host, an end-to-end > + secure session must be established between a guest and a trusted > + remote endpoint using well-known protocols such as TLS. > + This requirement also applies to protection of the guest's disk > + image. > + > + * - Malformed runtime input > + - A misbehaving host injects malformed input via any communication > + interface used by the guest's kernel code. If the code is not > + prepared to handle this input correctly, this can result in a host > + --> guest kernel privilege escalation. This includes traditional > + side-channel and/or transient execution attack vectors. > + - The attestation or signing process cannot help to mitigate this > + threat since this input is highly dynamic. Instead, a different set > + of mechanisms is required: > + > + 1. *Limit the exposed attack surface*. Whenever possible, disable > + complex kernel features and device drivers (not required for guest > + operation) that actively use the communication interfaces between > + the untrusted host and the guest. This is not a new concept for the > + Linux kernel, since it already has mechanisms to disable external > + interfaces, such as attacker's access via USB/Thunderbolt subsystem. > + > + 2. *Harden the exposed attack surface*. Any code that uses such > + interfaces must treat the input from the untrusted host as malicious, > + and do sanity checks before processing it. This can be ensured by > + performing a code audit of such device drivers as well as employing > + other standard techniques for testing the code robustness, such as > + fuzzing. This is again a well-known concept for the Linux kernel, > + since all its networking code has been previously analyzed under > + presumption of processing malformed input from a network attacker. > + > + * - Malicious runtime input > + - A misbehaving host injects a specific input value via any > + communication interface used by the guest's kernel code. The > + difference with the previous attack vector (malformed runtime input) > + is that this input is not malformed, but its value is crafted to > + impact the guest's kernel security. Examples of such inputs include > + providing a malicious time to the guest or the entropy to the guest > + random number generator. Additionally, the timing of such events can > + be an attack vector on its own, if it results in a particular guest > + kernel action (i.e. processing of a host-injected interrupt). > + - Similarly, as with the previous attack vector, it is not possible to > + use attestation mechanisms to address this threat. Instead, such > + attack vectors (i.e. interfaces) must be either disabled or made > + resistant to supplied host input. > + > +As can be seen from the above table, the potential mitigation strategies > +to secure the CoCo Linux guest kernel vary, but can be roughly split into > +mechanisms that either require or do not require changes to the existing > +Linux kernel code. One main goal of the CoCo security architecture is to > +minimize changes to the Linux kernel code, while also providing usable > +and scalable means to facilitate the security of a CoCo guest kernel. HTH. ~Randy
On Mon, Jun 12, 2023, Carlos Bilbao wrote: > Kernel developers working on confidential computing for virtualized > environments in x86 operate under a set of assumptions regarding the Linux No, "x86" isn't special, SNP and TDX and s390's UV are special. pKVM is similar, but (a) it's not as paranoid as SNP and TDX, and (b) the known use case for pKVM on x86 is to harden usage of hardware devices, i.e. pKVM x86 "guests" likely don't have the same "untrusted virtual device" attack surfaces a SNP/TDX/UV guests. > +Kernel developers working on confidential computing for virtualized > +environments in x86 operate under a set of assumptions regarding the Linux I don't think "virtualized environments" is the right description. IMO, "cloud computing environments" or maybe "off-premise environments" more accurately captures what you want to document, though the latter fails to imply the "virtual" aspect of things. > +The specific details of the CoCo security manager vastly diverge between > +technologies. For example, in some cases, it will be implemented in HW > +while in others it may be pure SW. In some cases, such as for the > +`Protected kernel-based virtual machine (pKVM) <https://github.com/intel-staging/pKVM-IA>`, > +the CoCo security manager is a small, isolated and highly privileged > +(compared to the rest of SW running on the host) part of a traditional > +VMM. I say that "virtualized environments" isn't a good description because while pKVM does utilize hardware virtualization, my understanding is that the primary use cases for pKVM don't have the same threat model as SNP/TDX, e.g. IIUC many (most? all?) pKVM guests don't require network access. > +Confidential Computing adds a new type of attacker to the above list: a This should be qualified as "CoCo for cloud", or whatever sublabel we land on. > +potentially misbehaving host (which can also include some part of a > +traditional VMM or all of it), which is typically placed outside of the > +CoCo VM TCB due to its large SW attack surface. It is important to note > +that this doesn’t imply that the host or VMM are intentionally > +malicious, but that there exists a security value in having a small CoCo > +VM TCB. This new type of adversary may be viewed as a more powerful type > +of external attacker, as it resides locally on the same physical machine > +-in contrast to a remote network attacker- and has control over the guest > +kernel communication with most of the HW:: IIUC, this last statement doesn't hold true for the pKVM on x86 use case, which specifically aims to give a "guest" exclusive access to hardware resources. > +The **Linux kernel CoCo VM security objectives** can be summarized as follows: > + > +1. Preserve the confidentiality and integrity of CoCo guest's private > +memory and registers. As I complained in v1, this doesn't hold true for all of x86. My complaint goes away if the document is specific to the TDX/SNP/UV threat models, but describing the doc as "x86 specific" is misleading, as the threat model isn't x86 specific, nor do all confidential compute technologies that run on x86 share these objectives, e.g. vanilla SEV. > +well as CoCo technology specific hypercalls, if present. Additionally, the > +host in a CoCo system typically controls the process of creating a CoCo > +guest: it has a method to load into a guest the firmware and bootloader > +images, the kernel image together with the kernel command line. All of this > +data should also be considered untrusted until its integrity and > +authenticity is established via attestation. Attestation is SNP and TDX specific. AIUI, none of SEV, SEV-ES, or pKVM (which doesn't even really exist on x86 yet), have attestation of their own, e.g. the proposed pKVM support would rely on Secure Boot of the original "full" host kernel. > +CONFIDENTIAL COMPUTING THREAT MODEL FOR X86 VIRTUALIZATION > +M: Elena Reshetova <elena.reshetova@intel.com> > +M: Carlos Bilbao <carlos.bilbao@amd.com> > +S: Maintained > +F: Documentation/security/x86-confidential-computing.rst Throwing "x86" on the name doesn't change my objections, this is still an SNP/TDX specific doc pretending to be more generic then it actually is. I don't understand the resistance to picking a name that makes it abundantly clear the doc covers a very specific niche of confidential computing.
> On Mon, Jun 12, 2023, Carlos Bilbao wrote: > > Kernel developers working on confidential computing for virtualized > > environments in x86 operate under a set of assumptions regarding the Linux > > No, "x86" isn't special, SNP and TDX and s390's UV are special. pKVM is similar, > but (a) it's not as paranoid as SNP and TDX, and (b) the known use case for pKVM > on x86 is to harden usage of hardware devices, i.e. pKVM x86 "guests" likely > don't > have the same "untrusted virtual device" attack surfaces a SNP/TDX/UV guests. + Jason Chen to help clarifying the pKVM on x86 case. My impression was that pKVM on x86 would similarly care about hardening its pKVM guest kernel against host attacks. Because in security world, if you try to put smth outside of your TCB (host SW stack in this case), you automatically need to prevent privilege escalation attacks from outside to inside and that implies caring about attacks that host can do via let's say malicious pci drivers and such. What prevents host doing such attacks in pKVM case? > > > +Kernel developers working on confidential computing for virtualized > > +environments in x86 operate under a set of assumptions regarding the Linux > > I don't think "virtualized environments" is the right description. IMO, "cloud > computing environments" or maybe "off-premise environments" more > accurately > captures what you want to document, though the latter fails to imply the > "virtual" > aspect of things. Hm.. "cloud computing environments" explicitly implies "cloud", which is what we were trying to get away from in v2, because it describes a *particular* use case where CoCo VMs can be used (and probably will be used a lot in practice), but we don’t want to limit this to just that use case and exclude others. "off-premise environments" is so vague imo that I would not know what it means in this context, if I would be a person new to the topic of CoCo. And as you said it doesn’t even imply the virtual aspect at all. > > > +The specific details of the CoCo security manager vastly diverge between > > +technologies. For example, in some cases, it will be implemented in HW > > +while in others it may be pure SW. In some cases, such as for the > > +`Protected kernel-based virtual machine (pKVM) <https://github.com/intel- > staging/pKVM-IA>`, > > +the CoCo security manager is a small, isolated and highly privileged > > +(compared to the rest of SW running on the host) part of a traditional > > +VMM. > > I say that "virtualized environments" isn't a good description because while > pKVM > does utilize hardware virtualization, my understanding is that the primary use > cases for pKVM don't have the same threat model as SNP/TDX, e.g. IIUC many > (most? > all?) pKVM guests don't require network access. Not having a network access requirement doesn’t implicitly invalidate the separation guarantees between the host and guest, it just makes it easier since you have one interface less between the host and guest. But again I will let Jason to reply on this since he knows details. But what you are saying more generally here and above is that you don’t want pKVM case included into this threat model, did I understand you correctly? > > > +Confidential Computing adds a new type of attacker to the above list: a > > This should be qualified as "CoCo for cloud", or whatever sublabel we land on. Yes, we just need to find this label. If you remember, v1 had the name "Confidential Cloud Computing", which you were the first one to complain about )) > > > +potentially misbehaving host (which can also include some part of a > > +traditional VMM or all of it), which is typically placed outside of the > > +CoCo VM TCB due to its large SW attack surface. It is important to note > > +that this doesn’t imply that the host or VMM are intentionally > > +malicious, but that there exists a security value in having a small CoCo > > +VM TCB. This new type of adversary may be viewed as a more powerful type > > +of external attacker, as it resides locally on the same physical machine > > +-in contrast to a remote network attacker- and has control over the guest > > +kernel communication with most of the HW:: > > IIUC, this last statement doesn't hold true for the pKVM on x86 use case, which > specifically aims to give a "guest" exclusive access to hardware resources. Does it hold for *all* HW resources? If yes, indeed this would make pKVM on x86 considerably different. > > > +The **Linux kernel CoCo VM security objectives** can be summarized as > follows: > > + > > +1. Preserve the confidentiality and integrity of CoCo guest's private > > +memory and registers. > > As I complained in v1, this doesn't hold true for all of x86. My complaint goes > away if the document is specific to the TDX/SNP/UV threat models, but describing > the doc as "x86 specific" is misleading, as the threat model isn't x86 specific, > nor do all confidential compute technologies that run on x86 share these > objectives, > e.g. vanilla SEV. Yes, this brings us back to the naming issue, see below. > > > +well as CoCo technology specific hypercalls, if present. Additionally, the > > +host in a CoCo system typically controls the process of creating a CoCo > > +guest: it has a method to load into a guest the firmware and bootloader > > +images, the kernel image together with the kernel command line. All of this > > +data should also be considered untrusted until its integrity and > > +authenticity is established via attestation. > > Attestation is SNP and TDX specific. AIUI, none of SEV, SEV-ES, or pKVM (which > doesn't even really exist on x86 yet), have attestation of their own, e.g. the > proposed pKVM support would rely on Secure Boot of the original "full" host > kernel. Agree the last phrase needs to be corrected to apply for pKVM case (was missed in v2), so propose to have this text instead: "All of this data should also be considered untrusted until its integrity and authenticity is established via a CoCo technology-defined process such as attestation or variants of secure/trusted/authenticated boot." The goal of the above sentence is only to say that the integrity/authenticity must be established via whatever method a concrete technology brings, otherwise we have a big problem in security. > > > +CONFIDENTIAL COMPUTING THREAT MODEL FOR X86 VIRTUALIZATION > > +M: Elena Reshetova <elena.reshetova@intel.com> > > +M: Carlos Bilbao <carlos.bilbao@amd.com> > > +S: Maintained > > +F: Documentation/security/x86-confidential-computing.rst > > Throwing "x86" on the name doesn't change my objections, this is still an > SNP/TDX > specific doc pretending to be more generic then it actually is. I don't understand > the resistance to picking a name that makes it abundantly clear the doc covers a > very specific niche of confidential computing. We really don’t pretend to "overgeneric", but since noone else outside of x86 is interested to help writing this document or becoming a co-maintainer, we cannot claim covering more than merely describing x86 specific solutions in this space. But, let’s agree on the name and then we can plug it everywhere in v3. v1 used "Confidential Cloud Computing" v2 used "Confidential Computing for virtualized environments" You proposed above "Confidential computing for cloud computing environments " and "Confidential Computing for off-premise environments ". I still don’t get what is wrong with "Confidential Computing for virtualized environments" name: you mentioned it doesn’t describe correctly what we want to express, but you didn’t explain why. Could you please elaborate? Also, is the name *that* important given that we have already spend a whole paragraph in v2 explaining what we mean by this name? We are all tech people here, so we don’t plan to use this name for marketing campaigns :) Best Regards, Elena.
Hello Randy, On 6/12/23 17:43, Randy Dunlap wrote: > Hi-- > > On 6/12/23 09:47, Carlos Bilbao wrote: >> Kernel developers working on confidential computing for virtualized >> environments in x86 operate under a set of assumptions regarding the Linux >> kernel threat model that differs from the traditional view. Historically, >> the Linux threat model acknowledges attackers residing in userspace, as >> well as a limited set of external attackers that are able to interact with >> the kernel through networking or limited HW-specific exposed interfaces >> (e.g. USB, thunderbolt). The goal of this document is to explain additional >> attack vectors that arise in the virtualized confidential computing space >> and discuss the proposed protection mechanisms for the Linux kernel. >> >> Reviewed-by: Larry Dewey <larry.dewey@amd.com> >> Reviewed-by: David Kaplan <david.kaplan@amd.com> >> Co-developed-by: Elena Reshetova <elena.reshetova@intel.com> >> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> >> Signed-off-by: Carlos Bilbao <carlos.bilbao@amd.com> >> --- >> >> V1 can be found in: >> >> https://lore.kernel.org/lkml/20230327141816.2648615-1-carlos.bilbao@amd.com/ >> Changes since v1: >> >> - Apply feedback from first version of the patch >> - Clarify that the document applies only to a particular angle of >> confidential computing, namely confidential computing for virtualized >> environments. Also, state that the document is specific to x86 and >> that the main goal is to discuss the emerging threats. >> - Change commit message and file name accordingly >> - Replace AMD's link to AMD SEV SNP white paper >> - Minor tweaking and clarifications >> >> --- >> Documentation/security/index.rst | 1 + >> .../security/x86-confidential-computing.rst | 298 ++++++++++++++++++ >> MAINTAINERS | 6 + >> 3 files changed, 305 insertions(+) >> create mode 100644 Documentation/security/x86-confidential-computing.rst >> >> diff --git a/Documentation/security/index.rst >> b/Documentation/security/index.rst >> index 6ed8d2fa6f9e..bda919aecb37 100644 >> --- a/Documentation/security/index.rst >> +++ b/Documentation/security/index.rst >> @@ -6,6 +6,7 @@ Security Documentation >> :maxdepth: 1 >> credentials >> + x86-confidential-computing > > Does the new entry align with the others? Yes, I believe so. > >> IMA-templates >> keys/index >> lsm >> diff --git a/Documentation/security/x86-confidential-computing.rst >> b/Documentation/security/x86-confidential-computing.rst >> new file mode 100644 >> index 000000000000..5c52b8888089 >> --- /dev/null >> +++ b/Documentation/security/x86-confidential-computing.rst >> @@ -0,0 +1,298 @@ >> +====================================================== >> +Confidential Computing in Linux for x86 virtualization >> +====================================================== >> + >> +.. contents:: :local: >> + >> +By: Elena Reshetova <elena.reshetova@intel.com> and Carlos Bilbao >> <carlos.bilbao@amd.com> >> + >> +Motivation >> +========== >> + >> +Kernel developers working on confidential computing for virtualized >> +environments in x86 operate under a set of assumptions regarding the Linux >> +kernel threat model that differ from the traditional view. Historically, >> +the Linux threat model acknowledges attackers residing in userspace, as >> +well as a limited set of external attackers that are able to interact with >> +the kernel through various networking or limited HW-specific exposed >> +interfaces (USB, thunderbolt). The goal of this document is to explain >> +additional attack vectors that arise in the confidential computing space >> +and discuss the proposed protection mechanisms for the Linux kernel. >> + >> +Overview and terminology >> +======================== >> + >> +Confidential Computing (CoCo) is a broad term covering a wide range of >> +security technologies that aim to protect the confidentiality and integrity >> +of data in use (vs. data at rest or data in transit). At its core, CoCo >> +solutions provide a Trusted Execution Environment (TEE), where secure data >> +processing can be performed and, as a result, they are typically further >> +classified into different subtypes depending on the SW that is intended >> +to be run in TEE. This document focuses on a subclass of CoCo technologies >> +that are targeting virtualized environments and allow running Virtual >> +Machines (VM) inside TEE. From now on in this document will be referring >> +to this subclass of CoCo as 'Confidential Computing (CoCo) for the >> +virtualized environments (VE)'. >> + >> +CoCo, in the virtualization context, refers to a set of HW and/or SW >> +technologies that allow for stronger security guarantees for the SW running >> +inside a CoCo VM. Namely, confidential computing allows its users to >> +confirm the trustworthiness of all SW pieces to include in its reduced >> +Trusted Computing Base (TCB) given its ability to attest the state of these >> +trusted components. >> + >> +While the concrete implementation details differ between technologies, all >> +available mechanisms aim to provide increased confidentiality and >> +integrity for the VM's guest memory and execution state (vCPU registers), >> +more tightly controlled guest interrupt injection, as well as some >> +additional mechanisms to control guest-host page mapping. More details on >> +the x86-specific solutions can be found in >> +:doc:`Intel Trust Domain Extensions (TDX) </arch/x86/tdx>` and > > <Documentation/arch/x86/tdx> > or does it work without the leading subdir? It works like this. > >> +`AMD Memory Encryption >> <https://www.amd.com/system/files/techdocs/sev-snp-strengthening-vm-isolation-with-integrity-protection-and-more.pdf>`_. >> + >> +The basic CoCo guest layout includes the host, guest, the interfaces that >> +communicate guest and host, a platform capable of supporting CoCo VMs, and >> +a trusted intermediary between the guest VM and the underlying platform >> +that acts as a security manager. The host-side virtual machine monitor >> +(VMM) typically consists of a subset of traditional VMM features and >> +is still in charge of the guest lifecycle, i.e. create or destroy a CoCo >> +VM, manage its access to system resources, etc. However, since it >> +typically stays out of CoCo VM TCB, its access is limited to preserve the > > to preserving the > ? I think that using "preserving" and "preserve" here may result in two different interpretations: "limited to preserve the security objectives" suggests that the limited access is enforced to preserve the security guarantees. In other words, the act of limiting access itself, particularly from the VMM, helps to maintain the security objectives. This is what we want to say. "limited to preserving the security objectives" suggests that the access of the VMM is limited to the components that allow the VMM to preserve the security objectives. Hope that makes sense? > >> +security objectives. >> + >> +In the following diagram, the "<--->" lines represent bi-directional >> +communication channels or interfaces between the CoCo security manager and >> +the rest of the components (data flow for guest, host, hardware) :: >> + >> + +-------------------+ +-----------------------+ >> + | CoCo guest VM |<---->| | >> + +-------------------+ | | >> + | Interfaces | | CoCo security manager | >> + +-------------------+ | | >> + | Host VMM |<---->| | >> + +-------------------+ | | >> + | | >> + +--------------------+ | | >> + | CoCo platform |<--->| | >> + +--------------------+ +-----------------------+ >> + >> +The specific details of the CoCo security manager vastly diverge between >> +technologies. For example, in some cases, it will be implemented in HW >> +while in others it may be pure SW. In some cases, such as for the >> +`Protected kernel-based virtual machine (pKVM) >> <https://github.com/intel-staging/pKVM-IA>`, >> +the CoCo security manager is a small, isolated and highly privileged >> +(compared to the rest of SW running on the host) part of a traditional >> +VMM. >> + >> +Existing Linux kernel threat model >> +================================== >> + >> +The overall components of the current Linux kernel threat model are:: >> + >> + +-----------------------+ +-------------------+ >> + | |<---->| Userspace | >> + | | +-------------------+ >> + | External attack | | Interfaces | >> + | vectors | +-------------------+ >> + | |<---->| Linux Kernel | >> + | | +-------------------+ >> + +-----------------------+ +-------------------+ >> + | Bootloader/BIOS | >> + +-------------------+ >> + +-------------------+ >> + | HW platform | >> + +-------------------+ >> + >> +There is also communication between the bootloader and the kernel during >> +the boot process, but this diagram does not represent it explicitly. The >> +"Interfaces" box represents the various interfaces that allow >> +communication between kernel and userspace. This includes system calls, >> +kernel APIs, device drivers, etc. >> + >> +The existing Linux kernel threat model typically assumes execution on a >> +trusted HW platform with all of the firmware and bootloaders included on >> +its TCB. The primary attacker resides in the userspace, and all of the data >> +coming from there is generally considered untrusted, unless userspace is >> +privileged enough to perform trusted actions. In addition, external >> +attackers are typically considered, including those with access to enabled >> +external networks (e.g. Ethernet, Wireless, Bluetooth), exposed hardware >> +interfaces (e.g. USB, Thunderbolt), and the ability to modify the contents >> +of disks offline. >> + >> +Regarding external attack vectors, it is interesting to note that in most >> +cases external attackers will try to exploit vulnerabilities in userspace >> +first, but that it is possible for an attacker to directly target the >> +kernel; particularly if the host has physical access. Examples of direct >> +kernel attacks include the vulnerabilities CVE-2019-19524, CVE-2022-0435 >> +and CVE-2020-24490. >> + >> +Confidential Computing threat model and its security objectives >> +=============================================================== >> + >> +Confidential Computing adds a new type of attacker to the above list: a >> +potentially misbehaving host (which can also include some part of a >> +traditional VMM or all of it), which is typically placed outside of the >> +CoCo VM TCB due to its large SW attack surface. It is important to note >> +that this doesn’t imply that the host or VMM are intentionally >> +malicious, but that there exists a security value in having a small CoCo >> +VM TCB. This new type of adversary may be viewed as a more powerful type >> +of external attacker, as it resides locally on the same physical machine >> +-in contrast to a remote network attacker- and has control over the guest > > Hyphens (dashes) are not normally used for a parenthetical phrase AFAIK. Yes, parentheses would be more appropriate. > >> +kernel communication with most of the HW:: > > I would prefer to capitalize "kernel" above. I'm not sure I follow, we don't capitalize kernel elsewhere, why here? > >> + >> + +------------------------+ >> + | CoCo guest VM | >> + +-----------------------+ | +-------------------+ | >> + | |<--->| | Userspace | | >> + | | | +-------------------+ | >> + | External attack | | | Interfaces | | >> + | vectors | | +-------------------+ | >> + | |<--->| | Linux Kernel | | >> + | | | +-------------------+ | >> + +-----------------------+ | +-------------------+ | >> + | | Bootloader/BIOS | | >> + +-----------------------+ | +-------------------+ | >> + | |<--->+------------------------+ >> + | | | Interfaces | >> + | | +------------------------+ >> + | CoCo security |<--->| Host/Host-side VMM | >> + | manager | +------------------------+ >> + | | +------------------------+ >> + | |<--->| CoCo platform | >> + +-----------------------+ +------------------------+ >> + >> +While traditionally the host has unlimited access to guest data and can >> +leverage this access to attack the guest, the CoCo systems mitigate such >> +attacks by adding security features like guest data confidentiality and >> +integrity protection. This threat model assumes that those features are >> +available and intact. >> + >> +The **Linux kernel CoCo VM security objectives** can be summarized as >> follows: >> + >> +1. Preserve the confidentiality and integrity of CoCo guest's private >> +memory and registers. >> + >> +2. Prevent privileged escalation from a host into a CoCo guest Linux >> kernel. >> +While it is true that the host (and host-side VMM) requires some level of >> +privilege to create, destroy, or pause the guest, part of the goal of >> +preventing privileged escalation is to ensure that these operations do not >> +provide a pathway for attackers to gain access to the guest's kernel. >> + >> +The above security objectives result in two primary **Linux kernel CoCo >> +VM assets**: >> + >> +1. Guest kernel execution context. >> +2. Guest kernel private memory. >> + >> +The host retains full control over the CoCo guest resources, and can deny >> +access to them at any time. Examples of resources include CPU time, memory >> +that the guest can consume, network bandwidth, etc. Because of this, the >> +host Denial of Service (DoS) attacks against CoCo guests are beyond the >> +scope of this threat model. >> + >> +The **Linux CoCo VM attack surface** is any interface exposed from a CoCo >> +guest Linux kernel towards an untrusted host that is not covered by the >> +CoCo technology SW/HW protection. This includes any possible >> +side-channels, as well as transient execution side channels. Examples of >> +explicit (not side-channel) interfaces include accesses to port I/O, MMIO >> +and DMA interfaces, access to PCI configuration space, VMM-specific >> +hypercalls (towards Host-side VMM), access to shared memory pages, >> +interrupts allowed to be injected into the guest kernel by the host, as >> +well as CoCo technology specific hypercalls, if present. Additionally, the > > technology-specific True! > >> +host in a CoCo system typically controls the process of creating a CoCo >> +guest: it has a method to load into a guest the firmware and bootloader >> +images, the kernel image together with the kernel command line. All of this >> +data should also be considered untrusted until its integrity and >> +authenticity is established via attestation. >> + >> +The table below shows a threat matrix for the CoCo guest Linux kernel with >> +the potential mitigation strategies. The matrix refers to CoCo-specific >> +versions of the guest, host and platform. >> + >> +.. list-table:: CoCo Linux guest kernel threat matrix >> + :widths: auto >> + :align: center >> + :header-rows: 1 >> + >> + * - Threat name >> + - Threat description >> + - Mitigation strategies >> + >> + * - Guest malicious configuration >> + - A misbehaving host modifies one of the following guest's >> + configuration: >> + >> + 1. Guest firmware or bootloader >> + >> + 2. Guest kernel or module binaries >> + >> + 3. Guest command line parameters >> + >> + This allows the host to break the integrity of the code running >> + inside a CoCo guest, and violates the CoCo security objectives. >> + - The integrity of the guest's configuration passed via untrusted host >> + must be ensured by methods such as remote attestation and signing. >> + This should be largely transparent to the guest kernel, and would >> + allow it to assume a trusted state at the time of boot. >> + >> + * - CoCo guest data attacks >> + - A misbehaving host retains full control of the CoCo guest's data >> + in-transit between the guest and the host-managed physical or >> + virtual devices. This allows any attack against confidentiality, >> + integrity or freshness of such data. >> + - The CoCo guest is responsible for ensuring the confidentiality, >> + integrity and freshness of such data using well-established >> + security mechanisms. For example, for any guest external network >> + communications passed via the untrusted host, an end-to-end >> + secure session must be established between a guest and a trusted >> + remote endpoint using well-known protocols such as TLS. >> + This requirement also applies to protection of the guest's disk >> + image. >> + >> + * - Malformed runtime input >> + - A misbehaving host injects malformed input via any communication >> + interface used by the guest's kernel code. If the code is not >> + prepared to handle this input correctly, this can result in a host >> + --> guest kernel privilege escalation. This includes traditional >> + side-channel and/or transient execution attack vectors. >> + - The attestation or signing process cannot help to mitigate this >> + threat since this input is highly dynamic. Instead, a different set >> + of mechanisms is required: >> + >> + 1. *Limit the exposed attack surface*. Whenever possible, disable >> + complex kernel features and device drivers (not required for guest >> + operation) that actively use the communication interfaces between >> + the untrusted host and the guest. This is not a new concept for the >> + Linux kernel, since it already has mechanisms to disable external >> + interfaces, such as attacker's access via USB/Thunderbolt subsystem. >> + >> + 2. *Harden the exposed attack surface*. Any code that uses such >> + interfaces must treat the input from the untrusted host as >> malicious, >> + and do sanity checks before processing it. This can be ensured by >> + performing a code audit of such device drivers as well as employing >> + other standard techniques for testing the code robustness, such as >> + fuzzing. This is again a well-known concept for the Linux kernel, >> + since all its networking code has been previously analyzed under >> + presumption of processing malformed input from a network attacker. >> + >> + * - Malicious runtime input >> + - A misbehaving host injects a specific input value via any >> + communication interface used by the guest's kernel code. The >> + difference with the previous attack vector (malformed runtime input) >> + is that this input is not malformed, but its value is crafted to >> + impact the guest's kernel security. Examples of such inputs include >> + providing a malicious time to the guest or the entropy to the guest >> + random number generator. Additionally, the timing of such events can >> + be an attack vector on its own, if it results in a particular guest >> + kernel action (i.e. processing of a host-injected interrupt). >> + - Similarly, as with the previous attack vector, it is not possible to >> + use attestation mechanisms to address this threat. Instead, such >> + attack vectors (i.e. interfaces) must be either disabled or made >> + resistant to supplied host input. >> + >> +As can be seen from the above table, the potential mitigation strategies >> +to secure the CoCo Linux guest kernel vary, but can be roughly split into >> +mechanisms that either require or do not require changes to the existing >> +Linux kernel code. One main goal of the CoCo security architecture is to >> +minimize changes to the Linux kernel code, while also providing usable >> +and scalable means to facilitate the security of a CoCo guest kernel. > > HTH. Very helpful, thank you for the feedback. > ~Randy > Best, Carlos
On Wed, Jun 14, 2023, Elena Reshetova wrote: > > > +The specific details of the CoCo security manager vastly diverge between > > > +technologies. For example, in some cases, it will be implemented in HW > > > +while in others it may be pure SW. In some cases, such as for the > > > +`Protected kernel-based virtual machine (pKVM) <https://github.com/intel- > > staging/pKVM-IA>`, > > > +the CoCo security manager is a small, isolated and highly privileged > > > +(compared to the rest of SW running on the host) part of a traditional > > > +VMM. > > > > I say that "virtualized environments" isn't a good description because > > while pKVM does utilize hardware virtualization, my understanding is that > > the primary use cases for pKVM don't have the same threat model as SNP/TDX, > > e.g. IIUC many (most? all?) pKVM guests don't require network access. > > Not having a network access requirement doesn’t implicitly invalidate the > separation guarantees between the host and guest, it just makes it easier > since you have one interface less between the host and guest. My point is that if the protected guest doesn't need any I/O beyond the hardware device that it accesses, then the threat model is different because many of the new/novel attack surfaces that come with the TDX/SNP threat model don't exist. E.g. the hardening that people want to do for VirtIO drivers may not be at all relevant to pKVM. > But again I will let Jason to reply on this since he knows details. > > But what you are saying more generally here and above is that you don’t want > pKVM case included into this threat model, did I understand you correctly? More or less. I think the threat models for pKVM versus TDX/SNP are different enough that accurately capturing the nuances and novelties of the TDX/SNP threat model will be unnecessarily difficult if you also try to lump in pKVM. E.g. pKVM is intended to run on portable client hardware, likely without memory encryption, versus TDX/SNP being almost exclusively server oriented with the hardware being owned and hosted by a third party that is benign (perhaps trusted even), but not necessarily physically isolated enough to satisfy the end user's security requirements. One of the points I (and others) was trying to get across in v1 feedback is that security requirements for CoCo are not the same across all use cases, and that there are subtle but meaningful differences even when use cases are built on common underlying technology. In other words, describing the TDX/SNP threat model with sufficient detail and nuance is difficult enough without throwing pKVM into the mix. And I don't see any need to formally document pKVM's threat model right *now*. pKVM on x86 is little more than a proposal at this point, and while I would love to see documentation for pKVM on ARM's threat model, that obviously doesn't belong in a doc that's x86 specific. > > > +potentially misbehaving host (which can also include some part of a > > > +traditional VMM or all of it), which is typically placed outside of the > > > +CoCo VM TCB due to its large SW attack surface. It is important to note > > > +that this doesn’t imply that the host or VMM are intentionally > > > +malicious, but that there exists a security value in having a small CoCo > > > +VM TCB. This new type of adversary may be viewed as a more powerful type > > > +of external attacker, as it resides locally on the same physical machine > > > +-in contrast to a remote network attacker- and has control over the guest > > > +kernel communication with most of the HW:: > > > > IIUC, this last statement doesn't hold true for the pKVM on x86 use case, which > > specifically aims to give a "guest" exclusive access to hardware resources. > > Does it hold for *all* HW resources? If yes, indeed this would make pKVM on > x86 considerably different. Heh, the original says "most", so it doesn't have to hold for all hardware resources, just a simple majority.
Hi Carlos, On 6/14/23 06:55, Carlos Bilbao wrote: > Hello Randy, > > On 6/12/23 17:43, Randy Dunlap wrote: >> Hi-- >> >> On 6/12/23 09:47, Carlos Bilbao wrote: >>> Kernel developers working on confidential computing for virtualized >>> environments in x86 operate under a set of assumptions regarding the Linux >>> kernel threat model that differs from the traditional view. Historically, >>> the Linux threat model acknowledges attackers residing in userspace, as >>> well as a limited set of external attackers that are able to interact with >>> the kernel through networking or limited HW-specific exposed interfaces >>> (e.g. USB, thunderbolt). The goal of this document is to explain additional >>> attack vectors that arise in the virtualized confidential computing space >>> and discuss the proposed protection mechanisms for the Linux kernel. >>> >>> Reviewed-by: Larry Dewey <larry.dewey@amd.com> >>> Reviewed-by: David Kaplan <david.kaplan@amd.com> >>> Co-developed-by: Elena Reshetova <elena.reshetova@intel.com> >>> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> >>> Signed-off-by: Carlos Bilbao <carlos.bilbao@amd.com> >>> --- >>> >>> --- >>> Documentation/security/index.rst | 1 + >>> .../security/x86-confidential-computing.rst | 298 ++++++++++++++++++ >>> MAINTAINERS | 6 + >>> 3 files changed, 305 insertions(+) >>> create mode 100644 Documentation/security/x86-confidential-computing.rst >>> >>> diff --git a/Documentation/security/x86-confidential-computing.rst b/Documentation/security/x86-confidential-computing.rst >>> new file mode 100644 >>> index 000000000000..5c52b8888089 >>> --- /dev/null >>> +++ b/Documentation/security/x86-confidential-computing.rst >>> @@ -0,0 +1,298 @@ >>> +====================================================== >>> +Confidential Computing in Linux for x86 virtualization >>> +====================================================== >>> + >>> +.. contents:: :local: >>> + >>> +By: Elena Reshetova <elena.reshetova@intel.com> and Carlos Bilbao <carlos.bilbao@amd.com> >>> + >>> +The basic CoCo guest layout includes the host, guest, the interfaces that >>> +communicate guest and host, a platform capable of supporting CoCo VMs, and >>> +a trusted intermediary between the guest VM and the underlying platform >>> +that acts as a security manager. The host-side virtual machine monitor >>> +(VMM) typically consists of a subset of traditional VMM features and >>> +is still in charge of the guest lifecycle, i.e. create or destroy a CoCo >>> +VM, manage its access to system resources, etc. However, since it >>> +typically stays out of CoCo VM TCB, its access is limited to preserve the >> >> to preserving the >> ? > > I think that using "preserving" and "preserve" here may result in two > different interpretations: > > "limited to preserve the security objectives" suggests that the limited > access is enforced to preserve the security guarantees. In other words, the > act of limiting access itself, particularly from the VMM, helps to maintain > the security objectives. This is what we want to say. > > "limited to preserving the security objectives" suggests that the access of > the VMM is limited to the components that allow the VMM to preserve the > security objectives. > > Hope that makes sense? Yes, I get it, thanks. >> >>> +security objectives. >>> + >>> +In the following diagram, the "<--->" lines represent bi-directional >>> +communication channels or interfaces between the CoCo security manager and >>> +the rest of the components (data flow for guest, host, hardware) :: >>> + >>> + +-------------------+ +-----------------------+ >>> + | CoCo guest VM |<---->| | >>> + +-------------------+ | | >>> + | Interfaces | | CoCo security manager | >>> + +-------------------+ | | >>> + | Host VMM |<---->| | >>> + +-------------------+ | | >>> + | | >>> + +--------------------+ | | >>> + | CoCo platform |<--->| | >>> + +--------------------+ +-----------------------+ >>> + >>> +The specific details of the CoCo security manager vastly diverge between >>> +technologies. For example, in some cases, it will be implemented in HW >>> +while in others it may be pure SW. In some cases, such as for the >>> +`Protected kernel-based virtual machine (pKVM) <https://github.com/intel-staging/pKVM-IA>`, >>> +the CoCo security manager is a small, isolated and highly privileged >>> +(compared to the rest of SW running on the host) part of a traditional >>> +VMM. >>> + >>> +Confidential Computing threat model and its security objectives >>> +=============================================================== >>> + >>> +Confidential Computing adds a new type of attacker to the above list: a >>> +potentially misbehaving host (which can also include some part of a >>> +traditional VMM or all of it), which is typically placed outside of the >>> +CoCo VM TCB due to its large SW attack surface. It is important to note >>> +that this doesn’t imply that the host or VMM are intentionally >>> +malicious, but that there exists a security value in having a small CoCo >>> +VM TCB. This new type of adversary may be viewed as a more powerful type >>> +of external attacker, as it resides locally on the same physical machine >>> +-in contrast to a remote network attacker- and has control over the guest >> >> Hyphens (dashes) are not normally used for a parenthetical phrase AFAIK. > > Yes, parentheses would be more appropriate. > >> >>> +kernel communication with most of the HW:: >> >> I would prefer to capitalize "kernel" above. > > I'm not sure I follow, we don't capitalize kernel elsewhere, why here? > My mistake in reading. :( Thanks.
On 6/13/23 19:03, Sean Christopherson wrote: > On Mon, Jun 12, 2023, Carlos Bilbao wrote: >> +well as CoCo technology specific hypercalls, if present. Additionally, the >> +host in a CoCo system typically controls the process of creating a CoCo >> +guest: it has a method to load into a guest the firmware and bootloader >> +images, the kernel image together with the kernel command line. All of this >> +data should also be considered untrusted until its integrity and >> +authenticity is established via attestation. > > Attestation is SNP and TDX specific. AIUI, none of SEV, SEV-ES, or pKVM (which > doesn't even really exist on x86 yet), have attestation of their own, e.g. the > proposed pKVM support would rely on Secure Boot of the original "full" host kernel. Seems to be a bit of misunderstanding here. Secure Boot verifies the host kernel, which is indeed also important, since the pKVM hypervisor is a part of the host kernel image. But when it comes to verifying the guests, it's a different story: a protected pKVM guest is started by the (untrusted) host at an arbitrary moment in time, not before the early kernel deprivileging when the host is still considered trusted. (Moreover, in practice the guest is started by a userspace VMM, i.e. not exactly the most trusted part of the host stack.) So the host can maliciously or mistakenly load a wrong guest image for running as a protected guest, so we do need attestation for protected guests. This attestation is not implemented in pKVM on x86 yet (you are right that pKVM on x86 is little more than a proposal at this point). But in pKVM on ARM it is afaik already working, it is software based (ensured by pKVM hypervisor + a tiny generic guest bootloader which verifies the guest image before jumping to the guest) and architecture-independent, so it should be possible to adopt it for x86 as is. Furthermore, since for pKVM on x86 use cases we also need assigning physical secure hardware devices to the protected guest, we need attestation not just for the guest image itself but also for the secure devices assigned to it by the host.
On 6/14/23 16:15, Sean Christopherson wrote: > On Wed, Jun 14, 2023, Elena Reshetova wrote: >>>> +The specific details of the CoCo security manager vastly diverge between >>>> +technologies. For example, in some cases, it will be implemented in HW >>>> +while in others it may be pure SW. In some cases, such as for the >>>> +`Protected kernel-based virtual machine (pKVM) <https://github.com/intel- >>> staging/pKVM-IA>`, >>>> +the CoCo security manager is a small, isolated and highly privileged >>>> +(compared to the rest of SW running on the host) part of a traditional >>>> +VMM. >>> >>> I say that "virtualized environments" isn't a good description because >>> while pKVM does utilize hardware virtualization, my understanding is that >>> the primary use cases for pKVM don't have the same threat model as SNP/TDX, >>> e.g. IIUC many (most? all?) pKVM guests don't require network access. >> >> Not having a network access requirement doesn’t implicitly invalidate the >> separation guarantees between the host and guest, it just makes it easier >> since you have one interface less between the host and guest. > > My point is that if the protected guest doesn't need any I/O beyond the hardware > device that it accesses, then the threat model is different because many of the > new/novel attack surfaces that come with the TDX/SNP threat model don't exist. > E.g. the hardening that people want to do for VirtIO drivers may not be at all > relevant to pKVM. Strictly speaking, the protected pKVM guest does need some I/O beyond that, e.g. for some (limited and specialized) communication between the host and the guest, e.g. vsock-based. For example, in the fingerprint use case, the guest receives requests from the host to capture fingerprint data from the sensor, sends encrypted fingerprint templates to the host, and so on. Additionally, speaking of the hardware device, the guest does not entirely own it. It has direct exclusive access to the data communication with the device (ensured by its exclusive access to MMIO and DMA buffers), but e.g. the device interrupts are forwarded to the guest by the host, and the PCI config space is virtualized by the host. But I think I get what you mean: there is no data transfer whereby the host is not an endpoint but an intermediary between the guest and some device. In simple words, things like virtio-net or virtio-blk are out of scope. Yes, I think that's correct for pKVM-on-x86 use cases (and I suppose it is correct for pKVM-on-ARM use cases as well). I guess it means that "guest data attacks" may not be relevant to pKVM, and perhaps this makes its threat model substantially different from cloud use cases. However, other kinds of threats described in the doc do seem to be relevant to pKVM. "Malformed/malicious runtime input" is relevant since communication channels between the host and the guest do exist, the host may arbitrarily inject interrupts into the guest, etc. "Guest malicious configuration" is relevant too, and guest attestation is required, as I wrote in [1]. Cc'ing android-kvm and some ChromeOS folks to correct me if needed. > And I don't see any need to formally document pKVM's threat model right *now*. > pKVM on x86 is little more than a proposal at this point, and while I would love > to see documentation for pKVM on ARM's threat model, that obviously doesn't belong > in a doc that's x86 specific. Agree, and I don't think it makes sense to mention pKVM-on-x86 without mentioning pKVM-on-ARM, as if pKVM-on-x86 had more in common with cloud use cases than with pKVM-on-ARM, while quite the opposite is true. It seems there is no reason why pKVM-on-x86 threat model should be different from pKVM-on-ARM. The use cases on ARM (for Android) and on x86 (for ChromeOS) are somewhat different at this moment (in that in ChromeOS use cases the protected guest's sensitive data includes also data coming directly from a physical device), but IIUC they are converging now, i.e. Android is getting interested in use cases with physical devices too. >>>> +potentially misbehaving host (which can also include some part of a >>>> +traditional VMM or all of it), which is typically placed outside of the >>>> +CoCo VM TCB due to its large SW attack surface. It is important to note >>>> +that this doesn’t imply that the host or VMM are intentionally >>>> +malicious, but that there exists a security value in having a small CoCo >>>> +VM TCB. This new type of adversary may be viewed as a more powerful type >>>> +of external attacker, as it resides locally on the same physical machine >>>> +-in contrast to a remote network attacker- and has control over the guest >>>> +kernel communication with most of the HW:: >>> >>> IIUC, this last statement doesn't hold true for the pKVM on x86 use case, which >>> specifically aims to give a "guest" exclusive access to hardware resources. >> >> Does it hold for *all* HW resources? If yes, indeed this would make pKVM on >> x86 considerably different. > > Heh, the original says "most", so it doesn't have to hold for all hardware resources, > just a simple majority. Again, pedantic mode on, I find it difficult to agree with the wording that the guest owns "most of" the HW resources it uses. It controls the data communication with its hardware device, but other resources (e.g. CPU time, interrupts, timers, PCI config space, ACPI) are owned by the host and virtualized by it for the guest. [1] https://lore.kernel.org/all/2cfa3122-6b54-aab5-8a61-41c08853286b@semihalf.com/
On Fri, Jun 16, 2023, Dmytro Maluka wrote: > On 6/14/23 16:15, Sean Christopherson wrote: > > On Wed, Jun 14, 2023, Elena Reshetova wrote: > >> Not having a network access requirement doesn’t implicitly invalidate the > >> separation guarantees between the host and guest, it just makes it easier > >> since you have one interface less between the host and guest. > > > > My point is that if the protected guest doesn't need any I/O beyond the hardware > > device that it accesses, then the threat model is different because many of the > > new/novel attack surfaces that come with the TDX/SNP threat model don't exist. > > E.g. the hardening that people want to do for VirtIO drivers may not be at all > > relevant to pKVM. ... > But I think I get what you mean: there is no data transfer whereby the > host is not an endpoint but an intermediary between the guest and some > device. In simple words, things like virtio-net or virtio-blk are out of > scope. Yes, I think that's correct for pKVM-on-x86 use cases (and I > suppose it is correct for pKVM-on-ARM use cases as well). I guess it > means that "guest data attacks" may not be relevant to pKVM, and perhaps > this makes its threat model substantially different from cloud use > cases. Yes. > >>>> +This new type of adversary may be viewed as a more powerful type > >>>> +of external attacker, as it resides locally on the same physical machine > >>>> +-in contrast to a remote network attacker- and has control over the guest > >>>> +kernel communication with most of the HW:: > >>> > >>> IIUC, this last statement doesn't hold true for the pKVM on x86 use case, which > >>> specifically aims to give a "guest" exclusive access to hardware resources. > >> > >> Does it hold for *all* HW resources? If yes, indeed this would make pKVM on > >> x86 considerably different. > > > > Heh, the original says "most", so it doesn't have to hold for all hardware resources, > > just a simple majority. > > Again, pedantic mode on, I find it difficult to agree with the wording > that the guest owns "most of" the HW resources it uses. It controls the > data communication with its hardware device, but other resources (e.g. > CPU time, interrupts, timers, PCI config space, ACPI) are owned by the > host and virtualized by it for the guest. I wasn't saying that the guest owns most resources, I was saying that the *untrusted* host does *not* own most resources that are exposed to the guest. My understanding is that everything in your list is owned by the trusted hypervisor in the pKVM model. What I was pointing out is related to the above discussion about the guest needing access to hardware that is effectively owned by the untrusted host, e.g. network access.
On Fri, Jun 16, 2023 at 8:56 AM Sean Christopherson <seanjc@google.com> wrote: > > On Fri, Jun 16, 2023, Dmytro Maluka wrote: > > On 6/14/23 16:15, Sean Christopherson wrote: > > > On Wed, Jun 14, 2023, Elena Reshetova wrote: > > >> Not having a network access requirement doesn’t implicitly invalidate the > > >> separation guarantees between the host and guest, it just makes it easier > > >> since you have one interface less between the host and guest. > > > > > > My point is that if the protected guest doesn't need any I/O beyond the hardware > > > device that it accesses, then the threat model is different because many of the > > > new/novel attack surfaces that come with the TDX/SNP threat model don't exist. > > > E.g. the hardening that people want to do for VirtIO drivers may not be at all > > > relevant to pKVM. > > ... > > > But I think I get what you mean: there is no data transfer whereby the > > host is not an endpoint but an intermediary between the guest and some > > device. In simple words, things like virtio-net or virtio-blk are out of > > scope. Yes, I think that's correct for pKVM-on-x86 use cases (and I > > suppose it is correct for pKVM-on-ARM use cases as well). I guess it > > means that "guest data attacks" may not be relevant to pKVM, and perhaps > > this makes its threat model substantially different from cloud use > > cases. > > Yes. > > > >>>> +This new type of adversary may be viewed as a more powerful type > > >>>> +of external attacker, as it resides locally on the same physical machine > > >>>> +-in contrast to a remote network attacker- and has control over the guest > > >>>> +kernel communication with most of the HW:: > > >>> > > >>> IIUC, this last statement doesn't hold true for the pKVM on x86 use case, which > > >>> specifically aims to give a "guest" exclusive access to hardware resources. > > >> > > >> Does it hold for *all* HW resources? If yes, indeed this would make pKVM on > > >> x86 considerably different. > > > > > > Heh, the original says "most", so it doesn't have to hold for all hardware resources, > > > just a simple majority. > > > > Again, pedantic mode on, I find it difficult to agree with the wording > > that the guest owns "most of" the HW resources it uses. It controls the > > data communication with its hardware device, but other resources (e.g. > > CPU time, interrupts, timers, PCI config space, ACPI) are owned by the > > host and virtualized by it for the guest. > > I wasn't saying that the guest owns most resources, I was saying that the *untrusted* > host does *not* own most resources that are exposed to the guest. My understanding > is that everything in your list is owned by the trusted hypervisor in the pKVM model. > > What I was pointing out is related to the above discussion about the guest needing > access to hardware that is effectively owned by the untrusted host, e.g. network > access. The network case isn't a great example because it is common for user space applications not to trust the network and to use verification schemes like TLS where trust of the network is not required, so the trusted guest could use these strategies when needed. There wouldn't be any availability guarantees, but my understanding is that isn't in scope for pKVM. In the case where the host owns a TPM and the guest has to cooperate with the host to communicate with the TPM. There are schemes for establishing trust between the TPM and the trusted guest with various properties (authentication, confidentiality, integrity, etc.). This does have the downside of additional complexity, but comes with the benefit of also being resistant to attacks like monitoring the SPI lines going to the TPM. Did you have particular situations in mind for resources that would be owned by the host and needed by the trusted guest?
On Fri, Jun 16, 2023, Dmytro Maluka wrote: > On 6/13/23 19:03, Sean Christopherson wrote: > > On Mon, Jun 12, 2023, Carlos Bilbao wrote: > >> +well as CoCo technology specific hypercalls, if present. Additionally, the > >> +host in a CoCo system typically controls the process of creating a CoCo > >> +guest: it has a method to load into a guest the firmware and bootloader > >> +images, the kernel image together with the kernel command line. All of this > >> +data should also be considered untrusted until its integrity and > >> +authenticity is established via attestation. > > > > Attestation is SNP and TDX specific. AIUI, none of SEV, SEV-ES, or pKVM (which > > doesn't even really exist on x86 yet), have attestation of their own, e.g. the > > proposed pKVM support would rely on Secure Boot of the original "full" host kernel. > > Seems to be a bit of misunderstanding here. Secure Boot verifies the > host kernel, which is indeed also important, since the pKVM hypervisor > is a part of the host kernel image. But when it comes to verifying the > guests, it's a different story: a protected pKVM guest is started by the > (untrusted) host at an arbitrary moment in time, not before the early > kernel deprivileging when the host is still considered trusted. > (Moreover, in practice the guest is started by a userspace VMM, i.e. not > exactly the most trusted part of the host stack.) So the host can > maliciously or mistakenly load a wrong guest image for running as a > protected guest, so we do need attestation for protected guests. > > This attestation is not implemented in pKVM on x86 yet (you are right > that pKVM on x86 is little more than a proposal at this point). But in > pKVM on ARM it is afaik already working, it is software based (ensured > by pKVM hypervisor + a tiny generic guest bootloader which verifies the > guest image before jumping to the guest) and architecture-independent, > so it should be possible to adopt it for x86 as is. Sorry, instead of "Attestation is SNP and TDX specific", I should have said, "The form of attestation described here is SNP and TDX specific". pKVM's "attestation", effectively has its root of trust in the pKVM hypervisor, which is in turn attested via Secure Boot. I.e. the guest payload is verified *before* it is launched. That is different from SNP and TDX where guest code and data is controlled by the *untrusted* host. The initial payload is measured by trusted firmware, but it is not verified, and so that measurement must be attested after the guest boots, before any sensitive data is provisioned to the guest. Specifically, with "untrusted" inserted by me for clarification, my understanding is that this doesn't hold true for pKVM when splitting hairs: Additionally, the **untrusted** host in a CoCo system typically controls the process of creating a CoCo guest: it has a method to load into a guest the firmware and bootloader images, the kernel image together with the kernel command line. All of this data should also be considered untrusted until its integrity and authenticity is established via attestation. because the guest firmware comes from a trusted entity, not the untrusted host.
On Fri, Jun 16, 2023, Allen Webb wrote: > On Fri, Jun 16, 2023 at 8:56 AM Sean Christopherson <seanjc@google.com> wrote: > > > > On Fri, Jun 16, 2023, Dmytro Maluka wrote: > > > On 6/14/23 16:15, Sean Christopherson wrote: > > > > On Wed, Jun 14, 2023, Elena Reshetova wrote: > > > >>>> +This new type of adversary may be viewed as a more powerful type > > > >>>> +of external attacker, as it resides locally on the same physical machine > > > >>>> +-in contrast to a remote network attacker- and has control over the guest > > > >>>> +kernel communication with most of the HW:: > > > >>> > > > >>> IIUC, this last statement doesn't hold true for the pKVM on x86 use case, which > > > >>> specifically aims to give a "guest" exclusive access to hardware resources. > > > >> > > > >> Does it hold for *all* HW resources? If yes, indeed this would make pKVM on > > > >> x86 considerably different. > > > > > > > > Heh, the original says "most", so it doesn't have to hold for all hardware resources, > > > > just a simple majority. > > > > > > Again, pedantic mode on, I find it difficult to agree with the wording > > > that the guest owns "most of" the HW resources it uses. It controls the > > > data communication with its hardware device, but other resources (e.g. > > > CPU time, interrupts, timers, PCI config space, ACPI) are owned by the > > > host and virtualized by it for the guest. > > > > I wasn't saying that the guest owns most resources, I was saying that the *untrusted* > > host does *not* own most resources that are exposed to the guest. My understanding > > is that everything in your list is owned by the trusted hypervisor in the pKVM model. > > > > What I was pointing out is related to the above discussion about the guest needing > > access to hardware that is effectively owned by the untrusted host, e.g. network > > access. > > The network case isn't a great example because it is common for user > space applications not to trust the network and to use verification > schemes like TLS where trust of the network is not required, so the > trusted guest could use these strategies when needed. There's a bit of context/history that isn't captured here. The network being untrusted isn't new/novel in the SNP/TDX threat model, what's new is that the network *device* is untrusted. In the SNP/TDX world, the NIC is likely to be a synthetic, virtual device that is provided by the untrusted VMM. Pre-SNP/TDX, input from the device, i.e. the VMM, is trusted; the guest still needs to use e.g. TLS to secure network traffic, but the device configuration and whatnot is fully trusted. When the VMM is no longer trusted, the device itself is no longer trusted. To address that, the folks working on SNP and TDX started posting patches[1][2] to harden kernel drivers against bad device configurations and whanot, but without first getting community buy-in on this new threat model, which led us here[3]. There is no equivalent in existing userspace applications, because userspace's memory is not private, i.e. the kernel doesn't need to do Iago attacks to compromise userspace, the kernel can simply read whatever memory it wants. And for pKVM, my understanding is that devices and configuration information that are exposed to the guest are trusted and/or verified in some way, i.e. the points of contention that led to this doc don't necessarily apply to the pKVM use case. [1] https://lore.kernel.org/linux-iommu/20210603004133.4079390-1-ak@linux.intel.com [2] https://lore.kernel.org/all/20230119170633.40944-1-alexander.shishkin@linux.intel.com [3] https://lore.kernel.org/lkml/DM8PR11MB57505481B2FE79C3D56C9201E7CE9@DM8PR11MB5750.namprd11.prod.outlook.com
On Fri, Jun 16, 2023 at 9:42 AM Sean Christopherson <seanjc@google.com> wrote: > > On Fri, Jun 16, 2023, Allen Webb wrote: > > On Fri, Jun 16, 2023 at 8:56 AM Sean Christopherson <seanjc@google.com> wrote: > > > > > > On Fri, Jun 16, 2023, Dmytro Maluka wrote: > > > > On 6/14/23 16:15, Sean Christopherson wrote: > > > > > On Wed, Jun 14, 2023, Elena Reshetova wrote: > > > > >>>> +This new type of adversary may be viewed as a more powerful type > > > > >>>> +of external attacker, as it resides locally on the same physical machine > > > > >>>> +-in contrast to a remote network attacker- and has control over the guest > > > > >>>> +kernel communication with most of the HW:: > > > > >>> > > > > >>> IIUC, this last statement doesn't hold true for the pKVM on x86 use case, which > > > > >>> specifically aims to give a "guest" exclusive access to hardware resources. > > > > >> > > > > >> Does it hold for *all* HW resources? If yes, indeed this would make pKVM on > > > > >> x86 considerably different. > > > > > > > > > > Heh, the original says "most", so it doesn't have to hold for all hardware resources, > > > > > just a simple majority. > > > > > > > > Again, pedantic mode on, I find it difficult to agree with the wording > > > > that the guest owns "most of" the HW resources it uses. It controls the > > > > data communication with its hardware device, but other resources (e.g. > > > > CPU time, interrupts, timers, PCI config space, ACPI) are owned by the > > > > host and virtualized by it for the guest. > > > > > > I wasn't saying that the guest owns most resources, I was saying that the *untrusted* > > > host does *not* own most resources that are exposed to the guest. My understanding > > > is that everything in your list is owned by the trusted hypervisor in the pKVM model. > > > > > > What I was pointing out is related to the above discussion about the guest needing > > > access to hardware that is effectively owned by the untrusted host, e.g. network > > > access. > > > > The network case isn't a great example because it is common for user > > space applications not to trust the network and to use verification > > schemes like TLS where trust of the network is not required, so the > > trusted guest could use these strategies when needed. > > There's a bit of context/history that isn't captured here. The network being > untrusted isn't new/novel in the SNP/TDX threat model, what's new is that the > network *device* is untrusted. > > In the SNP/TDX world, the NIC is likely to be a synthetic, virtual device that is > provided by the untrusted VMM. Pre-SNP/TDX, input from the device, i.e. the VMM, > is trusted; the guest still needs to use e.g. TLS to secure network traffic, but > the device configuration and whatnot is fully trusted. When the VMM is no longer > trusted, the device itself is no longer trusted. > > To address that, the folks working on SNP and TDX started posting patches[1][2] > to harden kernel drivers against bad device configurations and whanot, but without > first getting community buy-in on this new threat model, which led us here[3]. > > There is no equivalent in existing userspace applications, because userspace's > memory is not private, i.e. the kernel doesn't need to do Iago attacks to compromise > userspace, the kernel can simply read whatever memory it wants. > > And for pKVM, my understanding is that devices and configuration information that > are exposed to the guest are trusted and/or verified in some way, i.e. the points > of contention that led to this doc don't necessarily apply to the pKVM use case. That extra context helps, so the hardening is on the side of the guest kernel since the host kernel isn't trusted? My biggest concerns would be around situations where devices have memory access for things like DMA. In such cases the guest would need to be protected from the devices so bounce buffers or some limited shared memory might need to be set up to facilitate these devices without breaking the goals of pKVM. The minimum starting point for something like this would be a shared memory region visible to both the guest and the host. Given that it should be possible to build communication primitives on top, but yes ideally something like vsock or virtio would just work without introducing risk of exploitation and typically the hypervisor is trusted. Maybe this could be modeled as sibling to sibling virtio/vsock? > > [1] https://lore.kernel.org/linux-iommu/20210603004133.4079390-1-ak@linux.intel.com > [2] https://lore.kernel.org/all/20230119170633.40944-1-alexander.shishkin@linux.intel.com > [3] https://lore.kernel.org/lkml/DM8PR11MB57505481B2FE79C3D56C9201E7CE9@DM8PR11MB5750.namprd11.prod.outlook.com
On 6/16/23 15:56, Sean Christopherson wrote: > On Fri, Jun 16, 2023, Dmytro Maluka wrote: >> On 6/14/23 16:15, Sean Christopherson wrote: >>> On Wed, Jun 14, 2023, Elena Reshetova wrote: >>>>>> +This new type of adversary may be viewed as a more powerful type >>>>>> +of external attacker, as it resides locally on the same physical machine >>>>>> +-in contrast to a remote network attacker- and has control over the guest >>>>>> +kernel communication with most of the HW:: >>>>> >>>>> IIUC, this last statement doesn't hold true for the pKVM on x86 use case, which >>>>> specifically aims to give a "guest" exclusive access to hardware resources. >>>> >>>> Does it hold for *all* HW resources? If yes, indeed this would make pKVM on >>>> x86 considerably different. >>> >>> Heh, the original says "most", so it doesn't have to hold for all hardware resources, >>> just a simple majority. >> >> Again, pedantic mode on, I find it difficult to agree with the wording >> that the guest owns "most of" the HW resources it uses. It controls the >> data communication with its hardware device, but other resources (e.g. >> CPU time, interrupts, timers, PCI config space, ACPI) are owned by the >> host and virtualized by it for the guest. > > I wasn't saying that the guest owns most resources, I was saying that the *untrusted* > host does *not* own most resources that are exposed to the guest. My understanding > is that everything in your list is owned by the trusted hypervisor in the pKVM model. Heh, no. Most of these resources are owned by the untrusted host, that's the point. Basically for two reasons: 1. we want to keep the trusted hypervisor as simple as possible. 2. we don't need availability guarantees. The trusted hypervisor owns only: 2nd-stage MMU, IOMMU, VMCS (or its counterparts on non-Intel), physical PCI config space (merely for controlling a few critical registers like BARs and MSI address registers), perhaps a few more things that don't come to my mind now. The untrusted host schedules its guests on physical CPUs (i.e. the host's L1 vCPUs are 1:1 mapped onto pCPUs), while the trusted hypervisor has no scheduling, it only handles vmexits from the host and guests. The untrusted host fully controls the physical interrupt controllers (I think we realize that is not perfectly fine, but here we are), etc. > What I was pointing out is related to the above discussion about the guest needing > access to hardware that is effectively owned by the untrusted host, e.g. network > access.
On 6/16/23 16:20, Sean Christopherson wrote: > On Fri, Jun 16, 2023, Dmytro Maluka wrote: >> On 6/13/23 19:03, Sean Christopherson wrote: >>> On Mon, Jun 12, 2023, Carlos Bilbao wrote: >>>> +well as CoCo technology specific hypercalls, if present. Additionally, the >>>> +host in a CoCo system typically controls the process of creating a CoCo >>>> +guest: it has a method to load into a guest the firmware and bootloader >>>> +images, the kernel image together with the kernel command line. All of this >>>> +data should also be considered untrusted until its integrity and >>>> +authenticity is established via attestation. >>> >>> Attestation is SNP and TDX specific. AIUI, none of SEV, SEV-ES, or pKVM (which >>> doesn't even really exist on x86 yet), have attestation of their own, e.g. the >>> proposed pKVM support would rely on Secure Boot of the original "full" host kernel. >> >> Seems to be a bit of misunderstanding here. Secure Boot verifies the >> host kernel, which is indeed also important, since the pKVM hypervisor >> is a part of the host kernel image. But when it comes to verifying the >> guests, it's a different story: a protected pKVM guest is started by the >> (untrusted) host at an arbitrary moment in time, not before the early >> kernel deprivileging when the host is still considered trusted. >> (Moreover, in practice the guest is started by a userspace VMM, i.e. not >> exactly the most trusted part of the host stack.) So the host can >> maliciously or mistakenly load a wrong guest image for running as a >> protected guest, so we do need attestation for protected guests. >> >> This attestation is not implemented in pKVM on x86 yet (you are right >> that pKVM on x86 is little more than a proposal at this point). But in >> pKVM on ARM it is afaik already working, it is software based (ensured >> by pKVM hypervisor + a tiny generic guest bootloader which verifies the >> guest image before jumping to the guest) and architecture-independent, >> so it should be possible to adopt it for x86 as is. > > Sorry, instead of "Attestation is SNP and TDX specific", I should have said, "The > form of attestation described here is SNP and TDX specific". > > pKVM's "attestation", effectively has its root of trust in the pKVM hypervisor, > which is in turn attested via Secure Boot. I.e. the guest payload is verified > *before* it is launched. Got it, fair point. Yep, I think this understanding is fully correct. > That is different from SNP and TDX where guest code and data is controlled by the > *untrusted* host. The initial payload is measured by trusted firmware, but it is > not verified, and so that measurement must be attested after the guest boots, > before any sensitive data is provisioned to the guest. > > Specifically, with "untrusted" inserted by me for clarification, my understanding > is that this doesn't hold true for pKVM when splitting hairs: > > Additionally, the **untrusted** host in a CoCo system typically controls the > process of creating a CoCo guest: it has a method to load into a guest the > firmware and bootloader images, the kernel image together with the kernel > command line. All of this data should also be considered untrusted until its > integrity and authenticity is established via attestation. > > because the guest firmware comes from a trusted entity, not the untrusted host.
On Fri, Jun 16, 2023, Dmytro Maluka wrote: > On 6/16/23 15:56, Sean Christopherson wrote: > > On Fri, Jun 16, 2023, Dmytro Maluka wrote: > >> On 6/14/23 16:15, Sean Christopherson wrote: > >>> On Wed, Jun 14, 2023, Elena Reshetova wrote: > >>>>>> +This new type of adversary may be viewed as a more powerful type > >>>>>> +of external attacker, as it resides locally on the same physical machine > >>>>>> +-in contrast to a remote network attacker- and has control over the guest > >>>>>> +kernel communication with most of the HW:: > >>>>> > >>>>> IIUC, this last statement doesn't hold true for the pKVM on x86 use case, which > >>>>> specifically aims to give a "guest" exclusive access to hardware resources. > >>>> > >>>> Does it hold for *all* HW resources? If yes, indeed this would make pKVM on > >>>> x86 considerably different. > >>> > >>> Heh, the original says "most", so it doesn't have to hold for all hardware resources, > >>> just a simple majority. > >> > >> Again, pedantic mode on, I find it difficult to agree with the wording > >> that the guest owns "most of" the HW resources it uses. It controls the > >> data communication with its hardware device, but other resources (e.g. > >> CPU time, interrupts, timers, PCI config space, ACPI) are owned by the > >> host and virtualized by it for the guest. > > > > I wasn't saying that the guest owns most resources, I was saying that the *untrusted* > > host does *not* own most resources that are exposed to the guest. My understanding > > is that everything in your list is owned by the trusted hypervisor in the pKVM model. > > Heh, no. Most of these resources are owned by the untrusted host, that's > the point. Ah, I was overloading "owned", probably wrongly. What I'm trying to call out is that in pKVM, while the untrusted host can withold resources, it can't subvert most of those resources. Taking scheduling as an example, a pKVM vCPU may be migrated to a different pCPU by the untrusted host, but pKVM ensures that it is safe to run on the new pCPU, e.g. on Intel, pKVM (presumably) does any necessary VMCLEAR, IBPB, INVEPT, etc. to ensure the vCPU doesn't consume stale data. > Basically for two reasons: 1. we want to keep the trusted hypervisor as > simple as possible. 2. we don't need availability guarantees. > > The trusted hypervisor owns only: 2nd-stage MMU, IOMMU, VMCS (or its > counterparts on non-Intel), physical PCI config space (merely for > controlling a few critical registers like BARs and MSI address > registers), perhaps a few more things that don't come to my mind now. The "physical PCI config space" is a key difference, and is very relevant to this doc (see my response to Allen). > The untrusted host schedules its guests on physical CPUs (i.e. the > host's L1 vCPUs are 1:1 mapped onto pCPUs), while the trusted hypervisor > has no scheduling, it only handles vmexits from the host and guests. The > untrusted host fully controls the physical interrupt controllers (I > think we realize that is not perfectly fine, but here we are), etc. Yeah, IRQs are a tough nut to crack.
On 6/16/23 20:07, Sean Christopherson wrote: > On Fri, Jun 16, 2023, Dmytro Maluka wrote: >> On 6/16/23 15:56, Sean Christopherson wrote: >>> On Fri, Jun 16, 2023, Dmytro Maluka wrote: >>>> Again, pedantic mode on, I find it difficult to agree with the wording >>>> that the guest owns "most of" the HW resources it uses. It controls the >>>> data communication with its hardware device, but other resources (e.g. >>>> CPU time, interrupts, timers, PCI config space, ACPI) are owned by the >>>> host and virtualized by it for the guest. >>> >>> I wasn't saying that the guest owns most resources, I was saying that the *untrusted* >>> host does *not* own most resources that are exposed to the guest. My understanding >>> is that everything in your list is owned by the trusted hypervisor in the pKVM model. >> >> Heh, no. Most of these resources are owned by the untrusted host, that's >> the point. > > Ah, I was overloading "owned", probably wrongly. What I'm trying to call out is > that in pKVM, while the untrusted host can withold resources, it can't subvert > most of those resources. Taking scheduling as an example, a pKVM vCPU may be > migrated to a different pCPU by the untrusted host, but pKVM ensures that it is > safe to run on the new pCPU, e.g. on Intel, pKVM (presumably) does any necessary > VMCLEAR, IBPB, INVEPT, etc. to ensure the vCPU doesn't consume stale data. Yep, agree. >> Basically for two reasons: 1. we want to keep the trusted hypervisor as >> simple as possible. 2. we don't need availability guarantees. >> >> The trusted hypervisor owns only: 2nd-stage MMU, IOMMU, VMCS (or its >> counterparts on non-Intel), physical PCI config space (merely for >> controlling a few critical registers like BARs and MSI address >> registers), perhaps a few more things that don't come to my mind now. > > The "physical PCI config space" is a key difference, and is very relevant to this > doc (see my response to Allen). Yeah, thanks for the links and the context, BTW. But let me clarify that we have 2 things here that should not be confused with each other. We have 2 levels of virtualization of the PCI config space in pKVM. The hypervisor traps the host's accesses to the config space, but mostly it simply passes them through to hardware. Most importantly, when the host reprograms a BAR, the hypervisor makes sure to update the corresponding MMIO mappings in the host's and the guest's 2nd-level page tables (that is what makes protection of the protected guest's passthrough PCI devices possible at all). But essentially it's the host that manages the physical config space. And the host, in turn, virtualizes it for the guest, using vfio-pci, like it is traditionally done for passthrough PCI devices. This latter, emulated config space is the concern. Looking at the patches [1] and thinking if those MSI-X misconfiguration attacks are possible in pKVM, I come to the conclusion that yes, they are. Device attestation helps with trusting/verifying static information, but the dynamically changing config space is something different. So it seems that such "emulated PCI config misconfiguration attacks" need to be included in the threat model for pKVM as well, i.e. need to be hardened on the guest side. Unless we revisit our current design assumptions for device assignment in pKVM on x86 and manage the physical PCI config in the trusted hypervisor, not in the host (with all the increasing complexity that comes with that, related to power management and other things). Also, thinking more about it: irrespectively of passthrough devices, I guess that the protected pKVM guest may well want to use virtio with PCI transport (not for things like networking, but that's not the point), thus be prone to the same attacks. >> The untrusted host schedules its guests on physical CPUs (i.e. the >> host's L1 vCPUs are 1:1 mapped onto pCPUs), while the trusted hypervisor >> has no scheduling, it only handles vmexits from the host and guests. The >> untrusted host fully controls the physical interrupt controllers (I >> think we realize that is not perfectly fine, but here we are), etc. > > Yeah, IRQs are a tough nut to crack. And BTW, doesn't it mean that interrupts also need to be hardened in the guest (if we don't want the complexity of interrupt controllers in the trusted hypervisor)? At least sensitive ones like IPIs, but I guess we should also consider interrupt-based timings attacks, which could use any type of interrupt. (I have no idea how to harden either of the two cases, but I'm no expert.) [1] https://lore.kernel.org/all/20230119170633.40944-1-alexander.shishkin@linux.intel.com/
On 6/16/23 17:16, Allen Webb wrote: > That extra context helps, so the hardening is on the side of the guest > kernel since the host kernel isn't trusted? > > My biggest concerns would be around situations where devices have > memory access for things like DMA. In such cases the guest would need > to be protected from the devices so bounce buffers or some limited > shared memory might need to be set up to facilitate these devices > without breaking the goals of pKVM. I'm assuming you are talking about cases when we want a host-owned device, e.g. a TPM from your example, to be able to DMA to the guest memory (please correct me if you mean something different). I think with pKVM it should be already possible to do securely and without extra hardening in the guest (modulo establishing trust between the guest and the TPM, which you mentioned, but that is needed anyway?). The hypervisor in any case ensures protection of the guest memory from the host devices DMA via IOMMU. Also the hypervisor allows the guest to explicitly share its memory pages with the host via a hypercall. Those shared pages, and only those, become accessible by the host devices DMA as well. P.S. I know that on chromebooks the TPM can't possibly do DMA. :) > The minimum starting point for something like this would be a shared > memory region visible to both the guest and the host. Given that it > should be possible to build communication primitives on top, but yes > ideally something like vsock or virtio would just work without > introducing risk of exploitation and typically the hypervisor is > trusted. Maybe this could be modeled as sibling to sibling > virtio/vsock?
> On 6/16/23 20:07, Sean Christopherson wrote: > > On Fri, Jun 16, 2023, Dmytro Maluka wrote: > >> On 6/16/23 15:56, Sean Christopherson wrote: > >>> On Fri, Jun 16, 2023, Dmytro Maluka wrote: > >>>> Again, pedantic mode on, I find it difficult to agree with the wording > >>>> that the guest owns "most of" the HW resources it uses. It controls the > >>>> data communication with its hardware device, but other resources (e.g. > >>>> CPU time, interrupts, timers, PCI config space, ACPI) are owned by the > >>>> host and virtualized by it for the guest. > >>> > >>> I wasn't saying that the guest owns most resources, I was saying that the > *untrusted* > >>> host does *not* own most resources that are exposed to the guest. My > understanding > >>> is that everything in your list is owned by the trusted hypervisor in the pKVM > model. > >> > >> Heh, no. Most of these resources are owned by the untrusted host, that's > >> the point. > > > > Ah, I was overloading "owned", probably wrongly. What I'm trying to call out is > > that in pKVM, while the untrusted host can withold resources, it can't subvert > > most of those resources. Taking scheduling as an example, a pKVM vCPU may > be > > migrated to a different pCPU by the untrusted host, but pKVM ensures that it is > > safe to run on the new pCPU, e.g. on Intel, pKVM (presumably) does any > necessary > > VMCLEAR, IBPB, INVEPT, etc. to ensure the vCPU doesn't consume stale data. > > Yep, agree. > > >> Basically for two reasons: 1. we want to keep the trusted hypervisor as > >> simple as possible. 2. we don't need availability guarantees. > >> > >> The trusted hypervisor owns only: 2nd-stage MMU, IOMMU, VMCS (or its > >> counterparts on non-Intel), physical PCI config space (merely for > >> controlling a few critical registers like BARs and MSI address > >> registers), perhaps a few more things that don't come to my mind now. > > > > The "physical PCI config space" is a key difference, and is very relevant to this > > doc (see my response to Allen). > > Yeah, thanks for the links and the context, BTW. > > But let me clarify that we have 2 things here that should not be > confused with each other. We have 2 levels of virtualization of the PCI > config space in pKVM. The hypervisor traps the host's accesses to the > config space, but mostly it simply passes them through to hardware. Most > importantly, when the host reprograms a BAR, the hypervisor makes sure > to update the corresponding MMIO mappings in the host's and the guest's > 2nd-level page tables (that is what makes protection of the protected > guest's passthrough PCI devices possible at all). But essentially it's > the host that manages the physical config space. And the host, in turn, > virtualizes it for the guest, using vfio-pci, like it is traditionally > done for passthrough PCI devices. > > This latter, emulated config space is the concern. Looking at the > patches [1] and thinking if those MSI-X misconfiguration attacks are > possible in pKVM, I come to the conclusion that yes, they are. > > Device attestation helps with trusting/verifying static information, but > the dynamically changing config space is something different. > > So it seems that such "emulated PCI config misconfiguration attacks" > need to be included in the threat model for pKVM as well, i.e. need to > be hardened on the guest side. Unless we revisit our current design > assumptions for device assignment in pKVM on x86 and manage the physical > PCI config in the trusted hypervisor, not in the host (with all the > increasing complexity that comes with that, related to power management > and other things). Thank you very much for clarification Dmytro on this and many other points when it comes to pKVM. It does help greatly to bring us on the same page. > > Also, thinking more about it: irrespectively of passthrough devices, I > guess that the protected pKVM guest may well want to use virtio with PCI > transport (not for things like networking, but that's not the point), > thus be prone to the same attacks. > > >> The untrusted host schedules its guests on physical CPUs (i.e. the > >> host's L1 vCPUs are 1:1 mapped onto pCPUs), while the trusted hypervisor > >> has no scheduling, it only handles vmexits from the host and guests. The > >> untrusted host fully controls the physical interrupt controllers (I > >> think we realize that is not perfectly fine, but here we are), etc. > > > > Yeah, IRQs are a tough nut to crack. > > And BTW, doesn't it mean that interrupts also need to be hardened in the > guest (if we don't want the complexity of interrupt controllers in the > trusted hypervisor)? At least sensitive ones like IPIs, but I guess we > should also consider interrupt-based timings attacks, which could use > any type of interrupt. (I have no idea how to harden either of the two > cases, but I'm no expert.) We have been thinking about it a bit at least when it comes to our TDX case. Two main issues were identified: interrupts contributing to the state of Linux PRNG [1] and potential implications of missing interrupts for reliable panic and other kernel use cases [2]. [1] https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html#randomness-inside-tdx-guest [2] https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html#reliable-panic For the first one, in addition to simply enforce usage of RDSEED for TDX guests, we still want to do a proper evaluation of security of Linux PRNG under our threat model. The second one is harder to reliably asses imo, but so far we were not able to find any concrete attack vectors. But it would be good if people who have expertise in this, could take a look on the assessment we did. The logic was to go over all kernel core callers of various smp_call_function*, on_each_cpu* and check the implications if such an IPI is never delivered. Best Regards, Elena.
On 6/19/23 13:23, Reshetova, Elena wrote: >> And BTW, doesn't it mean that interrupts also need to be hardened in the >> guest (if we don't want the complexity of interrupt controllers in the >> trusted hypervisor)? At least sensitive ones like IPIs, but I guess we >> should also consider interrupt-based timings attacks, which could use >> any type of interrupt. (I have no idea how to harden either of the two >> cases, but I'm no expert.) > > We have been thinking about it a bit at least when it comes to our > TDX case. Two main issues were identified: interrupts contributing > to the state of Linux PRNG [1] and potential implications of missing > interrupts for reliable panic and other kernel use cases [2]. > > [1] https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html#randomness-inside-tdx-guest > [2] https://intel.github.io/ccc-linux-guest-hardening-docs/security-spec.html#reliable-panic > > For the first one, in addition to simply enforce usage of RDSEED > for TDX guests, we still want to do a proper evaluation of security > of Linux PRNG under our threat model. The second one is > harder to reliably asses imo, but so far we were not able to find any > concrete attack vectors. But it would be good if people who > have expertise in this, could take a look on the assessment we did. > The logic was to go over all kernel core callers of various > smp_call_function*, on_each_cpu* and check the implications > if such an IPI is never delivered. Thanks. I also had in mind for example [1]. [1] https://people.cs.kuleuven.be/~jo.vanbulck/ccs18.pdf
On 6/12/23 11:47, Carlos Bilbao wrote: > Kernel developers working on confidential computing for virtualized > environments in x86 operate under a set of assumptions regarding the Linux > kernel threat model that differs from the traditional view. Historically, > the Linux threat model acknowledges attackers residing in userspace, as > well as a limited set of external attackers that are able to interact with > the kernel through networking or limited HW-specific exposed interfaces > (e.g. USB, thunderbolt). The goal of this document is to explain additional > attack vectors that arise in the virtualized confidential computing space > and discuss the proposed protection mechanisms for the Linux kernel. To expedite things, I'm going to outline the changes to make for v3 based on the given feedback. Please, take a look and let me know if I'm missing something. Changes for v3: - Remove pKVM from the document. Although there are clear overlaps in the threat models (as the discussions have shown), it might be good to omit pKVM for now to avoid further complexity. In the future, when pKVM is more mature, we can revisit and discuss its inclusion. - Change file name to "snp-tdx-threat-model.rst". - Replace hyphens (dashes) for parenthesis in a parenthetical sentence. - Change "technology specific" for "technology-specific". > > Reviewed-by: Larry Dewey <larry.dewey@amd.com> > Reviewed-by: David Kaplan <david.kaplan@amd.com> > Co-developed-by: Elena Reshetova <elena.reshetova@intel.com> > Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> > Signed-off-by: Carlos Bilbao <carlos.bilbao@amd.com> > --- > > V1 can be found in: > https://lore.kernel.org/lkml/20230327141816.2648615-1-carlos.bilbao@amd.com/ > Changes since v1: > > - Apply feedback from first version of the patch > - Clarify that the document applies only to a particular angle of > confidential computing, namely confidential computing for virtualized > environments. Also, state that the document is specific to x86 and > that the main goal is to discuss the emerging threats. > - Change commit message and file name accordingly > - Replace AMD's link to AMD SEV SNP white paper > - Minor tweaking and clarifications > > --- > Documentation/security/index.rst | 1 + > .../security/x86-confidential-computing.rst | 298 ++++++++++++++++++ > MAINTAINERS | 6 + > 3 files changed, 305 insertions(+) > create mode 100644 Documentation/security/x86-confidential-computing.rst > > diff --git a/Documentation/security/index.rst b/Documentation/security/index.rst > index 6ed8d2fa6f9e..bda919aecb37 100644 > --- a/Documentation/security/index.rst > +++ b/Documentation/security/index.rst > @@ -6,6 +6,7 @@ Security Documentation > :maxdepth: 1 > > credentials > + x86-confidential-computing > IMA-templates > keys/index > lsm > diff --git a/Documentation/security/x86-confidential-computing.rst b/Documentation/security/x86-confidential-computing.rst > new file mode 100644 > index 000000000000..5c52b8888089 > --- /dev/null > +++ b/Documentation/security/x86-confidential-computing.rst > @@ -0,0 +1,298 @@ > +====================================================== > +Confidential Computing in Linux for x86 virtualization > +====================================================== > + > +.. contents:: :local: > + > +By: Elena Reshetova <elena.reshetova@intel.com> and Carlos Bilbao <carlos.bilbao@amd.com> > + > +Motivation > +========== > + > +Kernel developers working on confidential computing for virtualized > +environments in x86 operate under a set of assumptions regarding the Linux > +kernel threat model that differ from the traditional view. Historically, > +the Linux threat model acknowledges attackers residing in userspace, as > +well as a limited set of external attackers that are able to interact with > +the kernel through various networking or limited HW-specific exposed > +interfaces (USB, thunderbolt). The goal of this document is to explain > +additional attack vectors that arise in the confidential computing space > +and discuss the proposed protection mechanisms for the Linux kernel. > + > +Overview and terminology > +======================== > + > +Confidential Computing (CoCo) is a broad term covering a wide range of > +security technologies that aim to protect the confidentiality and integrity > +of data in use (vs. data at rest or data in transit). At its core, CoCo > +solutions provide a Trusted Execution Environment (TEE), where secure data > +processing can be performed and, as a result, they are typically further > +classified into different subtypes depending on the SW that is intended > +to be run in TEE. This document focuses on a subclass of CoCo technologies > +that are targeting virtualized environments and allow running Virtual > +Machines (VM) inside TEE. From now on in this document will be referring > +to this subclass of CoCo as 'Confidential Computing (CoCo) for the > +virtualized environments (VE)'. > + > +CoCo, in the virtualization context, refers to a set of HW and/or SW > +technologies that allow for stronger security guarantees for the SW running > +inside a CoCo VM. Namely, confidential computing allows its users to > +confirm the trustworthiness of all SW pieces to include in its reduced > +Trusted Computing Base (TCB) given its ability to attest the state of these > +trusted components. > + > +While the concrete implementation details differ between technologies, all > +available mechanisms aim to provide increased confidentiality and > +integrity for the VM's guest memory and execution state (vCPU registers), > +more tightly controlled guest interrupt injection, as well as some > +additional mechanisms to control guest-host page mapping. More details on > +the x86-specific solutions can be found in > +:doc:`Intel Trust Domain Extensions (TDX) </arch/x86/tdx>` and > +`AMD Memory Encryption <https://www.amd.com/system/files/techdocs/sev-snp-strengthening-vm-isolation-with-integrity-protection-and-more.pdf>`_. > + > +The basic CoCo guest layout includes the host, guest, the interfaces that > +communicate guest and host, a platform capable of supporting CoCo VMs, and > +a trusted intermediary between the guest VM and the underlying platform > +that acts as a security manager. The host-side virtual machine monitor > +(VMM) typically consists of a subset of traditional VMM features and > +is still in charge of the guest lifecycle, i.e. create or destroy a CoCo > +VM, manage its access to system resources, etc. However, since it > +typically stays out of CoCo VM TCB, its access is limited to preserve the > +security objectives. > + > +In the following diagram, the "<--->" lines represent bi-directional > +communication channels or interfaces between the CoCo security manager and > +the rest of the components (data flow for guest, host, hardware) :: > + > + +-------------------+ +-----------------------+ > + | CoCo guest VM |<---->| | > + +-------------------+ | | > + | Interfaces | | CoCo security manager | > + +-------------------+ | | > + | Host VMM |<---->| | > + +-------------------+ | | > + | | > + +--------------------+ | | > + | CoCo platform |<--->| | > + +--------------------+ +-----------------------+ > + > +The specific details of the CoCo security manager vastly diverge between > +technologies. For example, in some cases, it will be implemented in HW > +while in others it may be pure SW. In some cases, such as for the > +`Protected kernel-based virtual machine (pKVM) <https://github.com/intel-staging/pKVM-IA>`, > +the CoCo security manager is a small, isolated and highly privileged > +(compared to the rest of SW running on the host) part of a traditional > +VMM. > + > +Existing Linux kernel threat model > +================================== > + > +The overall components of the current Linux kernel threat model are:: > + > + +-----------------------+ +-------------------+ > + | |<---->| Userspace | > + | | +-------------------+ > + | External attack | | Interfaces | > + | vectors | +-------------------+ > + | |<---->| Linux Kernel | > + | | +-------------------+ > + +-----------------------+ +-------------------+ > + | Bootloader/BIOS | > + +-------------------+ > + +-------------------+ > + | HW platform | > + +-------------------+ > + > +There is also communication between the bootloader and the kernel during > +the boot process, but this diagram does not represent it explicitly. The > +"Interfaces" box represents the various interfaces that allow > +communication between kernel and userspace. This includes system calls, > +kernel APIs, device drivers, etc. > + > +The existing Linux kernel threat model typically assumes execution on a > +trusted HW platform with all of the firmware and bootloaders included on > +its TCB. The primary attacker resides in the userspace, and all of the data > +coming from there is generally considered untrusted, unless userspace is > +privileged enough to perform trusted actions. In addition, external > +attackers are typically considered, including those with access to enabled > +external networks (e.g. Ethernet, Wireless, Bluetooth), exposed hardware > +interfaces (e.g. USB, Thunderbolt), and the ability to modify the contents > +of disks offline. > + > +Regarding external attack vectors, it is interesting to note that in most > +cases external attackers will try to exploit vulnerabilities in userspace > +first, but that it is possible for an attacker to directly target the > +kernel; particularly if the host has physical access. Examples of direct > +kernel attacks include the vulnerabilities CVE-2019-19524, CVE-2022-0435 > +and CVE-2020-24490. > + > +Confidential Computing threat model and its security objectives > +=============================================================== > + > +Confidential Computing adds a new type of attacker to the above list: a > +potentially misbehaving host (which can also include some part of a > +traditional VMM or all of it), which is typically placed outside of the > +CoCo VM TCB due to its large SW attack surface. It is important to note > +that this doesn’t imply that the host or VMM are intentionally > +malicious, but that there exists a security value in having a small CoCo > +VM TCB. This new type of adversary may be viewed as a more powerful type > +of external attacker, as it resides locally on the same physical machine > +-in contrast to a remote network attacker- and has control over the guest > +kernel communication with most of the HW:: > + > + +------------------------+ > + | CoCo guest VM | > + +-----------------------+ | +-------------------+ | > + | |<--->| | Userspace | | > + | | | +-------------------+ | > + | External attack | | | Interfaces | | > + | vectors | | +-------------------+ | > + | |<--->| | Linux Kernel | | > + | | | +-------------------+ | > + +-----------------------+ | +-------------------+ | > + | | Bootloader/BIOS | | > + +-----------------------+ | +-------------------+ | > + | |<--->+------------------------+ > + | | | Interfaces | > + | | +------------------------+ > + | CoCo security |<--->| Host/Host-side VMM | > + | manager | +------------------------+ > + | | +------------------------+ > + | |<--->| CoCo platform | > + +-----------------------+ +------------------------+ > + > +While traditionally the host has unlimited access to guest data and can > +leverage this access to attack the guest, the CoCo systems mitigate such > +attacks by adding security features like guest data confidentiality and > +integrity protection. This threat model assumes that those features are > +available and intact. > + > +The **Linux kernel CoCo VM security objectives** can be summarized as follows: > + > +1. Preserve the confidentiality and integrity of CoCo guest's private > +memory and registers. > + > +2. Prevent privileged escalation from a host into a CoCo guest Linux kernel. > +While it is true that the host (and host-side VMM) requires some level of > +privilege to create, destroy, or pause the guest, part of the goal of > +preventing privileged escalation is to ensure that these operations do not > +provide a pathway for attackers to gain access to the guest's kernel. > + > +The above security objectives result in two primary **Linux kernel CoCo > +VM assets**: > + > +1. Guest kernel execution context. > +2. Guest kernel private memory. > + > +The host retains full control over the CoCo guest resources, and can deny > +access to them at any time. Examples of resources include CPU time, memory > +that the guest can consume, network bandwidth, etc. Because of this, the > +host Denial of Service (DoS) attacks against CoCo guests are beyond the > +scope of this threat model. > + > +The **Linux CoCo VM attack surface** is any interface exposed from a CoCo > +guest Linux kernel towards an untrusted host that is not covered by the > +CoCo technology SW/HW protection. This includes any possible > +side-channels, as well as transient execution side channels. Examples of > +explicit (not side-channel) interfaces include accesses to port I/O, MMIO > +and DMA interfaces, access to PCI configuration space, VMM-specific > +hypercalls (towards Host-side VMM), access to shared memory pages, > +interrupts allowed to be injected into the guest kernel by the host, as > +well as CoCo technology specific hypercalls, if present. Additionally, the > +host in a CoCo system typically controls the process of creating a CoCo > +guest: it has a method to load into a guest the firmware and bootloader > +images, the kernel image together with the kernel command line. All of this > +data should also be considered untrusted until its integrity and > +authenticity is established via attestation. > + > +The table below shows a threat matrix for the CoCo guest Linux kernel with > +the potential mitigation strategies. The matrix refers to CoCo-specific > +versions of the guest, host and platform. > + > +.. list-table:: CoCo Linux guest kernel threat matrix > + :widths: auto > + :align: center > + :header-rows: 1 > + > + * - Threat name > + - Threat description > + - Mitigation strategies > + > + * - Guest malicious configuration > + - A misbehaving host modifies one of the following guest's > + configuration: > + > + 1. Guest firmware or bootloader > + > + 2. Guest kernel or module binaries > + > + 3. Guest command line parameters > + > + This allows the host to break the integrity of the code running > + inside a CoCo guest, and violates the CoCo security objectives. > + - The integrity of the guest's configuration passed via untrusted host > + must be ensured by methods such as remote attestation and signing. > + This should be largely transparent to the guest kernel, and would > + allow it to assume a trusted state at the time of boot. > + > + * - CoCo guest data attacks > + - A misbehaving host retains full control of the CoCo guest's data > + in-transit between the guest and the host-managed physical or > + virtual devices. This allows any attack against confidentiality, > + integrity or freshness of such data. > + - The CoCo guest is responsible for ensuring the confidentiality, > + integrity and freshness of such data using well-established > + security mechanisms. For example, for any guest external network > + communications passed via the untrusted host, an end-to-end > + secure session must be established between a guest and a trusted > + remote endpoint using well-known protocols such as TLS. > + This requirement also applies to protection of the guest's disk > + image. > + > + * - Malformed runtime input > + - A misbehaving host injects malformed input via any communication > + interface used by the guest's kernel code. If the code is not > + prepared to handle this input correctly, this can result in a host > + --> guest kernel privilege escalation. This includes traditional > + side-channel and/or transient execution attack vectors. > + - The attestation or signing process cannot help to mitigate this > + threat since this input is highly dynamic. Instead, a different set > + of mechanisms is required: > + > + 1. *Limit the exposed attack surface*. Whenever possible, disable > + complex kernel features and device drivers (not required for guest > + operation) that actively use the communication interfaces between > + the untrusted host and the guest. This is not a new concept for the > + Linux kernel, since it already has mechanisms to disable external > + interfaces, such as attacker's access via USB/Thunderbolt subsystem. > + > + 2. *Harden the exposed attack surface*. Any code that uses such > + interfaces must treat the input from the untrusted host as malicious, > + and do sanity checks before processing it. This can be ensured by > + performing a code audit of such device drivers as well as employing > + other standard techniques for testing the code robustness, such as > + fuzzing. This is again a well-known concept for the Linux kernel, > + since all its networking code has been previously analyzed under > + presumption of processing malformed input from a network attacker. > + > + * - Malicious runtime input > + - A misbehaving host injects a specific input value via any > + communication interface used by the guest's kernel code. The > + difference with the previous attack vector (malformed runtime input) > + is that this input is not malformed, but its value is crafted to > + impact the guest's kernel security. Examples of such inputs include > + providing a malicious time to the guest or the entropy to the guest > + random number generator. Additionally, the timing of such events can > + be an attack vector on its own, if it results in a particular guest > + kernel action (i.e. processing of a host-injected interrupt). > + - Similarly, as with the previous attack vector, it is not possible to > + use attestation mechanisms to address this threat. Instead, such > + attack vectors (i.e. interfaces) must be either disabled or made > + resistant to supplied host input. > + > +As can be seen from the above table, the potential mitigation strategies > +to secure the CoCo Linux guest kernel vary, but can be roughly split into > +mechanisms that either require or do not require changes to the existing > +Linux kernel code. One main goal of the CoCo security architecture is to > +minimize changes to the Linux kernel code, while also providing usable > +and scalable means to facilitate the security of a CoCo guest kernel. > diff --git a/MAINTAINERS b/MAINTAINERS > index a73486c4aa6e..1d4ae60cdee9 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -5197,6 +5197,12 @@ S: Orphan > W: http://accessrunner.sourceforge.net/ > F: drivers/usb/atm/cxacru.c > > +CONFIDENTIAL COMPUTING THREAT MODEL FOR X86 VIRTUALIZATION > +M: Elena Reshetova <elena.reshetova@intel.com> > +M: Carlos Bilbao <carlos.bilbao@amd.com> > +S: Maintained > +F: Documentation/security/x86-confidential-computing.rst > + > CONFIGFS > M: Joel Becker <jlbec@evilplan.org> > M: Christoph Hellwig <hch@lst.de> Thanks, Carlos
diff --git a/Documentation/security/index.rst b/Documentation/security/index.rst index 6ed8d2fa6f9e..bda919aecb37 100644 --- a/Documentation/security/index.rst +++ b/Documentation/security/index.rst @@ -6,6 +6,7 @@ Security Documentation :maxdepth: 1 credentials + x86-confidential-computing IMA-templates keys/index lsm diff --git a/Documentation/security/x86-confidential-computing.rst b/Documentation/security/x86-confidential-computing.rst new file mode 100644 index 000000000000..5c52b8888089 --- /dev/null +++ b/Documentation/security/x86-confidential-computing.rst @@ -0,0 +1,298 @@ +====================================================== +Confidential Computing in Linux for x86 virtualization +====================================================== + +.. contents:: :local: + +By: Elena Reshetova <elena.reshetova@intel.com> and Carlos Bilbao <carlos.bilbao@amd.com> + +Motivation +========== + +Kernel developers working on confidential computing for virtualized +environments in x86 operate under a set of assumptions regarding the Linux +kernel threat model that differ from the traditional view. Historically, +the Linux threat model acknowledges attackers residing in userspace, as +well as a limited set of external attackers that are able to interact with +the kernel through various networking or limited HW-specific exposed +interfaces (USB, thunderbolt). The goal of this document is to explain +additional attack vectors that arise in the confidential computing space +and discuss the proposed protection mechanisms for the Linux kernel. + +Overview and terminology +======================== + +Confidential Computing (CoCo) is a broad term covering a wide range of +security technologies that aim to protect the confidentiality and integrity +of data in use (vs. data at rest or data in transit). At its core, CoCo +solutions provide a Trusted Execution Environment (TEE), where secure data +processing can be performed and, as a result, they are typically further +classified into different subtypes depending on the SW that is intended +to be run in TEE. This document focuses on a subclass of CoCo technologies +that are targeting virtualized environments and allow running Virtual +Machines (VM) inside TEE. From now on in this document will be referring +to this subclass of CoCo as 'Confidential Computing (CoCo) for the +virtualized environments (VE)'. + +CoCo, in the virtualization context, refers to a set of HW and/or SW +technologies that allow for stronger security guarantees for the SW running +inside a CoCo VM. Namely, confidential computing allows its users to +confirm the trustworthiness of all SW pieces to include in its reduced +Trusted Computing Base (TCB) given its ability to attest the state of these +trusted components. + +While the concrete implementation details differ between technologies, all +available mechanisms aim to provide increased confidentiality and +integrity for the VM's guest memory and execution state (vCPU registers), +more tightly controlled guest interrupt injection, as well as some +additional mechanisms to control guest-host page mapping. More details on +the x86-specific solutions can be found in +:doc:`Intel Trust Domain Extensions (TDX) </arch/x86/tdx>` and +`AMD Memory Encryption <https://www.amd.com/system/files/techdocs/sev-snp-strengthening-vm-isolation-with-integrity-protection-and-more.pdf>`_. + +The basic CoCo guest layout includes the host, guest, the interfaces that +communicate guest and host, a platform capable of supporting CoCo VMs, and +a trusted intermediary between the guest VM and the underlying platform +that acts as a security manager. The host-side virtual machine monitor +(VMM) typically consists of a subset of traditional VMM features and +is still in charge of the guest lifecycle, i.e. create or destroy a CoCo +VM, manage its access to system resources, etc. However, since it +typically stays out of CoCo VM TCB, its access is limited to preserve the +security objectives. + +In the following diagram, the "<--->" lines represent bi-directional +communication channels or interfaces between the CoCo security manager and +the rest of the components (data flow for guest, host, hardware) :: + + +-------------------+ +-----------------------+ + | CoCo guest VM |<---->| | + +-------------------+ | | + | Interfaces | | CoCo security manager | + +-------------------+ | | + | Host VMM |<---->| | + +-------------------+ | | + | | + +--------------------+ | | + | CoCo platform |<--->| | + +--------------------+ +-----------------------+ + +The specific details of the CoCo security manager vastly diverge between +technologies. For example, in some cases, it will be implemented in HW +while in others it may be pure SW. In some cases, such as for the +`Protected kernel-based virtual machine (pKVM) <https://github.com/intel-staging/pKVM-IA>`, +the CoCo security manager is a small, isolated and highly privileged +(compared to the rest of SW running on the host) part of a traditional +VMM. + +Existing Linux kernel threat model +================================== + +The overall components of the current Linux kernel threat model are:: + + +-----------------------+ +-------------------+ + | |<---->| Userspace | + | | +-------------------+ + | External attack | | Interfaces | + | vectors | +-------------------+ + | |<---->| Linux Kernel | + | | +-------------------+ + +-----------------------+ +-------------------+ + | Bootloader/BIOS | + +-------------------+ + +-------------------+ + | HW platform | + +-------------------+ + +There is also communication between the bootloader and the kernel during +the boot process, but this diagram does not represent it explicitly. The +"Interfaces" box represents the various interfaces that allow +communication between kernel and userspace. This includes system calls, +kernel APIs, device drivers, etc. + +The existing Linux kernel threat model typically assumes execution on a +trusted HW platform with all of the firmware and bootloaders included on +its TCB. The primary attacker resides in the userspace, and all of the data +coming from there is generally considered untrusted, unless userspace is +privileged enough to perform trusted actions. In addition, external +attackers are typically considered, including those with access to enabled +external networks (e.g. Ethernet, Wireless, Bluetooth), exposed hardware +interfaces (e.g. USB, Thunderbolt), and the ability to modify the contents +of disks offline. + +Regarding external attack vectors, it is interesting to note that in most +cases external attackers will try to exploit vulnerabilities in userspace +first, but that it is possible for an attacker to directly target the +kernel; particularly if the host has physical access. Examples of direct +kernel attacks include the vulnerabilities CVE-2019-19524, CVE-2022-0435 +and CVE-2020-24490. + +Confidential Computing threat model and its security objectives +=============================================================== + +Confidential Computing adds a new type of attacker to the above list: a +potentially misbehaving host (which can also include some part of a +traditional VMM or all of it), which is typically placed outside of the +CoCo VM TCB due to its large SW attack surface. It is important to note +that this doesn’t imply that the host or VMM are intentionally +malicious, but that there exists a security value in having a small CoCo +VM TCB. This new type of adversary may be viewed as a more powerful type +of external attacker, as it resides locally on the same physical machine +-in contrast to a remote network attacker- and has control over the guest +kernel communication with most of the HW:: + + +------------------------+ + | CoCo guest VM | + +-----------------------+ | +-------------------+ | + | |<--->| | Userspace | | + | | | +-------------------+ | + | External attack | | | Interfaces | | + | vectors | | +-------------------+ | + | |<--->| | Linux Kernel | | + | | | +-------------------+ | + +-----------------------+ | +-------------------+ | + | | Bootloader/BIOS | | + +-----------------------+ | +-------------------+ | + | |<--->+------------------------+ + | | | Interfaces | + | | +------------------------+ + | CoCo security |<--->| Host/Host-side VMM | + | manager | +------------------------+ + | | +------------------------+ + | |<--->| CoCo platform | + +-----------------------+ +------------------------+ + +While traditionally the host has unlimited access to guest data and can +leverage this access to attack the guest, the CoCo systems mitigate such +attacks by adding security features like guest data confidentiality and +integrity protection. This threat model assumes that those features are +available and intact. + +The **Linux kernel CoCo VM security objectives** can be summarized as follows: + +1. Preserve the confidentiality and integrity of CoCo guest's private +memory and registers. + +2. Prevent privileged escalation from a host into a CoCo guest Linux kernel. +While it is true that the host (and host-side VMM) requires some level of +privilege to create, destroy, or pause the guest, part of the goal of +preventing privileged escalation is to ensure that these operations do not +provide a pathway for attackers to gain access to the guest's kernel. + +The above security objectives result in two primary **Linux kernel CoCo +VM assets**: + +1. Guest kernel execution context. +2. Guest kernel private memory. + +The host retains full control over the CoCo guest resources, and can deny +access to them at any time. Examples of resources include CPU time, memory +that the guest can consume, network bandwidth, etc. Because of this, the +host Denial of Service (DoS) attacks against CoCo guests are beyond the +scope of this threat model. + +The **Linux CoCo VM attack surface** is any interface exposed from a CoCo +guest Linux kernel towards an untrusted host that is not covered by the +CoCo technology SW/HW protection. This includes any possible +side-channels, as well as transient execution side channels. Examples of +explicit (not side-channel) interfaces include accesses to port I/O, MMIO +and DMA interfaces, access to PCI configuration space, VMM-specific +hypercalls (towards Host-side VMM), access to shared memory pages, +interrupts allowed to be injected into the guest kernel by the host, as +well as CoCo technology specific hypercalls, if present. Additionally, the +host in a CoCo system typically controls the process of creating a CoCo +guest: it has a method to load into a guest the firmware and bootloader +images, the kernel image together with the kernel command line. All of this +data should also be considered untrusted until its integrity and +authenticity is established via attestation. + +The table below shows a threat matrix for the CoCo guest Linux kernel with +the potential mitigation strategies. The matrix refers to CoCo-specific +versions of the guest, host and platform. + +.. list-table:: CoCo Linux guest kernel threat matrix + :widths: auto + :align: center + :header-rows: 1 + + * - Threat name + - Threat description + - Mitigation strategies + + * - Guest malicious configuration + - A misbehaving host modifies one of the following guest's + configuration: + + 1. Guest firmware or bootloader + + 2. Guest kernel or module binaries + + 3. Guest command line parameters + + This allows the host to break the integrity of the code running + inside a CoCo guest, and violates the CoCo security objectives. + - The integrity of the guest's configuration passed via untrusted host + must be ensured by methods such as remote attestation and signing. + This should be largely transparent to the guest kernel, and would + allow it to assume a trusted state at the time of boot. + + * - CoCo guest data attacks + - A misbehaving host retains full control of the CoCo guest's data + in-transit between the guest and the host-managed physical or + virtual devices. This allows any attack against confidentiality, + integrity or freshness of such data. + - The CoCo guest is responsible for ensuring the confidentiality, + integrity and freshness of such data using well-established + security mechanisms. For example, for any guest external network + communications passed via the untrusted host, an end-to-end + secure session must be established between a guest and a trusted + remote endpoint using well-known protocols such as TLS. + This requirement also applies to protection of the guest's disk + image. + + * - Malformed runtime input + - A misbehaving host injects malformed input via any communication + interface used by the guest's kernel code. If the code is not + prepared to handle this input correctly, this can result in a host + --> guest kernel privilege escalation. This includes traditional + side-channel and/or transient execution attack vectors. + - The attestation or signing process cannot help to mitigate this + threat since this input is highly dynamic. Instead, a different set + of mechanisms is required: + + 1. *Limit the exposed attack surface*. Whenever possible, disable + complex kernel features and device drivers (not required for guest + operation) that actively use the communication interfaces between + the untrusted host and the guest. This is not a new concept for the + Linux kernel, since it already has mechanisms to disable external + interfaces, such as attacker's access via USB/Thunderbolt subsystem. + + 2. *Harden the exposed attack surface*. Any code that uses such + interfaces must treat the input from the untrusted host as malicious, + and do sanity checks before processing it. This can be ensured by + performing a code audit of such device drivers as well as employing + other standard techniques for testing the code robustness, such as + fuzzing. This is again a well-known concept for the Linux kernel, + since all its networking code has been previously analyzed under + presumption of processing malformed input from a network attacker. + + * - Malicious runtime input + - A misbehaving host injects a specific input value via any + communication interface used by the guest's kernel code. The + difference with the previous attack vector (malformed runtime input) + is that this input is not malformed, but its value is crafted to + impact the guest's kernel security. Examples of such inputs include + providing a malicious time to the guest or the entropy to the guest + random number generator. Additionally, the timing of such events can + be an attack vector on its own, if it results in a particular guest + kernel action (i.e. processing of a host-injected interrupt). + - Similarly, as with the previous attack vector, it is not possible to + use attestation mechanisms to address this threat. Instead, such + attack vectors (i.e. interfaces) must be either disabled or made + resistant to supplied host input. + +As can be seen from the above table, the potential mitigation strategies +to secure the CoCo Linux guest kernel vary, but can be roughly split into +mechanisms that either require or do not require changes to the existing +Linux kernel code. One main goal of the CoCo security architecture is to +minimize changes to the Linux kernel code, while also providing usable +and scalable means to facilitate the security of a CoCo guest kernel. diff --git a/MAINTAINERS b/MAINTAINERS index a73486c4aa6e..1d4ae60cdee9 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5197,6 +5197,12 @@ S: Orphan W: http://accessrunner.sourceforge.net/ F: drivers/usb/atm/cxacru.c +CONFIDENTIAL COMPUTING THREAT MODEL FOR X86 VIRTUALIZATION +M: Elena Reshetova <elena.reshetova@intel.com> +M: Carlos Bilbao <carlos.bilbao@amd.com> +S: Maintained +F: Documentation/security/x86-confidential-computing.rst + CONFIGFS M: Joel Becker <jlbec@evilplan.org> M: Christoph Hellwig <hch@lst.de>