From patchwork Mon Nov 21 00:26:23 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kai Huang <kai.huang@intel.com>
X-Patchwork-Id: 23478
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1322386wrr;
        Sun, 20 Nov 2022 16:28:02 -0800 (PST)
X-Google-Smtp-Source: 
 AA0mqf7kynm5wrVTPEA0JEXK2GS1g9wuqlxPupRBzA9gMJQfJRCB+agoZHDqPpMIgTtYi8nDwk2C
X-Received: by 2002:a17:902:ec8a:b0:188:640f:f400 with SMTP id
 x10-20020a170902ec8a00b00188640ff400mr9229561plg.143.1668990481883;
        Sun, 20 Nov 2022 16:28:01 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1668990481; cv=none;
        d=google.com; s=arc-20160816;
        b=hx050Tiy2MUKr+YTGdsvui7pgP/Q2zKMN+0nxr4S6GrZLXDMfLILsA0H8/gc3FVeFn
         CX7p66jpCSJwmzrPjG18zocJpcQHm0tCZK1Xd4dV7DqL4YL1uwnygYOeevLLN6smEM6+
         68bHLvuPHEVUTm6bzFfKalN95k8plCUQeoinzJWlVNEXCJ+WKHozljC+8HSryHN20JkF
         kfebgIVEsyCeeX3uP9c53DsOYAta6Q+27GvcE9/hSZhsfDhixUaOQZTMBMPA4YMOH7UT
         nQ/szymhVGQDm1G2UQ/bd6ndxuu8uUl4hru+Ka1+7PN7WTLS/9pMnJmk13RJbIO5xjg5
         IAZg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from
         :dkim-signature;
        bh=LZWYWLoEoL4F6Me74oCLdCREWhF4eCr0Lu+se/O+YHQ=;
        b=exNqzYcwX7fQNZu0XlvNQYJWQFTOf+IwtusjjoZO+IqJ4Rrz5Nv6kdztNOJ8YllqFL
         NdFgSc0eNLDuL9ltchDeUJgH5EnQNu5qAvKB9edqxGK/Xn28osuxQ9aGEHhwelpDSFI4
         ARWYAhDPRVl2wq76wi577uznHwcyX2g0SpDXJ+B5fPe69CUbbooGps5RwO8BgRZDWhFZ
         7LQT5cx96cXF6KRbWir11DPLwz5huR53HOlGoQf/mAyLYrgvDdgQXuLyfwqsJh6MU/Op
         QCEFiEJbF3wpvtLo6afRu5Ymkrb0fkx0MhNshbtjCw/67moEqEEl1+giC+QSsRYXSXdI
         V9aA==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=a4dJT29F;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 f35-20020a631023000000b0043a93738a14si9576044pgl.167.2022.11.20.16.27.49;
        Sun, 20 Nov 2022 16:28:01 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=a4dJT29F;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229893AbiKUA1h (ORCPT <rfc822;leviz.kernel.dev@gmail.com>
        + 99 others); Sun, 20 Nov 2022 19:27:37 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57418 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229874AbiKUA1R (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Sun, 20 Nov 2022 19:27:17 -0500
Received: from mga05.intel.com (mga05.intel.com [192.55.52.43])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 29D382CDCF;
        Sun, 20 Nov 2022 16:27:00 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1668990420; x=1700526420;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=JrAeTTu2WnvvJ3eZU1SYcarqNb52NmG09HTDLPmjcXs=;
  b=a4dJT29FA4LCvaCheONAJNeZy7sL/jV9XcZVCFFP4K1BpqiIRMcnxKqi
   so+W4SFhwrjYK+U/rKxyZg4GyiDzNOHbpocuwmGv+e0v6ayjc2b/wPrtt
   SENOl8R2bkwKFudzon5KLLnZt/J+udDCdMCmimZTusrhuRao+oZgZ+fnV
   zVudIlmUEWGz0hYBHevLSBdZDmwWTsBAoBRtdOS65q76BcS4obcfqgCP9
   SG6BHhc1LO3n7TUVA/tcZSW0XMZ3OXcjWnHDYz+Dsjj/LOfN78YeVsdef
   mP2AiV9/s2NMzcchkWe35Tp+4fEuaQFfLChacVdmPUT5mZw11zsupallu
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="399732270"
X-IronPort-AV: E=Sophos;i="5.96,180,1665471600";
   d="scan'208";a="399732270"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
  by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Nov 2022 16:26:59 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="729825134"
X-IronPort-AV: E=Sophos;i="5.96,180,1665471600";
   d="scan'208";a="729825134"
Received: from tomnavar-mobl.amr.corp.intel.com (HELO
 khuang2-desk.gar.corp.intel.com) ([10.209.176.15])
  by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Nov 2022 16:26:55 -0800
From: Kai Huang <kai.huang@intel.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: linux-mm@kvack.org, seanjc@google.com, pbonzini@redhat.com,
        dave.hansen@intel.com, dan.j.williams@intel.com,
        rafael.j.wysocki@intel.com, kirill.shutemov@linux.intel.com,
        ying.huang@intel.com, reinette.chatre@intel.com,
        len.brown@intel.com, tony.luck@intel.com, peterz@infradead.org,
        ak@linux.intel.com, isaku.yamahata@intel.com, chao.gao@intel.com,
        sathyanarayanan.kuppuswamy@linux.intel.com, bagasdotme@gmail.com,
        sagis@google.com, imammedo@redhat.com, kai.huang@intel.com
Subject: [PATCH v7 01/20] x86/tdx: Define TDX supported page sizes as macros
Date: Mon, 21 Nov 2022 13:26:23 +1300
Message-Id: 
 <d6c6e664c445e9ccf1528625f0e21bbb8471d35f.1668988357.git.kai.huang@intel.com>
X-Mailer: git-send-email 2.38.1
In-Reply-To: <cover.1668988357.git.kai.huang@intel.com>
References: <cover.1668988357.git.kai.huang@intel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,
        SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1750063363163072121?=
X-GMAIL-MSGID: =?utf-8?q?1750063363163072121?=

TDX supports 4K, 2M and 1G page sizes.  The corresponding values are
defined by the TDX module spec and used as TDX module ABI.  Currently,
they are used in try_accept_one() when the TDX guest tries to accept a
page.  However currently try_accept_one() uses hard-coded magic values.

Define TDX supported page sizes as macros and get rid of the hard-coded
values in try_accept_one().  TDX host support will need to use them too.

Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---

v6 -> v7:

 - Removed the helper to convert kernel page level to TDX page level.
 - Changed to use macro to define TDX supported page sizes.

---
 arch/x86/coco/tdx/tdx.c    | 6 +++---
 arch/x86/include/asm/tdx.h | 9 +++++++++
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index cfd4c95b9f04..7fa7fb54f438 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -722,13 +722,13 @@ static bool try_accept_one(phys_addr_t *start, unsigned long len,
 	 */
 	switch (pg_level) {
 	case PG_LEVEL_4K:
-		page_size = 0;
+		page_size = TDX_PS_4K;
 		break;
 	case PG_LEVEL_2M:
-		page_size = 1;
+		page_size = TDX_PS_2M;
 		break;
 	case PG_LEVEL_1G:
-		page_size = 2;
+		page_size = TDX_PS_1G;
 		break;
 	default:
 		return false;
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 28d889c9aa16..e9a3f4a6fba1 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -20,6 +20,15 @@
 
 #ifndef __ASSEMBLY__
 
+/*
+ * TDX supported page sizes (4K/2M/1G).
+ *
+ * Those values are part of the TDX module ABI.  Do not change them.
+ */
+#define TDX_PS_4K	0
+#define TDX_PS_2M	1
+#define TDX_PS_1G	2
+
 /*
  * Used to gather the output registers values of the TDCALL and SEAMCALL
  * instructions when requesting services from the TDX module.

From patchwork Mon Nov 21 00:26:24 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kai Huang <kai.huang@intel.com>
X-Patchwork-Id: 23479
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1322453wrr;
        Sun, 20 Nov 2022 16:28:23 -0800 (PST)
X-Google-Smtp-Source: 
 AA0mqf6FlT5SeXcCXTDpovP4gSvs6uHenjn3fknNr8bvLy8AR0ArAzBLKAq76kojC+fBxQ7+8pyk
X-Received: by 2002:a17:902:f70c:b0:189:bee:65ee with SMTP id
 h12-20020a170902f70c00b001890bee65eemr1880428plo.107.1668990502777;
        Sun, 20 Nov 2022 16:28:22 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1668990502; cv=none;
        d=google.com; s=arc-20160816;
        b=ERVbZJRy37pk0WDMnMqXWuzWzTSyZf9oxuQKi+ktFuo2L8SdquzKLratNlJuaT1gMp
         Uurh8PoqN7lMmSAZkstohvX0Kg6dMhTc+ff3Y0rRIlfL8lgRbYgT2VPToVygVBMEBT24
         h89tnB/FAvzZo2VMozhbtLoxhEzzbKf8ceSj3Mi9V3CGAJWxG85TYwh5QXQ6t0C1ihGW
         phdY4VeAxrbANtNym7jDHWCtmJSYyo3ycl4Q4lV62kq0c1+rB/VWG5iXfHcNhl+LqLby
         6LWYbgSfxVBAlV7/7lGVIXAt5JSFvYUozVLRZSKiiRHnjz0SgcR9tsZHKP3F19Z07RjY
         tyRQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from
         :dkim-signature;
        bh=mXmgJNt5a8JBs6Kck5UgABXP7Jyt88RLbgYn2bBxf/Y=;
        b=vyfR0uWszk9QyXzGe/cChwPPI07e9J/i2PG4QERBi/AYkAChblGwKpaX4/Cm6xkm8A
         HDIhuZO9/LB5reuqBwCk/lwdnloFPWZ5re/zBh7XdbVy3NcAD5x9neZNoZzAQTwWBk9E
         61PQIOyKWbKjurBhjxGjUQtYUIDWRuXG3cBLLAzeXTK0pfCi2sRquZMBNioAMnklWER0
         mOBXBKyp+kVd9ccTnwJH4vv6y1+vEVwQX6wbMrTvsMLN3k4n/CT17jQaQowqPfEdXdIE
         4fX9lwZ/gu0lOO8T2gLzOqaXZ5lIw5LdI9tPe120RndFhgtBm12QO5fiNq/Bu/Et8NHG
         MkQQ==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=TKzJ6YgF;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 w1-20020aa78581000000b0056c25866e36si9039517pfn.79.2022.11.20.16.28.09;
        Sun, 20 Nov 2022 16:28:22 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=TKzJ6YgF;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229909AbiKUA1k (ORCPT <rfc822;leviz.kernel.dev@gmail.com>
        + 99 others); Sun, 20 Nov 2022 19:27:40 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57304 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229877AbiKUA1S (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Sun, 20 Nov 2022 19:27:18 -0500
Received: from mga05.intel.com (mga05.intel.com [192.55.52.43])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 70E332CE3A;
        Sun, 20 Nov 2022 16:27:04 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1668990424; x=1700526424;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=LZUKd5ji+lqvqTP4W9ecKyu3udioVEFbqy438fHZEyw=;
  b=TKzJ6YgFqbzoNUaUTPjcCxM2WKj0nHYv3TLWtj49jfihJeNH8t+zyITL
   /S5yMcoAlrhPwzRLGpCd1omzYEqTc1IGFgsjvz2yn3BbeNkBGcfXybGnx
   YIRpEtLotfyGGRQG57OYeTpwWpOo7QDqVowPgybmSdIvJuWVQBVaD94o+
   6o6MropRT9aUP7d++IvfN42fQIffjn8XERRvM+oRUGPr5UuSTmQUeiPlI
   KNhLvpIo2CKiB8spdi42anCW1Syj2ijs71lbVTfugzwf7E1hJh84PjQlX
   07vNLTVFOsNgqWSH3Ig2+kyHUJjXNF8lutwhkH0yo33bx+FZ3L+2APGBj
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="399732279"
X-IronPort-AV: E=Sophos;i="5.96,180,1665471600";
   d="scan'208";a="399732279"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
  by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Nov 2022 16:27:04 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="729825173"
X-IronPort-AV: E=Sophos;i="5.96,180,1665471600";
   d="scan'208";a="729825173"
Received: from tomnavar-mobl.amr.corp.intel.com (HELO
 khuang2-desk.gar.corp.intel.com) ([10.209.176.15])
  by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Nov 2022 16:27:00 -0800
From: Kai Huang <kai.huang@intel.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: linux-mm@kvack.org, seanjc@google.com, pbonzini@redhat.com,
        dave.hansen@intel.com, dan.j.williams@intel.com,
        rafael.j.wysocki@intel.com, kirill.shutemov@linux.intel.com,
        ying.huang@intel.com, reinette.chatre@intel.com,
        len.brown@intel.com, tony.luck@intel.com, peterz@infradead.org,
        ak@linux.intel.com, isaku.yamahata@intel.com, chao.gao@intel.com,
        sathyanarayanan.kuppuswamy@linux.intel.com, bagasdotme@gmail.com,
        sagis@google.com, imammedo@redhat.com, kai.huang@intel.com
Subject: [PATCH v7 02/20] x86/virt/tdx: Detect TDX during kernel boot
Date: Mon, 21 Nov 2022 13:26:24 +1300
Message-Id: 
 <aaee2d5332a97c840ad401ba935842a998a877ec.1668988357.git.kai.huang@intel.com>
X-Mailer: git-send-email 2.38.1
In-Reply-To: <cover.1668988357.git.kai.huang@intel.com>
References: <cover.1668988357.git.kai.huang@intel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,
        SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1750063385520447562?=
X-GMAIL-MSGID: =?utf-8?q?1750063385520447562?=

Intel Trust Domain Extensions (TDX) protects guest VMs from malicious
host and certain physical attacks.  A CPU-attested software module
called 'the TDX module' runs inside a new isolated memory range as a
trusted hypervisor to manage and run protected VMs.

Pre-TDX Intel hardware has support for a memory encryption architecture
called MKTME.  The memory encryption hardware underpinning MKTME is also
used for Intel TDX.  TDX ends up "stealing" some of the physical address
space from the MKTME architecture for crypto-protection to VMs.  The
BIOS is responsible for partitioning the "KeyID" space between legacy
MKTME and TDX.  The KeyIDs reserved for TDX are called 'TDX private
KeyIDs' or 'TDX KeyIDs' for short.

TDX doesn't trust the BIOS.  During machine boot, TDX verifies the TDX
private KeyIDs are consistently and correctly programmed by the BIOS
across all CPU packages before it enables TDX on any CPU core.  A valid
TDX private KeyID range on BSP indicates TDX has been enabled by the
BIOS, otherwise the BIOS is buggy.

The TDX module is expected to be loaded by the BIOS when it enables TDX,
but the kernel needs to properly initialize it before it can be used to
create and run any TDX guests.  The TDX module will be initialized at
runtime by the user (i.e. KVM) on demand.

Add a new early_initcall(tdx_init) to do TDX early boot initialization.
Only detect TDX private KeyIDs for now.  Some other early checks will
follow up.  Also add a new function to report whether TDX has been
enabled by BIOS (TDX private KeyID range is valid).  Kexec() will also
need it to determine whether need to flush dirty cachelines that are
associated with any TDX private KeyIDs before booting to the new kernel.

To start to support TDX, create a new arch/x86/virt/vmx/tdx/tdx.c for
TDX host kernel support.  Add a new Kconfig option CONFIG_INTEL_TDX_HOST
to opt-in TDX host kernel support (to distinguish with TDX guest kernel
support).  So far only KVM is the only user of TDX.  Make the new config
option depend on KVM_INTEL.

Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---

v6 -> v7:
 - No change.

v5 -> v6:
 - Removed SEAMRR detection to make code simpler.
 - Removed the 'default N' in the KVM_TDX_HOST Kconfig (Kirill).
 - Changed to use 'obj-y' in arch/x86/virt/vmx/tdx/Makefile (Kirill).


---
 arch/x86/Kconfig               | 12 +++++
 arch/x86/Makefile              |  2 +
 arch/x86/include/asm/tdx.h     |  7 +++
 arch/x86/virt/Makefile         |  2 +
 arch/x86/virt/vmx/Makefile     |  2 +
 arch/x86/virt/vmx/tdx/Makefile |  2 +
 arch/x86/virt/vmx/tdx/tdx.c    | 95 ++++++++++++++++++++++++++++++++++
 arch/x86/virt/vmx/tdx/tdx.h    | 15 ++++++
 8 files changed, 137 insertions(+)
 create mode 100644 arch/x86/virt/Makefile
 create mode 100644 arch/x86/virt/vmx/Makefile
 create mode 100644 arch/x86/virt/vmx/tdx/Makefile
 create mode 100644 arch/x86/virt/vmx/tdx/tdx.c
 create mode 100644 arch/x86/virt/vmx/tdx/tdx.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 67745ceab0db..cced4ef3bfb2 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1953,6 +1953,18 @@ config X86_SGX
 
 	  If unsure, say N.
 
+config INTEL_TDX_HOST
+	bool "Intel Trust Domain Extensions (TDX) host support"
+	depends on CPU_SUP_INTEL
+	depends on X86_64
+	depends on KVM_INTEL
+	help
+	  Intel Trust Domain Extensions (TDX) protects guest VMs from malicious
+	  host and certain physical attacks.  This option enables necessary TDX
+	  support in host kernel to run protected VMs.
+
+	  If unsure, say N.
+
 config EFI
 	bool "EFI runtime service support"
 	depends on ACPI
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 415a5d138de4..38d3e8addc5f 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -246,6 +246,8 @@ archheaders:
 
 libs-y  += arch/x86/lib/
 
+core-y += arch/x86/virt/
+
 # drivers-y are linked after core-y
 drivers-$(CONFIG_MATH_EMULATION) += arch/x86/math-emu/
 drivers-$(CONFIG_PCI)            += arch/x86/pci/
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index e9a3f4a6fba1..51c4222a13ae 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -98,5 +98,12 @@ static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
 	return -ENODEV;
 }
 #endif /* CONFIG_INTEL_TDX_GUEST && CONFIG_KVM_GUEST */
+
+#ifdef CONFIG_INTEL_TDX_HOST
+bool platform_tdx_enabled(void);
+#else	/* !CONFIG_INTEL_TDX_HOST */
+static inline bool platform_tdx_enabled(void) { return false; }
+#endif	/* CONFIG_INTEL_TDX_HOST */
+
 #endif /* !__ASSEMBLY__ */
 #endif /* _ASM_X86_TDX_H */
diff --git a/arch/x86/virt/Makefile b/arch/x86/virt/Makefile
new file mode 100644
index 000000000000..1e36502cd738
--- /dev/null
+++ b/arch/x86/virt/Makefile
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0-only
+obj-y	+= vmx/
diff --git a/arch/x86/virt/vmx/Makefile b/arch/x86/virt/vmx/Makefile
new file mode 100644
index 000000000000..feebda21d793
--- /dev/null
+++ b/arch/x86/virt/vmx/Makefile
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0-only
+obj-$(CONFIG_INTEL_TDX_HOST)	+= tdx/
diff --git a/arch/x86/virt/vmx/tdx/Makefile b/arch/x86/virt/vmx/tdx/Makefile
new file mode 100644
index 000000000000..93ca8b73e1f1
--- /dev/null
+++ b/arch/x86/virt/vmx/tdx/Makefile
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0-only
+obj-y += tdx.o
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
new file mode 100644
index 000000000000..982d9c453b6b
--- /dev/null
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -0,0 +1,95 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright(c) 2022 Intel Corporation.
+ *
+ * Intel Trusted Domain Extensions (TDX) support
+ */
+
+#define pr_fmt(fmt)	"tdx: " fmt
+
+#include <linux/types.h>
+#include <linux/init.h>
+#include <linux/printk.h>
+#include <asm/msr-index.h>
+#include <asm/msr.h>
+#include <asm/tdx.h>
+#include "tdx.h"
+
+static u32 tdx_keyid_start __ro_after_init;
+static u32 tdx_keyid_num __ro_after_init;
+
+/*
+ * Detect TDX private KeyIDs to see whether TDX has been enabled by the
+ * BIOS.  Both initializing the TDX module and running TDX guest require
+ * TDX private KeyID.
+ *
+ * TDX doesn't trust BIOS.  TDX verifies all configurations from BIOS
+ * are correct before enabling TDX on any core.  TDX requires the BIOS
+ * to correctly and consistently program TDX private KeyIDs on all CPU
+ * packages.  Unless there is a BIOS bug, detecting a valid TDX private
+ * KeyID range on BSP indicates TDX has been enabled by the BIOS.  If
+ * there's such BIOS bug, it will be caught later when initializing the
+ * TDX module.
+ */
+static int __init detect_tdx(void)
+{
+	int ret;
+
+	/*
+	 * IA32_MKTME_KEYID_PARTIONING:
+	 *   Bit [31:0]:	Number of MKTME KeyIDs.
+	 *   Bit [63:32]:	Number of TDX private KeyIDs.
+	 */
+	ret = rdmsr_safe(MSR_IA32_MKTME_KEYID_PARTITIONING, &tdx_keyid_start,
+			&tdx_keyid_num);
+	if (ret)
+		return -ENODEV;
+
+	if (!tdx_keyid_num)
+		return -ENODEV;
+
+	/*
+	 * KeyID 0 is for TME.  MKTME KeyIDs start from 1.  TDX private
+	 * KeyIDs start after the last MKTME KeyID.
+	 */
+	tdx_keyid_start++;
+
+	pr_info("TDX enabled by BIOS. TDX private KeyID range: [%u, %u)\n",
+			tdx_keyid_start, tdx_keyid_start + tdx_keyid_num);
+
+	return 0;
+}
+
+static void __init clear_tdx(void)
+{
+	tdx_keyid_start = tdx_keyid_num = 0;
+}
+
+static int __init tdx_init(void)
+{
+	if (detect_tdx())
+		return -ENODEV;
+
+	/*
+	 * Initializing the TDX module requires one TDX private KeyID.
+	 * If there's only one TDX KeyID then after module initialization
+	 * KVM won't be able to run any TDX guest, which makes the whole
+	 * thing worthless.  Just disable TDX in this case.
+	 */
+	if (tdx_keyid_num < 2) {
+		pr_info("Disable TDX as there's only one TDX private KeyID available.\n");
+		goto no_tdx;
+	}
+
+	return 0;
+no_tdx:
+	clear_tdx();
+	return -ENODEV;
+}
+early_initcall(tdx_init);
+
+/* Return whether the BIOS has enabled TDX */
+bool platform_tdx_enabled(void)
+{
+	return !!tdx_keyid_num;
+}
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
new file mode 100644
index 000000000000..d00074abcb20
--- /dev/null
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _X86_VIRT_TDX_H
+#define _X86_VIRT_TDX_H
+
+/*
+ * This file contains both macros and data structures defined by the TDX
+ * architecture and Linux defined software data structures and functions.
+ * The two should not be mixed together for better readability.  The
+ * architectural definitions come first.
+ */
+
+/* MSR to report KeyID partitioning between MKTME and TDX */
+#define MSR_IA32_MKTME_KEYID_PARTITIONING	0x00000087
+
+#endif

From patchwork Mon Nov 21 00:26:25 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kai Huang <kai.huang@intel.com>
X-Patchwork-Id: 23481
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1322548wrr;
        Sun, 20 Nov 2022 16:28:48 -0800 (PST)
X-Google-Smtp-Source: 
 AA0mqf6KNl5LZ7o1ItZt/n8krhBenTmde+bCqOZKhYoVchrTHKupafum3XdfrxwgM7qs7YvPqk9+
X-Received: by 2002:a17:902:ca04:b0:17f:7f7e:70c7 with SMTP id
 w4-20020a170902ca0400b0017f7f7e70c7mr9584101pld.107.1668990527991;
        Sun, 20 Nov 2022 16:28:47 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1668990527; cv=none;
        d=google.com; s=arc-20160816;
        b=M1RQlXAL70nd67YOQVy4OAwvFUM/YYp6HNkbbKgJIueKVrYI8M0gAv3LSfBkq9KJyb
         XTbOcFOOU34Jz2AK3V0eX82vXSGhD1XbY4DK9JbfQb6euk8NyTsDGvYT6+8VFVmmVdN8
         xH6JyUVbHx2R5WYtAR8QZQIahRY3NTyb9YLQtFDkqJx8J9iprMyYmJXRz1Fc0C5o+hKg
         d+RCHDSqbmiYFpJqQO8IrinCHxLS28Wr5mem8Rg6JICsOaPLiSoEezVBpBcfei+O6jmv
         /IzyEzeX9iVGo5gbKeLGX7/q2THJdhnuYvEzCUinaC8r1+9SFdrdbJKKMGCNKueQXbIk
         XPXw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from
         :dkim-signature;
        bh=SQSQYIALcr7+N3KRl94Q53ldCN/jVs1SV6NgFtqbx8s=;
        b=jMxXAFRDJt2xbiPK4zdfrzVXKieSg+SIw4hZeNekipmCjDZpieOAiPrfWLYI0OxAFr
         h7BVTxahHxR8yyGo1xQSgMPj+Mbo9/cJ0DvL8hY6FduCyozK0RM5V4lloqxn6aIG1F8a
         aAZTUgWAFRM+c2rtEI7RBjk2s33JWZD+BprAOForOtPbYQvNH1WJMICkNUzCE5WZ8v0U
         +J8vrP7zX2zBekvOOCzgoVxeWgbqDr5BF0QzpFRifM/VYu9wH8HJ+rntWF1BgUL+gKZg
         +3q/Zc1JtLuW7mRvpBTuWB4lZ1uBGwbEUAOmJAL0OcHmBMHFo7U9FG3V9ci3yKXmbjb/
         g1VQ==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=hOxGimbe;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 g24-20020a633758000000b0044034efb2aesi2548008pgn.869.2022.11.20.16.28.35;
        Sun, 20 Nov 2022 16:28:47 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=hOxGimbe;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229960AbiKUA1t (ORCPT <rfc822;leviz.kernel.dev@gmail.com>
        + 99 others); Sun, 20 Nov 2022 19:27:49 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57672 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229924AbiKUA1T (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Sun, 20 Nov 2022 19:27:19 -0500
Received: from mga05.intel.com (mga05.intel.com [192.55.52.43])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 782212D1EE;
        Sun, 20 Nov 2022 16:27:10 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1668990430; x=1700526430;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=t7sPVkm/3UzeLP7VuUIOn/LR0Yeq7hatZzy8/VrnZkk=;
  b=hOxGimbe+IYDLQhr3uwKE5IFxlMBAttn9lWKxIlxaWliRhP2NxShVRvG
   gUUrmpMEsCy5fYVzeyrIh7ld9JrHQUCOAZDqfr1TtUviWAk73oWdj/JRd
   8mr/iWYeNYlTd25FBXC3Ea3E/ZLtcEISYZnCRB7NtLzAoQaHLQ8YgwyuC
   8ZjIQCaCj7CC9g08BBQZvjZfVV91p+JRmH8gOhRUEJUa+Anmt2+qG7O+4
   ga/rHeXKuQMWD9hLgNWB0vzoJSyzKJ2xaHO4hHGBJrUrBT9Bdwh74Q1vf
   vlCnert/OlAyogkM2hPDV1uEKrVrkz88mEgeeoC4G+FbGFyh/vTQJftmo
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="399732287"
X-IronPort-AV: E=Sophos;i="5.96,180,1665471600";
   d="scan'208";a="399732287"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
  by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Nov 2022 16:27:08 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="729825206"
X-IronPort-AV: E=Sophos;i="5.96,180,1665471600";
   d="scan'208";a="729825206"
Received: from tomnavar-mobl.amr.corp.intel.com (HELO
 khuang2-desk.gar.corp.intel.com) ([10.209.176.15])
  by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Nov 2022 16:27:04 -0800
From: Kai Huang <kai.huang@intel.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: linux-mm@kvack.org, seanjc@google.com, pbonzini@redhat.com,
        dave.hansen@intel.com, dan.j.williams@intel.com,
        rafael.j.wysocki@intel.com, kirill.shutemov@linux.intel.com,
        ying.huang@intel.com, reinette.chatre@intel.com,
        len.brown@intel.com, tony.luck@intel.com, peterz@infradead.org,
        ak@linux.intel.com, isaku.yamahata@intel.com, chao.gao@intel.com,
        sathyanarayanan.kuppuswamy@linux.intel.com, bagasdotme@gmail.com,
        sagis@google.com, imammedo@redhat.com, kai.huang@intel.com
Subject: [PATCH v7 03/20] x86/virt/tdx: Disable TDX if X2APIC is not enabled
Date: Mon, 21 Nov 2022 13:26:25 +1300
Message-Id: 
 <c5f484c1a87ee052597fd5f539cf021f158755b9.1668988357.git.kai.huang@intel.com>
X-Mailer: git-send-email 2.38.1
In-Reply-To: <cover.1668988357.git.kai.huang@intel.com>
References: <cover.1668988357.git.kai.huang@intel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,
        SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1750063411563004003?=
X-GMAIL-MSGID: =?utf-8?q?1750063411563004003?=

The MMIO/xAPIC interface has some problems, most notably the APIC LEAK
[1].  This bug allows an attacker to use the APIC MMIO interface to
extract data from the SGX enclave.

TDX is not immune from this either.  Early check X2APIC and disable TDX
if X2APIC is not enabled, and make INTEL_TDX_HOST depend on X86_X2APIC.

[1]: https://aepicleak.com/aepicleak.pdf

Link: https://lore.kernel.org/lkml/d6ffb489-7024-ff74-bd2f-d1e06573bb82@intel.com/
Link: https://lore.kernel.org/lkml/ba80b303-31bf-d44a-b05d-5c0f83038798@intel.com/
Signed-off-by: Kai Huang <kai.huang@intel.com>
---

v6 -> v7:
 - Changed to use "Link" for the two lore links to get rid of checkpatch
   warning.

---
 arch/x86/Kconfig            |  1 +
 arch/x86/virt/vmx/tdx/tdx.c | 11 +++++++++++
 2 files changed, 12 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index cced4ef3bfb2..dd333b46fafb 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1958,6 +1958,7 @@ config INTEL_TDX_HOST
 	depends on CPU_SUP_INTEL
 	depends on X86_64
 	depends on KVM_INTEL
+	depends on X86_X2APIC
 	help
 	  Intel Trust Domain Extensions (TDX) protects guest VMs from malicious
 	  host and certain physical attacks.  This option enables necessary TDX
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 982d9c453b6b..8d943bdc8335 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -12,6 +12,7 @@
 #include <linux/printk.h>
 #include <asm/msr-index.h>
 #include <asm/msr.h>
+#include <asm/apic.h>
 #include <asm/tdx.h>
 #include "tdx.h"
 
@@ -81,6 +82,16 @@ static int __init tdx_init(void)
 		goto no_tdx;
 	}
 
+	/*
+	 * TDX requires X2APIC being enabled to prevent potential data
+	 * leak via APIC MMIO registers.  Just disable TDX if not using
+	 * X2APIC.
+	 */
+	if (!x2apic_enabled()) {
+		pr_info("Disable TDX as X2APIC is not enabled.\n");
+		goto no_tdx;
+	}
+
 	return 0;
 no_tdx:
 	clear_tdx();

From patchwork Mon Nov 21 00:26:26 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kai Huang <kai.huang@intel.com>
X-Patchwork-Id: 23480
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1322475wrr;
        Sun, 20 Nov 2022 16:28:31 -0800 (PST)
X-Google-Smtp-Source: 
 AA0mqf7Bf11I/xUP/azIebXJMzmACwD1w0fr0V5rWNb4rCe23ctBzyuFdREEnUT/h3GTxQY3Qi/0
X-Received: by 2002:a62:1c05:0:b0:56a:af55:629c with SMTP id
 c5-20020a621c05000000b0056aaf55629cmr18008177pfc.82.1668990511185;
        Sun, 20 Nov 2022 16:28:31 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1668990511; cv=none;
        d=google.com; s=arc-20160816;
        b=YWeU6dBnoD1RMO6x3Q87UnucwDKxZbB1fK+ZSx0n/6xtfrFqZU3DnxjDqgZrYmybGn
         Gw6ycwvj1nsWZQbCbPO50PP1vjDJAaCbSvlsUjpI7+QLaWmST+Kz7y8Ng5JtCCjWRQvd
         uH7J/KuMku1EDPsuedFOXq7+IhrFMNzHbZDmq3IhiI+K/FOPg3dl9eYXZPGkisXZ9nRh
         M8bI5q9iVZHs/8iGRiKIP0GVcfebSggNj00hurfRHJyxgxEw36p3YfR8MaMY51Ql8Jj+
         L/oVgxuSJZ8iXDnn56VCpHsQaY+U7kPIIqNk8SYFvj3nSQyRyYlmZRylbqjzoI6MNxZe
         ayTw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from
         :dkim-signature;
        bh=7ur5RcuQkWcjjwGZvulljBWYNnnq4yuWImrMgWC3aF4=;
        b=y0okKFf6V452tXF4DtAiy8yMr8enI8U8zqD70Lk93Ualqk/cK4YQ4+YnIfxXwHYY0N
         24J+qRZVuem7cnMC2lK0yY87VYHPP4l5ozh+XdIpldZ6RNJ2bDABuazxz6Y1PDxTkARC
         8X+z+OAw9amrqkyYaV4az1mbABdgJSQPSREueI4jCVafaLKDrFxRU/Vcr0a5XmVtO/ge
         nLbWlbNCAbCRxN/vHea40YVxSzoYPQceO4Xq5hZz7s9QszNxK96stKWI3sK7gHTwqdky
         22zf49orlcDjA8KqHTzng1+auvicj0sOuTLa4CS//7tYV1deFCIPAA7RT982aG4eeGdO
         z+XQ==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=aZV3cbXl;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 z4-20020a63e544000000b0047730522d4esi7260349pgj.95.2022.11.20.16.28.17;
        Sun, 20 Nov 2022 16:28:31 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=aZV3cbXl;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229932AbiKUA1o (ORCPT <rfc822;leviz.kernel.dev@gmail.com>
        + 99 others); Sun, 20 Nov 2022 19:27:44 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57432 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229708AbiKUA1T (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Sun, 20 Nov 2022 19:27:19 -0500
Received: from mga05.intel.com (mga05.intel.com [192.55.52.43])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1DA592D1F3;
        Sun, 20 Nov 2022 16:27:14 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1668990434; x=1700526434;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=3s5S0WEvIVlYJHz6MygetQEQmwV83ab1apvR2JvNNVM=;
  b=aZV3cbXlFSLo9SFtMyaU+aTBZZMtAY5m8DID6ZDwvSBV1lRbfXyZCyFE
   m6pVABMaGZeyk1rzPF4gfGshidY/y1BYWvZKp1wuLXqHqW6jDGC6Hle/t
   ZKgFhw/vGVdQFd6F4gJAqtlr8LCws3lDs90Mpbd8oZ3F2aylO6vpKL9NJ
   lKYZC87UDEiNwlcU2nTHjHOqjgP8/F5OkYsm2MgazEYus+jyiQL/59wDD
   HRfesOLlzlvzQTCxRkk1mJFfb8bCu3Rljw0U4PTAI8i0WJCyC4jZ5s99o
   EPSaElWK8dx4n8s73BQ2+4HAJoOlgFrIb3SUFNr3FdEEiZRNkL2YxyCU/
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="399732292"
X-IronPort-AV: E=Sophos;i="5.96,180,1665471600";
   d="scan'208";a="399732292"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
  by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Nov 2022 16:27:12 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="729825228"
X-IronPort-AV: E=Sophos;i="5.96,180,1665471600";
   d="scan'208";a="729825228"
Received: from tomnavar-mobl.amr.corp.intel.com (HELO
 khuang2-desk.gar.corp.intel.com) ([10.209.176.15])
  by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Nov 2022 16:27:08 -0800
From: Kai Huang <kai.huang@intel.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: linux-mm@kvack.org, seanjc@google.com, pbonzini@redhat.com,
        dave.hansen@intel.com, dan.j.williams@intel.com,
        rafael.j.wysocki@intel.com, kirill.shutemov@linux.intel.com,
        ying.huang@intel.com, reinette.chatre@intel.com,
        len.brown@intel.com, tony.luck@intel.com, peterz@infradead.org,
        ak@linux.intel.com, isaku.yamahata@intel.com, chao.gao@intel.com,
        sathyanarayanan.kuppuswamy@linux.intel.com, bagasdotme@gmail.com,
        sagis@google.com, imammedo@redhat.com, kai.huang@intel.com
Subject: [PATCH v7 04/20] x86/virt/tdx: Add skeleton to initialize TDX on
 demand
Date: Mon, 21 Nov 2022 13:26:26 +1300
Message-Id: 
 <d26254af8e5b3dcca8a070703c5d6d04f48d47a9.1668988357.git.kai.huang@intel.com>
X-Mailer: git-send-email 2.38.1
In-Reply-To: <cover.1668988357.git.kai.huang@intel.com>
References: <cover.1668988357.git.kai.huang@intel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,
        SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1750063394372940491?=
X-GMAIL-MSGID: =?utf-8?q?1750063394372940491?=

Before the TDX module can be used to create and run TDX guests, it must
be loaded and properly initialized.  The TDX module is expected to be
loaded by the BIOS, and to be initialized by the kernel.

TDX introduces a new CPU mode: Secure Arbitration Mode (SEAM).  The host
kernel communicates with the TDX module via a new SEAMCALL instruction.
The TDX module implements a set of SEAMCALL leaf functions to allow the
host kernel to initialize it.

The TDX module can be initialized only once in its lifetime.  Instead
of always initializing it at boot time, this implementation chooses an
"on demand" approach to initialize TDX until there is a real need (e.g
when requested by KVM).  This approach has below pros:

1) It avoids consuming the memory that must be allocated by kernel and
given to the TDX module as metadata (~1/256th of the TDX-usable memory),
and also saves the CPU cycles of initializing the TDX module (and the
metadata) when TDX is not used at all.

2) It is more flexible to support TDX module runtime updating in the
future (after updating the TDX module, it needs to be initialized
again).

3) It avoids having to do a "temporary" solution to handle VMXON in the
core (non-KVM) kernel for now.  This is because SEAMCALL requires CPU
being in VMX operation (VMXON is done), but currently only KVM handles
VMXON.  Adding VMXON support to the core kernel isn't trivial.  More
importantly, from long-term a reference-based approach is likely needed
in the core kernel as more kernel components are likely needed to
support TDX as well.  Allow KVM to initialize the TDX module avoids
having to handle VMXON during kernel boot for now.

Add a placeholder tdx_enable() to detect and initialize the TDX module
on demand, with a state machine protected by mutex to support concurrent
calls from multiple callers.

The TDX module will be initialized in multi-steps defined by the TDX
module:

  1) Global initialization;
  2) Logical-CPU scope initialization;
  3) Enumerate the TDX module capabilities and platform configuration;
  4) Configure the TDX module about TDX usable memory ranges and global
     KeyID information;
  5) Package-scope configuration for the global KeyID;
  6) Initialize usable memory ranges based on 4).

The TDX module can also be shut down at any time during its lifetime.
In case of any error during the initialization process, shut down the
module.  It's pointless to leave the module in any intermediate state
during the initialization.

Both logical CPU scope initialization and shutting down the TDX module
require calling SEAMCALL on all boot-time present CPUs.  For simplicity
just temporarily disable CPU hotplug during the module initialization.

Note TDX architecturally doesn't support physical CPU hot-add/removal.
A non-buggy BIOS should never support ACPI CPU hot-add/removal.  This
implementation doesn't explicitly handle ACPI CPU hot-add/removal but
depends on the BIOS to do the right thing.

Reviewed-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---

v6 -> v7:
 - No change.

v5 -> v6:
 - Added code to set status to TDX_MODULE_NONE if TDX module is not
   loaded (Chao)
 - Added Chao's Reviewed-by.
 - Improved comments around cpus_read_lock().

- v3->v5 (no feedback on v4):
 - Removed the check that SEAMRR and TDX KeyID have been detected on
   all present cpus.
 - Removed tdx_detect().
 - Added num_online_cpus() to MADT-enabled CPUs check within the CPU
   hotplug lock and return early with error message.
 - Improved dmesg printing for TDX module detection and initialization.

---
 arch/x86/include/asm/tdx.h  |   2 +
 arch/x86/virt/vmx/tdx/tdx.c | 150 ++++++++++++++++++++++++++++++++++++
 2 files changed, 152 insertions(+)

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 51c4222a13ae..05fc89d9742a 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -101,8 +101,10 @@ static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
 
 #ifdef CONFIG_INTEL_TDX_HOST
 bool platform_tdx_enabled(void);
+int tdx_enable(void);
 #else	/* !CONFIG_INTEL_TDX_HOST */
 static inline bool platform_tdx_enabled(void) { return false; }
+static inline int tdx_enable(void)  { return -ENODEV; }
 #endif	/* CONFIG_INTEL_TDX_HOST */
 
 #endif /* !__ASSEMBLY__ */
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 8d943bdc8335..28c187b8726f 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -10,15 +10,34 @@
 #include <linux/types.h>
 #include <linux/init.h>
 #include <linux/printk.h>
+#include <linux/mutex.h>
+#include <linux/cpu.h>
+#include <linux/cpumask.h>
 #include <asm/msr-index.h>
 #include <asm/msr.h>
 #include <asm/apic.h>
 #include <asm/tdx.h>
 #include "tdx.h"
 
+/* TDX module status during initialization */
+enum tdx_module_status_t {
+	/* TDX module hasn't been detected and initialized */
+	TDX_MODULE_UNKNOWN,
+	/* TDX module is not loaded */
+	TDX_MODULE_NONE,
+	/* TDX module is initialized */
+	TDX_MODULE_INITIALIZED,
+	/* TDX module is shut down due to initialization error */
+	TDX_MODULE_SHUTDOWN,
+};
+
 static u32 tdx_keyid_start __ro_after_init;
 static u32 tdx_keyid_num __ro_after_init;
 
+static enum tdx_module_status_t tdx_module_status;
+/* Prevent concurrent attempts on TDX detection and initialization */
+static DEFINE_MUTEX(tdx_module_lock);
+
 /*
  * Detect TDX private KeyIDs to see whether TDX has been enabled by the
  * BIOS.  Both initializing the TDX module and running TDX guest require
@@ -104,3 +123,134 @@ bool platform_tdx_enabled(void)
 {
 	return !!tdx_keyid_num;
 }
+
+/*
+ * Detect and initialize the TDX module.
+ *
+ * Return -ENODEV when the TDX module is not loaded, 0 when it
+ * is successfully initialized, or other error when it fails to
+ * initialize.
+ */
+static int init_tdx_module(void)
+{
+	/* The TDX module hasn't been detected */
+	return -ENODEV;
+}
+
+static void shutdown_tdx_module(void)
+{
+	/* TODO: Shut down the TDX module */
+}
+
+static int __tdx_enable(void)
+{
+	int ret;
+
+	/*
+	 * Initializing the TDX module requires doing SEAMCALL on all
+	 * boot-time present CPUs.  For simplicity temporarily disable
+	 * CPU hotplug to prevent any CPU from going offline during
+	 * the initialization.
+	 */
+	cpus_read_lock();
+
+	/*
+	 * Check whether all boot-time present CPUs are online and
+	 * return early with a message so the user can be aware.
+	 *
+	 * Note a non-buggy BIOS should never support physical (ACPI)
+	 * CPU hotplug when TDX is enabled, and all boot-time present
+	 * CPU should be enabled in MADT, so there should be no
+	 * disabled_cpus and num_processors won't change at runtime
+	 * either.
+	 */
+	if (disabled_cpus || num_online_cpus() != num_processors) {
+		pr_err("Unable to initialize the TDX module when there's offline CPU(s).\n");
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ret = init_tdx_module();
+	if (ret == -ENODEV) {
+		pr_info("TDX module is not loaded.\n");
+		tdx_module_status = TDX_MODULE_NONE;
+		goto out;
+	}
+
+	/*
+	 * Shut down the TDX module in case of any error during the
+	 * initialization process.  It's meaningless to leave the TDX
+	 * module in any middle state of the initialization process.
+	 *
+	 * Shutting down the module also requires doing SEAMCALL on all
+	 * MADT-enabled CPUs.  Do it while CPU hotplug is disabled.
+	 *
+	 * Return all errors during the initialization as -EFAULT as the
+	 * module is always shut down.
+	 */
+	if (ret) {
+		pr_info("Failed to initialize TDX module. Shut it down.\n");
+		shutdown_tdx_module();
+		tdx_module_status = TDX_MODULE_SHUTDOWN;
+		ret = -EFAULT;
+		goto out;
+	}
+
+	pr_info("TDX module initialized.\n");
+	tdx_module_status = TDX_MODULE_INITIALIZED;
+out:
+	cpus_read_unlock();
+
+	return ret;
+}
+
+/**
+ * tdx_enable - Enable TDX by initializing the TDX module
+ *
+ * Caller to make sure all CPUs are online and in VMX operation before
+ * calling this function.  CPU hotplug is temporarily disabled internally
+ * to prevent any cpu from going offline.
+ *
+ * This function can be called in parallel by multiple callers.
+ *
+ * Return:
+ *
+ * * 0:		The TDX module has been successfully initialized.
+ * * -ENODEV:	The TDX module is not loaded, or TDX is not supported.
+ * * -EINVAL:	The TDX module cannot be initialized due to certain
+ *		conditions are not met (i.e. when not all MADT-enabled
+ *		CPUs are not online).
+ * * -EFAULT:	Other internal fatal errors, or the TDX module is in
+ *		shutdown mode due to it failed to initialize in previous
+ *		attempts.
+ */
+int tdx_enable(void)
+{
+	int ret;
+
+	if (!platform_tdx_enabled())
+		return -ENODEV;
+
+	mutex_lock(&tdx_module_lock);
+
+	switch (tdx_module_status) {
+	case TDX_MODULE_UNKNOWN:
+		ret = __tdx_enable();
+		break;
+	case TDX_MODULE_NONE:
+		ret = -ENODEV;
+		break;
+	case TDX_MODULE_INITIALIZED:
+		ret = 0;
+		break;
+	default:
+		WARN_ON_ONCE(tdx_module_status != TDX_MODULE_SHUTDOWN);
+		ret = -EFAULT;
+		break;
+	}
+
+	mutex_unlock(&tdx_module_lock);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(tdx_enable);

From patchwork Mon Nov 21 00:26:27 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kai Huang <kai.huang@intel.com>
X-Patchwork-Id: 23487
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1322754wrr;
        Sun, 20 Nov 2022 16:29:39 -0800 (PST)
X-Google-Smtp-Source: 
 AA0mqf4/e49NEy/cBYbDQRQBP1ScsC2yd01/U4Yfb2ux5C/EEAHr6dXoFt4WqPhve83YoqfrcFxP
X-Received: by 2002:a17:90a:f690:b0:218:abaa:14b8 with SMTP id
 cl16-20020a17090af69000b00218abaa14b8mr3790906pjb.40.1668990578755;
        Sun, 20 Nov 2022 16:29:38 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1668990578; cv=none;
        d=google.com; s=arc-20160816;
        b=NgbK9b5I/7UHJIV867RY1dJzld3FZ2zTJ1WSDqXWarQpZ2mYo/Ez2p28QJC3r6GTe8
         sgbStw1QuTUvF4/eP3jezT7UJxoZdIJLlsDb0q/rOf9pqu8TN8cQ6pUMEk4nTnIM9H5P
         aKPmiWsDzJLZ4TD2VmSNJU3a6CxLolIpadEQOHu6lwjglKKgDYhoZ1LHMfhcRgFkD6EE
         wwBbrMLOwljbJUg5JTFUbDANwF5R3YAw5+zZqc7ATfBNt38FbSTsrROpo0Qes/8nTFgp
         psVWEKsxVydnTEI98X8AG0kVnguxsAZ+VQFI0kKG8DyT0V5rtiWmTGmby6wKvG7nigSv
         dnhw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from
         :dkim-signature;
        bh=SNiwe2Zb0cjC3OZBJmtPpgsz1LMJa5pkeUfTE9hPttE=;
        b=0qUbYK5iLu+rMw3817b/W73VfIxabenMl23Vf5y6TM2iVIT7JHRszR9KQ3W7GoF1Q0
         scoukzfAU3AkX+oZSyggNjcVoFJ4tc5cis77Al0x8JxZgSmSpoBNFXk3yMv1Rc0lYENh
         VIBF3iOQJwjtk6HJrMz/vfzUe5S7SAfs0UvIRt0UwK1y7qtl+6NDjGGLZa54lUq7WuJz
         CcznON/y0XcZu4cd1BUuEFICn8zr1hkn1B841fDiK7LqPIDT6a8tKyPn23GB8pM6Ye7N
         CZ1X3xeSzCdyO1eAZwQAaAaZHCSOQANBst1xrAZ3Bdep95/nRiXbNLfDnP5yyTn4recD
         FEUA==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=gUlO6rJ4;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 w3-20020a634743000000b0046fe2443137si5426803pgk.190.2022.11.20.16.29.26;
        Sun, 20 Nov 2022 16:29:38 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=gUlO6rJ4;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229991AbiKUA15 (ORCPT <rfc822;leviz.kernel.dev@gmail.com>
        + 99 others); Sun, 20 Nov 2022 19:27:57 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57674 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229853AbiKUA1U (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Sun, 20 Nov 2022 19:27:20 -0500
Received: from mga05.intel.com (mga05.intel.com [192.55.52.43])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 06EA12D749;
        Sun, 20 Nov 2022 16:27:16 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1668990437; x=1700526437;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=MudlMe0lvY5bsPcDekedIAIQH5VXs6WlLHycx03oiQw=;
  b=gUlO6rJ44mJjwCLLEYEldrFB/OnnNfTKuALi7wMCXTLp76EC1QdXnCcF
   2AVgTtHu9aEIp7ENypx/8tAOhaXj00JytJ0lcIHslGsw/nC8zJ77KnqRN
   e7zZXRXaeWx/ji6POwW+3v7pm6jKAgzubDV81k1NoXQJj4pkUz2aoaqrD
   Zxsqmk8/8NPNqYzUoLShuAt67MfOSkmiaflOFjVafvapi/qT08HWKRjSx
   MB+uEWGhQLW6Q8M9DWfTw2zPQzILr3ncqnQ3aM4augwlPPO2Nz1abzR9n
   tWUtD8PWptYQVBgyEf+EZObyiKSs7E84eJaL9w4lOUDgCsNSap1f7hVOw
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="399732296"
X-IronPort-AV: E=Sophos;i="5.96,180,1665471600";
   d="scan'208";a="399732296"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
  by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Nov 2022 16:27:16 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="729825246"
X-IronPort-AV: E=Sophos;i="5.96,180,1665471600";
   d="scan'208";a="729825246"
Received: from tomnavar-mobl.amr.corp.intel.com (HELO
 khuang2-desk.gar.corp.intel.com) ([10.209.176.15])
  by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Nov 2022 16:27:12 -0800
From: Kai Huang <kai.huang@intel.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: linux-mm@kvack.org, seanjc@google.com, pbonzini@redhat.com,
        dave.hansen@intel.com, dan.j.williams@intel.com,
        rafael.j.wysocki@intel.com, kirill.shutemov@linux.intel.com,
        ying.huang@intel.com, reinette.chatre@intel.com,
        len.brown@intel.com, tony.luck@intel.com, peterz@infradead.org,
        ak@linux.intel.com, isaku.yamahata@intel.com, chao.gao@intel.com,
        sathyanarayanan.kuppuswamy@linux.intel.com, bagasdotme@gmail.com,
        sagis@google.com, imammedo@redhat.com, kai.huang@intel.com
Subject: [PATCH v7 05/20] x86/virt/tdx: Implement functions to make SEAMCALL
Date: Mon, 21 Nov 2022 13:26:27 +1300
Message-Id: 
 <5977ec3c2e682e6927ce1c33e7fcac7fcfe2d346.1668988357.git.kai.huang@intel.com>
X-Mailer: git-send-email 2.38.1
In-Reply-To: <cover.1668988357.git.kai.huang@intel.com>
References: <cover.1668988357.git.kai.huang@intel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,
        SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1750063465487045450?=
X-GMAIL-MSGID: =?utf-8?q?1750063465487045450?=

TDX introduces a new CPU mode: Secure Arbitration Mode (SEAM).  This
mode runs only the TDX module itself or other code to load the TDX
module.

The host kernel communicates with SEAM software via a new SEAMCALL
instruction.  This is conceptually similar to a guest->host hypercall,
except it is made from the host to SEAM software instead.

The TDX module defines a set of SEAMCALL leaf functions to allow the
host to initialize it, and to create and run protected VMs.  SEAMCALL
leaf functions use an ABI different from the x86-64 system-v ABI.
Instead, they share the same ABI with the TDCALL leaf functions.

Implement a function __seamcall() to allow the host to make SEAMCALL
to SEAM software using the TDX_MODULE_CALL macro which is the common
assembly for both SEAMCALL and TDCALL.

SEAMCALL instruction causes #GP when SEAMRR isn't enabled, and #UD when
CPU is not in VMX operation.  The current TDX_MODULE_CALL macro doesn't
handle any of them.  There's no way to check whether the CPU is in VMX
operation or not.

Initializing the TDX module is done at runtime on demand, and it depends
on the caller to ensure CPU is in VMX operation before making SEAMCALL.
To avoid getting Oops when the caller mistakenly tries to initialize the
TDX module when CPU is not in VMX operation, extend the TDX_MODULE_CALL
macro to handle #UD (and also #GP, which can theoretically still happen
when TDX isn't actually enabled by the BIOS, i.e. due to BIOS bug).

Introduce two new TDX error codes for #UD and #GP respectively so the
caller can distinguish.  Also, Opportunistically put the new TDX error
codes and the existing TDX_SEAMCALL_VMFAILINVALID into INTEL_TDX_HOST
Kconfig option as they are only used when it is on.

As __seamcall() can potentially return multiple error codes, besides the
actual SEAMCALL leaf function return code, also introduce a wrapper
function seamcall() to convert the __seamcall() error code to the kernel
error code, so the caller doesn't need to duplicate the code to check
return value of __seamcall() and return kernel error code accordingly.

Signed-off-by: Kai Huang <kai.huang@intel.com>
---

v6 -> v7:
 - No change.

v5 -> v6:
 - Added code to handle #UD and #GP (Dave).
 - Moved the seamcall() wrapper function to this patch, and used a
   temporary __always_unused to avoid compile warning (Dave).

- v3 -> v5 (no feedback on v4):
 - Explicitly tell TDX_SEAMCALL_VMFAILINVALID is returned if the
   SEAMCALL itself fails.
 - Improve the changelog.

---
 arch/x86/include/asm/tdx.h       |  9 ++++++
 arch/x86/virt/vmx/tdx/Makefile   |  2 +-
 arch/x86/virt/vmx/tdx/seamcall.S | 52 ++++++++++++++++++++++++++++++++
 arch/x86/virt/vmx/tdx/tdx.c      | 42 ++++++++++++++++++++++++++
 arch/x86/virt/vmx/tdx/tdx.h      |  8 +++++
 arch/x86/virt/vmx/tdx/tdxcall.S  | 19 ++++++++++--
 6 files changed, 129 insertions(+), 3 deletions(-)
 create mode 100644 arch/x86/virt/vmx/tdx/seamcall.S

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 05fc89d9742a..d688228f3151 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -8,6 +8,10 @@
 #include <asm/ptrace.h>
 #include <asm/shared/tdx.h>
 
+#ifdef CONFIG_INTEL_TDX_HOST
+
+#include <asm/trapnr.h>
+
 /*
  * SW-defined error codes.
  *
@@ -18,6 +22,11 @@
 #define TDX_SW_ERROR			(TDX_ERROR | GENMASK_ULL(47, 40))
 #define TDX_SEAMCALL_VMFAILINVALID	(TDX_SW_ERROR | _UL(0xFFFF0000))
 
+#define TDX_SEAMCALL_GP			(TDX_SW_ERROR | X86_TRAP_GP)
+#define TDX_SEAMCALL_UD			(TDX_SW_ERROR | X86_TRAP_UD)
+
+#endif
+
 #ifndef __ASSEMBLY__
 
 /*
diff --git a/arch/x86/virt/vmx/tdx/Makefile b/arch/x86/virt/vmx/tdx/Makefile
index 93ca8b73e1f1..38d534f2c113 100644
--- a/arch/x86/virt/vmx/tdx/Makefile
+++ b/arch/x86/virt/vmx/tdx/Makefile
@@ -1,2 +1,2 @@
 # SPDX-License-Identifier: GPL-2.0-only
-obj-y += tdx.o
+obj-y += tdx.o seamcall.o
diff --git a/arch/x86/virt/vmx/tdx/seamcall.S b/arch/x86/virt/vmx/tdx/seamcall.S
new file mode 100644
index 000000000000..f81be6b9c133
--- /dev/null
+++ b/arch/x86/virt/vmx/tdx/seamcall.S
@@ -0,0 +1,52 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#include <linux/linkage.h>
+#include <asm/frame.h>
+
+#include "tdxcall.S"
+
+/*
+ * __seamcall() - Host-side interface functions to SEAM software module
+ *		  (the P-SEAMLDR or the TDX module).
+ *
+ * Transform function call register arguments into the SEAMCALL register
+ * ABI.  Return TDX_SEAMCALL_VMFAILINVALID if the SEAMCALL itself fails,
+ * or the completion status of the SEAMCALL leaf function.  Additional
+ * output operands are saved in @out (if it is provided by the caller).
+ *
+ *-------------------------------------------------------------------------
+ * SEAMCALL ABI:
+ *-------------------------------------------------------------------------
+ * Input Registers:
+ *
+ * RAX                 - SEAMCALL Leaf number.
+ * RCX,RDX,R8-R9       - SEAMCALL Leaf specific input registers.
+ *
+ * Output Registers:
+ *
+ * RAX                 - SEAMCALL completion status code.
+ * RCX,RDX,R8-R11      - SEAMCALL Leaf specific output registers.
+ *
+ *-------------------------------------------------------------------------
+ *
+ * __seamcall() function ABI:
+ *
+ * @fn  (RDI)          - SEAMCALL Leaf number, moved to RAX
+ * @rcx (RSI)          - Input parameter 1, moved to RCX
+ * @rdx (RDX)          - Input parameter 2, moved to RDX
+ * @r8  (RCX)          - Input parameter 3, moved to R8
+ * @r9  (R8)           - Input parameter 4, moved to R9
+ *
+ * @out (R9)           - struct tdx_module_output pointer
+ *			 stored temporarily in R12 (not
+ *			 used by the P-SEAMLDR or the TDX
+ *			 module). It can be NULL.
+ *
+ * Return (via RAX) the completion status of the SEAMCALL, or
+ * TDX_SEAMCALL_VMFAILINVALID.
+ */
+SYM_FUNC_START(__seamcall)
+	FRAME_BEGIN
+	TDX_MODULE_CALL host=1
+	FRAME_END
+	RET
+SYM_FUNC_END(__seamcall)
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 28c187b8726f..b06c1a2bc9cb 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -124,6 +124,48 @@ bool platform_tdx_enabled(void)
 	return !!tdx_keyid_num;
 }
 
+/*
+ * Wrapper of __seamcall() to convert SEAMCALL leaf function error code
+ * to kernel error code.  @seamcall_ret and @out contain the SEAMCALL
+ * leaf function return code and the additional output respectively if
+ * not NULL.
+ */
+static int __always_unused seamcall(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
+				    u64 *seamcall_ret,
+				    struct tdx_module_output *out)
+{
+	u64 sret;
+
+	sret = __seamcall(fn, rcx, rdx, r8, r9, out);
+
+	/* Save SEAMCALL return code if caller wants it */
+	if (seamcall_ret)
+		*seamcall_ret = sret;
+
+	/* SEAMCALL was successful */
+	if (!sret)
+		return 0;
+
+	switch (sret) {
+	case TDX_SEAMCALL_GP:
+		/*
+		 * platform_tdx_enabled() is checked to be true
+		 * before making any SEAMCALL.
+		 */
+		WARN_ON_ONCE(1);
+		fallthrough;
+	case TDX_SEAMCALL_VMFAILINVALID:
+		/* Return -ENODEV if the TDX module is not loaded. */
+		return -ENODEV;
+	case TDX_SEAMCALL_UD:
+		/* Return -EINVAL if CPU isn't in VMX operation. */
+		return -EINVAL;
+	default:
+		/* Return -EIO if the actual SEAMCALL leaf failed. */
+		return -EIO;
+	}
+}
+
 /*
  * Detect and initialize the TDX module.
  *
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index d00074abcb20..92a8de957dc7 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -12,4 +12,12 @@
 /* MSR to report KeyID partitioning between MKTME and TDX */
 #define MSR_IA32_MKTME_KEYID_PARTITIONING	0x00000087
 
+/*
+ * Do not put any hardware-defined TDX structure representations below
+ * this comment!
+ */
+
+struct tdx_module_output;
+u64 __seamcall(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
+	       struct tdx_module_output *out);
 #endif
diff --git a/arch/x86/virt/vmx/tdx/tdxcall.S b/arch/x86/virt/vmx/tdx/tdxcall.S
index 49a54356ae99..757b0c34be10 100644
--- a/arch/x86/virt/vmx/tdx/tdxcall.S
+++ b/arch/x86/virt/vmx/tdx/tdxcall.S
@@ -1,6 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 #include <asm/asm-offsets.h>
 #include <asm/tdx.h>
+#include <asm/asm.h>
 
 /*
  * TDCALL and SEAMCALL are supported in Binutils >= 2.36.
@@ -45,6 +46,7 @@
 	/* Leave input param 2 in RDX */
 
 	.if \host
+1:
 	seamcall
 	/*
 	 * SEAMCALL instruction is essentially a VMExit from VMX root
@@ -57,10 +59,23 @@
 	 * This value will never be used as actual SEAMCALL error code as
 	 * it is from the Reserved status code class.
 	 */
-	jnc .Lno_vmfailinvalid
+	jnc .Lseamcall_out
 	mov $TDX_SEAMCALL_VMFAILINVALID, %rax
-.Lno_vmfailinvalid:
+	jmp .Lseamcall_out
+2:
+	/*
+	 * SEAMCALL caused #GP or #UD.  By reaching here %eax contains
+	 * the trap number.  Convert the trap number to the TDX error
+	 * code by setting TDX_SW_ERROR to the high 32-bits of %rax.
+	 *
+	 * Note cannot OR TDX_SW_ERROR directly to %rax as OR instruction
+	 * only accepts 32-bit immediate at most.
+	 */
+	mov $TDX_SW_ERROR, %r12
+	orq %r12, %rax
 
+	_ASM_EXTABLE_FAULT(1b, 2b)
+.Lseamcall_out:
 	.else
 	tdcall
 	.endif

From patchwork Mon Nov 21 00:26:28 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kai Huang <kai.huang@intel.com>
X-Patchwork-Id: 23482
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1322594wrr;
        Sun, 20 Nov 2022 16:29:02 -0800 (PST)
X-Google-Smtp-Source: 
 AA0mqf7uOK9P/uKSPWb0MDhBsfuXxzh29i1jbH5h09cUYxbhGK25cOrUMs4Ph4xeoBUdpf+Xo7O5
X-Received: by 2002:a05:6a00:1624:b0:573:993b:ea6d with SMTP id
 e4-20020a056a00162400b00573993bea6dmr3474831pfc.10.1668990541979;
        Sun, 20 Nov 2022 16:29:01 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1668990541; cv=none;
        d=google.com; s=arc-20160816;
        b=Os5QM2X9eyI9OUta71dx9WGeXnsbr7q3khqNJL8JbMojZEGZ+gjk+80xbOR5xnR71H
         gFeAuId58yHw+goqRWN/ZEBYY+VZ3B+mVT0GtFCbWGmyVPP32oq8k//OVC6KTNBAPBXI
         1Rxt5ShlL3zNkyofKY4XQSrgOv+xJEM9S5hmTInq22l1m+J1RtM5P+Bt6Yq3dzFWf95v
         aih0Q39W5i8QsXpEKsv6arvpdyx+bZeEr7NjBueBUmi6B9PTB0ebrVwSBG9G0chYeUzk
         HtucPA2IBlGzE0hFJZFx/yt3EKjxzs5mZ9n2550wt5FfRGTqI0AqeJ2n/iCq4M2C7Do5
         EhOA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from
         :dkim-signature;
        bh=8XoJQcUICegriGuR63P2iubKECjIBDjWK5SijeeW2NE=;
        b=DqmEPjof27sAOHY0BvprfDN5rT5O5JoI39RdZC8YoCP2OueUopXZXfczxZb5Ma8Hlv
         adzT6xiY2RpHEmxsmRbohGMfcPV/1LtU5e9p7gYCcIGLURo1lYrwO2O8TDEkW5DoQxNM
         C5xg82t8uTqqXds3EoHfMZK2Gb8Ick07k72mGZTzMOgJzlgJHF0hLzq338b5sXTLAgTv
         Ehy+lcqLvLofr79hxBU11KCD2ds4D6zvOzPJCiy1Naf+imdgMqC9ILgbZErdkiKvkPCZ
         yhFIkIGXKuZIAWmb+gjucCdfrwhlFtm2vjjPUuEm4eejRew8bNZhyJTKGOCCqyg4u7R6
         fVOA==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=Zrudpnn8;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 x23-20020a634a17000000b00476e938c09asi9873714pga.280.2022.11.20.16.28.49;
        Sun, 20 Nov 2022 16:29:01 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=Zrudpnn8;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229853AbiKUA2M (ORCPT <rfc822;leviz.kernel.dev@gmail.com>
        + 99 others); Sun, 20 Nov 2022 19:28:12 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57440 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229762AbiKUA1V (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Sun, 20 Nov 2022 19:27:21 -0500
Received: from mga05.intel.com (mga05.intel.com [192.55.52.43])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2E7352DAA1;
        Sun, 20 Nov 2022 16:27:21 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1668990441; x=1700526441;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=hwCT3RUBvAvk7hoIlEmR+DbUGKtOmJbWdTENnqoyMek=;
  b=Zrudpnn80cZTcD+rAl188EsY3CbVD3b7z66ufsiUGtmrtXbnvKGc84r+
   DzwmxPQ4PQd0Q0TJcger+qUuh3g1p46hG8eDjOg3/GLIljC4dxtqkqz3w
   HP+qAA4rOxg073fWCjcAjRbjqsmtXe/Gf0ERTf5Pz2Ak7A/woeDGXod3J
   kb3qBTTuNMPtaaw38lClHcmWpwpo9UMIsxkWuaJkVGc+hJFSi172Ww7Zn
   Colvi7qW+Zha+cjTUKwagPelAFBxbl9gAVKj6oTU8WNAXQqEokIp2q+3y
   uV6EEMb/uKvbQQV7/fUL5DpnbZzpV2XGNGSzIotjn/e2dCjzZw7yA9OrC
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="399732300"
X-IronPort-AV: E=Sophos;i="5.96,180,1665471600";
   d="scan'208";a="399732300"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
  by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Nov 2022 16:27:21 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="729825266"
X-IronPort-AV: E=Sophos;i="5.96,180,1665471600";
   d="scan'208";a="729825266"
Received: from tomnavar-mobl.amr.corp.intel.com (HELO
 khuang2-desk.gar.corp.intel.com) ([10.209.176.15])
  by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Nov 2022 16:27:16 -0800
From: Kai Huang <kai.huang@intel.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: linux-mm@kvack.org, seanjc@google.com, pbonzini@redhat.com,
        dave.hansen@intel.com, dan.j.williams@intel.com,
        rafael.j.wysocki@intel.com, kirill.shutemov@linux.intel.com,
        ying.huang@intel.com, reinette.chatre@intel.com,
        len.brown@intel.com, tony.luck@intel.com, peterz@infradead.org,
        ak@linux.intel.com, isaku.yamahata@intel.com, chao.gao@intel.com,
        sathyanarayanan.kuppuswamy@linux.intel.com, bagasdotme@gmail.com,
        sagis@google.com, imammedo@redhat.com, kai.huang@intel.com
Subject: [PATCH v7 06/20] x86/virt/tdx: Shut down TDX module in case of error
Date: Mon, 21 Nov 2022 13:26:28 +1300
Message-Id: 
 <48505089b645019a734d85c2c29f3c8ae2dbd6bd.1668988357.git.kai.huang@intel.com>
X-Mailer: git-send-email 2.38.1
In-Reply-To: <cover.1668988357.git.kai.huang@intel.com>
References: <cover.1668988357.git.kai.huang@intel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,
        SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1750063426863176062?=
X-GMAIL-MSGID: =?utf-8?q?1750063426863176062?=

TDX supports shutting down the TDX module at any time during its
lifetime.  After the module is shut down, no further TDX module SEAMCALL
leaf functions can be made to the module on any logical cpu.

Shut down the TDX module in case of any error during the initialization
process.  It's pointless to leave the TDX module in some middle state.

Shutting down the TDX module requires calling TDH.SYS.LP.SHUTDOWN on all
BIOS-enabled CPUs, and the SEMACALL can run concurrently on different
CPUs.  Implement a mechanism to run SEAMCALL concurrently on all online
CPUs and use it to shut down the module.  Later logical-cpu scope module
initialization will use it too.

Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---

v6 -> v7:
 - No change.

v5 -> v6:
 - Removed the seamcall() wrapper to previous patch (Dave).

- v3 -> v5 (no feedback on v4):
 - Added a wrapper of __seamcall() to print error code if SEAMCALL fails.
 - Made the seamcall_on_each_cpu() void.
 - Removed 'seamcall_ret' and 'tdx_module_out' from
   'struct seamcall_ctx', as they must be local variable.
 - Added the comments to tdx_init() and one paragraph to changelog to
   explain the caller should handle VMXON.
 - Called out after shut down, no "TDX module" SEAMCALL can be made.

---
 arch/x86/virt/vmx/tdx/tdx.c | 43 +++++++++++++++++++++++++++++++++----
 arch/x86/virt/vmx/tdx/tdx.h |  5 +++++
 2 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index b06c1a2bc9cb..5db1a05cb4bd 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -13,6 +13,8 @@
 #include <linux/mutex.h>
 #include <linux/cpu.h>
 #include <linux/cpumask.h>
+#include <linux/smp.h>
+#include <linux/atomic.h>
 #include <asm/msr-index.h>
 #include <asm/msr.h>
 #include <asm/apic.h>
@@ -124,15 +126,27 @@ bool platform_tdx_enabled(void)
 	return !!tdx_keyid_num;
 }
 
+/*
+ * Data structure to make SEAMCALL on multiple CPUs concurrently.
+ * @err is set to -EFAULT when SEAMCALL fails on any cpu.
+ */
+struct seamcall_ctx {
+	u64 fn;
+	u64 rcx;
+	u64 rdx;
+	u64 r8;
+	u64 r9;
+	atomic_t err;
+};
+
 /*
  * Wrapper of __seamcall() to convert SEAMCALL leaf function error code
  * to kernel error code.  @seamcall_ret and @out contain the SEAMCALL
  * leaf function return code and the additional output respectively if
  * not NULL.
  */
-static int __always_unused seamcall(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
-				    u64 *seamcall_ret,
-				    struct tdx_module_output *out)
+static int seamcall(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
+		    u64 *seamcall_ret, struct tdx_module_output *out)
 {
 	u64 sret;
 
@@ -166,6 +180,25 @@ static int __always_unused seamcall(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9,
 	}
 }
 
+static void seamcall_smp_call_function(void *data)
+{
+	struct seamcall_ctx *sc = data;
+	int ret;
+
+	ret = seamcall(sc->fn, sc->rcx, sc->rdx, sc->r8, sc->r9, NULL, NULL);
+	if (ret)
+		atomic_set(&sc->err, -EFAULT);
+}
+
+/*
+ * Call the SEAMCALL on all online CPUs concurrently.  Caller to check
+ * @sc->err to determine whether any SEAMCALL failed on any cpu.
+ */
+static void seamcall_on_each_cpu(struct seamcall_ctx *sc)
+{
+	on_each_cpu(seamcall_smp_call_function, sc, true);
+}
+
 /*
  * Detect and initialize the TDX module.
  *
@@ -181,7 +214,9 @@ static int init_tdx_module(void)
 
 static void shutdown_tdx_module(void)
 {
-	/* TODO: Shut down the TDX module */
+	struct seamcall_ctx sc = { .fn = TDH_SYS_LP_SHUTDOWN };
+
+	seamcall_on_each_cpu(&sc);
 }
 
 static int __tdx_enable(void)
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index 92a8de957dc7..215cc1065d78 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -12,6 +12,11 @@
 /* MSR to report KeyID partitioning between MKTME and TDX */
 #define MSR_IA32_MKTME_KEYID_PARTITIONING	0x00000087
 
+/*
+ * TDX module SEAMCALL leaf functions
+ */
+#define TDH_SYS_LP_SHUTDOWN	44
+
 /*
  * Do not put any hardware-defined TDX structure representations below
  * this comment!

From patchwork Mon Nov 21 00:26:29 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kai Huang <kai.huang@intel.com>
X-Patchwork-Id: 23484
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1322625wrr;
        Sun, 20 Nov 2022 16:29:09 -0800 (PST)
X-Google-Smtp-Source: 
 AA0mqf6slgyV/oSyKL0CMbjk9/7L691+DQzDMQt6qFwTKQSCVto0aOK/JHGCIQfouInvkssuDlcM
X-Received: by 2002:a05:6a00:1a14:b0:572:5be2:505b with SMTP id
 g20-20020a056a001a1400b005725be2505bmr17902137pfv.52.1668990549319;
        Sun, 20 Nov 2022 16:29:09 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1668990549; cv=none;
        d=google.com; s=arc-20160816;
        b=XkUmj/VfrNJrHpBqh2GgFFW9lRY7PSzxV057O1oLS7GamzNUyswSGt+QDm1q1zfq8U
         lpYGAkS3oNg4C87Et3A4+FIo8OYasC7ua+YrNR1YSezYtgxBDRVuorkeU2XR1yb/3sGC
         5oHEWSJDXP9ypbe61N9Ib3RMMOK9rtwJUh784bcbe1/aTH2ZQaGzjj6WI6AVocutA84l
         XUCvKCJE/Rys7L/UlA2IY2A18TV4XOXsgQFNskZzVcoPtLhPaHpmMHQY2NEVE/zXTR0Q
         LnuHm4MUBWvpZ32t1l3PLdfh25yfS3TvlgnoDv02oVP1mZHIwpqFxlcal5omR1TbUAsG
         HbDw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from
         :dkim-signature;
        bh=IiGUTreJwN+IalvVC+Vh/npF6YxcRoYimGC6MP2jaoU=;
        b=N3Ym4NFW/QPz3v1gqE3/4c3AYWLsWHJMovTJV04MWP68P8n9GvbUOR4JSR8B5FwK07
         H3B60a2Ndc4Dz47iUhzoqpKiguCHzwgEpdzXlt+8alw4Qfl4X3DoSkG6tE0XW6GXeW8r
         JJ5mCauplCYuJ/P/dcl2zMLXM7RYjdCz0qnXC9UGC45In+o0WTDudEyxsmeyO5YeUYVz
         R2rCbeL7wjK79gdTdlfirT4cbNxQZ23G2hUBW0dgvsAXTmyuKoX/ZErY/hIW+9ZIVE/5
         EJLm4zBPGGTTPGU4sgPCdrWauYh8cCX4VRimC4R3o91wyB8HU2sNGXrvhk32PpVhjYzj
         s8gw==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=mjejGvAf;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 k185-20020a6384c2000000b0046eed3142cesi9835438pgd.350.2022.11.20.16.28.56;
        Sun, 20 Nov 2022 16:29:09 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=mjejGvAf;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229784AbiKUA2W (ORCPT <rfc822;leviz.kernel.dev@gmail.com>
        + 99 others); Sun, 20 Nov 2022 19:28:22 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57614 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229841AbiKUA1b (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Sun, 20 Nov 2022 19:27:31 -0500
Received: from mga05.intel.com (mga05.intel.com [192.55.52.43])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5D6D612B;
        Sun, 20 Nov 2022 16:27:25 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1668990445; x=1700526445;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=KhP8T6MQSf6upYtvCFaMabziPVPSa54uAPjTpE/5J6U=;
  b=mjejGvAfYHGUuXM1+JRE7TbQbsl/23UP1L68UOtB5ACr70Batjb0tLVN
   9k42Ii9qSCffjiJVgoPOmgSeBXHg8AmY1fdqtIGbQ3+WO+8tQRlppZTj9
   fV5pji0I48imGZE4U/dZQEs/yIZOQ6XSJY7uvR0DV1aO/k2eT9vEXSDSF
   BSjonsPdJGbyqQ/pj71Y/evYkEa9LYqLECN7KMaakkxCIqkiAZw6HXWgz
   C5aheeTjY9lUonds1Oyzb8wjDenL4lcro4H8Sa1yNSNnAL+QtryT6bTSk
   mbDnE7Ag1akAYPs50Nzy6jSpiIsmWBEk3NnFC13+77rwwNzS15cpcerzi
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="399732311"
X-IronPort-AV: E=Sophos;i="5.96,180,1665471600";
   d="scan'208";a="399732311"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
  by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Nov 2022 16:27:25 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="729825287"
X-IronPort-AV: E=Sophos;i="5.96,180,1665471600";
   d="scan'208";a="729825287"
Received: from tomnavar-mobl.amr.corp.intel.com (HELO
 khuang2-desk.gar.corp.intel.com) ([10.209.176.15])
  by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Nov 2022 16:27:21 -0800
From: Kai Huang <kai.huang@intel.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: linux-mm@kvack.org, seanjc@google.com, pbonzini@redhat.com,
        dave.hansen@intel.com, dan.j.williams@intel.com,
        rafael.j.wysocki@intel.com, kirill.shutemov@linux.intel.com,
        ying.huang@intel.com, reinette.chatre@intel.com,
        len.brown@intel.com, tony.luck@intel.com, peterz@infradead.org,
        ak@linux.intel.com, isaku.yamahata@intel.com, chao.gao@intel.com,
        sathyanarayanan.kuppuswamy@linux.intel.com, bagasdotme@gmail.com,
        sagis@google.com, imammedo@redhat.com, kai.huang@intel.com
Subject: [PATCH v7 07/20] x86/virt/tdx: Do TDX module global initialization
Date: Mon, 21 Nov 2022 13:26:29 +1300
Message-Id: 
 <40824ec3e3dc759705dcfa1cb2929d18c12b417a.1668988357.git.kai.huang@intel.com>
X-Mailer: git-send-email 2.38.1
In-Reply-To: <cover.1668988357.git.kai.huang@intel.com>
References: <cover.1668988357.git.kai.huang@intel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,
        SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1750063434406378685?=
X-GMAIL-MSGID: =?utf-8?q?1750063434406378685?=

The first step of initializing the module is to call TDH.SYS.INIT once
on any logical cpu to do module global initialization.  Do the module
global initialization.

It also detects the TDX module, as seamcall() returns -ENODEV when the
module is not loaded.

Signed-off-by: Kai Huang <kai.huang@intel.com>
---

v6 -> v7:
 - Improved changelog.

---
 arch/x86/virt/vmx/tdx/tdx.c | 19 +++++++++++++++++--
 arch/x86/virt/vmx/tdx/tdx.h |  1 +
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 5db1a05cb4bd..f292292313bd 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -208,8 +208,23 @@ static void seamcall_on_each_cpu(struct seamcall_ctx *sc)
  */
 static int init_tdx_module(void)
 {
-	/* The TDX module hasn't been detected */
-	return -ENODEV;
+	int ret;
+
+	/*
+	 * Call TDH.SYS.INIT to do the global initialization of
+	 * the TDX module.  It also detects the module.
+	 */
+	ret = seamcall(TDH_SYS_INIT, 0, 0, 0, 0, NULL, NULL);
+	if (ret)
+		goto out;
+
+	/*
+	 * Return -EINVAL until all steps of TDX module initialization
+	 * process are done.
+	 */
+	ret = -EINVAL;
+out:
+	return ret;
 }
 
 static void shutdown_tdx_module(void)
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index 215cc1065d78..0b415805c921 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -15,6 +15,7 @@
 /*
  * TDX module SEAMCALL leaf functions
  */
+#define TDH_SYS_INIT		33
 #define TDH_SYS_LP_SHUTDOWN	44
 
 /*

From patchwork Mon Nov 21 00:26:30 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kai Huang <kai.huang@intel.com>
X-Patchwork-Id: 23483
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1322612wrr;
        Sun, 20 Nov 2022 16:29:05 -0800 (PST)
X-Google-Smtp-Source: 
 AA0mqf4XEDAB9Dr03YA1VANI6sBFFw6+dg+KksDDyuQ/hycEWd0gmnLw3zkECdJuL6VhzM3BxoMn
X-Received: by 2002:a17:90b:3c8:b0:218:aaee:e266 with SMTP id
 go8-20020a17090b03c800b00218aaeee266mr3987039pjb.199.1668990545722;
        Sun, 20 Nov 2022 16:29:05 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1668990545; cv=none;
        d=google.com; s=arc-20160816;
        b=qvZ+gFncTi+W7Md5A9ZcujUC1ycDXUhs6yKQ33R83jogHyV9DJYXQH5bBratgLImft
         V4FUbOWMpyDoxPFWsbSoh0fxFnwHcGDmj30Rz2VoViGD5pTkPjEoj7oWXVG0PMbqUp65
         iYP/J4ozm8SHjFkLJDblSoSMCRXhAC6vk3Fx0GJz7f9MhBK2pQx87UPQ/G8UcajJxWkF
         Yo/Rype61cdr6Pgfwu5BfL6VCnQIMfNQHsBEj1yA2u6BoN2uMpOFu6NhGKBvQPUf/mTL
         gQ1kQyyNW3g4BuLTXlYPVEyf6MWXSCxfmaow+cHn+F1iNfD0Y2ILODBg16myP6VMrNvU
         HQag==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from
         :dkim-signature;
        bh=S8HROhjstKfGAUz4QI9Bcnwwg4WUDsfi30P1C7QwVIY=;
        b=Ypk/OohZV9hnSSUF1KXkTHr0GGp79ztcbIbjJMQRJl++R0ZZxAz+z4MvkZYRDd35LJ
         9XihRhkxsnZsxmtTaZzMKwOafarc1EJRY7/eSbeMryIlibvSdEdOn2MqdfgZkiTSkEec
         b8xTokJVvDmB11Ia0xmc778/oov7BAuUtEvvghSCqui86xx1TiaEvCxQLy04DXg2yxpK
         l0NU6cjDGdBebDHWClDc0yMGaoQ4BCmrkOvYoNKk51ooWwdb6Zg0K0B7vyi3cJNyM5h5
         XcuWzmevcVcCHiuOvYp/y9zfmCdXgjFRDbOXhGsyi6iJgCNmg6gt9T8LA0bglyIxOkIN
         He6g==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=fshP4RzA;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 c198-20020a6335cf000000b0046f3dfb87fasi10659298pga.191.2022.11.20.16.28.52;
        Sun, 20 Nov 2022 16:29:05 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=fshP4RzA;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230009AbiKUA2Q (ORCPT <rfc822;leviz.kernel.dev@gmail.com>
        + 99 others); Sun, 20 Nov 2022 19:28:16 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57332 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229845AbiKUA1b (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Sun, 20 Nov 2022 19:27:31 -0500
Received: from mga05.intel.com (mga05.intel.com [192.55.52.43])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D37B0B63;
        Sun, 20 Nov 2022 16:27:29 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1668990449; x=1700526449;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=gOZ0T77u1owzOZnOoSrcaq1f36vBbh2UkFPGd0zudQU=;
  b=fshP4RzAcPivCZj9voHBLtObrFqIdBHHUw6YCKJkzpzrfb70Yrw2mXjE
   GRC4x/1V1l3HBSczukuhLob81yms3MdDjLnOBta40Jm9mIEmGC3WFsixG
   gaBWVH18haj7xzbaoLGuoroMNq3P1wfngG4XHZhgkmMhkM5YUhrkE6SUk
   8Phd6FJ4CWaIuuQWBnkSdCXL+1CYhqGEG9l3Pab1Aq2pWvdS3LaeCLK/h
   OhVJSZyEiNhVKv/JMvwiuDQI1fSSMpgbvkfIm+7gdriBlxT7R0FuiXcf+
   qd6nh8tF4LjhwxleTUe99zV16JyPrn8KNtqpdZh7h1ofJpwV7A9U4miJ9
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="399732315"
X-IronPort-AV: E=Sophos;i="5.96,180,1665471600";
   d="scan'208";a="399732315"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
  by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Nov 2022 16:27:29 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="729825319"
X-IronPort-AV: E=Sophos;i="5.96,180,1665471600";
   d="scan'208";a="729825319"
Received: from tomnavar-mobl.amr.corp.intel.com (HELO
 khuang2-desk.gar.corp.intel.com) ([10.209.176.15])
  by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Nov 2022 16:27:25 -0800
From: Kai Huang <kai.huang@intel.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: linux-mm@kvack.org, seanjc@google.com, pbonzini@redhat.com,
        dave.hansen@intel.com, dan.j.williams@intel.com,
        rafael.j.wysocki@intel.com, kirill.shutemov@linux.intel.com,
        ying.huang@intel.com, reinette.chatre@intel.com,
        len.brown@intel.com, tony.luck@intel.com, peterz@infradead.org,
        ak@linux.intel.com, isaku.yamahata@intel.com, chao.gao@intel.com,
        sathyanarayanan.kuppuswamy@linux.intel.com, bagasdotme@gmail.com,
        sagis@google.com, imammedo@redhat.com, kai.huang@intel.com
Subject: [PATCH v7 08/20] x86/virt/tdx: Do logical-cpu scope TDX module
 initialization
Date: Mon, 21 Nov 2022 13:26:30 +1300
Message-Id: 
 <083f32ce0721611b7fc9297641cc8f5f222b937e.1668988357.git.kai.huang@intel.com>
X-Mailer: git-send-email 2.38.1
In-Reply-To: <cover.1668988357.git.kai.huang@intel.com>
References: <cover.1668988357.git.kai.huang@intel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,
        SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1750063430441842233?=
X-GMAIL-MSGID: =?utf-8?q?1750063430441842233?=

After the global module initialization, the next step is logical-cpu
scope module initialization.  Logical-cpu initialization requires
calling TDH.SYS.LP.INIT on all BIOS-enabled CPUs.  This SEAMCALL can run
concurrently on all CPUs.

Use the helper introduced for shutting down the module to do logical-cpu
scope initialization.

Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/virt/vmx/tdx/tdx.c | 14 ++++++++++++++
 arch/x86/virt/vmx/tdx/tdx.h |  1 +
 2 files changed, 15 insertions(+)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index f292292313bd..2cf7090667aa 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -199,6 +199,15 @@ static void seamcall_on_each_cpu(struct seamcall_ctx *sc)
 	on_each_cpu(seamcall_smp_call_function, sc, true);
 }
 
+static int tdx_module_init_cpus(void)
+{
+	struct seamcall_ctx sc = { .fn = TDH_SYS_LP_INIT };
+
+	seamcall_on_each_cpu(&sc);
+
+	return atomic_read(&sc.err);
+}
+
 /*
  * Detect and initialize the TDX module.
  *
@@ -218,6 +227,11 @@ static int init_tdx_module(void)
 	if (ret)
 		goto out;
 
+	/* Logical-cpu scope initialization */
+	ret = tdx_module_init_cpus();
+	if (ret)
+		goto out;
+
 	/*
 	 * Return -EINVAL until all steps of TDX module initialization
 	 * process are done.
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index 0b415805c921..9ba11808bd45 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -16,6 +16,7 @@
  * TDX module SEAMCALL leaf functions
  */
 #define TDH_SYS_INIT		33
+#define TDH_SYS_LP_INIT		35
 #define TDH_SYS_LP_SHUTDOWN	44
 
 /*

From patchwork Mon Nov 21 00:26:31 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kai Huang <kai.huang@intel.com>
X-Patchwork-Id: 23485
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1322691wrr;
        Sun, 20 Nov 2022 16:29:25 -0800 (PST)
X-Google-Smtp-Source: 
 AA0mqf6sIs+xfg7TygmANUmtR2J8hYCC7pxJOSSB7wMzhrTHVkBW0f8ysCN/rBSmFuvSPUyilieG
X-Received: by 2002:a17:902:dac2:b0:189:7d5:26ea with SMTP id
 q2-20020a170902dac200b0018907d526eamr9295730plx.145.1668990565486;
        Sun, 20 Nov 2022 16:29:25 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1668990565; cv=none;
        d=google.com; s=arc-20160816;
        b=nQzxmtQRqCrd6j3+glDR6awNwA7bLMNu6/CbJH+tDTVnfrCo5M5J0Sg4tBejzRW2xn
         O9B1LBAShkXvleuxtS63sWJrCGJnjKkS+cxR1S7yEgJ/sFkJQEIiHM36JTECsQmPX3y0
         VjfSIENuWoCKQ5gc1XQJbCjuRCAFCNWh8JQqV7mGzk71jpVwDqaXp1+gXwwBXe/dcoVN
         Gz4XTwhNk2Y2jAGiL1nN0HvxiVhPweYT7EI00BU4W9ttUrJZsl3HiDCSm43UT1VMjlRU
         BvYv+H/2qlGoxSX89MzMo4UTPqscgo2WlXLlVdyNRrq+gRCsrIrhovQncm1tYTQ3yFDn
         8J+A==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from
         :dkim-signature;
        bh=G7nW5sje5OqaY411P07fynbrjrGLuDjodie3EK+0lh4=;
        b=y/zJErMCcm3hw7IIje+ZpNvN80f7EELP9+GYj2yweq/IcGUMON5+PsK4HhbNFhJV7X
         hkBQmoIuzyC+OTjuqv1QVum5107hj52N8bfgscQCHNyUsXA7AroXCiNRYwSu1HWB/N4L
         CJfPDV3yO1zsZIol3PCtloG50eDQjT5xzCvCiP82tSuqRNmAEM2FoPSIN/4mZZoW5WUL
         VqNddM/jUIqCcR3M5gR5FMfzudOznVWNvwQhiR3pqT6SigB4Fdq5l6/CZ/t3W8nOCr9K
         +IXF3l32gYxg2ik8Ausm9cKe3yxuz3ntSi7APGfgA9jBwxpQ1dTx8itRwWCRB8+su+lo
         /cFw==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=hyLIKqwq;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 q2-20020a056a00150200b0054d5253e7d7si9878964pfu.190.2022.11.20.16.29.13;
        Sun, 20 Nov 2022 16:29:25 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=hyLIKqwq;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230018AbiKUA2q (ORCPT <rfc822;leviz.kernel.dev@gmail.com>
        + 99 others); Sun, 20 Nov 2022 19:28:46 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57434 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229890AbiKUA1f (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Sun, 20 Nov 2022 19:27:35 -0500
Received: from mga05.intel.com (mga05.intel.com [192.55.52.43])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 17747616D;
        Sun, 20 Nov 2022 16:27:34 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1668990454; x=1700526454;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=6Pa4jSAXA0/AWOM55d4JNHw0m9B5twdJCN8J0UtUQn8=;
  b=hyLIKqwq0kO/TATvXx1VJbXLf6jbfOc0ulSCkMEI0AUR42hgeNyKsf7g
   dwCh7zgLtT1qCTeoUBfFPhIxBtWt3nczOPvIavjBb1gf8oaUkbYqm2J4g
   YG8AwyRKg+OQb21Y1OaEXnAbPXGQ2fDJXdYT9hrqEj1QxHcNbx9YTFeCS
   NQnm0tC5UQxLc5pKaFBbi1D6Z1oS1Eo7rlfHjNVJZ6qX5SmTvQZLonkDm
   kf461/v1r3t+NhEqFBntYaPygUgDjFdtaZADY3iCdlynC0eXNS36bv21p
   HlPZgAtiYwGhlh24jUtOScpuhPh1XiQeBI1j+7lLfNohuHDoZQFldPvzH
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="399732318"
X-IronPort-AV: E=Sophos;i="5.96,180,1665471600";
   d="scan'208";a="399732318"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
  by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Nov 2022 16:27:33 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="729825337"
X-IronPort-AV: E=Sophos;i="5.96,180,1665471600";
   d="scan'208";a="729825337"
Received: from tomnavar-mobl.amr.corp.intel.com (HELO
 khuang2-desk.gar.corp.intel.com) ([10.209.176.15])
  by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Nov 2022 16:27:29 -0800
From: Kai Huang <kai.huang@intel.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: linux-mm@kvack.org, seanjc@google.com, pbonzini@redhat.com,
        dave.hansen@intel.com, dan.j.williams@intel.com,
        rafael.j.wysocki@intel.com, kirill.shutemov@linux.intel.com,
        ying.huang@intel.com, reinette.chatre@intel.com,
        len.brown@intel.com, tony.luck@intel.com, peterz@infradead.org,
        ak@linux.intel.com, isaku.yamahata@intel.com, chao.gao@intel.com,
        sathyanarayanan.kuppuswamy@linux.intel.com, bagasdotme@gmail.com,
        sagis@google.com, imammedo@redhat.com, kai.huang@intel.com
Subject: [PATCH v7 09/20] x86/virt/tdx: Get information about TDX module and
 TDX-capable memory
Date: Mon, 21 Nov 2022 13:26:31 +1300
Message-Id: 
 <cd23a9583edcfa85e11612d94ecfd2d5e862c1d5.1668988357.git.kai.huang@intel.com>
X-Mailer: git-send-email 2.38.1
In-Reply-To: <cover.1668988357.git.kai.huang@intel.com>
References: <cover.1668988357.git.kai.huang@intel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,
        SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1750063451511306314?=
X-GMAIL-MSGID: =?utf-8?q?1750063451511306314?=

TDX provides increased levels of memory confidentiality and integrity.
This requires special hardware support for features like memory
encryption and storage of memory integrity checksums.  Not all memory
satisfies these requirements.

As a result, TDX introduced the concept of a "Convertible Memory Region"
(CMR).  During boot, the firmware builds a list of all of the memory
ranges which can provide the TDX security guarantees.  The list of these
ranges, along with TDX module information, is available to the kernel by
querying the TDX module via TDH.SYS.INFO SEAMCALL.

The host kernel can choose whether or not to use all convertible memory
regions as TDX-usable memory.  Before the TDX module is ready to create
any TDX guests, the kernel needs to configure the TDX-usable memory
regions by passing an array of "TD Memory Regions" (TDMRs) to the TDX
module.  Constructing the TDMR array requires information of both the
TDX module (TDSYSINFO_STRUCT) and the Convertible Memory Regions.  Call
TDH.SYS.INFO to get this information as a preparation.

Use static variables for both TDSYSINFO_STRUCT and CMR array to avoid
having to pass them as function arguments when constructing the TDMR
array.  And they are too big to be put to the stack anyway.  Also, KVM
needs to use the TDSYSINFO_STRUCT to create TDX guests.

Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---

v6 -> v7:
 - Simplified the check of CMRs due to the fact that TDX actually
   verifies CMRs (that are passed by the BIOS) before enabling TDX.
 - Changed the function name from check_cmrs() -> trim_empty_cmrs().
 - Added CMR page aligned check so that later patch can just get the PFN
   using ">> PAGE_SHIFT".

v5 -> v6:
 - Added to also print TDX module's attribute (Isaku).
 - Removed all arguments in tdx_gete_sysinfo() to use static variables
   of 'tdx_sysinfo' and 'tdx_cmr_array' directly as they are all used
   directly in other functions in later patches.
 - Added Isaku's Reviewed-by.

- v3 -> v5 (no feedback on v4):
 - Renamed sanitize_cmrs() to check_cmrs().
 - Removed unnecessary sanity check against tdx_sysinfo and tdx_cmr_array
   actual size returned by TDH.SYS.INFO.
 - Changed -EFAULT to -EINVAL in couple places.
 - Added comments around tdx_sysinfo and tdx_cmr_array saying they are
   used by TDH.SYS.INFO ABI.
 - Changed to pass 'tdx_sysinfo' and 'tdx_cmr_array' as function
   arguments in tdx_get_sysinfo().
 - Changed to only print BIOS-CMR when check_cmrs() fails.

---
 arch/x86/virt/vmx/tdx/tdx.c | 125 ++++++++++++++++++++++++++++++++++++
 arch/x86/virt/vmx/tdx/tdx.h |  61 ++++++++++++++++++
 2 files changed, 186 insertions(+)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 2cf7090667aa..43227af25e44 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -15,6 +15,7 @@
 #include <linux/cpumask.h>
 #include <linux/smp.h>
 #include <linux/atomic.h>
+#include <linux/align.h>
 #include <asm/msr-index.h>
 #include <asm/msr.h>
 #include <asm/apic.h>
@@ -40,6 +41,11 @@ static enum tdx_module_status_t tdx_module_status;
 /* Prevent concurrent attempts on TDX detection and initialization */
 static DEFINE_MUTEX(tdx_module_lock);
 
+/* Below two are used in TDH.SYS.INFO SEAMCALL ABI */
+static struct tdsysinfo_struct tdx_sysinfo;
+static struct cmr_info tdx_cmr_array[MAX_CMRS] __aligned(CMR_INFO_ARRAY_ALIGNMENT);
+static int tdx_cmr_num;
+
 /*
  * Detect TDX private KeyIDs to see whether TDX has been enabled by the
  * BIOS.  Both initializing the TDX module and running TDX guest require
@@ -208,6 +214,121 @@ static int tdx_module_init_cpus(void)
 	return atomic_read(&sc.err);
 }
 
+static inline bool is_cmr_empty(struct cmr_info *cmr)
+{
+	return !cmr->size;
+}
+
+static inline bool is_cmr_ok(struct cmr_info *cmr)
+{
+	/* CMR must be page aligned */
+	return IS_ALIGNED(cmr->base, PAGE_SIZE) &&
+		IS_ALIGNED(cmr->size, PAGE_SIZE);
+}
+
+static void print_cmrs(struct cmr_info *cmr_array, int cmr_num,
+		       const char *name)
+{
+	int i;
+
+	for (i = 0; i < cmr_num; i++) {
+		struct cmr_info *cmr = &cmr_array[i];
+
+		pr_info("%s : [0x%llx, 0x%llx)\n", name,
+				cmr->base, cmr->base + cmr->size);
+	}
+}
+
+/* Check CMRs reported by TDH.SYS.INFO, and trim tail empty CMRs. */
+static int trim_empty_cmrs(struct cmr_info *cmr_array, int *actual_cmr_num)
+{
+	struct cmr_info *cmr;
+	int i, cmr_num;
+
+	/*
+	 * Intel TDX module spec, 20.7.3 CMR_INFO:
+	 *
+	 *   TDH.SYS.INFO leaf function returns a MAX_CMRS (32) entry
+	 *   array of CMR_INFO entries. The CMRs are sorted from the
+	 *   lowest base address to the highest base address, and they
+	 *   are non-overlapping.
+	 *
+	 * This implies that BIOS may generate invalid empty entries
+	 * if total CMRs are less than 32.  Need to skip them manually.
+	 *
+	 * CMR also must be 4K aligned.  TDX doesn't trust BIOS.  TDX
+	 * actually verifies CMRs before it gets enabled, so anything
+	 * doesn't meet above means kernel bug (or TDX is broken).
+	 */
+	cmr = &cmr_array[0];
+	/* There must be at least one valid CMR */
+	if (WARN_ON_ONCE(is_cmr_empty(cmr) || !is_cmr_ok(cmr)))
+		goto err;
+
+	cmr_num = *actual_cmr_num;
+	for (i = 1; i < cmr_num; i++) {
+		struct cmr_info *cmr = &cmr_array[i];
+		struct cmr_info *prev_cmr = NULL;
+
+		/* Skip further empty CMRs */
+		if (is_cmr_empty(cmr))
+			break;
+
+		/*
+		 * Do sanity check anyway to make sure CMRs:
+		 *  - are 4K aligned
+		 *  - don't overlap
+		 *  - are in address ascending order.
+		 */
+		if (WARN_ON_ONCE(!is_cmr_ok(cmr)))
+			goto err;
+
+		prev_cmr = &cmr_array[i - 1];
+		if (WARN_ON_ONCE((prev_cmr->base + prev_cmr->size) >
+					cmr->base))
+			goto err;
+	}
+
+	/* Update the actual number of CMRs */
+	*actual_cmr_num = i;
+
+	/* Print kernel checked CMRs */
+	print_cmrs(cmr_array, *actual_cmr_num, "Kernel-checked-CMR");
+
+	return 0;
+err:
+	pr_info("[TDX broken ?]: Invalid CMRs detected\n");
+	print_cmrs(cmr_array, cmr_num, "BIOS-CMR");
+	return -EINVAL;
+}
+
+static int tdx_get_sysinfo(void)
+{
+	struct tdx_module_output out;
+	int ret;
+
+	BUILD_BUG_ON(sizeof(struct tdsysinfo_struct) != TDSYSINFO_STRUCT_SIZE);
+
+	ret = seamcall(TDH_SYS_INFO, __pa(&tdx_sysinfo), TDSYSINFO_STRUCT_SIZE,
+			__pa(tdx_cmr_array), MAX_CMRS, NULL, &out);
+	if (ret)
+		return ret;
+
+	/* R9 contains the actual entries written the CMR array. */
+	tdx_cmr_num = out.r9;
+
+	pr_info("TDX module: atributes 0x%x, vendor_id 0x%x, major_version %u, minor_version %u, build_date %u, build_num %u",
+		tdx_sysinfo.attributes, tdx_sysinfo.vendor_id,
+		tdx_sysinfo.major_version, tdx_sysinfo.minor_version,
+		tdx_sysinfo.build_date, tdx_sysinfo.build_num);
+
+	/*
+	 * trim_empty_cmrs() updates the actual number of CMRs by
+	 * dropping all tail empty CMRs.
+	 */
+	return trim_empty_cmrs(tdx_cmr_array, &tdx_cmr_num);
+}
+
 /*
  * Detect and initialize the TDX module.
  *
@@ -232,6 +353,10 @@ static int init_tdx_module(void)
 	if (ret)
 		goto out;
 
+	ret = tdx_get_sysinfo();
+	if (ret)
+		goto out;
+
 	/*
 	 * Return -EINVAL until all steps of TDX module initialization
 	 * process are done.
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index 9ba11808bd45..8e273756098c 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -15,10 +15,71 @@
 /*
  * TDX module SEAMCALL leaf functions
  */
+#define TDH_SYS_INFO		32
 #define TDH_SYS_INIT		33
 #define TDH_SYS_LP_INIT		35
 #define TDH_SYS_LP_SHUTDOWN	44
 
+struct cmr_info {
+	u64	base;
+	u64	size;
+} __packed;
+
+#define MAX_CMRS			32
+#define CMR_INFO_ARRAY_ALIGNMENT	512
+
+struct cpuid_config {
+	u32	leaf;
+	u32	sub_leaf;
+	u32	eax;
+	u32	ebx;
+	u32	ecx;
+	u32	edx;
+} __packed;
+
+#define TDSYSINFO_STRUCT_SIZE		1024
+#define TDSYSINFO_STRUCT_ALIGNMENT	1024
+
+struct tdsysinfo_struct {
+	/* TDX-SEAM Module Info */
+	u32	attributes;
+	u32	vendor_id;
+	u32	build_date;
+	u16	build_num;
+	u16	minor_version;
+	u16	major_version;
+	u8	reserved0[14];
+	/* Memory Info */
+	u16	max_tdmrs;
+	u16	max_reserved_per_tdmr;
+	u16	pamt_entry_size;
+	u8	reserved1[10];
+	/* Control Struct Info */
+	u16	tdcs_base_size;
+	u8	reserved2[2];
+	u16	tdvps_base_size;
+	u8	tdvps_xfam_dependent_size;
+	u8	reserved3[9];
+	/* TD Capabilities */
+	u64	attributes_fixed0;
+	u64	attributes_fixed1;
+	u64	xfam_fixed0;
+	u64	xfam_fixed1;
+	u8	reserved4[32];
+	u32	num_cpuid_config;
+	/*
+	 * The actual number of CPUID_CONFIG depends on above
+	 * 'num_cpuid_config'.  The size of 'struct tdsysinfo_struct'
+	 * is 1024B defined by TDX architecture.  Use a union with
+	 * specific padding to make 'sizeof(struct tdsysinfo_struct)'
+	 * equal to 1024.
+	 */
+	union {
+		struct cpuid_config	cpuid_configs[0];
+		u8			reserved5[892];
+	};
+} __packed __aligned(TDSYSINFO_STRUCT_ALIGNMENT);
+
 /*
  * Do not put any hardware-defined TDX structure representations below
  * this comment!

From patchwork Mon Nov 21 00:26:32 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kai Huang <kai.huang@intel.com>
X-Patchwork-Id: 23486
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1322745wrr;
        Sun, 20 Nov 2022 16:29:34 -0800 (PST)
X-Google-Smtp-Source: 
 AA0mqf4mZzA1DjPEsf94B+MXPDVGyQLexc504laC4iaUBaHhOUY/JRQjGbbgs2CdgiFbrpOOyIBO
X-Received: by 2002:a17:903:2642:b0:186:99ef:891c with SMTP id
 je2-20020a170903264200b0018699ef891cmr9590050plb.169.1668990573928;
        Sun, 20 Nov 2022 16:29:33 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1668990573; cv=none;
        d=google.com; s=arc-20160816;
        b=Gw1oDRKIU9seZcUOkC8iTgCqkYr6QSuyFXxafl+VKtrggYGo6pFMM176Ut0UCiFcU0
         GjF6zX7buNBKS50N/eCmYUxI5YYynQsJhtLzkqTrULH4IGcbzAZhnB2/TzL59gv0xQzd
         26AQmF3fhMrZNIneCb0dn5P6VJCRl+tJunjhhiQuGxrpBZg8R5c6gUFA8eW1R/rp9bxy
         B7AbdeePxd57IrcBiSCH0b+E98ZvqXmygK7ked+N8W6I75n7+EEt9rlyIFiibHYwhtt1
         OLWQ7VU9l8w1CcO0H0TsxniV7/wM6LyQWUG9EDD6VTgWIgJdMcTtB7gbjMPTNUIfaTsg
         q0sA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from
         :dkim-signature;
        bh=gJ+5X3JEGP1sm7qRluYBRppMuKG810qRYdjZgW7jZGA=;
        b=hOn+jBpxYxJ3qCN3Q1d4Z2vHXclj+gdsOyaT6NDAGZYHxklRv5WcAAvkQIP5dr401q
         hEFKgmu9zfCD8iyi4KUZGxDiKqC6wqKihrfebrvFl+CSMDsBsCwK7sIEMnmhVSc6C3OE
         j+IVwbsk/SV09MJW7wdNlt1u1xAckMPzLY3Ohlu4/YamkM2Zc7FJAiJjnsaXIAKmZqIk
         3jmdxCP7NfNcfHxo1TfO+ietatAZFR/tBC3Jan4l2ESOXeXGun/Tl+fozcCreyQWE7jd
         OSLXjcpRIi36yqzNQFUjd+Em0mSLPPGKYzYXRLwkSsne0L4jEMOlcv5xiEyT1u3blxdy
         dDkg==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=eV8f2n2Q;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 x185-20020a6331c2000000b004639c772888si10024222pgx.225.2022.11.20.16.29.18;
        Sun, 20 Nov 2022 16:29:33 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=eV8f2n2Q;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229708AbiKUA2v (ORCPT <rfc822;leviz.kernel.dev@gmail.com>
        + 99 others); Sun, 20 Nov 2022 19:28:51 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57286 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229814AbiKUA1n (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Sun, 20 Nov 2022 19:27:43 -0500
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7F73E1743C;
        Sun, 20 Nov 2022 16:27:39 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1668990460; x=1700526460;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=tTM6f9ZtDSECP7CTls1i410chojXO/BfWVbWBowQUQE=;
  b=eV8f2n2Q2AKNiPl6TiQVQ7EZDr0TXdEeZNV0lCIwCghrm3Y+tY80LcjL
   x8+N3LJL0qFcQ6z9f6yiqxwS4n7jehm87LGlUKoym4gB4W9fxTkNaUEzq
   mZQQsrqcG+5Z9IBBgVLBHQiTTkjIkEGTbg6B9+dINi+QnIb/FodL4DibA
   W5pG4WgbXFov9ROhYGszt5AkRWcFpIGpUC6N1CKYXF3eWWgXR406CPSSR
   cxI0QcTCvC7bT1HVDIKpkFGZGf/X+BblSfFDTpYkI+nkiEOFMg0lnZZ8g
   e8fzGgkg78HGDnGfPbDPnQf7p36F0RcacYGiyX+pRA0xp3jSFi+yKBYNK
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="377705705"
X-IronPort-AV: E=Sophos;i="5.96,180,1665471600";
   d="scan'208";a="377705705"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
  by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Nov 2022 16:27:38 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="729825362"
X-IronPort-AV: E=Sophos;i="5.96,180,1665471600";
   d="scan'208";a="729825362"
Received: from tomnavar-mobl.amr.corp.intel.com (HELO
 khuang2-desk.gar.corp.intel.com) ([10.209.176.15])
  by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Nov 2022 16:27:34 -0800
From: Kai Huang <kai.huang@intel.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: linux-mm@kvack.org, seanjc@google.com, pbonzini@redhat.com,
        dave.hansen@intel.com, dan.j.williams@intel.com,
        rafael.j.wysocki@intel.com, kirill.shutemov@linux.intel.com,
        ying.huang@intel.com, reinette.chatre@intel.com,
        len.brown@intel.com, tony.luck@intel.com, peterz@infradead.org,
        ak@linux.intel.com, isaku.yamahata@intel.com, chao.gao@intel.com,
        sathyanarayanan.kuppuswamy@linux.intel.com, bagasdotme@gmail.com,
        sagis@google.com, imammedo@redhat.com, kai.huang@intel.com
Subject: [PATCH v7 10/20] x86/virt/tdx: Use all system memory when
 initializing TDX module as TDX memory
Date: Mon, 21 Nov 2022 13:26:32 +1300
Message-Id: 
 <9b545148275b14a8c7edef1157f8ec44dc8116ee.1668988357.git.kai.huang@intel.com>
X-Mailer: git-send-email 2.38.1
In-Reply-To: <cover.1668988357.git.kai.huang@intel.com>
References: <cover.1668988357.git.kai.huang@intel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,
        SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1750063459829070488?=
X-GMAIL-MSGID: =?utf-8?q?1750063459829070488?=

TDX reports a list of "Convertible Memory Region" (CMR) to indicate all
memory regions that can possibly be used by the TDX module, but they are
not automatically usable to the TDX module.  As a step of initializing
the TDX module, the kernel needs to choose a list of memory regions (out
from convertible memory regions) that the TDX module can use and pass
those regions to the TDX module.  Once this is done, those "TDX-usable"
memory regions are fixed during module's lifetime.  No more TDX-usable
memory can be added to the TDX module after that.

The initial support of TDX guests will only allocate TDX guest memory
from the global page allocator.  To keep things simple, this initial
implementation simply guarantees all pages in the page allocator are TDX
memory.  To achieve this, use all system memory in the core-mm at the
time of initializing the TDX module as TDX memory, and at the meantime,
refuse to add any non-TDX-memory in the memory hotplug.

Specifically, walk through all memory regions managed by memblock and
add them to a global list of "TDX-usable" memory regions, which is a
fixed list after the module initialization (or empty if initialization
fails).  To reject non-TDX-memory in memory hotplug, add an additional
check in arch_add_memory() to check whether the new region is covered by
any region in the "TDX-usable" memory region list.

Note this requires all memory regions in memblock are TDX convertible
memory when initializing the TDX module.  This is true in practice if no
new memory has been hot-added before initializing the TDX module, since
in practice all boot-time present DIMM is TDX convertible memory.  If
any new memory has been hot-added, then initializing the TDX module will
fail due to that memory region is not covered by CMR.

This can be enhanced in the future, i.e. by allowing adding non-TDX
memory to a separate NUMA node.  In this case, the "TDX-capable" nodes
and the "non-TDX-capable" nodes can co-exist, but the kernel/userspace
needs to guarantee memory pages for TDX guests are always allocated from
the "TDX-capable" nodes.

Note TDX assumes convertible memory is always physically present during
machine's runtime.  A non-buggy BIOS should never support hot-removal of
any convertible memory.  This implementation doesn't handle ACPI memory
removal but depends on the BIOS to behave correctly.

Signed-off-by: Kai Huang <kai.huang@intel.com>
---

v6 -> v7:
 - Changed to use all system memory in memblock at the time of
   initializing the TDX module as TDX memory
 - Added memory hotplug support

---
 arch/x86/Kconfig            |   1 +
 arch/x86/include/asm/tdx.h  |   3 +
 arch/x86/mm/init_64.c       |  10 ++
 arch/x86/virt/vmx/tdx/tdx.c | 183 ++++++++++++++++++++++++++++++++++++
 4 files changed, 197 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index dd333b46fafb..b36129183035 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1959,6 +1959,7 @@ config INTEL_TDX_HOST
 	depends on X86_64
 	depends on KVM_INTEL
 	depends on X86_X2APIC
+	select ARCH_KEEP_MEMBLOCK
 	help
 	  Intel Trust Domain Extensions (TDX) protects guest VMs from malicious
 	  host and certain physical attacks.  This option enables necessary TDX
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index d688228f3151..71169ecefabf 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -111,9 +111,12 @@ static inline long tdx_kvm_hypercall(unsigned int nr, unsigned long p1,
 #ifdef CONFIG_INTEL_TDX_HOST
 bool platform_tdx_enabled(void);
 int tdx_enable(void);
+bool tdx_cc_memory_compatible(unsigned long start_pfn, unsigned long end_pfn);
 #else	/* !CONFIG_INTEL_TDX_HOST */
 static inline bool platform_tdx_enabled(void) { return false; }
 static inline int tdx_enable(void)  { return -ENODEV; }
+static inline bool tdx_cc_memory_compatible(unsigned long start_pfn,
+		unsigned long end_pfn) { return true; }
 #endif	/* CONFIG_INTEL_TDX_HOST */
 
 #endif /* !__ASSEMBLY__ */
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 3f040c6e5d13..900341333d7e 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -55,6 +55,7 @@
 #include <asm/uv/uv.h>
 #include <asm/setup.h>
 #include <asm/ftrace.h>
+#include <asm/tdx.h>
 
 #include "mm_internal.h"
 
@@ -968,6 +969,15 @@ int arch_add_memory(int nid, u64 start, u64 size,
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 
+	/*
+	 * For now if TDX is enabled, all pages in the page allocator
+	 * must be TDX memory, which is a fixed set of memory regions
+	 * that are passed to the TDX module.  Reject the new region
+	 * if it is not TDX memory to guarantee above is true.
+	 */
+	if (!tdx_cc_memory_compatible(start_pfn, start_pfn + nr_pages))
+		return -EINVAL;
+
 	init_memory_mapping(start, start + size, params->pgprot);
 
 	return add_pages(nid, start_pfn, nr_pages, params);
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 43227af25e44..32af86e31c47 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -16,6 +16,11 @@
 #include <linux/smp.h>
 #include <linux/atomic.h>
 #include <linux/align.h>
+#include <linux/list.h>
+#include <linux/slab.h>
+#include <linux/memblock.h>
+#include <linux/minmax.h>
+#include <linux/sizes.h>
 #include <asm/msr-index.h>
 #include <asm/msr.h>
 #include <asm/apic.h>
@@ -34,6 +39,13 @@ enum tdx_module_status_t {
 	TDX_MODULE_SHUTDOWN,
 };
 
+struct tdx_memblock {
+	struct list_head list;
+	unsigned long start_pfn;
+	unsigned long end_pfn;
+	int nid;
+};
+
 static u32 tdx_keyid_start __ro_after_init;
 static u32 tdx_keyid_num __ro_after_init;
 
@@ -46,6 +58,9 @@ static struct tdsysinfo_struct tdx_sysinfo;
 static struct cmr_info tdx_cmr_array[MAX_CMRS] __aligned(CMR_INFO_ARRAY_ALIGNMENT);
 static int tdx_cmr_num;
 
+/* All TDX-usable memory regions */
+static LIST_HEAD(tdx_memlist);
+
 /*
  * Detect TDX private KeyIDs to see whether TDX has been enabled by the
  * BIOS.  Both initializing the TDX module and running TDX guest require
@@ -329,6 +344,107 @@ static int tdx_get_sysinfo(void)
 	return trim_empty_cmrs(tdx_cmr_array, &tdx_cmr_num);
 }
 
+/* Check whether the given pfn range is covered by any CMR or not. */
+static bool pfn_range_covered_by_cmr(unsigned long start_pfn,
+				     unsigned long end_pfn)
+{
+	int i;
+
+	for (i = 0; i < tdx_cmr_num; i++) {
+		struct cmr_info *cmr = &tdx_cmr_array[i];
+		unsigned long cmr_start_pfn;
+		unsigned long cmr_end_pfn;
+
+		cmr_start_pfn = cmr->base >> PAGE_SHIFT;
+		cmr_end_pfn = (cmr->base + cmr->size) >> PAGE_SHIFT;
+
+		if (start_pfn >= cmr_start_pfn && end_pfn <= cmr_end_pfn)
+			return true;
+	}
+
+	return false;
+}
+
+/*
+ * Add a memory region on a given node as a TDX memory block.  The caller
+ * to make sure all memory regions are added in address ascending order
+ * and don't overlap.
+ */
+static int add_tdx_memblock(unsigned long start_pfn, unsigned long end_pfn,
+			    int nid)
+{
+	struct tdx_memblock *tmb;
+
+	tmb = kmalloc(sizeof(*tmb), GFP_KERNEL);
+	if (!tmb)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&tmb->list);
+	tmb->start_pfn = start_pfn;
+	tmb->end_pfn = end_pfn;
+	tmb->nid = nid;
+
+	list_add_tail(&tmb->list, &tdx_memlist);
+	return 0;
+}
+
+static void free_tdx_memory(void)
+{
+	while (!list_empty(&tdx_memlist)) {
+		struct tdx_memblock *tmb = list_first_entry(&tdx_memlist,
+				struct tdx_memblock, list);
+
+		list_del(&tmb->list);
+		kfree(tmb);
+	}
+}
+
+/*
+ * Add all memblock memory regions to the @tdx_memlist as TDX memory.
+ * Must be called when get_online_mems() is called by the caller.
+ */
+static int build_tdx_memory(void)
+{
+	unsigned long start_pfn, end_pfn;
+	int i, nid, ret;
+
+	for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) {
+		/*
+		 * The first 1MB may not be reported as TDX convertible
+		 * memory.  Manually exclude them as TDX memory.
+		 *
+		 * This is fine as the first 1MB is already reserved in
+		 * reserve_real_mode() and won't end up to ZONE_DMA as
+		 * free page anyway.
+		 */
+		start_pfn = max(start_pfn, (unsigned long)SZ_1M >> PAGE_SHIFT);
+		if (start_pfn >= end_pfn)
+			continue;
+
+		/* Verify memory is truly TDX convertible memory */
+		if (!pfn_range_covered_by_cmr(start_pfn, end_pfn)) {
+			pr_info("Memory region [0x%lx, 0x%lx) is not TDX convertible memorry.\n",
+					start_pfn << PAGE_SHIFT,
+					end_pfn << PAGE_SHIFT);
+			return -EINVAL;
+		}
+
+		/*
+		 * Add the memory regions as TDX memory.  The regions in
+		 * memblock has already guaranteed they are in address
+		 * ascending order and don't overlap.
+		 */
+		ret = add_tdx_memblock(start_pfn, end_pfn, nid);
+		if (ret)
+			goto err;
+	}
+
+	return 0;
+err:
+	free_tdx_memory();
+	return ret;
+}
+
 /*
  * Detect and initialize the TDX module.
  *
@@ -357,12 +473,56 @@ static int init_tdx_module(void)
 	if (ret)
 		goto out;
 
+	/*
+	 * All memory regions that can be used by the TDX module must be
+	 * passed to the TDX module during the module initialization.
+	 * Once this is done, all "TDX-usable" memory regions are fixed
+	 * during module's runtime.
+	 *
+	 * The initial support of TDX guests only allocates memory from
+	 * the global page allocator.  To keep things simple, for now
+	 * just make sure all pages in the page allocator are TDX memory.
+	 *
+	 * To achieve this, use all system memory in the core-mm at the
+	 * time of initializing the TDX module as TDX memory, and at the
+	 * meantime, reject any new memory in memory hot-add.
+	 *
+	 * This works as in practice, all boot-time present DIMM is TDX
+	 * convertible memory.  However if any new memory is hot-added
+	 * before initializing the TDX module, the initialization will
+	 * fail due to that memory is not covered by CMR.
+	 *
+	 * This can be enhanced in the future, i.e. by allowing adding or
+	 * onlining non-TDX memory to a separate node, in which case the
+	 * "TDX-capable" nodes and the "non-TDX-capable" nodes can exist
+	 * together -- the userspace/kernel just needs to make sure pages
+	 * for TDX guests must come from those "TDX-capable" nodes.
+	 *
+	 * Build the list of TDX memory regions as mentioned above so
+	 * they can be passed to the TDX module later.
+	 */
+	get_online_mems();
+
+	ret = build_tdx_memory();
+	if (ret)
+		goto out;
 	/*
 	 * Return -EINVAL until all steps of TDX module initialization
 	 * process are done.
 	 */
 	ret = -EINVAL;
 out:
+	/*
+	 * Memory hotplug checks the hot-added memory region against the
+	 * @tdx_memlist to see if the region is TDX memory.
+	 *
+	 * Do put_online_mems() here to make sure any modification to
+	 * @tdx_memlist is done while holding the memory hotplug read
+	 * lock, so that the memory hotplug path can just check the
+	 * @tdx_memlist w/o holding the @tdx_module_lock which may cause
+	 * deadlock.
+	 */
+	put_online_mems();
 	return ret;
 }
 
@@ -485,3 +645,26 @@ int tdx_enable(void)
 	return ret;
 }
 EXPORT_SYMBOL_GPL(tdx_enable);
+
+/*
+ * Check whether the given range is TDX memory.  Must be called between
+ * mem_hotplug_begin()/mem_hotplug_done().
+ */
+bool tdx_cc_memory_compatible(unsigned long start_pfn, unsigned long end_pfn)
+{
+	struct tdx_memblock *tmb;
+
+	/* Empty list means TDX isn't enabled successfully */
+	if (list_empty(&tdx_memlist))
+		return true;
+
+	list_for_each_entry(tmb, &tdx_memlist, list) {
+		/*
+		 * The new range is TDX memory if it is fully covered
+		 * by any TDX memory block.
+		 */
+		if (start_pfn >= tmb->start_pfn && end_pfn <= tmb->end_pfn)
+			return true;
+	}
+	return false;
+}

From patchwork Mon Nov 21 00:26:33 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kai Huang <kai.huang@intel.com>
X-Patchwork-Id: 23488
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1323032wrr;
        Sun, 20 Nov 2022 16:30:47 -0800 (PST)
X-Google-Smtp-Source: 
 AA0mqf4l3iK825Jkl9FOxpz/+UZYkIP9DhdEJ+pK1NFE1AnD7nSe8//s3JBef0CB4L7vBYOPAd5Z
X-Received: by 2002:a17:90a:a405:b0:218:8416:7268 with SMTP id
 y5-20020a17090aa40500b0021884167268mr13818714pjp.194.1668990647068;
        Sun, 20 Nov 2022 16:30:47 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1668990647; cv=none;
        d=google.com; s=arc-20160816;
        b=U8wu0RvfgGhTlmPqd6ONZCSss4xWMIP+7/NCgraIQKbBkgT3Stb42/t1+7I0V5CDFs
         9afq4Lt+92djiLf4MB+gPmClAxiB+ALB5ETnkZuJCJC4xqIOrKlZVRv2Y0emxKrAPigk
         yMpU4GMJ6VDlKBPbvK1PBoCLTRfeMnvqXNzzVTyM+gJKtAJLxhEVcz93ogo0eD3P4PaQ
         t1sknSnQcDWsrFRaQhL5w3LCVSmPqRBh/vO4ZrkAmppB6UjDZn9eyXFL82iCraMzIy5i
         R/acC0d9s5GKQOsIOu98rWsySZQOwEx9ci23eLXsTBwDQct9PP0nK4Kjqet6/I/TEApz
         TKRQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from
         :dkim-signature;
        bh=6ly/tW/l8sxVF+1POXPZy0u43a5lyFM1xw+ryHowF2c=;
        b=b3qBTibk2Vk6ciZSTq/9UKQ84N+Wogq7IbRCR721e8489qSUBHySor64w/vAhUfEds
         DLhOPTEiNgTvKpv2fN5CDItvMjdml8dsldOhtGzshP/a8d+Pn7SBqAo9x0aWAwqTC0dE
         DKPrmWTzf/y+v/c/RC5wvtpm1F6eUF8U+JsLm7IaII+NCO6AFVP0wAwQZbziMzPUwrej
         MtJMmSkOtsMa1Ig8awjSH/boDri4jkRgwVF2jnVmGbTQ3UWDUn9Ar5ywjKOHyTKwVe8M
         BIZ2mdiLT30s2VWqVUmih3i3LLVME+9/Ti0RgQU5syxKA1Za19vEhfCE4NQWO4RS3K5B
         gJoA==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=lIG1YbKV;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 lb7-20020a17090b4a4700b00205da45d8a5si13422266pjb.124.2022.11.20.16.30.34;
        Sun, 20 Nov 2022 16:30:47 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=lIG1YbKV;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230040AbiKUA2z (ORCPT <rfc822;leviz.kernel.dev@gmail.com>
        + 99 others); Sun, 20 Nov 2022 19:28:55 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58426 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229939AbiKUA1p (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Sun, 20 Nov 2022 19:27:45 -0500
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 329ED1DDC4;
        Sun, 20 Nov 2022 16:27:43 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1668990464; x=1700526464;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=znmF9BwMgILl7Jshlgv9LDu7pEGe6gH0wYN4oKNnxSU=;
  b=lIG1YbKVek7SRmRs2ulPJr/54zdTj04/YEKb14bLim0gDl7HAFBqoe7J
   ByBLoEQ+Sl7wMu4A3c+PN9058Uq/p3aWQoqBPVUBQ0mat6pxuysS+2UiU
   ON33luCIE6KYOZHRuvxCjFz+HjYXbPLtP2HtkR3/4OhduRbju2gqyCKWC
   IYqaxiSllS/rtDrLSv2nQHjsxOe80y1LDwE65pS1aJ3K4B3/Z6tAIxR5H
   fmxdhMnl0j05De0/f0SEZMaZoGlnqkDeBz2ZzWk0kOY2cqXNoq1jlhVkh
   CuGeUiD6kRWsafWUZGq6t+M2rWIJwSnPU2isakMZkBsuXZP6dClrOQSXs
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="377705706"
X-IronPort-AV: E=Sophos;i="5.96,180,1665471600";
   d="scan'208";a="377705706"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
  by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Nov 2022 16:27:42 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="729825390"
X-IronPort-AV: E=Sophos;i="5.96,180,1665471600";
   d="scan'208";a="729825390"
Received: from tomnavar-mobl.amr.corp.intel.com (HELO
 khuang2-desk.gar.corp.intel.com) ([10.209.176.15])
  by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Nov 2022 16:27:38 -0800
From: Kai Huang <kai.huang@intel.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: linux-mm@kvack.org, seanjc@google.com, pbonzini@redhat.com,
        dave.hansen@intel.com, dan.j.williams@intel.com,
        rafael.j.wysocki@intel.com, kirill.shutemov@linux.intel.com,
        ying.huang@intel.com, reinette.chatre@intel.com,
        len.brown@intel.com, tony.luck@intel.com, peterz@infradead.org,
        ak@linux.intel.com, isaku.yamahata@intel.com, chao.gao@intel.com,
        sathyanarayanan.kuppuswamy@linux.intel.com, bagasdotme@gmail.com,
        sagis@google.com, imammedo@redhat.com, kai.huang@intel.com
Subject: [PATCH v7 11/20] x86/virt/tdx: Add placeholder to construct TDMRs to
 cover all TDX memory regions
Date: Mon, 21 Nov 2022 13:26:33 +1300
Message-Id: 
 <32c1968fe34c8cf3cb834e3a9966cd2a201efc5b.1668988357.git.kai.huang@intel.com>
X-Mailer: git-send-email 2.38.1
In-Reply-To: <cover.1668988357.git.kai.huang@intel.com>
References: <cover.1668988357.git.kai.huang@intel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,
        SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1750063536793000198?=
X-GMAIL-MSGID: =?utf-8?q?1750063536793000198?=

TDX provides increased levels of memory confidentiality and integrity.
This requires special hardware support for features like memory
encryption and storage of memory integrity checksums.  Not all memory
satisfies these requirements.

As a result, the TDX introduced the concept of a "Convertible Memory
Region" (CMR).  During boot, the firmware builds a list of all of the
memory ranges which can provide the TDX security guarantees.  The list
of these ranges is available to the kernel by querying the TDX module.

The TDX architecture needs additional metadata to record things like
which TD guest "owns" a given page of memory.  This metadata essentially
serves as the 'struct page' for the TDX module.  The space for this
metadata is not reserved by the hardware up front and must be allocated
by the kernel and given to the TDX module.

Since this metadata consumes space, the VMM can choose whether or not to
allocate it for a given area of convertible memory.  If it chooses not
to, the memory cannot receive TDX protections and can not be used by TDX
guests as private memory.

For every memory region that the VMM wants to use as TDX memory, it sets
up a "TD Memory Region" (TDMR).  Each TDMR represents a physically
contiguous convertible range and must also have its own physically
contiguous metadata table, referred to as a Physical Address Metadata
Table (PAMT), to track status for each page in the TDMR range.

Unlike a CMR, each TDMR requires 1G granularity and alignment.  To
support physical RAM areas that don't meet those strict requirements,
each TDMR permits a number of internal "reserved areas" which can be
placed over memory holes.  If PAMT metadata is placed within a TDMR it
must be covered by one of these reserved areas.

Let's summarize the concepts:

 CMR - Firmware-enumerated physical ranges that support TDX.  CMRs are
       4K aligned.
TDMR - Physical address range which is chosen by the kernel to support
       TDX.  1G granularity and alignment required.  Each TDMR has
       reserved areas where TDX memory holes and overlapping PAMTs can
       be put into.
PAMT - Physically contiguous TDX metadata.  One table for each page size
       per TDMR.  Roughly 1/256th of TDMR in size.  256G TDMR = ~1G
       PAMT.

As one step of initializing the TDX module, the kernel configures
TDX-usable memory regions by passing an array of TDMRs to the TDX module.

Constructing the array of TDMRs consists below steps:

1) Create TDMRs to cover all memory regions that the TDX module can use;
2) Allocate and set up PAMT for each TDMR;
3) Set up reserved areas for each TDMR.

Add a placeholder to construct TDMRs to do the above steps after all
TDX memory regions are verified to be truly convertible.  Always free
TDMRs at the end of the initialization (no matter successful or not)
as TDMRs are only used during the initialization.

Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---

v6 -> v7:
 - Improved commit message to explain 'int' overflow cannot happen
   in cal_tdmr_size() and alloc_tdmr_array(). -- Andy/Dave.

v5 -> v6:
 - construct_tdmrs_memblock() -> construct_tdmrs() as 'tdx_memblock' is
   used instead of memblock.
 - Added Isaku's Reviewed-by.

- v3 -> v5 (no feedback on v4):
 - Moved calculating TDMR size to this patch.
 - Changed to use alloc_pages_exact() to allocate buffer for all TDMRs
   once, instead of allocating each TDMR individually.
 - Removed "crypto protection" in the changelog.
 - -EFAULT -> -EINVAL in couple of places.


---
 arch/x86/virt/vmx/tdx/tdx.c | 83 +++++++++++++++++++++++++++++++++++++
 arch/x86/virt/vmx/tdx/tdx.h | 23 ++++++++++
 2 files changed, 106 insertions(+)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 32af86e31c47..26048c6b0170 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -445,6 +445,63 @@ static int build_tdx_memory(void)
 	return ret;
 }
 
+/* Calculate the actual TDMR_INFO size */
+static inline int cal_tdmr_size(void)
+{
+	int tdmr_sz;
+
+	/*
+	 * The actual size of TDMR_INFO depends on the maximum number
+	 * of reserved areas.
+	 *
+	 * Note: for TDX1.0 the max_reserved_per_tdmr is 16, and
+	 * TDMR_INFO size is aligned up to 512-byte.  Even it is
+	 * extended in the future, it would be insane if TDMR_INFO
+	 * becomes larger than 4K.  The tdmr_sz here should never
+	 * overflow.
+	 */
+	tdmr_sz = sizeof(struct tdmr_info);
+	tdmr_sz += sizeof(struct tdmr_reserved_area) *
+		   tdx_sysinfo.max_reserved_per_tdmr;
+
+	/*
+	 * TDX requires each TDMR_INFO to be 512-byte aligned.  Always
+	 * round up TDMR_INFO size to the 512-byte boundary.
+	 */
+	return ALIGN(tdmr_sz, TDMR_INFO_ALIGNMENT);
+}
+
+static struct tdmr_info *alloc_tdmr_array(int *array_sz)
+{
+	/*
+	 * TDX requires each TDMR_INFO to be 512-byte aligned.
+	 * Use alloc_pages_exact() to allocate all TDMRs at once.
+	 * Each TDMR_INFO will still be 512-byte aligned since
+	 * cal_tdmr_size() always returns 512-byte aligned size.
+	 */
+	*array_sz = cal_tdmr_size() * tdx_sysinfo.max_tdmrs;
+
+	/*
+	 * Zero the buffer so 'struct tdmr_info::size' can be
+	 * used to determine whether a TDMR is valid.
+	 *
+	 * Note: for TDX1.0 the max_tdmrs is 64 and TDMR_INFO size
+	 * is 512-byte.  Even they are extended in the future, it
+	 * would be insane if the total size exceeds 4MB.
+	 */
+	return alloc_pages_exact(*array_sz, GFP_KERNEL | __GFP_ZERO);
+}
+
+/*
+ * Construct an array of TDMRs to cover all TDX memory ranges.
+ * The actual number of TDMRs is kept to @tdmr_num.
+ */
+static int construct_tdmrs(struct tdmr_info *tdmr_array, int *tdmr_num)
+{
+	/* Return -EINVAL until constructing TDMRs is done */
+	return -EINVAL;
+}
+
 /*
  * Detect and initialize the TDX module.
  *
@@ -454,6 +511,9 @@ static int build_tdx_memory(void)
  */
 static int init_tdx_module(void)
 {
+	struct tdmr_info *tdmr_array;
+	int tdmr_array_sz;
+	int tdmr_num;
 	int ret;
 
 	/*
@@ -506,11 +566,34 @@ static int init_tdx_module(void)
 	ret = build_tdx_memory();
 	if (ret)
 		goto out;
+
+	/* Prepare enough space to construct TDMRs */
+	tdmr_array = alloc_tdmr_array(&tdmr_array_sz);
+	if (!tdmr_array) {
+		ret = -ENOMEM;
+		goto out_free_tdx_mem;
+	}
+
+	/* Construct TDMRs to cover all TDX memory ranges */
+	ret = construct_tdmrs(tdmr_array, &tdmr_num);
+	if (ret)
+		goto out_free_tdmrs;
+
 	/*
 	 * Return -EINVAL until all steps of TDX module initialization
 	 * process are done.
 	 */
 	ret = -EINVAL;
+out_free_tdmrs:
+	/*
+	 * The array of TDMRs is freed no matter the initialization is
+	 * successful or not.  They are not needed anymore after the
+	 * module initialization.
+	 */
+	free_pages_exact(tdmr_array, tdmr_array_sz);
+out_free_tdx_mem:
+	if (ret)
+		free_tdx_memory();
 out:
 	/*
 	 * Memory hotplug checks the hot-added memory region against the
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index 8e273756098c..a737f2b51474 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -80,6 +80,29 @@ struct tdsysinfo_struct {
 	};
 } __packed __aligned(TDSYSINFO_STRUCT_ALIGNMENT);
 
+struct tdmr_reserved_area {
+	u64 offset;
+	u64 size;
+} __packed;
+
+#define TDMR_INFO_ALIGNMENT	512
+
+struct tdmr_info {
+	u64 base;
+	u64 size;
+	u64 pamt_1g_base;
+	u64 pamt_1g_size;
+	u64 pamt_2m_base;
+	u64 pamt_2m_size;
+	u64 pamt_4k_base;
+	u64 pamt_4k_size;
+	/*
+	 * Actual number of reserved areas depends on
+	 * 'struct tdsysinfo_struct'::max_reserved_per_tdmr.
+	 */
+	struct tdmr_reserved_area reserved_areas[0];
+} __packed __aligned(TDMR_INFO_ALIGNMENT);
+
 /*
  * Do not put any hardware-defined TDX structure representations below
  * this comment!

From patchwork Mon Nov 21 00:26:34 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kai Huang <kai.huang@intel.com>
X-Patchwork-Id: 23489
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1323038wrr;
        Sun, 20 Nov 2022 16:30:48 -0800 (PST)
X-Google-Smtp-Source: 
 AA0mqf53E02uAzkQJDgAPP/SGfgE2rVbcrgE4aeV6VJuB87Uiz4fpRtQCIOw+KiiltfX9GH8smSF
X-Received: by 2002:a17:90b:2688:b0:218:b9e1:ebef with SMTP id
 pl8-20020a17090b268800b00218b9e1ebefmr1131103pjb.65.1668990648299;
        Sun, 20 Nov 2022 16:30:48 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1668990648; cv=none;
        d=google.com; s=arc-20160816;
        b=XM/eaVSxeEMDOnpHPOMV0JvEhwnvHYWTG3BvNNRiWcwCTjQengXq3VI2/u2D6RTH4O
         +LG1QbfIecJCOLfR92NKAu89eMcKD3/FUDL0Ql0SBNLwSdLFzGS1+QZtkpX3Lt9YTQhp
         Wucry2OrEtWx7roHdujJqLN3Xqd33KWjMRV6A4WLNOSo7Vp3U4H4zZjsaYUGcdp9MCq/
         s5ioZPkkskkUtWZ9vlyrWcxmoXMNYGuIefAjdb4yyeduuEmCZVL2zxNtxWX9fimMYDTM
         M+AU2Vkl0PXsBuPy+bS/ulPS3gzjMMYgIpbwKQ8tfQyPrg0oIV9ZKq/I3W13dfVgV0WJ
         BRAQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from
         :dkim-signature;
        bh=jHNQ7/yL06M/T50RVxsQUWyWLSm0bBCn37vpvdDxts4=;
        b=PstLaZqzaLZH4q4NMEWA6pBNEZvSyeEOacIcoBIW7u53oOjiUt2GUeu0OSdqHBiQt6
         WR8uMEq+kqhmpyXAKVcGguupzogwfr/9nkVLsflLuz8NOfpj79uylo4A8Dg3EptdkpZJ
         jvHBgngeHMF3vFYDoLAtc86OuWMELYKyYeH0CFMqVjrmNFPjUCMl9kY8YLgcLRulftbZ
         1rA5ExzTVf6BafRGfrMK4A0moWA9DTrb940EPlYoAug8hSnty1an53KSSgISue9u/8a4
         bnhnxhi9hDyHfwtV15QOaBst0/odZd4TYcN56rXlI+WWJtWY2CjeMQN6qrYmFdD2Qk/I
         JHdw==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b="YWK/LEpx";
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 s10-20020a170902ea0a00b00172f8a4b3e1si10826710plg.81.2022.11.20.16.30.35;
        Sun, 20 Nov 2022 16:30:48 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b="YWK/LEpx";
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230049AbiKUA3K (ORCPT <rfc822;leviz.kernel.dev@gmail.com>
        + 99 others); Sun, 20 Nov 2022 19:29:10 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58700 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229985AbiKUA1w (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Sun, 20 Nov 2022 19:27:52 -0500
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 158D31FCCC;
        Sun, 20 Nov 2022 16:27:48 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1668990468; x=1700526468;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=lQ+GUoAJqvHedg+d+kfODMpHJrhD7+ALFsnhNJ3753Q=;
  b=YWK/LEpxzXQ2aVSmi1U76NwlzB5WwN1BMiA+IPa4IKYvqXVDGSK+WxYG
   FeXsldcW6gz6SGxqYbPWWWMQquTzZbiwmW4o1xZWjP8NwdJ1dNRRLJWPT
   oBrRQXzIGIyQNolh3VAJoR/OSXrsvHy0lo3j6pQ97WvmZ9Og8nXfE4gXt
   Q/kjh1uAkzvE3KEJw71XrZPgOT0yJeQ02sgjGZ+eAPwT6edZxwHhH4PkD
   md4KKi6OKq8Ul5gU8qHuZlNO2eO/kqOm6pl5HPTH795A6MLDyex7cPi3i
   jh2A0l43zhrSrZCb7wOLLL2l5gNXiYSkfiooAszCxh4pp3PpzUjNr1XlC
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="377705713"
X-IronPort-AV: E=Sophos;i="5.96,180,1665471600";
   d="scan'208";a="377705713"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
  by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Nov 2022 16:27:47 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="729825418"
X-IronPort-AV: E=Sophos;i="5.96,180,1665471600";
   d="scan'208";a="729825418"
Received: from tomnavar-mobl.amr.corp.intel.com (HELO
 khuang2-desk.gar.corp.intel.com) ([10.209.176.15])
  by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Nov 2022 16:27:42 -0800
From: Kai Huang <kai.huang@intel.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: linux-mm@kvack.org, seanjc@google.com, pbonzini@redhat.com,
        dave.hansen@intel.com, dan.j.williams@intel.com,
        rafael.j.wysocki@intel.com, kirill.shutemov@linux.intel.com,
        ying.huang@intel.com, reinette.chatre@intel.com,
        len.brown@intel.com, tony.luck@intel.com, peterz@infradead.org,
        ak@linux.intel.com, isaku.yamahata@intel.com, chao.gao@intel.com,
        sathyanarayanan.kuppuswamy@linux.intel.com, bagasdotme@gmail.com,
        sagis@google.com, imammedo@redhat.com, kai.huang@intel.com
Subject: [PATCH v7 12/20] x86/virt/tdx: Create TDMRs to cover all TDX memory
 regions
Date: Mon, 21 Nov 2022 13:26:34 +1300
Message-Id: 
 <4db59b4a87f0309c29e61a79892b9fa6645754a8.1668988357.git.kai.huang@intel.com>
X-Mailer: git-send-email 2.38.1
In-Reply-To: <cover.1668988357.git.kai.huang@intel.com>
References: <cover.1668988357.git.kai.huang@intel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,
        SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1750063537747385272?=
X-GMAIL-MSGID: =?utf-8?q?1750063537747385272?=

The kernel configures TDX-usable memory regions by passing an array of
"TD Memory Regions" (TDMRs) to the TDX module.  Each TDMR contains the
information of the base/size of a memory region, the base/size of the
associated Physical Address Metadata Table (PAMT) and a list of reserved
areas in the region.

Create a number of TDMRs to cover all TDX memory regions.  To keep it
simple, always try to create one TDMR for each memory region.  As the
first step only set up the base/size for each TDMR.

Each TDMR must be 1G aligned and the size must be in 1G granularity.
This implies that one TDMR could cover multiple memory regions.  If a
memory region spans the 1GB boundary and the former part is already
covered by the previous TDMR, just create a new TDMR for the remaining
part.

TDX only supports a limited number of TDMRs.  Disable TDX if all TDMRs
are consumed but there is more memory region to cover.

Signed-off-by: Kai Huang <kai.huang@intel.com>
---

v6 -> v7:
 - No change.

v5 -> v6:
 - Rebase due to using 'tdx_memblock' instead of memblock.

- v3 -> v5 (no feedback on v4):
 - Removed allocating TDMR individually.
 - Improved changelog by using Dave's words.
 - Made TDMR_START() and TDMR_END() as static inline function.

---
 arch/x86/virt/vmx/tdx/tdx.c | 104 +++++++++++++++++++++++++++++++++++-
 1 file changed, 103 insertions(+), 1 deletion(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 26048c6b0170..57b448de59a0 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -445,6 +445,24 @@ static int build_tdx_memory(void)
 	return ret;
 }
 
+/* TDMR must be 1gb aligned */
+#define TDMR_ALIGNMENT		BIT_ULL(30)
+#define TDMR_PFN_ALIGNMENT	(TDMR_ALIGNMENT >> PAGE_SHIFT)
+
+/* Align up and down the address to TDMR boundary */
+#define TDMR_ALIGN_DOWN(_addr)	ALIGN_DOWN((_addr), TDMR_ALIGNMENT)
+#define TDMR_ALIGN_UP(_addr)	ALIGN((_addr), TDMR_ALIGNMENT)
+
+static inline u64 tdmr_start(struct tdmr_info *tdmr)
+{
+	return tdmr->base;
+}
+
+static inline u64 tdmr_end(struct tdmr_info *tdmr)
+{
+	return tdmr->base + tdmr->size;
+}
+
 /* Calculate the actual TDMR_INFO size */
 static inline int cal_tdmr_size(void)
 {
@@ -492,14 +510,98 @@ static struct tdmr_info *alloc_tdmr_array(int *array_sz)
 	return alloc_pages_exact(*array_sz, GFP_KERNEL | __GFP_ZERO);
 }
 
+static struct tdmr_info *tdmr_array_entry(struct tdmr_info *tdmr_array,
+					  int idx)
+{
+	return (struct tdmr_info *)((unsigned long)tdmr_array +
+			cal_tdmr_size() * idx);
+}
+
+/*
+ * Create TDMRs to cover all TDX memory regions.  The actual number
+ * of TDMRs is set to @tdmr_num.
+ */
+static int create_tdmrs(struct tdmr_info *tdmr_array, int *tdmr_num)
+{
+	struct tdx_memblock *tmb;
+	int tdmr_idx = 0;
+
+	/*
+	 * Loop over TDX memory regions and create TDMRs to cover them.
+	 * To keep it simple, always try to use one TDMR to cover
+	 * one memory region.
+	 */
+	list_for_each_entry(tmb, &tdx_memlist, list) {
+		struct tdmr_info *tdmr;
+		u64 start, end;
+
+		tdmr = tdmr_array_entry(tdmr_array, tdmr_idx);
+		start = TDMR_ALIGN_DOWN(tmb->start_pfn << PAGE_SHIFT);
+		end = TDMR_ALIGN_UP(tmb->end_pfn << PAGE_SHIFT);
+
+		/*
+		 * If the current TDMR's size hasn't been initialized,
+		 * it is a new TDMR to cover the new memory region.
+		 * Otherwise, the current TDMR has already covered the
+		 * previous memory region.  In the latter case, check
+		 * whether the current memory region has been fully or
+		 * partially covered by the current TDMR, since TDMR is
+		 * 1G aligned.
+		 */
+		if (tdmr->size) {
+			/*
+			 * Loop to the next memory region if the current
+			 * block has already been fully covered by the
+			 * current TDMR.
+			 */
+			if (end <= tdmr_end(tdmr))
+				continue;
+
+			/*
+			 * If part of the current memory region has
+			 * already been covered by the current TDMR,
+			 * skip the already covered part.
+			 */
+			if (start < tdmr_end(tdmr))
+				start = tdmr_end(tdmr);
+
+			/*
+			 * Create a new TDMR to cover the current memory
+			 * region, or the remaining part of it.
+			 */
+			tdmr_idx++;
+			if (tdmr_idx >= tdx_sysinfo.max_tdmrs)
+				return -E2BIG;
+
+			tdmr = tdmr_array_entry(tdmr_array, tdmr_idx);
+		}
+
+		tdmr->base = start;
+		tdmr->size = end - start;
+	}
+
+	/* @tdmr_idx is always the index of last valid TDMR. */
+	*tdmr_num = tdmr_idx + 1;
+
+	return 0;
+}
+
 /*
  * Construct an array of TDMRs to cover all TDX memory ranges.
  * The actual number of TDMRs is kept to @tdmr_num.
  */
 static int construct_tdmrs(struct tdmr_info *tdmr_array, int *tdmr_num)
 {
+	int ret;
+
+	ret = create_tdmrs(tdmr_array, tdmr_num);
+	if (ret)
+		goto err;
+
 	/* Return -EINVAL until constructing TDMRs is done */
-	return -EINVAL;
+	ret = -EINVAL;
+err:
+	return ret;
 }
 
 /*

From patchwork Mon Nov 21 00:26:35 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kai Huang <kai.huang@intel.com>
X-Patchwork-Id: 23490
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1323063wrr;
        Sun, 20 Nov 2022 16:30:53 -0800 (PST)
X-Google-Smtp-Source: 
 AA0mqf6BEHprvsFl2HaBI1ko4Xio4om5kU7x3nkeQCMPR7tZztEkugFOUPHPddLb4y4rWhspODrP
X-Received: by 2002:a17:902:ead1:b0:187:31da:a260 with SMTP id
 p17-20020a170902ead100b0018731daa260mr9279637pld.64.1668990652736;
        Sun, 20 Nov 2022 16:30:52 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1668990652; cv=none;
        d=google.com; s=arc-20160816;
        b=YssBf7jtDr4j+GeKVz9jb6UpodHC/o7vHrTHll++SBfDBlRsOOGiR+Vfd6ZqAluz/q
         zcB/f6tRpOYZXsp2H3s3FONUkT1IC11k69bbENHbo1wnQplWT4OI4bUz2v3xnOai527v
         oOEsHg2Ndu4dx9gAXwGnGFhbrPZ1DnwGFFNNgmXtl0WPa9xeA9/X1lAT/fJf/zDm4/ac
         J3w+felR9uKAJk8XqC1jXtUAVeN7LXHbVU6U6TdRRp358dKxW/hcRwUvWzasha2Esxun
         DYs8iI8VjSv0EaerTQq+RYDtlJh6QlorDhGra8Ux3XjB0nmchGIfc9+/FP4IN0OXqqQl
         HWOQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from
         :dkim-signature;
        bh=tygx++zkp4b12mDPrLAD4EkrF/tM2ITsIdQZLjpd6Vc=;
        b=vEGusIbU47//2JQczexoecJvlw6v0DW+2Q2eKEuAbTT4fS60q7lz0GuYX0w/bNw4LM
         1jTafAt7QKistBu1LcPN8bDl2dGK8oOhWRSrAxmmm2PXKwfHJTw2V6TTmp+YYG040hrk
         a13EaARAUewTCQ7o44UNnUV4tWiWPZl2HDW9Y6Hj14MjGZVg5lAKIFV5DPr1yFM/OfZR
         Y+FJhv5AM1ybWdlq4ugI9e52jBrGtxr9L++CeILzIsTE+oMBdAKnKHcxU7fYGyS8i8Ey
         CAyKgcMmvGnzj29MmkbFHYXiLetErZr9UPehF7CUu85k3f6nZk6EiesmQhDzlVDLsMBZ
         XzMw==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=NO2Neigf;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 w3-20020a17090aaf8300b00215f01f79a4si12469328pjq.62.2022.11.20.16.30.39;
        Sun, 20 Nov 2022 16:30:52 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=NO2Neigf;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229827AbiKUA3R (ORCPT <rfc822;leviz.kernel.dev@gmail.com>
        + 99 others); Sun, 20 Nov 2022 19:29:17 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57306 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229848AbiKUA2o (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Sun, 20 Nov 2022 19:28:44 -0500
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 365E72DA96;
        Sun, 20 Nov 2022 16:27:52 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1668990472; x=1700526472;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=USq/NN8kgkdf9mW4R0hp3+fsrsI45C1gLZr7XotHn6c=;
  b=NO2NeigfGmh2RjuvhPVrgDp/n8PaPOEdJf3uHQfN+gS/qyPX7A2F2imv
   y0bLkjT19YimGW2t29jKDwbG7R/+CeHVIAUDGgrjbPsNlzZIAd0o6i4VI
   Gz/kO2J+so720BIWditXBTThw5Dp+WWziGmXJmzACATbGbvsEyZOv7VVb
   Hz/8BjMuPONVfwHc6HQfGRTMQHj1kBR1MnChYVrcq4ykXbpZz4cPJ6ZBl
   Qu/aAQAEHLxIUiVq0+wgLL55NuTJ9nJ/UnAsG/2+9h/2kdZ0MRwro5hMd
   btWc4OQ6musfc4/Qy56HfFkeAxon2ZIiqctBEQyU+sdI74S7mE/8xKQTF
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="377705718"
X-IronPort-AV: E=Sophos;i="5.96,180,1665471600";
   d="scan'208";a="377705718"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
  by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Nov 2022 16:27:51 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="729825444"
X-IronPort-AV: E=Sophos;i="5.96,180,1665471600";
   d="scan'208";a="729825444"
Received: from tomnavar-mobl.amr.corp.intel.com (HELO
 khuang2-desk.gar.corp.intel.com) ([10.209.176.15])
  by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Nov 2022 16:27:47 -0800
From: Kai Huang <kai.huang@intel.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: linux-mm@kvack.org, seanjc@google.com, pbonzini@redhat.com,
        dave.hansen@intel.com, dan.j.williams@intel.com,
        rafael.j.wysocki@intel.com, kirill.shutemov@linux.intel.com,
        ying.huang@intel.com, reinette.chatre@intel.com,
        len.brown@intel.com, tony.luck@intel.com, peterz@infradead.org,
        ak@linux.intel.com, isaku.yamahata@intel.com, chao.gao@intel.com,
        sathyanarayanan.kuppuswamy@linux.intel.com, bagasdotme@gmail.com,
        sagis@google.com, imammedo@redhat.com, kai.huang@intel.com
Subject: [PATCH v7 13/20] x86/virt/tdx: Allocate and set up PAMTs for TDMRs
Date: Mon, 21 Nov 2022 13:26:35 +1300
Message-Id: 
 <ef6cdab2c371b9f068f2b4bf493b1dd0c9bb3c99.1668988357.git.kai.huang@intel.com>
X-Mailer: git-send-email 2.38.1
In-Reply-To: <cover.1668988357.git.kai.huang@intel.com>
References: <cover.1668988357.git.kai.huang@intel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,
        SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1750063542649410629?=
X-GMAIL-MSGID: =?utf-8?q?1750063542649410629?=

The TDX module uses additional metadata to record things like which
guest "owns" a given page of memory.  This metadata, referred as
Physical Address Metadata Table (PAMT), essentially serves as the
'struct page' for the TDX module.  PAMTs are not reserved by hardware
up front.  They must be allocated by the kernel and then given to the
TDX module.

TDX supports 3 page sizes: 4K, 2M, and 1G.  Each "TD Memory Region"
(TDMR) has 3 PAMTs to track the 3 supported page sizes.  Each PAMT must
be a physically contiguous area from a Convertible Memory Region (CMR).
However, the PAMTs which track pages in one TDMR do not need to reside
within that TDMR but can be anywhere in CMRs.  If one PAMT overlaps with
any TDMR, the overlapping part must be reported as a reserved area in
that particular TDMR.

Use alloc_contig_pages() since PAMT must be a physically contiguous area
and it may be potentially large (~1/256th of the size of the given TDMR).
The downside is alloc_contig_pages() may fail at runtime.  One (bad)
mitigation is to launch a TD guest early during system boot to get those
PAMTs allocated at early time, but the only way to fix is to add a boot
option to allocate or reserve PAMTs during kernel boot.

TDX only supports a limited number of reserved areas per TDMR to cover
both PAMTs and memory holes within the given TDMR.  If many PAMTs are
allocated within a single TDMR, the reserved areas may not be sufficient
to cover all of them.

Adopt the following policies when allocating PAMTs for a given TDMR:

  - Allocate three PAMTs of the TDMR in one contiguous chunk to minimize
    the total number of reserved areas consumed for PAMTs.
  - Try to first allocate PAMT from the local node of the TDMR for better
    NUMA locality.

Also dump out how many pages are allocated for PAMTs when the TDX module
is initialized successfully.

Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---

v6 -> v7:
 - Changes due to using macros instead of 'enum' for TDX supported page
   sizes.

v5 -> v6:
 - Rebase due to using 'tdx_memblock' instead of memblock.
 - 'int pamt_entry_nr' -> 'unsigned long nr_pamt_entries' (Dave/Sagis).
 - Improved comment around tdmr_get_nid() (Dave).
 - Improved comment in tdmr_set_up_pamt() around breaking the PAMT
   into PAMTs for 4K/2M/1G (Dave).
 - tdmrs_get_pamt_pages() -> tdmrs_count_pamt_pages() (Dave).   

- v3 -> v5 (no feedback on v4):
 - Used memblock to get the NUMA node for given TDMR.
 - Removed tdmr_get_pamt_sz() helper but use open-code instead.
 - Changed to use 'switch .. case..' for each TDX supported page size in
   tdmr_get_pamt_sz() (the original __tdmr_get_pamt_sz()).
 - Added printing out memory used for PAMT allocation when TDX module is
   initialized successfully.
 - Explained downside of alloc_contig_pages() in changelog.
 - Addressed other minor comments.


---
 arch/x86/Kconfig            |   1 +
 arch/x86/virt/vmx/tdx/tdx.c | 191 ++++++++++++++++++++++++++++++++++++
 2 files changed, 192 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b36129183035..b86a333b860f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1960,6 +1960,7 @@ config INTEL_TDX_HOST
 	depends on KVM_INTEL
 	depends on X86_X2APIC
 	select ARCH_KEEP_MEMBLOCK
+	depends on CONTIG_ALLOC
 	help
 	  Intel Trust Domain Extensions (TDX) protects guest VMs from malicious
 	  host and certain physical attacks.  This option enables necessary TDX
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 57b448de59a0..9d76e70de46e 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -586,6 +586,187 @@ static int create_tdmrs(struct tdmr_info *tdmr_array, int *tdmr_num)
 	return 0;
 }
 
+/*
+ * Calculate PAMT size given a TDMR and a page size.  The returned
+ * PAMT size is always aligned up to 4K page boundary.
+ */
+static unsigned long tdmr_get_pamt_sz(struct tdmr_info *tdmr, int pgsz)
+{
+	unsigned long pamt_sz, nr_pamt_entries;
+
+	switch (pgsz) {
+	case TDX_PS_4K:
+		nr_pamt_entries = tdmr->size >> PAGE_SHIFT;
+		break;
+	case TDX_PS_2M:
+		nr_pamt_entries = tdmr->size >> PMD_SHIFT;
+		break;
+	case TDX_PS_1G:
+		nr_pamt_entries = tdmr->size >> PUD_SHIFT;
+		break;
+	default:
+		WARN_ON_ONCE(1);
+		return 0;
+	}
+
+	pamt_sz = nr_pamt_entries * tdx_sysinfo.pamt_entry_size;
+	/* TDX requires PAMT size must be 4K aligned */
+	pamt_sz = ALIGN(pamt_sz, PAGE_SIZE);
+
+	return pamt_sz;
+}
+
+/*
+ * Pick a NUMA node on which to allocate this TDMR's metadata.
+ *
+ * This is imprecise since TDMRs are 1G aligned and NUMA nodes might
+ * not be.  If the TDMR covers more than one node, just use the _first_
+ * one.  This can lead to small areas of off-node metadata for some
+ * memory.
+ */
+static int tdmr_get_nid(struct tdmr_info *tdmr)
+{
+	struct tdx_memblock *tmb;
+
+	/* Find the first memory region covered by the TDMR */
+	list_for_each_entry(tmb, &tdx_memlist, list) {
+		if (tmb->end_pfn > (tdmr_start(tdmr) >> PAGE_SHIFT))
+			return tmb->nid;
+	}
+
+	/*
+	 * Fall back to allocating the TDMR's metadata from node 0 when
+	 * no TDX memory block can be found.  This should never happen
+	 * since TDMRs originate from TDX memory blocks.
+	 */
+	WARN_ON_ONCE(1);
+	return 0;
+}
+
+static int tdmr_set_up_pamt(struct tdmr_info *tdmr)
+{
+	unsigned long pamt_base[TDX_PS_1G + 1];
+	unsigned long pamt_size[TDX_PS_1G + 1];
+	unsigned long tdmr_pamt_base;
+	unsigned long tdmr_pamt_size;
+	struct page *pamt;
+	int pgsz, nid;
+
+	nid = tdmr_get_nid(tdmr);
+
+	/*
+	 * Calculate the PAMT size for each TDX supported page size
+	 * and the total PAMT size.
+	 */
+	tdmr_pamt_size = 0;
+	for (pgsz = TDX_PS_4K; pgsz <= TDX_PS_1G ; pgsz++) {
+		pamt_size[pgsz] = tdmr_get_pamt_sz(tdmr, pgsz);
+		tdmr_pamt_size += pamt_size[pgsz];
+	}
+
+	/*
+	 * Allocate one chunk of physically contiguous memory for all
+	 * PAMTs.  This helps minimize the PAMT's use of reserved areas
+	 * in overlapped TDMRs.
+	 */
+	pamt = alloc_contig_pages(tdmr_pamt_size >> PAGE_SHIFT, GFP_KERNEL,
+			nid, &node_online_map);
+	if (!pamt)
+		return -ENOMEM;
+
+	/*
+	 * Break the contiguous allocation back up into the
+	 * individual PAMTs for each page size.
+	 */
+	tdmr_pamt_base = page_to_pfn(pamt) << PAGE_SHIFT;
+	for (pgsz = TDX_PS_4K; pgsz <= TDX_PS_1G; pgsz++) {
+		pamt_base[pgsz] = tdmr_pamt_base;
+		tdmr_pamt_base += pamt_size[pgsz];
+	}
+
+	tdmr->pamt_4k_base = pamt_base[TDX_PS_4K];
+	tdmr->pamt_4k_size = pamt_size[TDX_PS_4K];
+	tdmr->pamt_2m_base = pamt_base[TDX_PS_2M];
+	tdmr->pamt_2m_size = pamt_size[TDX_PS_2M];
+	tdmr->pamt_1g_base = pamt_base[TDX_PS_1G];
+	tdmr->pamt_1g_size = pamt_size[TDX_PS_1G];
+
+	return 0;
+}
+
+static void tdmr_get_pamt(struct tdmr_info *tdmr, unsigned long *pamt_pfn,
+			  unsigned long *pamt_npages)
+{
+	unsigned long pamt_base, pamt_sz;
+
+	/*
+	 * The PAMT was allocated in one contiguous unit.  The 4K PAMT
+	 * should always point to the beginning of that allocation.
+	 */
+	pamt_base = tdmr->pamt_4k_base;
+	pamt_sz = tdmr->pamt_4k_size + tdmr->pamt_2m_size + tdmr->pamt_1g_size;
+
+	*pamt_pfn = pamt_base >> PAGE_SHIFT;
+	*pamt_npages = pamt_sz >> PAGE_SHIFT;
+}
+
+static void tdmr_free_pamt(struct tdmr_info *tdmr)
+{
+	unsigned long pamt_pfn, pamt_npages;
+
+	tdmr_get_pamt(tdmr, &pamt_pfn, &pamt_npages);
+
+	/* Do nothing if PAMT hasn't been allocated for this TDMR */
+	if (!pamt_npages)
+		return;
+
+	if (WARN_ON_ONCE(!pamt_pfn))
+		return;
+
+	free_contig_range(pamt_pfn, pamt_npages);
+}
+
+static void tdmrs_free_pamt_all(struct tdmr_info *tdmr_array, int tdmr_num)
+{
+	int i;
+
+	for (i = 0; i < tdmr_num; i++)
+		tdmr_free_pamt(tdmr_array_entry(tdmr_array, i));
+}
+
+/* Allocate and set up PAMTs for all TDMRs */
+static int tdmrs_set_up_pamt_all(struct tdmr_info *tdmr_array, int tdmr_num)
+{
+	int i, ret = 0;
+
+	for (i = 0; i < tdmr_num; i++) {
+		ret = tdmr_set_up_pamt(tdmr_array_entry(tdmr_array, i));
+		if (ret)
+			goto err;
+	}
+
+	return 0;
+err:
+	tdmrs_free_pamt_all(tdmr_array, tdmr_num);
+	return ret;
+}
+
+static unsigned long tdmrs_count_pamt_pages(struct tdmr_info *tdmr_array,
+					  int tdmr_num)
+{
+	unsigned long pamt_npages = 0;
+	int i;
+
+	for (i = 0; i < tdmr_num; i++) {
+		unsigned long pfn, npages;
+
+		tdmr_get_pamt(tdmr_array_entry(tdmr_array, i), &pfn, &npages);
+		pamt_npages += npages;
+	}
+
+	return pamt_npages;
+}
+
 /*
  * Construct an array of TDMRs to cover all TDX memory ranges.
  * The actual number of TDMRs is kept to @tdmr_num.
@@ -598,8 +779,13 @@ static int construct_tdmrs(struct tdmr_info *tdmr_array, int *tdmr_num)
 	if (ret)
 		goto err;
 
+	ret = tdmrs_set_up_pamt_all(tdmr_array, *tdmr_num);
+	if (ret)
+		goto err;
+
 	/* Return -EINVAL until constructing TDMRs is done */
 	ret = -EINVAL;
+	tdmrs_free_pamt_all(tdmr_array, *tdmr_num);
 err:
 	return ret;
 }
@@ -686,6 +872,11 @@ static int init_tdx_module(void)
 	 * process are done.
 	 */
 	ret = -EINVAL;
+	if (ret)
+		tdmrs_free_pamt_all(tdmr_array, tdmr_num);
+	else
+		pr_info("%lu pages allocated for PAMT.\n",
+				tdmrs_count_pamt_pages(tdmr_array, tdmr_num));
 out_free_tdmrs:
 	/*
 	 * The array of TDMRs is freed no matter the initialization is

From patchwork Mon Nov 21 00:26:36 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kai Huang <kai.huang@intel.com>
X-Patchwork-Id: 23491
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1323174wrr;
        Sun, 20 Nov 2022 16:31:19 -0800 (PST)
X-Google-Smtp-Source: 
 AA0mqf5Jn2VeRYwwmANVExl4ahagma5593LT/0pm1uqR4LLX5pLsihT8NMF/zpd2nNVkr48Q1Pl2
X-Received: by 2002:a17:903:2013:b0:186:6a1d:331d with SMTP id
 s19-20020a170903201300b001866a1d331dmr9672078pla.168.1668990678867;
        Sun, 20 Nov 2022 16:31:18 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1668990678; cv=none;
        d=google.com; s=arc-20160816;
        b=dGMrYSPv/LLxmlqOH+sEY6Pu4IBe4Psb3WDo5LnrRnMs6P3yfwjklFiquyuQJ3utUG
         1VVYqKjTcTyfkMdE8EjQVYJ3LR0nnWs+Xvh/0l4G4JgGCsNscNPcdtk1IQ8UAKbfedUf
         jhYceaMzpRqd4LZYRxbtaGI/UAbusZTJaYvjChpZXyB9dOD1PJWPQ9dwMl4lgW7o8FzA
         I6xJ9DkcrwFHfYLhFjP/clayWLaPuzUfYHH1xjrbtDgpVZfbRpujCX+oWfKleCU6CgqA
         8damJOzBaF+C3SCRsfa/mX1L5QWNkZxJCNNk0elfJjdMo9wf+V86qOporxAZwmDwBQwO
         LasQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from
         :dkim-signature;
        bh=A4B3khvq5oJuI7iMcHA3xLsfDS8b2IAshH7rHNKGCwo=;
        b=vFpNOGAM1toyvq3hKYyXm7HFi3ICV7kAWU+9FGK5ATZVZKoEUoblapOqq0M0rExuCn
         vbnhWqMjhPMSwYk80PfKcZgt8TBEokplSAW3TIywCqW4CivNv0PO6MIcUH+phh8VgoEu
         +E3VXMA57g9MEx3w8eqParpNkkqy6Du4/MASlDfgceUcy8TAPCr5723zltb4RKoTwS+h
         TsRroedngBvhI2yXcawt4TocmaXkHKVtVHDD565flW3K23e8NYukbS6inVQEFsx6B6Rr
         pZpl0S30xiFmusEZwKhHaYFoWyYm0XfSxs5JKdpaEFMp10q6ZojSi5o88v6tQSAOfUFe
         GE+w==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b="CZRf4GW/";
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 i13-20020a654d0d000000b0043aebb63fc9si10032173pgt.732.2022.11.20.16.31.04;
        Sun, 20 Nov 2022 16:31:18 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b="CZRf4GW/";
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229970AbiKUAad (ORCPT <rfc822;leviz.kernel.dev@gmail.com>
        + 99 others); Sun, 20 Nov 2022 19:30:33 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57294 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229966AbiKUAaR (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Sun, 20 Nov 2022 19:30:17 -0500
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 43EDD5DBAF;
        Sun, 20 Nov 2022 16:28:50 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1668990531; x=1700526531;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=GgfjMNlWKh7Q48et9Zex42X5bo0IVc2OwOzhISGZumM=;
  b=CZRf4GW/bLfGb7kb5m2NE804j5I9nhp7Th+CaYxThnZjEeUGygzEHmnd
   yXwYKBWPLwMMU5Kwc6wdurPKalOC1Z7w0ZX+5ArNiZRay1OplmV3Vce13
   2kx4Zc/wetuxRPXDbYE4JfIIrlVubYLh3pQtaWnPgnECkK/w/Cl6ZWu/G
   Bl0SfP6ZqMw8N4sSPhpO5KocqFkWGPa+5rm8vTLdLunLcgGELe5rqXLLI
   1hKRrAoeVy9DpWx4e5cT17DyRTawpNMsh/4XiGJ5ODmokMJaWzSg6wPMU
   w4qDJe+PDmwPJRVLKpxVdo7YcMUIc1neHBc6dXb+G6Cy/0/RbOjUKLiMc
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="377705741"
X-IronPort-AV: E=Sophos;i="5.96,180,1665471600";
   d="scan'208";a="377705741"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
  by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Nov 2022 16:27:55 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="729825473"
X-IronPort-AV: E=Sophos;i="5.96,180,1665471600";
   d="scan'208";a="729825473"
Received: from tomnavar-mobl.amr.corp.intel.com (HELO
 khuang2-desk.gar.corp.intel.com) ([10.209.176.15])
  by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Nov 2022 16:27:51 -0800
From: Kai Huang <kai.huang@intel.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: linux-mm@kvack.org, seanjc@google.com, pbonzini@redhat.com,
        dave.hansen@intel.com, dan.j.williams@intel.com,
        rafael.j.wysocki@intel.com, kirill.shutemov@linux.intel.com,
        ying.huang@intel.com, reinette.chatre@intel.com,
        len.brown@intel.com, tony.luck@intel.com, peterz@infradead.org,
        ak@linux.intel.com, isaku.yamahata@intel.com, chao.gao@intel.com,
        sathyanarayanan.kuppuswamy@linux.intel.com, bagasdotme@gmail.com,
        sagis@google.com, imammedo@redhat.com, kai.huang@intel.com
Subject: [PATCH v7 14/20] x86/virt/tdx: Set up reserved areas for all TDMRs
Date: Mon, 21 Nov 2022 13:26:36 +1300
Message-Id: 
 <5a5644e691134dc72c5e3fb0fc22fa40d4aa0b34.1668988357.git.kai.huang@intel.com>
X-Mailer: git-send-email 2.38.1
In-Reply-To: <cover.1668988357.git.kai.huang@intel.com>
References: <cover.1668988357.git.kai.huang@intel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,
        SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1750063569838498685?=
X-GMAIL-MSGID: =?utf-8?q?1750063569838498685?=

As the last step of constructing TDMRs, set up reserved areas for all
TDMRs.  For each TDMR, put all memory holes within this TDMR to the
reserved areas.  And for all PAMTs which overlap with this TDMR, put
all the overlapping parts to reserved areas too.

Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---

v6 -> v7:
 - No change.

v5 -> v6:
 - Rebase due to using 'tdx_memblock' instead of memblock.
 - Split tdmr_set_up_rsvd_areas() into two functions to handle memory
   hole and PAMT respectively.
 - Added Isaku's Reviewed-by.


---
 arch/x86/virt/vmx/tdx/tdx.c | 190 +++++++++++++++++++++++++++++++++++-
 1 file changed, 188 insertions(+), 2 deletions(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 9d76e70de46e..1fbf33f2f210 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -21,6 +21,7 @@
 #include <linux/memblock.h>
 #include <linux/minmax.h>
 #include <linux/sizes.h>
+#include <linux/sort.h>
 #include <asm/msr-index.h>
 #include <asm/msr.h>
 #include <asm/apic.h>
@@ -767,6 +768,187 @@ static unsigned long tdmrs_count_pamt_pages(struct tdmr_info *tdmr_array,
 	return pamt_npages;
 }
 
+static int tdmr_add_rsvd_area(struct tdmr_info *tdmr, int *p_idx,
+			      u64 addr, u64 size)
+{
+	struct tdmr_reserved_area *rsvd_areas = tdmr->reserved_areas;
+	int idx = *p_idx;
+
+	/* Reserved area must be 4K aligned in offset and size */
+	if (WARN_ON(addr & ~PAGE_MASK || size & ~PAGE_MASK))
+		return -EINVAL;
+
+	/* Cannot exceed maximum reserved areas supported by TDX */
+	if (idx >= tdx_sysinfo.max_reserved_per_tdmr)
+		return -E2BIG;
+
+	rsvd_areas[idx].offset = addr - tdmr->base;
+	rsvd_areas[idx].size = size;
+
+	*p_idx = idx + 1;
+
+	return 0;
+}
+
+static int tdmr_set_up_memory_hole_rsvd_areas(struct tdmr_info *tdmr,
+					      int *rsvd_idx)
+{
+	struct tdx_memblock *tmb;
+	u64 prev_end;
+	int ret;
+
+	/* Mark holes between memory regions as reserved */
+	prev_end = tdmr_start(tdmr);
+	list_for_each_entry(tmb, &tdx_memlist, list) {
+		u64 start, end;
+
+		start = tmb->start_pfn << PAGE_SHIFT;
+		end = tmb->end_pfn << PAGE_SHIFT;
+
+		/* Break if this region is after the TDMR */
+		if (start >= tdmr_end(tdmr))
+			break;
+
+		/* Exclude regions before this TDMR */
+		if (end < tdmr_start(tdmr))
+			continue;
+
+		/*
+		 * Skip if no hole exists before this region. "<=" is
+		 * used because one memory region might span two TDMRs
+		 * (when the previous TDMR covers part of this region).
+		 * In this case the start address of this region is
+		 * smaller than the start address of the second TDMR.
+		 *
+		 * Update the prev_end to the end of this region where
+		 * the possible memory hole starts.
+		 */
+		if (start <= prev_end) {
+			prev_end = end;
+			continue;
+		}
+
+		/* Add the hole before this region */
+		ret = tdmr_add_rsvd_area(tdmr, rsvd_idx, prev_end,
+				start - prev_end);
+		if (ret)
+			return ret;
+
+		prev_end = end;
+	}
+
+	/* Add the hole after the last region if it exists. */
+	if (prev_end < tdmr_end(tdmr)) {
+		ret = tdmr_add_rsvd_area(tdmr, rsvd_idx, prev_end,
+				tdmr_end(tdmr) - prev_end);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int tdmr_set_up_pamt_rsvd_areas(struct tdmr_info *tdmr, int *rsvd_idx,
+				       struct tdmr_info *tdmr_array,
+				       int tdmr_num)
+{
+	int i, ret;
+
+	/*
+	 * If any PAMT overlaps with this TDMR, the overlapping part
+	 * must also be put to the reserved area too.  Walk over all
+	 * TDMRs to find out those overlapping PAMTs and put them to
+	 * reserved areas.
+	 */
+	for (i = 0; i < tdmr_num; i++) {
+		struct tdmr_info *tmp = tdmr_array_entry(tdmr_array, i);
+		unsigned long pamt_start_pfn, pamt_npages;
+		u64 pamt_start, pamt_end;
+
+		tdmr_get_pamt(tmp, &pamt_start_pfn, &pamt_npages);
+		/* Each TDMR must already have PAMT allocated */
+		WARN_ON_ONCE(!pamt_npages || !pamt_start_pfn);
+
+		pamt_start = pamt_start_pfn << PAGE_SHIFT;
+		pamt_end = pamt_start + (pamt_npages << PAGE_SHIFT);
+
+		/* Skip PAMTs outside of the given TDMR */
+		if ((pamt_end <= tdmr_start(tdmr)) ||
+				(pamt_start >= tdmr_end(tdmr)))
+			continue;
+
+		/* Only mark the part within the TDMR as reserved */
+		if (pamt_start < tdmr_start(tdmr))
+			pamt_start = tdmr_start(tdmr);
+		if (pamt_end > tdmr_end(tdmr))
+			pamt_end = tdmr_end(tdmr);
+
+		ret = tdmr_add_rsvd_area(tdmr, rsvd_idx, pamt_start,
+				pamt_end - pamt_start);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+/* Compare function called by sort() for TDMR reserved areas */
+static int rsvd_area_cmp_func(const void *a, const void *b)
+{
+	struct tdmr_reserved_area *r1 = (struct tdmr_reserved_area *)a;
+	struct tdmr_reserved_area *r2 = (struct tdmr_reserved_area *)b;
+
+	if (r1->offset + r1->size <= r2->offset)
+		return -1;
+	if (r1->offset >= r2->offset + r2->size)
+		return 1;
+
+	/* Reserved areas cannot overlap.  The caller should guarantee. */
+	WARN_ON_ONCE(1);
+	return -1;
+}
+
+/* Set up reserved areas for a TDMR, including memory holes and PAMTs */
+static int tdmr_set_up_rsvd_areas(struct tdmr_info *tdmr,
+				  struct tdmr_info *tdmr_array,
+				  int tdmr_num)
+{
+	int ret, rsvd_idx = 0;
+
+	/* Put all memory holes within the TDMR into reserved areas */
+	ret = tdmr_set_up_memory_hole_rsvd_areas(tdmr, &rsvd_idx);
+	if (ret)
+		return ret;
+
+	/* Put all (overlapping) PAMTs within the TDMR into reserved areas */
+	ret = tdmr_set_up_pamt_rsvd_areas(tdmr, &rsvd_idx, tdmr_array, tdmr_num);
+	if (ret)
+		return ret;
+
+	/* TDX requires reserved areas listed in address ascending order */
+	sort(tdmr->reserved_areas, rsvd_idx, sizeof(struct tdmr_reserved_area),
+			rsvd_area_cmp_func, NULL);
+
+	return 0;
+}
+
+static int tdmrs_set_up_rsvd_areas_all(struct tdmr_info *tdmr_array,
+				       int tdmr_num)
+{
+	int i;
+
+	for (i = 0; i < tdmr_num; i++) {
+		int ret;
+
+		ret = tdmr_set_up_rsvd_areas(tdmr_array_entry(tdmr_array, i),
+				tdmr_array, tdmr_num);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
 /*
  * Construct an array of TDMRs to cover all TDX memory ranges.
  * The actual number of TDMRs is kept to @tdmr_num.
@@ -783,8 +965,12 @@ static int construct_tdmrs(struct tdmr_info *tdmr_array, int *tdmr_num)
 	if (ret)
 		goto err;
 
-	/* Return -EINVAL until constructing TDMRs is done */
-	ret = -EINVAL;
+	ret = tdmrs_set_up_rsvd_areas_all(tdmr_array, *tdmr_num);
+	if (ret)
+		goto err_free_pamts;
+
+	return 0;
+err_free_pamts:
 	tdmrs_free_pamt_all(tdmr_array, *tdmr_num);
 err:
 	return ret;

From patchwork Mon Nov 21 00:26:37 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kai Huang <kai.huang@intel.com>
X-Patchwork-Id: 23492
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1323185wrr;
        Sun, 20 Nov 2022 16:31:20 -0800 (PST)
X-Google-Smtp-Source: 
 AA0mqf5gBMkjWbLNgPSNeL56TmI5QRpcicE3d4ZA1tnMGf5ctlK4LtHR4oU5Jb/UpsRFdghXNPwV
X-Received: by 2002:a17:902:f80d:b0:186:5d84:604e with SMTP id
 ix13-20020a170902f80d00b001865d84604emr9509525plb.85.1668990680618;
        Sun, 20 Nov 2022 16:31:20 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1668990680; cv=none;
        d=google.com; s=arc-20160816;
        b=TlbjCLZNsTV4vVhJ9vvucWx/FvuXOpFF/oh5e7Hw5/Av1ipFFycfWxKvxnhXpbnqrq
         Vs4IIZ3y9HY4/UT4iluEwKEdmpauX0FxhbDm3s5nVM2nREB4YHtz1UWgAbpYFboTL8Uj
         jXMKPcSM68QCNjmJ2IRXRPbVCSFOd04bSC7UaeG2y8eeWySmx7SFbA4UxApuJlzhdl96
         WsYoZGcwo4o9fEi2d3xYyBrZGqYesY9PBZhqQa6pFizPvf2AAWonKTwcISH9KH0ReTFf
         dKYen7Eo/O+DYQuTpinIctGoQcUrZlrAeri4uOGdMmkx88IJ3bPJ3yZ9rJcUthAgpEpA
         t2Cw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from
         :dkim-signature;
        bh=vxx245sjBvbQA8FPUJSVr1+bd2mG3rW5RI92nqB2fhk=;
        b=IvDgyGTBojNmAgmy1ECXQNovdT3xzHychXuvdstzv5JXGBLRzLg3ZF7XvEcacrY0jr
         XrG5aYA4TcnunJzDkdDJWZt1dtlFTplScFhHEfVD7C1rJQRWbvdMqANO7wlKkt2sk8F8
         0gTnHJ2W3pBr6v8eCVaJJhEV17OpL3VPb1dzs/QHZa8urd6NIEUHt184pzVg+27iGEZy
         jSOwlNFaUrlahV1woom7R9Q2Q9liluetmifY6cbBMq/lWdEMRuk1uLiZKzvXmPWR4im+
         /FzssMclJUVQcWCT24QvA64MmExdjKM7brjmZw2ozrGriUOE/sW4sorhLYja/H8+AeEJ
         sQyg==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=ma7P+Uxg;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 j189-20020a636ec6000000b0046f5808167asi10387592pgc.812.2022.11.20.16.31.07;
        Sun, 20 Nov 2022 16:31:20 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=ma7P+Uxg;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229992AbiKUAag (ORCPT <rfc822;leviz.kernel.dev@gmail.com>
        + 99 others); Sun, 20 Nov 2022 19:30:36 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57302 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229969AbiKUAaR (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Sun, 20 Nov 2022 19:30:17 -0500
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 201F867F70;
        Sun, 20 Nov 2022 16:28:51 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1668990532; x=1700526532;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=p0XEmMAPhmR4juGNDpcaQB3QmsMbWYRHbmAUiytIe9Q=;
  b=ma7P+Uxg39RKNUwrsWoY+bqQiG5MOmBc4gVwlymDrjwSynKviTHVM8Pe
   2cd+JqzbPweC0mB0m8tZPB3baogCrTDYSIeHvkRQ1CV7BNIJi3TVhxkxa
   11In/vaoaE913S3BTp4P5g8sHzUxuFxNt3gMZ8JbpuT8SYNEKwjBQo0hO
   +zZf1FCDptI8PVK5+noakrN0TI9XlKpUVBJ3fEkF86WH7BV7SeVIntsSm
   rTTewheaMf9ynHxtdOM3PyrMOkFCDJ5aVA9KNVxy1YgRR0I+lPQBF8p7Q
   rJ1ieBHOxQ1/WI+qFGm9EH78+MeAoDyLnshym4d9O+eY9Fuh5VGEWLqPz
   w==;
X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="377705750"
X-IronPort-AV: E=Sophos;i="5.96,180,1665471600";
   d="scan'208";a="377705750"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
  by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Nov 2022 16:28:00 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="729825498"
X-IronPort-AV: E=Sophos;i="5.96,180,1665471600";
   d="scan'208";a="729825498"
Received: from tomnavar-mobl.amr.corp.intel.com (HELO
 khuang2-desk.gar.corp.intel.com) ([10.209.176.15])
  by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Nov 2022 16:27:56 -0800
From: Kai Huang <kai.huang@intel.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: linux-mm@kvack.org, seanjc@google.com, pbonzini@redhat.com,
        dave.hansen@intel.com, dan.j.williams@intel.com,
        rafael.j.wysocki@intel.com, kirill.shutemov@linux.intel.com,
        ying.huang@intel.com, reinette.chatre@intel.com,
        len.brown@intel.com, tony.luck@intel.com, peterz@infradead.org,
        ak@linux.intel.com, isaku.yamahata@intel.com, chao.gao@intel.com,
        sathyanarayanan.kuppuswamy@linux.intel.com, bagasdotme@gmail.com,
        sagis@google.com, imammedo@redhat.com, kai.huang@intel.com
Subject: [PATCH v7 15/20] x86/virt/tdx: Reserve TDX module global KeyID
Date: Mon, 21 Nov 2022 13:26:37 +1300
Message-Id: 
 <fec007c0193e5f0509450de78052346da1045b23.1668988357.git.kai.huang@intel.com>
X-Mailer: git-send-email 2.38.1
In-Reply-To: <cover.1668988357.git.kai.huang@intel.com>
References: <cover.1668988357.git.kai.huang@intel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,
        SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1750063572387192963?=
X-GMAIL-MSGID: =?utf-8?q?1750063572387192963?=

TDX module initialization requires to use one TDX private KeyID as the
global KeyID to protect the TDX module metadata.  The global KeyID is
configured to the TDX module along with TDMRs.

Just reserve the first TDX private KeyID as the global KeyID.  Keep the
global KeyID as a static variable as KVM will need to use it too.

Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/virt/vmx/tdx/tdx.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 1fbf33f2f210..e2cbeeb7f0dc 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -62,6 +62,9 @@ static int tdx_cmr_num;
 /* All TDX-usable memory regions */
 static LIST_HEAD(tdx_memlist);
 
+/* TDX module global KeyID.  Used in TDH.SYS.CONFIG ABI. */
+static u32 tdx_global_keyid;
+
 /*
  * Detect TDX private KeyIDs to see whether TDX has been enabled by the
  * BIOS.  Both initializing the TDX module and running TDX guest require
@@ -1053,6 +1056,12 @@ static int init_tdx_module(void)
 	if (ret)
 		goto out_free_tdmrs;
 
+	/*
+	 * Reserve the first TDX KeyID as global KeyID to protect
+	 * TDX module metadata.
+	 */
+	tdx_global_keyid = tdx_keyid_start;
+
 	/*
 	 * Return -EINVAL until all steps of TDX module initialization
 	 * process are done.

From patchwork Mon Nov 21 00:26:38 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kai Huang <kai.huang@intel.com>
X-Patchwork-Id: 23493
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1323228wrr;
        Sun, 20 Nov 2022 16:31:27 -0800 (PST)
X-Google-Smtp-Source: 
 AA0mqf4FKuNgcOHV3hObZ6A79aYwZc5S7t/G+SVpiqTJ5LMvJc76AY72rCJXa+IkCmoGWxWZP9AR
X-Received: by 2002:a63:110d:0:b0:46e:bcc1:28df with SMTP id
 g13-20020a63110d000000b0046ebcc128dfmr15323716pgl.187.1668990687560;
        Sun, 20 Nov 2022 16:31:27 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1668990687; cv=none;
        d=google.com; s=arc-20160816;
        b=jn1AkRlANgpaFa4WVACEzSBp+tidcXGDe2asSGQ10MYvT8ZX2UVPWZnf/TxD53yqET
         P8E94e4/AxrgOg23oOngeMtn9ceFldMqBpQ3SEOpmCndgXLSeOF/NcBhhDzYdB8LVXQM
         UDedK08t81JpgwAZdxkiT3v1JAr/vPnbXiWB6eEAjrw5iiQ40a0+Gt8o3cCOc8usG6xb
         2Mk7xVsnSOjGG2155729ebFIn8vZrp6DqMx902l3WDgw2WR+oWhyDRWGaCwMucqHDXi4
         e82BZ+VPoLRorYWfNsgJPeWYWKQ3E+CiSrBdHKbWV5s1PT3oLooMggLwYpPSd9l2dQ6c
         lE0w==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from
         :dkim-signature;
        bh=tmJYBtyDZ7or73XyG1rgNnlZS36N1OabISvnADKu2Gk=;
        b=YPjqfNUTDyGPPbfuyfoZ0W3WSKCcKS6jmOg4X68P8bSKImHKIBE8FHDQoNOxlm5pRJ
         uBK7niX2rYGCaqA65e6nusi+GCgwB+4HTlPVCYz2nnE/RIhpE3ClwOReJv9vVn+S2UGE
         I1akqffvmBfPx3wGrU1glLOfSzdW3RcpUedOKQS9WOyCtgoSI9p3JsOPRR4eu8JlmEuE
         Hn5aPMEiBQzkHoo/pgt0EE7AR2VTvk5ienAesftevO89ndQXf4uL6h8gayJN8Xgr2buZ
         KAkeI6MHasUc/K/Y/vaxdzl0ibSypZ9pJLQ2phWEf0MvgcXRUcmaYl/nHxeznbxRxKXf
         fBVQ==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=M0Aj5Qrc;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 s1-20020a170902c64100b001870464adb6si8553294pls.183.2022.11.20.16.31.11;
        Sun, 20 Nov 2022 16:31:27 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=M0Aj5Qrc;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229997AbiKUAak (ORCPT <rfc822;leviz.kernel.dev@gmail.com>
        + 99 others); Sun, 20 Nov 2022 19:30:40 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57430 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229978AbiKUAaT (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Sun, 20 Nov 2022 19:30:19 -0500
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 30CE66A693;
        Sun, 20 Nov 2022 16:28:52 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1668990532; x=1700526532;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=UBAtxkoUHDLhrJ6Ir/h/0EBfmIyd0UUVQOxLVLHPC58=;
  b=M0Aj5QrcALu7z12Di0o4q8ex3qwTkai9hCfK/LwRCCNKBw0QzJd9lb2/
   QPi9Bnt6hs/z/gEeejH08PuRVRg23ClkroYbYIKw4uZ3hbVfSny6CrVrG
   Avw4ibiDzjnrH/nijc5oTMwrqo6j9gfAIz5W/GM0s0lLt2oLsiqe+OYqE
   HjtZ/vA4f2pR9WWHyf58Xu3KaXitadFMDlwqJo692xgoltsOBlZ3483Ea
   8yG1/+epNuPxPPuLYbO5Aeze+nI4kPw+ZNtqw+xxkECCN69qfAxyTsG+w
   SWQu6c9dX7jG5WEBLNcRUST4LqK5b9M2Qb46KxB4tXA2ndyB14RJCGqHc
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="377705751"
X-IronPort-AV: E=Sophos;i="5.96,180,1665471600";
   d="scan'208";a="377705751"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
  by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Nov 2022 16:28:05 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="729825519"
X-IronPort-AV: E=Sophos;i="5.96,180,1665471600";
   d="scan'208";a="729825519"
Received: from tomnavar-mobl.amr.corp.intel.com (HELO
 khuang2-desk.gar.corp.intel.com) ([10.209.176.15])
  by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Nov 2022 16:28:00 -0800
From: Kai Huang <kai.huang@intel.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: linux-mm@kvack.org, seanjc@google.com, pbonzini@redhat.com,
        dave.hansen@intel.com, dan.j.williams@intel.com,
        rafael.j.wysocki@intel.com, kirill.shutemov@linux.intel.com,
        ying.huang@intel.com, reinette.chatre@intel.com,
        len.brown@intel.com, tony.luck@intel.com, peterz@infradead.org,
        ak@linux.intel.com, isaku.yamahata@intel.com, chao.gao@intel.com,
        sathyanarayanan.kuppuswamy@linux.intel.com, bagasdotme@gmail.com,
        sagis@google.com, imammedo@redhat.com, kai.huang@intel.com
Subject: [PATCH v7 16/20] x86/virt/tdx: Configure TDX module with TDMRs and
 global KeyID
Date: Mon, 21 Nov 2022 13:26:38 +1300
Message-Id: 
 <344234642a5eb9dc1aa34410f641f596ec428ea5.1668988357.git.kai.huang@intel.com>
X-Mailer: git-send-email 2.38.1
In-Reply-To: <cover.1668988357.git.kai.huang@intel.com>
References: <cover.1668988357.git.kai.huang@intel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,
        SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1750063579075793938?=
X-GMAIL-MSGID: =?utf-8?q?1750063579075793938?=

After the TDX-usable memory regions are constructed in an array of TDMRs
and the global KeyID is reserved, configure them to the TDX module using
TDH.SYS.CONFIG SEAMCALL.  TDH.SYS.CONFIG can only be called once and can
be done on any logical cpu.

Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/virt/vmx/tdx/tdx.c | 37 +++++++++++++++++++++++++++++++++++++
 arch/x86/virt/vmx/tdx/tdx.h |  2 ++
 2 files changed, 39 insertions(+)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index e2cbeeb7f0dc..3a032930e58a 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -979,6 +979,37 @@ static int construct_tdmrs(struct tdmr_info *tdmr_array, int *tdmr_num)
 	return ret;
 }
 
+static int config_tdx_module(struct tdmr_info *tdmr_array, int tdmr_num,
+			     u64 global_keyid)
+{
+	u64 *tdmr_pa_array;
+	int i, array_sz;
+	u64 ret;
+
+	/*
+	 * TDMR_INFO entries are configured to the TDX module via an
+	 * array of the physical address of each TDMR_INFO.  TDX module
+	 * requires the array itself to be 512-byte aligned.  Round up
+	 * the array size to 512-byte aligned so the buffer allocated
+	 * by kzalloc() will meet the alignment requirement.
+	 */
+	array_sz = ALIGN(tdmr_num * sizeof(u64), TDMR_INFO_PA_ARRAY_ALIGNMENT);
+	tdmr_pa_array = kzalloc(array_sz, GFP_KERNEL);
+	if (!tdmr_pa_array)
+		return -ENOMEM;
+
+	for (i = 0; i < tdmr_num; i++)
+		tdmr_pa_array[i] = __pa(tdmr_array_entry(tdmr_array, i));
+
+	ret = seamcall(TDH_SYS_CONFIG, __pa(tdmr_pa_array), tdmr_num,
+				global_keyid, 0, NULL, NULL);
+
+	/* Free the array as it is not required anymore. */
+	kfree(tdmr_pa_array);
+
+	return ret;
+}
+
 /*
  * Detect and initialize the TDX module.
  *
@@ -1062,11 +1093,17 @@ static int init_tdx_module(void)
 	 */
 	tdx_global_keyid = tdx_keyid_start;
 
+	/* Pass the TDMRs and the global KeyID to the TDX module */
+	ret = config_tdx_module(tdmr_array, tdmr_num, tdx_global_keyid);
+	if (ret)
+		goto out_free_pamts;
+
 	/*
 	 * Return -EINVAL until all steps of TDX module initialization
 	 * process are done.
 	 */
 	ret = -EINVAL;
+out_free_pamts:
 	if (ret)
 		tdmrs_free_pamt_all(tdmr_array, tdmr_num);
 	else
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index a737f2b51474..c26bab2555ca 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -19,6 +19,7 @@
 #define TDH_SYS_INIT		33
 #define TDH_SYS_LP_INIT		35
 #define TDH_SYS_LP_SHUTDOWN	44
+#define TDH_SYS_CONFIG		45
 
 struct cmr_info {
 	u64	base;
@@ -86,6 +87,7 @@ struct tdmr_reserved_area {
 } __packed;
 
 #define TDMR_INFO_ALIGNMENT	512
+#define TDMR_INFO_PA_ARRAY_ALIGNMENT	512
 
 struct tdmr_info {
 	u64 base;

From patchwork Mon Nov 21 00:26:39 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kai Huang <kai.huang@intel.com>
X-Patchwork-Id: 23494
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1325612wrr;
        Sun, 20 Nov 2022 16:40:37 -0800 (PST)
X-Google-Smtp-Source: 
 AA0mqf6sTXLBH/fu9RcRpzJFY+I0Co82x36M7wXX6EG7IIS2ZxBGoXM5tGleAaTsMKSF7MGISEiZ
X-Received: by 2002:a05:6a00:24c1:b0:56e:a001:8cb0 with SMTP id
 d1-20020a056a0024c100b0056ea0018cb0mr17780052pfv.60.1668991236612;
        Sun, 20 Nov 2022 16:40:36 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1668991236; cv=none;
        d=google.com; s=arc-20160816;
        b=1JFJ2VFkcEW+e4XX7DyOySlwWrDd7R1/4FFDaEcSwQ/4GqOMfXCiIApfo9AVtjsjXy
         sQna5mktKOsS9rWGUN2FNmHdNQcd0kx/Sq92mHPeGcLtfz2psZK1csQnSB6+MiCPqCOZ
         1h4RZmFz9NsyCPBSCLJ8dj7vr/Kj8T3FMpr6S1f0UpA8JfiHZEgCNrKqYBatQfUG0Lev
         Ht34e3W6q03B8XXdvPfiL6D9OhIWG7saaF4DUEosh5di3FhsIAo0WzJQP5pc5/xFaP27
         hMLKLvCCj5JK/TF3IHSgpDOJVo6ZZhxW7K/e9Y81djuSKDudq6lFlI3yrvdsZ1BEA77j
         NXYw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from
         :dkim-signature;
        bh=1ZDrmZlNcxV6f+e5/5Q37GQIdNjaxpt17aZEM/WACYk=;
        b=EJ/Rs4JJ+TorNImNThIMmSUDYMxk9a0qcAW0mL5wMt3FSHCkBUU3WLCuvSrdFqgEbR
         ZUKrAAF2ox8mfpOWx1y9w1n0KF9a0mc26FsYpgco2RXleTYp82ecvomUJjU2ewORndhD
         LrR/b8LgH8bShTVOC9ZkUz+O7G5iz1/0JG6g55wnPVjX+rz3OEtv9KiBwqS0Uo3P94I/
         yo9BeoINQefCG2vErvweBwkb4iIwsoMiQU/YpdnOhFmljmcJ+hdR8XQceNBTCq+ADxh8
         ZDWMk+Po/LLCjnonNqd/RYk4xlzTLppT6C8H393RiEQCkOr9L6KEDC3PGZlNpNH2piNZ
         fMjQ==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=T3Vg7vCu;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 j13-20020a170903028d00b00186b59eba21si10211423plr.574.2022.11.20.16.40.20;
        Sun, 20 Nov 2022 16:40:36 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=T3Vg7vCu;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230113AbiKUAbE (ORCPT <rfc822;leviz.kernel.dev@gmail.com>
        + 99 others); Sun, 20 Nov 2022 19:31:04 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58626 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229929AbiKUAa2 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Sun, 20 Nov 2022 19:30:28 -0500
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E65991E708;
        Sun, 20 Nov 2022 16:28:53 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1668990537; x=1700526537;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=jIBcb5cVMwmxp99dDnDGXF9eYbfG3mWlVut9+yUNmLc=;
  b=T3Vg7vCuQSdz5CNeHk1ybLQkedvecjgDu3L+33rOOaE5zbN81zHyls4w
   0MA6UySZb7sSsoOTizQ1K5mRux81qCxeK2w7PmjjP75Y/lTk0ufiL0u3n
   5oNBKyc02m+uPl88OI0g4kuN0eljC7TI1LqU+Rpsi2+gu3ewvg4Kh3uHJ
   JqnOIlqaiM0KsZzJe6IjNhhfJ1Odw5cWf8pkN2C7Eofb74ihh5ejkoWUV
   Pe2eStsG3Z4pwxbNBRuatz/am5jBL003yO3nOaw1MT/7IMbgxZTbLOGi4
   ibrn2GKqMWx6v6fHsSc1a9OFiQlTzqL9iNhSBmGW+QXlYUkBvlF5jAEdc
   A==;
X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="377705756"
X-IronPort-AV: E=Sophos;i="5.96,180,1665471600";
   d="scan'208";a="377705756"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
  by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Nov 2022 16:28:09 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="729825533"
X-IronPort-AV: E=Sophos;i="5.96,180,1665471600";
   d="scan'208";a="729825533"
Received: from tomnavar-mobl.amr.corp.intel.com (HELO
 khuang2-desk.gar.corp.intel.com) ([10.209.176.15])
  by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Nov 2022 16:28:05 -0800
From: Kai Huang <kai.huang@intel.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: linux-mm@kvack.org, seanjc@google.com, pbonzini@redhat.com,
        dave.hansen@intel.com, dan.j.williams@intel.com,
        rafael.j.wysocki@intel.com, kirill.shutemov@linux.intel.com,
        ying.huang@intel.com, reinette.chatre@intel.com,
        len.brown@intel.com, tony.luck@intel.com, peterz@infradead.org,
        ak@linux.intel.com, isaku.yamahata@intel.com, chao.gao@intel.com,
        sathyanarayanan.kuppuswamy@linux.intel.com, bagasdotme@gmail.com,
        sagis@google.com, imammedo@redhat.com, kai.huang@intel.com
Subject: [PATCH v7 17/20] x86/virt/tdx: Configure global KeyID on all packages
Date: Mon, 21 Nov 2022 13:26:39 +1300
Message-Id: 
 <8d8285cc5efa6302cf42a3fe2c9153d1a9dbcdac.1668988357.git.kai.huang@intel.com>
X-Mailer: git-send-email 2.38.1
In-Reply-To: <cover.1668988357.git.kai.huang@intel.com>
References: <cover.1668988357.git.kai.huang@intel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,
        SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1750064154423130313?=
X-GMAIL-MSGID: =?utf-8?q?1750064154423130313?=

After the array of TDMRs and the global KeyID are configured to the TDX
module, use TDH.SYS.KEY.CONFIG to configure the key of the global KeyID
on all packages.

TDH.SYS.KEY.CONFIG must be done on one (any) cpu for each package.  And
it cannot run concurrently on different CPUs.  Implement a helper to
run SEAMCALL on one cpu for each package one by one, and use it to
configure the global KeyID on all packages.

Intel hardware doesn't guarantee cache coherency across different
KeyIDs.  The kernel needs to flush PAMT's dirty cachelines (associated
with KeyID 0) before the TDX module uses the global KeyID to access the
PAMT.  Following the TDX module specification, flush cache before
configuring the global KeyID on all packages.

Given the PAMT size can be large (~1/256th of system RAM), just use
WBINVD on all CPUs to flush.

Note if any TDH.SYS.KEY.CONFIG fails, the TDX module may already have
used the global KeyID to write any PAMT.  Therefore, need to use WBINVD
to flush cache before freeing the PAMTs back to the kernel.  Note using
MOVDIR64B (which changes the page's associated KeyID from the old TDX
private KeyID back to KeyID 0, which is used by the kernel) to clear
PMATs isn't needed, as the KeyID 0 doesn't support integrity check.

Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---

v6 -> v7:
 - Improved changelong and comment to explain why MOVDIR64B isn't used
   when returning PAMTs back to the kernel.

---
 arch/x86/virt/vmx/tdx/tdx.c | 89 ++++++++++++++++++++++++++++++++++++-
 arch/x86/virt/vmx/tdx/tdx.h |  1 +
 2 files changed, 88 insertions(+), 2 deletions(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 3a032930e58a..99d1be5941a7 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -224,6 +224,46 @@ static void seamcall_on_each_cpu(struct seamcall_ctx *sc)
 	on_each_cpu(seamcall_smp_call_function, sc, true);
 }
 
+/*
+ * Call one SEAMCALL on one (any) cpu for each physical package in
+ * serialized way.  Return immediately in case of any error if
+ * SEAMCALL fails on any cpu.
+ *
+ * Note for serialized calls 'struct seamcall_ctx::err' doesn't have
+ * to be atomic, but for simplicity just reuse it instead of adding
+ * a new one.
+ */
+static int seamcall_on_each_package_serialized(struct seamcall_ctx *sc)
+{
+	cpumask_var_t packages;
+	int cpu, ret = 0;
+
+	if (!zalloc_cpumask_var(&packages, GFP_KERNEL))
+		return -ENOMEM;
+
+	for_each_online_cpu(cpu) {
+		if (cpumask_test_and_set_cpu(topology_physical_package_id(cpu),
+					packages))
+			continue;
+
+		ret = smp_call_function_single(cpu, seamcall_smp_call_function,
+				sc, true);
+		if (ret)
+			break;
+
+		/*
+		 * Doesn't have to use atomic_read(), but it doesn't
+		 * hurt either.
+		 */
+		ret = atomic_read(&sc->err);
+		if (ret)
+			break;
+	}
+
+	free_cpumask_var(packages);
+	return ret;
+}
+
 static int tdx_module_init_cpus(void)
 {
 	struct seamcall_ctx sc = { .fn = TDH_SYS_LP_INIT };
@@ -1010,6 +1050,22 @@ static int config_tdx_module(struct tdmr_info *tdmr_array, int tdmr_num,
 	return ret;
 }
 
+static int config_global_keyid(void)
+{
+	struct seamcall_ctx sc = { .fn = TDH_SYS_KEY_CONFIG };
+
+	/*
+	 * Configure the key of the global KeyID on all packages by
+	 * calling TDH.SYS.KEY.CONFIG on all packages in a serialized
+	 * way as it cannot run concurrently on different CPUs.
+	 *
+	 * TDH.SYS.KEY.CONFIG may fail with entropy error (which is
+	 * a recoverable error).  Assume this is exceedingly rare and
+	 * just return error if encountered instead of retrying.
+	 */
+	return seamcall_on_each_package_serialized(&sc);
+}
+
 /*
  * Detect and initialize the TDX module.
  *
@@ -1098,15 +1154,44 @@ static int init_tdx_module(void)
 	if (ret)
 		goto out_free_pamts;
 
+	/*
+	 * Hardware doesn't guarantee cache coherency across different
+	 * KeyIDs.  The kernel needs to flush PAMT's dirty cachelines
+	 * (associated with KeyID 0) before the TDX module can use the
+	 * global KeyID to access the PAMT.  Given PAMTs are potentially
+	 * large (~1/256th of system RAM), just use WBINVD on all cpus
+	 * to flush the cache.
+	 *
+	 * Follow the TDX spec to flush cache before configuring the
+	 * global KeyID on all packages.
+	 */
+	wbinvd_on_all_cpus();
+
+	/* Config the key of global KeyID on all packages */
+	ret = config_global_keyid();
+	if (ret)
+		goto out_free_pamts;
+
 	/*
 	 * Return -EINVAL until all steps of TDX module initialization
 	 * process are done.
 	 */
 	ret = -EINVAL;
 out_free_pamts:
-	if (ret)
+	if (ret) {
+		/*
+		 * Part of PAMT may already have been initialized by
+		 * TDX module.  Flush cache before returning PAMT back
+		 * to the kernel.
+		 *
+		 * Note there's no need to do MOVDIR64B (which changes
+		 * the page's associated KeyID from the old TDX private
+		 * KeyID back to KeyID 0, which is used by the kernel),
+		 * as KeyID 0 doesn't support integrity check.
+		 */
+		wbinvd_on_all_cpus();
 		tdmrs_free_pamt_all(tdmr_array, tdmr_num);
-	else
+	} else
 		pr_info("%lu pages allocated for PAMT.\n",
 				tdmrs_count_pamt_pages(tdmr_array, tdmr_num));
 out_free_tdmrs:
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index c26bab2555ca..768d097412ab 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -15,6 +15,7 @@
 /*
  * TDX module SEAMCALL leaf functions
  */
+#define TDH_SYS_KEY_CONFIG	31
 #define TDH_SYS_INFO		32
 #define TDH_SYS_INIT		33
 #define TDH_SYS_LP_INIT		35

From patchwork Mon Nov 21 00:26:40 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kai Huang <kai.huang@intel.com>
X-Patchwork-Id: 23496
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1327420wrr;
        Sun, 20 Nov 2022 16:48:04 -0800 (PST)
X-Google-Smtp-Source: 
 AA0mqf6hA6MdwuNmcXl1EZdbz79m024NnM+eGRaNFV1txu0T0SIXWHzzxQW894+yuQNn7VvCUP1w
X-Received: by 2002:a17:902:dac2:b0:188:f5c7:4d23 with SMTP id
 q2-20020a170902dac200b00188f5c74d23mr9586414plx.125.1668991684487;
        Sun, 20 Nov 2022 16:48:04 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1668991684; cv=none;
        d=google.com; s=arc-20160816;
        b=q+YemIQFhOt9bqOeM8rXqc2KCDvk0f/BPVvXNA8lfrQuoZ5gPM8yEi38dL9zgqG+yU
         c0rOjSVSkvOE4bdM1AVEYT2UszChrfWEYiOxmNX4RuQcVR1t37N6sTVNWJsxegVLYxo1
         9BpwohgsYToh8fSeKJdnnwcCsB3SRu4LvUkuJBN7E5ERSGNhb1RC8sONwAJmVMKjni7m
         U3Gi7XRKtJnxBbDhQCIh1GwSoy+tlcvqIrgtg+S9BSlBoZef/j5DKjzmCXdpPJeI5vdO
         tnw6GcTd4mGfjcPs4C1BaBZxMHeeHNlFa2Wse02KHsUDJAm4oO2FJcAOcXoIqqdICb+X
         yQrA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from
         :dkim-signature;
        bh=RZ2vana1oXRQxzEh9aeGfPQU1eY3ER67EJl735YSoOo=;
        b=r6Lr1WXj6Hj0V//25CFTQUkYmHkDrYrA4EDmxH4uiylPm4X/A8oWQxJxRwHGhgEsnE
         radrDP40g6etN9otimWVkYreIrTFYiRbtTvl4gSBP/z+lJgmwSSMjrxZKosjGz7upT/9
         XrtpYEezhQJ5ZqeHu4C6ruOh2KgFSx/KqSGzNXdxsh9b1254YlAgIjFnjAw0sXTbN9rS
         xyhAk4+fIeLKi6S0eInKpVSbeQYrhc86XwIXRqC4XmDegwvQ26EJm3zxTJWQgZC2GJkU
         RCllFoJUTYdWnQk2qd4UpnzFWveBnM2U2Au7ucBSbOIdrFLfd0P/7fX9LVaXt2pEb5IN
         nLUQ==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=MEWNWzv7;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 c11-20020a170902d48b00b001769fce8c2fsi10814545plg.485.2022.11.20.16.47.51;
        Sun, 20 Nov 2022 16:48:04 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=MEWNWzv7;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230124AbiKUAbM (ORCPT <rfc822;leviz.kernel.dev@gmail.com>
        + 99 others); Sun, 20 Nov 2022 19:31:12 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58622 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229934AbiKUAa2 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Sun, 20 Nov 2022 19:30:28 -0500
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F3F035E9F5;
        Sun, 20 Nov 2022 16:28:54 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1668990537; x=1700526537;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=3LwZ/2jAQ0M9UrYz1pvBJwdZZg6HBZnO8RRBVvS9No4=;
  b=MEWNWzv7JAUBF48uSWNG2sSZ3wzkfCIr540XtHyg5JHJDcgYWp9O7RDR
   q9Jgh7t58bqjgrnVtkf9e1wOyU7CetgPLG3bso9IjTH/IeVBgksl8dzmp
   Cfb72x7Uu1ZNlmR2gsG48uFzMhfg4sRyYbbrrLP2RL4NfGbD7IO2v+2de
   zw2j9eD0kB4hi0MmD1o4LlF1L1iX2k5/UFLYUo6yVPMnW62uzdpOPanhq
   u4xy1gYyVEPhb/wgPxHXVI5UGQmDQYdfL68E7z7MOn4ZLax52oTjf1d0V
   K7vWjSTaguOAgf/OmifHTjidF0Za9wU9r1eKCB7VOAYwfnObwLHxmjrys
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="377705762"
X-IronPort-AV: E=Sophos;i="5.96,180,1665471600";
   d="scan'208";a="377705762"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
  by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Nov 2022 16:28:13 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="729825550"
X-IronPort-AV: E=Sophos;i="5.96,180,1665471600";
   d="scan'208";a="729825550"
Received: from tomnavar-mobl.amr.corp.intel.com (HELO
 khuang2-desk.gar.corp.intel.com) ([10.209.176.15])
  by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Nov 2022 16:28:09 -0800
From: Kai Huang <kai.huang@intel.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: linux-mm@kvack.org, seanjc@google.com, pbonzini@redhat.com,
        dave.hansen@intel.com, dan.j.williams@intel.com,
        rafael.j.wysocki@intel.com, kirill.shutemov@linux.intel.com,
        ying.huang@intel.com, reinette.chatre@intel.com,
        len.brown@intel.com, tony.luck@intel.com, peterz@infradead.org,
        ak@linux.intel.com, isaku.yamahata@intel.com, chao.gao@intel.com,
        sathyanarayanan.kuppuswamy@linux.intel.com, bagasdotme@gmail.com,
        sagis@google.com, imammedo@redhat.com, kai.huang@intel.com
Subject: [PATCH v7 18/20] x86/virt/tdx: Initialize all TDMRs
Date: Mon, 21 Nov 2022 13:26:40 +1300
Message-Id: 
 <2337c8e9086a006aaa2c4b99caf478420d1fc640.1668988357.git.kai.huang@intel.com>
X-Mailer: git-send-email 2.38.1
In-Reply-To: <cover.1668988357.git.kai.huang@intel.com>
References: <cover.1668988357.git.kai.huang@intel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,
        SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1750064624533406583?=
X-GMAIL-MSGID: =?utf-8?q?1750064624533406583?=

Initialize TDMRs via TDH.SYS.TDMR.INIT as the last step to complete the
TDX initialization.

All TDMRs need to be initialized using TDH.SYS.TDMR.INIT SEAMCALL before
the memory pages can be used by the TDX module.  The time to initialize
TDMR is proportional to the size of the TDMR because TDH.SYS.TDMR.INIT
internally initializes the PAMT entries using the global KeyID.

To avoid long latency caused in one SEAMCALL, TDH.SYS.TDMR.INIT only
initializes an (implementation-specific) subset of PAMT entries of one
TDMR in one invocation.  The caller needs to call TDH.SYS.TDMR.INIT
iteratively until all PAMT entries of the given TDMR are initialized.

TDH.SYS.TDMR.INITs can run concurrently on multiple CPUs as long as they
are initializing different TDMRs.  To keep it simple, just initialize
all TDMRs one by one.  On a 2-socket machine with 2.2G CPUs and 64GB
memory, each TDH.SYS.TDMR.INIT roughly takes couple of microseconds on
average, and it takes roughly dozens of milliseconds to complete the
initialization of all TDMRs while system is idle.

Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---

v6 -> v7:
 - Removed need_resched() check. -- Andi.

---
 arch/x86/virt/vmx/tdx/tdx.c | 69 ++++++++++++++++++++++++++++++++++---
 arch/x86/virt/vmx/tdx/tdx.h |  1 +
 2 files changed, 65 insertions(+), 5 deletions(-)

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 99d1be5941a7..9bcdb30b7a80 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -1066,6 +1066,65 @@ static int config_global_keyid(void)
 	return seamcall_on_each_package_serialized(&sc);
 }
 
+/* Initialize one TDMR */
+static int init_tdmr(struct tdmr_info *tdmr)
+{
+	u64 next;
+
+	/*
+	 * Initializing PAMT entries might be time-consuming (in
+	 * proportion to the size of the requested TDMR).  To avoid long
+	 * latency in one SEAMCALL, TDH.SYS.TDMR.INIT only initializes
+	 * an (implementation-defined) subset of PAMT entries in one
+	 * invocation.
+	 *
+	 * Call TDH.SYS.TDMR.INIT iteratively until all PAMT entries
+	 * of the requested TDMR are initialized (if next-to-initialize
+	 * address matches the end address of the TDMR).
+	 */
+	do {
+		struct tdx_module_output out;
+		int ret;
+
+		ret = seamcall(TDH_SYS_TDMR_INIT, tdmr->base, 0, 0, 0, NULL,
+				&out);
+		if (ret)
+			return ret;
+		/*
+		 * RDX contains 'next-to-initialize' address if
+		 * TDH.SYS.TDMR.INT succeeded.
+		 */
+		next = out.rdx;
+		/* Allow scheduling when needed */
+		cond_resched();
+	} while (next < tdmr->base + tdmr->size);
+
+	return 0;
+}
+
+/* Initialize all TDMRs */
+static int init_tdmrs(struct tdmr_info *tdmr_array, int tdmr_num)
+{
+	int i;
+
+	/*
+	 * Initialize TDMRs one-by-one for simplicity, though the TDX
+	 * architecture does allow different TDMRs to be initialized in
+	 * parallel on multiple CPUs.  Parallel initialization could
+	 * be added later when the time spent in the serialized scheme
+	 * becomes a real concern.
+	 */
+	for (i = 0; i < tdmr_num; i++) {
+		int ret;
+
+		ret = init_tdmr(tdmr_array_entry(tdmr_array, i));
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
 /*
  * Detect and initialize the TDX module.
  *
@@ -1172,11 +1231,11 @@ static int init_tdx_module(void)
 	if (ret)
 		goto out_free_pamts;
 
-	/*
-	 * Return -EINVAL until all steps of TDX module initialization
-	 * process are done.
-	 */
-	ret = -EINVAL;
+	/* Initialize TDMRs to complete the TDX module initialization */
+	ret = init_tdmrs(tdmr_array, tdmr_num);
+	if (ret)
+		goto out_free_pamts;
+
 out_free_pamts:
 	if (ret) {
 		/*
diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
index 768d097412ab..891691b1ea50 100644
--- a/arch/x86/virt/vmx/tdx/tdx.h
+++ b/arch/x86/virt/vmx/tdx/tdx.h
@@ -19,6 +19,7 @@
 #define TDH_SYS_INFO		32
 #define TDH_SYS_INIT		33
 #define TDH_SYS_LP_INIT		35
+#define TDH_SYS_TDMR_INIT	36
 #define TDH_SYS_LP_SHUTDOWN	44
 #define TDH_SYS_CONFIG		45
 

From patchwork Mon Nov 21 00:26:41 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kai Huang <kai.huang@intel.com>
X-Patchwork-Id: 23495
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1325654wrr;
        Sun, 20 Nov 2022 16:40:45 -0800 (PST)
X-Google-Smtp-Source: 
 AA0mqf73hMo1nT1c7txYs5FVMNqSsSbcwMf6FPH1Nb2T8VBYHGAyYSVj6KxpERlRiPUP5v5wDAdl
X-Received: by 2002:a65:5b44:0:b0:477:1bf6:73b6 with SMTP id
 y4-20020a655b44000000b004771bf673b6mr7183670pgr.36.1668991245600;
        Sun, 20 Nov 2022 16:40:45 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1668991245; cv=none;
        d=google.com; s=arc-20160816;
        b=BHmJhMbV27jBKc/LiCt140aXqLXwuj3FaHD/e23SJ0NH3gKrreWoI8+CWiTCm40xFn
         8lo3OkhH9riuZgbrsjczqONmJzp37YdtwRhfTc6BcpYVu/3T+JzRzXeg3Fo/l/21Lj83
         2yO3pfhTFeEAAvypxrSUPeijXSXb6ZGvGwyvWq0WE6GU5FlGBPSyX9o9J/XBZ5eQwRYe
         ctiUgLNuOjHddXIsiQUUE3fPbvGwufla2iIQyDXpwwl3T3/ZIFqtSUyfX570sjjeYG6N
         cBsYlxZmGsBZJ2cX1UxLaO2qqdwiPLQIQJF23LarIlr3ouAhUptxEcgAus7rpQPjzc1H
         yvwg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from
         :dkim-signature;
        bh=0HsaI+F+Q6SMDl5C9timeO1Ls7uwIsud4GJwh7yr45s=;
        b=BbfaIJG6Vdivq47Msu6FsMA6u7HWHVDljCL7PfQjZ5inEEvfvyEBLUk32Xmr4qm9LM
         LGtItmQl17XXRwTb2Tn34h9c/ZfBZ4jLQxWFMoj+R1A3ngRUQqv4t+P9O4L0mhJK6EdI
         LsDgRUBVnbyXlGk18vajHz317F5vaY6vLlSxrVULSt736t+cNqSEyQmAPGtCdyp3Guo6
         9q7tsvAGiAIvExKFBvpsG6ohLy58lNB37McxzKKPt8Ko3bNoccU5nHFkBTO/C0Ot6NPR
         X0KmqwwCn/1ZODdBMVzKTPLSs2JradxG/FKj26bhInXQBbv37euEC0XPFJrQ0H9f3nrj
         BmRw==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=SzzD6Q46;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 lw6-20020a17090b180600b0020d65f31df8si16318238pjb.143.2022.11.20.16.40.32;
        Sun, 20 Nov 2022 16:40:45 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=SzzD6Q46;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229964AbiKUAbH (ORCPT <rfc822;leviz.kernel.dev@gmail.com>
        + 99 others); Sun, 20 Nov 2022 19:31:07 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58620 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230047AbiKUAa2 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Sun, 20 Nov 2022 19:30:28 -0500
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6822E1E3F2;
        Sun, 20 Nov 2022 16:28:55 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1668990537; x=1700526537;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=B4X5vcXsxt6uAK36/XCY9Qp+zNxFIBT0eQmTkgHAVxg=;
  b=SzzD6Q46Q4FD3NF7YWB6fd0YbJTjY0CHQ3HvPkv2bH8XNUftTY5gaSMV
   MagRbX4pNvR3qNLUDpmE6n/0w0hbfwXyX0F8lNDkMZXwOT8f7p3Y9XKvE
   OMOHNSSpvjtquVaHHE2x48bxkq/9ODmqHVicFR92manNvDpe72B5aUEbQ
   Lo76wGTUdHo9d5Hj6hmtwW+RxagQKI8ZMX8qg9Dvx+1J7N0z+0dRKT0lo
   Oo8RsH5mrmUGdSrcyp3ZoAxABdhb0iCdR5VWI7TfJQIRAuO5zezIiBV77
   c4N1AqrSSAgec4FIvIeTzvjCsuvzgwQLcpIMGziFtZClPx4awmfoSAMhD
   Q==;
X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="377705768"
X-IronPort-AV: E=Sophos;i="5.96,180,1665471600";
   d="scan'208";a="377705768"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
  by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Nov 2022 16:28:17 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="729825557"
X-IronPort-AV: E=Sophos;i="5.96,180,1665471600";
   d="scan'208";a="729825557"
Received: from tomnavar-mobl.amr.corp.intel.com (HELO
 khuang2-desk.gar.corp.intel.com) ([10.209.176.15])
  by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Nov 2022 16:28:13 -0800
From: Kai Huang <kai.huang@intel.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: linux-mm@kvack.org, seanjc@google.com, pbonzini@redhat.com,
        dave.hansen@intel.com, dan.j.williams@intel.com,
        rafael.j.wysocki@intel.com, kirill.shutemov@linux.intel.com,
        ying.huang@intel.com, reinette.chatre@intel.com,
        len.brown@intel.com, tony.luck@intel.com, peterz@infradead.org,
        ak@linux.intel.com, isaku.yamahata@intel.com, chao.gao@intel.com,
        sathyanarayanan.kuppuswamy@linux.intel.com, bagasdotme@gmail.com,
        sagis@google.com, imammedo@redhat.com, kai.huang@intel.com
Subject: [PATCH v7 19/20] x86/virt/tdx: Flush cache in kexec() when TDX is
 enabled
Date: Mon, 21 Nov 2022 13:26:41 +1300
Message-Id: 
 <a8a097dfa03704b95f0169b8e84f385a8dd3dc30.1668988357.git.kai.huang@intel.com>
X-Mailer: git-send-email 2.38.1
In-Reply-To: <cover.1668988357.git.kai.huang@intel.com>
References: <cover.1668988357.git.kai.huang@intel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,
        SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1750064164133493377?=
X-GMAIL-MSGID: =?utf-8?q?1750064164133493377?=

There are two problems in terms of using kexec() to boot to a new kernel
when the old kernel has enabled TDX: 1) Part of the memory pages are
still TDX private pages (i.e. metadata used by the TDX module, and any
TDX guest memory if kexec() happens when there's any TDX guest alive).
2) There might be dirty cachelines associated with TDX private pages.

Because the hardware doesn't guarantee cache coherency among different
KeyIDs, the old kernel needs to flush cache (of those TDX private pages)
before booting to the new kernel.  Also, reading TDX private page using
any shared non-TDX KeyID with integrity-check enabled can trigger #MC.
Therefore ideally, the kernel should convert all TDX private pages back
to normal before booting to the new kernel.

However, this implementation doesn't convert TDX private pages back to
normal in kexec() because of below considerations:

1) The kernel doesn't have existing infrastructure to track which pages
   are TDX private pages.
2) The number of TDX private pages can be large, and converting all of
   them (cache flush + using MOVDIR64B to clear the page) in kexec() can
   be time consuming.
3) The new kernel will almost only use KeyID 0 to access memory.  KeyID
   0 doesn't support integrity-check, so it's OK.
4) The kernel doesn't (and may never) support MKTME.  If any 3rd party
   kernel ever supports MKTME, it should do MOVDIR64B to clear the page
   with the new MKTME KeyID (just like TDX does) before using it.

Therefore, this implementation just flushes cache to make sure there are
no stale dirty cachelines associated with any TDX private KeyIDs before
booting to the new kernel, otherwise they may silently corrupt the new
kernel.

Following SME support, use wbinvd() to flush cache in stop_this_cpu().
Theoretically, cache flush is only needed when the TDX module has been
initialized.  However initializing the TDX module is done on demand at
runtime, and it takes a mutex to read the module status.  Just check
whether TDX is enabled by BIOS instead to flush cache.

Also, the current TDX module doesn't play nicely with kexec().  The TDX
module can only be initialized once during its lifetime, and there is no
ABI to reset the module to give a new clean slate to the new kernel.
Therefore ideally, if the TDX module is ever initialized, it's better
to shut it down.  The new kernel won't be able to use TDX anyway (as it
needs to go through the TDX module initialization process which will
fail immediately at the first step).

However, shutting down the TDX module requires all CPUs being in VMX
operation, but there's no such guarantee as kexec() can happen at any
time (i.e. when KVM is not even loaded).  So just do nothing but leave
leave the TDX module open.

Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
---

v6 -> v7:
 - Improved changelog to explain why don't convert TDX private pages back
   to normal.

---
 arch/x86/kernel/process.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index c21b7347a26d..0cc84977dc62 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -765,8 +765,14 @@ void __noreturn stop_this_cpu(void *dummy)
 	 *
 	 * Test the CPUID bit directly because the machine might've cleared
 	 * X86_FEATURE_SME due to cmdline options.
+	 *
+	 * Similar to SME, if the TDX module is ever initialized, the
+	 * cachelines associated with any TDX private KeyID must be flushed
+	 * before transiting to the new kernel.  The TDX module is initialized
+	 * on demand, and it takes the mutex to read its status.  Just check
+	 * whether TDX is enabled by BIOS instead to flush cache.
 	 */
-	if (cpuid_eax(0x8000001f) & BIT(0))
+	if (cpuid_eax(0x8000001f) & BIT(0) || platform_tdx_enabled())
 		native_wbinvd();
 	for (;;) {
 		/*

From patchwork Mon Nov 21 00:26:42 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kai Huang <kai.huang@intel.com>
X-Patchwork-Id: 23498
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1329149wrr;
        Sun, 20 Nov 2022 16:55:02 -0800 (PST)
X-Google-Smtp-Source: 
 AA0mqf6xdeGrEsHHYwG88T8N720X2CZoCFX8aGYBJ4EYmSERjbFXFXk6tSSkqYi3ZkXMKWCjYQBk
X-Received: by 2002:a63:5122:0:b0:464:3f16:e296 with SMTP id
 f34-20020a635122000000b004643f16e296mr2513889pgb.526.1668992102456;
        Sun, 20 Nov 2022 16:55:02 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1668992102; cv=none;
        d=google.com; s=arc-20160816;
        b=czwxbyBfTpEXxz2LHuViO8289raxJKqGBNTm3o/Fp0bPv/0ESWyjzR+UfQMVeiibuh
         yf7hJxz4O0Wi3JLEeWxlwZWnTy9HWmc15zVQihbJz6dEXjI/6e1TtJGvZirvuaWFpNKS
         EV48lMyalemNf+6w/3/fjMly4N3+elKwtV62qbKWFapW7zI9mX9MqlBfvVNCO+QOvItx
         3RWtCZSahohQ9jhW46VgXa1H+3IXaflREb4i7NiS8G2DbJxUVm9Zc/aQMv3EUaJtJx4f
         H371LyVcVHAcMlD12AhBGoc0UAnidpSJShETtZMLu+pyPqewa7y53xHsO1B/dPNnJekn
         /FgA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from
         :dkim-signature;
        bh=EcyGtgJbhtPPZic72nqXCWIC1h2PBlxnFg7z3W63EM4=;
        b=EAaWuolHZQddSgoNt8hc3TBXSyiiOL3tS7O0poq8kq8QTvSWUXcqUz5fJQDb6RtWsd
         Uxm+3RxFYh9jSKUffqkPGq4v31NCjWyiHaAeenpwgxjKa1ep4fUJhGvV6Lqb5DMG9uRH
         TPjX6l5QJzVlHTsYG3y6lMYcxFHbNGaN38hZLAqpruED7ojpjlzoDrqt3LMrIeZQwHx5
         EOWIwKLfTniClYhpwTo1ktFjDlfWEHBhlbCjo387h+LEXUzt3AvrL1PpxusuUh3U5w3o
         5WVLqMGcIjKWDEC182MutMLFrThwnSKeqK5FL4EWMWzAFoBG3TSrXidv0SK17LLqBGJh
         Wl3g==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=FgD3jlXE;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 c9-20020a170903234900b00186989178b0si10024517plh.132.2022.11.20.16.54.49;
        Sun, 20 Nov 2022 16:55:02 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=FgD3jlXE;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230106AbiKUAb3 (ORCPT <rfc822;leviz.kernel.dev@gmail.com>
        + 99 others); Sun, 20 Nov 2022 19:31:29 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58360 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230104AbiKUAaz (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Sun, 20 Nov 2022 19:30:55 -0500
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6F5DD68C55;
        Sun, 20 Nov 2022 16:29:00 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1668990540; x=1700526540;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=zsDhKWAI30wSvAoA4u3acs9vgLnFrD59dq8SeAEe4Qk=;
  b=FgD3jlXEf4Yn+8hknXYDXDG0t22dwsMxAXuCagsoPH/INewm6SSyBNjE
   Fl7vbOajGijri7uYRBQeTw4Gmop4qWr4dRpTxHz2B7ckMiPttfxyNHbc5
   RKvU6wszJqSthixVjcK2xx/zshi0VgaLPMto6ZcQC52RcrItMGPtFpmvf
   LNgXgngcCNKgqJcmsxootnstjYA1sxkvib1aYH4mm4C1xw6L+/KzyZzEd
   X8PJAMhzNM8zZzcFcoK1GFU3jRUWFZzlU8KKdaQcSI0elbka1wYLxHWxR
   rQ+AxjlylnpkAIrdlc4Axs9Rn/FGsXibG1SrKqQHbFEFzfx4a/j4RwKlH
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="377705773"
X-IronPort-AV: E=Sophos;i="5.96,180,1665471600";
   d="scan'208";a="377705773"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
  by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Nov 2022 16:28:22 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10537"; a="729825561"
X-IronPort-AV: E=Sophos;i="5.96,180,1665471600";
   d="scan'208";a="729825561"
Received: from tomnavar-mobl.amr.corp.intel.com (HELO
 khuang2-desk.gar.corp.intel.com) ([10.209.176.15])
  by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 20 Nov 2022 16:28:17 -0800
From: Kai Huang <kai.huang@intel.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: linux-mm@kvack.org, seanjc@google.com, pbonzini@redhat.com,
        dave.hansen@intel.com, dan.j.williams@intel.com,
        rafael.j.wysocki@intel.com, kirill.shutemov@linux.intel.com,
        ying.huang@intel.com, reinette.chatre@intel.com,
        len.brown@intel.com, tony.luck@intel.com, peterz@infradead.org,
        ak@linux.intel.com, isaku.yamahata@intel.com, chao.gao@intel.com,
        sathyanarayanan.kuppuswamy@linux.intel.com, bagasdotme@gmail.com,
        sagis@google.com, imammedo@redhat.com, kai.huang@intel.com
Subject: [PATCH v7 20/20] Documentation/x86: Add documentation for TDX host
 support
Date: Mon, 21 Nov 2022 13:26:42 +1300
Message-Id: 
 <661183935202155894bb669930d483a555a73a7b.1668988357.git.kai.huang@intel.com>
X-Mailer: git-send-email 2.38.1
In-Reply-To: <cover.1668988357.git.kai.huang@intel.com>
References: <cover.1668988357.git.kai.huang@intel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,
        SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1750065063076106661?=
X-GMAIL-MSGID: =?utf-8?q?1750065063076106661?=

Add documentation for TDX host kernel support.  There is already one
file Documentation/x86/tdx.rst containing documentation for TDX guest
internals.  Also reuse it for TDX host kernel support.

Introduce a new level menu "TDX Guest Support" and move existing
materials under it, and add a new menu for TDX host kernel support.

Signed-off-by: Kai Huang <kai.huang@intel.com>
---

v6 -> v7:
 - Changed "TDX Memory Policy" and "Kexec()" sections.

---
 Documentation/x86/tdx.rst | 181 +++++++++++++++++++++++++++++++++++---
 1 file changed, 170 insertions(+), 11 deletions(-)

diff --git a/Documentation/x86/tdx.rst b/Documentation/x86/tdx.rst
index dc8d9fd2c3f7..35092e7c60f7 100644
--- a/Documentation/x86/tdx.rst
+++ b/Documentation/x86/tdx.rst
@@ -10,6 +10,165 @@ encrypting the guest memory. In TDX, a special module running in a special
 mode sits between the host and the guest and manages the guest/host
 separation.
 
+TDX Host Kernel Support
+=======================
+
+TDX introduces a new CPU mode called Secure Arbitration Mode (SEAM) and
+a new isolated range pointed by the SEAM Ranger Register (SEAMRR).  A
+CPU-attested software module called 'the TDX module' runs inside the new
+isolated range to provide the functionalities to manage and run protected
+VMs.
+
+TDX also leverages Intel Multi-Key Total Memory Encryption (MKTME) to
+provide crypto-protection to the VMs.  TDX reserves part of MKTME KeyIDs
+as TDX private KeyIDs, which are only accessible within the SEAM mode.
+BIOS is responsible for partitioning legacy MKTME KeyIDs and TDX KeyIDs.
+
+Before the TDX module can be used to create and run protected VMs, it
+must be loaded into the isolated range and properly initialized.  The TDX
+architecture doesn't require the BIOS to load the TDX module, but the
+kernel assumes it is loaded by the BIOS.
+
+TDX boot-time detection
+-----------------------
+
+The kernel detects TDX by detecting TDX private KeyIDs during kernel
+boot.  Below dmesg shows when TDX is enabled by BIOS::
+
+  [..] tdx: TDX enabled by BIOS. TDX private KeyID range: [16, 64).
+
+TDX module detection and initialization
+---------------------------------------
+
+There is no CPUID or MSR to detect the TDX module.  The kernel detects it
+by initializing it.
+
+The kernel talks to the TDX module via the new SEAMCALL instruction.  The
+TDX module implements SEAMCALL leaf functions to allow the kernel to
+initialize it.
+
+Initializing the TDX module consumes roughly ~1/256th system RAM size to
+use it as 'metadata' for the TDX memory.  It also takes additional CPU
+time to initialize those metadata along with the TDX module itself.  Both
+are not trivial.  The kernel initializes the TDX module at runtime on
+demand.  The caller to call tdx_enable() to initialize the TDX module::
+
+        ret = tdx_enable();
+        if (ret)
+                goto no_tdx;
+        // TDX is ready to use
+
+Initializing the TDX module requires all logical CPUs being online.
+tdx_enable() internally temporarily disables CPU hotplug to prevent any
+CPU from going offline, but the caller still needs to guarantee all
+present CPUs are online before calling tdx_enable().
+
+Also, tdx_enable() requires all CPUs are already in VMX operation
+(requirement of making SEAMCALL).  Currently, tdx_enable() doesn't handle
+VMXON internally, but depends on the caller to guarantee that.  So far
+KVM is the only user of TDX and KVM already handles VMXON.
+
+User can consult dmesg to see the presence of the TDX module, and whether
+it has been initialized.
+
+If the TDX module is not loaded, dmesg shows below::
+
+  [..] tdx: TDX module is not loaded.
+
+If the TDX module is initialized successfully, dmesg shows something
+like below::
+
+  [..] tdx: TDX module: attributes 0x0, vendor_id 0x8086, major_version 1, minor_version 0, build_date 20211209, build_num 160
+  [..] tdx: 65667 pages allocated for PAMT.
+  [..] tdx: TDX module initialized.
+
+If the TDX module failed to initialize, dmesg shows below::
+
+  [..] tdx: Failed to initialize TDX module. Shut it down.
+
+TDX Interaction to Other Kernel Components
+------------------------------------------
+
+TDX Memory Policy
+~~~~~~~~~~~~~~~~~
+
+TDX reports a list of "Convertible Memory Region" (CMR) to indicate all
+memory regions that can possibly be used by the TDX module, but they are
+not automatically usable to the TDX module.  As a step of initializing
+the TDX module, the kernel needs to choose a list of memory regions (out
+from convertible memory regions) that the TDX module can use and pass
+those regions to the TDX module.  Once this is done, those "TDX-usable"
+memory regions are fixed during module's lifetime.  No more TDX-usable
+memory can be added to the TDX module after that.
+
+To keep things simple, currently the kernel simply guarantees all pages
+in the page allocator are TDX memory.  Specifically, the kernel uses all
+system memory in the core-mm at the time of initializing the TDX module
+as TDX memory, and at the meantime, refuses to add any non-TDX-memory in
+the memory hotplug.
+
+This can be enhanced in the future, i.e. by allowing adding non-TDX
+memory to a separate NUMA node.  In this case, the "TDX-capable" nodes
+and the "non-TDX-capable" nodes can co-exist, but the kernel/userspace
+needs to guarantee memory pages for TDX guests are always allocated from
+the "TDX-capable" nodes.
+
+Note TDX assumes convertible memory is always physically present during
+machine's runtime.  A non-buggy BIOS should never support hot-removal of
+any convertible memory.  This implementation doesn't handle ACPI memory
+removal but depends on the BIOS to behave correctly.
+
+CPU Hotplug
+~~~~~~~~~~~
+
+TDX doesn't support physical (ACPI) CPU hotplug.  During machine boot,
+TDX verifies all boot-time present logical CPUs are TDX compatible before
+enabling TDX.  A non-buggy BIOS should never support hot-add/removal of
+physical CPU.  Currently the kernel doesn't handle physical CPU hotplug,
+but depends on the BIOS to behave correctly.
+
+Note TDX works with CPU logical online/offline, thus the kernel still
+allows to offline logical CPU and online it again.
+
+Kexec()
+~~~~~~~
+
+There are two problems in terms of using kexec() to boot to a new kernel
+when the old kernel has enabled TDX: 1) Part of the memory pages are
+still TDX private pages (i.e. metadata used by the TDX module, and any
+TDX guest memory if kexec() is executed when there's live TDX guests).
+2) There might be dirty cachelines associated with TDX private pages.
+
+Because the hardware doesn't guarantee cache coherency among different
+KeyIDs, the old kernel needs to flush cache (of TDX private pages)
+before booting to the new kernel.  Also, the kernel doesn't convert all
+TDX private pages back to normal because of below considerations:
+
+1) The kernel doesn't have existing infrastructure to track which pages
+   are TDX private page.
+2) The number of TDX private pages can be large, and converting all of
+   them (cache flush + using MOVDIR64B to clear the page) can be time
+   consuming.
+3) The new kernel will almost only use KeyID 0 to access memory.  KeyID
+   0 doesn't support integrity-check, so it's OK.
+4) The kernel doesn't (and may never) support MKTME.  If any 3rd party
+   kernel ever supports MKTME, it should do MOVDIR64B to clear the page
+   with the new MKTME KeyID (just like TDX does) before using it.
+
+The current TDX module architecture doesn't play nicely with kexec().
+The TDX module can only be initialized once during its lifetime, and
+there is no SEAMCALL to reset the module to give a new clean slate to
+the new kernel.  Therefore, ideally, if the module is ever initialized,
+it's better to shut down the module.  The new kernel won't be able to
+use TDX anyway (as it needs to go through the TDX module initialization
+process which will fail immediately at the first step).
+
+However, there's no guarantee CPU is in VMX operation during kexec(), so
+it's impractical to shut down the module.  Currently, the kernel just
+leaves the module in open state.
+
+TDX Guest Support
+=================
 Since the host cannot directly access guest registers or memory, much
 normal functionality of a hypervisor must be moved into the guest. This is
 implemented using a Virtualization Exception (#VE) that is handled by the
@@ -20,7 +179,7 @@ TDX includes new hypercall-like mechanisms for communicating from the
 guest to the hypervisor or the TDX module.
 
 New TDX Exceptions
-==================
+------------------
 
 TDX guests behave differently from bare-metal and traditional VMX guests.
 In TDX guests, otherwise normal instructions or memory accesses can cause
@@ -30,7 +189,7 @@ Instructions marked with an '*' conditionally cause exceptions.  The
 details for these instructions are discussed below.
 
 Instruction-based #VE
----------------------
+~~~~~~~~~~~~~~~~~~~~~
 
 - Port I/O (INS, OUTS, IN, OUT)
 - HLT
@@ -41,7 +200,7 @@ Instruction-based #VE
 - CPUID*
 
 Instruction-based #GP
----------------------
+~~~~~~~~~~~~~~~~~~~~~
 
 - All VMX instructions: INVEPT, INVVPID, VMCLEAR, VMFUNC, VMLAUNCH,
   VMPTRLD, VMPTRST, VMREAD, VMRESUME, VMWRITE, VMXOFF, VMXON
@@ -52,7 +211,7 @@ Instruction-based #GP
 - RDMSR*,WRMSR*
 
 RDMSR/WRMSR Behavior
---------------------
+~~~~~~~~~~~~~~~~~~~~
 
 MSR access behavior falls into three categories:
 
@@ -73,7 +232,7 @@ trapping and handling in the TDX module.  Other than possibly being slow,
 these MSRs appear to function just as they would on bare metal.
 
 CPUID Behavior
---------------
+~~~~~~~~~~~~~~
 
 For some CPUID leaves and sub-leaves, the virtualized bit fields of CPUID
 return values (in guest EAX/EBX/ECX/EDX) are configurable by the
@@ -93,7 +252,7 @@ not know how to handle. The guest kernel may ask the hypervisor for the
 value with a hypercall.
 
 #VE on Memory Accesses
-======================
+----------------------
 
 There are essentially two classes of TDX memory: private and shared.
 Private memory receives full TDX protections.  Its content is protected
@@ -107,7 +266,7 @@ entries.  This helps ensure that a guest does not place sensitive
 information in shared memory, exposing it to the untrusted hypervisor.
 
 #VE on Shared Memory
---------------------
+~~~~~~~~~~~~~~~~~~~~
 
 Access to shared mappings can cause a #VE.  The hypervisor ultimately
 controls whether a shared memory access causes a #VE, so the guest must be
@@ -127,7 +286,7 @@ be careful not to access device MMIO regions unless it is also prepared to
 handle a #VE.
 
 #VE on Private Pages
---------------------
+~~~~~~~~~~~~~~~~~~~~
 
 An access to private mappings can also cause a #VE.  Since all kernel
 memory is also private memory, the kernel might theoretically need to
@@ -145,7 +304,7 @@ The hypervisor is permitted to unilaterally move accepted pages to a
 to handle the exception.
 
 Linux #VE handler
-=================
+-----------------
 
 Just like page faults or #GP's, #VE exceptions can be either handled or be
 fatal.  Typically, an unhandled userspace #VE results in a SIGSEGV.
@@ -167,7 +326,7 @@ While the block is in place, any #VE is elevated to a double fault (#DF)
 which is not recoverable.
 
 MMIO handling
-=============
+-------------
 
 In non-TDX VMs, MMIO is usually implemented by giving a guest access to a
 mapping which will cause a VMEXIT on access, and then the hypervisor
@@ -189,7 +348,7 @@ MMIO access via other means (like structure overlays) may result in an
 oops.
 
 Shared Memory Conversions
-=========================
+-------------------------
 
 All TDX guest memory starts out as private at boot.  This memory can not
 be accessed by the hypervisor.  However, some kernel users like device