Message ID | 20230213234836.3683-1-kirill.shutemov@linux.intel.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp2648718wrn; Mon, 13 Feb 2023 15:56:03 -0800 (PST) X-Google-Smtp-Source: AK7set/YsfHJiZyRV3kFkjD0/HPAFuQzJy55dt6n3/TS6ce97GijWypkIS8Zpbr5Gle9GfupI0q4 X-Received: by 2002:a17:906:fc06:b0:894:acbe:7a97 with SMTP id ov6-20020a170906fc0600b00894acbe7a97mr798219ejb.13.1676332563114; Mon, 13 Feb 2023 15:56:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676332563; cv=none; d=google.com; s=arc-20160816; b=zW/Qlxg+U7bfHA2bSvVRvUQaFmUOz+UN50DmEA+EB6ohsHrBwdrUwwwireWPpw7QRy LnYZB4woQ0haJeTUegPlz5qfmoDpBKdcMsTZYueTdIhNRmh2vQW94qiVh2WzUyz0NKE7 fxgB8kiPpziT8328OtmZ4d512HC1EQNa80zFyBlen+9WDblqbq3v7pq98YX3lo6OU/K8 mUcQqrOVeW0y9GV0P8Nn3bU7GEaUb7RyAhGoZQKelweUeUPoBHRZ9mthX2jhZfHFQFAn rmySEwaMP2uddOFp873jGUerbLZndqTpf6RTc+945XRzxxf+9f88n6lH9Q8whGVZWD9n Gtfw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=779u1MmbyvGpKOCbj1k/+8x0FxArizkKRPHCEw9jHMs=; b=duGaRLOmPbr8OtP/TbfgwMl8vTNKLPAYJA62SouBjXFFrSgGUYotZ0g9f8jeT1UIFh wGBnoDFyZamHDCQ6nL0nYWoiLnVYppBan5uBG+FYbbrM88JjeDvVKRkXOef8W6tjMsnE Zz/optWVjZE3R1rcVfQOsM1UHqiScw+ZQhR8jaq+2pXiPO9sMxLp54d/KZXIYtC6Mj0J mi+pSJtTiQLdDkcXp4MMRr7Gi+BOHTXiFU+FfwuPzHWfaH8wJlL98xSVC1tEpnBVvi4t 5FL7GnYR3TK4VubhPg4YR1TjWKoEaYJXvOz/UM1cOBmfYcGKa92kFUuuG+57GqZYgrHu gZww== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=CSJRnkDp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g8-20020aa7d1c8000000b004acbdb25b3bsi6820962edp.628.2023.02.13.15.55.40; Mon, 13 Feb 2023 15:56:03 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=CSJRnkDp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231190AbjBMXsz (ORCPT <rfc822;tebrre53rla2o@gmail.com> + 99 others); Mon, 13 Feb 2023 18:48:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54770 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229604AbjBMXsu (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 13 Feb 2023 18:48:50 -0500 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 66ECC126C7 for <linux-kernel@vger.kernel.org>; Mon, 13 Feb 2023 15:48:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1676332129; x=1707868129; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=yDoVpTAUfuRLKHuvAYJkruKvIjru3gmVTwuDctec5C8=; b=CSJRnkDpnb3DQ8a5puUCnCsCsMbESMZPLxJrwLRjvTd+x9q4unR6pm+/ tPpzyPnDgUAkulDSj0ZD3SUeXHwdwGsBr//ZKa1Aa4Tmue/UDEv+tP4cw PuHH8TxnNzfjn3G2aNQQ551QJliOFvuB+ta1rbwemirRCKh90MOgKHA13 w2IgVKM/VcZqWlw+WQZ5KV/ojprND8bk2tEasfeEjxSwPOh5YftRfxTeU cSM5CsCDlBokq9wYNdBlxV+wi+TrQJ7c26KQg0PX8wjI0FacxlT6tv28H ZPB324cwgd6tK47/x+B82BGR8wFWU/kXcQR3i/EIRj1bdoNTNWsMW2kEs g==; X-IronPort-AV: E=McAfee;i="6500,9779,10620"; a="329658419" X-IronPort-AV: E=Sophos;i="5.97,294,1669104000"; d="scan'208";a="329658419" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Feb 2023 15:48:48 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10620"; a="668965311" X-IronPort-AV: E=Sophos;i="5.97,294,1669104000"; d="scan'208";a="668965311" Received: from iannetti-mobl.ger.corp.intel.com (HELO box.shutemov.name) ([10.252.49.216]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Feb 2023 15:48:46 -0800 Received: by box.shutemov.name (Postfix, from userid 1000) id 5BB7810CA33; Tue, 14 Feb 2023 02:48:43 +0300 (+03) From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> To: Dave Hansen <dave.hansen@intel.com>, Borislav Petkov <bp@alien8.de> Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>, Thomas Gleixner <tglx@linutronix.de>, Isaku Yamahata <isaku.yamahata@intel.com>, x86@kernel.org, linux-coco@lists.linux.dev, kexec@lists.infradead.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Subject: [PATCH 0/2] Kexec enabling in TDX guest Date: Tue, 14 Feb 2023 02:48:34 +0300 Message-Id: <20230213234836.3683-1-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.39.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1757762093846430303?= X-GMAIL-MSGID: =?utf-8?q?1757762093846430303?= |
Series |
Kexec enabling in TDX guest
|
|
Message
Kirill A. Shutemov
Feb. 13, 2023, 11:48 p.m. UTC
The patch brings basic enabling of kexec in TDX guests. By "basic enabling" I mean, kexec in the guests with a single CPU. TDX guests use ACPI MADT MPWK to bring up secondary CPUs. The mechanism doesn't allow to put a CPU back offline if it has woken up. We are looking into this, but it might take time. Kirill A. Shutemov (2): x86/kexec: Preserve CR4.MCE during kexec x86/tdx: Convert shared memory back to private on kexec arch/x86/coco/tdx/Makefile | 1 + arch/x86/coco/tdx/kexec.c | 82 ++++++++++++++++++++++++++++ arch/x86/include/asm/tdx.h | 4 ++ arch/x86/kernel/machine_kexec_64.c | 2 + arch/x86/kernel/relocate_kernel_64.S | 6 +- 5 files changed, 94 insertions(+), 1 deletion(-) create mode 100644 arch/x86/coco/tdx/kexec.c
Comments
On 2/13/23 15:48, Kirill A. Shutemov wrote: > The patch brings basic enabling of kexec in TDX guests. > > By "basic enabling" I mean, kexec in the guests with a single CPU. > TDX guests use ACPI MADT MPWK to bring up secondary CPUs. The mechanism > doesn't allow to put a CPU back offline if it has woken up. > > We are looking into this, but it might take time. This is simple enough. But, nobody will _actually_ use this code as-is, right? What's the point of applying it now?
On Thu, Feb 16, 2023 at 09:50:32AM -0800, Dave Hansen wrote: > On 2/13/23 15:48, Kirill A. Shutemov wrote: > > The patch brings basic enabling of kexec in TDX guests. > > > > By "basic enabling" I mean, kexec in the guests with a single CPU. > > TDX guests use ACPI MADT MPWK to bring up secondary CPUs. The mechanism > > doesn't allow to put a CPU back offline if it has woken up. > > > > We are looking into this, but it might take time. > > This is simple enough. But, nobody will _actually_ use this code as-is, > right? What's the point of applying it now? Why nobody? Single CPU VMs are not that uncommon.
On 2/16/23 10:12, Kirill A. Shutemov wrote: > On Thu, Feb 16, 2023 at 09:50:32AM -0800, Dave Hansen wrote: >> On 2/13/23 15:48, Kirill A. Shutemov wrote: >>> The patch brings basic enabling of kexec in TDX guests. >>> >>> By "basic enabling" I mean, kexec in the guests with a single CPU. >>> TDX guests use ACPI MADT MPWK to bring up secondary CPUs. The mechanism >>> doesn't allow to put a CPU back offline if it has woken up. >>> >>> We are looking into this, but it might take time. >> This is simple enough. But, nobody will _actually_ use this code as-is, >> right? What's the point of applying it now? > Why nobody? Single CPU VMs are not that uncommon. Here's one data point: the only "General Purpose" ones I see AWS offering are Haswell era: https://aws.amazon.com/ec2/instance-types/ That _might_ be because of concerns about SMT side-channel exposure on anything newer. So, we can argue about what "uncommon" means. But, a minority of folks care about 1-cpu VMs. Also, a separate minority of folks care about kexec(). I'm worried that the overlap between the two will be an *OVERWHELMING* minority of folks. In other words, so few people will use this code that it'll just bitrot. I'm looking for compelling arguments why mainline should carry this.
On Tue, 2023-02-14 at 02:48 +0300, Kirill A. Shutemov wrote: > The patch brings basic enabling of kexec in TDX guests. > > By "basic enabling" I mean, kexec in the guests with a single CPU. > TDX guests use ACPI MADT MPWK to bring up secondary CPUs. The mechanism > doesn't allow to put a CPU back offline if it has woken up. > > We are looking into this, but it might take time. Can't we park the secondary CPUs in a purgatory-like thing of their own and wake them from there when we want them? Patches for that were floating around once, although the primary reason then was latency, and we decided to address that differently by doing the bringup in parallel instead.
On Wed, Feb 22, 2023 at 10:26:22AM +0000, David Woodhouse wrote: > On Tue, 2023-02-14 at 02:48 +0300, Kirill A. Shutemov wrote: > > The patch brings basic enabling of kexec in TDX guests. > > > > By "basic enabling" I mean, kexec in the guests with a single CPU. > > TDX guests use ACPI MADT MPWK to bring up secondary CPUs. The mechanism > > doesn't allow to put a CPU back offline if it has woken up. > > > > We are looking into this, but it might take time. > > Can't we park the secondary CPUs in a purgatory-like thing of their own > and wake them from there when we want them? > > Patches for that were floating around once, although the primary reason > then was latency, and we decided to address that differently by doing > the bringup in parallel instead. That's plan B. It is suboptimal. kexec() can happen into something that is not Linux which will not be able to wake up CPUs. Ideally, it has to be addressed on BIOS level: it has to provide a way to offline CPUs, putting it back to pre-wakeup state.
On 2/24/23 06:30, Kirill A. Shutemov wrote: > Ideally, it has to be addressed on BIOS level: it has to provide a way to > offline CPUs, putting it back to pre-wakeup state. Is there anything stopping us from just parking the CPUs in a loop looking at 'acpi_mp_wake_mailbox_paddr'? Basically park them in a way which is indistinguishable from what the BIOS did.
On Fri, Feb 24, 2023 at 07:22:18AM -0800, Dave Hansen wrote: > On 2/24/23 06:30, Kirill A. Shutemov wrote: > > Ideally, it has to be addressed on BIOS level: it has to provide a way to > > offline CPUs, putting it back to pre-wakeup state. > > Is there anything stopping us from just parking the CPUs in a loop > looking at 'acpi_mp_wake_mailbox_paddr'? Basically park them in a way > which is indistinguishable from what the BIOS did. +Rafael. - Forward compatibility can be an issue. Version 0 of mailbox supports only single Wakeup command. Future specs may define a new command that kernel implementation doesn't support. - BIOS owns the mailbox page and can re-use for something else after the last CPU has woken up. (I know it is very theoretical, but still.) - We can patch ACPI table to point to mailbox page in kernel allocated memory, but it brings other problem. If the first kernel didn't wake up all CPUs for some reason (CONFIG_SMP=n or nr_cpus= or something) the second kernel would not be able to wake up them too since they looping around the old address. But ultimately, I think it is clearly missing BIOS functionality and has to be addressed there. Hacking around it in kernel will lead to more problems down the road.