[kernel,v3] x86/compressed/64: reduce #VC nesting for intercepted CPUID for SEV-SNP guest
Message ID | 20231003073123.1763564-1-aik@amd.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:2a8e:b0:403:3b70:6f57 with SMTP id in14csp1913937vqb; Tue, 3 Oct 2023 00:32:01 -0700 (PDT) X-Google-Smtp-Source: AGHT+IG71WeIJcx+7AiXjj1sUq9/Nl1HpKaQu4yi32TLwCYpfGXMRDXQ7wLjq8IYaKBJbvuJwKcT X-Received: by 2002:a05:6358:52cf:b0:13c:ee28:2323 with SMTP id z15-20020a05635852cf00b0013cee282323mr16949139rwz.16.1696318321461; Tue, 03 Oct 2023 00:32:01 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1696318321; cv=pass; d=google.com; s=arc-20160816; b=tVgUDflWMkEhPzJ+P2Jqo2clkUgWekZ/ODQITUqfCN1YUJX3RWjz5isZQepFCVNRZ3 lELDuGDLHVE+PxkRcGC4HQSFIFhtYeQFUsqgIhcjgavsNb+tCHGcDBjuCzIbaYGTzhBz q4jCpFqDOvQ2RLAyrdSTHqlscM87GywbABaTJJVzcpyBwAHdGs/uFfiDe3I3uOAo6nn3 8XDgoavJfPkdGlP8SR/qHDHCK/OnTHwpcagRcJc8mpdB3rtaMHdoM8mMp6rvx/nCaALn J7YHnIwxK2xSz/GzGxT7u2OILllvNT1ALrjeQ8x+SHZqtme+uTWcOOhWwUlPEexZs+TE qcMQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=TisWu+VLPNfEYcip/BDkc1glSc5VhFgBpx4jUXyMFF8=; fh=m5cLB5weZ86EPaXrAXH7YMMW6iceME9NGa9cgbPyP6Y=; b=XmvJrUHDXF7lvPNdpsZzT/zc5NVNCcMbSjYDDyrbi6IYGLfY7JmVbwgNabunEdMhoi AC10s1icJNIf0DyfO1OWPbVm0pRbO5GNK2vfNuBrtEkQ0XOo7HVUdsTQrpetCAaFcbJ8 XA8nRaIh3flPsepGEZ+jMErjcMxv8V6uC7hJ/qAkYFmt6ejZQm3sa/UeE/kBHBB/21Du WLh+bs3QlF43XsQlav0hPs7G45s9leg0fOIgkdDLuKdsLkB3ZVgSK9Lq4608yAlPWvMH aaklWPSsCCWpi8JEEwC5htLsUkr527s8ZusHBTxbBJZ5+8U7/5oGm27ko8EtW4A2xq+l UoVA== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@amd.com header.s=selector1 header.b=dufQDHuY; arc=pass (i=1 spf=pass spfdomain=amd.com dmarc=pass fromdomain=amd.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amd.com Received: from howler.vger.email (howler.vger.email. [23.128.96.34]) by mx.google.com with ESMTPS id t23-20020a656097000000b0055b731aa9adsi808921pgu.562.2023.10.03.00.32.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 03 Oct 2023 00:32:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) client-ip=23.128.96.34; Authentication-Results: mx.google.com; dkim=pass header.i=@amd.com header.s=selector1 header.b=dufQDHuY; arc=pass (i=1 spf=pass spfdomain=amd.com dmarc=pass fromdomain=amd.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amd.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id B21BE80A1452; Tue, 3 Oct 2023 00:32:00 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231157AbjJCHb4 (ORCPT <rfc822;pusanteemu@gmail.com> + 18 others); Tue, 3 Oct 2023 03:31:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34314 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230323AbjJCHbz (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 3 Oct 2023 03:31:55 -0400 Received: from NAM12-DM6-obe.outbound.protection.outlook.com (mail-dm6nam12on2061.outbound.protection.outlook.com [40.107.243.61]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AA48A83 for <linux-kernel@vger.kernel.org>; Tue, 3 Oct 2023 00:31:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=DEiaA6q5YBinzWdbffTJl/XY5IjiDvNmm6W77A/Opzprj1rOsUTHbj1yDvLhuZn0jQhTkjrkr8ZHrWuHPq6WQeAolY8/A4SXupFaUml2f9RSsRJpKbqQCte5UwcOI3MDcFvuDsN/QQAcuI0z1GJAMLER081e6DrDMV25fZRtdj9AjKs5uVSwMxvqGzS1FBoMWju6YIldkVUY6UY9yZSuyiqL8IuXnGEDIBHPdUJUC+ei4NdaZaNIRruVPZqNOukBaUq3Uc67/Fe2d3KfNgTcoR0j06jO/Gj3XepYsuarbvJFu5NJX4qWcQww64hwdY3xeRAU1wowxS8tHlH8MQoz0g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=TisWu+VLPNfEYcip/BDkc1glSc5VhFgBpx4jUXyMFF8=; b=dFVgW2aeSEnEQ8PBtMW2k/gdHnvNM/FHJ3EL43K0RivGXNy42+0X4HQZ6JlYwBfj+u4oA2Aef7tW1k7+xq7nx8JYqIxIKyWdEAe7x23w7eKEarz4sFjhT/MlY51ebJUi/rkdMuinVa6L9GiHCwIt4DRexe7Tl1lTzGdVJS6Huwt2mVvhOGgZ9DOQdCj9+FMaMgSOsGhAcEWGIp7HUSLz24OYk/Uhrm8y9VEfFEfjStrKtiOZ/EaZFh4rfujeKwS3U86j8SR1dBgeLq57hnjGeWVAVFuoS6Ne5y1tzQh2bR3Ukh3GVNiXo3YUHb2chh7FpfAyFp65+gg9VLrOhVBHaw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=TisWu+VLPNfEYcip/BDkc1glSc5VhFgBpx4jUXyMFF8=; b=dufQDHuYQiARPbDB39UAsvDiDelHdULdIZIt2GEEj68dSt/NTavxsnbB0l5JXa+dvAYkbgcbSNkm1S/r+8r6ZlpWm/Q2an2GfGc0JNsS0svrfEHp0NpfLIlUob5NL3jSKe4lQu9vYid50w61QNP/YRTWrMeMpAtJ0RhkRU/tq1U= Received: from CH2PR16CA0027.namprd16.prod.outlook.com (2603:10b6:610:50::37) by DS7PR12MB6240.namprd12.prod.outlook.com (2603:10b6:8:94::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6838.31; Tue, 3 Oct 2023 07:31:50 +0000 Received: from DS3PEPF000099D9.namprd04.prod.outlook.com (2603:10b6:610:50:cafe::3d) by CH2PR16CA0027.outlook.office365.com (2603:10b6:610:50::37) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6838.33 via Frontend Transport; Tue, 3 Oct 2023 07:31:50 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by DS3PEPF000099D9.mail.protection.outlook.com (10.167.17.10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.6838.14 via Frontend Transport; Tue, 3 Oct 2023 07:31:50 +0000 Received: from aiemdee.2.ozlabs.ru (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.27; Tue, 3 Oct 2023 02:31:45 -0500 From: Alexey Kardashevskiy <aik@amd.com> To: <x86@kernel.org> CC: <linux-kernel@vger.kernel.org>, Ingo Molnar <mingo@redhat.com>, "Borislav Petkov" <bp@alien8.de>, Tom Lendacky <thomas.lendacky@amd.com>, "Alexey Kardashevskiy" <aik@amd.com> Subject: [PATCH kernel v3] x86/compressed/64: reduce #VC nesting for intercepted CPUID for SEV-SNP guest Date: Tue, 3 Oct 2023 18:31:23 +1100 Message-ID: <20231003073123.1763564-1-aik@amd.com> X-Mailer: git-send-email 2.41.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS3PEPF000099D9:EE_|DS7PR12MB6240:EE_ X-MS-Office365-Filtering-Correlation-Id: f4753995-f295-485e-90b3-08dbc3e2ced0 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: ASfCNUv+8gD/qK2mnFSEMzoa/rLHtlVlr8bcspotj+J5CMnkorp66eM8ZXPAKWRbG2Bibl4ZXWmoX13vTd9ZauKtCeZEX+Q1G51YNR3dgmyAu/WHLhhooxuzTTJz/sfYQeH7nrSwwb2rRrFcK/mxVedLyabGd7OrUOSF/t5UgzGXF/9zOzqpU5bDBcg569SQ8oWt2Keb46mEpFXpJ8NUPFPO6GwkjSuIwSUeuwmNElPKcDJMgL7Vj0t/D8PBjXtuXMGiNdPvDHd5IBwWlF4WrqfRtIKVKKMjCRVdVpt92lbwJVAMmh9bxRVvhXEoTUE4zA20yJwKUud9TndFPIn/uUewIitErMyEYDTieeONa0vXgKXy8p6vrlwFPn/YNOhAzG9g5uITFLitNSlPGr2smr1DAJ6+kS+C5uxscCNb0TC/dMCBiCDnw6BY27yxPCGT5/+uKxq0x51MYti1jRqM7Znh7lbnpaJ+XbE0sIsR7ln9Ft0hk27JVhaLyUme+vF79TANL0+6nYUugWNp8nCZn8aVL5p+jTB2PbOmuTiY6dzskQMBqo59QyvtQwYNIPtQzkXN6ajcoRA1fGx3uXnZIY3bZRJM4EGLj5GVBgaNqH9sWs966ShzS9kpfd1D4MPZSSgO+9/8tqUox5HGIxGzSLLUsFcsfqTiMhSoEyvGEIgJPoeJNKEi7mnR6DBTxpoQ3djmXg5JX7L+V5o955qyLAwu3X6nWhJraZ/vjmTVCe5XPjNWZ4Mbcg/ddXNj+coC X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230031)(4636009)(39860400002)(376002)(346002)(136003)(396003)(230922051799003)(1800799009)(186009)(451199024)(82310400011)(64100799003)(40470700004)(36840700001)(46966006)(2616005)(2906002)(40460700003)(356005)(82740400003)(83380400001)(81166007)(336012)(26005)(47076005)(40480700001)(36756003)(36860700001)(1076003)(16526019)(426003)(966005)(316002)(8936002)(4326008)(8676002)(478600001)(5660300002)(6916009)(41300700001)(54906003)(70206006)(70586007)(6666004)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 03 Oct 2023 07:31:50.2076 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: f4753995-f295-485e-90b3-08dbc3e2ced0 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: DS3PEPF000099D9.namprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS7PR12MB6240 X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FORGED_SPF_HELO, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Tue, 03 Oct 2023 00:32:00 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1778076200076363144 X-GMAIL-MSGID: 1778718680300965082 |
Series |
[kernel,v3] x86/compressed/64: reduce #VC nesting for intercepted CPUID for SEV-SNP guest
|
|
Commit Message
Alexey Kardashevskiy
Oct. 3, 2023, 7:31 a.m. UTC
For certain intercepts an SNP guest uses the GHCB protocol to talk to
the hypervisor from the #VC handler. The protocol requires a shared page so
there is one per vCPU. In case NMI arrives in a middle of #VC or the NMI
handler triggers a #VC, there is another "backup" GHCB page which stores
the content of the first one while SVM_VMGEXIT_NMI_COMPLETE is sent.
The vc_raw_handle_exception() handler manages main and backup GHCB pages
via __sev_get_ghcb/__sev_put_ghcb.
This works fine for #VC and occasional NMIs but not so fine when the #VC
handler causes intercept + another #VC. If NMI arrives during
the second #VC, there are no more pages for SVM_VMGEXIT_NMI_COMPLETE.
The problem place is the #VC CPUID handler which reads an MSR which
triggers another #VC and if "perf" was running, panic happens:
Kernel panic - not syncing: Unable to handle #VC exception! GHCB and Backup GHCB are already in use
Add a helper similar to native_read_msr_safe() for making a direct hypercall
in the SEV-ES environment. Use the new helper instead of the raw "rdmsr" to
avoid the extra #VC event.
Fixes: ee0bfa08a345 ("x86/compressed/64: Add support for SEV-SNP CPUID table in #VC handlers")
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
---
Based on:
https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=tip-x86-urgent
which top at the time was:
62d5e970d022 "x86/sev: Change npages to unsigned long in snp_accept_memory()"
---
Changes:
v3:
* made it a function, mimic native_read_msr_safe() which 1) returns value 2) returns an error
* removed debug backtraces the commit log as these were added for debugging and never
appear with actual kernels
v2:
* de-uglify by defining rdmsr_safe_GHCB()
---
arch/x86/kernel/sev-shared.c | 27 +++++++++++++++++---
1 file changed, 23 insertions(+), 4 deletions(-)
Comments
* Alexey Kardashevskiy <aik@amd.com> wrote: > For certain intercepts an SNP guest uses the GHCB protocol to talk to > the hypervisor from the #VC handler. The protocol requires a shared page so s/requires a shared page so there is /requires a shared page, so there is > there is one per vCPU. In case NMI arrives in a middle of #VC or the NMI > handler triggers a #VC, there is another "backup" GHCB page which stores > the content of the first one while SVM_VMGEXIT_NMI_COMPLETE is sent. > The vc_raw_handle_exception() handler manages main and backup GHCB pages > via __sev_get_ghcb/__sev_put_ghcb. > > This works fine for #VC and occasional NMIs but not so fine when the #VC > handler causes intercept + another #VC. If NMI arrives during > the second #VC, there are no more pages for SVM_VMGEXIT_NMI_COMPLETE. > The problem place is the #VC CPUID handler which reads an MSR which > triggers another #VC and if "perf" was running, panic happens: > > Kernel panic - not syncing: Unable to handle #VC exception! GHCB and Backup GHCB are already in use > > Add a helper similar to native_read_msr_safe() for making a direct hypercall > in the SEV-ES environment. Use the new helper instead of the raw "rdmsr" to s/"rdmsr" /RDMSR > avoid the extra #VC event. > > Fixes: ee0bfa08a345 ("x86/compressed/64: Add support for SEV-SNP CPUID table in #VC handlers") > Signed-off-by: Alexey Kardashevskiy <aik@amd.com> LGTM. Reviewed-by: Ingo Molnar <mingo@kernel.org> Thanks, Ingo
On 10/3/23 02:31, Alexey Kardashevskiy wrote: > For certain intercepts an SNP guest uses the GHCB protocol to talk to > the hypervisor from the #VC handler. The protocol requires a shared page so > there is one per vCPU. In case NMI arrives in a middle of #VC or the NMI > handler triggers a #VC, there is another "backup" GHCB page which stores > the content of the first one while SVM_VMGEXIT_NMI_COMPLETE is sent. > The vc_raw_handle_exception() handler manages main and backup GHCB pages > via __sev_get_ghcb/__sev_put_ghcb. > > This works fine for #VC and occasional NMIs but not so fine when the #VC > handler causes intercept + another #VC. If NMI arrives during > the second #VC, there are no more pages for SVM_VMGEXIT_NMI_COMPLETE. > The problem place is the #VC CPUID handler which reads an MSR which > triggers another #VC and if "perf" was running, panic happens: > > Kernel panic - not syncing: Unable to handle #VC exception! GHCB and Backup GHCB are already in use > > Add a helper similar to native_read_msr_safe() for making a direct hypercall > in the SEV-ES environment. Use the new helper instead of the raw "rdmsr" to > avoid the extra #VC event. > > Fixes: ee0bfa08a345 ("x86/compressed/64: Add support for SEV-SNP CPUID table in #VC handlers") > Signed-off-by: Alexey Kardashevskiy <aik@amd.com> > --- > > Based on: > https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=tip-x86-urgent > which top at the time was: > 62d5e970d022 "x86/sev: Change npages to unsigned long in snp_accept_memory()" > > --- > Changes: > v3: > * made it a function, mimic native_read_msr_safe() which 1) returns value 2) returns an error > * removed debug backtraces the commit log as these were added for debugging and never > appear with actual kernels > > > v2: > * de-uglify by defining rdmsr_safe_GHCB() > --- > arch/x86/kernel/sev-shared.c | 27 +++++++++++++++++--- > 1 file changed, 23 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c > index dcf325b7b022..494d92a71986 100644 > --- a/arch/x86/kernel/sev-shared.c > +++ b/arch/x86/kernel/sev-shared.c > @@ -241,6 +241,25 @@ static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb, > return verify_exception_info(ghcb, ctxt); > } > > + > +/* Paravirt SEV-ES rdmsr which avoids extra #VC event */ > +static unsigned long long ghcb_prot_read_msr(unsigned int msr, struct ghcb *ghcb, > + struct es_em_ctxt *ctxt, int *err) Alternatively you could return enum es_result and take xss as a parameter... six of one, half dozen of another I guess. > +{ > + unsigned long long ret = 0; > + > + ghcb_set_rcx(ghcb, msr); > + > + *err = sev_es_ghcb_hv_call(ghcb, ctxt, SVM_EXIT_MSR, 0, 0); > + if (*err == ES_OK) > + ret = (ghcb->save.rdx << 32) | ghcb->save.rax; You should check ghcb_rax_is_valid(ghcb) and ghcb_rdx_is_valid(ghcb) before using the values. > + > + /* Invalidate qwords for likely another following GHCB call */ > + vc_ghcb_invalidate(ghcb); We should probably call this on entry to the function, too, right? Not sure it really matters though. Thanks, Tom > + > + return ret; > +} > + > static int __sev_cpuid_hv(u32 fn, int reg_idx, u32 *reg) > { > u64 val; > @@ -477,11 +496,11 @@ static int snp_cpuid_postprocess(struct ghcb *ghcb, struct es_em_ctxt *ctxt, > if (leaf->subfn == 1) { > /* Get XSS value if XSAVES is enabled. */ > if (leaf->eax & BIT(3)) { > - unsigned long lo, hi; > + int err = 0; > > - asm volatile("rdmsr" : "=a" (lo), "=d" (hi) > - : "c" (MSR_IA32_XSS)); > - xss = (hi << 32) | lo; > + xss = ghcb_prot_read_msr(MSR_IA32_XSS, ghcb, ctxt, &err); > + if (err != ES_OK) > + return -EINVAL; > } > > /*
On 4/10/23 04:21, Tom Lendacky wrote: > On 10/3/23 02:31, Alexey Kardashevskiy wrote: >> For certain intercepts an SNP guest uses the GHCB protocol to talk to >> the hypervisor from the #VC handler. The protocol requires a shared >> page so >> there is one per vCPU. In case NMI arrives in a middle of #VC or the NMI >> handler triggers a #VC, there is another "backup" GHCB page which stores >> the content of the first one while SVM_VMGEXIT_NMI_COMPLETE is sent. >> The vc_raw_handle_exception() handler manages main and backup GHCB pages >> via __sev_get_ghcb/__sev_put_ghcb. >> >> This works fine for #VC and occasional NMIs but not so fine when the #VC >> handler causes intercept + another #VC. If NMI arrives during >> the second #VC, there are no more pages for SVM_VMGEXIT_NMI_COMPLETE. >> The problem place is the #VC CPUID handler which reads an MSR which >> triggers another #VC and if "perf" was running, panic happens: >> >> Kernel panic - not syncing: Unable to handle #VC exception! GHCB and >> Backup GHCB are already in use >> >> Add a helper similar to native_read_msr_safe() for making a direct >> hypercall >> in the SEV-ES environment. Use the new helper instead of the raw >> "rdmsr" to >> avoid the extra #VC event. >> >> Fixes: ee0bfa08a345 ("x86/compressed/64: Add support for SEV-SNP CPUID >> table in #VC handlers") >> Signed-off-by: Alexey Kardashevskiy <aik@amd.com> >> --- >> >> Based on: >> https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=tip-x86-urgent >> which top at the time was: >> 62d5e970d022 "x86/sev: Change npages to unsigned long in >> snp_accept_memory()" >> >> --- >> Changes: >> v3: >> * made it a function, mimic native_read_msr_safe() which 1) returns >> value 2) returns an error >> * removed debug backtraces the commit log as these were added for >> debugging and never >> appear with actual kernels >> >> >> v2: >> * de-uglify by defining rdmsr_safe_GHCB() >> --- >> arch/x86/kernel/sev-shared.c | 27 +++++++++++++++++--- >> 1 file changed, 23 insertions(+), 4 deletions(-) >> >> diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c >> index dcf325b7b022..494d92a71986 100644 >> --- a/arch/x86/kernel/sev-shared.c >> +++ b/arch/x86/kernel/sev-shared.c >> @@ -241,6 +241,25 @@ static enum es_result sev_es_ghcb_hv_call(struct >> ghcb *ghcb, >> return verify_exception_info(ghcb, ctxt); >> } >> + >> +/* Paravirt SEV-ES rdmsr which avoids extra #VC event */ >> +static unsigned long long ghcb_prot_read_msr(unsigned int msr, struct >> ghcb *ghcb, >> + struct es_em_ctxt *ctxt, int *err) > > Alternatively you could return enum es_result and take xss as a > parameter... six of one, half dozen of another I guess. How do we decide on this? :) and yeah, I need to s/int/enum es_result/ >> +{ >> + unsigned long long ret = 0; >> + >> + ghcb_set_rcx(ghcb, msr); >> + >> + *err = sev_es_ghcb_hv_call(ghcb, ctxt, SVM_EXIT_MSR, 0, 0); >> + if (*err == ES_OK) >> + ret = (ghcb->save.rdx << 32) | ghcb->save.rax; > > You should check ghcb_rax_is_valid(ghcb) and ghcb_rdx_is_valid(ghcb) > before using the values. Huh. v4 is coming then. Although what are the chances of *err == ES_OK and !ghcb_rax_is_valid() at the same time? What if *err == ES_OK and ghcb_rdx_is_valid()==true but ghcb_rax_is_valid()==false? return ((ghcb_rdx_is_valid(ghcb)?(ghcb->save.rdx << 32):0) | (ghcb_rax_is_valid(ghcb)?ghcb->save.rax:0; Or I can just drop *err, invalidate ghcb before sev_es_ghcb_hv_call() and only rely on (ghcb_rdx_is_valid() && ghcb_rax_is_valid)? Where should I stop with this? :) >> + >> + /* Invalidate qwords for likely another following GHCB call */ >> + vc_ghcb_invalidate(ghcb); > > We should probably call this on entry to the function, too, right? Not > sure it really matters though. The SVM_EXIT_MSR's handler in SVM/KVM only cares if RCX is valid in sev_es_validate_vmgexit() and the guest's ghcb_set_rcx() does that. Nothing in SVM enforces that other (unused) registers are not valid though. Thanks, > > Thanks, > Tom > >> + >> + return ret; >> +} >> + >> static int __sev_cpuid_hv(u32 fn, int reg_idx, u32 *reg) >> { >> u64 val; >> @@ -477,11 +496,11 @@ static int snp_cpuid_postprocess(struct ghcb >> *ghcb, struct es_em_ctxt *ctxt, >> if (leaf->subfn == 1) { >> /* Get XSS value if XSAVES is enabled. */ >> if (leaf->eax & BIT(3)) { >> - unsigned long lo, hi; >> + int err = 0; >> - asm volatile("rdmsr" : "=a" (lo), "=d" (hi) >> - : "c" (MSR_IA32_XSS)); >> - xss = (hi << 32) | lo; >> + xss = ghcb_prot_read_msr(MSR_IA32_XSS, ghcb, ctxt, >> &err); >> + if (err != ES_OK) >> + return -EINVAL; >> } >> /*
On 10/3/23 18:22, Alexey Kardashevskiy wrote: > > On 4/10/23 04:21, Tom Lendacky wrote: >> On 10/3/23 02:31, Alexey Kardashevskiy wrote: >>> For certain intercepts an SNP guest uses the GHCB protocol to talk to >>> the hypervisor from the #VC handler. The protocol requires a shared >>> page so >>> there is one per vCPU. In case NMI arrives in a middle of #VC or the NMI >>> handler triggers a #VC, there is another "backup" GHCB page which stores >>> the content of the first one while SVM_VMGEXIT_NMI_COMPLETE is sent. >>> The vc_raw_handle_exception() handler manages main and backup GHCB pages >>> via __sev_get_ghcb/__sev_put_ghcb. >>> >>> This works fine for #VC and occasional NMIs but not so fine when the #VC >>> handler causes intercept + another #VC. If NMI arrives during >>> the second #VC, there are no more pages for SVM_VMGEXIT_NMI_COMPLETE. >>> The problem place is the #VC CPUID handler which reads an MSR which >>> triggers another #VC and if "perf" was running, panic happens: >>> >>> Kernel panic - not syncing: Unable to handle #VC exception! GHCB and >>> Backup GHCB are already in use >>> >>> Add a helper similar to native_read_msr_safe() for making a direct >>> hypercall >>> in the SEV-ES environment. Use the new helper instead of the raw >>> "rdmsr" to >>> avoid the extra #VC event. >>> >>> Fixes: ee0bfa08a345 ("x86/compressed/64: Add support for SEV-SNP CPUID >>> table in #VC handlers") >>> Signed-off-by: Alexey Kardashevskiy <aik@amd.com> >>> --- >>> >>> Based on: >>> https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=tip-x86-urgent >>> which top at the time was: >>> 62d5e970d022 "x86/sev: Change npages to unsigned long in >>> snp_accept_memory()" >>> >>> --- >>> Changes: >>> v3: >>> * made it a function, mimic native_read_msr_safe() which 1) returns >>> value 2) returns an error >>> * removed debug backtraces the commit log as these were added for >>> debugging and never >>> appear with actual kernels >>> >>> >>> v2: >>> * de-uglify by defining rdmsr_safe_GHCB() >>> --- >>> arch/x86/kernel/sev-shared.c | 27 +++++++++++++++++--- >>> 1 file changed, 23 insertions(+), 4 deletions(-) >>> >>> diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c >>> index dcf325b7b022..494d92a71986 100644 >>> --- a/arch/x86/kernel/sev-shared.c >>> +++ b/arch/x86/kernel/sev-shared.c >>> @@ -241,6 +241,25 @@ static enum es_result sev_es_ghcb_hv_call(struct >>> ghcb *ghcb, >>> return verify_exception_info(ghcb, ctxt); >>> } >>> + >>> +/* Paravirt SEV-ES rdmsr which avoids extra #VC event */ >>> +static unsigned long long ghcb_prot_read_msr(unsigned int msr, struct >>> ghcb *ghcb, >>> + struct es_em_ctxt *ctxt, int *err) >> >> Alternatively you could return enum es_result and take xss as a >> parameter... six of one, half dozen of another I guess. > > How do we decide on this? :) > > and yeah, I need to s/int/enum es_result/ > >>> +{ >>> + unsigned long long ret = 0; >>> + >>> + ghcb_set_rcx(ghcb, msr); >>> + >>> + *err = sev_es_ghcb_hv_call(ghcb, ctxt, SVM_EXIT_MSR, 0, 0); >>> + if (*err == ES_OK) >>> + ret = (ghcb->save.rdx << 32) | ghcb->save.rax; >> >> You should check ghcb_rax_is_valid(ghcb) and ghcb_rdx_is_valid(ghcb) >> before using the values. > > Huh. v4 is coming then. Although what are the chances of *err == ES_OK and > !ghcb_rax_is_valid() at the same time? What if *err == ES_OK and > ghcb_rdx_is_valid()==true but ghcb_rax_is_valid()==false? > > return ((ghcb_rdx_is_valid(ghcb)?(ghcb->save.rdx << 32):0) | > (ghcb_rax_is_valid(ghcb)?ghcb->save.rax:0; > > Or I can just drop *err, invalidate ghcb before sev_es_ghcb_hv_call() and > only rely on (ghcb_rdx_is_valid() && ghcb_rax_is_valid)? > > Where should I stop with this? :) No, you can't drop *err. The GHCB protocol specifically calls out how errors can be returned and how register state is returned. In this case, sev_es_ghcb_hv_call() will check for general errors being returned from the hypervisor, e.g. non-zero SW_EXITINFO1[31:0] and that is why you need to check *err. Then you need to validate that the hypervisor set the proper registers, hence the check for ghcb_rax/rdx_is_valid() (see __sev_cpuid_hv_ghcb() as an example). Thanks, Tom > >>> + >>> + /* Invalidate qwords for likely another following GHCB call */ >>> + vc_ghcb_invalidate(ghcb); >> >> We should probably call this on entry to the function, too, right? Not >> sure it really matters though. > > The SVM_EXIT_MSR's handler in SVM/KVM only cares if RCX is valid in > sev_es_validate_vmgexit() and the guest's ghcb_set_rcx() does that. > Nothing in SVM enforces that other (unused) registers are not valid > though. Thanks, > > >> >> Thanks, >> Tom >> >>> + >>> + return ret; >>> +} >>> + >>> static int __sev_cpuid_hv(u32 fn, int reg_idx, u32 *reg) >>> { >>> u64 val; >>> @@ -477,11 +496,11 @@ static int snp_cpuid_postprocess(struct ghcb >>> *ghcb, struct es_em_ctxt *ctxt, >>> if (leaf->subfn == 1) { >>> /* Get XSS value if XSAVES is enabled. */ >>> if (leaf->eax & BIT(3)) { >>> - unsigned long lo, hi; >>> + int err = 0; >>> - asm volatile("rdmsr" : "=a" (lo), "=d" (hi) >>> - : "c" (MSR_IA32_XSS)); >>> - xss = (hi << 32) | lo; >>> + xss = ghcb_prot_read_msr(MSR_IA32_XSS, ghcb, ctxt, &err); >>> + if (err != ES_OK) >>> + return -EINVAL; >>> } >>> /* >
On 5/10/23 00:53, Tom Lendacky wrote: > On 10/3/23 18:22, Alexey Kardashevskiy wrote: >> >> On 4/10/23 04:21, Tom Lendacky wrote: >>> On 10/3/23 02:31, Alexey Kardashevskiy wrote: >>>> For certain intercepts an SNP guest uses the GHCB protocol to talk to >>>> the hypervisor from the #VC handler. The protocol requires a shared >>>> page so >>>> there is one per vCPU. In case NMI arrives in a middle of #VC or the >>>> NMI >>>> handler triggers a #VC, there is another "backup" GHCB page which >>>> stores >>>> the content of the first one while SVM_VMGEXIT_NMI_COMPLETE is sent. >>>> The vc_raw_handle_exception() handler manages main and backup GHCB >>>> pages >>>> via __sev_get_ghcb/__sev_put_ghcb. >>>> >>>> This works fine for #VC and occasional NMIs but not so fine when the >>>> #VC >>>> handler causes intercept + another #VC. If NMI arrives during >>>> the second #VC, there are no more pages for SVM_VMGEXIT_NMI_COMPLETE. >>>> The problem place is the #VC CPUID handler which reads an MSR which >>>> triggers another #VC and if "perf" was running, panic happens: >>>> >>>> Kernel panic - not syncing: Unable to handle #VC exception! GHCB and >>>> Backup GHCB are already in use >>>> >>>> Add a helper similar to native_read_msr_safe() for making a direct >>>> hypercall >>>> in the SEV-ES environment. Use the new helper instead of the raw >>>> "rdmsr" to >>>> avoid the extra #VC event. >>>> >>>> Fixes: ee0bfa08a345 ("x86/compressed/64: Add support for SEV-SNP >>>> CPUID table in #VC handlers") >>>> Signed-off-by: Alexey Kardashevskiy <aik@amd.com> >>>> --- >>>> >>>> Based on: >>>> https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=tip-x86-urgent >>>> which top at the time was: >>>> 62d5e970d022 "x86/sev: Change npages to unsigned long in >>>> snp_accept_memory()" >>>> >>>> --- >>>> Changes: >>>> v3: >>>> * made it a function, mimic native_read_msr_safe() which 1) returns >>>> value 2) returns an error >>>> * removed debug backtraces the commit log as these were added for >>>> debugging and never >>>> appear with actual kernels >>>> >>>> >>>> v2: >>>> * de-uglify by defining rdmsr_safe_GHCB() >>>> --- >>>> arch/x86/kernel/sev-shared.c | 27 +++++++++++++++++--- >>>> 1 file changed, 23 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/arch/x86/kernel/sev-shared.c >>>> b/arch/x86/kernel/sev-shared.c >>>> index dcf325b7b022..494d92a71986 100644 >>>> --- a/arch/x86/kernel/sev-shared.c >>>> +++ b/arch/x86/kernel/sev-shared.c >>>> @@ -241,6 +241,25 @@ static enum es_result >>>> sev_es_ghcb_hv_call(struct ghcb *ghcb, >>>> return verify_exception_info(ghcb, ctxt); >>>> } >>>> + >>>> +/* Paravirt SEV-ES rdmsr which avoids extra #VC event */ >>>> +static unsigned long long ghcb_prot_read_msr(unsigned int msr, >>>> struct ghcb *ghcb, >>>> + struct es_em_ctxt *ctxt, int *err) >>> >>> Alternatively you could return enum es_result and take xss as a >>> parameter... six of one, half dozen of another I guess. >> >> How do we decide on this? :) >> >> and yeah, I need to s/int/enum es_result/ >> >>>> +{ >>>> + unsigned long long ret = 0; >>>> + >>>> + ghcb_set_rcx(ghcb, msr); >>>> + >>>> + *err = sev_es_ghcb_hv_call(ghcb, ctxt, SVM_EXIT_MSR, 0, 0); >>>> + if (*err == ES_OK) >>>> + ret = (ghcb->save.rdx << 32) | ghcb->save.rax; >>> >>> You should check ghcb_rax_is_valid(ghcb) and ghcb_rdx_is_valid(ghcb) >>> before using the values. >> >> Huh. v4 is coming then. Although what are the chances of *err == ES_OK >> and !ghcb_rax_is_valid() at the same time? What if *err == ES_OK and >> ghcb_rdx_is_valid()==true but ghcb_rax_is_valid()==false? >> >> return ((ghcb_rdx_is_valid(ghcb)?(ghcb->save.rdx << 32):0) | >> (ghcb_rax_is_valid(ghcb)?ghcb->save.rax:0; >> >> Or I can just drop *err, invalidate ghcb before sev_es_ghcb_hv_call() >> and only rely on (ghcb_rdx_is_valid() && ghcb_rax_is_valid)? >> >> Where should I stop with this? :) > > No, you can't drop *err. The GHCB protocol specifically calls out how > errors can be returned and how register state is returned. > > In this case, sev_es_ghcb_hv_call() will check for general errors being > returned from the hypervisor, e.g. non-zero SW_EXITINFO1[31:0] and that > is why you need to check *err. > > Then you need to validate that the hypervisor set the proper registers, > hence the check for ghcb_rax/rdx_is_valid() (see __sev_cpuid_hv_ghcb() > as an example). After an offline discussion, it turns out this intercepted rdmsr of XSS in this particular place (postprocessing of CPUID 0xd:0x1 bit3 == "XSAVES, XRSTOR, and XSS are supported") in the guest should not have been intercepted in the first place as it is virtualized and swapped as typeB, but it is intercepted as this is the default. This applied to KVM fixes the guest crashing problem: --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -4266,6 +4266,11 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) vcpu->arch.xsaves_enabled = guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) && boot_cpu_has(X86_FEATURE_XSAVE) && boot_cpu_has(X86_FEATURE_XSAVES); + if (vcpu->arch.xsaves_enabled) + set_msr_interception(vcpu, svm->msrpm, MSR_IA32_XSS, 1, 1); Sooo. I guess we want to fix the KVM but at least for now the guest needs the fix too, does not it? And adding Sean in cc. Thanks, > > Thanks, > Tom > >> >>>> + >>>> + /* Invalidate qwords for likely another following GHCB call */ >>>> + vc_ghcb_invalidate(ghcb); >>> >>> We should probably call this on entry to the function, too, right? >>> Not sure it really matters though. >> >> The SVM_EXIT_MSR's handler in SVM/KVM only cares if RCX is valid in >> sev_es_validate_vmgexit() and the guest's ghcb_set_rcx() does that. >> Nothing in SVM enforces that other (unused) registers are not valid >> though. Thanks, >> >> >>> >>> Thanks, >>> Tom >>> >>>> + >>>> + return ret; >>>> +} >>>> + >>>> static int __sev_cpuid_hv(u32 fn, int reg_idx, u32 *reg) >>>> { >>>> u64 val; >>>> @@ -477,11 +496,11 @@ static int snp_cpuid_postprocess(struct ghcb >>>> *ghcb, struct es_em_ctxt *ctxt, >>>> if (leaf->subfn == 1) { >>>> /* Get XSS value if XSAVES is enabled. */ >>>> if (leaf->eax & BIT(3)) { >>>> - unsigned long lo, hi; >>>> + int err = 0; >>>> - asm volatile("rdmsr" : "=a" (lo), "=d" (hi) >>>> - : "c" (MSR_IA32_XSS)); >>>> - xss = (hi << 32) | lo; >>>> + xss = ghcb_prot_read_msr(MSR_IA32_XSS, ghcb, ctxt, >>>> &err); >>>> + if (err != ES_OK) >>>> + return -EINVAL; >>>> } >>>> /* >>
On 10/5/23 04:36, Alexey Kardashevskiy wrote: > > On 5/10/23 00:53, Tom Lendacky wrote: >> On 10/3/23 18:22, Alexey Kardashevskiy wrote: >>> >>> On 4/10/23 04:21, Tom Lendacky wrote: >>>> On 10/3/23 02:31, Alexey Kardashevskiy wrote: >>>>> For certain intercepts an SNP guest uses the GHCB protocol to talk to >>>>> the hypervisor from the #VC handler. The protocol requires a shared >>>>> page so >>>>> there is one per vCPU. In case NMI arrives in a middle of #VC or the NMI >>>>> handler triggers a #VC, there is another "backup" GHCB page which stores >>>>> the content of the first one while SVM_VMGEXIT_NMI_COMPLETE is sent. >>>>> The vc_raw_handle_exception() handler manages main and backup GHCB pages >>>>> via __sev_get_ghcb/__sev_put_ghcb. >>>>> >>>>> This works fine for #VC and occasional NMIs but not so fine when the #VC >>>>> handler causes intercept + another #VC. If NMI arrives during >>>>> the second #VC, there are no more pages for SVM_VMGEXIT_NMI_COMPLETE. >>>>> The problem place is the #VC CPUID handler which reads an MSR which >>>>> triggers another #VC and if "perf" was running, panic happens: >>>>> >>>>> Kernel panic - not syncing: Unable to handle #VC exception! GHCB and >>>>> Backup GHCB are already in use >>>>> >>>>> Add a helper similar to native_read_msr_safe() for making a direct >>>>> hypercall >>>>> in the SEV-ES environment. Use the new helper instead of the raw >>>>> "rdmsr" to >>>>> avoid the extra #VC event. >>>>> >>>>> Fixes: ee0bfa08a345 ("x86/compressed/64: Add support for SEV-SNP >>>>> CPUID table in #VC handlers") >>>>> Signed-off-by: Alexey Kardashevskiy <aik@amd.com> >>>>> --- >>>>> >>>>> Based on: >>>>> https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=tip-x86-urgent >>>>> which top at the time was: >>>>> 62d5e970d022 "x86/sev: Change npages to unsigned long in >>>>> snp_accept_memory()" >>>>> >>>>> --- >>>>> Changes: >>>>> v3: >>>>> * made it a function, mimic native_read_msr_safe() which 1) returns >>>>> value 2) returns an error >>>>> * removed debug backtraces the commit log as these were added for >>>>> debugging and never >>>>> appear with actual kernels >>>>> >>>>> >>>>> v2: >>>>> * de-uglify by defining rdmsr_safe_GHCB() >>>>> --- >>>>> arch/x86/kernel/sev-shared.c | 27 +++++++++++++++++--- >>>>> 1 file changed, 23 insertions(+), 4 deletions(-) >>>>> >>>>> diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c >>>>> index dcf325b7b022..494d92a71986 100644 >>>>> --- a/arch/x86/kernel/sev-shared.c >>>>> +++ b/arch/x86/kernel/sev-shared.c >>>>> @@ -241,6 +241,25 @@ static enum es_result sev_es_ghcb_hv_call(struct >>>>> ghcb *ghcb, >>>>> return verify_exception_info(ghcb, ctxt); >>>>> } >>>>> + >>>>> +/* Paravirt SEV-ES rdmsr which avoids extra #VC event */ >>>>> +static unsigned long long ghcb_prot_read_msr(unsigned int msr, >>>>> struct ghcb *ghcb, >>>>> + struct es_em_ctxt *ctxt, int *err) >>>> >>>> Alternatively you could return enum es_result and take xss as a >>>> parameter... six of one, half dozen of another I guess. >>> >>> How do we decide on this? :) >>> >>> and yeah, I need to s/int/enum es_result/ >>> >>>>> +{ >>>>> + unsigned long long ret = 0; >>>>> + >>>>> + ghcb_set_rcx(ghcb, msr); >>>>> + >>>>> + *err = sev_es_ghcb_hv_call(ghcb, ctxt, SVM_EXIT_MSR, 0, 0); >>>>> + if (*err == ES_OK) >>>>> + ret = (ghcb->save.rdx << 32) | ghcb->save.rax; >>>> >>>> You should check ghcb_rax_is_valid(ghcb) and ghcb_rdx_is_valid(ghcb) >>>> before using the values. >>> >>> Huh. v4 is coming then. Although what are the chances of *err == ES_OK >>> and !ghcb_rax_is_valid() at the same time? What if *err == ES_OK and >>> ghcb_rdx_is_valid()==true but ghcb_rax_is_valid()==false? >>> >>> return ((ghcb_rdx_is_valid(ghcb)?(ghcb->save.rdx << 32):0) | >>> (ghcb_rax_is_valid(ghcb)?ghcb->save.rax:0; >>> >>> Or I can just drop *err, invalidate ghcb before sev_es_ghcb_hv_call() >>> and only rely on (ghcb_rdx_is_valid() && ghcb_rax_is_valid)? >>> >>> Where should I stop with this? :) >> >> No, you can't drop *err. The GHCB protocol specifically calls out how >> errors can be returned and how register state is returned. >> >> In this case, sev_es_ghcb_hv_call() will check for general errors being >> returned from the hypervisor, e.g. non-zero SW_EXITINFO1[31:0] and that >> is why you need to check *err. >> >> Then you need to validate that the hypervisor set the proper registers, >> hence the check for ghcb_rax/rdx_is_valid() (see __sev_cpuid_hv_ghcb() >> as an example). > > > After an offline discussion, it turns out this intercepted rdmsr of XSS in > this particular place (postprocessing of CPUID 0xd:0x1 bit3 == "XSAVES, > XRSTOR, and XSS are supported") in the guest should not have been > intercepted in the first place as it is virtualized and swapped as typeB, > but it is intercepted as this is the default. > > > This applied to KVM fixes the guest crashing problem: > > --- a/arch/x86/kvm/svm/svm.c > +++ b/arch/x86/kvm/svm/svm.c > @@ -4266,6 +4266,11 @@ static void svm_vcpu_after_set_cpuid(struct > kvm_vcpu *vcpu) > vcpu->arch.xsaves_enabled = guest_cpuid_has(vcpu, > X86_FEATURE_XSAVE) && > boot_cpu_has(X86_FEATURE_XSAVE) && > boot_cpu_has(X86_FEATURE_XSAVES); > + if (vcpu->arch.xsaves_enabled) > + set_msr_interception(vcpu, svm->msrpm, MSR_IA32_XSS, 1, 1); But this isn't valid for non ES/SNP guests, so it would need to be part of the (recently added) sev_es_vcpu_after_set_cpuid(). It would also need to (re)enable the intercept if xsaves_enabled isn't set. > > > Sooo. I guess we want to fix the KVM but at least for now the guest needs > the fix too, does not it? Since the SNP hypervisor patches aren't upstream, I don't think we do. If the SNP patches do the right thing from the start, everything is ok. There isn't currently support for XSS in the ES #VC path for CPUID, so we do need to look at including that. But that is also the tricky part for ES since we don't have a CPUID table. The XSS MSR is valid if CPUID leaf 0xd, subleaf 1 has bit 3 set, but you need to be able to supply XSS on the CPUID call... so there is a bit of a chicken and egg issue where we might have to call that CPUID twice the very first time that CPUID request is encountered so that the proper XSS value can be supplied before the X86_FEATURE_XSAVES flag is set. Is XSAVES being advertised for ES guests today? If not, we can add the interception updates today to KVM and be covered there, too. Thanks, Tom > > And adding Sean in cc. > > Thanks, > > >> >> Thanks, >> Tom >> >>> >>>>> + >>>>> + /* Invalidate qwords for likely another following GHCB call */ >>>>> + vc_ghcb_invalidate(ghcb); >>>> >>>> We should probably call this on entry to the function, too, right? Not >>>> sure it really matters though. >>> >>> The SVM_EXIT_MSR's handler in SVM/KVM only cares if RCX is valid in >>> sev_es_validate_vmgexit() and the guest's ghcb_set_rcx() does that. >>> Nothing in SVM enforces that other (unused) registers are not valid >>> though. Thanks, >>> >>> >>>> >>>> Thanks, >>>> Tom >>>> >>>>> + >>>>> + return ret; >>>>> +} >>>>> + >>>>> static int __sev_cpuid_hv(u32 fn, int reg_idx, u32 *reg) >>>>> { >>>>> u64 val; >>>>> @@ -477,11 +496,11 @@ static int snp_cpuid_postprocess(struct ghcb >>>>> *ghcb, struct es_em_ctxt *ctxt, >>>>> if (leaf->subfn == 1) { >>>>> /* Get XSS value if XSAVES is enabled. */ >>>>> if (leaf->eax & BIT(3)) { >>>>> - unsigned long lo, hi; >>>>> + int err = 0; >>>>> - asm volatile("rdmsr" : "=a" (lo), "=d" (hi) >>>>> - : "c" (MSR_IA32_XSS)); >>>>> - xss = (hi << 32) | lo; >>>>> + xss = ghcb_prot_read_msr(MSR_IA32_XSS, ghcb, ctxt, >>>>> &err); >>>>> + if (err != ES_OK) >>>>> + return -EINVAL; >>>>> } >>>>> /* >>> >
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c index dcf325b7b022..494d92a71986 100644 --- a/arch/x86/kernel/sev-shared.c +++ b/arch/x86/kernel/sev-shared.c @@ -241,6 +241,25 @@ static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb, return verify_exception_info(ghcb, ctxt); } + +/* Paravirt SEV-ES rdmsr which avoids extra #VC event */ +static unsigned long long ghcb_prot_read_msr(unsigned int msr, struct ghcb *ghcb, + struct es_em_ctxt *ctxt, int *err) +{ + unsigned long long ret = 0; + + ghcb_set_rcx(ghcb, msr); + + *err = sev_es_ghcb_hv_call(ghcb, ctxt, SVM_EXIT_MSR, 0, 0); + if (*err == ES_OK) + ret = (ghcb->save.rdx << 32) | ghcb->save.rax; + + /* Invalidate qwords for likely another following GHCB call */ + vc_ghcb_invalidate(ghcb); + + return ret; +} + static int __sev_cpuid_hv(u32 fn, int reg_idx, u32 *reg) { u64 val; @@ -477,11 +496,11 @@ static int snp_cpuid_postprocess(struct ghcb *ghcb, struct es_em_ctxt *ctxt, if (leaf->subfn == 1) { /* Get XSS value if XSAVES is enabled. */ if (leaf->eax & BIT(3)) { - unsigned long lo, hi; + int err = 0; - asm volatile("rdmsr" : "=a" (lo), "=d" (hi) - : "c" (MSR_IA32_XSS)); - xss = (hi << 32) | lo; + xss = ghcb_prot_read_msr(MSR_IA32_XSS, ghcb, ctxt, &err); + if (err != ES_OK) + return -EINVAL; } /*