Message ID | 20221201021948.9259-2-aik@amd.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp29534wrr; Wed, 30 Nov 2022 18:22:58 -0800 (PST) X-Google-Smtp-Source: AA0mqf6k7B8WsmoKrVVzvQvdKQnwwt5ckxE0uo/hOxQRxS1kzJ9z7j04gN3FJIZrqucCl0l4ypOj X-Received: by 2002:a17:906:37c7:b0:7c0:e15:9c4 with SMTP id o7-20020a17090637c700b007c00e1509c4mr14086288ejc.435.1669861378294; Wed, 30 Nov 2022 18:22:58 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1669861378; cv=pass; d=google.com; s=arc-20160816; b=hBQIrGng+NrJ9TRPIooIXIWreSGGh1uZ/4cpzskOH0GA+m1aHhbzIWGPttb5kd9Adr NQO6VpGnXGr2jo1NLaJe5NYdbCvU8w7hNeJZw8tPNd+OCMyPIj01FNpMSf8tX7k5KMGX sS5WC/GAEEWRqVS2+EOL4oJtA32qSnRPTIng53VytZxW/suVV34aCSOkEHNPWGLFvc9Q 0u05e3AVavnSyqKjxk9MB7lWjT1woR0v1wjbdJf223UaTteCLAQ+b8xlrf6OYRNn6XJ+ HPNzT40iKtkFaMnjHN85n8isz0jwjOGLuJBMnrC84xX6cj/1D05MPnGjFf3gE9oJlIRh CvAA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=lp76VTxDs0OlO4s1fAPkZZ8difGoTPNp0BwYKAeFZrE=; b=R2rjm5UJUao8MSHpv5EHoafojsIQkzD/qIt8+ENGzK3+ota4GsVhD9/uT16zrwiYrT 6BQ9+SNchAdKOJ8MmG8IK1071QX1M6jYsVWLjT+I/sBLhGP3NL3m72azN2P/aBRIw3Xx 4XJmaHn5Na2hyQqqom3kzdHAd4w9mt3jXi2+Q/ebPAGgdbYfGpesv/3QtfkdlMmr/pGU ZTTXiKoUFFake/TwpFEySlxTftIInGpch015tkj8fflTwwRS33sYN74fi30vUUQiLKWI hxe6X269Sy8ceP13Q3Arfq4KvRQQjID5KtQcr6lvZClhbkAU0xaFAjBuRunRiXrZhuzu EMfw== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@amd.com header.s=selector1 header.b=SvbLjDa9; arc=pass (i=1 spf=pass spfdomain=amd.com dmarc=pass fromdomain=amd.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amd.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id xe8-20020a170907318800b007ad8140c60asi2756706ejb.492.2022.11.30.18.22.35; Wed, 30 Nov 2022 18:22:58 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@amd.com header.s=selector1 header.b=SvbLjDa9; arc=pass (i=1 spf=pass spfdomain=amd.com dmarc=pass fromdomain=amd.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amd.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229934AbiLACVh (ORCPT <rfc822;heyuhang3455@gmail.com> + 99 others); Wed, 30 Nov 2022 21:21:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54450 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230007AbiLACVF (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 30 Nov 2022 21:21:05 -0500 Received: from NAM10-MW2-obe.outbound.protection.outlook.com (mail-mw2nam10on2065.outbound.protection.outlook.com [40.107.94.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BBAE9A13D2; Wed, 30 Nov 2022 18:20:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=LJdIK6T6jfLj4dtTMafJ2VJxncCfjVrBmnH+LIqYg48sJbDTExUsARR7+wYKrBh1g8XIeBB3Q5TBBOQzRiMCUfyyPYBwI53PvfztY9xMn3APkcukC+dghBUYbAxSu+zRBh43nb+TqgJ2CON6bY2QfE3zAqQJxodVwhrHC6aIgLEHxRWpK7QeEEMUIJiBg/0Lxmqg8ywdDH6rKwDTst90eVXSeJh1GJ1OXKWtfDO3rHp5bOzTTcPQ4e7jgOZs9YzSaOs82Fm8zsMk/ROgxV73P2ghQLhxhcw9/b7xORp7V/yR3cxvCNJmQ8m40EHzH4QJlmGCh4boziI1RnMy0TVFCw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=lp76VTxDs0OlO4s1fAPkZZ8difGoTPNp0BwYKAeFZrE=; b=V4lVVyt0v9lEg1MMhA9TrkP/IyyAhSK/j8Xl1sOs4qvjVDn+qYk0lwdkMgDRdvsTzWXxY2Sr9rH5u/eoui42LpYGN9ufPFNmI5xuMXJiJjit4a0tOUyNm2Bbir9g92lxW+xLgLxwkMrMeAcrjDVzX5CP05TQy37OugOmnuOgxE7Z/HHe+DisT+2hx6ioqa/TFsUSsewU5irxbX2EI4iYSLfoxcvdF5BSItBdT+bz2m0g5iaoe3LqIww2BkEIyBYdmEg4s/MgWa2wZpXncg1GyoWcYu4GWpNYl0f43Sl5ugsm2yYI5KsVuKb5xbtEYgmsDIUGmt8pll/Ds71pnhhRlw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=lp76VTxDs0OlO4s1fAPkZZ8difGoTPNp0BwYKAeFZrE=; b=SvbLjDa9EMpysxzWlwtDMmWtGd/VJF+a6MC4poBjvQ+aYcc1oDiuX+8xuiPS18a98duXK6KDCeIymV3Qzf5Kc5pWgu+49CaQz/lpruVO52a/Q4rZPxqvFnklhZaNi7qH5U06BhE2BzpPHyNx9RfDa04xK5q8ab09h5cgsjrfbOY= Received: from BN0PR03CA0002.namprd03.prod.outlook.com (2603:10b6:408:e6::7) by CY8PR12MB7610.namprd12.prod.outlook.com (2603:10b6:930:9a::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5857.23; Thu, 1 Dec 2022 02:20:44 +0000 Received: from BN8NAM11FT115.eop-nam11.prod.protection.outlook.com (2603:10b6:408:e6:cafe::3) by BN0PR03CA0002.outlook.office365.com (2603:10b6:408:e6::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.8 via Frontend Transport; Thu, 1 Dec 2022 02:20:43 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by BN8NAM11FT115.mail.protection.outlook.com (10.13.177.151) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.5857.23 via Frontend Transport; Thu, 1 Dec 2022 02:20:43 +0000 Received: from aiemdeew.1.ozlabs.ru (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.34; Wed, 30 Nov 2022 20:20:33 -0600 From: Alexey Kardashevskiy <aik@amd.com> To: Alexey Kardashevskiy <aik@amd.com> CC: <kvm@vger.kernel.org>, <x86@kernel.org>, <linux-kernel@vger.kernel.org>, Venu Busireddy <venu.busireddy@oracle.com>, Tony Luck <tony.luck@intel.com>, Tom Lendacky <thomas.lendacky@amd.com>, Thomas Gleixner <tglx@linutronix.de>, Sean Christopherson <seanjc@google.com>, Peter Zijlstra <peterz@infradead.org>, Paolo Bonzini <pbonzini@redhat.com>, Michael Sterritt <sterritt@google.com>, Michael Roth <michael.roth@amd.com>, Mario Limonciello <mario.limonciello@amd.com>, Ingo Molnar <mingo@redhat.com>, Heiko Carstens <hca@linux.ibm.com>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, "Dave Hansen" <dave.hansen@linux.intel.com>, Borislav Petkov <bp@alien8.de>, "Andrew Cooper" <andrew.cooper3@citrix.com>, "Jason A. Donenfeld" <Jason@zx2c4.com>, "H. Peter Anvin" <hpa@zytor.com> Subject: [PATCH kernel 1/3] x86/amd/dr_addr_mask: Cache values in percpu variables Date: Thu, 1 Dec 2022 13:19:46 +1100 Message-ID: <20221201021948.9259-2-aik@amd.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20221201021948.9259-1-aik@amd.com> References: <20221201021948.9259-1-aik@amd.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN8NAM11FT115:EE_|CY8PR12MB7610:EE_ X-MS-Office365-Filtering-Correlation-Id: 9d418dcf-13e6-4d0d-99af-08dad342a645 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: JbPmWF9OQ4dUoAh6eLdQ+K9rrflc7V3sN3hR5FFUn6Mkk30Y/4Cuc6noMV9yMqqBEI4OacR4rOQMhmTwmWBfLsQUuF8JDPuJAUxdHo8dWv7b3jWOKJcmOkgCTtb3y0yL7/tyIiw0hS3NrDm/lV05oCHSGXu75ChXS0fdiLPKt9aVFncQYMVQWECj9M9hxibxcggT0BEFirhEGGwc92GIwXzXcpe+NRy0IRBC8jYUmOg6+4U6NQu7TLirIJD4yhkc336//dGU78/M5Qztjoa4JOamIclDJi4VaS2Djh9HzFhVL/pKNWFUmtEQPyyeYwK8mchESUoDB6sFOu8gwcezhEh5ruZfgwWXJU/GB9S27XlTjM0oudZYG9pxNn1fmcA495sL+3SSoDstYKyBjTTOMi2DkDtILF1gL5DHk1UTgtQjNQC6DXY5fxvMsDRj3OYuDUBqkfHxEENm4BdEWCj68dauyQMCmYp4O0N4keXOvZwcYE1u4EWzp7Fm51w+ouvVCV6OazyHvgThO6xgqSG/1LgFx588+2bWKTt8BJDIirn5DqxVcJFvcjgRcYVvV0M0rx/YDq7Gk4SKNEGkcBiepx2kDf2tFHf0IqGKjNe58tDnxEptFkEb9VNCSrP6wsxBsu+d0egCK5Web3zgDcJi14YD4EQHw0BRjdTNUvGQjHgqUCK/NkFoxVcsdy32FnWg43hqFTf3QpuA+jW1xqemxiAREeUh6azOVAgrmu5rDQA= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230022)(4636009)(39860400002)(376002)(346002)(136003)(396003)(451199015)(40470700004)(36840700001)(46966006)(7049001)(7416002)(5660300002)(26005)(54906003)(8936002)(316002)(6862004)(8676002)(4326008)(36756003)(37006003)(36860700001)(70586007)(41300700001)(83380400001)(40460700003)(40480700001)(81166007)(356005)(47076005)(82740400003)(478600001)(6666004)(2616005)(426003)(70206006)(82310400005)(336012)(186003)(16526019)(1076003)(2906002)(6200100001)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Dec 2022 02:20:43.6759 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 9d418dcf-13e6-4d0d-99af-08dad342a645 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT115.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY8PR12MB7610 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1750976564670500730?= X-GMAIL-MSGID: =?utf-8?q?1750976564670500730?= |
Series |
KVM: SEV: Enable AMD SEV-ES DebugSwap
|
|
Commit Message
Alexey Kardashevskiy
Dec. 1, 2022, 2:19 a.m. UTC
Reading DR[0-3]_ADDR_MASK MSRs takes about 250 cycles which is going to
be noticeable when the AMD KVM SEV-ES's DebugSwap feature is enabled and
KVM needs to store these before switching to a guest; the DebugSwitch
hardware support restores them as type B swap.
This stores MSR values from set_dr_addr_mask() in percpu values and
returns them via new get_dr_addr_mask(). The gain here is about 10x.
Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
---
arch/x86/include/asm/debugreg.h | 1 +
arch/x86/kernel/cpu/amd.c | 32 ++++++++++++++++++++
2 files changed, 33 insertions(+)
Comments
On Thu, Dec 01, 2022, Alexey Kardashevskiy wrote: > Reading DR[0-3]_ADDR_MASK MSRs takes about 250 cycles which is going to > be noticeable when the AMD KVM SEV-ES's DebugSwap feature is enabled and > KVM needs to store these before switching to a guest; the DebugSwitch > hardware support restores them as type B swap. > > This stores MSR values from set_dr_addr_mask() in percpu values and > returns them via new get_dr_addr_mask(). The gain here is about 10x. > > Signed-off-by: Alexey Kardashevskiy <aik@amd.com> > --- > arch/x86/include/asm/debugreg.h | 1 + > arch/x86/kernel/cpu/amd.c | 32 ++++++++++++++++++++ > 2 files changed, 33 insertions(+) > > diff --git a/arch/x86/include/asm/debugreg.h b/arch/x86/include/asm/debugreg.h > index cfdf307ddc01..c4324d0205b5 100644 > --- a/arch/x86/include/asm/debugreg.h > +++ b/arch/x86/include/asm/debugreg.h > @@ -127,6 +127,7 @@ static __always_inline void local_db_restore(unsigned long dr7) > > #ifdef CONFIG_CPU_SUP_AMD > extern void set_dr_addr_mask(unsigned long mask, int dr); > +extern unsigned long get_dr_addr_mask(int dr); > #else > static inline void set_dr_addr_mask(unsigned long mask, int dr) { } KVM_AMD doesn't depend on CPU_SUP_AMD, i.e. this needs a stub. Or we need to add a dependency. > diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c > index c75d75b9f11a..ec7efcef4e14 100644 > --- a/arch/x86/kernel/cpu/amd.c > +++ b/arch/x86/kernel/cpu/amd.c > @@ -1158,6 +1158,11 @@ static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum) > return false; > } > > +DEFINE_PER_CPU_READ_MOSTLY(unsigned long, dr0_addr_mask); > +DEFINE_PER_CPU_READ_MOSTLY(unsigned long, dr1_addr_mask); > +DEFINE_PER_CPU_READ_MOSTLY(unsigned long, dr2_addr_mask); > +DEFINE_PER_CPU_READ_MOSTLY(unsigned long, dr3_addr_mask); > + > void set_dr_addr_mask(unsigned long mask, int dr) > { > if (!boot_cpu_has(X86_FEATURE_BPEXT)) > @@ -1166,17 +1171,44 @@ void set_dr_addr_mask(unsigned long mask, int dr) > switch (dr) { > case 0: > wrmsr(MSR_F16H_DR0_ADDR_MASK, mask, 0); LOL, I'd love to hear how MSR_F16H_DR0_ADDR_MASK ended up with a completely different MSR index. > + per_cpu(dr0_addr_mask, smp_processor_id()) = mask; Use an array to avoid the copy+paste? And if you're going to add a cache, might as well use it to avoid unnecessary writes. > break; > case 1:
On 2/12/22 03:58, Sean Christopherson wrote: > On Thu, Dec 01, 2022, Alexey Kardashevskiy wrote: >> Reading DR[0-3]_ADDR_MASK MSRs takes about 250 cycles which is going to >> be noticeable when the AMD KVM SEV-ES's DebugSwap feature is enabled and >> KVM needs to store these before switching to a guest; the DebugSwitch >> hardware support restores them as type B swap. >> >> This stores MSR values from set_dr_addr_mask() in percpu values and >> returns them via new get_dr_addr_mask(). The gain here is about 10x. >> >> Signed-off-by: Alexey Kardashevskiy <aik@amd.com> >> --- >> arch/x86/include/asm/debugreg.h | 1 + >> arch/x86/kernel/cpu/amd.c | 32 ++++++++++++++++++++ >> 2 files changed, 33 insertions(+) >> >> diff --git a/arch/x86/include/asm/debugreg.h b/arch/x86/include/asm/debugreg.h >> index cfdf307ddc01..c4324d0205b5 100644 >> --- a/arch/x86/include/asm/debugreg.h >> +++ b/arch/x86/include/asm/debugreg.h >> @@ -127,6 +127,7 @@ static __always_inline void local_db_restore(unsigned long dr7) >> >> #ifdef CONFIG_CPU_SUP_AMD >> extern void set_dr_addr_mask(unsigned long mask, int dr); >> +extern unsigned long get_dr_addr_mask(int dr); >> #else >> static inline void set_dr_addr_mask(unsigned long mask, int dr) { } > > KVM_AMD doesn't depend on CPU_SUP_AMD, i.e. this needs a stub. Or we need to add > a dependency. The new symbol is under CONFIG_CPU_SUP_AMD and so is its only user arch/x86/kernel/cpu/amd.c: arch/x86/kernel/cpu/Makefile: obj-$(CONFIG_CPU_SUP_AMD) += amd.o Is this enough dependency or we need something else? (if enough, I'll put it into the next commit log). >> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c >> index c75d75b9f11a..ec7efcef4e14 100644 >> --- a/arch/x86/kernel/cpu/amd.c >> +++ b/arch/x86/kernel/cpu/amd.c >> @@ -1158,6 +1158,11 @@ static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum) >> return false; >> } >> >> +DEFINE_PER_CPU_READ_MOSTLY(unsigned long, dr0_addr_mask); >> +DEFINE_PER_CPU_READ_MOSTLY(unsigned long, dr1_addr_mask); >> +DEFINE_PER_CPU_READ_MOSTLY(unsigned long, dr2_addr_mask); >> +DEFINE_PER_CPU_READ_MOSTLY(unsigned long, dr3_addr_mask); >> + >> void set_dr_addr_mask(unsigned long mask, int dr) >> { >> if (!boot_cpu_has(X86_FEATURE_BPEXT)) >> @@ -1166,17 +1171,44 @@ void set_dr_addr_mask(unsigned long mask, int dr) >> switch (dr) { >> case 0: >> wrmsr(MSR_F16H_DR0_ADDR_MASK, mask, 0); > > LOL, I'd love to hear how MSR_F16H_DR0_ADDR_MASK ended up with a completely > different MSR index. >> + per_cpu(dr0_addr_mask, smp_processor_id()) = mask; > > Use an array to avoid the copy+paste? And if you're going to add a cache, might > as well use it to avoid unnecessary writes. I'll do this, I did not realize DEFINE_PER_CPU_READ_MOSTLY can do arrays. Thanks, > >> break; >> case 1:
On Tue, Dec 06, 2022, Alexey Kardashevskiy wrote: > > On 2/12/22 03:58, Sean Christopherson wrote: > > On Thu, Dec 01, 2022, Alexey Kardashevskiy wrote: > > > diff --git a/arch/x86/include/asm/debugreg.h b/arch/x86/include/asm/debugreg.h > > > index cfdf307ddc01..c4324d0205b5 100644 > > > --- a/arch/x86/include/asm/debugreg.h > > > +++ b/arch/x86/include/asm/debugreg.h > > > @@ -127,6 +127,7 @@ static __always_inline void local_db_restore(unsigned long dr7) > > > #ifdef CONFIG_CPU_SUP_AMD > > > extern void set_dr_addr_mask(unsigned long mask, int dr); > > > +extern unsigned long get_dr_addr_mask(int dr); > > > #else > > > static inline void set_dr_addr_mask(unsigned long mask, int dr) { } > > > > KVM_AMD doesn't depend on CPU_SUP_AMD, i.e. this needs a stub. Or we need to add > > a dependency. > > The new symbol is under CONFIG_CPU_SUP_AMD and so is its only user > arch/x86/kernel/cpu/amd.c: > > arch/x86/kernel/cpu/Makefile: > obj-$(CONFIG_CPU_SUP_AMD) += amd.o > > > Is this enough dependency or we need something else? (if enough, I'll put it > into the next commit log). No, it's actually the opposite, the issue is precisely that the symbol is buried under CPU_SUP_AMD. KVM_AMD doesn't currently depend on CPU_SUP_AMD, and so the usage in sev_es_prepare_switch_to_guest() fails to compile if CPU_SUP_AMD=n and KVM_AMD={Y,M}. config KVM_AMD tristate "KVM for AMD processors support" depends on KVM I actually just submitted a patch[*] to "fix" that since KVM requires the CPU vendor to be AMD or Hygon at runtime. Although that patch is buried in the middle of a large series, it doesn't have any dependencies. So, if this series is routed through the KVM tree, it should be straightforward to just pull that patch into this series, and whichever series lands first gets the honor of applying that patch. If this series is routed through the tip tree, the best option may be to just add a stub to avoid potential conflicts, and then we can rip the stub out later. [*] https://lore.kernel.org/all/20221201232655.290720-12-seanjc@google.com
On 7/12/22 04:07, Sean Christopherson wrote: > On Tue, Dec 06, 2022, Alexey Kardashevskiy wrote: >> >> On 2/12/22 03:58, Sean Christopherson wrote: >>> On Thu, Dec 01, 2022, Alexey Kardashevskiy wrote: >>>> diff --git a/arch/x86/include/asm/debugreg.h b/arch/x86/include/asm/debugreg.h >>>> index cfdf307ddc01..c4324d0205b5 100644 >>>> --- a/arch/x86/include/asm/debugreg.h >>>> +++ b/arch/x86/include/asm/debugreg.h >>>> @@ -127,6 +127,7 @@ static __always_inline void local_db_restore(unsigned long dr7) >>>> #ifdef CONFIG_CPU_SUP_AMD >>>> extern void set_dr_addr_mask(unsigned long mask, int dr); >>>> +extern unsigned long get_dr_addr_mask(int dr); >>>> #else >>>> static inline void set_dr_addr_mask(unsigned long mask, int dr) { } >>> >>> KVM_AMD doesn't depend on CPU_SUP_AMD, i.e. this needs a stub. Or we need to add >>> a dependency. >> >> The new symbol is under CONFIG_CPU_SUP_AMD and so is its only user >> arch/x86/kernel/cpu/amd.c: >> >> arch/x86/kernel/cpu/Makefile: >> obj-$(CONFIG_CPU_SUP_AMD) += amd.o >> >> >> Is this enough dependency or we need something else? (if enough, I'll put it >> into the next commit log). > > No, it's actually the opposite, the issue is precisely that the symbol is buried > under CPU_SUP_AMD. KVM_AMD doesn't currently depend on CPU_SUP_AMD, and so the > usage in sev_es_prepare_switch_to_guest() fails to compile if CPU_SUP_AMD=n and > KVM_AMD={Y,M}. Ouch, you are one step ahead, 2/3 fails, not this one. My bad. I'll add a stub anyway. btw I want to do "s/int dr/unsigned int dr/" since I am using an array now. Does it have to be patch#1 "fix set_dr_addr_mask" and then patch#2 "add get_dr_addr_mask" or one patch will do? Thanks, > > config KVM_AMD > tristate "KVM for AMD processors support" > depends on KVM > > I actually just submitted a patch[*] to "fix" that since KVM requires the CPU vendor > to be AMD or Hygon at runtime. > Although that patch is buried in the middle of a > large series, it doesn't have any dependencies. So, if this series is routed through > the KVM tree, it should be straightforward to just pull that patch into this series, > and whichever series lands first gets the honor of applying that patch. > > If this series is routed through the tip tree, the best option may be to just add > a stub to avoid potential conflicts, and then we can rip the stub out later. > > [*] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fall%2F20221201232655.290720-12-seanjc%40google.com&data=05%7C01%7CAlexey.Kardashevskiy%40amd.com%7C6ee92cb78f8f446b887908dad7ac6fca%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638059432896741545%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=25f18IS6z9HEf0Qtt%2BFDxZdtbTVPg%2FVulsFGhWLH0Rg%3D&reserved=0
On Thu, Dec 01, 2022 at 01:19:46PM +1100, Alexey Kardashevskiy wrote: > Subject: Re: [PATCH kernel 1/3] x86/amd/dr_addr_mask: Cache values in percpu variables "x86/amd: " is perfectly fine as a prefix. > Reading DR[0-3]_ADDR_MASK MSRs takes about 250 cycles which is going to > be noticeable when the AMD KVM SEV-ES's DebugSwap feature is enabled and which does what? I.e., a sort of lazy DR regs swapping... > KVM needs to store these before switching to a guest; the DebugSwitch > hardware support restores them as type B swap. I know this is all clear to you but you should explain what type B register swap is. > This stores MSR values from set_dr_addr_mask() in percpu values and s/This stores/Store/ From Documentation/process/submitting-patches.rst: "Describe your changes in imperative mood, e.g. "make xyzzy do frotz" instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy to do frotz", as if you are giving orders to the codebase to change its behaviour." Also, do not talk about what your patch does - that should hopefully be visible in the diff itself. Rather, talk about *why* you're doing what you're doing. > returns them via new get_dr_addr_mask(). The gain here is about 10x. > > Signed-off-by: Alexey Kardashevskiy <aik@amd.com> > --- > arch/x86/include/asm/debugreg.h | 1 + > arch/x86/kernel/cpu/amd.c | 32 ++++++++++++++++++++ > 2 files changed, 33 insertions(+) > > diff --git a/arch/x86/include/asm/debugreg.h b/arch/x86/include/asm/debugreg.h > index cfdf307ddc01..c4324d0205b5 100644 > --- a/arch/x86/include/asm/debugreg.h > +++ b/arch/x86/include/asm/debugreg.h > @@ -127,6 +127,7 @@ static __always_inline void local_db_restore(unsigned long dr7) > > #ifdef CONFIG_CPU_SUP_AMD > extern void set_dr_addr_mask(unsigned long mask, int dr); > +extern unsigned long get_dr_addr_mask(int dr); > #else > static inline void set_dr_addr_mask(unsigned long mask, int dr) { } > #endif > diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c > index c75d75b9f11a..ec7efcef4e14 100644 > --- a/arch/x86/kernel/cpu/amd.c > +++ b/arch/x86/kernel/cpu/amd.c > @@ -1158,6 +1158,11 @@ static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum) > return false; > } > > +DEFINE_PER_CPU_READ_MOSTLY(unsigned long, dr0_addr_mask); > +DEFINE_PER_CPU_READ_MOSTLY(unsigned long, dr1_addr_mask); > +DEFINE_PER_CPU_READ_MOSTLY(unsigned long, dr2_addr_mask); > +DEFINE_PER_CPU_READ_MOSTLY(unsigned long, dr3_addr_mask); This BPEXT thing is AMD-only, right? I guess those should be called amd_drX_addr_mask where X in [0-3]. Yeah yeah, they are used in AMD-only code - svm* - but still. > void set_dr_addr_mask(unsigned long mask, int dr) > { > if (!boot_cpu_has(X86_FEATURE_BPEXT)) > @@ -1166,17 +1171,44 @@ void set_dr_addr_mask(unsigned long mask, int dr) > switch (dr) { > case 0: > wrmsr(MSR_F16H_DR0_ADDR_MASK, mask, 0); > + per_cpu(dr0_addr_mask, smp_processor_id()) = mask; > break; > case 1: > + wrmsr(MSR_F16H_DR1_ADDR_MASK - 1 + dr, mask, 0); > + per_cpu(dr1_addr_mask, smp_processor_id()) = mask; > + break; > case 2: > + wrmsr(MSR_F16H_DR1_ADDR_MASK - 1 + dr, mask, 0); > + per_cpu(dr2_addr_mask, smp_processor_id()) = mask; > + break; > case 3: > wrmsr(MSR_F16H_DR1_ADDR_MASK - 1 + dr, mask, 0); > + per_cpu(dr3_addr_mask, smp_processor_id()) = mask; > break; > default: > break; > } > } > > +unsigned long get_dr_addr_mask(int dr) This function name is too generic for an exported function. amd_get_dr_addr_mask() I'd say. > + if (!boot_cpu_has(X86_FEATURE_BPEXT)) check_for_deprecated_apis: WARNING: arch/x86/kernel/cpu/amd.c:1195: Do not use boot_cpu_has() - use cpu_feature_enabled() instead. You could fix the above one too, while at it. > + return 0; > + > + switch (dr) { > + case 0: > + return per_cpu(dr0_addr_mask, smp_processor_id()); > + case 1: > + return per_cpu(dr1_addr_mask, smp_processor_id()); > + case 2: > + return per_cpu(dr2_addr_mask, smp_processor_id()); > + case 3: > + return per_cpu(dr3_addr_mask, smp_processor_id()); default: WARN_ON_ONCE(1); break; Just in case. And as a matter of fact, make that short and succinct: switch (dr) { case 0: return per_cpu(dr0_addr_mask, smp_processor_id()); case 1: return per_cpu(dr1_addr_mask, smp_processor_id()); case 2: return per_cpu(dr2_addr_mask, smp_processor_id()); case 3: return per_cpu(dr3_addr_mask, smp_processor_id()); default: WARN_ON_ONCE(1); break; } Thx.
On 8/12/22 05:55, Borislav Petkov wrote: > On Thu, Dec 01, 2022 at 01:19:46PM +1100, Alexey Kardashevskiy wrote: >> Subject: Re: [PATCH kernel 1/3] x86/amd/dr_addr_mask: Cache values in percpu variables > > "x86/amd: " is perfectly fine as a prefix. > >> Reading DR[0-3]_ADDR_MASK MSRs takes about 250 cycles which is going to >> be noticeable when the AMD KVM SEV-ES's DebugSwap feature is enabled and > > which does what? I.e., a sort of lazy DR regs swapping... > >> KVM needs to store these before switching to a guest; the DebugSwitch >> hardware support restores them as type B swap. > > I know this is all clear to you but you should explain what type B > register swap is. > >> This stores MSR values from set_dr_addr_mask() in percpu values and > > s/This stores/Store/ > > From Documentation/process/submitting-patches.rst: > > "Describe your changes in imperative mood, e.g. "make xyzzy do frotz" > instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy > to do frotz", as if you are giving orders to the codebase to change > its behaviour." > > Also, do not talk about what your patch does - that should hopefully be > visible in the diff itself. Rather, talk about *why* you're doing what > you're doing. > >> returns them via new get_dr_addr_mask(). The gain here is about 10x. >> >> Signed-off-by: Alexey Kardashevskiy <aik@amd.com> >> --- >> arch/x86/include/asm/debugreg.h | 1 + >> arch/x86/kernel/cpu/amd.c | 32 ++++++++++++++++++++ >> 2 files changed, 33 insertions(+) >> >> diff --git a/arch/x86/include/asm/debugreg.h b/arch/x86/include/asm/debugreg.h >> index cfdf307ddc01..c4324d0205b5 100644 >> --- a/arch/x86/include/asm/debugreg.h >> +++ b/arch/x86/include/asm/debugreg.h >> @@ -127,6 +127,7 @@ static __always_inline void local_db_restore(unsigned long dr7) >> >> #ifdef CONFIG_CPU_SUP_AMD >> extern void set_dr_addr_mask(unsigned long mask, int dr); >> +extern unsigned long get_dr_addr_mask(int dr); >> #else >> static inline void set_dr_addr_mask(unsigned long mask, int dr) { } >> #endif >> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c >> index c75d75b9f11a..ec7efcef4e14 100644 >> --- a/arch/x86/kernel/cpu/amd.c >> +++ b/arch/x86/kernel/cpu/amd.c >> @@ -1158,6 +1158,11 @@ static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum) >> return false; >> } >> >> +DEFINE_PER_CPU_READ_MOSTLY(unsigned long, dr0_addr_mask); >> +DEFINE_PER_CPU_READ_MOSTLY(unsigned long, dr1_addr_mask); >> +DEFINE_PER_CPU_READ_MOSTLY(unsigned long, dr2_addr_mask); >> +DEFINE_PER_CPU_READ_MOSTLY(unsigned long, dr3_addr_mask); > > This BPEXT thing is AMD-only, right? > > I guess those should be called amd_drX_addr_mask where X in [0-3]. > > Yeah yeah, they are used in AMD-only code - svm* - but still. > >> void set_dr_addr_mask(unsigned long mask, int dr) >> { >> if (!boot_cpu_has(X86_FEATURE_BPEXT)) >> @@ -1166,17 +1171,44 @@ void set_dr_addr_mask(unsigned long mask, int dr) >> switch (dr) { >> case 0: >> wrmsr(MSR_F16H_DR0_ADDR_MASK, mask, 0); >> + per_cpu(dr0_addr_mask, smp_processor_id()) = mask; >> break; >> case 1: >> + wrmsr(MSR_F16H_DR1_ADDR_MASK - 1 + dr, mask, 0); >> + per_cpu(dr1_addr_mask, smp_processor_id()) = mask; >> + break; >> case 2: >> + wrmsr(MSR_F16H_DR1_ADDR_MASK - 1 + dr, mask, 0); >> + per_cpu(dr2_addr_mask, smp_processor_id()) = mask; >> + break; >> case 3: >> wrmsr(MSR_F16H_DR1_ADDR_MASK - 1 + dr, mask, 0); >> + per_cpu(dr3_addr_mask, smp_processor_id()) = mask; >> break; >> default: >> break; >> } >> } >> >> +unsigned long get_dr_addr_mask(int dr) > > This function name is too generic for an exported function. > > amd_get_dr_addr_mask() I'd say. > >> + if (!boot_cpu_has(X86_FEATURE_BPEXT)) > > check_for_deprecated_apis: WARNING: arch/x86/kernel/cpu/amd.c:1195: Do not use boot_cpu_has() - use cpu_feature_enabled() instead. > > You could fix the above one too, while at it. > >> + return 0; >> + >> + switch (dr) { >> + case 0: >> + return per_cpu(dr0_addr_mask, smp_processor_id()); >> + case 1: >> + return per_cpu(dr1_addr_mask, smp_processor_id()); >> + case 2: >> + return per_cpu(dr2_addr_mask, smp_processor_id()); >> + case 3: >> + return per_cpu(dr3_addr_mask, smp_processor_id()); > > default: > WARN_ON_ONCE(1); > break; > > Just in case. > > And as a matter of fact, make that short and succinct: > > switch (dr) { > case 0: return per_cpu(dr0_addr_mask, smp_processor_id()); > case 1: return per_cpu(dr1_addr_mask, smp_processor_id()); > case 2: return per_cpu(dr2_addr_mask, smp_processor_id()); > case 3: return per_cpu(dr3_addr_mask, smp_processor_id()); > default: WARN_ON_ONCE(1); break; > } Not an array, as Sean suggested? Uff... > > Thx. >
On Thu, Dec 08, 2022 at 05:11:26PM +1100, Alexey Kardashevskiy wrote:
> Not an array, as Sean suggested? Uff...
Sure an array if it makes the code even more readable and clean. With an
array it should become even more compact, I'd say.
Thx.
diff --git a/arch/x86/include/asm/debugreg.h b/arch/x86/include/asm/debugreg.h index cfdf307ddc01..c4324d0205b5 100644 --- a/arch/x86/include/asm/debugreg.h +++ b/arch/x86/include/asm/debugreg.h @@ -127,6 +127,7 @@ static __always_inline void local_db_restore(unsigned long dr7) #ifdef CONFIG_CPU_SUP_AMD extern void set_dr_addr_mask(unsigned long mask, int dr); +extern unsigned long get_dr_addr_mask(int dr); #else static inline void set_dr_addr_mask(unsigned long mask, int dr) { } #endif diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index c75d75b9f11a..ec7efcef4e14 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -1158,6 +1158,11 @@ static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum) return false; } +DEFINE_PER_CPU_READ_MOSTLY(unsigned long, dr0_addr_mask); +DEFINE_PER_CPU_READ_MOSTLY(unsigned long, dr1_addr_mask); +DEFINE_PER_CPU_READ_MOSTLY(unsigned long, dr2_addr_mask); +DEFINE_PER_CPU_READ_MOSTLY(unsigned long, dr3_addr_mask); + void set_dr_addr_mask(unsigned long mask, int dr) { if (!boot_cpu_has(X86_FEATURE_BPEXT)) @@ -1166,17 +1171,44 @@ void set_dr_addr_mask(unsigned long mask, int dr) switch (dr) { case 0: wrmsr(MSR_F16H_DR0_ADDR_MASK, mask, 0); + per_cpu(dr0_addr_mask, smp_processor_id()) = mask; break; case 1: + wrmsr(MSR_F16H_DR1_ADDR_MASK - 1 + dr, mask, 0); + per_cpu(dr1_addr_mask, smp_processor_id()) = mask; + break; case 2: + wrmsr(MSR_F16H_DR1_ADDR_MASK - 1 + dr, mask, 0); + per_cpu(dr2_addr_mask, smp_processor_id()) = mask; + break; case 3: wrmsr(MSR_F16H_DR1_ADDR_MASK - 1 + dr, mask, 0); + per_cpu(dr3_addr_mask, smp_processor_id()) = mask; break; default: break; } } +unsigned long get_dr_addr_mask(int dr) +{ + if (!boot_cpu_has(X86_FEATURE_BPEXT)) + return 0; + + switch (dr) { + case 0: + return per_cpu(dr0_addr_mask, smp_processor_id()); + case 1: + return per_cpu(dr1_addr_mask, smp_processor_id()); + case 2: + return per_cpu(dr2_addr_mask, smp_processor_id()); + case 3: + return per_cpu(dr3_addr_mask, smp_processor_id()); + } + return 0; +} +EXPORT_SYMBOL_GPL(get_dr_addr_mask); + u32 amd_get_highest_perf(void) { struct cpuinfo_x86 *c = &boot_cpu_data;