Message ID | 20230530200152.18961-1-jon@nutanix.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp2431519vqr; Tue, 30 May 2023 13:08:52 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ69MgkhK6Nwmpij5L5MSqCFjCDYHqxVXzeyxSef3R3GEJ8Y2P8/fOHwcXJAb4aN3320f6M+ X-Received: by 2002:a17:90a:6c26:b0:233:fb7d:845a with SMTP id x35-20020a17090a6c2600b00233fb7d845amr3649125pjj.4.1685477332495; Tue, 30 May 2023 13:08:52 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1685477332; cv=pass; d=google.com; s=arc-20160816; b=VqXKc2LDEONRXH0fCRa2eDV0Wc/kY2oqTuNI62lyLkM7gd1MzbTn7BxUuDzmRyV+UA /7dckSG7d/E6UZOzXJUqmQAHfDEKF2XInwhEpe7oqs0IKKog/tr098Jbv3boendEbanb XiOChrUAjTBc8WOXhYgR231Dnmzqn5dbSvNU9LC1SuE2fmZrB/NK3TtQ60BJehPumpkN g4vzWAWQ5iCh3VEg5zGMIgG7tFSD9ACclmoJCHl10jF6Il/PBaFt5WKEbfm/QpTNjfa7 O7z0vv085ZWtrh7zo9zNdNwvAVLhDHyf+d6LBlh4nV9OrzrWgBDR1hy5zFK1k3PuQ0Ij Lesw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:content-transfer-encoding :message-id:date:subject:cc:to:from:dkim-signature:dkim-signature; bh=onOBIYlEmEoJaU5CupLn5wHBxu8zLYjRvxmC6/sHHfY=; b=ZXiJH55CTXXyDBYoxkrXWqraOefSePmpb3oELKO9VeGAX92x9eaMJ8Pr81ogxPboyO 4V9/GTM7Cs44moeT6s1qgQdb5TqttuO9dzd5C0OaU+lrYYC7YIBr7evoSk0HS1hjjjmW 6bw1VRSbXCCnQ6FI5PuuHR7qs2BwmNHM13a8/RRQVzsEL5jJV2ky8j5S+uAG77MoxDdz SmlLpbYRDKN2u0BH8UVL9OsLuf9mxH3m8JzJXGxJ+O02cxRrCf5ij+NGfAaHTwopVTb8 G3kXQEeFo0JBMyLxK6+h2IoeCb3VB9gusM4znS8Nie+kOMYSNIinT8j0HcRnnXvXnTYf VvsQ== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@nutanix.com header.s=proofpoint20171006 header.b=Dna1mz1+; dkim=pass header.i=@nutanix.com header.s=selector1 header.b=RlsOT867; arc=pass (i=1 spf=pass spfdomain=nutanix.com dkim=pass dkdomain=nutanix.com dmarc=pass fromdomain=nutanix.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=nutanix.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g6-20020a17090a714600b002535c2c4c8esi6919469pjs.150.2023.05.30.13.08.20; Tue, 30 May 2023 13:08:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@nutanix.com header.s=proofpoint20171006 header.b=Dna1mz1+; dkim=pass header.i=@nutanix.com header.s=selector1 header.b=RlsOT867; arc=pass (i=1 spf=pass spfdomain=nutanix.com dkim=pass dkdomain=nutanix.com dmarc=pass fromdomain=nutanix.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=nutanix.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232270AbjE3UC5 (ORCPT <rfc822;andrewvogler123@gmail.com> + 99 others); Tue, 30 May 2023 16:02:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43642 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229540AbjE3UCy (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 30 May 2023 16:02:54 -0400 Received: from mx0b-002c1b01.pphosted.com (mx0b-002c1b01.pphosted.com [148.163.155.12]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1DE06102; Tue, 30 May 2023 13:02:52 -0700 (PDT) Received: from pps.filterd (m0127842.ppops.net [127.0.0.1]) by mx0b-002c1b01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 34UF3fCT026847; Tue, 30 May 2023 13:02:17 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nutanix.com; h=from : to : cc : subject : date : message-id : content-transfer-encoding : content-type : mime-version; s=proofpoint20171006; bh=onOBIYlEmEoJaU5CupLn5wHBxu8zLYjRvxmC6/sHHfY=; b=Dna1mz1+goHXf7MIuaXFqDYHLKCf1bBhPKdCoXuifmruv8qJZovtCFzVA1pYRxJjIXSE UXeMmeXjod4v/sinpE3969z/yqjq/ZdMXfbfJuwmndmwIC7HvOWm8stjOpEKMIBVYekD VWMAkF8KghdfaC/xjhNLJErfnPizNb2WDoWwRiEQRxFwb4VXRr3T+PlelIe2gWuAaBV3 sjemhmMvIrDwi8hl8PRnDToRm+qxISCJFpH5MjELtYB2m11vqU0RmeQxe2GR86RQ6ngS VU5npeasQgaXTAD2I7CzsZiERTx765+Muqlcb09IbL2Pa5C1v07EhemzB3mph/5fRq2c Cw== Received: from nam12-mw2-obe.outbound.protection.outlook.com (mail-mw2nam12lp2044.outbound.protection.outlook.com [104.47.66.44]) by mx0b-002c1b01.pphosted.com (PPS) with ESMTPS id 3quh6cg6cx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 30 May 2023 13:02:16 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=kt4XmCRPppK2a0A2speElv0LsW4vKbgOLocoeXlCz52p1ltjBiyfqp9dmvE8LD5TCUyEhuJcsUgCzEgElNB030GmHqlmrGY/HoA3pAzT3ihgnCVIxwEyKjLsCecy9N1H8b4uyUaI3PgXPS4sNBjlkWf8roKtgWkjNrjcmaAMCKGV65oAQtZZL3tlGilsRUGRe+bonh84xKOLWi/OZCjHCD1F2htB9IB5hafOsmlHlTMDaGapCyHIY79/Ii3g6YKdfMWlCsQTMJIATGh1hGzO4vHnMefstTcBKr9lNQ7ewf2sPrvZ0t3j7y2UuGCwWGz8J5oalp3CW+zZbRwtU8cCIg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=onOBIYlEmEoJaU5CupLn5wHBxu8zLYjRvxmC6/sHHfY=; b=ObSIqlscfEV4bDMm80xZi1Sz6CX81R++i45Iqh+oAWeMZEmMXBv7hBMJ/zFt4zDQJi/itj7vN+lEsL96zCo7NZqeKTF6UXu7ktfHvXvTZhx1MyvAkYzFMUqKRF1+1uuBQuydigxUcFzTSp6FucMLtnEVZD6/CbK3BaLnk8/AzKK9fHyRcFPu+a7ONOYJiA9Iu6id2e7Pt/LlpKTwPJ7xdAP00WeNCt5Mh4i69J9aJsDWdGHfIjKK1U4vfYCyFklIQaJf2g0Ls9e1zEY3tVvWFFsz8FTv3aMfav5tIKtLXAK4WqoXuArf2n6YaIfaZLsMf/6YUV8ENa2cSXJBTRO7Ug== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nutanix.com; dmarc=pass action=none header.from=nutanix.com; dkim=pass header.d=nutanix.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nutanix.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=onOBIYlEmEoJaU5CupLn5wHBxu8zLYjRvxmC6/sHHfY=; b=RlsOT867BSvVFYFMl2uAAfVUrlfbmxQv4kKmL0RoENTCTyQM7XniojPjFrKGgTEZs0IbgrYakDLDHmJrQs+tBENBXFlAWZvDIU9Z0gFqeVWhr++srBYLzmDVGgEKddQ2pxi2tDlMjE4gmW/vU6/DiGegG98Uw2SKrB717d8kNxZRwhR76cgZCOVPFkb/21u3+ujTR4+aQGiLCi5KXcotJHqef+q0+Z4Zbqybubghs4Bo6TrVi8uxJdNeIw1SXvq8ijyBMYb8OH4HV+F250ZGnBQWWje76vK45QUPssPM2By7nbKOkY+jYiQdoFm8odtAc/98cIcatVXNuqzU7Kr7PA== Received: from BL0PR02MB4579.namprd02.prod.outlook.com (2603:10b6:208:4b::10) by CYYPR02MB9787.namprd02.prod.outlook.com (2603:10b6:930:ba::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6433.23; Tue, 30 May 2023 20:02:11 +0000 Received: from BL0PR02MB4579.namprd02.prod.outlook.com ([fe80::b138:ab35:d489:67f]) by BL0PR02MB4579.namprd02.prod.outlook.com ([fe80::b138:ab35:d489:67f%4]) with mapi id 15.20.6433.022; Tue, 30 May 2023 20:02:11 +0000 From: Jon Kohler <jon@nutanix.com> To: Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>, x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>, "Chang S. Bae" <chang.seok.bae@intel.com>, Kyle Huey <me@kylehuey.com>, neelnatu@google.com, Andrew Cooper <andrew.cooper3@citrix.com>, Jon Kohler <jon@nutanix.com>, linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, Sean Christopherson <seanjc@google.com>, Paolo Bonzini <pbonzini@redhat.com> Subject: [PATCH] x86/fpu/xstate: clear XSAVE features if DISABLED_MASK set Date: Tue, 30 May 2023 16:01:50 -0400 Message-Id: <20230530200152.18961-1-jon@nutanix.com> X-Mailer: git-send-email 2.30.1 (Apple Git-130) Content-Transfer-Encoding: 8bit Content-Type: text/plain X-ClientProxiedBy: BY5PR20CA0006.namprd20.prod.outlook.com (2603:10b6:a03:1f4::19) To BL0PR02MB4579.namprd02.prod.outlook.com (2603:10b6:208:4b::10) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BL0PR02MB4579:EE_|CYYPR02MB9787:EE_ X-MS-Office365-Filtering-Correlation-Id: 057866ce-51eb-4127-a10a-08db6148c13c x-proofpoint-crosstenant: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 4RV0exuOr3DdtXxpT48fZn1muoC9G3be4abOGr2FXQXOloCu6GqFL2VwespA5Ac3SeCiqlMMON+/A7z/Njuh3luhQxM0wW18Y8RJAZV29S+yc4cubCRnTpYqJ9FyQVaVw+M/4DXm+jmelmoUuoFJZdroVVqdpzDok1GqioLInfZIKXc7MYj2TE+7DCy2l+0S/T6AJURc++k9aBIQMFVNtHVCnvOQEMs2vXTKv4tyvWAjWfPHkOlEx2FmMhZCuVPag5stlWA+bRZqjpjeHA60Sw2P+wD/l84HdrfJzQn9qEjVm+vGIoynfcdgifwPfE3RglzrctofCnrSKxzDMyDAnFscG0ddS8YruwZCGQmRJE8VKu3bKGMLiDfemDyLmeMfovXj7nQsFNJ3sRuNWhkbXdota+qYSTIFt/GgX8ODrxmnhGf0FnnMPpf+kiqa0gprsHTsbamDqO1AQNyR2mlTmqrvvfTCM1NM88w8QQ3+N6p+48DjnL0CSvBySv+zKr2lPs8iGVSqYT15NfPJdLcJnTDODsKBkUfHM7MJ6hBrMhGn5/X9urGMOBsmay9GjM76Lu89fPBchIhdjacVW/G+jLpppK0QfTH4pOPYY2BD7B/tw/J+5RG4dXXllMc+5UD5t3crXPApZNGrs3FLPwaYng== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:BL0PR02MB4579.namprd02.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230028)(346002)(376002)(39860400002)(136003)(366004)(396003)(451199021)(54906003)(478600001)(110136005)(8676002)(8936002)(7416002)(5660300002)(36756003)(2906002)(86362001)(921005)(66476007)(4326008)(66556008)(66946007)(316002)(38100700002)(41300700001)(2616005)(186003)(6512007)(1076003)(52116002)(6486002)(6666004)(6506007)(83380400001)(46800400005);DIR:OUT;SFP:1102; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: aZ/Fu6jogleN22DO8jh7Q4/ESL2yDn7j9RMnrYDKOZVpC9iBk35yKRV+F6dxfH4slqjz9E4mv+H0r21Uqms2GGmfjSTNABJ1b9gQeaFJA78kEGZCEXMsjcZuUJmTy3v7izFQMtpEJyfesBTzqve4hrtvBUGp5o6ho3mEySEzfVvkCY6Pnxhk+aoo6p/HCGRwp8ytAFwYMrLiR6rcr3F0QyuXfWUb9jU/KgWHC/m8Inaow+mmaGMqim/u99U7y0VBoLiNkSNkHx7w6fKsCBXYyLik5BkEGp3payPawzwW2QJquP+HD3V01KM1PZ4OpDKEs4s6uT0xOALMhA3Irh2KKr3P/ldMYidtth4E9Lx8dMK9zDeh7fOEuKeZv1mDtzk/Vo7M1QbDft2z52ngzRw3ZVXL3ZZDNuDCcvf3/3gJhVL7+moRC0XLehHu4+opme13cILlAiA5vsJsealJvZqSC33XJg/igzjL5F2+s1KRZf7jonSK/onPcLq2qZPfKqOjw2lTtLu2X0Up6YNKZausPQg7F6kibdGoUC5pGM564DtXs6YhPuGZ9DPbvq/TEkpvrn1Eb8n2F5F0TLlzFHt4JR1ahXZpqvZKtjjFXxZYXWNE747zU4QYaORbL5Zs9PIvBXo09SQoGeu/4tttyBVpGmT93Nf/Bh7vGgg0FX1iGayNaXVyfZ3XtIC6jATKBtqHpjC/815a80hnOc0Acvoo/PQhSx+5iAmuACINtuNInw+h3dEVRLNhiN5Qrw4hdwBnd9/kf+7H7t0zgjrGaDMfqIJJYgSMfuH7Vx/jJClAT/7Wg/0kcKyf/W/2nDy7CMtN3eLkCIewoePfWDrrWXHK8JSDNqn5zjobYKPA6JNkcEWH38hz764JAgZV/XiyBe3In93ELCbOT6azDRan6slcYhAgrafrtZL++Z10cfMPdgxWsHLyyFkkhziLSBfljQ7GnprJAwTIwiNdzjxEp+qGhsTZ2puxfDrqQl1wv5FrmLr8kPTp10yid2jzZHVpu0TSaoKLhqhYtvilQNL0/OcrecnshKymHeJt90/k4NB9Aabwp3ux07zLlHCAVhzZBJDSiV+LP8Kqm8zvRHNUUdvrkHBm2Ilo9hp/xC2fpK5xWDoUhImuVeBIf1v/7qXI92mlY7mYoBBi0jmSeBJ0/2iYLI2H/O+f/lEHaQyyW+Ufu5wbZ365/K6TOadpWV62vVYU3ozqhGY/DKQQGPZVvtXQGcImuS8R4KEsixeMCIjLgWNiBbwKh7xD7fYAbL3cdsvlshe50yzecdD9COfHCAnV1A/j3UuxWZqbLAgP2QlB/ba1o4fVAVkNtY/XRX9ZNC/x3obhBoKA4KiLVCH+w7fvhHS9lZB7SiZRLawTeqwEvt8QOg94KD/t2/gPbKLKlw6UBmO9oCgSVQt/EmLyyy4/IDIErvb/aT90SGebRZ0L7hXEEwnYAtPD6v2C5Awk11t4wesMRbPcvier2w+0ZXCmcmeQQGK0yg0IOQnm1KVb8i+mmGFb5O2RGX8S7UtKo44LVPW2JijGaGdCxVXoqaJxLEOLaubLjGsU1xUAfTnChMOYI/SH1OJ+vWgBCxVmRT+f35Qjf2mYvLvZFGlHwHDYnV1rCafzxVmDnHNNQpE0UgJplQiSa5WgqAupm+r1GizPRk0uXTuIDgTUjFdU1P4XNA== X-OriginatorOrg: nutanix.com X-MS-Exchange-CrossTenant-Network-Message-Id: 057866ce-51eb-4127-a10a-08db6148c13c X-MS-Exchange-CrossTenant-AuthSource: BL0PR02MB4579.namprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 May 2023 20:02:11.2493 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: bb047546-786f-4de1-bd75-24e5b6f79043 X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: 4+4Joj9DheKsbRMI03rMpS6TqYXsMtPHfv64rbhPJxxv831RkRcky+3vqaRWdBKun24U0kvVRH11lpdZLYCn1eoiiKorSl2P1gxqlsKIpNY= X-MS-Exchange-Transport-CrossTenantHeadersStamped: CYYPR02MB9787 X-Proofpoint-ORIG-GUID: a6WSXJ-4ZUQQnxg4xhucGpWsLAEua8Xv X-Proofpoint-GUID: a6WSXJ-4ZUQQnxg4xhucGpWsLAEua8Xv X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.573,FMLib:17.11.176.26 definitions=2023-05-30_15,2023-05-30_01,2023-05-22_02 X-Proofpoint-Spam-Reason: safe X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE, SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1767351079813592033?= X-GMAIL-MSGID: =?utf-8?q?1767351079813592033?= |
Series |
x86/fpu/xstate: clear XSAVE features if DISABLED_MASK set
|
|
Commit Message
Jon Kohler
May 30, 2023, 8:01 p.m. UTC
Respect DISABLED_MASK when clearing XSAVE features, such that features
that are disabled do not appear in the xfeatures mask.
This is important for kvm_load_{guest|host}_xsave_state, which look
at host_xcr0 and will do an expensive xsetbv when the guest and host
do not match.
A prime example if CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS is disabled,
the guest OS will not see PKU masked; however, the guest will incur
xsetbv since the host mask will never match the guest, even though
DISABLED_MASK16 has DISABLE_PKU set.
Signed-off-by: Jon Kohler <jon@nutanix.com>
CC: kvm@vger.kernel.org
CC: Sean Christopherson <seanjc@google.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
---
arch/x86/kernel/fpu/xstate.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--
2.30.1 (Apple Git-130)
Comments
On 5/30/23 13:01, Jon Kohler wrote: > Respect DISABLED_MASK when clearing XSAVE features, such that features > that are disabled do not appear in the xfeatures mask. One sanity check that I'd suggest adopting is "How many other users in the code do this?" How many DISABLED_MASK_BIT_SET() users are there? > This is important for kvm_load_{guest|host}_xsave_state, which look > at host_xcr0 and will do an expensive xsetbv when the guest and host > do not match. Is that the only problem? kvm_load_guest_xsave_state() seems to have some #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS code and I can't imagine that KVM guests can even use PKRU if this code is compiled out. Also, this will set XFEATURE_PKRU in xcr0 and go to the trouble of XSAVE/XRSTOR'ing it all over the place even for regular tasks. > A prime example if CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS is disabled, > the guest OS will not see PKU masked; however, the guest will incur > xsetbv since the host mask will never match the guest, even though > DISABLED_MASK16 has DISABLE_PKU set. OK, so you care because you're seeing KVM go slow. You tracked it down to lots of XSETBV's? You said, "what the heck, why is it doing XSETBV so much?" and tracked it down to 'max_features' which we use to populate XCR0? > diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c > index 0bab497c9436..211ef82b53e3 100644 > --- a/arch/x86/kernel/fpu/xstate.c > +++ b/arch/x86/kernel/fpu/xstate.c > @@ -798,7 +798,8 @@ void __init fpu__init_system_xstate(unsigned int legacy_size) > unsigned short cid = xsave_cpuid_features[i]; > > /* Careful: X86_FEATURE_FPU is 0! */ > - if ((i != XFEATURE_FP && !cid) || !boot_cpu_has(cid)) > + if ((i != XFEATURE_FP && !cid) || !boot_cpu_has(cid) || > + DISABLED_MASK_BIT_SET(cid)) > fpu_kernel_cfg.max_features &= ~BIT_ULL(i); > } I _think_ I'd rather this just be cpu_feature_enabled(cid) rather than using DISABLED_MASK_BIT_SET() directly. But, I guess this probably also isn't a big deal for _most_ people. Any sane distro kernel will just set CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS since it's pretty widespread on modern CPUs and works across Intel and AMD now.
> On May 30, 2023, at 6:22 PM, Dave Hansen <dave.hansen@intel.com> wrote: > > On 5/30/23 13:01, Jon Kohler wrote: >> Respect DISABLED_MASK when clearing XSAVE features, such that features >> that are disabled do not appear in the xfeatures mask. > > One sanity check that I'd suggest adopting is "How many other users in > the code do this?" How many DISABLED_MASK_BIT_SET() users are there? Good tip, thank you. Just cpu_feature_enabled(), though I felt that using DISABLED_MASK_BIT_SET() really does capture *exactly* what I’m trying to do here. Happy to take suggestions, perhaps !cpu_feature_enabled(cid) instead? Or, I did noodle with the idea of making a cpu_feature_disabled() as an alias for DISABLED_MASK_BIT_SET(), but that felt like bloating the change for little gain. > >> This is important for kvm_load_{guest|host}_xsave_state, which look >> at host_xcr0 and will do an expensive xsetbv when the guest and host >> do not match. > > Is that the only problem? kvm_load_guest_xsave_state() seems to have > some #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS code and I can't > imagine that KVM guests can even use PKRU if this code is compiled out. > > Also, this will set XFEATURE_PKRU in xcr0 and go to the trouble of > XSAVE/XRSTOR'ing it all over the place even for regular tasks. Correct, KVM isn’t the only beneficiary here as you rightly pointed out. I’m happy to clarify that in the commit msg if you’d like. Also, ack on the ifdef’s, I added those myself way back when, this change is an addendum to that one to nuke the xsetbv overhead. >> A prime example if CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS is disabled, >> the guest OS will not see PKU masked; however, the guest will incur >> xsetbv since the host mask will never match the guest, even though >> DISABLED_MASK16 has DISABLE_PKU set. > > OK, so you care because you're seeing KVM go slow. You tracked it down > to lots of XSETBV's? You said, "what the heck, why is it doing XSETBV > so much?" and tracked it down to 'max_features' which we use to populate > XCR0? Yes and Yes, that is exactly how I arrived here. kvm_load_{guest|host}_xsave_state is on the critical path for vmexit / vmentry. This overhead sticks out like a sore thumb when looking at perf top or visualizing threads with a flamegraph if the guest, for whatever reason, has a different xcr0 than the host. That is easy to do if guests have PKU masked out. > >> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c >> index 0bab497c9436..211ef82b53e3 100644 >> --- a/arch/x86/kernel/fpu/xstate.c >> +++ b/arch/x86/kernel/fpu/xstate.c >> @@ -798,7 +798,8 @@ void __init fpu__init_system_xstate(unsigned int legacy_size) >> unsigned short cid = xsave_cpuid_features[i]; >> >> /* Careful: X86_FEATURE_FPU is 0! */ >> - if ((i != XFEATURE_FP && !cid) || !boot_cpu_has(cid)) >> + if ((i != XFEATURE_FP && !cid) || !boot_cpu_has(cid) || >> + DISABLED_MASK_BIT_SET(cid)) >> fpu_kernel_cfg.max_features &= ~BIT_ULL(i); >> } > > I _think_ I'd rather this just be cpu_feature_enabled(cid) rather than > using DISABLED_MASK_BIT_SET() directly. > > But, I guess this probably also isn't a big deal for _most_ people. Any > sane distro kernel will just set CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS > since it's pretty widespread on modern CPUs and works across Intel and > AMD now. Ack, I’m using PKU as the key example here, but looking forward this is more of a correctness thing than anything else. If for any reason, any xsave feature is disabled In the way that PKU is disabled, it will slip thru the cracks. If it would make it cleaner, I’m happy to drop the example from the commit msg to prevent any confusion that this is PKU specific in any way. Thoughts?
On Wed, May 31, 2023, Jon Kohler wrote: > > > On May 30, 2023, at 6:22 PM, Dave Hansen <dave.hansen@intel.com> wrote: > > > > On 5/30/23 13:01, Jon Kohler wrote: > > Is that the only problem? kvm_load_guest_xsave_state() seems to have > > some #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS code and I can't > > imagine that KVM guests can even use PKRU if this code is compiled out. ... > >> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c > >> index 0bab497c9436..211ef82b53e3 100644 > >> --- a/arch/x86/kernel/fpu/xstate.c > >> +++ b/arch/x86/kernel/fpu/xstate.c > >> @@ -798,7 +798,8 @@ void __init fpu__init_system_xstate(unsigned int legacy_size) > >> unsigned short cid = xsave_cpuid_features[i]; > >> > >> /* Careful: X86_FEATURE_FPU is 0! */ > >> - if ((i != XFEATURE_FP && !cid) || !boot_cpu_has(cid)) > >> + if ((i != XFEATURE_FP && !cid) || !boot_cpu_has(cid) || > >> + DISABLED_MASK_BIT_SET(cid)) > >> fpu_kernel_cfg.max_features &= ~BIT_ULL(i); > >> } > > > > I _think_ I'd rather this just be cpu_feature_enabled(cid) rather than > > using DISABLED_MASK_BIT_SET() directly. +1, xstate.c uses cpu_feature_enabled() all over the place, and IMO effectively open coding cpu_feature_enabled() yields less intuitive code. And on the KVM side, we can and should replace the #ifdef with cpu_feature_enabled() (I'll post a patch), as modern compilers are clever enough to completely optimize out the code when CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS=n. At that point, using cpu_feature_enabled() in both KVM and xstate.c will provide a nice bit of symmetry. Caveat #1: cpu_feature_enabled() has a flaw that's relevant to this code: in the unlikely scenario that the compiler doesn't resolve "cid" to a compile-time constant value, cpu_feature_enabled() won't query DISABLED_MASK_BIT_SET(). I don't see any other use of cpu_feature_enabled() without a hardcoded X86_FEATURE_*, and the below compiles with my config, so I think/hope we can just require a compile-time constant when using cpu_feature_enabled(). diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h index ce0c8f7d3218..886200fbf8d9 100644 --- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -141,8 +141,11 @@ extern const char * const x86_bug_flags[NBUGINTS*32]; * supporting a possible guest feature where host support for it * is not relevant. */ -#define cpu_feature_enabled(bit) \ - (__builtin_constant_p(bit) && DISABLED_MASK_BIT_SET(bit) ? 0 : static_cpu_has(bit)) +#define cpu_feature_enabled(bit) \ +({ \ + BUILD_BUG_ON(!__builtin_constant_p(bit)); \ + DISABLED_MASK_BIT_SET(bit) ? 0 : static_cpu_has(bit); \ +}) #define boot_cpu_has(bit) cpu_has(&boot_cpu_data, bit) Caveat #2: Using cpu_feature_enabled() could subtly break KVM, as KVM advertises support for features based on boot_cpu_data. E.g. if a feature were disabled by Kconfig but present in hardware, KVM would allow the guest to use the feature without properly context switching the data. PKU isn't problematic because KVM explicitly gates PKU on boot_cpu_has(X86_FEATURE_OSPKE), but KVM learned that lesson the hard way (see commit c469268cd523, "KVM: x86: block guest protection keys unless the host has them enabled"). Exposing a feature that's disabled in the host isn't completely absurd, e.g. KVM already effectively does this for MPX. The only reason using cpu_feature_enabled() wouldn't be problematic for MPX is because there's no longer a Kconfig for MPX. I'm totally ok gating xfeature bits on cpu_feature_enabled(), but there should be a prep patch for KVM to clear features bits in kvm_cpu_caps if the corresponding XCR0/XSS bit is not set in the host. If KVM ever wants to expose an xstate feature (other than MPX) that's disabled in the host, then we can revisit fpu__init_system_xstate(). But we need to ensure the "failure" mode is that KVM doesn't advertise the feature, as opposed to advertising a feature without without context switching its data. > > But, I guess this probably also isn't a big deal for _most_ people. Any > > sane distro kernel will just set CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS > > since it's pretty widespread on modern CPUs and works across Intel and > > AMD now. > > Ack, I’m using PKU as the key example here, but looking forward this is more of a > correctness thing than anything else. If for any reason, any xsave feature is disabled > In the way that PKU is disabled, it will slip thru the cracks. I'd be careful about billing this as a correctness thing. See above.
> On May 31, 2023, at 12:30 PM, Sean Christopherson <seanjc@google.com> wrote: > > On Wed, May 31, 2023, Jon Kohler wrote: >> >>> On May 30, 2023, at 6:22 PM, Dave Hansen <dave.hansen@intel.com> wrote: >>> >>> On 5/30/23 13:01, Jon Kohler wrote: >>> Is that the only problem? kvm_load_guest_xsave_state() seems to have >>> some #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS code and I can't >>> imagine that KVM guests can even use PKRU if this code is compiled out. > > ... > >>>> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c >>>> index 0bab497c9436..211ef82b53e3 100644 >>>> --- a/arch/x86/kernel/fpu/xstate.c >>>> +++ b/arch/x86/kernel/fpu/xstate.c >>>> @@ -798,7 +798,8 @@ void __init fpu__init_system_xstate(unsigned int legacy_size) >>>> unsigned short cid = xsave_cpuid_features[i]; >>>> >>>> /* Careful: X86_FEATURE_FPU is 0! */ >>>> - if ((i != XFEATURE_FP && !cid) || !boot_cpu_has(cid)) >>>> + if ((i != XFEATURE_FP && !cid) || !boot_cpu_has(cid) || >>>> + DISABLED_MASK_BIT_SET(cid)) >>>> fpu_kernel_cfg.max_features &= ~BIT_ULL(i); >>>> } >>> >>> I _think_ I'd rather this just be cpu_feature_enabled(cid) rather than >>> using DISABLED_MASK_BIT_SET() directly. > > +1, xstate.c uses cpu_feature_enabled() all over the place, and IMO effectively > open coding cpu_feature_enabled() yields less intuitive code. Ack, thank you for the feedback > > And on the KVM side, we can and should replace the #ifdef with cpu_feature_enabled() > (I'll post a patch), as modern compilers are clever enough to completely optimize > out the code when CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS=n. At that point, using > cpu_feature_enabled() in both KVM and xstate.c will provide a nice bit of symmetry. Ok, thanks for helping to clean that up, I appreciate it. > > Caveat #1: cpu_feature_enabled() has a flaw that's relevant to this code: in the > unlikely scenario that the compiler doesn't resolve "cid" to a compile-time > constant value, cpu_feature_enabled() won't query DISABLED_MASK_BIT_SET(). I don't > see any other use of cpu_feature_enabled() without a hardcoded X86_FEATURE_*, and > the below compiles with my config, so I think/hope we can just require a compile-time > constant when using cpu_feature_enabled(). > Yea I think that should work. I’ll club that into v2 of this patch. > diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h > index ce0c8f7d3218..886200fbf8d9 100644 > --- a/arch/x86/include/asm/cpufeature.h > +++ b/arch/x86/include/asm/cpufeature.h > @@ -141,8 +141,11 @@ extern const char * const x86_bug_flags[NBUGINTS*32]; > * supporting a possible guest feature where host support for it > * is not relevant. > */ > -#define cpu_feature_enabled(bit) \ > - (__builtin_constant_p(bit) && DISABLED_MASK_BIT_SET(bit) ? 0 : static_cpu_has(bit)) > +#define cpu_feature_enabled(bit) \ > +({ \ > + BUILD_BUG_ON(!__builtin_constant_p(bit)); \ > + DISABLED_MASK_BIT_SET(bit) ? 0 : static_cpu_has(bit); \ > +}) > > #define boot_cpu_has(bit) cpu_has(&boot_cpu_data, bit) > > Caveat #2: Using cpu_feature_enabled() could subtly break KVM, as KVM advertises > support for features based on boot_cpu_data. E.g. if a feature were disabled by > Kconfig but present in hardware, KVM would allow the guest to use the feature > without properly context switching the data. PKU isn't problematic because KVM > explicitly gates PKU on boot_cpu_has(X86_FEATURE_OSPKE), but KVM learned that > lesson the hard way (see commit c469268cd523, "KVM: x86: block guest protection > keys unless the host has them enabled"). Exposing a feature that's disabled in > the host isn't completely absurd, e.g. KVM already effectively does this for MPX. > The only reason using cpu_feature_enabled() wouldn't be problematic for MPX is > because there's no longer a Kconfig for MPX. > > I'm totally ok gating xfeature bits on cpu_feature_enabled(), but there should be > a prep patch for KVM to clear features bits in kvm_cpu_caps if the corresponding > XCR0/XSS bit is not set in the host. If KVM ever wants to expose an xstate feature > (other than MPX) that's disabled in the host, then we can revisit > fpu__init_system_xstate(). But we need to ensure the "failure" mode is that > KVM doesn't advertise the feature, as opposed to advertising a feature without > without context switching its data. Looking into this, trying to understand the comment about a feature being used without context switching its data. In __kvm_x86_vendor_init() we’re already populating host_xcr0 using the XCR_XFEATURE_ENABLED_MASK, which should be populated on boot by fpu__init_cpu_xstate(), which happens almost immediately after the code that I modified in this commit. That then flows into guest_supported_xcr0 (as well as user_xfeatures). guest_supported_xcr0 is then plumbed into __kvm_set_xcr, which specifically says that we’re using that to prevent the guest from setting bits that we won’t save in the first place. /* * Do not allow the guest to set bits that we do not support * saving. However, xcr0 bit 0 is always set, even if the * emulated CPU does not support XSAVE (see kvm_vcpu_reset()). */ valid_bits = vcpu->arch.guest_supported_xcr0 | XFEATURE_MASK_FP; Wouldn’t this mean that the *guest* xstate initialization would not see a given feature too and take care of the problem naturally? Or are you saying you’d want an even more detailed clearing? > >>> But, I guess this probably also isn't a big deal for _most_ people. Any >>> sane distro kernel will just set CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS >>> since it's pretty widespread on modern CPUs and works across Intel and >>> AMD now. >> >> Ack, I’m using PKU as the key example here, but looking forward this is more of a >> correctness thing than anything else. If for any reason, any xsave feature is disabled >> In the way that PKU is disabled, it will slip thru the cracks. > > I'd be careful about billing this as a correctness thing. See above. Fair enough. I’ll see about simplifying the commit msg when we get thru the comments on this thread.
On Wed, May 31, 2023, Jon Kohler wrote: > > > On May 31, 2023, at 12:30 PM, Sean Christopherson <seanjc@google.com> wrote: > > Caveat #1: cpu_feature_enabled() has a flaw that's relevant to this code: in the > > unlikely scenario that the compiler doesn't resolve "cid" to a compile-time > > constant value, cpu_feature_enabled() won't query DISABLED_MASK_BIT_SET(). I don't > > see any other use of cpu_feature_enabled() without a hardcoded X86_FEATURE_*, and > > the below compiles with my config, so I think/hope we can just require a compile-time > > constant when using cpu_feature_enabled(). > > > > Yea I think that should work. I’ll club that into v2 of this patch. I recommend doing it as a separate patch, hardening cpu_feature_enabled() shouldn't have a dependency on tweaking the xfeatures mask. I tested this with an allyesconfig if you want to throw it in as a prep patch. --- From: Sean Christopherson <seanjc@google.com> Date: Wed, 31 May 2023 09:41:12 -0700 Subject: [PATCH] x86/cpufeature: Require compile-time constant in cpu_feature_enabled() Assert that the to-be-checked bit passed to cpu_feature_enabled() is a compile-time constant instead of applying the DISABLED_MASK_BIT_SET() logic if and only if the bit is a constant. Conditioning the check on the bit being constant instead of requiring the bit to be constant could result in compiler specific kernel behavior, e.g. running on hardware that supports a disabled feature would return %false if the compiler resolved the bit to a constant, but %true if not. All current usage of cpu_feature_enabled() specifies a hardcoded X86_FEATURE_* flag, so this *should* be a glorified nop. Signed-off-by: Sean Christopherson <seanjc@google.com> --- arch/x86/include/asm/cpufeature.h | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h index ce0c8f7d3218..886200fbf8d9 100644 --- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -141,8 +141,11 @@ extern const char * const x86_bug_flags[NBUGINTS*32]; * supporting a possible guest feature where host support for it * is not relevant. */ -#define cpu_feature_enabled(bit) \ - (__builtin_constant_p(bit) && DISABLED_MASK_BIT_SET(bit) ? 0 : static_cpu_has(bit)) +#define cpu_feature_enabled(bit) \ +({ \ + BUILD_BUG_ON(!__builtin_constant_p(bit)); \ + DISABLED_MASK_BIT_SET(bit) ? 0 : static_cpu_has(bit); \ +}) #define boot_cpu_has(bit) cpu_has(&boot_cpu_data, bit) base-commit: 39428f6ea9eace95011681628717062ff7f5eb5f
> On May 31, 2023, at 4:18 PM, Sean Christopherson <seanjc@google.com> wrote: > > On Wed, May 31, 2023, Jon Kohler wrote: >> >>> On May 31, 2023, at 12:30 PM, Sean Christopherson <seanjc@google.com> wrote: >>> Caveat #1: cpu_feature_enabled() has a flaw that's relevant to this code: in the >>> unlikely scenario that the compiler doesn't resolve "cid" to a compile-time >>> constant value, cpu_feature_enabled() won't query DISABLED_MASK_BIT_SET(). I don't >>> see any other use of cpu_feature_enabled() without a hardcoded X86_FEATURE_*, and >>> the below compiles with my config, so I think/hope we can just require a compile-time >>> constant when using cpu_feature_enabled(). >>> >> >> Yea I think that should work. I’ll club that into v2 of this patch. > > I recommend doing it as a separate patch, hardening cpu_feature_enabled() shouldn't > have a dependency on tweaking the xfeatures mask. I tested this with an allyesconfig > if you want to throw it in as a prep patch. Ack, I’ll do that and make this into a small series, thanks for the help! > > --- > From: Sean Christopherson <seanjc@google.com> > Date: Wed, 31 May 2023 09:41:12 -0700 > Subject: [PATCH] x86/cpufeature: Require compile-time constant in > cpu_feature_enabled() > > Assert that the to-be-checked bit passed to cpu_feature_enabled() is a > compile-time constant instead of applying the DISABLED_MASK_BIT_SET() > logic if and only if the bit is a constant. Conditioning the check on > the bit being constant instead of requiring the bit to be constant could > result in compiler specific kernel behavior, e.g. running on hardware that > supports a disabled feature would return %false if the compiler resolved > the bit to a constant, but %true if not. > > All current usage of cpu_feature_enabled() specifies a hardcoded > X86_FEATURE_* flag, so this *should* be a glorified nop. > > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > arch/x86/include/asm/cpufeature.h | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h > index ce0c8f7d3218..886200fbf8d9 100644 > --- a/arch/x86/include/asm/cpufeature.h > +++ b/arch/x86/include/asm/cpufeature.h > @@ -141,8 +141,11 @@ extern const char * const x86_bug_flags[NBUGINTS*32]; > * supporting a possible guest feature where host support for it > * is not relevant. > */ > -#define cpu_feature_enabled(bit) \ > - (__builtin_constant_p(bit) && DISABLED_MASK_BIT_SET(bit) ? 0 : static_cpu_has(bit)) > +#define cpu_feature_enabled(bit) \ > +({ \ > + BUILD_BUG_ON(!__builtin_constant_p(bit)); \ > + DISABLED_MASK_BIT_SET(bit) ? 0 : static_cpu_has(bit); \ > +}) > > #define boot_cpu_has(bit) cpu_has(&boot_cpu_data, bit) > > > base-commit: 39428f6ea9eace95011681628717062ff7f5eb5f > -- > >>> I'm totally ok gating xfeature bits on cpu_feature_enabled(), but there should be >>> a prep patch for KVM to clear features bits in kvm_cpu_caps if the corresponding >>> XCR0/XSS bit is not set in the host. If KVM ever wants to expose an xstate feature >>> (other than MPX) that's disabled in the host, then we can revisit >>> fpu__init_system_xstate(). But we need to ensure the "failure" mode is that >>> KVM doesn't advertise the feature, as opposed to advertising a feature without >>> without context switching its data. >> >> >> Looking into this, trying to understand the comment about a feature being used >> without context switching its data. >> >> In __kvm_x86_vendor_init() we’re already populating host_xcr0 using the >> XCR_XFEATURE_ENABLED_MASK, which should be populated on boot >> by fpu__init_cpu_xstate(), which happens almost immediately after the code that I >> modified in this commit. >> >> That then flows into guest_supported_xcr0 (as well as user_xfeatures). >> guest_supported_xcr0 is then plumbed into __kvm_set_xcr, which specifically says >> that we’re using that to prevent the guest from setting bits that we won’t save in the >> first place. >> >> /* >> * Do not allow the guest to set bits that we do not support >> * saving. However, xcr0 bit 0 is always set, even if the >> * emulated CPU does not support XSAVE (see kvm_vcpu_reset()). >> */ >> valid_bits = vcpu->arch.guest_supported_xcr0 | XFEATURE_MASK_FP; >> >> Wouldn’t this mean that the *guest* xstate initialization would not see a given >> feature too and take care of the problem naturally? >> >> Or are you saying you’d want an even more detailed clearing? > > The CPUID bits that enumerate support for a feature are independent from the CPUID > bits that enumerate what XCR0 bits are supported, i.e. what features can be saved > and restored via XSAVE/XRSTOR. > > KVM does mostly account for host XCR0, but in a very ad hoc way. E.g. MPX is > handled by manually checking host XCR0. > > if (kvm_mpx_supported()) > kvm_cpu_cap_check_and_set(X86_FEATURE_MPX); > > PKU manually checks too, but indirectly by looking at whether or not the kernel > has enabled CR4.OSPKE. > > if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE)) > kvm_cpu_cap_clear(X86_FEATURE_PKU); > > But unless I'm missing something, the various AVX and AMX bits rely solely on > boot_cpu_data, i.e. would break if someone added CONFIG_X86_AVX or CONFIG_X86_AMX. What if we simply moved static unsigned short xsave_cpuid_features[] … into xstate.h, which is already included in arch/x86/kvm/cpuid.c, and do something similar to what I’m proposing in this patch already This would future proof such breakages I’d imagine? void kvm_set_cpu_caps(void) { ... /* * Clear CPUID for XSAVE features that are disabled. */ for (i = 0; i < ARRAY_SIZE(xsave_cpuid_features); i++) { unsigned short cid = xsave_cpuid_features[i]; /* Careful: X86_FEATURE_FPU is 0! */ if ((i != XFEATURE_FP && !cid) || !boot_cpu_has(cid) || !cpu_feature_enabled(cid)) kvm_cpu_cap_clear(cid); } ... }
On Wed, May 31, 2023 at 08:18:34PM +0000, Sean Christopherson wrote: > Assert that the to-be-checked bit passed to cpu_feature_enabled() is a > compile-time constant instead of applying the DISABLED_MASK_BIT_SET() > logic if and only if the bit is a constant. Conditioning the check on > the bit being constant instead of requiring the bit to be constant could > result in compiler specific kernel behavior, e.g. running on hardware that > supports a disabled feature would return %false if the compiler resolved > the bit to a constant, but %true if not. Uff, more mirroring CPUID inconsistencies. So *actually*, we should clear all those build-time disabled bits from x86_capability so that this doesn't happen. > All current usage of cpu_feature_enabled() specifies a hardcoded > X86_FEATURE_* flag, so this *should* be a glorified nop. Also, pls add here the exact example which prompted this as I'm sure we will forget why this was done. Thx.
On Wed, May 31, 2023, Borislav Petkov wrote: > On Wed, May 31, 2023 at 08:18:34PM +0000, Sean Christopherson wrote: > > Assert that the to-be-checked bit passed to cpu_feature_enabled() is a > > compile-time constant instead of applying the DISABLED_MASK_BIT_SET() > > logic if and only if the bit is a constant. Conditioning the check on > > the bit being constant instead of requiring the bit to be constant could > > result in compiler specific kernel behavior, e.g. running on hardware that > > supports a disabled feature would return %false if the compiler resolved > > the bit to a constant, but %true if not. > > Uff, more mirroring CPUID inconsistencies. > > So *actually*, we should clear all those build-time disabled bits from > x86_capability so that this doesn't happen. Heh, I almost suggested that, but there is a non-zero amount of code that wants to ignore the disabled bits and query the "raw" CPUID information. In quotes because the kernel still massages x86_capability. Changing that behavior will require auditing a lot of code, because in most cases any breakage will be mostly silent, e.g. loss of features/performance and not explosions. E.g. KVM emulates UMIP when it's not supported in hardware, and so advertises UMIP support irrespective of hardware/host support. But emulating UMIP is imperfect and suboptimal (requires intercepting L*DT instructions), so KVM intercepts L*DT instructions iff UMIP is not supported in hardware, as detected by boot_cpu_has(X86_FEATURE_UMIP). The comment for cpu_feature_enabled() even calls out this type of use case: Use the cpu_has() family if you want true runtime testing of CPU features, like in hypervisor code where you are supporting a possible guest feature where host support for it is not relevant. That said, the behavior of cpu_has() is wildly inconsistent, e.g. LA57 is indirectly cleared in x86_capability if it's a disabled bit because of this code in early_identify_cpu(). if (!pgtable_l5_enabled()) setup_clear_cpu_cap(X86_FEATURE_LA57); KVM works around that by manually doing CPUID to query hardware directly: /* Set LA57 based on hardware capability. */ if (cpuid_ecx(7) & F(LA57)) kvm_cpu_cap_set(X86_FEATURE_LA57); So yeah, I 100% agree the current state is messy and would love to have cpu_feature_enabled() be a pure optimization with respect to boot_cpu_has(), but it's not as trivial at it looks.
> On May 31, 2023, at 5:09 PM, Jon Kohler <jon@nutanix.com> wrote: > > >> >> The CPUID bits that enumerate support for a feature are independent from the CPUID >> bits that enumerate what XCR0 bits are supported, i.e. what features can be saved >> and restored via XSAVE/XRSTOR. >> >> KVM does mostly account for host XCR0, but in a very ad hoc way. E.g. MPX is >> handled by manually checking host XCR0. >> >> if (kvm_mpx_supported()) >> kvm_cpu_cap_check_and_set(X86_FEATURE_MPX); >> >> PKU manually checks too, but indirectly by looking at whether or not the kernel >> has enabled CR4.OSPKE. >> >> if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE)) >> kvm_cpu_cap_clear(X86_FEATURE_PKU); >> >> But unless I'm missing something, the various AVX and AMX bits rely solely on >> boot_cpu_data, i.e. would break if someone added CONFIG_X86_AVX or CONFIG_X86_AMX. > > What if we simply moved static unsigned short xsave_cpuid_features[] … into xstate.h, which > is already included in arch/x86/kvm/cpuid.c, and do something similar to what I’m proposing in this > patch already > > This would future proof such breakages I’d imagine? > > void kvm_set_cpu_caps(void) > { > ... > /* > * Clear CPUID for XSAVE features that are disabled. > */ > for (i = 0; i < ARRAY_SIZE(xsave_cpuid_features); i++) { > unsigned short cid = xsave_cpuid_features[i]; > > /* Careful: X86_FEATURE_FPU is 0! */ > if ((i != XFEATURE_FP && !cid) || !boot_cpu_has(cid) || > !cpu_feature_enabled(cid)) > kvm_cpu_cap_clear(cid); > } > … > } > Sean - following up on this rough idea code above, wanted to validate that this was the direction you were thinking of having kvm_set_cpu_caps() clear caps when a particular xsave feature was disabled?
On Mon, Jun 05, 2023, Jon Kohler wrote: > > On May 31, 2023, at 5:09 PM, Jon Kohler <jon@nutanix.com> wrote: > >> The CPUID bits that enumerate support for a feature are independent from the CPUID > >> bits that enumerate what XCR0 bits are supported, i.e. what features can be saved > >> and restored via XSAVE/XRSTOR. > >> > >> KVM does mostly account for host XCR0, but in a very ad hoc way. E.g. MPX is > >> handled by manually checking host XCR0. > >> > >> if (kvm_mpx_supported()) > >> kvm_cpu_cap_check_and_set(X86_FEATURE_MPX); > >> > >> PKU manually checks too, but indirectly by looking at whether or not the kernel > >> has enabled CR4.OSPKE. > >> > >> if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE)) > >> kvm_cpu_cap_clear(X86_FEATURE_PKU); > >> > >> But unless I'm missing something, the various AVX and AMX bits rely solely on > >> boot_cpu_data, i.e. would break if someone added CONFIG_X86_AVX or CONFIG_X86_AMX. > > > > What if we simply moved static unsigned short xsave_cpuid_features[] … into > > xstate.h, which is already included in arch/x86/kvm/cpuid.c, and do > > something similar to what I’m proposing in this patch already > > > > This would future proof such breakages I’d imagine? > > > > void kvm_set_cpu_caps(void) > > { > > ... > > /* > > * Clear CPUID for XSAVE features that are disabled. > > */ > > for (i = 0; i < ARRAY_SIZE(xsave_cpuid_features); i++) { > > unsigned short cid = xsave_cpuid_features[i]; > > > > /* Careful: X86_FEATURE_FPU is 0! */ > > if ((i != XFEATURE_FP && !cid) || !boot_cpu_has(cid) || > > !cpu_feature_enabled(cid)) > > kvm_cpu_cap_clear(cid); > > } > > … > > } > > > > Sean - following up on this rough idea code above, wanted to validate that > this was the direction you were thinking of having kvm_set_cpu_caps() clear > caps when a particular xsave feature was disabled? Ya, more or or less. But for KVM, that should be kvm_cpu_cap_has(), not boot_cpu_has(). And then I think KVM could actually WARN on a feature being disabled, i.e. put up a tripwire to detect if things change in the future and the kernel lets the user disable a feature that KVM wants to expose to a guest. Side topic, I find the "cid" nomenclature super confusing, and the established name in KVM is x86_feature. Something like this? for (i = 0; i < ARRAY_SIZE(xsave_cpuid_features); i++) { unsigned int x86_feature = xsave_cpuid_features[i]; if (i != XFEATURE_FP && !x86_feature) continue; if (!kvm_cpu_cap_has(x86_feature)) continue; if (WARN_ON_ONCE(!cpu_feature_enabled(x86_feature))) kvm_cpu_cap_clear(x86_feature); }
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 0bab497c9436..211ef82b53e3 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -798,7 +798,8 @@ void __init fpu__init_system_xstate(unsigned int legacy_size) unsigned short cid = xsave_cpuid_features[i]; /* Careful: X86_FEATURE_FPU is 0! */ - if ((i != XFEATURE_FP && !cid) || !boot_cpu_has(cid)) + if ((i != XFEATURE_FP && !cid) || !boot_cpu_has(cid) || + DISABLED_MASK_BIT_SET(cid)) fpu_kernel_cfg.max_features &= ~BIT_ULL(i); }