From patchwork Mon Feb 19 07:47:07 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Yang, Weijiang" X-Patchwork-Id: 202930 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:693c:2685:b0:108:e6aa:91d0 with SMTP id mn5csp1135971dyc; Sun, 18 Feb 2024 23:52:25 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCWPafRfPBHurYj0zt9tzaMSwxfNrNCEo0B/LcP1WM3C/UvyHXmbJnjPNk0j7Zordzw0fNJk1T8UYwA+KwK7/6mz+1RSsg== X-Google-Smtp-Source: AGHT+IEc63aYZgmXso+TLLtxV6xJ3VX/tw5YQSiiVA8plrxmREQki2bikayhzBPpJmxZC4nhg9ll X-Received: by 2002:a05:6512:21e:b0:512:ada2:190e with SMTP id a30-20020a056512021e00b00512ada2190emr1991537lfo.15.1708329145074; Sun, 18 Feb 2024 23:52:25 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1708329145; cv=pass; d=google.com; s=arc-20160816; b=KluUpPM+nV8JGUekrfJPg8OD7NVhLR+uYMATUGJyYU0ngsPcVoNx5NboYHGArenprB mResGJWiWRU5Yjts76wcqVWmjaDRYPGzjAKv1/A8O/r+DS1lH5Ukg3NlhUuS/VJtc14v VBdzgthoNtH+C6LN8tr5OyNjM7giDhLYkb2H2wME+pnkfoAuy9wePT2rg+kcxv8RYDik RTZLpxaPJ7hPBzWVwvHxvpiEqv1vL32uK4/I0TDXzg44OFM0MucYOfhFCljZl8B0JD7h VabONQJ0+N8VE8PZmJkv/dOZr72XaSTNz52PZRaaUdSVR1382V723jknJyg2gZcU0BH+ 5aEA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=e4t/NxAKsodRuK5hABYzKarBYbUuInz3JQNT+iMXYUI=; fh=6bYtQoKQxNJSYrJA1a9vseXH6qHZpRYO7L/7krtpXA0=; b=yvjyBunZaI5sacbj6oltQqgfaPepIQkCWz2PVY52e9HRvmvk9w1SowBxAgGFGN1dCZ cu+WR7Uqo9RZPj72KM2WeUCMlZolb+6iK6fsV1ZEp2fKaEWhznGH5GbfYDVmiLRvmxfJ 1s+SVcEHMweaPKbD9X4mKRgKWTmcDyjJzfqYvdjwJytGrA8GRH82885J0t0imbGqSMt6 B/l7DFcoVbW+NehJajbXzNzza37HoBUtq5nIMogBUG22QrZyfGCuez9oFvP3yHLynStj miMov0bPTWz/2oNwa+bDEOfwU1bA00dBEMUTaCyqFoct3EkQL9L5coSlYUAW9VhPPn/a KAow==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="jkWCW/Ug"; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-70868-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-70868-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id g27-20020a17090613db00b00a3ecdad9d8csi65445ejc.310.2024.02.18.23.52.24 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 18 Feb 2024 23:52:25 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-70868-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="jkWCW/Ug"; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-70868-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-70868-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 82F1E1F20F9B for ; Mon, 19 Feb 2024 07:52:24 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 3D4952E637; Mon, 19 Feb 2024 07:47:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="jkWCW/Ug" Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 879B8210EC; Mon, 19 Feb 2024 07:47:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708328868; cv=none; b=AyGnZOgb5TFBmBXe23mHoK5ZUmXg7h3yoVMxrjQ0KB2wTABh7Mj4FW5V8gPbvXUS+71Lb6+R4OL1/Ysvqoj1IoiGlBFsgouTAAZUK5uC4IsuLBP2VNBWnz4rOrwnUf8CW1Oi8Wx5M8GsbxUVx9NDGdZCr70118HF2M/AyY4tjs0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708328868; c=relaxed/simple; bh=j7/oTF9PaZHXhsGXMLIHaKH3qGq42otZGD7EVuO6kH4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=RUw0CbkiCPx8kF8IJSjVjqjM0Zq/r9gpX22qMHR35MI6s9AjnQssbK+vy8sXgkvhgOyhBAtIbff3BxcE3DJKaNawjZgHBynaKPe+s3dD/MP6Q7meHmUK3jNMlO13+7ePr+pm+BQuLy/exaHq3GMvTCqyNqTTOAlZJazAZJ0bozE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=jkWCW/Ug; arc=none smtp.client-ip=198.175.65.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1708328866; x=1739864866; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=j7/oTF9PaZHXhsGXMLIHaKH3qGq42otZGD7EVuO6kH4=; b=jkWCW/Ug9PmBRejg2WSwly//SWw32nH7p8AlMUEyq+hCFadHejwb6et+ RO1RbCIoFJiNtiFArzbe30bR46u6tjR5/G325ZpLgTUlkaVtRmOONYoM9 LGFHJ6ArQl70s3oCXbeBIafQyr4HdghCrG0ehAR/kaIcMhajp5Vmth+w9 OoXbRJGEEn4V7eNRUURqi7LlN1smuaGO5iKhHK7vGvA0j8abY2DSKr701 DNp+lqer+/GF9/iNeCL7aEG143TBikJByLantsl+cuVTXuQutgTzl6hzF XGHxyrWcAG/mgCacpNJJs0e1TVci3j+Zm6JBUx8IuK7Xv5d2KvxjeKyyf Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10988"; a="2535017" X-IronPort-AV: E=Sophos;i="6.06,170,1705392000"; d="scan'208";a="2535017" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orvoesa108.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Feb 2024 23:47:43 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10988"; a="826966060" X-IronPort-AV: E=Sophos;i="6.06,170,1705392000"; d="scan'208";a="826966060" Received: from jf.jf.intel.com (HELO jf.intel.com) ([10.165.9.183]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Feb 2024 23:47:43 -0800 From: Yang Weijiang To: seanjc@google.com, pbonzini@redhat.com, dave.hansen@intel.com, x86@kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org Cc: peterz@infradead.org, chao.gao@intel.com, rick.p.edgecombe@intel.com, mlevitsk@redhat.com, john.allen@amd.com, weijiang.yang@intel.com Subject: [PATCH v10 01/27] x86/fpu/xstate: Always preserve non-user xfeatures/flags in __state_perm Date: Sun, 18 Feb 2024 23:47:07 -0800 Message-ID: <20240219074733.122080-2-weijiang.yang@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240219074733.122080-1-weijiang.yang@intel.com> References: <20240219074733.122080-1-weijiang.yang@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1791312941617508916 X-GMAIL-MSGID: 1791312941617508916 From: Sean Christopherson When granting userspace or a KVM guest access to an xfeature, preserve the entity's existing supervisor and software-defined permissions as tracked by __state_perm, i.e. use __state_perm to track *all* permissions even though all supported supervisor xfeatures are granted to all FPUs and FPU_GUEST_PERM_LOCKED disallows changing permissions. Effectively clobbering supervisor permissions results in inconsistent behavior, as xstate_get_group_perm() will report supervisor features for process that do NOT request access to dynamic user xfeatures, whereas any and all supervisor features will be absent from the set of permissions for any process that is granted access to one or more dynamic xfeatures (which right now means AMX). The inconsistency isn't problematic because fpu_xstate_prctl() already strips out everything except user xfeatures: case ARCH_GET_XCOMP_PERM: /* * Lockless snapshot as it can also change right after the * dropping the lock. */ permitted = xstate_get_host_group_perm(); permitted &= XFEATURE_MASK_USER_SUPPORTED; return put_user(permitted, uptr); case ARCH_GET_XCOMP_GUEST_PERM: permitted = xstate_get_guest_group_perm(); permitted &= XFEATURE_MASK_USER_SUPPORTED; return put_user(permitted, uptr); and similarly KVM doesn't apply the __state_perm to supervisor states (kvm_get_filtered_xcr0() incorporates xstate_get_guest_group_perm()): case 0xd: { u64 permitted_xcr0 = kvm_get_filtered_xcr0(); u64 permitted_xss = kvm_caps.supported_xss; But if KVM in particular were to ever change, dropping supervisor permissions would result in subtle bugs in KVM's reporting of supported CPUID settings. And the above behavior also means that having supervisor xfeatures in __state_perm is correctly handled by all users. Dropping supervisor permissions also creates another landmine for KVM. If more dynamic user xfeatures are ever added, requesting access to multiple xfeatures in separate ARCH_REQ_XCOMP_GUEST_PERM calls will result in the second invocation of __xstate_request_perm() computing the wrong ksize, as as the mask passed to xstate_calculate_size() would not contain *any* supervisor features. Commit 781c64bfcb73 ("x86/fpu/xstate: Handle supervisor states in XSTATE permissions") fudged around the size issue for userspace FPUs, but for reasons unknown skipped guest FPUs. Lack of a fix for KVM "works" only because KVM doesn't yet support virtualizing features that have supervisor xfeatures, i.e. as of today, KVM guest FPUs will never need the relevant xfeatures. Simply extending the hack-a-fix for guests would temporarily solve the ksize issue, but wouldn't address the inconsistency issue and would leave another lurking pitfall for KVM. KVM support for virtualizing CET will likely add CET_KERNEL as a guest-only xfeature, i.e. CET_KERNEL will not be set in xfeatures_mask_supervisor() and would again be dropped when granting access to dynamic xfeatures. Note, the existing clobbering behavior is rather subtle. The @permitted parameter to __xstate_request_perm() comes from: permitted = xstate_get_group_perm(guest); which is either fpu->guest_perm.__state_perm or fpu->perm.__state_perm, where __state_perm is initialized to: fpu->perm.__state_perm = fpu_kernel_cfg.default_features; and copied to the guest side of things: /* Same defaults for guests */ fpu->guest_perm = fpu->perm; fpu_kernel_cfg.default_features contains everything except the dynamic xfeatures, i.e. everything except XFEATURE_MASK_XTILE_DATA: fpu_kernel_cfg.default_features = fpu_kernel_cfg.max_features; fpu_kernel_cfg.default_features &= ~XFEATURE_MASK_USER_DYNAMIC; When __xstate_request_perm() restricts the local "mask" variable to compute the user state size: mask &= XFEATURE_MASK_USER_SUPPORTED; usize = xstate_calculate_size(mask, false); it subtly overwrites the target __state_perm with "mask" containing only user xfeatures: perm = guest ? &fpu->guest_perm : &fpu->perm; /* Pairs with the READ_ONCE() in xstate_get_group_perm() */ WRITE_ONCE(perm->__state_perm, mask); Cc: Maxim Levitsky Cc: Weijiang Yang Cc: Dave Hansen Cc: Paolo Bonzini Cc: Peter Zijlstra Cc: Chao Gao Cc: Rick Edgecombe Cc: John Allen Cc: kvm@vger.kernel.org Link: https://lore.kernel.org/all/ZTqgzZl-reO1m01I@google.com Signed-off-by: Sean Christopherson Signed-off-by: Yang Weijiang Reviewed-by: Maxim Levitsky Reviewed-by: Rick Edgecombe --- arch/x86/kernel/fpu/xstate.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 117e74c44e75..07911532b108 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -1601,16 +1601,20 @@ static int __xstate_request_perm(u64 permitted, u64 requested, bool guest) if ((permitted & requested) == requested) return 0; - /* Calculate the resulting kernel state size */ + /* + * Calculate the resulting kernel state size. Note, @permitted also + * contains supervisor xfeatures even though supervisor are always + * permitted for kernel and guest FPUs, and never permitted for user + * FPUs. + */ mask = permitted | requested; - /* Take supervisor states into account on the host */ - if (!guest) - mask |= xfeatures_mask_supervisor(); ksize = xstate_calculate_size(mask, compacted); - /* Calculate the resulting user state size */ - mask &= XFEATURE_MASK_USER_SUPPORTED; - usize = xstate_calculate_size(mask, false); + /* + * Calculate the resulting user state size. Take care not to clobber + * the supervisor xfeatures in the new mask! + */ + usize = xstate_calculate_size(mask & XFEATURE_MASK_USER_SUPPORTED, false); if (!guest) { ret = validate_sigaltstack(usize);