Message ID | 20221107063807.81774-2-khuey@kylehuey.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp1875718wru; Sun, 6 Nov 2022 22:48:17 -0800 (PST) X-Google-Smtp-Source: AMsMyM51C/dVkTFjZsdNUDNagTsWwrmdKHfcVd9/iJxlO3aG1XdCVD5M0alp8wzm4oBTmff5QwdV X-Received: by 2002:a17:907:80d:b0:73d:1e3f:3d83 with SMTP id wv13-20020a170907080d00b0073d1e3f3d83mr44891420ejb.372.1667803697465; Sun, 06 Nov 2022 22:48:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1667803697; cv=none; d=google.com; s=arc-20160816; b=yJhZJrrEZlPQrsSMpDWZbEDtz5Vwrn0PPktpd/0dsQIqizobnh9+7ojWNePyJlIU2e BUtyqkq0d3rrsL2F7rCFL0vfs1SE/xIL+I2JR40Fvm+QI9PQ4sPoaEavLQpUxynxRVPf Ea5nkU9TQ9fNihiynIdndFyE7MZiKpblDfzszd5+3nqsGtsvCDplRonrzU96cSeF6aZ9 V2O8yhU9cMH6G/o2e85rE4wwbI3XHOlX6/vahkfCb+R3x7VYr1lrZ2DNsZhTrcXn13e1 6ZsKHqNwOdkAqhbWalDFvA9QGLJ/K65oe4N2eEjXTk4kARKzyZFlPdLZ2w1gInx8bFEb 28Cg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=9mtjXR87V5zH/fMM47wiRGD726rwhb3iIYRWAJHDu0g=; b=mK8KUnaBPxpGK/8M1Nofk5oC+ozmSbTzV/q54qUGfesyukaJDPxq5mO6jqK43J5zaP 1QCacCJXhdfF02HJH+OOUzUGbiqxrYrCdrllObZrg39xgsSkAeG3ElJ7M/gbBOfQaHr9 DRDd4AOMUZk1/By8tKqUdUM9Keau/1wDvacghf0uGMgSOJ3bFKp6iMB9c8IEq2ws5TNh GwLqBH0NrxPVjhAjrHerWBMGtxN67Xg+ZMY7KRZrK6e5QW3xba08Y6ZiCA+ZQCtPS+rR x26KGuRN+yX8KSqt+lPnoGxowPq8viqDzfja7VRvWgTi3pyNrjVqLIwPanv+BZVcSML4 cKpg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kylehuey.com header.s=google header.b=ZFGmnygH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ne28-20020a1709077b9c00b007a9c3366e9asi9264840ejc.716.2022.11.06.22.47.53; Sun, 06 Nov 2022 22:48:17 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kylehuey.com header.s=google header.b=ZFGmnygH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230430AbiKGGi3 (ORCPT <rfc822;hjfbswb@gmail.com> + 99 others); Mon, 7 Nov 2022 01:38:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48006 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231138AbiKGGi1 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 7 Nov 2022 01:38:27 -0500 Received: from mail-pf1-x435.google.com (mail-pf1-x435.google.com [IPv6:2607:f8b0:4864:20::435]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D67BAE82 for <linux-kernel@vger.kernel.org>; Sun, 6 Nov 2022 22:38:25 -0800 (PST) Received: by mail-pf1-x435.google.com with SMTP id q9so9719953pfg.5 for <linux-kernel@vger.kernel.org>; Sun, 06 Nov 2022 22:38:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kylehuey.com; s=google; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=9mtjXR87V5zH/fMM47wiRGD726rwhb3iIYRWAJHDu0g=; b=ZFGmnygH/bA8GD6CH49bLrGItI53z8fuVmwSh+8/Ffdw4H02GBBtMU2UWE4yr071W3 m/l98oanMYhVNLk2rL8MpMLIJijsm21roy/Xr/gwwvr+I2Mm0wT2pTx+tTE7TzPJtgS6 MS0JlUoRIFzaAZu5m9awQZbjGmWqWznH4vDvQ2Hbp6IHrI7cO7NlMCXbkSm3bStboLAC hPwYEhIBiCtnYh7VMmupBVz5Fxvp+yRzDiIqBh28FH48sBSN/rUob3pI0aOkt2HMtfji 2MRuokyaj4fB07zimGANpFM9SSGvd1XjWMp5SGMHzaHupNZO8Lg3asi63TYs8GBB39qZ rWEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9mtjXR87V5zH/fMM47wiRGD726rwhb3iIYRWAJHDu0g=; b=D9s4xxLzUaDFPT+IZPlDdcB5E+VoQBuJzD4mgJm7YKm+OXY0l6KbsALQWANUaysWVb pJlVGNjKs8v1ouy+FEuVq2q9+tof4oORFgocimzvvXuZFUE4WT3k5UOClBPyxxmvlJ2s nZhUjsAR3UTKHEYVBi6tPkpqsX/Rz1MsYUJjLXBU/wnfg8KAnypZtDkKUQ8lGA+F0PMu JkrTWPLbuV1ddufxr2NqUsR1ERX4FD0utXqMq9477fy8rIxh+AjqfwVYddHpvL4WgwUw 4W3SrdR7wXQR9RJEbQYuldUGTGRYXKitDg1ev6+XWp+5Iy2xktmTsAPGE1pBWkifuedM SFUw== X-Gm-Message-State: ACrzQf2eLS5eDSavHEhZ6xKbRX/3iluIcsfRqS2bhiA+ivV4fwV57F7V A6YnmaeXfxp5gwKdBH/y730GCA== X-Received: by 2002:a05:6a00:1822:b0:56b:f29d:cca1 with SMTP id y34-20020a056a00182200b0056bf29dcca1mr49595196pfa.65.1667803105304; Sun, 06 Nov 2022 22:38:25 -0800 (PST) Received: from minbar.home.kylehuey.com (c-71-198-251-229.hsd1.ca.comcast.net. [71.198.251.229]) by smtp.gmail.com with ESMTPSA id b20-20020a63d814000000b004468cb97c01sm3453803pgh.56.2022.11.06.22.38.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 06 Nov 2022 22:38:24 -0800 (PST) From: Kyle Huey <me@kylehuey.com> X-Google-Original-From: Kyle Huey <khuey@kylehuey.com> To: Linus Torvalds <torvalds@linux-foundation.org> Cc: Dave Hansen <dave.hansen@linux.intel.com>, Thomas Gleixner <tglx@linutronix.de>, Borislav Petkov <bp@alien8.de>, Ingo Molnar <mingo@redhat.com>, x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>, Paolo Bonzini <pbonzini@redhat.com>, Andy Lutomirski <luto@kernel.org>, Peter Zijlstra <peterz@infradead.org>, Sean Christopherson <seanjc@google.com>, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, Robert O'Callahan <robert@ocallahan.org>, David Manouchehri <david.manouchehri@riseup.net>, Kyle Huey <me@kylehuey.com>, Borislav Petkov <bp@suse.de>, stable@vger.kernel.org Subject: [RESEND PATCH v6 1/2] x86/fpu: Allow PKRU to be (once again) written by ptrace. Date: Sun, 6 Nov 2022 22:38:07 -0800 Message-Id: <20221107063807.81774-2-khuey@kylehuey.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20221107063807.81774-1-khuey@kylehuey.com> References: <20221107063807.81774-1-khuey@kylehuey.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748818930015184267?= X-GMAIL-MSGID: =?utf-8?q?1748818930015184267?= |
Series |
[RESEND,v6,1/2] x86/fpu: Allow PKRU to be (once again) written by ptrace.
|
|
Commit Message
Kyle Huey
Nov. 7, 2022, 6:38 a.m. UTC
From: Kyle Huey <me@kylehuey.com> When management of the PKRU register was moved away from XSTATE, emulation of PKRU's existence in XSTATE was added for reading PKRU through ptrace, but not for writing PKRU through ptrace. This can be seen by running gdb and executing `p $pkru`, `set $pkru = 42`, and `p $pkru`. On affected kernels (5.14+) the write to the PKRU register (which gdb performs through ptrace) is ignored. There are three APIs that write PKRU: sigreturn, PTRACE_SETREGSET with NT_X86_XSTATE, and KVM_SET_XSAVE. sigreturn still uses XRSTOR to write to PKRU. KVM_SET_XSAVE has its own special handling to make PKRU writes take effect (in fpu_copy_uabi_to_guest_fpstate). Push that down into copy_uabi_to_xstate and have PTRACE_SETREGSET with NT_X86_XSTATE pass in a pointer to the appropriate PKRU slot. copy_sigframe_from_user_to_xstate depends on copy_uabi_to_xstate populating the PKRU field in the task's XSTATE so that __fpu_restore_sig can do a XRSTOR from it, so continue doing that. This also adds code to initialize the PKRU value to the hardware init value (namely 0) if the PKRU bit is not set in the XSTATE header provided to ptrace, to match XRSTOR. Changelog since v5: - Avoids a second copy from the uabi buffer as suggested. - Preserves old KVM_SET_XSAVE behavior where leaving the PKRU bit in the XSTATE header results in PKRU remaining unchanged instead of reinitializing it. - Fixed up patch metadata as requested. Changelog since v4: - Selftest additionally checks PKRU readbacks through ptrace. - Selftest flips all PKRU bits (except the default key). Changelog since v3: - The v3 patch is now part 1 of 2. - Adds a selftest in part 2 of 2. Changelog since v2: - Removed now unused variables in fpu_copy_uabi_to_guest_fpstate Changelog since v1: - Handles the error case of copy_to_buffer(). Fixes: e84ba47e313d ("x86/fpu: Hook up PKRU into ptrace()") Signed-off-by: Kyle Huey <me@kylehuey.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org # 5.14+ --- arch/x86/kernel/fpu/core.c | 20 +++++++++----------- arch/x86/kernel/fpu/regset.c | 2 +- arch/x86/kernel/fpu/signal.c | 2 +- arch/x86/kernel/fpu/xstate.c | 25 ++++++++++++++++++++----- arch/x86/kernel/fpu/xstate.h | 4 ++-- 5 files changed, 33 insertions(+), 20 deletions(-)
Comments
Kyle, sorry this took so long to get to. I do really appreciate the fix and the selftest. This seems like something we want to get merged sooner rather than later, so please bear with me. On 11/6/22 22:38, Kyle Huey wrote: > There are three APIs that write PKRU: sigreturn, PTRACE_SETREGSET with > NT_X86_XSTATE, and KVM_SET_XSAVE. sigreturn still uses XRSTOR to write to > PKRU. KVM_SET_XSAVE has its own special handling to make PKRU writes take > effect (in fpu_copy_uabi_to_guest_fpstate). Push that down into > copy_uabi_to_xstate and have PTRACE_SETREGSET with NT_X86_XSTATE pass in > a pointer to the appropriate PKRU slot. copy_sigframe_from_user_to_xstate > depends on copy_uabi_to_xstate populating the PKRU field in the task's > XSTATE so that __fpu_restore_sig can do a XRSTOR from it, so continue doing > that. Please always write functions() with parenthesis to make it clear what you're talking about. Also, there are much better ways to format this paragraph. I probably would have said: There are three APIs that write PKRU: 1. sigreturn 2. PTRACE_SETREGSET with NT_X86_XSTATE 3. KVM_SET_XSAVE Then broken it up into three follow-on paragraphs. I actually kinda had to do this to even make sense of what you were trying to say above. It would also be nice to have a clear problem statement paired with the mention of these three ABIs. #1 and #3 work OK, right? It's #2 that's broken? > sigreturn still uses XRSTOR to write to PKRU. ... which means? That sigreturn is fine and does not need to be touched in this patch? > KVM_SET_XSAVE has its own special handling to make PKRU writes take > effect (in fpu_copy_uabi_to_guest_fpstate()). Push that down into > copy_uabi_to_xstate() and have PTRACE_SETREGSET with NT_X86_XSTATE pass in > a pointer to the appropriate PKRU slot. So this is the bugfix? KVM already does it right, and we just need to make ptrace() share the KVM code? > copy_sigframe_from_user_to_xstate() depends on copy_uabi_to_xstate() > populating the PKRU field in the task's XSTATE so that > __fpu_restore_sig() can do a XRSTOR from it, so continue doing that. I'm not quite sure what this chunk of the changelog is trying to tell me. Isn't this the sigreturn path? Why did the paragraph above go from talking about sigreturn to KVM then back to sigreturn? > This also adds code to initialize the PKRU value to the hardware init value > (namely 0) if the PKRU bit is not set in the XSTATE header provided to > ptrace, to match XRSTOR. The implication here is that we would like the sigreturn ABI and the ptrace ABI to behave in a similar fashion, right? At a high level, this patch does a *LOT*. Generally, it's nice when bugfixes can be encapsulted in one patch, but I think there's too much going on here for one patch. > diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c > index 3b28c5b25e12..c273669e8a00 100644 > --- a/arch/x86/kernel/fpu/core.c > +++ b/arch/x86/kernel/fpu/core.c > @@ -391,8 +391,6 @@ int fpu_copy_uabi_to_guest_fpstate(struct fpu_guest *gfpu, const void *buf, > { > struct fpstate *kstate = gfpu->fpstate; > const union fpregs_state *ustate = buf; > - struct pkru_state *xpkru; > - int ret; > > if (!cpu_feature_enabled(X86_FEATURE_XSAVE)) { > if (ustate->xsave.header.xfeatures & ~XFEATURE_MASK_FPSSE) > @@ -406,16 +404,16 @@ int fpu_copy_uabi_to_guest_fpstate(struct fpu_guest *gfpu, const void *buf, > if (ustate->xsave.header.xfeatures & ~xcr0) > return -EINVAL; > > - ret = copy_uabi_from_kernel_to_xstate(kstate, ustate); > - if (ret) > - return ret; > + /* > + * Nullify @vpkru to preserve its current value if PKRU's bit isn't set > + * in the header. KVM's odd ABI is to leave PKRU untouched in this > + * case (all other components are eventually re-initialized). > + * (Not clear that this is actually necessary for compat). > + */ > + if (!(ustate->xsave.header.xfeatures & XFEATURE_MASK_PKRU)) > + vpkru = NULL; I'm not a big fan of hunks that are part of bugfixes where it is not clear that the hunk is necessary. > - /* Retrieve PKRU if not in init state */ > - if (kstate->regs.xsave.header.xfeatures & XFEATURE_MASK_PKRU) { > - xpkru = get_xsave_addr(&kstate->regs.xsave, XFEATURE_PKRU); > - *vpkru = xpkru->pkru; > - } > - return 0; > + return copy_uabi_from_kernel_to_xstate(kstate, ustate, vpkru); > } > EXPORT_SYMBOL_GPL(fpu_copy_uabi_to_guest_fpstate); > #endif /* CONFIG_KVM */ > diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c > index 75ffaef8c299..6d056b68f4ed 100644 > --- a/arch/x86/kernel/fpu/regset.c > +++ b/arch/x86/kernel/fpu/regset.c > @@ -167,7 +167,7 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset, > } > > fpu_force_restore(fpu); > - ret = copy_uabi_from_kernel_to_xstate(fpu->fpstate, kbuf ?: tmpbuf); > + ret = copy_uabi_from_kernel_to_xstate(fpu->fpstate, kbuf ?: tmpbuf, &target->thread.pkru); I actually hadn't dug into the KVM code around this before. It seems like the PKRU pointer (&target->thread.pkru) here can also be &vcpu->arch.pkru if it comes from the KVM side. I was missing why PKRU is so special here. I think in *both* cases, we're copying a potential PKRU value from userspace. But, the fpstate target is a useless place to write PKRU because the kernel doesn't update from there. So, the copy-in function (copy_uabi_from_kernel_to_xstate()) needs a place to stash PKRU where the kernel will see it. The place that the kernel wants to stash it is either the task PKRU field or the KVM vcpu field. That's what the pointer provides. Also, is this getting a wee bit over 80 columns? > out: > vfree(tmpbuf); > diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c > index 91d4b6de58ab..558076dbde5b 100644 > --- a/arch/x86/kernel/fpu/signal.c > +++ b/arch/x86/kernel/fpu/signal.c > @@ -396,7 +396,7 @@ static bool __fpu_restore_sig(void __user *buf, void __user *buf_fx, > > fpregs = &fpu->fpstate->regs; > if (use_xsave() && !fx_only) { > - if (copy_sigframe_from_user_to_xstate(fpu->fpstate, buf_fx)) > + if (copy_sigframe_from_user_to_xstate(tsk, buf_fx)) > return false; This is also changing copy_sigframe_from_user_to_xstate() to take a 'task_struct' instead of an 'fpstate'. Why? That function just turns it right back into an fpstate with: tsk->thread.fpu.fpstate. > } else { > if (__copy_from_user(&fpregs->fxsave, buf_fx, > diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c > index c8340156bfd2..8f14981a3936 100644 > --- a/arch/x86/kernel/fpu/xstate.c > +++ b/arch/x86/kernel/fpu/xstate.c > @@ -1197,7 +1197,7 @@ static int copy_from_buffer(void *dst, unsigned int offset, unsigned int size, > > > static int copy_uabi_to_xstate(struct fpstate *fpstate, const void *kbuf, > - const void __user *ubuf) > + const void __user *ubuf, u32 *pkru) I think this function deserves a little comment about what it expects from 'pkru' here. Maybe: /* * The kernel will not update the actual PKRU register from the PKRU * space in @fpstate. Allow callers to pass in an alternate destination * for PKRU. This is currently either the pkru field from the * task_struct or vcpu. */ > { > struct xregs_state *xsave = &fpstate->regs.xsave; > unsigned int offset, size; > @@ -1246,6 +1246,21 @@ static int copy_uabi_to_xstate(struct fpstate *fpstate, const void *kbuf, > } > } > > + /* > + * Update the user protection key storage. Allow KVM to > + * pass in a NULL pkru pointer if the mask bit is unset > + * for its legacy ABI behavior. > + */ If I read this in 5 years, do I know what the "KVM legacy ABI behavior" is? > + if (pkru) > + *pkru = 0; > + > + if (hdr.xfeatures & XFEATURE_MASK_PKRU) { > + struct pkru_state *xpkru; > + > + xpkru = __raw_xsave_addr(xsave, XFEATURE_PKRU); > + *pkru = xpkru->pkru; > + } This is a bit wonky. The code kinda pretends that XFEATURE_MASK_PKRU and 'pkru' are independent. But, it's actually impossible to have a pkru==NULL and have XFEATURE_MASK_PKRU set. The code would oops in that case. Would something like this be more clear? if (hdr.xfeatures & XFEATURE_MASK_PKRU) { struct pkru_state *xpkru; xpkru = __raw_xsave_addr(xsave, XFEATURE_PKRU); *pkru = xpkru->pkru; } else { /* * KVM may pass a NULL 'pkru' to indicate * that it does not need PKRU updated. */ if (pkru) *pkru = 0; }
On Tue, Nov 8, 2022 at 10:23 AM Dave Hansen <dave.hansen@intel.com> wrote: > > Kyle, sorry this took so long to get to. I do really appreciate the fix > and the selftest. This seems like something we want to get merged > sooner rather than later, so please bear with me. > > On 11/6/22 22:38, Kyle Huey wrote: > > There are three APIs that write PKRU: sigreturn, PTRACE_SETREGSET with > > NT_X86_XSTATE, and KVM_SET_XSAVE. sigreturn still uses XRSTOR to write to > > PKRU. KVM_SET_XSAVE has its own special handling to make PKRU writes take > > effect (in fpu_copy_uabi_to_guest_fpstate). Push that down into > > copy_uabi_to_xstate and have PTRACE_SETREGSET with NT_X86_XSTATE pass in > > a pointer to the appropriate PKRU slot. copy_sigframe_from_user_to_xstate > > depends on copy_uabi_to_xstate populating the PKRU field in the task's > > XSTATE so that __fpu_restore_sig can do a XRSTOR from it, so continue doing > > that. > > Please always write functions() with parenthesis to make it clear what > you're talking about. Also, there are much better ways to format this > paragraph. I probably would have said: > > There are three APIs that write PKRU: > 1. sigreturn > 2. PTRACE_SETREGSET with NT_X86_XSTATE > 3. KVM_SET_XSAVE > > Then broken it up into three follow-on paragraphs. I actually kinda had > to do this to even make sense of what you were trying to say above. It > would also be nice to have a clear problem statement paired with the > mention of these three ABIs. > > #1 and #3 work OK, right? It's #2 that's broken? Well this depends on one defines work. If you use my definition of "behaves equivalently to a hardware XRSTOR instruction" then #1 works, #2 is totally broken, and #3 is subtly broken. But #3 has been the way that it is for a long time and the KVM maintainers don't want to change it, whereas #2 was broken recently (in the commit this fixes). > > sigreturn still uses XRSTOR to write to PKRU. > > ... which means? That sigreturn is fine and does not need to be touched > in this patch? Yes (modulo refactoring). > > KVM_SET_XSAVE has its own special handling to make PKRU writes take > > effect (in fpu_copy_uabi_to_guest_fpstate()). Push that down into > > copy_uabi_to_xstate() and have PTRACE_SETREGSET with NT_X86_XSTATE pass in > > a pointer to the appropriate PKRU slot. > > So this is the bugfix? KVM already does it right, and we just need to > make ptrace() share the KVM code? Largely but not entirely, because KVM's behavior is subtly different from XRSTOR's. KVM doesn't reinitialize PKRU to the hardware init value if the PKRU bit is not set in the xfeatures mask, whereas XRSTOR does (and thus ptrace previously did). > > copy_sigframe_from_user_to_xstate() depends on copy_uabi_to_xstate() > > populating the PKRU field in the task's XSTATE so that > > __fpu_restore_sig() can do a XRSTOR from it, so continue doing that. > > I'm not quite sure what this chunk of the changelog is trying to tell > me. Isn't this the sigreturn path? Why did the paragraph above go from > talking about sigreturn to KVM then back to sigreturn? It's telling you that nothing really changed there. I can drop that. > > This also adds code to initialize the PKRU value to the hardware init value > > (namely 0) if the PKRU bit is not set in the XSTATE header provided to > > ptrace, to match XRSTOR. > > The implication here is that we would like the sigreturn ABI and the > ptrace ABI to behave in a similar fashion, right? I don't personally care about sigreturn that much but I would like the ptrace ABI to behave like XRSTOR (which it did before 5.14), and the sigreturn ABI behaves like XRSTOR because it *is* XRSTOR (both before and after 5.14), so the ptrace ABI behaving like the sigreturn ABI arises transitively. > At a high level, this patch does a *LOT*. Generally, it's nice when > bugfixes can be encapsulted in one patch, but I think there's too much > going on here for one patch. Ok. How about I break the first part into two pieces, one that changes the signatures of copy_uabi_from_kernel_to_xstate() and copy_sigframe_from_user_to_xstate(), and one that moves the relevant KVM code from fpu_copy_uabi_to_guest_fpstate() to copy_uabi_to_xstate() and deals with the edge case behavior of the mask? > > diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c > > index 3b28c5b25e12..c273669e8a00 100644 > > --- a/arch/x86/kernel/fpu/core.c > > +++ b/arch/x86/kernel/fpu/core.c > > @@ -391,8 +391,6 @@ int fpu_copy_uabi_to_guest_fpstate(struct fpu_guest *gfpu, const void *buf, > > { > > struct fpstate *kstate = gfpu->fpstate; > > const union fpregs_state *ustate = buf; > > - struct pkru_state *xpkru; > > - int ret; > > > > if (!cpu_feature_enabled(X86_FEATURE_XSAVE)) { > > if (ustate->xsave.header.xfeatures & ~XFEATURE_MASK_FPSSE) > > @@ -406,16 +404,16 @@ int fpu_copy_uabi_to_guest_fpstate(struct fpu_guest *gfpu, const void *buf, > > if (ustate->xsave.header.xfeatures & ~xcr0) > > return -EINVAL; > > > > - ret = copy_uabi_from_kernel_to_xstate(kstate, ustate); > > - if (ret) > > - return ret; > > + /* > > + * Nullify @vpkru to preserve its current value if PKRU's bit isn't set > > + * in the header. KVM's odd ABI is to leave PKRU untouched in this > > + * case (all other components are eventually re-initialized). > > + * (Not clear that this is actually necessary for compat). > > + */ > > + if (!(ustate->xsave.header.xfeatures & XFEATURE_MASK_PKRU)) > > + vpkru = NULL; > > I'm not a big fan of hunks that are part of bugfixes where it is not > clear that the hunk is necessary. This is necessary to avoid changing KVM's behavior at the same time that we change ptrace, since KVM doesn't want the same behavior as ptrace. > > - /* Retrieve PKRU if not in init state */ > > - if (kstate->regs.xsave.header.xfeatures & XFEATURE_MASK_PKRU) { > > - xpkru = get_xsave_addr(&kstate->regs.xsave, XFEATURE_PKRU); > > - *vpkru = xpkru->pkru; > > - } > > - return 0; > > + return copy_uabi_from_kernel_to_xstate(kstate, ustate, vpkru); > > } > > EXPORT_SYMBOL_GPL(fpu_copy_uabi_to_guest_fpstate); > > #endif /* CONFIG_KVM */ > > diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c > > index 75ffaef8c299..6d056b68f4ed 100644 > > --- a/arch/x86/kernel/fpu/regset.c > > +++ b/arch/x86/kernel/fpu/regset.c > > @@ -167,7 +167,7 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset, > > } > > > > fpu_force_restore(fpu); > > - ret = copy_uabi_from_kernel_to_xstate(fpu->fpstate, kbuf ?: tmpbuf); > > + ret = copy_uabi_from_kernel_to_xstate(fpu->fpstate, kbuf ?: tmpbuf, &target->thread.pkru); > > I actually hadn't dug into the KVM code around this before. It seems > like the PKRU pointer (&target->thread.pkru) here can also be > &vcpu->arch.pkru if it comes from the KVM side. Right. Where we need to put PKRU to get the kernel to swap it in varies depending on whether this is a normal task or a VM. > I was missing why PKRU is so special here. I think in *both* cases, > we're copying a potential PKRU value from userspace. But, the fpstate > target is a useless place to write PKRU because the kernel doesn't > update from there. Right. > So, the copy-in function > (copy_uabi_from_kernel_to_xstate()) needs a place to stash PKRU where > the kernel will see it. The place that the kernel wants to stash it is > either the task PKRU field or the KVM vcpu field. That's what the > pointer provides. Right. > Also, is this getting a wee bit over 80 columns? Probably. > > out: > > vfree(tmpbuf); > > diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c > > index 91d4b6de58ab..558076dbde5b 100644 > > --- a/arch/x86/kernel/fpu/signal.c > > +++ b/arch/x86/kernel/fpu/signal.c > > @@ -396,7 +396,7 @@ static bool __fpu_restore_sig(void __user *buf, void __user *buf_fx, > > > > fpregs = &fpu->fpstate->regs; > > if (use_xsave() && !fx_only) { > > - if (copy_sigframe_from_user_to_xstate(fpu->fpstate, buf_fx)) > > + if (copy_sigframe_from_user_to_xstate(tsk, buf_fx)) > > return false; > > This is also changing copy_sigframe_from_user_to_xstate() to take a > 'task_struct' instead of an 'fpstate'. Why? That function just turns > it right back into an fpstate with: tsk->thread.fpu.fpstate. So that we can also access tsk->thread.pkru at the same time. > > } else { > > if (__copy_from_user(&fpregs->fxsave, buf_fx, > > diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c > > index c8340156bfd2..8f14981a3936 100644 > > --- a/arch/x86/kernel/fpu/xstate.c > > +++ b/arch/x86/kernel/fpu/xstate.c > > @@ -1197,7 +1197,7 @@ static int copy_from_buffer(void *dst, unsigned int offset, unsigned int size, > > > > > > static int copy_uabi_to_xstate(struct fpstate *fpstate, const void *kbuf, > > - const void __user *ubuf) > > + const void __user *ubuf, u32 *pkru) > > I think this function deserves a little comment about what it expects > from 'pkru' here. Maybe: Ok. > /* > * The kernel will not update the actual PKRU register from the PKRU > * space in @fpstate. Allow callers to pass in an alternate destination > * for PKRU. This is currently either the pkru field from the > * task_struct or vcpu. > */ Or NULL, but yeah, sure. > > { > > struct xregs_state *xsave = &fpstate->regs.xsave; > > unsigned int offset, size; > > @@ -1246,6 +1246,21 @@ static int copy_uabi_to_xstate(struct fpstate *fpstate, const void *kbuf, > > } > > } > > > > + /* > > + * Update the user protection key storage. Allow KVM to > > + * pass in a NULL pkru pointer if the mask bit is unset > > + * for its legacy ABI behavior. > > + */ > > If I read this in 5 years, do I know what the "KVM legacy ABI behavior" is? Probably not. I'll be more specific. > > + if (pkru) > > + *pkru = 0; > > + > > + if (hdr.xfeatures & XFEATURE_MASK_PKRU) { > > + struct pkru_state *xpkru; > > + > > + xpkru = __raw_xsave_addr(xsave, XFEATURE_PKRU); > > + *pkru = xpkru->pkru; > > + } > > This is a bit wonky. The code kinda pretends that XFEATURE_MASK_PKRU > and 'pkru' are independent. But, it's actually impossible to have a > pkru==NULL and have XFEATURE_MASK_PKRU set. The code would oops in that > case. Right. > Would something like this be more clear? > > if (hdr.xfeatures & XFEATURE_MASK_PKRU) { > struct pkru_state *xpkru; > > xpkru = __raw_xsave_addr(xsave, XFEATURE_PKRU); > *pkru = xpkru->pkru; > } else { > /* > * KVM may pass a NULL 'pkru' to indicate > * that it does not need PKRU updated. > */ > if (pkru) > *pkru = 0; > } Yeah, Sean Christopherson suggested this (with the else and if collapsed into a single level) when I submitted this previously. Thanks, - Kyle
On 11/10/22 16:03, Kyle Huey wrote: > On Tue, Nov 8, 2022 at 10:23 AM Dave Hansen <dave.hansen@intel.com> wrote: ... >> At a high level, this patch does a *LOT*. Generally, it's nice when >> bugfixes can be encapsulted in one patch, but I think there's too much >> going on here for one patch. > > Ok. How about I break the first part into two pieces, one that changes the > signatures of copy_uabi_from_kernel_to_xstate() and > copy_sigframe_from_user_to_xstate(), and one that moves the relevant > KVM code from fpu_copy_uabi_to_guest_fpstate() to copy_uabi_to_xstate() > and deals with the edge case behavior of the mask? Sounds like a good start. My gut says there's another patch or two that could be broken out, but that sounds like a reasonable next step. >>> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c >>> index 3b28c5b25e12..c273669e8a00 100644 >>> --- a/arch/x86/kernel/fpu/core.c >>> +++ b/arch/x86/kernel/fpu/core.c >>> @@ -391,8 +391,6 @@ int fpu_copy_uabi_to_guest_fpstate(struct fpu_guest *gfpu, const void *buf, >>> { >>> struct fpstate *kstate = gfpu->fpstate; >>> const union fpregs_state *ustate = buf; >>> - struct pkru_state *xpkru; >>> - int ret; >>> >>> if (!cpu_feature_enabled(X86_FEATURE_XSAVE)) { >>> if (ustate->xsave.header.xfeatures & ~XFEATURE_MASK_FPSSE) >>> @@ -406,16 +404,16 @@ int fpu_copy_uabi_to_guest_fpstate(struct fpu_guest *gfpu, const void *buf, >>> if (ustate->xsave.header.xfeatures & ~xcr0) >>> return -EINVAL; >>> >>> - ret = copy_uabi_from_kernel_to_xstate(kstate, ustate); >>> - if (ret) >>> - return ret; >>> + /* >>> + * Nullify @vpkru to preserve its current value if PKRU's bit isn't set >>> + * in the header. KVM's odd ABI is to leave PKRU untouched in this >>> + * case (all other components are eventually re-initialized). >>> + * (Not clear that this is actually necessary for compat). >>> + */ >>> + if (!(ustate->xsave.header.xfeatures & XFEATURE_MASK_PKRU)) >>> + vpkru = NULL; >> >> I'm not a big fan of hunks that are part of bugfixes where it is not >> clear that the hunk is necessary. > > This is necessary to avoid changing KVM's behavior at the same time > that we change > ptrace, since KVM doesn't want the same behavior as ptrace. Your "This is necessary" doesn't really match with "Not clear that this is actually necessary" from the comment, right? Rather than claim whether it is necessary or not, maybe just say why it's there: it's there to preserve wonky KVM behavior. BTW, I'd love to know if KVM *REALLY* depends on this. It'd be nice to kill if not. >> Would something like this be more clear? >> >> if (hdr.xfeatures & XFEATURE_MASK_PKRU) { >> struct pkru_state *xpkru; >> >> xpkru = __raw_xsave_addr(xsave, XFEATURE_PKRU); >> *pkru = xpkru->pkru; >> } else { >> /* >> * KVM may pass a NULL 'pkru' to indicate >> * that it does not need PKRU updated. >> */ >> if (pkru) >> *pkru = 0; >> } > > Yeah, Sean Christopherson suggested this (with the else and if > collapsed into a single level) when I submitted this previously. I generally agree with Sean, but he's also been guilty of an atrocity or two over the years. :) While I generally like low levels of indentation I also think my version is much more clear in this case.
On Thu, Nov 10, 2022 at 5:38 PM Dave Hansen <dave.hansen@intel.com> wrote: > > On 11/10/22 16:03, Kyle Huey wrote: > > On Tue, Nov 8, 2022 at 10:23 AM Dave Hansen <dave.hansen@intel.com> wrote: > ... > >> At a high level, this patch does a *LOT*. Generally, it's nice when > >> bugfixes can be encapsulted in one patch, but I think there's too much > >> going on here for one patch. > > > > Ok. How about I break the first part into two pieces, one that changes the > > signatures of copy_uabi_from_kernel_to_xstate() and > > copy_sigframe_from_user_to_xstate(), and one that moves the relevant > > KVM code from fpu_copy_uabi_to_guest_fpstate() to copy_uabi_to_xstate() > > and deals with the edge case behavior of the mask? > > Sounds like a good start. My gut says there's another patch or two that > could be broken out, but that sounds like a reasonable next step. > > >>> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c > >>> index 3b28c5b25e12..c273669e8a00 100644 > >>> --- a/arch/x86/kernel/fpu/core.c > >>> +++ b/arch/x86/kernel/fpu/core.c > >>> @@ -391,8 +391,6 @@ int fpu_copy_uabi_to_guest_fpstate(struct fpu_guest *gfpu, const void *buf, > >>> { > >>> struct fpstate *kstate = gfpu->fpstate; > >>> const union fpregs_state *ustate = buf; > >>> - struct pkru_state *xpkru; > >>> - int ret; > >>> > >>> if (!cpu_feature_enabled(X86_FEATURE_XSAVE)) { > >>> if (ustate->xsave.header.xfeatures & ~XFEATURE_MASK_FPSSE) > >>> @@ -406,16 +404,16 @@ int fpu_copy_uabi_to_guest_fpstate(struct fpu_guest *gfpu, const void *buf, > >>> if (ustate->xsave.header.xfeatures & ~xcr0) > >>> return -EINVAL; > >>> > >>> - ret = copy_uabi_from_kernel_to_xstate(kstate, ustate); > >>> - if (ret) > >>> - return ret; > >>> + /* > >>> + * Nullify @vpkru to preserve its current value if PKRU's bit isn't set > >>> + * in the header. KVM's odd ABI is to leave PKRU untouched in this > >>> + * case (all other components are eventually re-initialized). > >>> + * (Not clear that this is actually necessary for compat). > >>> + */ > >>> + if (!(ustate->xsave.header.xfeatures & XFEATURE_MASK_PKRU)) > >>> + vpkru = NULL; > >> > >> I'm not a big fan of hunks that are part of bugfixes where it is not > >> clear that the hunk is necessary. > > > > This is necessary to avoid changing KVM's behavior at the same time > > that we change > > ptrace, since KVM doesn't want the same behavior as ptrace. > > Your "This is necessary" doesn't really match with "Not clear that this > is actually necessary" from the comment, right? > > Rather than claim whether it is necessary or not, maybe just say why > it's there: it's there to preserve wonky KVM behavior. > > BTW, I'd love to know if KVM *REALLY* depends on this. It'd be nice to > kill if not. qemu didn't appear to (it treats the KVM_GET_XSAVE2/KVM_SET_XSAVE buffers as opaque blobs afaict) but it's of course not the only KVM application out there. > >> Would something like this be more clear? > >> > >> if (hdr.xfeatures & XFEATURE_MASK_PKRU) { > >> struct pkru_state *xpkru; > >> > >> xpkru = __raw_xsave_addr(xsave, XFEATURE_PKRU); > >> *pkru = xpkru->pkru; > >> } else { > >> /* > >> * KVM may pass a NULL 'pkru' to indicate > >> * that it does not need PKRU updated. > >> */ > >> if (pkru) > >> *pkru = 0; > >> } > > > > Yeah, Sean Christopherson suggested this (with the else and if > > collapsed into a single level) when I submitted this previously. > > I generally agree with Sean, but he's also been guilty of an atrocity or > two over the years. :) While I generally like low levels of > indentation I also think my version is much more clear in this case. > - Kyle
On Thu, Nov 10, 2022, Dave Hansen wrote: > On 11/10/22 16:03, Kyle Huey wrote: > > On Tue, Nov 8, 2022 at 10:23 AM Dave Hansen <dave.hansen@intel.com> wrote: > BTW, I'd love to know if KVM *REALLY* depends on this. Unlikely, but nearly impossible to know for sure. Copy+pasting my response[1] to an earlier version. : Hrm, the current behavior has been KVM ABI for a very long time. : : It's definitely odd because all other components will be initialized due to their : bits being cleared in the header during kvm_load_guest_fpu(), and it probably : wouldn't cause problems in practice as most VMMs likely do "all or nothing" loads. : But, in theory, userspace could save/restore a subset of guest XSTATE and rely on : the kernel not overwriting guest PKRU when its bit is cleared in the header. : : All that said, I don't see any reason to force KVM to change at this time, it's : trivial enough to handle KVM's oddities while providing sane behavior for others. : Nullify the pointer in the guest path and then update copy_uabi_to_xstate() to : play nice with a NULL pointer, e.g. : : /* : * Nullify @vpkru to preserve its current value if PKRU's bit isn't set : * in the header. KVM's odd ABI is to leave PKRU untouched in this : * case (all other components are eventually re-initialized). : */ : if (!(kstate->regs.xsave.header.xfeatures & XFEATURE_MASK_PKRU)) : vpkru = NULL; : : return copy_uabi_from_kernel_to_xstate(kstate, ustate, vpkru); > It'd be nice to kill if not. I don't disagree, my hesitation is purely that doing so might subtly break userspace. That said, I'm 99.9% certain no traditional VMM, e.g. QEMU, is relying on this behavior, as doing KVM_SET_XSAVE with anything except the guest's xfeatures mask would corrupt guest XSAVE state for everything except PKRU. I.e. for all intents and purposes, a traditional VMM must do KVM_GET_SAVE => KVM_SET_XSAVE without touching the xfeatures mask. And for non-traditional usage of KVM, I would be quite surprised if any of those use cases utilize PKRU in the guest, let alone play games with KVM_{G,S}SET_XSAVE. So, I'm not completely opposed to "fixing" KVM's ABI, but it should be done as a separate patch that is tagged "KVM: x86:" and clearly states that it's changing KVM's ABI in a way that could theoretically break userspace. > >> Would something like this be more clear? > >> > >> if (hdr.xfeatures & XFEATURE_MASK_PKRU) { > >> struct pkru_state *xpkru; > >> > >> xpkru = __raw_xsave_addr(xsave, XFEATURE_PKRU); > >> *pkru = xpkru->pkru; > >> } else { > >> /* > >> * KVM may pass a NULL 'pkru' to indicate > >> * that it does not need PKRU updated. > >> */ > >> if (pkru) > >> *pkru = 0; > >> } > > > > Yeah, Sean Christopherson suggested this (with the else and if > > collapsed into a single level) when I submitted this previously. > > I generally agree with Sean, but he's also been guilty of an atrocity or > two over the years. :) Heh, just one or two? I'll call that a win. > While I generally like low levels of indentation I also think my version is > much more clear in this case. I've no objection to a standalone if. My suggestion[2] was in response to code that zeroed @pkru before the XFEATURE_MASK_PKRU check. if (pkru) *pkru = 0; if (hdr.xfeatures & XFEATURE_MASK_PKRU) { struct pkru_state *xpkru; xpkru = __raw_xsave_addr(xsave, XFEATURE_PKRU); *pkru = xpkru->pkru; } [1] https://lore.kernel.org/all/Yv6szXuKGv75wWmm@google.com [2] https://lore.kernel.org/all/YxDP6jie4cwzZIHp@google.com
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index 3b28c5b25e12..c273669e8a00 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -391,8 +391,6 @@ int fpu_copy_uabi_to_guest_fpstate(struct fpu_guest *gfpu, const void *buf, { struct fpstate *kstate = gfpu->fpstate; const union fpregs_state *ustate = buf; - struct pkru_state *xpkru; - int ret; if (!cpu_feature_enabled(X86_FEATURE_XSAVE)) { if (ustate->xsave.header.xfeatures & ~XFEATURE_MASK_FPSSE) @@ -406,16 +404,16 @@ int fpu_copy_uabi_to_guest_fpstate(struct fpu_guest *gfpu, const void *buf, if (ustate->xsave.header.xfeatures & ~xcr0) return -EINVAL; - ret = copy_uabi_from_kernel_to_xstate(kstate, ustate); - if (ret) - return ret; + /* + * Nullify @vpkru to preserve its current value if PKRU's bit isn't set + * in the header. KVM's odd ABI is to leave PKRU untouched in this + * case (all other components are eventually re-initialized). + * (Not clear that this is actually necessary for compat). + */ + if (!(ustate->xsave.header.xfeatures & XFEATURE_MASK_PKRU)) + vpkru = NULL; - /* Retrieve PKRU if not in init state */ - if (kstate->regs.xsave.header.xfeatures & XFEATURE_MASK_PKRU) { - xpkru = get_xsave_addr(&kstate->regs.xsave, XFEATURE_PKRU); - *vpkru = xpkru->pkru; - } - return 0; + return copy_uabi_from_kernel_to_xstate(kstate, ustate, vpkru); } EXPORT_SYMBOL_GPL(fpu_copy_uabi_to_guest_fpstate); #endif /* CONFIG_KVM */ diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c index 75ffaef8c299..6d056b68f4ed 100644 --- a/arch/x86/kernel/fpu/regset.c +++ b/arch/x86/kernel/fpu/regset.c @@ -167,7 +167,7 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset, } fpu_force_restore(fpu); - ret = copy_uabi_from_kernel_to_xstate(fpu->fpstate, kbuf ?: tmpbuf); + ret = copy_uabi_from_kernel_to_xstate(fpu->fpstate, kbuf ?: tmpbuf, &target->thread.pkru); out: vfree(tmpbuf); diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c index 91d4b6de58ab..558076dbde5b 100644 --- a/arch/x86/kernel/fpu/signal.c +++ b/arch/x86/kernel/fpu/signal.c @@ -396,7 +396,7 @@ static bool __fpu_restore_sig(void __user *buf, void __user *buf_fx, fpregs = &fpu->fpstate->regs; if (use_xsave() && !fx_only) { - if (copy_sigframe_from_user_to_xstate(fpu->fpstate, buf_fx)) + if (copy_sigframe_from_user_to_xstate(tsk, buf_fx)) return false; } else { if (__copy_from_user(&fpregs->fxsave, buf_fx, diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index c8340156bfd2..8f14981a3936 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -1197,7 +1197,7 @@ static int copy_from_buffer(void *dst, unsigned int offset, unsigned int size, static int copy_uabi_to_xstate(struct fpstate *fpstate, const void *kbuf, - const void __user *ubuf) + const void __user *ubuf, u32 *pkru) { struct xregs_state *xsave = &fpstate->regs.xsave; unsigned int offset, size; @@ -1246,6 +1246,21 @@ static int copy_uabi_to_xstate(struct fpstate *fpstate, const void *kbuf, } } + /* + * Update the user protection key storage. Allow KVM to + * pass in a NULL pkru pointer if the mask bit is unset + * for its legacy ABI behavior. + */ + if (pkru) + *pkru = 0; + + if (hdr.xfeatures & XFEATURE_MASK_PKRU) { + struct pkru_state *xpkru; + + xpkru = __raw_xsave_addr(xsave, XFEATURE_PKRU); + *pkru = xpkru->pkru; + } + /* * The state that came in from userspace was user-state only. * Mask all the user states out of 'xfeatures': @@ -1264,9 +1279,9 @@ static int copy_uabi_to_xstate(struct fpstate *fpstate, const void *kbuf, * Convert from a ptrace standard-format kernel buffer to kernel XSAVE[S] * format and copy to the target thread. Used by ptrace and KVM. */ -int copy_uabi_from_kernel_to_xstate(struct fpstate *fpstate, const void *kbuf) +int copy_uabi_from_kernel_to_xstate(struct fpstate *fpstate, const void *kbuf, u32 *pkru) { - return copy_uabi_to_xstate(fpstate, kbuf, NULL); + return copy_uabi_to_xstate(fpstate, kbuf, NULL, pkru); } /* @@ -1274,10 +1289,10 @@ int copy_uabi_from_kernel_to_xstate(struct fpstate *fpstate, const void *kbuf) * XSAVE[S] format and copy to the target thread. This is called from the * sigreturn() and rt_sigreturn() system calls. */ -int copy_sigframe_from_user_to_xstate(struct fpstate *fpstate, +int copy_sigframe_from_user_to_xstate(struct task_struct *tsk, const void __user *ubuf) { - return copy_uabi_to_xstate(fpstate, NULL, ubuf); + return copy_uabi_to_xstate(tsk->thread.fpu.fpstate, NULL, ubuf, &tsk->thread.pkru); } static bool validate_independent_components(u64 mask) diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h index 5ad47031383b..a4ecb04d8d64 100644 --- a/arch/x86/kernel/fpu/xstate.h +++ b/arch/x86/kernel/fpu/xstate.h @@ -46,8 +46,8 @@ extern void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate, u32 pkru_val, enum xstate_copy_mode copy_mode); extern void copy_xstate_to_uabi_buf(struct membuf to, struct task_struct *tsk, enum xstate_copy_mode mode); -extern int copy_uabi_from_kernel_to_xstate(struct fpstate *fpstate, const void *kbuf); -extern int copy_sigframe_from_user_to_xstate(struct fpstate *fpstate, const void __user *ubuf); +extern int copy_uabi_from_kernel_to_xstate(struct fpstate *fpstate, const void *kbuf, u32 *pkru); +extern int copy_sigframe_from_user_to_xstate(struct task_struct *tsk, const void __user *ubuf); extern void fpu__init_cpu_xstate(void);