[1/1] x86/fpu: Fix copy_xstate_to_uabi() to copy init states correctly

Message ID 20221018221349.4196-2-chang.seok.bae@intel.com
State New
Headers
Series x86/fpu: Follow up on the init_fpstate fix |

Commit Message

Chang S. Bae Oct. 18, 2022, 10:13 p.m. UTC
  When an extended state component is present in fpstate, but in init state,
the function copies from init_fpstate via copy_feature().

But, dynamic states are not present in init_fpstate. Then accessing
init_fpstate for those will explode like this:

 BUG: kernel NULL pointer dereference, address: 0000000000000000
 ...
 RIP: 0010:memcpy_erms+0x6/0x10
  ? __copy_xstate_to_uabi_buf+0x381/0x870
  fpu_copy_guest_fpstate_to_uabi+0x28/0x80
  kvm_arch_vcpu_ioctl+0x14c/0x1460 [kvm]
  ? __this_cpu_preempt_check+0x13/0x20
  ? vmx_vcpu_put+0x2e/0x260 [kvm_intel]
  kvm_vcpu_ioctl+0xea/0x6b0 [kvm]
  ? kvm_vcpu_ioctl+0xea/0x6b0 [kvm]
  ? __fget_light+0xd4/0x130
  __x64_sys_ioctl+0xe3/0x910
  ? debug_smp_processor_id+0x17/0x20
  ? fpregs_assert_state_consistent+0x27/0x50
  do_syscall_64+0x3f/0x90
  entry_SYSCALL_64_after_hwframe+0x63/0xcd

Instead of referencing init_fpstate, simply zero out the userspace buffer
for the state component in an all-zeros init state.

Fixes: 2308ee57d93d ("x86/fpu/amx: Enable the AMX feature in 64-bit mode")
Reported-by: Yuan Yao <yuan.yao@intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Tested-by: Yuan Yao <yuan.yao@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/lkml/BYAPR11MB3717EDEF2351C958F2C86EED95259@BYAPR11MB3717.namprd11.prod.outlook.com/
---
 arch/x86/kernel/fpu/xstate.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)
  

Comments

Dave Hansen Oct. 20, 2022, 4:57 p.m. UTC | #1
On 10/18/22 15:13, Chang S. Bae wrote:
> @@ -1141,10 +1141,14 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
>  			 */
>  			pkru.pkru = pkru_val;
>  			membuf_write(&to, &pkru, sizeof(pkru));
> +		} else if (!(header.xfeatures & BIT_ULL(i))) {
> +			/*
> +			 * Every extended state component has an all zeros
> +			 * init state.
> +			 */
> +			membuf_zero(&to, xstate_sizes[i]);
>  		} else {
> -			copy_feature(header.xfeatures & BIT_ULL(i), &to,
> -				     __raw_xsave_addr(xsave, i),
> -				     __raw_xsave_addr(xinit, i),
> +			membuf_write(&to, __raw_xsave_addr(xsave, i),
>  				     xstate_sizes[i]);
>  		}

Just to add a bit more context, this is inside this loop:

        mask = fpstate->user_xfeatures;
        for_each_extended_xfeature(i, mask) {
                if (zerofrom < xstate_offsets[i])
                        membuf_zero(&to, xstate_offsets[i] - zerofrom);
		...
	}
        if (to.left)
                membuf_zero(&to, to.left);

In other words, the loop and the surrounding code already know how to
membuf_zero() any gaps in the middle or the end of the user buffer.
Would it be simpler to just adjust the 'mask' over which the loop iterates?

I think that would end up being something like:

	 mask = fpstate->user_xfeatures &
		(xsave->xfeatures | xinit->xfeatures);

Logically, that makes sense too.  We're copying out of either 'xsave' or
'xinit'.  If a feature isn't in either one of those we can't do the
copy_feature() on it.
  
Chang S. Bae Oct. 20, 2022, 6:52 p.m. UTC | #2
On 10/20/2022 9:57 AM, Dave Hansen wrote:
> On 10/18/22 15:13, Chang S. Bae wrote:
>> @@ -1141,10 +1141,14 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
>>   			 */
>>   			pkru.pkru = pkru_val;
>>   			membuf_write(&to, &pkru, sizeof(pkru));
>> +		} else if (!(header.xfeatures & BIT_ULL(i))) {
>> +			/*
>> +			 * Every extended state component has an all zeros
>> +			 * init state.
>> +			 */
>> +			membuf_zero(&to, xstate_sizes[i]);
>>   		} else {
>> -			copy_feature(header.xfeatures & BIT_ULL(i), &to,
>> -				     __raw_xsave_addr(xsave, i),
>> -				     __raw_xsave_addr(xinit, i),
>> +			membuf_write(&to, __raw_xsave_addr(xsave, i),
>>   				     xstate_sizes[i]);
>>   		}
> 
> Just to add a bit more context, this is inside this loop:
> 
>          mask = fpstate->user_xfeatures;
>          for_each_extended_xfeature(i, mask) {
>                  if (zerofrom < xstate_offsets[i])
>                          membuf_zero(&to, xstate_offsets[i] - zerofrom);
> 		...
> 	}
>          if (to.left)
>                  membuf_zero(&to, to.left);
> 
> In other words, the loop and the surrounding code already know how to
> membuf_zero() any gaps in the middle or the end of the user buffer.
> Would it be simpler to just adjust the 'mask' over which the loop iterates?

Yeah, right!

> I think that would end up being something like:
> 
> 	 mask = fpstate->user_xfeatures &
> 		(xsave->xfeatures | xinit->xfeatures);
> 
> Logically, that makes sense too.  We're copying out of either 'xsave' or
> 'xinit'.  If a feature isn't in either one of those we can't do the
> copy_feature() on it.

Yes, it is. But, one tricky part here is xinit->xstate_bv is zero. 
Instead, xinit->xcomp_bv appears to be relevant. Also, we want this for 
dynamic features that rely on XSAVES. Then, the change can be something 
like this:

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index e77cabfa802f..3f3286d7e1a8 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1125,6 +1125,15 @@ void __copy_xstate_to_uabi_buf(struct membuf to, 
struct fpstate *fpstate,
          */
         mask = fpstate->user_xfeatures;

+       /*
+        * Dynamic features are not present in init_fpstate since they have
+        * an all zeros init state. When they are in init state, instead of
+        * retrieving them from init_fpstate, remove those from 'mask' to
+        * zero the user buffer.
+        */
+       if (fpu_state_size_dynamic())
+               mask &= (header.xfeatures | xinit->header.xcomp_bv);

Thanks,
Chang
  

Patch

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index e77cabfa802f..efa9e3a269fc 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1141,10 +1141,14 @@  void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
 			 */
 			pkru.pkru = pkru_val;
 			membuf_write(&to, &pkru, sizeof(pkru));
+		} else if (!(header.xfeatures & BIT_ULL(i))) {
+			/*
+			 * Every extended state component has an all zeros
+			 * init state.
+			 */
+			membuf_zero(&to, xstate_sizes[i]);
 		} else {
-			copy_feature(header.xfeatures & BIT_ULL(i), &to,
-				     __raw_xsave_addr(xsave, i),
-				     __raw_xsave_addr(xinit, i),
+			membuf_write(&to, __raw_xsave_addr(xsave, i),
 				     xstate_sizes[i]);
 		}
 		/*