[v2,3/5] mm: Make PR_MDWE_REFUSE_EXEC_GAIN an unsigned long

Message ID 20230517150321.2890206-4-revest@chromium.org
State New
Headers
Series MDWE without inheritance |

Commit Message

Florent Revest May 17, 2023, 3:03 p.m. UTC
  Alexey pointed out that defining a prctl flag as an int is a footgun
because, under some circumstances, when used as a flag to prctl, it can
be casted to long with garbage upper bits which would result in
unexpected behaviors.

This patch changes the constant to a UL to eliminate these
possibilities.

Signed-off-by: Florent Revest <revest@chromium.org>
Suggested-by: Alexey Izbyshev <izbyshev@ispras.ru>
---
 include/uapi/linux/prctl.h       | 2 +-
 tools/include/uapi/linux/prctl.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)
  

Comments

David Hildenbrand May 22, 2023, 8:55 a.m. UTC | #1
On 17.05.23 17:03, Florent Revest wrote:
> Alexey pointed out that defining a prctl flag as an int is a footgun
> because, under some circumstances, when used as a flag to prctl, it can
> be casted to long with garbage upper bits which would result in
> unexpected behaviors.
> 
> This patch changes the constant to a UL to eliminate these
> possibilities.
> 
> Signed-off-by: Florent Revest <revest@chromium.org>
> Suggested-by: Alexey Izbyshev <izbyshev@ispras.ru>
> ---
>   include/uapi/linux/prctl.h       | 2 +-
>   tools/include/uapi/linux/prctl.h | 2 +-
>   2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
> index f23d9a16507f..6e9af6cbc950 100644
> --- a/include/uapi/linux/prctl.h
> +++ b/include/uapi/linux/prctl.h
> @@ -283,7 +283,7 @@ struct prctl_mm_map {
>   
>   /* Memory deny write / execute */
>   #define PR_SET_MDWE			65
> -# define PR_MDWE_REFUSE_EXEC_GAIN	1
> +# define PR_MDWE_REFUSE_EXEC_GAIN	(1UL << 0)
>   
>   #define PR_GET_MDWE			66
>   
> diff --git a/tools/include/uapi/linux/prctl.h b/tools/include/uapi/linux/prctl.h
> index 759b3f53e53f..6e6563e97fef 100644
> --- a/tools/include/uapi/linux/prctl.h
> +++ b/tools/include/uapi/linux/prctl.h
> @@ -283,7 +283,7 @@ struct prctl_mm_map {
>   
>   /* Memory deny write / execute */
>   #define PR_SET_MDWE			65
> -# define PR_MDWE_REFUSE_EXEC_GAIN	1
> +# define PR_MDWE_REFUSE_EXEC_GAIN	(1UL << 0)
>   
>   #define PR_GET_MDWE			66
>   

Both are changing existing uapi, so you'll already have existing user 
space using the old values, that your kernel code has to deal with no?
  
Alexey Izbyshev May 22, 2023, 10:35 a.m. UTC | #2
On 2023-05-22 11:55, David Hildenbrand wrote:
> On 17.05.23 17:03, Florent Revest wrote:
>> Alexey pointed out that defining a prctl flag as an int is a footgun
>> because, under some circumstances, when used as a flag to prctl, it 
>> can
>> be casted to long with garbage upper bits which would result in
>> unexpected behaviors.
>> 
>> This patch changes the constant to a UL to eliminate these
>> possibilities.
>> 
>> Signed-off-by: Florent Revest <revest@chromium.org>
>> Suggested-by: Alexey Izbyshev <izbyshev@ispras.ru>
>> ---
>>   include/uapi/linux/prctl.h       | 2 +-
>>   tools/include/uapi/linux/prctl.h | 2 +-
>>   2 files changed, 2 insertions(+), 2 deletions(-)
>> 
>> diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
>> index f23d9a16507f..6e9af6cbc950 100644
>> --- a/include/uapi/linux/prctl.h
>> +++ b/include/uapi/linux/prctl.h
>> @@ -283,7 +283,7 @@ struct prctl_mm_map {
>>     /* Memory deny write / execute */
>>   #define PR_SET_MDWE			65
>> -# define PR_MDWE_REFUSE_EXEC_GAIN	1
>> +# define PR_MDWE_REFUSE_EXEC_GAIN	(1UL << 0)
>>     #define PR_GET_MDWE			66
>>   diff --git a/tools/include/uapi/linux/prctl.h 
>> b/tools/include/uapi/linux/prctl.h
>> index 759b3f53e53f..6e6563e97fef 100644
>> --- a/tools/include/uapi/linux/prctl.h
>> +++ b/tools/include/uapi/linux/prctl.h
>> @@ -283,7 +283,7 @@ struct prctl_mm_map {
>>     /* Memory deny write / execute */
>>   #define PR_SET_MDWE			65
>> -# define PR_MDWE_REFUSE_EXEC_GAIN	1
>> +# define PR_MDWE_REFUSE_EXEC_GAIN	(1UL << 0)
>>     #define PR_GET_MDWE			66
>> 
> 
> Both are changing existing uapi, so you'll already have existing user
> space using the old values, that your kernel code has to deal with no?

I'm the one who suggested this change, so I feel the need to clarify.

For any existing 64-bit user space code using the kernel and the uapi 
headers before this patch and doing the wrong prctl(PR_SET_MDWE, 
PR_MDWE_REFUSE_EXEC_GAIN) call instead of the correct prctl(PR_SET_MDWE, 
(unsigned long)PR_MDWE_REFUSE_EXEC_GAIN), there are two possibilities 
when prctl() implementation extracts the second argument via va_arg(op, 
unsigned long):

* It gets lucky, and the upper 32 bits of the argument are zero. The 
call does what is expected by the user.

* The upper 32 bits are non-zero junk. The flags argument is rejected by 
the kernel, and the call fails with EINVAL (unexpectedly for the user).

This change is intended to affect only the second case, and only after 
the program is recompiled with the new uapi headers. The currently 
wrong, but naturally-looking prctl(PR_SET_MDWE, 
PR_MDWE_REFUSE_EXEC_GAIN) call becomes correct.

The kernel ABI is unaffected by this change, since it has been defined 
in terms of unsigned long from the start.

Thanks,
Alexey
  
David Hildenbrand May 22, 2023, 4:22 p.m. UTC | #3
On 22.05.23 12:35, Alexey Izbyshev wrote:
> On 2023-05-22 11:55, David Hildenbrand wrote:
>> On 17.05.23 17:03, Florent Revest wrote:
>>> Alexey pointed out that defining a prctl flag as an int is a footgun
>>> because, under some circumstances, when used as a flag to prctl, it
>>> can
>>> be casted to long with garbage upper bits which would result in
>>> unexpected behaviors.
>>>
>>> This patch changes the constant to a UL to eliminate these
>>> possibilities.
>>>
>>> Signed-off-by: Florent Revest <revest@chromium.org>
>>> Suggested-by: Alexey Izbyshev <izbyshev@ispras.ru>
>>> ---
>>>    include/uapi/linux/prctl.h       | 2 +-
>>>    tools/include/uapi/linux/prctl.h | 2 +-
>>>    2 files changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
>>> index f23d9a16507f..6e9af6cbc950 100644
>>> --- a/include/uapi/linux/prctl.h
>>> +++ b/include/uapi/linux/prctl.h
>>> @@ -283,7 +283,7 @@ struct prctl_mm_map {
>>>      /* Memory deny write / execute */
>>>    #define PR_SET_MDWE			65
>>> -# define PR_MDWE_REFUSE_EXEC_GAIN	1
>>> +# define PR_MDWE_REFUSE_EXEC_GAIN	(1UL << 0)
>>>      #define PR_GET_MDWE			66
>>>    diff --git a/tools/include/uapi/linux/prctl.h
>>> b/tools/include/uapi/linux/prctl.h
>>> index 759b3f53e53f..6e6563e97fef 100644
>>> --- a/tools/include/uapi/linux/prctl.h
>>> +++ b/tools/include/uapi/linux/prctl.h
>>> @@ -283,7 +283,7 @@ struct prctl_mm_map {
>>>      /* Memory deny write / execute */
>>>    #define PR_SET_MDWE			65
>>> -# define PR_MDWE_REFUSE_EXEC_GAIN	1
>>> +# define PR_MDWE_REFUSE_EXEC_GAIN	(1UL << 0)
>>>      #define PR_GET_MDWE			66
>>>
>>
>> Both are changing existing uapi, so you'll already have existing user
>> space using the old values, that your kernel code has to deal with no?
> 
> I'm the one who suggested this change, so I feel the need to clarify.
> 
> For any existing 64-bit user space code using the kernel and the uapi
> headers before this patch and doing the wrong prctl(PR_SET_MDWE,
> PR_MDWE_REFUSE_EXEC_GAIN) call instead of the correct prctl(PR_SET_MDWE,
> (unsigned long)PR_MDWE_REFUSE_EXEC_GAIN), there are two possibilities
> when prctl() implementation extracts the second argument via va_arg(op,
> unsigned long):
> 
> * It gets lucky, and the upper 32 bits of the argument are zero. The
> call does what is expected by the user.
> 
> * The upper 32 bits are non-zero junk. The flags argument is rejected by
> the kernel, and the call fails with EINVAL (unexpectedly for the user).
> 
> This change is intended to affect only the second case, and only after
> the program is recompiled with the new uapi headers. The currently
> wrong, but naturally-looking prctl(PR_SET_MDWE,
> PR_MDWE_REFUSE_EXEC_GAIN) call becomes correct.
> 
> The kernel ABI is unaffected by this change, since it has been defined
> in terms of unsigned long from the start.

The thing I'm concerned about is the following: old user space (that 
would fail) on new kernel where we define some upper 32bit to actually 
have a meaning (where it would succeed with wrong semantics).

IOW, can we ever really "use" these upper 32bit, or should we instead 
only consume the lower 32bit in the kernel and effectively ignore the 
upper 32bit?

I guess the feature is not that old, so having many existing user space 
applications is unlikely.

Which raises the question if we want to tag this here with a "Fixes" and 
eventually cc stable (hmm ...)?
  
Alexey Izbyshev May 22, 2023, 6:58 p.m. UTC | #4
On 2023-05-22 19:22, David Hildenbrand wrote:
> On 22.05.23 12:35, Alexey Izbyshev wrote:
>> On 2023-05-22 11:55, David Hildenbrand wrote:
>>> On 17.05.23 17:03, Florent Revest wrote:
>>>> Alexey pointed out that defining a prctl flag as an int is a footgun
>>>> because, under some circumstances, when used as a flag to prctl, it
>>>> can
>>>> be casted to long with garbage upper bits which would result in
>>>> unexpected behaviors.
>>>> 
>>>> This patch changes the constant to a UL to eliminate these
>>>> possibilities.
>>>> 
>>>> Signed-off-by: Florent Revest <revest@chromium.org>
>>>> Suggested-by: Alexey Izbyshev <izbyshev@ispras.ru>
>>>> ---
>>>>    include/uapi/linux/prctl.h       | 2 +-
>>>>    tools/include/uapi/linux/prctl.h | 2 +-
>>>>    2 files changed, 2 insertions(+), 2 deletions(-)
>>>> 
>>>> diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
>>>> index f23d9a16507f..6e9af6cbc950 100644
>>>> --- a/include/uapi/linux/prctl.h
>>>> +++ b/include/uapi/linux/prctl.h
>>>> @@ -283,7 +283,7 @@ struct prctl_mm_map {
>>>>      /* Memory deny write / execute */
>>>>    #define PR_SET_MDWE			65
>>>> -# define PR_MDWE_REFUSE_EXEC_GAIN	1
>>>> +# define PR_MDWE_REFUSE_EXEC_GAIN	(1UL << 0)
>>>>      #define PR_GET_MDWE			66
>>>>    diff --git a/tools/include/uapi/linux/prctl.h
>>>> b/tools/include/uapi/linux/prctl.h
>>>> index 759b3f53e53f..6e6563e97fef 100644
>>>> --- a/tools/include/uapi/linux/prctl.h
>>>> +++ b/tools/include/uapi/linux/prctl.h
>>>> @@ -283,7 +283,7 @@ struct prctl_mm_map {
>>>>      /* Memory deny write / execute */
>>>>    #define PR_SET_MDWE			65
>>>> -# define PR_MDWE_REFUSE_EXEC_GAIN	1
>>>> +# define PR_MDWE_REFUSE_EXEC_GAIN	(1UL << 0)
>>>>      #define PR_GET_MDWE			66
>>>> 
>>> 
>>> Both are changing existing uapi, so you'll already have existing user
>>> space using the old values, that your kernel code has to deal with 
>>> no?
>> 
>> I'm the one who suggested this change, so I feel the need to clarify.
>> 
>> For any existing 64-bit user space code using the kernel and the uapi
>> headers before this patch and doing the wrong prctl(PR_SET_MDWE,
>> PR_MDWE_REFUSE_EXEC_GAIN) call instead of the correct 
>> prctl(PR_SET_MDWE,
>> (unsigned long)PR_MDWE_REFUSE_EXEC_GAIN), there are two possibilities
>> when prctl() implementation extracts the second argument via 
>> va_arg(op,
>> unsigned long):
>> 
>> * It gets lucky, and the upper 32 bits of the argument are zero. The
>> call does what is expected by the user.
>> 
>> * The upper 32 bits are non-zero junk. The flags argument is rejected 
>> by
>> the kernel, and the call fails with EINVAL (unexpectedly for the 
>> user).
>> 
>> This change is intended to affect only the second case, and only after
>> the program is recompiled with the new uapi headers. The currently
>> wrong, but naturally-looking prctl(PR_SET_MDWE,
>> PR_MDWE_REFUSE_EXEC_GAIN) call becomes correct.
>> 
>> The kernel ABI is unaffected by this change, since it has been defined
>> in terms of unsigned long from the start.
> 
> The thing I'm concerned about is the following: old user space (that
> would fail) on new kernel where we define some upper 32bit to actually
> have a meaning (where it would succeed with wrong semantics).
> 
> IOW, can we ever really "use" these upper 32bit, or should we instead
> only consume the lower 32bit in the kernel and effectively ignore the
> upper 32bit?
> 
I see, thanks. But I think this question is mostly independent from this 
patch. The patch removes a footgun, but it doesn't change the flags 
check in the kernel, and nothing stops the user from doing

int flags = PR_MDWE_REFUSE_EXEC_GAIN;
prctl(PR_SET_MDWE, flags);

So we have to decide whether to ignore the upper 32 bits or not even if 
this patch is not applied (actually *had to* when PR_SET_MDWE op was 
being added).

Possible arguments for ignoring them:
* Upper 32 bits can't be passed on 32-bit targets via the current 
prctl() interface, so a change that adds meaning to them would have to 
be both 64-bit-specific and unable to use another prctl() argument 
instead. That seems unlikely.

* It's not hard to accidentally pass int to prctl() even after this 
patch, so making technically wrong user code work as intended could be a 
good thing.

* A similar footgun exists for ILP32 ABIs (e.g. x32) on a lower level: 
while prctl(PR_SET_MDWE, 1) is fine there because long is 32-bit, the 
syscall interface is still 64-bit, so e.g. syscall(SYS_prctl, 
PR_SET_MDWE, -1) could, depending on syscall() implementation, 
sign-extend -1 to 64 bits and pass 64 set bits instead of 32 to the 
kernel.

Possible arguments for checking them:
* Code like "prctl(PR_SET_MDWE, 1)" is UB on 64-bit platforms. If the 
compiler notices that (e.g. if somebody ever manages to build a program 
and a libc together with LTO), it's allowed to make things much worse 
than just passing junk. Allowing the user to detect at least some of 
such calls now by checking for junk could be better.

* I have the impression that the kernel security community prefers 
strict argument validation.

* PR_SET_MDWE is a new op added in 6.3, so we don't have lots of legacy 
code that is known to pass junk in the upper 32 bits and must be kept 
working (i.e. failing) in the same way in a potential future kernel that 
assigns meaning to those bits.

My preference would be to keep checking the upper 32 bits. Florent, what 
do you think?

> I guess the feature is not that old, so having many existing user
> space applications is unlikely.
> 
> Which raises the question if we want to tag this here with a "Fixes"
> and eventually cc stable (hmm ...)?

Yes, IMO the faster we propagate this change, the better.

Thanks,
Alexey
  
David Hildenbrand May 23, 2023, 9:12 a.m. UTC | #5
On 22.05.23 20:58, Alexey Izbyshev wrote:
> On 2023-05-22 19:22, David Hildenbrand wrote:
>> On 22.05.23 12:35, Alexey Izbyshev wrote:
>>> On 2023-05-22 11:55, David Hildenbrand wrote:
>>>> On 17.05.23 17:03, Florent Revest wrote:
>>>>> Alexey pointed out that defining a prctl flag as an int is a footgun
>>>>> because, under some circumstances, when used as a flag to prctl, it
>>>>> can
>>>>> be casted to long with garbage upper bits which would result in
>>>>> unexpected behaviors.
>>>>>
>>>>> This patch changes the constant to a UL to eliminate these
>>>>> possibilities.
>>>>>
>>>>> Signed-off-by: Florent Revest <revest@chromium.org>
>>>>> Suggested-by: Alexey Izbyshev <izbyshev@ispras.ru>
>>>>> ---
>>>>>     include/uapi/linux/prctl.h       | 2 +-
>>>>>     tools/include/uapi/linux/prctl.h | 2 +-
>>>>>     2 files changed, 2 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
>>>>> index f23d9a16507f..6e9af6cbc950 100644
>>>>> --- a/include/uapi/linux/prctl.h
>>>>> +++ b/include/uapi/linux/prctl.h
>>>>> @@ -283,7 +283,7 @@ struct prctl_mm_map {
>>>>>       /* Memory deny write / execute */
>>>>>     #define PR_SET_MDWE			65
>>>>> -# define PR_MDWE_REFUSE_EXEC_GAIN	1
>>>>> +# define PR_MDWE_REFUSE_EXEC_GAIN	(1UL << 0)
>>>>>       #define PR_GET_MDWE			66
>>>>>     diff --git a/tools/include/uapi/linux/prctl.h
>>>>> b/tools/include/uapi/linux/prctl.h
>>>>> index 759b3f53e53f..6e6563e97fef 100644
>>>>> --- a/tools/include/uapi/linux/prctl.h
>>>>> +++ b/tools/include/uapi/linux/prctl.h
>>>>> @@ -283,7 +283,7 @@ struct prctl_mm_map {
>>>>>       /* Memory deny write / execute */
>>>>>     #define PR_SET_MDWE			65
>>>>> -# define PR_MDWE_REFUSE_EXEC_GAIN	1
>>>>> +# define PR_MDWE_REFUSE_EXEC_GAIN	(1UL << 0)
>>>>>       #define PR_GET_MDWE			66
>>>>>
>>>>
>>>> Both are changing existing uapi, so you'll already have existing user
>>>> space using the old values, that your kernel code has to deal with
>>>> no?
>>>
>>> I'm the one who suggested this change, so I feel the need to clarify.
>>>
>>> For any existing 64-bit user space code using the kernel and the uapi
>>> headers before this patch and doing the wrong prctl(PR_SET_MDWE,
>>> PR_MDWE_REFUSE_EXEC_GAIN) call instead of the correct
>>> prctl(PR_SET_MDWE,
>>> (unsigned long)PR_MDWE_REFUSE_EXEC_GAIN), there are two possibilities
>>> when prctl() implementation extracts the second argument via
>>> va_arg(op,
>>> unsigned long):
>>>
>>> * It gets lucky, and the upper 32 bits of the argument are zero. The
>>> call does what is expected by the user.
>>>
>>> * The upper 32 bits are non-zero junk. The flags argument is rejected
>>> by
>>> the kernel, and the call fails with EINVAL (unexpectedly for the
>>> user).
>>>
>>> This change is intended to affect only the second case, and only after
>>> the program is recompiled with the new uapi headers. The currently
>>> wrong, but naturally-looking prctl(PR_SET_MDWE,
>>> PR_MDWE_REFUSE_EXEC_GAIN) call becomes correct.
>>>
>>> The kernel ABI is unaffected by this change, since it has been defined
>>> in terms of unsigned long from the start.
>>
>> The thing I'm concerned about is the following: old user space (that
>> would fail) on new kernel where we define some upper 32bit to actually
>> have a meaning (where it would succeed with wrong semantics).
>>
>> IOW, can we ever really "use" these upper 32bit, or should we instead
>> only consume the lower 32bit in the kernel and effectively ignore the
>> upper 32bit?
>>
> I see, thanks. But I think this question is mostly independent from this
> patch. The patch removes a footgun, but it doesn't change the flags
> check in the kernel, and nothing stops the user from doing
> 
> int flags = PR_MDWE_REFUSE_EXEC_GAIN;
> prctl(PR_SET_MDWE, flags);
> 
> So we have to decide whether to ignore the upper 32 bits or not even if
> this patch is not applied (actually *had to* when PR_SET_MDWE op was
> being added).

Well, an alternative to this patch would be to say "well, for this prctl 
we ignore any upper 32bit. Then, this change would not be needed. Yes, I 
also don't like that :)

Bu unrelated, I looked at some other random prctl.

#define SUID_DUMP_USER           1

And in kernel/sys.c:

	case PR_SET_DUMPABLE:
		if (arg2 != SUID_DUMP_DISABLE && arg2 != SUID_DUMP_USER)
	...

Wouldn't that also suffer from the same issue, or how is this different?

Also, how is passing "0"s to e.g., PR_GET_THP_DISABLE reliable? We need 
arg2 -> arg5 to be 0. But wouldn't the following also just pass a 0 "int" ?

prctl(PR_GET_THP_DISABLE, 0, 0, 0, 0)


I'm easily confused by such (va_args) things, so sorry for the dummy 
questions.
  
Alexey Izbyshev May 23, 2023, 10:53 a.m. UTC | #6
On 2023-05-23 12:12, David Hildenbrand wrote:
> On 22.05.23 20:58, Alexey Izbyshev wrote:
>> On 2023-05-22 19:22, David Hildenbrand wrote:
>>> On 22.05.23 12:35, Alexey Izbyshev wrote:
>>>> On 2023-05-22 11:55, David Hildenbrand wrote:
>>>>> On 17.05.23 17:03, Florent Revest wrote:
>>>>>> Alexey pointed out that defining a prctl flag as an int is a 
>>>>>> footgun
>>>>>> because, under some circumstances, when used as a flag to prctl, 
>>>>>> it
>>>>>> can
>>>>>> be casted to long with garbage upper bits which would result in
>>>>>> unexpected behaviors.
>>>>>> 
>>>>>> This patch changes the constant to a UL to eliminate these
>>>>>> possibilities.
>>>>>> 
>>>>>> Signed-off-by: Florent Revest <revest@chromium.org>
>>>>>> Suggested-by: Alexey Izbyshev <izbyshev@ispras.ru>
>>>>>> ---
>>>>>>     include/uapi/linux/prctl.h       | 2 +-
>>>>>>     tools/include/uapi/linux/prctl.h | 2 +-
>>>>>>     2 files changed, 2 insertions(+), 2 deletions(-)
>>>>>> 
>>>>>> diff --git a/include/uapi/linux/prctl.h 
>>>>>> b/include/uapi/linux/prctl.h
>>>>>> index f23d9a16507f..6e9af6cbc950 100644
>>>>>> --- a/include/uapi/linux/prctl.h
>>>>>> +++ b/include/uapi/linux/prctl.h
>>>>>> @@ -283,7 +283,7 @@ struct prctl_mm_map {
>>>>>>       /* Memory deny write / execute */
>>>>>>     #define PR_SET_MDWE			65
>>>>>> -# define PR_MDWE_REFUSE_EXEC_GAIN	1
>>>>>> +# define PR_MDWE_REFUSE_EXEC_GAIN	(1UL << 0)
>>>>>>       #define PR_GET_MDWE			66
>>>>>>     diff --git a/tools/include/uapi/linux/prctl.h
>>>>>> b/tools/include/uapi/linux/prctl.h
>>>>>> index 759b3f53e53f..6e6563e97fef 100644
>>>>>> --- a/tools/include/uapi/linux/prctl.h
>>>>>> +++ b/tools/include/uapi/linux/prctl.h
>>>>>> @@ -283,7 +283,7 @@ struct prctl_mm_map {
>>>>>>       /* Memory deny write / execute */
>>>>>>     #define PR_SET_MDWE			65
>>>>>> -# define PR_MDWE_REFUSE_EXEC_GAIN	1
>>>>>> +# define PR_MDWE_REFUSE_EXEC_GAIN	(1UL << 0)
>>>>>>       #define PR_GET_MDWE			66
>>>>>> 
>>>>> 
>>>>> Both are changing existing uapi, so you'll already have existing 
>>>>> user
>>>>> space using the old values, that your kernel code has to deal with
>>>>> no?
>>>> 
>>>> I'm the one who suggested this change, so I feel the need to 
>>>> clarify.
>>>> 
>>>> For any existing 64-bit user space code using the kernel and the 
>>>> uapi
>>>> headers before this patch and doing the wrong prctl(PR_SET_MDWE,
>>>> PR_MDWE_REFUSE_EXEC_GAIN) call instead of the correct
>>>> prctl(PR_SET_MDWE,
>>>> (unsigned long)PR_MDWE_REFUSE_EXEC_GAIN), there are two 
>>>> possibilities
>>>> when prctl() implementation extracts the second argument via
>>>> va_arg(op,
>>>> unsigned long):
>>>> 
>>>> * It gets lucky, and the upper 32 bits of the argument are zero. The
>>>> call does what is expected by the user.
>>>> 
>>>> * The upper 32 bits are non-zero junk. The flags argument is 
>>>> rejected
>>>> by
>>>> the kernel, and the call fails with EINVAL (unexpectedly for the
>>>> user).
>>>> 
>>>> This change is intended to affect only the second case, and only 
>>>> after
>>>> the program is recompiled with the new uapi headers. The currently
>>>> wrong, but naturally-looking prctl(PR_SET_MDWE,
>>>> PR_MDWE_REFUSE_EXEC_GAIN) call becomes correct.
>>>> 
>>>> The kernel ABI is unaffected by this change, since it has been 
>>>> defined
>>>> in terms of unsigned long from the start.
>>> 
>>> The thing I'm concerned about is the following: old user space (that
>>> would fail) on new kernel where we define some upper 32bit to 
>>> actually
>>> have a meaning (where it would succeed with wrong semantics).
>>> 
>>> IOW, can we ever really "use" these upper 32bit, or should we instead
>>> only consume the lower 32bit in the kernel and effectively ignore the
>>> upper 32bit?
>>> 
>> I see, thanks. But I think this question is mostly independent from 
>> this
>> patch. The patch removes a footgun, but it doesn't change the flags
>> check in the kernel, and nothing stops the user from doing
>> 
>> int flags = PR_MDWE_REFUSE_EXEC_GAIN;
>> prctl(PR_SET_MDWE, flags);
>> 
>> So we have to decide whether to ignore the upper 32 bits or not even 
>> if
>> this patch is not applied (actually *had to* when PR_SET_MDWE op was
>> being added).
> 
> Well, an alternative to this patch would be to say "well, for this
> prctl we ignore any upper 32bit. Then, this change would not be
> needed. Yes, I also don't like that :)
> 
> Bu unrelated, I looked at some other random prctl.
> 
> #define SUID_DUMP_USER           1
> 
> And in kernel/sys.c:
> 
> 	case PR_SET_DUMPABLE:
> 		if (arg2 != SUID_DUMP_DISABLE && arg2 != SUID_DUMP_USER)
> 	...
> 
> Wouldn't that also suffer from the same issue, or how is this 
> different?
> 
Yes, it is the same issue, so e.g. prctl(PR_SET_DUMPABLE, 
SUID_DUMP_DISABLE ) may wrongly fail with EINVAL on 64-bit targets.

> Also, how is passing "0"s to e.g., PR_GET_THP_DISABLE reliable? We
> need arg2 -> arg5 to be 0. But wouldn't the following also just pass a
> 0 "int" ?
> 
> prctl(PR_GET_THP_DISABLE, 0, 0, 0, 0)
> 
Yes, this is not reliable on 64-bit targets too. The simplest fix is to 
use "0L", as done in MDWE self-tests (but many other tests get this 
wrong).

Florent also expressed surprise[1] that we don't see a lot of failures 
due to such issues, and I tried to provide some reasons. To elaborate on 
the x86-64 thing, for prctl(PR_SET_DUMPABLE, 0) the compiler will likely 
generate "xorl %esi, %esi" to pass zero, but this instruction will also 
clear the upper 32 bits of %rsi, so the problem is masked (and I believe 
CPU vendors are motivated to do such zeroing to reduce false 
dependencies). But this zeroing is not required by the ABI, so in a more 
complex situation junk might get through.

Real-world examples of very similar breakage in variadic functions 
involving NULL sentinels are mentioned in [2] (the musl bug report is 
[3]). In short, musl defined NULL as plain 0 for C++, so when people do 
e.g. execl("/bin/true", "true", NULL), junk might prevent detection of 
the sentinel in execl() impl. (Though if the sentinel is passed via 
stack because there are a lot of preceding arguments, the breakage 
becomes more apparent because auto-zeroing of registers doesn't come 
into play anymore.)

> 
> I'm easily confused by such (va_args) things, so sorry for the dummy 
> questions.

This stuff *is* confusing, and note that Linux man pages don't even tell 
that prctl() is actually declared as a variadic function (and for 
ptrace() this is mentioned only in the notes, but not in its signature).

Thanks,
Alexey

[1] 
https://lore.kernel.org/lkml/3a38319a3b241e578729ffa5484ad24b@ispras.ru
[2] https://ewontfix.com/11
[3] https://www.openwall.com/lists/musl/2013/01/09/1
  
Catalin Marinas May 23, 2023, 1:07 p.m. UTC | #7
On Tue, May 23, 2023 at 11:12:37AM +0200, David Hildenbrand wrote:
> Also, how is passing "0"s to e.g., PR_GET_THP_DISABLE reliable? We need arg2
> -> arg5 to be 0. But wouldn't the following also just pass a 0 "int" ?
> 
> prctl(PR_GET_THP_DISABLE, 0, 0, 0, 0)
> 
> I'm easily confused by such (va_args) things, so sorry for the dummy
> questions.

Isn't the prctl() prototype in the user headers defined with the first
argument as int while the rest as unsigned long? At least from the man
page:

int prctl(int option, unsigned long arg2, unsigned long arg3,
	  unsigned long arg4, unsigned long arg5);

So there are no va_args tricks (which confuse me as well).

Any int passed to arg[2-5] would be converted by the compiler to an
unsigned long before being passed to the kernel. So I think the change
in this patch is harmless as the conversion is happening anyway.

(well, unless I completely missed what the problem is)
  
Alexey Izbyshev May 23, 2023, 1:25 p.m. UTC | #8
On 2023-05-23 16:07, Catalin Marinas wrote:
> On Tue, May 23, 2023 at 11:12:37AM +0200, David Hildenbrand wrote:
>> Also, how is passing "0"s to e.g., PR_GET_THP_DISABLE reliable? We 
>> need arg2
>> -> arg5 to be 0. But wouldn't the following also just pass a 0 "int" ?
>> 
>> prctl(PR_GET_THP_DISABLE, 0, 0, 0, 0)
>> 
>> I'm easily confused by such (va_args) things, so sorry for the dummy
>> questions.
> 
> Isn't the prctl() prototype in the user headers defined with the first
> argument as int while the rest as unsigned long? At least from the man
> page:
> 
> int prctl(int option, unsigned long arg2, unsigned long arg3,
> 	  unsigned long arg4, unsigned long arg5);
> 
> So there are no va_args tricks (which confuse me as well).
> 
I have explicitly mentioned the problem with man pages in my response to 
David[1]. Quoting myself:

> This stuff *is* confusing, and note that Linux man pages don't even 
> tell
that prctl() is actually declared as a variadic function (and for
ptrace() this is mentioned only in the notes, but not in its signature).

The reality:

* glibc: 
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/sys/prctl.h;h=821aeefc1339b35210e8918ecfe9833ed2792626;hb=glibc-2.37#l42

* musl: 
https://git.musl-libc.org/cgit/musl/tree/include/sys/prctl.h?h=v1.2.4#n180

Though there is a test in the kernel that does define its own prototype, 
avoiding the issue: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/sched/cs_prctl_test.c?h=v6.3#n77

Thanks,
Alexey

[1] 
https://lore.kernel.org/lkml/7c572622c0d8e283fc880fe3f4ffac27@ispras.ru//lkml/7c572622c0d8e283fc880fe3f4ffac27@ispras.ru

> Any int passed to arg[2-5] would be converted by the compiler to an
> unsigned long before being passed to the kernel. So I think the change
> in this patch is harmless as the conversion is happening anyway.
> 
> (well, unless I completely missed what the problem is)
  
Catalin Marinas May 23, 2023, 2:09 p.m. UTC | #9
On Tue, May 23, 2023 at 04:25:45PM +0300, Alexey Izbyshev wrote:
> On 2023-05-23 16:07, Catalin Marinas wrote:
> > On Tue, May 23, 2023 at 11:12:37AM +0200, David Hildenbrand wrote:
> > > Also, how is passing "0"s to e.g., PR_GET_THP_DISABLE reliable? We
> > > need arg2
> > > -> arg5 to be 0. But wouldn't the following also just pass a 0 "int" ?
> > > 
> > > prctl(PR_GET_THP_DISABLE, 0, 0, 0, 0)
> > > 
> > > I'm easily confused by such (va_args) things, so sorry for the dummy
> > > questions.
> > 
> > Isn't the prctl() prototype in the user headers defined with the first
> > argument as int while the rest as unsigned long? At least from the man
> > page:
> > 
> > int prctl(int option, unsigned long arg2, unsigned long arg3,
> > 	  unsigned long arg4, unsigned long arg5);
> > 
> > So there are no va_args tricks (which confuse me as well).
> > 
> I have explicitly mentioned the problem with man pages in my response to
> David[1]. Quoting myself:
> 
> > This stuff *is* confusing, and note that Linux man pages don't even tell
> that prctl() is actually declared as a variadic function (and for
> ptrace() this is mentioned only in the notes, but not in its signature).

Ah, thanks for the clarification (I somehow missed your reply).

> The reality:
> 
> * glibc: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/sys/prctl.h;h=821aeefc1339b35210e8918ecfe9833ed2792626;hb=glibc-2.37#l42
> 
> * musl:
> https://git.musl-libc.org/cgit/musl/tree/include/sys/prctl.h?h=v1.2.4#n180
> 
> Though there is a test in the kernel that does define its own prototype,
> avoiding the issue: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/sched/cs_prctl_test.c?h=v6.3#n77

At least for glibc, it seems that there is a conversion to unsigned
long:

https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/prctl.c#l28

unsigned long int arg2 = va_arg (arg, unsigned long int);

(does va_arg expand to an actual cast?)

If the libc passes a 32-bit to a kernel ABI that expects 64-bit, I think
it's a user-space bug and not a kernel ABI issue.
  
David Hildenbrand May 23, 2023, 2:10 p.m. UTC | #10
>> Wouldn't that also suffer from the same issue, or how is this
>> different?
>>
> Yes, it is the same issue, so e.g. prctl(PR_SET_DUMPABLE,
> SUID_DUMP_DISABLE ) may wrongly fail with EINVAL on 64-bit targets.
> 
>> Also, how is passing "0"s to e.g., PR_GET_THP_DISABLE reliable? We
>> need arg2 -> arg5 to be 0. But wouldn't the following also just pass a
>> 0 "int" ?
>>
>> prctl(PR_GET_THP_DISABLE, 0, 0, 0, 0)
>>
> Yes, this is not reliable on 64-bit targets too. The simplest fix is to
> use "0L", as done in MDWE self-tests (but many other tests get this
> wrong).

Oh, it's even worse than I thought, then. :)

Even in our selftest most of
	$ git grep prctl tools/testing/selftests/ | grep "0"

gets it wrong.

> 
> Florent also expressed surprise[1] that we don't see a lot of failures
> due to such issues, and I tried to provide some reasons. To elaborate on

Yes, I'm also surprised!

> the x86-64 thing, for prctl(PR_SET_DUMPABLE, 0) the compiler will likely
> generate "xorl %esi, %esi" to pass zero, but this instruction will also
> clear the upper 32 bits of %rsi, so the problem is masked (and I believe
> CPU vendors are motivated to do such zeroing to reduce false
> dependencies). But this zeroing is not required by the ABI, so in a more
> complex situation junk might get through.

:/

> 
> Real-world examples of very similar breakage in variadic functions
> involving NULL sentinels are mentioned in [2] (the musl bug report is
> [3]). In short, musl defined NULL as plain 0 for C++, so when people do
> e.g. execl("/bin/true", "true", NULL), junk might prevent detection of
> the sentinel in execl() impl. (Though if the sentinel is passed via
> stack because there are a lot of preceding arguments, the breakage
> becomes more apparent because auto-zeroing of registers doesn't come
> into play anymore.)

Yes, I heard about the "fun" with NULL already. Thanks for the musl 
pointer. And thanks for the confirmation/explanation.

> 
>>
>> I'm easily confused by such (va_args) things, so sorry for the dummy
>> questions.
> 
> This stuff *is* confusing, and note that Linux man pages don't even tell
> that prctl() is actually declared as a variadic function (and for
> ptrace() this is mentioned only in the notes, but not in its signature).

Agreed, that's easy to miss (and probably many people missed it).


Anyhow, for this patch as is (although it feels like drops in the ocean 
after our discussion)

Reviewed-by: David Hildenbrand <david@redhat.com>
  
Catalin Marinas May 23, 2023, 2:11 p.m. UTC | #11
On Wed, May 17, 2023 at 05:03:19PM +0200, Florent Revest wrote:
> Alexey pointed out that defining a prctl flag as an int is a footgun
> because, under some circumstances, when used as a flag to prctl, it can
> be casted to long with garbage upper bits which would result in
> unexpected behaviors.
> 
> This patch changes the constant to a UL to eliminate these
> possibilities.
> 
> Signed-off-by: Florent Revest <revest@chromium.org>
> Suggested-by: Alexey Izbyshev <izbyshev@ispras.ru>

FWIW, I'm fine with this patch and I don't think it introduces an ABI
change.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
  
Alexey Izbyshev May 23, 2023, 2:46 p.m. UTC | #12
On 2023-05-23 17:09, Catalin Marinas wrote:
> On Tue, May 23, 2023 at 04:25:45PM +0300, Alexey Izbyshev wrote:
>> On 2023-05-23 16:07, Catalin Marinas wrote:
>> > On Tue, May 23, 2023 at 11:12:37AM +0200, David Hildenbrand wrote:
>> > > Also, how is passing "0"s to e.g., PR_GET_THP_DISABLE reliable? We
>> > > need arg2
>> > > -> arg5 to be 0. But wouldn't the following also just pass a 0 "int" ?
>> > >
>> > > prctl(PR_GET_THP_DISABLE, 0, 0, 0, 0)
>> > >
>> > > I'm easily confused by such (va_args) things, so sorry for the dummy
>> > > questions.
>> >
>> > Isn't the prctl() prototype in the user headers defined with the first
>> > argument as int while the rest as unsigned long? At least from the man
>> > page:
>> >
>> > int prctl(int option, unsigned long arg2, unsigned long arg3,
>> > 	  unsigned long arg4, unsigned long arg5);
>> >
>> > So there are no va_args tricks (which confuse me as well).
>> >
>> I have explicitly mentioned the problem with man pages in my response 
>> to
>> David[1]. Quoting myself:
>> 
>> > This stuff *is* confusing, and note that Linux man pages don't even tell
>> that prctl() is actually declared as a variadic function (and for
>> ptrace() this is mentioned only in the notes, but not in its 
>> signature).
> 
> Ah, thanks for the clarification (I somehow missed your reply).
> 
>> The reality:
>> 
>> * glibc: 
>> https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/sys/prctl.h;h=821aeefc1339b35210e8918ecfe9833ed2792626;hb=glibc-2.37#l42
>> 
>> * musl:
>> https://git.musl-libc.org/cgit/musl/tree/include/sys/prctl.h?h=v1.2.4#n180
>> 
>> Though there is a test in the kernel that does define its own 
>> prototype,
>> avoiding the issue: 
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/sched/cs_prctl_test.c?h=v6.3#n77
> 
> At least for glibc, it seems that there is a conversion to unsigned
> long:
> 
> https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/prctl.c#l28
> 
> unsigned long int arg2 = va_arg (arg, unsigned long int);
> 
> (does va_arg expand to an actual cast?)
> 
No, this not a conversion or a cast in the sense that I think you mean 
it. What happens in the situation discussed in this thread is the 
following (assuming the argument is passed via a register, which is 
typical for initial variadic arguments on 64-bit targets):

* User calls prctl(op, 0) on a 64-bit target.
* The second argument is an int.
* The compiler generates code to pass an int (32 bits) via a 64-bit 
register. The compiler is NOT required to clear the upper 32 bits of the 
register, so they might contain arbitrary junk in a general case.
* The prctl() implementation calls va_arg(arg, unsigned long) (as in 
your quote).
* The compiler extracts the full 64-bit value of the same register 
(which in our case might contain junk in the upper 32 bits).
* This extracted 64-bit value is then passed to the system call.

So...

> If the libc passes a 32-bit to a kernel ABI that expects 64-bit, I 
> think
> it's a user-space bug and not a kernel ABI issue.

... the problem happens not at the user/kernel boundary, but in prctl() 
call/implementation in user space. But yes, it's still a user-space bug 
and not a kernel ABI issue. The David's question, as I understand it, 
was whether we want to keep such buggy code that happens to pass junk 
failing with EINVAL in future kernels or not. If we do want to keep it 
failing, we can never assign any meaning to the upper 32 bits of the 
second prctl() argument for PR_SET_MDWE op.

Thanks,
Alexey
  
Szabolcs Nagy May 23, 2023, 3:01 p.m. UTC | #13
The 05/23/2023 15:09, Catalin Marinas wrote:
> At least for glibc, it seems that there is a conversion to unsigned
> long:
>
> https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/prctl.c#l28
>
> unsigned long int arg2 = va_arg (arg, unsigned long int);
>
> (does va_arg expand to an actual cast?)

this is not a conversion.

at this point the argument is already corrupt since
an int was passed instead of unsigned long, usually
it's just wrong top 32bit, but in theory arg passing
for int and long could be completely different.

>
> If the libc passes a 32-bit to a kernel ABI that expects 64-bit, I think
> it's a user-space bug and not a kernel ABI issue.

this is point 3 in an earlier mail i wrote about varargs
(we ran into vararg issues during morello porting but it
affects all 64bit targets):

https://lkml.org/lkml/2022/10/31/386
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
  
Florent Revest May 26, 2023, 7:02 p.m. UTC | #14
On Mon, May 22, 2023 at 8:58 PM Alexey Izbyshev <izbyshev@ispras.ru> wrote:
>
> My preference would be to keep checking the upper 32 bits. Florent, what
> do you think?

I don't mind. I can do this as part of v3. :)

> > Which raises the question if we want to tag this here with a "Fixes"
> > and eventually cc stable (hmm ...)?
>
> Yes, IMO the faster we propagate this change, the better.

Okay, will do
  
Florent Revest May 26, 2023, 7:04 p.m. UTC | #15
On Tue, May 23, 2023 at 4:10 PM David Hildenbrand <david@redhat.com> wrote:
>
> >> I'm easily confused by such (va_args) things, so sorry for the dummy
> >> questions.
> >
> > This stuff *is* confusing, and note that Linux man pages don't even tell
> > that prctl() is actually declared as a variadic function (and for
> > ptrace() this is mentioned only in the notes, but not in its signature).
>
> Agreed, that's easy to miss (and probably many people missed it).
>
>
> Anyhow, for this patch as is (although it feels like drops in the ocean
> after our discussion)
>
> Reviewed-by: David Hildenbrand <david@redhat.com>

Thanks everyone for the review and the exchange ! :)
  

Patch

diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index f23d9a16507f..6e9af6cbc950 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -283,7 +283,7 @@  struct prctl_mm_map {
 
 /* Memory deny write / execute */
 #define PR_SET_MDWE			65
-# define PR_MDWE_REFUSE_EXEC_GAIN	1
+# define PR_MDWE_REFUSE_EXEC_GAIN	(1UL << 0)
 
 #define PR_GET_MDWE			66
 
diff --git a/tools/include/uapi/linux/prctl.h b/tools/include/uapi/linux/prctl.h
index 759b3f53e53f..6e6563e97fef 100644
--- a/tools/include/uapi/linux/prctl.h
+++ b/tools/include/uapi/linux/prctl.h
@@ -283,7 +283,7 @@  struct prctl_mm_map {
 
 /* Memory deny write / execute */
 #define PR_SET_MDWE			65
-# define PR_MDWE_REFUSE_EXEC_GAIN	1
+# define PR_MDWE_REFUSE_EXEC_GAIN	(1UL << 0)
 
 #define PR_GET_MDWE			66