execve.2: execve also returns E2BIG if a string is too long

Message ID 20231010234153.021826b1@imladris.surriel.com
State New
Headers
Series execve.2: execve also returns E2BIG if a string is too long |

Commit Message

Rik van Riel Oct. 11, 2023, 3:41 a.m. UTC
  Document that if a command line or environment string is too long (> MAX_ARG_STRLEN), execve will also return E2BIG.

Signed-off-by: Rik van Riel <riel@surriel.com>
---
 man2/execve.2 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Alejandro Colomar Oct. 11, 2023, 10:41 a.m. UTC | #1
Hi Rik,

On Tue, Oct 10, 2023 at 11:41:53PM -0400, Rik van Riel wrote:
> Document that if a command line or environment string is too long (> MAX_ARG_STRLEN), execve will also return E2BIG.

That's already implied by the current text:

       E2BIG  The total number of bytes in the environment (envp) and argument
              list (argv) is too large.

That means that

size_t  bytes;

bytes = 0;
for (char *e = envp; e != NULL; e++)
	bytes += strlen(e) + 1;  // I have doubts about the +1
for (char *a = argv; a != NULL; a++)
	bytes += strlen(a) + 1;  // Same doubts

if (bytes > MAX_ARG_STRLEN)  // Maybe >= ?
	return -E2BIG;

> 
> Signed-off-by: Rik van Riel <riel@surriel.com>
> ---
>  man2/execve.2 | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/man2/execve.2 b/man2/execve.2
> index 0d9582492ad1..c1a359d01872 100644
> --- a/man2/execve.2
> +++ b/man2/execve.2
> @@ -449,7 +449,7 @@ The total number of bytes in the environment
>  .RI ( envp )
>  and argument list
>  .RI ( argv )
> -is too large.
> +is too large, or an argument or environment string is too long.

Please use semantic newlines:

$ MANWIDTH=72 man man-pages | sed -n '/Use semantic newlines/,/^$/p'
   Use semantic newlines
       In the source of a manual page, new sentences should be started
       on  new  lines,  long  sentences  should be split into lines at
       clause breaks (commas, semicolons, colons, and so on), and long
       clauses should be split at phrase boundaries.  This convention,
       sometimes known as "semantic newlines", makes it easier to  see
       the  effect of patches, which often operate at the level of in‐
       dividual sentences, clauses, or phrases.


Thanks,
Alex

>  .TP
>  .B EACCES
>  Search permission is denied on a component of the path prefix of
> -- 
> 2.41.0
> 
>
  
Rik van Riel Oct. 11, 2023, 1:21 p.m. UTC | #2
On Wed, 2023-10-11 at 12:41 +0200, Alejandro Colomar wrote:
> Hi Rik,
> 
> On Tue, Oct 10, 2023 at 11:41:53PM -0400, Rik van Riel wrote:
> > Document that if a command line or environment string is too long
> > (> MAX_ARG_STRLEN), execve will also return E2BIG.
> 
> That's already implied by the current text:
> 
>        E2BIG  The total number of bytes in the environment (envp) and
> argument
>               list (argv) is too large.
> 
> That means that
> 
> size_t  bytes;
> 
> bytes = 0;
> for (char *e = envp; e != NULL; e++)
>         bytes += strlen(e) + 1;  // I have doubts about the +1
> for (char *a = argv; a != NULL; a++)
>         bytes += strlen(a) + 1;  // Same doubts
> 
> if (bytes > MAX_ARG_STRLEN)  // Maybe >= ?
>         return -E2BIG;

The code in fs/exec.c enforces MAX_ARG_STRLEN against
each individual string, not against the total.

If any string, either argument or environment, is larger
than 32 * PAGE_SIZE, the kernel will return -E2BIG.

do_execveat_common() has this code, which uses copy_strings
to copy both the strings from the environment, and from
the command line arguments:

        retval = copy_strings(bprm->envc, envp, bprm);
        if (retval < 0)
                goto out_free;

        retval = copy_strings(bprm->argc, argv, bprm);
        if (retval < 0)
                goto out_free;

Inside copy_strings() we have this code:


        while (argc-- > 0) {
...
                len = strnlen_user(str, MAX_ARG_STRLEN);
                if (!len)
                        goto out;

                ret = -E2BIG;
                if (!valid_arg_len(bprm, len))
                        goto out;

The valid_arg_len() function does not need explanation:

static bool valid_arg_len(struct linux_binprm *bprm, long len)
{
        return len <= MAX_ARG_STRLEN;
}


The current man page wording is very clear about the total
length being enforced, but IMHO not as clear about the limit
that gets enforced on each individual string.

The total length limit of environment & commandline arguments
is enforced by bprm_stack_limits(), and is checked against
either 1/4 of the maximum stack size, or 3/4 of _STK_LIM, whichever
is smaller. The MAX_ARG_STRLEN value does not come into play when
enforcing the total.
  
Matthew House Oct. 11, 2023, 1:44 p.m. UTC | #3
On Wed, Oct 11, 2023 at 9:21 AM Rik van Riel <riel@surriel.com> wrote:
> On Wed, 2023-10-11 at 12:41 +0200, Alejandro Colomar wrote:
> > Hi Rik,
> >
> > On Tue, Oct 10, 2023 at 11:41:53PM -0400, Rik van Riel wrote:
> > > Document that if a command line or environment string is too long
> > > (> MAX_ARG_STRLEN), execve will also return E2BIG.
> >
> > That's already implied by the current text:
> >
> >        E2BIG  The total number of bytes in the environment (envp) and
> > argument
> >               list (argv) is too large.
> >
> > That means that
> >
> > size_t  bytes;
> >
> > bytes = 0;
> > for (char *e = envp; e != NULL; e++)
> >         bytes += strlen(e) + 1;  // I have doubts about the +1
> > for (char *a = argv; a != NULL; a++)
> >         bytes += strlen(a) + 1;  // Same doubts
> >
> > if (bytes > MAX_ARG_STRLEN)  // Maybe >= ?
> >         return -E2BIG;
>
> The code in fs/exec.c enforces MAX_ARG_STRLEN against
> each individual string, not against the total.
>
> If any string, either argument or environment, is larger
> than 32 * PAGE_SIZE, the kernel will return -E2BIG.
>
> do_execveat_common() has this code, which uses copy_strings
> to copy both the strings from the environment, and from
> the command line arguments:
>
>         retval = copy_strings(bprm->envc, envp, bprm);
>         if (retval < 0)
>                 goto out_free;
>
>         retval = copy_strings(bprm->argc, argv, bprm);
>         if (retval < 0)
>                 goto out_free;
>
> Inside copy_strings() we have this code:
>
>
>         while (argc-- > 0) {
> ...
>                 len = strnlen_user(str, MAX_ARG_STRLEN);
>                 if (!len)
>                         goto out;
>
>                 ret = -E2BIG;
>                 if (!valid_arg_len(bprm, len))
>                         goto out;
>
> The valid_arg_len() function does not need explanation:
>
> static bool valid_arg_len(struct linux_binprm *bprm, long len)
> {
>         return len <= MAX_ARG_STRLEN;
> }
>
>
> The current man page wording is very clear about the total
> length being enforced, but IMHO not as clear about the limit
> that gets enforced on each individual string.
>
> The total length limit of environment & commandline arguments
> is enforced by bprm_stack_limits(), and is checked against
> either 1/4 of the maximum stack size, or 3/4 of _STK_LIM, whichever
> is smaller. The MAX_ARG_STRLEN value does not come into play when
> enforcing the total.

To expand on this, there are basically two separate byte limits in
fs/exec.c, one for each individual argv/envp string, and another for all
strings and all pointers to them as a whole. To put the whole thing in
pseudocode, the checks work effectively like this, assuming I haven't made
any errors:

int argc, envc;
unsigned long bytes, limit;

/* assume that argv has already been adjusted to add an empty argv[0] */
argc = 0, envc = 0, bytes = 0;
for (char **a = argv; *a != NULL; a++, argc++) {
    if (strlen(*a) >= MAX_ARG_STRLEN)
        return -E2BIG;
    bytes += strlen(*a) + 1;
}
for (char **e = envp; *e != NULL; e++, envc++) {
    if (strlen(*e) >= MAX_ARG_STRLEN)
        return -E2BIG;
    bytes += strlen(*e) + 1;
}

if (argc > MAX_ARG_STRINGS || envc > MAX_ARG_STRINGS)
    return -E2BIG;
bytes += (argc + envc) * sizeof(void *);

limit = max(min(_STK_LIM / 4 * 3, rlim_stack.rlim_cur / 4), ARG_MAX);
if (bytes > limit)
    return -E2BIG;

Thank you,
Matthew House
  
Matthew House Oct. 11, 2023, 1:52 p.m. UTC | #4
On Tue, Oct 10, 2023 at 11:51 PM Rik van Riel <riel@surriel.com> wrote:
> Document that if a command line or environment string is too long (> MAX_ARG_STRLEN), execve will also return E2BIG.
>
> Signed-off-by: Rik van Riel <riel@surriel.com>

It might be worth mentioning that strlen(pathname) must also be strictly
less than MAX_ARG_STRLEN, it being subject to the same restrictions as
each of the argv/envp strings.

Thank you,
Matthew House
  
Alejandro Colomar Oct. 11, 2023, 2:42 p.m. UTC | #5
Hi Rik,

On Wed, Oct 11, 2023 at 09:21:28AM -0400, Rik van Riel wrote:
> On Wed, 2023-10-11 at 12:41 +0200, Alejandro Colomar wrote:
> > Hi Rik,
> > 
> > On Tue, Oct 10, 2023 at 11:41:53PM -0400, Rik van Riel wrote:
> > > Document that if a command line or environment string is too long
> > > (> MAX_ARG_STRLEN), execve will also return E2BIG.
> > 
> > That's already implied by the current text:
> > 
> >        E2BIG  The total number of bytes in the environment (envp) and
> > argument
> >               list (argv) is too large.
> > 
> > That means that
> > 
> > size_t  bytes;
> > 
> > bytes = 0;
> > for (char *e = envp; e != NULL; e++)
> >         bytes += strlen(e) + 1;  // I have doubts about the +1
> > for (char *a = argv; a != NULL; a++)
> >         bytes += strlen(a) + 1;  // Same doubts
> > 
> > if (bytes > MAX_ARG_STRLEN)  // Maybe >= ?
> >         return -E2BIG;
> 
> The code in fs/exec.c enforces MAX_ARG_STRLEN against
> each individual string, not against the total.
> 
> If any string, either argument or environment, is larger
> than 32 * PAGE_SIZE, the kernel will return -E2BIG.
> 
> do_execveat_common() has this code, which uses copy_strings
> to copy both the strings from the environment, and from
> the command line arguments:
> 
>         retval = copy_strings(bprm->envc, envp, bprm);
>         if (retval < 0)
>                 goto out_free;
> 
>         retval = copy_strings(bprm->argc, argv, bprm);
>         if (retval < 0)
>                 goto out_free;
> 
> Inside copy_strings() we have this code:
> 
> 
>         while (argc-- > 0) {
> ...
>                 len = strnlen_user(str, MAX_ARG_STRLEN);
>                 if (!len)
>                         goto out;
> 
>                 ret = -E2BIG;
>                 if (!valid_arg_len(bprm, len))
>                         goto out;
> 
> The valid_arg_len() function does not need explanation:
> 
> static bool valid_arg_len(struct linux_binprm *bprm, long len)
> {
>         return len <= MAX_ARG_STRLEN;
> }
> 
> 
> The current man page wording is very clear about the total
> length being enforced, but IMHO not as clear about the limit
> that gets enforced on each individual string.
> 
> The total length limit of environment & commandline arguments
> is enforced by bprm_stack_limits(), and is checked against
> either 1/4 of the maximum stack size, or 3/4 of _STK_LIM, whichever
> is smaller. The MAX_ARG_STRLEN value does not come into play when
> enforcing the total.

Ahh, so the limit for each string is different than the limit for the
total.  That makes sense.  Sorry for the noise.

Cheers,
Alex

> 
> -- 
> All Rights Reversed.
  
Alejandro Colomar Oct. 11, 2023, 2:44 p.m. UTC | #6
Hi Matthew,

On Wed, Oct 11, 2023 at 09:44:29AM -0400, Matthew House wrote:
> 
> To expand on this, there are basically two separate byte limits in
> fs/exec.c, one for each individual argv/envp string, and another for all
> strings and all pointers to them as a whole. To put the whole thing in
> pseudocode, the checks work effectively like this, assuming I haven't made
> any errors:
> 
> int argc, envc;
> unsigned long bytes, limit;
> 
> /* assume that argv has already been adjusted to add an empty argv[0] */
> argc = 0, envc = 0, bytes = 0;
> for (char **a = argv; *a != NULL; a++, argc++) {
>     if (strlen(*a) >= MAX_ARG_STRLEN)
>         return -E2BIG;
>     bytes += strlen(*a) + 1;
> }
> for (char **e = envp; *e != NULL; e++, envc++) {
>     if (strlen(*e) >= MAX_ARG_STRLEN)
>         return -E2BIG;
>     bytes += strlen(*e) + 1;
> }
> 
> if (argc > MAX_ARG_STRINGS || envc > MAX_ARG_STRINGS)
>     return -E2BIG;
> bytes += (argc + envc) * sizeof(void *);
> 
> limit = max(min(_STK_LIM / 4 * 3, rlim_stack.rlim_cur / 4), ARG_MAX);
> if (bytes > limit)
>     return -E2BIG;
> 
> Thank you,
> Matthew House

Thanks!

This thing would be useful in the commit message.  An example program
demonstrating it would be even better.

Cheers,
Alex
  
Alejandro Colomar Oct. 11, 2023, 2:47 p.m. UTC | #7
On Wed, Oct 11, 2023 at 09:44:29AM -0400, Matthew House wrote:
> On Wed, Oct 11, 2023 at 9:21 AM Rik van Riel <riel@surriel.com> wrote:
> > On Wed, 2023-10-11 at 12:41 +0200, Alejandro Colomar wrote:
> > > Hi Rik,
> > >
> > > On Tue, Oct 10, 2023 at 11:41:53PM -0400, Rik van Riel wrote:
> > > > Document that if a command line or environment string is too long
> > > > (> MAX_ARG_STRLEN), execve will also return E2BIG.
> > >
> > > That's already implied by the current text:
> > >
> > >        E2BIG  The total number of bytes in the environment (envp) and
> > > argument
> > >               list (argv) is too large.
> > >
> > > That means that
> > >
> > > size_t  bytes;
> > >
> > > bytes = 0;
> > > for (char *e = envp; e != NULL; e++)
> > >         bytes += strlen(e) + 1;  // I have doubts about the +1
> > > for (char *a = argv; a != NULL; a++)
> > >         bytes += strlen(a) + 1;  // Same doubts
> > >
> > > if (bytes > MAX_ARG_STRLEN)  // Maybe >= ?
> > >         return -E2BIG;
> >
> > The code in fs/exec.c enforces MAX_ARG_STRLEN against
> > each individual string, not against the total.
> >
> > If any string, either argument or environment, is larger
> > than 32 * PAGE_SIZE, the kernel will return -E2BIG.
> >
> > do_execveat_common() has this code, which uses copy_strings
> > to copy both the strings from the environment, and from
> > the command line arguments:
> >
> >         retval = copy_strings(bprm->envc, envp, bprm);
> >         if (retval < 0)
> >                 goto out_free;
> >
> >         retval = copy_strings(bprm->argc, argv, bprm);
> >         if (retval < 0)
> >                 goto out_free;
> >
> > Inside copy_strings() we have this code:
> >
> >
> >         while (argc-- > 0) {
> > ...
> >                 len = strnlen_user(str, MAX_ARG_STRLEN);
> >                 if (!len)
> >                         goto out;
> >
> >                 ret = -E2BIG;
> >                 if (!valid_arg_len(bprm, len))
> >                         goto out;
> >
> > The valid_arg_len() function does not need explanation:
> >
> > static bool valid_arg_len(struct linux_binprm *bprm, long len)
> > {
> >         return len <= MAX_ARG_STRLEN;
> > }
> >
> >
> > The current man page wording is very clear about the total
> > length being enforced, but IMHO not as clear about the limit
> > that gets enforced on each individual string.
> >
> > The total length limit of environment & commandline arguments
> > is enforced by bprm_stack_limits(), and is checked against
> > either 1/4 of the maximum stack size, or 3/4 of _STK_LIM, whichever
> > is smaller. The MAX_ARG_STRLEN value does not come into play when
> > enforcing the total.
> 
> To expand on this, there are basically two separate byte limits in
> fs/exec.c, one for each individual argv/envp string, and another for all
> strings and all pointers to them as a whole. To put the whole thing in
> pseudocode, the checks work effectively like this, assuming I haven't made
> any errors:
> 
> int argc, envc;
> unsigned long bytes, limit;
> 
> /* assume that argv has already been adjusted to add an empty argv[0] */
> argc = 0, envc = 0, bytes = 0;
> for (char **a = argv; *a != NULL; a++, argc++) {
>     if (strlen(*a) >= MAX_ARG_STRLEN)

Are you sure this is >= and not > ?

>         return -E2BIG;
>     bytes += strlen(*a) + 1;
> }
> for (char **e = envp; *e != NULL; e++, envc++) {
>     if (strlen(*e) >= MAX_ARG_STRLEN)
>         return -E2BIG;
>     bytes += strlen(*e) + 1;
> }
> 
> if (argc > MAX_ARG_STRINGS || envc > MAX_ARG_STRINGS)
>     return -E2BIG;
> bytes += (argc + envc) * sizeof(void *);
> 
> limit = max(min(_STK_LIM / 4 * 3, rlim_stack.rlim_cur / 4), ARG_MAX);
> if (bytes > limit)
>     return -E2BIG;
> 
> Thank you,
> Matthew House
  
Matthew House Oct. 11, 2023, 3:11 p.m. UTC | #8
On Wed, Oct 11, 2023 at 10:47 AM Alejandro Colomar <alx@kernel.org> wrote:
> On Wed, Oct 11, 2023 at 09:44:29AM -0400, Matthew House wrote:
> > To expand on this, there are basically two separate byte limits in
> > fs/exec.c, one for each individual argv/envp string, and another for all
> > strings and all pointers to them as a whole. To put the whole thing in
> > pseudocode, the checks work effectively like this, assuming I haven't made
> > any errors:
> >
> > int argc, envc;
> > unsigned long bytes, limit;
> >
> > /* assume that argv has already been adjusted to add an empty argv[0] */
> > argc = 0, envc = 0, bytes = 0;
> > for (char **a = argv; *a != NULL; a++, argc++) {
> >     if (strlen(*a) >= MAX_ARG_STRLEN)
>
> Are you sure this is >= and not > ?

Yes. In general, the kernel's string limits tend to include the trailing
null byte. There are two places where this limit is enforced, one for the
pathname (or full pathname for execveat) and the other for the argv/envp
strings. The pathname is handled by copy_string_kernel():

	int len = strnlen(arg, MAX_ARG_STRLEN) + 1 /* terminating NUL */;

	if (len == 0)
		return -EFAULT;
	if (!valid_arg_len(bprm, len))
		return -E2BIG;

where valid_arg_len(bprm, len) is just (len <= MAX_ARG_STRLEN). Here,
strnlen() has the same behavior as the ordinary libc strnlen(3),
effectively returning min(strlen(arg), MAX_ARG_STRLEN). Thus, the check
succeeds iff strlen(arg) + 1 <= MAX_ARG_STRLEN, or equivalently, iff
strlen(arg) < MAX_ARG_STRLEN.

Next, each of the environment and argument strings is handled by
copy_strings():

		len = strnlen_user(str, MAX_ARG_STRLEN);
		if (!len)
			goto out;

		ret = -E2BIG;
		if (!valid_arg_len(bprm, len))
			goto out;

The strnlen_user() function, per its documentation, is explicitly inclusive
of the trailing null byte:

 * Returns the size of the string INCLUDING the terminating NUL.
 * If the string is too long, returns a number larger than @count. User
 * has to check the return value against "> count".
 * On exception (or invalid count), returns 0.

Thus, the check succeeds iff the size including the null byte is
<= MAX_ARG_STRLEN, i.e., iff strlen(arg) + 1 <= MAX_ARG_STRLEN, or
strlen(arg) < MAX_ARG_STRLEN.

Matthew House
  
Alejandro Colomar Oct. 11, 2023, 3:50 p.m. UTC | #9
Hi Matthew,

On Wed, Oct 11, 2023 at 11:11:24AM -0400, Matthew House wrote:
> On Wed, Oct 11, 2023 at 10:47 AM Alejandro Colomar <alx@kernel.org> wrote:
> > On Wed, Oct 11, 2023 at 09:44:29AM -0400, Matthew House wrote:
> > > To expand on this, there are basically two separate byte limits in
> > > fs/exec.c, one for each individual argv/envp string, and another for all
> > > strings and all pointers to them as a whole. To put the whole thing in
> > > pseudocode, the checks work effectively like this, assuming I haven't made
> > > any errors:
> > >
> > > int argc, envc;
> > > unsigned long bytes, limit;
> > >
> > > /* assume that argv has already been adjusted to add an empty argv[0] */
> > > argc = 0, envc = 0, bytes = 0;
> > > for (char **a = argv; *a != NULL; a++, argc++) {
> > >     if (strlen(*a) >= MAX_ARG_STRLEN)
> >
> > Are you sure this is >= and not > ?
> 
> Yes. In general, the kernel's string limits tend to include the trailing
> null byte. There are two places where this limit is enforced, one for the
> pathname (or full pathname for execveat) and the other for the argv/envp
> strings. The pathname is handled by copy_string_kernel():
> 
> 	int len = strnlen(arg, MAX_ARG_STRLEN) + 1 /* terminating NUL */;
> 
> 	if (len == 0)
> 		return -EFAULT;
> 	if (!valid_arg_len(bprm, len))
> 		return -E2BIG;
> 
> where valid_arg_len(bprm, len) is just (len <= MAX_ARG_STRLEN). Here,
> strnlen() has the same behavior as the ordinary libc strnlen(3),
> effectively returning min(strlen(arg), MAX_ARG_STRLEN). Thus, the check
> succeeds iff strlen(arg) + 1 <= MAX_ARG_STRLEN, or equivalently, iff
> strlen(arg) < MAX_ARG_STRLEN.
> 
> Next, each of the environment and argument strings is handled by
> copy_strings():
> 
> 		len = strnlen_user(str, MAX_ARG_STRLEN);
> 		if (!len)
> 			goto out;
> 
> 		ret = -E2BIG;
> 		if (!valid_arg_len(bprm, len))
> 			goto out;
> 
> The strnlen_user() function, per its documentation, is explicitly inclusive
> of the trailing null byte:
> 
>  * Returns the size of the string INCLUDING the terminating NUL.
>  * If the string is too long, returns a number larger than @count. User
>  * has to check the return value against "> count".
>  * On exception (or invalid count), returns 0.
> 
> Thus, the check succeeds iff the size including the null byte is
> <= MAX_ARG_STRLEN, i.e., iff strlen(arg) + 1 <= MAX_ARG_STRLEN, or
> strlen(arg) < MAX_ARG_STRLEN.

Thanks!  It's a bit confusing to see the terms 'len' and '_STRLEN'
meaning length+1, but it makes sense now.

Cheers,
Alex

> 
> Matthew House
  

Patch

diff --git a/man2/execve.2 b/man2/execve.2
index 0d9582492ad1..c1a359d01872 100644
--- a/man2/execve.2
+++ b/man2/execve.2
@@ -449,7 +449,7 @@  The total number of bytes in the environment
 .RI ( envp )
 and argument list
 .RI ( argv )
-is too large.
+is too large, or an argument or environment string is too long.
 .TP
 .B EACCES
 Search permission is denied on a component of the path prefix of