sysinfo: Saturate 16-bit procs rather than wrapping

Message ID e32ea9a03d0797ce2b8e7a82ed59c0dad9431f2b.1680407255.git.josh@joshtriplett.org
State New
Headers
Series sysinfo: Saturate 16-bit procs rather than wrapping |

Commit Message

Josh Triplett April 2, 2023, 3:57 a.m. UTC
  struct sysinfo has a 16-bit field for the number of processes. Current
systems can easily exceed this. Rather than wrapping around, saturate
the value at U16_MAX. This is still incorrect, but more likely to
help the user know what's going on; a caller can then (for instance)
parse the full value out of /proc/loadavg.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
---

Not sure what tree changes to kernel/sys.c should flow through. Andrew,
could you take this through your tree (assuming you agree with it), or
suggest what tree it should go through instead?
  

Comments

Eric W. Biederman April 5, 2023, 10:27 p.m. UTC | #1
Josh Triplett <josh@joshtriplett.org> writes:

> struct sysinfo has a 16-bit field for the number of processes. Current
> systems can easily exceed this. Rather than wrapping around, saturate
> the value at U16_MAX. This is still incorrect, but more likely to
> help the user know what's going on; a caller can then (for instance)
> parse the full value out of /proc/loadavg.
>
> Signed-off-by: Josh Triplett <josh@joshtriplett.org>
> ---
>
> Not sure what tree changes to kernel/sys.c should flow through. Andrew,
> could you take this through your tree (assuming you agree with it), or
> suggest what tree it should go through instead?


Mind if I ask what the motivation for this is?

I looked at debian code search and there are a lot of uses of the
sysinfo system call.  Most of the uses were for load average or memory
occupancy.  The only use of procs that I could find was in samba.  I did
not trace the code far enough but it clearly had an embedded assumption
that 16 bits was enough to report the number of processes on a linux
system.

I looked at glibc and if I read things correctly the sysinfo system
call is just a pass through to the kernel.


I looked because just saturating the 16bit field feels like a hack
that will continue to encourage buggy programs to stay buggy.

If there is real value in sysinfo returning a this information someone
could go through the work and update the kernel to return the high
bits of the process count in info->pad that is immediately after
info->procs, and then update the apps or libc to find those high bits.

Otherwise I think it makes most sense to encourage programs to
use /proc/loadavg, where this information has always been returned
correctly as it is a text file.  We could do it like:

	/*
	 * Reliably fail when there are more than 64k processes.
         * Userspace should use /proc/loadavg instead.
         */
	info->procs = (nr_threads <= U16_MAX) ? nr_threads : 0;

If saturating does make sense can we please have a comment documenting
why saturating and encouraging confused userspace programs to stay
confused makes sense?


Eric


> diff --git a/kernel/sys.c b/kernel/sys.c
> index 495cd87d9bf4..ba05fca26927 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -2699,7 +2699,7 @@ static int do_sysinfo(struct sysinfo *info)
>  
>  	get_avenrun(info->loads, 0, SI_LOAD_SHIFT - FSHIFT);
>  
> -	info->procs = nr_threads;
> +	info->procs = min_t(typeof(nr_threads), nr_threads, U16_MAX);
>  
>  	si_meminfo(info);
>  	si_swapinfo(info);
  
Josh Triplett April 6, 2023, 1:03 a.m. UTC | #2
On Wed, Apr 05, 2023 at 05:27:12PM -0500, Eric W. Biederman wrote:
> Josh Triplett <josh@joshtriplett.org> writes:
> 
> > struct sysinfo has a 16-bit field for the number of processes. Current
> > systems can easily exceed this. Rather than wrapping around, saturate
> > the value at U16_MAX. This is still incorrect, but more likely to
> > help the user know what's going on; a caller can then (for instance)
> > parse the full value out of /proc/loadavg.
> >
> > Signed-off-by: Josh Triplett <josh@joshtriplett.org>
> > ---
> >
> > Not sure what tree changes to kernel/sys.c should flow through. Andrew,
> > could you take this through your tree (assuming you agree with it), or
> > suggest what tree it should go through instead?
> 
> 
> Mind if I ask what the motivation for this is?
[...]
> I looked because just saturating the 16bit field feels like a hack
> that will continue to encourage buggy programs to stay buggy.

On the contrary, it seemed to me like the best way to make issues more
obvious. Wrapping around and showing (say) 12 processes because the
results are mod 65536 won't necessarily immediately stand out in stats,
the way that saturating at 65535 does.

That said, I like the idea of doing 0 even more:

> If there is real value in sysinfo returning a this information someone
> could go through the work and update the kernel to return the high
> bits of the process count in info->pad that is immediately after
> info->procs, and then update the apps or libc to find those high bits.

I'd love to do that; I just thought that'd be less likely to be
accepted. But yes, I think that'd be even better.

That said, I think we need to return *all* the bits in the later
padding, rather than just the high bits, so that we can reliably detect
if the bits were provided. We can return 0 in the existing field, and
then return the process count in the padding, and if the padding is 0
then it wasn't provided.

(If we returned the high bits in the later padding, it'd be hard to tell
if low=42 and high=0 was 42 processes or 42-mod-65536 processes on an
old kernel. If we return the full bits in the later padding, we can
reliably tell the difference between those.)

- Josh Triplett
  

Patch

diff --git a/kernel/sys.c b/kernel/sys.c
index 495cd87d9bf4..ba05fca26927 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2699,7 +2699,7 @@  static int do_sysinfo(struct sysinfo *info)
 
 	get_avenrun(info->loads, 0, SI_LOAD_SHIFT - FSHIFT);
 
-	info->procs = nr_threads;
+	info->procs = min_t(typeof(nr_threads), nr_threads, U16_MAX);
 
 	si_meminfo(info);
 	si_swapinfo(info);