[v4] sched/numa: Fix divide by zero for sysctl_numa_balancing_scan_size.

Message ID 20230406152633.3136708-1-chris.hyser@oracle.com
State New
Headers
Series [v4] sched/numa: Fix divide by zero for sysctl_numa_balancing_scan_size. |

Commit Message

Chris Hyser April 6, 2023, 3:26 p.m. UTC
  Commit 6419265899d9 ("sched/fair: Fix division by zero
sysctl_numa_balancing_scan_size") prevented a divide by zero by using
sysctl mechanisms to return EINVAL for a sysctl_numa_balancing_scan_size
value of zero. When moved from a sysctl to a debugfs file, this checking
was lost.

This patch puts zero checking back in place.

Cc: stable@vger.kernel.org
Fixes: 8a99b6833c88 ("sched: Move SCHED_DEBUG sysctl to debugfs")
Tested-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Chris Hyser <chris.hyser@oracle.com>
---
 kernel/sched/debug.c | 44 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 43 insertions(+), 1 deletion(-)
  

Comments

Peter Zijlstra April 29, 2023, 2:56 p.m. UTC | #1
On Thu, Apr 06, 2023 at 11:26:33AM -0400, chris hyser wrote:
> Commit 6419265899d9 ("sched/fair: Fix division by zero
> sysctl_numa_balancing_scan_size") prevented a divide by zero by using
> sysctl mechanisms to return EINVAL for a sysctl_numa_balancing_scan_size
> value of zero. When moved from a sysctl to a debugfs file, this checking
> was lost.
> 
> This patch puts zero checking back in place.
> 
> Cc: stable@vger.kernel.org
> Fixes: 8a99b6833c88 ("sched: Move SCHED_DEBUG sysctl to debugfs")
> Tested-by: Chen Yu <yu.c.chen@intel.com>
> Signed-off-by: Chris Hyser <chris.hyser@oracle.com>

I suppose.. but is it really worth the hassle? I mean, this is debug
stuff, just don't write 0 in then?

If we do find we want this (why?!) then should we not invest in a better
debugfs_create_u32_minmax() or something so that we don't get to add 40+
lines for everthing we want to add limits on?

> ---
>  kernel/sched/debug.c | 44 +++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 43 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
> index 1637b65ba07a..cc6a0172a598 100644
> --- a/kernel/sched/debug.c
> +++ b/kernel/sched/debug.c
> @@ -278,6 +278,48 @@ static const struct file_operations sched_dynamic_fops = {
>  
>  #endif /* CONFIG_PREEMPT_DYNAMIC */
>  
> +#ifdef CONFIG_NUMA_BALANCING
> +
> +static ssize_t sched_numa_scan_write(struct file *filp, const char __user *ubuf,
> +				     size_t cnt, loff_t *ppos)
> +{
> +	int err;
> +	unsigned int scan_size;
> +
> +	err = kstrtouint_from_user(ubuf, cnt, 10, &scan_size);
> +	if (err)
> +		return err;
> +
> +	if (!scan_size)
> +		return -EINVAL;
> +
> +	sysctl_numa_balancing_scan_size = scan_size;
> +
> +	*ppos += cnt;
> +	return cnt;
> +}
> +
> +static int sched_numa_scan_show(struct seq_file *m, void *v)
> +{
> +	seq_printf(m, "%d\n", sysctl_numa_balancing_scan_size);
> +	return 0;
> +}
> +
> +static int sched_numa_scan_open(struct inode *inode, struct file *filp)
> +{
> +	return single_open(filp, sched_numa_scan_show, NULL);
> +}
> +
> +static const struct file_operations sched_numa_scan_fops = {
> +	.open		= sched_numa_scan_open,
> +	.write		= sched_numa_scan_write,
> +	.read		= seq_read,
> +	.llseek		= seq_lseek,
> +	.release	= single_release,
> +};
> +
> +#endif /* CONFIG_NUMA_BALANCING */
> +
>  __read_mostly bool sched_debug_verbose;
>  
>  static const struct seq_operations sched_debug_sops;
> @@ -332,7 +374,7 @@ static __init int sched_init_debug(void)
>  	debugfs_create_u32("scan_delay_ms", 0644, numa, &sysctl_numa_balancing_scan_delay);
>  	debugfs_create_u32("scan_period_min_ms", 0644, numa, &sysctl_numa_balancing_scan_period_min);
>  	debugfs_create_u32("scan_period_max_ms", 0644, numa, &sysctl_numa_balancing_scan_period_max);
> -	debugfs_create_u32("scan_size_mb", 0644, numa, &sysctl_numa_balancing_scan_size);
> +	debugfs_create_file("scan_size_mb", 0644, numa, NULL, &sched_numa_scan_fops);
>  	debugfs_create_u32("hot_threshold_ms", 0644, numa, &sysctl_numa_balancing_hot_threshold);
>  #endif
>  
> -- 
> 2.31.1
>
  
Chris Hyser May 1, 2023, 4:21 p.m. UTC | #2
On 4/29/23 10:56, Peter Zijlstra wrote:
> On Thu, Apr 06, 2023 at 11:26:33AM -0400, chris hyser wrote:
>> Commit 6419265899d9 ("sched/fair: Fix division by zero
>> sysctl_numa_balancing_scan_size") prevented a divide by zero by using
>> sysctl mechanisms to return EINVAL for a sysctl_numa_balancing_scan_size
>> value of zero. When moved from a sysctl to a debugfs file, this checking
>> was lost.
>>
>> This patch puts zero checking back in place.
>>
>> Cc: stable@vger.kernel.org
>> Fixes: 8a99b6833c88 ("sched: Move SCHED_DEBUG sysctl to debugfs")
>> Tested-by: Chen Yu <yu.c.chen@intel.com>
>> Signed-off-by: Chris Hyser <chris.hyser@oracle.com>
> 
> I suppose.. but is it really worth the hassle? I mean, this is debug
> stuff, just don't write 0 in then?

My understanding of the history is that this was always debug, someone 
found the divide by zero and a convenient sysctl mechanism was used to 
fix it. It did also cleanup a little compiler weirdness. I did not see 
any justifications in the discussion of the inclusion of that patch 
other than showing the nasty stack trace you get when the machine dies 
after writing a zero. It is a major inconvenience, completely 
preventable and technically a regression, but as you point out the new 
fix is a lot more code.

In terms of actually wanting to fix this, I'm a bit confused. It clearly 
was worth fixing the first time around (it has your sign-off), and the 
only thing that has changed is that that fix no longer works.

> 
> If we do find we want this (why?!) then should we not invest in a better
> debugfs_create_u32_minmax() or something so that we don't get to add 40+
> lines for everthing we want to add limits on?

I will look at a way to greatly simplify the bounds checking here as you 
suggest.


-chrish
  
Peter Zijlstra May 1, 2023, 6:55 p.m. UTC | #3
On Mon, May 01, 2023 at 12:21:17PM -0400, chris hyser wrote:
> In terms of actually wanting to fix this, I'm a bit confused. It clearly was
> worth fixing the first time around (it has your sign-off), and the only
> thing that has changed is that that fix no longer works.

Well, the amount of effort to fix it has dramatically increased, 40+
extra lines vs 2 extra lines.

> > If we do find we want this (why?!) then should we not invest in a better
> > debugfs_create_u32_minmax() or something so that we don't get to add 40+
> > lines for everthing we want to add limits on?
> 
> I will look at a way to greatly simplify the bounds checking here as you
> suggest.

Thanks, that might make it all a lot nicer indeed.
  

Patch

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 1637b65ba07a..cc6a0172a598 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -278,6 +278,48 @@  static const struct file_operations sched_dynamic_fops = {
 
 #endif /* CONFIG_PREEMPT_DYNAMIC */
 
+#ifdef CONFIG_NUMA_BALANCING
+
+static ssize_t sched_numa_scan_write(struct file *filp, const char __user *ubuf,
+				     size_t cnt, loff_t *ppos)
+{
+	int err;
+	unsigned int scan_size;
+
+	err = kstrtouint_from_user(ubuf, cnt, 10, &scan_size);
+	if (err)
+		return err;
+
+	if (!scan_size)
+		return -EINVAL;
+
+	sysctl_numa_balancing_scan_size = scan_size;
+
+	*ppos += cnt;
+	return cnt;
+}
+
+static int sched_numa_scan_show(struct seq_file *m, void *v)
+{
+	seq_printf(m, "%d\n", sysctl_numa_balancing_scan_size);
+	return 0;
+}
+
+static int sched_numa_scan_open(struct inode *inode, struct file *filp)
+{
+	return single_open(filp, sched_numa_scan_show, NULL);
+}
+
+static const struct file_operations sched_numa_scan_fops = {
+	.open		= sched_numa_scan_open,
+	.write		= sched_numa_scan_write,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
+#endif /* CONFIG_NUMA_BALANCING */
+
 __read_mostly bool sched_debug_verbose;
 
 static const struct seq_operations sched_debug_sops;
@@ -332,7 +374,7 @@  static __init int sched_init_debug(void)
 	debugfs_create_u32("scan_delay_ms", 0644, numa, &sysctl_numa_balancing_scan_delay);
 	debugfs_create_u32("scan_period_min_ms", 0644, numa, &sysctl_numa_balancing_scan_period_min);
 	debugfs_create_u32("scan_period_max_ms", 0644, numa, &sysctl_numa_balancing_scan_period_max);
-	debugfs_create_u32("scan_size_mb", 0644, numa, &sysctl_numa_balancing_scan_size);
+	debugfs_create_file("scan_size_mb", 0644, numa, NULL, &sched_numa_scan_fops);
 	debugfs_create_u32("hot_threshold_ms", 0644, numa, &sysctl_numa_balancing_hot_threshold);
 #endif