[v1] sched/fair: fix inconsistency in update_task_scan_period

Message ID 20230425100204.2032009-1-zhaomzhao@126.com
State New
Headers
Series [v1] sched/fair: fix inconsistency in update_task_scan_period |

Commit Message

Zhao Mengmeng April 25, 2023, 10:02 a.m. UTC
  From: Zhao Mengmeng <zhaomengmeng@kylinos.cn>

During calculate numa_scan_period diff, the actual code
and the comment are inconsistent. The comment says it is
using shared faults ratio, but code uses private faults
ratio. This patch fixes it.

Signed-off-by: Zhao Mengmeng <zhaomengmeng@kylinos.cn>
---
 kernel/sched/fair.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)
  

Comments

Peter Zijlstra April 25, 2023, 10:24 a.m. UTC | #1
On Tue, Apr 25, 2023 at 06:02:04AM -0400, zhaomzhao@126.com wrote:
> From: Zhao Mengmeng <zhaomengmeng@kylinos.cn>
> 
> During calculate numa_scan_period diff, the actual code
> and the comment are inconsistent. The comment says it is
> using shared faults ratio, but code uses private faults
> ratio. This patch fixes it.

So for some reason you think the comment is correct ? You also don't
discuss the performance changes caused by changing the code.
  
Zhao Mengmeng April 26, 2023, 1:46 a.m. UTC | #2
On 2023/4/25 18:24, Peter Zijlstra wrote:
> On Tue, Apr 25, 2023 at 06:02:04AM -0400, zhaomzhao@126.com wrote:
>> From: Zhao Mengmeng <zhaomengmeng@kylinos.cn>
>>
>> During calculate numa_scan_period diff, the actual code
>> and the comment are inconsistent. The comment says it is
>> using shared faults ratio, but code uses private faults
>> ratio. This patch fixes it.
> 
> So for some reason you think the comment is correct ? You also don't
> discuss the performance changes caused by changing the code.

Sorry for the repeated mail and thanks for your comments.

First, I'd like to make some additions. When I was studying the 
numa-balancing code and found this inconsistency, I feel like the same 
as you: the comment is out-dated and not matching the code , so I was 
intended to change the comment. But after some git digging work,  I 
change my mind.

Function update_task_scan_period was added in "04bb2f9475054: 
sched/numa: Adjust scan rate in task_numa_placement", which set up the 
basis. Though the details logic inside the function may change a lot, 
this function's outer-side effect keep the same.
"""
Increase the scan period (slow down scanning) if the majority of our 
memory is
already on our local node, or if the majority of the page accesses are 
shared
with other processes
"""
Later, in commit "37ec97deb3a8c: sched/numa: Slow down scan rate if 
shared faults dominate", the author claims that
"""
    However, with the current code, all a high ratio of shared accesses
    does is slow down the rate at which scanning is made faster.

    This patch changes things so either lots of shared accesses or
    lots of local accesses will slow down scanning, and numa scanning
    is sped up only when there are lots of private faults on remote
    memory pages.
"""
But the actual code logic doesn't reflect his intention in commit log, 
which resulting the latest code used by kernel. Based on the this, I 
change the code and make this patch. That is *the reason I think the 
comment is correct*.

Second, the performance. I'm sorry that it is the shortage of my work. 
Which benchmark shall I use, like autonuma-benchmark?  I'm lack of 
physical server with multi numa node for testing,  is it enough to offer 
a virtual machine testing result? Besides, this patch just meant to 
achieve the initial design, not for optimization, is a performance 
result necessary?

Best regards.
  

Patch

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 373ff5f55884..73cc83128072 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2362,7 +2362,7 @@  static void update_task_scan_period(struct task_struct *p,
 			unsigned long shared, unsigned long private)
 {
 	unsigned int period_slot;
-	int lr_ratio, ps_ratio;
+	int lr_ratio, sp_ratio;
 	int diff;
 
 	unsigned long remote = p->numa_faults_locality[0];
@@ -2393,24 +2393,24 @@  static void update_task_scan_period(struct task_struct *p,
 	 */
 	period_slot = DIV_ROUND_UP(p->numa_scan_period, NUMA_PERIOD_SLOTS);
 	lr_ratio = (local * NUMA_PERIOD_SLOTS) / (local + remote);
-	ps_ratio = (private * NUMA_PERIOD_SLOTS) / (private + shared);
+	sp_ratio = (shared * NUMA_PERIOD_SLOTS) / (private + shared);
 
-	if (ps_ratio >= NUMA_PERIOD_THRESHOLD) {
+	if (lr_ratio >= NUMA_PERIOD_THRESHOLD) {
 		/*
 		 * Most memory accesses are local. There is no need to
 		 * do fast NUMA scanning, since memory is already local.
 		 */
-		int slot = ps_ratio - NUMA_PERIOD_THRESHOLD;
+		int slot = lr_ratio - NUMA_PERIOD_THRESHOLD;
 		if (!slot)
 			slot = 1;
 		diff = slot * period_slot;
-	} else if (lr_ratio >= NUMA_PERIOD_THRESHOLD) {
+	} else if (sp_ratio >= NUMA_PERIOD_THRESHOLD) {
 		/*
 		 * Most memory accesses are shared with other tasks.
 		 * There is no point in continuing fast NUMA scanning,
 		 * since other tasks may just move the memory elsewhere.
 		 */
-		int slot = lr_ratio - NUMA_PERIOD_THRESHOLD;
+		int slot = sp_ratio - NUMA_PERIOD_THRESHOLD;
 		if (!slot)
 			slot = 1;
 		diff = slot * period_slot;
@@ -2420,7 +2420,7 @@  static void update_task_scan_period(struct task_struct *p,
 		 * yet they are not on the local NUMA node. Speed up
 		 * NUMA scanning to get the memory moved over.
 		 */
-		int ratio = max(lr_ratio, ps_ratio);
+		int ratio = max(lr_ratio, sp_ratio);
 		diff = -(NUMA_PERIOD_THRESHOLD - ratio) * period_slot;
 	}