[v3,00/10] sched/fair: Avoid unnecessary migrations within SMT domains

Message ID 20230207045838.11243-1-ricardo.neri-calderon@linux.intel.com
Headers
Series sched/fair: Avoid unnecessary migrations within SMT domains |

Message

Ricardo Neri Feb. 7, 2023, 4:58 a.m. UTC
  Hi,

This is v3 of this series. Previous versions can be found here [1] and
here [2]. To avoid duplication, I do not include the cover letter of the
original submission. You can read it in [1].

Changes since v2:

Vincent correctly indicated that I was abusing asym_packing to force load
balances unrelated to CPU priority. The underlying issue is that the
scheduler cannot not handle load balances between SMT and non-SMT cores
correctly. I added several prework patches to fix it... and I removed the
abuse of asym_packing.

Dietmar helped me to realize that there is a better way to check the idle
state of SMT cores. Now I give the task to the scheduler instead of
architecture-specific overrides. I unconditionally obey CPU priorities
at the SMT level. This keeps Power7 happy. At upper levels (i.e., when
balancing load between cores) the scheduler also considers the idle state
of the core in addition to CPU priority. This satisfies x86.

Ionela spotted a violation of the scheduler topology sanity checks. We did
not find a check that suits both Power7 and x86. For now, I removed the
NEEDS_CHILD flag of SD_ASYM_PACKING.

Hopefully, these patches are in sufficiently good shape to be merged.

Thank you for your feedback and I look forward to getting more of it!

New patches 2, 3, 4, 5, 6, 7, 8
Updated patches: 1
Unchanged patches: 9, 10

BR,
Ricardo

[1]. https://lore.kernel.org/lkml/20220825225529.26465-1-ricardo.neri-calderon@linux.intel.com/
[2]. https://lore.kernel.org/lkml/20221122203532.15013-1-ricardo.neri-calderon@linux.intel.com/


Ricardo Neri (10):
  sched/fair: Generalize asym_packing logic for SMT cores
  sched/fair: Move is_core_idle() out of CONFIG_NUMA
  sched/fair: Only do asym_packing load balancing from fully idle SMT
    cores
  sched/fair: Let low-priority cores help high-priority busy SMT cores
  sched/fair: Keep a fully_busy SMT sched group as busiest
  sched/fair: Use the prefer_sibling flag of the current sched domain
  sched/fair: Do not even the number of busy CPUs via asym_packing
  sched/topology: Remove SHARED_CHILD from ASYM_PACKING
  x86/sched: Remove SD_ASYM_PACKING from the SMT domain flags
  x86/sched/itmt: Give all SMT siblings of a core the same priority

 arch/x86/kernel/itmt.c         |  23 +----
 arch/x86/kernel/smpboot.c      |   2 +-
 include/linux/sched/sd_flags.h |   5 +-
 kernel/sched/fair.c            | 175 +++++++++++++++++----------------
 4 files changed, 99 insertions(+), 106 deletions(-)
  

Comments

Zhang, Rui Feb. 9, 2023, 8:07 a.m. UTC | #1
Hi, All,

On Mon, 2023-02-06 at 20:58 -0800, Ricardo Neri wrote:
> Hi,
> 
> This is v3 of this series. Previous versions can be found here [1]
> and
> here [2]. To avoid duplication, I do not include the cover letter of
> the
> original submission. You can read it in [1].

I happened to run into a similar issue when testing another patch
series which allows idle injections for partial cpus instead of all
cpus. 
https://lore.kernel.org/all/a68a6f8c76cb719cd4865bd6aa726306772d4ee3.camel@intel.com/

On an ADL-P NUC system with 4 Pcores (cpu0-cpu7), and 8 Ecores (cpu8-
cpu15), the problem can be reproduced by
1. start 16 stress threads
2. force idle injection to all Ecore cpus
3. stop idle injection after 10 seconds
After step 3, all the Pcore cpus are 100% busy, and all the Ecore cpus
are almost 100% idle. This situation lasts for a long time, till I kill
all the stress threads after 20 seconds.

After sync with Chen Yu, I also tried
	stress -c 16 &
	chrt -r 70 taskset -c 8-15 stress -c 8 -t 10
instead of idle injection, and the problem is also 100% reproducible.

And note that, the problem can be reproduced w/ and w/o ITMT enabled,
by poking /proc/sys/kernel/sched_itmt_enabled

With this whole patch series applied, I can confirm the problem is gone
both w/ and w/o ITMT enabled. So

Tested-by: Zhang Rui <rui.zhang@intel.com>

thanks,
rui

> 
> Changes since v2:
> 
> Vincent correctly indicated that I was abusing asym_packing to force
> load
> balances unrelated to CPU priority. The underlying issue is that the
> scheduler cannot not handle load balances between SMT and non-SMT
> cores
> correctly. I added several prework patches to fix it... and I removed
> the
> abuse of asym_packing.
> 
> Dietmar helped me to realize that there is a better way to check the
> idle
> state of SMT cores. Now I give the task to the scheduler instead of
> architecture-specific overrides. I unconditionally obey CPU
> priorities
> at the SMT level. This keeps Power7 happy. At upper levels (i.e.,
> when
> balancing load between cores) the scheduler also considers the idle
> state
> of the core in addition to CPU priority. This satisfies x86.
> 
> Ionela spotted a violation of the scheduler topology sanity checks.
> We did
> not find a check that suits both Power7 and x86. For now, I removed
> the
> NEEDS_CHILD flag of SD_ASYM_PACKING.
> 
> Hopefully, these patches are in sufficiently good shape to be merged.
> 
> Thank you for your feedback and I look forward to getting more of it!
> 
> New patches 2, 3, 4, 5, 6, 7, 8
> Updated patches: 1
> Unchanged patches: 9, 10
> 
> BR,
> Ricardo
> 
> [1]. 
> https://lore.kernel.org/lkml/20220825225529.26465-1-ricardo.neri-calderon@linux.intel.com/
> [2]. 
> https://lore.kernel.org/lkml/20221122203532.15013-1-ricardo.neri-calderon@linux.intel.com/
> 
> 
> Ricardo Neri (10):
>   sched/fair: Generalize asym_packing logic for SMT cores
>   sched/fair: Move is_core_idle() out of CONFIG_NUMA
>   sched/fair: Only do asym_packing load balancing from fully idle SMT
>     cores
>   sched/fair: Let low-priority cores help high-priority busy SMT
> cores
>   sched/fair: Keep a fully_busy SMT sched group as busiest
>   sched/fair: Use the prefer_sibling flag of the current sched domain
>   sched/fair: Do not even the number of busy CPUs via asym_packing
>   sched/topology: Remove SHARED_CHILD from ASYM_PACKING
>   x86/sched: Remove SD_ASYM_PACKING from the SMT domain flags
>   x86/sched/itmt: Give all SMT siblings of a core the same priority
> 
>  arch/x86/kernel/itmt.c         |  23 +----
>  arch/x86/kernel/smpboot.c      |   2 +-
>  include/linux/sched/sd_flags.h |   5 +-
>  kernel/sched/fair.c            | 175 +++++++++++++++++------------
> ----
>  4 files changed, 99 insertions(+), 106 deletions(-)
>