[v6,03/11] cpu/hotplug: Add dynamic parallel bringup states before CPUHP_BRINGUP_CPU
Commit Message
From: David Woodhouse <dwmw@amazon.co.uk>
If the platform registers these states, bring all CPUs to each registered
state in turn, before the final bringup to CPUHP_BRINGUP_CPU. This allows
the architecture to parallelise the slow asynchronous tasks like sending
INIT/SIPI and waiting for the AP to come to life.
There is a subtlety here: even with an empty CPUHP_BP_PARALLEL_DYN step,
this means that *all* CPUs are brought through the prepare states and to
CPUHP_BP_PREPARE_DYN before any of them are taken to CPUHP_BRINGUP_CPU
and then are allowed to run for themselves to CPUHP_ONLINE.
So any combination of prepare/start calls which depend on A-B ordering
for each CPU in turn, such as the X2APIC code which used to allocate a
cluster mask 'just in case' and store it in a global variable in the
prep stage, then potentially consume that preallocated structure from
the AP and set the global pointer to NULL to be reallocated in
CPUHP_X2APIC_PREPARE for the next CPU... would explode horribly.
We believe that X2APIC was the only such case, for x86. But this is why
it remains an architecture opt-in. For now.
Note that the new parallel stages do *not* yet bring each AP to the
CPUHP_BRINGUP_CPU state. The final loop in bringup_nonboot_cpus() is
untouched, bringing each AP in turn from the final PARALLEL_DYN state
(or all the way from CPUHP_OFFLINE) to CPUHP_BRINGUP_CPU and then
waiting for that AP to do its own processing and reach CPUHP_ONLINE
before releasing the next. Parallelising that part by bringing them all
to CPUHP_BRINGUP_CPU and then waiting for them all is an exercise for
the future.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
include/linux/cpuhotplug.h | 2 ++
kernel/cpu.c | 27 +++++++++++++++++++++++++--
2 files changed, 27 insertions(+), 2 deletions(-)
Comments
On Thu, Feb 02 2023 at 21:56, Usama Arif wrote:
> So any combination of prepare/start calls which depend on A-B ordering
> for each CPU in turn, such as the X2APIC code which used to allocate a
> cluster mask 'just in case' and store it in a global variable in the
> prep stage, then potentially consume that preallocated structure from
> the AP and set the global pointer to NULL to be reallocated in
> CPUHP_X2APIC_PREPARE for the next CPU... would explode horribly.
>
> We believe that X2APIC was the only such case, for x86. But this is why
> it remains an architecture opt-in. For now.
We believe is not really a convincing technical argument.
On x86 the X2APIC case was established to be the only one which relied
on the full CPU hotplug serialization. It's unclear whether this
affects any other architecture, so this optimization is opt-in.
Hmm?
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 6c0a92ca6bb5..5a8f1a93b57c 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -1505,6 +1505,24 @@ int bringup_hibernate_cpu(unsigned int sleep_cpu)
> void bringup_nonboot_cpus(unsigned int setup_max_cpus)
> {
> unsigned int cpu;
> + int n = setup_max_cpus - num_online_cpus();
https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#variable-declarations
Aside of that I detest the mixture between unsigned and signed here.
> +
> + /* ∀ parallel pre-bringup state, bring N CPUs to it */
> + if (n > 0) {
> + enum cpuhp_state st = CPUHP_BP_PARALLEL_DYN;
> +
> + while (st <= CPUHP_BP_PARALLEL_DYN_END &&
> + cpuhp_hp_states[st].name) {
while (st <= CPUHP_BP_PARALLEL_DYN_END && cpuhp_hp_states[st].name) {
80 character limit has been updated quite some time ago
Thanks,
tglx
@@ -133,6 +133,8 @@ enum cpuhp_state {
CPUHP_MIPS_SOC_PREPARE,
CPUHP_BP_PREPARE_DYN,
CPUHP_BP_PREPARE_DYN_END = CPUHP_BP_PREPARE_DYN + 20,
+ CPUHP_BP_PARALLEL_DYN,
+ CPUHP_BP_PARALLEL_DYN_END = CPUHP_BP_PARALLEL_DYN + 4,
CPUHP_BRINGUP_CPU,
/*
@@ -1505,6 +1505,24 @@ int bringup_hibernate_cpu(unsigned int sleep_cpu)
void bringup_nonboot_cpus(unsigned int setup_max_cpus)
{
unsigned int cpu;
+ int n = setup_max_cpus - num_online_cpus();
+
+ /* ∀ parallel pre-bringup state, bring N CPUs to it */
+ if (n > 0) {
+ enum cpuhp_state st = CPUHP_BP_PARALLEL_DYN;
+
+ while (st <= CPUHP_BP_PARALLEL_DYN_END &&
+ cpuhp_hp_states[st].name) {
+ int i = n;
+
+ for_each_present_cpu(cpu) {
+ cpu_up(cpu, st);
+ if (!--i)
+ break;
+ }
+ st++;
+ }
+ }
for_each_present_cpu(cpu) {
if (num_online_cpus() >= setup_max_cpus)
@@ -1882,6 +1900,10 @@ static int cpuhp_reserve_state(enum cpuhp_state state)
step = cpuhp_hp_states + CPUHP_BP_PREPARE_DYN;
end = CPUHP_BP_PREPARE_DYN_END;
break;
+ case CPUHP_BP_PARALLEL_DYN:
+ step = cpuhp_hp_states + CPUHP_BP_PARALLEL_DYN;
+ end = CPUHP_BP_PARALLEL_DYN_END;
+ break;
default:
return -EINVAL;
}
@@ -1906,14 +1928,15 @@ static int cpuhp_store_callbacks(enum cpuhp_state state, const char *name,
/*
* If name is NULL, then the state gets removed.
*
- * CPUHP_AP_ONLINE_DYN and CPUHP_BP_PREPARE_DYN are handed out on
+ * CPUHP_AP_ONLINE_DYN and CPUHP_BP_P*_DYN are handed out on
* the first allocation from these dynamic ranges, so the removal
* would trigger a new allocation and clear the wrong (already
* empty) state, leaving the callbacks of the to be cleared state
* dangling, which causes wreckage on the next hotplug operation.
*/
if (name && (state == CPUHP_AP_ONLINE_DYN ||
- state == CPUHP_BP_PREPARE_DYN)) {
+ state == CPUHP_BP_PREPARE_DYN ||
+ state == CPUHP_BP_PARALLEL_DYN)) {
ret = cpuhp_reserve_state(state);
if (ret < 0)
return ret;