[v2] kernel/sched: Modify initial boot task idle setup

Message ID 20230915174444.2835306-1-Liam.Howlett@oracle.com
State New
Headers
Series [v2] kernel/sched: Modify initial boot task idle setup |

Commit Message

Liam R. Howlett Sept. 15, 2023, 5:44 p.m. UTC
  Initial booting is setting the task flag to idle (PF_IDLE) by the call
path sched_init() -> init_idle().  Having the task idle and calling
call_rcu() in kernel/rcu/tiny.c means that TIF_NEED_RESCHED will be
set.  Subsequent calls to any cond_resched() will enable IRQs,
potentially earlier than the IRQ setup has completed.  Recent changes
have caused just this scenario and IRQs have been enabled early.

This causes a warning later in start_kernel() as interrupts are enabled
before they are fully set up.

Fix this issue by setting the PF_IDLE flag later in the boot sequence.

Although the boot task was marked as idle since (at least) d80e4fda576d,
I am not sure that it is wrong to do so.  The forced context-switch on
idle task was introduced in the tiny_rcu update, so I'm going to claim
this fixes 5f6130fa52ee.

Link: https://lore.kernel.org/linux-mm/87v8cv22jh.fsf@mail.lhotse/
Link: https://lore.kernel.org/linux-mm/CAMuHMdWpvpWoDa=Ox-do92czYRvkok6_x6pYUH+ZouMcJbXy+Q@mail.gmail.com/
Fixes: 5f6130fa52ee ("tiny_rcu: Directly force QS when call_rcu_[bh|sched]() on idle_task")
Cc: stable@vger.kernel.org
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
---

v1: https://lore.kernel.org/linux-mm/20230913005647.1534747-1-Liam.Howlett@oracle.com/

 kernel/sched/core.c | 2 +-
 kernel/sched/idle.c | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)
  

Comments

Peter Zijlstra Sept. 19, 2023, 8:49 a.m. UTC | #1
On Fri, Sep 15, 2023 at 01:44:44PM -0400, Liam R. Howlett wrote:
> Initial booting is setting the task flag to idle (PF_IDLE) by the call
> path sched_init() -> init_idle().  Having the task idle and calling
> call_rcu() in kernel/rcu/tiny.c means that TIF_NEED_RESCHED will be
> set.  Subsequent calls to any cond_resched() will enable IRQs,
> potentially earlier than the IRQ setup has completed.  Recent changes
> have caused just this scenario and IRQs have been enabled early.
> 
> This causes a warning later in start_kernel() as interrupts are enabled
> before they are fully set up.
> 
> Fix this issue by setting the PF_IDLE flag later in the boot sequence.
> 
> Although the boot task was marked as idle since (at least) d80e4fda576d,
> I am not sure that it is wrong to do so.  The forced context-switch on
> idle task was introduced in the tiny_rcu update, so I'm going to claim
> this fixes 5f6130fa52ee.
> 
> Link: https://lore.kernel.org/linux-mm/87v8cv22jh.fsf@mail.lhotse/
> Link: https://lore.kernel.org/linux-mm/CAMuHMdWpvpWoDa=Ox-do92czYRvkok6_x6pYUH+ZouMcJbXy+Q@mail.gmail.com/
> Fixes: 5f6130fa52ee ("tiny_rcu: Directly force QS when call_rcu_[bh|sched]() on idle_task")
> Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> ---

Thanks!, I've queued this up for sched/urgent but will let the robots at
it before I push it out to -tip.
  
Geert Uytterhoeven Sept. 19, 2023, 10:11 a.m. UTC | #2
Hi Liam,

On Fri, Sep 15, 2023 at 7:45 PM Liam R. Howlett <Liam.Howlett@oracle.com> wrote:
> Initial booting is setting the task flag to idle (PF_IDLE) by the call
> path sched_init() -> init_idle().  Having the task idle and calling
> call_rcu() in kernel/rcu/tiny.c means that TIF_NEED_RESCHED will be
> set.  Subsequent calls to any cond_resched() will enable IRQs,
> potentially earlier than the IRQ setup has completed.  Recent changes
> have caused just this scenario and IRQs have been enabled early.
>
> This causes a warning later in start_kernel() as interrupts are enabled
> before they are fully set up.
>
> Fix this issue by setting the PF_IDLE flag later in the boot sequence.
>
> Although the boot task was marked as idle since (at least) d80e4fda576d,
> I am not sure that it is wrong to do so.  The forced context-switch on
> idle task was introduced in the tiny_rcu update, so I'm going to claim
> this fixes 5f6130fa52ee.
>
> Link: https://lore.kernel.org/linux-mm/87v8cv22jh.fsf@mail.lhotse/
> Link: https://lore.kernel.org/linux-mm/CAMuHMdWpvpWoDa=Ox-do92czYRvkok6_x6pYUH+ZouMcJbXy+Q@mail.gmail.com/
> Fixes: 5f6130fa52ee ("tiny_rcu: Directly force QS when call_rcu_[bh|sched]() on idle_task")
> Cc: stable@vger.kernel.org
> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
> Cc: Andreas Schwab <schwab@linux-m68k.org>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Peng Zhang <zhangpeng.00@bytedance.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Juri Lelli <juri.lelli@redhat.com>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>

Thanks for your patch!

This fixes the

      WARNING: CPU: 0 PID: 0 at init/main.c:992 start_kernel+0x2f0/0x480

I was seeing during boot on Renesas RZ/A1 and RZ/A2 since commit
cfeb6ae8bcb96ccf ("maple_tree: disable mas_wr_append() when other
readers are possible") in v6.5.

And unlike v1, this does not cause lots of new warnings on e.g. R-Car M2-W.

Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>

Gr{oetje,eeting}s,

                        Geert
  

Patch

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c52c2eba7c73..e8f73ff12126 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9271,7 +9271,7 @@  void __init init_idle(struct task_struct *idle, int cpu)
 	 * PF_KTHREAD should already be set at this point; regardless, make it
 	 * look like a proper per-CPU kthread.
 	 */
-	idle->flags |= PF_IDLE | PF_KTHREAD | PF_NO_SETAFFINITY;
+	idle->flags |= PF_KTHREAD | PF_NO_SETAFFINITY;
 	kthread_set_per_cpu(idle, cpu);
 
 #ifdef CONFIG_SMP
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 342f58a329f5..5007b25c5bc6 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -373,6 +373,7 @@  EXPORT_SYMBOL_GPL(play_idle_precise);
 
 void cpu_startup_entry(enum cpuhp_state state)
 {
+	current->flags |= PF_IDLE;
 	arch_cpu_idle_prepare();
 	cpuhp_online_idle(state);
 	while (1)