[v2,2/3] livepatch,sched: Add livepatch task switching to cond_resched()

Message ID 9f09bff809fc026618108e8bbaac67ef2f8e6d3d.1676672328.git.jpoimboe@kernel.org
State New
Headers
Series livepatch,sched: Add livepatch task switching to cond_resched() |

Commit Message

Josh Poimboeuf Feb. 17, 2023, 10:22 p.m. UTC
  There have been reports [1][2] of live patches failing to complete
within a reasonable amount of time due to CPU-bound kthreads.

Fix it by patching tasks in cond_resched().

There are four different flavors of cond_resched(), depending on the
kernel configuration.  Hook into all of them.

A more elegant solution might be to use a preempt notifier.  However,
non-ORC unwinders can't unwind a preempted task reliably.

[1] https://lore.kernel.org/lkml/20220507174628.2086373-1-song@kernel.org/
[2] https://lkml.kernel.org/lkml/20230120-vhost-klp-switching-v1-0-7c2b65519c43@kernel.org

Tested-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
---
 include/linux/livepatch.h       |  1 +
 include/linux/livepatch_sched.h | 29 ++++++++++++
 include/linux/sched.h           | 20 ++++++---
 kernel/livepatch/transition.c   | 78 ++++++++++++++++++++++++++++-----
 kernel/sched/core.c             | 64 +++++++++++++++++++++++----
 5 files changed, 168 insertions(+), 24 deletions(-)
 create mode 100644 include/linux/livepatch_sched.h
  

Comments

Petr Mladek Feb. 22, 2023, 2 p.m. UTC | #1
On Fri 2023-02-17 14:22:55, Josh Poimboeuf wrote:
> There have been reports [1][2] of live patches failing to complete
> within a reasonable amount of time due to CPU-bound kthreads.
> 
> Fix it by patching tasks in cond_resched().
> 
> There are four different flavors of cond_resched(), depending on the
> kernel configuration.  Hook into all of them.
> 
> A more elegant solution might be to use a preempt notifier.  However,
> non-ORC unwinders can't unwind a preempted task reliably.
> 
> [1] https://lore.kernel.org/lkml/20220507174628.2086373-1-song@kernel.org/
> [2] https://lkml.kernel.org/lkml/20230120-vhost-klp-switching-v1-0-7c2b65519c43@kernel.org
> 
> --- a/kernel/livepatch/transition.c
> +++ b/kernel/livepatch/transition.c
> @@ -588,14 +641,10 @@ void klp_reverse_transition(void)
>  		 klp_target_state == KLP_PATCHED ? "patching to unpatching" :
>  						   "unpatching to patching");
>  
> -	klp_transition_patch->enabled = !klp_transition_patch->enabled;
> -
> -	klp_target_state = !klp_target_state;
> -
>  	/*
>  	 * Clear all TIF_PATCH_PENDING flags to prevent races caused by
> -	 * klp_update_patch_state() running in parallel with
> -	 * klp_start_transition().
> +	 * klp_update_patch_state() or __klp_sched_try_switch() running in
> +	 * parallel with the reverse transition.
>  	 */
>  	read_lock(&tasklist_lock);
>  	for_each_process_thread(g, task)
> @@ -605,9 +654,16 @@ void klp_reverse_transition(void)
>  	for_each_possible_cpu(cpu)
>  		clear_tsk_thread_flag(idle_task(cpu), TIF_PATCH_PENDING);
>  
> -	/* Let any remaining calls to klp_update_patch_state() complete */
> +	/*
> +	 * Make sure all existing invocations of klp_update_patch_state() and
> +	 * __klp_sched_try_switch() see the cleared TIF_PATCH_PENDING before
> +	 * starting the reverse transition.
> +	 */
>  	klp_synchronize_transition();
>  
> +	/* All patching has stopped, now start the reverse transition. */
> +	klp_transition_patch->enabled = !klp_transition_patch->enabled;
> +	klp_target_state = !klp_target_state;

I have double checked the synchronization and we need here:

	/*
	 * Make sure klp_update_patch_state() and __klp_sched_try_switch()
	 * see the updated klp_target_state before TIF_PATCH_PENDING
	 * is set again in klp_start_transition().
	 */
	smp_wmb();

The same is achieved by smp_wmb() in klp_init_transition().

Note that the extra barrier was missing here because klp_target_state
was set before klp_synchronize_transition(). It was fine because
klp_update_patch_state() was called on locations where a transition
in any direction was always safe.

Just for record. We need to modify @klp_target_state after
klp_synchronize_transition() now. The value is used by
__klp_sched_try_switch() to decide when the transition
is safe. It defines what functions must not be on the stack.

I am sorry that I missed this when reviewing v1. I think that I needed
to see the new code with a fresh head.

>  	klp_start_transition();
>  }

I do not see any other problem. With the above barrier added,
feel free to use:

Reviewed-by: Petr Mladek <pmladek@suse.com>

It is for the livepatching part. I checked also the scheduler
code and it looked fine but I would not put my hand in the fire
for it.

Best Regards,
Petr
  
Josh Poimboeuf Feb. 24, 2023, 1:34 a.m. UTC | #2
On Wed, Feb 22, 2023 at 03:00:33PM +0100, Petr Mladek wrote:
> > +	/* All patching has stopped, now start the reverse transition. */
> > +	klp_transition_patch->enabled = !klp_transition_patch->enabled;
> > +	klp_target_state = !klp_target_state;
> 
> I have double checked the synchronization and we need here:
> 
> 	/*
> 	 * Make sure klp_update_patch_state() and __klp_sched_try_switch()
> 	 * see the updated klp_target_state before TIF_PATCH_PENDING
> 	 * is set again in klp_start_transition().
> 	 */
> 	smp_wmb();
> 
> The same is achieved by smp_wmb() in klp_init_transition().
> 
> Note that the extra barrier was missing here because klp_target_state
> was set before klp_synchronize_transition(). It was fine because
> klp_update_patch_state() was called on locations where a transition
> in any direction was always safe.
> 
> Just for record. We need to modify @klp_target_state after
> klp_synchronize_transition() now. The value is used by
> __klp_sched_try_switch() to decide when the transition
> is safe. It defines what functions must not be on the stack.

Yes, makes sense.  And we need a corresponding smp_rmb() in
__klp_sched_try_switch() before the call to klp_try_switch_task(),
right?

Something like this on top?  Also updated a few more comments.

diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
index 201f0c0482fb..3f79265dd6e5 100644
--- a/kernel/livepatch/core.c
+++ b/kernel/livepatch/core.c
@@ -33,6 +33,7 @@
  *
  * - klp_ftrace_handler()
  * - klp_update_patch_state()
+ * - __klp_sched_try_switch()
  */
 DEFINE_MUTEX(klp_mutex);
 
diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
index b9e006632124..218ef4a5d575 100644
--- a/kernel/livepatch/transition.c
+++ b/kernel/livepatch/transition.c
@@ -192,8 +192,8 @@ void klp_update_patch_state(struct task_struct *task)
 	 * barrier (smp_rmb) for two cases:
 	 *
 	 * 1) Enforce the order of the TIF_PATCH_PENDING read and the
-	 *    klp_target_state read.  The corresponding write barrier is in
-	 *    klp_init_transition().
+	 *    klp_target_state read.  The corresponding write barriers are in
+	 *    klp_init_transition() and klp_reverse_transition().
 	 *
 	 * 2) Enforce the order of the TIF_PATCH_PENDING read and a future read
 	 *    of func->transition, if klp_ftrace_handler() is called later on
@@ -381,6 +381,14 @@ void __klp_sched_try_switch(void)
 	if (unlikely(!klp_patch_pending(current)))
 		goto out;
 
+	/*
+	 * Enforce the order of the TIF_PATCH_PENDING read above and the
+	 * klp_target_state read in klp_try_switch_task().  The corresponding
+	 * write barriers are in klp_init_transition() and
+	 * klp_reverse_transition().
+	 */
+	smp_rmb();
+
 	klp_try_switch_task(current);
 
 out:
@@ -604,8 +612,9 @@ void klp_init_transition(struct klp_patch *patch, int state)
 	 * see a func in transition with a task->patch_state of KLP_UNDEFINED.
 	 *
 	 * Also enforce the order of the klp_target_state write and future
-	 * TIF_PATCH_PENDING writes to ensure klp_update_patch_state() doesn't
-	 * set a task->patch_state to KLP_UNDEFINED.
+	 * TIF_PATCH_PENDING writes to ensure klp_update_patch_state() and
+	 * __klp_sched_try_switch() don't set a task->patch_state to
+	 * KLP_UNDEFINED.
 	 */
 	smp_wmb();
 
@@ -661,9 +670,19 @@ void klp_reverse_transition(void)
 	 */
 	klp_synchronize_transition();
 
-	/* All patching has stopped, now start the reverse transition. */
+	/* All patching has stopped, now start the reverse transition: */
+
 	klp_transition_patch->enabled = !klp_transition_patch->enabled;
 	klp_target_state = !klp_target_state;
+
+	/*
+	 * Enforce the order of the klp_target_state write and the
+	 * TIF_PATCH_PENDING writes in klp_start_transition() to ensure
+	 * klp_update_patch_state() and __klp_sched_try_switch() don't set
+	 * task->patch_state to the wrong value.
+	 */
+	smp_wmb();
+
 	klp_start_transition();
 }
  
Petr Mladek Feb. 24, 2023, 4:01 p.m. UTC | #3
On Thu 2023-02-23 17:34:02, Josh Poimboeuf wrote:
> On Wed, Feb 22, 2023 at 03:00:33PM +0100, Petr Mladek wrote:
> > > +	/* All patching has stopped, now start the reverse transition. */
> > > +	klp_transition_patch->enabled = !klp_transition_patch->enabled;
> > > +	klp_target_state = !klp_target_state;
> > 
> > I have double checked the synchronization and we need here:
> > 
> > 	/*
> > 	 * Make sure klp_update_patch_state() and __klp_sched_try_switch()
> > 	 * see the updated klp_target_state before TIF_PATCH_PENDING
> > 	 * is set again in klp_start_transition().
> > 	 */
> > 	smp_wmb();
> > 
> > The same is achieved by smp_wmb() in klp_init_transition().
> > 
> > Note that the extra barrier was missing here because klp_target_state
> > was set before klp_synchronize_transition(). It was fine because
> > klp_update_patch_state() was called on locations where a transition
> > in any direction was always safe.
> > 
> > Just for record. We need to modify @klp_target_state after
> > klp_synchronize_transition() now. The value is used by
> > __klp_sched_try_switch() to decide when the transition
> > is safe. It defines what functions must not be on the stack.
> 
> Yes, makes sense.  And we need a corresponding smp_rmb() in
> __klp_sched_try_switch() before the call to klp_try_switch_task(),
> right?

Yes. Great catch!

I feel shame that I missed this counter piece when I realized that
the write barrier was missing.

> Something like this on top?  Also updated a few more comments.
>
> diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
> index b9e006632124..218ef4a5d575 100644
> --- a/kernel/livepatch/transition.c
> +++ b/kernel/livepatch/transition.c
> @@ -192,8 +192,8 @@ void klp_update_patch_state(struct task_struct *task)
>  	 * barrier (smp_rmb) for two cases:
>  	 *
>  	 * 1) Enforce the order of the TIF_PATCH_PENDING read and the
> -	 *    klp_target_state read.  The corresponding write barrier is in
> -	 *    klp_init_transition().
> +	 *    klp_target_state read.  The corresponding write barriers are in
> +	 *    klp_init_transition() and klp_reverse_transition().
>  	 *
>  	 * 2) Enforce the order of the TIF_PATCH_PENDING read and a future read
>  	 *    of func->transition, if klp_ftrace_handler() is called later on
> @@ -381,6 +381,14 @@ void __klp_sched_try_switch(void)
>  	if (unlikely(!klp_patch_pending(current)))
>  		goto out;
>  
> +	/*
> +	 * Enforce the order of the TIF_PATCH_PENDING read above and the
> +	 * klp_target_state read in klp_try_switch_task().  The corresponding
> +	 * write barriers are in klp_init_transition() and
> +	 * klp_reverse_transition().
> +	 */
> +	smp_rmb();

This barrier has basically the same purpose as the implicit read
barrier in klp_update_patch_state().

The comment in klp_update_patch_state() says that the read barrier
actually has two purposes. The 1st one is easy. It is the one
described above.

It took me quite some time to understand the 2nd purpose again.
The original comment was:

	 * 2) Enforce the order of the TIF_PATCH_PENDING read and a future read
	 *    of func->transition, if klp_ftrace_handler() is called later on
	 *    the same CPU.  See __klp_disable_patch().

I think that a better description would be:

	 * 2) Make sure that this CPU sees func->transition enabled when
	 *    it sees the TIF_PATCH_PENDING enabled. This is important when
	 *    the current task is transitioning itself and then calls
	 *    klp_ftrace_handler() later. It ensures that the ftrace handler
	 *    would check the state change that we did here.
	 *    The corresponding write barrier is in __klp_enable_patch()
	 *    and __klp_disable_patch().

Note that the previous comment wasn't correct. IMHO, the related write
barrier is needed in both __klp_enable_patch() and __klp_disable_patch().

> +
>  	klp_try_switch_task(current);
>  
>  out:
> @@ -604,8 +612,9 @@ void klp_init_transition(struct klp_patch *patch, int state)
>  	 * see a func in transition with a task->patch_state of KLP_UNDEFINED.
>  	 *
>  	 * Also enforce the order of the klp_target_state write and future
> -	 * TIF_PATCH_PENDING writes to ensure klp_update_patch_state() doesn't
> -	 * set a task->patch_state to KLP_UNDEFINED.
> +	 * TIF_PATCH_PENDING writes to ensure klp_update_patch_state() and
> +	 * __klp_sched_try_switch() don't set a task->patch_state to
> +	 * KLP_UNDEFINED.
>  	 */
>  	smp_wmb();
>  
> @@ -661,9 +670,19 @@ void klp_reverse_transition(void)
>  	 */
>  	klp_synchronize_transition();
>  
> -	/* All patching has stopped, now start the reverse transition. */
> +	/* All patching has stopped, now start the reverse transition: */
> +

Is the extra empty line intended?

>  	klp_transition_patch->enabled = !klp_transition_patch->enabled;
>  	klp_target_state = !klp_target_state;
> +
> +	/*
> +	 * Enforce the order of the klp_target_state write and the
> +	 * TIF_PATCH_PENDING writes in klp_start_transition() to ensure
> +	 * klp_update_patch_state() and __klp_sched_try_switch() don't set
> +	 * task->patch_state to the wrong value.
> +	 */
> +	smp_wmb();
> +
>  	klp_start_transition();
>  }

This made me to revisit all the barriers in the livepatch code.
The good thing is that it seems that all the barriers are correct,
including the new ones proposed in this patchset.

But some comments are a bit misleading. I would like to update
them a bit. I have started working on it but it goes slowly.
I often get lost...

I am not sure about the ordering. I do not want to block this patchset
by the clean up of the comments. The currently proposed ones are
good enough. Feel free to send v3.

Or would you prefer to wait for my clean up of the comments?

Best Regards,
Petr
  
Josh Poimboeuf Feb. 24, 2023, 4:42 p.m. UTC | #4
On Fri, Feb 24, 2023 at 05:01:22PM +0100, Petr Mladek wrote:
> > @@ -381,6 +381,14 @@ void __klp_sched_try_switch(void)
> >  	if (unlikely(!klp_patch_pending(current)))
> >  		goto out;
> >  
> > +	/*
> > +	 * Enforce the order of the TIF_PATCH_PENDING read above and the
> > +	 * klp_target_state read in klp_try_switch_task().  The corresponding
> > +	 * write barriers are in klp_init_transition() and
> > +	 * klp_reverse_transition().
> > +	 */
> > +	smp_rmb();
> 
> This barrier has basically the same purpose as the implicit read
> barrier in klp_update_patch_state().
> 
> The comment in klp_update_patch_state() says that the read barrier
> actually has two purposes. The 1st one is easy. It is the one
> described above.
> 
> It took me quite some time to understand the 2nd purpose again.
> The original comment was:
> 
> 	 * 2) Enforce the order of the TIF_PATCH_PENDING read and a future read
> 	 *    of func->transition, if klp_ftrace_handler() is called later on
> 	 *    the same CPU.  See __klp_disable_patch().
> 
> I think that a better description would be:
> 
> 	 * 2) Make sure that this CPU sees func->transition enabled when
> 	 *    it sees the TIF_PATCH_PENDING enabled. This is important when
> 	 *    the current task is transitioning itself and then calls
> 	 *    klp_ftrace_handler() later. It ensures that the ftrace handler
> 	 *    would check the state change that we did here.
> 	 *    The corresponding write barrier is in __klp_enable_patch()
> 	 *    and __klp_disable_patch().
> 
> Note that the previous comment wasn't correct. IMHO, the related write
> barrier is needed in both __klp_enable_patch() and __klp_disable_patch().

That 2nd comment also confused me.  Yours is definitely better!

> > @@ -661,9 +670,19 @@ void klp_reverse_transition(void)
> >  	 */
> >  	klp_synchronize_transition();
> >  
> > -	/* All patching has stopped, now start the reverse transition. */
> > +	/* All patching has stopped, now start the reverse transition: */
> > +
> 
> Is the extra empty line intended?

Due to the additional comment and whitespace added below, I added
whitespace here to try to imply that the comment doesn't only apply to
the following two lines, but also the code after it.  I'm open to
suggestions :-)

> >  	klp_transition_patch->enabled = !klp_transition_patch->enabled;
> >  	klp_target_state = !klp_target_state;
> > +
> > +	/*
> > +	 * Enforce the order of the klp_target_state write and the
> > +	 * TIF_PATCH_PENDING writes in klp_start_transition() to ensure
> > +	 * klp_update_patch_state() and __klp_sched_try_switch() don't set
> > +	 * task->patch_state to the wrong value.
> > +	 */
> > +	smp_wmb();
> > +
> >  	klp_start_transition();
> >  }
> 
> This made me to revisit all the barriers in the livepatch code.
> The good thing is that it seems that all the barriers are correct,
> including the new ones proposed in this patchset.

That's good news :-)

> But some comments are a bit misleading. I would like to update
> them a bit. I have started working on it but it goes slowly.
> I often get lost...
> 
> I am not sure about the ordering. I do not want to block this patchset
> by the clean up of the comments. The currently proposed ones are
> good enough. Feel free to send v3.
> 
> Or would you prefer to wait for my clean up of the comments?

Sounds good, I'll send v3 soon and you can base your updates on top.
  

Patch

diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h
index 293e29960c6e..9b9b38e89563 100644
--- a/include/linux/livepatch.h
+++ b/include/linux/livepatch.h
@@ -13,6 +13,7 @@ 
 #include <linux/ftrace.h>
 #include <linux/completion.h>
 #include <linux/list.h>
+#include <linux/livepatch_sched.h>
 
 #if IS_ENABLED(CONFIG_LIVEPATCH)
 
diff --git a/include/linux/livepatch_sched.h b/include/linux/livepatch_sched.h
new file mode 100644
index 000000000000..013794fb5da0
--- /dev/null
+++ b/include/linux/livepatch_sched.h
@@ -0,0 +1,29 @@ 
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef _LINUX_LIVEPATCH_SCHED_H_
+#define _LINUX_LIVEPATCH_SCHED_H_
+
+#include <linux/jump_label.h>
+#include <linux/static_call_types.h>
+
+#ifdef CONFIG_LIVEPATCH
+
+void __klp_sched_try_switch(void);
+
+#if !defined(CONFIG_PREEMPT_DYNAMIC) || !defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL)
+
+DECLARE_STATIC_KEY_FALSE(klp_sched_try_switch_key);
+
+static __always_inline void klp_sched_try_switch(void)
+{
+	if (static_branch_unlikely(&klp_sched_try_switch_key))
+		__klp_sched_try_switch();
+}
+
+#endif /* !CONFIG_PREEMPT_DYNAMIC || !CONFIG_HAVE_PREEMPT_DYNAMIC_CALL */
+
+#else /* !CONFIG_LIVEPATCH */
+static inline void klp_sched_try_switch(void) {}
+static inline void __klp_sched_try_switch(void) {}
+#endif /* CONFIG_LIVEPATCH */
+
+#endif /* _LINUX_LIVEPATCH_SCHED_H_ */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 853d08f7562b..bd1e6f02facb 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -36,6 +36,7 @@ 
 #include <linux/seqlock.h>
 #include <linux/kcsan.h>
 #include <linux/rv.h>
+#include <linux/livepatch_sched.h>
 #include <asm/kmap_size.h>
 
 /* task_struct member predeclarations (sorted alphabetically): */
@@ -2064,6 +2065,9 @@  extern int __cond_resched(void);
 
 #if defined(CONFIG_PREEMPT_DYNAMIC) && defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL)
 
+void sched_dynamic_klp_enable(void);
+void sched_dynamic_klp_disable(void);
+
 DECLARE_STATIC_CALL(cond_resched, __cond_resched);
 
 static __always_inline int _cond_resched(void)
@@ -2072,6 +2076,7 @@  static __always_inline int _cond_resched(void)
 }
 
 #elif defined(CONFIG_PREEMPT_DYNAMIC) && defined(CONFIG_HAVE_PREEMPT_DYNAMIC_KEY)
+
 extern int dynamic_cond_resched(void);
 
 static __always_inline int _cond_resched(void)
@@ -2079,20 +2084,25 @@  static __always_inline int _cond_resched(void)
 	return dynamic_cond_resched();
 }
 
-#else
+#else /* !CONFIG_PREEMPTION */
 
 static inline int _cond_resched(void)
 {
+	klp_sched_try_switch();
 	return __cond_resched();
 }
 
-#endif /* CONFIG_PREEMPT_DYNAMIC */
+#endif /* PREEMPT_DYNAMIC && CONFIG_HAVE_PREEMPT_DYNAMIC_CALL */
 
-#else
+#else /* CONFIG_PREEMPTION && !CONFIG_PREEMPT_DYNAMIC */
 
-static inline int _cond_resched(void) { return 0; }
+static inline int _cond_resched(void)
+{
+	klp_sched_try_switch();
+	return 0;
+}
 
-#endif /* !defined(CONFIG_PREEMPTION) || defined(CONFIG_PREEMPT_DYNAMIC) */
+#endif /* !CONFIG_PREEMPTION || CONFIG_PREEMPT_DYNAMIC */
 
 #define cond_resched() ({			\
 	__might_resched(__FILE__, __LINE__, 0);	\
diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
index 4d1f443778f7..b9e006632124 100644
--- a/kernel/livepatch/transition.c
+++ b/kernel/livepatch/transition.c
@@ -9,6 +9,7 @@ 
 
 #include <linux/cpu.h>
 #include <linux/stacktrace.h>
+#include <linux/static_call.h>
 #include "core.h"
 #include "patch.h"
 #include "transition.h"
@@ -24,6 +25,25 @@  static int klp_target_state = KLP_UNDEFINED;
 
 static unsigned int klp_signals_cnt;
 
+/*
+ * When a livepatch is in progress, enable klp stack checking in
+ * cond_resched().  This helps CPU-bound kthreads get patched.
+ */
+#if defined(CONFIG_PREEMPT_DYNAMIC) && defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL)
+
+#define klp_cond_resched_enable() sched_dynamic_klp_enable()
+#define klp_cond_resched_disable() sched_dynamic_klp_disable()
+
+#else /* !CONFIG_PREEMPT_DYNAMIC || !CONFIG_HAVE_PREEMPT_DYNAMIC_CALL */
+
+DEFINE_STATIC_KEY_FALSE(klp_sched_try_switch_key);
+EXPORT_SYMBOL(klp_sched_try_switch_key);
+
+#define klp_cond_resched_enable() static_branch_enable(&klp_sched_try_switch_key)
+#define klp_cond_resched_disable() static_branch_disable(&klp_sched_try_switch_key)
+
+#endif /* CONFIG_PREEMPT_DYNAMIC && CONFIG_HAVE_PREEMPT_DYNAMIC_CALL */
+
 /*
  * This work can be performed periodically to finish patching or unpatching any
  * "straggler" tasks which failed to transition in the first attempt.
@@ -338,6 +358,36 @@  static bool klp_try_switch_task(struct task_struct *task)
 	return !ret;
 }
 
+void __klp_sched_try_switch(void)
+{
+	if (likely(!klp_patch_pending(current)))
+		return;
+
+	/*
+	 * This function is called from cond_resched() which is called in many
+	 * places throughout the kernel.  Using the klp_mutex here might
+	 * deadlock.
+	 *
+	 * Instead, disable preemption to prevent racing with other callers of
+	 * klp_try_switch_task().  Thanks to task_call_func() they won't be
+	 * able to switch this task while it's running.
+	 */
+	preempt_disable();
+
+	/*
+	 * Make sure current didn't get patched between the above check and
+	 * preempt_disable().
+	 */
+	if (unlikely(!klp_patch_pending(current)))
+		goto out;
+
+	klp_try_switch_task(current);
+
+out:
+	preempt_enable();
+}
+EXPORT_SYMBOL(__klp_sched_try_switch);
+
 /*
  * Sends a fake signal to all non-kthread tasks with TIF_PATCH_PENDING set.
  * Kthreads with TIF_PATCH_PENDING set are woken up.
@@ -444,7 +494,8 @@  void klp_try_complete_transition(void)
 		return;
 	}
 
-	/* we're done, now cleanup the data structures */
+	/* Done!  Now cleanup the data structures. */
+	klp_cond_resched_disable();
 	patch = klp_transition_patch;
 	klp_complete_transition();
 
@@ -496,6 +547,8 @@  void klp_start_transition(void)
 			set_tsk_thread_flag(task, TIF_PATCH_PENDING);
 	}
 
+	klp_cond_resched_enable();
+
 	klp_signals_cnt = 0;
 }
 
@@ -588,14 +641,10 @@  void klp_reverse_transition(void)
 		 klp_target_state == KLP_PATCHED ? "patching to unpatching" :
 						   "unpatching to patching");
 
-	klp_transition_patch->enabled = !klp_transition_patch->enabled;
-
-	klp_target_state = !klp_target_state;
-
 	/*
 	 * Clear all TIF_PATCH_PENDING flags to prevent races caused by
-	 * klp_update_patch_state() running in parallel with
-	 * klp_start_transition().
+	 * klp_update_patch_state() or __klp_sched_try_switch() running in
+	 * parallel with the reverse transition.
 	 */
 	read_lock(&tasklist_lock);
 	for_each_process_thread(g, task)
@@ -605,9 +654,16 @@  void klp_reverse_transition(void)
 	for_each_possible_cpu(cpu)
 		clear_tsk_thread_flag(idle_task(cpu), TIF_PATCH_PENDING);
 
-	/* Let any remaining calls to klp_update_patch_state() complete */
+	/*
+	 * Make sure all existing invocations of klp_update_patch_state() and
+	 * __klp_sched_try_switch() see the cleared TIF_PATCH_PENDING before
+	 * starting the reverse transition.
+	 */
 	klp_synchronize_transition();
 
+	/* All patching has stopped, now start the reverse transition. */
+	klp_transition_patch->enabled = !klp_transition_patch->enabled;
+	klp_target_state = !klp_target_state;
 	klp_start_transition();
 }
 
@@ -621,9 +677,9 @@  void klp_copy_process(struct task_struct *child)
 	 * the task flag up to date with the parent here.
 	 *
 	 * The operation is serialized against all klp_*_transition()
-	 * operations by the tasklist_lock. The only exception is
-	 * klp_update_patch_state(current), but we cannot race with
-	 * that because we are current.
+	 * operations by the tasklist_lock. The only exceptions are
+	 * klp_update_patch_state(current) and __klp_sched_try_switch(), but we
+	 * cannot race with them because we are current.
 	 */
 	if (test_tsk_thread_flag(current, TIF_PATCH_PENDING))
 		set_tsk_thread_flag(child, TIF_PATCH_PENDING);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e838feb6adc5..895d2a1fdcb3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8487,6 +8487,7 @@  EXPORT_STATIC_CALL_TRAMP(might_resched);
 static DEFINE_STATIC_KEY_FALSE(sk_dynamic_cond_resched);
 int __sched dynamic_cond_resched(void)
 {
+	klp_sched_try_switch();
 	if (!static_branch_unlikely(&sk_dynamic_cond_resched))
 		return 0;
 	return __cond_resched();
@@ -8635,13 +8636,17 @@  int sched_dynamic_mode(const char *str)
 #error "Unsupported PREEMPT_DYNAMIC mechanism"
 #endif
 
-void sched_dynamic_update(int mode)
+DEFINE_MUTEX(sched_dynamic_mutex);
+static bool klp_override;
+
+static void __sched_dynamic_update(int mode)
 {
 	/*
 	 * Avoid {NONE,VOLUNTARY} -> FULL transitions from ever ending up in
 	 * the ZERO state, which is invalid.
 	 */
-	preempt_dynamic_enable(cond_resched);
+	if (!klp_override)
+		preempt_dynamic_enable(cond_resched);
 	preempt_dynamic_enable(might_resched);
 	preempt_dynamic_enable(preempt_schedule);
 	preempt_dynamic_enable(preempt_schedule_notrace);
@@ -8649,36 +8654,79 @@  void sched_dynamic_update(int mode)
 
 	switch (mode) {
 	case preempt_dynamic_none:
-		preempt_dynamic_enable(cond_resched);
+		if (!klp_override)
+			preempt_dynamic_enable(cond_resched);
 		preempt_dynamic_disable(might_resched);
 		preempt_dynamic_disable(preempt_schedule);
 		preempt_dynamic_disable(preempt_schedule_notrace);
 		preempt_dynamic_disable(irqentry_exit_cond_resched);
-		pr_info("Dynamic Preempt: none\n");
+		if (mode != preempt_dynamic_mode)
+			pr_info("Dynamic Preempt: none\n");
 		break;
 
 	case preempt_dynamic_voluntary:
-		preempt_dynamic_enable(cond_resched);
+		if (!klp_override)
+			preempt_dynamic_enable(cond_resched);
 		preempt_dynamic_enable(might_resched);
 		preempt_dynamic_disable(preempt_schedule);
 		preempt_dynamic_disable(preempt_schedule_notrace);
 		preempt_dynamic_disable(irqentry_exit_cond_resched);
-		pr_info("Dynamic Preempt: voluntary\n");
+		if (mode != preempt_dynamic_mode)
+			pr_info("Dynamic Preempt: voluntary\n");
 		break;
 
 	case preempt_dynamic_full:
-		preempt_dynamic_disable(cond_resched);
+		if (!klp_override)
+			preempt_dynamic_disable(cond_resched);
 		preempt_dynamic_disable(might_resched);
 		preempt_dynamic_enable(preempt_schedule);
 		preempt_dynamic_enable(preempt_schedule_notrace);
 		preempt_dynamic_enable(irqentry_exit_cond_resched);
-		pr_info("Dynamic Preempt: full\n");
+		if (mode != preempt_dynamic_mode)
+			pr_info("Dynamic Preempt: full\n");
 		break;
 	}
 
 	preempt_dynamic_mode = mode;
 }
 
+void sched_dynamic_update(int mode)
+{
+	mutex_lock(&sched_dynamic_mutex);
+	__sched_dynamic_update(mode);
+	mutex_unlock(&sched_dynamic_mutex);
+}
+
+#ifdef CONFIG_HAVE_PREEMPT_DYNAMIC_CALL
+
+static int klp_cond_resched(void)
+{
+	__klp_sched_try_switch();
+	return __cond_resched();
+}
+
+void sched_dynamic_klp_enable(void)
+{
+	mutex_lock(&sched_dynamic_mutex);
+
+	klp_override = true;
+	static_call_update(cond_resched, klp_cond_resched);
+
+	mutex_unlock(&sched_dynamic_mutex);
+}
+
+void sched_dynamic_klp_disable(void)
+{
+	mutex_lock(&sched_dynamic_mutex);
+
+	klp_override = false;
+	__sched_dynamic_update(preempt_dynamic_mode);
+
+	mutex_unlock(&sched_dynamic_mutex);
+}
+
+#endif /* CONFIG_HAVE_PREEMPT_DYNAMIC_CALL */
+
 static int __init setup_preempt_mode(char *str)
 {
 	int mode = sched_dynamic_mode(str);