[v2] stop_machine: Avoid potential non-atomic read of multi_stop_data::state

Message ID tencent_4CD220721A6C0B39670D5D52AAE4BD2A8F0A@qq.com
State New
Headers
Series [v2] stop_machine: Avoid potential non-atomic read of multi_stop_data::state |

Commit Message

Rong Tao Oct. 19, 2023, 12:11 a.m. UTC
  From: Rong Tao <rongtao@cestc.cn>

In commit b1fc58333575 ("stop_machine: Avoid potential race behaviour")
fix both multi_cpu_stop() and set_state() access multi_stop_data::state,
Pass curstate as a parameter to ack_state(), to avoid the non-atomic read.

And replace smp_wmb()+WRITE_ONCE() with smp_store_release().

Signed-off-by: Rong Tao <rongtao@cestc.cn>
---
v1: stop_machine: Avoid potential race behaviour of multi_stop_data::state
    https://lore.kernel.org/lkml/tencent_705C16DF25978ACAEBD1E83E228881901006@qq.com/
---
 kernel/stop_machine.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)
  

Comments

Mark Rutland Oct. 20, 2023, 10:23 a.m. UTC | #1
On Thu, Oct 19, 2023 at 08:11:23AM +0800, Rong Tao wrote:
> From: Rong Tao <rongtao@cestc.cn>
> 
> In commit b1fc58333575 ("stop_machine: Avoid potential race behaviour")
> fix both multi_cpu_stop() and set_state() access multi_stop_data::state,
> Pass curstate as a parameter to ack_state(), to avoid the non-atomic read.

Can we please describe this better? This is *not* a fix, it is a cleanup.

As I covered in:

  https://lore.kernel.org/lkml/ZS5g6I-UtUnihToH@FVFF77S0Q05N/

... there are no concurrent writers, and so the value of multi_stop_data::state
cannot change, and a non-atomic read is fine.

The actual change looks good to me as it makes it easier to see that there's no
race.

> And replace smp_wmb()+WRITE_ONCE() with smp_store_release().

This is also fine, but feels like a logically separate change.

Mark.

> 
> Signed-off-by: Rong Tao <rongtao@cestc.cn>
> ---
> v1: stop_machine: Avoid potential race behaviour of multi_stop_data::state
>     https://lore.kernel.org/lkml/tencent_705C16DF25978ACAEBD1E83E228881901006@qq.com/
> ---
>  kernel/stop_machine.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index cedb17ba158a..35a122ce2cbd 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -183,15 +183,15 @@ static void set_state(struct multi_stop_data *msdata,
>  {
>  	/* Reset ack counter. */
>  	atomic_set(&msdata->thread_ack, msdata->num_threads);
> -	smp_wmb();
> -	WRITE_ONCE(msdata->state, newstate);
> +	smp_store_release(&msdata->state, newstate);
>  }
>  
>  /* Last one to ack a state moves to the next state. */
> -static void ack_state(struct multi_stop_data *msdata)
> +static void ack_state(struct multi_stop_data *msdata,
> +		      enum multi_stop_state curstate)
>  {
>  	if (atomic_dec_and_test(&msdata->thread_ack))
> -		set_state(msdata, msdata->state + 1);
> +		set_state(msdata, curstate + 1);
>  }
>  
>  notrace void __weak stop_machine_yield(const struct cpumask *cpumask)
> @@ -242,7 +242,7 @@ static int multi_cpu_stop(void *data)
>  			default:
>  				break;
>  			}
> -			ack_state(msdata);
> +			ack_state(msdata, curstate);
>  		} else if (curstate > MULTI_STOP_PREPARE) {
>  			/*
>  			 * At this stage all other CPUs we depend on must spin
> -- 
> 2.42.0
>
  
Peter Zijlstra Oct. 20, 2023, 10:48 a.m. UTC | #2
On Fri, Oct 20, 2023 at 11:23:37AM +0100, Mark Rutland wrote:

> > And replace smp_wmb()+WRITE_ONCE() with smp_store_release().
> 
> This is also fine, but feels like a logically separate change.

Agreed, and please take this opportunity to write a comment with the
barrier.
  

Patch

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index cedb17ba158a..35a122ce2cbd 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -183,15 +183,15 @@  static void set_state(struct multi_stop_data *msdata,
 {
 	/* Reset ack counter. */
 	atomic_set(&msdata->thread_ack, msdata->num_threads);
-	smp_wmb();
-	WRITE_ONCE(msdata->state, newstate);
+	smp_store_release(&msdata->state, newstate);
 }
 
 /* Last one to ack a state moves to the next state. */
-static void ack_state(struct multi_stop_data *msdata)
+static void ack_state(struct multi_stop_data *msdata,
+		      enum multi_stop_state curstate)
 {
 	if (atomic_dec_and_test(&msdata->thread_ack))
-		set_state(msdata, msdata->state + 1);
+		set_state(msdata, curstate + 1);
 }
 
 notrace void __weak stop_machine_yield(const struct cpumask *cpumask)
@@ -242,7 +242,7 @@  static int multi_cpu_stop(void *data)
 			default:
 				break;
 			}
-			ack_state(msdata);
+			ack_state(msdata, curstate);
 		} else if (curstate > MULTI_STOP_PREPARE) {
 			/*
 			 * At this stage all other CPUs we depend on must spin