[v2] tools: memory-model: Make plain accesses carry dependencies

Message ID 20221202125100.30146-1-jonas.oberhauser@huaweicloud.com
State New
Headers
Series [v2] tools: memory-model: Make plain accesses carry dependencies |

Commit Message

Jonas Oberhauser Dec. 2, 2022, 12:51 p.m. UTC
  From: Jonas Oberhauser <jonas.oberhauser@huawei.com>

As reported by Viktor, plain accesses in LKMM are weaker than
accesses to registers: the latter carry dependencies but the former
do not. This is exemplified in the following snippet:

  int r = READ_ONCE(*x);
  WRITE_ONCE(*y, r);

Here a data dependency links the READ_ONCE() to the WRITE_ONCE(),
preserving their order, because the model treats r as a register.
If r is turned into a memory location accessed by plain accesses,
however, the link is broken and the order between READ_ONCE() and
WRITE_ONCE() is no longer preserved.

This is too conservative, since any optimizations on plain
accesses that might break dependencies are also possible on
registers; it also contradicts the intuitive notion of "dependency"
as the data stored by the WRITE_ONCE() does depend on the data read
by the READ_ONCE(), independently of whether r is a register or a
memory location.

This is resolved by redefining all dependencies to include
dependencies carried by memory accesses; a dependency is said to be
carried by memory accesses (in the model: carry-dep) from one load
to another load if the initial load is followed by an arbitrarily
long sequence alternating between stores and loads of the same
thread, where the data of each store depends on the previous load,
and is read by the next load.

Any dependency linking the final load in the sequence to another
access also links the initial load in the sequence to that access.

Reported-by: Viktor Vafeiadis <viktor@mpi-sws.org>
Signed-off-by: Jonas Oberhauser <jonas.oberhauser@huawei.com>
Reviewed-by: Reviewed-by: Alan Stern <stern@rowland.harvard.edu>
---
 .../Documentation/explanation.txt             |  9 +++++-
 tools/memory-model/linux-kernel.bell          |  6 ++++
 .../litmus-tests/dep+plain.litmus             | 31 +++++++++++++++++++
 3 files changed, 45 insertions(+), 1 deletion(-)
 create mode 100644 tools/memory-model/litmus-tests/dep+plain.litmus
  

Comments

Akira Yokosawa Dec. 2, 2022, 2:44 p.m. UTC | #1
[Dropping most CCs]

Hi Jonas,

A comment on the way this patch was sent.

On Fri,  2 Dec 2022 13:51:00 +0100, Jonas Oberhauser wrote:
> From: Jonas Oberhauser <jonas.oberhauser@huawei.com>

The sender of this patch reads
     "Jonas Oberhauser <jonas.oberhauser@huaweicloud.com>".

Is there any technical reason you've sent this patch in this way?

        Thanks, Akira

[...]
  
Boqun Feng Dec. 2, 2022, 6:49 p.m. UTC | #2
On Fri, Dec 02, 2022 at 01:51:00PM +0100, Jonas Oberhauser wrote:
> From: Jonas Oberhauser <jonas.oberhauser@huawei.com>
> 
> As reported by Viktor, plain accesses in LKMM are weaker than
> accesses to registers: the latter carry dependencies but the former
> do not. This is exemplified in the following snippet:
> 
>   int r = READ_ONCE(*x);
>   WRITE_ONCE(*y, r);
> 
> Here a data dependency links the READ_ONCE() to the WRITE_ONCE(),
> preserving their order, because the model treats r as a register.
> If r is turned into a memory location accessed by plain accesses,
> however, the link is broken and the order between READ_ONCE() and
> WRITE_ONCE() is no longer preserved.
> 
> This is too conservative, since any optimizations on plain
> accesses that might break dependencies are also possible on
> registers; it also contradicts the intuitive notion of "dependency"
> as the data stored by the WRITE_ONCE() does depend on the data read
> by the READ_ONCE(), independently of whether r is a register or a
> memory location.
> 
> This is resolved by redefining all dependencies to include
> dependencies carried by memory accesses; a dependency is said to be
> carried by memory accesses (in the model: carry-dep) from one load
> to another load if the initial load is followed by an arbitrarily
> long sequence alternating between stores and loads of the same
> thread, where the data of each store depends on the previous load,
> and is read by the next load.
> 
> Any dependency linking the final load in the sequence to another
> access also links the initial load in the sequence to that access.
> 
> Reported-by: Viktor Vafeiadis <viktor@mpi-sws.org>
> Signed-off-by: Jonas Oberhauser <jonas.oberhauser@huawei.com>
> Reviewed-by: Reviewed-by: Alan Stern <stern@rowland.harvard.edu>

s/Reviewed-by: Reviewed-by:/Reviewed-by:^2 to save some space ? ;-)

Joke aside, I wonder is this patch a first step to solve the OOTA
problem you reported in OSS:

	https://static.sched.com/hosted_files/osseu2022/e1/oss-eu22-jonas.pdf

?

/me catching up slowly on that topic, so I'm curious. If so maybe it's
better to put the link in the commit log I think.

Regards,
Boqun

> ---
>  .../Documentation/explanation.txt             |  9 +++++-
>  tools/memory-model/linux-kernel.bell          |  6 ++++
>  .../litmus-tests/dep+plain.litmus             | 31 +++++++++++++++++++
>  3 files changed, 45 insertions(+), 1 deletion(-)
>  create mode 100644 tools/memory-model/litmus-tests/dep+plain.litmus
> 
> diff --git a/tools/memory-model/Documentation/explanation.txt b/tools/memory-model/Documentation/explanation.txt
> index e901b47236c3..8e7085238470 100644
> --- a/tools/memory-model/Documentation/explanation.txt
> +++ b/tools/memory-model/Documentation/explanation.txt
> @@ -2575,7 +2575,7 @@ smp_store_release() -- which is basically how the Linux kernel treats
>  them.
>  
>  Although we said that plain accesses are not linked by the ppo
> -relation, they do contribute to it indirectly.  Namely, when there is
> +relation, they do contribute to it indirectly.  Firstly, when there is
>  an address dependency from a marked load R to a plain store W,
>  followed by smp_wmb() and then a marked store W', the LKMM creates a
>  ppo link from R to W'.  The reasoning behind this is perhaps a little
> @@ -2584,6 +2584,13 @@ for this source code in which W' could execute before R.  Just as with
>  pre-bounding by address dependencies, it is possible for the compiler
>  to undermine this relation if sufficient care is not taken.
>  
> +Secondly, plain accesses can carry dependencies: If a data dependency
> +links a marked load R to a store W, and the store is read by a load R'
> +from the same thread, then the data loaded by R' depends on the data
> +loaded originally by R. Thus, if R' is linked to any access X by a
> +dependency, R is also linked to access X by the same dependency, even
> +if W' or R' (or both!) are plain.
> +
>  There are a few oddball fences which need special treatment:
>  smp_mb__before_atomic(), smp_mb__after_atomic(), and
>  smp_mb__after_spinlock().  The LKMM uses fence events with special
> diff --git a/tools/memory-model/linux-kernel.bell b/tools/memory-model/linux-kernel.bell
> index 65c32ca9d5ea..5f0b98c1ab81 100644
> --- a/tools/memory-model/linux-kernel.bell
> +++ b/tools/memory-model/linux-kernel.bell
> @@ -76,3 +76,9 @@ flag ~empty different-values(srcu-rscs) as srcu-bad-nesting
>  let Marked = (~M) | IW | Once | Release | Acquire | domain(rmw) | range(rmw) |
>  		LKR | LKW | UL | LF | RL | RU
>  let Plain = M \ Marked
> +
> +(* Redefine dependencies to include those carried through plain accesses *)
> +let carry-dep = (data ; rfi)*
> +let addr = carry-dep ; addr
> +let ctrl = carry-dep ; ctrl
> +let data = carry-dep ; data
> diff --git a/tools/memory-model/litmus-tests/dep+plain.litmus b/tools/memory-model/litmus-tests/dep+plain.litmus
> new file mode 100644
> index 000000000000..ebf84daa9a59
> --- /dev/null
> +++ b/tools/memory-model/litmus-tests/dep+plain.litmus
> @@ -0,0 +1,31 @@
> +C dep+plain
> +
> +(*
> + * Result: Never
> + *
> + * This litmus test demonstrates that in LKMM, plain accesses
> + * carry dependencies much like accesses to registers:
> + * The data stored to *z1 and *z2 by P0() originates from P0()'s
> + * READ_ONCE(), and therefore using that data to compute the
> + * conditional of P0()'s if-statement creates a control dependency
> + * from that READ_ONCE() to P0()'s WRITE_ONCE().
> + *)
> +
> +{}
> +
> +P0(int *x, int *y, int *z1, int *z2)
> +{
> +	int a = READ_ONCE(*x);
> +	*z1 = a;
> +	*z2 = *z1;
> +	if (*z2 == 1)
> +		WRITE_ONCE(*y, 1);
> +}
> +
> +P1(int *x, int *y)
> +{
> +	int r = smp_load_acquire(y);
> +	smp_store_release(x, r);
> +}
> +
> +exists (x=1 /\ y=1)
> -- 
> 2.17.1
>
  
Paul E. McKenney Dec. 3, 2022, 7:02 p.m. UTC | #3
On Sat, Dec 03, 2022 at 11:58:36AM +0000, Jonas Oberhauser wrote:
> 
> 
> -----Original Message-----
> From: Boqun Feng [mailto:boqun.feng@gmail.com] 
> Sent: Friday, December 2, 2022 7:50 PM
> 
> > > Reviewed-by: Reviewed-by: Alan Stern <stern@rowland.harvard.edu>
> 
> > s/Reviewed-by: Reviewed-by:/Reviewed-by:^2 to save some space ? ;-)
> 
> 
> Oh, I didn't know I'm allowed to compress things like that! Can I use ² as well to save another character?

Heh!  I might miss that, and who knows?  The bots might think that "²"
was the first letter of your name.  ;-)

> > I wonder is this patch a first step to solve the OOTA problem you reported in OSS:
> >	https://static.sched.com/hosted_files/osseu2022/e1/oss-eu22-jonas.pdf
> > If so maybe it's better to put the link in the commit log I think.
> 
> It's not directly related to that specific problem, it does solve some other OOTA issues though.
> If you think we should link to the talk, there's also a video with slightly more updated slides from the actual talk: https://www.youtube.com/watch?v=iFDKhIxKhoQ
> do you think I should link to both then?

It is not hard for me to add that in if people believe that it should be
included.  But default is lazy in this case.  ;-)

							Thanx, Paul
  
Boqun Feng Dec. 3, 2022, 8:14 p.m. UTC | #4
On Sat, Dec 03, 2022 at 11:02:26AM -0800, Paul E. McKenney wrote:
> On Sat, Dec 03, 2022 at 11:58:36AM +0000, Jonas Oberhauser wrote:
> > 
> > 
> > -----Original Message-----
> > From: Boqun Feng [mailto:boqun.feng@gmail.com] 
> > Sent: Friday, December 2, 2022 7:50 PM
> > 
> > > > Reviewed-by: Reviewed-by: Alan Stern <stern@rowland.harvard.edu>
> > 
> > > s/Reviewed-by: Reviewed-by:/Reviewed-by:^2 to save some space ? ;-)
> > 
> > 
> > Oh, I didn't know I'm allowed to compress things like that! Can I use ² as well to save another character?
> 
> Heh!  I might miss that, and who knows?  The bots might think that "²"
> was the first letter of your name.  ;-)
> 
> > > I wonder is this patch a first step to solve the OOTA problem you reported in OSS:
> > >	https://static.sched.com/hosted_files/osseu2022/e1/oss-eu22-jonas.pdf
> > > If so maybe it's better to put the link in the commit log I think.
> > 
> > It's not directly related to that specific problem, it does solve some other OOTA issues though.
> > If you think we should link to the talk, there's also a video with slightly more updated slides from the actual talk: https://www.youtube.com/watch?v=iFDKhIxKhoQ
> > do you think I should link to both then?
> 
> It is not hard for me to add that in if people believe that it should be
> included.  But default is lazy in this case.  ;-)
> 

I brought this up because, as we recently experience in RCU code, we
need answers of "why we did this?" to the future us ;-)

I agree with Alan, this seems like a good idea, but having some big
picture of why we do this may be better.

Regards,
Boqun

> 							Thanx, Paul
  
Alan Stern Dec. 3, 2022, 8:34 p.m. UTC | #5
On Sat, Dec 03, 2022 at 11:02:26AM -0800, Paul E. McKenney wrote:
> On Sat, Dec 03, 2022 at 11:58:36AM +0000, Jonas Oberhauser wrote:
> > 
> > 
> > -----Original Message-----
> > From: Boqun Feng [mailto:boqun.feng@gmail.com] 
> > Sent: Friday, December 2, 2022 7:50 PM
> 
> > > I wonder is this patch a first step to solve the OOTA problem you reported in OSS:
> > >	https://static.sched.com/hosted_files/osseu2022/e1/oss-eu22-jonas.pdf
> > > If so maybe it's better to put the link in the commit log I think.
> > 
> > It's not directly related to that specific problem, it does solve some other OOTA issues though.
> > If you think we should link to the talk, there's also a video with slightly more updated slides from the actual talk: https://www.youtube.com/watch?v=iFDKhIxKhoQ
> > do you think I should link to both then?
> 
> It is not hard for me to add that in if people believe that it should be
> included.  But default is lazy in this case.  ;-)

I don't think there's any need to mention that video in the commit log.  
It's an introductory talk, and it's pretty safe to assume that anyone 
reading the commit because they are interested in the LKMM in great 
detail is already beyond the introductory level.

On the other hand, it wouldn't hurt to include a Link: tag to an 
appropriate message in this email thread.  (I leave it up to Paul to 
decide which message is most "appropriate" -- there may not be a good 
candidate, because a lot of the messages were not CC'ed to LKML.)

Alan
  
Paul E. McKenney Dec. 3, 2022, 8:40 p.m. UTC | #6
On Sat, Dec 03, 2022 at 12:14:29PM -0800, Boqun Feng wrote:
> On Sat, Dec 03, 2022 at 11:02:26AM -0800, Paul E. McKenney wrote:
> > On Sat, Dec 03, 2022 at 11:58:36AM +0000, Jonas Oberhauser wrote:
> > > 
> > > 
> > > -----Original Message-----
> > > From: Boqun Feng [mailto:boqun.feng@gmail.com] 
> > > Sent: Friday, December 2, 2022 7:50 PM
> > > 
> > > > > Reviewed-by: Reviewed-by: Alan Stern <stern@rowland.harvard.edu>
> > > 
> > > > s/Reviewed-by: Reviewed-by:/Reviewed-by:^2 to save some space ? ;-)
> > > 
> > > 
> > > Oh, I didn't know I'm allowed to compress things like that! Can I use ² as well to save another character?
> > 
> > Heh!  I might miss that, and who knows?  The bots might think that "²"
> > was the first letter of your name.  ;-)
> > 
> > > > I wonder is this patch a first step to solve the OOTA problem you reported in OSS:
> > > >	https://static.sched.com/hosted_files/osseu2022/e1/oss-eu22-jonas.pdf
> > > > If so maybe it's better to put the link in the commit log I think.
> > > 
> > > It's not directly related to that specific problem, it does solve some other OOTA issues though.
> > > If you think we should link to the talk, there's also a video with slightly more updated slides from the actual talk: https://www.youtube.com/watch?v=iFDKhIxKhoQ
> > > do you think I should link to both then?
> > 
> > It is not hard for me to add that in if people believe that it should be
> > included.  But default is lazy in this case.  ;-)
> > 
> 
> I brought this up because, as we recently experience in RCU code, we
> need answers of "why we did this?" to the future us ;-)
> 
> I agree with Alan, this seems like a good idea, but having some big
> picture of why we do this may be better.

Fair enough!

How about something like this as a new paragraph at the end of
the commit log?

						Thanx, Paul

------------------------------------------------------------------------

For more on LKMM and dependencies, please see this Open Source Summit
talk:  https://www.youtube.com/watch?v=iFDKhIxKhoQ
  
Paul E. McKenney Dec. 3, 2022, 8:44 p.m. UTC | #7
On Sat, Dec 03, 2022 at 03:34:20PM -0500, stern@rowland.harvard.edu wrote:
> On Sat, Dec 03, 2022 at 11:02:26AM -0800, Paul E. McKenney wrote:
> > On Sat, Dec 03, 2022 at 11:58:36AM +0000, Jonas Oberhauser wrote:
> > > 
> > > 
> > > -----Original Message-----
> > > From: Boqun Feng [mailto:boqun.feng@gmail.com] 
> > > Sent: Friday, December 2, 2022 7:50 PM
> > 
> > > > I wonder is this patch a first step to solve the OOTA problem you reported in OSS:
> > > >	https://static.sched.com/hosted_files/osseu2022/e1/oss-eu22-jonas.pdf
> > > > If so maybe it's better to put the link in the commit log I think.
> > > 
> > > It's not directly related to that specific problem, it does solve some other OOTA issues though.
> > > If you think we should link to the talk, there's also a video with slightly more updated slides from the actual talk: https://www.youtube.com/watch?v=iFDKhIxKhoQ
> > > do you think I should link to both then?
> > 
> > It is not hard for me to add that in if people believe that it should be
> > included.  But default is lazy in this case.  ;-)
> 
> I don't think there's any need to mention that video in the commit log.  
> It's an introductory talk, and it's pretty safe to assume that anyone 
> reading the commit because they are interested in the LKMM in great 
> detail is already beyond the introductory level.
> 
> On the other hand, it wouldn't hurt to include a Link: tag to an 
> appropriate message in this email thread.  (I leave it up to Paul to 
> decide which message is most "appropriate" -- there may not be a good 
> candidate, because a lot of the messages were not CC'ed to LKML.)

For this approach, I would add this:

Link: https://lore.kernel.org/all/4262e55407294a5989e766bc4dc48293@huawei.com/

I could of course do both the extra paragraph -and- the Link:.  ;-)

Thoughts?

							Thanx, Paul
  
Boqun Feng Dec. 3, 2022, 8:52 p.m. UTC | #8
On Sat, Dec 03, 2022 at 12:44:05PM -0800, Paul E. McKenney wrote:
> On Sat, Dec 03, 2022 at 03:34:20PM -0500, stern@rowland.harvard.edu wrote:
> > On Sat, Dec 03, 2022 at 11:02:26AM -0800, Paul E. McKenney wrote:
> > > On Sat, Dec 03, 2022 at 11:58:36AM +0000, Jonas Oberhauser wrote:
> > > > 
> > > > 
> > > > -----Original Message-----
> > > > From: Boqun Feng [mailto:boqun.feng@gmail.com] 
> > > > Sent: Friday, December 2, 2022 7:50 PM
> > > 
> > > > > I wonder is this patch a first step to solve the OOTA problem you reported in OSS:
> > > > >	https://static.sched.com/hosted_files/osseu2022/e1/oss-eu22-jonas.pdf
> > > > > If so maybe it's better to put the link in the commit log I think.
> > > > 
> > > > It's not directly related to that specific problem, it does solve some other OOTA issues though.
> > > > If you think we should link to the talk, there's also a video with slightly more updated slides from the actual talk: https://www.youtube.com/watch?v=iFDKhIxKhoQ
> > > > do you think I should link to both then?
> > > 
> > > It is not hard for me to add that in if people believe that it should be
> > > included.  But default is lazy in this case.  ;-)
> > 
> > I don't think there's any need to mention that video in the commit log.  
> > It's an introductory talk, and it's pretty safe to assume that anyone 
> > reading the commit because they are interested in the LKMM in great 
> > detail is already beyond the introductory level.
> > 
> > On the other hand, it wouldn't hurt to include a Link: tag to an 
> > appropriate message in this email thread.  (I leave it up to Paul to 
> > decide which message is most "appropriate" -- there may not be a good 
> > candidate, because a lot of the messages were not CC'ed to LKML.)
> 
> For this approach, I would add this:
> 
> Link: https://lore.kernel.org/all/4262e55407294a5989e766bc4dc48293@huawei.com/
> 
> I could of course do both the extra paragraph -and- the Link:.  ;-)
> 
> Thoughts?
> 

I think only having Link: is fine ;-) And I agree with Alan, no need to
mention that video.

Thank you!

Regards,
Boqun

> 							Thanx, Paul
  
Paul E. McKenney Dec. 3, 2022, 9:16 p.m. UTC | #9
On Sat, Dec 03, 2022 at 12:52:02PM -0800, Boqun Feng wrote:
> On Sat, Dec 03, 2022 at 12:44:05PM -0800, Paul E. McKenney wrote:
> > On Sat, Dec 03, 2022 at 03:34:20PM -0500, stern@rowland.harvard.edu wrote:
> > > On Sat, Dec 03, 2022 at 11:02:26AM -0800, Paul E. McKenney wrote:
> > > > On Sat, Dec 03, 2022 at 11:58:36AM +0000, Jonas Oberhauser wrote:
> > > > > 
> > > > > 
> > > > > -----Original Message-----
> > > > > From: Boqun Feng [mailto:boqun.feng@gmail.com] 
> > > > > Sent: Friday, December 2, 2022 7:50 PM
> > > > 
> > > > > > I wonder is this patch a first step to solve the OOTA problem you reported in OSS:
> > > > > >	https://static.sched.com/hosted_files/osseu2022/e1/oss-eu22-jonas.pdf
> > > > > > If so maybe it's better to put the link in the commit log I think.
> > > > > 
> > > > > It's not directly related to that specific problem, it does solve some other OOTA issues though.
> > > > > If you think we should link to the talk, there's also a video with slightly more updated slides from the actual talk: https://www.youtube.com/watch?v=iFDKhIxKhoQ
> > > > > do you think I should link to both then?
> > > > 
> > > > It is not hard for me to add that in if people believe that it should be
> > > > included.  But default is lazy in this case.  ;-)
> > > 
> > > I don't think there's any need to mention that video in the commit log.  
> > > It's an introductory talk, and it's pretty safe to assume that anyone 
> > > reading the commit because they are interested in the LKMM in great 
> > > detail is already beyond the introductory level.
> > > 
> > > On the other hand, it wouldn't hurt to include a Link: tag to an 
> > > appropriate message in this email thread.  (I leave it up to Paul to 
> > > decide which message is most "appropriate" -- there may not be a good 
> > > candidate, because a lot of the messages were not CC'ed to LKML.)
> > 
> > For this approach, I would add this:
> > 
> > Link: https://lore.kernel.org/all/4262e55407294a5989e766bc4dc48293@huawei.com/
> > 
> > I could of course do both the extra paragraph -and- the Link:.  ;-)
> > 
> > Thoughts?
> > 
> 
> I think only having Link: is fine ;-) And I agree with Alan, no need to
> mention that video.

Very good, I will add the Link: on the next rebase.

And I will even refrain from adding the URL of the infamous "why not
both" video to this email.  ;-)

							Thanx, Paul
  
Alan Stern Dec. 3, 2022, 9:32 p.m. UTC | #10
On Sat, Dec 03, 2022 at 12:44:05PM -0800, Paul E. McKenney wrote:
> On Sat, Dec 03, 2022 at 03:34:20PM -0500, stern@rowland.harvard.edu wrote:
> > On Sat, Dec 03, 2022 at 11:02:26AM -0800, Paul E. McKenney wrote:
> > > On Sat, Dec 03, 2022 at 11:58:36AM +0000, Jonas Oberhauser wrote:
> > > > 
> > > > 
> > > > -----Original Message-----
> > > > From: Boqun Feng [mailto:boqun.feng@gmail.com] 
> > > > Sent: Friday, December 2, 2022 7:50 PM
> > > 
> > > > > I wonder is this patch a first step to solve the OOTA problem you reported in OSS:
> > > > >	https://static.sched.com/hosted_files/osseu2022/e1/oss-eu22-jonas.pdf
> > > > > If so maybe it's better to put the link in the commit log I think.
> > > > 
> > > > It's not directly related to that specific problem, it does solve some other OOTA issues though.
> > > > If you think we should link to the talk, there's also a video with slightly more updated slides from the actual talk: https://www.youtube.com/watch?v=iFDKhIxKhoQ
> > > > do you think I should link to both then?
> > > 
> > > It is not hard for me to add that in if people believe that it should be
> > > included.  But default is lazy in this case.  ;-)
> > 
> > I don't think there's any need to mention that video in the commit log.  
> > It's an introductory talk, and it's pretty safe to assume that anyone 
> > reading the commit because they are interested in the LKMM in great 
> > detail is already beyond the introductory level.
> > 
> > On the other hand, it wouldn't hurt to include a Link: tag to an 
> > appropriate message in this email thread.  (I leave it up to Paul to 
> > decide which message is most "appropriate" -- there may not be a good 
> > candidate, because a lot of the messages were not CC'ed to LKML.)
> 
> For this approach, I would add this:
> 
> Link: https://lore.kernel.org/all/4262e55407294a5989e766bc4dc48293@huawei.com/

There's no point including that link; it merely points to messages 
containing or commenting on early versions of the commit.  It adds very 
little information not already present in the commit itself.  (Have you 
read any of Linus's criticisms of the Link: tags that people tend to 
include in patches they send him?  It's the same principle.)

I was thinking of the discussion which led up to the commit being 
written, where Jonas first brought up the idea that plain accesses 
should be able to carry dependencies just like accesses to registers.  
That's the sort of thing which would give readers some context and 
understanding of the reasoning behind the commit.  They were part of the 
thread with the subject "RE: Interesting LKMM litmus test".

But I can't find those messages on lore.kernel.org (which isn't 
surprising, as they weren't CC'ed to any mailing lists).

> I could of course do both the extra paragraph -and- the Link:.  ;-)
> 
> Thoughts?

My advice: Omit them both.

Alan
  
Paul E. McKenney Dec. 3, 2022, 11:11 p.m. UTC | #11
On Sat, Dec 03, 2022 at 04:32:19PM -0500, stern@rowland.harvard.edu wrote:
> On Sat, Dec 03, 2022 at 12:44:05PM -0800, Paul E. McKenney wrote:
> > On Sat, Dec 03, 2022 at 03:34:20PM -0500, stern@rowland.harvard.edu wrote:
> > > On Sat, Dec 03, 2022 at 11:02:26AM -0800, Paul E. McKenney wrote:
> > > > On Sat, Dec 03, 2022 at 11:58:36AM +0000, Jonas Oberhauser wrote:
> > > > > 
> > > > > 
> > > > > -----Original Message-----
> > > > > From: Boqun Feng [mailto:boqun.feng@gmail.com] 
> > > > > Sent: Friday, December 2, 2022 7:50 PM
> > > > 
> > > > > > I wonder is this patch a first step to solve the OOTA problem you reported in OSS:
> > > > > >	https://static.sched.com/hosted_files/osseu2022/e1/oss-eu22-jonas.pdf
> > > > > > If so maybe it's better to put the link in the commit log I think.
> > > > > 
> > > > > It's not directly related to that specific problem, it does solve some other OOTA issues though.
> > > > > If you think we should link to the talk, there's also a video with slightly more updated slides from the actual talk: https://www.youtube.com/watch?v=iFDKhIxKhoQ
> > > > > do you think I should link to both then?
> > > > 
> > > > It is not hard for me to add that in if people believe that it should be
> > > > included.  But default is lazy in this case.  ;-)
> > > 
> > > I don't think there's any need to mention that video in the commit log.  
> > > It's an introductory talk, and it's pretty safe to assume that anyone 
> > > reading the commit because they are interested in the LKMM in great 
> > > detail is already beyond the introductory level.
> > > 
> > > On the other hand, it wouldn't hurt to include a Link: tag to an 
> > > appropriate message in this email thread.  (I leave it up to Paul to 
> > > decide which message is most "appropriate" -- there may not be a good 
> > > candidate, because a lot of the messages were not CC'ed to LKML.)
> > 
> > For this approach, I would add this:
> > 
> > Link: https://lore.kernel.org/all/4262e55407294a5989e766bc4dc48293@huawei.com/
> 
> There's no point including that link; it merely points to messages 
> containing or commenting on early versions of the commit.  It adds very 
> little information not already present in the commit itself.  (Have you 
> read any of Linus's criticisms of the Link: tags that people tend to 
> include in patches they send him?  It's the same principle.)
> 
> I was thinking of the discussion which led up to the commit being 
> written, where Jonas first brought up the idea that plain accesses 
> should be able to carry dependencies just like accesses to registers.  
> That's the sort of thing which would give readers some context and 
> understanding of the reasoning behind the commit.  They were part of the 
> thread with the subject "RE: Interesting LKMM litmus test".
> 
> But I can't find those messages on lore.kernel.org (which isn't 
> surprising, as they weren't CC'ed to any mailing lists).

I am OK with them being made public, maybe in a Google document or
some such.

We would of course also need the consent of everyone else on that thread.

> > I could of course do both the extra paragraph -and- the Link:.  ;-)
> > 
> > Thoughts?
> 
> My advice: Omit them both.

It would be good to reference something or another.  ;-)

							Thanx, Paul
  
Boqun Feng Dec. 4, 2022, 8:33 a.m. UTC | #12
On Sun, Dec 04, 2022 at 12:15:27AM +0000, Jonas Oberhauser wrote:
> 
> 
> -----Original Message-----
> From: Paul E. McKenney [mailto:paulmck@kernel.org] 
> Sent: Sunday, December 4, 2022 12:11 AM
> To: stern@rowland.harvard.edu
> > On Sat, Dec 03, 2022 at 04:32:19PM -0500, stern@rowland.harvard.edu wrote:
> > > My advice: Omit them both.
> > It would be good to reference something or another.  ;-)
> 
> I also prefer to not refer to that presentation. 
> If there is a feeling that more context is needed, I would first
> prefer to enhance the commit message itself in some way. (Personally I
> don't feel that this is needed, and the imho the issue stands by
> itself even without reference to OOTA, which could be resolved fully
> independently e.g. by Viktor's suggestion to just axiomatically forbid
> OOTA --- the issue addressed by this patch would still exist). If

The reason that I'm gving you a hard time is that I haven't seen a real
world code usage that needs this fix, maybe there is one and I'm just
stupid and not knowing about. Your litmus explains the problem very well
but it's better if there is real world code expecting this ordering.

Not saying real world code is essential for memory model changes, but
without it, I guess the rationale of this patch is "plain accesses
shouldn't be weaker than registers" or "This (plain accesses don't
provide dependencies) is too conservative", and these don't seem very
strong without a bigger motivation behind it.

Also I'm in the impression that people love to put
READ_ONCE()/WRITE_ONCE() when they have some ordering issues (in real
world or with LKMM). Although I don't like this, but you cannot blame
people who just want more guarantee allowing their code to work ;-( This
is also another reason that I'd like to see strong reasoning of this
change.

Besides, could you also explain a little bit why only "data;rfi" can be
"carry-dep" but "ctrl;rfi" and "addr;rfi" cannot? I think it's because
there are special cases when compilers can figure out a condition being
true or an address being constant therefore break the dependency? But
maybe I'm wrong or missing something. Thank you!

(Please don't be mad at me, sometimes I'm just slow to understand things
;-))

Regards,
Boqun

> that's not satisfactory, I would also consent to publishing the
> e-mails from the thread starting where I relayed Viktor's observation
> of the relaxed accesses, but I don't recall it saying anything
> substantially beyond the current commit log + the documentation
> included in the patch.
> 
> best wishes, jonas
  
Alan Stern Dec. 5, 2022, 4:18 p.m. UTC | #13
On Mon, Dec 05, 2022 at 01:42:46PM +0000, Jonas Oberhauser wrote:
> > Besides, could you also explain a little bit why only "data;rfi" can be "carry-dep" but "ctrl;rfi" and "addr;rfi" cannot? I think it's because there are special cases when compilers can figure out a condition being true or an address being constant therefore break the dependency
> 
> Oh, good question. A bit hard for me to write down the answer clearly 
> (which some people will claim that I don't understand it well myself, 
> but I beg to differ :) :( :) ).
> 
> In a nutshell, it's because x ->data [Plain] ->rfi y ->... z fulfils 
> the same role as storing something in a register and then using it in 
> a subsequent computation; x ->ctrl y ->... z (and ->addr) don't. So 
> it's not due to smart compilers, just the fact that the other two 
> cases seem unrelated to the problem being solved, and including them 
> might introduce some unsoundness (not that I have checked if they do).

More can be said here.  Consider the following simple example:

	void P0(int *x, int *y)
	{
		int r1, r2;
		int a[10];

		r1 = READ_ONCE(*x);
		a[r1] = 1;
		r2 = a[r1];
		WRITE_ONCE(*y, r2);
	}

There is an address dependency from the READ_ONCE to the plain store in 
a[r1].  Then there is an rfi and a data dependency to the WRITE_ONCE.

But in this example, the WRITE_ONCE is _not_ ordered after the 
READ_ONCE, even though they are linked by (addr ; rfi ; data).  The 
compiler knows that the value of r1 does not change between the two 
plain accesses, so it knows that it can optimize the code to be:

	r1 = READ_ONCE(*x);
	r2 = 1;
	WRITE_ONCE(*y, r2);
	a[r1] = r2;

And then the CPU can execute the WRITE_ONCE before the READ_ONCE.  This 
shows that (addr ; rfi) must not be included in the carry-deps relation.

You may be able to come up with a similar argument for (ctrl ; rfi), 
although it might not be quite as clear.

Alan
  
Boqun Feng Dec. 5, 2022, 7:49 p.m. UTC | #14
On Mon, Dec 05, 2022 at 01:42:46PM +0000, Jonas Oberhauser wrote:
> > Without it, I guess the rationale of this patch is "plain accesses
> > shouldn't be weaker than registers" or "This (plain accesses don't
> > provide dependencies) is too conservative", and these don't seem
> > very strong without a bigger motivation behind it.
> 
> Good points, thanks.
> 
> 
> -----Original Message-----
> From: Boqun Feng [mailto:boqun.feng@gmail.com] 
> Sent: Sunday, December 4, 2022 9:33 AM
> >
> >  a real world code usage that needs this fix
> 
> Note that in the standard C world, 'int a' (and even 'register int a')
> declares an object, i.e., a memory location that is accessed with
> plain accesses. In my opinion, that 'int a' is a "register" in herd7
> is an implementation artifact that hides the current weakness
> addressed by this patch. So if you would actually analyze pretty much
> any code that looks like
> 
>    int a = <some atomic load>
>    if (a == ...) { <some atomic store> }
> 
> then under the standard interpretation of that code, a is a memory
> location, and there is (in current LKMM) no control dependency, and
> the code would be considered broken by LKMM.
> 
> Or in another view, whether something that is (provably) never
> accessed externally is to be considered a 'register' or a 'memory
> location' should be an implementation detail of the compiler etc.
> (with the possibility to freely do reg2mem/mem2reg), and not have any
> effect on the ordering guarantees. Notably most tools (GenMC for sure
> and I think also Dat3M) use such compiler transformations (especially
> mem2reg) because they live under the assumption that it doesn't change
> anything, but of course with current LKMM that renders all of these
> tools unsound; if you would carefully avoid to do any such
> transformations, then these tools would also report bugs in a lot more
> linux code.
> 

Thanks! This is exactly what I want to see ;-) Now I understand more
about the rationale of this change:

First, Linux kernel developers expect and use dependencies ordering.

Second, herd modeling has the concept of "registers" that carries the
dependency because herd also models asm code where "registers" do exist.

Third, for C standard or other models' viewpoint, LKMM has some
inconsistency since both "int a" and "*z" are "memory locations",
however LKMM currently respects dependencies carried by the first but
not those carried by the latter.

Finally, to close the gap with LKMM+herd with other model tools and C
standard, all dependencies that carried by "memory locations" should be
enforced in LKMM.

I must say I didn't get the "Third" part from your patch at first, now
re-read your patch, I think you did mention that in some degree. I guess
the reason I did get it first is because my mind is herded ;-)

How about adding some comments before "carry-dep", for example:

(* Redefine dependencies to include those carried through plain
accesses, since herd by default only generates dependencies via
"registers", while LKMM enforces dependencies carried by "memory
location" of C standard *)

I guess Alan and Paul can do better job on wording, so that just
expresses my idea. Maybe instead of saying "Make plain accesses carry
dependencies", we say "Make memory location carry dependencies" or
"Extend dependency carrying from registers to memory locations".

Thoughts?

> 
> 
> 
> > Also I'm in the impression that people love to put
> > READ_ONCE()/WRITE_ONCE() when they have some ordering issues (in
> > real world or with LKMM). 
> 
> Note that this would require to do 
> 
>   int a; WRITE_ONCE(&a, <some_atomic_load>); if (READ_ONCE(&a) == ...)
>   { <some_atomic_store>; }
> 
> to get a control dependency, and imho violate the point of relaxed
> marked accesses -- to point out where races are, but no (additional)
> ordering is required.
> 
> > Although I don't like this
> 
> I like this and I have forwarded Paul's email to some friends who like

"So you are these guys" ;-) I guess I shouldn't say "I don't like this",
I'm OK with either way since I also understand the benefit you mentioned
below.

> not to put relaxed marked/atomic accesses in their code "because it
> makes the code less readable". I completely disagree -- marking racy
> accesses makes the code more readable because it lets us know when
> something is subject to concurrent access, and modifying the logic may
> break code of other threads.
> 
> What I don't like is relying on dependencies in the first place, since
> they are hard to maintain -- some innocuous optimizations that people
> might want to do (like adding some read in front of a bunch of writes
> to see if they're really necessary) bypass the ordering imposed by the
> dependency. So I generally advise to use acq loads instead, unless
> there is a measurable performance impact. But that's a completely
> separate discussion...
> 

True. I actually agree with the "acq loads first, unless performance
benefit" approach.

[I will reply the ctrl/addr in a separate mail]

Regards,
Boqun
  
Boqun Feng Dec. 6, 2022, 8:46 p.m. UTC | #15
On Mon, Dec 05, 2022 at 11:18:13AM -0500, stern@rowland.harvard.edu wrote:
> On Mon, Dec 05, 2022 at 01:42:46PM +0000, Jonas Oberhauser wrote:
> > > Besides, could you also explain a little bit why only "data;rfi" can be "carry-dep" but "ctrl;rfi" and "addr;rfi" cannot? I think it's because there are special cases when compilers can figure out a condition being true or an address being constant therefore break the dependency
> > 
> > Oh, good question. A bit hard for me to write down the answer clearly 
> > (which some people will claim that I don't understand it well myself, 
> > but I beg to differ :) :( :) ).

Nah, I think your answer is clear to me ;-)

> > 
> > In a nutshell, it's because x ->data [Plain] ->rfi y ->... z fulfils 
> > the same role as storing something in a register and then using it in 
> > a subsequent computation; x ->ctrl y ->... z (and ->addr) don't. So 
> > it's not due to smart compilers, just the fact that the other two 
> > cases seem unrelated to the problem being solved, and including them 
> > might introduce some unsoundness (not that I have checked if they do).

So it's about whether a value can have a dataflow from x to y, right? In
that case registers and memory cells should be treated the same by
compilers, therefore we can extend the dependencies.
> 
> More can be said here.  Consider the following simple example:
> 
> 	void P0(int *x, int *y)
> 	{
> 		int r1, r2;
> 		int a[10];
> 
> 		r1 = READ_ONCE(*x);
> 		a[r1] = 1;
> 		r2 = a[r1];
> 		WRITE_ONCE(*y, r2);
> 	}
> 
> There is an address dependency from the READ_ONCE to the plain store in 
> a[r1].  Then there is an rfi and a data dependency to the WRITE_ONCE.
> 
> But in this example, the WRITE_ONCE is _not_ ordered after the 
> READ_ONCE, even though they are linked by (addr ; rfi ; data).  The 
> compiler knows that the value of r1 does not change between the two 
> plain accesses, so it knows that it can optimize the code to be:
> 
> 	r1 = READ_ONCE(*x);
> 	r2 = 1;
> 	WRITE_ONCE(*y, r2);
> 	a[r1] = r2;
> 
> And then the CPU can execute the WRITE_ONCE before the READ_ONCE.  This 
> shows that (addr ; rfi) must not be included in the carry-deps relation.
> 
> You may be able to come up with a similar argument for (ctrl ; rfi), 
> although it might not be quite as clear.
> 

Thank you, Alan! One question though, can a "smart" compiler optimize
out the case below, with the same logic?

	void P0(int *x, int *y, int *a)
	{
		int r1, r2;

		r1 = READ_ONCE(*x); // A

		*a = r1 & 0xffff; // B

		r2 = *a & 0xffff0000; // C

		WRITE_ONCE(*y, r2); // D

	}

I think we have A ->data B ->rfi C ->data D, however a "smart" compiler
can figure out that r2 is actually zero, right? And the code get
optimized to:

	r1 = READ_ONCE(*x);
	r2 = 0;
	WRITE_ONCE(*y, r2);
	*a = r1 & 0xffff;

and break the dependency.

I know that our memory model is actually unware of the differences of
syntatics dependencies vs semantics syntatics, so one may argue that in
the (data; rfi) example above the compiler optimization is outside the
scope of LKMM, but won't the same reasoning apply to the (addr; rfi)
example from you? The WRITE_ONCE() _syntatically_ depends on load of
a[r1], therefore even a "smart" compiler can figure out the value, LKMM
won't take that into consideration.

Am I missing something subtle here?

Regards,
Boqun

> Alan
  
Boqun Feng Dec. 6, 2022, 8:52 p.m. UTC | #16
On Tue, Dec 06, 2022 at 12:46:58PM -0800, Boqun Feng wrote:
> On Mon, Dec 05, 2022 at 11:18:13AM -0500, stern@rowland.harvard.edu wrote:
> > On Mon, Dec 05, 2022 at 01:42:46PM +0000, Jonas Oberhauser wrote:
> > > > Besides, could you also explain a little bit why only "data;rfi" can be "carry-dep" but "ctrl;rfi" and "addr;rfi" cannot? I think it's because there are special cases when compilers can figure out a condition being true or an address being constant therefore break the dependency
> > > 
> > > Oh, good question. A bit hard for me to write down the answer clearly 
> > > (which some people will claim that I don't understand it well myself, 
> > > but I beg to differ :) :( :) ).
> 
> Nah, I think your answer is clear to me ;-)
> 
> > > 
> > > In a nutshell, it's because x ->data [Plain] ->rfi y ->... z fulfils 
> > > the same role as storing something in a register and then using it in 
> > > a subsequent computation; x ->ctrl y ->... z (and ->addr) don't. So 
> > > it's not due to smart compilers, just the fact that the other two 
> > > cases seem unrelated to the problem being solved, and including them 
> > > might introduce some unsoundness (not that I have checked if they do).
> 
> So it's about whether a value can have a dataflow from x to y, right? In
> that case registers and memory cells should be treated the same by
> compilers, therefore we can extend the dependencies.
> > 
> > More can be said here.  Consider the following simple example:
> > 
> > 	void P0(int *x, int *y)
> > 	{
> > 		int r1, r2;
> > 		int a[10];
> > 
> > 		r1 = READ_ONCE(*x);
> > 		a[r1] = 1;
> > 		r2 = a[r1];
> > 		WRITE_ONCE(*y, r2);
> > 	}
> > 
> > There is an address dependency from the READ_ONCE to the plain store in 
> > a[r1].  Then there is an rfi and a data dependency to the WRITE_ONCE.
> > 
> > But in this example, the WRITE_ONCE is _not_ ordered after the 
> > READ_ONCE, even though they are linked by (addr ; rfi ; data).  The 
> > compiler knows that the value of r1 does not change between the two 
> > plain accesses, so it knows that it can optimize the code to be:
> > 
> > 	r1 = READ_ONCE(*x);
> > 	r2 = 1;
> > 	WRITE_ONCE(*y, r2);
> > 	a[r1] = r2;
> > 
> > And then the CPU can execute the WRITE_ONCE before the READ_ONCE.  This 
> > shows that (addr ; rfi) must not be included in the carry-deps relation.
> > 
> > You may be able to come up with a similar argument for (ctrl ; rfi), 
> > although it might not be quite as clear.
> > 
> 
> Thank you, Alan! One question though, can a "smart" compiler optimize
> out the case below, with the same logic?
> 
> 	void P0(int *x, int *y, int *a)
> 	{
> 		int r1, r2;
> 
> 		r1 = READ_ONCE(*x); // A
> 
> 		*a = r1 & 0xffff; // B
> 
> 		r2 = *a & 0xffff0000; // C
> 
> 		WRITE_ONCE(*y, r2); // D
> 
> 	}
> 
> I think we have A ->data B ->rfi C ->data D, however a "smart" compiler
> can figure out that r2 is actually zero, right? And the code get
> optimized to:
> 
> 	r1 = READ_ONCE(*x);
> 	r2 = 0;
> 	WRITE_ONCE(*y, r2);
> 	*a = r1 & 0xffff;
> 
> and break the dependency.
> 
> I know that our memory model is actually unware of the differences of
> syntatics dependencies vs semantics syntatics, so one may argue that in
> the (data; rfi) example above the compiler optimization is outside the
> scope of LKMM, but won't the same reasoning apply to the (addr; rfi)
> example from you? The WRITE_ONCE() _syntatically_ depends on load of
> a[r1], therefore even a "smart" compiler can figure out the value, LKMM

I guess it should be that r2 (i.e. the load of a[r1]) _syntatically_
depends on the value of r1.

Regards,
Boqun

> won't take that into consideration.
> 
> Am I missing something subtle here?
> 
> Regards,
> Boqun
> 
> > Alan
  
Boqun Feng Dec. 7, 2022, 1:43 a.m. UTC | #17
On Mon, Dec 05, 2022 at 01:42:46PM +0000, Jonas Oberhauser wrote:
[...]
> > Besides, could you also explain a little bit why only "data;rfi" can
> > be "carry-dep" but "ctrl;rfi" and "addr;rfi" cannot? I think it's
> > because there are special cases when compilers can figure out a
> > condition being true or an address being constant therefore break
> > the dependency
> 
> Oh, good question. A bit hard for me to write down the answer clearly
> (which some people will claim that I don't understand it well myself,
> but I beg to differ :) :( :) ).
> 
> In a nutshell, it's because  x ->data  [Plain]  ->rfi  y ->... z
> fulfils the same role as storing something in a register and then
> using it in a subsequent computation; x ->ctrl y ->... z (and ->addr)
> don't. So it's not due to smart compilers, just the fact that the
> other two cases seem unrelated to the problem being solved, and
> including them might introduce some unsoundness (not that I have
> checked if they do).
> 
> Mathematically I imagine the computation graph between register
> accesses/computations r_1,...,r_n and memory accesses m_1,...m_k that
> has an unlabeled edge  m_i -> r_j when m_i loads some data that is an
> input to the access/computation r_j, similarly it has an unlabeled
> edge r_i -> r_j  when the result of r_i is used as an input of r_j,
> and finally r_i ->data/ctrl/addr m_j when the value computed by r_j is
> the address/data or affects the ctrl of m_j.  So for example, if 'int
> a' were to declare a register, then
>     int a = READ_ONCE(&x);
>     if (a == 5) { WRITE_ONCE(&y,5); }
> would have something like (*) 4 events  (*= since I'm avoiding fully
> formalizing the graph model here I don't really define to which extent
> one splits up register reads, computations, etc., so I'll do it
> somewhat arbitrarily)
> m_1 = READ_ONCE(&x)
> r_1 = store the result of m_1 in a
> r_2 = compare the content of a to 5
> m_2 = WRITE_ONCE(&y, 5)
> 
> with the edges m_1 -> r_1 -> r_2 ->ctrl m_2
> 
> Then in the LKMM graph, we would have m_i ->ctrl m_j (or data or addr)
> iff there is a path m_i -> r_x1 -> ... -> r_xl ->ctrl m_j (or data or
> addr).
> So in the example above, m_1 ->ctrl m_2.
> 
> Now what we would do is look at what happens if the compiler/a
> tool/me/whatever interprets 'int a' as a memory location. Instead of
> r_1 and r_2 from above, we would have something like
> 
> m_1 = READ_ONCE(&x)
> r_3 = the result of m_1 (stored in a CPU register)
> m_3 = a := the r_3 result
> m_4 = load from a
> r_4 = the result of m_4 (stored in a CPU register)
> m_2 = WRITE_ONCE(&y, 5)
> 
> with m_1 -> r_3 ->data -> m_3  ->rfi  m_4 -> r_4 ->ctrl m_2
> and hence
> m_1 ->data m_3 ->rfi m_4 ->ctrl m_2
> 
> What the patch does is make sure that even under this interpretation,
> we still have m_1 ->ctrl m_2
> Note that this ->data ->rfi applies in every case where something is
> considered a register is turned into a memory location, because those
> cases always introduce a store with a fixed address (that memory
> location) where the data is the current content of the register, which
> is then read (internally) at the next time that data is picked up. A
> store to register never becomes a ctrl or addr edge, hence they are
> not included in the patch.
> 

Let me see if I can describe your approach in a more formal way (look at
me, pretending to know formal methods ;-))

Bascially, what you are saying is that for every valid litmus test with
dependencies (carried by local variables i.e. registers), there exists a
way to rewrite the litmus test by treating all (or any number of) local
variables as memory locations.

Lets say the set of the litmus tests to start with is L and the set of
the rewritten ones is L'is. Here is a rewrite rule I come up with:

1)	for every local named <n> on processor <P>, add an extra memory
	location (maybe named <n>_<P>) in the processor function, for
	example

		P0(int *x, int *y) {
			int a ...;
		}

	became
		P0(int *x, int *y, int *a_P0) {
		}

2)	for each <n> in 1), for each assignment of <n>, say:

		int <n> = <expr>; // H

		or 

		<n> = <expr>; // H
	
	rewrite it to
	
		int computer_register_<seq> = <expr>; // A
		*<n>_<P> = computer_register_<seq>; // B
	
	where <seq> is a self increasing counter that increases every
	step or the rewrite.

	This step introduces an extra edge A ->data B.

3)	for each <n> in 1), for each the read of <n>, say:

		Expr; // T
	
	rewrite it to

		int computer_register_<seq> = *<n>_<P>; // C
		Expr[<n>/computer_register_<seq>]; // D

	where <seq> is a self increasing counter that increases every
	step or the rewrite.

	"M[x/y]" means changing the expression by replace all appearance
	of variable x into y, e.g. (n1 = n + 1)[n/y] is n1 = y + 1.

	Note that for every rewrite 3), there must exists a
	corresponding rewrite 2), that C reads the value stored by the
	B of the rewrite 2). This is because the litmus tests in set L
	are valid, all variables must be assigned a value before used.

	This step introduces an extra edge B ->rfi C, and also for every
	dependency between H -> T in the old litmus test, it preserves
	between C -> D in the new litmus test.

4)	for each <n> and <P> in 1), for each expression in the exist
	clause, say:

		Expr
	
	rewrite it to

		Expr[<P>:<n>/<n>_<P>]

For example, by these rules, the following litmus test:

	P0(int *x, int *y) {
		int r = READ_ONCE(*x);
		WRITE_ONCE(*y, r);
	}
	exists(0:r = 1)

is rewritten into

	P0(int *x, int *y, int *r_P0) {
		int computer_register_0 = READ_ONCE(*x); // by 2)
		*r_P0 = computer_register_0;             // by 2)
		int computer_register_1 = *r_P0;         // by 3)
		WRITE_ONCE(*y, computer_register_1);     // by 3)
	}
	exists(r_P0 = 1)                                 // by 4)


It's obvious that the rewritten litmus tests are valid, and for every
dependency
	
	H ->dep T in litmus tests of set L

there must exists a

	A ->data ->rfi ->dep D in L'

("dep" is not the same as in linux-kernel.cat).

And note since L' is a set of valid litmus tests, we can do another
whole rewrite to L' and get L'' which will contains
->data->rfi->data->rfi->dep, and this covers the (data;rfi)* ;-)


Now the hard part, since the rewrite is only one-direction, i.e L => L',
or more preciselly the transverse closure of the rewrite is
one-direction. We can only prove that if there exists a valid litmus
test with dependency ->dep, we can construct a number of litmus tests
with (->data ->rfi)*->dep.

But to accept the new rules in your patch, we need the opposite
direction, that is, if there exists a (->data ->rfi)* ->dep, we can
somehow find a litmus test that preserves anything else but reduce

	(->data->rfi)*->dep
into

	->dep.

(I'm not sure the following is sound, since some more work is needed)

The first observation is that we can ignore ->data [Marked] ->rfi,
because if we can reduce

	(->data [Plain] ->rfi)* ->data [Marked] ->rfi -> (->data [Plain] ->rfi)* ->dep

to
	
	->data [Marked] ->rfi ->dep

we get "hb", which already has the ordering we want.

Now we can focus on ->data [Plain] ->rfi ->dep, e.g.

	x = READ_ONCE(..); // A
	*<N> = <expr>; // B, contains x
	r = Expr; // C, Expr contains *<N>
	WRITE_ONCE(.., r); // D

We need to assume that the litmus tests are data-race free, since we
only care about valid litmus tests, then it's OK. The rewritten looks
like the following:

	int computer_register_<seq>;
	x = READ_ONCE(..); // A
	computer_register = <expr>; // B, contains x
	// ^^^ basically reverse rewrite of the 2) above	

	r = Expr[*<N>/computer_register_<seq>]; // C
	// ^^^ basically reverse rewrite of the 3) above	

	*<N> = computer_register_<seq>; // R
	// ^^^ need this to keep the global memory effect

	WRITE_ONCE(.., r); // D
	
We need the data-race free assumption to replace a memory cell into a
local variable.

By this rewrite, We reduce
	A ->data B ->rfi C ->dep D
into
	A ->dep D.

A few details are missing here, for example we may have multip Cs and
need to rewrite all of them, and we need normalization work like
converting litmus tests into canonical forms (e.g change the += into
temporary variables and assignment). Also we need to prove that no more
edges (or edges we care) are added. But these look obvious to me, and
enough formally digging for me today ;-)

As a result, the rewrite operation on L is proved to be isomorphic! ;-)
In the sense that we think reducing to (->data [Marked] ->rfi)* ->dep
is good enough ;-)

Now for every litmus test with

	(->data ->rfi)* ->dep

we can rewrite it into another litmus test preserving all other edges
except replacing the above with

	(->data [Marked] ->rfi)* ->dep

and the rewrite only contains replacing non-data-race accesses with
local variables and in the sense of C standards and maybe other model
tools these litmus tests are equivalent.

> Please let me know if this is convincing and clear. If so we could
> link to your e-mail and my response to provide context, without having
> to put a huge explanation into the commit message. If that's not
> desirable I can also think about how to compress this into a shorter
> explanation.
> 

My intution now is the rewrite explanation above, which is good enough
for me but not sure for other audience. Thank you for keep explaining
this to me ;-)


Regards,
Boqun

> Best wishes,
> Jonas
> 
> PS:
> 
> > sometimes I'm just slow to understand things ;-)
> 
> Well, welcome in the club :D
  
Boqun Feng Dec. 8, 2022, 9:06 p.m. UTC | #18
On Tue, Dec 06, 2022 at 05:43:49PM -0800, Boqun Feng wrote:
> On Mon, Dec 05, 2022 at 01:42:46PM +0000, Jonas Oberhauser wrote:
> [...]
> > > Besides, could you also explain a little bit why only "data;rfi" can
> > > be "carry-dep" but "ctrl;rfi" and "addr;rfi" cannot? I think it's
> > > because there are special cases when compilers can figure out a
> > > condition being true or an address being constant therefore break
> > > the dependency
> > 
> > Oh, good question. A bit hard for me to write down the answer clearly
> > (which some people will claim that I don't understand it well myself,
> > but I beg to differ :) :( :) ).
> > 
> > In a nutshell, it's because  x ->data  [Plain]  ->rfi  y ->... z
> > fulfils the same role as storing something in a register and then
> > using it in a subsequent computation; x ->ctrl y ->... z (and ->addr)
> > don't. So it's not due to smart compilers, just the fact that the
> > other two cases seem unrelated to the problem being solved, and
> > including them might introduce some unsoundness (not that I have
> > checked if they do).
> > 
> > Mathematically I imagine the computation graph between register
> > accesses/computations r_1,...,r_n and memory accesses m_1,...m_k that
> > has an unlabeled edge  m_i -> r_j when m_i loads some data that is an
> > input to the access/computation r_j, similarly it has an unlabeled
> > edge r_i -> r_j  when the result of r_i is used as an input of r_j,
> > and finally r_i ->data/ctrl/addr m_j when the value computed by r_j is
> > the address/data or affects the ctrl of m_j.  So for example, if 'int
> > a' were to declare a register, then
> >     int a = READ_ONCE(&x);
> >     if (a == 5) { WRITE_ONCE(&y,5); }
> > would have something like (*) 4 events  (*= since I'm avoiding fully
> > formalizing the graph model here I don't really define to which extent
> > one splits up register reads, computations, etc., so I'll do it
> > somewhat arbitrarily)
> > m_1 = READ_ONCE(&x)
> > r_1 = store the result of m_1 in a
> > r_2 = compare the content of a to 5
> > m_2 = WRITE_ONCE(&y, 5)
> > 
> > with the edges m_1 -> r_1 -> r_2 ->ctrl m_2
> > 
> > Then in the LKMM graph, we would have m_i ->ctrl m_j (or data or addr)
> > iff there is a path m_i -> r_x1 -> ... -> r_xl ->ctrl m_j (or data or
> > addr).
> > So in the example above, m_1 ->ctrl m_2.
> > 
> > Now what we would do is look at what happens if the compiler/a
> > tool/me/whatever interprets 'int a' as a memory location. Instead of
> > r_1 and r_2 from above, we would have something like
> > 
> > m_1 = READ_ONCE(&x)
> > r_3 = the result of m_1 (stored in a CPU register)
> > m_3 = a := the r_3 result
> > m_4 = load from a
> > r_4 = the result of m_4 (stored in a CPU register)
> > m_2 = WRITE_ONCE(&y, 5)
> > 
> > with m_1 -> r_3 ->data -> m_3  ->rfi  m_4 -> r_4 ->ctrl m_2
> > and hence
> > m_1 ->data m_3 ->rfi m_4 ->ctrl m_2
> > 
> > What the patch does is make sure that even under this interpretation,
> > we still have m_1 ->ctrl m_2
> > Note that this ->data ->rfi applies in every case where something is
> > considered a register is turned into a memory location, because those
> > cases always introduce a store with a fixed address (that memory
> > location) where the data is the current content of the register, which
> > is then read (internally) at the next time that data is picked up. A
> > store to register never becomes a ctrl or addr edge, hence they are
> > not included in the patch.
> > 
> 
> Let me see if I can describe your approach in a more formal way (look at
> me, pretending to know formal methods ;-))
> 
> Bascially, what you are saying is that for every valid litmus test with
> dependencies (carried by local variables i.e. registers), there exists a
> way to rewrite the litmus test by treating all (or any number of) local
> variables as memory locations.
> 
> Lets say the set of the litmus tests to start with is L and the set of
> the rewritten ones is L'is. Here is a rewrite rule I come up with:
> 
> 1)	for every local named <n> on processor <P>, add an extra memory
> 	location (maybe named <n>_<P>) in the processor function, for
> 	example
> 
> 		P0(int *x, int *y) {
> 			int a ...;
> 		}
> 
> 	became
> 		P0(int *x, int *y, int *a_P0) {
> 		}
> 
> 2)	for each <n> in 1), for each assignment of <n>, say:
> 
> 		int <n> = <expr>; // H
> 
> 		or 
> 
> 		<n> = <expr>; // H
> 	
> 	rewrite it to
> 	
> 		int computer_register_<seq> = <expr>; // A
> 		*<n>_<P> = computer_register_<seq>; // B
> 	
> 	where <seq> is a self increasing counter that increases every
> 	step or the rewrite.
> 
> 	This step introduces an extra edge A ->data B.
> 
> 3)	for each <n> in 1), for each the read of <n>, say:
> 
> 		Expr; // T
> 	
> 	rewrite it to
> 
> 		int computer_register_<seq> = *<n>_<P>; // C
> 		Expr[<n>/computer_register_<seq>]; // D
> 
> 	where <seq> is a self increasing counter that increases every
> 	step or the rewrite.
> 
> 	"M[x/y]" means changing the expression by replace all appearance
> 	of variable x into y, e.g. (n1 = n + 1)[n/y] is n1 = y + 1.
> 
> 	Note that for every rewrite 3), there must exists a
> 	corresponding rewrite 2), that C reads the value stored by the
> 	B of the rewrite 2). This is because the litmus tests in set L
> 	are valid, all variables must be assigned a value before used.
> 
> 	This step introduces an extra edge B ->rfi C, and also for every
> 	dependency between H -> T in the old litmus test, it preserves
> 	between C -> D in the new litmus test.
> 
> 4)	for each <n> and <P> in 1), for each expression in the exist
> 	clause, say:
> 
> 		Expr
> 	
> 	rewrite it to
> 
> 		Expr[<P>:<n>/<n>_<P>]
> 
> For example, by these rules, the following litmus test:
> 
> 	P0(int *x, int *y) {
> 		int r = READ_ONCE(*x);
> 		WRITE_ONCE(*y, r);
> 	}
> 	exists(0:r = 1)
> 
> is rewritten into
> 
> 	P0(int *x, int *y, int *r_P0) {
> 		int computer_register_0 = READ_ONCE(*x); // by 2)
> 		*r_P0 = computer_register_0;             // by 2)
> 		int computer_register_1 = *r_P0;         // by 3)
> 		WRITE_ONCE(*y, computer_register_1);     // by 3)
> 	}
> 	exists(r_P0 = 1)                                 // by 4)
> 
> 
> It's obvious that the rewritten litmus tests are valid, and for every
> dependency
> 	
> 	H ->dep T in litmus tests of set L
> 
> there must exists a
> 
> 	A ->data ->rfi ->dep D in L'
> 
> ("dep" is not the same as in linux-kernel.cat).
> 
> And note since L' is a set of valid litmus tests, we can do another
> whole rewrite to L' and get L'' which will contains
> ->data->rfi->data->rfi->dep, and this covers the (data;rfi)* ;-)
> 
> 
> Now the hard part, since the rewrite is only one-direction, i.e L => L',
> or more preciselly the transverse closure of the rewrite is
> one-direction. We can only prove that if there exists a valid litmus
> test with dependency ->dep, we can construct a number of litmus tests
> with (->data ->rfi)*->dep.
> 
> But to accept the new rules in your patch, we need the opposite
> direction, that is, if there exists a (->data ->rfi)* ->dep, we can
> somehow find a litmus test that preserves anything else but reduce
> 
> 	(->data->rfi)*->dep
> into
> 
> 	->dep.
> 
> (I'm not sure the following is sound, since some more work is needed)
> 
> The first observation is that we can ignore ->data [Marked] ->rfi,
> because if we can reduce
> 
> 	(->data [Plain] ->rfi)* ->data [Marked] ->rfi -> (->data [Plain] ->rfi)* ->dep
> 
> to
> 	
> 	->data [Marked] ->rfi ->dep
> 
> we get "hb", which already has the ordering we want.
> 
> Now we can focus on ->data [Plain] ->rfi ->dep, e.g.
> 
> 	x = READ_ONCE(..); // A
> 	*<N> = <expr>; // B, contains x
> 	r = Expr; // C, Expr contains *<N>
> 	WRITE_ONCE(.., r); // D
> 
> We need to assume that the litmus tests are data-race free, since we
> only care about valid litmus tests, then it's OK. The rewritten looks
> like the following:
> 
> 	int computer_register_<seq>;
> 	x = READ_ONCE(..); // A
> 	computer_register = <expr>; // B, contains x
> 	// ^^^ basically reverse rewrite of the 2) above	
> 
> 	r = Expr[*<N>/computer_register_<seq>]; // C
> 	// ^^^ basically reverse rewrite of the 3) above	
> 
> 	*<N> = computer_register_<seq>; // R
> 	// ^^^ need this to keep the global memory effect
> 
> 	WRITE_ONCE(.., r); // D
> 	
> We need the data-race free assumption to replace a memory cell into a
> local variable.
> 
> By this rewrite, We reduce
> 	A ->data B ->rfi C ->dep D
> into
> 	A ->dep D.
> 
> A few details are missing here, for example we may have multip Cs and
> need to rewrite all of them, and we need normalization work like
> converting litmus tests into canonical forms (e.g change the += into
> temporary variables and assignment). Also we need to prove that no more
> edges (or edges we care) are added. But these look obvious to me, and
> enough formally digging for me today ;-)
> 
> As a result, the rewrite operation on L is proved to be isomorphic! ;-)
> In the sense that we think reducing to (->data [Marked] ->rfi)* ->dep
> is good enough ;-)
> 
> Now for every litmus test with
> 
> 	(->data ->rfi)* ->dep
> 
> we can rewrite it into another litmus test preserving all other edges
> except replacing the above with
> 
> 	(->data [Marked] ->rfi)* ->dep
> 
> and the rewrite only contains replacing non-data-race accesses with
> local variables and in the sense of C standards and maybe other model
> tools these litmus tests are equivalent.
> 
> > Please let me know if this is convincing and clear. If so we could
> > link to your e-mail and my response to provide context, without having
> > to put a huge explanation into the commit message. If that's not
> > desirable I can also think about how to compress this into a shorter
> > explanation.
> > 
> 
> My intution now is the rewrite explanation above, which is good enough
> for me but not sure for other audience. Thank you for keep explaining
> this to me ;-)
> 
> 
> Regards,
> Boqun
> 

Paul, I think Jonas and I reach out to some degree of argeement on more
details of this change, at least I learned a lot ;-)

Could you add the following lines in the commit log if you haven't send
a PR?

	More deep details can be found at the LKML discussion[1]

	[1]: https://lore.kernel.org/lkml/d86295788ad14a02874ab030ddb8a6f8@huawei.com/

I think Jonas' email covers most of the details behind the change and
can server as a "Quick Quiz" for the readers of the commit log ;-)

Thoughts?

Regards,
Boqun

> > Best wishes,
> > Jonas
> > 
> > PS:
> > 
> > > sometimes I'm just slow to understand things ;-)
> > 
> > Well, welcome in the club :D
  
Alan Stern Dec. 8, 2022, 9:37 p.m. UTC | #19
On Tue, Dec 06, 2022 at 12:52:47PM -0800, Boqun Feng wrote:
> On Tue, Dec 06, 2022 at 12:46:58PM -0800, Boqun Feng wrote:
> > Thank you, Alan! One question though, can a "smart" compiler optimize
> > out the case below, with the same logic?
> > 
> > 	void P0(int *x, int *y, int *a)
> > 	{
> > 		int r1, r2;
> > 
> > 		r1 = READ_ONCE(*x); // A
> > 
> > 		*a = r1 & 0xffff; // B
> > 
> > 		r2 = *a & 0xffff0000; // C
> > 
> > 		WRITE_ONCE(*y, r2); // D
> > 
> > 	}
> > 
> > I think we have A ->data B ->rfi C ->data D, however a "smart" compiler
> > can figure out that r2 is actually zero, right? And the code get
> > optimized to:
> > 
> > 	r1 = READ_ONCE(*x);
> > 	r2 = 0;
> > 	WRITE_ONCE(*y, r2);
> > 	*a = r1 & 0xffff;
> > 
> > and break the dependency.

Yes, that could happen.

> > I know that our memory model is actually unware of the differences of
> > syntatics dependencies vs semantics syntatics, so one may argue that in
> > the (data; rfi) example above the compiler optimization is outside the
> > scope of LKMM, but won't the same reasoning apply to the (addr; rfi)
> > example from you? The WRITE_ONCE() _syntatically_ depends on load of
> > a[r1], therefore even a "smart" compiler can figure out the value, LKMM
> 
> I guess it should be that r2 (i.e. the load of a[r1]) _syntatically_
> depends on the value of r1.

Yes.  But more to the point, the LKMM already has this problem for 
ordinary dependencies.  If you do:

	r1 = READ_ONCE(*x);
	r2 = r1 & 0x0000ffff;
	r3 = r2 & 0xffff0000;
	WRITE_ONCE(*y, r3);

then the LKMM will think there is a dependency (because there is a 
_syntactic_ dependency), but the compiler is likely to realize that 
there isn't a _semantic_ dependency and will destroy the ordering.

We warn people about this possibility, and the same warning applies to 
dependencies carried by plain accesses.  So I don't think this is a 
reason to object to Jonas's carries-dep relation.

Alan
  
Paul E. McKenney Dec. 9, 2022, 12:01 a.m. UTC | #20
On Thu, Dec 08, 2022 at 01:06:20PM -0800, Boqun Feng wrote:
> On Tue, Dec 06, 2022 at 05:43:49PM -0800, Boqun Feng wrote:
> > On Mon, Dec 05, 2022 at 01:42:46PM +0000, Jonas Oberhauser wrote:
> > [...]
> > > > Besides, could you also explain a little bit why only "data;rfi" can
> > > > be "carry-dep" but "ctrl;rfi" and "addr;rfi" cannot? I think it's
> > > > because there are special cases when compilers can figure out a
> > > > condition being true or an address being constant therefore break
> > > > the dependency
> > > 
> > > Oh, good question. A bit hard for me to write down the answer clearly
> > > (which some people will claim that I don't understand it well myself,
> > > but I beg to differ :) :( :) ).
> > > 
> > > In a nutshell, it's because  x ->data  [Plain]  ->rfi  y ->... z
> > > fulfils the same role as storing something in a register and then
> > > using it in a subsequent computation; x ->ctrl y ->... z (and ->addr)
> > > don't. So it's not due to smart compilers, just the fact that the
> > > other two cases seem unrelated to the problem being solved, and
> > > including them might introduce some unsoundness (not that I have
> > > checked if they do).
> > > 
> > > Mathematically I imagine the computation graph between register
> > > accesses/computations r_1,...,r_n and memory accesses m_1,...m_k that
> > > has an unlabeled edge  m_i -> r_j when m_i loads some data that is an
> > > input to the access/computation r_j, similarly it has an unlabeled
> > > edge r_i -> r_j  when the result of r_i is used as an input of r_j,
> > > and finally r_i ->data/ctrl/addr m_j when the value computed by r_j is
> > > the address/data or affects the ctrl of m_j.  So for example, if 'int
> > > a' were to declare a register, then
> > >     int a = READ_ONCE(&x);
> > >     if (a == 5) { WRITE_ONCE(&y,5); }
> > > would have something like (*) 4 events  (*= since I'm avoiding fully
> > > formalizing the graph model here I don't really define to which extent
> > > one splits up register reads, computations, etc., so I'll do it
> > > somewhat arbitrarily)
> > > m_1 = READ_ONCE(&x)
> > > r_1 = store the result of m_1 in a
> > > r_2 = compare the content of a to 5
> > > m_2 = WRITE_ONCE(&y, 5)
> > > 
> > > with the edges m_1 -> r_1 -> r_2 ->ctrl m_2
> > > 
> > > Then in the LKMM graph, we would have m_i ->ctrl m_j (or data or addr)
> > > iff there is a path m_i -> r_x1 -> ... -> r_xl ->ctrl m_j (or data or
> > > addr).
> > > So in the example above, m_1 ->ctrl m_2.
> > > 
> > > Now what we would do is look at what happens if the compiler/a
> > > tool/me/whatever interprets 'int a' as a memory location. Instead of
> > > r_1 and r_2 from above, we would have something like
> > > 
> > > m_1 = READ_ONCE(&x)
> > > r_3 = the result of m_1 (stored in a CPU register)
> > > m_3 = a := the r_3 result
> > > m_4 = load from a
> > > r_4 = the result of m_4 (stored in a CPU register)
> > > m_2 = WRITE_ONCE(&y, 5)
> > > 
> > > with m_1 -> r_3 ->data -> m_3  ->rfi  m_4 -> r_4 ->ctrl m_2
> > > and hence
> > > m_1 ->data m_3 ->rfi m_4 ->ctrl m_2
> > > 
> > > What the patch does is make sure that even under this interpretation,
> > > we still have m_1 ->ctrl m_2
> > > Note that this ->data ->rfi applies in every case where something is
> > > considered a register is turned into a memory location, because those
> > > cases always introduce a store with a fixed address (that memory
> > > location) where the data is the current content of the register, which
> > > is then read (internally) at the next time that data is picked up. A
> > > store to register never becomes a ctrl or addr edge, hence they are
> > > not included in the patch.
> > > 
> > 
> > Let me see if I can describe your approach in a more formal way (look at
> > me, pretending to know formal methods ;-))
> > 
> > Bascially, what you are saying is that for every valid litmus test with
> > dependencies (carried by local variables i.e. registers), there exists a
> > way to rewrite the litmus test by treating all (or any number of) local
> > variables as memory locations.
> > 
> > Lets say the set of the litmus tests to start with is L and the set of
> > the rewritten ones is L'is. Here is a rewrite rule I come up with:
> > 
> > 1)	for every local named <n> on processor <P>, add an extra memory
> > 	location (maybe named <n>_<P>) in the processor function, for
> > 	example
> > 
> > 		P0(int *x, int *y) {
> > 			int a ...;
> > 		}
> > 
> > 	became
> > 		P0(int *x, int *y, int *a_P0) {
> > 		}
> > 
> > 2)	for each <n> in 1), for each assignment of <n>, say:
> > 
> > 		int <n> = <expr>; // H
> > 
> > 		or 
> > 
> > 		<n> = <expr>; // H
> > 	
> > 	rewrite it to
> > 	
> > 		int computer_register_<seq> = <expr>; // A
> > 		*<n>_<P> = computer_register_<seq>; // B
> > 	
> > 	where <seq> is a self increasing counter that increases every
> > 	step or the rewrite.
> > 
> > 	This step introduces an extra edge A ->data B.
> > 
> > 3)	for each <n> in 1), for each the read of <n>, say:
> > 
> > 		Expr; // T
> > 	
> > 	rewrite it to
> > 
> > 		int computer_register_<seq> = *<n>_<P>; // C
> > 		Expr[<n>/computer_register_<seq>]; // D
> > 
> > 	where <seq> is a self increasing counter that increases every
> > 	step or the rewrite.
> > 
> > 	"M[x/y]" means changing the expression by replace all appearance
> > 	of variable x into y, e.g. (n1 = n + 1)[n/y] is n1 = y + 1.
> > 
> > 	Note that for every rewrite 3), there must exists a
> > 	corresponding rewrite 2), that C reads the value stored by the
> > 	B of the rewrite 2). This is because the litmus tests in set L
> > 	are valid, all variables must be assigned a value before used.
> > 
> > 	This step introduces an extra edge B ->rfi C, and also for every
> > 	dependency between H -> T in the old litmus test, it preserves
> > 	between C -> D in the new litmus test.
> > 
> > 4)	for each <n> and <P> in 1), for each expression in the exist
> > 	clause, say:
> > 
> > 		Expr
> > 	
> > 	rewrite it to
> > 
> > 		Expr[<P>:<n>/<n>_<P>]
> > 
> > For example, by these rules, the following litmus test:
> > 
> > 	P0(int *x, int *y) {
> > 		int r = READ_ONCE(*x);
> > 		WRITE_ONCE(*y, r);
> > 	}
> > 	exists(0:r = 1)
> > 
> > is rewritten into
> > 
> > 	P0(int *x, int *y, int *r_P0) {
> > 		int computer_register_0 = READ_ONCE(*x); // by 2)
> > 		*r_P0 = computer_register_0;             // by 2)
> > 		int computer_register_1 = *r_P0;         // by 3)
> > 		WRITE_ONCE(*y, computer_register_1);     // by 3)
> > 	}
> > 	exists(r_P0 = 1)                                 // by 4)
> > 
> > 
> > It's obvious that the rewritten litmus tests are valid, and for every
> > dependency
> > 	
> > 	H ->dep T in litmus tests of set L
> > 
> > there must exists a
> > 
> > 	A ->data ->rfi ->dep D in L'
> > 
> > ("dep" is not the same as in linux-kernel.cat).
> > 
> > And note since L' is a set of valid litmus tests, we can do another
> > whole rewrite to L' and get L'' which will contains
> > ->data->rfi->data->rfi->dep, and this covers the (data;rfi)* ;-)
> > 
> > 
> > Now the hard part, since the rewrite is only one-direction, i.e L => L',
> > or more preciselly the transverse closure of the rewrite is
> > one-direction. We can only prove that if there exists a valid litmus
> > test with dependency ->dep, we can construct a number of litmus tests
> > with (->data ->rfi)*->dep.
> > 
> > But to accept the new rules in your patch, we need the opposite
> > direction, that is, if there exists a (->data ->rfi)* ->dep, we can
> > somehow find a litmus test that preserves anything else but reduce
> > 
> > 	(->data->rfi)*->dep
> > into
> > 
> > 	->dep.
> > 
> > (I'm not sure the following is sound, since some more work is needed)
> > 
> > The first observation is that we can ignore ->data [Marked] ->rfi,
> > because if we can reduce
> > 
> > 	(->data [Plain] ->rfi)* ->data [Marked] ->rfi -> (->data [Plain] ->rfi)* ->dep
> > 
> > to
> > 	
> > 	->data [Marked] ->rfi ->dep
> > 
> > we get "hb", which already has the ordering we want.
> > 
> > Now we can focus on ->data [Plain] ->rfi ->dep, e.g.
> > 
> > 	x = READ_ONCE(..); // A
> > 	*<N> = <expr>; // B, contains x
> > 	r = Expr; // C, Expr contains *<N>
> > 	WRITE_ONCE(.., r); // D
> > 
> > We need to assume that the litmus tests are data-race free, since we
> > only care about valid litmus tests, then it's OK. The rewritten looks
> > like the following:
> > 
> > 	int computer_register_<seq>;
> > 	x = READ_ONCE(..); // A
> > 	computer_register = <expr>; // B, contains x
> > 	// ^^^ basically reverse rewrite of the 2) above	
> > 
> > 	r = Expr[*<N>/computer_register_<seq>]; // C
> > 	// ^^^ basically reverse rewrite of the 3) above	
> > 
> > 	*<N> = computer_register_<seq>; // R
> > 	// ^^^ need this to keep the global memory effect
> > 
> > 	WRITE_ONCE(.., r); // D
> > 	
> > We need the data-race free assumption to replace a memory cell into a
> > local variable.
> > 
> > By this rewrite, We reduce
> > 	A ->data B ->rfi C ->dep D
> > into
> > 	A ->dep D.
> > 
> > A few details are missing here, for example we may have multip Cs and
> > need to rewrite all of them, and we need normalization work like
> > converting litmus tests into canonical forms (e.g change the += into
> > temporary variables and assignment). Also we need to prove that no more
> > edges (or edges we care) are added. But these look obvious to me, and
> > enough formally digging for me today ;-)
> > 
> > As a result, the rewrite operation on L is proved to be isomorphic! ;-)
> > In the sense that we think reducing to (->data [Marked] ->rfi)* ->dep
> > is good enough ;-)
> > 
> > Now for every litmus test with
> > 
> > 	(->data ->rfi)* ->dep
> > 
> > we can rewrite it into another litmus test preserving all other edges
> > except replacing the above with
> > 
> > 	(->data [Marked] ->rfi)* ->dep
> > 
> > and the rewrite only contains replacing non-data-race accesses with
> > local variables and in the sense of C standards and maybe other model
> > tools these litmus tests are equivalent.
> > 
> > > Please let me know if this is convincing and clear. If so we could
> > > link to your e-mail and my response to provide context, without having
> > > to put a huge explanation into the commit message. If that's not
> > > desirable I can also think about how to compress this into a shorter
> > > explanation.
> > > 
> > 
> > My intution now is the rewrite explanation above, which is good enough
> > for me but not sure for other audience. Thank you for keep explaining
> > this to me ;-)
> > 
> > 
> > Regards,
> > Boqun
> > 
> 
> Paul, I think Jonas and I reach out to some degree of argeement on more
> details of this change, at least I learned a lot ;-)
> 
> Could you add the following lines in the commit log if you haven't send
> a PR?
> 
> 	More deep details can be found at the LKML discussion[1]
> 
> 	[1]: https://lore.kernel.org/lkml/d86295788ad14a02874ab030ddb8a6f8@huawei.com/
> 
> I think Jonas' email covers most of the details behind the change and
> can server as a "Quick Quiz" for the readers of the commit log ;-)
> 
> Thoughts?

I am not so sure that any such quiz would be quick, but life is often
like that for memory models.  ;-)

Thank you all, and unless there are further objections, I will update
this on the next rebase.

						Thanx, Paul

> Regards,
> Boqun
> 
> > > Best wishes,
> > > Jonas
> > > 
> > > PS:
> > > 
> > > > sometimes I'm just slow to understand things ;-)
> > > 
> > > Well, welcome in the club :D
  

Patch

diff --git a/tools/memory-model/Documentation/explanation.txt b/tools/memory-model/Documentation/explanation.txt
index e901b47236c3..8e7085238470 100644
--- a/tools/memory-model/Documentation/explanation.txt
+++ b/tools/memory-model/Documentation/explanation.txt
@@ -2575,7 +2575,7 @@  smp_store_release() -- which is basically how the Linux kernel treats
 them.
 
 Although we said that plain accesses are not linked by the ppo
-relation, they do contribute to it indirectly.  Namely, when there is
+relation, they do contribute to it indirectly.  Firstly, when there is
 an address dependency from a marked load R to a plain store W,
 followed by smp_wmb() and then a marked store W', the LKMM creates a
 ppo link from R to W'.  The reasoning behind this is perhaps a little
@@ -2584,6 +2584,13 @@  for this source code in which W' could execute before R.  Just as with
 pre-bounding by address dependencies, it is possible for the compiler
 to undermine this relation if sufficient care is not taken.
 
+Secondly, plain accesses can carry dependencies: If a data dependency
+links a marked load R to a store W, and the store is read by a load R'
+from the same thread, then the data loaded by R' depends on the data
+loaded originally by R. Thus, if R' is linked to any access X by a
+dependency, R is also linked to access X by the same dependency, even
+if W' or R' (or both!) are plain.
+
 There are a few oddball fences which need special treatment:
 smp_mb__before_atomic(), smp_mb__after_atomic(), and
 smp_mb__after_spinlock().  The LKMM uses fence events with special
diff --git a/tools/memory-model/linux-kernel.bell b/tools/memory-model/linux-kernel.bell
index 65c32ca9d5ea..5f0b98c1ab81 100644
--- a/tools/memory-model/linux-kernel.bell
+++ b/tools/memory-model/linux-kernel.bell
@@ -76,3 +76,9 @@  flag ~empty different-values(srcu-rscs) as srcu-bad-nesting
 let Marked = (~M) | IW | Once | Release | Acquire | domain(rmw) | range(rmw) |
 		LKR | LKW | UL | LF | RL | RU
 let Plain = M \ Marked
+
+(* Redefine dependencies to include those carried through plain accesses *)
+let carry-dep = (data ; rfi)*
+let addr = carry-dep ; addr
+let ctrl = carry-dep ; ctrl
+let data = carry-dep ; data
diff --git a/tools/memory-model/litmus-tests/dep+plain.litmus b/tools/memory-model/litmus-tests/dep+plain.litmus
new file mode 100644
index 000000000000..ebf84daa9a59
--- /dev/null
+++ b/tools/memory-model/litmus-tests/dep+plain.litmus
@@ -0,0 +1,31 @@ 
+C dep+plain
+
+(*
+ * Result: Never
+ *
+ * This litmus test demonstrates that in LKMM, plain accesses
+ * carry dependencies much like accesses to registers:
+ * The data stored to *z1 and *z2 by P0() originates from P0()'s
+ * READ_ONCE(), and therefore using that data to compute the
+ * conditional of P0()'s if-statement creates a control dependency
+ * from that READ_ONCE() to P0()'s WRITE_ONCE().
+ *)
+
+{}
+
+P0(int *x, int *y, int *z1, int *z2)
+{
+	int a = READ_ONCE(*x);
+	*z1 = a;
+	*z2 = *z1;
+	if (*z2 == 1)
+		WRITE_ONCE(*y, 1);
+}
+
+P1(int *x, int *y)
+{
+	int r = smp_load_acquire(y);
+	smp_store_release(x, r);
+}
+
+exists (x=1 /\ y=1)