doc: Reword the description of -mrelax-cmpxchg-loop [PR 107676]

Message ID 20221115033559.66827-1-hongyu.wang@intel.com
State Accepted
Headers
Series doc: Reword the description of -mrelax-cmpxchg-loop [PR 107676] |

Checks

Context Check Description
snail/gcc-patch-check success Github commit url

Commit Message

Hongyu Wang Nov. 15, 2022, 3:35 a.m. UTC
  Hi,

According to PR 107676, the document of -mrelax-cmpxchg-loop is nonsensical.
Adjust the wording according to the comments.

Bootstrapped on x86_64-pc-linux-gnu, ok for trunk?

gcc/ChangeLog:

	PR target/107676
	* doc/invoke.texi: Reword the description of
	-mrelax-cmpxchg-loop.
---
 gcc/doc/invoke.texi | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)
  

Comments

Jonathan Wakely Nov. 15, 2022, 10:53 a.m. UTC | #1
On 15/11/22 11:35 +0800, Hongyu Wang wrote:
>Hi,
>
>According to PR 107676, the document of -mrelax-cmpxchg-loop is nonsensical.
>Adjust the wording according to the comments.
>
>Bootstrapped on x86_64-pc-linux-gnu, ok for trunk?
>
>gcc/ChangeLog:
>
>	PR target/107676
>	* doc/invoke.texi: Reword the description of
>	-mrelax-cmpxchg-loop.
>---
> gcc/doc/invoke.texi | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
>diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>index 40f667a630a..bdd7c319aef 100644
>--- a/gcc/doc/invoke.texi
>+++ b/gcc/doc/invoke.texi
>@@ -33805,10 +33805,12 @@ registers.
>
> @item -mrelax-cmpxchg-loop
> @opindex mrelax-cmpxchg-loop
>-Relax cmpxchg loop by emitting an early load and compare before cmpxchg,
>-execute pause if load value is not expected. This reduces excessive
>-cachline bouncing when and works for all atomic logic fetch builtins
>-that generates compare and swap loop.
>+For compare and swap loops that emitted by some __atomic_* builtins

s/that emitted/that are emitted/

>+(e.g. __atomic_fetch_(or|and|xor|nand) and their __atomic_*_fetch
>+counterparts), emit an atomic load before cmpxchg instruction. If the

s/before cmpxchg/before the cmpxchg/

>+loaded value is not equal to expected, execute a pause instead of

s/not equal to expected/not equal to the expected/

>+directly run the cmpxchg instruction. This might reduce excessive

s/directly run/directly running/

>+cacheline bouncing.
>
> @item -mindirect-branch=@var{choice}
> @opindex mindirect-branch
  
Alexander Monakov Nov. 15, 2022, 1:28 p.m. UTC | #2
On Tue, 15 Nov 2022, Jonathan Wakely via Gcc-patches wrote:

> > @item -mrelax-cmpxchg-loop
> > @opindex mrelax-cmpxchg-loop
> >-Relax cmpxchg loop by emitting an early load and compare before cmpxchg,
> >-execute pause if load value is not expected. This reduces excessive
> >-cachline bouncing when and works for all atomic logic fetch builtins
> >-that generates compare and swap loop.
> >+For compare and swap loops that emitted by some __atomic_* builtins
> 
> s/that emitted/that are emitted/
> 
> >+(e.g. __atomic_fetch_(or|and|xor|nand) and their __atomic_*_fetch
> >+counterparts), emit an atomic load before cmpxchg instruction. If the
> 
> s/before cmpxchg/before the cmpxchg/
> 
> >+loaded value is not equal to expected, execute a pause instead of
> 
> s/not equal to expected/not equal to the expected/
> 
> >+directly run the cmpxchg instruction. This might reduce excessive
> 
> s/directly run/directly running/

This results in "... execute a pause instead of directly running the
cmpxchg instruction", which needs further TLC because:

* 'a pause' should be 'the PAUSE instruction';
* 'directly running [an instruction]' does not seem correct in context.

The option also applies to __sync builtins, not just __atomic.


How about the following:

When emitting a compare-and-swap loop for @ref{__sync Builtins}
and @ref{__atomic Builtins} lacking a native instruction, optimize
for the highly contended case by issuing an atomic load before the
@code{CMPXCHG} instruction, and invoke the @code{PAUSE} instruction
when restarting the loop.

Alexander
  
Jonathan Wakely Nov. 15, 2022, 1:44 p.m. UTC | #3
On Tue, 15 Nov 2022 at 13:34, Alexander Monakov <amonakov@ispras.ru> wrote:
>
> On Tue, 15 Nov 2022, Jonathan Wakely via Gcc-patches wrote:
>
> > > @item -mrelax-cmpxchg-loop
> > > @opindex mrelax-cmpxchg-loop
> > >-Relax cmpxchg loop by emitting an early load and compare before cmpxchg,
> > >-execute pause if load value is not expected. This reduces excessive
> > >-cachline bouncing when and works for all atomic logic fetch builtins
> > >-that generates compare and swap loop.
> > >+For compare and swap loops that emitted by some __atomic_* builtins
> >
> > s/that emitted/that are emitted/
> >
> > >+(e.g. __atomic_fetch_(or|and|xor|nand) and their __atomic_*_fetch
> > >+counterparts), emit an atomic load before cmpxchg instruction. If the
> >
> > s/before cmpxchg/before the cmpxchg/
> >
> > >+loaded value is not equal to expected, execute a pause instead of
> >
> > s/not equal to expected/not equal to the expected/
> >
> > >+directly run the cmpxchg instruction. This might reduce excessive
> >
> > s/directly run/directly running/
>
> This results in "... execute a pause instead of directly running the
> cmpxchg instruction", which needs further TLC because:
>
> * 'a pause' should be 'the PAUSE instruction';
> * 'directly running [an instruction]' does not seem correct in context.
>
> The option also applies to __sync builtins, not just __atomic.
>
>
> How about the following:
>
> When emitting a compare-and-swap loop for @ref{__sync Builtins}
> and @ref{__atomic Builtins} lacking a native instruction, optimize
> for the highly contended case by issuing an atomic load before the
> @code{CMPXCHG} instruction, and invoke the @code{PAUSE} instruction
> when restarting the loop.

That's much better, thanks. My only remaining quibble would be that
"invoking" an instruction seems only marginally better than running
one. Emitting? Issuing? Using? Adding?
  
Alexander Monakov Nov. 15, 2022, 1:59 p.m. UTC | #4
On Tue, 15 Nov 2022, Jonathan Wakely wrote:

> > How about the following:
> >
> > When emitting a compare-and-swap loop for @ref{__sync Builtins}
> > and @ref{__atomic Builtins} lacking a native instruction, optimize
> > for the highly contended case by issuing an atomic load before the
> > @code{CMPXCHG} instruction, and invoke the @code{PAUSE} instruction
> > when restarting the loop.
> 
> That's much better, thanks. My only remaining quibble would be that
> "invoking" an instruction seems only marginally better than running
> one. Emitting? Issuing? Using? Adding?

Right, it should be 'using'; let me also add 'to save CPU power':

When emitting a compare-and-swap loop for @ref{__sync Builtins}
and @ref{__atomic Builtins} lacking a native instruction, optimize
for the highly contended case by issuing an atomic load before the
@code{CMPXCHG} instruction, and using the @code{PAUSE} instruction
to save CPU power when restarting the loop.

Alexander
  
Hongyu Wang Nov. 16, 2022, 5:15 a.m. UTC | #5
> When emitting a compare-and-swap loop for @ref{__sync Builtins}
> and @ref{__atomic Builtins} lacking a native instruction, optimize
> for the highly contended case by issuing an atomic load before the
> @code{CMPXCHG} instruction, and using the @code{PAUSE} instruction
> to save CPU power when restarting the loop.

Thanks for the correction, it looks quite clear now! Here is the
updated patch, ok for trunk?

Alexander Monakov via Gcc-patches <gcc-patches@gcc.gnu.org>
于2022年11月15日周二 21:59写道:
>
>
> On Tue, 15 Nov 2022, Jonathan Wakely wrote:
>
> > > How about the following:
> > >
> > > When emitting a compare-and-swap loop for @ref{__sync Builtins}
> > > and @ref{__atomic Builtins} lacking a native instruction, optimize
> > > for the highly contended case by issuing an atomic load before the
> > > @code{CMPXCHG} instruction, and invoke the @code{PAUSE} instruction
> > > when restarting the loop.
> >
> > That's much better, thanks. My only remaining quibble would be that
> > "invoking" an instruction seems only marginally better than running
> > one. Emitting? Issuing? Using? Adding?
>
> Right, it should be 'using'; let me also add 'to save CPU power':
>
> When emitting a compare-and-swap loop for @ref{__sync Builtins}
> and @ref{__atomic Builtins} lacking a native instruction, optimize
> for the highly contended case by issuing an atomic load before the
> @code{CMPXCHG} instruction, and using the @code{PAUSE} instruction
> to save CPU power when restarting the loop.
>
> Alexander
  
Alexander Monakov Nov. 16, 2022, 7:51 a.m. UTC | #6
On Wed, 16 Nov 2022, Hongyu Wang wrote:

> > When emitting a compare-and-swap loop for @ref{__sync Builtins}
> > and @ref{__atomic Builtins} lacking a native instruction, optimize
> > for the highly contended case by issuing an atomic load before the
> > @code{CMPXCHG} instruction, and using the @code{PAUSE} instruction
> > to save CPU power when restarting the loop.
> 
> Thanks for the correction, it looks quite clear now! Here is the
> updated patch, ok for trunk?

Please use 'git commit --author' to indicate authorship of the patch
(or simply let me push it once approved).

Jonathan, please let us know if the above wording looks fine to you?
Mainly I'm asking if '... and using' or '... and use' is easier to read.

Alexander
  
Hongyu Wang Nov. 16, 2022, 8:29 a.m. UTC | #7
> Please use 'git commit --author' to indicate authorship of the patch
> (or simply let me push it once approved).

Yes, just change the author and push it.
Thanks for your help!
  
Jonathan Wakely Nov. 16, 2022, 10:51 a.m. UTC | #8
On Wed, 16 Nov 2022 at 07:51, Alexander Monakov <amonakov@ispras.ru> wrote:
>
>
> On Wed, 16 Nov 2022, Hongyu Wang wrote:
>
> > > When emitting a compare-and-swap loop for @ref{__sync Builtins}
> > > and @ref{__atomic Builtins} lacking a native instruction, optimize
> > > for the highly contended case by issuing an atomic load before the
> > > @code{CMPXCHG} instruction, and using the @code{PAUSE} instruction
> > > to save CPU power when restarting the loop.
> >
> > Thanks for the correction, it looks quite clear now! Here is the
> > updated patch, ok for trunk?
>
> Please use 'git commit --author' to indicate authorship of the patch
> (or simply let me push it once approved).
>
> Jonathan, please let us know if the above wording looks fine to you?
> Mainly I'm asking if '... and using' or '... and use' is easier to read.


Your wording above ("and using...") looks good, it reads naturally and clearly.

It's quite a long sentence, so I considered suggesting:

"... by issuing an atomic load before the CMPXCHG instruction. Also
use the PAUSE instruction to save CPU power when restarting the loop."

But I think your original is better. The sentence is long, but it
flows better as a single sentence. As two sentences, the second one
just seems tacked onto the end and it's less clear how it relates to
the first.
  

Patch

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 40f667a630a..bdd7c319aef 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -33805,10 +33805,12 @@  registers.
 
 @item -mrelax-cmpxchg-loop
 @opindex mrelax-cmpxchg-loop
-Relax cmpxchg loop by emitting an early load and compare before cmpxchg,
-execute pause if load value is not expected. This reduces excessive
-cachline bouncing when and works for all atomic logic fetch builtins
-that generates compare and swap loop.
+For compare and swap loops that emitted by some __atomic_* builtins
+(e.g. __atomic_fetch_(or|and|xor|nand) and their __atomic_*_fetch
+counterparts), emit an atomic load before cmpxchg instruction. If the
+loaded value is not equal to expected, execute a pause instead of
+directly run the cmpxchg instruction. This might reduce excessive
+cacheline bouncing.
 
 @item -mindirect-branch=@var{choice}
 @opindex mindirect-branch