diff mbox series

[v1] Mode-Switching: Add optional EMIT_AFTER hook

Message ID	20230821072627.3984748-1-pan2.li@intel.com
State	Accepted
Headers	Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7E4933858291 To: gcc-patches@gcc.gnu.org Cc: juzhe.zhong@rivai.ai, jeffreyalaw@gmail.com, pan2.li@intel.com, yanzhang.wang@intel.com, kito.cheng@gmail.com Subject: [PATCH v1] Mode-Switching: Add optional EMIT_AFTER hook Date: Mon, 21 Aug 2023 15:26:27 +0800 Message-Id: <20230821072627.3984748-1-pan2.li@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: list From: Pan Li via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: pan2.li@intel.com Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org> X-getmail-retrieved-from-mailbox: INBOX
Series	[v1] Mode-Switching: Add optional EMIT_AFTER hook \| [v1] Mode-Switching: Add optional EMIT_AFTER hook

Checks

Context	Check	Description
snail/gcc-patch-check	success	Github commit url

Commit Message

Li, Pan2 via Gcc-patches Aug. 21, 2023, 7:26 a.m. UTC

  From: Pan Li <pan2.li@intel.com>

We have EMIT hook in mode switching already, which will insert the
insn before in most cases. However, in some arch like RISC-V, it
requires the additional insn to be inserted after when meet a call.

       |
       | <- EMIT HOOK, insert the insn before.
 +-----------+
 | ptr->insn |
 +-----------+
       | <- EMIT_AFTER HOOK, insert the insn after.
       |

Thus, this patch would like to add one optional EMIT_AFTER hook, which
will try to insert the emitted insn after. The end-user can either
implement this HOOK or leave it NULL as is.

If the backend ignore this optinal hook, there is no impact to the
original mode switching stuff. If the backend implement this optional
hook, the mode switching will try to insert the insn after. Please note
the EMIT_AFTER doen't have any impact to EMIT hook.

Passed both the regression and bootstrap test in x86.

Signed-off-by: Pan Li <pan2.li@intel.com>

gcc/ChangeLog:

	* doc/tm.texi: Add hook def and update the description.
	* doc/tm.texi.in: Ditto.
	* mode-switching.cc (optimize_mode_switching): Insert the
	emitted insn after ptr->insn.
	* target.def (insn): Define emit_after hook.
---
 gcc/doc/tm.texi       | 12 ++++++++++--
 gcc/doc/tm.texi.in    |  6 ++++--
 gcc/mode-switching.cc | 45 +++++++++++++++++++++++++++++++++++++++++++
 gcc/target.def        |  9 +++++++++
 4 files changed, 68 insertions(+), 4 deletions(-)

Comments

Jeff Law Aug. 21, 2023, 2:24 p.m. UTC | #1

On 8/21/23 01:26, pan2.li@intel.com wrote:
> From: Pan Li <pan2.li@intel.com>
> 
> We have EMIT hook in mode switching already, which will insert the
> insn before in most cases. However, in some arch like RISC-V, it
> requires the additional insn to be inserted after when meet a call.
> 
>         |
>         | <- EMIT HOOK, insert the insn before.
>   +-----------+
>   | ptr->insn |
>   +-----------+
>         | <- EMIT_AFTER HOOK, insert the insn after.
>         |
> 
> Thus, this patch would like to add one optional EMIT_AFTER hook, which
> will try to insert the emitted insn after. The end-user can either
> implement this HOOK or leave it NULL as is.
> 
> If the backend ignore this optinal hook, there is no impact to the
> original mode switching stuff. If the backend implement this optional
> hook, the mode switching will try to insert the insn after. Please note
> the EMIT_AFTER doen't have any impact to EMIT hook.
> 
> Passed both the regression and bootstrap test in x86.
> 
> Signed-off-by: Pan Li <pan2.li@intel.com>
> 
> gcc/ChangeLog:
> 
> 	* doc/tm.texi: Add hook def and update the description.
> 	* doc/tm.texi.in: Ditto.
> 	* mode-switching.cc (optimize_mode_switching): Insert the
> 	emitted insn after ptr->insn.
> 	* target.def (insn): Define emit_after hook.
Not a full review.  I think I need to know a bit more about why you need 
these additional hooks.

Presumably you can't use the current ".emit" hook because it doesn't 
give you access to the block or insn that you can then iterate on for 
insertion on the outgoing edges?



> @@ -831,6 +833,49 @@ optimize_mode_switching (void)
>   			emit_insn_before (mode_set, ptr->insn_ptr);
>   		    }
>   
> +		  if (targetm.mode_switching.emit_after)
> +		    {
> +		      if (control_flow_insn_p (ptr->insn_ptr)
> +			&& ptr->insn_ptr == BB_END (bb))
I'm not aware of a case where we can have an insn with control flow that 
isn't the end of the block.  So perhaps then that second conditional 
into an assertion inside the true arm?


> +			{
> +			  edge eg;
> +			  edge_iterator eg_iterator;
> +
> +			  FOR_EACH_EDGE (eg, eg_iterator, bb->succs)
> +			    {
> +			      start_sequence ();
> +			      targetm.mode_switching.emit_after (entity_map[j],
> +				ptr->mode, cur_mode, ptr->regs_live);
> +			      mode_set = get_insns ();
> +			      end_sequence ();
> +
> +			      if (mode_set != NULL_RTX)
> +				{
> +				  if (eg->flags & EDGE_ABNORMAL)
> +				    insert_insn_end_basic_block (mode_set, bb);
> +				  else
> +				    insert_insn_on_edge (mode_set, eg);
Is this really correct for EDGE_ABNORMAL?  If the abnormal edge is 
created by, say a nonlocal goto, exception handling, etc, then the insn 
you insert at the end of the block will never be executed.

This is a classic problem with these classes of algorithms and I suspect 
there's code elsewhere to deal with these cases.



Jeff

Li, Pan2 via Gcc-patches Aug. 23, 2023, 6:03 a.m. UTC | #2

Thanks Jeff for comments, and sorry for late response.

The background comes from the CALL insn. For the RISC-V dynamic rounding mode we need to

1. restore the frm BEFORE call, to avoid the static rounding mode pollute the call.
2. Backup the frm AFTER call, to ensure the frm value after call is live.

Currently, we don’t take care of it elegantly but we would like to refine this part by the optional EMIT_AFTER.

> I'm not aware of a case where we can have an insn with control flow that 
> isn't the end of the block.  So perhaps then that second conditional 
> into an assertion inside the true arm?

Not very sure my understanding is correct, but there may be a call insn in the middle of the bb,
And can be considered as control flow?

> Is this really correct for EDGE_ABNORMAL?  If the abnormal edge is 
> created by, say a nonlocal goto, exception handling, etc, then the insn 
> you insert at the end of the block will never be executed.

Got it, let me have a try for this, as well as there is somewhere take care of this already.

Pan


-----Original Message-----
From: Jeff Law <jeffreyalaw@gmail.com> 
Sent: Monday, August 21, 2023 10:24 PM
To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com
Subject: Re: [PATCH v1] Mode-Switching: Add optional EMIT_AFTER hook



On 8/21/23 01:26, pan2.li@intel.com wrote:
> From: Pan Li <pan2.li@intel.com>
> 
> We have EMIT hook in mode switching already, which will insert the
> insn before in most cases. However, in some arch like RISC-V, it
> requires the additional insn to be inserted after when meet a call.
> 
>         |
>         | <- EMIT HOOK, insert the insn before.
>   +-----------+
>   | ptr->insn |
>   +-----------+
>         | <- EMIT_AFTER HOOK, insert the insn after.
>         |
> 
> Thus, this patch would like to add one optional EMIT_AFTER hook, which
> will try to insert the emitted insn after. The end-user can either
> implement this HOOK or leave it NULL as is.
> 
> If the backend ignore this optinal hook, there is no impact to the
> original mode switching stuff. If the backend implement this optional
> hook, the mode switching will try to insert the insn after. Please note
> the EMIT_AFTER doen't have any impact to EMIT hook.
> 
> Passed both the regression and bootstrap test in x86.
> 
> Signed-off-by: Pan Li <pan2.li@intel.com>
> 
> gcc/ChangeLog:
> 
> 	* doc/tm.texi: Add hook def and update the description.
> 	* doc/tm.texi.in: Ditto.
> 	* mode-switching.cc (optimize_mode_switching): Insert the
> 	emitted insn after ptr->insn.
> 	* target.def (insn): Define emit_after hook.
Not a full review.  I think I need to know a bit more about why you need 
these additional hooks.

Presumably you can't use the current ".emit" hook because it doesn't 
give you access to the block or insn that you can then iterate on for 
insertion on the outgoing edges?



> @@ -831,6 +833,49 @@ optimize_mode_switching (void)
>   			emit_insn_before (mode_set, ptr->insn_ptr);
>   		    }
>   
> +		  if (targetm.mode_switching.emit_after)
> +		    {
> +		      if (control_flow_insn_p (ptr->insn_ptr)
> +			&& ptr->insn_ptr == BB_END (bb))
I'm not aware of a case where we can have an insn with control flow that 
isn't the end of the block.  So perhaps then that second conditional 
into an assertion inside the true arm?


> +			{
> +			  edge eg;
> +			  edge_iterator eg_iterator;
> +
> +			  FOR_EACH_EDGE (eg, eg_iterator, bb->succs)
> +			    {
> +			      start_sequence ();
> +			      targetm.mode_switching.emit_after (entity_map[j],
> +				ptr->mode, cur_mode, ptr->regs_live);
> +			      mode_set = get_insns ();
> +			      end_sequence ();
> +
> +			      if (mode_set != NULL_RTX)
> +				{
> +				  if (eg->flags & EDGE_ABNORMAL)
> +				    insert_insn_end_basic_block (mode_set, bb);
> +				  else
> +				    insert_insn_on_edge (mode_set, eg);
Is this really correct for EDGE_ABNORMAL?  If the abnormal edge is 
created by, say a nonlocal goto, exception handling, etc, then the insn 
you insert at the end of the block will never be executed.

This is a classic problem with these classes of algorithms and I suspect 
there's code elsewhere to deal with these cases.



Jeff

Jeff Law Aug. 23, 2023, 2:25 p.m. UTC | #3

On 8/23/23 00:03, Li, Pan2 wrote:
> Thanks Jeff for comments, and sorry for late response.
> 
> The background comes from the CALL insn. For the RISC-V dynamic rounding mode we need to
> 
> 1. restore the frm BEFORE call, to avoid the static rounding mode pollute the call.
> 2. Backup the frm AFTER call, to ensure the frm value after call is live.
> 
> Currently, we don’t take care of it elegantly but we would like to refine this part by the optional EMIT_AFTER.
Understood.  So the natural question is why does x86/sh not need this 
for its mode switching?   Don't all the same issues exist on those 
targets as well?

> 
>> I'm not aware of a case where we can have an insn with control flow that
>> isn't the end of the block.  So perhaps then that second conditional
>> into an assertion inside the true arm?
> 
> Not very sure my understanding is correct, but there may be a call insn in the middle of the bb,
> And can be considered as control flow?
In the case where the call is control flow, then it'll end the block. 
Examples of this would be if the call could throw or perform a nonlocal 
goto.  For "normal" calls, they are not considered control flow and can 
show up in the middle of a block.

> 
>> Is this really correct for EDGE_ABNORMAL?  If the abnormal edge is
>> created by, say a nonlocal goto, exception handling, etc, then the insn
>> you insert at the end of the block will never be executed.
> 
> Got it, let me have a try for this, as well as there is somewhere take care of this already.
You might also peek at the RTL gcse/pre code which is also LCM based and 
has the same class of problems.

jeff

Li, Pan2 via Gcc-patches Aug. 23, 2023, 2:54 p.m. UTC | #4

Thanks Jeff for comments.

> Understood.  So the natural question is why does x86/sh not need this 
> for its mode switching?   Don't all the same issues exist on those 
> targets as well?

AFAIK, it comes from the different design principle between the risc-v and x86/arm intrinsic API.
The risc-v rvv FP rounding mode intrinsic API has one abstract level above the insn itself, while
the x86/arm only indicates the semantics of the insn.

For example, if one vector instruction VFADD doesn't have static rounding mode (aka encoding rm in insn),
there is no such a intrinsic API contains rounding mode argument in x86/arm. While the risc-v fp
vector intrinsic will always have static rounding mode API if the frm is honored.

In short, the risc-v intrinsic API is closer to the end-user, while the x86/arm instrinsic API is closer to insn itself.

For the rest part, will have a try based on your suggestion soon as I am in the middle of something.

Pan

-----Original Message-----
From: Jeff Law <jeffreyalaw@gmail.com> 
Sent: Wednesday, August 23, 2023 10:25 PM
To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com
Subject: Re: [PATCH v1] Mode-Switching: Add optional EMIT_AFTER hook



On 8/23/23 00:03, Li, Pan2 wrote:
> Thanks Jeff for comments, and sorry for late response.
> 
> The background comes from the CALL insn. For the RISC-V dynamic rounding mode we need to
> 
> 1. restore the frm BEFORE call, to avoid the static rounding mode pollute the call.
> 2. Backup the frm AFTER call, to ensure the frm value after call is live.
> 
> Currently, we don’t take care of it elegantly but we would like to refine this part by the optional EMIT_AFTER.
Understood.  So the natural question is why does x86/sh not need this 
for its mode switching?   Don't all the same issues exist on those 
targets as well?

> 
>> I'm not aware of a case where we can have an insn with control flow that
>> isn't the end of the block.  So perhaps then that second conditional
>> into an assertion inside the true arm?
> 
> Not very sure my understanding is correct, but there may be a call insn in the middle of the bb,
> And can be considered as control flow?
In the case where the call is control flow, then it'll end the block. 
Examples of this would be if the call could throw or perform a nonlocal 
goto.  For "normal" calls, they are not considered control flow and can 
show up in the middle of a block.

> 
>> Is this really correct for EDGE_ABNORMAL?  If the abnormal edge is
>> created by, say a nonlocal goto, exception handling, etc, then the insn
>> you insert at the end of the block will never be executed.
> 
> Got it, let me have a try for this, as well as there is somewhere take care of this already.
You might also peek at the RTL gcse/pre code which is also LCM based and 
has the same class of problems.

jeff

Jeff Law Aug. 23, 2023, 11:26 p.m. UTC | #5

On 8/23/23 08:54, Li, Pan2 wrote:
> Thanks Jeff for comments.
> 
>> Understood.  So the natural question is why does x86/sh not need this
>> for its mode switching?   Don't all the same issues exist on those
>> targets as well?
> 
> AFAIK, it comes from the different design principle between the risc-v and x86/arm intrinsic API.
> The risc-v rvv FP rounding mode intrinsic API has one abstract level above the insn itself, while
> the x86/arm only indicates the semantics of the insn.
> 
> For example, if one vector instruction VFADD doesn't have static rounding mode (aka encoding rm in insn),
> there is no such a intrinsic API contains rounding mode argument in x86/arm. While the risc-v fp
> vector intrinsic will always have static rounding mode API if the frm is honored.
> 
> In short, the risc-v intrinsic API is closer to the end-user, while the x86/arm instrinsic API is closer to insn itself.
OK, but I'm still strugging to see how the distinction is important 
here.  Ultimately there's a state at a call site.  We need to make sure 
that state from the current function doesn't impact the callee and we 
need to make sure that the callee doesn't impact the state in the caller.

That implies a save/restore pair around the call (possibly optimized so 
that we minimize the number of save/restores).  I would have expected 
x86 to already be doing this.  But maybe there's some ABI thing around 
mmx vs x86 state that allows it to be avoided....

> 
> For the rest part, will have a try based on your suggestion soon as I am in the middle of something.
No problem.  Get to it when you can.  I think it affects you more than 
me :-)

jeff

Li, Pan2 via Gcc-patches Aug. 24, 2023, 4:53 a.m. UTC | #6

Thanks Jeff.

> That implies a save/restore pair around the call (possibly optimized so 
> that we minimize the number of save/restores).  I would have expected 
> x86 to already be doing this.  But maybe there's some ABI thing around 
> mmx vs x86 state that allows it to be avoided....

Very similar to save/restore but optional.
If no static rounding mode instrinsic here, it is unnecessary to add save/restore
pair around the call. I bet mode-switching take care of this already.

Pan

-----Original Message-----
From: Jeff Law <jeffreyalaw@gmail.com> 
Sent: Thursday, August 24, 2023 7:27 AM
To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com
Subject: Re: [PATCH v1] Mode-Switching: Add optional EMIT_AFTER hook



On 8/23/23 08:54, Li, Pan2 wrote:
> Thanks Jeff for comments.
> 
>> Understood.  So the natural question is why does x86/sh not need this
>> for its mode switching?   Don't all the same issues exist on those
>> targets as well?
> 
> AFAIK, it comes from the different design principle between the risc-v and x86/arm intrinsic API.
> The risc-v rvv FP rounding mode intrinsic API has one abstract level above the insn itself, while
> the x86/arm only indicates the semantics of the insn.
> 
> For example, if one vector instruction VFADD doesn't have static rounding mode (aka encoding rm in insn),
> there is no such a intrinsic API contains rounding mode argument in x86/arm. While the risc-v fp
> vector intrinsic will always have static rounding mode API if the frm is honored.
> 
> In short, the risc-v intrinsic API is closer to the end-user, while the x86/arm instrinsic API is closer to insn itself.
OK, but I'm still strugging to see how the distinction is important 
here.  Ultimately there's a state at a call site.  We need to make sure 
that state from the current function doesn't impact the callee and we 
need to make sure that the callee doesn't impact the state in the caller.

That implies a save/restore pair around the call (possibly optimized so 
that we minimize the number of save/restores).  I would have expected 
x86 to already be doing this.  But maybe there's some ABI thing around 
mmx vs x86 state that allows it to be avoided....

> 
> For the rest part, will have a try based on your suggestion soon as I am in the middle of something.
No problem.  Get to it when you can.  I think it affects you more than 
me :-)

jeff

Li, Pan2 via Gcc-patches Aug. 25, 2023, 12:44 p.m. UTC | #7

Hi Jeff,

> You might also peek at the RTL gcse/pre code which is also LCM based and 
> has the same class of problems.

I found a similar approach to take care of this in gcse.cc/pre_edge_insert with some comments as below.

  /* We can't insert anything on an abnormal and
       critical edge, so we insert the insn at the end of
       the previous block. There are several alternatives
       detailed in Morgans book P277 (sec 10.5) for
       handling this situation.  This one is easiest for
       now.  */

if (eg->flags & EDGE_ABNORMAL)
  insert_insn_end_basic_block (index_map[j], bb);
else
  {
      insn = process_insert_insn (index_map[j]);
      insert_insn_on_edge (insn, eg);
  }

It looks the insert_insn_end_basic_block is designed to handle the ABNORMAL edge by inserting at end of previous block from the comments.

Pan

-----Original Message-----
From: Gcc-patches <gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org> On Behalf Of Li, Pan2 via Gcc-patches
Sent: Thursday, August 24, 2023 12:54 PM
To: Jeff Law <jeffreyalaw@gmail.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com
Subject: RE: [PATCH v1] Mode-Switching: Add optional EMIT_AFTER hook

Thanks Jeff.

> That implies a save/restore pair around the call (possibly optimized so 
> that we minimize the number of save/restores).  I would have expected 
> x86 to already be doing this.  But maybe there's some ABI thing around 
> mmx vs x86 state that allows it to be avoided....

Very similar to save/restore but optional.
If no static rounding mode instrinsic here, it is unnecessary to add save/restore
pair around the call. I bet mode-switching take care of this already.

Pan

-----Original Message-----
From: Jeff Law <jeffreyalaw@gmail.com> 
Sent: Thursday, August 24, 2023 7:27 AM
To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com
Subject: Re: [PATCH v1] Mode-Switching: Add optional EMIT_AFTER hook



On 8/23/23 08:54, Li, Pan2 wrote:
> Thanks Jeff for comments.
> 
>> Understood.  So the natural question is why does x86/sh not need this
>> for its mode switching?   Don't all the same issues exist on those
>> targets as well?
> 
> AFAIK, it comes from the different design principle between the risc-v and x86/arm intrinsic API.
> The risc-v rvv FP rounding mode intrinsic API has one abstract level above the insn itself, while
> the x86/arm only indicates the semantics of the insn.
> 
> For example, if one vector instruction VFADD doesn't have static rounding mode (aka encoding rm in insn),
> there is no such a intrinsic API contains rounding mode argument in x86/arm. While the risc-v fp
> vector intrinsic will always have static rounding mode API if the frm is honored.
> 
> In short, the risc-v intrinsic API is closer to the end-user, while the x86/arm instrinsic API is closer to insn itself.
OK, but I'm still strugging to see how the distinction is important 
here.  Ultimately there's a state at a call site.  We need to make sure 
that state from the current function doesn't impact the callee and we 
need to make sure that the callee doesn't impact the state in the caller.

That implies a save/restore pair around the call (possibly optimized so 
that we minimize the number of save/restores).  I would have expected 
x86 to already be doing this.  But maybe there's some ABI thing around 
mmx vs x86 state that allows it to be avoided....

> 
> For the rest part, will have a try based on your suggestion soon as I am in the middle of something.
No problem.  Get to it when you can.  I think it affects you more than 
me :-)

jeff

Li, Pan2 via Gcc-patches Sept. 11, 2023, 8:36 a.m. UTC | #8

Hi Jeff,

Kindly ping for the Patch V2 as below.

https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628508.html

Pan

-----Original Message-----
From: Li, Pan2 <pan2.li@intel.com> 
Sent: Friday, August 25, 2023 8:45 PM
To: Li, Pan2 <pan2.li@intel.com>; Jeff Law <jeffreyalaw@gmail.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com
Subject: RE: [PATCH v1] Mode-Switching: Add optional EMIT_AFTER hook

Hi Jeff,

> You might also peek at the RTL gcse/pre code which is also LCM based and 
> has the same class of problems.

I found a similar approach to take care of this in gcse.cc/pre_edge_insert with some comments as below.

  /* We can't insert anything on an abnormal and
       critical edge, so we insert the insn at the end of
       the previous block. There are several alternatives
       detailed in Morgans book P277 (sec 10.5) for
       handling this situation.  This one is easiest for
       now.  */

if (eg->flags & EDGE_ABNORMAL)
  insert_insn_end_basic_block (index_map[j], bb);
else
  {
      insn = process_insert_insn (index_map[j]);
      insert_insn_on_edge (insn, eg);
  }

It looks the insert_insn_end_basic_block is designed to handle the ABNORMAL edge by inserting at end of previous block from the comments.

Pan

-----Original Message-----
From: Gcc-patches <gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org> On Behalf Of Li, Pan2 via Gcc-patches
Sent: Thursday, August 24, 2023 12:54 PM
To: Jeff Law <jeffreyalaw@gmail.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com
Subject: RE: [PATCH v1] Mode-Switching: Add optional EMIT_AFTER hook

Thanks Jeff.

> That implies a save/restore pair around the call (possibly optimized so 
> that we minimize the number of save/restores).  I would have expected 
> x86 to already be doing this.  But maybe there's some ABI thing around 
> mmx vs x86 state that allows it to be avoided....

Very similar to save/restore but optional.
If no static rounding mode instrinsic here, it is unnecessary to add save/restore
pair around the call. I bet mode-switching take care of this already.

Pan

-----Original Message-----
From: Jeff Law <jeffreyalaw@gmail.com> 
Sent: Thursday, August 24, 2023 7:27 AM
To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com
Subject: Re: [PATCH v1] Mode-Switching: Add optional EMIT_AFTER hook



On 8/23/23 08:54, Li, Pan2 wrote:
> Thanks Jeff for comments.
> 
>> Understood.  So the natural question is why does x86/sh not need this
>> for its mode switching?   Don't all the same issues exist on those
>> targets as well?
> 
> AFAIK, it comes from the different design principle between the risc-v and x86/arm intrinsic API.
> The risc-v rvv FP rounding mode intrinsic API has one abstract level above the insn itself, while
> the x86/arm only indicates the semantics of the insn.
> 
> For example, if one vector instruction VFADD doesn't have static rounding mode (aka encoding rm in insn),
> there is no such a intrinsic API contains rounding mode argument in x86/arm. While the risc-v fp
> vector intrinsic will always have static rounding mode API if the frm is honored.
> 
> In short, the risc-v intrinsic API is closer to the end-user, while the x86/arm instrinsic API is closer to insn itself.
OK, but I'm still strugging to see how the distinction is important 
here.  Ultimately there's a state at a call site.  We need to make sure 
that state from the current function doesn't impact the callee and we 
need to make sure that the callee doesn't impact the state in the caller.

That implies a save/restore pair around the call (possibly optimized so 
that we minimize the number of save/restores).  I would have expected 
x86 to already be doing this.  But maybe there's some ABI thing around 
mmx vs x86 state that allows it to be avoided....

> 
> For the rest part, will have a try based on your suggestion soon as I am in the middle of something.
No problem.  Get to it when you can.  I think it affects you more than 
me :-)

jeff

Li, Pan2 Sept. 28, 2023, 1:07 a.m. UTC | #9

Almost forget about this patch, sorry for disturbing and kindly ping again.

Pan

-----Original Message-----
From: Gcc-patches <gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org> On Behalf Of Li, Pan2 via Gcc-patches
Sent: Monday, September 11, 2023 4:37 PM
To: Jeff Law <jeffreyalaw@gmail.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com
Subject: RE: [PATCH v1] Mode-Switching: Add optional EMIT_AFTER hook

Hi Jeff,

Kindly ping for the Patch V2 as below.

https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628508.html

Pan

-----Original Message-----
From: Li, Pan2 <pan2.li@intel.com> 
Sent: Friday, August 25, 2023 8:45 PM
To: Li, Pan2 <pan2.li@intel.com>; Jeff Law <jeffreyalaw@gmail.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com
Subject: RE: [PATCH v1] Mode-Switching: Add optional EMIT_AFTER hook

Hi Jeff,

> You might also peek at the RTL gcse/pre code which is also LCM based and 
> has the same class of problems.

I found a similar approach to take care of this in gcse.cc/pre_edge_insert with some comments as below.

  /* We can't insert anything on an abnormal and
       critical edge, so we insert the insn at the end of
       the previous block. There are several alternatives
       detailed in Morgans book P277 (sec 10.5) for
       handling this situation.  This one is easiest for
       now.  */

if (eg->flags & EDGE_ABNORMAL)
  insert_insn_end_basic_block (index_map[j], bb);
else
  {
      insn = process_insert_insn (index_map[j]);
      insert_insn_on_edge (insn, eg);
  }

It looks the insert_insn_end_basic_block is designed to handle the ABNORMAL edge by inserting at end of previous block from the comments.

Pan

-----Original Message-----
From: Gcc-patches <gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org> On Behalf Of Li, Pan2 via Gcc-patches
Sent: Thursday, August 24, 2023 12:54 PM
To: Jeff Law <jeffreyalaw@gmail.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com
Subject: RE: [PATCH v1] Mode-Switching: Add optional EMIT_AFTER hook

Thanks Jeff.

> That implies a save/restore pair around the call (possibly optimized so 
> that we minimize the number of save/restores).  I would have expected 
> x86 to already be doing this.  But maybe there's some ABI thing around 
> mmx vs x86 state that allows it to be avoided....

Very similar to save/restore but optional.
If no static rounding mode instrinsic here, it is unnecessary to add save/restore
pair around the call. I bet mode-switching take care of this already.

Pan

-----Original Message-----
From: Jeff Law <jeffreyalaw@gmail.com> 
Sent: Thursday, August 24, 2023 7:27 AM
To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com
Subject: Re: [PATCH v1] Mode-Switching: Add optional EMIT_AFTER hook



On 8/23/23 08:54, Li, Pan2 wrote:
> Thanks Jeff for comments.
> 
>> Understood.  So the natural question is why does x86/sh not need this
>> for its mode switching?   Don't all the same issues exist on those
>> targets as well?
> 
> AFAIK, it comes from the different design principle between the risc-v and x86/arm intrinsic API.
> The risc-v rvv FP rounding mode intrinsic API has one abstract level above the insn itself, while
> the x86/arm only indicates the semantics of the insn.
> 
> For example, if one vector instruction VFADD doesn't have static rounding mode (aka encoding rm in insn),
> there is no such a intrinsic API contains rounding mode argument in x86/arm. While the risc-v fp
> vector intrinsic will always have static rounding mode API if the frm is honored.
> 
> In short, the risc-v intrinsic API is closer to the end-user, while the x86/arm instrinsic API is closer to insn itself.
OK, but I'm still strugging to see how the distinction is important 
here.  Ultimately there's a state at a call site.  We need to make sure 
that state from the current function doesn't impact the callee and we 
need to make sure that the callee doesn't impact the state in the caller.

That implies a save/restore pair around the call (possibly optimized so 
that we minimize the number of save/restores).  I would have expected 
x86 to already be doing this.  But maybe there's some ABI thing around 
mmx vs x86 state that allows it to be avoided....

> 
> For the rest part, will have a try based on your suggestion soon as I am in the middle of something.
No problem.  Get to it when you can.  I think it affects you more than 
me :-)

jeff

Jeff Law Sept. 29, 2023, 8:29 p.m. UTC | #10

On 8/25/23 06:44, Li, Pan2 wrote:
> Hi Jeff,
> 
>> You might also peek at the RTL gcse/pre code which is also LCM based and
>> has the same class of problems.
> 
> I found a similar approach to take care of this in gcse.cc/pre_edge_insert with some comments as below.
> 
>    /* We can't insert anything on an abnormal and
>         critical edge, so we insert the insn at the end of
>         the previous block. There are several alternatives
>         detailed in Morgans book P277 (sec 10.5) for
>         handling this situation.  This one is easiest for
>         now.  */
> 
> if (eg->flags & EDGE_ABNORMAL)
>    insert_insn_end_basic_block (index_map[j], bb);
> else
>    {
>        insn = process_insert_insn (index_map[j]);
>        insert_insn_on_edge (insn, eg);
>    }
> 
> It looks the insert_insn_end_basic_block is designed to handle the ABNORMAL edge by inserting at end of previous block from the comments.
That's probably dead code at this point.  IIRC rth did further work in 
this space because inserting in the end of the block with the abnormal 
edge isn't semantically correct.

It's been 20+ years, but IIRC he adjusted the PRE bitmaps so that we 
never would need to do an insertion on an abnormal edge.  Search for 
EDGE_ABNORMAL in gcse.cc.

Jeff

Jeff Law Sept. 29, 2023, 8:49 p.m. UTC | #11

On 8/23/23 22:53, Li, Pan2 wrote:
> Thanks Jeff.
> 
>> That implies a save/restore pair around the call (possibly optimized so
>> that we minimize the number of save/restores).  I would have expected
>> x86 to already be doing this.  But maybe there's some ABI thing around
>> mmx vs x86 state that allows it to be avoided....
> 
> Very similar to save/restore but optional.
> If no static rounding mode instrinsic here, it is unnecessary to add save/restore
> pair around the call. I bet mode-switching take care of this already.
But I still fail to see how this is relevant.

If we go back to the x86 mode switching case.  We ultimately still have 
to ensure that the caller does not impact the callee and the callee does 
not impact the caller.  That implies there must be a mechanism to 
save/restore the mode at call sites unless the x86 ABI somehow defines 
that problem away.

Conceptually the rounding mode is just a property.  The call, in effect, 
should demand a "normal" rounding mode and set the rounding mode to 
unknown if I understand how this is supposed to work.  If my 
understanding is wrong, then maybe that's where we should start -- with 
a good description of the problem ;-)

jeff

Robin Dapp Oct. 2, 2023, 8:26 a.m. UTC | #12

> Conceptually the rounding mode is just a property.  The call, in
> effect, should demand a "normal" rounding mode and set the rounding
> mode to unknown if I understand how this is supposed to work.  If my
> understanding is wrong, then maybe that's where we should start --
> with a good description of the problem ;-)

That's also what I what struggled with last time this was discussed.

Normally, mode switching is used to switch to a requested mode for
an insn or a call and potentially switch back afterwards.

For those riscv intrinsics that specify a variable, non-default rounding
mode we have two options:

- Save and restore before and after each mode-changing intrinsic
 fegetround old_rounding
 fesetround new_rounding 
 actual instruction
 fesetround old_rounding)

- Have mode switching do it for us (lazily) to avoid most of the
storing of the old rounding mode by storing an (e.g.) function-level
rounding-mode backup value.  The backup value is used to lazily
restore the currently valid rounding mode.

The problem with this now is that whenever fesetround gets called
our backup is outdated.  Therefore we need to update our backup after
each function call (as fesetround can of course be present anywhere)
and this is where most of the complications come from.

So in that case the callee _does_ impact the caller via the backup
clobbering.  That was one of my complaints about the whole procedure
last time.  Besides, I didn't see the need for those intrinsics
anyway and would much rather have explicit fesetround calls but well :)

Having said that, it looks like Pan's patch just tries to move some of
the dirty work from the backend to the mode-switching pass by making it
easier to do something after a call.  I believe I asked for that back in
one of the reviews even?

Regards
 Robin

Li, Pan2 Oct. 5, 2023, 11:51 a.m. UTC | #13

Thanks Jeff and Robin for comments, sorry for late reply.

> Conceptually the rounding mode is just a property.  The call, in effect, 
> should demand a "normal" rounding mode and set the rounding mode to 
> unknown if I understand how this is supposed to work.  If my 
> understanding is wrong, then maybe that's where we should start -- with 
> a good description of the problem ;-)

I think we are on the same page of how it works, I may need to take a look at how x86 taking care of this.

> That's probably dead code at this point.  IIRC rth did further work in 
> this space because inserting in the end of the block with the abnormal 
> edge isn't semantically correct.

> It's been 20+ years, but IIRC he adjusted the PRE bitmaps so that we 
> never would need to do an insertion on an abnormal edge.  Search for 
> EDGE_ABNORMAL in gcse.cc.

That is quite old up to a point, will have a try for the EDGE_ABNORMAL case.

> Having said that, it looks like Pan's patch just tries to move some of
> the dirty work from the backend to the mode-switching pass by making it
> easier to do something after a call.  I believe I asked for that back in
> one of the reviews even?

Yes, that is what I would like to do in this PATCH, as the following up of some comments from Robin in previous.

Pan

-----Original Message-----
From: Robin Dapp <rdapp.gcc@gmail.com> 
Sent: Monday, October 2, 2023 4:26 PM
To: Jeff Law <jeffreyalaw@gmail.com>; Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
Cc: rdapp.gcc@gmail.com; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com
Subject: Re: [PATCH v1] Mode-Switching: Add optional EMIT_AFTER hook

> Conceptually the rounding mode is just a property.  The call, in
> effect, should demand a "normal" rounding mode and set the rounding
> mode to unknown if I understand how this is supposed to work.  If my
> understanding is wrong, then maybe that's where we should start --
> with a good description of the problem ;-)

That's also what I what struggled with last time this was discussed.

Normally, mode switching is used to switch to a requested mode for
an insn or a call and potentially switch back afterwards.

For those riscv intrinsics that specify a variable, non-default rounding
mode we have two options:

- Save and restore before and after each mode-changing intrinsic
 fegetround old_rounding
 fesetround new_rounding 
 actual instruction
 fesetround old_rounding)

- Have mode switching do it for us (lazily) to avoid most of the
storing of the old rounding mode by storing an (e.g.) function-level
rounding-mode backup value.  The backup value is used to lazily
restore the currently valid rounding mode.

The problem with this now is that whenever fesetround gets called
our backup is outdated.  Therefore we need to update our backup after
each function call (as fesetround can of course be present anywhere)
and this is where most of the complications come from.

So in that case the callee _does_ impact the caller via the backup
clobbering.  That was one of my complaints about the whole procedure
last time.  Besides, I didn't see the need for those intrinsics
anyway and would much rather have explicit fesetround calls but well :)

Having said that, it looks like Pan's patch just tries to move some of
the dirty work from the backend to the mode-switching pass by making it
easier to do something after a call.  I believe I asked for that back in
one of the reviews even?

Regards
 Robin

diff mbox series

Patch

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index e4d0cc43f41..a0798d4468b 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -10334,8 +10334,8 @@  return nonzero for any @var{entity} that needs mode-switching.
 If you define this macro, you also have to define
 @code{NUM_MODES_FOR_MODE_SWITCHING}, @code{TARGET_MODE_NEEDED},
 @code{TARGET_MODE_PRIORITY} and @code{TARGET_MODE_EMIT}.
-@code{TARGET_MODE_AFTER}, @code{TARGET_MODE_ENTRY}, and @code{TARGET_MODE_EXIT}
-are optional.
+@code{TARGET_MODE_AFTER}, @code{TARGET_MODE_ENTRY}, @code{TARGET_MODE_EMIT_AFTER},
+and @code{TARGET_MODE_EXIT} are optional.
 @end defmac
 
 @defmac NUM_MODES_FOR_MODE_SWITCHING
@@ -10359,6 +10359,14 @@  to switch from. Sets of a lower numbered entity will be emitted before
 sets of a higher numbered entity to a mode of the same or lower priority.
 @end deftypefn
 
+@deftypefn {Target Hook} void TARGET_MODE_EMIT_AFTER (int @var{entity}, int @var{mode}, int @var{prev_mode}, HARD_REG_SET @var{regs_live})
+Generate one or more insns to set @var{entity} to @var{mode}.
+@var{hard_reg_live} is the set of hard registers live at the point where
+the insn(s) are to be inserted after. @var{prev_moxde} indicates the mode
+to switch from. Sets of a lower numbered entity will be emitted before
+sets of a higher numbered entity to a mode of the same or lower priority.
+@end deftypefn
+
 @deftypefn {Target Hook} int TARGET_MODE_NEEDED (int @var{entity}, rtx_insn *@var{insn})
 @var{entity} is an integer specifying a mode-switched entity.
 If @code{OPTIMIZE_MODE_SWITCHING} is defined, you must define this macro
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 4ac96dc357d..2942ce0be3b 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -6911,8 +6911,8 @@  return nonzero for any @var{entity} that needs mode-switching.
 If you define this macro, you also have to define
 @code{NUM_MODES_FOR_MODE_SWITCHING}, @code{TARGET_MODE_NEEDED},
 @code{TARGET_MODE_PRIORITY} and @code{TARGET_MODE_EMIT}.
-@code{TARGET_MODE_AFTER}, @code{TARGET_MODE_ENTRY}, and @code{TARGET_MODE_EXIT}
-are optional.
+@code{TARGET_MODE_AFTER}, @code{TARGET_MODE_ENTRY}, @code{TARGET_MODE_EMIT_AFTER},
+and @code{TARGET_MODE_EXIT} are optional.
 @end defmac
 
 @defmac NUM_MODES_FOR_MODE_SWITCHING
@@ -6930,6 +6930,8 @@  switch is needed / supplied.
 
 @hook TARGET_MODE_EMIT
 
+@hook TARGET_MODE_EMIT_AFTER
+
 @hook TARGET_MODE_NEEDED
 
 @hook TARGET_MODE_AFTER
diff --git a/gcc/mode-switching.cc b/gcc/mode-switching.cc
index f483c831c35..98051dff487 100644
--- a/gcc/mode-switching.cc
+++ b/gcc/mode-switching.cc
@@ -34,6 +34,8 @@  along with GCC; see the file COPYING3.  If not see
 #include "lcm.h"
 #include "cfgcleanup.h"
 #include "tree-pass.h"
+#include "cfgbuild.h"
+#include "gcse.h"
 
 /* We want target macros for the mode switching code to be able to refer
    to instruction attribute values.  */
@@ -831,6 +833,49 @@  optimize_mode_switching (void)
 			emit_insn_before (mode_set, ptr->insn_ptr);
 		    }
 
+		  if (targetm.mode_switching.emit_after)
+		    {
+		      if (control_flow_insn_p (ptr->insn_ptr)
+			&& ptr->insn_ptr == BB_END (bb))
+			{
+			  edge eg;
+			  edge_iterator eg_iterator;
+
+			  FOR_EACH_EDGE (eg, eg_iterator, bb->succs)
+			    {
+			      start_sequence ();
+			      targetm.mode_switching.emit_after (entity_map[j],
+				ptr->mode, cur_mode, ptr->regs_live);
+			      mode_set = get_insns ();
+			      end_sequence ();
+
+			      if (mode_set != NULL_RTX)
+				{
+				  if (eg->flags & EDGE_ABNORMAL)
+				    insert_insn_end_basic_block (mode_set, bb);
+				  else
+				    insert_insn_on_edge (mode_set, eg);
+
+				  emitted = true;
+				  need_commit = true;
+				}
+			    }
+			}
+		      else
+			{
+			  start_sequence ();
+			  targetm.mode_switching.emit_after (entity_map[j],
+			    ptr->mode, cur_mode, ptr->regs_live);
+			  mode_set = get_insns ();
+			  end_sequence ();
+
+			  if (mode_set != NULL_RTX)
+			    {
+			      emit_insn_after (mode_set, ptr->insn_ptr);
+			      emitted = true;
+			    }
+			}
+		    }
 		  default_rtl_profile ();
 		}
 
diff --git a/gcc/target.def b/gcc/target.def
index a500aeb6714..abcf9dc5c0e 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -7005,6 +7005,15 @@  to switch from. Sets of a lower numbered entity will be emitted before\n\
 sets of a higher numbered entity to a mode of the same or lower priority.",
  void, (int entity, int mode, int prev_mode, HARD_REG_SET regs_live), NULL)
 
+DEFHOOK
+(emit_after,
+ "Generate one or more insns to set @var{entity} to @var{mode}.\n\
+@var{hard_reg_live} is the set of hard registers live at the point where\n\
+the insn(s) are to be inserted after. @var{prev_moxde} indicates the mode\n\
+to switch from. Sets of a lower numbered entity will be emitted before\n\
+sets of a higher numbered entity to a mode of the same or lower priority.",
+ void, (int entity, int mode, int prev_mode, HARD_REG_SET regs_live), NULL)
+
 DEFHOOK
 (needed,
  "@var{entity} is an integer specifying a mode-switched entity.\n\