[v2] In the pipeline, USE or CLOBBER should delay execution if it starts a new live range.

Message ID 20230814112255.2071-1-jinma@linux.alibaba.com
State Accepted
Headers
Series [v2] In the pipeline, USE or CLOBBER should delay execution if it starts a new live range. |

Checks

Context Check Description
snail/gcc-patch-check success Github commit url

Commit Message

Jin Ma Aug. 14, 2023, 11:22 a.m. UTC
  CLOBBER and USE does not represent real instructions, but in the
process of pipeline optimization, they will wait for transmission
in ready list like other insns, without considering resource
conflicts and cycles. This results in a multi-issue CPU architecture
that can be issued at any time if other regular insns have resource
conflicts or cannot be launched for other reasons. As a result,
its position is advanced in the generated insns sequence, which
will affect register allocation and often lead to more redundant
mov instructions.

A simple example:
https://github.com/majin2020/gcc-test/blob/master/test.c
This is a function in the dhrystone benchmark.

https://github.com/majin2020/gcc-test/blob/0b08c1a13de9663d7d9aba7539b960ec0607ca24/test.c.299r.sched1
This is a log of the pass 'sched1' When -mtune=rocket but issue_rate == 2.

The pipeline is:
;; | insn | prio |
;; |  17  |  3   | r142=a0 alu
;; |  14  |  0   | clobber r136 nothing
;; |  13  |  0   | clobber a0 nothing
;; |  18  |  2   | r143=a1 alu
...
;; |  12  |  0   | a0=r136 alu
;; |  15  |  0   | use a0 nothing

In this log, insn 13 and 14 are much ahead of schedule, which risks generating
redundant mov instructions, which seems unreasonable.

Therefore, I submit patch again on the basis of the last review
opinions to try to solve this problem.

https://github.com/majin2020/gcc-test/commit/efcb43e3369e771bde702955048bfe3f501263dd#diff-805031b1be5092a2322852a248d0b0f92eef7cad5784a8209f4dfc6221407457L189
This is the diff log of shed1 after patch is added.

The new pipeline is:
;; | insn | prio |
;; |  17  |  3   | r142=a0 alu
...
;; |  10  |  0   | [r144]=r141 alu
;; |  13  |  0   | clobber a0 nothing
;; |  14  |  0   | clobber r136 nothing
;; |  12  |  0   | a0=r136 alu
;; |  15  |  0   | use a0 nothing

gcc/ChangeLog:
	* haifa-sched.cc (use_or_clobber_starts_range_p): New.
	(prune_ready_list): USE or CLOBBER should delay execution
	if it starts a new live range.
---
 gcc/haifa-sched.cc | 55 +++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 50 insertions(+), 5 deletions(-)


base-commit: c944ded09595946290778a26794074e69cc65f3e
  

Comments

Jin Ma Aug. 29, 2023, 8 a.m. UTC | #1
ping

Ref:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627341.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621296.html
  
Jeff Law Nov. 11, 2023, 8:12 p.m. UTC | #2
On 8/14/23 05:22, Jin Ma wrote:
> CLOBBER and USE does not represent real instructions, but in the
> process of pipeline optimization, they will wait for transmission
> in ready list like other insns, without considering resource
> conflicts and cycles. This results in a multi-issue CPU architecture
> that can be issued at any time if other regular insns have resource
> conflicts or cannot be launched for other reasons. As a result,
> its position is advanced in the generated insns sequence, which
> will affect register allocation and often lead to more redundant
> mov instructions.
> 
> A simple example:
> https://github.com/majin2020/gcc-test/blob/master/test.c
> This is a function in the dhrystone benchmark.
> 
> https://github.com/majin2020/gcc-test/blob/0b08c1a13de9663d7d9aba7539b960ec0607ca24/test.c.299r.sched1
> This is a log of the pass 'sched1' When -mtune=rocket but issue_rate == 2.
> 
> The pipeline is:
> ;; | insn | prio |
> ;; |  17  |  3   | r142=a0 alu
> ;; |  14  |  0   | clobber r136 nothing
> ;; |  13  |  0   | clobber a0 nothing
> ;; |  18  |  2   | r143=a1 alu
> ...
> ;; |  12  |  0   | a0=r136 alu
> ;; |  15  |  0   | use a0 nothing
> 
> In this log, insn 13 and 14 are much ahead of schedule, which risks generating
> redundant mov instructions, which seems unreasonable.
> 
> Therefore, I submit patch again on the basis of the last review
> opinions to try to solve this problem.
> 
> https://github.com/majin2020/gcc-test/commit/efcb43e3369e771bde702955048bfe3f501263dd#diff-805031b1be5092a2322852a248d0b0f92eef7cad5784a8209f4dfc6221407457L189
> This is the diff log of shed1 after patch is added.
> 
> The new pipeline is:
> ;; | insn | prio |
> ;; |  17  |  3   | r142=a0 alu
> ...
> ;; |  10  |  0   | [r144]=r141 alu
> ;; |  13  |  0   | clobber a0 nothing
> ;; |  14  |  0   | clobber r136 nothing
> ;; |  12  |  0   | a0=r136 alu
> ;; |  15  |  0   | use a0 nothing
> 
> gcc/ChangeLog:
> 	* haifa-sched.cc (use_or_clobber_starts_range_p): New.
> 	(prune_ready_list): USE or CLOBBER should delay execution
> 	if it starts a new live range.
OK for the trunk.  It doesn't look like you have write access and I 
don't see anything about what testing was done.  Standard practice is to 
do a bootstrap and regression test on a primary platform such as x86, 
aarch64, ppc64.

I went ahead and did a bootstrap and regression test on x86_64, then 
pushed this to the trunk.

Thanks for your patience,

jeff
  
Xi Ruoyao Nov. 12, 2023, 5:41 p.m. UTC | #3
On Sat, 2023-11-11 at 13:12 -0700, Jeff Law wrote:
> 
> 
> On 8/14/23 05:22, Jin Ma wrote:
> > CLOBBER and USE does not represent real instructions, but in the
> > process of pipeline optimization, they will wait for transmission
> > in ready list like other insns, without considering resource
> > conflicts and cycles. This results in a multi-issue CPU architecture
> > that can be issued at any time if other regular insns have resource
> > conflicts or cannot be launched for other reasons. As a result,
> > its position is advanced in the generated insns sequence, which
> > will affect register allocation and often lead to more redundant
> > mov instructions.
> > 
> > A simple example:
> > https://github.com/majin2020/gcc-test/blob/master/test.c
> > This is a function in the dhrystone benchmark.
> > 
> > https://github.com/majin2020/gcc-test/blob/0b08c1a13de9663d7d9aba7539b960ec0607ca24/test.c.299r.sched1
> > This is a log of the pass 'sched1' When -mtune=rocket but issue_rate
> > == 2.
> > 
> > The pipeline is:
> > ;; | insn | prio |
> > ;; |  17  |  3   | r142=a0 alu
> > ;; |  14  |  0   | clobber r136 nothing
> > ;; |  13  |  0   | clobber a0 nothing
> > ;; |  18  |  2   | r143=a1 alu
> > ...
> > ;; |  12  |  0   | a0=r136 alu
> > ;; |  15  |  0   | use a0 nothing
> > 
> > In this log, insn 13 and 14 are much ahead of schedule, which risks
> > generating
> > redundant mov instructions, which seems unreasonable.
> > 
> > Therefore, I submit patch again on the basis of the last review
> > opinions to try to solve this problem.
> > 
> > https://github.com/majin2020/gcc-test/commit/efcb43e3369e771bde702955048bfe3f501263dd#diff-805031b1be5092a2322852a248d0b0f92eef7cad5784a8209f4dfc6221407457L189
> > This is the diff log of shed1 after patch is added.
> > 
> > The new pipeline is:
> > ;; | insn | prio |
> > ;; |  17  |  3   | r142=a0 alu
> > ...
> > ;; |  10  |  0   | [r144]=r141 alu
> > ;; |  13  |  0   | clobber a0 nothing
> > ;; |  14  |  0   | clobber r136 nothing
> > ;; |  12  |  0   | a0=r136 alu
> > ;; |  15  |  0   | use a0 nothing
> > 
> > gcc/ChangeLog:
> > 	* haifa-sched.cc (use_or_clobber_starts_range_p): New.
> > 	(prune_ready_list): USE or CLOBBER should delay execution
> > 	if it starts a new live range.
> OK for the trunk.  It doesn't look like you have write access and I 
> don't see anything about what testing was done.  Standard practice is
> to 
> do a bootstrap and regression test on a primary platform such as x86, 
> aarch64, ppc64.
> 
> I went ahead and did a bootstrap and regression test on x86_64, then 
> pushed this to the trunk.

Unfortunately this patch has triggered a bootstrap comparison failure on
loongarch64-linux-gnu: https://gcc.gnu.org/PR112497.
  
Jeff Law Nov. 12, 2023, 6:02 p.m. UTC | #4
On 11/12/23 10:41, Xi Ruoyao wrote:
> On Sat, 2023-11-11 at 13:12 -0700, Jeff Law wrote:
>>
>>
>> On 8/14/23 05:22, Jin Ma wrote:
>>> CLOBBER and USE does not represent real instructions, but in the
>>> process of pipeline optimization, they will wait for transmission
>>> in ready list like other insns, without considering resource
>>> conflicts and cycles. This results in a multi-issue CPU architecture
>>> that can be issued at any time if other regular insns have resource
>>> conflicts or cannot be launched for other reasons. As a result,
>>> its position is advanced in the generated insns sequence, which
>>> will affect register allocation and often lead to more redundant
>>> mov instructions.
>>>
>>> A simple example:
>>> https://github.com/majin2020/gcc-test/blob/master/test.c
>>> This is a function in the dhrystone benchmark.
>>>
>>> https://github.com/majin2020/gcc-test/blob/0b08c1a13de9663d7d9aba7539b960ec0607ca24/test.c.299r.sched1
>>> This is a log of the pass 'sched1' When -mtune=rocket but issue_rate
>>> == 2.
>>>
>>> The pipeline is:
>>> ;; | insn | prio |
>>> ;; |  17  |  3   | r142=a0 alu
>>> ;; |  14  |  0   | clobber r136 nothing
>>> ;; |  13  |  0   | clobber a0 nothing
>>> ;; |  18  |  2   | r143=a1 alu
>>> ...
>>> ;; |  12  |  0   | a0=r136 alu
>>> ;; |  15  |  0   | use a0 nothing
>>>
>>> In this log, insn 13 and 14 are much ahead of schedule, which risks
>>> generating
>>> redundant mov instructions, which seems unreasonable.
>>>
>>> Therefore, I submit patch again on the basis of the last review
>>> opinions to try to solve this problem.
>>>
>>> https://github.com/majin2020/gcc-test/commit/efcb43e3369e771bde702955048bfe3f501263dd#diff-805031b1be5092a2322852a248d0b0f92eef7cad5784a8209f4dfc6221407457L189
>>> This is the diff log of shed1 after patch is added.
>>>
>>> The new pipeline is:
>>> ;; | insn | prio |
>>> ;; |  17  |  3   | r142=a0 alu
>>> ...
>>> ;; |  10  |  0   | [r144]=r141 alu
>>> ;; |  13  |  0   | clobber a0 nothing
>>> ;; |  14  |  0   | clobber r136 nothing
>>> ;; |  12  |  0   | a0=r136 alu
>>> ;; |  15  |  0   | use a0 nothing
>>>
>>> gcc/ChangeLog:
>>> 	* haifa-sched.cc (use_or_clobber_starts_range_p): New.
>>> 	(prune_ready_list): USE or CLOBBER should delay execution
>>> 	if it starts a new live range.
>> OK for the trunk.  It doesn't look like you have write access and I
>> don't see anything about what testing was done.  Standard practice is
>> to
>> do a bootstrap and regression test on a primary platform such as x86,
>> aarch64, ppc64.
>>
>> I went ahead and did a bootstrap and regression test on x86_64, then
>> pushed this to the trunk.
> 
> Unfortunately this patch has triggered a bootstrap comparison failure on
> loongarch64-linux-gnu: https://gcc.gnu.org/PR112497.
It's also causing simple build failures on other targets.  For example 
c6x-elf aborts when compiling gcc.c-torture/execute/pr82210 (and others) 
with -O2 with that patch applied.

I've reverted it for now.  I'm not going to have time to investigate 
this week.

Jeff
>
  
Xi Ruoyao Nov. 12, 2023, 6:11 p.m. UTC | #5
On Sun, 2023-11-12 at 11:02 -0700, Jeff Law wrote:
> 
> 
> On 11/12/23 10:41, Xi Ruoyao wrote:
> > On Sat, 2023-11-11 at 13:12 -0700, Jeff Law wrote:
> > > 
> > > 
> > > On 8/14/23 05:22, Jin Ma wrote:
> > > > CLOBBER and USE does not represent real instructions, but in the
> > > > process of pipeline optimization, they will wait for
> > > > transmission
> > > > in ready list like other insns, without considering resource
> > > > conflicts and cycles. This results in a multi-issue CPU
> > > > architecture
> > > > that can be issued at any time if other regular insns have
> > > > resource
> > > > conflicts or cannot be launched for other reasons. As a result,
> > > > its position is advanced in the generated insns sequence, which
> > > > will affect register allocation and often lead to more redundant
> > > > mov instructions.
> > > > 
> > > > A simple example:
> > > > https://github.com/majin2020/gcc-test/blob/master/test.c
> > > > This is a function in the dhrystone benchmark.
> > > > 
> > > > https://github.com/majin2020/gcc-test/blob/0b08c1a13de9663d7d9aba7539b960ec0607ca24/test.c.299r.sched1
> > > > This is a log of the pass 'sched1' When -mtune=rocket but
> > > > issue_rate
> > > > == 2.
> > > > 
> > > > The pipeline is:
> > > > ;; | insn | prio |
> > > > ;; |  17  |  3   | r142=a0 alu
> > > > ;; |  14  |  0   | clobber r136 nothing
> > > > ;; |  13  |  0   | clobber a0 nothing
> > > > ;; |  18  |  2   | r143=a1 alu
> > > > ...
> > > > ;; |  12  |  0   | a0=r136 alu
> > > > ;; |  15  |  0   | use a0 nothing
> > > > 
> > > > In this log, insn 13 and 14 are much ahead of schedule, which
> > > > risks
> > > > generating
> > > > redundant mov instructions, which seems unreasonable.
> > > > 
> > > > Therefore, I submit patch again on the basis of the last review
> > > > opinions to try to solve this problem.
> > > > 
> > > > https://github.com/majin2020/gcc-test/commit/efcb43e3369e771bde702955048bfe3f501263dd#diff-805031b1be5092a2322852a248d0b0f92eef7cad5784a8209f4dfc6221407457L189
> > > > This is the diff log of shed1 after patch is added.
> > > > 
> > > > The new pipeline is:
> > > > ;; | insn | prio |
> > > > ;; |  17  |  3   | r142=a0 alu
> > > > ...
> > > > ;; |  10  |  0   | [r144]=r141 alu
> > > > ;; |  13  |  0   | clobber a0 nothing
> > > > ;; |  14  |  0   | clobber r136 nothing
> > > > ;; |  12  |  0   | a0=r136 alu
> > > > ;; |  15  |  0   | use a0 nothing
> > > > 
> > > > gcc/ChangeLog:
> > > > 	* haifa-sched.cc (use_or_clobber_starts_range_p): New.
> > > > 	(prune_ready_list): USE or CLOBBER should delay
> > > > execution
> > > > 	if it starts a new live range.
> > > OK for the trunk.  It doesn't look like you have write access and
> > > I
> > > don't see anything about what testing was done.  Standard practice
> > > is
> > > to
> > > do a bootstrap and regression test on a primary platform such as
> > > x86,
> > > aarch64, ppc64.
> > > 
> > > I went ahead and did a bootstrap and regression test on x86_64,
> > > then
> > > pushed this to the trunk.
> > 
> > Unfortunately this patch has triggered a bootstrap comparison
> > failure on
> > loongarch64-linux-gnu: https://gcc.gnu.org/PR112497.
> It's also causing simple build failures on other targets.  For example
> c6x-elf aborts when compiling gcc.c-torture/execute/pr82210 (and
> others) 
> with -O2 with that patch applied.
> 
> I've reverted it for now.  I'm not going to have time to investigate 
> this week.

So I'm marking the PR fixed.  Please CC me when iterating this patch for
another round so I can test it on loongarch64-linux-gnu.
  
Jin Ma Nov. 13, 2023, 2:16 a.m. UTC | #6
> > 
> > Unfortunately this patch has triggered a bootstrap comparison failure on
> > loongarch64-linux-gnu: https://gcc.gnu.org/PR112497.
> It's also causing simple build failures on other targets.  For example 
> c6x-elf aborts when compiling gcc.c-torture/execute/pr82210 (and others) 
> with -O2 with that patch applied.
> 
> I've reverted it for now.  I'm not going to have time to investigate 
> this week.

I'm sorry to have caused this and had a bad effect. This patch has
been a long time since I verified it, so I don't know what happened, I
will check it out :)

BR
Jin

> Jeff
> >
  
Jeff Law Nov. 13, 2023, 2:28 a.m. UTC | #7
On 11/12/23 19:16, Jin Ma wrote:
>>>
>>> Unfortunately this patch has triggered a bootstrap comparison failure on
>>> loongarch64-linux-gnu: https://gcc.gnu.org/PR112497.
>> It's also causing simple build failures on other targets.  For example
>> c6x-elf aborts when compiling gcc.c-torture/execute/pr82210 (and others)
>> with -O2 with that patch applied.
>>
>> I've reverted it for now.  I'm not going to have time to investigate
>> this week.
> 
> I'm sorry to have caused this and had a bad effect. This patch has
> been a long time since I verified it, so I don't know what happened, I
> will check it out :)
It happens to all of us.  It's been reverted, so it's not causing anyone 
problems anymore.   We also know that various ports have sensitivity to 
the patch, so we can do deeper testing on it once you think it's ready 
to go again.

Jeff
  

Patch

diff --git a/gcc/haifa-sched.cc b/gcc/haifa-sched.cc
index 8e8add709b3..47ad09457c7 100644
--- a/gcc/haifa-sched.cc
+++ b/gcc/haifa-sched.cc
@@ -765,6 +765,23 @@  real_insn_for_shadow (rtx_insn *insn)
   return pair->i1;
 }
 
+/* Return TRUE if INSN (a USE or CLOBBER) starts a new live
+    range, FALSE otherwise.  */
+
+static bool
+use_or_clobber_starts_range_p (rtx_insn *insn)
+{
+  gcc_assert (insn);
+
+  if ((GET_CODE (PATTERN (insn)) == CLOBBER
+       || GET_CODE (PATTERN (insn)) == USE)
+      && !sd_lists_empty_p (insn, SD_LIST_FORW)
+      && sd_lists_empty_p (insn, SD_LIST_BACK))
+    return true;
+
+  return false;
+}
+
 /* For a pair P of insns, return the fixed distance in cycles from the first
    insn after which the second must be scheduled.  */
 static int
@@ -6320,11 +6337,39 @@  prune_ready_list (state_t temp_state, bool first_cycle_insn_p,
 	    }
 	  else if (recog_memoized (insn) < 0)
 	    {
-	      if (!first_cycle_insn_p
-		  && (GET_CODE (PATTERN (insn)) == ASM_INPUT
-		      || asm_noperands (PATTERN (insn)) >= 0))
-		cost = 1;
-	      reason = "asm";
+	      if (GET_CODE (PATTERN (insn)) == ASM_INPUT
+		  || asm_noperands (PATTERN (insn)) >= 0)
+		{
+		  reason = "asm";
+		  if (!first_cycle_insn_p)
+		    cost = 1;
+		}
+	      else if (use_or_clobber_starts_range_p (insn))
+		{
+		  /* If USE or CLOBBER opens an active range, its execution should
+		     be delayed so as to be closer to the relevant instructions and
+		     avoid the generation of some redundant mov instructions.
+		     Otherwise, it should be executed as soon as possible.  */
+		  reason = "unrecog insn";
+		  if (!first_cycle_insn_p)
+		    /* If USE or CLOBBER is not in the first cycle, simply delay it
+		       by one cycle.  */
+		    cost = 1;
+		  else
+		    {
+		      /* If the USE or CLOBBER is in the first cycle and there are no
+			 other non-USE or non-CLOBBER instructions after it, we need
+			 to execute it immediately, otherwise we need to execute the
+			 non-USE or non-CLOBBER instructions first and postpone the
+			 execution of the USE or CLOBBER instructions.  */
+		      int j = i;
+		      while (n > ++j)
+			if (!use_or_clobber_starts_range_p (ready_element (&ready, j)))
+			  break;
+
+		      cost = (j == n) ? 0 : 1;
+		    }
+		}
 	    }
 	  else if (sched_pressure != SCHED_PRESSURE_NONE)
 	    {