Allow prologues and epilogues to be inserted later

Message ID mptmt8xsbrl.fsf@arm.com
State Accepted
Headers
Series Allow prologues and epilogues to be inserted later |

Checks

Context Check Description
snail/gcc-patch-check success Github commit url

Commit Message

Richard Sandiford Nov. 11, 2022, 4:21 p.m. UTC
  Arm's SME adds a new processor mode called streaming mode.
This mode enables some new (matrix-oriented) instructions and
disables several existing groups of instructions, such as most
Advanced SIMD vector instructions and a much smaller set of SVE
instructions.  It can also change the current vector length.

There are instructions to switch in and out of streaming mode.
However, their effect on the ISA and vector length can't be represented
directly in RTL, so they need to be emitted late in the pass pipeline,
close to md_reorg.

It's sometimes the responsibility of the prologue and epilogue to
switch modes, which means we need to emit the prologue and epilogue
sequences late as well.  (This loses shrink-wrapping and scheduling
opportunities, but that's a price worth paying.)

This patch therefore adds a target hook for forcing prologue
and epilogue insertion to happen later in the pipeline.

Tested on aarch64-linux-gnu (including with a follow-on patch)
and x86_64-linux-gnu.  OK to install?

Richard


gcc/
	* target.def (use_late_prologue_epilogue): New hook.
	* doc/gccint/target-macros/miscellaneous-parameters.rst: Add
	TARGET_USE_LATE_PROLOGUE_EPILOGUE.
	* doc/gccint/target-macros/tm.rst.in: Regenerate.
	* passes.def (pass_late_thread_prologue_and_epilogue): New pass.
	* tree-pass.h (make_pass_late_thread_prologue_and_epilogue): Declare.
	* function.cc (pass_thread_prologue_and_epilogue::gate): New function.
	(pass_data_late_thread_prologue_and_epilogue): New pass variable.
	(pass_late_thread_prologue_and_epilogue): New pass class.
	(make_pass_late_thread_prologue_and_epilogue): New function.
---
 .../miscellaneous-parameters.rst              |  5 +++
 gcc/doc/gccint/target-macros/tm.rst.in        | 22 ++++++++++
 gcc/function.cc                               | 43 +++++++++++++++++++
 gcc/passes.def                                |  3 ++
 gcc/target.def                                | 21 +++++++++
 gcc/tree-pass.h                               |  2 +
 6 files changed, 96 insertions(+)
  

Comments

Jeff Law Nov. 15, 2022, 5:57 p.m. UTC | #1
On 11/11/22 09:21, Richard Sandiford via Gcc-patches wrote:
> Arm's SME adds a new processor mode called streaming mode.
> This mode enables some new (matrix-oriented) instructions and
> disables several existing groups of instructions, such as most
> Advanced SIMD vector instructions and a much smaller set of SVE
> instructions.  It can also change the current vector length.
>
> There are instructions to switch in and out of streaming mode.
> However, their effect on the ISA and vector length can't be represented
> directly in RTL, so they need to be emitted late in the pass pipeline,
> close to md_reorg.
>
> It's sometimes the responsibility of the prologue and epilogue to
> switch modes, which means we need to emit the prologue and epilogue
> sequences late as well.  (This loses shrink-wrapping and scheduling
> opportunities, but that's a price worth paying.)
>
> This patch therefore adds a target hook for forcing prologue
> and epilogue insertion to happen later in the pipeline.
>
> Tested on aarch64-linux-gnu (including with a follow-on patch)
> and x86_64-linux-gnu.  OK to install?
>   I'll ob
> Richard
>
>
> gcc/
> 	* target.def (use_late_prologue_epilogue): New hook.
> 	* doc/gccint/target-macros/miscellaneous-parameters.rst: Add
> 	TARGET_USE_LATE_PROLOGUE_EPILOGUE.
> 	* doc/gccint/target-macros/tm.rst.in: Regenerate.
> 	* passes.def (pass_late_thread_prologue_and_epilogue): New pass.
> 	* tree-pass.h (make_pass_late_thread_prologue_and_epilogue): Declare.
> 	* function.cc (pass_thread_prologue_and_epilogue::gate): New function.
> 	(pass_data_late_thread_prologue_and_epilogue): New pass variable.
> 	(pass_late_thread_prologue_and_epilogue): New pass class.
> 	(make_pass_late_thread_prologue_and_epilogue): New function.

I'm not sure how we'll enforce the no target independent code motion 
limitation that this seems to need and the exception made for reorg is 
hackish in that it appears we just rely on the fact that reorg isn't run 
for the one target where this matters.  That does make me wonder if we 
should future proof this ever so slightly -- is there a reasonably easy 
way to fail if a target were to define delay slots and the need for late 
prologue/epilogue?  If so, that seems advisable.


No objection to the meat of the patch, just wondering a bit about the 
additional sanity checking we can do...


Jeff
  
Richard Sandiford Nov. 18, 2022, 3:18 p.m. UTC | #2
Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On 11/11/22 09:21, Richard Sandiford via Gcc-patches wrote:
>> Arm's SME adds a new processor mode called streaming mode.
>> This mode enables some new (matrix-oriented) instructions and
>> disables several existing groups of instructions, such as most
>> Advanced SIMD vector instructions and a much smaller set of SVE
>> instructions.  It can also change the current vector length.
>>
>> There are instructions to switch in and out of streaming mode.
>> However, their effect on the ISA and vector length can't be represented
>> directly in RTL, so they need to be emitted late in the pass pipeline,
>> close to md_reorg.
>>
>> It's sometimes the responsibility of the prologue and epilogue to
>> switch modes, which means we need to emit the prologue and epilogue
>> sequences late as well.  (This loses shrink-wrapping and scheduling
>> opportunities, but that's a price worth paying.)
>>
>> This patch therefore adds a target hook for forcing prologue
>> and epilogue insertion to happen later in the pipeline.
>>
>> Tested on aarch64-linux-gnu (including with a follow-on patch)
>> and x86_64-linux-gnu.  OK to install?
>>   I'll ob
>> Richard
>>
>>
>> gcc/
>> 	* target.def (use_late_prologue_epilogue): New hook.
>> 	* doc/gccint/target-macros/miscellaneous-parameters.rst: Add
>> 	TARGET_USE_LATE_PROLOGUE_EPILOGUE.
>> 	* doc/gccint/target-macros/tm.rst.in: Regenerate.
>> 	* passes.def (pass_late_thread_prologue_and_epilogue): New pass.
>> 	* tree-pass.h (make_pass_late_thread_prologue_and_epilogue): Declare.
>> 	* function.cc (pass_thread_prologue_and_epilogue::gate): New function.
>> 	(pass_data_late_thread_prologue_and_epilogue): New pass variable.
>> 	(pass_late_thread_prologue_and_epilogue): New pass class.
>> 	(make_pass_late_thread_prologue_and_epilogue): New function.
>
> I'm not sure how we'll enforce the no target independent code motion 
> limitation that this seems to need and the exception made for reorg is 
> hackish in that it appears we just rely on the fact that reorg isn't run 
> for the one target where this matters.  That does make me wonder if we 
> should future proof this ever so slightly -- is there a reasonably easy 
> way to fail if a target were to define delay slots and the need for late 
> prologue/epilogue?  If so, that seems advisable.
>
>
> No objection to the meat of the patch, just wondering a bit about the 
> additional sanity checking we can do...

Yeah, good point.  How does the version below look?  Tested as before.

I guess it's a philosophical question what distinguishes "late compilation"
from everything else, but I think it makes sense for it to mean "no code
motion" (among other things).  And it's useful if targets have a well-
defined point at which they can insert their own passes while guaranteeing
that:

- the CFG still exists and hasn't lost information
- no code motion occurs later
- alignments aren't nailed down yet
- variable tracking occurs later (and so will account for whatever the
  target does in its pass)

One of the SME patches uses it for that purpose, independently of this patch,
and also needs there to be no code motion.

I don't think it's controversial to say that delay-branch reorg should
happen as part of normal scheduling, with the later passes coping with
the SEQUENCEs generated from it, but there's no realistic chance of
that happening.  So unfortunately it's always likely to be a special
case...

Bernd did some nice work on avoiding dbr for bfin (IIRC), but without
the handling of SEQUENCEs in rtl passes, even that version had to
happen during md_reorg.

Thanks,
Richard


gcc/
	* target.def (use_late_prologue_epilogue): New hook.
	* doc/tm.texi.in: Add TARGET_USE_LATE_PROLOGUE_EPILOGUE.
	* doc/tm.texi: Regenerate.
	* passes.def (pass_late_thread_prologue_and_epilogue): New pass.
	* tree-pass.h (make_pass_late_thread_prologue_and_epilogue): Declare.
	* function.cc (pass_thread_prologue_and_epilogue::gate): New function.
	(pass_data_late_thread_prologue_and_epilogue): New pass variable.
	(pass_late_thread_prologue_and_epilogue): New pass class.
	(make_pass_late_thread_prologue_and_epilogue): New function.
---
 gcc/doc/tm.texi    | 19 ++++++++++++++++++
 gcc/doc/tm.texi.in |  2 ++
 gcc/function.cc    | 49 ++++++++++++++++++++++++++++++++++++++++++++++
 gcc/passes.def     |  3 +++
 gcc/target.def     | 21 ++++++++++++++++++++
 gcc/tree-pass.h    |  2 ++
 6 files changed, 96 insertions(+)

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index af77d16030c..6624768d68c 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11667,6 +11667,25 @@ of the if-block in the @code{struct ce_if_block} structure that is pointed
 to by @var{ce_info}.
 @end defmac
 
+@deftypefn {Target Hook} bool TARGET_USE_LATE_PROLOGUE_EPILOGUE ()
+Return true if the current function's prologue and epilogue should
+be emitted late in the pass pipeline, instead of at the usual point.
+
+Normally, the prologue and epilogue sequences are introduced soon after
+register allocation is complete.  The advantage of this approach is that
+it allows the prologue and epilogue instructions to be optimized and
+scheduled with other code in the function.  However, some targets
+require the prologue and epilogue to be the first and last sequences
+executed by the function, with no variation allowed.  This hook should
+return true on such targets.
+
+The default implementation returns false, which is correct for most
+targets.  The hook should only return true if there is a specific
+target limitation that cannot be described in RTL.  For example,
+the hook might return true if the prologue and epilogue need to switch
+between instruction sets.
+@end deftypefn
+
 @deftypefn {Target Hook} void TARGET_MACHINE_DEPENDENT_REORG (void)
 If non-null, this hook performs a target-specific pass over the
 instruction stream.  The compiler will run it at all optimization levels,
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 62c49ac46de..ca32737743b 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -7708,6 +7708,8 @@ of the if-block in the @code{struct ce_if_block} structure that is pointed
 to by @var{ce_info}.
 @end defmac
 
+@hook TARGET_USE_LATE_PROLOGUE_EPILOGUE
+
 @hook TARGET_MACHINE_DEPENDENT_REORG
 
 @hook TARGET_INIT_BUILTINS
diff --git a/gcc/function.cc b/gcc/function.cc
index b54a1d81a3b..926a2a59a45 100644
--- a/gcc/function.cc
+++ b/gcc/function.cc
@@ -84,6 +84,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "function-abi.h"
 #include "value-range.h"
 #include "gimple-range.h"
+#include "insn-attr.h"
 
 /* So we can assign to cfun in this file.  */
 #undef cfun
@@ -6641,6 +6642,11 @@ public:
   {}
 
   /* opt_pass methods: */
+  bool gate (function *) final override
+    {
+      return !targetm.use_late_prologue_epilogue ();
+    }
+
   unsigned int execute (function *) final override
     {
       return rest_of_handle_thread_prologue_and_epilogue ();
@@ -6648,6 +6654,43 @@ public:
 
 }; // class pass_thread_prologue_and_epilogue
 
+const pass_data pass_data_late_thread_prologue_and_epilogue =
+{
+  RTL_PASS, /* type */
+  "late_pro_and_epilogue", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_THREAD_PROLOGUE_AND_EPILOGUE, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  ( TODO_df_verify | TODO_df_finish ), /* todo_flags_finish */
+};
+
+class pass_late_thread_prologue_and_epilogue : public rtl_opt_pass
+{
+public:
+  pass_late_thread_prologue_and_epilogue (gcc::context *ctxt)
+    : rtl_opt_pass (pass_data_late_thread_prologue_and_epilogue, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  bool gate (function *) final override
+    {
+      return targetm.use_late_prologue_epilogue ();
+    }
+
+  unsigned int execute (function *) final override
+    {
+      /* It's not currently possible to have both delay slots and
+	 late prologue/epilogue, since the latter has to run before
+	 the former, and the former won't honor whatever restrictions
+	 the latter is trying to enforce.  */
+      gcc_assert (!DELAY_SLOTS);
+      return rest_of_handle_thread_prologue_and_epilogue ();
+    }
+}; // class pass_late_thread_prologue_and_epilogue
+
 } // anon namespace
 
 rtl_opt_pass *
@@ -6656,6 +6699,12 @@ make_pass_thread_prologue_and_epilogue (gcc::context *ctxt)
   return new pass_thread_prologue_and_epilogue (ctxt);
 }
 
+rtl_opt_pass *
+make_pass_late_thread_prologue_and_epilogue (gcc::context *ctxt)
+{
+  return new pass_late_thread_prologue_and_epilogue (ctxt);
+}
+
 namespace {
 
 const pass_data pass_data_zero_call_used_regs =
diff --git a/gcc/passes.def b/gcc/passes.def
index 462e9afad61..12c792b1ced 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -518,6 +518,9 @@ along with GCC; see the file COPYING3.  If not see
 	      NEXT_PASS (pass_stack_regs_run);
 	  POP_INSERT_PASSES ()
       POP_INSERT_PASSES ()
+      NEXT_PASS (pass_late_thread_prologue_and_epilogue);
+      /* No target-independent code motion is allowed beyond this point,
+         excepting the legacy delayed-branch pass.  */
       NEXT_PASS (pass_late_compilation);
       PUSH_INSERT_PASSES_WITHIN (pass_late_compilation)
 	  NEXT_PASS (pass_zero_call_used_regs);
diff --git a/gcc/target.def b/gcc/target.def
index d82606ff5ab..b6ebf96c494 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -4089,6 +4089,27 @@ returns @code{VOIDmode}.",
  machine_mode, (machine_mode m1, machine_mode m2),
  default_cc_modes_compatible)
 
+DEFHOOK
+(use_late_prologue_epilogue,
+ "Return true if the current function's prologue and epilogue should\n\
+be emitted late in the pass pipeline, instead of at the usual point.\n\
+\n\
+Normally, the prologue and epilogue sequences are introduced soon after\n\
+register allocation is complete.  The advantage of this approach is that\n\
+it allows the prologue and epilogue instructions to be optimized and\n\
+scheduled with other code in the function.  However, some targets\n\
+require the prologue and epilogue to be the first and last sequences\n\
+executed by the function, with no variation allowed.  This hook should\n\
+return true on such targets.\n\
+\n\
+The default implementation returns false, which is correct for most\n\
+targets.  The hook should only return true if there is a specific\n\
+target limitation that cannot be described in RTL.  For example,\n\
+the hook might return true if the prologue and epilogue need to switch\n\
+between instruction sets.",
+ bool, (),
+ hook_bool_void_false)
+
 /* Do machine-dependent code transformations.  Called just before
      delayed-branch scheduling.  */
 DEFHOOK
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 8480d41384b..63177764ffa 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -612,6 +612,8 @@ extern rtl_opt_pass *make_pass_gcse2 (gcc::context *ctxt);
 extern rtl_opt_pass *make_pass_split_after_reload (gcc::context *ctxt);
 extern rtl_opt_pass *make_pass_thread_prologue_and_epilogue (gcc::context
 							     *ctxt);
+extern rtl_opt_pass *make_pass_late_thread_prologue_and_epilogue (gcc::context
+								  *ctxt);
 extern rtl_opt_pass *make_pass_zero_call_used_regs (gcc::context *ctxt);
 extern rtl_opt_pass *make_pass_stack_adjustments (gcc::context *ctxt);
 extern rtl_opt_pass *make_pass_sched_fusion (gcc::context *ctxt);
  
Jeff Law Nov. 18, 2022, 4:42 p.m. UTC | #3
On 11/18/22 08:18, Richard Sandiford wrote:
>
> Yeah, good point.  How does the version below look?  Tested as before.
>
> I guess it's a philosophical question what distinguishes "late compilation"
> from everything else, but I think it makes sense for it to mean "no code
> motion" (among other things).  And it's useful if targets have a well-
> defined point at which they can insert their own passes while guaranteeing
> that:
>
> - the CFG still exists and hasn't lost information
> - no code motion occurs later
> - alignments aren't nailed down yet
> - variable tracking occurs later (and so will account for whatever the
>    target does in its pass)

Seems like a reasonable set of properties.  Do we want to document this 
somewhere so that it get captured?  That can be independent of this 
particular patch.


>
> I don't think it's controversial to say that delay-branch reorg should
> happen as part of normal scheduling, with the later passes coping with
> the SEQUENCEs generated from it, but there's no realistic chance of
> that happening.  So unfortunately it's always likely to be a special
> case...

I've been wanting the guts of dbr moved into sched2 for a long time.  
I've speculated that we could use the dependence analysis from sched2 to 
provide the candidates for delay slot filling and that doing so would 
probably pick up the vast majority of opportunities, but without the 
ad-hoc dependency bits in reorg.c. But yea, realistically nobody's going 
to invest the time to revamp reorg.


>
> Bernd did some nice work on avoiding dbr for bfin (IIRC), but without
> the handling of SEQUENCEs in rtl passes, even that version had to
> happen during md_reorg.

Never really looked at it.



>
> Thanks,
> Richard
>
>
> gcc/
> 	* target.def (use_late_prologue_epilogue): New hook.
> 	* doc/tm.texi.in: Add TARGET_USE_LATE_PROLOGUE_EPILOGUE.
> 	* doc/tm.texi: Regenerate.
> 	* passes.def (pass_late_thread_prologue_and_epilogue): New pass.
> 	* tree-pass.h (make_pass_late_thread_prologue_and_epilogue): Declare.
> 	* function.cc (pass_thread_prologue_and_epilogue::gate): New function.
> 	(pass_data_late_thread_prologue_and_epilogue): New pass variable.
> 	(pass_late_thread_prologue_and_epilogue): New pass class.
> 	(make_pass_late_thread_prologue_and_epilogue): New function.

OK

jeff
  

Patch

diff --git a/gcc/doc/gccint/target-macros/miscellaneous-parameters.rst b/gcc/doc/gccint/target-macros/miscellaneous-parameters.rst
index e4e348c2adc..b48f91d3fd2 100644
--- a/gcc/doc/gccint/target-macros/miscellaneous-parameters.rst
+++ b/gcc/doc/gccint/target-macros/miscellaneous-parameters.rst
@@ -551,6 +551,11 @@  Here are several miscellaneous parameters.
   of the if-block in the ``struct ce_if_block`` structure that is pointed
   to by :samp:`{ce_info}`.
 
+.. include:: tm.rst.in
+  :start-after: [TARGET_USE_LATE_PROLOGUE_EPILOGUE]
+  :end-before: [TARGET_USE_LATE_PROLOGUE_EPILOGUE]
+
+
 .. include:: tm.rst.in
   :start-after: [TARGET_MACHINE_DEPENDENT_REORG]
   :end-before: [TARGET_MACHINE_DEPENDENT_REORG]
diff --git a/gcc/doc/gccint/target-macros/tm.rst.in b/gcc/doc/gccint/target-macros/tm.rst.in
index 44f3a3b2222..2e789f8723d 100644
--- a/gcc/doc/gccint/target-macros/tm.rst.in
+++ b/gcc/doc/gccint/target-macros/tm.rst.in
@@ -3702,6 +3702,28 @@ 
 
 [TARGET_CC_MODES_COMPATIBLE]
 
+[TARGET_USE_LATE_PROLOGUE_EPILOGUE]
+.. function:: bool TARGET_USE_LATE_PROLOGUE_EPILOGUE ()
+
+  Return true if the current function's prologue and epilogue should
+  be emitted late in the pass pipeline, instead of at the usual point.
+  
+  Normally, the prologue and epilogue sequences are introduced soon after
+  register allocation is complete.  The advantage of this approach is that
+  it allows the prologue and epilogue instructions to be optimized and
+  scheduled with other code in the function.  However, some targets
+  require the prologue and epilogue to be the first and last sequences
+  executed by the function, with no variation allowed.  This hook should
+  return true on such targets.
+  
+  The default implementation returns false, which is correct for most
+  targets.  The hook should only return true if there is a specific
+  target limitation that cannot be described in RTL.  For example,
+  the hook might return true if the prologue and epilogue need to switch
+  between instruction sets.
+
+[TARGET_USE_LATE_PROLOGUE_EPILOGUE]
+
 [TARGET_MACHINE_DEPENDENT_REORG]
 .. function:: void TARGET_MACHINE_DEPENDENT_REORG (void)
 
diff --git a/gcc/function.cc b/gcc/function.cc
index b54a1d81a3b..3b1ab5d09e5 100644
--- a/gcc/function.cc
+++ b/gcc/function.cc
@@ -6641,6 +6641,11 @@  public:
   {}
 
   /* opt_pass methods: */
+  bool gate (function *) final override
+    {
+      return !targetm.use_late_prologue_epilogue ();
+    }
+
   unsigned int execute (function *) final override
     {
       return rest_of_handle_thread_prologue_and_epilogue ();
@@ -6648,6 +6653,38 @@  public:
 
 }; // class pass_thread_prologue_and_epilogue
 
+const pass_data pass_data_late_thread_prologue_and_epilogue =
+{
+  RTL_PASS, /* type */
+  "late_pro_and_epilogue", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_THREAD_PROLOGUE_AND_EPILOGUE, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  ( TODO_df_verify | TODO_df_finish ), /* todo_flags_finish */
+};
+
+class pass_late_thread_prologue_and_epilogue : public rtl_opt_pass
+{
+public:
+  pass_late_thread_prologue_and_epilogue (gcc::context *ctxt)
+    : rtl_opt_pass (pass_data_late_thread_prologue_and_epilogue, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  bool gate (function *) final override
+    {
+      return targetm.use_late_prologue_epilogue ();
+    }
+
+  unsigned int execute (function *) final override
+    {
+      return rest_of_handle_thread_prologue_and_epilogue ();
+    }
+}; // class pass_late_thread_prologue_and_epilogue
+
 } // anon namespace
 
 rtl_opt_pass *
@@ -6656,6 +6693,12 @@  make_pass_thread_prologue_and_epilogue (gcc::context *ctxt)
   return new pass_thread_prologue_and_epilogue (ctxt);
 }
 
+rtl_opt_pass *
+make_pass_late_thread_prologue_and_epilogue (gcc::context *ctxt)
+{
+  return new pass_late_thread_prologue_and_epilogue (ctxt);
+}
+
 namespace {
 
 const pass_data pass_data_zero_call_used_regs =
diff --git a/gcc/passes.def b/gcc/passes.def
index 193b5794749..822d5713f53 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -518,6 +518,9 @@  along with GCC; see the file COPYING3.  If not see
 	      NEXT_PASS (pass_stack_regs_run);
 	  POP_INSERT_PASSES ()
       POP_INSERT_PASSES ()
+      NEXT_PASS (pass_late_thread_prologue_and_epilogue);
+      /* No target-independent code motion is allowed beyond this point,
+         excepting the legacy delayed-branch pass.  */
       NEXT_PASS (pass_late_compilation);
       PUSH_INSERT_PASSES_WITHIN (pass_late_compilation)
 	  NEXT_PASS (pass_zero_call_used_regs);
diff --git a/gcc/target.def b/gcc/target.def
index aed1c1d3e22..8b8aef982e8 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -4062,6 +4062,27 @@  returns ``VOIDmode``.",
  machine_mode, (machine_mode m1, machine_mode m2),
  default_cc_modes_compatible)
 
+DEFHOOK
+(use_late_prologue_epilogue,
+ "Return true if the current function's prologue and epilogue should\n\
+be emitted late in the pass pipeline, instead of at the usual point.\n\
+\n\
+Normally, the prologue and epilogue sequences are introduced soon after\n\
+register allocation is complete.  The advantage of this approach is that\n\
+it allows the prologue and epilogue instructions to be optimized and\n\
+scheduled with other code in the function.  However, some targets\n\
+require the prologue and epilogue to be the first and last sequences\n\
+executed by the function, with no variation allowed.  This hook should\n\
+return true on such targets.\n\
+\n\
+The default implementation returns false, which is correct for most\n\
+targets.  The hook should only return true if there is a specific\n\
+target limitation that cannot be described in RTL.  For example,\n\
+the hook might return true if the prologue and epilogue need to switch\n\
+between instruction sets.",
+ bool, (),
+ hook_bool_void_false)
+
 /* Do machine-dependent code transformations.  Called just before
      delayed-branch scheduling.  */
 DEFHOOK
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 8480d41384b..63177764ffa 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -612,6 +612,8 @@  extern rtl_opt_pass *make_pass_gcse2 (gcc::context *ctxt);
 extern rtl_opt_pass *make_pass_split_after_reload (gcc::context *ctxt);
 extern rtl_opt_pass *make_pass_thread_prologue_and_epilogue (gcc::context
 							     *ctxt);
+extern rtl_opt_pass *make_pass_late_thread_prologue_and_epilogue (gcc::context
+								  *ctxt);
 extern rtl_opt_pass *make_pass_zero_call_used_regs (gcc::context *ctxt);
 extern rtl_opt_pass *make_pass_stack_adjustments (gcc::context *ctxt);
 extern rtl_opt_pass *make_pass_sched_fusion (gcc::context *ctxt);