RISCV: Add -m(no)-omit-leaf-frame-pointer support.

Message ID 20230602070726.3807539-1-yanzhang.wang@intel.com
State Unresolved
Headers
Series RISCV: Add -m(no)-omit-leaf-frame-pointer support. |

Checks

Context Check Description
snail/gcc-patch-check warning Git am fail log

Commit Message

Li, Pan2 via Gcc-patches June 2, 2023, 7:07 a.m. UTC
  From: Yanzhang Wang <yanzhang.wang@intel.com>

gcc/ChangeLog:

	* config/riscv/riscv.cc (riscv_save_reg_p): Save ra for leaf
	  when enabling -mno-omit-leaf-frame-pointer
	(riscv_option_override): Override omit-frame-pointer.
	(riscv_frame_pointer_required): Save s0 for non-leaf function
	(TARGET_FRAME_POINTER_REQUIRED): Override defination
	* config/riscv/riscv.opt: Add option support.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/omit-frame-pointer-1.c: New test.
	* gcc.target/riscv/omit-frame-pointer-2.c: New test.
	* gcc.target/riscv/omit-frame-pointer-3.c: New test.
	* gcc.target/riscv/omit-frame-pointer-4.c: New test.
	* gcc.target/riscv/omit-frame-pointer-test.c: New test.

Signed-off-by: Yanzhang Wang <yanzhang.wang@intel.com>
---
 gcc/config/riscv/riscv.cc                     | 31 ++++++++++++++++++-
 gcc/config/riscv/riscv.opt                    |  4 +++
 .../gcc.target/riscv/omit-frame-pointer-1.c   |  7 +++++
 .../gcc.target/riscv/omit-frame-pointer-2.c   |  7 +++++
 .../gcc.target/riscv/omit-frame-pointer-3.c   |  7 +++++
 .../gcc.target/riscv/omit-frame-pointer-4.c   |  7 +++++
 .../riscv/omit-frame-pointer-test.c           | 13 ++++++++
 7 files changed, 75 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/omit-frame-pointer-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/omit-frame-pointer-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/omit-frame-pointer-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/omit-frame-pointer-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/omit-frame-pointer-test.c
  

Comments

Jeff Law June 3, 2023, 2:43 a.m. UTC | #1
On 6/2/23 01:07, yanzhang.wang--- via Gcc-patches wrote:
> From: Yanzhang Wang <yanzhang.wang@intel.com>
> 
> gcc/ChangeLog:
> 
> 	* config/riscv/riscv.cc (riscv_save_reg_p): Save ra for leaf
> 	  when enabling -mno-omit-leaf-frame-pointer
> 	(riscv_option_override): Override omit-frame-pointer.
> 	(riscv_frame_pointer_required): Save s0 for non-leaf function
> 	(TARGET_FRAME_POINTER_REQUIRED): Override defination
> 	* config/riscv/riscv.opt: Add option support.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/riscv/omit-frame-pointer-1.c: New test.
> 	* gcc.target/riscv/omit-frame-pointer-2.c: New test.
> 	* gcc.target/riscv/omit-frame-pointer-3.c: New test.
> 	* gcc.target/riscv/omit-frame-pointer-4.c: New test.
> 	* gcc.target/riscv/omit-frame-pointer-test.c: New test.
Not ACKing or NAKing at this time.

Why do you want this feature?

jeff
  
Li, Pan2 via Gcc-patches June 5, 2023, 1:04 a.m. UTC | #2
Some nit comments.

+static bool
+riscv_frame_pointer_required (void)
+{
+  if (riscv_save_frame_pointer && !crtl->is_leaf)
+    return true;
+
+  return false;
+}

Can be simplified to return riscv_save_frame_pointer && !crtl->is_leaf;

+  riscv_save_frame_pointer = false;
+  if (TARGET_OMIT_LEAF_FRAME_POINTER_P (global_options.x_target_flags))
+    {
+      if (!global_options.x_flag_omit_frame_pointer)
+	riscv_save_frame_pointer = true;
+
+      global_options.x_flag_omit_frame_pointer = 1;
+    }

Does this mean if omit_leaf_frame will also set the omit_frame_pointer implicitly?

Pan


-----Original Message-----
From: Wang, Yanzhang <yanzhang.wang@intel.com> 
Sent: Friday, June 2, 2023 3:07 PM
To: gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; kito.cheng@sifive.com; Li, Pan2 <pan2.li@intel.com>; Wang, Yanzhang <yanzhang.wang@intel.com>
Subject: [PATCH] RISCV: Add -m(no)-omit-leaf-frame-pointer support.

From: Yanzhang Wang <yanzhang.wang@intel.com>

gcc/ChangeLog:

	* config/riscv/riscv.cc (riscv_save_reg_p): Save ra for leaf
	  when enabling -mno-omit-leaf-frame-pointer
	(riscv_option_override): Override omit-frame-pointer.
	(riscv_frame_pointer_required): Save s0 for non-leaf function
	(TARGET_FRAME_POINTER_REQUIRED): Override defination
	* config/riscv/riscv.opt: Add option support.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/omit-frame-pointer-1.c: New test.
	* gcc.target/riscv/omit-frame-pointer-2.c: New test.
	* gcc.target/riscv/omit-frame-pointer-3.c: New test.
	* gcc.target/riscv/omit-frame-pointer-4.c: New test.
	* gcc.target/riscv/omit-frame-pointer-test.c: New test.

Signed-off-by: Yanzhang Wang <yanzhang.wang@intel.com>
---
 gcc/config/riscv/riscv.cc                     | 31 ++++++++++++++++++-
 gcc/config/riscv/riscv.opt                    |  4 +++
 .../gcc.target/riscv/omit-frame-pointer-1.c   |  7 +++++
 .../gcc.target/riscv/omit-frame-pointer-2.c   |  7 +++++
 .../gcc.target/riscv/omit-frame-pointer-3.c   |  7 +++++
 .../gcc.target/riscv/omit-frame-pointer-4.c   |  7 +++++
 .../riscv/omit-frame-pointer-test.c           | 13 ++++++++
 7 files changed, 75 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/omit-frame-pointer-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/omit-frame-pointer-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/omit-frame-pointer-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/omit-frame-pointer-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/omit-frame-pointer-test.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 5d2550871c7..e02f9cb50a4 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -408,6 +408,10 @@ static const struct riscv_tune_info riscv_tune_info_table[] = {
 #include "riscv-cores.def"
 };
 
+/* Global variable to distinguish whether we should save and restore s0/fp for
+   function.  */
+static bool riscv_save_frame_pointer;
+
 void riscv_frame_info::reset(void)
 {
   total_size = 0;
@@ -4744,7 +4748,11 @@ riscv_save_reg_p (unsigned int regno)
   if (regno == HARD_FRAME_POINTER_REGNUM && frame_pointer_needed)
     return true;
 
-  if (regno == RETURN_ADDR_REGNUM && crtl->calls_eh_return)
+  /* Need not to use ra for leaf when frame pointer is turned off by option
+     whatever the omit-leaf-frame's value.  */
+  bool keep_leaf_ra = frame_pointer_needed && crtl->is_leaf
+    && !TARGET_OMIT_LEAF_FRAME_POINTER;
+  if (regno == RETURN_ADDR_REGNUM && (crtl->calls_eh_return || keep_leaf_ra))
     return true;
 
   /* If this is an interrupt handler, then must save extra registers.  */
@@ -6287,6 +6295,15 @@ riscv_option_override (void)
   if (flag_pic)
     riscv_cmodel = CM_PIC;
 
+  riscv_save_frame_pointer = false;
+  if (TARGET_OMIT_LEAF_FRAME_POINTER_P (global_options.x_target_flags))
+    {
+      if (!global_options.x_flag_omit_frame_pointer)
+	riscv_save_frame_pointer = true;
+
+      global_options.x_flag_omit_frame_pointer = 1;
+    }
+
   /* We get better code with explicit relocs for CM_MEDLOW, but
      worse code for the others (for now).  Pick the best default.  */
   if ((target_flags_explicit & MASK_EXPLICIT_RELOCS) == 0)
@@ -7158,6 +7175,15 @@ riscv_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
 							& ~zeroed_hardregs);
 }
 
+static bool
+riscv_frame_pointer_required (void)
+{
+  if (riscv_save_frame_pointer && !crtl->is_leaf)
+    return true;
+
+  return false;
+}
+
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
 #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
@@ -7412,6 +7438,9 @@ riscv_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
 #undef TARGET_ZERO_CALL_USED_REGS
 #define TARGET_ZERO_CALL_USED_REGS riscv_zero_call_used_regs
 
+#undef TARGET_FRAME_POINTER_REQUIRED
+#define TARGET_FRAME_POINTER_REQUIRED riscv_frame_pointer_required
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-riscv.h"
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index ff1dd4ddd4f..c3e2e7c1da4 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -138,6 +138,10 @@ Enable the CSR checking for the ISA-dependent CRS and the read-only CSR.
 The ISA-dependent CSR are only valid when the specific ISA is set.  The
 read-only CSR can not be written by the CSR instructions.
 
+momit-leaf-frame-pointer
+Target Mask (OMIT_LEAF_FRAME_POINTER) Save
+Omit the frame pointer in leaf functions.
+
 Mask(64BIT)
 
 Mask(MUL)
diff --git a/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-1.c b/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-1.c
new file mode 100644
index 00000000000..c96123ea702
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-1.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64 -O2 -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -fno-inline" } */
+
+#include "omit-frame-pointer-test.c"
+
+/* { dg-final { scan-assembler-times "sd\tra,\[0-9\]+\\(sp\\)" 2 } } */
+/* { dg-final { scan-assembler-times "sd\ts0,\[0-9\]+\\(sp\\)" 2 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-2.c b/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-2.c
new file mode 100644
index 00000000000..067148c6a58
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-2.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64 -O2 -fno-omit-frame-pointer -momit-leaf-frame-pointer -fno-inline" } */
+
+#include "omit-frame-pointer-test.c"
+
+/* { dg-final { scan-assembler-times "sd\tra,\[0-9\]+\\(sp\\)" 1 } } */
+/* { dg-final { scan-assembler-times "sd\ts0,\[0-9\]+\\(sp\\)" 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-3.c b/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-3.c
new file mode 100644
index 00000000000..b4d7d6f4f0d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-3.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64 -O2 -fomit-frame-pointer -mno-omit-leaf-frame-pointer -fno-inline" } */
+
+#include "omit-frame-pointer-test.c"
+
+/* { dg-final { scan-assembler-times "sd\tra,\[0-9\]+\\(sp\\)" 1 } } */
+/* { dg-final { scan-assembler-not "sd\ts0,\[0-9\]+\\(sp\\)"} } */
diff --git a/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-4.c b/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-4.c
new file mode 100644
index 00000000000..5a5b540ef4e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-4.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64 -O2 -fomit-frame-pointer -momit-leaf-frame-pointer -fno-inline" } */
+
+#include "omit-frame-pointer-test.c"
+
+/* { dg-final { scan-assembler-times "sd\tra,\[0-9\]+\\(sp\\)" 1 } } */
+/* { dg-final { scan-assembler-not "sd\ts0,\[0-9\]+\\(sp\\)"} } */
diff --git a/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-test.c b/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-test.c
new file mode 100644
index 00000000000..cf19f001e29
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-test.c
@@ -0,0 +1,13 @@
+int inc(int n)
+{
+  return n + 1;
+}
+
+
+int bar(void)
+{
+  int n = 100;
+  n = inc(n);
+  n = inc(n) + 100;
+  return n;
+}
  
Li, Pan2 via Gcc-patches June 5, 2023, 2:49 a.m. UTC | #3
Hi Jeff,

Yes, there's a requirement to support backtrace based on the fp+ra.
And the unwind/cfa is not acceptable because it will add additional
sections to the binary. Currently, -fno-omit-frame-pointer can not
save the ra for the leaf function. So we need to add another option
like ARM/X86 to support consistent fp+ra stack layout for the leaf
and non-leaf functions.

Thanks,
Yanzhang
  
Li, Pan2 via Gcc-patches June 5, 2023, 3:36 a.m. UTC | #4
> +static bool
> +riscv_frame_pointer_required (void)
> +{
> +  if (riscv_save_frame_pointer && !crtl->is_leaf)
> +    return true;
> +
> +  return false;
> +}
> 
> Can be simplified to return riscv_save_frame_pointer && !crtl->is_leaf;

Nice. It's much simpler. Will modify in another patch.

> +  riscv_save_frame_pointer = false;
> +  if (TARGET_OMIT_LEAF_FRAME_POINTER_P (global_options.x_target_flags))
> +    {
> +      if (!global_options.x_flag_omit_frame_pointer)
> +	riscv_save_frame_pointer = true;
> +
> +      global_options.x_flag_omit_frame_pointer = 1;
> +    }
> 
> Does this mean if omit_leaf_frame will also set the omit_frame_pointer
> implicitly?
>

For the flag it's yes but for the behavior it's no. The behavior still is
based on the flag of omit-frame-pointer's value.

- ON, than the frame pointer of non-leaf functions will be omitted.
- OFF(no), than the frame pointer of non-leaf functions will not be omitted.

In the other words, if we want to omit the leaf frame pointers,

- if we want to omit the non-leaf fp too, we need only save the ra for the non-leaf.
- if we don't, we need to save the fp+ra for the non-leaf but no fp+ra for the leaf.

We need to override the option (x_flag_omit_frame_pointer) because it's the
first priority when determine whether the frame pointer is needed. If it's
turned off, the frame pointer will be saved for leaf functions too even
though we turn on the omit-leaf-frame-pointer.

To distinguish the two scenarios above, we need to add another variable to
save the flag user set originally otherwise it will be threw away.

Yanzhang

> -----Original Message-----
> From: Li, Pan2 <pan2.li@intel.com>
> Sent: Monday, June 5, 2023 9:04 AM
> To: Wang, Yanzhang <yanzhang.wang@intel.com>; gcc-patches@gcc.gnu.org
> Cc: juzhe.zhong@rivai.ai; kito.cheng@sifive.com
> Subject: RE: [PATCH] RISCV: Add -m(no)-omit-leaf-frame-pointer support.
> 
> Some nit comments.
> 
> +static bool
> +riscv_frame_pointer_required (void)
> +{
> +  if (riscv_save_frame_pointer && !crtl->is_leaf)
> +    return true;
> +
> +  return false;
> +}
> 
> Can be simplified to return riscv_save_frame_pointer && !crtl->is_leaf;
> 
> +  riscv_save_frame_pointer = false;
> +  if (TARGET_OMIT_LEAF_FRAME_POINTER_P (global_options.x_target_flags))
> +    {
> +      if (!global_options.x_flag_omit_frame_pointer)
> +	riscv_save_frame_pointer = true;
> +
> +      global_options.x_flag_omit_frame_pointer = 1;
> +    }
> 
> Does this mean if omit_leaf_frame will also set the omit_frame_pointer
> implicitly?
> 
> Pan
> 
> 
> -----Original Message-----
> From: Wang, Yanzhang <yanzhang.wang@intel.com>
> Sent: Friday, June 2, 2023 3:07 PM
> To: gcc-patches@gcc.gnu.org
> Cc: juzhe.zhong@rivai.ai; kito.cheng@sifive.com; Li, Pan2
> <pan2.li@intel.com>; Wang, Yanzhang <yanzhang.wang@intel.com>
> Subject: [PATCH] RISCV: Add -m(no)-omit-leaf-frame-pointer support.
> 
> From: Yanzhang Wang <yanzhang.wang@intel.com>
> 
> gcc/ChangeLog:
> 
> 	* config/riscv/riscv.cc (riscv_save_reg_p): Save ra for leaf
> 	  when enabling -mno-omit-leaf-frame-pointer
> 	(riscv_option_override): Override omit-frame-pointer.
> 	(riscv_frame_pointer_required): Save s0 for non-leaf function
> 	(TARGET_FRAME_POINTER_REQUIRED): Override defination
> 	* config/riscv/riscv.opt: Add option support.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/riscv/omit-frame-pointer-1.c: New test.
> 	* gcc.target/riscv/omit-frame-pointer-2.c: New test.
> 	* gcc.target/riscv/omit-frame-pointer-3.c: New test.
> 	* gcc.target/riscv/omit-frame-pointer-4.c: New test.
> 	* gcc.target/riscv/omit-frame-pointer-test.c: New test.
> 
> Signed-off-by: Yanzhang Wang <yanzhang.wang@intel.com>
> ---
>  gcc/config/riscv/riscv.cc                     | 31 ++++++++++++++++++-
>  gcc/config/riscv/riscv.opt                    |  4 +++
>  .../gcc.target/riscv/omit-frame-pointer-1.c   |  7 +++++
>  .../gcc.target/riscv/omit-frame-pointer-2.c   |  7 +++++
>  .../gcc.target/riscv/omit-frame-pointer-3.c   |  7 +++++
>  .../gcc.target/riscv/omit-frame-pointer-4.c   |  7 +++++
>  .../riscv/omit-frame-pointer-test.c           | 13 ++++++++
>  7 files changed, 75 insertions(+), 1 deletion(-)  create mode 100644
> gcc/testsuite/gcc.target/riscv/omit-frame-pointer-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/omit-frame-pointer-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/omit-frame-pointer-3.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/omit-frame-pointer-4.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/omit-frame-pointer-
> test.c
> 
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index
> 5d2550871c7..e02f9cb50a4 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -408,6 +408,10 @@ static const struct riscv_tune_info
> riscv_tune_info_table[] = {  #include "riscv-cores.def"
>  };
> 
> +/* Global variable to distinguish whether we should save and restore
> s0/fp for
> +   function.  */
> +static bool riscv_save_frame_pointer;
> +
>  void riscv_frame_info::reset(void)
>  {
>    total_size = 0;
> @@ -4744,7 +4748,11 @@ riscv_save_reg_p (unsigned int regno)
>    if (regno == HARD_FRAME_POINTER_REGNUM && frame_pointer_needed)
>      return true;
> 
> -  if (regno == RETURN_ADDR_REGNUM && crtl->calls_eh_return)
> +  /* Need not to use ra for leaf when frame pointer is turned off by
> option
> +     whatever the omit-leaf-frame's value.  */  bool keep_leaf_ra =
> + frame_pointer_needed && crtl->is_leaf
> +    && !TARGET_OMIT_LEAF_FRAME_POINTER;  if (regno ==
> + RETURN_ADDR_REGNUM && (crtl->calls_eh_return || keep_leaf_ra))
>      return true;
> 
>    /* If this is an interrupt handler, then must save extra registers.
> */ @@ -6287,6 +6295,15 @@ riscv_option_override (void)
>    if (flag_pic)
>      riscv_cmodel = CM_PIC;
> 
> +  riscv_save_frame_pointer = false;
> +  if (TARGET_OMIT_LEAF_FRAME_POINTER_P (global_options.x_target_flags))
> +    {
> +      if (!global_options.x_flag_omit_frame_pointer)
> +	riscv_save_frame_pointer = true;
> +
> +      global_options.x_flag_omit_frame_pointer = 1;
> +    }
> +
>    /* We get better code with explicit relocs for CM_MEDLOW, but
>       worse code for the others (for now).  Pick the best default.  */
>    if ((target_flags_explicit & MASK_EXPLICIT_RELOCS) == 0) @@ -7158,6
> +7175,15 @@ riscv_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
>  							& ~zeroed_hardregs);
>  }
> 
> +static bool
> +riscv_frame_pointer_required (void)
> +{
> +  if (riscv_save_frame_pointer && !crtl->is_leaf)
> +    return true;
> +
> +  return false;
> +}
> +
>  /* Initialize the GCC target structure.  */  #undef
> TARGET_ASM_ALIGNED_HI_OP  #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
> @@ -7412,6 +7438,9 @@ riscv_zero_call_used_regs (HARD_REG_SET
> need_zeroed_hardregs)  #undef TARGET_ZERO_CALL_USED_REGS  #define
> TARGET_ZERO_CALL_USED_REGS riscv_zero_call_used_regs
> 
> +#undef TARGET_FRAME_POINTER_REQUIRED
> +#define TARGET_FRAME_POINTER_REQUIRED riscv_frame_pointer_required
> +
>  struct gcc_target targetm = TARGET_INITIALIZER;
> 
>  #include "gt-riscv.h"
> diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
> index ff1dd4ddd4f..c3e2e7c1da4 100644
> --- a/gcc/config/riscv/riscv.opt
> +++ b/gcc/config/riscv/riscv.opt
> @@ -138,6 +138,10 @@ Enable the CSR checking for the ISA-dependent CRS
> and the read-only CSR.
>  The ISA-dependent CSR are only valid when the specific ISA is set.  The
> read-only CSR can not be written by the CSR instructions.
> 
> +momit-leaf-frame-pointer
> +Target Mask (OMIT_LEAF_FRAME_POINTER) Save Omit the frame pointer in
> +leaf functions.
> +
>  Mask(64BIT)
> 
>  Mask(MUL)
> diff --git a/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-1.c
> b/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-1.c
> new file mode 100644
> index 00000000000..c96123ea702
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-1.c
> @@ -0,0 +1,7 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gc -mabi=lp64 -O2 -fno-omit-frame-pointer
> +-mno-omit-leaf-frame-pointer -fno-inline" } */
> +
> +#include "omit-frame-pointer-test.c"
> +
> +/* { dg-final { scan-assembler-times "sd\tra,\[0-9\]+\\(sp\\)" 2 } } */
> +/* { dg-final { scan-assembler-times "sd\ts0,\[0-9\]+\\(sp\\)" 2 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-2.c
> b/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-2.c
> new file mode 100644
> index 00000000000..067148c6a58
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-2.c
> @@ -0,0 +1,7 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gc -mabi=lp64 -O2 -fno-omit-frame-pointer
> +-momit-leaf-frame-pointer -fno-inline" } */
> +
> +#include "omit-frame-pointer-test.c"
> +
> +/* { dg-final { scan-assembler-times "sd\tra,\[0-9\]+\\(sp\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sd\ts0,\[0-9\]+\\(sp\\)" 1 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-3.c
> b/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-3.c
> new file mode 100644
> index 00000000000..b4d7d6f4f0d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-3.c
> @@ -0,0 +1,7 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gc -mabi=lp64 -O2 -fomit-frame-pointer
> +-mno-omit-leaf-frame-pointer -fno-inline" } */
> +
> +#include "omit-frame-pointer-test.c"
> +
> +/* { dg-final { scan-assembler-times "sd\tra,\[0-9\]+\\(sp\\)" 1 } } */
> +/* { dg-final { scan-assembler-not "sd\ts0,\[0-9\]+\\(sp\\)"} } */
> diff --git a/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-4.c
> b/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-4.c
> new file mode 100644
> index 00000000000..5a5b540ef4e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-4.c
> @@ -0,0 +1,7 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gc -mabi=lp64 -O2 -fomit-frame-pointer
> +-momit-leaf-frame-pointer -fno-inline" } */
> +
> +#include "omit-frame-pointer-test.c"
> +
> +/* { dg-final { scan-assembler-times "sd\tra,\[0-9\]+\\(sp\\)" 1 } } */
> +/* { dg-final { scan-assembler-not "sd\ts0,\[0-9\]+\\(sp\\)"} } */
> diff --git a/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-test.c
> b/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-test.c
> new file mode 100644
> index 00000000000..cf19f001e29
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-test.c
> @@ -0,0 +1,13 @@
> +int inc(int n)
> +{
> +  return n + 1;
> +}
> +
> +
> +int bar(void)
> +{
> +  int n = 100;
> +  n = inc(n);
> +  n = inc(n) + 100;
> +  return n;
> +}
> --
> 2.40.1
  
Jeff Law June 7, 2023, 2:13 a.m. UTC | #5
On 6/4/23 20:49, Wang, Yanzhang wrote:
> Hi Jeff,
> 
> Yes, there's a requirement to support backtrace based on the fp+ra.
> And the unwind/cfa is not acceptable because it will add additional
> sections to the binary. Currently, -fno-omit-frame-pointer can not
> save the ra for the leaf function. So we need to add another option
> like ARM/X86 to support consistent fp+ra stack layout for the leaf
> and non-leaf functions.
One of the things that needs to be upstreamed is long jump support 
within a function.  Essentially once a function reaches 1M in size we 
have the real possibility that a direct jump may not reach its target.

To support this I expect that $ra is going to become a fixed register 
(ie, not available to the register allocator as a temporary).  It'll be 
used as a scratch register for long jump sequences.

One of the consequences of this is $ra will need to be saved in leaf 
functions that are near or over 1M in size.

Note that at the time when we have to lay out the stack, we do not know 
the precise length of the function.  So there's a degree of "fuzz" in 
the decision whether or not to save $ra in a function that is close to 
the 1M limit.

I don't think you can reliably know if $ra is valid in an arbitrary leaf 
function or not.  You could implement some heuristics by looking at the 
symbol table (which I'm guessing you don't want to do) or by 
disassembling the prologue (again, I'm guessing you don't want to do 
that either).

Meaning that what you really want is to be using -fno-omit-frame-pointer 
and for $ra to always be saved in the stack, even in a leaf function.

Presumably you're not suggesting any of these options be used in general 
-- they're going to be used for things like embedded devices or 
firmware?  Also note there are low overhead unwinding schemes out there 
that are already supported in various tools -- ORC & SFRAME come 
immediately to mind.   Those may be better than building a bespoke 
solution for the embedded space.



Jeff
  
Li, Pan2 via Gcc-patches June 7, 2023, 3:50 a.m. UTC | #6
Hi Jeff,

Thanks your comments. I have few questions that I don't quite understand.

> One of the things that needs to be upstreamed is long jump support within
> a function.  Essentially once a function reaches 1M in size we have the
> real possibility that a direct jump may not reach its target.
> 
> To support this I expect that $ra is going to become a fixed register (ie,
> not available to the register allocator as a temporary).  It'll be used
> as a scratch register for long jump sequences.
> 
> One of the consequences of this is $ra will need to be saved in leaf
> functions that are near or over 1M in size.
> 
> Note that at the time when we have to lay out the stack, we do not know
> the precise length of the function.  So there's a degree of "fuzz" in the
> decision whether or not to save $ra in a function that is close to the 1M
> limit.

Do you mean that, long jump to more than 1M offset will need multiple jal
and each jal will save the $ra ?

If yes, I'm confused about what's the influence of the $ra saving for
function prologue. We will save the fp+ra at the prologue, the next $ra 
saving seems will not modify the $ra already saved.

> I don't think you can reliably know if $ra is valid in an arbitrary leaf
> function or not.  You could implement some heuristics by looking at the
> symbol table (which I'm guessing you don't want to do) or by
> disassembling the prologue (again, I'm guessing you don't want to do that
> either).

I think it's yes (not valid) when we want to get the return address to parent
function from $ra directly in the function body. But we can get the right
return address from fp with offset if we save them at prologue, is it right ?

> Meaning that what you really want is to be using -fno-omit-frame-pointer
> and for $ra to always be saved in the stack, even in a leaf function.

This is also another solution but will change the default behavior of
-fno-omit-frame-pointer.

> Presumably you're not suggesting any of these options be used in general
> -- they're going to be used for things like embedded devices or firmware?
> Also note there are low overhead unwinding schemes out there that are
> already supported in various tools -- ORC & SFRAME come
> immediately to mind.   Those may be better than building a bespoke
> solution for the embedded space.

Yes. You're right, I forget to introduce background of the requirement. It
will be used in the firmware where the dwarf or unwinding maybe not acceptable.

Yanzhang

> -----Original Message-----
> From: Jeff Law <jeffreyalaw@gmail.com>
> Sent: Wednesday, June 7, 2023 10:13 AM
> To: Wang, Yanzhang <yanzhang.wang@intel.com>; gcc-patches@gcc.gnu.org
> Cc: juzhe.zhong@rivai.ai; kito.cheng@sifive.com; Li, Pan2
> <pan2.li@intel.com>
> Subject: Re: [PATCH] RISCV: Add -m(no)-omit-leaf-frame-pointer support.
> 
> 
> 
> On 6/4/23 20:49, Wang, Yanzhang wrote:
> > Hi Jeff,
> >
> > Yes, there's a requirement to support backtrace based on the fp+ra.
> > And the unwind/cfa is not acceptable because it will add additional
> > sections to the binary. Currently, -fno-omit-frame-pointer can not
> > save the ra for the leaf function. So we need to add another option
> > like ARM/X86 to support consistent fp+ra stack layout for the leaf and
> > non-leaf functions.
> One of the things that needs to be upstreamed is long jump support within
> a function.  Essentially once a function reaches 1M in size we have the
> real possibility that a direct jump may not reach its target.
> 
> To support this I expect that $ra is going to become a fixed register (ie,
> not available to the register allocator as a temporary).  It'll be used
> as a scratch register for long jump sequences.
> 
> One of the consequences of this is $ra will need to be saved in leaf
> functions that are near or over 1M in size.
> 
> Note that at the time when we have to lay out the stack, we do not know
> the precise length of the function.  So there's a degree of "fuzz" in the
> decision whether or not to save $ra in a function that is close to the 1M
> limit.
> 
> I don't think you can reliably know if $ra is valid in an arbitrary leaf
> function or not.  You could implement some heuristics by looking at the
> symbol table (which I'm guessing you don't want to do) or by
> disassembling the prologue (again, I'm guessing you don't want to do that
> either).
> 
> Meaning that what you really want is to be using -fno-omit-frame-pointer
> and for $ra to always be saved in the stack, even in a leaf function.
> 
> Presumably you're not suggesting any of these options be used in general
> -- they're going to be used for things like embedded devices or firmware?
> Also note there are low overhead unwinding schemes out there that are
> already supported in various tools -- ORC & SFRAME come
> immediately to mind.   Those may be better than building a bespoke
> solution for the embedded space.
> 
> 
> 
> Jeff
  
Jeff Law June 8, 2023, 3:05 p.m. UTC | #7
On 6/6/23 21:50, Wang, Yanzhang wrote:
> Hi Jeff,
> 
> Thanks your comments. I have few questions that I don't quite understand.
> 
>> One of the things that needs to be upstreamed is long jump support within
>> a function.  Essentially once a function reaches 1M in size we have the
>> real possibility that a direct jump may not reach its target.
>>
>> To support this I expect that $ra is going to become a fixed register (ie,
>> not available to the register allocator as a temporary).  It'll be used
>> as a scratch register for long jump sequences.
>>
>> One of the consequences of this is $ra will need to be saved in leaf
>> functions that are near or over 1M in size.
>>
>> Note that at the time when we have to lay out the stack, we do not know
>> the precise length of the function.  So there's a degree of "fuzz" in the
>> decision whether or not to save $ra in a function that is close to the 1M
>> limit.
> 
> Do you mean that, long jump to more than 1M offset will need multiple jal
> and each jal will save the $ra ?
Long jumps are implemnted as an indirect jump which needs a scratch 
register to hold the high part of the jump target address.

> 
> If yes, I'm confused about what's the influence of the $ra saving for
> function prologue. We will save the fp+ra at the prologue, the next $ra
> saving seems will not modify the $ra already saved.
The long branch handling is done at the assembler level.  So the 
clobbering of $ra isn't visible to the compiler.  Thus the compiler has 
to be extremely careful to not hold values in $ra because the assembler 
may clobber $ra.

This ultimately comes back to the phase ordering problem.  At register 
allocation time we don't know if we need long jumps or not.  So we don't 
know if $ra is potentially clobbered by the assembler.   A similar phase 
ordering problems exists in the prologue/epilogue generation.

The other approach to long branch handling would be to do it all in the 
compiler.  I would actually prefer this approach, but it's not likely to 
land in the near term.


> 
> I think it's yes (not valid) when we want to get the return address to parent
> function from $ra directly in the function body. But we can get the right
> return address from fp with offset if we save them at prologue, is it right ?
Right.  You'll be able to get the value of $ra out of the stack.



> 
>> Meaning that what you really want is to be using -fno-omit-frame-pointer
>> and for $ra to always be saved in the stack, even in a leaf function.
> 
> This is also another solution but will change the default behavior of
> -fno-omit-frame-pointer.
That's OK.  While -f options are target independent options, targets are 
allowed to adjust certain behaviors based on those options.

If you're not going to use dwarf, then my recommendation is to ensure 
that the data you need is *always* available in the stack at known 
offsets.   That will mean your code isn't optimized as well.  It means 
hand written assembly code has to follow the conventions, you can't link 
against libraries that do not follow those conventions, etc etc.  But 
that's the price you pay for not using dwarf (or presumably ORC/SFRAME 
which I haven't studied in detail).

Jeff






Jeff
  
Li, Pan2 via Gcc-patches June 21, 2023, 8:14 a.m. UTC | #8
Hi Jeff, sorry for the late reply.

> The long branch handling is done at the assembler level.  So the clobbering
> of $ra isn't visible to the compiler.  Thus the compiler has to be
> extremely careful to not hold values in $ra because the assembler may
> clobber $ra.

If assembler will modify the $ra behavior, it seems the rules we defined in
the riscv.cc will be ignored. For example, the $ra saving generated by this
patch may be modified by the assmebler and all others depends on it will be
wrong. So implementing the long jump in the compiler is better.

Do I understand it correctly ?

> If you're not going to use dwarf, then my recommendation is to ensure that
> the data you need is *always* available in the stack at known
> offsets.   That will mean your code isn't optimized as well.  It means
> hand written assembly code has to follow the conventions, you can't link
> against libraries that do not follow those conventions, etc etc.  But
> that's the price you pay for not using dwarf (or presumably ORC/SFRAME
> which I haven't studied in detail).

Yes. That's right. All the libraries need to follow the same logic. But as
you said, this is the price if we choose this solution. And fortunately,
this will only be used in special scenarios.

---

And Jeff, do you have any other comments about this patch? Should we add
some descriptions somewhere in the doc?

Thanks,
Yanzhang

> -----Original Message-----
> From: Jeff Law <jeffreyalaw@gmail.com>
> Sent: Thursday, June 8, 2023 11:05 PM
> To: Wang, Yanzhang <yanzhang.wang@intel.com>; gcc-patches@gcc.gnu.org
> Cc: juzhe.zhong@rivai.ai; kito.cheng@sifive.com; Li, Pan2
> <pan2.li@intel.com>
> Subject: Re: [PATCH] RISCV: Add -m(no)-omit-leaf-frame-pointer support.
> 
> 
> 
> On 6/6/23 21:50, Wang, Yanzhang wrote:
> > Hi Jeff,
> >
> > Thanks your comments. I have few questions that I don't quite understand.
> >
> >> One of the things that needs to be upstreamed is long jump support
> >> within a function.  Essentially once a function reaches 1M in size we
> >> have the real possibility that a direct jump may not reach its target.
> >>
> >> To support this I expect that $ra is going to become a fixed register
> >> (ie, not available to the register allocator as a temporary).  It'll
> >> be used as a scratch register for long jump sequences.
> >>
> >> One of the consequences of this is $ra will need to be saved in leaf
> >> functions that are near or over 1M in size.
> >>
> >> Note that at the time when we have to lay out the stack, we do not
> >> know the precise length of the function.  So there's a degree of
> >> "fuzz" in the decision whether or not to save $ra in a function that
> >> is close to the 1M limit.
> >
> > Do you mean that, long jump to more than 1M offset will need multiple
> > jal and each jal will save the $ra ?
> Long jumps are implemnted as an indirect jump which needs a scratch
> register to hold the high part of the jump target address.
> 
> >
> > If yes, I'm confused about what's the influence of the $ra saving for
> > function prologue. We will save the fp+ra at the prologue, the next
> > $ra saving seems will not modify the $ra already saved.
> The long branch handling is done at the assembler level.  So the clobbering
> of $ra isn't visible to the compiler.  Thus the compiler has to be
> extremely careful to not hold values in $ra because the assembler may
> clobber $ra.
> 
> This ultimately comes back to the phase ordering problem.  At register
> allocation time we don't know if we need long jumps or not.  So we don't
> know if $ra is potentially clobbered by the assembler.   A similar phase
> ordering problems exists in the prologue/epilogue generation.
> 
> The other approach to long branch handling would be to do it all in the
> compiler.  I would actually prefer this approach, but it's not likely to
> land in the near term.
> 
> 
> >
> > I think it's yes (not valid) when we want to get the return address to
> > parent function from $ra directly in the function body. But we can get
> > the right return address from fp with offset if we save them at prologue,
> is it right ?
> Right.  You'll be able to get the value of $ra out of the stack.
> 
> 
> 
> >
> >> Meaning that what you really want is to be using
> >> -fno-omit-frame-pointer and for $ra to always be saved in the stack,
> even in a leaf function.
> >
> > This is also another solution but will change the default behavior of
> > -fno-omit-frame-pointer.
> That's OK.  While -f options are target independent options, targets are
> allowed to adjust certain behaviors based on those options.
> 
> If you're not going to use dwarf, then my recommendation is to ensure that
> the data you need is *always* available in the stack at known
> offsets.   That will mean your code isn't optimized as well.  It means
> hand written assembly code has to follow the conventions, you can't link
> against libraries that do not follow those conventions, etc etc.  But
> that's the price you pay for not using dwarf (or presumably ORC/SFRAME
> which I haven't studied in detail).
> 
> Jeff
> 
> 
> 
> 
> 
> 
> Jeff
  
Jeff Law June 24, 2023, 3:01 p.m. UTC | #9
On 6/21/23 02:14, Wang, Yanzhang wrote:
> Hi Jeff, sorry for the late reply.
> 
>> The long branch handling is done at the assembler level.  So the clobbering
>> of $ra isn't visible to the compiler.  Thus the compiler has to be
>> extremely careful to not hold values in $ra because the assembler may
>> clobber $ra.
> 
> If assembler will modify the $ra behavior, it seems the rules we defined in
> the riscv.cc will be ignored. For example, the $ra saving generated by this
> patch may be modified by the assmebler and all others depends on it will be
> wrong. So implementing the long jump in the compiler is better.
Basically correct.  The assembler potentially clobbers $ra.  That's why 
in the long jump patches $ra becomes a fixed register -- the compiler 
doesn't know when it's clobbered by the assembler.

Even if this were done in the compiler, we'd still have to do something 
special with $ra.  The point at which decisions about register 
allocation and such are made is before the point where we know the final 
positions of jumps/labels.  It's a classic problem in GCC's design.


>> If you're not going to use dwarf, then my recommendation is to ensure that
>> the data you need is *always* available in the stack at known
>> offsets.   That will mean your code isn't optimized as well.  It means
>> hand written assembly code has to follow the conventions, you can't link
>> against libraries that do not follow those conventions, etc etc.  But
>> that's the price you pay for not using dwarf (or presumably ORC/SFRAME
>> which I haven't studied in detail).
> 
> Yes. That's right. All the libraries need to follow the same logic. But as
> you said, this is the price if we choose this solution. And fortunately,
> this will only be used in special scenarios.
The key point is you want the location of the return pointer to be 
consistent in every function and you want to know that every function 
has a frame pointer.

Otherwise you end up having to either consult on-the-side tables (at 
which point you might as well look at ORC/SFRAME) or disassembling code 
in the executable to deduce where to find fp, ra, etc (which is a path 
to madness).

Thus for the usage scenario you're looking at, I would recommend always 
having a frame pointer, every function, no matter how trivial and that 
$ra always be saved into a suitable slot relative to the frame pointer, 
again, no matter how trivial the function.


> And Jeff, do you have any other comments about this patch? Should we add
> some descriptions somewhere in the doc?
We may need to adjust the documentation a bit since I think I'm 
suggesting slight changes in the behavior of existing -m options.

I'd like to see an updated patch before commenting further on 
implementation details.

jeff
  
Stefan O'Rear June 25, 2023, 1:40 a.m. UTC | #10
On Sat, Jun 24, 2023, at 11:01 AM, Jeff Law via Gcc-patches wrote:
> On 6/21/23 02:14, Wang, Yanzhang wrote:
>> Hi Jeff, sorry for the late reply.
>> 
>>> The long branch handling is done at the assembler level.  So the clobbering
>>> of $ra isn't visible to the compiler.  Thus the compiler has to be
>>> extremely careful to not hold values in $ra because the assembler may
>>> clobber $ra.
>> 
>> If assembler will modify the $ra behavior, it seems the rules we defined in
>> the riscv.cc will be ignored. For example, the $ra saving generated by this
>> patch may be modified by the assmebler and all others depends on it will be
>> wrong. So implementing the long jump in the compiler is better.
> Basically correct.  The assembler potentially clobbers $ra.  That's why 
> in the long jump patches $ra becomes a fixed register -- the compiler 
> doesn't know when it's clobbered by the assembler.
>
> Even if this were done in the compiler, we'd still have to do something 
> special with $ra.  The point at which decisions about register 
> allocation and such are made is before the point where we know the final 
> positions of jumps/labels.  It's a classic problem in GCC's design.

Do you have a reference for more information on the long jump patches?

I'm particularly curious about why $ra was selected as the temporary instead
of $t1 like the tail pseudoinstruction uses.

-s
  
Jeff Law June 25, 2023, 12:49 p.m. UTC | #11
On 6/24/23 19:40, Stefan O'Rear wrote:
> On Sat, Jun 24, 2023, at 11:01 AM, Jeff Law via Gcc-patches wrote:
>> On 6/21/23 02:14, Wang, Yanzhang wrote:
>>> Hi Jeff, sorry for the late reply.
>>>
>>>> The long branch handling is done at the assembler level.  So the clobbering
>>>> of $ra isn't visible to the compiler.  Thus the compiler has to be
>>>> extremely careful to not hold values in $ra because the assembler may
>>>> clobber $ra.
>>>
>>> If assembler will modify the $ra behavior, it seems the rules we defined in
>>> the riscv.cc will be ignored. For example, the $ra saving generated by this
>>> patch may be modified by the assmebler and all others depends on it will be
>>> wrong. So implementing the long jump in the compiler is better.
>> Basically correct.  The assembler potentially clobbers $ra.  That's why
>> in the long jump patches $ra becomes a fixed register -- the compiler
>> doesn't know when it's clobbered by the assembler.
>>
>> Even if this were done in the compiler, we'd still have to do something
>> special with $ra.  The point at which decisions about register
>> allocation and such are made is before the point where we know the final
>> positions of jumps/labels.  It's a classic problem in GCC's design.
> 
> Do you have a reference for more information on the long jump patches?
I can extract the patch Andrew wrote if that would be helpful.

> 
> I'm particularly curious about why $ra was selected as the temporary instead
> of $t1 like the tail pseudoinstruction uses.
$ra would be less disruptive from a code generation standpoint. 
Essentially whatever register is selected has to become a fixed 
register, meaning it's unavailable to the allocator.   Thus $t1 would be 
a horrible choice.  Ultimately this is defined by the assembler.

jeff
  
Stefan O'Rear June 25, 2023, 6:45 p.m. UTC | #12
On Sun, Jun 25, 2023, at 8:49 AM, Jeff Law wrote:
> On 6/24/23 19:40, Stefan O'Rear wrote:
>> On Sat, Jun 24, 2023, at 11:01 AM, Jeff Law via Gcc-patches wrote:
>>> On 6/21/23 02:14, Wang, Yanzhang wrote:
>>>> Hi Jeff, sorry for the late reply.
>>>>
>>>>> The long branch handling is done at the assembler level.  So the clobbering
>>>>> of $ra isn't visible to the compiler.  Thus the compiler has to be
>>>>> extremely careful to not hold values in $ra because the assembler may
>>>>> clobber $ra.
>>>>
>>>> If assembler will modify the $ra behavior, it seems the rules we defined in
>>>> the riscv.cc will be ignored. For example, the $ra saving generated by this
>>>> patch may be modified by the assmebler and all others depends on it will be
>>>> wrong. So implementing the long jump in the compiler is better.
>>> Basically correct.  The assembler potentially clobbers $ra.  That's why
>>> in the long jump patches $ra becomes a fixed register -- the compiler
>>> doesn't know when it's clobbered by the assembler.
>>>
>>> Even if this were done in the compiler, we'd still have to do something
>>> special with $ra.  The point at which decisions about register
>>> allocation and such are made is before the point where we know the final
>>> positions of jumps/labels.  It's a classic problem in GCC's design.
>> 
>> Do you have a reference for more information on the long jump patches?
> I can extract the patch Andrew wrote if that would be helpful.
>
>> 
>> I'm particularly curious about why $ra was selected as the temporary instead
>> of $t1 like the tail pseudoinstruction uses.
> $ra would be less disruptive from a code generation standpoint. 
> Essentially whatever register is selected has to become a fixed 
> register, meaning it's unavailable to the allocator.   Thus $t1 would be 
> a horrible choice.  Ultimately this is defined by the assembler.

To clarify: are you proposing to make ra (or t1 in the hypothetical) a fixed
register for all functions, or only those heuristically identified as potentially
larger than 1MiB?  And would this extend to forcing the creation of stack frames
for all functions, including very small functions?  I am concerned this would
result in a substantial performance regression.

Without seeing the patch I can't know if I'm missing something obvious but I
would say t1 has three advantages:

1. Consistency with tail, possibly simpler implementation.

2. Very few functions use all seven t-registers.  qemu linux-user in 2016 had an
off-by-one bug that corrupted t6 in sigreturn and it took months for anyone to
notice.  By contrast, ra has live data in every non-_Noreturn function.

3. Any jalr instruction which has rs1=ra has a hint effect on the return address
stack (call, return, or coroutine swap); a jalr which is intended to be treated
as a plain jump must have rs1!=ra, rs1!=t0.

-s

> jeff
  
Jeff Law June 26, 2023, 2:30 p.m. UTC | #13
On 6/25/23 12:45, Stefan O'Rear wrote:

> 
> To clarify: are you proposing to make ra (or t1 in the hypothetical) a fixed
> register for all functions, or only those heuristically identified as potentially
> larger than 1MiB?  And would this extend to forcing the creation of stack frames
> for all functions, including very small functions?  I am concerned this would
> result in a substantial performance regression.For the case Yanzhang is discussing (firmware and such), yes.  And 
that's simply the cost they're going to have to pay for wanting 
consistent backtraces without utilizing dwarf unwind info, sframe or orc.

Normal builds won't be using those options and thus won't suffer from 
those performance penalties.

> 
> Without seeing the patch I can't know if I'm missing something obvious but I
> would say t1 has three advantages:
> 
> 1. Consistency with tail, possibly simpler implementation.
And as I've already stated, this sequence is defined by the assembler. 
While I do want to revisit a compiler only solution, it's way down on my 
list of things to improve if I do a cost/benefit analysis.   If  someone 
wants to take a stab at it, I'm all for it.  But it's not a simple 
problem due the phase ordering issues.

> 
> 2. Very few functions use all seven t-registers.  qemu linux-user in 2016 had an
> off-by-one bug that corrupted t6 in sigreturn and it took months for anyone to
> notice.  By contrast, ra has live data in every non-_Noreturn function.
That's a terrible way to evaluate the impact.  The right way is to use 
real benchmarks.  Not synthetic benchmarks.  Not indirect observations 
that require triggering a bug in a sigreturn path.  Build and run a real 
benchmark.



> 
> 3. Any jalr instruction which has rs1=ra has a hint effect on the return address
> stack (call, return, or coroutine swap); a jalr which is intended to be treated
> as a plain jump must have rs1!=ra, rs1!=t0.
I'm well aware of these concerns.  We support disambiguating various 
jump forms to facilitate different branch predictors.

jeff
  
Kito Cheng June 26, 2023, 2:50 p.m. UTC | #14
LLVM will try to find scratch register even after RA to resolve the long
jump issue. so maybe we could consider similar approach? And I guess the
most complicate part would be the scratch register is not found, and
require spill/reload after RA.

Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org>於 2023年6月26日 週一,22:31寫道:

>
>
> On 6/25/23 12:45, Stefan O'Rear wrote:
>
> >
> > To clarify: are you proposing to make ra (or t1 in the hypothetical) a
> fixed
> > register for all functions, or only those heuristically identified as
> potentially
> > larger than 1MiB?  And would this extend to forcing the creation of
> stack frames
> > for all functions, including very small functions?  I am concerned this
> would
> > result in a substantial performance regression.For the case Yanzhang is
> discussing (firmware and such), yes.  And
> that's simply the cost they're going to have to pay for wanting
> consistent backtraces without utilizing dwarf unwind info, sframe or orc.
>
> Normal builds won't be using those options and thus won't suffer from
> those performance penalties.
>
> >
> > Without seeing the patch I can't know if I'm missing something obvious
> but I
> > would say t1 has three advantages:
> >
> > 1. Consistency with tail, possibly simpler implementation.
> And as I've already stated, this sequence is defined by the assembler.
> While I do want to revisit a compiler only solution, it's way down on my
> list of things to improve if I do a cost/benefit analysis.   If  someone
> wants to take a stab at it, I'm all for it.  But it's not a simple
> problem due the phase ordering issues.
>
> >
> > 2. Very few functions use all seven t-registers.  qemu linux-user in
> 2016 had an
> > off-by-one bug that corrupted t6 in sigreturn and it took months for
> anyone to
> > notice.  By contrast, ra has live data in every non-_Noreturn function.
> That's a terrible way to evaluate the impact.  The right way is to use
> real benchmarks.  Not synthetic benchmarks.  Not indirect observations
> that require triggering a bug in a sigreturn path.  Build and run a real
> benchmark.
>
>
>
> >
> > 3. Any jalr instruction which has rs1=ra has a hint effect on the return
> address
> > stack (call, return, or coroutine swap); a jalr which is intended to be
> treated
> > as a plain jump must have rs1!=ra, rs1!=t0.
> I'm well aware of these concerns.  We support disambiguating various
> jump forms to facilitate different branch predictors.
>
> jeff
>
  
Jeff Law June 26, 2023, 4:51 p.m. UTC | #15
On 6/26/23 08:50, Kito Cheng wrote:
> LLVM will try to find scratch register even after RA to resolve the long 
> jump issue. so maybe we could consider similar approach? And I guess the 
> most complicate part would be the scratch register is not found, and 
> require spill/reload after RA.
Right.  And the spill/reload after RA is ta problem unless you 
pre-allocate the space.  Of course in a function near 1M in size, odds 
are there were some calls in there and thus $ra would be saved.  In the 
exceedingly rare case where it wasn't, allocating a single stack slot 
isn't going to be a major performance driver.

There's other things you can do as well.  Register scavenging, jump 
trampolines, etc.  Examples of both exist.

The point I'm trying to make is that I suspect we're better off burning 
$ra right now to address the correctness issue, then coming back to one 
of the schemes noted above when the cost/benefit analysis shows it's a 
reasonably high priority relative to other optimizations we could be doing.

Jeff
  

Patch

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 5d2550871c7..e02f9cb50a4 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -408,6 +408,10 @@  static const struct riscv_tune_info riscv_tune_info_table[] = {
 #include "riscv-cores.def"
 };
 
+/* Global variable to distinguish whether we should save and restore s0/fp for
+   function.  */
+static bool riscv_save_frame_pointer;
+
 void riscv_frame_info::reset(void)
 {
   total_size = 0;
@@ -4744,7 +4748,11 @@  riscv_save_reg_p (unsigned int regno)
   if (regno == HARD_FRAME_POINTER_REGNUM && frame_pointer_needed)
     return true;
 
-  if (regno == RETURN_ADDR_REGNUM && crtl->calls_eh_return)
+  /* Need not to use ra for leaf when frame pointer is turned off by option
+     whatever the omit-leaf-frame's value.  */
+  bool keep_leaf_ra = frame_pointer_needed && crtl->is_leaf
+    && !TARGET_OMIT_LEAF_FRAME_POINTER;
+  if (regno == RETURN_ADDR_REGNUM && (crtl->calls_eh_return || keep_leaf_ra))
     return true;
 
   /* If this is an interrupt handler, then must save extra registers.  */
@@ -6287,6 +6295,15 @@  riscv_option_override (void)
   if (flag_pic)
     riscv_cmodel = CM_PIC;
 
+  riscv_save_frame_pointer = false;
+  if (TARGET_OMIT_LEAF_FRAME_POINTER_P (global_options.x_target_flags))
+    {
+      if (!global_options.x_flag_omit_frame_pointer)
+	riscv_save_frame_pointer = true;
+
+      global_options.x_flag_omit_frame_pointer = 1;
+    }
+
   /* We get better code with explicit relocs for CM_MEDLOW, but
      worse code for the others (for now).  Pick the best default.  */
   if ((target_flags_explicit & MASK_EXPLICIT_RELOCS) == 0)
@@ -7158,6 +7175,15 @@  riscv_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
 							& ~zeroed_hardregs);
 }
 
+static bool
+riscv_frame_pointer_required (void)
+{
+  if (riscv_save_frame_pointer && !crtl->is_leaf)
+    return true;
+
+  return false;
+}
+
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
 #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
@@ -7412,6 +7438,9 @@  riscv_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
 #undef TARGET_ZERO_CALL_USED_REGS
 #define TARGET_ZERO_CALL_USED_REGS riscv_zero_call_used_regs
 
+#undef TARGET_FRAME_POINTER_REQUIRED
+#define TARGET_FRAME_POINTER_REQUIRED riscv_frame_pointer_required
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-riscv.h"
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index ff1dd4ddd4f..c3e2e7c1da4 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -138,6 +138,10 @@  Enable the CSR checking for the ISA-dependent CRS and the read-only CSR.
 The ISA-dependent CSR are only valid when the specific ISA is set.  The
 read-only CSR can not be written by the CSR instructions.
 
+momit-leaf-frame-pointer
+Target Mask (OMIT_LEAF_FRAME_POINTER) Save
+Omit the frame pointer in leaf functions.
+
 Mask(64BIT)
 
 Mask(MUL)
diff --git a/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-1.c b/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-1.c
new file mode 100644
index 00000000000..c96123ea702
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-1.c
@@ -0,0 +1,7 @@ 
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64 -O2 -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -fno-inline" } */
+
+#include "omit-frame-pointer-test.c"
+
+/* { dg-final { scan-assembler-times "sd\tra,\[0-9\]+\\(sp\\)" 2 } } */
+/* { dg-final { scan-assembler-times "sd\ts0,\[0-9\]+\\(sp\\)" 2 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-2.c b/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-2.c
new file mode 100644
index 00000000000..067148c6a58
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-2.c
@@ -0,0 +1,7 @@ 
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64 -O2 -fno-omit-frame-pointer -momit-leaf-frame-pointer -fno-inline" } */
+
+#include "omit-frame-pointer-test.c"
+
+/* { dg-final { scan-assembler-times "sd\tra,\[0-9\]+\\(sp\\)" 1 } } */
+/* { dg-final { scan-assembler-times "sd\ts0,\[0-9\]+\\(sp\\)" 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-3.c b/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-3.c
new file mode 100644
index 00000000000..b4d7d6f4f0d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-3.c
@@ -0,0 +1,7 @@ 
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64 -O2 -fomit-frame-pointer -mno-omit-leaf-frame-pointer -fno-inline" } */
+
+#include "omit-frame-pointer-test.c"
+
+/* { dg-final { scan-assembler-times "sd\tra,\[0-9\]+\\(sp\\)" 1 } } */
+/* { dg-final { scan-assembler-not "sd\ts0,\[0-9\]+\\(sp\\)"} } */
diff --git a/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-4.c b/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-4.c
new file mode 100644
index 00000000000..5a5b540ef4e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-4.c
@@ -0,0 +1,7 @@ 
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64 -O2 -fomit-frame-pointer -momit-leaf-frame-pointer -fno-inline" } */
+
+#include "omit-frame-pointer-test.c"
+
+/* { dg-final { scan-assembler-times "sd\tra,\[0-9\]+\\(sp\\)" 1 } } */
+/* { dg-final { scan-assembler-not "sd\ts0,\[0-9\]+\\(sp\\)"} } */
diff --git a/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-test.c b/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-test.c
new file mode 100644
index 00000000000..cf19f001e29
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/omit-frame-pointer-test.c
@@ -0,0 +1,13 @@ 
+int inc(int n)
+{
+  return n + 1;
+}
+
+
+int bar(void)
+{
+  int n = 100;
+  n = inc(n);
+  n = inc(n) + 100;
+  return n;
+}