[Rust,front-end,v3,01/46] Use DW_ATE_UTF for the Rust 'char' type

Message ID 20221026081811.602573-2-arthur.cohen@embecosm.com
State Accepted
Headers
Series [Rust,front-end,v3,01/46] Use DW_ATE_UTF for the Rust 'char' type |

Checks

Context Check Description
snail/gcc-patch-check success Github commit url

Commit Message

Arthur Cohen Oct. 26, 2022, 8:17 a.m. UTC
  From: Tom Tromey <tom@tromey.com>

The Rust 'char' type should use the DWARF DW_ATE_UTF encoding.
---
 gcc/dwarf2out.cc | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)
  

Comments

Jakub Jelinek Oct. 26, 2022, 8:39 a.m. UTC | #1
On Wed, Oct 26, 2022 at 10:17:26AM +0200, arthur.cohen@embecosm.com wrote:
> @@ -25201,6 +25215,13 @@ gen_compile_unit_die (const char *filename)
>      }
>    else if (strcmp (language_string, "GNU F77") == 0)
>      language = DW_LANG_Fortran77;
> +  else if (strcmp (language_string, "GNU Rust") == 0)
> +    {
> +      if (dwarf_version >= 5 || !dwarf_strict)
> +	language = DW_LANG_Rust;
> +      else
> +	language = DW_LANG_Rust_old;
> +    }

I must say I don't understand nor like this DW_LANG_Rust_old stuff at all.
Other languages don't do similar dances.
Look for D, or Go.  Neither of them has any non-standard lang code as
fallback, they use the DWARF assigned DW_LANG_* code, and DW_LANG_C as
fallback.  On most arches, DWARF 5 is the default anyway, or non-strict
DWARF at least.  Where neither is enabled because of prehistoric or buggy
DWARF consumers, it is unlikely they'd handle Rust sanely anyway.
Just follow what Go does in the same function.

	Jakub
  
Mark Wielaard Oct. 30, 2022, 3:22 p.m. UTC | #2
Hi,

On Wed, Oct 26, 2022 at 10:39:09AM +0200, Jakub Jelinek wrote:
> I must say I don't understand nor like this DW_LANG_Rust_old stuff at all.
> Other languages don't do similar dances.
> Look for D, or Go.  Neither of them has any non-standard lang code as
> fallback, they use the DWARF assigned DW_LANG_* code, and DW_LANG_C as
> fallback.  On most arches, DWARF 5 is the default anyway, or non-strict
> DWARF at least.  Where neither is enabled because of prehistoric or buggy
> DWARF consumers, it is unlikely they'd handle Rust sanely anyway.
> Just follow what Go does in the same function.

DW_LANG_Rust_old was used by old rustc compilers <= 2016 before DWARF5
assigned an official number. It might be recognized by some
debuggers. But I agree that these days it doesn't really make sense to
emit it. When producing strict DWARF it is also slightly odd to emit a
non-standard language code. So I agree that it makes sense to do what
Go does, always emit DW_LANG_Rust unless we emit strict DWARF for
versions before 5 (and then just fall back to DW_LANG_C).

The attached patch (against "upstream gccrs") does that. I kept the
oldlang.rs testcase just to see that the -gstrict-dwarf -gdwarf-3 case
does something sane.

The only "issue" is that is_rust () depends on the comp_unit_die
DW_AT_language being DW_LANG_Rust. But the only usage of is_rust
already depends on strict DWARF.

https://code.wildebeest.org/git/user/mjw/gccrs/commit/?h=no-Rust-old
if someone wants to push that, to merge for a v4.

Thanks,

Mark
  
Jakub Jelinek Oct. 30, 2022, 5:25 p.m. UTC | #3
On Sun, Oct 30, 2022 at 04:22:34PM +0100, Mark Wielaard wrote:
> Hi,
> 
> On Wed, Oct 26, 2022 at 10:39:09AM +0200, Jakub Jelinek wrote:
> > I must say I don't understand nor like this DW_LANG_Rust_old stuff at all.
> > Other languages don't do similar dances.
> > Look for D, or Go.  Neither of them has any non-standard lang code as
> > fallback, they use the DWARF assigned DW_LANG_* code, and DW_LANG_C as
> > fallback.  On most arches, DWARF 5 is the default anyway, or non-strict
> > DWARF at least.  Where neither is enabled because of prehistoric or buggy
> > DWARF consumers, it is unlikely they'd handle Rust sanely anyway.
> > Just follow what Go does in the same function.
> 
> DW_LANG_Rust_old was used by old rustc compilers <= 2016 before DWARF5
> assigned an official number. It might be recognized by some
> debuggers. But I agree that these days it doesn't really make sense to
> emit it. When producing strict DWARF it is also slightly odd to emit a
> non-standard language code. So I agree that it makes sense to do what
> Go does, always emit DW_LANG_Rust unless we emit strict DWARF for
> versions before 5 (and then just fall back to DW_LANG_C).
> 
> The attached patch (against "upstream gccrs") does that. I kept the
> oldlang.rs testcase just to see that the -gstrict-dwarf -gdwarf-3 case
> does something sane.
> 
> The only "issue" is that is_rust () depends on the comp_unit_die
> DW_AT_language being DW_LANG_Rust. But the only usage of is_rust
> already depends on strict DWARF.
> 
> https://code.wildebeest.org/git/user/mjw/gccrs/commit/?h=no-Rust-old
> if someone wants to push that, to merge for a v4.

LGTM, thanks.

	Jakub
  
Tom Tromey Oct. 31, 2022, 2:19 p.m. UTC | #4
>>>>> "Mark" == Mark Wielaard <mark@klomp.org> writes:

Mark> DW_LANG_Rust_old was used by old rustc compilers <= 2016 before DWARF5
Mark> assigned an official number. It might be recognized by some
Mark> debuggers.

FWIW I wouldn't worry about it any more.
We could probably just remove the '_old' constant.

Tom
  
Marc Poulhiès Nov. 15, 2022, 8:37 p.m. UTC | #5
Mark Wielaard <mark@klomp.org> writes:

> https://code.wildebeest.org/git/user/mjw/gccrs/commit/?h=no-Rust-old
> if someone wants to push that, to merge for a v4.

Sorry, missed that part, taking care of merging it right now :)

https://github.com/Rust-GCC/gccrs/pull/1649

Thanks,
Marc
  

Patch

diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index e3920c898f5..a8bccbabca4 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -5600,6 +5600,16 @@  is_fortran (const_tree decl)
   return is_fortran ();
 }
 
+/* Return TRUE if the language is Rust.  */
+
+static inline bool
+is_rust ()
+{
+  unsigned int lang = get_AT_unsigned (comp_unit_die (), DW_AT_language);
+
+  return lang == DW_LANG_Rust || lang == DW_LANG_Rust_old;
+}
+
 /* Return TRUE if the language is Ada.  */
 
 static inline bool
@@ -13231,7 +13241,11 @@  base_type_die (tree type, bool reverse)
 	}
       if (TYPE_STRING_FLAG (type))
 	{
-	  if (TYPE_UNSIGNED (type))
+	  if ((dwarf_version >= 4 || !dwarf_strict)
+	      && is_rust ()
+	      && int_size_in_bytes (type) == 4)
+	    encoding = DW_ATE_UTF;
+	  else if (TYPE_UNSIGNED (type))
 	    encoding = DW_ATE_unsigned_char;
 	  else
 	    encoding = DW_ATE_signed_char;
@@ -25201,6 +25215,13 @@  gen_compile_unit_die (const char *filename)
     }
   else if (strcmp (language_string, "GNU F77") == 0)
     language = DW_LANG_Fortran77;
+  else if (strcmp (language_string, "GNU Rust") == 0)
+    {
+      if (dwarf_version >= 5 || !dwarf_strict)
+	language = DW_LANG_Rust;
+      else
+	language = DW_LANG_Rust_old;
+    }
   else if (dwarf_version >= 3 || !dwarf_strict)
     {
       if (strcmp (language_string, "GNU Ada") == 0)