[v2,5/7] RISC-V: relax post-relocation-operator separator expectation

Message ID 89f892c8-e378-b81c-7b13-322a7876a252@suse.com
State Accepted
Headers
Series RISC-V/gas: insn operand parsing |

Checks

Context Check Description
snail/binutils-gdb-check success Github commit url

Commit Message

Jan Beulich March 10, 2023, 9:27 a.m. UTC
  With a blank being okay as a separator, constructs like

	lui	t0, %hi sym
	lui	t0, %hi 0x1000

are accepted. But then it makes little sense to not also accept e.g.

	lui	t0, %hi +sym
	lui	t0, %hi -0x1000

Therefore instead of looking for a blank or opening parenthesis, merely
check whether what follows wouldn't continue an identifier.

Note that we don't need to also check is_name_ender(), as LEX_END_NAME
isn't used for any character.
---
v2: New.
  

Comments

Nelson Chu April 25, 2023, 8:13 a.m. UTC | #1
On Fri, Mar 10, 2023 at 5:27 PM Jan Beulich <jbeulich@suse.com> wrote:

> With a blank being okay as a separator, constructs like
>
>         lui     t0, %hi sym
>         lui     t0, %hi 0x1000
>

I never noticed this kind of usage.  I always thought we should have a ()
after the %hi/%pcrel_hi/..., so perhaps this is just a mistake that should
be fixed.  Maybe other experts can help to clarify this behavior.

Thanks
Nelson


> are accepted. But then it makes little sense to not also accept e.g.
>
>         lui     t0, %hi +sym
>         lui     t0, %hi -0x1000
>
> Therefore instead of looking for a blank or opening parenthesis, merely
> check whether what follows wouldn't continue an identifier.
>
> Note that we don't need to also check is_name_ender(), as LEX_END_NAME
> isn't used for any character.
> ---
> v2: New.
>
> --- a/gas/config/tc-riscv.c
> +++ b/gas/config/tc-riscv.c
> @@ -2184,7 +2184,7 @@ parse_relocation (char **str, bfd_reloc_
>        {
>         size_t len = 1 + strlen (percent_op->str);
>
> -       if (!ISSPACE ((*str)[len]) && (*str)[len] != '(')
> +       if (is_part_of_name ((*str)[len]))
>           continue;
>
>         *str += len;
> --- a/gas/testsuite/gas/riscv/auipc-parsing.d
> +++ b/gas/testsuite/gas/riscv/auipc-parsing.d
> @@ -1,3 +1,3 @@
> -#as:
> +#as: -al
>  #source: auipc-parsing.s
>  #error_output: auipc-parsing.l
> --- a/gas/testsuite/gas/riscv/auipc-parsing.l
> +++ b/gas/testsuite/gas/riscv/auipc-parsing.l
> @@ -3,3 +3,7 @@
>  .*: Error: illegal operands `lui x10,x11'
>  .*: Error: illegal operands `auipc x12,symbol'
>  .*: Error: illegal operands `lui x13,symbol'
> +#...
> + *[0-9]+[      ]+# Accept unary .*
> + *[0-9]+[      ]+\?\?\?\? 17070000[    ]+auipc x14,%hi \+symbol
> + *[0-9]+[      ]+\?\?\?\? B7B7CBED[    ]+lui   x15,%hi -0x12345678
> --- a/gas/testsuite/gas/riscv/auipc-parsing.s
> +++ b/gas/testsuite/gas/riscv/auipc-parsing.s
> @@ -4,3 +4,6 @@
>  # Don't accept a symbol without %hi() for 'u' operands.
>         auipc   x12,symbol
>         lui     x13,symbol
> +# Accept unary operators starting the subject expression.
> +       auipc   x14,%hi +symbol
> +       lui     x15,%hi -0x12345678
>
>
  
Andreas Schwab April 25, 2023, 8:37 a.m. UTC | #2
On Mär 10 2023, Jan Beulich via Binutils wrote:

> With a blank being okay as a separator, constructs like
>
> 	lui	t0, %hi sym
> 	lui	t0, %hi 0x1000
>
> are accepted. But then it makes little sense to not also accept e.g.
>
> 	lui	t0, %hi +sym
> 	lui	t0, %hi -0x1000

https://github.com/riscv-non-isa/riscv-asm-manual/blob/master/riscv-asm.md
always includes the parens around the expression, and I don't think it
should be accepted without them.
  
Nelson Chu April 25, 2023, 8:44 a.m. UTC | #3
Thanks for the information, Andreas.  The riscv-asm should be the right
place to see the expected behaviors, and discuss the related stuff :-)

Nelson

On Tue, Apr 25, 2023 at 4:37 PM Andreas Schwab <schwab@linux-m68k.org>
wrote:

> On Mär 10 2023, Jan Beulich via Binutils wrote:
>
> > With a blank being okay as a separator, constructs like
> >
> >       lui     t0, %hi sym
> >       lui     t0, %hi 0x1000
> >
> > are accepted. But then it makes little sense to not also accept e.g.
> >
> >       lui     t0, %hi +sym
> >       lui     t0, %hi -0x1000
>
> https://github.com/riscv-non-isa/riscv-asm-manual/blob/master/riscv-asm.md
> always includes the parens around the expression, and I don't think it
> should be accepted without them.
>
> --
> Andreas Schwab, schwab@linux-m68k.org
> GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
> "And now for something completely different."
>
  
Jan Beulich April 25, 2023, 8:52 a.m. UTC | #4
On 25.04.2023 10:44, Nelson Chu wrote:
> Thanks for the information, Andreas.  The riscv-asm should be the right
> place to see the expected behaviors, and discuss the related stuff :-)

Just to clarify: Is this a request to replace this patch by one insisting
on the use of a parenthesis? If so, I can certainly do that, and commit
only the remaining patches (as per your other reply).

Jan

> On Tue, Apr 25, 2023 at 4:37 PM Andreas Schwab <schwab@linux-m68k.org>
> wrote:
> 
>> On Mär 10 2023, Jan Beulich via Binutils wrote:
>>
>>> With a blank being okay as a separator, constructs like
>>>
>>>       lui     t0, %hi sym
>>>       lui     t0, %hi 0x1000
>>>
>>> are accepted. But then it makes little sense to not also accept e.g.
>>>
>>>       lui     t0, %hi +sym
>>>       lui     t0, %hi -0x1000
>>
>> https://github.com/riscv-non-isa/riscv-asm-manual/blob/master/riscv-asm.md
>> always includes the parens around the expression, and I don't think it
>> should be accepted without them.
>>
>> --
>> Andreas Schwab, schwab@linux-m68k.org
>> GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
>> "And now for something completely different."
>>
>
  
Nelson Chu April 28, 2023, 1:29 a.m. UTC | #5
Sorry for missing this one.  No, not a request.  I'm fine with both, since
this patch won't break the risc-gnu-toolchain regression, so most of the
stuff already used looks good.  It seems that the risc v-asm doesn't say we
"must '' use a parenthesis after the modifiers, it just shows examples that
all use it.

Thanks
Nelson

On Tue, Apr 25, 2023 at 4:52 PM Jan Beulich <jbeulich@suse.com> wrote:

> On 25.04.2023 10:44, Nelson Chu wrote:
> > Thanks for the information, Andreas.  The riscv-asm should be the right
> > place to see the expected behaviors, and discuss the related stuff :-)
>
> Just to clarify: Is this a request to replace this patch by one insisting
> on the use of a parenthesis? If so, I can certainly do that, and commit
> only the remaining patches (as per your other reply).
>
> Jan
>
> > On Tue, Apr 25, 2023 at 4:37 PM Andreas Schwab <schwab@linux-m68k.org>
> > wrote:
> >
> >> On Mär 10 2023, Jan Beulich via Binutils wrote:
> >>
> >>> With a blank being okay as a separator, constructs like
> >>>
> >>>       lui     t0, %hi sym
> >>>       lui     t0, %hi 0x1000
> >>>
> >>> are accepted. But then it makes little sense to not also accept e.g.
> >>>
> >>>       lui     t0, %hi +sym
> >>>       lui     t0, %hi -0x1000
> >>
> >>
> https://github.com/riscv-non-isa/riscv-asm-manual/blob/master/riscv-asm.md
> >> always includes the parens around the expression, and I don't think it
> >> should be accepted without them.
> >>
> >> --
> >> Andreas Schwab, schwab@linux-m68k.org
> >> GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
> >> "And now for something completely different."
> >>
> >
>
>
  
Jan Beulich April 28, 2023, 6:05 a.m. UTC | #6
On 28.04.2023 03:29, Nelson Chu wrote:
> Sorry for missing this one.  No, not a request.  I'm fine with both, since
> this patch won't break the risc-gnu-toolchain regression, so most of the
> stuff already used looks good.  It seems that the risc v-asm doesn't say we
> "must '' use a parenthesis after the modifiers, it just shows examples that
> all use it.

Hmm, interesting. My interpretation of especially the table in the doc (not
so much the examples) would be that it pretty much mandates parentheses. One
might even go as far as taking it to mandate no blank between reloc specifier
and paren (but I wouldn't feel well going that far without it being said
explicitly). So my plan was to make the alternative patch, and then leave it
to you (incl other maintainers) to decide (with my personal preference being
the yet to be created alternative one, now that I've seen the doc).

Jan
  
Fangrui Song April 28, 2023, 8:03 a.m. UTC | #7
On Thu, Apr 27, 2023 at 11:05 PM Jan Beulich via Binutils
<binutils@sourceware.org> wrote:
>
> On 28.04.2023 03:29, Nelson Chu wrote:
> > Sorry for missing this one.  No, not a request.  I'm fine with both, since
> > this patch won't break the risc-gnu-toolchain regression, so most of the
> > stuff already used looks good.  It seems that the risc v-asm doesn't say we
> > "must '' use a parenthesis after the modifiers, it just shows examples that
> > all use it.
>
> Hmm, interesting. My interpretation of especially the table in the doc (not
> so much the examples) would be that it pretty much mandates parentheses. One
> might even go as far as taking it to mandate no blank between reloc specifier
> and paren (but I wouldn't feel well going that far without it being said
> explicitly). So my plan was to make the alternative patch, and then leave it
> to you (incl other maintainers) to decide (with my personal preference being
> the yet to be created alternative one, now that I've seen the doc).
>
> Jan

I agree that rejecting `%hi sym` shall be a reasonable change in gas.
The LLVM integrated assembler (used by Clang and llvm-mc) rejects the
syntax.

% cat a.s
 lui     t0, %hi sym
% clang --target=riscv64 -c a.s
a.s:1:18: error: expected '('
 lui     t0, %hi sym
                 ^
a.s:1:18: error: unknown operand
 lui     t0, %hi sym
                 ^
  
Jan Beulich May 11, 2023, 11:27 a.m. UTC | #8
On 25.04.2023 10:37, Andreas Schwab wrote:
> On Mär 10 2023, Jan Beulich via Binutils wrote:
> 
>> With a blank being okay as a separator, constructs like
>>
>> 	lui	t0, %hi sym
>> 	lui	t0, %hi 0x1000
>>
>> are accepted. But then it makes little sense to not also accept e.g.
>>
>> 	lui	t0, %hi +sym
>> 	lui	t0, %hi -0x1000
> 
> https://github.com/riscv-non-isa/riscv-asm-manual/blob/master/riscv-asm.md
> always includes the parens around the expression, and I don't think it
> should be accepted without them.

Just to mention it: Someone (validly) said that much of this code (and
the underlying syntax) was derived from MIPS. Just now it happened that
I had to look at the MIPS testsuite for a different reason, when it
caught my eye that they're actually testing cases like the earlier two
above (and I'm pretty sure they would similarly fail on the latter two).

Jan
  

Patch

--- a/gas/config/tc-riscv.c
+++ b/gas/config/tc-riscv.c
@@ -2184,7 +2184,7 @@  parse_relocation (char **str, bfd_reloc_
       {
 	size_t len = 1 + strlen (percent_op->str);
 
-	if (!ISSPACE ((*str)[len]) && (*str)[len] != '(')
+	if (is_part_of_name ((*str)[len]))
 	  continue;
 
 	*str += len;
--- a/gas/testsuite/gas/riscv/auipc-parsing.d
+++ b/gas/testsuite/gas/riscv/auipc-parsing.d
@@ -1,3 +1,3 @@ 
-#as:
+#as: -al
 #source: auipc-parsing.s
 #error_output: auipc-parsing.l
--- a/gas/testsuite/gas/riscv/auipc-parsing.l
+++ b/gas/testsuite/gas/riscv/auipc-parsing.l
@@ -3,3 +3,7 @@ 
 .*: Error: illegal operands `lui x10,x11'
 .*: Error: illegal operands `auipc x12,symbol'
 .*: Error: illegal operands `lui x13,symbol'
+#...
+ *[0-9]+[ 	]+# Accept unary .*
+ *[0-9]+[ 	]+\?\?\?\? 17070000[ 	]+auipc	x14,%hi \+symbol
+ *[0-9]+[ 	]+\?\?\?\? B7B7CBED[ 	]+lui	x15,%hi -0x12345678
--- a/gas/testsuite/gas/riscv/auipc-parsing.s
+++ b/gas/testsuite/gas/riscv/auipc-parsing.s
@@ -4,3 +4,6 @@ 
 # Don't accept a symbol without %hi() for 'u' operands.
 	auipc	x12,symbol
 	lui	x13,symbol
+# Accept unary operators starting the subject expression.
+	auipc	x14,%hi +symbol
+	lui	x15,%hi -0x12345678