x86: Don't check if AVX512 template requires AVX512VL

Message ID 20230620164436.432481-1-hjl.tools@gmail.com
State Accepted
Headers
Series x86: Don't check if AVX512 template requires AVX512VL |

Checks

Context Check Description
snail/binutils-gdb-check success Github commit url

Commit Message

H.J. Lu June 20, 2023, 4:44 p.m. UTC
  If the ZMM operand in an AVX12 template also allows XMM or ZMM, this
template must require AVX512VL.  Drop the AVX512VL requirement check
on such AVX512 templates.

	* config/tc-i386.c (check_VecOperands): Don't check if AVX512
	template requires AVX512VL.
---
 gas/config/tc-i386.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)
  

Comments

Jan Beulich June 21, 2023, 6:52 a.m. UTC | #1
On 20.06.2023 18:44, H.J. Lu via Binutils wrote:
> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -6288,11 +6288,10 @@ check_VecOperands (const insn_template *t)
>    /* Templates allowing for ZMMword as well as YMMword and/or XMMword for
>       any one operand are implicity requiring AVX512VL support if the actual
>       operand size is YMMword or XMMword.  Since this function runs after
> -     template matching, there's no need to check for YMMword/XMMword in
> -     the template.  */
> +     template matching, there's no need to check for YMMword/XMMword nor
> +     AVX512VL in the template.  */
>    cpu = cpu_flags_and (t->cpu_flags, avx512);
>    if (!cpu_flags_all_zero (&cpu)
> -      && !t->cpu_flags.bitfield.cpuavx512vl
>        && !cpu_arch_flags.bitfield.cpuavx512vl)
>      {
>        for (op = 0; op < t->operands; ++op)

While the code change is correct afaict, I don't think we want it,
which is related to your imo wrong editing of the comment: For one we
check for the flag to be clear (unlike the xmm/ymm checks). And then
the check is there to skip a pointless (in that case) loop. IOW the
loop is only needed for templates which do not explicitly name
AVX512VL as a required feature; templates which do can't validly have
a mix of zmm and one or both of xmm/ymm in any one operand. In fact if
you grep i386-opc.tbl for "AVX612VL.*ZMM" you won't find any single
match.

Jan
  
Jan Beulich June 21, 2023, 10:04 a.m. UTC | #2
On 20.06.2023 18:44, H.J. Lu via Binutils wrote:
> If the ZMM operand in an AVX12 template also allows XMM or ZMM, this
> template must require AVX512VL.  Drop the AVX512VL requirement check
> on such AVX512 templates.
> 
> 	* config/tc-i386.c (check_VecOperands): Don't check if AVX512
> 	template requires AVX512VL.
> ---
>  gas/config/tc-i386.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)

I notice you've committed this change already. May I please ask that
you wait a little with committing any patches where there is a chance
that there are going to be comments? (Typically I wait for at least a
week unless it's a clear bugfix.)

Jan
  
H.J. Lu June 21, 2023, 4:06 p.m. UTC | #3
On Tue, Jun 20, 2023 at 11:52 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 20.06.2023 18:44, H.J. Lu via Binutils wrote:
> > --- a/gas/config/tc-i386.c
> > +++ b/gas/config/tc-i386.c
> > @@ -6288,11 +6288,10 @@ check_VecOperands (const insn_template *t)
> >    /* Templates allowing for ZMMword as well as YMMword and/or XMMword for
> >       any one operand are implicity requiring AVX512VL support if the actual
> >       operand size is YMMword or XMMword.  Since this function runs after
> > -     template matching, there's no need to check for YMMword/XMMword in
> > -     the template.  */
> > +     template matching, there's no need to check for YMMword/XMMword nor
> > +     AVX512VL in the template.  */
> >    cpu = cpu_flags_and (t->cpu_flags, avx512);
> >    if (!cpu_flags_all_zero (&cpu)
> > -      && !t->cpu_flags.bitfield.cpuavx512vl
> >        && !cpu_arch_flags.bitfield.cpuavx512vl)
> >      {
> >        for (op = 0; op < t->operands; ++op)
>
> While the code change is correct afaict, I don't think we want it,
> which is related to your imo wrong editing of the comment: For one we
> check for the flag to be clear (unlike the xmm/ymm checks). And then
> the check is there to skip a pointless (in that case) loop. IOW the
> loop is only needed for templates which do not explicitly name
> AVX512VL as a required feature; templates which do can't validly have
> a mix of zmm and one or both of xmm/ymm in any one operand. In fact if
> you grep i386-opc.tbl for "AVX612VL.*ZMM" you won't find any single
> match.
>

That is why the check isn't needed.
  
Jan Beulich June 22, 2023, 9 a.m. UTC | #4
On 21.06.2023 18:06, H.J. Lu wrote:
> On Tue, Jun 20, 2023 at 11:52 PM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 20.06.2023 18:44, H.J. Lu via Binutils wrote:
>>> --- a/gas/config/tc-i386.c
>>> +++ b/gas/config/tc-i386.c
>>> @@ -6288,11 +6288,10 @@ check_VecOperands (const insn_template *t)
>>>    /* Templates allowing for ZMMword as well as YMMword and/or XMMword for
>>>       any one operand are implicity requiring AVX512VL support if the actual
>>>       operand size is YMMword or XMMword.  Since this function runs after
>>> -     template matching, there's no need to check for YMMword/XMMword in
>>> -     the template.  */
>>> +     template matching, there's no need to check for YMMword/XMMword nor
>>> +     AVX512VL in the template.  */
>>>    cpu = cpu_flags_and (t->cpu_flags, avx512);
>>>    if (!cpu_flags_all_zero (&cpu)
>>> -      && !t->cpu_flags.bitfield.cpuavx512vl
>>>        && !cpu_arch_flags.bitfield.cpuavx512vl)
>>>      {
>>>        for (op = 0; op < t->operands; ++op)
>>
>> While the code change is correct afaict, I don't think we want it,
>> which is related to your imo wrong editing of the comment: For one we
>> check for the flag to be clear (unlike the xmm/ymm checks). And then
>> the check is there to skip a pointless (in that case) loop. IOW the
>> loop is only needed for templates which do not explicitly name
>> AVX512VL as a required feature; templates which do can't validly have
>> a mix of zmm and one or both of xmm/ymm in any one operand. In fact if
>> you grep i386-opc.tbl for "AVX612VL.*ZMM" you won't find any single
>> match.
> 
> That is why the check isn't needed.

I'm confused now: You explicitly want to make gas perform more work for
no gain? The flag being clear is the common case, and hence in the
common case we can avoid getting into this loop. So yes, as said, your
code change is functionally correct (as in: not breaking things), but
it is at the same time making overall behavior worse. Please revert.

Jan
  
Jan Beulich June 23, 2023, 5:59 a.m. UTC | #5
On 22.06.2023 11:00, Jan Beulich wrote:
> On 21.06.2023 18:06, H.J. Lu wrote:
>> On Tue, Jun 20, 2023 at 11:52 PM Jan Beulich <jbeulich@suse.com> wrote:
>>>
>>> On 20.06.2023 18:44, H.J. Lu via Binutils wrote:
>>>> --- a/gas/config/tc-i386.c
>>>> +++ b/gas/config/tc-i386.c
>>>> @@ -6288,11 +6288,10 @@ check_VecOperands (const insn_template *t)
>>>>    /* Templates allowing for ZMMword as well as YMMword and/or XMMword for
>>>>       any one operand are implicity requiring AVX512VL support if the actual
>>>>       operand size is YMMword or XMMword.  Since this function runs after
>>>> -     template matching, there's no need to check for YMMword/XMMword in
>>>> -     the template.  */
>>>> +     template matching, there's no need to check for YMMword/XMMword nor
>>>> +     AVX512VL in the template.  */
>>>>    cpu = cpu_flags_and (t->cpu_flags, avx512);
>>>>    if (!cpu_flags_all_zero (&cpu)
>>>> -      && !t->cpu_flags.bitfield.cpuavx512vl
>>>>        && !cpu_arch_flags.bitfield.cpuavx512vl)
>>>>      {
>>>>        for (op = 0; op < t->operands; ++op)
>>>
>>> While the code change is correct afaict, I don't think we want it,
>>> which is related to your imo wrong editing of the comment: For one we
>>> check for the flag to be clear (unlike the xmm/ymm checks). And then
>>> the check is there to skip a pointless (in that case) loop. IOW the
>>> loop is only needed for templates which do not explicitly name
>>> AVX512VL as a required feature; templates which do can't validly have
>>> a mix of zmm and one or both of xmm/ymm in any one operand. In fact if
>>> you grep i386-opc.tbl for "AVX612VL.*ZMM" you won't find any single
>>> match.
>>
>> That is why the check isn't needed.
> 
> I'm confused now: You explicitly want to make gas perform more work for
> no gain? The flag being clear is the common case, and hence in the
> common case we can avoid getting into this loop.

I guess I managed to contradict myself here: The flag being clear is the
common case, and hence the effect of avoiding to enter the loop is more
limited than I first inferred.

Jan

> So yes, as said, your
> code change is functionally correct (as in: not breaking things), but
> it is at the same time making overall behavior worse. Please revert.
> 
> Jan
  

Patch

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index de35ee2a2c6..dcafac0c0cd 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -6288,11 +6288,10 @@  check_VecOperands (const insn_template *t)
   /* Templates allowing for ZMMword as well as YMMword and/or XMMword for
      any one operand are implicity requiring AVX512VL support if the actual
      operand size is YMMword or XMMword.  Since this function runs after
-     template matching, there's no need to check for YMMword/XMMword in
-     the template.  */
+     template matching, there's no need to check for YMMword/XMMword nor
+     AVX512VL in the template.  */
   cpu = cpu_flags_and (t->cpu_flags, avx512);
   if (!cpu_flags_all_zero (&cpu)
-      && !t->cpu_flags.bitfield.cpuavx512vl
       && !cpu_arch_flags.bitfield.cpuavx512vl)
     {
       for (op = 0; op < t->operands; ++op)