x86: Don't check if AVX512 template requires AVX512VL
Checks
Commit Message
If the ZMM operand in an AVX12 template also allows XMM or ZMM, this
template must require AVX512VL. Drop the AVX512VL requirement check
on such AVX512 templates.
* config/tc-i386.c (check_VecOperands): Don't check if AVX512
template requires AVX512VL.
---
gas/config/tc-i386.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
Comments
On 20.06.2023 18:44, H.J. Lu via Binutils wrote:
> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -6288,11 +6288,10 @@ check_VecOperands (const insn_template *t)
> /* Templates allowing for ZMMword as well as YMMword and/or XMMword for
> any one operand are implicity requiring AVX512VL support if the actual
> operand size is YMMword or XMMword. Since this function runs after
> - template matching, there's no need to check for YMMword/XMMword in
> - the template. */
> + template matching, there's no need to check for YMMword/XMMword nor
> + AVX512VL in the template. */
> cpu = cpu_flags_and (t->cpu_flags, avx512);
> if (!cpu_flags_all_zero (&cpu)
> - && !t->cpu_flags.bitfield.cpuavx512vl
> && !cpu_arch_flags.bitfield.cpuavx512vl)
> {
> for (op = 0; op < t->operands; ++op)
While the code change is correct afaict, I don't think we want it,
which is related to your imo wrong editing of the comment: For one we
check for the flag to be clear (unlike the xmm/ymm checks). And then
the check is there to skip a pointless (in that case) loop. IOW the
loop is only needed for templates which do not explicitly name
AVX512VL as a required feature; templates which do can't validly have
a mix of zmm and one or both of xmm/ymm in any one operand. In fact if
you grep i386-opc.tbl for "AVX612VL.*ZMM" you won't find any single
match.
Jan
On 20.06.2023 18:44, H.J. Lu via Binutils wrote:
> If the ZMM operand in an AVX12 template also allows XMM or ZMM, this
> template must require AVX512VL. Drop the AVX512VL requirement check
> on such AVX512 templates.
>
> * config/tc-i386.c (check_VecOperands): Don't check if AVX512
> template requires AVX512VL.
> ---
> gas/config/tc-i386.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
I notice you've committed this change already. May I please ask that
you wait a little with committing any patches where there is a chance
that there are going to be comments? (Typically I wait for at least a
week unless it's a clear bugfix.)
Jan
On Tue, Jun 20, 2023 at 11:52 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 20.06.2023 18:44, H.J. Lu via Binutils wrote:
> > --- a/gas/config/tc-i386.c
> > +++ b/gas/config/tc-i386.c
> > @@ -6288,11 +6288,10 @@ check_VecOperands (const insn_template *t)
> > /* Templates allowing for ZMMword as well as YMMword and/or XMMword for
> > any one operand are implicity requiring AVX512VL support if the actual
> > operand size is YMMword or XMMword. Since this function runs after
> > - template matching, there's no need to check for YMMword/XMMword in
> > - the template. */
> > + template matching, there's no need to check for YMMword/XMMword nor
> > + AVX512VL in the template. */
> > cpu = cpu_flags_and (t->cpu_flags, avx512);
> > if (!cpu_flags_all_zero (&cpu)
> > - && !t->cpu_flags.bitfield.cpuavx512vl
> > && !cpu_arch_flags.bitfield.cpuavx512vl)
> > {
> > for (op = 0; op < t->operands; ++op)
>
> While the code change is correct afaict, I don't think we want it,
> which is related to your imo wrong editing of the comment: For one we
> check for the flag to be clear (unlike the xmm/ymm checks). And then
> the check is there to skip a pointless (in that case) loop. IOW the
> loop is only needed for templates which do not explicitly name
> AVX512VL as a required feature; templates which do can't validly have
> a mix of zmm and one or both of xmm/ymm in any one operand. In fact if
> you grep i386-opc.tbl for "AVX612VL.*ZMM" you won't find any single
> match.
>
That is why the check isn't needed.
On 21.06.2023 18:06, H.J. Lu wrote:
> On Tue, Jun 20, 2023 at 11:52 PM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 20.06.2023 18:44, H.J. Lu via Binutils wrote:
>>> --- a/gas/config/tc-i386.c
>>> +++ b/gas/config/tc-i386.c
>>> @@ -6288,11 +6288,10 @@ check_VecOperands (const insn_template *t)
>>> /* Templates allowing for ZMMword as well as YMMword and/or XMMword for
>>> any one operand are implicity requiring AVX512VL support if the actual
>>> operand size is YMMword or XMMword. Since this function runs after
>>> - template matching, there's no need to check for YMMword/XMMword in
>>> - the template. */
>>> + template matching, there's no need to check for YMMword/XMMword nor
>>> + AVX512VL in the template. */
>>> cpu = cpu_flags_and (t->cpu_flags, avx512);
>>> if (!cpu_flags_all_zero (&cpu)
>>> - && !t->cpu_flags.bitfield.cpuavx512vl
>>> && !cpu_arch_flags.bitfield.cpuavx512vl)
>>> {
>>> for (op = 0; op < t->operands; ++op)
>>
>> While the code change is correct afaict, I don't think we want it,
>> which is related to your imo wrong editing of the comment: For one we
>> check for the flag to be clear (unlike the xmm/ymm checks). And then
>> the check is there to skip a pointless (in that case) loop. IOW the
>> loop is only needed for templates which do not explicitly name
>> AVX512VL as a required feature; templates which do can't validly have
>> a mix of zmm and one or both of xmm/ymm in any one operand. In fact if
>> you grep i386-opc.tbl for "AVX612VL.*ZMM" you won't find any single
>> match.
>
> That is why the check isn't needed.
I'm confused now: You explicitly want to make gas perform more work for
no gain? The flag being clear is the common case, and hence in the
common case we can avoid getting into this loop. So yes, as said, your
code change is functionally correct (as in: not breaking things), but
it is at the same time making overall behavior worse. Please revert.
Jan
On 22.06.2023 11:00, Jan Beulich wrote:
> On 21.06.2023 18:06, H.J. Lu wrote:
>> On Tue, Jun 20, 2023 at 11:52 PM Jan Beulich <jbeulich@suse.com> wrote:
>>>
>>> On 20.06.2023 18:44, H.J. Lu via Binutils wrote:
>>>> --- a/gas/config/tc-i386.c
>>>> +++ b/gas/config/tc-i386.c
>>>> @@ -6288,11 +6288,10 @@ check_VecOperands (const insn_template *t)
>>>> /* Templates allowing for ZMMword as well as YMMword and/or XMMword for
>>>> any one operand are implicity requiring AVX512VL support if the actual
>>>> operand size is YMMword or XMMword. Since this function runs after
>>>> - template matching, there's no need to check for YMMword/XMMword in
>>>> - the template. */
>>>> + template matching, there's no need to check for YMMword/XMMword nor
>>>> + AVX512VL in the template. */
>>>> cpu = cpu_flags_and (t->cpu_flags, avx512);
>>>> if (!cpu_flags_all_zero (&cpu)
>>>> - && !t->cpu_flags.bitfield.cpuavx512vl
>>>> && !cpu_arch_flags.bitfield.cpuavx512vl)
>>>> {
>>>> for (op = 0; op < t->operands; ++op)
>>>
>>> While the code change is correct afaict, I don't think we want it,
>>> which is related to your imo wrong editing of the comment: For one we
>>> check for the flag to be clear (unlike the xmm/ymm checks). And then
>>> the check is there to skip a pointless (in that case) loop. IOW the
>>> loop is only needed for templates which do not explicitly name
>>> AVX512VL as a required feature; templates which do can't validly have
>>> a mix of zmm and one or both of xmm/ymm in any one operand. In fact if
>>> you grep i386-opc.tbl for "AVX612VL.*ZMM" you won't find any single
>>> match.
>>
>> That is why the check isn't needed.
>
> I'm confused now: You explicitly want to make gas perform more work for
> no gain? The flag being clear is the common case, and hence in the
> common case we can avoid getting into this loop.
I guess I managed to contradict myself here: The flag being clear is the
common case, and hence the effect of avoiding to enter the loop is more
limited than I first inferred.
Jan
> So yes, as said, your
> code change is functionally correct (as in: not breaking things), but
> it is at the same time making overall behavior worse. Please revert.
>
> Jan
@@ -6288,11 +6288,10 @@ check_VecOperands (const insn_template *t)
/* Templates allowing for ZMMword as well as YMMword and/or XMMword for
any one operand are implicity requiring AVX512VL support if the actual
operand size is YMMword or XMMword. Since this function runs after
- template matching, there's no need to check for YMMword/XMMword in
- the template. */
+ template matching, there's no need to check for YMMword/XMMword nor
+ AVX512VL in the template. */
cpu = cpu_flags_and (t->cpu_flags, avx512);
if (!cpu_flags_all_zero (&cpu)
- && !t->cpu_flags.bitfield.cpuavx512vl
&& !cpu_arch_flags.bitfield.cpuavx512vl)
{
for (op = 0; op < t->operands; ++op)