x86: make better use of VBROADCASTSS / VPBROADCASTD
Checks
Commit Message
... in vec_dupv4sf / *vec_dupv4si. The respective broadcast insns are
never longer (yet sometimes shorter) than the corresponding VSHUFPS /
VPSHUFD, due to the immediate operand of the shuffle insns balancing the
need for VEX3 in the broadcast ones. When EVEX encoding is required the
broadcast insns are always shorter.
Add two new alternatives each, one covering the AVX2 case and one
covering AVX512.
gcc/
* config/i386/sse.md (vec_dupv4sf): New AVX2 and AVX512F
alternatives using vbroadcastss.
(*vec_dupv4si): New AVX2 and AVX512F alternatives using
vpbroadcastd.
---
I'm working from the assumption that the isa attributes to the original
1st and 2nd alternatives don't need further restricting (to sse2_noavx2
or avx_noavx2 as applicable), as the new earlier alternatives cover all
operand forms already when at least AVX2 is enabled.
Isn't prefix_extra use bogus here? What extra prefix does vbroadcastss
use? (Same further down in *vec_dupv4si and avx2_vbroadcasti128_<mode>
and elsewhere.)
Is use of Yv for the source operand really necessary in *vec_dupv4si?
I.e. would scalar integer values be put in XMM{16...31} when AVX512VL
isn't enabled? If so (*movsi_internal / *movdi_internal suggest they
might), wouldn't *vec_dupv2di need to use Yv as well in its 3rd
alternative (or just m, as Yv is already covered by the 2nd one)?
Comments
On Wed, Jun 14, 2023 at 1:58 PM Jan Beulich via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> ... in vec_dupv4sf / *vec_dupv4si. The respective broadcast insns are
> never longer (yet sometimes shorter) than the corresponding VSHUFPS /
> VPSHUFD, due to the immediate operand of the shuffle insns balancing the
> need for VEX3 in the broadcast ones. When EVEX encoding is required the
> broadcast insns are always shorter.
>
> Add two new alternatives each, one covering the AVX2 case and one
> covering AVX512.
I think you can just change assemble output for this first alternative
when TARGET_AVX2, use vbroadcastss, else use vshufps since
vbroadcastss only accept register operand when TARGET_AVX2. And no
need to support 2 extra alternatives which doesn't make sense just
make RA more confused about the same meaning of different
alternatives.
>
> gcc/
>
> * config/i386/sse.md (vec_dupv4sf): New AVX2 and AVX512F
> alternatives using vbroadcastss.
> (*vec_dupv4si): New AVX2 and AVX512F alternatives using
> vpbroadcastd.
> ---
> I'm working from the assumption that the isa attributes to the original
> 1st and 2nd alternatives don't need further restricting (to sse2_noavx2
> or avx_noavx2 as applicable), as the new earlier alternatives cover all
> operand forms already when at least AVX2 is enabled.
>
> Isn't prefix_extra use bogus here? What extra prefix does vbroadcastss
> use? (Same further down in *vec_dupv4si and avx2_vbroadcasti128_<mode>
> and elsewhere.)
Not sure about this part. I grep prefix_extra, seems only used by
znver.md/znver4.md for schedule, and only for comi instructions(?the
reservation name seems so).
>
> Is use of Yv for the source operand really necessary in *vec_dupv4si?
> I.e. would scalar integer values be put in XMM{16...31} when AVX512VL
Yes, You can look at ix86_hard_regno_mode_ok, EXT_REX_SSE_REGNO is
allowed for scalar mode, but not for 128/256-bit vector modes.
20204 if (TARGET_AVX512F
20205 && (VALID_AVX512F_REG_OR_XI_MODE (mode)
20206 || VALID_AVX512F_SCALAR_MODE (mode)))
20207 return true;
> isn't enabled? If so (*movsi_internal / *movdi_internal suggest they
> might), wouldn't *vec_dupv2di need to use Yv as well in its 3rd
> alternative (or just m, as Yv is already covered by the 2nd one)?
I guess xm is more suitable since we still want to allocate
operands[1] to register when sse3_noavx.
It didn't hit any error since for avx and above, alternative 1(2rd
one) is always matched than alternative 2.
>
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -25798,38 +25798,42 @@
> (const_int 1)))])
>
> (define_insn "vec_dupv4sf"
> - [(set (match_operand:V4SF 0 "register_operand" "=v,v,x")
> + [(set (match_operand:V4SF 0 "register_operand" "=Yv,v,v,v,x")
> (vec_duplicate:V4SF
> - (match_operand:SF 1 "nonimmediate_operand" "Yv,m,0")))]
> + (match_operand:SF 1 "nonimmediate_operand" "v,vm,Yv,m,0")))]
> "TARGET_SSE"
> "@
> + vbroadcastss\t{%1, %0|%0, %1}
> + vbroadcastss\t{%1, %g0|%g0, %1}
> vshufps\t{$0, %1, %1, %0|%0, %1, %1, 0}
> vbroadcastss\t{%1, %0|%0, %1}
> shufps\t{$0, %0, %0|%0, %0, 0}"
> - [(set_attr "isa" "avx,avx,noavx")
> - (set_attr "type" "sseshuf1,ssemov,sseshuf1")
> - (set_attr "length_immediate" "1,0,1")
> - (set_attr "prefix_extra" "0,1,*")
> - (set_attr "prefix" "maybe_evex,maybe_evex,orig")
> - (set_attr "mode" "V4SF")])
> + [(set_attr "isa" "avx2,avx512f,avx,avx,noavx")
> + (set_attr "type" "ssemov,ssemov,sseshuf1,ssemov,sseshuf1")
> + (set_attr "length_immediate" "0,0,1,0,1")
> + (set_attr "prefix_extra" "*,*,0,1,*")
> + (set_attr "prefix" "maybe_evex,evex,maybe_evex,maybe_evex,orig")
> + (set_attr "mode" "V4SF,V16SF,V4SF,V4SF,V4SF")])
>
> (define_insn "*vec_dupv4si"
> - [(set (match_operand:V4SI 0 "register_operand" "=v,v,x")
> + [(set (match_operand:V4SI 0 "register_operand" "=Yv,v,v,v,x")
> (vec_duplicate:V4SI
> - (match_operand:SI 1 "nonimmediate_operand" "Yv,m,0")))]
> + (match_operand:SI 1 "nonimmediate_operand" "vm,vm,Yv,m,0")))]
> "TARGET_SSE"
> "@
> + vpbroadcastd\t{%1, %0|%0, %1}
> + vpbroadcastd\t{%1, %g0|%g0, %1}
> %vpshufd\t{$0, %1, %0|%0, %1, 0}
> vbroadcastss\t{%1, %0|%0, %1}
> shufps\t{$0, %0, %0|%0, %0, 0}"
> - [(set_attr "isa" "sse2,avx,noavx")
> - (set_attr "type" "sselog1,ssemov,sselog1")
> - (set_attr "length_immediate" "1,0,1")
> - (set_attr "prefix_extra" "0,1,*")
> - (set_attr "prefix" "maybe_vex,maybe_evex,orig")
> - (set_attr "mode" "TI,V4SF,V4SF")
> + [(set_attr "isa" "avx2,avx512f,sse2,avx,noavx")
> + (set_attr "type" "ssemov,ssemov,sselog1,ssemov,sselog1")
> + (set_attr "length_immediate" "0,0,1,0,1")
> + (set_attr "prefix_extra" "*,*,0,1,*")
> + (set_attr "prefix" "maybe_evex,evex,maybe_vex,maybe_evex,orig")
> + (set_attr "mode" "TI,XI,TI,V4SF,V4SF")
> (set (attr "preferred_for_speed")
> - (cond [(eq_attr "alternative" "1")
> + (cond [(eq_attr "alternative" "3")
> (symbol_ref "!TARGET_INTER_UNIT_MOVES_TO_VEC")
> ]
> (symbol_ref "true")))])
On 14.06.2023 09:41, Hongtao Liu wrote:
> On Wed, Jun 14, 2023 at 1:58 PM Jan Beulich via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> ... in vec_dupv4sf / *vec_dupv4si. The respective broadcast insns are
>> never longer (yet sometimes shorter) than the corresponding VSHUFPS /
>> VPSHUFD, due to the immediate operand of the shuffle insns balancing the
>> need for VEX3 in the broadcast ones. When EVEX encoding is required the
>> broadcast insns are always shorter.
>>
>> Add two new alternatives each, one covering the AVX2 case and one
>> covering AVX512.
> I think you can just change assemble output for this first alternative
> when TARGET_AVX2, use vbroadcastss, else use vshufps since
> vbroadcastss only accept register operand when TARGET_AVX2. And no
> need to support 2 extra alternatives which doesn't make sense just
> make RA more confused about the same meaning of different
> alternatives.
You mean by switching from "@ ..." to C code using "switch
(which_alternative)"? I can do that, sure. Yet that'll make for a
more complicated "length_immediate" attribute then. Would be nice
if you could confirm that this is what you want, as I may well
have misunderstood you.
But that'll be for vec_dupv4sf only, as vec_dupv4si is subtly
different.
>> ---
>> I'm working from the assumption that the isa attributes to the original
>> 1st and 2nd alternatives don't need further restricting (to sse2_noavx2
>> or avx_noavx2 as applicable), as the new earlier alternatives cover all
>> operand forms already when at least AVX2 is enabled.
>>
>> Isn't prefix_extra use bogus here? What extra prefix does vbroadcastss
>> use? (Same further down in *vec_dupv4si and avx2_vbroadcasti128_<mode>
>> and elsewhere.)
> Not sure about this part. I grep prefix_extra, seems only used by
> znver.md/znver4.md for schedule, and only for comi instructions(?the
> reservation name seems so).
define_attr "length_vex" and define_attr "length" use it, too.
Otherwise I would have asked whether the attribute couldn't be
purged from most insns.
My present understanding is that the attribute is wrong on
vec_dupv4sf (and hence wants dropping from there altogether), and it
should be "prefix_data16" instead on *vec_dupv4si, evaluating to 1
only for the non-AVX pshufd case. I suspect at least the latter
would be going to far for doing it "while here" right in this patch.
Plus I think I have seen various other questionable uses of that
attribute.
>> Is use of Yv for the source operand really necessary in *vec_dupv4si?
>> I.e. would scalar integer values be put in XMM{16...31} when AVX512VL
> Yes, You can look at ix86_hard_regno_mode_ok, EXT_REX_SSE_REGNO is
> allowed for scalar mode, but not for 128/256-bit vector modes.
>
> 20204 if (TARGET_AVX512F
> 20205 && (VALID_AVX512F_REG_OR_XI_MODE (mode)
> 20206 || VALID_AVX512F_SCALAR_MODE (mode)))
> 20207 return true;
Okay, so I need to switch input constraints for relevant new
alternatives to Yv (I actually wonder why I did use v in
vec_dupv4sf, as it was clear to me that SFmode can be in the high
16 xmm registers with just AVX512F).
>> isn't enabled? If so (*movsi_internal / *movdi_internal suggest they
>> might), wouldn't *vec_dupv2di need to use Yv as well in its 3rd
>> alternative (or just m, as Yv is already covered by the 2nd one)?
> I guess xm is more suitable since we still want to allocate
> operands[1] to register when sse3_noavx.
> It didn't hit any error since for avx and above, alternative 1(2rd
> one) is always matched than alternative 2.
I'm afraid I don't follow: With just -mavx512f the source operand
can be in, say, %xmm16 (as per your clarification above). This
would not match Yv, but it would match vm. And hence wrongly
create an AVX512VL form of vmovddup. I didn't try it out earlier,
because unlike for SFmode / DFmode I thought it's not really clear
how to get the compiler to reliably put a DImode variable in an xmm
reg, but it just occurred to me that this can be done the same way
there. And voila,
typedef long long __attribute__((vector_size(16))) v2di;
v2di bcst(long long ll) {
register long long x asm("xmm16") = ll;
asm("nop %%esp" : "+v" (x));
return (v2di){x, x};
}
compiled with just -mavx512f (and -O2) produces an AVX512VL insn.
I'll make another patch, yet for that I'm then also not sure why
you say xm would be more suitable. Yvm allows for registers (with
or without AVX, merely SSE being required) just as much as vm
does, doesn't it? And I don't think I've found any combination of
destination being v and source being xm anywhere. Plus we want to
allow for the higher registers when AVX512VL is enabled.
Jan
On Wed, Jun 14, 2023 at 5:03 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 14.06.2023 09:41, Hongtao Liu wrote:
> > On Wed, Jun 14, 2023 at 1:58 PM Jan Beulich via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> >>
> >> ... in vec_dupv4sf / *vec_dupv4si. The respective broadcast insns are
> >> never longer (yet sometimes shorter) than the corresponding VSHUFPS /
> >> VPSHUFD, due to the immediate operand of the shuffle insns balancing the
> >> need for VEX3 in the broadcast ones. When EVEX encoding is required the
> >> broadcast insns are always shorter.
> >>
> >> Add two new alternatives each, one covering the AVX2 case and one
> >> covering AVX512.
> > I think you can just change assemble output for this first alternative
> > when TARGET_AVX2, use vbroadcastss, else use vshufps since
> > vbroadcastss only accept register operand when TARGET_AVX2. And no
> > need to support 2 extra alternatives which doesn't make sense just
> > make RA more confused about the same meaning of different
> > alternatives.
>
> You mean by switching from "@ ..." to C code using "switch
> (which_alternative)"? I can do that, sure. Yet that'll make for a
> more complicated "length_immediate" attribute then. Would be nice
Yes, you can also do something like
(set (attr "length_immediate")
(cond [(eq_attr "alternative" "0")
(if_then_else (match_test "TARGET_AVX2)
(const_string "")
(const_string "1"))
...]
> if you could confirm that this is what you want, as I may well
> have misunderstood you.
>
> But that'll be for vec_dupv4sf only, as vec_dupv4si is subtly
> different.
Yes, but can we use vpbroadcastd for vec_dupv4si similarly?
>
> >> ---
> >> I'm working from the assumption that the isa attributes to the original
> >> 1st and 2nd alternatives don't need further restricting (to sse2_noavx2
> >> or avx_noavx2 as applicable), as the new earlier alternatives cover all
> >> operand forms already when at least AVX2 is enabled.
> >>
> >> Isn't prefix_extra use bogus here? What extra prefix does vbroadcastss
> >> use? (Same further down in *vec_dupv4si and avx2_vbroadcasti128_<mode>
> >> and elsewhere.)
> > Not sure about this part. I grep prefix_extra, seems only used by
> > znver.md/znver4.md for schedule, and only for comi instructions(?the
> > reservation name seems so).
>
> define_attr "length_vex" and define_attr "length" use it, too.
> Otherwise I would have asked whether the attribute couldn't be
> purged from most insns.
>
> My present understanding is that the attribute is wrong on
> vec_dupv4sf (and hence wants dropping from there altogether), and it
> should be "prefix_data16" instead on *vec_dupv4si, evaluating to 1
> only for the non-AVX pshufd case. I suspect at least the latter
> would be going to far for doing it "while here" right in this patch.
> Plus I think I have seen various other questionable uses of that
> attribute.
>
> >> Is use of Yv for the source operand really necessary in *vec_dupv4si?
> >> I.e. would scalar integer values be put in XMM{16...31} when AVX512VL
> > Yes, You can look at ix86_hard_regno_mode_ok, EXT_REX_SSE_REGNO is
> > allowed for scalar mode, but not for 128/256-bit vector modes.
> >
> > 20204 if (TARGET_AVX512F
> > 20205 && (VALID_AVX512F_REG_OR_XI_MODE (mode)
> > 20206 || VALID_AVX512F_SCALAR_MODE (mode)))
> > 20207 return true;
>
> Okay, so I need to switch input constraints for relevant new
> alternatives to Yv (I actually wonder why I did use v in
> vec_dupv4sf, as it was clear to me that SFmode can be in the high
> 16 xmm registers with just AVX512F).
>
> >> isn't enabled? If so (*movsi_internal / *movdi_internal suggest they
> >> might), wouldn't *vec_dupv2di need to use Yv as well in its 3rd
> >> alternative (or just m, as Yv is already covered by the 2nd one)?
> > I guess xm is more suitable since we still want to allocate
> > operands[1] to register when sse3_noavx.
> > It didn't hit any error since for avx and above, alternative 1(2rd
> > one) is always matched than alternative 2.
>
> I'm afraid I don't follow: With just -mavx512f the source operand
> can be in, say, %xmm16 (as per your clarification above). This
> would not match Yv, but it would match vm. And hence wrongly
> create an AVX512VL form of vmovddup. I didn't try it out earlier,
> because unlike for SFmode / DFmode I thought it's not really clear
> how to get the compiler to reliably put a DImode variable in an xmm
> reg, but it just occurred to me that this can be done the same way
> there. And voila,
>
> typedef long long __attribute__((vector_size(16))) v2di;
>
> v2di bcst(long long ll) {
> register long long x asm("xmm16") = ll;
>
> asm("nop %%esp" : "+v" (x));
> return (v2di){x, x};
> }
>
> compiled with just -mavx512f (and -O2) produces an AVX512VL insn.
Ah, I see, indeed it's a potential bug for -mavx512f -mavx512vl
I meant with -mavx512vl,
<mask_codefor><avx512>_vec_dup_gpr<mode><mask_name> will be matched
instead of vec_dupv2di since it's put before vec_dupv2di in .md file,
first will be matched first for exactly the same pattern available. So
we just need to handle non-avx512 cases.
Also even w/o <mask_codefor><avx512>_vec_dup_gpr<mode><mask_name>, in
the same pattern vec_dupv2di, with -mavx512vl, Yv(alternative 1) will
always be match instead of Yvm(alternative 2) for register operand, so
Yv in Yvm is never be matched that's why I said xm is more suitable
(or enough).
You can use -dp to check .s file which alternative will be matched.
> I'll make another patch, yet for that I'm then also not sure why
> you say xm would be more suitable. Yvm allows for registers (with
> or without AVX, merely SSE being required) just as much as vm
> does, doesn't it? And I don't think I've found any combination of
> destination being v and source being xm anywhere. Plus we want to
> allow for the higher registers when AVX512VL is enabled.
>
> Jan
On Thu, Jun 15, 2023 at 1:23 PM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Wed, Jun 14, 2023 at 5:03 PM Jan Beulich <jbeulich@suse.com> wrote:
> >
> > On 14.06.2023 09:41, Hongtao Liu wrote:
> > > On Wed, Jun 14, 2023 at 1:58 PM Jan Beulich via Gcc-patches
> > > <gcc-patches@gcc.gnu.org> wrote:
> > >>
> > >> ... in vec_dupv4sf / *vec_dupv4si. The respective broadcast insns are
> > >> never longer (yet sometimes shorter) than the corresponding VSHUFPS /
> > >> VPSHUFD, due to the immediate operand of the shuffle insns balancing the
> > >> need for VEX3 in the broadcast ones. When EVEX encoding is required the
> > >> broadcast insns are always shorter.
> > >>
> > >> Add two new alternatives each, one covering the AVX2 case and one
> > >> covering AVX512.
> > > I think you can just change assemble output for this first alternative
> > > when TARGET_AVX2, use vbroadcastss, else use vshufps since
> > > vbroadcastss only accept register operand when TARGET_AVX2. And no
> > > need to support 2 extra alternatives which doesn't make sense just
> > > make RA more confused about the same meaning of different
> > > alternatives.
> >
> > You mean by switching from "@ ..." to C code using "switch
> > (which_alternative)"? I can do that, sure. Yet that'll make for a
> > more complicated "length_immediate" attribute then. Would be nice
> Yes, you can also do something like
> (set (attr "length_immediate")
> (cond [(eq_attr "alternative" "0")
> (if_then_else (match_test "TARGET_AVX2)
> (const_string "")
> (const_string "1"))
> ...]
>
> > if you could confirm that this is what you want, as I may well
> > have misunderstood you.
> >
> > But that'll be for vec_dupv4sf only, as vec_dupv4si is subtly
> > different.
> Yes, but can we use vpbroadcastd for vec_dupv4si similarly?
> >
> > >> ---
> > >> I'm working from the assumption that the isa attributes to the original
> > >> 1st and 2nd alternatives don't need further restricting (to sse2_noavx2
> > >> or avx_noavx2 as applicable), as the new earlier alternatives cover all
> > >> operand forms already when at least AVX2 is enabled.
> > >>
> > >> Isn't prefix_extra use bogus here? What extra prefix does vbroadcastss
> > >> use? (Same further down in *vec_dupv4si and avx2_vbroadcasti128_<mode>
> > >> and elsewhere.)
> > > Not sure about this part. I grep prefix_extra, seems only used by
> > > znver.md/znver4.md for schedule, and only for comi instructions(?the
> > > reservation name seems so).
> >
> > define_attr "length_vex" and define_attr "length" use it, too.
> > Otherwise I would have asked whether the attribute couldn't be
> > purged from most insns.
> >
> > My present understanding is that the attribute is wrong on
> > vec_dupv4sf (and hence wants dropping from there altogether), and it
> > should be "prefix_data16" instead on *vec_dupv4si, evaluating to 1
> > only for the non-AVX pshufd case. I suspect at least the latter
> > would be going to far for doing it "while here" right in this patch.
> > Plus I think I have seen various other questionable uses of that
> > attribute.
> >
> > >> Is use of Yv for the source operand really necessary in *vec_dupv4si?
> > >> I.e. would scalar integer values be put in XMM{16...31} when AVX512VL
> > > Yes, You can look at ix86_hard_regno_mode_ok, EXT_REX_SSE_REGNO is
> > > allowed for scalar mode, but not for 128/256-bit vector modes.
> > >
> > > 20204 if (TARGET_AVX512F
> > > 20205 && (VALID_AVX512F_REG_OR_XI_MODE (mode)
> > > 20206 || VALID_AVX512F_SCALAR_MODE (mode)))
> > > 20207 return true;
> >
> > Okay, so I need to switch input constraints for relevant new
> > alternatives to Yv (I actually wonder why I did use v in
> > vec_dupv4sf, as it was clear to me that SFmode can be in the high
> > 16 xmm registers with just AVX512F).
> >
> > >> isn't enabled? If so (*movsi_internal / *movdi_internal suggest they
> > >> might), wouldn't *vec_dupv2di need to use Yv as well in its 3rd
> > >> alternative (or just m, as Yv is already covered by the 2nd one)?
> > > I guess xm is more suitable since we still want to allocate
> > > operands[1] to register when sse3_noavx.
> > > It didn't hit any error since for avx and above, alternative 1(2rd
> > > one) is always matched than alternative 2.
> >
> > I'm afraid I don't follow: With just -mavx512f the source operand
> > can be in, say, %xmm16 (as per your clarification above). This
> > would not match Yv, but it would match vm. And hence wrongly
> > create an AVX512VL form of vmovddup. I didn't try it out earlier,
> > because unlike for SFmode / DFmode I thought it's not really clear
> > how to get the compiler to reliably put a DImode variable in an xmm
> > reg, but it just occurred to me that this can be done the same way
> > there. And voila,
> >
> > typedef long long __attribute__((vector_size(16))) v2di;
> >
> > v2di bcst(long long ll) {
> > register long long x asm("xmm16") = ll;
> >
> > asm("nop %%esp" : "+v" (x));
> > return (v2di){x, x};
> > }
> >
> > compiled with just -mavx512f (and -O2) produces an AVX512VL insn.
> Ah, I see, indeed it's a potential bug for -mavx512f -mavx512vl
correct typo: -mno-avx512vl
> I meant with -mavx512vl,
> <mask_codefor><avx512>_vec_dup_gpr<mode><mask_name> will be matched
> instead of vec_dupv2di since it's put before vec_dupv2di in .md file,
> first will be matched first for exactly the same pattern available. So
> we just need to handle non-avx512 cases.
> Also even w/o <mask_codefor><avx512>_vec_dup_gpr<mode><mask_name>, in
> the same pattern vec_dupv2di, with -mavx512vl, Yv(alternative 1) will
> always be match instead of Yvm(alternative 2) for register operand, so
> Yv in Yvm is never be matched that's why I said xm is more suitable
> (or enough).
> You can use -dp to check .s file which alternative will be matched.
> > I'll make another patch, yet for that I'm then also not sure why
> > you say xm would be more suitable. Yvm allows for registers (with
> > or without AVX, merely SSE being required) just as much as vm
> > does, doesn't it? And I don't think I've found any combination of
> > destination being v and source being xm anywhere. Plus we want to
> > allow for the higher registers when AVX512VL is enabled.
> >
> > Jan
>
>
>
> --
> BR,
> Hongtao
On 15.06.2023 07:23, Hongtao Liu wrote:
> On Wed, Jun 14, 2023 at 5:03 PM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 14.06.2023 09:41, Hongtao Liu wrote:
>>> On Wed, Jun 14, 2023 at 1:58 PM Jan Beulich via Gcc-patches
>>> <gcc-patches@gcc.gnu.org> wrote:
>>>>
>>>> ... in vec_dupv4sf / *vec_dupv4si. The respective broadcast insns are
>>>> never longer (yet sometimes shorter) than the corresponding VSHUFPS /
>>>> VPSHUFD, due to the immediate operand of the shuffle insns balancing the
>>>> need for VEX3 in the broadcast ones. When EVEX encoding is required the
>>>> broadcast insns are always shorter.
>>>>
>>>> Add two new alternatives each, one covering the AVX2 case and one
>>>> covering AVX512.
>>> I think you can just change assemble output for this first alternative
>>> when TARGET_AVX2, use vbroadcastss, else use vshufps since
>>> vbroadcastss only accept register operand when TARGET_AVX2. And no
>>> need to support 2 extra alternatives which doesn't make sense just
>>> make RA more confused about the same meaning of different
>>> alternatives.
>>
>> You mean by switching from "@ ..." to C code using "switch
>> (which_alternative)"? I can do that, sure. Yet that'll make for a
>> more complicated "length_immediate" attribute then. Would be nice
> Yes, you can also do something like
> (set (attr "length_immediate")
> (cond [(eq_attr "alternative" "0")
> (if_then_else (match_test "TARGET_AVX2)
> (const_string "")
> (const_string "1"))
> ...]
Yes, that's along the lines of what I was thinking of. I'm uncertain
about one aspect of what you spelled out above, though: What is the
meaning of the empty string in (const_string "")? Shouldn't this be
"0" or "*"?
>> But that'll be for vec_dupv4sf only, as vec_dupv4si is subtly
>> different.
> Yes, but can we use vpbroadcastd for vec_dupv4si similarly?
Well, the use there is similar, but the folding with the shuffle
alternative won't be possible, because of the new first alternative
also allowing m for the source, when the shuffle one allows for only
Yv. The extra m is pointless to have in vec_dupv4sf (because a later
alternative with a wider ISA [avx] has it already), while in
vec_dupv4si the similar later alternative resolves to vbroadcastss,
not vpbroadcastd. I should be able to fold the two vpbroadcastd
alternatives, along the lines of what I've done in the vec_dupv2di
patch just sent. (As I just realized the m in what are alternatives
1 each in patch v1 is pointless, since already taken care of by
other alternatives.)
Jan
On Thu, Jun 15, 2023 at 2:41 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 15.06.2023 07:23, Hongtao Liu wrote:
> > On Wed, Jun 14, 2023 at 5:03 PM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 14.06.2023 09:41, Hongtao Liu wrote:
> >>> On Wed, Jun 14, 2023 at 1:58 PM Jan Beulich via Gcc-patches
> >>> <gcc-patches@gcc.gnu.org> wrote:
> >>>>
> >>>> ... in vec_dupv4sf / *vec_dupv4si. The respective broadcast insns are
> >>>> never longer (yet sometimes shorter) than the corresponding VSHUFPS /
> >>>> VPSHUFD, due to the immediate operand of the shuffle insns balancing the
> >>>> need for VEX3 in the broadcast ones. When EVEX encoding is required the
> >>>> broadcast insns are always shorter.
> >>>>
> >>>> Add two new alternatives each, one covering the AVX2 case and one
> >>>> covering AVX512.
> >>> I think you can just change assemble output for this first alternative
> >>> when TARGET_AVX2, use vbroadcastss, else use vshufps since
> >>> vbroadcastss only accept register operand when TARGET_AVX2. And no
> >>> need to support 2 extra alternatives which doesn't make sense just
> >>> make RA more confused about the same meaning of different
> >>> alternatives.
> >>
> >> You mean by switching from "@ ..." to C code using "switch
> >> (which_alternative)"? I can do that, sure. Yet that'll make for a
> >> more complicated "length_immediate" attribute then. Would be nice
> > Yes, you can also do something like
> > (set (attr "length_immediate")
> > (cond [(eq_attr "alternative" "0")
> > (if_then_else (match_test "TARGET_AVX2)
> > (const_string "")
> > (const_string "1"))
> > ...]
>
> Yes, that's along the lines of what I was thinking of. I'm uncertain
> about one aspect of what you spelled out above, though: What is the
> meaning of the empty string in (const_string "")? Shouldn't this be
> "0" or "*"?
Yes, sorry for the typo, should be 0 or *.
>
> >> But that'll be for vec_dupv4sf only, as vec_dupv4si is subtly
> >> different.
> > Yes, but can we use vpbroadcastd for vec_dupv4si similarly?
>
> Well, the use there is similar, but the folding with the shuffle
> alternative won't be possible, because of the new first alternative
> also allowing m for the source, when the shuffle one allows for only
> Yv. The extra m is pointless to have in vec_dupv4sf (because a later
> alternative with a wider ISA [avx] has it already), while in
> vec_dupv4si the similar later alternative resolves to vbroadcastss,
> not vpbroadcastd. I should be able to fold the two vpbroadcastd
> alternatives, along the lines of what I've done in the vec_dupv2di
> patch just sent. (As I just realized the m in what are alternatives
> 1 each in patch v1 is pointless, since already taken care of by
> other alternatives.)
>
> Jan
@@ -25798,38 +25798,42 @@
(const_int 1)))])
(define_insn "vec_dupv4sf"
- [(set (match_operand:V4SF 0 "register_operand" "=v,v,x")
+ [(set (match_operand:V4SF 0 "register_operand" "=Yv,v,v,v,x")
(vec_duplicate:V4SF
- (match_operand:SF 1 "nonimmediate_operand" "Yv,m,0")))]
+ (match_operand:SF 1 "nonimmediate_operand" "v,vm,Yv,m,0")))]
"TARGET_SSE"
"@
+ vbroadcastss\t{%1, %0|%0, %1}
+ vbroadcastss\t{%1, %g0|%g0, %1}
vshufps\t{$0, %1, %1, %0|%0, %1, %1, 0}
vbroadcastss\t{%1, %0|%0, %1}
shufps\t{$0, %0, %0|%0, %0, 0}"
- [(set_attr "isa" "avx,avx,noavx")
- (set_attr "type" "sseshuf1,ssemov,sseshuf1")
- (set_attr "length_immediate" "1,0,1")
- (set_attr "prefix_extra" "0,1,*")
- (set_attr "prefix" "maybe_evex,maybe_evex,orig")
- (set_attr "mode" "V4SF")])
+ [(set_attr "isa" "avx2,avx512f,avx,avx,noavx")
+ (set_attr "type" "ssemov,ssemov,sseshuf1,ssemov,sseshuf1")
+ (set_attr "length_immediate" "0,0,1,0,1")
+ (set_attr "prefix_extra" "*,*,0,1,*")
+ (set_attr "prefix" "maybe_evex,evex,maybe_evex,maybe_evex,orig")
+ (set_attr "mode" "V4SF,V16SF,V4SF,V4SF,V4SF")])
(define_insn "*vec_dupv4si"
- [(set (match_operand:V4SI 0 "register_operand" "=v,v,x")
+ [(set (match_operand:V4SI 0 "register_operand" "=Yv,v,v,v,x")
(vec_duplicate:V4SI
- (match_operand:SI 1 "nonimmediate_operand" "Yv,m,0")))]
+ (match_operand:SI 1 "nonimmediate_operand" "vm,vm,Yv,m,0")))]
"TARGET_SSE"
"@
+ vpbroadcastd\t{%1, %0|%0, %1}
+ vpbroadcastd\t{%1, %g0|%g0, %1}
%vpshufd\t{$0, %1, %0|%0, %1, 0}
vbroadcastss\t{%1, %0|%0, %1}
shufps\t{$0, %0, %0|%0, %0, 0}"
- [(set_attr "isa" "sse2,avx,noavx")
- (set_attr "type" "sselog1,ssemov,sselog1")
- (set_attr "length_immediate" "1,0,1")
- (set_attr "prefix_extra" "0,1,*")
- (set_attr "prefix" "maybe_vex,maybe_evex,orig")
- (set_attr "mode" "TI,V4SF,V4SF")
+ [(set_attr "isa" "avx2,avx512f,sse2,avx,noavx")
+ (set_attr "type" "ssemov,ssemov,sselog1,ssemov,sselog1")
+ (set_attr "length_immediate" "0,0,1,0,1")
+ (set_attr "prefix_extra" "*,*,0,1,*")
+ (set_attr "prefix" "maybe_evex,evex,maybe_vex,maybe_evex,orig")
+ (set_attr "mode" "TI,XI,TI,V4SF,V4SF")
(set (attr "preferred_for_speed")
- (cond [(eq_attr "alternative" "1")
+ (cond [(eq_attr "alternative" "3")
(symbol_ref "!TARGET_INTER_UNIT_MOVES_TO_VEC")
]
(symbol_ref "true")))])