x86: slightly enhance "vec_dupv2df<mask_name>"

Message ID 620c187a-e6a6-0bd9-fe17-ecee0d00d6ea@suse.com
State Accepted
Headers
Series x86: slightly enhance "vec_dupv2df<mask_name>" |

Checks

Context Check Description
snail/gcc-patch-check success Github commit url

Commit Message

Jan Beulich July 14, 2023, 9:40 a.m. UTC
  Introduce a new alternative permitting all 32 registers to be used as
source without AVX512VL, by broadcasting to the full 512 bits in that
case. (The insn would also permit all registers to be used as
destination, but V2DFmode doesn't.)

gcc/

	* config/i386/sse.md (vec_dupv2df<mask_name>): Add new AVX512F
	alternative. Move AVX512VL part of condition to new "enabled"
	attribute.
---
Because of the V2DF restriction, in principle the new source constraint
could also omit 'm'.

Can't the latter two of the original alternatives be folded, by using
Yvm instead of xm/vm?
  

Comments

Hongtao Liu July 17, 2023, 6:09 a.m. UTC | #1
On Fri, Jul 14, 2023 at 5:40 PM Jan Beulich via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Introduce a new alternative permitting all 32 registers to be used as
> source without AVX512VL, by broadcasting to the full 512 bits in that
> case. (The insn would also permit all registers to be used as
> destination, but V2DFmode doesn't.)
The patch looks technically ok, but considering we don't have a real
CPU with only AVX512F but no AVX512VL, these optimisations for AVX512F
only don't make much sense, but rather increase the burden for
maintenance.
(For now, AVX512VL+AVX512CD+AVX512BW+AVX512DQ is a base set after
skylake-avx512, users are more likely to use -march=$PROCESSOR for
avx512.)
For this(and those previous AVX512F only) optimised patch, I think
it's helpful to help understand the pattern, so I'll approve of this
patch. But I hope we don't spend too much time on such optimisations
(unless there is an AVX512F only processor).
>
> gcc/
>
>         * config/i386/sse.md (vec_dupv2df<mask_name>): Add new AVX512F
>         alternative. Move AVX512VL part of condition to new "enabled"
>         attribute.
> ---
> Because of the V2DF restriction, in principle the new source constraint
> could also omit 'm'.
>
> Can't the latter two of the original alternatives be folded, by using
> Yvm instead of xm/vm?
I think yes.
>
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -13761,18 +13761,27 @@
>     (set_attr "mode" "DF,DF,V1DF,V1DF,V1DF,V2DF,V1DF,V1DF,V1DF")])
>
>  (define_insn "vec_dupv2df<mask_name>"
> -  [(set (match_operand:V2DF 0 "register_operand"     "=x,x,v")
> +  [(set (match_operand:V2DF 0 "register_operand"     "=x,x,v,v")
>         (vec_duplicate:V2DF
> -         (match_operand:DF 1 "nonimmediate_operand" " 0,xm,vm")))]
> -  "TARGET_SSE2 && <mask_avx512vl_condition>"
> +         (match_operand:DF 1 "nonimmediate_operand" "0,xm,vm,vm")))]
> +  "TARGET_SSE2"
>    "@
>     unpcklpd\t%0, %0
>     %vmovddup\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}
> -   vmovddup\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
> -  [(set_attr "isa" "noavx,sse3,avx512vl")
> -   (set_attr "type" "sselog1")
> -   (set_attr "prefix" "orig,maybe_vex,evex")
> -   (set_attr "mode" "V2DF,DF,DF")])
> +   vmovddup\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}
> +   vbroadcastsd\t{%1, }%g0<mask_operand2>{|, %1}"
> +  [(set_attr "isa" "noavx,sse3,avx512vl,*")
> +   (set_attr "type" "sselog1,ssemov,ssemov,ssemov")
> +   (set_attr "prefix" "orig,maybe_vex,evex,evex")
> +   (set_attr "mode" "V2DF,DF,DF,V8DF")
> +   (set (attr "enabled")
> +       (cond [(eq_attr "alternative" "3")
> +                (symbol_ref "TARGET_AVX512F && !TARGET_AVX512VL
> +                             && !TARGET_PREFER_AVX256")
> +              (match_test "<mask_avx512vl_condition>")
> +                (const_string "*")
> +             ]
> +             (symbol_ref "false")))])
>
>  (define_insn "vec_concatv2df"
>    [(set (match_operand:V2DF 0 "register_operand"     "=x,x,v,x,x, v,x,x")
  
Jan Beulich July 17, 2023, 6:20 a.m. UTC | #2
On 17.07.2023 08:09, Hongtao Liu wrote:
> On Fri, Jul 14, 2023 at 5:40 PM Jan Beulich via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> Introduce a new alternative permitting all 32 registers to be used as
>> source without AVX512VL, by broadcasting to the full 512 bits in that
>> case. (The insn would also permit all registers to be used as
>> destination, but V2DFmode doesn't.)
> The patch looks technically ok, but considering we don't have a real
> CPU with only AVX512F but no AVX512VL, these optimisations for AVX512F
> only don't make much sense, but rather increase the burden for
> maintenance.

Well, I can of course ignore this aspect going forward. It seemed
relevant to me for two reasons: For one, I expect I'm not the only
one to simply pass -mavx512f when caring about basic AVX512. And
then isn't the Knights line of processors (Xeon Phi) lacking VL?
(I'm getting the impression though that this line is discontinued
now.)

>> Can't the latter two of the original alternatives be folded, by using
>> Yvm instead of xm/vm?
> I think yes.

I guess I'll make a follow-on patch for that then.

Jan
  
Hongtao Liu July 17, 2023, 6:48 a.m. UTC | #3
On Mon, Jul 17, 2023 at 2:20 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 17.07.2023 08:09, Hongtao Liu wrote:
> > On Fri, Jul 14, 2023 at 5:40 PM Jan Beulich via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> >>
> >> Introduce a new alternative permitting all 32 registers to be used as
> >> source without AVX512VL, by broadcasting to the full 512 bits in that
> >> case. (The insn would also permit all registers to be used as
> >> destination, but V2DFmode doesn't.)
> > The patch looks technically ok, but considering we don't have a real
> > CPU with only AVX512F but no AVX512VL, these optimisations for AVX512F
> > only don't make much sense, but rather increase the burden for
> > maintenance.
>
> Well, I can of course ignore this aspect going forward. It seemed
> relevant to me for two reasons: For one, I expect I'm not the only
> one to simply pass -mavx512f when caring about basic AVX512. And
You're not, AFAIK, some users used target("avx512f") for FMV.  But I'd
rather persuade them to use target ("arch=x86-64-v4") rather than
optimizing for AVX512F only.
> then isn't the Knights line of processors (Xeon Phi) lacking VL?
> (I'm getting the impression though that this line is discontinued
> now.)
KNL is deprecated, and yes it doesn't support AVX512VL.
>
> >> Can't the latter two of the original alternatives be folded, by using
> >> Yvm instead of xm/vm?
> > I think yes.
>
> I guess I'll make a follow-on patch for that then.
>
> Jan
  

Patch

--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -13761,18 +13761,27 @@ 
    (set_attr "mode" "DF,DF,V1DF,V1DF,V1DF,V2DF,V1DF,V1DF,V1DF")])
 
 (define_insn "vec_dupv2df<mask_name>"
-  [(set (match_operand:V2DF 0 "register_operand"     "=x,x,v")
+  [(set (match_operand:V2DF 0 "register_operand"     "=x,x,v,v")
 	(vec_duplicate:V2DF
-	  (match_operand:DF 1 "nonimmediate_operand" " 0,xm,vm")))]
-  "TARGET_SSE2 && <mask_avx512vl_condition>"
+	  (match_operand:DF 1 "nonimmediate_operand" "0,xm,vm,vm")))]
+  "TARGET_SSE2"
   "@
    unpcklpd\t%0, %0
    %vmovddup\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}
-   vmovddup\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
-  [(set_attr "isa" "noavx,sse3,avx512vl")
-   (set_attr "type" "sselog1")
-   (set_attr "prefix" "orig,maybe_vex,evex")
-   (set_attr "mode" "V2DF,DF,DF")])
+   vmovddup\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}
+   vbroadcastsd\t{%1, }%g0<mask_operand2>{|, %1}"
+  [(set_attr "isa" "noavx,sse3,avx512vl,*")
+   (set_attr "type" "sselog1,ssemov,ssemov,ssemov")
+   (set_attr "prefix" "orig,maybe_vex,evex,evex")
+   (set_attr "mode" "V2DF,DF,DF,V8DF")
+   (set (attr "enabled")
+	(cond [(eq_attr "alternative" "3")
+		 (symbol_ref "TARGET_AVX512F && !TARGET_AVX512VL
+			      && !TARGET_PREFER_AVX256")
+	       (match_test "<mask_avx512vl_condition>")
+	         (const_string "*")
+	      ]
+	      (symbol_ref "false")))])
 
 (define_insn "vec_concatv2df"
   [(set (match_operand:V2DF 0 "register_operand"     "=x,x,v,x,x, v,x,x")