From patchwork Fri Oct 13 12:00:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 152553 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:2908:b0:403:3b70:6f57 with SMTP id ib8csp1835697vqb; Fri, 13 Oct 2023 05:01:23 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGQQY3j/uCuB2OCQ9qAitoMBz/y4bM9ntr+cBHCRHQeDF1IIDPUndMV3OEcy5TPTe5/CYlB X-Received: by 2002:ac8:5e4c:0:b0:419:82fa:7102 with SMTP id i12-20020ac85e4c000000b0041982fa7102mr27633394qtx.38.1697198482745; Fri, 13 Oct 2023 05:01:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697198482; cv=none; d=google.com; s=arc-20160816; b=sxahuPANl0eeJnFyqzC6F+4F/5G9FedvykcOk+pq0zQf4GL982SjcQFcMsNl2XzK7e 5rsUPSfMMB9C46qDoItna1CpmqmlGGS+VCtjw9rxkjE9TZC8I6A9jZ+SlreiDKowwwFO pVDUzmTYCezgQgLdk0LwSklHwIkb4fq69dE+NJ841tfpsmWvhvN9EpHJvBZynGQybpj9 m0MAe4DXSZqQr+BeRxEc/+Q7xDON9/qe+RVGqM4TrgAgY8D5dsvmZSY28KdaBnhpdSg4 oV3ptmt91EHDdiguI6Nsq4Pb8l8ZHOEM1D5vRdcB5+rDsuG1fmo+h+TiwkB1fASo7nIx bFZg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:message-id:mime-version:subject :to:from:date:dkim-signature:dkim-signature:dmarc-filter :delivered-to; bh=ONPhCQ3tKhGK217NSdVoQTGFiMZS6ibRGkRLVHSmh6E=; fh=hPrbWPhweUx4V0GV9uXJqbyAzg2ABmTz7kczrAQqMmM=; b=EhhuvTVfgTNa6G4V25y6lVKtapVRDCX6HCNAL8hDpHpilwKvB1yTzvAyLMtfGP6kZS vC7rJLa8msq2d+Qn1OmSOPliLutUqGscmLZaeCZdZrfm1YeXIfXkUmiahO9OhfM9ykA7 1y2APi/fNj1tI0KevjQfDvN+Dc/JZAKWs/fUtbBfTGsOWFB6g6yuS4HrtboTvvifkBUC AywKU5LIVzcJKc6KApRxpvwihPQZd4ym8j49qOHbdWAD6/mzsY3qDxQkNqvAyoqHj12T 0g6yF9nM5Qb4lDFFDC4qqGkJRXjqU/+ZshqAE4hk12hXHJQ/OXzTakIVGThF12ZjQbyl X+zQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=d6sQkpTJ; dkim=neutral (no key) header.i=@suse.de header.s=susede2_ed25519 header.b=RVxp+Uwt; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=suse.de Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id d16-20020a05622a05d000b00417ff49777csi1057016qtb.357.2023.10.13.05.01.22 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 Oct 2023 05:01:22 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=d6sQkpTJ; dkim=neutral (no key) header.i=@suse.de header.s=susede2_ed25519 header.b=RVxp+Uwt; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=suse.de Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7D85E3856DF4 for ; Fri, 13 Oct 2023 12:01:22 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by sourceware.org (Postfix) with ESMTPS id B17763858D35 for ; Fri, 13 Oct 2023 12:00:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B17763858D35 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 84912219B7 for ; Fri, 13 Oct 2023 12:00:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1697198456; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version:content-type:content-type; bh=ONPhCQ3tKhGK217NSdVoQTGFiMZS6ibRGkRLVHSmh6E=; b=d6sQkpTJvYgZ1XM/BE/UF7rmm5yZpl0qOV9nD2Y1x3kRPwhwVjKJB1/ZaU2YGlyH6Fpusn KKUxdv3TjXVnHJWOBCaIbglikum4AtS2onOwP1g51SEhFi7OJJOIw9uevFbWOVwmJf6LuA XqVQC8VxHHOQrw6DT/mhv2SolgBiBfU= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1697198456; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version:content-type:content-type; bh=ONPhCQ3tKhGK217NSdVoQTGFiMZS6ibRGkRLVHSmh6E=; b=RVxp+Uwth8qXiliK4QXfUiQxwS0/hViU6lP2rW7q15mMPBFLSm4eICob45M20SJ+9rWyw0 lXIse9zxb0ZC//Bg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 6C3FF1358F for ; Fri, 13 Oct 2023 12:00:56 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id Sg0BGHgxKWXrYgAAMHmgww (envelope-from ) for ; Fri, 13 Oct 2023 12:00:56 +0000 Date: Fri, 13 Oct 2023 14:00:55 +0200 (CEST) From: Richard Biener To: gcc-patches@gcc.gnu.org Subject: [PATCH] OMP SIMD inbranch call vectorization for AVX512 style masks MIME-Version: 1.0 Message-Id: <20231013120056.6C3FF1358F@imap2.suse-dmz.suse.de> Authentication-Results: smtp-out1.suse.de; none X-Spam-Level: X-Spam-Score: -7.10 X-Spamd-Result: default: False [-7.10 / 50.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_HAM_LONG(-3.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; TO_DN_NONE(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[gcc-patches@gcc.gnu.org]; RCPT_COUNT_ONE(0.00)[1]; MID_RHS_MATCH_FROMTLD(0.00)[]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[]; BAYES_HAM(-3.00)[100.00%] X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1779641596154414081 X-GMAIL-MSGID: 1779641596154414081 The following teaches vectorizable_simd_clone_call to handle integer mode masks. The tricky bit is to second-guess the number of lanes represented by a single mask argument - the following uses simdlen and the number of mask arguments to calculate that, assuming ABIs have them uniform. Similar to the VOIDmode handling there's a restriction on not supporting splitting/merging of incoming vector masks to more/less SIMD call arguments. Bootstrapped and tested on x86_64-unknown-linux-gnu, re-testing after a minor change. Will push later. Richard. PR tree-optimization/111795 * tree-vect-stmts.cc (vectorizable_simd_clone_call): Handle integer mode mask arguments. * gcc.target/i386/vect-simd-clone-avx512-1.c: New testcase. * gcc.target/i386/vect-simd-clone-avx512-2.c: Likewise. * gcc.target/i386/vect-simd-clone-avx512-3.c: Likewise. --- .../i386/vect-simd-clone-avx512-1.c | 43 +++++ .../i386/vect-simd-clone-avx512-2.c | 6 + .../i386/vect-simd-clone-avx512-3.c | 6 + gcc/tree-vect-stmts.cc | 150 ++++++++++++++---- 4 files changed, 175 insertions(+), 30 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/vect-simd-clone-avx512-1.c create mode 100644 gcc/testsuite/gcc.target/i386/vect-simd-clone-avx512-2.c create mode 100644 gcc/testsuite/gcc.target/i386/vect-simd-clone-avx512-3.c diff --git a/gcc/testsuite/gcc.target/i386/vect-simd-clone-avx512-1.c b/gcc/testsuite/gcc.target/i386/vect-simd-clone-avx512-1.c new file mode 100644 index 00000000000..e350996439e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vect-simd-clone-avx512-1.c @@ -0,0 +1,43 @@ +/* { dg-do run } */ +/* { dg-require-effective-target avx512vl } */ +/* { dg-options "-O2 -fopenmp-simd -mavx512vl" } */ + +#include "avx512vl-check.h" + +#ifndef SIMDLEN +#define SIMDLEN 4 +#endif + +int x[1024]; + +#pragma omp declare simd simdlen(SIMDLEN) +__attribute__((noinline)) int +foo (int a, int b) +{ + return a + b; +} + +void __attribute__((noipa)) +bar (void) +{ +#pragma omp simd + for (int i = 0; i < 1024; i++) + if (x[i] < 20) + x[i] = foo (x[i], x[i]); +} + +void avx512vl_test () +{ + int i; +#pragma GCC novector + for (i = 0; i < 1024; i++) + x[i] = i; + + bar (); + +#pragma GCC novector + for (i = 0; i < 1024; i++) + if ((i < 20 && x[i] != i + i) + || (i >= 20 && x[i] != i)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/vect-simd-clone-avx512-2.c b/gcc/testsuite/gcc.target/i386/vect-simd-clone-avx512-2.c new file mode 100644 index 00000000000..d9968ae30f5 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vect-simd-clone-avx512-2.c @@ -0,0 +1,6 @@ +/* { dg-do run } */ +/* { dg-require-effective-target avx512vl } */ +/* { dg-options "-O2 -fopenmp-simd -mavx512vl" } */ + +#define SIMDLEN 8 +#include "vect-simd-clone-avx512-1.c" diff --git a/gcc/testsuite/gcc.target/i386/vect-simd-clone-avx512-3.c b/gcc/testsuite/gcc.target/i386/vect-simd-clone-avx512-3.c new file mode 100644 index 00000000000..c05f6c8ce91 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vect-simd-clone-avx512-3.c @@ -0,0 +1,6 @@ +/* { dg-do run } */ +/* { dg-require-effective-target avx512vl } */ +/* { dg-options "-O2 -fopenmp-simd -mavx512vl" } */ + +#define SIMDLEN 16 +#include "vect-simd-clone-avx512-1.c" diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 0fb6fc3394a..abc8603f67c 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -4492,6 +4492,9 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info, i = -1; break; case SIMD_CLONE_ARG_TYPE_MASK: + if (SCALAR_INT_MODE_P (n->simdclone->mask_mode) + != SCALAR_INT_MODE_P (TYPE_MODE (arginfo[i].vectype))) + i = -1; break; } if (i == (size_t) -1) @@ -4517,6 +4520,12 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info, if (bestn == NULL) return false; + unsigned int num_mask_args = 0; + if (SCALAR_INT_MODE_P (bestn->simdclone->mask_mode)) + for (i = 0; i < nargs; i++) + if (bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_MASK) + num_mask_args++; + for (i = 0; i < nargs; i++) { if ((arginfo[i].dt == vect_constant_def @@ -4541,30 +4550,50 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info, return false; } - if (bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_MASK - && bestn->simdclone->mask_mode == VOIDmode - && (simd_clone_subparts (bestn->simdclone->args[i].vector_type) - != simd_clone_subparts (arginfo[i].vectype))) + if (bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_MASK) { - /* FORNOW we only have partial support for vector-type masks that - can't hold all of simdlen. */ - if (dump_enabled_p ()) - dump_printf_loc (MSG_MISSED_OPTIMIZATION, - vect_location, - "in-branch vector clones are not yet" - " supported for mismatched vector sizes.\n"); - return false; - } - if (bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_MASK - && bestn->simdclone->mask_mode != VOIDmode) - { - /* FORNOW don't support integer-type masks. */ - if (dump_enabled_p ()) - dump_printf_loc (MSG_MISSED_OPTIMIZATION, - vect_location, - "in-branch vector clones are not yet" - " supported for integer mask modes.\n"); - return false; + if (bestn->simdclone->mask_mode == VOIDmode) + { + if (simd_clone_subparts (bestn->simdclone->args[i].vector_type) + != simd_clone_subparts (arginfo[i].vectype)) + { + /* FORNOW we only have partial support for vector-type masks + that can't hold all of simdlen. */ + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, + vect_location, + "in-branch vector clones are not yet" + " supported for mismatched vector sizes.\n"); + return false; + } + } + else if (SCALAR_INT_MODE_P (bestn->simdclone->mask_mode)) + { + if (!SCALAR_INT_MODE_P (TYPE_MODE (arginfo[i].vectype)) + || maybe_ne (exact_div (bestn->simdclone->simdlen, + num_mask_args), + simd_clone_subparts (arginfo[i].vectype))) + { + /* FORNOW we only have partial support for integer-type masks + that represent the same number of lanes as the + vectorized mask inputs. */ + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, + vect_location, + "in-branch vector clones are not yet " + "supported for mismatched vector sizes.\n"); + return false; + } + } + else + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, + vect_location, + "in-branch vector clones not supported" + " on this target.\n"); + return false; + } } } @@ -4781,14 +4810,9 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info, } break; case SIMD_CLONE_ARG_TYPE_MASK: - atype = bestn->simdclone->args[i].vector_type; - if (bestn->simdclone->mask_mode != VOIDmode) - { - /* FORNOW: this is disabled above. */ - gcc_unreachable (); - } - else + if (bestn->simdclone->mask_mode == VOIDmode) { + atype = bestn->simdclone->args[i].vector_type; tree elt_type = TREE_TYPE (atype); tree one = fold_convert (elt_type, integer_one_node); tree zero = fold_convert (elt_type, integer_zero_node); @@ -4839,6 +4863,72 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info, } } } + else if (SCALAR_INT_MODE_P (bestn->simdclone->mask_mode)) + { + atype = bestn->simdclone->args[i].vector_type; + /* Guess the number of lanes represented by atype. */ + unsigned HOST_WIDE_INT atype_subparts + = exact_div (bestn->simdclone->simdlen, + num_mask_args).to_constant (); + o = vector_unroll_factor (nunits, atype_subparts); + for (m = j * o; m < (j + 1) * o; m++) + { + if (m == 0) + { + if (!slp_node) + vect_get_vec_defs_for_operand (vinfo, stmt_info, + o * ncopies, + op, + &vec_oprnds[i]); + vec_oprnds_i[i] = 0; + } + if (atype_subparts + < simd_clone_subparts (arginfo[i].vectype)) + { + /* The mask argument has fewer elements than the + input vector. */ + /* FORNOW */ + gcc_unreachable (); + } + else if (atype_subparts + == simd_clone_subparts (arginfo[i].vectype)) + { + /* The vector mask argument matches the input + in the number of lanes, but not necessarily + in the mode. */ + vec_oprnd0 = vec_oprnds[i][vec_oprnds_i[i]++]; + tree st = lang_hooks.types.type_for_mode + (TYPE_MODE (TREE_TYPE (vec_oprnd0)), 1); + vec_oprnd0 = build1 (VIEW_CONVERT_EXPR, st, + vec_oprnd0); + gassign *new_stmt + = gimple_build_assign (make_ssa_name (st), + vec_oprnd0); + vect_finish_stmt_generation (vinfo, stmt_info, + new_stmt, gsi); + if (!types_compatible_p (atype, st)) + { + new_stmt + = gimple_build_assign (make_ssa_name (atype), + NOP_EXPR, + gimple_assign_lhs + (new_stmt)); + vect_finish_stmt_generation (vinfo, stmt_info, + new_stmt, gsi); + } + vargs.safe_push (gimple_assign_lhs (new_stmt)); + } + else + { + /* The mask argument has more elements than the + input vector. */ + /* FORNOW */ + gcc_unreachable (); + } + } + } + else + gcc_unreachable (); break; case SIMD_CLONE_ARG_TYPE_UNIFORM: vargs.safe_push (op);