From patchwork Tue Aug 9 13:23:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 450 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:6a10:20da:b0:2d3:3019:e567 with SMTP id n26csp2515504pxc; Tue, 9 Aug 2022 06:25:16 -0700 (PDT) X-Google-Smtp-Source: AA6agR7rHbTj3/zll3OejFf4XyjYY/n9yj63SxAsfRKuJrNBBiEPZnoImCvh7q3+PtBPJiqQABxd X-Received: by 2002:a17:907:3e03:b0:722:e694:438 with SMTP id hp3-20020a1709073e0300b00722e6940438mr17243896ejc.755.1660051516589; Tue, 09 Aug 2022 06:25:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1660051516; cv=none; d=google.com; s=arc-20160816; b=KR2o59fLa8mxeHcI+Kxkh950MOhNXZVex4KW1MP7w8QRTEfqvdbQO6XGJrxqrxQPHi TNf19q+enzl5QLqQq42A9QT+IuOljUP3S39B6gJ3U8xHFqjYhDWZ9u+QpkBkaq/4PiUR 8qJrBGX4XGsUKCTFogB9wHeq/VTyyRchd5lpm27dhzDNcNTjRTYBGFKTxQlhm7D1I21A jNk8IET6c+A8AgrHJznSyQkMhKhTiJ5WwmtjcWCaP54MEP3rCmIo0XLz4TJCvSMHRDqZ D5CynAXO1B40WgeUalVHpME3oggmG6T/V+LKUIhOtIDNMQ9HNEhemtJxG39yuqZJ5TqO /+HQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:to:from :ironport-sdr:dmarc-filter:delivered-to; bh=muSho0wfBaeq7aBDKnBKHlYb1Njar0oH7QLG9s9aYbw=; b=rp7d7jTLUJGCdyeSP7s8QG/DJlvg1gNSr0vLVjDaC5T5GsKfOxUMuSJmbXiR1IuLgM pN2R7uPUTHssKwnr1jdz/O9uyVcUPXceW67dTRT1iFPBooldvn9nF0m/11gmwtpZ/9/z Kktm2/K58mgYEINiWdyH4JycbbL1xOGl9iqQcpPgrqmAIfWBL+roLsE32je7lujeTSRR jkS7UZrZEw1TnPmU03MRRkMPehMllKFWlBqpi2lzL9Qhprel/ZDeiS+vgu/MTCJDBJl0 QVkZD6LK1jDTiyLKsT3nD/JADTHIHN8UIMn7ndTGRxNLnvOXa5vV2wrRROiFBkD6H1Ct z86A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id g19-20020a056402425300b0043de9c6edd7si9689327edb.340.2022.08.09.06.25.16 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Aug 2022 06:25:16 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id F3E44385AE47 for ; Tue, 9 Aug 2022 13:25:02 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id A39B4385702C for ; Tue, 9 Aug 2022 13:24:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A39B4385702C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.93,224,1654588800"; d="scan'208";a="80994419" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa2.mentor.iphmx.com with ESMTP; 09 Aug 2022 05:24:08 -0800 IronPort-SDR: dwfF63Ks1sLczyWjZzKggvdOxpWKLe+TKIg0JwgV/iVfF5F/FoyBiRhvRTFQGYchK9yp4N9yfz fupUnNN7fHZ0e7ND21zPyKr66llI700imbuXd30SZONVWoQUGDJBg5n8BJvT//WIl27QZ9wQxW jPrrkOQv5Q9wiVlk7c9IYrkxkYXEbVjWPqrgUYe7wvpth8IZLNK+MSi7rDrXZiyBx57jheVY9i rgxL9UmvJPXA8W2s9VBlsUmJZ9O4qzVN+BJdH6MmVJGHhWDmv0tG0n5/j83WtoF56eWif8sUHi 2is= From: Andrew Stubbs To: Subject: [PATCH 1/3] omp-simd-clone: Allow fixed-lane vectors Date: Tue, 9 Aug 2022 14:23:48 +0100 Message-ID: X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-15.mgc.mentorg.com (139.181.222.15) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1740690179155289212?= X-GMAIL-MSGID: =?utf-8?q?1740690179155289212?= The vecsize_int/vecsize_float has an assumption that all arguments will use the same bitsize, and vary the number of lanes according to the element size, but this is inappropriate on targets where the number of lanes is fixed and the bitsize varies (i.e. amdgcn). With this change the vecsize can be left zero and the vectorization factor will be the same for all types. gcc/ChangeLog: * doc/tm.texi: Regenerate. * omp-simd-clone.cc (simd_clone_adjust_return_type): Allow zero vecsize. (simd_clone_adjust_argument_types): Likewise. * target.def (compute_vecsize_and_simdlen): Document the new vecsize_int and vecsize_float semantics. --- gcc/doc/tm.texi | 3 +++ gcc/omp-simd-clone.cc | 20 +++++++++++++++----- gcc/target.def | 3 +++ 3 files changed, 21 insertions(+), 5 deletions(-) diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 92bda1a7e14..c3001c6ded9 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -6253,6 +6253,9 @@ stores. This hook should set @var{vecsize_mangle}, @var{vecsize_int}, @var{vecsize_float} fields in @var{simd_clone} structure pointed by @var{clone_info} argument and also @var{simdlen} field if it was previously 0. +@var{vecsize_mangle} is a marker for the backend only. @var{vecsize_int} and +@var{vecsize_float} should be left zero on targets where the number of lanes is +not determined by the bitsize (in which case @var{simdlen} is always used). The hook should return 0 if SIMD clones shouldn't be emitted, or number of @var{vecsize_mangle} variants that should be emitted. @end deftypefn diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc index 58bd68b129b..258d3c6377f 100644 --- a/gcc/omp-simd-clone.cc +++ b/gcc/omp-simd-clone.cc @@ -504,7 +504,10 @@ simd_clone_adjust_return_type (struct cgraph_node *node) veclen = node->simdclone->vecsize_int; else veclen = node->simdclone->vecsize_float; - veclen = exact_div (veclen, GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t))); + if (known_eq (veclen, 0)) + veclen = node->simdclone->simdlen; + else + veclen = exact_div (veclen, GET_MODE_BITSIZE (SCALAR_TYPE_MODE (t))); if (multiple_p (veclen, node->simdclone->simdlen)) veclen = node->simdclone->simdlen; if (POINTER_TYPE_P (t)) @@ -618,8 +621,12 @@ simd_clone_adjust_argument_types (struct cgraph_node *node) veclen = sc->vecsize_int; else veclen = sc->vecsize_float; - veclen = exact_div (veclen, - GET_MODE_BITSIZE (SCALAR_TYPE_MODE (parm_type))); + if (known_eq (veclen, 0)) + veclen = sc->simdlen; + else + veclen = exact_div (veclen, + GET_MODE_BITSIZE + (SCALAR_TYPE_MODE (parm_type))); if (multiple_p (veclen, sc->simdlen)) veclen = sc->simdlen; adj.op = IPA_PARAM_OP_NEW; @@ -669,8 +676,11 @@ simd_clone_adjust_argument_types (struct cgraph_node *node) veclen = sc->vecsize_int; else veclen = sc->vecsize_float; - veclen = exact_div (veclen, - GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type))); + if (known_eq (veclen, 0)) + veclen = sc->simdlen; + else + veclen = exact_div (veclen, + GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type))); if (multiple_p (veclen, sc->simdlen)) veclen = sc->simdlen; if (sc->mask_mode != VOIDmode) diff --git a/gcc/target.def b/gcc/target.def index 2a7fa68f83d..4d49ffc2c88 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -1629,6 +1629,9 @@ DEFHOOK "This hook should set @var{vecsize_mangle}, @var{vecsize_int}, @var{vecsize_float}\n\ fields in @var{simd_clone} structure pointed by @var{clone_info} argument and also\n\ @var{simdlen} field if it was previously 0.\n\ +@var{vecsize_mangle} is a marker for the backend only. @var{vecsize_int} and\n\ +@var{vecsize_float} should be left zero on targets where the number of lanes is\n\ +not determined by the bitsize (in which case @var{simdlen} is always used).\n\ The hook should return 0 if SIMD clones shouldn't be emitted,\n\ or number of @var{vecsize_mangle} variants that should be emitted.", int, (struct cgraph_node *, struct cgraph_simd_clone *, tree, int), NULL) From patchwork Tue Aug 9 13:23:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 451 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:6a10:20da:b0:2d3:3019:e567 with SMTP id n26csp2515727pxc; Tue, 9 Aug 2022 06:25:50 -0700 (PDT) X-Google-Smtp-Source: AA6agR4cjMPnAYRQOhw7ol0uTWKp0Qj33UD2XY8IFaW4lZ2oC8Y5uK3y9zOcSo5WTBVspDHQFXmi X-Received: by 2002:a05:6402:2b8d:b0:43a:5410:a9fc with SMTP id fj13-20020a0564022b8d00b0043a5410a9fcmr22467959edb.99.1660051549701; Tue, 09 Aug 2022 06:25:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1660051549; cv=none; d=google.com; s=arc-20160816; b=ZLzazk0TiMBbbPhXT+0jo2vCl9VxsHavOEY5a3XCQgaZiEy48Fc0v1W6LEO7rnYhXt cdw86wmnDY2QfSd8atUzQCYSVmaOIEs73dYS/94F3aL5zacpzRoRKSJqlvMDxlYcWY0U XNJwUFBjJ2C1ES+UoPZs8XcLwAPnypi8A2Zz5AhZNTVhk5GHJapRl8Ggi4ArJmMAQAW/ DjZeZct8bItDg1wpgp0m0SNlp+fRq5BLsQfcMMBpBvEs+Zh25JbBYVeTkPMUlnoROv7C vjMjkKAPWY/n7S7Fw6va9e/jG0fb3lGOutSQdgrCzWh1RUGIg59Nm6dapsgPqxHzi5S3 kw0w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:to:from :ironport-sdr:dmarc-filter:delivered-to; bh=XKyem1nAUo1n7Zx02p2EInlGtRfZsOZcBB5tlNTKIWo=; b=urnItwa3MoaIX0BRZObjV04E0wt0dVXw2xnFe9xGGNtGYJdjpV2PT+2+T9gMtW/kHE DwC2z+VjV5Jyl1obl2BviRszHyrZyXlmlt1pGmH9ruzv2WrmB079kMKKbgz7m2EZAb92 MD3er42TTWwKmz6bVGdVt4aFMMMLgAVPfl6hunYitf+kHLiFmluIgWS4LFvdTj+vvaqq th2/EEvNtHReedJWj59rAod8vHvxu+A8330ixZ+VqrTzfEUeFk88dq5uUymucasdwfFw ThdZiXywawJniKVO4gVNI/vsKhlridjUWyqI6IOXzOR3qckNQlO/4CCLY2gy7aFSyG49 sK3w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id hd33-20020a17090796a100b00732fa7c5058si1096209ejc.300.2022.08.09.06.25.49 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Aug 2022 06:25:49 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2C4963856262 for ; Tue, 9 Aug 2022 13:25:35 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id 6049E3856DE2 for ; Tue, 9 Aug 2022 13:24:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6049E3856DE2 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.93,224,1654588800"; d="scan'208";a="80994424" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa2.mentor.iphmx.com with ESMTP; 09 Aug 2022 05:24:13 -0800 IronPort-SDR: kYUeIiRu+kzynfRQvftP/SMnw869ClMTQ3s9QZ6bs6R0f//EJZReNLdwDC4r8Zn10q7UE1kL+A WkU1bZM/bXJj++OqAVYoQR8oeQfiOGl3JxynO8/aGN64Hs9M8TG9DmEgf08bImpSLwxMz5ZEAK 9j9FmeUm2I8rW4yDXWy39tMDK47G9VvMU5q37uszBC6VUzrcO0hYd/bi8eGWRuKECPV9My6G/n 1b9R9jX9SrUbAb3gge1Zl2MB9wevDQLbhY686mWYwl2YUgQgagowgdWf+QInIKo1F+n61//+1/ EBw= From: Andrew Stubbs To: Subject: [PATCH 2/3] amdgcn: OpenMP SIMD routine support Date: Tue, 9 Aug 2022 14:23:49 +0100 Message-ID: X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-15.mgc.mentorg.com (139.181.222.15) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1740690213470280558?= X-GMAIL-MSGID: =?utf-8?q?1740690213470280558?= Enable and configure SIMD clones for amdgcn. This affects both the __simd__ function attribute, and the OpenMP "declare simd" directive. Note that the masked SIMD variants are generated, but the middle end doesn't actually support calling them yet. gcc/ChangeLog: * config/gcn/gcn.cc (gcn_simd_clone_compute_vecsize_and_simdlen): New. (gcn_simd_clone_adjust): New. (gcn_simd_clone_usable): New. (TARGET_SIMD_CLONE_ADJUST): New. (TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN): New. (TARGET_SIMD_CLONE_USABLE): New. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-simd-clone-1.c: Add dg-warning. * gcc.dg/vect/vect-simd-clone-2.c: Add dg-warning. * gcc.dg/vect/vect-simd-clone-3.c: Add dg-warning. * gcc.dg/vect/vect-simd-clone-4.c: Add dg-warning. * gcc.dg/vect/vect-simd-clone-5.c: Add dg-warning. * gcc.dg/vect/vect-simd-clone-8.c: Add dg-warning. --- gcc/config/gcn/gcn.cc | 63 +++++++++++++++++++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c | 2 + gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c | 2 + gcc/testsuite/gcc.dg/vect/vect-simd-clone-3.c | 1 + gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c | 1 + gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c | 1 + gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c | 2 + 7 files changed, 72 insertions(+) diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc index 96295e23aad..ceb69000807 100644 --- a/gcc/config/gcn/gcn.cc +++ b/gcc/config/gcn/gcn.cc @@ -52,6 +52,7 @@ #include "rtl-iter.h" #include "dwarf2.h" #include "gimple.h" +#include "cgraph.h" /* This file should be included last. */ #include "target-def.h" @@ -4555,6 +4556,61 @@ gcn_vectorization_cost (enum vect_cost_for_stmt ARG_UNUSED (type_of_cost), return 1; } +/* Implement TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN. */ + +static int +gcn_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *ARG_UNUSED (node), + struct cgraph_simd_clone *clonei, + tree base_type, + int ARG_UNUSED (num)) +{ + unsigned int elt_bits = GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type)); + + if (known_eq (clonei->simdlen, 0U)) + clonei->simdlen = 64; + else if (maybe_ne (clonei->simdlen, 64U)) + { + /* Note that x86 has a similar message that is likely to trigger on + sizes that are OK for gcn; the user can't win. */ + warning_at (DECL_SOURCE_LOCATION (node->decl), 0, + "unsupported simdlen %wd (amdgcn)", + clonei->simdlen.to_constant ()); + return 0; + } + + clonei->vecsize_mangle = 'n'; + clonei->vecsize_int = 0; + clonei->vecsize_float = 0; + + /* DImode ought to be more natural here, but VOIDmode produces better code, + at present, due to the shift-and-test steps not being optimized away + inside the in-branch clones. */ + clonei->mask_mode = VOIDmode; + + return 1; +} + +/* Implement TARGET_SIMD_CLONE_ADJUST. */ + +static void +gcn_simd_clone_adjust (struct cgraph_node *ARG_UNUSED (node)) +{ + /* This hook has to be defined when + TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN is defined, but we don't + need it to do anything yet. */ +} + +/* Implement TARGET_SIMD_CLONE_USABLE. */ + +static int +gcn_simd_clone_usable (struct cgraph_node *ARG_UNUSED (node)) +{ + /* We don't need to do anything here because + gcn_simd_clone_compute_vecsize_and_simdlen currently only returns one + possibility. */ + return 0; +} + /* }}} */ /* {{{ md_reorg pass. */ @@ -6643,6 +6699,13 @@ gcn_dwarf_register_span (rtx rtl) #define TARGET_SECTION_TYPE_FLAGS gcn_section_type_flags #undef TARGET_SCALAR_MODE_SUPPORTED_P #define TARGET_SCALAR_MODE_SUPPORTED_P gcn_scalar_mode_supported_p +#undef TARGET_SIMD_CLONE_ADJUST +#define TARGET_SIMD_CLONE_ADJUST gcn_simd_clone_adjust +#undef TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN +#define TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN \ + gcn_simd_clone_compute_vecsize_and_simdlen +#undef TARGET_SIMD_CLONE_USABLE +#define TARGET_SIMD_CLONE_USABLE gcn_simd_clone_usable #undef TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P #define TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P \ gcn_small_register_classes_for_mode_p diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c index 50429049500..cd65fc343f1 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c @@ -56,3 +56,5 @@ main () return 0; } +/* { dg-warning {unsupported simdlen 8 \(amdgcn\)} "" { target amdgcn*-*-* } 18 } */ +/* { dg-warning {unsupported simdlen 4 \(amdgcn\)} "" { target amdgcn*-*-* } 18 } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c index f89c73a961b..ffcbf9380d6 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c @@ -50,3 +50,5 @@ main () return 0; } +/* { dg-warning {unsupported simdlen 8 \(amdgcn\)} "" { target amdgcn*-*-* } 18 } */ +/* { dg-warning {unsupported simdlen 4 \(amdgcn\)} "" { target amdgcn*-*-* } 18 } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-3.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-3.c index 75ce696ed66..18d68779cc5 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-3.c +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-3.c @@ -43,3 +43,4 @@ main () return 0; } +/* { dg-warning {unsupported simdlen 4 \(amdgcn\)} "" { target amdgcn*-*-* } 15 } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c index debbe77b79d..e9af0b83162 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c @@ -46,3 +46,4 @@ main () return 0; } +/* { dg-warning {unsupported simdlen 8 \(amdgcn\)} "" { target amdgcn*-*-* } 17 } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c index 6a098d9a51a..46da496524d 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c @@ -41,3 +41,4 @@ main () return 0; } +/* { dg-warning {unsupported simdlen 4 \(amdgcn\)} "" { target amdgcn*-*-* } 15 } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c index 1bfd19dc8ab..f414285a170 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c @@ -92,3 +92,5 @@ main () return 0; } +/* { dg-warning {unsupported simdlen 8 \(amdgcn\)} "" { target amdgcn*-*-* } 17 } */ +/* { dg-warning {unsupported simdlen 8 \(amdgcn\)} "" { target amdgcn*-*-* } 24 } */ From patchwork Tue Aug 9 13:23:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 452 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:6a10:20da:b0:2d3:3019:e567 with SMTP id n26csp2516047pxc; Tue, 9 Aug 2022 06:26:26 -0700 (PDT) X-Google-Smtp-Source: AA6agR43jJDl4VqjldKlVrtcsbDLHUrbQPAy4H3P6afiehMXTAV22UJi8hh4ZfENLBvU2GC+GW11 X-Received: by 2002:a05:6402:268d:b0:43d:b9d0:9efc with SMTP id w13-20020a056402268d00b0043db9d09efcmr22276828edd.92.1660051586174; Tue, 09 Aug 2022 06:26:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1660051586; cv=none; d=google.com; s=arc-20160816; b=tfpXwRVRFkI0R+cP05Kxhw/79zUMe2GhaBSXPbQf8VTE5wmo8/NX7idA9rHPjJP8yX m1YuPlXtzItGYuQqn6ani3Xq7yVGfpBcFE6yQpPKoWZ5J66CYj80lrHXzjOJsGxlAAOA 2gudJhM/ziu3PNp8vg4s/VnBM6IyMc7BnNuCfTOOZ1dT9ojv2ieGbfhj2h3c9xihZvaq WIAe4pogk84EsKgVxFkfHWgT5WFYIC7iGyPDhIxE/oyTuLxY1k60dhhxbQr3C62mZ5LY QkAFhIA6HCJGXQ27gcyyytWFeQrY4eT3/qGgp4qwH4vOTX019ni6yN9ZMp2se0A/zpx5 a/4A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:to:from :ironport-sdr:dmarc-filter:delivered-to; bh=kH1+ceNRcjSi+GbzcX0JAiB0EcZqp6AuF6zC2WcXuho=; b=vdFQWsmf0oqpOqnh9sKwSnZUbLD7/gFcK28eYu9eqJfn1DexOkCM69/b0YBuYBmpj9 CdkSSh8ndEJSqgmCk7aQb6WgtkXm2a2xcb5y7de7HJ//0qOg2kk9NLvnh1KBZea9WGK4 iP+6ygAFKC3Mz/Eo/hZ2VLJzy0iDS1xH4FSw8ooDp57KU+NjIDJU41BZPMn6xY4kvlsD 3WRBsyBKwDDwPqrorzVOTujWZ9D7HsdKkxJiybMAC1CTqTTZFiOnKtghYuykTPc6R/BT Azqrlb6N5dXFjS32gw1F0EyJZH6DVcj7UythvZ4nefmgE9Dz/gOpBrgByv4Nko9Xi/uP 9UvA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id ga33-20020a1709070c2100b007306f35b498si2182617ejc.711.2022.08.09.06.26.25 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Aug 2022 06:26:26 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 135BB3856091 for ; Tue, 9 Aug 2022 13:26:08 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id E64363856944 for ; Tue, 9 Aug 2022 13:24:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E64363856944 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.93,224,1654588800"; d="scan'208";a="80994427" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa2.mentor.iphmx.com with ESMTP; 09 Aug 2022 05:24:13 -0800 IronPort-SDR: jLhJbrtUh0Q38rGTBCTa6ksRNTMrHR0UpqKyXwYsgxYeGS9U328ao0NEn2gezilm7fMoi3bcR7 B5AxiHUnSTwKonmA68+zHUZrMqhYZJapPeNh+OZx1ny635SEfpQrVKedThgruYQ4HKoQiuB1OU NUHfHkDoFJH7ErUJI0+kcGAaKpzTZ8/ertO7xbQfy9gXAcKaz72TBMJwGE1W2lPhOK7zSRfSBp oQ4KLn2pd1zyliuJ8yIdAtbUubpUirzGIWxTmhKN8A0vPhlBzkGFPkxnAJqUvBaHW5zXlfInSm 8Is= From: Andrew Stubbs To: Subject: [PATCH 3/3] vect: inbranch SIMD clones Date: Tue, 9 Aug 2022 14:23:50 +0100 Message-ID: X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-15.mgc.mentorg.com (139.181.222.15) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1740690252125772389?= X-GMAIL-MSGID: =?utf-8?q?1740690252125772389?= There has been support for generating "inbranch" SIMD clones for a long time, but nothing actually uses them (as far as I can see). This patch add supports for a sub-set of possible cases (those using mask_mode == VOIDmode). The other cases fail to vectorize, just as before, so there should be no regressions. The sub-set of support should cover all cases needed by amdgcn, at present. gcc/ChangeLog: * omp-simd-clone.cc (simd_clone_adjust_argument_types): Set vector_type for mask arguments also. * tree-if-conv.cc: Include cgraph.h. (if_convertible_stmt_p): Do if conversions for calls to SIMD calls. (predicate_statements): Pass the predicate to SIMD functions. * tree-vect-stmts.cc (vectorizable_simd_clone_call): Permit calls to clones with mask arguments, in some cases. Generate the mask vector arguments. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-simd-clone-16.c: New test. * gcc.dg/vect/vect-simd-clone-16b.c: New test. * gcc.dg/vect/vect-simd-clone-16c.c: New test. * gcc.dg/vect/vect-simd-clone-16d.c: New test. * gcc.dg/vect/vect-simd-clone-16e.c: New test. * gcc.dg/vect/vect-simd-clone-16f.c: New test. * gcc.dg/vect/vect-simd-clone-17.c: New test. * gcc.dg/vect/vect-simd-clone-17b.c: New test. * gcc.dg/vect/vect-simd-clone-17c.c: New test. * gcc.dg/vect/vect-simd-clone-17d.c: New test. * gcc.dg/vect/vect-simd-clone-17e.c: New test. * gcc.dg/vect/vect-simd-clone-17f.c: New test. * gcc.dg/vect/vect-simd-clone-18.c: New test. * gcc.dg/vect/vect-simd-clone-18b.c: New test. * gcc.dg/vect/vect-simd-clone-18c.c: New test. * gcc.dg/vect/vect-simd-clone-18d.c: New test. * gcc.dg/vect/vect-simd-clone-18e.c: New test. * gcc.dg/vect/vect-simd-clone-18f.c: New test. --- gcc/omp-simd-clone.cc | 1 + .../gcc.dg/vect/vect-simd-clone-16.c | 89 ++++++++++++ .../gcc.dg/vect/vect-simd-clone-16b.c | 14 ++ .../gcc.dg/vect/vect-simd-clone-16c.c | 16 +++ .../gcc.dg/vect/vect-simd-clone-16d.c | 16 +++ .../gcc.dg/vect/vect-simd-clone-16e.c | 14 ++ .../gcc.dg/vect/vect-simd-clone-16f.c | 16 +++ .../gcc.dg/vect/vect-simd-clone-17.c | 89 ++++++++++++ .../gcc.dg/vect/vect-simd-clone-17b.c | 14 ++ .../gcc.dg/vect/vect-simd-clone-17c.c | 16 +++ .../gcc.dg/vect/vect-simd-clone-17d.c | 16 +++ .../gcc.dg/vect/vect-simd-clone-17e.c | 14 ++ .../gcc.dg/vect/vect-simd-clone-17f.c | 16 +++ .../gcc.dg/vect/vect-simd-clone-18.c | 89 ++++++++++++ .../gcc.dg/vect/vect-simd-clone-18b.c | 14 ++ .../gcc.dg/vect/vect-simd-clone-18c.c | 16 +++ .../gcc.dg/vect/vect-simd-clone-18d.c | 16 +++ .../gcc.dg/vect/vect-simd-clone-18e.c | 14 ++ .../gcc.dg/vect/vect-simd-clone-18f.c | 16 +++ gcc/tree-if-conv.cc | 39 ++++- gcc/tree-vect-stmts.cc | 134 ++++++++++++++---- 21 files changed, 641 insertions(+), 28 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16b.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16c.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16d.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16e.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17b.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17c.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17d.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17e.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18b.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18c.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18d.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18e.c create mode 100644 gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc index 258d3c6377f..58e3dc8b2e9 100644 --- a/gcc/omp-simd-clone.cc +++ b/gcc/omp-simd-clone.cc @@ -716,6 +716,7 @@ simd_clone_adjust_argument_types (struct cgraph_node *node) } sc->args[i].orig_type = base_type; sc->args[i].arg_type = SIMD_CLONE_ARG_TYPE_MASK; + sc->args[i].vector_type = adj.type; } if (node->definition) diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16.c new file mode 100644 index 00000000000..ffaabb30d1e --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16.c @@ -0,0 +1,89 @@ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +/* Test that simd inbranch clones work correctly. */ + +#ifndef TYPE +#define TYPE int +#endif + +/* A simple function that will be cloned. */ +#pragma omp declare simd +TYPE __attribute__((noinline)) +foo (TYPE a) +{ + return a + 1; +} + +/* Check that "inbranch" clones are called correctly. */ + +void __attribute__((noinline)) +masked (TYPE * __restrict a, TYPE * __restrict b, int size) +{ + #pragma omp simd + for (int i = 0; i < size; i++) + b[i] = a[i]<1 ? foo(a[i]) : a[i]; +} + +/* Check that "inbranch" works when there might be unrolling. */ + +void __attribute__((noinline)) +masked_fixed (TYPE * __restrict a, TYPE * __restrict b) +{ + #pragma omp simd + for (int i = 0; i < 128; i++) + b[i] = a[i]<1 ? foo(a[i]) : a[i]; +} + +/* Validate the outputs. */ + +void +check_masked (TYPE *b, int size) +{ + for (int i = 0; i < size; i++) + if (((TYPE)i < 1 && b[i] != (TYPE)(i + 1)) + || ((TYPE)i >= 1 && b[i] != (TYPE)i)) + { + __builtin_printf ("error at %d\n", i); + __builtin_exit (1); + } +} + +int +main () +{ + TYPE a[1024]; + TYPE b[1024]; + + for (int i = 0; i < 1024; i++) + a[i] = i; + + masked_fixed (a, b); + check_masked (b, 128); + + /* Test various sizes to cover machines with different vectorization + factors. */ + for (int size = 8; size <= 1024; size *= 2) + { + masked (a, b, size); + check_masked (b, size); + } + + /* Test sizes that might exercise the partial vector code-path. */ + for (int size = 8; size <= 1024; size *= 2) + { + masked (a, b, size-4); + check_masked (b, size-4); + } + + return 0; +} + +/* Ensure the the in-branch simd clones are used on targets that support + them. These counts include all call and definitions. */ + +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 18 "optimized" { target x86_64-*-* } } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 7 "optimized" { target amdgcn-*-* } } } */ +/* TODO: aarch64 */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16b.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16b.c new file mode 100644 index 00000000000..a503ef85238 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16b.c @@ -0,0 +1,14 @@ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +#define TYPE float +#include "vect-simd-clone-16.c" + +/* Ensure the the in-branch simd clones are used on targets that support + them. These counts include all call and definitions. */ + +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 19 "optimized" { target x86_64-*-* } } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 7 "optimized" { target amdgcn-*-* } } } */ +/* TODO: aarch64 */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16c.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16c.c new file mode 100644 index 00000000000..6563879df71 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16c.c @@ -0,0 +1,16 @@ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +#define TYPE short +#include "vect-simd-clone-16.c" + +/* Ensure the the in-branch simd clones are used on targets that support + them. These counts include all call and definitions. */ + +/* { dg-final { scan-tree-dump-times "simdclone" 7 "optimized" { target amdgcn-*-* } } } */ +/* TODO: aarch64 */ + +/* Fails to use in-branch clones for TYPE=short. */ +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 16 "optimized" { target x86_64-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16d.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16d.c new file mode 100644 index 00000000000..6c5e69482e5 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16d.c @@ -0,0 +1,16 @@ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +#define TYPE char +#include "vect-simd-clone-16.c" + +/* Ensure the the in-branch simd clones are used on targets that support + them. These counts include all call and definitions. */ + +/* { dg-final { scan-tree-dump-times "simdclone" 7 "optimized" { target amdgcn-*-* } } } */ +/* TODO: aarch64 */ + +/* Fails to use in-branch clones for TYPE=char. */ +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 16 "optimized" { target x86_64-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16e.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16e.c new file mode 100644 index 00000000000..6690844deae --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16e.c @@ -0,0 +1,14 @@ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +#define TYPE double +#include "vect-simd-clone-16.c" + +/* Ensure the the in-branch simd clones are used on targets that support + them. These counts include all call and definitions. */ + +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 19 "optimized" { target x86_64-*-* } } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 7 "optimized" { target amdgcn-*-* } } } */ +/* TODO: aarch64 */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c new file mode 100644 index 00000000000..e7b35a6a2dc --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16f.c @@ -0,0 +1,16 @@ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +#define TYPE __INT64_TYPE__ +#include "vect-simd-clone-16.c" + +/* Ensure the the in-branch simd clones are used on targets that support + them. These counts include all call and definitions. */ + +/* { dg-final { scan-tree-dump-times "simdclone" 7 "optimized" { target amdgcn-*-* } } } */ +/* TODO: aarch64 */ + +/* Fails to use in-branch clones for TYPE=int64. */ +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 18 "optimized" { target x86_64-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17.c new file mode 100644 index 00000000000..6f5d374a417 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17.c @@ -0,0 +1,89 @@ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +/* Test that simd inbranch clones work correctly. */ + +#ifndef TYPE +#define TYPE int +#endif + +/* A simple function that will be cloned. */ +#pragma omp declare simd uniform(b) +TYPE __attribute__((noinline)) +foo (TYPE a, TYPE b) +{ + return a + b; +} + +/* Check that "inbranch" clones are called correctly. */ + +void __attribute__((noinline)) +masked (TYPE * __restrict a, TYPE * __restrict b, int size) +{ + #pragma omp simd + for (int i = 0; i < size; i++) + b[i] = a[i]<1 ? foo(a[i], 1) : a[i]; +} + +/* Check that "inbranch" works when there might be unrolling. */ + +void __attribute__((noinline)) +masked_fixed (TYPE * __restrict a, TYPE * __restrict b) +{ + #pragma omp simd + for (int i = 0; i < 128; i++) + b[i] = a[i]<1 ? foo(a[i], 1) : a[i]; +} + +/* Validate the outputs. */ + +void +check_masked (TYPE *b, int size) +{ + for (int i = 0; i < size; i++) + if (((TYPE)i < 1 && b[i] != (TYPE)(i + 1)) + || ((TYPE)i >= 1 && b[i] != (TYPE)i)) + { + __builtin_printf ("error at %d\n", i); + __builtin_exit (1); + } +} + +int +main () +{ + TYPE a[1024]; + TYPE b[1024]; + + for (int i = 0; i < 1024; i++) + a[i] = i; + + masked_fixed (a, b); + check_masked (b, 128); + + /* Test various sizes to cover machines with different vectorization + factors. */ + for (int size = 8; size <= 1024; size *= 2) + { + masked (a, b, size); + check_masked (b, size); + } + + /* Test sizes that might exercise the partial vector code-path. */ + for (int size = 8; size <= 1024; size *= 2) + { + masked (a, b, size-4); + check_masked (b, size-4); + } + + return 0; +} + +/* Ensure the the in-branch simd clones are used on targets that support + them. These counts include all call and definitions. */ + +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 18 "optimized" { target x86_64-*-* } } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */ +/* TODO: aarch64 */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17b.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17b.c new file mode 100644 index 00000000000..1e2c3ab11b3 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17b.c @@ -0,0 +1,14 @@ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +#define TYPE float +#include "vect-simd-clone-17.c" + +/* Ensure the the in-branch simd clones are used on targets that support + them. These counts include all call and definitions. */ + +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 19 "optimized" { target x86_64-*-* } } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */ +/* TODO: aarch64 */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17c.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17c.c new file mode 100644 index 00000000000..007001de669 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17c.c @@ -0,0 +1,16 @@ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +#define TYPE short +#include "vect-simd-clone-17.c" + +/* Ensure the the in-branch simd clones are used on targets that support + them. These counts include all call and definitions. */ + +/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */ +/* TODO: aarch64 */ + +/* Fails to use in-branch clones for TYPE=short. */ +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 16 "optimized" { target x86_64-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17d.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17d.c new file mode 100644 index 00000000000..abb85a4ceee --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17d.c @@ -0,0 +1,16 @@ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +#define TYPE char +#include "vect-simd-clone-17.c" + +/* Ensure the the in-branch simd clones are used on targets that support + them. These counts include all call and definitions. */ + +/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */ +/* TODO: aarch64 */ + +/* Fails to use in-branch clones for TYPE=char. */ +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 16 "optimized" { target x86_64-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17e.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17e.c new file mode 100644 index 00000000000..2c1d8a659bd --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17e.c @@ -0,0 +1,14 @@ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +#define TYPE double +#include "vect-simd-clone-17.c" + +/* Ensure the the in-branch simd clones are used on targets that support + them. These counts include all call and definitions. */ + +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 19 "optimized" { target x86_64-*-* } } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */ +/* TODO: aarch64 */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c new file mode 100644 index 00000000000..582e690304f --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-17f.c @@ -0,0 +1,16 @@ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +#define TYPE __INT64_TYPE__ +#include "vect-simd-clone-17.c" + +/* Ensure the the in-branch simd clones are used on targets that support + them. These counts include all call and definitions. */ + +/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */ +/* TODO: aarch64 */ + +/* Fails to use in-branch clones for TYPE=int64. */ +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 18 "optimized" { target x86_64-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18.c new file mode 100644 index 00000000000..750a3f92b62 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18.c @@ -0,0 +1,89 @@ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +/* Test that simd inbranch clones work correctly. */ + +#ifndef TYPE +#define TYPE int +#endif + +/* A simple function that will be cloned. */ +#pragma omp declare simd uniform(b) +TYPE __attribute__((noinline)) +foo (TYPE b, TYPE a) +{ + return a + b; +} + +/* Check that "inbranch" clones are called correctly. */ + +void __attribute__((noinline)) +masked (TYPE * __restrict a, TYPE * __restrict b, int size) +{ + #pragma omp simd + for (int i = 0; i < size; i++) + b[i] = a[i]<1 ? foo(1, a[i]) : a[i]; +} + +/* Check that "inbranch" works when there might be unrolling. */ + +void __attribute__((noinline)) +masked_fixed (TYPE * __restrict a, TYPE * __restrict b) +{ + #pragma omp simd + for (int i = 0; i < 128; i++) + b[i] = a[i]<1 ? foo(1, a[i]) : a[i]; +} + +/* Validate the outputs. */ + +void +check_masked (TYPE *b, int size) +{ + for (int i = 0; i < size; i++) + if (((TYPE)i < 1 && b[i] != (TYPE)(i + 1)) + || ((TYPE)i >= 1 && b[i] != (TYPE)i)) + { + __builtin_printf ("error at %d\n", i); + __builtin_exit (1); + } +} + +int +main () +{ + TYPE a[1024]; + TYPE b[1024]; + + for (int i = 0; i < 1024; i++) + a[i] = i; + + masked_fixed (a, b); + check_masked (b, 128); + + /* Test various sizes to cover machines with different vectorization + factors. */ + for (int size = 8; size <= 1024; size *= 2) + { + masked (a, b, size); + check_masked (b, size); + } + + /* Test sizes that might exercise the partial vector code-path. */ + for (int size = 8; size <= 1024; size *= 2) + { + masked (a, b, size-4); + check_masked (b, size-4); + } + + return 0; +} + +/* Ensure the the in-branch simd clones are used on targets that support + them. These counts include all call and definitions. */ + +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 18 "optimized" { target x86_64-*-* } } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */ +/* TODO: aarch64 */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18b.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18b.c new file mode 100644 index 00000000000..a77ccf3bfcc --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18b.c @@ -0,0 +1,14 @@ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +#define TYPE float +#include "vect-simd-clone-18.c" + +/* Ensure the the in-branch simd clones are used on targets that support + them. These counts include all call and definitions. */ + +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 19 "optimized" { target x86_64-*-* } } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */ +/* TODO: aarch64 */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18c.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18c.c new file mode 100644 index 00000000000..bee5f338abe --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18c.c @@ -0,0 +1,16 @@ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +#define TYPE short +#include "vect-simd-clone-18.c" + +/* Ensure the the in-branch simd clones are used on targets that support + them. These counts include all call and definitions. */ + +/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */ +/* TODO: aarch64 */ + +/* Fails to use in-branch clones for TYPE=short. */ +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 16 "optimized" { target x86_64-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18d.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18d.c new file mode 100644 index 00000000000..a749edefdd7 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18d.c @@ -0,0 +1,16 @@ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +#define TYPE char +#include "vect-simd-clone-18.c" + +/* Ensure the the in-branch simd clones are used on targets that support + them. These counts include all call and definitions. */ + +/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */ +/* TODO: aarch64 */ + +/* Fails to use in-branch clones for TYPE=char. */ +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 16 "optimized" { target x86_64-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18e.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18e.c new file mode 100644 index 00000000000..061e0dc2621 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18e.c @@ -0,0 +1,14 @@ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +#define TYPE double +#include "vect-simd-clone-18.c" + +/* Ensure the the in-branch simd clones are used on targets that support + them. These counts include all call and definitions. */ + +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 19 "optimized" { target x86_64-*-* } } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */ +/* TODO: aarch64 */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c new file mode 100644 index 00000000000..a3037f5809a --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-18f.c @@ -0,0 +1,16 @@ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd -fdump-tree-optimized" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +#define TYPE __INT64_TYPE__ +#include "vect-simd-clone-18.c" + +/* Ensure the the in-branch simd clones are used on targets that support + them. These counts include all call and definitions. */ + +/* { dg-final { scan-tree-dump-times "simdclone" 6 "optimized" { target amdgcn-*-* } } } */ +/* TODO: aarch64 */ + +/* Fails to use in-branch clones for TYPE=int64. */ +/* { dg-skip-if "" { x86_64-*-* } { "-flto" } { "" } } */ +/* { dg-final { scan-tree-dump-times "simdclone" 18 "optimized" { target x86_64-*-* } } } */ diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc index 1c8e1a45234..82b21add802 100644 --- a/gcc/tree-if-conv.cc +++ b/gcc/tree-if-conv.cc @@ -122,6 +122,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-ssa-dse.h" #include "tree-vectorizer.h" #include "tree-eh.h" +#include "cgraph.h" /* Only handle PHIs with no more arguments unless we are asked to by simd pragma. */ @@ -1054,7 +1055,8 @@ if_convertible_gimple_assign_stmt_p (gimple *stmt, A statement is if-convertible if: - it is an if-convertible GIMPLE_ASSIGN, - it is a GIMPLE_LABEL or a GIMPLE_COND, - - it is builtins call. */ + - it is builtins call. + - it is a call to a function with a SIMD clone. */ static bool if_convertible_stmt_p (gimple *stmt, vec refs) @@ -1074,13 +1076,19 @@ if_convertible_stmt_p (gimple *stmt, vec refs) tree fndecl = gimple_call_fndecl (stmt); if (fndecl) { + /* We can vectorize some builtins and functions with SIMD + clones. */ int flags = gimple_call_flags (stmt); + struct cgraph_node *node = cgraph_node::get (fndecl); if ((flags & ECF_CONST) && !(flags & ECF_LOOPING_CONST_OR_PURE) - /* We can only vectorize some builtins at the moment, - so restrict if-conversion to those. */ && fndecl_built_in_p (fndecl)) return true; + else if (node && node->simd_clones != NULL) + { + need_to_predicate = true; + return true; + } } return false; } @@ -2614,6 +2622,31 @@ predicate_statements (loop_p loop) gimple_assign_set_rhs1 (stmt, ifc_temp_var (type, rhs, &gsi)); update_stmt (stmt); } + + /* Add a predicate parameter to functions that have a SIMD clone. + This will cause the vectorizer to match the "in branch" clone + variants because they also have the extra parameter, and serves + to build the mask vector in a natural way. */ + gcall *call = dyn_cast (gsi_stmt (gsi)); + if (call && !gimple_call_internal_p (call)) + { + tree orig_fndecl = gimple_call_fndecl (call); + int orig_nargs = gimple_call_num_args (call); + auto_vec args; + for (int i=0; i < orig_nargs; i++) + args.safe_push (gimple_call_arg (call, i)); + args.safe_push (cond); + + /* Replace the call with a new one that has the extra + parameter. The FUNCTION_DECL remains unchanged so that + the vectorizer can find the SIMD clones. This call will + either be deleted or replaced at that time, so the + mismatch is short-lived and we can live with it. */ + gcall *new_call = gimple_build_call_vec (orig_fndecl, args); + gimple_call_set_lhs (new_call, gimple_call_lhs (call)); + gsi_replace (&gsi, new_call, true); + } + lhs = gimple_get_lhs (gsi_stmt (gsi)); if (lhs && TREE_CODE (lhs) == SSA_NAME) ssa_names.add (lhs); diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index f582d238984..2214d216c15 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -4049,16 +4049,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info, || thisarginfo.dt == vect_external_def) gcc_assert (thisarginfo.vectype == NULL_TREE); else - { - gcc_assert (thisarginfo.vectype != NULL_TREE); - if (VECTOR_BOOLEAN_TYPE_P (thisarginfo.vectype)) - { - if (dump_enabled_p ()) - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, - "vector mask arguments are not supported\n"); - return false; - } - } + gcc_assert (thisarginfo.vectype != NULL_TREE); /* For linear arguments, the analyze phase should have saved the base and step in STMT_VINFO_SIMD_CLONE_INFO. */ @@ -4151,9 +4142,6 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info, if (target_badness < 0) continue; this_badness += target_badness * 512; - /* FORNOW: Have to add code to add the mask argument. */ - if (n->simdclone->inbranch) - continue; for (i = 0; i < nargs; i++) { switch (n->simdclone->args[i].arg_type) @@ -4191,7 +4179,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info, i = -1; break; case SIMD_CLONE_ARG_TYPE_MASK: - gcc_unreachable (); + break; } if (i == (size_t) -1) break; @@ -4217,18 +4205,55 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info, return false; for (i = 0; i < nargs; i++) - if ((arginfo[i].dt == vect_constant_def - || arginfo[i].dt == vect_external_def) - && bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR) - { - tree arg_type = TREE_TYPE (gimple_call_arg (stmt, i)); - arginfo[i].vectype = get_vectype_for_scalar_type (vinfo, arg_type, - slp_node); - if (arginfo[i].vectype == NULL - || !constant_multiple_p (bestn->simdclone->simdlen, - simd_clone_subparts (arginfo[i].vectype))) + { + if ((arginfo[i].dt == vect_constant_def + || arginfo[i].dt == vect_external_def) + && bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR) + { + tree arg_type = TREE_TYPE (gimple_call_arg (stmt, i)); + arginfo[i].vectype = get_vectype_for_scalar_type (vinfo, arg_type, + slp_node); + if (arginfo[i].vectype == NULL + || !constant_multiple_p (bestn->simdclone->simdlen, + simd_clone_subparts (arginfo[i].vectype))) + return false; + } + + if (bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR + && VECTOR_BOOLEAN_TYPE_P (bestn->simdclone->args[i].vector_type)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "vector mask arguments are not supported.\n"); return false; - } + } + + if (bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_MASK + && bestn->simdclone->mask_mode == VOIDmode + && (simd_clone_subparts (bestn->simdclone->args[i].vector_type) + != simd_clone_subparts (arginfo[i].vectype))) + { + /* FORNOW we only have partial support for vector-type masks that + can't hold all of simdlen. */ + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, + vect_location, + "in-branch vector clones are not yet" + " supported for mismatched vector sizes.\n"); + return false; + } + if (bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_MASK + && bestn->simdclone->mask_mode != VOIDmode) + { + /* FORNOW don't support integer-type masks. */ + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, + vect_location, + "in-branch vector clones are not yet" + " supported for integer mask modes.\n"); + return false; + } + } fndecl = bestn->decl; nunits = bestn->simdclone->simdlen; @@ -4417,6 +4442,65 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info, } } break; + case SIMD_CLONE_ARG_TYPE_MASK: + atype = bestn->simdclone->args[i].vector_type; + if (bestn->simdclone->mask_mode != VOIDmode) + { + /* FORNOW: this is disabled above. */ + gcc_unreachable (); + } + else + { + tree elt_type = TREE_TYPE (atype); + tree one = fold_convert (elt_type, integer_one_node); + tree zero = fold_convert (elt_type, integer_zero_node); + o = vector_unroll_factor (nunits, + simd_clone_subparts (atype)); + for (m = j * o; m < (j + 1) * o; m++) + { + if (simd_clone_subparts (atype) + < simd_clone_subparts (arginfo[i].vectype)) + { + /* The mask type has fewer elements than simdlen. */ + + /* FORNOW */ + gcc_unreachable (); + } + else if (simd_clone_subparts (atype) + == simd_clone_subparts (arginfo[i].vectype)) + { + /* The SIMD clone function has the same number of + elements as the current function. */ + if (m == 0) + { + vect_get_vec_defs_for_operand (vinfo, stmt_info, + o * ncopies, + op, + &vec_oprnds[i]); + vec_oprnds_i[i] = 0; + } + vec_oprnd0 = vec_oprnds[i][vec_oprnds_i[i]++]; + vec_oprnd0 + = build3 (VEC_COND_EXPR, atype, vec_oprnd0, + build_vector_from_val (atype, one), + build_vector_from_val (atype, zero)); + gassign *new_stmt + = gimple_build_assign (make_ssa_name (atype), + vec_oprnd0); + vect_finish_stmt_generation (vinfo, stmt_info, + new_stmt, gsi); + vargs.safe_push (gimple_assign_lhs (new_stmt)); + } + else + { + /* The mask type has more elements than simdlen. */ + + /* FORNOW */ + gcc_unreachable (); + } + } + } + break; case SIMD_CLONE_ARG_TYPE_UNIFORM: vargs.safe_push (op); break;