From patchwork Wed Jun 28 13:41:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 113880 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp8936044vqr; Wed, 28 Jun 2023 06:42:44 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6Gu7DpbKtuhBejikhfvkplmiae1cq4k+RN+Z5RfKTahWG+RrxPzA2MGrbWfuSacK5yOI3I X-Received: by 2002:aa7:cd09:0:b0:51d:9477:7826 with SMTP id b9-20020aa7cd09000000b0051d94777826mr6910628edw.0.1687959764081; Wed, 28 Jun 2023 06:42:44 -0700 (PDT) Received: from sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id g8-20020a50ee08000000b0051db1382f7asi1577162eds.453.2023.06.28.06.42.43 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Jun 2023 06:42:44 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=H3HdBzA2; arc=fail (signature failed); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B8DC43857730 for ; Wed, 28 Jun 2023 13:42:29 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B8DC43857730 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1687959749; bh=GFudQkdviZu5/tTMp8cLL7Vch/f/MQd4AIY9fvo9rMA=; h=Date:To:Cc:Subject:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=H3HdBzA2woPDq6idMh5NrGPKdRVteCUv8JK3ZZfLqCkTxJeIYwXHmOs6NploAeaeb JJAuHZN6/3VNAKpZ/gk1HZKFZC/NcYA9hHzif67acTMRUthV76Tr6R1sKt1KI52eOH QhoIQ+ytzcN0NvYqiwAVZv+k5JVBQE1l2y8pqiBg= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR03-DBA-obe.outbound.protection.outlook.com (mail-dbaeur03on2086.outbound.protection.outlook.com [40.107.104.86]) by sourceware.org (Postfix) with ESMTPS id 90EAE3858C31 for ; Wed, 28 Jun 2023 13:41:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 90EAE3858C31 Received: from AM6PR04CA0035.eurprd04.prod.outlook.com (2603:10a6:20b:92::48) by AS8PR08MB6342.eurprd08.prod.outlook.com (2603:10a6:20b:31a::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.26; Wed, 28 Jun 2023 13:41:34 +0000 Received: from AM7EUR03FT029.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:92:cafe::18) by AM6PR04CA0035.outlook.office365.com (2603:10a6:20b:92::48) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6544.19 via Frontend Transport; Wed, 28 Jun 2023 13:41:34 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM7EUR03FT029.mail.protection.outlook.com (100.127.140.143) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6544.18 via Frontend Transport; Wed, 28 Jun 2023 13:41:34 +0000 Received: ("Tessian outbound 52217515e112:v142"); Wed, 28 Jun 2023 13:41:34 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 9531fbda2c421124 X-CR-MTA-TID: 64aa7808 Received: from 2e3920b91d91.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 728A8054-8687-4062-9206-931E3E3A8952.1; Wed, 28 Jun 2023 13:41:27 +0000 Received: from EUR04-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 2e3920b91d91.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 28 Jun 2023 13:41:27 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=mt/bmDD28g6aPAWXOgaA5XnvoM8DhMTSUXWImT9vy8MXx7j+aPoxyz3HYi203JkFE0qhmGc97ZpxB7OamEXUnv/OSSt/Oy6/GPO4G7r+TznSs8mIhF4lokJB9iUtSnq4pUgK6WuGCRelbYatN8mlcx+Tog7FnJ7mZ+Mf4oLS/PFfSzBJXvXyJyoFQpiBV8BA9ZNTetjMBAqp8e5PSrS6inyNsCofIXVzR2yhsYEA6EuMyEkw0cXQ0aCkYFUvtlORhig8WKwGPHCvPTF/XyYWuxyO77cV571+ZAWtnUwe+M5v8YLv9AUcNX8wnu+8tLeVMl4dpCqTAqH2ejF1CCIpEw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=GFudQkdviZu5/tTMp8cLL7Vch/f/MQd4AIY9fvo9rMA=; b=JI5S0TziupFQrHFkBgDkukDT+EJalzgQUebkqXh1a3VmDabZwHvNEFVHyqfP0fkmmGGv4O8vAr5azIxK0WXh1wvpBuK2dEZ4zir7mZmC8J82EbOJCktjIzMwyQ7iJYcAPLYSInFo5kXd0ph8+zPALMILXQvtnbrXvpMzKnUPbL+lYyf4nXAkT7bNpudViaeIFvV91TlFCmESjXDbHqMhH0teLK7em0BaeyPMyDk1VTCAIEADXmx6Af2s8lcKH9/LoqP9yzPQTR46yA3Uof+b/7E3+Jx75gzl4ZWCEF9V9gaKn3mkkKWbmv+mUOBr/hfrs2TLYUMDoXhj/D5tKl7ZhQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by DBAPR08MB5576.eurprd08.prod.outlook.com (2603:10a6:10:1ae::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.26; Wed, 28 Jun 2023 13:41:25 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::2301:1cde:cfe7:eaf0]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::2301:1cde:cfe7:eaf0%6]) with mapi id 15.20.6521.026; Wed, 28 Jun 2023 13:41:25 +0000 Date: Wed, 28 Jun 2023 14:41:21 +0100 To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, rguenther@suse.de, jlaw@ventanamicro.com Subject: [PATCH 1/19]middle-end ifcvt: Support bitfield lowering of multiple-exit loops Message-ID: Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: LO2P265CA0122.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:9f::14) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|DBAPR08MB5576:EE_|AM7EUR03FT029:EE_|AS8PR08MB6342:EE_ X-MS-Office365-Filtering-Correlation-Id: 1fb90388-551b-4921-0a3c-08db77dd6398 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: LumKpFuc4tFihRH1CEdkjX1C3GqBs/vBD4Q3++eNHnTb35KwErcKQdjPs0K/Q3D5EMLDfmPikWQuCJRtdmDMDOAz/L6zCHmeYmgBYo9M+K3U+lHSEjOqpBdJBusI6REf8sD34c0GPlE8/NCZzfghG06dp1C2esoUxvC0rev+MzYjlfdQVsN0acPWgoRYmJ1AqD+CWRG+di0mlfaAAZsJPzCnjuPWJXXm9LsXNDSFpUdBQSDS/j5CnoNSVIj8p+SpNRyLwRXxZ0wgh7m+uDr/7jLR9ROElZM6oRrEEHbM2Ylf0g3hbPTm0DFKgsnaxMuOmyWqOH+hllyq0NlqacVGY+pH+VXty6d1oHRKVgHrKWiaGSo8oO6PmJp3tM72QICZkoe1pG8yHkFFMgrHSp8222/a9WPADrxs4Yhum3uU7tIVCe04EyqjSVCRGAKgkhvPSSVq8hDm6qw6s44+8MuMxanRanDlBIW2nbtx4C6M0XSTSRZg0tM+Ltc4R7RDVvnsW1zsEwdjtzNqIcNM0QorJ+7jFUf5gAn8mUN97OVOKfxwkq7bkrcs9xf5fYMobI5TltxMVzlH/67wOHwG75nHupO6kyNeP5DGdkChRwFUkpp5aFO9yTwzGObByvAxRUuNNS/4G6omOD8jFotnOv9m8w== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230028)(4636009)(136003)(346002)(39860400002)(396003)(376002)(366004)(451199021)(6486002)(2616005)(83380400001)(33964004)(6666004)(6512007)(6506007)(26005)(44144004)(2906002)(186003)(478600001)(235185007)(30864003)(44832011)(6916009)(5660300002)(36756003)(4326008)(66946007)(316002)(38100700002)(8936002)(41300700001)(66476007)(8676002)(86362001)(84970400001)(66556008)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: DBAPR08MB5576 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM7EUR03FT029.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: c02c737b-3ab7-41cf-270c-08db77dd5df4 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: ZBRiesHwYpWDg48CgaHkMfnmw/srXqa7cHXBgK6XNDwwg7wq8z2iWPcM5ogOelDrymf/fz7s47fuFUZmxrTAs/cARDZHB2TbeXUiUaczcld5jAHz9K1tG6ExUuUGWgdk0MJ5sjP1WFXMYBvx7Ah9IqSt9JYbTVm9cDW5K0u1q5Qmp2/7APs0So7toLJDvqjVS3qjeJAYto1go43pT0vFmfKnVB9nu+M/5EuNeZQHaQD2c0/q6FJbGJM2Y3XbGYNXGi25uXRnlpi97AoCODOU8cCsjNV0MqXDh+kQqAaBFHahsBDKf0VwxiDrN6x6xrpo8Vs2MGsJ0CKSRxCBS+B1UkSTWsER4WCP4Ttd6p2uIOeDI2GIztNdDA/26da0l4ILvMIufor8CU2KCKRWvGnPOAmuCTwUX47kG+V9zsuRvrF/sSAoBUACIztMjXVW50ohIxUitjYTIRU2v1NGb94D01PqNs3NFW0TrINgA+KnzRPxny12XjohyFEs21AydSYIa+fjjm0GlPpxpPO59U7GBdIDuF3iHDhSFWgBM3DC/0zPewF0Zw7DahRWTm6PQSUnQpf9dEu6pAcfz+YdvUrWNNELQfjMeX1vIBztBJJTBxtK61O1wxNPOpchAhMaqb1YCCdmv+7eQ0MjzPaVM53zlury9WgWq2BwxDRzSFqi7FLrVTYWZ6CEUL0NogLPGGjXY6aAsv162EvEd0m+VxmhmRx795+DceHR1pu9o8MA794aGJOaMjEglnEf/vZnRek8a6UqhhqJucuhNvVILlC90FGl7t08nKC9zML4B1Xalxc= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230028)(4636009)(136003)(39860400002)(396003)(346002)(376002)(451199021)(46966006)(36840700001)(40470700004)(6512007)(84970400001)(82310400005)(36860700001)(26005)(36756003)(356005)(30864003)(82740400003)(235185007)(44832011)(70206006)(86362001)(41300700001)(6916009)(40460700003)(40480700001)(4326008)(81166007)(8936002)(70586007)(316002)(8676002)(107886003)(47076005)(5660300002)(6506007)(6486002)(186003)(83380400001)(44144004)(33964004)(6666004)(2906002)(336012)(2616005)(478600001)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Jun 2023 13:41:34.4662 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 1fb90388-551b-4921-0a3c-08db77dd6398 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM7EUR03FT029.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB6342 X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, KAM_DMARC_NONE, KAM_LOTSOFHASH, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tamar Christina via Gcc-patches From: Tamar Christina Reply-To: Tamar Christina Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769954097902437461?= X-GMAIL-MSGID: =?utf-8?q?1769954097902437461?= Hi, With the patch enabling the vectorization of early-breaks, we'd like to allow bitfield lowering in such loops, which requires the relaxation of allowing multiple exits when doing so. In order to avoid a similar issue to PR107275, the code that rejects loops with certain types of gimple_stmts was hoisted from 'if_convertible_loop_p_1' to 'get_loop_body_in_if_conv_order', to avoid trying to lower bitfields in loops we are not going to vectorize anyway. This also ensures 'ifcvt_local_dec' doesn't accidentally remove statements it shouldn't as it will never come across them. I made sure to add a comment to make clear that there is a direct connection between the two and if we were to enable vectorization of any other gimple statement we should make sure both handle it. NOTE: This patch accepted before but never committed because it is a no-op without the early break patch. This is a respun version of Andre's patch and rebased to changes in ifcvt and updated to handle multiple exits. Bootstrappend and regression tested on aarch64-none-linux-gnu and x86_64-pc-linux-gnu and no issues. gcc/ChangeLog: * tree-if-conv.cc (if_convertible_loop_p_1): Move check from here ... (get_loop_body_if_conv_order): ... to here. (if_convertible_loop_p): Remove single_exit check. (tree_if_conversion): Move single_exit check to if-conversion part and support multiple exits. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-bitfield-read-1-not.c: New test. * gcc.dg/vect/vect-bitfield-read-2-not.c: New test. * gcc.dg/vect/vect-bitfield-read-8.c: New test. * gcc.dg/vect/vect-bitfield-read-9.c: New test. Co-Authored-By: Andre Vieira --- inline copy of patch -- diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c new file mode 100644 index 0000000000000000000000000000000000000000..0d91067ebb27b1db2b2352975c43bce8b4171e3f --- diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c new file mode 100644 index 0000000000000000000000000000000000000000..0d91067ebb27b1db2b2352975c43bce8b4171e3f --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c @@ -0,0 +1,60 @@ +/* { dg-require-effective-target vect_shift } */ +/* { dg-require-effective-target vect_long_long } */ +/* { dg-additional-options { "-fdump-tree-ifcvt-all" } } */ + +#include +#include "tree-vect.h" + +extern void abort(void); + +struct s { + char a : 4; +}; + +#define N 32 +#define ELT0 {0} +#define ELT1 {1} +#define ELT2 {2} +#define ELT3 {3} +#define RES 56 +struct s A[N] + = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3}; + +int __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + int res = 0; + for (int i = 0; i < n; ++i) + { + switch (ptr[i].a) + { + case 0: + res += ptr[i].a + 1; + break; + case 1: + case 2: + case 3: + res += ptr[i].a; + break; + default: + return 0; + } + } + return res; +} + +int main (void) +{ + check_vect (); + + if (f(&A[0], N) != RES) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-not "Bitfield OK to lower." "ifcvt" } } */ + + diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2-not.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2-not.c new file mode 100644 index 0000000000000000000000000000000000000000..4ac7b3fc0dfd1c9d0b5e94a2ba6a745545577ec1 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2-not.c @@ -0,0 +1,49 @@ +/* { dg-require-effective-target vect_shift } */ +/* { dg-require-effective-target vect_long_long } */ +/* { dg-additional-options { "-fdump-tree-ifcvt-all" } } */ + +#include +#include "tree-vect.h" + +extern void abort(void); + +struct s { + char a : 4; +}; + +#define N 32 +#define ELT0 {0} +#define ELT1 {1} +#define ELT2 {2} +#define ELT3 {3} +#define RES 48 +struct s A[N] + = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3}; + +int __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + int res = 0; + for (int i = 0; i < n; ++i) + { + asm volatile ("" ::: "memory"); + res += ptr[i].a; + } + return res; +} + +int main (void) +{ + check_vect (); + + if (f(&A[0], N) != RES) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-not "Bitfield OK to lower." "ifcvt" } } */ + + diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-8.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-8.c new file mode 100644 index 0000000000000000000000000000000000000000..52cfd33d937ae90f3fe9556716c90e098b768ac8 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-8.c @@ -0,0 +1,49 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target vect_shift } */ +/* { dg-additional-options { "-fdump-tree-ifcvt-all" } } */ + +#include +#include "tree-vect.h" + +extern void abort(void); + +struct s { int i : 31; }; + +#define ELT0 {0} +#define ELT1 {1} +#define ELT2 {2} +#define ELT3 {3} +#define ELT4 {4} +#define N 32 +#define RES 25 +struct s A[N] + = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT4, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3}; + +int __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + int res = 0; + for (int i = 0; i < n; ++i) + { + if (ptr[i].i == 4) + return res; + res += ptr[i].i; + } + + return res; +} + +int main (void) +{ + check_vect (); + + if (f(&A[0], N) != RES) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump "Bitfield OK to lower." "ifcvt" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-9.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-9.c new file mode 100644 index 0000000000000000000000000000000000000000..ab814698131a5905def181eeed85d8a3c62b924b --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-9.c @@ -0,0 +1,51 @@ +/* { dg-require-effective-target vect_shift } */ +/* { dg-require-effective-target vect_long_long } */ +/* { dg-additional-options { "-fdump-tree-ifcvt-all" } } */ + +#include +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned i : 31; + char a : 4; +}; + +#define N 32 +#define ELT0 {0x7FFFFFFFUL, 0} +#define ELT1 {0x7FFFFFFFUL, 1} +#define ELT2 {0x7FFFFFFFUL, 2} +#define ELT3 {0x7FFFFFFFUL, 3} +#define ELT4 {0x7FFFFFFFUL, 4} +#define RES 9 +struct s A[N] + = { ELT0, ELT4, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3}; + +int __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + int res = 0; + for (int i = 0; i < n; ++i) + { + if (ptr[i].a) + return 9; + res += ptr[i].a; + } + return res; +} + +int main (void) +{ + check_vect (); + + if (f(&A[0], N) != RES) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump "Bitfield OK to lower." "ifcvt" } } */ + diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc index e342532a343a3c066142adeec5fdfaf736a653e5..cdb0fe4c29dfa531e3277925022d127b13ffcc16 100644 --- a/gcc/tree-if-conv.cc +++ b/gcc/tree-if-conv.cc @@ -586,7 +586,7 @@ add_to_dst_predicate_list (class loop *loop, edge e, /* Return true if one of the successor edges of BB exits LOOP. */ static bool -bb_with_exit_edge_p (class loop *loop, basic_block bb) +bb_with_exit_edge_p (const class loop *loop, basic_block bb) { edge e; edge_iterator ei; @@ -1268,6 +1268,44 @@ get_loop_body_in_if_conv_order (const class loop *loop) } free (blocks_in_bfs_order); BITMAP_FREE (visited); + + /* Go through loop and reject if-conversion or lowering of bitfields if we + encounter statements we do not believe the vectorizer will be able to + handle. If adding a new type of statement here, make sure + 'ifcvt_local_dce' is also able to handle it propertly. */ + for (index = 0; index < loop->num_nodes; index++) + { + basic_block bb = blocks[index]; + gimple_stmt_iterator gsi; + + bool may_have_nonlocal_labels + = bb_with_exit_edge_p (loop, bb) || bb == loop->latch; + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) + switch (gimple_code (gsi_stmt (gsi))) + { + case GIMPLE_LABEL: + if (!may_have_nonlocal_labels) + { + tree label + = gimple_label_label (as_a (gsi_stmt (gsi))); + if (DECL_NONLOCAL (label) || FORCED_LABEL (label)) + { + free (blocks); + return NULL; + } + } + /* Fallthru. */ + case GIMPLE_ASSIGN: + case GIMPLE_CALL: + case GIMPLE_DEBUG: + case GIMPLE_COND: + gimple_set_uid (gsi_stmt (gsi), 0); + break; + default: + free (blocks); + return NULL; + } + } return blocks; } @@ -1438,36 +1476,6 @@ if_convertible_loop_p_1 (class loop *loop, vec *refs) exit_bb = bb; } - for (i = 0; i < loop->num_nodes; i++) - { - basic_block bb = ifc_bbs[i]; - gimple_stmt_iterator gsi; - - bool may_have_nonlocal_labels - = bb_with_exit_edge_p (loop, bb) || bb == loop->latch; - for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) - switch (gimple_code (gsi_stmt (gsi))) - { - case GIMPLE_LABEL: - if (!may_have_nonlocal_labels) - { - tree label - = gimple_label_label (as_a (gsi_stmt (gsi))); - if (DECL_NONLOCAL (label) || FORCED_LABEL (label)) - return false; - } - /* Fallthru. */ - case GIMPLE_ASSIGN: - case GIMPLE_CALL: - case GIMPLE_DEBUG: - case GIMPLE_COND: - gimple_set_uid (gsi_stmt (gsi), 0); - break; - default: - return false; - } - } - data_reference_p dr; innermost_DR_map @@ -1579,14 +1587,6 @@ if_convertible_loop_p (class loop *loop, vec *refs) return false; } - /* More than one loop exit is too much to handle. */ - if (!single_exit (loop)) - { - if (dump_file && (dump_flags & TDF_DETAILS)) - fprintf (dump_file, "multiple exits\n"); - return false; - } - /* If one of the loop header's edge is an exit edge then do not apply if-conversion. */ FOR_EACH_EDGE (e, ei, loop->header->succs) @@ -3566,9 +3566,6 @@ tree_if_conversion (class loop *loop, vec *preds) aggressive_if_conv = true; } - if (!single_exit (loop)) - goto cleanup; - /* If there are more than two BBs in the loop then there is at least one if to convert. */ if (loop->num_nodes > 2 @@ -3588,15 +3585,25 @@ tree_if_conversion (class loop *loop, vec *preds) if (loop->num_nodes > 2) { - need_to_ifcvt = true; + /* More than one loop exit is too much to handle. */ + if (!single_exit (loop)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "Can not ifcvt due to multiple exits\n"); + } + else + { + need_to_ifcvt = true; - if (!if_convertible_loop_p (loop, &refs) || !dbg_cnt (if_conversion_tree)) - goto cleanup; + if (!if_convertible_loop_p (loop, &refs) + || !dbg_cnt (if_conversion_tree)) + goto cleanup; - if ((need_to_predicate || any_complicated_phi) - && ((!flag_tree_loop_vectorize && !loop->force_vectorize) - || loop->dont_vectorize)) - goto cleanup; + if ((need_to_predicate || any_complicated_phi) + && ((!flag_tree_loop_vectorize && !loop->force_vectorize) + || loop->dont_vectorize)) + goto cleanup; + } } if ((flag_tree_loop_vectorize || loop->force_vectorize) @@ -3687,7 +3694,8 @@ tree_if_conversion (class loop *loop, vec *preds) PHIs, those are to be kept in sync with the non-if-converted copy. ??? We'll still keep dead stores though. */ exit_bbs = BITMAP_ALLOC (NULL); - bitmap_set_bit (exit_bbs, single_exit (loop)->dest->index); + for (edge exit : get_loop_exit_edges (loop)) + bitmap_set_bit (exit_bbs, exit->dest->index); bitmap_set_bit (exit_bbs, loop->latch->index); std::pair *name_pair; --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c @@ -0,0 +1,60 @@ +/* { dg-require-effective-target vect_shift } */ +/* { dg-require-effective-target vect_long_long } */ +/* { dg-additional-options { "-fdump-tree-ifcvt-all" } } */ + +#include +#include "tree-vect.h" + +extern void abort(void); + +struct s { + char a : 4; +}; + +#define N 32 +#define ELT0 {0} +#define ELT1 {1} +#define ELT2 {2} +#define ELT3 {3} +#define RES 56 +struct s A[N] + = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3}; + +int __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + int res = 0; + for (int i = 0; i < n; ++i) + { + switch (ptr[i].a) + { + case 0: + res += ptr[i].a + 1; + break; + case 1: + case 2: + case 3: + res += ptr[i].a; + break; + default: + return 0; + } + } + return res; +} + +int main (void) +{ + check_vect (); + + if (f(&A[0], N) != RES) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-not "Bitfield OK to lower." "ifcvt" } } */ + + diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2-not.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2-not.c new file mode 100644 index 0000000000000000000000000000000000000000..4ac7b3fc0dfd1c9d0b5e94a2ba6a745545577ec1 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-2-not.c @@ -0,0 +1,49 @@ +/* { dg-require-effective-target vect_shift } */ +/* { dg-require-effective-target vect_long_long } */ +/* { dg-additional-options { "-fdump-tree-ifcvt-all" } } */ + +#include +#include "tree-vect.h" + +extern void abort(void); + +struct s { + char a : 4; +}; + +#define N 32 +#define ELT0 {0} +#define ELT1 {1} +#define ELT2 {2} +#define ELT3 {3} +#define RES 48 +struct s A[N] + = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3}; + +int __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + int res = 0; + for (int i = 0; i < n; ++i) + { + asm volatile ("" ::: "memory"); + res += ptr[i].a; + } + return res; +} + +int main (void) +{ + check_vect (); + + if (f(&A[0], N) != RES) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-not "Bitfield OK to lower." "ifcvt" } } */ + + diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-8.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-8.c new file mode 100644 index 0000000000000000000000000000000000000000..52cfd33d937ae90f3fe9556716c90e098b768ac8 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-8.c @@ -0,0 +1,49 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target vect_shift } */ +/* { dg-additional-options { "-fdump-tree-ifcvt-all" } } */ + +#include +#include "tree-vect.h" + +extern void abort(void); + +struct s { int i : 31; }; + +#define ELT0 {0} +#define ELT1 {1} +#define ELT2 {2} +#define ELT3 {3} +#define ELT4 {4} +#define N 32 +#define RES 25 +struct s A[N] + = { ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT4, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3}; + +int __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + int res = 0; + for (int i = 0; i < n; ++i) + { + if (ptr[i].i == 4) + return res; + res += ptr[i].i; + } + + return res; +} + +int main (void) +{ + check_vect (); + + if (f(&A[0], N) != RES) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump "Bitfield OK to lower." "ifcvt" } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-9.c b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-9.c new file mode 100644 index 0000000000000000000000000000000000000000..ab814698131a5905def181eeed85d8a3c62b924b --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-9.c @@ -0,0 +1,51 @@ +/* { dg-require-effective-target vect_shift } */ +/* { dg-require-effective-target vect_long_long } */ +/* { dg-additional-options { "-fdump-tree-ifcvt-all" } } */ + +#include +#include "tree-vect.h" + +extern void abort(void); + +struct s { + unsigned i : 31; + char a : 4; +}; + +#define N 32 +#define ELT0 {0x7FFFFFFFUL, 0} +#define ELT1 {0x7FFFFFFFUL, 1} +#define ELT2 {0x7FFFFFFFUL, 2} +#define ELT3 {0x7FFFFFFFUL, 3} +#define ELT4 {0x7FFFFFFFUL, 4} +#define RES 9 +struct s A[N] + = { ELT0, ELT4, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3, + ELT0, ELT1, ELT2, ELT3, ELT0, ELT1, ELT2, ELT3}; + +int __attribute__ ((noipa)) +f(struct s *ptr, unsigned n) { + int res = 0; + for (int i = 0; i < n; ++i) + { + if (ptr[i].a) + return 9; + res += ptr[i].a; + } + return res; +} + +int main (void) +{ + check_vect (); + + if (f(&A[0], N) != RES) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump "Bitfield OK to lower." "ifcvt" } } */ + diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc index e342532a343a3c066142adeec5fdfaf736a653e5..cdb0fe4c29dfa531e3277925022d127b13ffcc16 100644 --- a/gcc/tree-if-conv.cc +++ b/gcc/tree-if-conv.cc @@ -586,7 +586,7 @@ add_to_dst_predicate_list (class loop *loop, edge e, /* Return true if one of the successor edges of BB exits LOOP. */ static bool -bb_with_exit_edge_p (class loop *loop, basic_block bb) +bb_with_exit_edge_p (const class loop *loop, basic_block bb) { edge e; edge_iterator ei; @@ -1268,6 +1268,44 @@ get_loop_body_in_if_conv_order (const class loop *loop) } free (blocks_in_bfs_order); BITMAP_FREE (visited); + + /* Go through loop and reject if-conversion or lowering of bitfields if we + encounter statements we do not believe the vectorizer will be able to + handle. If adding a new type of statement here, make sure + 'ifcvt_local_dce' is also able to handle it propertly. */ + for (index = 0; index < loop->num_nodes; index++) + { + basic_block bb = blocks[index]; + gimple_stmt_iterator gsi; + + bool may_have_nonlocal_labels + = bb_with_exit_edge_p (loop, bb) || bb == loop->latch; + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) + switch (gimple_code (gsi_stmt (gsi))) + { + case GIMPLE_LABEL: + if (!may_have_nonlocal_labels) + { + tree label + = gimple_label_label (as_a (gsi_stmt (gsi))); + if (DECL_NONLOCAL (label) || FORCED_LABEL (label)) + { + free (blocks); + return NULL; + } + } + /* Fallthru. */ + case GIMPLE_ASSIGN: + case GIMPLE_CALL: + case GIMPLE_DEBUG: + case GIMPLE_COND: + gimple_set_uid (gsi_stmt (gsi), 0); + break; + default: + free (blocks); + return NULL; + } + } return blocks; } @@ -1438,36 +1476,6 @@ if_convertible_loop_p_1 (class loop *loop, vec *refs) exit_bb = bb; } - for (i = 0; i < loop->num_nodes; i++) - { - basic_block bb = ifc_bbs[i]; - gimple_stmt_iterator gsi; - - bool may_have_nonlocal_labels - = bb_with_exit_edge_p (loop, bb) || bb == loop->latch; - for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) - switch (gimple_code (gsi_stmt (gsi))) - { - case GIMPLE_LABEL: - if (!may_have_nonlocal_labels) - { - tree label - = gimple_label_label (as_a (gsi_stmt (gsi))); - if (DECL_NONLOCAL (label) || FORCED_LABEL (label)) - return false; - } - /* Fallthru. */ - case GIMPLE_ASSIGN: - case GIMPLE_CALL: - case GIMPLE_DEBUG: - case GIMPLE_COND: - gimple_set_uid (gsi_stmt (gsi), 0); - break; - default: - return false; - } - } - data_reference_p dr; innermost_DR_map @@ -1579,14 +1587,6 @@ if_convertible_loop_p (class loop *loop, vec *refs) return false; } - /* More than one loop exit is too much to handle. */ - if (!single_exit (loop)) - { - if (dump_file && (dump_flags & TDF_DETAILS)) - fprintf (dump_file, "multiple exits\n"); - return false; - } - /* If one of the loop header's edge is an exit edge then do not apply if-conversion. */ FOR_EACH_EDGE (e, ei, loop->header->succs) @@ -3566,9 +3566,6 @@ tree_if_conversion (class loop *loop, vec *preds) aggressive_if_conv = true; } - if (!single_exit (loop)) - goto cleanup; - /* If there are more than two BBs in the loop then there is at least one if to convert. */ if (loop->num_nodes > 2 @@ -3588,15 +3585,25 @@ tree_if_conversion (class loop *loop, vec *preds) if (loop->num_nodes > 2) { - need_to_ifcvt = true; + /* More than one loop exit is too much to handle. */ + if (!single_exit (loop)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "Can not ifcvt due to multiple exits\n"); + } + else + { + need_to_ifcvt = true; - if (!if_convertible_loop_p (loop, &refs) || !dbg_cnt (if_conversion_tree)) - goto cleanup; + if (!if_convertible_loop_p (loop, &refs) + || !dbg_cnt (if_conversion_tree)) + goto cleanup; - if ((need_to_predicate || any_complicated_phi) - && ((!flag_tree_loop_vectorize && !loop->force_vectorize) - || loop->dont_vectorize)) - goto cleanup; + if ((need_to_predicate || any_complicated_phi) + && ((!flag_tree_loop_vectorize && !loop->force_vectorize) + || loop->dont_vectorize)) + goto cleanup; + } } if ((flag_tree_loop_vectorize || loop->force_vectorize) @@ -3687,7 +3694,8 @@ tree_if_conversion (class loop *loop, vec *preds) PHIs, those are to be kept in sync with the non-if-converted copy. ??? We'll still keep dead stores though. */ exit_bbs = BITMAP_ALLOC (NULL); - bitmap_set_bit (exit_bbs, single_exit (loop)->dest->index); + for (edge exit : get_loop_exit_edges (loop)) + bitmap_set_bit (exit_bbs, exit->dest->index); bitmap_set_bit (exit_bbs, loop->latch->index); std::pair *name_pair; From patchwork Wed Jun 28 13:41:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 113882 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp8936435vqr; Wed, 28 Jun 2023 06:43:25 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4uCRhHn1q1VDJwgljVzjigh1iwbIWhTpP7O7t4zGFfyyzuWB0RplUgj8P1VmYzBDMDMAB0 X-Received: by 2002:aa7:cc9a:0:b0:51a:3159:53c7 with SMTP id p26-20020aa7cc9a000000b0051a315953c7mr24111542edt.30.1687959804891; Wed, 28 Jun 2023 06:43:24 -0700 (PDT) Received: from sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id m2-20020aa7d342000000b0051dd0fbcae4si699877edr.347.2023.06.28.06.43.24 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Jun 2023 06:43:24 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=TzPxTdFr; arc=fail (signature failed); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 156233858412 for ; Wed, 28 Jun 2023 13:43:22 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 156233858412 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1687959802; bh=h2vmavDAAMhEi1B7s7UbrcmmXw0boOtAzKQB4tyVQyI=; h=Date:To:Cc:Subject:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=TzPxTdFrRnCfc8Nht5/621Zn+4SIzY4zgmmndyl1pSwkSCSV+HrqT+3FL2FWz8uo7 a0T8LP83a6EQ6kJamTElhAH8/OPBekZRW5LohdKNAehRv/pbIs+wsizGDedFV9unMM j+T1jw3xFD7mr6fahDchgIp+Rk3xSba1dMXSO4Gw= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR03-DBA-obe.outbound.protection.outlook.com (mail-dbaeur03on2047.outbound.protection.outlook.com [40.107.104.47]) by sourceware.org (Postfix) with ESMTPS id 40E863858032 for ; Wed, 28 Jun 2023 13:42:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 40E863858032 Received: from AM6P194CA0029.EURP194.PROD.OUTLOOK.COM (2603:10a6:209:90::42) by DB8PR08MB5500.eurprd08.prod.outlook.com (2603:10a6:10:11e::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.26; Wed, 28 Jun 2023 13:42:05 +0000 Received: from AM7EUR03FT058.eop-EUR03.prod.protection.outlook.com (2603:10a6:209:90:cafe::b2) by AM6P194CA0029.outlook.office365.com (2603:10a6:209:90::42) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6544.19 via Frontend Transport; Wed, 28 Jun 2023 13:42:05 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM7EUR03FT058.mail.protection.outlook.com (100.127.140.247) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6544.20 via Frontend Transport; Wed, 28 Jun 2023 13:42:05 +0000 Received: ("Tessian outbound d6c4ee3ba1eb:v142"); Wed, 28 Jun 2023 13:42:05 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 760dc4d3ace42c35 X-CR-MTA-TID: 64aa7808 Received: from 93c57f0344e5.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id DB2DBC78-54C9-41C3-809C-BC513B5D8A7D.1; Wed, 28 Jun 2023 13:41:58 +0000 Received: from EUR01-DB5-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 93c57f0344e5.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 28 Jun 2023 13:41:58 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=gAAZqY9AxytGGOKZq7jo74Ksmow70OYBMHV74cFyBsyO/3DhUtVBF/tVRGXWkC0pQEq5buFL/PMvB1KWvHaqQvMdOhlhztNrKC8S8pSMdzX5qVZZg6fzCij1yY/y3n0wyDa7zB3IhsePN6nDZGTItDaP8juxTne/d6yf91b0OFO/AoGvL/Er18Hqf6yHTa8ac11jOVN/TyBdcxzq+RlUoJGYlCNWVWdR07wS2cBXhbh0ExYtB+AwLQxhzxeImuwpH2v1VeFL8gEbIN70bVJWa804FaYuSDUS/sko+45z1gyonEr4dBMv7LINrMQ6QLQWEhCqdjeuRifXh2+WW+pZ8Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=h2vmavDAAMhEi1B7s7UbrcmmXw0boOtAzKQB4tyVQyI=; b=OkbIsdB1TV+QAJTeBCidiQnVKTg8NBpB20C+TRHQ73PjRGV4u7e6/JqLWp252VWoE+JttsBAaftL/Ie/WzzLho3DY7ql7jIsMfc2HqqKIVNIkY9uTxt43cbe9bz+bEyi2FMsaytwt0ZNzMCj0qtJG5qexwwuwxjE32iSdc36BQA3dzapwT8mXNEstOBh4EfwpEeOsr5yDJnzXadlzyS93kun0HM4EvlKUMKDs9mGUudnvBhlagx8ZKDYWqM9MmvJXg0pAeAXMW+yBgla4BXt+CASAyWJMw7PU+IqOwtwlGPAHNIFag+Mbv5pl9YIuQXkrQCi8nbIxIIK7NZjbg8/og== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by DBAPR08MB5576.eurprd08.prod.outlook.com (2603:10a6:10:1ae::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.26; Wed, 28 Jun 2023 13:41:54 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::2301:1cde:cfe7:eaf0]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::2301:1cde:cfe7:eaf0%6]) with mapi id 15.20.6521.026; Wed, 28 Jun 2023 13:41:54 +0000 Date: Wed, 28 Jun 2023 14:41:46 +0100 To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, joseph@codesourcery.com, rguenther@suse.de, nathan@acm.org, jason@redhat.com Subject: [PATCH 2/19][front-end] C/C++ front-end: add pragma GCC novector Message-ID: Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: DM6PR02CA0040.namprd02.prod.outlook.com (2603:10b6:5:177::17) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|DBAPR08MB5576:EE_|AM7EUR03FT058:EE_|DB8PR08MB5500:EE_ X-MS-Office365-Filtering-Correlation-Id: 785579b6-7ac7-4b4f-aaf2-08db77dd760c x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: XR2Va+t8ee9NskseHtLMHRByYn3s5e1QfI9id/lSxTTecyouoqDbdgEkVHd1VHimyUMbzWyRoz3lc0s+OPuBrvgXjbOhJGp0xN0dUlsF6Oaw0UL+XOhxh47jzS87fB1s2Ikf2AKJYHqJt+BQEuoB0D7jQolz1VobivDBWCCupipMsjFe/Q9lgYo8h5dw59aM0ljp8tUVQbJ1zW/F+yTtpi4ZDE/+h2HE4XHBI+N+tx2fY+KIL2EBx5TQQDIG4w9m3JPOU7Ozs9++BY59w/ViO2fBuDpM4qdqvtsiopWTET+bzPL6q+40T+KbXnN07IyU4vU7P+zgT9YslgNUkSIYwvnY5QFwUXDKpg9p8scdW4oPDND4aaQNDd1aneaaLso1NwiUEyYYGDBNt6exMdx4IVqMKZpBrmz4DtYEb9bkXPOaZ+eE7Zdcr6j6uBuW8bK0iuOz1FY1Dh2HvPLzbe66q61zMxkPe2qB80aGFGcBT3U= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230028)(4636009)(136003)(346002)(39860400002)(396003)(376002)(366004)(451199021)(6486002)(2616005)(83380400001)(33964004)(6666004)(6512007)(4743002)(6506007)(26005)(44144004)(2906002)(186003)(478600001)(235185007)(30864003)(44832011)(6916009)(5660300002)(36756003)(4326008)(66946007)(316002)(38100700002)(8936002)(41300700001)(66476007)(8676002)(86362001)(84970400001)(66556008)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: DBAPR08MB5576 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM7EUR03FT058.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 95fa6741-25ef-42f7-326b-08db77dd6f03 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: +z0I9yjzMvOkqSiksn6En7EIIrtMu9yY2Sr5mp8yQNHOrSMjvd4TlgY30h4ADsm9PpmAc5Gb48PzrqibKpP+xOXse0abblglcQePCt+/rnK29BUc17WWciVIjekqO5HXLS76xwpuipFwO3pM0EdQGGlWNuGsBNRJyjDWREeeD/sIimp26908M+MQ0ioD5JeSwWXE6XFYIln1D1ITxn6Etr/INCwarSMV16s60UZSOeVbsfdhDQ03PTVlgpLGDSg2PJrsWbXpV2/92K/CE3EZXS9ms6A1s/9KBAVK2sI9+g9U7g+Sb76vkJ8qYKe92RFk7dxqgTxjpDyn842ODwPCNmlCy9CBcVoBSb1Ot/l66QPKjM2d3kv9TKumr16+Z4Fnv39q6xA1RqmLvvjljcC0btDIBytvk7ZYzawEhusZtMT0XHzTbd8h44SGwbUJnk0GExqkHYXPw0vk1Itkfy9pCe6FlrN+IhOKxUqBs8ptQRMmTq5MxaA9tyupM6RMFrnhcGp4nD2rTA4EbX8NPa9uD34d7hKv62N2BXtNrS9erAY7u14IQcioFCeWbK/I1tbP4WsrYATZSdN9QHCTQ7e4SPzV45u7WpdC6Z2/28iPuhzLl9uQC7J5aJBOjxbzug/luWwjGQHHB3MfyB28+Uks7bM94lt0PF69Y/zTqt+IG7Zjf2zmc2jISlDpie+GfbXCJB3wT5SOyHpIAk+TkN6uzaUbB3PL+nZ27bCrc5iBtEPuR4t0F2MuIekiKQTyHRCD8NIqh5oHyWWUHTmEgt8bsQ== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230028)(4636009)(136003)(39860400002)(396003)(346002)(376002)(451199021)(40470700004)(36840700001)(46966006)(6512007)(84970400001)(36860700001)(26005)(82310400005)(6916009)(82740400003)(356005)(44832011)(5660300002)(235185007)(40460700003)(70206006)(86362001)(8936002)(41300700001)(4326008)(316002)(8676002)(81166007)(70586007)(30864003)(40480700001)(107886003)(47076005)(36756003)(6506007)(6486002)(186003)(2906002)(4743002)(83380400001)(44144004)(33964004)(6666004)(336012)(2616005)(478600001)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Jun 2023 13:42:05.4295 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 785579b6-7ac7-4b4f-aaf2-08db77dd760c X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM7EUR03FT058.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB8PR08MB5500 X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, KAM_DMARC_NONE, KAM_LOTSOFHASH, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tamar Christina via Gcc-patches From: Tamar Christina Reply-To: Tamar Christina Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769954140412111517?= X-GMAIL-MSGID: =?utf-8?q?1769954140412111517?= Hi All, FORTRAN currently has a pragma NOVECTOR for indicating that vectorization should not be applied to a particular loop. ICC/ICX also has such a pragma for C and C++ called #pragma novector. As part of this patch series I need a way to easily turn off vectorization of particular loops, particularly for testsuite reasons. This patch proposes a #pragma GCC novector that does the same for C and C++ as gfortan does for FORTRAN and what ICX/ICX does for C and C++. I added only some basic tests here, but the next patch in the series uses this in the testsuite in about ~800 tests. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/c-family/ChangeLog: * c-pragma.h (enum pragma_kind): Add PRAGMA_NOVECTOR. * c-pragma.cc (init_pragma): Use it. gcc/c/ChangeLog: * c-parser.cc (c_parser_while_statement, c_parser_do_statement, c_parser_for_statement, c_parser_statement_after_labels, c_parse_pragma_novector, c_parser_pragma): Wire through novector and default to false. gcc/cp/ChangeLog: * cp-tree.def (RANGE_FOR_STMT): Update comment. * cp-tree.h (RANGE_FOR_NOVECTOR): New. (cp_convert_range_for, finish_while_stmt_cond, finish_do_stmt, finish_for_cond): Add novector param. * init.cc (build_vec_init): Default novector to false. * method.cc (build_comparison_op): Likewise. * parser.cc (cp_parser_statement): Likewise. (cp_parser_for, cp_parser_c_for, cp_parser_range_for, cp_convert_range_for, cp_parser_iteration_statement, cp_parser_omp_for_loop, cp_parser_pragma): Support novector. (cp_parser_pragma_novector): New. * pt.cc (tsubst_expr): Likewise. * semantics.cc (finish_while_stmt_cond, finish_do_stmt, finish_for_cond): Likewise. gcc/ChangeLog: * doc/extend.texi: Document it. * tree-core.h (struct tree_base): Add lang_flag_7 and reduce spare0. * tree.h (TREE_LANG_FLAG_7): New. gcc/testsuite/ChangeLog: * g++.dg/vect/vect-novector-pragma.cc: New test. * gcc.dg/vect/vect-novector-pragma.c: New test. --- inline copy of patch -- diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h index 9cc95ab3ee376628dbef2485b84e6008210fa8fc..99cf2e8bd1c05537c198470f1aaa0a5a9da4e576 100644 --- diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h index 9cc95ab3ee376628dbef2485b84e6008210fa8fc..99cf2e8bd1c05537c198470f1aaa0a5a9da4e576 100644 --- a/gcc/c-family/c-pragma.h +++ b/gcc/c-family/c-pragma.h @@ -87,6 +87,7 @@ enum pragma_kind { PRAGMA_GCC_PCH_PREPROCESS, PRAGMA_IVDEP, PRAGMA_UNROLL, + PRAGMA_NOVECTOR, PRAGMA_FIRST_EXTERNAL }; diff --git a/gcc/c-family/c-pragma.cc b/gcc/c-family/c-pragma.cc index 0d2b333cebbed32423d5dc6fd2a3ac0ce0bf8b94..848a850b8e123ff1c6ae1ec4b7f8ccbd599b1a88 100644 --- a/gcc/c-family/c-pragma.cc +++ b/gcc/c-family/c-pragma.cc @@ -1862,6 +1862,10 @@ init_pragma (void) cpp_register_deferred_pragma (parse_in, "GCC", "unroll", PRAGMA_UNROLL, false, false); + if (!flag_preprocess_only) + cpp_register_deferred_pragma (parse_in, "GCC", "novector", PRAGMA_NOVECTOR, + false, false); + #ifdef HANDLE_PRAGMA_PACK_WITH_EXPANSION c_register_pragma_with_expansion (0, "pack", handle_pragma_pack); #else diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc index 24a6eb6e4596f32c477e3f1c3f98b9792f7bc92c..9d35fe68704c8aca197bcd4805a146c655959621 100644 --- a/gcc/c/c-parser.cc +++ b/gcc/c/c-parser.cc @@ -1572,9 +1572,11 @@ static tree c_parser_c99_block_statement (c_parser *, bool *, location_t * = NULL); static void c_parser_if_statement (c_parser *, bool *, vec *); static void c_parser_switch_statement (c_parser *, bool *); -static void c_parser_while_statement (c_parser *, bool, unsigned short, bool *); -static void c_parser_do_statement (c_parser *, bool, unsigned short); -static void c_parser_for_statement (c_parser *, bool, unsigned short, bool *); +static void c_parser_while_statement (c_parser *, bool, unsigned short, bool, + bool *); +static void c_parser_do_statement (c_parser *, bool, unsigned short, bool); +static void c_parser_for_statement (c_parser *, bool, unsigned short, bool, + bool *); static tree c_parser_asm_statement (c_parser *); static tree c_parser_asm_operands (c_parser *); static tree c_parser_asm_goto_operands (c_parser *); @@ -6644,13 +6646,13 @@ c_parser_statement_after_labels (c_parser *parser, bool *if_p, c_parser_switch_statement (parser, if_p); break; case RID_WHILE: - c_parser_while_statement (parser, false, 0, if_p); + c_parser_while_statement (parser, false, 0, false, if_p); break; case RID_DO: - c_parser_do_statement (parser, false, 0); + c_parser_do_statement (parser, false, 0, false); break; case RID_FOR: - c_parser_for_statement (parser, false, 0, if_p); + c_parser_for_statement (parser, false, 0, false, if_p); break; case RID_GOTO: c_parser_consume_token (parser); @@ -7146,7 +7148,7 @@ c_parser_switch_statement (c_parser *parser, bool *if_p) static void c_parser_while_statement (c_parser *parser, bool ivdep, unsigned short unroll, - bool *if_p) + bool novector, bool *if_p) { tree block, cond, body; unsigned char save_in_statement; @@ -7168,6 +7170,11 @@ c_parser_while_statement (c_parser *parser, bool ivdep, unsigned short unroll, build_int_cst (integer_type_node, annot_expr_unroll_kind), build_int_cst (integer_type_node, unroll)); + if (novector && cond != error_mark_node) + cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, + build_int_cst (integer_type_node, + annot_expr_no_vector_kind), + integer_zero_node); save_in_statement = in_statement; in_statement = IN_ITERATION_STMT; @@ -7199,7 +7206,8 @@ c_parser_while_statement (c_parser *parser, bool ivdep, unsigned short unroll, */ static void -c_parser_do_statement (c_parser *parser, bool ivdep, unsigned short unroll) +c_parser_do_statement (c_parser *parser, bool ivdep, unsigned short unroll, + bool novector) { tree block, cond, body; unsigned char save_in_statement; @@ -7228,6 +7236,11 @@ c_parser_do_statement (c_parser *parser, bool ivdep, unsigned short unroll) build_int_cst (integer_type_node, annot_expr_unroll_kind), build_int_cst (integer_type_node, unroll)); + if (novector && cond != error_mark_node) + cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, + build_int_cst (integer_type_node, + annot_expr_no_vector_kind), + integer_zero_node); if (!c_parser_require (parser, CPP_SEMICOLON, "expected %<;%>")) c_parser_skip_to_end_of_block_or_statement (parser); @@ -7296,7 +7309,7 @@ c_parser_do_statement (c_parser *parser, bool ivdep, unsigned short unroll) static void c_parser_for_statement (c_parser *parser, bool ivdep, unsigned short unroll, - bool *if_p) + bool novector, bool *if_p) { tree block, cond, incr, body; unsigned char save_in_statement; @@ -7430,6 +7443,12 @@ c_parser_for_statement (c_parser *parser, bool ivdep, unsigned short unroll, "with % pragma"); cond = error_mark_node; } + else if (novector) + { + c_parser_error (parser, "missing loop condition in loop " + "with % pragma"); + cond = error_mark_node; + } else { c_parser_consume_token (parser); @@ -7452,6 +7471,11 @@ c_parser_for_statement (c_parser *parser, bool ivdep, unsigned short unroll, build_int_cst (integer_type_node, annot_expr_unroll_kind), build_int_cst (integer_type_node, unroll)); + if (novector && cond != error_mark_node) + cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, + build_int_cst (integer_type_node, + annot_expr_no_vector_kind), + integer_zero_node); } /* Parse the increment expression (the third expression in a for-statement). In the case of a foreach-statement, this is @@ -13037,6 +13061,16 @@ c_parse_pragma_ivdep (c_parser *parser) return true; } +/* Parse a pragma GCC novector. */ + +static bool +c_parse_pragma_novector (c_parser *parser) +{ + c_parser_consume_pragma (parser); + c_parser_skip_to_pragma_eol (parser); + return true; +} + /* Parse a pragma GCC unroll. */ static unsigned short @@ -13264,11 +13298,12 @@ c_parser_pragma (c_parser *parser, enum pragma_context context, bool *if_p) case PRAGMA_IVDEP: { const bool ivdep = c_parse_pragma_ivdep (parser); - unsigned short unroll; + unsigned short unroll = 0; + bool novector = false; if (c_parser_peek_token (parser)->pragma_kind == PRAGMA_UNROLL) unroll = c_parser_pragma_unroll (parser); - else - unroll = 0; + if (c_parser_peek_token (parser)->pragma_kind == PRAGMA_NOVECTOR) + novector = c_parse_pragma_novector (parser); if (!c_parser_next_token_is_keyword (parser, RID_FOR) && !c_parser_next_token_is_keyword (parser, RID_WHILE) && !c_parser_next_token_is_keyword (parser, RID_DO)) @@ -13277,22 +13312,48 @@ c_parser_pragma (c_parser *parser, enum pragma_context context, bool *if_p) return false; } if (c_parser_next_token_is_keyword (parser, RID_FOR)) - c_parser_for_statement (parser, ivdep, unroll, if_p); + c_parser_for_statement (parser, ivdep, unroll, novector, if_p); else if (c_parser_next_token_is_keyword (parser, RID_WHILE)) - c_parser_while_statement (parser, ivdep, unroll, if_p); + c_parser_while_statement (parser, ivdep, unroll, novector, if_p); else - c_parser_do_statement (parser, ivdep, unroll); + c_parser_do_statement (parser, ivdep, unroll, novector); } return true; case PRAGMA_UNROLL: { unsigned short unroll = c_parser_pragma_unroll (parser); - bool ivdep; + bool ivdep = false; + bool novector = false; if (c_parser_peek_token (parser)->pragma_kind == PRAGMA_IVDEP) ivdep = c_parse_pragma_ivdep (parser); + if (c_parser_peek_token (parser)->pragma_kind == PRAGMA_NOVECTOR) + novector = c_parse_pragma_novector (parser); + if (!c_parser_next_token_is_keyword (parser, RID_FOR) + && !c_parser_next_token_is_keyword (parser, RID_WHILE) + && !c_parser_next_token_is_keyword (parser, RID_DO)) + { + c_parser_error (parser, "for, while or do statement expected"); + return false; + } + if (c_parser_next_token_is_keyword (parser, RID_FOR)) + c_parser_for_statement (parser, ivdep, unroll, novector, if_p); + else if (c_parser_next_token_is_keyword (parser, RID_WHILE)) + c_parser_while_statement (parser, ivdep, unroll, novector, if_p); else - ivdep = false; + c_parser_do_statement (parser, ivdep, unroll, novector); + } + return true; + + case PRAGMA_NOVECTOR: + { + bool novector = c_parse_pragma_novector (parser); + unsigned short unroll = 0; + bool ivdep = false; + if (c_parser_peek_token (parser)->pragma_kind == PRAGMA_IVDEP) + ivdep = c_parse_pragma_ivdep (parser); + if (c_parser_peek_token (parser)->pragma_kind == PRAGMA_UNROLL) + unroll = c_parser_pragma_unroll (parser); if (!c_parser_next_token_is_keyword (parser, RID_FOR) && !c_parser_next_token_is_keyword (parser, RID_WHILE) && !c_parser_next_token_is_keyword (parser, RID_DO)) @@ -13301,11 +13362,11 @@ c_parser_pragma (c_parser *parser, enum pragma_context context, bool *if_p) return false; } if (c_parser_next_token_is_keyword (parser, RID_FOR)) - c_parser_for_statement (parser, ivdep, unroll, if_p); + c_parser_for_statement (parser, ivdep, unroll, novector, if_p); else if (c_parser_next_token_is_keyword (parser, RID_WHILE)) - c_parser_while_statement (parser, ivdep, unroll, if_p); + c_parser_while_statement (parser, ivdep, unroll, novector, if_p); else - c_parser_do_statement (parser, ivdep, unroll); + c_parser_do_statement (parser, ivdep, unroll, novector); } return true; diff --git a/gcc/cp/cp-tree.def b/gcc/cp/cp-tree.def index 0e66ca70e00caa1dc4beada1024ace32954e2aaf..c13c8ea98a523c4ef1c55a11e02d5da9db7e367e 100644 --- a/gcc/cp/cp-tree.def +++ b/gcc/cp/cp-tree.def @@ -305,8 +305,8 @@ DEFTREECODE (IF_STMT, "if_stmt", tcc_statement, 4) /* Used to represent a range-based `for' statement. The operands are RANGE_FOR_DECL, RANGE_FOR_EXPR, RANGE_FOR_BODY, RANGE_FOR_SCOPE, - RANGE_FOR_UNROLL, and RANGE_FOR_INIT_STMT, respectively. Only used in - templates. */ + RANGE_FOR_UNROLL, RANGE_FOR_NOVECTOR and RANGE_FOR_INIT_STMT, + respectively. Only used in templates. */ DEFTREECODE (RANGE_FOR_STMT, "range_for_stmt", tcc_statement, 6) /* Used to represent an expression statement. Use `EXPR_STMT_EXPR' to diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index 8398223311194837441107cb335d497ff5f5ec1c..50b0f20817a168b5e9ac58db59ad44233f079e11 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -5377,6 +5377,7 @@ get_vec_init_expr (tree t) #define RANGE_FOR_UNROLL(NODE) TREE_OPERAND (RANGE_FOR_STMT_CHECK (NODE), 4) #define RANGE_FOR_INIT_STMT(NODE) TREE_OPERAND (RANGE_FOR_STMT_CHECK (NODE), 5) #define RANGE_FOR_IVDEP(NODE) TREE_LANG_FLAG_6 (RANGE_FOR_STMT_CHECK (NODE)) +#define RANGE_FOR_NOVECTOR(NODE) TREE_LANG_FLAG_7 (RANGE_FOR_STMT_CHECK (NODE)) /* STMT_EXPR accessor. */ #define STMT_EXPR_STMT(NODE) TREE_OPERAND (STMT_EXPR_CHECK (NODE), 0) @@ -7286,7 +7287,7 @@ extern bool maybe_clone_body (tree); /* In parser.cc */ extern tree cp_convert_range_for (tree, tree, tree, tree, unsigned int, bool, - unsigned short); + unsigned short, bool); extern void cp_convert_omp_range_for (tree &, vec *, tree &, tree &, tree &, tree &, tree &, tree &); extern void cp_finish_omp_range_for (tree, tree); @@ -7609,16 +7610,19 @@ extern void begin_else_clause (tree); extern void finish_else_clause (tree); extern void finish_if_stmt (tree); extern tree begin_while_stmt (void); -extern void finish_while_stmt_cond (tree, tree, bool, unsigned short); +extern void finish_while_stmt_cond (tree, tree, bool, unsigned short, + bool); extern void finish_while_stmt (tree); extern tree begin_do_stmt (void); extern void finish_do_body (tree); -extern void finish_do_stmt (tree, tree, bool, unsigned short); +extern void finish_do_stmt (tree, tree, bool, unsigned short, + bool); extern tree finish_return_stmt (tree); extern tree begin_for_scope (tree *); extern tree begin_for_stmt (tree, tree); extern void finish_init_stmt (tree); -extern void finish_for_cond (tree, tree, bool, unsigned short); +extern void finish_for_cond (tree, tree, bool, unsigned short, + bool); extern void finish_for_expr (tree, tree); extern void finish_for_stmt (tree); extern tree begin_range_for_stmt (tree, tree); diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc index af6e30f511e142c7a594e742d128b2bf0aa8fb8d..5b735b27e6f5bc6b439ae64665902f4f1ca76f95 100644 --- a/gcc/cp/init.cc +++ b/gcc/cp/init.cc @@ -4846,7 +4846,7 @@ build_vec_init (tree base, tree maxindex, tree init, finish_init_stmt (for_stmt); finish_for_cond (build2 (GT_EXPR, boolean_type_node, iterator, build_int_cst (TREE_TYPE (iterator), -1)), - for_stmt, false, 0); + for_stmt, false, 0, false); /* We used to pass this decrement to finish_for_expr; now we add it to elt_init below so it's part of the same full-expression as the initialization, and thus happens before any potentially throwing diff --git a/gcc/cp/method.cc b/gcc/cp/method.cc index 91cf943f11089c0e6bcbe8377daa4e016f956d56..fce49c796199c2c65cd70684e2942fea1b6b2ebd 100644 --- a/gcc/cp/method.cc +++ b/gcc/cp/method.cc @@ -1645,7 +1645,8 @@ build_comparison_op (tree fndecl, bool defining, tsubst_flags_t complain) add_stmt (idx); finish_init_stmt (for_stmt); finish_for_cond (build2 (LE_EXPR, boolean_type_node, idx, - maxval), for_stmt, false, 0); + maxval), for_stmt, false, 0, + false); finish_for_expr (cp_build_unary_op (PREINCREMENT_EXPR, TARGET_EXPR_SLOT (idx), false, complain), diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc index dd3665c8ccf48a8a0b1ba2c06400fe50999ea240..0bc110121d51ee13258b7ff0e4ad7851b4eae78e 100644 --- a/gcc/cp/parser.cc +++ b/gcc/cp/parser.cc @@ -2324,15 +2324,15 @@ static tree cp_parser_selection_statement static tree cp_parser_condition (cp_parser *); static tree cp_parser_iteration_statement - (cp_parser *, bool *, bool, unsigned short); + (cp_parser *, bool *, bool, unsigned short, bool); static bool cp_parser_init_statement (cp_parser *, tree *decl); static tree cp_parser_for - (cp_parser *, bool, unsigned short); + (cp_parser *, bool, unsigned short, bool); static tree cp_parser_c_for - (cp_parser *, tree, tree, bool, unsigned short); + (cp_parser *, tree, tree, bool, unsigned short, bool); static tree cp_parser_range_for - (cp_parser *, tree, tree, tree, bool, unsigned short, bool); + (cp_parser *, tree, tree, tree, bool, unsigned short, bool, bool); static void do_range_for_auto_deduction (tree, tree, tree, unsigned int); static tree cp_parser_perform_range_for_lookup @@ -12414,7 +12414,8 @@ cp_parser_statement (cp_parser* parser, tree in_statement_expr, case RID_DO: case RID_FOR: std_attrs = process_stmt_hotness_attribute (std_attrs, attrs_loc); - statement = cp_parser_iteration_statement (parser, if_p, false, 0); + statement = cp_parser_iteration_statement (parser, if_p, false, 0, + false); break; case RID_BREAK: @@ -13594,7 +13595,8 @@ cp_parser_condition (cp_parser* parser) not included. */ static tree -cp_parser_for (cp_parser *parser, bool ivdep, unsigned short unroll) +cp_parser_for (cp_parser *parser, bool ivdep, unsigned short unroll, + bool novector) { tree init, scope, decl; bool is_range_for; @@ -13624,14 +13626,14 @@ cp_parser_for (cp_parser *parser, bool ivdep, unsigned short unroll) if (is_range_for) return cp_parser_range_for (parser, scope, init, decl, ivdep, unroll, - false); + novector, false); else - return cp_parser_c_for (parser, scope, init, ivdep, unroll); + return cp_parser_c_for (parser, scope, init, ivdep, unroll, novector); } static tree cp_parser_c_for (cp_parser *parser, tree scope, tree init, bool ivdep, - unsigned short unroll) + unsigned short unroll, bool novector) { /* Normal for loop */ tree condition = NULL_TREE; @@ -13658,7 +13660,13 @@ cp_parser_c_for (cp_parser *parser, tree scope, tree init, bool ivdep, "% pragma"); condition = error_mark_node; } - finish_for_cond (condition, stmt, ivdep, unroll); + else if (novector) + { + cp_parser_error (parser, "missing loop condition in loop with " + "% pragma"); + condition = error_mark_node; + } + finish_for_cond (condition, stmt, ivdep, unroll, novector); /* Look for the `;'. */ cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON); @@ -13682,7 +13690,8 @@ cp_parser_c_for (cp_parser *parser, tree scope, tree init, bool ivdep, static tree cp_parser_range_for (cp_parser *parser, tree scope, tree init, tree range_decl, - bool ivdep, unsigned short unroll, bool is_omp) + bool ivdep, unsigned short unroll, bool novector, + bool is_omp) { tree stmt, range_expr; auto_vec bindings; @@ -13758,6 +13767,8 @@ cp_parser_range_for (cp_parser *parser, tree scope, tree init, tree range_decl, RANGE_FOR_IVDEP (stmt) = 1; if (unroll) RANGE_FOR_UNROLL (stmt) = build_int_cst (integer_type_node, unroll); + if (novector) + RANGE_FOR_NOVECTOR (stmt) = 1; finish_range_for_decl (stmt, range_decl, range_expr); if (!type_dependent_expression_p (range_expr) /* do_auto_deduction doesn't mess with template init-lists. */ @@ -13770,7 +13781,7 @@ cp_parser_range_for (cp_parser *parser, tree scope, tree init, tree range_decl, stmt = begin_for_stmt (scope, init); stmt = cp_convert_range_for (stmt, range_decl, range_expr, decomp_first_name, decomp_cnt, ivdep, - unroll); + unroll, novector); } return stmt; } @@ -13948,7 +13959,7 @@ warn_for_range_copy (tree decl, tree expr) tree cp_convert_range_for (tree statement, tree range_decl, tree range_expr, tree decomp_first_name, unsigned int decomp_cnt, - bool ivdep, unsigned short unroll) + bool ivdep, unsigned short unroll, bool novector) { tree begin, end; tree iter_type, begin_expr, end_expr; @@ -14008,7 +14019,7 @@ cp_convert_range_for (tree statement, tree range_decl, tree range_expr, begin, ERROR_MARK, end, ERROR_MARK, NULL_TREE, NULL, tf_warning_or_error); - finish_for_cond (condition, statement, ivdep, unroll); + finish_for_cond (condition, statement, ivdep, unroll, novector); /* The new increment expression. */ expression = finish_unary_op_expr (input_location, @@ -14175,7 +14186,7 @@ cp_parser_range_for_member_function (tree range, tree identifier) static tree cp_parser_iteration_statement (cp_parser* parser, bool *if_p, bool ivdep, - unsigned short unroll) + unsigned short unroll, bool novector) { cp_token *token; enum rid keyword; @@ -14209,7 +14220,7 @@ cp_parser_iteration_statement (cp_parser* parser, bool *if_p, bool ivdep, parens.require_open (parser); /* Parse the condition. */ condition = cp_parser_condition (parser); - finish_while_stmt_cond (condition, statement, ivdep, unroll); + finish_while_stmt_cond (condition, statement, ivdep, unroll, novector); /* Look for the `)'. */ parens.require_close (parser); /* Parse the dependent statement. */ @@ -14244,7 +14255,7 @@ cp_parser_iteration_statement (cp_parser* parser, bool *if_p, bool ivdep, /* Parse the expression. */ expression = cp_parser_expression (parser); /* We're done with the do-statement. */ - finish_do_stmt (expression, statement, ivdep, unroll); + finish_do_stmt (expression, statement, ivdep, unroll, novector); /* Look for the `)'. */ parens.require_close (parser); /* Look for the `;'. */ @@ -14258,7 +14269,7 @@ cp_parser_iteration_statement (cp_parser* parser, bool *if_p, bool ivdep, matching_parens parens; parens.require_open (parser); - statement = cp_parser_for (parser, ivdep, unroll); + statement = cp_parser_for (parser, ivdep, unroll, novector); /* Look for the `)'. */ parens.require_close (parser); @@ -43815,7 +43826,7 @@ cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses, cp_parser_require (parser, CPP_COLON, RT_COLON); init = cp_parser_range_for (parser, NULL_TREE, NULL_TREE, decl, - false, 0, true); + false, 0, true, false); cp_convert_omp_range_for (this_pre_body, for_block, decl, orig_decl, init, orig_init, @@ -49300,6 +49311,15 @@ cp_parser_pragma_unroll (cp_parser *parser, cp_token *pragma_tok) return unroll; } +/* Parse a pragma GCC novector. */ + +static bool +cp_parser_pragma_novector (cp_parser *parser, cp_token *pragma_tok) +{ + cp_parser_skip_to_pragma_eol (parser, pragma_tok); + return true; +} + /* Normal parsing of a pragma token. Here we can (and must) use the regular lexer. */ @@ -49613,17 +49633,33 @@ cp_parser_pragma (cp_parser *parser, enum pragma_context context, bool *if_p) break; } const bool ivdep = cp_parser_pragma_ivdep (parser, pragma_tok); - unsigned short unroll; + unsigned short unroll = 0; + bool novector = false; cp_token *tok = cp_lexer_peek_token (the_parser->lexer); - if (tok->type == CPP_PRAGMA - && cp_parser_pragma_kind (tok) == PRAGMA_UNROLL) + + while (tok->type == CPP_PRAGMA) { - tok = cp_lexer_consume_token (parser->lexer); - unroll = cp_parser_pragma_unroll (parser, tok); - tok = cp_lexer_peek_token (the_parser->lexer); + switch (cp_parser_pragma_kind (tok)) + { + case PRAGMA_UNROLL: + { + tok = cp_lexer_consume_token (parser->lexer); + unroll = cp_parser_pragma_unroll (parser, tok); + tok = cp_lexer_peek_token (the_parser->lexer); + break; + } + case PRAGMA_NOVECTOR: + { + tok = cp_lexer_consume_token (parser->lexer); + novector = cp_parser_pragma_novector (parser, tok); + tok = cp_lexer_peek_token (the_parser->lexer); + break; + } + default: + gcc_unreachable (); + } } - else - unroll = 0; + if (tok->type != CPP_KEYWORD || (tok->keyword != RID_FOR && tok->keyword != RID_WHILE @@ -49632,7 +49668,7 @@ cp_parser_pragma (cp_parser *parser, enum pragma_context context, bool *if_p) cp_parser_error (parser, "for, while or do statement expected"); return false; } - cp_parser_iteration_statement (parser, if_p, ivdep, unroll); + cp_parser_iteration_statement (parser, if_p, ivdep, unroll, novector); return true; } @@ -49646,17 +49682,82 @@ cp_parser_pragma (cp_parser *parser, enum pragma_context context, bool *if_p) } const unsigned short unroll = cp_parser_pragma_unroll (parser, pragma_tok); - bool ivdep; + bool ivdep = false; + bool novector = false; cp_token *tok = cp_lexer_peek_token (the_parser->lexer); - if (tok->type == CPP_PRAGMA - && cp_parser_pragma_kind (tok) == PRAGMA_IVDEP) + + while (tok->type == CPP_PRAGMA) { - tok = cp_lexer_consume_token (parser->lexer); - ivdep = cp_parser_pragma_ivdep (parser, tok); - tok = cp_lexer_peek_token (the_parser->lexer); + switch (cp_parser_pragma_kind (tok)) + { + case PRAGMA_IVDEP: + { + tok = cp_lexer_consume_token (parser->lexer); + ivdep = cp_parser_pragma_ivdep (parser, tok); + tok = cp_lexer_peek_token (the_parser->lexer); + break; + } + case PRAGMA_NOVECTOR: + { + tok = cp_lexer_consume_token (parser->lexer); + novector = cp_parser_pragma_novector (parser, tok); + tok = cp_lexer_peek_token (the_parser->lexer); + break; + } + default: + gcc_unreachable (); + } } - else - ivdep = false; + + if (tok->type != CPP_KEYWORD + || (tok->keyword != RID_FOR + && tok->keyword != RID_WHILE + && tok->keyword != RID_DO)) + { + cp_parser_error (parser, "for, while or do statement expected"); + return false; + } + cp_parser_iteration_statement (parser, if_p, ivdep, unroll, novector); + return true; + } + + case PRAGMA_NOVECTOR: + { + if (context == pragma_external) + { + error_at (pragma_tok->location, + "%<#pragma GCC novector%> must be inside a function"); + break; + } + const bool novector + = cp_parser_pragma_novector (parser, pragma_tok); + bool ivdep = false; + unsigned short unroll; + cp_token *tok = cp_lexer_peek_token (the_parser->lexer); + + while (tok->type == CPP_PRAGMA) + { + switch (cp_parser_pragma_kind (tok)) + { + case PRAGMA_IVDEP: + { + tok = cp_lexer_consume_token (parser->lexer); + ivdep = cp_parser_pragma_ivdep (parser, tok); + tok = cp_lexer_peek_token (the_parser->lexer); + break; + } + case PRAGMA_UNROLL: + { + tok = cp_lexer_consume_token (parser->lexer); + unroll = cp_parser_pragma_unroll (parser, tok); + tok = cp_lexer_peek_token (the_parser->lexer); + break; + } + default: + gcc_unreachable (); + } + } + if (tok->type != CPP_KEYWORD || (tok->keyword != RID_FOR && tok->keyword != RID_WHILE @@ -49665,7 +49766,7 @@ cp_parser_pragma (cp_parser *parser, enum pragma_context context, bool *if_p) cp_parser_error (parser, "for, while or do statement expected"); return false; } - cp_parser_iteration_statement (parser, if_p, ivdep, unroll); + cp_parser_iteration_statement (parser, if_p, ivdep, unroll, novector); return true; } diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc index 2345a18becc1160b9d12f3d88cccb66c8917373c..7b0d01a90e3c4012ec603ebe04cbbb31a7dd1570 100644 --- a/gcc/cp/pt.cc +++ b/gcc/cp/pt.cc @@ -19036,7 +19036,7 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl) RECUR (FOR_INIT_STMT (t)); finish_init_stmt (stmt); tmp = RECUR (FOR_COND (t)); - finish_for_cond (tmp, stmt, false, 0); + finish_for_cond (tmp, stmt, false, 0, false); tmp = RECUR (FOR_EXPR (t)); finish_for_expr (tmp, stmt); { @@ -19073,6 +19073,7 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl) { RANGE_FOR_IVDEP (stmt) = RANGE_FOR_IVDEP (t); RANGE_FOR_UNROLL (stmt) = RANGE_FOR_UNROLL (t); + RANGE_FOR_NOVECTOR (stmt) = RANGE_FOR_NOVECTOR (t); finish_range_for_decl (stmt, decl, expr); if (decomp_first && decl != error_mark_node) cp_finish_decomp (decl, decomp_first, decomp_cnt); @@ -19083,7 +19084,8 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl) ? tree_to_uhwi (RANGE_FOR_UNROLL (t)) : 0); stmt = cp_convert_range_for (stmt, decl, expr, decomp_first, decomp_cnt, - RANGE_FOR_IVDEP (t), unroll); + RANGE_FOR_IVDEP (t), unroll, + RANGE_FOR_NOVECTOR (t)); } bool prev = note_iteration_stmt_body_start (); @@ -19096,7 +19098,7 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl) case WHILE_STMT: stmt = begin_while_stmt (); tmp = RECUR (WHILE_COND (t)); - finish_while_stmt_cond (tmp, stmt, false, 0); + finish_while_stmt_cond (tmp, stmt, false, 0, false); { bool prev = note_iteration_stmt_body_start (); RECUR (WHILE_BODY (t)); @@ -19114,7 +19116,7 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl) } finish_do_body (stmt); tmp = RECUR (DO_COND (t)); - finish_do_stmt (tmp, stmt, false, 0); + finish_do_stmt (tmp, stmt, false, 0, false); break; case IF_STMT: diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc index 8fb47fd179eb2af2e82bf31d188023e9b9d41de9..b79975109c22ebcfcb060b4f20f32f69f3c3c444 100644 --- a/gcc/cp/semantics.cc +++ b/gcc/cp/semantics.cc @@ -1148,7 +1148,7 @@ begin_while_stmt (void) void finish_while_stmt_cond (tree cond, tree while_stmt, bool ivdep, - unsigned short unroll) + unsigned short unroll, bool novector) { cond = maybe_convert_cond (cond); finish_cond (&WHILE_COND (while_stmt), cond); @@ -1168,6 +1168,13 @@ finish_while_stmt_cond (tree cond, tree while_stmt, bool ivdep, annot_expr_unroll_kind), build_int_cst (integer_type_node, unroll)); + if (novector && cond != error_mark_node) + WHILE_COND (while_stmt) = build3 (ANNOTATE_EXPR, + TREE_TYPE (WHILE_COND (while_stmt)), + WHILE_COND (while_stmt), + build_int_cst (integer_type_node, + annot_expr_no_vector_kind), + integer_zero_node); simplify_loop_decl_cond (&WHILE_COND (while_stmt), WHILE_BODY (while_stmt)); } @@ -1212,7 +1219,8 @@ finish_do_body (tree do_stmt) COND is as indicated. */ void -finish_do_stmt (tree cond, tree do_stmt, bool ivdep, unsigned short unroll) +finish_do_stmt (tree cond, tree do_stmt, bool ivdep, unsigned short unroll, + bool novector) { cond = maybe_convert_cond (cond); end_maybe_infinite_loop (cond); @@ -1229,6 +1237,10 @@ finish_do_stmt (tree cond, tree do_stmt, bool ivdep, unsigned short unroll) cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, build_int_cst (integer_type_node, annot_expr_unroll_kind), build_int_cst (integer_type_node, unroll)); + if (novector && cond != error_mark_node) + cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, + build_int_cst (integer_type_node, annot_expr_no_vector_kind), + integer_zero_node); DO_COND (do_stmt) = cond; } @@ -1325,7 +1337,7 @@ finish_init_stmt (tree for_stmt) FOR_STMT. */ void -finish_for_cond (tree cond, tree for_stmt, bool ivdep, unsigned short unroll) +finish_for_cond (tree cond, tree for_stmt, bool ivdep, unsigned short unroll, bool novector) { cond = maybe_convert_cond (cond); finish_cond (&FOR_COND (for_stmt), cond); @@ -1345,6 +1357,13 @@ finish_for_cond (tree cond, tree for_stmt, bool ivdep, unsigned short unroll) annot_expr_unroll_kind), build_int_cst (integer_type_node, unroll)); + if (novector && cond != error_mark_node) + FOR_COND (for_stmt) = build3 (ANNOTATE_EXPR, + TREE_TYPE (FOR_COND (for_stmt)), + FOR_COND (for_stmt), + build_int_cst (integer_type_node, + annot_expr_no_vector_kind), + integer_zero_node); simplify_loop_decl_cond (&FOR_COND (for_stmt), FOR_BODY (for_stmt)); } diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 3040a9bdea65d27f8d20572b4ed37375f5fe949b..baac6643d1abbf33d592e68aca49ac83e3c29188 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -24349,6 +24349,25 @@ void ignore_vec_dep (int *a, int k, int c, int m) @} @end smallexample +@cindex pragma GCC novector +@item #pragma GCC novector + +With this pragma, the programmer asserts that the following loop should be +prevented from executing concurrently with SIMD (single instruction multiple +data) instructions. + +For example, the compiler cannot vectorize the following loop with the pragma: + +@smallexample +void foo (int n, int *a, int *b, int *c) +@{ + int i, j; +#pragma GCC novector + for (i = 0; i < n; ++i) + a[i] = b[i] + c[i]; +@} +@end smallexample + @cindex pragma GCC unroll @var{n} @item #pragma GCC unroll @var{n} diff --git a/gcc/testsuite/g++.dg/vect/vect-novector-pragma.cc b/gcc/testsuite/g++.dg/vect/vect-novector-pragma.cc new file mode 100644 index 0000000000000000000000000000000000000000..4667935b641a06e3004904dc86c4513a78736f04 --- /dev/null +++ b/gcc/testsuite/g++.dg/vect/vect-novector-pragma.cc @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ + +#include + +void f4 (std::vector a, std::vector b, int n) +{ + int i = 0; +#pragma GCC novector + while (i < (n & -8)) + { + a[i] += b[i]; + i++; + } +} + + +void f5 (std::vector a, std::vector b, int n) +{ + int i = 0; +#pragma GCC novector + for (auto x : b) + { + a[i] += x; + i++; + } +} + +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-novector-pragma.c b/gcc/testsuite/gcc.dg/vect/vect-novector-pragma.c new file mode 100644 index 0000000000000000000000000000000000000000..c4b3957711db8f78d26a32634e9bbfdc11a33302 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-novector-pragma.c @@ -0,0 +1,34 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ + +void f1 (int * restrict a, int * restrict b, int n) +{ +#pragma GCC novector + for (int i = 0; i < (n & -8); i++) + a[i] += b[i]; +} + +void f2 (int * restrict a, int * restrict b, int n) +{ + int i = 0; +#pragma GCC novector + do + { + a[i] += b[i]; + i++; + } + while (i < (n & -8)); +} + +void f3 (int * restrict a, int * restrict b, int n) +{ + int i = 0; +#pragma GCC novector + while (i < (n & -8)) + { + a[i] += b[i]; + i++; + } +} + +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ diff --git a/gcc/tree-core.h b/gcc/tree-core.h index c48a12b378f0b3086747bee43b38e2da3f90b24d..9268a0668390192caac9efaade0a53d9359cf9a7 100644 --- a/gcc/tree-core.h +++ b/gcc/tree-core.h @@ -1063,6 +1063,7 @@ struct GTY(()) tree_base { unsigned lang_flag_4 : 1; unsigned lang_flag_5 : 1; unsigned lang_flag_6 : 1; + unsigned lang_flag_7 : 1; unsigned saturating_flag : 1; unsigned unsigned_flag : 1; @@ -1071,7 +1072,7 @@ struct GTY(()) tree_base { unsigned nameless_flag : 1; unsigned atomic_flag : 1; unsigned unavailable_flag : 1; - unsigned spare0 : 2; + unsigned spare0 : 1; unsigned spare1 : 8; diff --git a/gcc/tree.h b/gcc/tree.h index 1854fe4a7d4d25b0cb55ee70402d5721f8b629ba..e96e8884bf68de77d19c95a87ae1c147460c23df 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -1112,6 +1112,8 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int, (TREE_NOT_CHECK2 (NODE, TREE_VEC, SSA_NAME)->base.u.bits.lang_flag_5) #define TREE_LANG_FLAG_6(NODE) \ (TREE_NOT_CHECK2 (NODE, TREE_VEC, SSA_NAME)->base.u.bits.lang_flag_6) +#define TREE_LANG_FLAG_7(NODE) \ + (TREE_NOT_CHECK2 (NODE, TREE_VEC, SSA_NAME)->base.u.bits.lang_flag_7) /* Define additional fields and accessors for nodes representing constants. */ --- a/gcc/c-family/c-pragma.h +++ b/gcc/c-family/c-pragma.h @@ -87,6 +87,7 @@ enum pragma_kind { PRAGMA_GCC_PCH_PREPROCESS, PRAGMA_IVDEP, PRAGMA_UNROLL, + PRAGMA_NOVECTOR, PRAGMA_FIRST_EXTERNAL }; diff --git a/gcc/c-family/c-pragma.cc b/gcc/c-family/c-pragma.cc index 0d2b333cebbed32423d5dc6fd2a3ac0ce0bf8b94..848a850b8e123ff1c6ae1ec4b7f8ccbd599b1a88 100644 --- a/gcc/c-family/c-pragma.cc +++ b/gcc/c-family/c-pragma.cc @@ -1862,6 +1862,10 @@ init_pragma (void) cpp_register_deferred_pragma (parse_in, "GCC", "unroll", PRAGMA_UNROLL, false, false); + if (!flag_preprocess_only) + cpp_register_deferred_pragma (parse_in, "GCC", "novector", PRAGMA_NOVECTOR, + false, false); + #ifdef HANDLE_PRAGMA_PACK_WITH_EXPANSION c_register_pragma_with_expansion (0, "pack", handle_pragma_pack); #else diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc index 24a6eb6e4596f32c477e3f1c3f98b9792f7bc92c..9d35fe68704c8aca197bcd4805a146c655959621 100644 --- a/gcc/c/c-parser.cc +++ b/gcc/c/c-parser.cc @@ -1572,9 +1572,11 @@ static tree c_parser_c99_block_statement (c_parser *, bool *, location_t * = NULL); static void c_parser_if_statement (c_parser *, bool *, vec *); static void c_parser_switch_statement (c_parser *, bool *); -static void c_parser_while_statement (c_parser *, bool, unsigned short, bool *); -static void c_parser_do_statement (c_parser *, bool, unsigned short); -static void c_parser_for_statement (c_parser *, bool, unsigned short, bool *); +static void c_parser_while_statement (c_parser *, bool, unsigned short, bool, + bool *); +static void c_parser_do_statement (c_parser *, bool, unsigned short, bool); +static void c_parser_for_statement (c_parser *, bool, unsigned short, bool, + bool *); static tree c_parser_asm_statement (c_parser *); static tree c_parser_asm_operands (c_parser *); static tree c_parser_asm_goto_operands (c_parser *); @@ -6644,13 +6646,13 @@ c_parser_statement_after_labels (c_parser *parser, bool *if_p, c_parser_switch_statement (parser, if_p); break; case RID_WHILE: - c_parser_while_statement (parser, false, 0, if_p); + c_parser_while_statement (parser, false, 0, false, if_p); break; case RID_DO: - c_parser_do_statement (parser, false, 0); + c_parser_do_statement (parser, false, 0, false); break; case RID_FOR: - c_parser_for_statement (parser, false, 0, if_p); + c_parser_for_statement (parser, false, 0, false, if_p); break; case RID_GOTO: c_parser_consume_token (parser); @@ -7146,7 +7148,7 @@ c_parser_switch_statement (c_parser *parser, bool *if_p) static void c_parser_while_statement (c_parser *parser, bool ivdep, unsigned short unroll, - bool *if_p) + bool novector, bool *if_p) { tree block, cond, body; unsigned char save_in_statement; @@ -7168,6 +7170,11 @@ c_parser_while_statement (c_parser *parser, bool ivdep, unsigned short unroll, build_int_cst (integer_type_node, annot_expr_unroll_kind), build_int_cst (integer_type_node, unroll)); + if (novector && cond != error_mark_node) + cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, + build_int_cst (integer_type_node, + annot_expr_no_vector_kind), + integer_zero_node); save_in_statement = in_statement; in_statement = IN_ITERATION_STMT; @@ -7199,7 +7206,8 @@ c_parser_while_statement (c_parser *parser, bool ivdep, unsigned short unroll, */ static void -c_parser_do_statement (c_parser *parser, bool ivdep, unsigned short unroll) +c_parser_do_statement (c_parser *parser, bool ivdep, unsigned short unroll, + bool novector) { tree block, cond, body; unsigned char save_in_statement; @@ -7228,6 +7236,11 @@ c_parser_do_statement (c_parser *parser, bool ivdep, unsigned short unroll) build_int_cst (integer_type_node, annot_expr_unroll_kind), build_int_cst (integer_type_node, unroll)); + if (novector && cond != error_mark_node) + cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, + build_int_cst (integer_type_node, + annot_expr_no_vector_kind), + integer_zero_node); if (!c_parser_require (parser, CPP_SEMICOLON, "expected %<;%>")) c_parser_skip_to_end_of_block_or_statement (parser); @@ -7296,7 +7309,7 @@ c_parser_do_statement (c_parser *parser, bool ivdep, unsigned short unroll) static void c_parser_for_statement (c_parser *parser, bool ivdep, unsigned short unroll, - bool *if_p) + bool novector, bool *if_p) { tree block, cond, incr, body; unsigned char save_in_statement; @@ -7430,6 +7443,12 @@ c_parser_for_statement (c_parser *parser, bool ivdep, unsigned short unroll, "with % pragma"); cond = error_mark_node; } + else if (novector) + { + c_parser_error (parser, "missing loop condition in loop " + "with % pragma"); + cond = error_mark_node; + } else { c_parser_consume_token (parser); @@ -7452,6 +7471,11 @@ c_parser_for_statement (c_parser *parser, bool ivdep, unsigned short unroll, build_int_cst (integer_type_node, annot_expr_unroll_kind), build_int_cst (integer_type_node, unroll)); + if (novector && cond != error_mark_node) + cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, + build_int_cst (integer_type_node, + annot_expr_no_vector_kind), + integer_zero_node); } /* Parse the increment expression (the third expression in a for-statement). In the case of a foreach-statement, this is @@ -13037,6 +13061,16 @@ c_parse_pragma_ivdep (c_parser *parser) return true; } +/* Parse a pragma GCC novector. */ + +static bool +c_parse_pragma_novector (c_parser *parser) +{ + c_parser_consume_pragma (parser); + c_parser_skip_to_pragma_eol (parser); + return true; +} + /* Parse a pragma GCC unroll. */ static unsigned short @@ -13264,11 +13298,12 @@ c_parser_pragma (c_parser *parser, enum pragma_context context, bool *if_p) case PRAGMA_IVDEP: { const bool ivdep = c_parse_pragma_ivdep (parser); - unsigned short unroll; + unsigned short unroll = 0; + bool novector = false; if (c_parser_peek_token (parser)->pragma_kind == PRAGMA_UNROLL) unroll = c_parser_pragma_unroll (parser); - else - unroll = 0; + if (c_parser_peek_token (parser)->pragma_kind == PRAGMA_NOVECTOR) + novector = c_parse_pragma_novector (parser); if (!c_parser_next_token_is_keyword (parser, RID_FOR) && !c_parser_next_token_is_keyword (parser, RID_WHILE) && !c_parser_next_token_is_keyword (parser, RID_DO)) @@ -13277,22 +13312,48 @@ c_parser_pragma (c_parser *parser, enum pragma_context context, bool *if_p) return false; } if (c_parser_next_token_is_keyword (parser, RID_FOR)) - c_parser_for_statement (parser, ivdep, unroll, if_p); + c_parser_for_statement (parser, ivdep, unroll, novector, if_p); else if (c_parser_next_token_is_keyword (parser, RID_WHILE)) - c_parser_while_statement (parser, ivdep, unroll, if_p); + c_parser_while_statement (parser, ivdep, unroll, novector, if_p); else - c_parser_do_statement (parser, ivdep, unroll); + c_parser_do_statement (parser, ivdep, unroll, novector); } return true; case PRAGMA_UNROLL: { unsigned short unroll = c_parser_pragma_unroll (parser); - bool ivdep; + bool ivdep = false; + bool novector = false; if (c_parser_peek_token (parser)->pragma_kind == PRAGMA_IVDEP) ivdep = c_parse_pragma_ivdep (parser); + if (c_parser_peek_token (parser)->pragma_kind == PRAGMA_NOVECTOR) + novector = c_parse_pragma_novector (parser); + if (!c_parser_next_token_is_keyword (parser, RID_FOR) + && !c_parser_next_token_is_keyword (parser, RID_WHILE) + && !c_parser_next_token_is_keyword (parser, RID_DO)) + { + c_parser_error (parser, "for, while or do statement expected"); + return false; + } + if (c_parser_next_token_is_keyword (parser, RID_FOR)) + c_parser_for_statement (parser, ivdep, unroll, novector, if_p); + else if (c_parser_next_token_is_keyword (parser, RID_WHILE)) + c_parser_while_statement (parser, ivdep, unroll, novector, if_p); else - ivdep = false; + c_parser_do_statement (parser, ivdep, unroll, novector); + } + return true; + + case PRAGMA_NOVECTOR: + { + bool novector = c_parse_pragma_novector (parser); + unsigned short unroll = 0; + bool ivdep = false; + if (c_parser_peek_token (parser)->pragma_kind == PRAGMA_IVDEP) + ivdep = c_parse_pragma_ivdep (parser); + if (c_parser_peek_token (parser)->pragma_kind == PRAGMA_UNROLL) + unroll = c_parser_pragma_unroll (parser); if (!c_parser_next_token_is_keyword (parser, RID_FOR) && !c_parser_next_token_is_keyword (parser, RID_WHILE) && !c_parser_next_token_is_keyword (parser, RID_DO)) @@ -13301,11 +13362,11 @@ c_parser_pragma (c_parser *parser, enum pragma_context context, bool *if_p) return false; } if (c_parser_next_token_is_keyword (parser, RID_FOR)) - c_parser_for_statement (parser, ivdep, unroll, if_p); + c_parser_for_statement (parser, ivdep, unroll, novector, if_p); else if (c_parser_next_token_is_keyword (parser, RID_WHILE)) - c_parser_while_statement (parser, ivdep, unroll, if_p); + c_parser_while_statement (parser, ivdep, unroll, novector, if_p); else - c_parser_do_statement (parser, ivdep, unroll); + c_parser_do_statement (parser, ivdep, unroll, novector); } return true; diff --git a/gcc/cp/cp-tree.def b/gcc/cp/cp-tree.def index 0e66ca70e00caa1dc4beada1024ace32954e2aaf..c13c8ea98a523c4ef1c55a11e02d5da9db7e367e 100644 --- a/gcc/cp/cp-tree.def +++ b/gcc/cp/cp-tree.def @@ -305,8 +305,8 @@ DEFTREECODE (IF_STMT, "if_stmt", tcc_statement, 4) /* Used to represent a range-based `for' statement. The operands are RANGE_FOR_DECL, RANGE_FOR_EXPR, RANGE_FOR_BODY, RANGE_FOR_SCOPE, - RANGE_FOR_UNROLL, and RANGE_FOR_INIT_STMT, respectively. Only used in - templates. */ + RANGE_FOR_UNROLL, RANGE_FOR_NOVECTOR and RANGE_FOR_INIT_STMT, + respectively. Only used in templates. */ DEFTREECODE (RANGE_FOR_STMT, "range_for_stmt", tcc_statement, 6) /* Used to represent an expression statement. Use `EXPR_STMT_EXPR' to diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index 8398223311194837441107cb335d497ff5f5ec1c..50b0f20817a168b5e9ac58db59ad44233f079e11 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -5377,6 +5377,7 @@ get_vec_init_expr (tree t) #define RANGE_FOR_UNROLL(NODE) TREE_OPERAND (RANGE_FOR_STMT_CHECK (NODE), 4) #define RANGE_FOR_INIT_STMT(NODE) TREE_OPERAND (RANGE_FOR_STMT_CHECK (NODE), 5) #define RANGE_FOR_IVDEP(NODE) TREE_LANG_FLAG_6 (RANGE_FOR_STMT_CHECK (NODE)) +#define RANGE_FOR_NOVECTOR(NODE) TREE_LANG_FLAG_7 (RANGE_FOR_STMT_CHECK (NODE)) /* STMT_EXPR accessor. */ #define STMT_EXPR_STMT(NODE) TREE_OPERAND (STMT_EXPR_CHECK (NODE), 0) @@ -7286,7 +7287,7 @@ extern bool maybe_clone_body (tree); /* In parser.cc */ extern tree cp_convert_range_for (tree, tree, tree, tree, unsigned int, bool, - unsigned short); + unsigned short, bool); extern void cp_convert_omp_range_for (tree &, vec *, tree &, tree &, tree &, tree &, tree &, tree &); extern void cp_finish_omp_range_for (tree, tree); @@ -7609,16 +7610,19 @@ extern void begin_else_clause (tree); extern void finish_else_clause (tree); extern void finish_if_stmt (tree); extern tree begin_while_stmt (void); -extern void finish_while_stmt_cond (tree, tree, bool, unsigned short); +extern void finish_while_stmt_cond (tree, tree, bool, unsigned short, + bool); extern void finish_while_stmt (tree); extern tree begin_do_stmt (void); extern void finish_do_body (tree); -extern void finish_do_stmt (tree, tree, bool, unsigned short); +extern void finish_do_stmt (tree, tree, bool, unsigned short, + bool); extern tree finish_return_stmt (tree); extern tree begin_for_scope (tree *); extern tree begin_for_stmt (tree, tree); extern void finish_init_stmt (tree); -extern void finish_for_cond (tree, tree, bool, unsigned short); +extern void finish_for_cond (tree, tree, bool, unsigned short, + bool); extern void finish_for_expr (tree, tree); extern void finish_for_stmt (tree); extern tree begin_range_for_stmt (tree, tree); diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc index af6e30f511e142c7a594e742d128b2bf0aa8fb8d..5b735b27e6f5bc6b439ae64665902f4f1ca76f95 100644 --- a/gcc/cp/init.cc +++ b/gcc/cp/init.cc @@ -4846,7 +4846,7 @@ build_vec_init (tree base, tree maxindex, tree init, finish_init_stmt (for_stmt); finish_for_cond (build2 (GT_EXPR, boolean_type_node, iterator, build_int_cst (TREE_TYPE (iterator), -1)), - for_stmt, false, 0); + for_stmt, false, 0, false); /* We used to pass this decrement to finish_for_expr; now we add it to elt_init below so it's part of the same full-expression as the initialization, and thus happens before any potentially throwing diff --git a/gcc/cp/method.cc b/gcc/cp/method.cc index 91cf943f11089c0e6bcbe8377daa4e016f956d56..fce49c796199c2c65cd70684e2942fea1b6b2ebd 100644 --- a/gcc/cp/method.cc +++ b/gcc/cp/method.cc @@ -1645,7 +1645,8 @@ build_comparison_op (tree fndecl, bool defining, tsubst_flags_t complain) add_stmt (idx); finish_init_stmt (for_stmt); finish_for_cond (build2 (LE_EXPR, boolean_type_node, idx, - maxval), for_stmt, false, 0); + maxval), for_stmt, false, 0, + false); finish_for_expr (cp_build_unary_op (PREINCREMENT_EXPR, TARGET_EXPR_SLOT (idx), false, complain), diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc index dd3665c8ccf48a8a0b1ba2c06400fe50999ea240..0bc110121d51ee13258b7ff0e4ad7851b4eae78e 100644 --- a/gcc/cp/parser.cc +++ b/gcc/cp/parser.cc @@ -2324,15 +2324,15 @@ static tree cp_parser_selection_statement static tree cp_parser_condition (cp_parser *); static tree cp_parser_iteration_statement - (cp_parser *, bool *, bool, unsigned short); + (cp_parser *, bool *, bool, unsigned short, bool); static bool cp_parser_init_statement (cp_parser *, tree *decl); static tree cp_parser_for - (cp_parser *, bool, unsigned short); + (cp_parser *, bool, unsigned short, bool); static tree cp_parser_c_for - (cp_parser *, tree, tree, bool, unsigned short); + (cp_parser *, tree, tree, bool, unsigned short, bool); static tree cp_parser_range_for - (cp_parser *, tree, tree, tree, bool, unsigned short, bool); + (cp_parser *, tree, tree, tree, bool, unsigned short, bool, bool); static void do_range_for_auto_deduction (tree, tree, tree, unsigned int); static tree cp_parser_perform_range_for_lookup @@ -12414,7 +12414,8 @@ cp_parser_statement (cp_parser* parser, tree in_statement_expr, case RID_DO: case RID_FOR: std_attrs = process_stmt_hotness_attribute (std_attrs, attrs_loc); - statement = cp_parser_iteration_statement (parser, if_p, false, 0); + statement = cp_parser_iteration_statement (parser, if_p, false, 0, + false); break; case RID_BREAK: @@ -13594,7 +13595,8 @@ cp_parser_condition (cp_parser* parser) not included. */ static tree -cp_parser_for (cp_parser *parser, bool ivdep, unsigned short unroll) +cp_parser_for (cp_parser *parser, bool ivdep, unsigned short unroll, + bool novector) { tree init, scope, decl; bool is_range_for; @@ -13624,14 +13626,14 @@ cp_parser_for (cp_parser *parser, bool ivdep, unsigned short unroll) if (is_range_for) return cp_parser_range_for (parser, scope, init, decl, ivdep, unroll, - false); + novector, false); else - return cp_parser_c_for (parser, scope, init, ivdep, unroll); + return cp_parser_c_for (parser, scope, init, ivdep, unroll, novector); } static tree cp_parser_c_for (cp_parser *parser, tree scope, tree init, bool ivdep, - unsigned short unroll) + unsigned short unroll, bool novector) { /* Normal for loop */ tree condition = NULL_TREE; @@ -13658,7 +13660,13 @@ cp_parser_c_for (cp_parser *parser, tree scope, tree init, bool ivdep, "% pragma"); condition = error_mark_node; } - finish_for_cond (condition, stmt, ivdep, unroll); + else if (novector) + { + cp_parser_error (parser, "missing loop condition in loop with " + "% pragma"); + condition = error_mark_node; + } + finish_for_cond (condition, stmt, ivdep, unroll, novector); /* Look for the `;'. */ cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON); @@ -13682,7 +13690,8 @@ cp_parser_c_for (cp_parser *parser, tree scope, tree init, bool ivdep, static tree cp_parser_range_for (cp_parser *parser, tree scope, tree init, tree range_decl, - bool ivdep, unsigned short unroll, bool is_omp) + bool ivdep, unsigned short unroll, bool novector, + bool is_omp) { tree stmt, range_expr; auto_vec bindings; @@ -13758,6 +13767,8 @@ cp_parser_range_for (cp_parser *parser, tree scope, tree init, tree range_decl, RANGE_FOR_IVDEP (stmt) = 1; if (unroll) RANGE_FOR_UNROLL (stmt) = build_int_cst (integer_type_node, unroll); + if (novector) + RANGE_FOR_NOVECTOR (stmt) = 1; finish_range_for_decl (stmt, range_decl, range_expr); if (!type_dependent_expression_p (range_expr) /* do_auto_deduction doesn't mess with template init-lists. */ @@ -13770,7 +13781,7 @@ cp_parser_range_for (cp_parser *parser, tree scope, tree init, tree range_decl, stmt = begin_for_stmt (scope, init); stmt = cp_convert_range_for (stmt, range_decl, range_expr, decomp_first_name, decomp_cnt, ivdep, - unroll); + unroll, novector); } return stmt; } @@ -13948,7 +13959,7 @@ warn_for_range_copy (tree decl, tree expr) tree cp_convert_range_for (tree statement, tree range_decl, tree range_expr, tree decomp_first_name, unsigned int decomp_cnt, - bool ivdep, unsigned short unroll) + bool ivdep, unsigned short unroll, bool novector) { tree begin, end; tree iter_type, begin_expr, end_expr; @@ -14008,7 +14019,7 @@ cp_convert_range_for (tree statement, tree range_decl, tree range_expr, begin, ERROR_MARK, end, ERROR_MARK, NULL_TREE, NULL, tf_warning_or_error); - finish_for_cond (condition, statement, ivdep, unroll); + finish_for_cond (condition, statement, ivdep, unroll, novector); /* The new increment expression. */ expression = finish_unary_op_expr (input_location, @@ -14175,7 +14186,7 @@ cp_parser_range_for_member_function (tree range, tree identifier) static tree cp_parser_iteration_statement (cp_parser* parser, bool *if_p, bool ivdep, - unsigned short unroll) + unsigned short unroll, bool novector) { cp_token *token; enum rid keyword; @@ -14209,7 +14220,7 @@ cp_parser_iteration_statement (cp_parser* parser, bool *if_p, bool ivdep, parens.require_open (parser); /* Parse the condition. */ condition = cp_parser_condition (parser); - finish_while_stmt_cond (condition, statement, ivdep, unroll); + finish_while_stmt_cond (condition, statement, ivdep, unroll, novector); /* Look for the `)'. */ parens.require_close (parser); /* Parse the dependent statement. */ @@ -14244,7 +14255,7 @@ cp_parser_iteration_statement (cp_parser* parser, bool *if_p, bool ivdep, /* Parse the expression. */ expression = cp_parser_expression (parser); /* We're done with the do-statement. */ - finish_do_stmt (expression, statement, ivdep, unroll); + finish_do_stmt (expression, statement, ivdep, unroll, novector); /* Look for the `)'. */ parens.require_close (parser); /* Look for the `;'. */ @@ -14258,7 +14269,7 @@ cp_parser_iteration_statement (cp_parser* parser, bool *if_p, bool ivdep, matching_parens parens; parens.require_open (parser); - statement = cp_parser_for (parser, ivdep, unroll); + statement = cp_parser_for (parser, ivdep, unroll, novector); /* Look for the `)'. */ parens.require_close (parser); @@ -43815,7 +43826,7 @@ cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses, cp_parser_require (parser, CPP_COLON, RT_COLON); init = cp_parser_range_for (parser, NULL_TREE, NULL_TREE, decl, - false, 0, true); + false, 0, true, false); cp_convert_omp_range_for (this_pre_body, for_block, decl, orig_decl, init, orig_init, @@ -49300,6 +49311,15 @@ cp_parser_pragma_unroll (cp_parser *parser, cp_token *pragma_tok) return unroll; } +/* Parse a pragma GCC novector. */ + +static bool +cp_parser_pragma_novector (cp_parser *parser, cp_token *pragma_tok) +{ + cp_parser_skip_to_pragma_eol (parser, pragma_tok); + return true; +} + /* Normal parsing of a pragma token. Here we can (and must) use the regular lexer. */ @@ -49613,17 +49633,33 @@ cp_parser_pragma (cp_parser *parser, enum pragma_context context, bool *if_p) break; } const bool ivdep = cp_parser_pragma_ivdep (parser, pragma_tok); - unsigned short unroll; + unsigned short unroll = 0; + bool novector = false; cp_token *tok = cp_lexer_peek_token (the_parser->lexer); - if (tok->type == CPP_PRAGMA - && cp_parser_pragma_kind (tok) == PRAGMA_UNROLL) + + while (tok->type == CPP_PRAGMA) { - tok = cp_lexer_consume_token (parser->lexer); - unroll = cp_parser_pragma_unroll (parser, tok); - tok = cp_lexer_peek_token (the_parser->lexer); + switch (cp_parser_pragma_kind (tok)) + { + case PRAGMA_UNROLL: + { + tok = cp_lexer_consume_token (parser->lexer); + unroll = cp_parser_pragma_unroll (parser, tok); + tok = cp_lexer_peek_token (the_parser->lexer); + break; + } + case PRAGMA_NOVECTOR: + { + tok = cp_lexer_consume_token (parser->lexer); + novector = cp_parser_pragma_novector (parser, tok); + tok = cp_lexer_peek_token (the_parser->lexer); + break; + } + default: + gcc_unreachable (); + } } - else - unroll = 0; + if (tok->type != CPP_KEYWORD || (tok->keyword != RID_FOR && tok->keyword != RID_WHILE @@ -49632,7 +49668,7 @@ cp_parser_pragma (cp_parser *parser, enum pragma_context context, bool *if_p) cp_parser_error (parser, "for, while or do statement expected"); return false; } - cp_parser_iteration_statement (parser, if_p, ivdep, unroll); + cp_parser_iteration_statement (parser, if_p, ivdep, unroll, novector); return true; } @@ -49646,17 +49682,82 @@ cp_parser_pragma (cp_parser *parser, enum pragma_context context, bool *if_p) } const unsigned short unroll = cp_parser_pragma_unroll (parser, pragma_tok); - bool ivdep; + bool ivdep = false; + bool novector = false; cp_token *tok = cp_lexer_peek_token (the_parser->lexer); - if (tok->type == CPP_PRAGMA - && cp_parser_pragma_kind (tok) == PRAGMA_IVDEP) + + while (tok->type == CPP_PRAGMA) { - tok = cp_lexer_consume_token (parser->lexer); - ivdep = cp_parser_pragma_ivdep (parser, tok); - tok = cp_lexer_peek_token (the_parser->lexer); + switch (cp_parser_pragma_kind (tok)) + { + case PRAGMA_IVDEP: + { + tok = cp_lexer_consume_token (parser->lexer); + ivdep = cp_parser_pragma_ivdep (parser, tok); + tok = cp_lexer_peek_token (the_parser->lexer); + break; + } + case PRAGMA_NOVECTOR: + { + tok = cp_lexer_consume_token (parser->lexer); + novector = cp_parser_pragma_novector (parser, tok); + tok = cp_lexer_peek_token (the_parser->lexer); + break; + } + default: + gcc_unreachable (); + } } - else - ivdep = false; + + if (tok->type != CPP_KEYWORD + || (tok->keyword != RID_FOR + && tok->keyword != RID_WHILE + && tok->keyword != RID_DO)) + { + cp_parser_error (parser, "for, while or do statement expected"); + return false; + } + cp_parser_iteration_statement (parser, if_p, ivdep, unroll, novector); + return true; + } + + case PRAGMA_NOVECTOR: + { + if (context == pragma_external) + { + error_at (pragma_tok->location, + "%<#pragma GCC novector%> must be inside a function"); + break; + } + const bool novector + = cp_parser_pragma_novector (parser, pragma_tok); + bool ivdep = false; + unsigned short unroll; + cp_token *tok = cp_lexer_peek_token (the_parser->lexer); + + while (tok->type == CPP_PRAGMA) + { + switch (cp_parser_pragma_kind (tok)) + { + case PRAGMA_IVDEP: + { + tok = cp_lexer_consume_token (parser->lexer); + ivdep = cp_parser_pragma_ivdep (parser, tok); + tok = cp_lexer_peek_token (the_parser->lexer); + break; + } + case PRAGMA_UNROLL: + { + tok = cp_lexer_consume_token (parser->lexer); + unroll = cp_parser_pragma_unroll (parser, tok); + tok = cp_lexer_peek_token (the_parser->lexer); + break; + } + default: + gcc_unreachable (); + } + } + if (tok->type != CPP_KEYWORD || (tok->keyword != RID_FOR && tok->keyword != RID_WHILE @@ -49665,7 +49766,7 @@ cp_parser_pragma (cp_parser *parser, enum pragma_context context, bool *if_p) cp_parser_error (parser, "for, while or do statement expected"); return false; } - cp_parser_iteration_statement (parser, if_p, ivdep, unroll); + cp_parser_iteration_statement (parser, if_p, ivdep, unroll, novector); return true; } diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc index 2345a18becc1160b9d12f3d88cccb66c8917373c..7b0d01a90e3c4012ec603ebe04cbbb31a7dd1570 100644 --- a/gcc/cp/pt.cc +++ b/gcc/cp/pt.cc @@ -19036,7 +19036,7 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl) RECUR (FOR_INIT_STMT (t)); finish_init_stmt (stmt); tmp = RECUR (FOR_COND (t)); - finish_for_cond (tmp, stmt, false, 0); + finish_for_cond (tmp, stmt, false, 0, false); tmp = RECUR (FOR_EXPR (t)); finish_for_expr (tmp, stmt); { @@ -19073,6 +19073,7 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl) { RANGE_FOR_IVDEP (stmt) = RANGE_FOR_IVDEP (t); RANGE_FOR_UNROLL (stmt) = RANGE_FOR_UNROLL (t); + RANGE_FOR_NOVECTOR (stmt) = RANGE_FOR_NOVECTOR (t); finish_range_for_decl (stmt, decl, expr); if (decomp_first && decl != error_mark_node) cp_finish_decomp (decl, decomp_first, decomp_cnt); @@ -19083,7 +19084,8 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl) ? tree_to_uhwi (RANGE_FOR_UNROLL (t)) : 0); stmt = cp_convert_range_for (stmt, decl, expr, decomp_first, decomp_cnt, - RANGE_FOR_IVDEP (t), unroll); + RANGE_FOR_IVDEP (t), unroll, + RANGE_FOR_NOVECTOR (t)); } bool prev = note_iteration_stmt_body_start (); @@ -19096,7 +19098,7 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl) case WHILE_STMT: stmt = begin_while_stmt (); tmp = RECUR (WHILE_COND (t)); - finish_while_stmt_cond (tmp, stmt, false, 0); + finish_while_stmt_cond (tmp, stmt, false, 0, false); { bool prev = note_iteration_stmt_body_start (); RECUR (WHILE_BODY (t)); @@ -19114,7 +19116,7 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl) } finish_do_body (stmt); tmp = RECUR (DO_COND (t)); - finish_do_stmt (tmp, stmt, false, 0); + finish_do_stmt (tmp, stmt, false, 0, false); break; case IF_STMT: diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc index 8fb47fd179eb2af2e82bf31d188023e9b9d41de9..b79975109c22ebcfcb060b4f20f32f69f3c3c444 100644 --- a/gcc/cp/semantics.cc +++ b/gcc/cp/semantics.cc @@ -1148,7 +1148,7 @@ begin_while_stmt (void) void finish_while_stmt_cond (tree cond, tree while_stmt, bool ivdep, - unsigned short unroll) + unsigned short unroll, bool novector) { cond = maybe_convert_cond (cond); finish_cond (&WHILE_COND (while_stmt), cond); @@ -1168,6 +1168,13 @@ finish_while_stmt_cond (tree cond, tree while_stmt, bool ivdep, annot_expr_unroll_kind), build_int_cst (integer_type_node, unroll)); + if (novector && cond != error_mark_node) + WHILE_COND (while_stmt) = build3 (ANNOTATE_EXPR, + TREE_TYPE (WHILE_COND (while_stmt)), + WHILE_COND (while_stmt), + build_int_cst (integer_type_node, + annot_expr_no_vector_kind), + integer_zero_node); simplify_loop_decl_cond (&WHILE_COND (while_stmt), WHILE_BODY (while_stmt)); } @@ -1212,7 +1219,8 @@ finish_do_body (tree do_stmt) COND is as indicated. */ void -finish_do_stmt (tree cond, tree do_stmt, bool ivdep, unsigned short unroll) +finish_do_stmt (tree cond, tree do_stmt, bool ivdep, unsigned short unroll, + bool novector) { cond = maybe_convert_cond (cond); end_maybe_infinite_loop (cond); @@ -1229,6 +1237,10 @@ finish_do_stmt (tree cond, tree do_stmt, bool ivdep, unsigned short unroll) cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, build_int_cst (integer_type_node, annot_expr_unroll_kind), build_int_cst (integer_type_node, unroll)); + if (novector && cond != error_mark_node) + cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, + build_int_cst (integer_type_node, annot_expr_no_vector_kind), + integer_zero_node); DO_COND (do_stmt) = cond; } @@ -1325,7 +1337,7 @@ finish_init_stmt (tree for_stmt) FOR_STMT. */ void -finish_for_cond (tree cond, tree for_stmt, bool ivdep, unsigned short unroll) +finish_for_cond (tree cond, tree for_stmt, bool ivdep, unsigned short unroll, bool novector) { cond = maybe_convert_cond (cond); finish_cond (&FOR_COND (for_stmt), cond); @@ -1345,6 +1357,13 @@ finish_for_cond (tree cond, tree for_stmt, bool ivdep, unsigned short unroll) annot_expr_unroll_kind), build_int_cst (integer_type_node, unroll)); + if (novector && cond != error_mark_node) + FOR_COND (for_stmt) = build3 (ANNOTATE_EXPR, + TREE_TYPE (FOR_COND (for_stmt)), + FOR_COND (for_stmt), + build_int_cst (integer_type_node, + annot_expr_no_vector_kind), + integer_zero_node); simplify_loop_decl_cond (&FOR_COND (for_stmt), FOR_BODY (for_stmt)); } diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 3040a9bdea65d27f8d20572b4ed37375f5fe949b..baac6643d1abbf33d592e68aca49ac83e3c29188 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -24349,6 +24349,25 @@ void ignore_vec_dep (int *a, int k, int c, int m) @} @end smallexample +@cindex pragma GCC novector +@item #pragma GCC novector + +With this pragma, the programmer asserts that the following loop should be +prevented from executing concurrently with SIMD (single instruction multiple +data) instructions. + +For example, the compiler cannot vectorize the following loop with the pragma: + +@smallexample +void foo (int n, int *a, int *b, int *c) +@{ + int i, j; +#pragma GCC novector + for (i = 0; i < n; ++i) + a[i] = b[i] + c[i]; +@} +@end smallexample + @cindex pragma GCC unroll @var{n} @item #pragma GCC unroll @var{n} diff --git a/gcc/testsuite/g++.dg/vect/vect-novector-pragma.cc b/gcc/testsuite/g++.dg/vect/vect-novector-pragma.cc new file mode 100644 index 0000000000000000000000000000000000000000..4667935b641a06e3004904dc86c4513a78736f04 --- /dev/null +++ b/gcc/testsuite/g++.dg/vect/vect-novector-pragma.cc @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ + +#include + +void f4 (std::vector a, std::vector b, int n) +{ + int i = 0; +#pragma GCC novector + while (i < (n & -8)) + { + a[i] += b[i]; + i++; + } +} + + +void f5 (std::vector a, std::vector b, int n) +{ + int i = 0; +#pragma GCC novector + for (auto x : b) + { + a[i] += x; + i++; + } +} + +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-novector-pragma.c b/gcc/testsuite/gcc.dg/vect/vect-novector-pragma.c new file mode 100644 index 0000000000000000000000000000000000000000..c4b3957711db8f78d26a32634e9bbfdc11a33302 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-novector-pragma.c @@ -0,0 +1,34 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ + +void f1 (int * restrict a, int * restrict b, int n) +{ +#pragma GCC novector + for (int i = 0; i < (n & -8); i++) + a[i] += b[i]; +} + +void f2 (int * restrict a, int * restrict b, int n) +{ + int i = 0; +#pragma GCC novector + do + { + a[i] += b[i]; + i++; + } + while (i < (n & -8)); +} + +void f3 (int * restrict a, int * restrict b, int n) +{ + int i = 0; +#pragma GCC novector + while (i < (n & -8)) + { + a[i] += b[i]; + i++; + } +} + +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ diff --git a/gcc/tree-core.h b/gcc/tree-core.h index c48a12b378f0b3086747bee43b38e2da3f90b24d..9268a0668390192caac9efaade0a53d9359cf9a7 100644 --- a/gcc/tree-core.h +++ b/gcc/tree-core.h @@ -1063,6 +1063,7 @@ struct GTY(()) tree_base { unsigned lang_flag_4 : 1; unsigned lang_flag_5 : 1; unsigned lang_flag_6 : 1; + unsigned lang_flag_7 : 1; unsigned saturating_flag : 1; unsigned unsigned_flag : 1; @@ -1071,7 +1072,7 @@ struct GTY(()) tree_base { unsigned nameless_flag : 1; unsigned atomic_flag : 1; unsigned unavailable_flag : 1; - unsigned spare0 : 2; + unsigned spare0 : 1; unsigned spare1 : 8; diff --git a/gcc/tree.h b/gcc/tree.h index 1854fe4a7d4d25b0cb55ee70402d5721f8b629ba..e96e8884bf68de77d19c95a87ae1c147460c23df 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -1112,6 +1112,8 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int, (TREE_NOT_CHECK2 (NODE, TREE_VEC, SSA_NAME)->base.u.bits.lang_flag_5) #define TREE_LANG_FLAG_6(NODE) \ (TREE_NOT_CHECK2 (NODE, TREE_VEC, SSA_NAME)->base.u.bits.lang_flag_6) +#define TREE_LANG_FLAG_7(NODE) \ + (TREE_NOT_CHECK2 (NODE, TREE_VEC, SSA_NAME)->base.u.bits.lang_flag_7) /* Define additional fields and accessors for nodes representing constants. */ From patchwork Mon Nov 6 07:37:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 161848 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:8f47:0:b0:403:3b70:6f57 with SMTP id j7csp2495706vqu; Sun, 5 Nov 2023 23:39:01 -0800 (PST) X-Google-Smtp-Source: AGHT+IF60H1qzfgI+vc0Ig5PXYsVcX6ZBfQwFxFtf6c3q3BmnUXr6yDf4V2ZGngOIhPFFYAjW7Ad X-Received: by 2002:a05:620a:4625:b0:778:9888:ae6e with SMTP id br37-20020a05620a462500b007789888ae6emr35596762qkb.19.1699256340656; Sun, 05 Nov 2023 23:39:00 -0800 (PST) ARC-Seal: i=4; a=rsa-sha256; t=1699256340; cv=pass; d=google.com; s=arc-20160816; b=N6mHpFpoblITnISgxZTclJJpQXN33uRpj8apNG05/hEqaEz4y5QCnWFc1uQZt/9x95 tqsvp3glFnmmhP2ybUj2+wgoTLeQPi1cXL4d2V8koD+yFc47xaic3m2xIL84qUyjA+VD /yTuM38aAzsdOX2UVy/LtBjNRR6TAPwJtXAxMfNg7gZLdruhPuRGpaUSU0B/8bufMfSB nbJJiTH96nI1RTippiBKUBKqZEUTMdl2zRTujxxYwlamh/CJAFBbmROE8xDxNw9hy4po NwFNDFelgEnzJxLMho4khb1DoqQ1xLvhxQoKNYHH/7qQpQoMol8MdTjSi3smOJyHX0K7 XUXg== ARC-Message-Signature: i=4; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:original-authentication-results :nodisclaimer:mime-version:in-reply-to:content-disposition :message-id:subject:cc:to:from:date:authentication-results-original :dkim-signature:dkim-signature:arc-filter:dmarc-filter:delivered-to; bh=qg2yhuXN28bEBfjn5yMlAdpxRa/WYsoNrJ3hqQZ5F6U=; fh=6lLuKcPp5JwcfhLsQ40FQN1vJS1KxlKbf2GiViQwbCM=; b=TYLVd5bW9q/UmbOClnib/2GkMBCRl9zQONz/o6gsnH/buauMnea/XmROdfJlbXzDnf oFtiNKFSINxUeeiag5piYFNn2/iYFqNpaMlSE6K8EeOG9Q7VV/HoNsCn+/+VWMNwBAQQ XWhuo6/LZgZJpyAJ6SrLzjyNygz9bWlZq0eHwuti+ks8eLMVrtR+bStRvrFDDAXpZsm4 wd2f9wON6YhJalb7j+JONW2iA8MVj6xfNnPDpOS/EWpaWEfHTzv6IxUAn1suxfSrQVao 0EjiBolZqDYKGPQa6Dz0sXGSYvPDWi9SNWFWnNn+m4aAn6uKzXt8DTPnDRyfd04BFU6z 6Qzw== ARC-Authentication-Results: i=4; mx.google.com; dkim=pass header.i=@armh.onmicrosoft.com header.s=selector2-armh-onmicrosoft-com header.b=g4F7xLxZ; dkim=pass header.i=@armh.onmicrosoft.com header.s=selector2-armh-onmicrosoft-com header.b=g4F7xLxZ; arc=pass (i=3); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id m17-20020ae9e711000000b0076729db5823si5075142qka.240.2023.11.05.23.39.00 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 05 Nov 2023 23:39:00 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@armh.onmicrosoft.com header.s=selector2-armh-onmicrosoft-com header.b=g4F7xLxZ; dkim=pass header.i=@armh.onmicrosoft.com header.s=selector2-armh-onmicrosoft-com header.b=g4F7xLxZ; arc=pass (i=3); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 95BBD3856DF2 for ; Mon, 6 Nov 2023 07:38:57 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR01-DB5-obe.outbound.protection.outlook.com (mail-db5eur01on2084.outbound.protection.outlook.com [40.107.15.84]) by sourceware.org (Postfix) with ESMTPS id 7BC983857700 for ; Mon, 6 Nov 2023 07:38:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7BC983857700 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 7BC983857700 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.15.84 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1699256299; cv=pass; b=wq93kUW8s7YlYzP+UQcrulrG00AWm6ZeWl3cGcwulO9vKVrKSWrg13/yjaaEVrkbu2mPvFTj+iiJ7sldeprCBh/6GRhp8VAWuDcF9++R1DuhIZ03Le+bvE/mKP515ni3T10memewY7LbqwR0CrEsRJL4qYJyGAEBoTcOQNu1guc= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1699256299; c=relaxed/simple; bh=EStcmw4nbyv4WDtOfinh8x9kUWq1H1p+ZVWYoAlhw9g=; h=DKIM-Signature:DKIM-Signature:Date:From:To:Subject:Message-ID: MIME-Version; b=Vm7ujAvFdBINOZkChruDlHzL4jVnJ+wjMpgzvveniIKR2COrsJK8eTqVFlHH6ZVKptrgs/OeG3VHOWsdWqEV5VNQFEARFBkJRp7ZSAQTH9LTgGjQn9rieveC+ZKGdl3f5r7pBuAdVjsZvfrMm+dhQUKGevO1ZhpfN9nTf85OOpA= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=k+sJdxYkfXBoqmag30f4kvhTu4qbzd+7LjX6Dc6k88SUCRFIjfAqD8LdSzpq2ZAq4nCeUuvE/LMsDjOIftIx0H8+qqDraIRah9+BRnhmszp/8LwDhReUPaaKPe9nzc3ZIYkoMGLtlAmvwxCCLGTBGxOvsnG8kTg0xyvSr4duZlsj5YhDf2JcWcZlG/J8pSORgZ9Tg9wF/lfSEfOZ+fY/rYpWZOeIoZZBI6poGzxGlrOBYz9tccDzOGn+p31wDLBFLxByRpiphodgIR7HlSw+Aledib30nDYM0CRsrXpEJ0sXFhsN8UfpYFgXpXXYgfS2hKJjsw4buPdYabFpckff6w== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=qg2yhuXN28bEBfjn5yMlAdpxRa/WYsoNrJ3hqQZ5F6U=; b=fiqz7FZB45RR8jwenL9pP0naRVoGmG3BbA8EtbVKpQOCq5aoJvkt3pQdimvXqdUt99SSeMUtZ8IOtNaEbvpm1xDxsAPdefRDF0BB2GLrARk1Qo5JxFSFc5B3aA/ilb3DV6ok8gPaAxMZ2kwepvmIudqM2qS+i1YcNec3mb8x4vews0IOTMZF4mgBmBRfVW88Qp3RX8ZvzE6ZPzDMcAW+TMgj9hLBx5aFsJ5UAGeVx4+y/5XZDRnu8WMA51YVOSeq5+YljFYXEM5lDnC9eL2j5obQiM+pyA3xK4vzPzSDmTLD85v47MJ80s3g+imxmb6H7Mq59c1t1RqNUfw4dzAVyQ== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=qg2yhuXN28bEBfjn5yMlAdpxRa/WYsoNrJ3hqQZ5F6U=; b=g4F7xLxZrkdqQo9pAc2CtgBbseCZwNp+zWIeOSvI/Gt+xmssm/m+sp8vTDc8+NzpT/bqLy/07kFlzfsfvo6GkSvBEJqUFZDy0CGHwlK+95imDGEV5OgBBCEf1h/rq4nYxZMaes2Vhz8jj5v2kwVtW0sc2qvot7qiMZKVz/gwHKE= Received: from DB8P191CA0008.EURP191.PROD.OUTLOOK.COM (2603:10a6:10:130::18) by AS8PR08MB9266.eurprd08.prod.outlook.com (2603:10a6:20b:5a2::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6954.27; Mon, 6 Nov 2023 07:38:07 +0000 Received: from DB5PEPF00014B90.eurprd02.prod.outlook.com (2603:10a6:10:130:cafe::e6) by DB8P191CA0008.outlook.office365.com (2603:10a6:10:130::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6954.28 via Frontend Transport; Mon, 6 Nov 2023 07:38:07 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DB5PEPF00014B90.mail.protection.outlook.com (10.167.8.228) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6977.16 via Frontend Transport; Mon, 6 Nov 2023 07:38:07 +0000 Received: ("Tessian outbound 8289ea11ec17:v228"); Mon, 06 Nov 2023 07:38:07 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 9713b032ebae445e X-CR-MTA-TID: 64aa7808 Received: from 2556f0629c09.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id AA551479-4DC5-433C-9FD1-19803145F123.1; Mon, 06 Nov 2023 07:38:00 +0000 Received: from EUR03-AM7-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 2556f0629c09.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 06 Nov 2023 07:38:00 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=EsUaAOKfwlgpirqjbf9/rQpd5Z7LDrKKMprJH5NMZ71r8zrIFbmcot7Ku0H1hN3OCZlnXY6nV63ZdTIAtS1bVcNqCTU0OF+I8g5lhftsfhvSL/nmZWGz1e1PVBodvuuys3B0dbquULG0peLiFoKkGRvpZvITJQ+Kf4lLuHnjtM3Vcm5RR1XVls6k8tnCU6cVFa0FUDpvxoP6fKe3i1oi9WrGH2t+2FhGAUDIg2mhTJvYsnkoJNAawzvvEwo6xuWoocLthN4p9eR0w5EDq6Y+4yMyX1rNdrm8K+A/Pdv+/aKNoroN96u9lU0zU75JI0v7bX/ESE7ZzFhtHjyAMZONXA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=qg2yhuXN28bEBfjn5yMlAdpxRa/WYsoNrJ3hqQZ5F6U=; b=ZiA2GJIiIAD7GDDUaJQubq1xSTMRpgme1yxJgjMb4Z/kH4co6o5cCNmnPr1Tad5IipRvk7fkMSUOSh9bmz6QtzKXddJ+4B9ZFW8bUL49cF7sjQo2AqYAvCV+zgyd2vLnrBBltxU1iGzM6hphUJ+DY/LwQhNAi0gnc79otgBQCsWjuMLthZsujnjPvkXj+cfduE+PBvhXwYl9C/lhXcSEVl6pkxvxxRQoBh7xHk7Jq0+YWGjJhOSB3VtPiOyIwbojyJmDNkFMv9htSA+PGQZkXwNnaWKkpIEXq3viSVMT9Pg2WFBkItLyGpEZP1UaIWRPNQ80UipUficeiZZWxGYf+Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=qg2yhuXN28bEBfjn5yMlAdpxRa/WYsoNrJ3hqQZ5F6U=; b=g4F7xLxZrkdqQo9pAc2CtgBbseCZwNp+zWIeOSvI/Gt+xmssm/m+sp8vTDc8+NzpT/bqLy/07kFlzfsfvo6GkSvBEJqUFZDy0CGHwlK+95imDGEV5OgBBCEf1h/rq4nYxZMaes2Vhz8jj5v2kwVtW0sc2qvot7qiMZKVz/gwHKE= Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by DU0PR08MB9581.eurprd08.prod.outlook.com (2603:10a6:10:44b::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6954.28; Mon, 6 Nov 2023 07:37:57 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::26aa:efdd:a74a:27d0]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::26aa:efdd:a74a:27d0%5]) with mapi id 15.20.6954.028; Mon, 6 Nov 2023 07:37:57 +0000 Date: Mon, 6 Nov 2023 07:37:55 +0000 From: Tamar Christina To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, rguenther@suse.de, jlaw@ventanamicro.com Subject: [PATCH 3/21]middle-end: Implement code motion and dependency analysis for early breaks Message-ID: Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: LO4P265CA0313.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:390::15) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|DU0PR08MB9581:EE_|DB5PEPF00014B90:EE_|AS8PR08MB9266:EE_ X-MS-Office365-Filtering-Correlation-Id: d865993d-d1f8-4ea4-1ad4-08dbde9b519e x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: LTm8bh/FC3HzvVe3sqGC7xpNxvwVVE41yeUjhWBf33+avlZ+d4xZRA3+cuCItS7bDIZtAp4/rHpjjRSQ2UF2MLQH1nGvGyPF6R/5C+vl9DwZTDsJIv8FAKFWizE21lLFtl3S+pfdAspudmnCZYEYSCbtWFR7nCu4BzjcsO7pwBPtulnjsOWO8H7ZWw/jlx2Bf0X6trUSja0nts+xfTa1+83CcWm+YjQmxAWytznv1pF27ckvu/gYYplTDsYb6i4+YBRMQsX3uTIUGDTiyAaNw+wwlHysj2KKuJeha31ku1oFSWkqXlQLvwZ+OYpMMQdiuFNNiAEwB6zm599RLEZOdzNeE+D6VApIm4EuRQaEHoOreXAGr5NkvDWYHArLEB/ptgW1kcrygmAPnR4ZUrLY9zgbyT1q3Tjsjp/RykIWQzWQZ1t/iSBI7q0jsnc+wt4adRrdBG5HGLl26IMq9M+aJvI1J6axa6WTfriCODOglu1eF5Ar+cfICTK172nU8QuzTFWO96o0OtcfWZs7VS8Yo7WiEPgyqrswTxUvAMpah4BbF8bJvQI1Gt0DUMmaOU1JzRX7v/l1VzSrrvj5GSVRPn5HWegak8kZWNr62JypFNbe1IfETpv3g4QrRWePwmyC X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230031)(396003)(376002)(136003)(346002)(39860400002)(366004)(230922051799003)(64100799003)(451199024)(186009)(1800799009)(66899024)(6506007)(478600001)(2616005)(44144004)(6512007)(6486002)(33964004)(36756003)(38100700002)(86362001)(66946007)(5660300002)(235185007)(44832011)(316002)(66476007)(66556008)(41300700001)(4743002)(83380400001)(2906002)(30864003)(26005)(8676002)(4326008)(6916009)(8936002)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU0PR08MB9581 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB5PEPF00014B90.eurprd02.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: afafc0eb-478a-491f-a5b5-08dbde9b4bbc X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: pxWBLM0MV6Nf4Qs2ib47YweQR4DLv+sWa3nP5MZle71nObEdIFldYaDkjlmfN7jmh/ezIIl/mUAudo3FhiveEmaO7b1LIGvLnQ+jVucmDsXNkTHu+nSu/OeWGrDwoicsIKcy57WJtbC6jadZTivQY+6/UtzajNijBBC0pRP3HSd0UiCYKkZbHHuoTNsi5vM+QzuH4PsrxxGvQ/SpNkx9SPppF7QwNnkEjz+l4mAFzLj2SA7eUaP7QEz037hA15t+t9++ioF21453CXfbKSnTx4M/KMVJanK7aU13kXn/8+4IunrL7u15RH6avLF+berdaMx4OYVBH/HUy60gLSOGhY6jBH3xFzKL33QlizBBMhYLJPF4uTp6eNT3QtShiGG8Q4/SHmkTSyoOFoOiIx972Wf4c/vqdjlpvMgPw4LIPdumI4SUKW1RTa/52pEz+Tksb1AX7dzmC9SwY75GxNT0Ciwmx+zmpOdUvJT+P2jgxCbtQgSleturIpM1+otjz+KUCcia5UwIubjt91o+ItrvXMG0pUB/PaK+kQXzE02yCUH2+YkTwSU2MHgdbm3yfBwb/xafZBz2XrsfH2oRdw7MiwCZJZn0HGgceGgPUsbaZF7WPaFnUZOcSTqUXjDxUHMWvFij1BOCZmnpsEdOR6hQXV1uzCDY5SjdIYru/CJZlHtUXx3aQwJEc/kP8iE2GxW5d6mRLVX29T5+ku0waeAutvZREPbtPIYkFQDLSp/+lEk6/77ExuDRlYVM8OuFrGW+CDtTTB4RIrmvCB8Itj5NPw== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230031)(4636009)(136003)(396003)(376002)(346002)(39860400002)(230922051799003)(186009)(64100799003)(1800799009)(82310400011)(451199024)(46966006)(36840700001)(40470700004)(6916009)(4326008)(8676002)(8936002)(41300700001)(36756003)(40480700001)(2906002)(235185007)(36860700001)(5660300002)(6486002)(66899024)(6512007)(356005)(316002)(86362001)(44832011)(47076005)(30864003)(81166007)(4743002)(70206006)(70586007)(83380400001)(336012)(40460700003)(26005)(82740400003)(2616005)(107886003)(6506007)(478600001)(33964004)(44144004)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Nov 2023 07:38:07.3514 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: d865993d-d1f8-4ea4-1ad4-08dbde9b519e X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DB5PEPF00014B90.eurprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB9266 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_LOTSOFHASH, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781799416876254910 X-GMAIL-MSGID: 1781799416876254910 Hi All, When performing early break vectorization we need to be sure that the vector operations are safe to perform. A simple example is e.g. for (int i = 0; i < N; i++) { vect_b[i] = x + i; if (vect_a[i]*2 != x) break; vect_a[i] = x; } where the store to vect_b is not allowed to be executed unconditionally since if we exit through the early break it wouldn't have been done for the full VF iteration. Effective the code motion determines: - is it safe/possible to vectorize the function - what updates to the VUSES should be performed if we do - Which statements need to be moved - Which statements can't be moved: * values that are live must be reachable through all exits * values that aren't single use and shared by the use/def chain of the cond - The final insertion point of the instructions. In the cases we have multiple early exist statements this should be the one closest to the loop latch itself. After motion the loop above is: for (int i = 0; i < N; i++) { ... y = x + i; if (vect_a[i]*2 != x) break; vect_b[i] = y; vect_a[i] = x; } The operation is split into two, during data ref analysis we determine validity of the operation and generate a worklist of actions to perform if we vectorize. After peeling and just before statetement tranformation we replay this worklist which moves the statements and updates book keeping only in the main loop that's to be vectorized. This includes updating of USES in exit blocks. At the moment we don't support this for epilog nomasks since the additional vectorized epilog's stmt UIDs are not found. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * tree-vect-data-refs.cc (validate_early_exit_stmts): New. (vect_analyze_early_break_dependences): New. (vect_analyze_data_ref_dependences): Use them. * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize early_breaks. (move_early_exit_stmts): New. (vect_transform_loop): use it/ * tree-vect-stmts.cc (vect_is_simple_use): Use vect_early_exit_def. * tree-vectorizer.h (enum vect_def_type): Add vect_early_exit_def. (class _loop_vec_info): Add early_breaks, early_break_conflict, early_break_vuses. (LOOP_VINFO_EARLY_BREAKS): New. (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS): New. (LOOP_VINFO_EARLY_BRK_DEST_BB): New. (LOOP_VINFO_EARLY_BRK_VUSES): New. --- inline copy of patch -- diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc index d5c9c4a11c2e5d8fd287f412bfa86d081c2f8325..0fc4f325980be0474f628c32b9ce7be77f3e1d60 100644 --- diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc index d5c9c4a11c2e5d8fd287f412bfa86d081c2f8325..0fc4f325980be0474f628c32b9ce7be77f3e1d60 100644 --- a/gcc/tree-vect-data-refs.cc +++ b/gcc/tree-vect-data-refs.cc @@ -613,6 +613,332 @@ vect_analyze_data_ref_dependence (struct data_dependence_relation *ddr, return opt_result::success (); } +/* This function tries to validate whether an early break vectorization + is possible for the current instruction sequence. Returns True i + possible, otherwise False. + + Requirements: + - Any memory access must be to a fixed size buffer. + - There must not be any loads and stores to the same object. + - Multiple loads are allowed as long as they don't alias. + + NOTE: + This implemementation is very conservative. Any overlappig loads/stores + that take place before the early break statement gets rejected aside from + WAR dependencies. + + i.e.: + + a[i] = 8 + c = a[i] + if (b[i]) + ... + + is not allowed, but + + c = a[i] + a[i] = 8 + if (b[i]) + ... + + is which is the common case. + + Arguments: + - LOOP_VINFO: loop information for the current loop. + - CHAIN: Currently detected sequence of instructions that need to be moved + if we are to vectorize this early break. + - FIXED: Sequences of SSA_NAMEs that must not be moved, they are reachable from + one or more cond conditions. If this set overlaps with CHAIN then FIXED + takes precedence. This deals with non-single use cases. + - LOADS: List of all loads found during traversal. + - BASES: List of all load data references found during traversal. + - GSTMT: Current position to inspect for validity. The sequence + will be moved upwards from this point. + - REACHING_VUSE: The dominating VUSE found so far. */ + +static bool +validate_early_exit_stmts (loop_vec_info loop_vinfo, hash_set *chain, + hash_set *fixed, vec *loads, + vec *bases, tree *reaching_vuse, + gimple_stmt_iterator *gstmt) +{ + if (gsi_end_p (*gstmt)) + return true; + + gimple *stmt = gsi_stmt (*gstmt); + /* ?? Do I need to move debug statements? not quite sure.. */ + if (gimple_has_ops (stmt) + && !is_gimple_debug (stmt)) + { + tree dest = NULL_TREE; + /* Try to find the SSA_NAME being defined. For Statements with an LHS + use the LHS, if not, assume that the first argument of a call is the + value being defined. e.g. MASKED_LOAD etc. */ + if (gimple_has_lhs (stmt)) + dest = gimple_get_lhs (stmt); + else if (const gcall *call = dyn_cast (stmt)) + dest = gimple_arg (call, 0); + else if (const gcond *cond = dyn_cast (stmt)) + { + /* Operands of conds are ones we can't move. */ + fixed->add (gimple_cond_lhs (cond)); + fixed->add (gimple_cond_rhs (cond)); + } + + bool move = false; + + stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt); + if (!stmt_vinfo) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "early breaks not supported. Unknown" + " statement: %G", stmt); + return false; + } + + auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo); + if (dr_ref) + { + /* We currently only support statically allocated objects due to + not having first-faulting loads support or peeling for alignment + support. Compute the size of the referenced object (it could be + dynamically allocated). */ + tree obj = DR_BASE_ADDRESS (dr_ref); + if (!obj || TREE_CODE (obj) != ADDR_EXPR) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "early breaks only supported on statically" + " allocated objects.\n"); + return false; + } + + tree refop = TREE_OPERAND (obj, 0); + tree refbase = get_base_address (refop); + if (!refbase || !DECL_P (refbase) || !DECL_SIZE (refbase) + || TREE_CODE (DECL_SIZE (refbase)) != INTEGER_CST) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "early breaks only supported on statically" + " allocated objects.\n"); + return false; + } + + if (DR_IS_READ (dr_ref)) + { + loads->safe_push (dest); + bases->safe_push (dr_ref); + } + else if (DR_IS_WRITE (dr_ref)) + { + for (auto dr : bases) + if (same_data_refs_base_objects (dr, dr_ref)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, + vect_location, + "early breaks only supported," + " overlapping loads and stores found" + " before the break statement.\n"); + return false; + } + /* Any writes starts a new chain. */ + move = true; + } + } + + /* If a statement is live and escapes the loop through usage in the loop + epilogue then we can't move it since we need to maintain its + reachability through all exits. */ + bool skip = false; + if (STMT_VINFO_LIVE_P (stmt_vinfo) + && !(dr_ref && DR_IS_WRITE (dr_ref))) + { + imm_use_iterator imm_iter; + use_operand_p use_p; + FOR_EACH_IMM_USE_FAST (use_p, imm_iter, dest) + { + basic_block bb = gimple_bb (USE_STMT (use_p)); + skip = bb == LOOP_VINFO_IV_EXIT (loop_vinfo)->dest; + if (skip) + break; + } + } + + /* If we found the defining statement of a something that's part of the + chain then expand the chain with the new SSA_VARs being used. */ + if (!skip && (chain->contains (dest) || move)) + { + move = true; + for (unsigned x = 0; x < gimple_num_args (stmt); x++) + { + tree var = gimple_arg (stmt, x); + if (TREE_CODE (var) == SSA_NAME) + { + if (fixed->contains (dest)) + { + move = false; + fixed->add (var); + } + else + chain->add (var); + } + else + { + use_operand_p use_p; + ssa_op_iter iter; + FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_USE) + { + tree op = USE_FROM_PTR (use_p); + gcc_assert (TREE_CODE (op) == SSA_NAME); + if (fixed->contains (dest)) + { + move = false; + fixed->add (op); + } + else + chain->add (op); + } + } + } + + if (dump_enabled_p ()) + { + if (move) + dump_printf_loc (MSG_NOTE, vect_location, + "found chain %G", stmt); + else + dump_printf_loc (MSG_NOTE, vect_location, + "ignored chain %G, not single use", stmt); + } + } + + if (move) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "==> recording stmt %G", stmt); + + for (tree ref : loads) + if (stmt_may_clobber_ref_p (stmt, ref, true)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "early breaks not supported as memory used" + " may alias.\n"); + return false; + } + + /* If we've moved a VDEF, extract the defining MEM and update + usages of it. */ + tree vdef; + if ((vdef = gimple_vdef (stmt))) + { + /* This statement is to be moved. */ + LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).safe_push (stmt); + *reaching_vuse = gimple_vuse (stmt); + } + } + } + + gsi_prev (gstmt); + + if (!validate_early_exit_stmts (loop_vinfo, chain, fixed, loads, bases, + reaching_vuse, gstmt)) + return false; + + if (gimple_vuse (stmt) && !gimple_vdef (stmt)) + { + LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).safe_push (stmt); + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "marked statement for vUSE update: %G", stmt); + } + + return true; +} + +/* Funcion vect_analyze_early_break_dependences. + + Examime all the data references in the loop and make sure that if we have + mulitple exits that we are able to safely move stores such that they become + safe for vectorization. The function also calculates the place where to move + the instructions to and computes what the new vUSE chain should be. + + This works in tandem with the CFG that will be produced by + slpeel_tree_duplicate_loop_to_edge_cfg later on. */ + +static opt_result +vect_analyze_early_break_dependences (loop_vec_info loop_vinfo) +{ + DUMP_VECT_SCOPE ("vect_analyze_early_break_dependences"); + + hash_set chain, fixed; + auto_vec loads; + auto_vec bases; + basic_block dest_bb = NULL; + tree vuse = NULL; + + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "loop contains multiple exits, analyzing" + " statement dependencies.\n"); + + for (gcond *c : LOOP_VINFO_LOOP_CONDS (loop_vinfo)) + { + stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (c); + if (STMT_VINFO_TYPE (loop_cond_info) != loop_exit_ctrl_vec_info_type) + continue; + + gimple *stmt = STMT_VINFO_STMT (loop_cond_info); + gimple_stmt_iterator gsi = gsi_for_stmt (stmt); + + /* Initiaze the vuse chain with the one at the early break. */ + if (!vuse) + vuse = gimple_vuse (c); + + if (!validate_early_exit_stmts (loop_vinfo, &chain, &fixed, &loads, + &bases, &vuse, &gsi)) + return opt_result::failure_at (stmt, + "can't safely apply code motion to " + "dependencies of %G to vectorize " + "the early exit.\n", stmt); + + /* Save destination as we go, BB are visited in order and the last one + is where statements should be moved to. */ + if (!dest_bb) + dest_bb = gimple_bb (c); + else + { + basic_block curr_bb = gimple_bb (c); + if (dominated_by_p (CDI_DOMINATORS, curr_bb, dest_bb)) + dest_bb = curr_bb; + } + } + + dest_bb = FALLTHRU_EDGE (dest_bb)->dest; + gcc_assert (dest_bb); + LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo) = dest_bb; + + /* TODO: Remove? It's useful debug statement but may be too much. */ + for (auto g : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "updated use: %T, mem_ref: %G", + vuse, g); + } + + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "recorded statements to be moved to BB %d\n", + LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo)->index); + + return opt_result::success (); +} + /* Function vect_analyze_data_ref_dependences. Examine all the data references in the loop, and make sure there do not @@ -657,6 +983,11 @@ vect_analyze_data_ref_dependences (loop_vec_info loop_vinfo, return res; } + /* If we have early break statements in the loop, check to see if they + are of a form we can vectorizer. */ + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) + return vect_analyze_early_break_dependences (loop_vinfo); + return opt_result::success (); } diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 40f167d279589a5b97f618720cfbc0d41b7f2342..c123398aad207082384a2079c5234033c3d825ea 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -1040,6 +1040,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared) partial_load_store_bias (0), peeling_for_gaps (false), peeling_for_niter (false), + early_breaks (false), no_data_dependencies (false), has_mask_store (false), scalar_loop_scaling (profile_probability::uninitialized ()), @@ -11392,6 +11393,55 @@ update_epilogue_loop_vinfo (class loop *epilogue, tree advance) epilogue_vinfo->shared->save_datarefs (); } +/* When vectorizing early break statements instructions that happen before + the early break in the current BB need to be moved to after the early + break. This function deals with that and assumes that any validity + checks has already been performed. + + While moving the instructions if it encounters a VUSE or VDEF it then + corrects the VUSES as it moves the statements along. GDEST is the location + in which to insert the new statements. */ + +static void +move_early_exit_stmts (loop_vec_info loop_vinfo) +{ + if (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).is_empty ()) + return; + + /* Move all stmts that need moving. */ + basic_block dest_bb = LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo); + gimple_stmt_iterator dest_gsi = gsi_start_bb (dest_bb); + + for (gimple *stmt : LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo)) + { + /* Check to see if statement is still required for vect or has been + elided. */ + auto stmt_info = loop_vinfo->lookup_stmt (stmt); + if (!stmt_info) + continue; + + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, "moving stmt %G", stmt); + + gimple_stmt_iterator stmt_gsi = gsi_for_stmt (stmt); + gsi_move_before (&stmt_gsi, &dest_gsi); + gsi_prev (&dest_gsi); + update_stmt (stmt); + } + + /* Update all the stmts with their new reaching VUSES. */ + tree vuse = gimple_vuse (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).last ()); + for (auto p : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "updating vuse to %T for stmt %G", vuse, p); + unlink_stmt_vdef (p); + gimple_set_vuse (p, vuse); + update_stmt (p); + } +} + /* Function vect_transform_loop. The analysis phase has determined that the loop is vectorizable. @@ -11541,6 +11591,11 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call) vect_schedule_slp (loop_vinfo, LOOP_VINFO_SLP_INSTANCES (loop_vinfo)); } + /* Handle any code motion that we need to for early-break vectorization after + we've done peeling but just before we start vectorizing. */ + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) + move_early_exit_stmts (loop_vinfo); + /* FORNOW: the vectorizer supports only loops which body consist of one basic block (header + empty latch). When the vectorizer will support more involved loop forms, the order by which the BBs are diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 99ba75e98c0d185edd78c7b8b9947618d18576cc..42cebb92789247434a91cb8e74c0557e75d1ea2c 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -13511,6 +13511,9 @@ vect_is_simple_use (tree operand, vec_info *vinfo, enum vect_def_type *dt, case vect_first_order_recurrence: dump_printf (MSG_NOTE, "first order recurrence\n"); break; + case vect_early_exit_def: + dump_printf (MSG_NOTE, "early exit\n"); + break; case vect_unknown_def_type: dump_printf (MSG_NOTE, "unknown\n"); break; diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index a4043e4a6568a9e8cfaf9298fe940289e165f9e2..1418913d2c308b0cf78352e29dc9958746fb9c94 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -66,6 +66,7 @@ enum vect_def_type { vect_double_reduction_def, vect_nested_cycle, vect_first_order_recurrence, + vect_early_exit_def, vect_unknown_def_type }; @@ -888,6 +889,10 @@ public: we need to peel off iterations at the end to form an epilogue loop. */ bool peeling_for_niter; + /* When the loop has early breaks that we can vectorize we need to peel + the loop for the break finding loop. */ + bool early_breaks; + /* List of loop additional IV conditionals found in the loop. */ auto_vec conds; @@ -942,6 +947,20 @@ public: /* The controlling loop IV for the scalar loop being vectorized. This IV controls the natural exits of the loop. */ edge scalar_loop_iv_exit; + + /* Used to store the list of statements needing to be moved if doing early + break vectorization as they would violate the scalar loop semantics if + vectorized in their current location. These are stored in order that they need + to be moved. */ + auto_vec early_break_conflict; + + /* The final basic block where to move statements to. In the case of + multiple exits this could be pretty far away. */ + basic_block early_break_dest_bb; + + /* Statements whose VUSES need updating if early break vectorization is to + happen. */ + auto_vec early_break_vuses; } *loop_vec_info; /* Access Functions. */ @@ -996,6 +1015,10 @@ public: #define LOOP_VINFO_REDUCTION_CHAINS(L) (L)->reduction_chains #define LOOP_VINFO_PEELING_FOR_GAPS(L) (L)->peeling_for_gaps #define LOOP_VINFO_PEELING_FOR_NITER(L) (L)->peeling_for_niter +#define LOOP_VINFO_EARLY_BREAKS(L) (L)->early_breaks +#define LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS(L) (L)->early_break_conflict +#define LOOP_VINFO_EARLY_BRK_DEST_BB(L) (L)->early_break_dest_bb +#define LOOP_VINFO_EARLY_BRK_VUSES(L) (L)->early_break_vuses #define LOOP_VINFO_LOOP_CONDS(L) (L)->conds #define LOOP_VINFO_LOOP_IV_COND(L) (L)->loop_iv_cond #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies --- a/gcc/tree-vect-data-refs.cc +++ b/gcc/tree-vect-data-refs.cc @@ -613,6 +613,332 @@ vect_analyze_data_ref_dependence (struct data_dependence_relation *ddr, return opt_result::success (); } +/* This function tries to validate whether an early break vectorization + is possible for the current instruction sequence. Returns True i + possible, otherwise False. + + Requirements: + - Any memory access must be to a fixed size buffer. + - There must not be any loads and stores to the same object. + - Multiple loads are allowed as long as they don't alias. + + NOTE: + This implemementation is very conservative. Any overlappig loads/stores + that take place before the early break statement gets rejected aside from + WAR dependencies. + + i.e.: + + a[i] = 8 + c = a[i] + if (b[i]) + ... + + is not allowed, but + + c = a[i] + a[i] = 8 + if (b[i]) + ... + + is which is the common case. + + Arguments: + - LOOP_VINFO: loop information for the current loop. + - CHAIN: Currently detected sequence of instructions that need to be moved + if we are to vectorize this early break. + - FIXED: Sequences of SSA_NAMEs that must not be moved, they are reachable from + one or more cond conditions. If this set overlaps with CHAIN then FIXED + takes precedence. This deals with non-single use cases. + - LOADS: List of all loads found during traversal. + - BASES: List of all load data references found during traversal. + - GSTMT: Current position to inspect for validity. The sequence + will be moved upwards from this point. + - REACHING_VUSE: The dominating VUSE found so far. */ + +static bool +validate_early_exit_stmts (loop_vec_info loop_vinfo, hash_set *chain, + hash_set *fixed, vec *loads, + vec *bases, tree *reaching_vuse, + gimple_stmt_iterator *gstmt) +{ + if (gsi_end_p (*gstmt)) + return true; + + gimple *stmt = gsi_stmt (*gstmt); + /* ?? Do I need to move debug statements? not quite sure.. */ + if (gimple_has_ops (stmt) + && !is_gimple_debug (stmt)) + { + tree dest = NULL_TREE; + /* Try to find the SSA_NAME being defined. For Statements with an LHS + use the LHS, if not, assume that the first argument of a call is the + value being defined. e.g. MASKED_LOAD etc. */ + if (gimple_has_lhs (stmt)) + dest = gimple_get_lhs (stmt); + else if (const gcall *call = dyn_cast (stmt)) + dest = gimple_arg (call, 0); + else if (const gcond *cond = dyn_cast (stmt)) + { + /* Operands of conds are ones we can't move. */ + fixed->add (gimple_cond_lhs (cond)); + fixed->add (gimple_cond_rhs (cond)); + } + + bool move = false; + + stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt); + if (!stmt_vinfo) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "early breaks not supported. Unknown" + " statement: %G", stmt); + return false; + } + + auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo); + if (dr_ref) + { + /* We currently only support statically allocated objects due to + not having first-faulting loads support or peeling for alignment + support. Compute the size of the referenced object (it could be + dynamically allocated). */ + tree obj = DR_BASE_ADDRESS (dr_ref); + if (!obj || TREE_CODE (obj) != ADDR_EXPR) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "early breaks only supported on statically" + " allocated objects.\n"); + return false; + } + + tree refop = TREE_OPERAND (obj, 0); + tree refbase = get_base_address (refop); + if (!refbase || !DECL_P (refbase) || !DECL_SIZE (refbase) + || TREE_CODE (DECL_SIZE (refbase)) != INTEGER_CST) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "early breaks only supported on statically" + " allocated objects.\n"); + return false; + } + + if (DR_IS_READ (dr_ref)) + { + loads->safe_push (dest); + bases->safe_push (dr_ref); + } + else if (DR_IS_WRITE (dr_ref)) + { + for (auto dr : bases) + if (same_data_refs_base_objects (dr, dr_ref)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, + vect_location, + "early breaks only supported," + " overlapping loads and stores found" + " before the break statement.\n"); + return false; + } + /* Any writes starts a new chain. */ + move = true; + } + } + + /* If a statement is live and escapes the loop through usage in the loop + epilogue then we can't move it since we need to maintain its + reachability through all exits. */ + bool skip = false; + if (STMT_VINFO_LIVE_P (stmt_vinfo) + && !(dr_ref && DR_IS_WRITE (dr_ref))) + { + imm_use_iterator imm_iter; + use_operand_p use_p; + FOR_EACH_IMM_USE_FAST (use_p, imm_iter, dest) + { + basic_block bb = gimple_bb (USE_STMT (use_p)); + skip = bb == LOOP_VINFO_IV_EXIT (loop_vinfo)->dest; + if (skip) + break; + } + } + + /* If we found the defining statement of a something that's part of the + chain then expand the chain with the new SSA_VARs being used. */ + if (!skip && (chain->contains (dest) || move)) + { + move = true; + for (unsigned x = 0; x < gimple_num_args (stmt); x++) + { + tree var = gimple_arg (stmt, x); + if (TREE_CODE (var) == SSA_NAME) + { + if (fixed->contains (dest)) + { + move = false; + fixed->add (var); + } + else + chain->add (var); + } + else + { + use_operand_p use_p; + ssa_op_iter iter; + FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_USE) + { + tree op = USE_FROM_PTR (use_p); + gcc_assert (TREE_CODE (op) == SSA_NAME); + if (fixed->contains (dest)) + { + move = false; + fixed->add (op); + } + else + chain->add (op); + } + } + } + + if (dump_enabled_p ()) + { + if (move) + dump_printf_loc (MSG_NOTE, vect_location, + "found chain %G", stmt); + else + dump_printf_loc (MSG_NOTE, vect_location, + "ignored chain %G, not single use", stmt); + } + } + + if (move) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "==> recording stmt %G", stmt); + + for (tree ref : loads) + if (stmt_may_clobber_ref_p (stmt, ref, true)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "early breaks not supported as memory used" + " may alias.\n"); + return false; + } + + /* If we've moved a VDEF, extract the defining MEM and update + usages of it. */ + tree vdef; + if ((vdef = gimple_vdef (stmt))) + { + /* This statement is to be moved. */ + LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).safe_push (stmt); + *reaching_vuse = gimple_vuse (stmt); + } + } + } + + gsi_prev (gstmt); + + if (!validate_early_exit_stmts (loop_vinfo, chain, fixed, loads, bases, + reaching_vuse, gstmt)) + return false; + + if (gimple_vuse (stmt) && !gimple_vdef (stmt)) + { + LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).safe_push (stmt); + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "marked statement for vUSE update: %G", stmt); + } + + return true; +} + +/* Funcion vect_analyze_early_break_dependences. + + Examime all the data references in the loop and make sure that if we have + mulitple exits that we are able to safely move stores such that they become + safe for vectorization. The function also calculates the place where to move + the instructions to and computes what the new vUSE chain should be. + + This works in tandem with the CFG that will be produced by + slpeel_tree_duplicate_loop_to_edge_cfg later on. */ + +static opt_result +vect_analyze_early_break_dependences (loop_vec_info loop_vinfo) +{ + DUMP_VECT_SCOPE ("vect_analyze_early_break_dependences"); + + hash_set chain, fixed; + auto_vec loads; + auto_vec bases; + basic_block dest_bb = NULL; + tree vuse = NULL; + + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "loop contains multiple exits, analyzing" + " statement dependencies.\n"); + + for (gcond *c : LOOP_VINFO_LOOP_CONDS (loop_vinfo)) + { + stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (c); + if (STMT_VINFO_TYPE (loop_cond_info) != loop_exit_ctrl_vec_info_type) + continue; + + gimple *stmt = STMT_VINFO_STMT (loop_cond_info); + gimple_stmt_iterator gsi = gsi_for_stmt (stmt); + + /* Initiaze the vuse chain with the one at the early break. */ + if (!vuse) + vuse = gimple_vuse (c); + + if (!validate_early_exit_stmts (loop_vinfo, &chain, &fixed, &loads, + &bases, &vuse, &gsi)) + return opt_result::failure_at (stmt, + "can't safely apply code motion to " + "dependencies of %G to vectorize " + "the early exit.\n", stmt); + + /* Save destination as we go, BB are visited in order and the last one + is where statements should be moved to. */ + if (!dest_bb) + dest_bb = gimple_bb (c); + else + { + basic_block curr_bb = gimple_bb (c); + if (dominated_by_p (CDI_DOMINATORS, curr_bb, dest_bb)) + dest_bb = curr_bb; + } + } + + dest_bb = FALLTHRU_EDGE (dest_bb)->dest; + gcc_assert (dest_bb); + LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo) = dest_bb; + + /* TODO: Remove? It's useful debug statement but may be too much. */ + for (auto g : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "updated use: %T, mem_ref: %G", + vuse, g); + } + + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "recorded statements to be moved to BB %d\n", + LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo)->index); + + return opt_result::success (); +} + /* Function vect_analyze_data_ref_dependences. Examine all the data references in the loop, and make sure there do not @@ -657,6 +983,11 @@ vect_analyze_data_ref_dependences (loop_vec_info loop_vinfo, return res; } + /* If we have early break statements in the loop, check to see if they + are of a form we can vectorizer. */ + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) + return vect_analyze_early_break_dependences (loop_vinfo); + return opt_result::success (); } diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 40f167d279589a5b97f618720cfbc0d41b7f2342..c123398aad207082384a2079c5234033c3d825ea 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -1040,6 +1040,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared) partial_load_store_bias (0), peeling_for_gaps (false), peeling_for_niter (false), + early_breaks (false), no_data_dependencies (false), has_mask_store (false), scalar_loop_scaling (profile_probability::uninitialized ()), @@ -11392,6 +11393,55 @@ update_epilogue_loop_vinfo (class loop *epilogue, tree advance) epilogue_vinfo->shared->save_datarefs (); } +/* When vectorizing early break statements instructions that happen before + the early break in the current BB need to be moved to after the early + break. This function deals with that and assumes that any validity + checks has already been performed. + + While moving the instructions if it encounters a VUSE or VDEF it then + corrects the VUSES as it moves the statements along. GDEST is the location + in which to insert the new statements. */ + +static void +move_early_exit_stmts (loop_vec_info loop_vinfo) +{ + if (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).is_empty ()) + return; + + /* Move all stmts that need moving. */ + basic_block dest_bb = LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo); + gimple_stmt_iterator dest_gsi = gsi_start_bb (dest_bb); + + for (gimple *stmt : LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo)) + { + /* Check to see if statement is still required for vect or has been + elided. */ + auto stmt_info = loop_vinfo->lookup_stmt (stmt); + if (!stmt_info) + continue; + + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, "moving stmt %G", stmt); + + gimple_stmt_iterator stmt_gsi = gsi_for_stmt (stmt); + gsi_move_before (&stmt_gsi, &dest_gsi); + gsi_prev (&dest_gsi); + update_stmt (stmt); + } + + /* Update all the stmts with their new reaching VUSES. */ + tree vuse = gimple_vuse (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).last ()); + for (auto p : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "updating vuse to %T for stmt %G", vuse, p); + unlink_stmt_vdef (p); + gimple_set_vuse (p, vuse); + update_stmt (p); + } +} + /* Function vect_transform_loop. The analysis phase has determined that the loop is vectorizable. @@ -11541,6 +11591,11 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call) vect_schedule_slp (loop_vinfo, LOOP_VINFO_SLP_INSTANCES (loop_vinfo)); } + /* Handle any code motion that we need to for early-break vectorization after + we've done peeling but just before we start vectorizing. */ + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) + move_early_exit_stmts (loop_vinfo); + /* FORNOW: the vectorizer supports only loops which body consist of one basic block (header + empty latch). When the vectorizer will support more involved loop forms, the order by which the BBs are diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 99ba75e98c0d185edd78c7b8b9947618d18576cc..42cebb92789247434a91cb8e74c0557e75d1ea2c 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -13511,6 +13511,9 @@ vect_is_simple_use (tree operand, vec_info *vinfo, enum vect_def_type *dt, case vect_first_order_recurrence: dump_printf (MSG_NOTE, "first order recurrence\n"); break; + case vect_early_exit_def: + dump_printf (MSG_NOTE, "early exit\n"); + break; case vect_unknown_def_type: dump_printf (MSG_NOTE, "unknown\n"); break; diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index a4043e4a6568a9e8cfaf9298fe940289e165f9e2..1418913d2c308b0cf78352e29dc9958746fb9c94 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -66,6 +66,7 @@ enum vect_def_type { vect_double_reduction_def, vect_nested_cycle, vect_first_order_recurrence, + vect_early_exit_def, vect_unknown_def_type }; @@ -888,6 +889,10 @@ public: we need to peel off iterations at the end to form an epilogue loop. */ bool peeling_for_niter; + /* When the loop has early breaks that we can vectorize we need to peel + the loop for the break finding loop. */ + bool early_breaks; + /* List of loop additional IV conditionals found in the loop. */ auto_vec conds; @@ -942,6 +947,20 @@ public: /* The controlling loop IV for the scalar loop being vectorized. This IV controls the natural exits of the loop. */ edge scalar_loop_iv_exit; + + /* Used to store the list of statements needing to be moved if doing early + break vectorization as they would violate the scalar loop semantics if + vectorized in their current location. These are stored in order that they need + to be moved. */ + auto_vec early_break_conflict; + + /* The final basic block where to move statements to. In the case of + multiple exits this could be pretty far away. */ + basic_block early_break_dest_bb; + + /* Statements whose VUSES need updating if early break vectorization is to + happen. */ + auto_vec early_break_vuses; } *loop_vec_info; /* Access Functions. */ @@ -996,6 +1015,10 @@ public: #define LOOP_VINFO_REDUCTION_CHAINS(L) (L)->reduction_chains #define LOOP_VINFO_PEELING_FOR_GAPS(L) (L)->peeling_for_gaps #define LOOP_VINFO_PEELING_FOR_NITER(L) (L)->peeling_for_niter +#define LOOP_VINFO_EARLY_BREAKS(L) (L)->early_breaks +#define LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS(L) (L)->early_break_conflict +#define LOOP_VINFO_EARLY_BRK_DEST_BB(L) (L)->early_break_dest_bb +#define LOOP_VINFO_EARLY_BRK_VUSES(L) (L)->early_break_vuses #define LOOP_VINFO_LOOP_CONDS(L) (L)->conds #define LOOP_VINFO_LOOP_IV_COND(L) (L)->loop_iv_cond #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies From patchwork Wed Jun 28 13:43:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 113886 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp8937550vqr; Wed, 28 Jun 2023 06:45:18 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5ra3T64LPfU5m5yPhbod5ZkmlRyxNB9e5JzFW6ku4yK6NW9HZddZsEMY+T3kJFn6G12bjV X-Received: by 2002:a17:907:16a9:b0:98c:e3a1:dbb4 with SMTP id hc41-20020a17090716a900b0098ce3a1dbb4mr20031807ejc.68.1687959918635; Wed, 28 Jun 2023 06:45:18 -0700 (PDT) Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id y18-20020a17090614d200b0098e0a14c671si5046057ejc.614.2023.06.28.06.45.18 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Jun 2023 06:45:18 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=LJQwgM4C; arc=fail (signature failed); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 42A2E3857716 for ; Wed, 28 Jun 2023 13:45:00 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 42A2E3857716 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1687959900; bh=gPAO0trMrLwudeRQm0bpHCv8dvDT5b4k8dM+EC9afcs=; h=Date:To:Cc:Subject:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=LJQwgM4CojcssH+F7RbchAKWdUsIJP3pZIkO77iOylXPxHQWjFSw3goNgnms/x3RP i2upxlRt3bZmCFJPHPFlwwh2VYNArAvH/VajVF2uio2Zo20Ir+Pu4WpcMIUM7m1YSb Qg6yFheNeSz8Pdx0FPRBCKeLmLqy4eIEWO9+/6c8= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR05-AM6-obe.outbound.protection.outlook.com (mail-am6eur05on2043.outbound.protection.outlook.com [40.107.22.43]) by sourceware.org (Postfix) with ESMTPS id DBC9C3858C2D for ; Wed, 28 Jun 2023 13:44:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DBC9C3858C2D Received: from AM6PR04CA0037.eurprd04.prod.outlook.com (2603:10a6:20b:f0::14) by DB3PR08MB9898.eurprd08.prod.outlook.com (2603:10a6:10:43e::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.24; Wed, 28 Jun 2023 13:44:00 +0000 Received: from AM7EUR03FT035.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:f0:cafe::41) by AM6PR04CA0037.outlook.office365.com (2603:10a6:20b:f0::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6544.19 via Frontend Transport; Wed, 28 Jun 2023 13:44:00 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM7EUR03FT035.mail.protection.outlook.com (100.127.141.24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6544.20 via Frontend Transport; Wed, 28 Jun 2023 13:44:00 +0000 Received: ("Tessian outbound 52217515e112:v142"); Wed, 28 Jun 2023 13:43:59 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 0c163df15ce77150 X-CR-MTA-TID: 64aa7808 Received: from 318ada93779d.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id DA64514C-FE22-49BF-9C9E-0875E7A4A6BE.1; Wed, 28 Jun 2023 13:43:54 +0000 Received: from EUR02-DB5-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 318ada93779d.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 28 Jun 2023 13:43:54 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=cT3drSj6GDyHMaULlJ5YSX5xZ53vhGwpUExaw7dXCX2q3pZ2Ls9Jvryo3Q5AK+b9HTTTIGsqm2FTKXIDKcPgCqmiQ5KEsB/WSeOfGfYEe33FhZYrEcpG639DrfqlcqmWMiNrcW0+q4O9Wzdi5jscar4N8ZWuVAXvRFvME9xOrKHVO3wWgp43uxUhpzvbEscN/7UpXx45wVMy/vwN3grWQlaimouIisq7Eb/pw++oS4tpx+67Kf4OsF3rdZ0Wiz2QUPw6n2Rt93OVKHKM/4JrJHL69lviEpgukKEagm1Sdj/PaKbeGd0SBj8PYoGs9ylG0Eu70LGKDe0K+E5bDM2dpg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=gPAO0trMrLwudeRQm0bpHCv8dvDT5b4k8dM+EC9afcs=; b=lfuGbU0CkPe+aRe4YGVrzro/15K4yUL3hvNnjncyrRG//zkX/zK8KNofXqegZvl/sNM5dzELM6TBjsatpjrATTHfnjLMVQ7bgqEsPeFKeKL8n1q+F4xXYxMONrnr4/Pz1HOdz63EVcEv2YiQ0tMPt46kuMaaWCcYZnnf//sdPc+AXvQksXf2MDxlTprX7N9YTUVJneELHQeg8dASSH6Mqd1lImGWPTxpa3LHkHc4f32J4LJLEROCv2d7VzhBvKGTz1/poUAMsaf0SZ6Vjsx+mM/0vP5wLB7VeLiriDq3vpVbj4eLX6WQhbryNB5yV7MdpLq7bdX19GU8kVAtiSRd3Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by GV2PR08MB8172.eurprd08.prod.outlook.com (2603:10a6:150:7c::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.24; Wed, 28 Jun 2023 13:43:42 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::2301:1cde:cfe7:eaf0]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::2301:1cde:cfe7:eaf0%6]) with mapi id 15.20.6521.026; Wed, 28 Jun 2023 13:43:42 +0000 Date: Wed, 28 Jun 2023 14:43:33 +0100 To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, rguenther@suse.de, jlaw@ventanamicro.com Subject: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits Message-ID: Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: DX0P273CA0105.AREP273.PROD.OUTLOOK.COM (2603:1086:300:5c::16) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|GV2PR08MB8172:EE_|AM7EUR03FT035:EE_|DB3PR08MB9898:EE_ X-MS-Office365-Filtering-Correlation-Id: e5ba3a83-8831-4473-145e-08db77ddba5d x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: q7JfLdGkUk+IAZzdiY//hbn/E3AMxLG3BXJmG8PCxaFll9vZ4tZMEPmU5TtPbOtikRLiwV3Z8T11rWOuPCjjs/amObGV1xEoHOA9apngp3zXuy62Pa7rChN853WLh6t1fU9gEA0WFuTkhO2FseYxnmCLbX46dU3nXI+yJFP2XEmR8oNnkEsEGA0Y/3m8kfzlRycBwjlb6swuLfpy+UOUA9gtyvYNCC+oKlDlAcFEX+Ey68aabS1bBtREY704NbB349y6ccP1Dc/adjtV4lSPtsz25v/eF8f9+eAuHbBxNMfV4CKoHIiUOkTwK+E7urZfYdQDPwtLWjrPD7cB43m87xFXHU1v1BfLfoy68PumNLXZbLaK5qtXlsx0+n4JUHf8OK61M9qszfII4XWJf2XOCpvEBQ4kOMW6CTD/IuBG8KZw9sfbeely3RkuveRn77pxl3l/ev7c1TilV1VrHDG+8oTphhS+s19UCrNU/eya5tN0N51BXaRYvsjpoJy82g8h0ReKOVMYjwUyXND/I7XgmMqtUrR5Cs0JlUmT2qkaUmoxnHQGl8CdqN/C1WgV/sj3IWvPclg3ZWV3qKV73APbuxWOQx18/SlxXCf5mMnJlr3oLH1H2C+XknyU5KA/HtUM X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230028)(4636009)(39860400002)(136003)(346002)(366004)(396003)(376002)(451199021)(2906002)(6486002)(33964004)(2616005)(4743002)(44144004)(6666004)(83380400001)(38100700002)(186003)(6512007)(86362001)(26005)(41300700001)(478600001)(36756003)(66556008)(66946007)(316002)(4326008)(6916009)(66476007)(6506007)(44832011)(5660300002)(235185007)(8936002)(8676002)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: GV2PR08MB8172 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM7EUR03FT035.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: c257f2d8-f847-422a-ecad-08db77ddafb7 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: wC2f9Ubol8eUflD273XDOM1GwpEaOqa7NAD+dURhHgrz5JbMoNIUur23aLc7rGt9HEcapP1LBl3/iRPF4DIi/nysoaRnVPxOaP/yiRUYvYIhS2POUl20W8382PRu2FKro/FSpQqUsO+KKoABOVHvLOH/w2JlH22KtisLtZZj2hZ3DsqrMsBz7uLuKlMPV4D6rgCMDMuRD8wgfe6ldQNzdFsEl76eHHEkjAIGEqqxYjwU6SUTcD+/jdnNssmY9dAJWTCFFk004Iy+mxujDywju3zficmPtj8sdp2ZCA4fPtGp/PCqS+KIvQ86MyRrVT15ZjoH1NndMHR7wx6Vjbx/CdLz410l6EtznjDhPx9R/lbXo4gaIF55OXy6SqoEPN8A6sneSSkOS08McXrNcJbFFNbRaVuG9/U6hBWPmA0OWoDhk4U76ujRgWbnI/IxJLkItsGrCEeX7LWG6wdpkGeNE7IVnUPd02dGR2QmcrVkvmeIuwgpIk6Y/azKVtVbjQFdkitB5J0ISU/5THcC1xbH5g7kG60VpzW6ckohNoL0oARzgj7RTbXbgVpdD87+t22y5GeACdGz6pCdJ1XkCFEjYztEnQrbsQFGq/N6md/MwJfqhfaKl0PzuJ9fILSf7BNwD2G2lG439XeBCm+uj5fvKselAgnDQ8seI3g6rHBmKzDBIGIbIkno4IMACwpQewK8HN4/J4AU+nthkfr3UwvWpoKOJ3Td8FGsiiNiSSjRq5L9+piZIsIw9DCRyNMCP4HHH2H+ASS5//fDjmWqivggmQ== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230028)(4636009)(346002)(136003)(396003)(376002)(39860400002)(451199021)(40470700004)(46966006)(36840700001)(82310400005)(2906002)(36860700001)(26005)(36756003)(40460700003)(70206006)(44832011)(235185007)(356005)(6916009)(4326008)(8936002)(8676002)(86362001)(316002)(81166007)(82740400003)(5660300002)(70586007)(40480700001)(41300700001)(6486002)(107886003)(47076005)(4743002)(6512007)(186003)(6506007)(33964004)(44144004)(336012)(478600001)(6666004)(83380400001)(2616005)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Jun 2023 13:44:00.0728 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: e5ba3a83-8831-4473-145e-08db77ddba5d X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM7EUR03FT035.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB3PR08MB9898 X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tamar Christina via Gcc-patches From: Tamar Christina Reply-To: Tamar Christina Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769954259682106738?= X-GMAIL-MSGID: =?utf-8?q?1769954259682106738?= Hi All, There's an existing bug in loop frequency scaling where the if statement checks to see if there's a single exit, and records an dump file note but then continues. It then tries to access the null pointer, which of course fails. For multiple loop exists it's not really clear how to scale the exit probablities as it's really unknown which exit is most probably. For that reason I ignore the exit edges during scaling but still adjust the loop body. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * cfgloopmanip.cc (scale_loop_frequencies): Fix typo. (scale_loop_profile): Don't access null pointer. --- inline copy of patch -- diff --git a/gcc/cfgloopmanip.cc b/gcc/cfgloopmanip.cc index 6e09dcbb0b1864bc64ffd570a4b923f50c3819b5..b10ef3d2be82902ccd74e52a4318217b2db13bcb 100644 --- diff --git a/gcc/cfgloopmanip.cc b/gcc/cfgloopmanip.cc index 6e09dcbb0b1864bc64ffd570a4b923f50c3819b5..b10ef3d2be82902ccd74e52a4318217b2db13bcb 100644 --- a/gcc/cfgloopmanip.cc +++ b/gcc/cfgloopmanip.cc @@ -501,7 +501,7 @@ scale_loop_frequencies (class loop *loop, profile_probability p) /* Scale profile in LOOP by P. If ITERATION_BOUND is non-zero, scale even further if loop is predicted to iterate too many times. - Before caling this function, preheader block profile should be already + Before calling this function, preheader block profile should be already scaled to final count. This is necessary because loop iterations are determined by comparing header edge count to latch ege count and thus they need to be scaled synchronously. */ @@ -597,14 +597,14 @@ scale_loop_profile (class loop *loop, profile_probability p, /* If latch exists, change its count, since we changed probability of exit. Theoretically we should update everything from source of exit edge to latch, but for vectorizer this is enough. */ - if (loop->latch && loop->latch != e->src) + if (e && loop->latch && loop->latch != e->src) loop->latch->count += count_delta; /* Scale the probabilities. */ scale_loop_frequencies (loop, p); /* Change latch's count back. */ - if (loop->latch && loop->latch != e->src) + if (e && loop->latch && loop->latch != e->src) loop->latch->count -= count_delta; if (dump_file && (dump_flags & TDF_DETAILS)) --- a/gcc/cfgloopmanip.cc +++ b/gcc/cfgloopmanip.cc @@ -501,7 +501,7 @@ scale_loop_frequencies (class loop *loop, profile_probability p) /* Scale profile in LOOP by P. If ITERATION_BOUND is non-zero, scale even further if loop is predicted to iterate too many times. - Before caling this function, preheader block profile should be already + Before calling this function, preheader block profile should be already scaled to final count. This is necessary because loop iterations are determined by comparing header edge count to latch ege count and thus they need to be scaled synchronously. */ @@ -597,14 +597,14 @@ scale_loop_profile (class loop *loop, profile_probability p, /* If latch exists, change its count, since we changed probability of exit. Theoretically we should update everything from source of exit edge to latch, but for vectorizer this is enough. */ - if (loop->latch && loop->latch != e->src) + if (e && loop->latch && loop->latch != e->src) loop->latch->count += count_delta; /* Scale the probabilities. */ scale_loop_frequencies (loop, p); /* Change latch's count back. */ - if (loop->latch && loop->latch != e->src) + if (e && loop->latch && loop->latch != e->src) loop->latch->count -= count_delta; if (dump_file && (dump_flags & TDF_DETAILS)) From patchwork Wed Jun 28 13:43:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 113888 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp8937679vqr; Wed, 28 Jun 2023 06:45:30 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4RZNjBJ05GRYeBpCINYK975QqM844zuctdQ+gD3NXTmDgz99y32M3z1mSGfXU4VWF9ZI1q X-Received: by 2002:a17:907:16a9:b0:98c:e3a1:dbb4 with SMTP id hc41-20020a17090716a900b0098ce3a1dbb4mr20032450ejc.68.1687959930456; Wed, 28 Jun 2023 06:45:30 -0700 (PDT) Received: from sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id gg18-20020a170906e29200b00988ceb28006si5741724ejb.754.2023.06.28.06.45.30 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Jun 2023 06:45:30 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=gMnSHkMD; arc=fail (signature failed); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0CD4B3856DCE for ; Wed, 28 Jun 2023 13:45:05 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0CD4B3856DCE DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1687959905; bh=KlUK5zs44K+e6A8M/SegstZjuKi/yCGRI+AD+gbw9a0=; h=Date:To:Cc:Subject:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=gMnSHkMDP1M8twlLwiyV6wqaSCwkyHw1RAMJ8gPOwKEI9X57WGhWsGVzWjcD4tr6/ ohnhau0JSs7GuOlfee4DNN/m5cy7MBX1aVxfmNo2cIcefnGr/MS1FGArb2Ems+IoUf sB3EJPdxPevSG+/yiArM3cQ+K8sugxPlEIqbQC+w= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR05-DB8-obe.outbound.protection.outlook.com (mail-db8eur05on2054.outbound.protection.outlook.com [40.107.20.54]) by sourceware.org (Postfix) with ESMTPS id 4DDB83858C2D for ; Wed, 28 Jun 2023 13:44:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4DDB83858C2D Received: from DUZPR01CA0076.eurprd01.prod.exchangelabs.com (2603:10a6:10:46a::7) by DBBPR08MB6249.eurprd08.prod.outlook.com (2603:10a6:10:203::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.26; Wed, 28 Jun 2023 13:44:15 +0000 Received: from DBAEUR03FT053.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:46a:cafe::be) by DUZPR01CA0076.outlook.office365.com (2603:10a6:10:46a::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.34 via Frontend Transport; Wed, 28 Jun 2023 13:44:15 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT053.mail.protection.outlook.com (100.127.142.121) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6544.20 via Frontend Transport; Wed, 28 Jun 2023 13:44:15 +0000 Received: ("Tessian outbound b11b8bb4dfe8:v142"); Wed, 28 Jun 2023 13:44:15 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 3452ead84983cf05 X-CR-MTA-TID: 64aa7808 Received: from ac284f754147.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 3E7F5CAB-5A96-4D9D-9D7D-0CCD857AC62B.1; Wed, 28 Jun 2023 13:44:09 +0000 Received: from EUR01-DB5-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id ac284f754147.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 28 Jun 2023 13:44:09 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=XDP7T2BMnztSMsaJWBF20YbyJ6Ut56YEu43V/dcZHkNxtuUomFA0KC72EfBIa3BS7TdvUkJbDGrjxhUUkhuBhJySbDiVdSfuSq+BYv3FHa5jV7rNzRvuZtJQaJgSccFBTcUMXLmPzSpXzSl899UFGrQrbKaC5eNnpGg72oe7OK/ZxH4hYd8NsIw0y2v0GH9akJOYwO9GsB2YvZpOMgwjGSZT6zIdQlZZfaL4N9SiwhM2gB7C6nQKzdh2LXbkaUa0rYw6OOtCM+XZoFRD3w5OxJGk4hYHcFCDdK+8CJFDOS6m6obMp2QE9U6LqAH4tilj2WlvYTU9oit1xxb3wISq0w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=KlUK5zs44K+e6A8M/SegstZjuKi/yCGRI+AD+gbw9a0=; b=AkSRncNNBohAmumYNZNWslb9YyTwN2rJodVGBEX6S0QaimlUXcy7VZSEHPIpd/smjyEEZihMCaRAvk9v2MjDZQ09aVuHVZieNgbY/52EHqNYizOAEpYjvEZ4kh7kkpKA326ad6hJKPF3qz5WBD/c8PstjGDVBrIXDqUEPFLm5kwQaqDTzwC7fujA8upngb3w3idDmRs17mjK0GFiZfQBiuV6f6LNd643PVTUIL8GXqdOn0xe4oDiu83wP84IJtkDLTGqw8XHt5rvh6r+50wAl27+msmwHFyr52LTDmAH8SQw0OMmNL7ipka+C7zwlP2trem7LeP78JMEVTXg2zC1kQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by GV2PR08MB8172.eurprd08.prod.outlook.com (2603:10a6:150:7c::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.24; Wed, 28 Jun 2023 13:44:05 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::2301:1cde:cfe7:eaf0]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::2301:1cde:cfe7:eaf0%6]) with mapi id 15.20.6521.026; Wed, 28 Jun 2023 13:44:05 +0000 Date: Wed, 28 Jun 2023 14:43:58 +0100 To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, rguenther@suse.de, jlaw@ventanamicro.com Subject: [PATCH 5/19]middle-end: Enable bit-field vectorization to work correctly when we're vectoring inside conds Message-ID: Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: SA1P222CA0146.NAMP222.PROD.OUTLOOK.COM (2603:10b6:806:3c2::28) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|GV2PR08MB8172:EE_|DBAEUR03FT053:EE_|DBBPR08MB6249:EE_ X-MS-Office365-Filtering-Correlation-Id: 7a13280d-df32-4b4e-c2f0-08db77ddc3c0 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: qbWQgkwT3M4yl4dGPJP/ql+65byyIpaNgX8Uf//ijOJkJBBmbfSWTM0fFufhJLAuAXBJP5NXMiWXx6NiqfCWYnZy67JbTdq3WtMkB8OMLdTrF6zPCLBjZj81hYDzQA/UBbpggD/u9X7aDqLotmebbqK5Qb/JOGzC/LWitFzax8GE8l7kXQMZ2F7yEc/XQirbdXf4sb1y2uw5lStpN2gOAuye4Adpq2e9wdQM10IJJx9PMNdy4rDegEEK7rcXTgocjSUlJWmOSTMJuahnEfABLP2d/8R5d1csXuZpFFte92/VlfUyHlQ0IRiat2m68yYh/1KYuE9aEqUWAvaf8fVL6EdLNlQE1hakbbMfwIYElAT+ZTLANl7JVKV0Hw5I7aHb+Dl4YMbS/FsXKCDHyVARYy+t4zE879TCYGy+GonFZd3UlWHiW3nPPGnMbeEa0dGYOB1US+Os68nv9wkdIkYE6crECjCvuWe4jlMX/mjU0313qtf1KlhrsuwEGjZwo8ZPLHP9t/JRD+NLGJDFANgDprTy+SPhZCTDcToyC3mUi5nhekSRsR+LxeSrJvdes3GYVXXPbpDOvEV5ksgoj/QoQ9DQRpxuBUxCfqw7wG8LRKZonKAsto5X4OcxFz7TXe8l X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230028)(4636009)(39860400002)(136003)(346002)(366004)(396003)(376002)(451199021)(2906002)(6486002)(33964004)(2616005)(4743002)(44144004)(6666004)(83380400001)(38100700002)(186003)(6512007)(86362001)(26005)(41300700001)(478600001)(36756003)(66556008)(66946007)(316002)(4326008)(6916009)(66476007)(6506007)(44832011)(5660300002)(235185007)(8936002)(8676002)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: GV2PR08MB8172 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT053.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 5f5c0347-5c42-4eab-328f-08db77ddbd6f X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: UI1ft/fTcEEzLNg6RwMGC38TdPWdb8Vs4P00+AipHR48d4qTdF5IkOw4c3p8B7vk5N/isnr7vKMfP5Z2gOIaGOjN0VPIw1HLaCzM4i2/fhQ8MV6WE+lb3yfQb9v2smZ1hHWKbsFTwjd60Q5bPo18hRWK9bQqqgcvJAEVXqG4iP/nl4qTw3avbs3o/vjrv83cXmcQf3WscNrQB5BujzHIP5rZ7cPyGJ0y5pi/0I84hBd0N9NI/jkool7xRMCzaqKbuY8HRnypF5BtS403sNjHjqxbKByPP8dXvCs8uAKLstM+VOEbu6Hhn8mrNdUUEqmHu+SDlcoK6EIgdiCMkHMrnzcUHk7bAb+acBOWKv/bdQUBb4sT4FC4pZ9b0aYTDEa5zePtAuaAQb7sqGjHdhLgQlmiXzWse2+SrkDJ4nALZ6iRWMPSLcyIJu7fd5OxhkU1aDHEmjfZX5mn9xLgbZoWGVH6qecIQOpggSvvrmpeKgj1W0Y03lLbRXVRac/Htrqg9vpSjzzr0KWUVEIXNwPORbFboZthHmTsmc/cRTowL0SL7jUBtYazPiucaE1q6paiUHnDEY248RYzU4vWabhdeOTHFHS/EBqkmIxIgCa0joKWwRGl2gJtk10hpjC+iftsaZJeRWHkX+Ye/moOG64KyAHUVsBMqff2VHJDFiR2uFYloVQtClutqIs3ptWMI9pDb8GcRqMfk5CwVTHRE4U+jGWbOjrdiB+lZk3xw1c1QPBsvuZ2ReRBrJztDhs+dze2DYvE4UoFPQa0pmOWOQDN0w== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230028)(4636009)(346002)(376002)(136003)(396003)(39860400002)(451199021)(36840700001)(46966006)(40470700004)(82310400005)(41300700001)(36860700001)(70206006)(107886003)(6486002)(47076005)(26005)(4743002)(6512007)(44144004)(6666004)(83380400001)(2616005)(336012)(478600001)(6506007)(33964004)(186003)(2906002)(356005)(44832011)(235185007)(36756003)(40460700003)(5660300002)(70586007)(81166007)(82740400003)(40480700001)(4326008)(316002)(6916009)(8676002)(8936002)(86362001)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Jun 2023 13:44:15.8724 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 7a13280d-df32-4b4e-c2f0-08db77ddc3c0 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT053.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DBBPR08MB6249 X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tamar Christina via Gcc-patches From: Tamar Christina Reply-To: Tamar Christina Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769954272056728411?= X-GMAIL-MSGID: =?utf-8?q?1769954272056728411?= Hi All, The bitfield vectorization support does not currently recognize bitfields inside gconds. This means they can't be used as conditions for early break vectorization which is a functionality we require. This adds support for them by explicitly matching and handling gcond as a source. Testcases are added in the testsuite update patch as the only way to get there is with the early break vectorization. See tests: - vect-early-break_20.c - vect-early-break_21.c Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * tree-vect-patterns.cc (vect_init_pattern_stmt): Copy STMT_VINFO_TYPE from original statement. (vect_recog_bitfield_ref_pattern): Support bitfields in gcond. Co-Authored-By: Andre Vieira --- inline copy of patch -- diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index 60bc9be6819af9bd28a81430869417965ba9d82d..c221b1d64449ce3b6c8864bbec4b17ddf938c2d6 100644 --- diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index 60bc9be6819af9bd28a81430869417965ba9d82d..c221b1d64449ce3b6c8864bbec4b17ddf938c2d6 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -128,6 +128,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt, STMT_VINFO_RELATED_STMT (pattern_stmt_info) = orig_stmt_info; STMT_VINFO_DEF_TYPE (pattern_stmt_info) = STMT_VINFO_DEF_TYPE (orig_stmt_info); + STMT_VINFO_TYPE (pattern_stmt_info) = STMT_VINFO_TYPE (orig_stmt_info); if (!STMT_VINFO_VECTYPE (pattern_stmt_info)) { gcc_assert (!vectype @@ -2488,27 +2489,37 @@ static gimple * vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info, tree *type_out) { - gassign *first_stmt = dyn_cast (stmt_info->stmt); + gassign *conv_stmt = dyn_cast (stmt_info->stmt); + gcond *cond_stmt = dyn_cast (stmt_info->stmt); - if (!first_stmt) - return NULL; - - gassign *bf_stmt; - if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (first_stmt)) - && TREE_CODE (gimple_assign_rhs1 (first_stmt)) == SSA_NAME) + gimple *bf_stmt = NULL; + tree cond_cst = NULL_TREE; + if (cond_stmt) { - gimple *second_stmt - = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (first_stmt)); - bf_stmt = dyn_cast (second_stmt); - if (!bf_stmt - || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF) + tree op = gimple_cond_lhs (cond_stmt); + if (TREE_CODE (op) != SSA_NAME) + return NULL; + bf_stmt = dyn_cast (SSA_NAME_DEF_STMT (op)); + cond_cst = gimple_cond_rhs (cond_stmt); + if (TREE_CODE (cond_cst) != INTEGER_CST) return NULL; } - else + else if (conv_stmt + && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (conv_stmt)) + && TREE_CODE (gimple_assign_rhs1 (conv_stmt)) == SSA_NAME) + { + gimple *second_stmt = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (conv_stmt)); + bf_stmt = dyn_cast (second_stmt); + } + + if (!bf_stmt + || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF) return NULL; tree bf_ref = gimple_assign_rhs1 (bf_stmt); tree container = TREE_OPERAND (bf_ref, 0); + tree ret_type = cond_cst ? TREE_TYPE (container) + : TREE_TYPE (gimple_assign_lhs (conv_stmt)); if (!bit_field_offset (bf_ref).is_constant () || !bit_field_size (bf_ref).is_constant () @@ -2522,8 +2533,6 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info, gimple *use_stmt, *pattern_stmt; use_operand_p use_p; - tree ret = gimple_assign_lhs (first_stmt); - tree ret_type = TREE_TYPE (ret); bool shift_first = true; tree container_type = TREE_TYPE (container); tree vectype = get_vectype_for_scalar_type (vinfo, container_type); @@ -2560,7 +2569,8 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info, /* If the only use of the result of this BIT_FIELD_REF + CONVERT is a PLUS_EXPR then do the shift last as some targets can combine the shift and add into a single instruction. */ - if (single_imm_use (gimple_assign_lhs (first_stmt), &use_p, &use_stmt)) + if (conv_stmt + && single_imm_use (gimple_assign_lhs (conv_stmt), &use_p, &use_stmt)) { if (gimple_code (use_stmt) == GIMPLE_ASSIGN && gimple_assign_rhs_code (use_stmt) == PLUS_EXPR) @@ -2620,7 +2630,21 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info, NOP_EXPR, result); } - *type_out = STMT_VINFO_VECTYPE (stmt_info); + if (cond_cst) + { + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype); + pattern_stmt + = gimple_build_cond (gimple_cond_code (cond_stmt), + gimple_get_lhs (pattern_stmt), + fold_convert (ret_type, cond_cst), + gimple_cond_true_label (cond_stmt), + gimple_cond_false_label (cond_stmt)); + *type_out = STMT_VINFO_VECTYPE (stmt_info); + } + else + *type_out + = get_vectype_for_scalar_type (vinfo, + TREE_TYPE (gimple_get_lhs (pattern_stmt))); vect_pattern_detected ("bitfield_ref pattern", stmt_info->stmt); return pattern_stmt; --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -128,6 +128,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple *pattern_stmt, STMT_VINFO_RELATED_STMT (pattern_stmt_info) = orig_stmt_info; STMT_VINFO_DEF_TYPE (pattern_stmt_info) = STMT_VINFO_DEF_TYPE (orig_stmt_info); + STMT_VINFO_TYPE (pattern_stmt_info) = STMT_VINFO_TYPE (orig_stmt_info); if (!STMT_VINFO_VECTYPE (pattern_stmt_info)) { gcc_assert (!vectype @@ -2488,27 +2489,37 @@ static gimple * vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info, tree *type_out) { - gassign *first_stmt = dyn_cast (stmt_info->stmt); + gassign *conv_stmt = dyn_cast (stmt_info->stmt); + gcond *cond_stmt = dyn_cast (stmt_info->stmt); - if (!first_stmt) - return NULL; - - gassign *bf_stmt; - if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (first_stmt)) - && TREE_CODE (gimple_assign_rhs1 (first_stmt)) == SSA_NAME) + gimple *bf_stmt = NULL; + tree cond_cst = NULL_TREE; + if (cond_stmt) { - gimple *second_stmt - = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (first_stmt)); - bf_stmt = dyn_cast (second_stmt); - if (!bf_stmt - || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF) + tree op = gimple_cond_lhs (cond_stmt); + if (TREE_CODE (op) != SSA_NAME) + return NULL; + bf_stmt = dyn_cast (SSA_NAME_DEF_STMT (op)); + cond_cst = gimple_cond_rhs (cond_stmt); + if (TREE_CODE (cond_cst) != INTEGER_CST) return NULL; } - else + else if (conv_stmt + && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (conv_stmt)) + && TREE_CODE (gimple_assign_rhs1 (conv_stmt)) == SSA_NAME) + { + gimple *second_stmt = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (conv_stmt)); + bf_stmt = dyn_cast (second_stmt); + } + + if (!bf_stmt + || gimple_assign_rhs_code (bf_stmt) != BIT_FIELD_REF) return NULL; tree bf_ref = gimple_assign_rhs1 (bf_stmt); tree container = TREE_OPERAND (bf_ref, 0); + tree ret_type = cond_cst ? TREE_TYPE (container) + : TREE_TYPE (gimple_assign_lhs (conv_stmt)); if (!bit_field_offset (bf_ref).is_constant () || !bit_field_size (bf_ref).is_constant () @@ -2522,8 +2533,6 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info, gimple *use_stmt, *pattern_stmt; use_operand_p use_p; - tree ret = gimple_assign_lhs (first_stmt); - tree ret_type = TREE_TYPE (ret); bool shift_first = true; tree container_type = TREE_TYPE (container); tree vectype = get_vectype_for_scalar_type (vinfo, container_type); @@ -2560,7 +2569,8 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info, /* If the only use of the result of this BIT_FIELD_REF + CONVERT is a PLUS_EXPR then do the shift last as some targets can combine the shift and add into a single instruction. */ - if (single_imm_use (gimple_assign_lhs (first_stmt), &use_p, &use_stmt)) + if (conv_stmt + && single_imm_use (gimple_assign_lhs (conv_stmt), &use_p, &use_stmt)) { if (gimple_code (use_stmt) == GIMPLE_ASSIGN && gimple_assign_rhs_code (use_stmt) == PLUS_EXPR) @@ -2620,7 +2630,21 @@ vect_recog_bitfield_ref_pattern (vec_info *vinfo, stmt_vec_info stmt_info, NOP_EXPR, result); } - *type_out = STMT_VINFO_VECTYPE (stmt_info); + if (cond_cst) + { + append_pattern_def_seq (vinfo, stmt_info, pattern_stmt, vectype); + pattern_stmt + = gimple_build_cond (gimple_cond_code (cond_stmt), + gimple_get_lhs (pattern_stmt), + fold_convert (ret_type, cond_cst), + gimple_cond_true_label (cond_stmt), + gimple_cond_false_label (cond_stmt)); + *type_out = STMT_VINFO_VECTYPE (stmt_info); + } + else + *type_out + = get_vectype_for_scalar_type (vinfo, + TREE_TYPE (gimple_get_lhs (pattern_stmt))); vect_pattern_detected ("bitfield_ref pattern", stmt_info->stmt); return pattern_stmt; From patchwork Wed Jun 28 13:44:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 113891 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp8938182vqr; Wed, 28 Jun 2023 06:46:19 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6/lc6O5PmCbG4s8fvixosVG56vKzTe0d1lXqbBmD+dyYNC57UIdlPjEf8tuxc+4w3ioinY X-Received: by 2002:a17:907:77c4:b0:992:3b89:f980 with SMTP id kz4-20020a17090777c400b009923b89f980mr3938838ejc.31.1687959978656; Wed, 28 Jun 2023 06:46:18 -0700 (PDT) Received: from sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id c4-20020a170906924400b00991e0192079si3544037ejx.891.2023.06.28.06.46.18 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Jun 2023 06:46:18 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=TwCa38T0; arc=fail (signature failed); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D20FA3856DD6 for ; Wed, 28 Jun 2023 13:45:31 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D20FA3856DD6 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1687959931; bh=uIcU2EX+WHKsAmN0PiO5bfoYVrwKOtEruMW4polfUL4=; h=Date:To:Cc:Subject:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=TwCa38T0iFBbX1oR7fIvnyzNHgHpl3og7xK/p9WpcnECLK0vfl2wUa1ocgAE2xrm+ vY+NuPRLOv7TNsWBY9SJUmMS08BTk8vmSEL97SFhZqebBB0caHxeMH2FsO6srulrdo gKATUn2JYh3FeZmupWIenHVmbxAjNNP/mm6+3d78= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR02-DB5-obe.outbound.protection.outlook.com (mail-db5eur02on2079.outbound.protection.outlook.com [40.107.249.79]) by sourceware.org (Postfix) with ESMTPS id 4AB92385770C for ; Wed, 28 Jun 2023 13:44:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4AB92385770C Received: from DUZPR01CA0097.eurprd01.prod.exchangelabs.com (2603:10a6:10:4bb::10) by DB5PR08MB10311.eurprd08.prod.outlook.com (2603:10a6:10:4a5::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.24; Wed, 28 Jun 2023 13:44:43 +0000 Received: from DBAEUR03FT043.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:4bb:cafe::b8) by DUZPR01CA0097.outlook.office365.com (2603:10a6:10:4bb::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6544.18 via Frontend Transport; Wed, 28 Jun 2023 13:44:43 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT043.mail.protection.outlook.com (100.127.143.24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6544.20 via Frontend Transport; Wed, 28 Jun 2023 13:44:43 +0000 Received: ("Tessian outbound 546d04a74417:v142"); Wed, 28 Jun 2023 13:44:43 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: fbfe09bd314495b2 X-CR-MTA-TID: 64aa7808 Received: from 6e1880ce5b65.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 0CE9AD53-70DE-4A82-A377-FC9538419E5F.1; Wed, 28 Jun 2023 13:44:36 +0000 Received: from EUR01-DB5-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 6e1880ce5b65.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 28 Jun 2023 13:44:36 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=exBj9ilO8HIVcXfNXyblqxcln2xV6HWedGe9FFsA9TtnOOvdLf0Q03K73obEVpwj9MLsfMqJg4OQnYVPMUTe+/eEwK3MlFhP2KDJD9y6mg28x6YZSCMGlBmP337pzBMREzBYgfyOuM6fXSfiE2VBZ2PgiYlhlUJxFMYqiaq3SLwG/e64sZUvgvBXyCQu2YW2A3Rocx2Yw6nxjtzlhuRM9ek3s66ReBuSlyNeeWpvRFcLeTFHB4l4kpXG9AEFIPqs/9yKWmbgDcRc6qVsDyAN3prDWpxBVX+Xao2E2uV0sMZRo/g2iQ1GUGwHZuS3SUxquOEHB7GeQeL9dHclFxHezg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=uIcU2EX+WHKsAmN0PiO5bfoYVrwKOtEruMW4polfUL4=; b=i7EhTK/7U0ilQXmDHqh7UAYwQZFJWxikVd7dLf+iyh13BVtT0sVqDjqGk+ak2BD7p6guRJKkMUc7E0jHJwHNNP6+VQZXfd7Im5uFSczRbYCKrlT9pQ1ioiPkjpAjl6DH7PKBXBUt3PW9/ilyT0kfva2Jgd53isy+HGN/dN3YcBGq2wFTlr927NYzKm1QZGCXwuG4/4ny8poV/S4huegiOpM1/zU1O+zQ5vjhvd5rDPI5MURk3c6rx9FrfA0TsuPeHnHTlgZZfDCALj8dAeK7psbk8pnfwGbCSviCGOQNvRRNBCGQuTP6KBi8RwAaIMdZ+meJwAKUYh9kUnoqpdk68w== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by GV2PR08MB8172.eurprd08.prod.outlook.com (2603:10a6:150:7c::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.24; Wed, 28 Jun 2023 13:44:33 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::2301:1cde:cfe7:eaf0]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::2301:1cde:cfe7:eaf0%6]) with mapi id 15.20.6521.026; Wed, 28 Jun 2023 13:44:33 +0000 Date: Wed, 28 Jun 2023 14:44:26 +0100 To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, rguenther@suse.de, jlaw@ventanamicro.com Subject: [PATCH 6/19]middle-end: Don't enter piecewise expansion if VF is not constant. Message-ID: Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: SA9P223CA0022.NAMP223.PROD.OUTLOOK.COM (2603:10b6:806:26::27) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|GV2PR08MB8172:EE_|DBAEUR03FT043:EE_|DB5PR08MB10311:EE_ X-MS-Office365-Filtering-Correlation-Id: 77989f79-3c56-49cc-bd41-08db77ddd41f x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: AZsiF40marAMUTdQOosfidJgfUqJ5FWJfR9C2uxXvzKDxw/Q3/0jRTw5xkoWAEu8DzwbBoJrlT6lsElqGt1WJhB9Rf7TCD2O8olOAijuTuZYBtfF+f5vwMIMkkEJmPxv2z/YbDKCKfTj8KGepbVYoCDkr0YRCeRY+Y9/YgfA8GDaH9HAFLwoqKuDSLeQB+vC+iXie097R0eo+v3Oercnk2kiaJI+/xELkjTJq54BTbezGc8g4PdtepWor4aPg8xNRyDHUaFDi/rCWM1Ot6F7DdOPcmS+HcpToqONW6B3vSwEw++pgE4+5lqfVrQzVHlKOVL9Htb6X1PUFRirdWHxm0Y0Ja+TNYuJoS6PHC7pi/ZB8paMbpl/WGnoRIVViYRedDI8/tL9Zkph0H5kSlV74UqkNrtGYzH1kGg8tmBkswabbtbr4GDHk4Ue6Rd1kVgd9YxAPgQL3m2yBTao8CdVIZwRg0+PLuLYtRl8JjMKhkPBfS3lBW/yVumduFkNFP0CuBD5mABGL9AFu0N7khhlfnMx0yeQ0G9+VprdorUWXOoJW0+7E42T7HnEKf5INdJZan6f0ZCG0UtWkfpi7ZtY8AiJ4koUt2FmHcdGzrWkCDC7kAkiLXiKDafUaONEFmiu X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230028)(4636009)(39860400002)(136003)(346002)(366004)(396003)(376002)(451199021)(2906002)(6486002)(33964004)(2616005)(4743002)(44144004)(6666004)(38100700002)(186003)(6512007)(86362001)(26005)(41300700001)(478600001)(36756003)(66556008)(66946007)(316002)(4326008)(6916009)(66476007)(6506007)(44832011)(5660300002)(235185007)(8936002)(8676002)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: GV2PR08MB8172 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT043.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 5f8f6ac2-b248-4d42-cbe3-08db77ddce49 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: EoQEmdSMRkFxsc/0Ti2OCD9PY6YRtAj7mfBtspbuSYoYCt/2XP3JpWqVpE97Zqv+DjnsIRuo1bPkZtcVNLj9llmCMju7AJ95C8AqLdzdGUCA8dS1Q9nTNfI6CfFV3wVuJrGeFnczaJFdtaDKtD5ID+uALP9+Ox/e/QTLP4SmGdbxs93ullEXJhOhI/BeZr1sx8ih6Od7YTsKJB3Pjjs/hUa/uZybe0XBlDzldMbYHdGPoNUb70lsXla6AwewjjhY91S1giulm4b/IoIyGPi77ck7pFu/jVD6OR4URWFTQ13Om2+Mgi7svAcWQbr0uXM1MrVUdVkDZf5K7rf2kLRx1R1Hi+P+xpI/mol7OKxl4pyrv9l37jcD0CcjEq2UXCEFjycZ77r+v9HbCrkkfUPgpd55YOcJ44GaLbKYFn1fs1r3n3Lj6xnNPdjyJI1tT9rA9H5XLqSscuwZ8OGsGIhigoSKUax8/gQWzHVx5AiOp31V08tbQtD98K67AqRkflj97G4k5VWL0UxCAIwG+vtKPGWwplWBV+248alKYsdFnwOq5OMV9w/0+CAQ3Ej3oOCyGuBCFcU+TrXJuZKQZzsvphZKSc3plwq0IrrJQBiMS4T51uixIxOAK0Y2EVyY8F+Wm67EJ2GVJbS8EYs0c9dv5RWU3V9jcolzj5UOpoudg4PrTf8JqPN7hIG+nMXNin7plgr9Hi8vfMIbXeklngj0GRc2pDR/AF7z9VNQFfxECQaxn4qYObDala/qgWAAGLvZjG+N/MNFuX4Jd/4D0peU9w== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230028)(4636009)(136003)(346002)(396003)(376002)(39860400002)(451199021)(46966006)(40470700004)(36840700001)(316002)(70586007)(6916009)(4326008)(478600001)(70206006)(36756003)(235185007)(8936002)(5660300002)(8676002)(44832011)(40460700003)(36860700001)(86362001)(40480700001)(33964004)(41300700001)(44144004)(82310400005)(6486002)(2906002)(186003)(4743002)(47076005)(6506007)(6512007)(6666004)(336012)(356005)(26005)(82740400003)(81166007)(2616005)(107886003)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Jun 2023 13:44:43.3228 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 77989f79-3c56-49cc-bd41-08db77ddd41f X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT043.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB5PR08MB10311 X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tamar Christina via Gcc-patches From: Tamar Christina Reply-To: Tamar Christina Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769954322738036139?= X-GMAIL-MSGID: =?utf-8?q?1769954322738036139?= Hi All, expand_vector_piecewise does not support VLA expansion as it has a hard assert on the type not being VLA. Instead of just failing to expand and so the call marked unsupported we ICE. This adjust it so we don't and can gracefully handle the expansion in support checks. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * tree-vect-generic.cc (expand_vector_comparison): Skip piecewise if not constant. --- inline copy of patch -- diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc index df04a0db68da3222f43dd938f8e7adb186cd93c9..da1fd2f40d82a9fa301e6ed0b2f4c3c222d58a8d 100644 --- diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc index df04a0db68da3222f43dd938f8e7adb186cd93c9..da1fd2f40d82a9fa301e6ed0b2f4c3c222d58a8d 100644 --- a/gcc/tree-vect-generic.cc +++ b/gcc/tree-vect-generic.cc @@ -481,7 +481,7 @@ expand_vector_comparison (gimple_stmt_iterator *gsi, tree type, tree op0, } t = gimplify_build1 (gsi, VIEW_CONVERT_EXPR, type, t); } - else + else if (TYPE_VECTOR_SUBPARTS (type).is_constant ()) t = expand_vector_piecewise (gsi, do_compare, type, TREE_TYPE (TREE_TYPE (op0)), op0, op1, code, false); --- a/gcc/tree-vect-generic.cc +++ b/gcc/tree-vect-generic.cc @@ -481,7 +481,7 @@ expand_vector_comparison (gimple_stmt_iterator *gsi, tree type, tree op0, } t = gimplify_build1 (gsi, VIEW_CONVERT_EXPR, type, t); } - else + else if (TYPE_VECTOR_SUBPARTS (type).is_constant ()) t = expand_vector_piecewise (gsi, do_compare, type, TREE_TYPE (TREE_TYPE (op0)), op0, op1, code, false); From patchwork Wed Jun 28 13:44:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 113894 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp8939004vqr; Wed, 28 Jun 2023 06:47:41 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6gMi+CfqaWa5vH7fCQVcPfLCyMzfZ54kxy7w22GBC1OGLWyjHYMQJBRw0GGXkYPHMzh6LA X-Received: by 2002:a17:907:26cc:b0:987:498a:87f6 with SMTP id bp12-20020a17090726cc00b00987498a87f6mr23466341ejc.34.1687960061110; Wed, 28 Jun 2023 06:47:41 -0700 (PDT) Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id m19-20020a1709066d1300b0098d85fee2e2si5463047ejr.608.2023.06.28.06.47.40 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Jun 2023 06:47:41 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=uQSWGJao; arc=fail (signature failed); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 905AA3851ABA for ; Wed, 28 Jun 2023 13:46:18 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 905AA3851ABA DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1687959978; bh=sqcL001IzrvkYK+E4g/6YR27pN5BjfPsEZ6SyEwymJA=; h=Date:To:Cc:Subject:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=uQSWGJaop6rnOx567fi1wGcH8J0Z1V3h+IoQu95UAYwPGLwIHcK9mb8aP0h8p87gW cSuJ0uUjfRUL9W3gGnQQPqXYaYhCmQiBwJqgG0c54fp2GeXUkmsHGvL6QihpBWmUTx XSiH3obHFAajXFSKNwQKPKkxdvz0Bo5jFngnxhNQ= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR04-HE1-obe.outbound.protection.outlook.com (mail-he1eur04on2078.outbound.protection.outlook.com [40.107.7.78]) by sourceware.org (Postfix) with ESMTPS id 224883856633 for ; Wed, 28 Jun 2023 13:45:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 224883856633 Received: from DUZPR01CA0156.eurprd01.prod.exchangelabs.com (2603:10a6:10:4bd::19) by AS8PR08MB7839.eurprd08.prod.outlook.com (2603:10a6:20b:52e::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.23; Wed, 28 Jun 2023 13:45:18 +0000 Received: from DBAEUR03FT045.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:4bd:cafe::68) by DUZPR01CA0156.outlook.office365.com (2603:10a6:10:4bd::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6544.18 via Frontend Transport; Wed, 28 Jun 2023 13:45:18 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT045.mail.protection.outlook.com (100.127.142.142) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6544.20 via Frontend Transport; Wed, 28 Jun 2023 13:45:17 +0000 Received: ("Tessian outbound c08fa2e31830:v142"); Wed, 28 Jun 2023 13:45:17 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: d4a5ca4e19d36ff0 X-CR-MTA-TID: 64aa7808 Received: from cf42b78647f4.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id D9E04FA6-3DB5-48F9-A6B0-08FE9549E5F6.1; Wed, 28 Jun 2023 13:45:10 +0000 Received: from EUR02-DB5-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id cf42b78647f4.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 28 Jun 2023 13:45:10 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=GZUulIa0Rrb3BQYDzNO09AkEEx3KZHpJHQ/mQikaYKAkVI21C+mz5+z/BguSh8VOgnXs+/XT5fn1ZBCLx6h7o86xrnq9Pv3CtDXefatB9PNr04Sn2jGn670RKdivd2Ro331niJi/g8kZmcMkTc/ZT/uvyC+1aze9dj6Z1OJCUkWPY7nh8DyfGDVs2fyBghJU/vEjtqJAAg450llo54bl9vTTG2Y8iOvN+c/geNrPsqWEAJRNwmMX9P5bWq/rm9DS7K9u9YT6so8X+hGntRZoa466ccNg/mF/l60dJL0ZLRh4kFRv76bEFM2DEzYchzZVQxQJKvOvUa5aVjOysUCzsA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=sqcL001IzrvkYK+E4g/6YR27pN5BjfPsEZ6SyEwymJA=; b=j53/RusRf/exkBtv/kU/efLWy+KljIWYcW1w0UNYk6AMZ3Pm/XE7cYGaA32GYfMBWRCVt12MlX44KyCvbmY6qUe8i5Rf7XH7AZGAB3rR9/IZPOS9eXLlwgL2DJNAczJopeIxABY6QfhPFt/LKMeqbONoqS9Em2t+93KRg2vr+HQ7oWYL2mYdawRkcwS4kz2++Jj3WOww4pDfQg8s9N1KydHOoCt2HqF9Pyg98MpjrSKjZPRYR1WI9NewzWjO2jC33S5fT3gN3GWVocvapiEXqjTjjEd0WJoIU2yKQsPpmaI0zJVHRbckslCvb1XruotFzClucbJWwVODp/4hZa8uHA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by GV2PR08MB8172.eurprd08.prod.outlook.com (2603:10a6:150:7c::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.24; Wed, 28 Jun 2023 13:45:07 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::2301:1cde:cfe7:eaf0]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::2301:1cde:cfe7:eaf0%6]) with mapi id 15.20.6521.026; Wed, 28 Jun 2023 13:45:07 +0000 Date: Wed, 28 Jun 2023 14:44:59 +0100 To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, rguenther@suse.de, jlaw@ventanamicro.com Subject: [PATCH 7/19]middle-end: Refactor vectorizer loop conditionals and separate out IV to new variables Message-ID: Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: SA9PR13CA0045.namprd13.prod.outlook.com (2603:10b6:806:22::20) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|GV2PR08MB8172:EE_|DBAEUR03FT045:EE_|AS8PR08MB7839:EE_ X-MS-Office365-Filtering-Correlation-Id: e409d6d1-21bc-4cc5-0909-08db77dde8c6 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: NY4ThcaWT4tUNkkv0ee0rSLefW5Fj48YfQM7oP/51JfGHCY2EU2L4UWaRggMxA9RC1fIgSDA2qwGCrJmpZb/2N030Eu9Ts4+aWJFcIYWohwH7Y4uNKqrg3PmIFXXMBtXZOuzi6WiLQ08zrNj/PQWcq4dY0R4bU7GJqnplGvVbQLIlRQ5yl9zDskyWgSKQ/BIYNJNxTL3VEgFugSeGt3nEugCE9p4TZbwLq1DkgiWKdgz1eQEBi8QYyeNIzs+8+g2BYPyTntADz3Qkerv3MNpjGZ4+/TyoqFllXsUdtv9gAMC6Jy7/d/xZDOpuGVEEGs5nQeWFIeMnYolj7T7yAH9vk6WUDbHbHGth8giagxw8bprTJ2Rfm6L2+PKy51+wH3pnv2F3cfI6bWWrF5+PoHnJC7eX9JkZREE1mNkfE9+5HT+Jb9zgq+V2Ax1g10cjyWmnqnPNP8/wtTSFQ1Mje2pC04RDNBu2TjF/U02RjdfTcM1vrXDFUdPy+Y4WB/j3qLVy72cRd011STWz7JcF5sEZuJelGMIAl9w11C9nLTGLNgOoxrDyY0i8rrCey+UBI/o6aIsF+q3216MtHKm4EXMP0HAbJp4/JXimPLtStuoBgrt0oUl+vKmpuIYYgjcJxM4 X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230028)(4636009)(39860400002)(136003)(346002)(366004)(396003)(376002)(451199021)(2906002)(6486002)(33964004)(2616005)(4743002)(44144004)(6666004)(83380400001)(38100700002)(30864003)(186003)(6512007)(86362001)(26005)(41300700001)(478600001)(36756003)(66556008)(66946007)(316002)(4326008)(6916009)(66476007)(6506007)(44832011)(5660300002)(235185007)(8936002)(8676002)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: GV2PR08MB8172 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT045.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: e3a113b9-8a1b-4790-f3cf-08db77dde242 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: lRokRPmx7S/RJsrclaCAPqNlGdF0buj+fV5ZN+9/F0Syhk2plrRhnUVQOysqzQq0kqMlJkXC2paWuz1PELErd6JdSaVWk6FfK5/xkMqq7q8815jKTAVd4oGwRWky5AHs2k7UVDtc7UJxvKxb2SeWIGo9NaduOyTWMbKVWQTEqAGl6f957IGBTP/KecmaiPJVHnKolbRJ5P2T9moeeSn+ZYgCe85S6P0zxrESG511ie2T2lawSZoeZmHbyQ8p0bnVHhfgKLTG6w8GwHiOF/1YDlblfRxQpWjWqyeYzP8nzvN5Nhq5lG9AZeFFzVj2epYGJEnzorm0t9rn2/lfbg4uq3uoI9B9r85M6LOEDY8C2at2Q6CeYrpf6dyTT0qn4jZ53nX22CNpQ+5IfVyH+Mm5HUkp949TvcdCGhkDHR6fC1p4IWoRm8Dm4J6Swf7thd4vyjaoi+rBQmF8FGggqJnG+9wSuPTDp+0MS6lbrEyq2p7GCA/CFSdABvZWwE3rkZE9N8c2YzhusNsurNrJb9vKhY1PqBeY1UkMndvwPqmP1pzcrCu1Zw4jHyCBBFAR+au9Wi+8rH0ww0bVvOXZTXTHCpgKc8qHVbmHIDRqq5EuJyTSjXhx+C/7MwIiZJOjy2S6Dma1MB8M4AUiC/m+PktPAAWJ2B3J2u3W2V552JLnMpMdn8W+nMCTk/rOlk/LMYL04IcVZ7VtpQKxBq/jWdKL7N3LRjCSeWf8A9BoJ0S/vMS7ntx8g5js3bTrmk/hIIvDAMr++SIxaiTaCZpIL8/vKQ== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230028)(4636009)(396003)(376002)(346002)(136003)(39860400002)(451199021)(46966006)(40470700004)(36840700001)(82310400005)(36860700001)(70206006)(40460700003)(36756003)(30864003)(5660300002)(44832011)(235185007)(356005)(86362001)(8936002)(8676002)(6916009)(41300700001)(316002)(4326008)(81166007)(70586007)(40480700001)(82740400003)(107886003)(47076005)(2906002)(26005)(478600001)(6506007)(186003)(2616005)(6512007)(4743002)(33964004)(44144004)(83380400001)(6486002)(336012)(6666004)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Jun 2023 13:45:17.9671 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: e409d6d1-21bc-4cc5-0909-08db77dde8c6 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT045.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB7839 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_LOTSOFHASH, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tamar Christina via Gcc-patches From: Tamar Christina Reply-To: Tamar Christina Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769954408917297224?= X-GMAIL-MSGID: =?utf-8?q?1769954408917297224?= Hi All, This patch splits off the vectorizer's understanding of the main loop exit off from the normal loop infrastructure. Essentially we're relaxing the use of single_exit() in the vectorizer as we will no longer have a single single and need a well defined split between the main and secondary exits of loops for vectorization. These new values were added to the loop class even though they're only used by the vectorizer for a couple of reasons: - We need access to them in places where we have no loop_vinfo. - We only have a single loop_vinfo for each loop under consideration, however that same loop can have different copies, e.g. peeled/versioned copies or the scalar variant of the loop. For each of these we still need to be able to have a coherent exit definition. For these reason the placement in the loop class was the only way to keep the book keeping together with the loops and avoid possibly expensive lookups. For this version of the patch the `main` exit of a loop is defined as the exit that is closest to the loop latch. This is stored in vec_loop_iv. The remaining exits which are relevant for the vectorizer are stored inside vec_loop_alt_exits. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * cfgloop.cc (alloc_loop): Initialize vec_loop_iv. * cfgloop.h (class loop): Add vec_loop_iv and vec_loop_alt_exits. * doc/loop.texi: Document get_edge_condition. * tree-loop-distribution.cc (loop_distribution::distribute_loop): Initialize vec_loop_iv since loop distributions calls loop peeling which only understands vec_loop_iv now. * tree-scalar-evolution.cc (get_edge_condition): New. (get_loop_exit_condition): Refactor into get_edge_condition. * tree-scalar-evolution.h (get_edge_condition): New. * tree-vect-data-refs.cc (vect_enhance_data_refs_alignment): Update use of single_exit. * tree-vect-loop-manip.cc (vect_set_loop_condition_partial_vectors, vect_set_loop_condition_normal, vect_set_loop_condition, slpeel_tree_duplicate_loop_to_edge_cfg, slpeel_can_duplicate_loop_p, find_loop_location, vect_update_ivs_after_vectorizer, vect_gen_vector_loop_niters_mult_vf, find_guard_arg, vect_do_peeling): Replace usages of single_exit. (vec_init_exit_info): New. * tree-vect-loop.cc (vect_analyze_loop_form, vect_create_epilog_for_reduction, vectorizable_live_operation, scale_profile_for_vect_loop, vect_transform_loop): New. * tree-vectorizer.h (LOOP_VINFO_IV_EXIT, LOOP_VINFO_ALT_EXITS, vec_init_exit_info): New. --- inline copy of patch -- diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h index e7ac2b5f3db55de3dbbab7bd2bfe08388f4ec533..cab82d7960e5be517bba2621f7f4888e7bf3c295 100644 --- diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h index e7ac2b5f3db55de3dbbab7bd2bfe08388f4ec533..cab82d7960e5be517bba2621f7f4888e7bf3c295 100644 --- a/gcc/cfgloop.h +++ b/gcc/cfgloop.h @@ -272,6 +272,14 @@ public: the basic-block from being collected but its index can still be reused. */ basic_block former_header; + + /* The controlling loop IV for the current loop when vectorizing. This IV + controls the natural exits of the loop. */ + edge GTY ((skip (""))) vec_loop_iv; + + /* If the loop has multiple exits this structure contains the alternate + exits of the loop which are relevant for vectorization. */ + vec GTY ((skip (""))) vec_loop_alt_exits; }; /* Set if the loop is known to be infinite. */ diff --git a/gcc/cfgloop.cc b/gcc/cfgloop.cc index ccda7415d7037e26048425b5d85f3633a39fd325..98123f7dce98227c8dffe4833e159fbb05596831 100644 --- a/gcc/cfgloop.cc +++ b/gcc/cfgloop.cc @@ -355,6 +355,7 @@ alloc_loop (void) loop->nb_iterations_upper_bound = 0; loop->nb_iterations_likely_upper_bound = 0; loop->nb_iterations_estimate = 0; + loop->vec_loop_iv = NULL; return loop; } diff --git a/gcc/doc/loop.texi b/gcc/doc/loop.texi index b357e9de7bcb1898ab9dda25738b9f003ca6f9f5..4ba6bb2585c81f7af34943b0493b94d5c3a8bf60 100644 --- a/gcc/doc/loop.texi +++ b/gcc/doc/loop.texi @@ -212,6 +212,7 @@ relation, and breath-first search order, respectively. @code{NULL} if the loop has more than one exit. You can only use this function if @code{LOOPS_HAVE_RECORDED_EXITS} is used. @item @code{get_loop_exit_edges}: Enumerates the exit edges of a loop. +@item @code{get_edge_condition}: Get the condition belonging to an exit edge. @item @code{just_once_each_iteration_p}: Returns true if the basic block is executed exactly once during each iteration of a loop (that is, it does not belong to a sub-loop, and it dominates the latch of the loop). diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc index cf7c197aaf7919a0ecd56a10db0a42f93707ca58..97879498db46dd3c34181ae9aa6e5476004dd5b5 100644 --- a/gcc/tree-loop-distribution.cc +++ b/gcc/tree-loop-distribution.cc @@ -3042,6 +3042,24 @@ loop_distribution::distribute_loop (class loop *loop, return 0; } + /* Loop distribution only does prologue peeling but we still need to + initialize loop exit information. However we only support single exits at + the moment. As such, should exit information not have been provided and we + have more than one exit, bail out. */ + if (!(loop->vec_loop_iv = single_exit (loop))) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, + "Loop %d not distributed: too many exits.\n", + loop->num); + + free_rdg (rdg); + loop_nest.release (); + free_data_refs (datarefs_vec); + delete ddrs_table; + return 0; + } + data_reference_p dref; for (i = 0; datarefs_vec.iterate (i, &dref); ++i) dref->aux = (void *) (uintptr_t) i; diff --git a/gcc/tree-scalar-evolution.h b/gcc/tree-scalar-evolution.h index c58a8a16e81573aada38e912b7c58b3e1b23b66d..2e83836911ec8e968e90cf9b489dc7fe121ff80e 100644 --- a/gcc/tree-scalar-evolution.h +++ b/gcc/tree-scalar-evolution.h @@ -23,6 +23,7 @@ along with GCC; see the file COPYING3. If not see extern tree number_of_latch_executions (class loop *); extern gcond *get_loop_exit_condition (const class loop *); +extern gcond *get_edge_condition (edge); extern void scev_initialize (void); extern bool scev_initialized_p (void); diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc index ba47a684f4b373fb4f2dc16ddb8edb0ef39da6ed..af8be618b0748258132ccbef2d387bfddbe3c16b 100644 --- a/gcc/tree-scalar-evolution.cc +++ b/gcc/tree-scalar-evolution.cc @@ -1293,8 +1293,15 @@ scev_dfs::follow_ssa_edge_expr (gimple *at_stmt, tree expr, gcond * get_loop_exit_condition (const class loop *loop) { + return get_edge_condition (single_exit (loop)); +} + +/* If the statement just before the EXIT_EDGE contains a condition then + return the condition, otherwise NULL. */ + +gcond * +get_edge_condition (edge exit_edge){ gcond *res = NULL; - edge exit_edge = single_exit (loop); if (dump_file && (dump_flags & TDF_SCEV)) fprintf (dump_file, "(get_loop_exit_condition \n "); diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc index ebe93832b1e89120eab2fdac0fc30fe35c0356a2..fcc950f528b2d1e044be12424c2df11f692ee8ba 100644 --- a/gcc/tree-vect-data-refs.cc +++ b/gcc/tree-vect-data-refs.cc @@ -2070,7 +2070,8 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) /* Check if we can possibly peel the loop. */ if (!vect_can_advance_ivs_p (loop_vinfo) - || !slpeel_can_duplicate_loop_p (loop, single_exit (loop)) + || !slpeel_can_duplicate_loop_p (loop_vinfo, + LOOP_VINFO_IV_EXIT (loop_vinfo)) || loop->inner) do_peeling = false; diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc index 20f570e4a0d64610d7b63fe492eba5254ab5dc2c..299dfb75e3372b6a91637101b4bab0e82eb560ad 100644 --- a/gcc/tree-vect-loop-manip.cc +++ b/gcc/tree-vect-loop-manip.cc @@ -904,7 +904,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop, add_header_seq (loop, header_seq); /* Get a boolean result that tells us whether to iterate. */ - edge exit_edge = single_exit (loop); + edge exit_edge = LOOP_VINFO_IV_EXIT (loop_vinfo); gcond *cond_stmt; if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) && !LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo)) @@ -935,7 +935,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop, if (final_iv) { gassign *assign = gimple_build_assign (final_iv, orig_niters); - gsi_insert_on_edge_immediate (single_exit (loop), assign); + gsi_insert_on_edge_immediate (exit_edge, assign); } return cond_stmt; @@ -1183,7 +1183,8 @@ vect_set_loop_condition_partial_vectors_avx512 (class loop *loop, loop handles exactly VF scalars per iteration. */ static gcond * -vect_set_loop_condition_normal (class loop *loop, tree niters, tree step, +vect_set_loop_condition_normal (loop_vec_info loop_vinfo, + class loop *loop, tree niters, tree step, tree final_iv, bool niters_maybe_zero, gimple_stmt_iterator loop_cond_gsi) { @@ -1191,13 +1192,13 @@ vect_set_loop_condition_normal (class loop *loop, tree niters, tree step, gcond *cond_stmt; gcond *orig_cond; edge pe = loop_preheader_edge (loop); - edge exit_edge = single_exit (loop); + edge exit_edge = loop->vec_loop_iv; gimple_stmt_iterator incr_gsi; bool insert_after; enum tree_code code; tree niters_type = TREE_TYPE (niters); - orig_cond = get_loop_exit_condition (loop); + orig_cond = get_edge_condition (exit_edge); gcc_assert (orig_cond); loop_cond_gsi = gsi_for_stmt (orig_cond); @@ -1305,7 +1306,7 @@ vect_set_loop_condition_normal (class loop *loop, tree niters, tree step, if (final_iv) { gassign *assign; - edge exit = single_exit (loop); + edge exit = LOOP_VINFO_IV_EXIT (loop_vinfo); gcc_assert (single_pred_p (exit->dest)); tree phi_dest = integer_zerop (init) ? final_iv : copy_ssa_name (indx_after_incr); @@ -1353,7 +1354,7 @@ vect_set_loop_condition (class loop *loop, loop_vec_info loop_vinfo, bool niters_maybe_zero) { gcond *cond_stmt; - gcond *orig_cond = get_loop_exit_condition (loop); + gcond *orig_cond = get_edge_condition (loop->vec_loop_iv); gimple_stmt_iterator loop_cond_gsi = gsi_for_stmt (orig_cond); if (loop_vinfo && LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)) @@ -1370,7 +1371,8 @@ vect_set_loop_condition (class loop *loop, loop_vec_info loop_vinfo, loop_cond_gsi); } else - cond_stmt = vect_set_loop_condition_normal (loop, niters, step, final_iv, + cond_stmt = vect_set_loop_condition_normal (loop_vinfo, loop, niters, + step, final_iv, niters_maybe_zero, loop_cond_gsi); @@ -1439,6 +1441,69 @@ slpeel_duplicate_current_defs_from_edges (edge from, edge to) get_current_def (PHI_ARG_DEF_FROM_EDGE (from_phi, from))); } +/* When copies of the same loop are created the copies won't have any SCEV + information and so we can't determine what their exits are. However since + they are copies of an original loop the exits should be the same. + + I don't really like this, and think we need a different way, but I don't + know what. So sending this up so Richi can comment. */ + +void +vec_init_exit_info (class loop *loop) +{ + if (loop->vec_loop_iv) + return; + + auto_vec exits = get_loop_exit_edges (loop); + if (exits.is_empty ()) + return; + + if ((loop->vec_loop_iv = single_exit (loop))) + return; + + loop->vec_loop_alt_exits.create (exits.length () - 1); + + /* The main IV is to be determined by the block that's the first reachable + block from the latch. We cannot rely on the order the loop analysis + returns and we don't have any SCEV analysis on the loop. */ + auto_vec workset; + workset.safe_push (loop_latch_edge (loop)); + hash_set visited; + + while (!workset.is_empty ()) + { + edge e = workset.pop (); + if (visited.contains (e)) + continue; + + bool found_p = false; + for (edge ex : e->src->succs) + { + if (exits.contains (ex)) + { + found_p = true; + e = ex; + break; + } + } + + if (found_p) + { + loop->vec_loop_iv = e; + for (edge ex : exits) + if (e != ex) + loop->vec_loop_alt_exits.safe_push (ex); + return; + } + else + { + for (edge ex : e->src->preds) + workset.safe_insert (0, ex); + } + visited.add (e); + } + gcc_unreachable (); +} /* Given LOOP this function generates a new copy of it and puts it on E which is either the entry or exit of LOOP. If SCALAR_LOOP is @@ -1458,13 +1523,15 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge exit, new_exit; bool duplicate_outer_loop = false; - exit = single_exit (loop); + exit = loop->vec_loop_iv; at_exit = (e == exit); if (!at_exit && e != loop_preheader_edge (loop)) return NULL; if (scalar_loop == NULL) scalar_loop = loop; + else + vec_init_exit_info (scalar_loop); bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1); pbbs = bbs + 1; @@ -1490,13 +1557,17 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, bbs[0] = preheader; new_bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1); - exit = single_exit (scalar_loop); + exit = scalar_loop->vec_loop_iv; copy_bbs (bbs, scalar_loop->num_nodes + 1, new_bbs, &exit, 1, &new_exit, NULL, at_exit ? loop->latch : e->src, true); - exit = single_exit (loop); + exit = loop->vec_loop_iv; basic_block new_preheader = new_bbs[0]; + /* Record the new loop exit information. new_loop doesn't have SCEV data and + so we must initialize the exit information. */ + vec_init_exit_info (new_loop); + /* Before installing PHI arguments make sure that the edges into them match that of the scalar loop we analyzed. This makes sure the SLP tree matches up between the main vectorized @@ -1537,7 +1608,7 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, but LOOP will not. slpeel_update_phi_nodes_for_guard{1,2} expects the LOOP SSA_NAMEs (on the exit edge and edge from latch to header) to have current_def set, so copy them over. */ - slpeel_duplicate_current_defs_from_edges (single_exit (scalar_loop), + slpeel_duplicate_current_defs_from_edges (scalar_loop->vec_loop_iv, exit); slpeel_duplicate_current_defs_from_edges (EDGE_SUCC (scalar_loop->latch, 0), @@ -1696,11 +1767,12 @@ slpeel_add_loop_guard (basic_block guard_bb, tree cond, */ bool -slpeel_can_duplicate_loop_p (const class loop *loop, const_edge e) +slpeel_can_duplicate_loop_p (const loop_vec_info loop_vinfo, const_edge e) { - edge exit_e = single_exit (loop); + class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); + edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo); edge entry_e = loop_preheader_edge (loop); - gcond *orig_cond = get_loop_exit_condition (loop); + gcond *orig_cond = get_edge_condition (exit_e); gimple_stmt_iterator loop_exit_gsi = gsi_last_bb (exit_e->src); unsigned int num_bb = loop->inner? 5 : 2; @@ -1709,7 +1781,7 @@ slpeel_can_duplicate_loop_p (const class loop *loop, const_edge e) if (!loop_outer (loop) || loop->num_nodes != num_bb || !empty_block_p (loop->latch) - || !single_exit (loop) + || !LOOP_VINFO_IV_EXIT (loop_vinfo) /* Verify that new loop exit condition can be trivially modified. */ || (!orig_cond || orig_cond != gsi_stmt (loop_exit_gsi)) || (e != exit_e && e != entry_e)) @@ -1722,7 +1794,7 @@ slpeel_can_duplicate_loop_p (const class loop *loop, const_edge e) return ret; } -/* Function vect_get_loop_location. +/* Function find_loop_location. Extract the location of the loop in the source code. If the loop is not well formed for vectorization, an estimated @@ -1739,11 +1811,19 @@ find_loop_location (class loop *loop) if (!loop) return dump_user_location_t (); - stmt = get_loop_exit_condition (loop); + if (loops_state_satisfies_p (LOOPS_HAVE_RECORDED_EXITS)) + { + /* We only care about the loop location, so use any exit with location + information. */ + for (edge e : get_loop_exit_edges (loop)) + { + stmt = get_edge_condition (e); - if (stmt - && LOCATION_LOCUS (gimple_location (stmt)) > BUILTINS_LOCATION) - return stmt; + if (stmt + && LOCATION_LOCUS (gimple_location (stmt)) > BUILTINS_LOCATION) + return stmt; + } + } /* If we got here the loop is probably not "well formed", try to estimate the loop location */ @@ -1962,7 +2042,8 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo, gphi_iterator gsi, gsi1; class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); basic_block update_bb = update_e->dest; - basic_block exit_bb = single_exit (loop)->dest; + + basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest; /* Make sure there exists a single-predecessor exit bb: */ gcc_assert (single_pred_p (exit_bb)); @@ -2529,10 +2610,9 @@ vect_gen_vector_loop_niters_mult_vf (loop_vec_info loop_vinfo, { /* We should be using a step_vector of VF if VF is variable. */ int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo).to_constant (); - class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); tree type = TREE_TYPE (niters_vector); tree log_vf = build_int_cst (type, exact_log2 (vf)); - basic_block exit_bb = single_exit (loop)->dest; + basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest; gcc_assert (niters_vector_mult_vf_ptr != NULL); tree niters_vector_mult_vf = fold_build2 (LSHIFT_EXPR, type, @@ -2559,7 +2639,7 @@ find_guard_arg (class loop *loop, class loop *epilog ATTRIBUTE_UNUSED, gphi *lcssa_phi) { gphi_iterator gsi; - edge e = single_exit (loop); + edge e = loop->vec_loop_iv; gcc_assert (single_pred_p (e->dest)); for (gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi); gsi_next (&gsi)) @@ -3328,8 +3408,8 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, if (epilog_peeling) { - e = single_exit (loop); - gcc_checking_assert (slpeel_can_duplicate_loop_p (loop, e)); + e = LOOP_VINFO_IV_EXIT (loop_vinfo); + gcc_checking_assert (slpeel_can_duplicate_loop_p (loop_vinfo, e)); /* Peel epilog and put it on exit edge of loop. If we are vectorizing said epilog then we should use a copy of the main loop as a starting @@ -3419,8 +3499,8 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, { guard_cond = fold_build2 (EQ_EXPR, boolean_type_node, niters, niters_vector_mult_vf); - guard_bb = single_exit (loop)->dest; - guard_to = split_edge (single_exit (epilog)); + guard_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest; + guard_to = split_edge (epilog->vec_loop_iv); guard_e = slpeel_add_loop_guard (guard_bb, guard_cond, guard_to, skip_vector ? anchor : guard_bb, prob_epilog.invert (), @@ -3428,7 +3508,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, if (vect_epilogues) epilogue_vinfo->skip_this_loop_edge = guard_e; slpeel_update_phi_nodes_for_guard2 (loop, epilog, guard_e, - single_exit (epilog)); + epilog->vec_loop_iv); /* Only need to handle basic block before epilog loop if it's not the guard_bb, which is the case when skip_vector is true. */ if (guard_bb != bb_before_epilog) diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 0a03f56aae7b51fb4c5ce0e49d96888bae634ef7..0bca5932d237cf1cfbbb48271db3f4430672b5dc 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -1641,6 +1641,13 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info) { DUMP_VECT_SCOPE ("vect_analyze_loop_form"); + vec_init_exit_info (loop); + if (!loop->vec_loop_iv) + return opt_result::failure_at (vect_location, + "not vectorized:" + " could not determine main exit from" + " loop with multiple exits.\n"); + /* Different restrictions apply when we are considering an inner-most loop, vs. an outer (nested) loop. (FORNOW. May want to relax some of these restrictions in the future). */ @@ -3025,9 +3032,8 @@ start_over: if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "epilog loop required\n"); if (!vect_can_advance_ivs_p (loop_vinfo) - || !slpeel_can_duplicate_loop_p (LOOP_VINFO_LOOP (loop_vinfo), - single_exit (LOOP_VINFO_LOOP - (loop_vinfo)))) + || !slpeel_can_duplicate_loop_p (loop_vinfo, + LOOP_VINFO_IV_EXIT (loop_vinfo))) { ok = opt_result::failure_at (vect_location, "not vectorized: can't create required " @@ -5964,7 +5970,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, Store them in NEW_PHIS. */ if (double_reduc) loop = outer_loop; - exit_bb = single_exit (loop)->dest; + exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest; exit_gsi = gsi_after_labels (exit_bb); reduc_inputs.create (slp_node ? vec_num : ncopies); for (unsigned i = 0; i < vec_num; i++) @@ -5980,7 +5986,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, phi = create_phi_node (new_def, exit_bb); if (j) def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]); - SET_PHI_ARG_DEF (phi, single_exit (loop)->dest_idx, def); + SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, def); new_def = gimple_convert (&stmts, vectype, new_def); reduc_inputs.quick_push (new_def); } @@ -10301,12 +10307,12 @@ vectorizable_live_operation (vec_info *vinfo, lhs' = new_tree; */ class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); - basic_block exit_bb = single_exit (loop)->dest; + basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest; gcc_assert (single_pred_p (exit_bb)); tree vec_lhs_phi = copy_ssa_name (vec_lhs); gimple *phi = create_phi_node (vec_lhs_phi, exit_bb); - SET_PHI_ARG_DEF (phi, single_exit (loop)->dest_idx, vec_lhs); + SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, vec_lhs); gimple_seq stmts = NULL; tree new_tree; @@ -10829,7 +10835,8 @@ scale_profile_for_vect_loop (class loop *loop, unsigned vf) scale_loop_frequencies (loop, p); } - edge exit_e = single_exit (loop); + edge exit_e = loop->vec_loop_iv; + exit_e->probability = profile_probability::always () / (new_est_niter + 1); edge exit_l = single_pred_edge (loop->latch); @@ -11177,7 +11184,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call) /* Make sure there exists a single-predecessor exit bb. Do this before versioning. */ - edge e = single_exit (loop); + edge e = LOOP_VINFO_IV_EXIT (loop_vinfo); if (! single_pred_p (e->dest)) { split_loop_exit_edge (e, true); diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index a36974c2c0d2103b0a2d0397d06ab84dace08129..bd5eceb5da7a45ef036cd14609ebe091799320bf 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -917,6 +917,8 @@ public: /* Access Functions. */ #define LOOP_VINFO_LOOP(L) (L)->loop +#define LOOP_VINFO_IV_EXIT(L) (L)->loop->vec_loop_iv +#define LOOP_VINFO_ALT_EXITS(L) (L)->loop->vec_loop_alt_exits #define LOOP_VINFO_BBS(L) (L)->bbs #define LOOP_VINFO_NITERSM1(L) (L)->num_itersm1 #define LOOP_VINFO_NITERS(L) (L)->num_iters @@ -2162,6 +2164,7 @@ extern void vect_prepare_for_masked_peels (loop_vec_info); extern dump_user_location_t find_loop_location (class loop *); extern bool vect_can_advance_ivs_p (loop_vec_info); extern void vect_update_inits_of_drs (loop_vec_info, tree, tree_code); +extern void vec_init_exit_info (class loop *); /* In tree-vect-stmts.cc. */ extern tree get_related_vectype_for_scalar_type (machine_mode, tree, --- a/gcc/cfgloop.h +++ b/gcc/cfgloop.h @@ -272,6 +272,14 @@ public: the basic-block from being collected but its index can still be reused. */ basic_block former_header; + + /* The controlling loop IV for the current loop when vectorizing. This IV + controls the natural exits of the loop. */ + edge GTY ((skip (""))) vec_loop_iv; + + /* If the loop has multiple exits this structure contains the alternate + exits of the loop which are relevant for vectorization. */ + vec GTY ((skip (""))) vec_loop_alt_exits; }; /* Set if the loop is known to be infinite. */ diff --git a/gcc/cfgloop.cc b/gcc/cfgloop.cc index ccda7415d7037e26048425b5d85f3633a39fd325..98123f7dce98227c8dffe4833e159fbb05596831 100644 --- a/gcc/cfgloop.cc +++ b/gcc/cfgloop.cc @@ -355,6 +355,7 @@ alloc_loop (void) loop->nb_iterations_upper_bound = 0; loop->nb_iterations_likely_upper_bound = 0; loop->nb_iterations_estimate = 0; + loop->vec_loop_iv = NULL; return loop; } diff --git a/gcc/doc/loop.texi b/gcc/doc/loop.texi index b357e9de7bcb1898ab9dda25738b9f003ca6f9f5..4ba6bb2585c81f7af34943b0493b94d5c3a8bf60 100644 --- a/gcc/doc/loop.texi +++ b/gcc/doc/loop.texi @@ -212,6 +212,7 @@ relation, and breath-first search order, respectively. @code{NULL} if the loop has more than one exit. You can only use this function if @code{LOOPS_HAVE_RECORDED_EXITS} is used. @item @code{get_loop_exit_edges}: Enumerates the exit edges of a loop. +@item @code{get_edge_condition}: Get the condition belonging to an exit edge. @item @code{just_once_each_iteration_p}: Returns true if the basic block is executed exactly once during each iteration of a loop (that is, it does not belong to a sub-loop, and it dominates the latch of the loop). diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc index cf7c197aaf7919a0ecd56a10db0a42f93707ca58..97879498db46dd3c34181ae9aa6e5476004dd5b5 100644 --- a/gcc/tree-loop-distribution.cc +++ b/gcc/tree-loop-distribution.cc @@ -3042,6 +3042,24 @@ loop_distribution::distribute_loop (class loop *loop, return 0; } + /* Loop distribution only does prologue peeling but we still need to + initialize loop exit information. However we only support single exits at + the moment. As such, should exit information not have been provided and we + have more than one exit, bail out. */ + if (!(loop->vec_loop_iv = single_exit (loop))) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, + "Loop %d not distributed: too many exits.\n", + loop->num); + + free_rdg (rdg); + loop_nest.release (); + free_data_refs (datarefs_vec); + delete ddrs_table; + return 0; + } + data_reference_p dref; for (i = 0; datarefs_vec.iterate (i, &dref); ++i) dref->aux = (void *) (uintptr_t) i; diff --git a/gcc/tree-scalar-evolution.h b/gcc/tree-scalar-evolution.h index c58a8a16e81573aada38e912b7c58b3e1b23b66d..2e83836911ec8e968e90cf9b489dc7fe121ff80e 100644 --- a/gcc/tree-scalar-evolution.h +++ b/gcc/tree-scalar-evolution.h @@ -23,6 +23,7 @@ along with GCC; see the file COPYING3. If not see extern tree number_of_latch_executions (class loop *); extern gcond *get_loop_exit_condition (const class loop *); +extern gcond *get_edge_condition (edge); extern void scev_initialize (void); extern bool scev_initialized_p (void); diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc index ba47a684f4b373fb4f2dc16ddb8edb0ef39da6ed..af8be618b0748258132ccbef2d387bfddbe3c16b 100644 --- a/gcc/tree-scalar-evolution.cc +++ b/gcc/tree-scalar-evolution.cc @@ -1293,8 +1293,15 @@ scev_dfs::follow_ssa_edge_expr (gimple *at_stmt, tree expr, gcond * get_loop_exit_condition (const class loop *loop) { + return get_edge_condition (single_exit (loop)); +} + +/* If the statement just before the EXIT_EDGE contains a condition then + return the condition, otherwise NULL. */ + +gcond * +get_edge_condition (edge exit_edge){ gcond *res = NULL; - edge exit_edge = single_exit (loop); if (dump_file && (dump_flags & TDF_SCEV)) fprintf (dump_file, "(get_loop_exit_condition \n "); diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc index ebe93832b1e89120eab2fdac0fc30fe35c0356a2..fcc950f528b2d1e044be12424c2df11f692ee8ba 100644 --- a/gcc/tree-vect-data-refs.cc +++ b/gcc/tree-vect-data-refs.cc @@ -2070,7 +2070,8 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) /* Check if we can possibly peel the loop. */ if (!vect_can_advance_ivs_p (loop_vinfo) - || !slpeel_can_duplicate_loop_p (loop, single_exit (loop)) + || !slpeel_can_duplicate_loop_p (loop_vinfo, + LOOP_VINFO_IV_EXIT (loop_vinfo)) || loop->inner) do_peeling = false; diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc index 20f570e4a0d64610d7b63fe492eba5254ab5dc2c..299dfb75e3372b6a91637101b4bab0e82eb560ad 100644 --- a/gcc/tree-vect-loop-manip.cc +++ b/gcc/tree-vect-loop-manip.cc @@ -904,7 +904,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop, add_header_seq (loop, header_seq); /* Get a boolean result that tells us whether to iterate. */ - edge exit_edge = single_exit (loop); + edge exit_edge = LOOP_VINFO_IV_EXIT (loop_vinfo); gcond *cond_stmt; if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) && !LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo)) @@ -935,7 +935,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop, if (final_iv) { gassign *assign = gimple_build_assign (final_iv, orig_niters); - gsi_insert_on_edge_immediate (single_exit (loop), assign); + gsi_insert_on_edge_immediate (exit_edge, assign); } return cond_stmt; @@ -1183,7 +1183,8 @@ vect_set_loop_condition_partial_vectors_avx512 (class loop *loop, loop handles exactly VF scalars per iteration. */ static gcond * -vect_set_loop_condition_normal (class loop *loop, tree niters, tree step, +vect_set_loop_condition_normal (loop_vec_info loop_vinfo, + class loop *loop, tree niters, tree step, tree final_iv, bool niters_maybe_zero, gimple_stmt_iterator loop_cond_gsi) { @@ -1191,13 +1192,13 @@ vect_set_loop_condition_normal (class loop *loop, tree niters, tree step, gcond *cond_stmt; gcond *orig_cond; edge pe = loop_preheader_edge (loop); - edge exit_edge = single_exit (loop); + edge exit_edge = loop->vec_loop_iv; gimple_stmt_iterator incr_gsi; bool insert_after; enum tree_code code; tree niters_type = TREE_TYPE (niters); - orig_cond = get_loop_exit_condition (loop); + orig_cond = get_edge_condition (exit_edge); gcc_assert (orig_cond); loop_cond_gsi = gsi_for_stmt (orig_cond); @@ -1305,7 +1306,7 @@ vect_set_loop_condition_normal (class loop *loop, tree niters, tree step, if (final_iv) { gassign *assign; - edge exit = single_exit (loop); + edge exit = LOOP_VINFO_IV_EXIT (loop_vinfo); gcc_assert (single_pred_p (exit->dest)); tree phi_dest = integer_zerop (init) ? final_iv : copy_ssa_name (indx_after_incr); @@ -1353,7 +1354,7 @@ vect_set_loop_condition (class loop *loop, loop_vec_info loop_vinfo, bool niters_maybe_zero) { gcond *cond_stmt; - gcond *orig_cond = get_loop_exit_condition (loop); + gcond *orig_cond = get_edge_condition (loop->vec_loop_iv); gimple_stmt_iterator loop_cond_gsi = gsi_for_stmt (orig_cond); if (loop_vinfo && LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo)) @@ -1370,7 +1371,8 @@ vect_set_loop_condition (class loop *loop, loop_vec_info loop_vinfo, loop_cond_gsi); } else - cond_stmt = vect_set_loop_condition_normal (loop, niters, step, final_iv, + cond_stmt = vect_set_loop_condition_normal (loop_vinfo, loop, niters, + step, final_iv, niters_maybe_zero, loop_cond_gsi); @@ -1439,6 +1441,69 @@ slpeel_duplicate_current_defs_from_edges (edge from, edge to) get_current_def (PHI_ARG_DEF_FROM_EDGE (from_phi, from))); } +/* When copies of the same loop are created the copies won't have any SCEV + information and so we can't determine what their exits are. However since + they are copies of an original loop the exits should be the same. + + I don't really like this, and think we need a different way, but I don't + know what. So sending this up so Richi can comment. */ + +void +vec_init_exit_info (class loop *loop) +{ + if (loop->vec_loop_iv) + return; + + auto_vec exits = get_loop_exit_edges (loop); + if (exits.is_empty ()) + return; + + if ((loop->vec_loop_iv = single_exit (loop))) + return; + + loop->vec_loop_alt_exits.create (exits.length () - 1); + + /* The main IV is to be determined by the block that's the first reachable + block from the latch. We cannot rely on the order the loop analysis + returns and we don't have any SCEV analysis on the loop. */ + auto_vec workset; + workset.safe_push (loop_latch_edge (loop)); + hash_set visited; + + while (!workset.is_empty ()) + { + edge e = workset.pop (); + if (visited.contains (e)) + continue; + + bool found_p = false; + for (edge ex : e->src->succs) + { + if (exits.contains (ex)) + { + found_p = true; + e = ex; + break; + } + } + + if (found_p) + { + loop->vec_loop_iv = e; + for (edge ex : exits) + if (e != ex) + loop->vec_loop_alt_exits.safe_push (ex); + return; + } + else + { + for (edge ex : e->src->preds) + workset.safe_insert (0, ex); + } + visited.add (e); + } + gcc_unreachable (); +} /* Given LOOP this function generates a new copy of it and puts it on E which is either the entry or exit of LOOP. If SCALAR_LOOP is @@ -1458,13 +1523,15 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge exit, new_exit; bool duplicate_outer_loop = false; - exit = single_exit (loop); + exit = loop->vec_loop_iv; at_exit = (e == exit); if (!at_exit && e != loop_preheader_edge (loop)) return NULL; if (scalar_loop == NULL) scalar_loop = loop; + else + vec_init_exit_info (scalar_loop); bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1); pbbs = bbs + 1; @@ -1490,13 +1557,17 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, bbs[0] = preheader; new_bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1); - exit = single_exit (scalar_loop); + exit = scalar_loop->vec_loop_iv; copy_bbs (bbs, scalar_loop->num_nodes + 1, new_bbs, &exit, 1, &new_exit, NULL, at_exit ? loop->latch : e->src, true); - exit = single_exit (loop); + exit = loop->vec_loop_iv; basic_block new_preheader = new_bbs[0]; + /* Record the new loop exit information. new_loop doesn't have SCEV data and + so we must initialize the exit information. */ + vec_init_exit_info (new_loop); + /* Before installing PHI arguments make sure that the edges into them match that of the scalar loop we analyzed. This makes sure the SLP tree matches up between the main vectorized @@ -1537,7 +1608,7 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, but LOOP will not. slpeel_update_phi_nodes_for_guard{1,2} expects the LOOP SSA_NAMEs (on the exit edge and edge from latch to header) to have current_def set, so copy them over. */ - slpeel_duplicate_current_defs_from_edges (single_exit (scalar_loop), + slpeel_duplicate_current_defs_from_edges (scalar_loop->vec_loop_iv, exit); slpeel_duplicate_current_defs_from_edges (EDGE_SUCC (scalar_loop->latch, 0), @@ -1696,11 +1767,12 @@ slpeel_add_loop_guard (basic_block guard_bb, tree cond, */ bool -slpeel_can_duplicate_loop_p (const class loop *loop, const_edge e) +slpeel_can_duplicate_loop_p (const loop_vec_info loop_vinfo, const_edge e) { - edge exit_e = single_exit (loop); + class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); + edge exit_e = LOOP_VINFO_IV_EXIT (loop_vinfo); edge entry_e = loop_preheader_edge (loop); - gcond *orig_cond = get_loop_exit_condition (loop); + gcond *orig_cond = get_edge_condition (exit_e); gimple_stmt_iterator loop_exit_gsi = gsi_last_bb (exit_e->src); unsigned int num_bb = loop->inner? 5 : 2; @@ -1709,7 +1781,7 @@ slpeel_can_duplicate_loop_p (const class loop *loop, const_edge e) if (!loop_outer (loop) || loop->num_nodes != num_bb || !empty_block_p (loop->latch) - || !single_exit (loop) + || !LOOP_VINFO_IV_EXIT (loop_vinfo) /* Verify that new loop exit condition can be trivially modified. */ || (!orig_cond || orig_cond != gsi_stmt (loop_exit_gsi)) || (e != exit_e && e != entry_e)) @@ -1722,7 +1794,7 @@ slpeel_can_duplicate_loop_p (const class loop *loop, const_edge e) return ret; } -/* Function vect_get_loop_location. +/* Function find_loop_location. Extract the location of the loop in the source code. If the loop is not well formed for vectorization, an estimated @@ -1739,11 +1811,19 @@ find_loop_location (class loop *loop) if (!loop) return dump_user_location_t (); - stmt = get_loop_exit_condition (loop); + if (loops_state_satisfies_p (LOOPS_HAVE_RECORDED_EXITS)) + { + /* We only care about the loop location, so use any exit with location + information. */ + for (edge e : get_loop_exit_edges (loop)) + { + stmt = get_edge_condition (e); - if (stmt - && LOCATION_LOCUS (gimple_location (stmt)) > BUILTINS_LOCATION) - return stmt; + if (stmt + && LOCATION_LOCUS (gimple_location (stmt)) > BUILTINS_LOCATION) + return stmt; + } + } /* If we got here the loop is probably not "well formed", try to estimate the loop location */ @@ -1962,7 +2042,8 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo, gphi_iterator gsi, gsi1; class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); basic_block update_bb = update_e->dest; - basic_block exit_bb = single_exit (loop)->dest; + + basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest; /* Make sure there exists a single-predecessor exit bb: */ gcc_assert (single_pred_p (exit_bb)); @@ -2529,10 +2610,9 @@ vect_gen_vector_loop_niters_mult_vf (loop_vec_info loop_vinfo, { /* We should be using a step_vector of VF if VF is variable. */ int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo).to_constant (); - class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); tree type = TREE_TYPE (niters_vector); tree log_vf = build_int_cst (type, exact_log2 (vf)); - basic_block exit_bb = single_exit (loop)->dest; + basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest; gcc_assert (niters_vector_mult_vf_ptr != NULL); tree niters_vector_mult_vf = fold_build2 (LSHIFT_EXPR, type, @@ -2559,7 +2639,7 @@ find_guard_arg (class loop *loop, class loop *epilog ATTRIBUTE_UNUSED, gphi *lcssa_phi) { gphi_iterator gsi; - edge e = single_exit (loop); + edge e = loop->vec_loop_iv; gcc_assert (single_pred_p (e->dest)); for (gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi); gsi_next (&gsi)) @@ -3328,8 +3408,8 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, if (epilog_peeling) { - e = single_exit (loop); - gcc_checking_assert (slpeel_can_duplicate_loop_p (loop, e)); + e = LOOP_VINFO_IV_EXIT (loop_vinfo); + gcc_checking_assert (slpeel_can_duplicate_loop_p (loop_vinfo, e)); /* Peel epilog and put it on exit edge of loop. If we are vectorizing said epilog then we should use a copy of the main loop as a starting @@ -3419,8 +3499,8 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, { guard_cond = fold_build2 (EQ_EXPR, boolean_type_node, niters, niters_vector_mult_vf); - guard_bb = single_exit (loop)->dest; - guard_to = split_edge (single_exit (epilog)); + guard_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest; + guard_to = split_edge (epilog->vec_loop_iv); guard_e = slpeel_add_loop_guard (guard_bb, guard_cond, guard_to, skip_vector ? anchor : guard_bb, prob_epilog.invert (), @@ -3428,7 +3508,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, if (vect_epilogues) epilogue_vinfo->skip_this_loop_edge = guard_e; slpeel_update_phi_nodes_for_guard2 (loop, epilog, guard_e, - single_exit (epilog)); + epilog->vec_loop_iv); /* Only need to handle basic block before epilog loop if it's not the guard_bb, which is the case when skip_vector is true. */ if (guard_bb != bb_before_epilog) diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 0a03f56aae7b51fb4c5ce0e49d96888bae634ef7..0bca5932d237cf1cfbbb48271db3f4430672b5dc 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -1641,6 +1641,13 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info) { DUMP_VECT_SCOPE ("vect_analyze_loop_form"); + vec_init_exit_info (loop); + if (!loop->vec_loop_iv) + return opt_result::failure_at (vect_location, + "not vectorized:" + " could not determine main exit from" + " loop with multiple exits.\n"); + /* Different restrictions apply when we are considering an inner-most loop, vs. an outer (nested) loop. (FORNOW. May want to relax some of these restrictions in the future). */ @@ -3025,9 +3032,8 @@ start_over: if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "epilog loop required\n"); if (!vect_can_advance_ivs_p (loop_vinfo) - || !slpeel_can_duplicate_loop_p (LOOP_VINFO_LOOP (loop_vinfo), - single_exit (LOOP_VINFO_LOOP - (loop_vinfo)))) + || !slpeel_can_duplicate_loop_p (loop_vinfo, + LOOP_VINFO_IV_EXIT (loop_vinfo))) { ok = opt_result::failure_at (vect_location, "not vectorized: can't create required " @@ -5964,7 +5970,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, Store them in NEW_PHIS. */ if (double_reduc) loop = outer_loop; - exit_bb = single_exit (loop)->dest; + exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest; exit_gsi = gsi_after_labels (exit_bb); reduc_inputs.create (slp_node ? vec_num : ncopies); for (unsigned i = 0; i < vec_num; i++) @@ -5980,7 +5986,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, phi = create_phi_node (new_def, exit_bb); if (j) def = gimple_get_lhs (STMT_VINFO_VEC_STMTS (rdef_info)[j]); - SET_PHI_ARG_DEF (phi, single_exit (loop)->dest_idx, def); + SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, def); new_def = gimple_convert (&stmts, vectype, new_def); reduc_inputs.quick_push (new_def); } @@ -10301,12 +10307,12 @@ vectorizable_live_operation (vec_info *vinfo, lhs' = new_tree; */ class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); - basic_block exit_bb = single_exit (loop)->dest; + basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest; gcc_assert (single_pred_p (exit_bb)); tree vec_lhs_phi = copy_ssa_name (vec_lhs); gimple *phi = create_phi_node (vec_lhs_phi, exit_bb); - SET_PHI_ARG_DEF (phi, single_exit (loop)->dest_idx, vec_lhs); + SET_PHI_ARG_DEF (phi, LOOP_VINFO_IV_EXIT (loop_vinfo)->dest_idx, vec_lhs); gimple_seq stmts = NULL; tree new_tree; @@ -10829,7 +10835,8 @@ scale_profile_for_vect_loop (class loop *loop, unsigned vf) scale_loop_frequencies (loop, p); } - edge exit_e = single_exit (loop); + edge exit_e = loop->vec_loop_iv; + exit_e->probability = profile_probability::always () / (new_est_niter + 1); edge exit_l = single_pred_edge (loop->latch); @@ -11177,7 +11184,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call) /* Make sure there exists a single-predecessor exit bb. Do this before versioning. */ - edge e = single_exit (loop); + edge e = LOOP_VINFO_IV_EXIT (loop_vinfo); if (! single_pred_p (e->dest)) { split_loop_exit_edge (e, true); diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index a36974c2c0d2103b0a2d0397d06ab84dace08129..bd5eceb5da7a45ef036cd14609ebe091799320bf 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -917,6 +917,8 @@ public: /* Access Functions. */ #define LOOP_VINFO_LOOP(L) (L)->loop +#define LOOP_VINFO_IV_EXIT(L) (L)->loop->vec_loop_iv +#define LOOP_VINFO_ALT_EXITS(L) (L)->loop->vec_loop_alt_exits #define LOOP_VINFO_BBS(L) (L)->bbs #define LOOP_VINFO_NITERSM1(L) (L)->num_itersm1 #define LOOP_VINFO_NITERS(L) (L)->num_iters @@ -2162,6 +2164,7 @@ extern void vect_prepare_for_masked_peels (loop_vec_info); extern dump_user_location_t find_loop_location (class loop *); extern bool vect_can_advance_ivs_p (loop_vec_info); extern void vect_update_inits_of_drs (loop_vec_info, tree, tree_code); +extern void vec_init_exit_info (class loop *); /* In tree-vect-stmts.cc. */ extern tree get_related_vectype_for_scalar_type (machine_mode, tree, From patchwork Wed Jun 28 13:45:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 113896 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp8939313vqr; Wed, 28 Jun 2023 06:48:13 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7fgm+hhRWuesL8RAJutrPmPtN7OZtK8nEvoNXet43+HwEhwC3FVKm3Yppv3qcSXi9vvEK5 X-Received: by 2002:a17:907:36c5:b0:978:acec:36b1 with SMTP id bj5-20020a17090736c500b00978acec36b1mr28968741ejc.17.1687960092764; Wed, 28 Jun 2023 06:48:12 -0700 (PDT) Received: from sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id b11-20020a170906d10b00b00982818cbb54si5653248ejz.593.2023.06.28.06.48.12 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Jun 2023 06:48:12 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=kkg6tqRj; arc=fail (signature failed); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A20863864C6A for ; Wed, 28 Jun 2023 13:46:34 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A20863864C6A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1687959994; bh=JRtp83SrSr9NKESgMZiVBn/UYyIox+rUfgMfey6VGoE=; h=Date:To:Cc:Subject:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=kkg6tqRjOvptVb/aPgZKJO4qbJoqxc3xLZhYiFbx2ulV2+8QcsM0CtbBiYJz+FBpY mIX8OoGqvf0m+74bsbHVOTPq9vp4uvKcONUfJXb4QdlTB6vhaiZB/WSDLN7LoaW9aI AopZVuTwOsFB1bnXecy4aeBVCr07edaErIbL0YBU= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR04-DB3-obe.outbound.protection.outlook.com (mail-db3eur04on2082.outbound.protection.outlook.com [40.107.6.82]) by sourceware.org (Postfix) with ESMTPS id 40BC43857438 for ; Wed, 28 Jun 2023 13:45:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 40BC43857438 Received: from AS9PR06CA0662.eurprd06.prod.outlook.com (2603:10a6:20b:49c::7) by AS8PR08MB10314.eurprd08.prod.outlook.com (2603:10a6:20b:5c0::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.26; Wed, 28 Jun 2023 13:45:42 +0000 Received: from AM7EUR03FT045.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:49c:cafe::6d) by AS9PR06CA0662.outlook.office365.com (2603:10a6:20b:49c::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.34 via Frontend Transport; Wed, 28 Jun 2023 13:45:42 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM7EUR03FT045.mail.protection.outlook.com (100.127.140.150) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6544.20 via Frontend Transport; Wed, 28 Jun 2023 13:45:42 +0000 Received: ("Tessian outbound e2424c13b707:v142"); Wed, 28 Jun 2023 13:45:42 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 174086eba2bb8df6 X-CR-MTA-TID: 64aa7808 Received: from e8eea0f808ea.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 07AF4E92-5AE2-4AF0-B2AE-A57396C88700.1; Wed, 28 Jun 2023 13:45:35 +0000 Received: from EUR05-DB8-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id e8eea0f808ea.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 28 Jun 2023 13:45:35 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=aRG4fphg11Q+QefwikMADaqrkhaIBHFWJVBborvsoOIDU5dr6k8B7f7t7A0RW8P3kVEI4higTTYCzyFeJf7PysbatimzUIcKH2jwh9Bnhgb+8SeKkkTW1kMBH4YvR+g6us7GeEzFPw4MNJrrDTfM+TR2V/aitEJOJ/1ECSq3jBcrheg5l2Sq9BAG/mczUJVZ4l2F5JtIBIXseVXTmm3lX0t0q2iTJ/I9U50LeGfZYkChbGRtKqfigUNXpfjG0KQrYj1wfi/S5mXo31uS1BX3tixcG8IdHAk4LgjqNTgXjYwzcEWhDP7P6aSLRbsoXkUVPA+JrX9AbMglpH/iwv/N0g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=JRtp83SrSr9NKESgMZiVBn/UYyIox+rUfgMfey6VGoE=; b=fJ0xTiIGrH2u3aBDl9QvIaXW3fHzb9Q8Lk/TI1eg3PF2RJSwIUSc9kli5j0Trq9V3UmwPePDLkuwGGRr3/eokLDw7VebYaabAUaJtUKmTvrwly+cCJVPP36FDgLzAZ8neKnpNXkk8RU/nCvd/opmX4OXb0RKRZwV38O+tcubOaSz3LgrNLECdZhS9UklQFM96MSXhEgZEOblz4IDuu+qlRMse2G0+/re7JLnOEdqH4w7lvhVwRkQoX7QY18q9DCbWSFzKhgh9TyblEl2eCSNTZshfz7W8PRxmG7KSvZJaEyf+DcpPto881H2D1BuaiTAaXBC1XtKJogVq4C7FYDRHg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by DB5PR08MB9970.eurprd08.prod.outlook.com (2603:10a6:10:489::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.23; Wed, 28 Jun 2023 13:45:32 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::2301:1cde:cfe7:eaf0]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::2301:1cde:cfe7:eaf0%6]) with mapi id 15.20.6521.026; Wed, 28 Jun 2023 13:45:32 +0000 Date: Wed, 28 Jun 2023 14:45:30 +0100 To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, rguenther@suse.de, jlaw@ventanamicro.com Subject: [PATCH 8/19]middle-end: updated niters analysis to handle multiple exits. Message-ID: Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: LO4P123CA0396.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:18f::23) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|DB5PR08MB9970:EE_|AM7EUR03FT045:EE_|AS8PR08MB10314:EE_ X-MS-Office365-Filtering-Correlation-Id: 823d9974-ac3e-4319-d608-08db77ddf77b x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: vBAG/tLsDEyF0rID/xH7vC046qnfmWVoG/SML8JthIpe8XrowVcIJUw8PZcbiq2x5MQXCADoXMGfVFb6CIjcYuQnB4m3MXkxTBgiTBJKjJOU0xk27cVw1R1Df3k0ZK3Xtm2oPuvhik1VAw9SHc1vjF2DdTL0Sb6W51qBK5PaKYIbeuimA8K8lBVJm1dCx4fFn2jRuRPQKfgf1HK5+IwLpnwPnuz2Or4DpoVaeWUbog6ETIX/klIAEmUygmCxQFD9XkbO1AubasT7bgBtVI37sIIOkq7b6KdLlBBdlzYLiP6FDPKtBd5wqg4ZeiqRmpuFXQsA517epyr6v58PCSOTXBtx3KgZGeMU4/tQC/S4zLNIfsgPl1dbAxuJ8d0NZil8zr2qnT7i7tQNlepACV4DGmilFtGLGfBacePgOkNtizvaol8CniNsHCkV3U3cMZZ0b1y9pXS31oNqT6Zpq4oKZSdENqqVvNrEjvuDEpoEt199Yl4FFbqpppkTO/Ooz0qE2EyBetC5u5p7jFfZCm0N/P7w8NCQ5vGpCBSfzskTHuUd33xa3lVMEJbpn0O4/Ie1FXHEBRj7aqDOKHPrLSoD/tlI81SagpkVw7icJ1P5yioTVJKJhBcbMYRPYf8NHJqx X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230028)(4636009)(136003)(346002)(396003)(376002)(366004)(39860400002)(451199021)(66946007)(66476007)(6916009)(44832011)(316002)(66556008)(478600001)(4326008)(36756003)(8936002)(235185007)(15650500001)(8676002)(5660300002)(86362001)(2906002)(33964004)(41300700001)(44144004)(6486002)(186003)(4743002)(6506007)(6512007)(38100700002)(26005)(83380400001)(2616005)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB5PR08MB9970 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM7EUR03FT045.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 8e2248cc-ae45-4967-2ecb-08db77ddf143 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 3gmEqIoOY407l08xstA5ds4u4Tb+3NzgryAmq3/hd00JSIA8l31oEhPyY8bEAcFID2O5zMEG/riUbTTWWnjWmFTGVEPIt36whIX+7GcadZF6ThuO1u6I9I/h9o5sCTOXArjP/L4qD21EVYmuNuEVhuhTWJ2GuJ4+sLLozXxtDfUzK9BK071vTw3eAF2FvU7fyR8ezgTy38HJHpiFYpPOoMxEeFzLg1iDAZc3Puf+B8srMtec33KLFej6ets/+/39NciucEuW1q0OE6xJJZatabNm2DZGjN/BejIFIz0CZUsAvcy+lVzLRAWMN8wqhctzPGbQ4h2PjrzCwxN6wx+aM7lityD/Z7iAEZ9hLKISz3+N6rPLVTBkOeMOW5rwvotsXSIrKPMiz2bnJWFhyNx5D4WhvbB6ck3Ql5OR8lU+/V9B1HjqU+Husa5PYAFwAKzY+iaZTSL6b04be12+yycXbOWx0FaS/uZHuG4AETdVzs9XaCjUOOok1UcNKlAesYoT1uXDSlaAdQEa1D5vJvA/rdZS+yx89B1V4GDvGh9Odwo70pNHfduLV7qPVQTWNMet4B37Vv23kh+Arp3fjqw6teTbDXeBz2GrYomt618FH1G2YU7Z/Ssge5wK9beiegzmSFqEKqhi6EdffR7dGFgmyeFhAZD4ZfC00JQ9F8Sq7J6FVnPA9HRQ0bfBWojC08+RAAjb3Bb6SP9aRLd9WAqDrNBb5gfNRAJTUrYKYmpPd0RRROBiNDQDnGS31CdlLD4UYVLQ8+olRhRwiqB7bPhtqg== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230028)(4636009)(396003)(39860400002)(136003)(376002)(346002)(451199021)(46966006)(40470700004)(36840700001)(44832011)(235185007)(5660300002)(15650500001)(4326008)(6916009)(70206006)(478600001)(36756003)(316002)(70586007)(8936002)(8676002)(2906002)(40460700003)(4743002)(33964004)(36860700001)(41300700001)(6486002)(44144004)(82310400005)(186003)(40480700001)(336012)(86362001)(26005)(6506007)(6512007)(47076005)(356005)(107886003)(81166007)(2616005)(82740400003)(83380400001)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Jun 2023 13:45:42.5841 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 823d9974-ac3e-4319-d608-08db77ddf77b X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM7EUR03FT045.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB10314 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_LOTSOFHASH, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tamar Christina via Gcc-patches From: Tamar Christina Reply-To: Tamar Christina Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769954442292202279?= X-GMAIL-MSGID: =?utf-8?q?1769954442292202279?= Hi All, For early break vectorization we have to update niters analysis to record and analyze all exits of the loop, and so all conds. The niters of the loop is still determined by the main/natural exit of the loop as this is the O(n) bounds. For now we don't do much with the secondary conds, but their assumptions can be used to generate versioning checks later. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * tree-vect-loop.cc (vect_get_loop_niters): Analyze all exits and return all gconds. (vect_analyze_loop_form): Update code checking for conds. (vect_create_loop_vinfo): Handle having multiple conds. (vect_analyze_loop): Release extra loop conds structures. * tree-vectorizer.h (LOOP_VINFO_LOOP_CONDS, LOOP_VINFO_LOOP_IV_COND): New. (struct vect_loop_form_info): Add conds, loop_iv_cond. --- inline copy of patch -- diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 55e69a7ca0b24e0872477141db6f74dbf90b7981..9065811b3b9c2a550baf44768603172b9e26b94b 100644 --- diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 55e69a7ca0b24e0872477141db6f74dbf90b7981..9065811b3b9c2a550baf44768603172b9e26b94b 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -849,80 +849,106 @@ vect_fixup_scalar_cycles_with_patterns (loop_vec_info loop_vinfo) in NUMBER_OF_ITERATIONSM1. Place the condition under which the niter information holds in ASSUMPTIONS. - Return the loop exit condition. */ + Return the loop exit conditions. */ -static gcond * +static vec vect_get_loop_niters (class loop *loop, tree *assumptions, tree *number_of_iterations, tree *number_of_iterationsm1) { - edge exit = single_exit (loop); + auto_vec exits = get_loop_exit_edges (loop); + vec conds; + conds.create (exits.length ()); class tree_niter_desc niter_desc; tree niter_assumptions, niter, may_be_zero; - gcond *cond = get_loop_exit_condition (loop); *assumptions = boolean_true_node; *number_of_iterationsm1 = chrec_dont_know; *number_of_iterations = chrec_dont_know; + DUMP_VECT_SCOPE ("get_loop_niters"); - if (!exit) - return cond; + if (exits.is_empty ()) + return conds; - may_be_zero = NULL_TREE; - if (!number_of_iterations_exit_assumptions (loop, exit, &niter_desc, NULL) - || chrec_contains_undetermined (niter_desc.niter)) - return cond; + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, "Loop has %d exits.\n", + exits.length ()); - niter_assumptions = niter_desc.assumptions; - may_be_zero = niter_desc.may_be_zero; - niter = niter_desc.niter; + edge exit; + unsigned int i; + FOR_EACH_VEC_ELT (exits, i, exit) + { + gcond *cond = get_edge_condition (exit); + if (cond) + conds.safe_push (cond); - if (may_be_zero && integer_zerop (may_be_zero)) - may_be_zero = NULL_TREE; + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, "Analyzing exit %d...\n", i); - if (may_be_zero) - { - if (COMPARISON_CLASS_P (may_be_zero)) + may_be_zero = NULL_TREE; + if (!number_of_iterations_exit_assumptions (loop, exit, &niter_desc, NULL) + || chrec_contains_undetermined (niter_desc.niter)) + continue; + + niter_assumptions = niter_desc.assumptions; + may_be_zero = niter_desc.may_be_zero; + niter = niter_desc.niter; + + if (may_be_zero && integer_zerop (may_be_zero)) + may_be_zero = NULL_TREE; + + if (may_be_zero) { - /* Try to combine may_be_zero with assumptions, this can simplify - computation of niter expression. */ - if (niter_assumptions && !integer_nonzerop (niter_assumptions)) - niter_assumptions = fold_build2 (TRUTH_AND_EXPR, boolean_type_node, - niter_assumptions, - fold_build1 (TRUTH_NOT_EXPR, - boolean_type_node, - may_be_zero)); + if (COMPARISON_CLASS_P (may_be_zero)) + { + /* Try to combine may_be_zero with assumptions, this can simplify + computation of niter expression. */ + if (niter_assumptions && !integer_nonzerop (niter_assumptions)) + niter_assumptions = fold_build2 (TRUTH_AND_EXPR, boolean_type_node, + niter_assumptions, + fold_build1 (TRUTH_NOT_EXPR, + boolean_type_node, + may_be_zero)); + else + niter = fold_build3 (COND_EXPR, TREE_TYPE (niter), may_be_zero, + build_int_cst (TREE_TYPE (niter), 0), + rewrite_to_non_trapping_overflow (niter)); + + may_be_zero = NULL_TREE; + } + else if (integer_nonzerop (may_be_zero) && exit == loop->vec_loop_iv) + { + *number_of_iterationsm1 = build_int_cst (TREE_TYPE (niter), 0); + *number_of_iterations = build_int_cst (TREE_TYPE (niter), 1); + continue; + } else - niter = fold_build3 (COND_EXPR, TREE_TYPE (niter), may_be_zero, - build_int_cst (TREE_TYPE (niter), 0), - rewrite_to_non_trapping_overflow (niter)); + continue; + } - may_be_zero = NULL_TREE; - } - else if (integer_nonzerop (may_be_zero)) + /* Loop assumptions are based off the normal exit. */ + if (exit == loop->vec_loop_iv) { - *number_of_iterationsm1 = build_int_cst (TREE_TYPE (niter), 0); - *number_of_iterations = build_int_cst (TREE_TYPE (niter), 1); - return cond; + *assumptions = niter_assumptions; + *number_of_iterationsm1 = niter; + + /* We want the number of loop header executions which is the number + of latch executions plus one. + ??? For UINT_MAX latch executions this number overflows to zero + for loops like do { n++; } while (n != 0); */ + if (niter && !chrec_contains_undetermined (niter)) + niter = fold_build2 (PLUS_EXPR, TREE_TYPE (niter), + unshare_expr (niter), + build_int_cst (TREE_TYPE (niter), 1)); + *number_of_iterations = niter; } - else - return cond; } - *assumptions = niter_assumptions; - *number_of_iterationsm1 = niter; - - /* We want the number of loop header executions which is the number - of latch executions plus one. - ??? For UINT_MAX latch executions this number overflows to zero - for loops like do { n++; } while (n != 0); */ - if (niter && !chrec_contains_undetermined (niter)) - niter = fold_build2 (PLUS_EXPR, TREE_TYPE (niter), unshare_expr (niter), - build_int_cst (TREE_TYPE (niter), 1)); - *number_of_iterations = niter; + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, "All loop exits successfully analyzed.\n"); - return cond; + return conds; } /* Function bb_in_loop_p @@ -1768,15 +1794,26 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info) "not vectorized:" " abnormal loop exit edge.\n"); - info->loop_cond + info->conds = vect_get_loop_niters (loop, &info->assumptions, &info->number_of_iterations, &info->number_of_iterationsm1); - if (!info->loop_cond) + + if (info->conds.is_empty ()) return opt_result::failure_at (vect_location, "not vectorized: complicated exit condition.\n"); + /* Determine what the primary and alternate exit conds are. */ + info->alt_loop_conds.create (info->conds.length () - 1); + for (gcond *cond : info->conds) + { + if (loop->vec_loop_iv->src != gimple_bb (cond)) + info->alt_loop_conds.quick_push (cond); + else + info->loop_cond = cond; + } + if (integer_zerop (info->assumptions) || !info->number_of_iterations || chrec_contains_undetermined (info->number_of_iterations)) @@ -1821,8 +1858,14 @@ vect_create_loop_vinfo (class loop *loop, vec_info_shared *shared, if (!integer_onep (info->assumptions) && !main_loop_info) LOOP_VINFO_NITERS_ASSUMPTIONS (loop_vinfo) = info->assumptions; - stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (info->loop_cond); - STMT_VINFO_TYPE (loop_cond_info) = loop_exit_ctrl_vec_info_type; + for (gcond *cond : info->alt_loop_conds) + { + stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (cond); + STMT_VINFO_TYPE (loop_cond_info) = loop_exit_ctrl_vec_info_type; + } + LOOP_VINFO_LOOP_CONDS (loop_vinfo).safe_splice (info->alt_loop_conds); + LOOP_VINFO_LOOP_IV_COND (loop_vinfo) = info->loop_cond; + if (info->inner_loop_cond) { stmt_vec_info inner_loop_cond_info @@ -3520,6 +3563,9 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared) "***** Choosing vector mode %s\n", GET_MODE_NAME (first_loop_vinfo->vector_mode)); + loop_form_info.conds.release (); + loop_form_info.alt_loop_conds.release (); + /* Only vectorize epilogues if PARAM_VECT_EPILOGUES_NOMASK is enabled, SIMDUID is not set, it is the innermost loop and we have either already found the loop's SIMDLEN or there was no SIMDLEN to @@ -3631,6 +3677,9 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared) (first_loop_vinfo->epilogue_vinfos[0]->vector_mode)); } + loop_form_info.conds.release (); + loop_form_info.alt_loop_conds.release (); + return first_loop_vinfo; } diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index bd5eceb5da7a45ef036cd14609ebe091799320bf..1cc003c12e2447eca878f56cb019236f56e96f85 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -876,6 +876,12 @@ public: we need to peel off iterations at the end to form an epilogue loop. */ bool peeling_for_niter; + /* List of loop additional IV conditionals found in the loop. */ + auto_vec conds; + + /* Main loop IV cond. */ + gcond* loop_iv_cond; + /* True if there are no loop carried data dependencies in the loop. If loop->safelen <= 1, then this is always true, either the loop didn't have any loop carried data dependencies, or the loop is being @@ -966,6 +972,8 @@ public: #define LOOP_VINFO_REDUCTION_CHAINS(L) (L)->reduction_chains #define LOOP_VINFO_PEELING_FOR_GAPS(L) (L)->peeling_for_gaps #define LOOP_VINFO_PEELING_FOR_NITER(L) (L)->peeling_for_niter +#define LOOP_VINFO_LOOP_CONDS(L) (L)->conds +#define LOOP_VINFO_LOOP_IV_COND(L) (L)->loop_iv_cond #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies #define LOOP_VINFO_SCALAR_LOOP(L) (L)->scalar_loop #define LOOP_VINFO_SCALAR_LOOP_SCALING(L) (L)->scalar_loop_scaling @@ -2353,7 +2361,9 @@ struct vect_loop_form_info tree number_of_iterations; tree number_of_iterationsm1; tree assumptions; + vec conds; gcond *loop_cond; + vec alt_loop_conds; gcond *inner_loop_cond; }; extern opt_result vect_analyze_loop_form (class loop *, vect_loop_form_info *); --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -849,80 +849,106 @@ vect_fixup_scalar_cycles_with_patterns (loop_vec_info loop_vinfo) in NUMBER_OF_ITERATIONSM1. Place the condition under which the niter information holds in ASSUMPTIONS. - Return the loop exit condition. */ + Return the loop exit conditions. */ -static gcond * +static vec vect_get_loop_niters (class loop *loop, tree *assumptions, tree *number_of_iterations, tree *number_of_iterationsm1) { - edge exit = single_exit (loop); + auto_vec exits = get_loop_exit_edges (loop); + vec conds; + conds.create (exits.length ()); class tree_niter_desc niter_desc; tree niter_assumptions, niter, may_be_zero; - gcond *cond = get_loop_exit_condition (loop); *assumptions = boolean_true_node; *number_of_iterationsm1 = chrec_dont_know; *number_of_iterations = chrec_dont_know; + DUMP_VECT_SCOPE ("get_loop_niters"); - if (!exit) - return cond; + if (exits.is_empty ()) + return conds; - may_be_zero = NULL_TREE; - if (!number_of_iterations_exit_assumptions (loop, exit, &niter_desc, NULL) - || chrec_contains_undetermined (niter_desc.niter)) - return cond; + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, "Loop has %d exits.\n", + exits.length ()); - niter_assumptions = niter_desc.assumptions; - may_be_zero = niter_desc.may_be_zero; - niter = niter_desc.niter; + edge exit; + unsigned int i; + FOR_EACH_VEC_ELT (exits, i, exit) + { + gcond *cond = get_edge_condition (exit); + if (cond) + conds.safe_push (cond); - if (may_be_zero && integer_zerop (may_be_zero)) - may_be_zero = NULL_TREE; + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, "Analyzing exit %d...\n", i); - if (may_be_zero) - { - if (COMPARISON_CLASS_P (may_be_zero)) + may_be_zero = NULL_TREE; + if (!number_of_iterations_exit_assumptions (loop, exit, &niter_desc, NULL) + || chrec_contains_undetermined (niter_desc.niter)) + continue; + + niter_assumptions = niter_desc.assumptions; + may_be_zero = niter_desc.may_be_zero; + niter = niter_desc.niter; + + if (may_be_zero && integer_zerop (may_be_zero)) + may_be_zero = NULL_TREE; + + if (may_be_zero) { - /* Try to combine may_be_zero with assumptions, this can simplify - computation of niter expression. */ - if (niter_assumptions && !integer_nonzerop (niter_assumptions)) - niter_assumptions = fold_build2 (TRUTH_AND_EXPR, boolean_type_node, - niter_assumptions, - fold_build1 (TRUTH_NOT_EXPR, - boolean_type_node, - may_be_zero)); + if (COMPARISON_CLASS_P (may_be_zero)) + { + /* Try to combine may_be_zero with assumptions, this can simplify + computation of niter expression. */ + if (niter_assumptions && !integer_nonzerop (niter_assumptions)) + niter_assumptions = fold_build2 (TRUTH_AND_EXPR, boolean_type_node, + niter_assumptions, + fold_build1 (TRUTH_NOT_EXPR, + boolean_type_node, + may_be_zero)); + else + niter = fold_build3 (COND_EXPR, TREE_TYPE (niter), may_be_zero, + build_int_cst (TREE_TYPE (niter), 0), + rewrite_to_non_trapping_overflow (niter)); + + may_be_zero = NULL_TREE; + } + else if (integer_nonzerop (may_be_zero) && exit == loop->vec_loop_iv) + { + *number_of_iterationsm1 = build_int_cst (TREE_TYPE (niter), 0); + *number_of_iterations = build_int_cst (TREE_TYPE (niter), 1); + continue; + } else - niter = fold_build3 (COND_EXPR, TREE_TYPE (niter), may_be_zero, - build_int_cst (TREE_TYPE (niter), 0), - rewrite_to_non_trapping_overflow (niter)); + continue; + } - may_be_zero = NULL_TREE; - } - else if (integer_nonzerop (may_be_zero)) + /* Loop assumptions are based off the normal exit. */ + if (exit == loop->vec_loop_iv) { - *number_of_iterationsm1 = build_int_cst (TREE_TYPE (niter), 0); - *number_of_iterations = build_int_cst (TREE_TYPE (niter), 1); - return cond; + *assumptions = niter_assumptions; + *number_of_iterationsm1 = niter; + + /* We want the number of loop header executions which is the number + of latch executions plus one. + ??? For UINT_MAX latch executions this number overflows to zero + for loops like do { n++; } while (n != 0); */ + if (niter && !chrec_contains_undetermined (niter)) + niter = fold_build2 (PLUS_EXPR, TREE_TYPE (niter), + unshare_expr (niter), + build_int_cst (TREE_TYPE (niter), 1)); + *number_of_iterations = niter; } - else - return cond; } - *assumptions = niter_assumptions; - *number_of_iterationsm1 = niter; - - /* We want the number of loop header executions which is the number - of latch executions plus one. - ??? For UINT_MAX latch executions this number overflows to zero - for loops like do { n++; } while (n != 0); */ - if (niter && !chrec_contains_undetermined (niter)) - niter = fold_build2 (PLUS_EXPR, TREE_TYPE (niter), unshare_expr (niter), - build_int_cst (TREE_TYPE (niter), 1)); - *number_of_iterations = niter; + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, "All loop exits successfully analyzed.\n"); - return cond; + return conds; } /* Function bb_in_loop_p @@ -1768,15 +1794,26 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info) "not vectorized:" " abnormal loop exit edge.\n"); - info->loop_cond + info->conds = vect_get_loop_niters (loop, &info->assumptions, &info->number_of_iterations, &info->number_of_iterationsm1); - if (!info->loop_cond) + + if (info->conds.is_empty ()) return opt_result::failure_at (vect_location, "not vectorized: complicated exit condition.\n"); + /* Determine what the primary and alternate exit conds are. */ + info->alt_loop_conds.create (info->conds.length () - 1); + for (gcond *cond : info->conds) + { + if (loop->vec_loop_iv->src != gimple_bb (cond)) + info->alt_loop_conds.quick_push (cond); + else + info->loop_cond = cond; + } + if (integer_zerop (info->assumptions) || !info->number_of_iterations || chrec_contains_undetermined (info->number_of_iterations)) @@ -1821,8 +1858,14 @@ vect_create_loop_vinfo (class loop *loop, vec_info_shared *shared, if (!integer_onep (info->assumptions) && !main_loop_info) LOOP_VINFO_NITERS_ASSUMPTIONS (loop_vinfo) = info->assumptions; - stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (info->loop_cond); - STMT_VINFO_TYPE (loop_cond_info) = loop_exit_ctrl_vec_info_type; + for (gcond *cond : info->alt_loop_conds) + { + stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (cond); + STMT_VINFO_TYPE (loop_cond_info) = loop_exit_ctrl_vec_info_type; + } + LOOP_VINFO_LOOP_CONDS (loop_vinfo).safe_splice (info->alt_loop_conds); + LOOP_VINFO_LOOP_IV_COND (loop_vinfo) = info->loop_cond; + if (info->inner_loop_cond) { stmt_vec_info inner_loop_cond_info @@ -3520,6 +3563,9 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared) "***** Choosing vector mode %s\n", GET_MODE_NAME (first_loop_vinfo->vector_mode)); + loop_form_info.conds.release (); + loop_form_info.alt_loop_conds.release (); + /* Only vectorize epilogues if PARAM_VECT_EPILOGUES_NOMASK is enabled, SIMDUID is not set, it is the innermost loop and we have either already found the loop's SIMDLEN or there was no SIMDLEN to @@ -3631,6 +3677,9 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared) (first_loop_vinfo->epilogue_vinfos[0]->vector_mode)); } + loop_form_info.conds.release (); + loop_form_info.alt_loop_conds.release (); + return first_loop_vinfo; } diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index bd5eceb5da7a45ef036cd14609ebe091799320bf..1cc003c12e2447eca878f56cb019236f56e96f85 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -876,6 +876,12 @@ public: we need to peel off iterations at the end to form an epilogue loop. */ bool peeling_for_niter; + /* List of loop additional IV conditionals found in the loop. */ + auto_vec conds; + + /* Main loop IV cond. */ + gcond* loop_iv_cond; + /* True if there are no loop carried data dependencies in the loop. If loop->safelen <= 1, then this is always true, either the loop didn't have any loop carried data dependencies, or the loop is being @@ -966,6 +972,8 @@ public: #define LOOP_VINFO_REDUCTION_CHAINS(L) (L)->reduction_chains #define LOOP_VINFO_PEELING_FOR_GAPS(L) (L)->peeling_for_gaps #define LOOP_VINFO_PEELING_FOR_NITER(L) (L)->peeling_for_niter +#define LOOP_VINFO_LOOP_CONDS(L) (L)->conds +#define LOOP_VINFO_LOOP_IV_COND(L) (L)->loop_iv_cond #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies #define LOOP_VINFO_SCALAR_LOOP(L) (L)->scalar_loop #define LOOP_VINFO_SCALAR_LOOP_SCALING(L) (L)->scalar_loop_scaling @@ -2353,7 +2361,9 @@ struct vect_loop_form_info tree number_of_iterations; tree number_of_iterationsm1; tree assumptions; + vec conds; gcond *loop_cond; + vec alt_loop_conds; gcond *inner_loop_cond; }; extern opt_result vect_analyze_loop_form (class loop *, vect_loop_form_info *); From patchwork Wed Jun 28 13:45:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 113898 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp8939670vqr; Wed, 28 Jun 2023 06:48:50 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6Ty21egIGteYbQZL4N05Mj35PNDsphmIMYGDnimFECbiJhlb0hynUAHFss0zeZXDjEpS3c X-Received: by 2002:a2e:9203:0:b0:2b6:a682:9aa5 with SMTP id k3-20020a2e9203000000b002b6a6829aa5mr5543777ljg.38.1687960130539; Wed, 28 Jun 2023 06:48:50 -0700 (PDT) Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id y11-20020aa7c24b000000b0051d8027e02esi5035902edo.461.2023.06.28.06.48.50 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Jun 2023 06:48:50 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b="TpuB/rU+"; arc=fail (signature failed); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 25D93385482E for ; Wed, 28 Jun 2023 13:47:02 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 25D93385482E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1687960022; bh=vnyl2ykqiTPU/rE/94Uw6LzwwEbBWqcwZD6fJoKmkqc=; h=Date:To:Cc:Subject:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=TpuB/rU+q1CIwMYk46I7TqGGZO0VKVej/q9LiJpx5QDR2wcgxSO+0hYJpkYjmANvo IJNhqYPSdknfuLYSMEol7orCnU6SXLgaYd7eTjyfkDtoJR88HIwjg4uf7M6OkNdGeC CToqO/CwfiIq1NX+GCAtwlsJWYwiWdzQqLw56iWA= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR05-DB8-obe.outbound.protection.outlook.com (mail-db8eur05on2061.outbound.protection.outlook.com [40.107.20.61]) by sourceware.org (Postfix) with ESMTPS id F14733856DDE for ; Wed, 28 Jun 2023 13:46:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F14733856DDE Received: from DBBPR09CA0037.eurprd09.prod.outlook.com (2603:10a6:10:d4::25) by VE1PR08MB5647.eurprd08.prod.outlook.com (2603:10a6:800:1b2::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.26; Wed, 28 Jun 2023 13:46:04 +0000 Received: from DBAEUR03FT057.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:d4:cafe::ea) by DBBPR09CA0037.outlook.office365.com (2603:10a6:10:d4::25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6544.19 via Frontend Transport; Wed, 28 Jun 2023 13:46:04 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT057.mail.protection.outlook.com (100.127.142.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6544.20 via Frontend Transport; Wed, 28 Jun 2023 13:46:04 +0000 Received: ("Tessian outbound 546d04a74417:v142"); Wed, 28 Jun 2023 13:46:04 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 052834bfdca2bf1f X-CR-MTA-TID: 64aa7808 Received: from ebb8f32f352e.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id E385BCCF-273F-4C15-8116-D86C9497F9E5.1; Wed, 28 Jun 2023 13:45:58 +0000 Received: from EUR05-DB8-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id ebb8f32f352e.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 28 Jun 2023 13:45:58 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=RfCDBbhZ87wDtTkXkXK/mM/WqUkehbWGBxxn2g/hOSBJa8vWU1LdX0qSK/wAiTAO33NGAYTEThPjKJT/wj9rIQL2Hn5myy9f3fFWjc0VTpS7k8NKtp1R23sqXaPjBQAWKKj9v0YhXan5gYkibv8SJg522u6AEPb1kByRWuD8giGIYfuNq2o5PYUJQrHAfdEI8NRgnT1eKHAvYvgyTbTp5+2ZsydZ0pFekrwxvAao7qyBXSdsc8TCqCBbp8dK2+IkvKlJ7zMrn2hXJobskzvWQ3xBPxVP6lQKcdwiBVNasAqRwNDShTjC8vteeqckOX242YP0K088ngZ8xrpDAQYX9g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=vnyl2ykqiTPU/rE/94Uw6LzwwEbBWqcwZD6fJoKmkqc=; b=XAC9WIh5kiltj2xmaRn9jeCZRvkg87Ed4jPxp+tjjxiBPIaEok4bRFllNVeMvKWLqqAC4tBcVl8PzGmBdHp3Um4nkaDfrNL+YZPMpjWszJub27my1AGfmT9AQblUGPCIvJ9b477pUx47EmVW5xW7TDTFnXB6UtJhCBOvxnZAde2RApPxcKOipBc7oBhATRM5bQ8UuKN0lX5BMhYEQ+jsycYLVV59nG92sHE4Jo2OElDKC9T1BdugqNr5L3a7XCNE6IzS1OxLdvkxjgrfLqYyk37YsPnn7qKubz8GDRAO96xiwmtqBRLw6KnMpgEzs4Ecokh5Ki1l6Y6uaKbqnbLgLg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by DB5PR08MB9970.eurprd08.prod.outlook.com (2603:10a6:10:489::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.23; Wed, 28 Jun 2023 13:45:56 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::2301:1cde:cfe7:eaf0]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::2301:1cde:cfe7:eaf0%6]) with mapi id 15.20.6521.026; Wed, 28 Jun 2023 13:45:56 +0000 Date: Wed, 28 Jun 2023 14:45:51 +0100 To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, Richard.Earnshaw@arm.com, Marcus.Shawcroft@arm.com, Kyrylo.Tkachov@arm.com, richard.sandiford@arm.com Subject: [PATCH 9/19]AArch64 middle-end: refactor vectorizable_comparison to make the main body re-usable. Message-ID: Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: VI1PR0902CA0060.eurprd09.prod.outlook.com (2603:10a6:802:1::49) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|DB5PR08MB9970:EE_|DBAEUR03FT057:EE_|VE1PR08MB5647:EE_ X-MS-Office365-Filtering-Correlation-Id: 9c055c20-5782-46ad-f3ba-08db77de0465 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: r+LpjBy6bSqlUZ73X+Al/ikVYRF4fFlCxWqjkaHx5au7FMDxabXx6zQoXmPkHdiSGh7pctoS/OKxx7KjYBqSM0ZEQWhYJ+3+v0Xa5Z4DKTDHrx7Rhwonm1gDJY6gQNx7YxC0mOioTQa7RtcfB5unVlH5tAbk8lCj3OmeiUZ7mKnHvOMIUHegAXsCzcVg8luzqDfiCvmVoylTIaJH4y/rxPCpwj2r4yFyGignmIhrvhXO9y8SAf/Bvy81R3aX5MC2LxrCRtHDp5bVMfz0Ov9aWDFmO+VoKEdLpawI/IZrAcdT+jAINXlJxsEieRS32PXkZ8WdqUjm+Jc5rBp3Ij2s3l6ormaV9eO10GwSrx6vodSyfp6ZDtT9XAk+NAF1ZZTERnMVUCdtlzFp/nWd/h6nOCDMDLSuZMa4TGzLoBzQE8mfaeuG3R5E+Gxi6ISMKnicXxho9dPH2TSqs4r6iaMDCScQqqzSI66I25oOin0UrQpWJIDB3NSQRsLXt5uV+A5J9t2yy0rcmloVyNdqdWJgHfd5exlHy3B7xUZz/aVTBv8In7Sre3DwRY5k44aKtPPW9vG77qZPnpecxdbD3I27sbffs0INeoWpDu/9G/xswALY9d1gh4ZPtJZkEFSUtFQo X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230028)(4636009)(136003)(346002)(396003)(376002)(366004)(39860400002)(451199021)(66946007)(66476007)(6916009)(44832011)(316002)(66556008)(478600001)(4326008)(36756003)(8936002)(235185007)(8676002)(5660300002)(86362001)(2906002)(33964004)(41300700001)(44144004)(6486002)(186003)(4743002)(6506007)(6512007)(6666004)(38100700002)(26005)(2616005)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB5PR08MB9970 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT057.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 09e8b3ef-72ff-4774-3f67-08db77ddff8e X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: lTx3nzH8anSeT4/F0ANP98x3moB1NpmS2usqA9zYiCga56zcAtQlgnuKgRR0fcDukWBIgpSF5llv0j5rdo8LR6+JpCmFTRSJ14p2sW1ZANDKbFvPnf4b+K/LH7Yya+nRr3bCpqkpLieKkyqrQhHBEKDUWDx3C9qAvseupDHqBuQxkxQ/RUzruOfQqiLk0XTnQ8DuLbDjO+J4lKfSX+OMsxjIms8jxrkvIF5elrdUFcB8nfMu/U1GfZYMp/q6bijJrLpwTZemKHfJytlFzhXYcaSfEGCwq0AzgsN3izaO4I+jTyhherZTn257MzIDnleI6gfetO16Zx+MJEEKBaDvjazOqnsI3DtQHbGVzLjdQY87SwENUjjVm2w9UOqIV02lNtCcU8fXwxzFfVrcuvnKwI7g28U+JXNiW96R6MOi4CB8gbgTACIdiSwd5u7/NHlFGO+tXbCAnCs9WpLEd25lq7BRt2CBBt2Bid/c+esWUKZjE5umjXlpSJqlp7TRKM2PoHNQW1JoM2cBCasanis/Et4ackzHOhgXmUwdLsgxss4asvCEQVkROGeOQrBpaFcvlAtZiaqMKbTDC+q3onSWl1J7IR0h6v8ATgEqCDxlXZiPKUdFTBBSqDXCF4HCE7hfRuDjfhYaJ73kjbvVs7NY02kJ5knEvgvBBFFszuLtNcok94+SLfFRhFNmpbOj2r75WcDjo45fJHY90wKGimd0YhLke8x6XU5N4Js9hboQihBGpwXi86srZGpl788skYcgrRwh/cRehVMh3u4oNh5neg== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230028)(4636009)(136003)(346002)(39860400002)(396003)(376002)(451199021)(40470700004)(46966006)(36840700001)(235185007)(82310400005)(186003)(2906002)(26005)(5660300002)(41300700001)(33964004)(82740400003)(8676002)(6666004)(6916009)(36756003)(316002)(2616005)(478600001)(6486002)(336012)(86362001)(4326008)(4743002)(81166007)(44144004)(47076005)(70586007)(8936002)(44832011)(356005)(40460700003)(40480700001)(70206006)(6506007)(6512007)(36860700001)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Jun 2023 13:46:04.3136 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 9c055c20-5782-46ad-f3ba-08db77de0465 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT057.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: VE1PR08MB5647 X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tamar Christina via Gcc-patches From: Tamar Christina Reply-To: Tamar Christina Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769954481606761993?= X-GMAIL-MSGID: =?utf-8?q?1769954481606761993?= Hi All, Vectorization of a gcond starts off essentially the same as vectorizing a comparison witht he only difference being how the operands are extracted. This refactors vectorable_comparison such that we now have a generic function that can be used from vectorizable_early_break. The refactoring splits the gassign checks and actual validation/codegen off to a helper function. No change in functionality expected. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * tree-vect-stmts.cc (vectorizable_comparison): Refactor, splitting body to ... (vectorizable_comparison_1): ...This. --- inline copy of patch -- diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index ae24f3e66e63d9bd9763284a47fb2c911335c4c1..f3e33cd4ed125b9564ca81acd197693fc3457c31 100644 --- diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index ae24f3e66e63d9bd9763284a47fb2c911335c4c1..f3e33cd4ed125b9564ca81acd197693fc3457c31 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -11332,21 +11332,22 @@ vectorizable_condition (vec_info *vinfo, /* vectorizable_comparison. - Check if STMT_INFO is comparison expression that can be vectorized. +/* Helper of vectorizable_comparison. + + Check if STMT_INFO is comparison expression CODE that can be vectorized. If VEC_STMT is also passed, vectorize STMT_INFO: create a vectorized comparison, put it in VEC_STMT, and insert it at GSI. Return true if STMT_INFO is vectorizable in this way. */ static bool -vectorizable_comparison (vec_info *vinfo, - stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, - gimple **vec_stmt, - slp_tree slp_node, stmt_vector_for_cost *cost_vec) +vectorizable_comparison_1 (vec_info *vinfo, tree vectype, + stmt_vec_info stmt_info, tree_code code, + gimple_stmt_iterator *gsi, gimple **vec_stmt, + slp_tree slp_node, stmt_vector_for_cost *cost_vec) { tree lhs, rhs1, rhs2; tree vectype1 = NULL_TREE, vectype2 = NULL_TREE; - tree vectype = STMT_VINFO_VECTYPE (stmt_info); tree vec_rhs1 = NULL_TREE, vec_rhs2 = NULL_TREE; tree new_temp; loop_vec_info loop_vinfo = dyn_cast (vinfo); @@ -11354,7 +11355,7 @@ vectorizable_comparison (vec_info *vinfo, int ndts = 2; poly_uint64 nunits; int ncopies; - enum tree_code code, bitop1 = NOP_EXPR, bitop2 = NOP_EXPR; + enum tree_code bitop1 = NOP_EXPR, bitop2 = NOP_EXPR; int i; bb_vec_info bb_vinfo = dyn_cast (vinfo); vec vec_oprnds0 = vNULL; @@ -11377,14 +11378,6 @@ vectorizable_comparison (vec_info *vinfo, ncopies = vect_get_num_copies (loop_vinfo, vectype); gcc_assert (ncopies >= 1); - if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def) - return false; - - gassign *stmt = dyn_cast (stmt_info->stmt); - if (!stmt) - return false; - - code = gimple_assign_rhs_code (stmt); if (TREE_CODE_CLASS (code) != tcc_comparison) return false; @@ -11499,7 +11492,6 @@ vectorizable_comparison (vec_info *vinfo, return false; } - STMT_VINFO_TYPE (stmt_info) = comparison_vec_info_type; vect_model_simple_cost (vinfo, stmt_info, ncopies * (1 + (bitop2 != NOP_EXPR)), dts, ndts, slp_node, cost_vec); @@ -11565,6 +11557,44 @@ vectorizable_comparison (vec_info *vinfo, return true; } +/* vectorizable_comparison. + + Check if STMT_INFO is comparison expression that can be vectorized. + If VEC_STMT is also passed, vectorize STMT_INFO: create a vectorized + comparison, put it in VEC_STMT, and insert it at GSI. + + Return true if STMT_INFO is vectorizable in this way. */ + +static bool +vectorizable_comparison (vec_info *vinfo, + stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, + gimple **vec_stmt, + slp_tree slp_node, stmt_vector_for_cost *cost_vec) +{ + bb_vec_info bb_vinfo = dyn_cast (vinfo); + + if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo) + return false; + + if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def) + return false; + + gassign *stmt = dyn_cast (stmt_info->stmt); + if (!stmt) + return false; + + enum tree_code code = gimple_assign_rhs_code (stmt); + tree vectype = STMT_VINFO_VECTYPE (stmt_info); + if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi, + vec_stmt, slp_node, cost_vec)) + return false; + + if (!vec_stmt) + STMT_VINFO_TYPE (stmt_info) = comparison_vec_info_type; + + return true; +} + /* If SLP_NODE is nonnull, return true if vectorizable_live_operation can handle all live statements in the node. Otherwise return true if STMT_INFO is not live or if vectorizable_live_operation can handle it. --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -11332,21 +11332,22 @@ vectorizable_condition (vec_info *vinfo, /* vectorizable_comparison. - Check if STMT_INFO is comparison expression that can be vectorized. +/* Helper of vectorizable_comparison. + + Check if STMT_INFO is comparison expression CODE that can be vectorized. If VEC_STMT is also passed, vectorize STMT_INFO: create a vectorized comparison, put it in VEC_STMT, and insert it at GSI. Return true if STMT_INFO is vectorizable in this way. */ static bool -vectorizable_comparison (vec_info *vinfo, - stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, - gimple **vec_stmt, - slp_tree slp_node, stmt_vector_for_cost *cost_vec) +vectorizable_comparison_1 (vec_info *vinfo, tree vectype, + stmt_vec_info stmt_info, tree_code code, + gimple_stmt_iterator *gsi, gimple **vec_stmt, + slp_tree slp_node, stmt_vector_for_cost *cost_vec) { tree lhs, rhs1, rhs2; tree vectype1 = NULL_TREE, vectype2 = NULL_TREE; - tree vectype = STMT_VINFO_VECTYPE (stmt_info); tree vec_rhs1 = NULL_TREE, vec_rhs2 = NULL_TREE; tree new_temp; loop_vec_info loop_vinfo = dyn_cast (vinfo); @@ -11354,7 +11355,7 @@ vectorizable_comparison (vec_info *vinfo, int ndts = 2; poly_uint64 nunits; int ncopies; - enum tree_code code, bitop1 = NOP_EXPR, bitop2 = NOP_EXPR; + enum tree_code bitop1 = NOP_EXPR, bitop2 = NOP_EXPR; int i; bb_vec_info bb_vinfo = dyn_cast (vinfo); vec vec_oprnds0 = vNULL; @@ -11377,14 +11378,6 @@ vectorizable_comparison (vec_info *vinfo, ncopies = vect_get_num_copies (loop_vinfo, vectype); gcc_assert (ncopies >= 1); - if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def) - return false; - - gassign *stmt = dyn_cast (stmt_info->stmt); - if (!stmt) - return false; - - code = gimple_assign_rhs_code (stmt); if (TREE_CODE_CLASS (code) != tcc_comparison) return false; @@ -11499,7 +11492,6 @@ vectorizable_comparison (vec_info *vinfo, return false; } - STMT_VINFO_TYPE (stmt_info) = comparison_vec_info_type; vect_model_simple_cost (vinfo, stmt_info, ncopies * (1 + (bitop2 != NOP_EXPR)), dts, ndts, slp_node, cost_vec); @@ -11565,6 +11557,44 @@ vectorizable_comparison (vec_info *vinfo, return true; } +/* vectorizable_comparison. + + Check if STMT_INFO is comparison expression that can be vectorized. + If VEC_STMT is also passed, vectorize STMT_INFO: create a vectorized + comparison, put it in VEC_STMT, and insert it at GSI. + + Return true if STMT_INFO is vectorizable in this way. */ + +static bool +vectorizable_comparison (vec_info *vinfo, + stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, + gimple **vec_stmt, + slp_tree slp_node, stmt_vector_for_cost *cost_vec) +{ + bb_vec_info bb_vinfo = dyn_cast (vinfo); + + if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo) + return false; + + if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def) + return false; + + gassign *stmt = dyn_cast (stmt_info->stmt); + if (!stmt) + return false; + + enum tree_code code = gimple_assign_rhs_code (stmt); + tree vectype = STMT_VINFO_VECTYPE (stmt_info); + if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi, + vec_stmt, slp_node, cost_vec)) + return false; + + if (!vec_stmt) + STMT_VINFO_TYPE (stmt_info) = comparison_vec_info_type; + + return true; +} + /* If SLP_NODE is nonnull, return true if vectorizable_live_operation can handle all live statements in the node. Otherwise return true if STMT_INFO is not live or if vectorizable_live_operation can handle it. From patchwork Wed Jun 28 13:46:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 113903 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp8940742vqr; Wed, 28 Jun 2023 06:50:34 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ57+B/qtnfvCJqYNPjre1W6gS5Wg0nYeFB9Pw4FIz923RbtXbQx7hZidsVq906ZuKWD7Bj+ X-Received: by 2002:a17:907:868e:b0:991:c3ce:d79b with SMTP id qa14-20020a170907868e00b00991c3ced79bmr7875358ejc.30.1687960233768; Wed, 28 Jun 2023 06:50:33 -0700 (PDT) Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id k16-20020a17090646d000b009920f18a5e8si2348093ejs.727.2023.06.28.06.50.33 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Jun 2023 06:50:33 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=r6ZNMLwD; arc=fail (signature failed); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 29D37385C6E7 for ; Wed, 28 Jun 2023 13:48:10 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 29D37385C6E7 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1687960090; bh=/6nL4gs+hwSD247JUrHxz8QJ2x7Kvzxdan2O8i+bGgY=; h=Date:To:Cc:Subject:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=r6ZNMLwDlTnrxh8ikzAEO34HC5w4mnOxTYW4dhMsou9GKd1kmkBoDCAqVcV8uHIar zKY5ToY3TrK0ndnmK9mMJRK+U367AT4oERbjzIR0O8X/iD2OOwvN30UIGDVhG+usPF dtDHstQvWWdNkgkCuqtlZqlUli5qslFwyLkUUMyo= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR04-HE1-obe.outbound.protection.outlook.com (mail-he1eur04on2057.outbound.protection.outlook.com [40.107.7.57]) by sourceware.org (Postfix) with ESMTPS id 4047C3857014 for ; Wed, 28 Jun 2023 13:46:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4047C3857014 Received: from DU2PR04CA0078.eurprd04.prod.outlook.com (2603:10a6:10:232::23) by DU0PR08MB9106.eurprd08.prod.outlook.com (2603:10a6:10:470::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.24; Wed, 28 Jun 2023 13:46:33 +0000 Received: from DBAEUR03FT013.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:232:cafe::ff) by DU2PR04CA0078.outlook.office365.com (2603:10a6:10:232::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.34 via Frontend Transport; Wed, 28 Jun 2023 13:46:33 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT013.mail.protection.outlook.com (100.127.142.222) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6544.18 via Frontend Transport; Wed, 28 Jun 2023 13:46:33 +0000 Received: ("Tessian outbound d6c4ee3ba1eb:v142"); Wed, 28 Jun 2023 13:46:33 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 6defedc034df0521 X-CR-MTA-TID: 64aa7808 Received: from 9b6dbf0216aa.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 628EBB33-4AF4-416B-9741-71AB9FAB0D31.1; Wed, 28 Jun 2023 13:46:26 +0000 Received: from EUR05-DB8-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 9b6dbf0216aa.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 28 Jun 2023 13:46:26 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=cymalg/1g+I6SMsPpvksXuK6x2sKTnos4IqUTGo/IbiLBOob9gSDXTj54go9bUUAfJ13M/R5rlbdECfJi9BUyh2Rfs7iMhBbsjPERklGTDi2E1EsAzhX/VnHJzVPpdVlvdNJ2V/tu7tgMlQUWe5jF5dRM+Q/DyOjbiB+sIslQ5TH3KL4ZNRkqRsMHqHXGckW5msilZ8gf8Qej2MIRWyJSqaYD+vt54x9BePC44wqh+zbQvCNZL52K6jR/i+dFlxe1MHIIXrdTT/PGStvGRNYFntnuNkV0Oa77XiPAJ4SBO3wtUcvrMGlv8VXmJe5gMFdFsleQeWMDfY7kbYUK6V3Hg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=/6nL4gs+hwSD247JUrHxz8QJ2x7Kvzxdan2O8i+bGgY=; b=IGylH3eh0XVUvn5qkAFN9XSOthQmwx0i1S6KhbRsVEpmOzCxO1450qCXgmLyX+wwJ9/B/xGtyzkM7zmEOSgI7hA2WFIeBlJrXDnqOlxFVcZAvaEq/B/fmyYu4tlbKGaXpAttJEdhXFNkxNOzjpx1Wsstrry+oMqiLHxrelugJvILQ+zYhstz/V2hKE30SwYmdSUsbbr7h+8FKgiRamg4mdqqdlF1L5k3gEBCC8kLAZYBdRjy1Bum45vYuWu5AixfTiUrjkIT/RXFj3dAMtI9AIig1kWaHn5+B4ucCiUjUqWTkTI9niLlWDepHaaG7GHZIq1JDAS6dAAEmIlIw2ys1w== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by DB5PR08MB9970.eurprd08.prod.outlook.com (2603:10a6:10:489::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.23; Wed, 28 Jun 2023 13:46:24 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::2301:1cde:cfe7:eaf0]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::2301:1cde:cfe7:eaf0%6]) with mapi id 15.20.6521.026; Wed, 28 Jun 2023 13:46:24 +0000 Date: Wed, 28 Jun 2023 14:46:16 +0100 To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, rguenther@suse.de, jlaw@ventanamicro.com Subject: [PATCH 10/19]middle-end: implement vectorizable_early_break. Message-ID: Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: SA9PR03CA0001.namprd03.prod.outlook.com (2603:10b6:806:20::6) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|DB5PR08MB9970:EE_|DBAEUR03FT013:EE_|DU0PR08MB9106:EE_ X-MS-Office365-Filtering-Correlation-Id: b08eff13-0f05-479c-527b-08db77de15a8 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: Y7ZTaKdsis5zKYTkMyhgsY1vcD93OjkmofLrWUKp7iDxB+lpjKZgEdhZdKvEGX/gNMnHvFtLJNmF1qxqTg/JavxB3gFU57NxXd3QMdi+bylsS6zV5t7nfw38edj0HdhpKyjq0rxEy4rrmvWvMiMszDOj1yCFontiNjW/MvR0OgoGzbJzx7yUfcnM3SL4wBCbblfFucjlMO9oAVuzM+20gLLZFBt99B7utq3AhFhWS5AwhnKjjE3GLktFmmE0VAgYGGIPDj0KTNw8XnXR36TdZA4g6igjCrl1YccSr3QyqBr07f2UTidJOMAI+opjDy36z01rVe1aCOMiQokYy1GbR+ITDGiVojz/LwKl3kFBCgjV9ZOX5V6AR2J0AZmFKjsCt+6x0VVJfOuIGSK4e784MYeBnyjxlFegJV7Bpg7jFVWg2pIQuCgOOfp6oDGLKp0uAqkExMEfJtEE1a7EGpDYf67lT4mRGIZu0QszzO3s7y4K/vYrPNBDj8HdLkqTM9Qbfhm4C7wygXltSI7YyY8i+D/LMjuk2nbQn9wsaNmBiJEfxlO9q4GLX24NAJfymlBmqBHlI/NsScobUK0TLChPBbv28jfs4E5WiJZe/XMX8r0vbS1jYdJe4yGC2Irny808 X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230028)(4636009)(136003)(346002)(396003)(376002)(366004)(39860400002)(451199021)(66946007)(66476007)(6916009)(44832011)(316002)(66556008)(478600001)(4326008)(36756003)(8936002)(235185007)(8676002)(5660300002)(86362001)(2906002)(33964004)(41300700001)(44144004)(6486002)(30864003)(186003)(4743002)(6506007)(6512007)(6666004)(38100700002)(26005)(83380400001)(2616005)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB5PR08MB9970 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT013.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 33ddc9a3-422b-4c8c-431c-08db77de100e X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: WtC+VYwds9xbNQhusQSCviYenXsMdTM2WoKBnAlhK4fGRo+t1yFbDf59+u/kXAg5UW0shBJMob+Otmake7wvKnZbNs644gUT2j2Dc+ze7Mm/fxshH1HhYrk/Eup3xG5JI8xbmMiU4YnxPg50C5CK1msFagJsg4K76o4TZ6nR1J6h/5OpglN/QEBoZiWHzaL1FTdqjDBJsU4l1+t/h+IwiT4dBm9nXjWsfoQBss9tR7E/MxBoQvTb1mJT6MSbN5BXmAruVt9zI2AzD2bning6ZZ12M7rJrIiIo6IqSKGQMAxSE49FjXsW/2INLBzybeQZVqN5esSTrRu8PT2wuV+IA/L/lnuIAdbY+zjOEYD1vYh5PghlLRkyXEbGS/ndsFEK2WJ/0XUq0a7K1ElnTV0Q2LwqcfYSnzk04mrJ+n9IYy5BFECOzx8H2tPs2L0l+efapP+gRckRe8+61zAja1kXn9z85FUbtDJ4RkNTsGg5LrLW6OHlRUVKlbAmasfGw4gZIfzc0tDx63aFm/wLTQicjoSpMDzhWJ/uI5B50c+OG3JcS5RuzpF6yV+WSOydkvASp8nvdUds4oS7AIO5coG2RsWPE6i62OFEdKVZ8ciUZpqubKoWobQ1BD5ZYV7ijADET7LXm6vKAFiVOK/O3MzQDWA4Ecn1UD+K0RVL2CCPADzbMfRE8miutRpbE2T61KugtYfA8FsQhBgNhEjbx0Xonas24kg0o3WVzvJdoG9dtIqkGUKZnRSt8NnBksPhy26ubYb6Kw+qWSqlKRdmutu+VQ== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230028)(4636009)(396003)(39860400002)(136003)(376002)(346002)(451199021)(46966006)(40470700004)(36840700001)(44832011)(235185007)(5660300002)(4326008)(6916009)(70206006)(478600001)(36756003)(316002)(70586007)(8936002)(8676002)(2906002)(40460700003)(4743002)(33964004)(36860700001)(41300700001)(6486002)(44144004)(30864003)(82310400005)(186003)(40480700001)(336012)(86362001)(26005)(6506007)(6512007)(47076005)(6666004)(356005)(107886003)(81166007)(2616005)(82740400003)(83380400001)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Jun 2023 13:46:33.2698 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: b08eff13-0f05-479c-527b-08db77de15a8 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT013.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU0PR08MB9106 X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tamar Christina via Gcc-patches From: Tamar Christina Reply-To: Tamar Christina Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769954590103412953?= X-GMAIL-MSGID: =?utf-8?q?1769954590103412953?= Hi All, This implements vectorable_early_exit which is used as the codegen part of vectorizing a gcond. For the most part it shares the majority of the code with vectorizable_comparison with addition that it needs to be able to reduce multiple resulting statements into a single one for use in the gcond, and also needs to be able to perform masking on the comparisons. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without lhs. (vectorizable_early_exit): New. (vect_analyze_stmt, vect_transform_stmt): Use it. (vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond. --- inline copy of patch -- diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index f3e33cd4ed125b9564ca81acd197693fc3457c31..87c4353fa5180fcb7f60b192897456cf24f3fdbe 100644 --- diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index f3e33cd4ed125b9564ca81acd197693fc3457c31..87c4353fa5180fcb7f60b192897456cf24f3fdbe 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -11330,8 +11330,186 @@ vectorizable_condition (vec_info *vinfo, return true; } -/* vectorizable_comparison. +static bool +vectorizable_comparison_1 (vec_info *, tree, stmt_vec_info, tree_code, + gimple_stmt_iterator *, gimple **, slp_tree, + stmt_vector_for_cost *); + +/* Check to see if the current early break given in STMT_INFO is valid for + vectorization. */ + +static bool +vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info, + gimple_stmt_iterator *gsi, gimple **vec_stmt, + slp_tree slp_node, stmt_vector_for_cost *cost_vec) +{ + loop_vec_info loop_vinfo = dyn_cast (vinfo); + if (!loop_vinfo + || !is_a (STMT_VINFO_STMT (stmt_info))) + return false; + + if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_early_exit_def) + return false; + + if (!STMT_VINFO_RELEVANT_P (stmt_info)) + return false; + + gimple_match_op op; + if (!gimple_extract_op (stmt_info->stmt, &op)) + gcc_unreachable (); + gcc_assert (op.code.is_tree_code ()); + auto code = tree_code (op.code); + + tree vectype_out = STMT_VINFO_VECTYPE (stmt_info); + gcc_assert (vectype_out); + + stmt_vec_info operand0_info + = loop_vinfo->lookup_stmt (SSA_NAME_DEF_STMT (op.ops[0])); + if (!operand0_info) + return false; + /* If we're in a pattern get the type of the original statement. */ + if (STMT_VINFO_IN_PATTERN_P (operand0_info)) + operand0_info = STMT_VINFO_RELATED_STMT (operand0_info); + tree vectype_op = STMT_VINFO_VECTYPE (operand0_info); + + tree truth_type = truth_type_for (vectype_op); + machine_mode mode = TYPE_MODE (truth_type); + int ncopies; + + if (slp_node) + ncopies = 1; + else + ncopies = vect_get_num_copies (loop_vinfo, truth_type); + + vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo); + bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo); + + /* Analyze only. */ + if (!vec_stmt) + { + if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "can't vectorize early exit because the " + "target doesn't support flag setting vector " + "comparisons.\n"); + return false; + } + + if (!expand_vec_cmp_expr_p (vectype_op, truth_type, NE_EXPR)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "can't vectorize early exit because the " + "target does not support boolean vector " + "comparisons for type %T.\n", truth_type); + return false; + } + + if (ncopies > 1 + && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "can't vectorize early exit because the " + "target does not support boolean vector OR for " + "type %T.\n", truth_type); + return false; + } + + if (!vectorizable_comparison_1 (vinfo, truth_type, stmt_info, code, gsi, + vec_stmt, slp_node, cost_vec)) + return false; + if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)) + vect_record_loop_mask (loop_vinfo, masks, ncopies, truth_type, NULL); + + return true; + } + + /* Tranform. */ + + tree new_temp = NULL_TREE; + gimple *new_stmt = NULL; + + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n"); + + if (!vectorizable_comparison_1 (vinfo, truth_type, stmt_info, code, gsi, + vec_stmt, slp_node, cost_vec)) + gcc_unreachable (); + + gimple *stmt = STMT_VINFO_STMT (stmt_info); + basic_block cond_bb = gimple_bb (stmt); + gimple_stmt_iterator cond_gsi = gsi_last_bb (cond_bb); + + vec stmts; + + if (slp_node) + stmts = SLP_TREE_VEC_STMTS (slp_node); + else + stmts = STMT_VINFO_VEC_STMTS (stmt_info); + + /* Determine if we need to reduce the final value. */ + if (stmts.length () > 1) + { + /* We build the reductions in a way to maintain as much parallelism as + possible. */ + auto_vec workset (stmts.length ()); + workset.splice (stmts); + while (workset.length () > 1) + { + new_temp = make_temp_ssa_name (truth_type, NULL, "vexit_reduc"); + gimple *arg0 = workset.pop (); + gimple *arg1 = workset.pop (); + new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, + gimple_assign_lhs (arg0), + gimple_assign_lhs (arg1)); + vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt, + &cond_gsi); + if (slp_node) + SLP_TREE_VEC_STMTS (slp_node).quick_push (new_stmt); + else + STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt); + workset.quick_insert (0, new_stmt); + } + } + else + new_stmt = stmts[0]; + + gcc_assert (new_stmt); + + tree cond = gimple_assign_lhs (new_stmt); + if (masked_loop_p) + { + tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, truth_type, 0); + cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond, + &cond_gsi); + } + + /* Now build the new conditional. Pattern gimple_conds get dropped during + codegen so we must replace the original insn. */ + if (is_pattern_stmt_p (stmt_info)) + stmt = STMT_VINFO_STMT (STMT_VINFO_RELATED_STMT (stmt_info)); + + tree t = fold_build2 (NE_EXPR, boolean_type_node, cond, + build_zero_cst (truth_type)); + t = canonicalize_cond_expr_cond (t); + gimple_cond_set_condition_from_tree ((gcond*)stmt, t); + update_stmt (stmt); + + if (slp_node) + SLP_TREE_VEC_STMTS (slp_node).quick_push (stmt); + else + STMT_VINFO_VEC_STMTS (stmt_info).safe_push (stmt); + + + if (!slp_node) + *vec_stmt = stmt; + + return true; +} /* Helper of vectorizable_comparison. Check if STMT_INFO is comparison expression CODE that can be vectorized. @@ -11501,8 +11679,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype, /* Transform. */ /* Handle def. */ - lhs = gimple_assign_lhs (stmt); - mask = vect_create_destination_var (lhs, mask_type); + lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info)); + if (lhs) + mask = vect_create_destination_var (lhs, mask_type); vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies, rhs1, &vec_oprnds0, vectype, @@ -11516,7 +11695,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype, gimple *new_stmt; vec_rhs2 = vec_oprnds1[i]; - new_temp = make_ssa_name (mask); + if (lhs) + new_temp = make_ssa_name (mask); + else + new_temp = make_temp_ssa_name (mask_type, NULL, "cmp"); if (bitop1 == NOP_EXPR) { new_stmt = gimple_build_assign (new_temp, code, @@ -11816,7 +11998,9 @@ vect_analyze_stmt (vec_info *vinfo, || vectorizable_lc_phi (as_a (vinfo), stmt_info, NULL, node) || vectorizable_recurr (as_a (vinfo), - stmt_info, NULL, node, cost_vec)); + stmt_info, NULL, node, cost_vec) + || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node, + cost_vec)); else { if (bb_vinfo) @@ -11839,7 +12023,10 @@ vect_analyze_stmt (vec_info *vinfo, NULL, NULL, node, cost_vec) || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node, cost_vec) - || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)); + || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec) + || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node, + cost_vec)); + } if (node) @@ -11997,6 +12184,12 @@ vect_transform_stmt (vec_info *vinfo, gcc_assert (done); break; + case loop_exit_ctrl_vec_info_type: + done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt, + slp_node, NULL); + gcc_assert (done); + break; + default: if (!STMT_VINFO_LIVE_P (stmt_info)) { @@ -12395,6 +12588,9 @@ vect_is_simple_use (tree operand, vec_info *vinfo, enum vect_def_type *dt, case vect_first_order_recurrence: dump_printf (MSG_NOTE, "first order recurrence\n"); break; + case vect_early_exit_def: + dump_printf (MSG_NOTE, "early exit\n"); + break; case vect_unknown_def_type: dump_printf (MSG_NOTE, "unknown\n"); break; @@ -12511,6 +12707,14 @@ vect_is_simple_use (vec_info *vinfo, stmt_vec_info stmt, slp_tree slp_node, else *op = gimple_op (ass, operand + 1); } + else if (gcond *cond = dyn_cast (stmt->stmt)) + { + gimple_match_op m_op; + if (!gimple_extract_op (cond, &m_op)) + return false; + gcc_assert (m_op.code.is_tree_code ()); + *op = m_op.ops[operand]; + } else if (gcall *call = dyn_cast (stmt->stmt)) *op = gimple_call_arg (call, operand); else @@ -13121,6 +13325,8 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info, *nunits_vectype_out = NULL_TREE; if (gimple_get_lhs (stmt) == NULL_TREE + /* Allow vector conditionals through here. */ + && !is_ctrl_stmt (stmt) /* MASK_STORE has no lhs, but is ok. */ && !gimple_call_internal_p (stmt, IFN_MASK_STORE)) { @@ -13137,7 +13343,7 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info, } return opt_result::failure_at (stmt, - "not vectorized: irregular stmt.%G", stmt); + "not vectorized: irregular stmt: %G", stmt); } tree vectype; @@ -13166,6 +13372,14 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info, scalar_type = TREE_TYPE (DR_REF (dr)); else if (gimple_call_internal_p (stmt, IFN_MASK_STORE)) scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3)); + else if (is_ctrl_stmt (stmt)) + { + gcond *cond = dyn_cast (stmt); + if (!cond) + return opt_result::failure_at (stmt, "not vectorized: unsupported" + " control flow statement.\n"); + scalar_type = TREE_TYPE (gimple_cond_rhs (stmt)); + } else scalar_type = TREE_TYPE (gimple_get_lhs (stmt)); --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -11330,8 +11330,186 @@ vectorizable_condition (vec_info *vinfo, return true; } -/* vectorizable_comparison. +static bool +vectorizable_comparison_1 (vec_info *, tree, stmt_vec_info, tree_code, + gimple_stmt_iterator *, gimple **, slp_tree, + stmt_vector_for_cost *); + +/* Check to see if the current early break given in STMT_INFO is valid for + vectorization. */ + +static bool +vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info, + gimple_stmt_iterator *gsi, gimple **vec_stmt, + slp_tree slp_node, stmt_vector_for_cost *cost_vec) +{ + loop_vec_info loop_vinfo = dyn_cast (vinfo); + if (!loop_vinfo + || !is_a (STMT_VINFO_STMT (stmt_info))) + return false; + + if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_early_exit_def) + return false; + + if (!STMT_VINFO_RELEVANT_P (stmt_info)) + return false; + + gimple_match_op op; + if (!gimple_extract_op (stmt_info->stmt, &op)) + gcc_unreachable (); + gcc_assert (op.code.is_tree_code ()); + auto code = tree_code (op.code); + + tree vectype_out = STMT_VINFO_VECTYPE (stmt_info); + gcc_assert (vectype_out); + + stmt_vec_info operand0_info + = loop_vinfo->lookup_stmt (SSA_NAME_DEF_STMT (op.ops[0])); + if (!operand0_info) + return false; + /* If we're in a pattern get the type of the original statement. */ + if (STMT_VINFO_IN_PATTERN_P (operand0_info)) + operand0_info = STMT_VINFO_RELATED_STMT (operand0_info); + tree vectype_op = STMT_VINFO_VECTYPE (operand0_info); + + tree truth_type = truth_type_for (vectype_op); + machine_mode mode = TYPE_MODE (truth_type); + int ncopies; + + if (slp_node) + ncopies = 1; + else + ncopies = vect_get_num_copies (loop_vinfo, truth_type); + + vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo); + bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo); + + /* Analyze only. */ + if (!vec_stmt) + { + if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "can't vectorize early exit because the " + "target doesn't support flag setting vector " + "comparisons.\n"); + return false; + } + + if (!expand_vec_cmp_expr_p (vectype_op, truth_type, NE_EXPR)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "can't vectorize early exit because the " + "target does not support boolean vector " + "comparisons for type %T.\n", truth_type); + return false; + } + + if (ncopies > 1 + && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "can't vectorize early exit because the " + "target does not support boolean vector OR for " + "type %T.\n", truth_type); + return false; + } + + if (!vectorizable_comparison_1 (vinfo, truth_type, stmt_info, code, gsi, + vec_stmt, slp_node, cost_vec)) + return false; + if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)) + vect_record_loop_mask (loop_vinfo, masks, ncopies, truth_type, NULL); + + return true; + } + + /* Tranform. */ + + tree new_temp = NULL_TREE; + gimple *new_stmt = NULL; + + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n"); + + if (!vectorizable_comparison_1 (vinfo, truth_type, stmt_info, code, gsi, + vec_stmt, slp_node, cost_vec)) + gcc_unreachable (); + + gimple *stmt = STMT_VINFO_STMT (stmt_info); + basic_block cond_bb = gimple_bb (stmt); + gimple_stmt_iterator cond_gsi = gsi_last_bb (cond_bb); + + vec stmts; + + if (slp_node) + stmts = SLP_TREE_VEC_STMTS (slp_node); + else + stmts = STMT_VINFO_VEC_STMTS (stmt_info); + + /* Determine if we need to reduce the final value. */ + if (stmts.length () > 1) + { + /* We build the reductions in a way to maintain as much parallelism as + possible. */ + auto_vec workset (stmts.length ()); + workset.splice (stmts); + while (workset.length () > 1) + { + new_temp = make_temp_ssa_name (truth_type, NULL, "vexit_reduc"); + gimple *arg0 = workset.pop (); + gimple *arg1 = workset.pop (); + new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, + gimple_assign_lhs (arg0), + gimple_assign_lhs (arg1)); + vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt, + &cond_gsi); + if (slp_node) + SLP_TREE_VEC_STMTS (slp_node).quick_push (new_stmt); + else + STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt); + workset.quick_insert (0, new_stmt); + } + } + else + new_stmt = stmts[0]; + + gcc_assert (new_stmt); + + tree cond = gimple_assign_lhs (new_stmt); + if (masked_loop_p) + { + tree mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, truth_type, 0); + cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond, + &cond_gsi); + } + + /* Now build the new conditional. Pattern gimple_conds get dropped during + codegen so we must replace the original insn. */ + if (is_pattern_stmt_p (stmt_info)) + stmt = STMT_VINFO_STMT (STMT_VINFO_RELATED_STMT (stmt_info)); + + tree t = fold_build2 (NE_EXPR, boolean_type_node, cond, + build_zero_cst (truth_type)); + t = canonicalize_cond_expr_cond (t); + gimple_cond_set_condition_from_tree ((gcond*)stmt, t); + update_stmt (stmt); + + if (slp_node) + SLP_TREE_VEC_STMTS (slp_node).quick_push (stmt); + else + STMT_VINFO_VEC_STMTS (stmt_info).safe_push (stmt); + + + if (!slp_node) + *vec_stmt = stmt; + + return true; +} /* Helper of vectorizable_comparison. Check if STMT_INFO is comparison expression CODE that can be vectorized. @@ -11501,8 +11679,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype, /* Transform. */ /* Handle def. */ - lhs = gimple_assign_lhs (stmt); - mask = vect_create_destination_var (lhs, mask_type); + lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info)); + if (lhs) + mask = vect_create_destination_var (lhs, mask_type); vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies, rhs1, &vec_oprnds0, vectype, @@ -11516,7 +11695,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree vectype, gimple *new_stmt; vec_rhs2 = vec_oprnds1[i]; - new_temp = make_ssa_name (mask); + if (lhs) + new_temp = make_ssa_name (mask); + else + new_temp = make_temp_ssa_name (mask_type, NULL, "cmp"); if (bitop1 == NOP_EXPR) { new_stmt = gimple_build_assign (new_temp, code, @@ -11816,7 +11998,9 @@ vect_analyze_stmt (vec_info *vinfo, || vectorizable_lc_phi (as_a (vinfo), stmt_info, NULL, node) || vectorizable_recurr (as_a (vinfo), - stmt_info, NULL, node, cost_vec)); + stmt_info, NULL, node, cost_vec) + || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node, + cost_vec)); else { if (bb_vinfo) @@ -11839,7 +12023,10 @@ vect_analyze_stmt (vec_info *vinfo, NULL, NULL, node, cost_vec) || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node, cost_vec) - || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)); + || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec) + || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node, + cost_vec)); + } if (node) @@ -11997,6 +12184,12 @@ vect_transform_stmt (vec_info *vinfo, gcc_assert (done); break; + case loop_exit_ctrl_vec_info_type: + done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt, + slp_node, NULL); + gcc_assert (done); + break; + default: if (!STMT_VINFO_LIVE_P (stmt_info)) { @@ -12395,6 +12588,9 @@ vect_is_simple_use (tree operand, vec_info *vinfo, enum vect_def_type *dt, case vect_first_order_recurrence: dump_printf (MSG_NOTE, "first order recurrence\n"); break; + case vect_early_exit_def: + dump_printf (MSG_NOTE, "early exit\n"); + break; case vect_unknown_def_type: dump_printf (MSG_NOTE, "unknown\n"); break; @@ -12511,6 +12707,14 @@ vect_is_simple_use (vec_info *vinfo, stmt_vec_info stmt, slp_tree slp_node, else *op = gimple_op (ass, operand + 1); } + else if (gcond *cond = dyn_cast (stmt->stmt)) + { + gimple_match_op m_op; + if (!gimple_extract_op (cond, &m_op)) + return false; + gcc_assert (m_op.code.is_tree_code ()); + *op = m_op.ops[operand]; + } else if (gcall *call = dyn_cast (stmt->stmt)) *op = gimple_call_arg (call, operand); else @@ -13121,6 +13325,8 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info, *nunits_vectype_out = NULL_TREE; if (gimple_get_lhs (stmt) == NULL_TREE + /* Allow vector conditionals through here. */ + && !is_ctrl_stmt (stmt) /* MASK_STORE has no lhs, but is ok. */ && !gimple_call_internal_p (stmt, IFN_MASK_STORE)) { @@ -13137,7 +13343,7 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info, } return opt_result::failure_at (stmt, - "not vectorized: irregular stmt.%G", stmt); + "not vectorized: irregular stmt: %G", stmt); } tree vectype; @@ -13166,6 +13372,14 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info, scalar_type = TREE_TYPE (DR_REF (dr)); else if (gimple_call_internal_p (stmt, IFN_MASK_STORE)) scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3)); + else if (is_ctrl_stmt (stmt)) + { + gcond *cond = dyn_cast (stmt); + if (!cond) + return opt_result::failure_at (stmt, "not vectorized: unsupported" + " control flow statement.\n"); + scalar_type = TREE_TYPE (gimple_cond_rhs (stmt)); + } else scalar_type = TREE_TYPE (gimple_get_lhs (stmt)); From patchwork Wed Jun 28 13:46:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 113901 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp8940374vqr; Wed, 28 Jun 2023 06:49:57 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7tpdlUdx94vra2pQC/U6cKa991nD4AgfRvP8nayscalRalJoGL0GlAzwTshbC0f+KZBu7d X-Received: by 2002:a17:906:5d14:b0:98d:ebb7:a8b0 with SMTP id g20-20020a1709065d1400b0098debb7a8b0mr1217049ejt.14.1687960196801; Wed, 28 Jun 2023 06:49:56 -0700 (PDT) Received: from sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id lw20-20020a170906bcd400b0098e4aef0789si4529659ejb.613.2023.06.28.06.49.56 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Jun 2023 06:49:56 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=AY+JVAC2; arc=fail (signature failed); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A07E33860002 for ; Wed, 28 Jun 2023 13:47:49 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A07E33860002 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1687960069; bh=7TYXijYIxOKQ2jyGZVl9UI3MwmgkJmL0PQvtUkUx9Os=; h=Date:To:Cc:Subject:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=AY+JVAC2rDoX+UBmy9HDdfuOVykPD19CUSzmGJ2F1pmoxDi6Q4EmKBHcAdt1RsDTN 4JUyosCb19nfjHnYdKdZ+PQDuj+PC/2S2E7420ffZJNwW3xmogjThtcQgqQaph8YXU oFkKYBteGDG9TSfMMGwaWK5XsxTmL73/ReZtlQyY= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR01-DB5-obe.outbound.protection.outlook.com (mail-db5eur01on2057.outbound.protection.outlook.com [40.107.15.57]) by sourceware.org (Postfix) with ESMTPS id AD8643856627 for ; Wed, 28 Jun 2023 13:46:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org AD8643856627 Received: from DU2PR04CA0359.eurprd04.prod.outlook.com (2603:10a6:10:2b4::26) by DU0PR08MB9750.eurprd08.prod.outlook.com (2603:10a6:10:446::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.26; Wed, 28 Jun 2023 13:46:55 +0000 Received: from DBAEUR03FT038.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:2b4:cafe::ef) by DU2PR04CA0359.outlook.office365.com (2603:10a6:10:2b4::26) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.34 via Frontend Transport; Wed, 28 Jun 2023 13:46:55 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT038.mail.protection.outlook.com (100.127.143.23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6544.18 via Frontend Transport; Wed, 28 Jun 2023 13:46:55 +0000 Received: ("Tessian outbound c08fa2e31830:v142"); Wed, 28 Jun 2023 13:46:55 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 42c19ac32e1b9f3e X-CR-MTA-TID: 64aa7808 Received: from 67f45c0e9240.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 6F3785F7-3567-4EE9-8A01-6AA6AA032197.1; Wed, 28 Jun 2023 13:46:48 +0000 Received: from EUR05-DB8-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 67f45c0e9240.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 28 Jun 2023 13:46:48 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=UnnHf11qHtL1Sz3DV3XJOdgDvV5ImrvK2If1FdLWL+jiQr3ql2dtjgMhdL5pnPdKkqKQHpWWcKM1Nr2lJq6PIl2aE7MUzhsMGO+USHjFnlll7PC+I13HM1waiYs0n5hAzimmdaUclygO0aQv6EnG5NMgtyvxFSyJ2dGVVJ3AKi2f3YQaPsUuzP8dotojyRxxPh/DaZyloN9dKeA2ZTyAC6tvojNV48cyFNh/gTEIu3woIgFgTu+iscuwtYQFLypFMQ/SFvpby3KVYZ+z4W6DTDaDpBPrgmLUB/6nmwGVs4PQALSlf80f2g5qiUED4BBvhIG7GGViN3ySEE6T05qA2g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=7TYXijYIxOKQ2jyGZVl9UI3MwmgkJmL0PQvtUkUx9Os=; b=Pmn7mKR3VlezhjOodb+2fr27vN29NVUoE7T4EBnhLKagUhRrHFBHS3XF/GqbdA7SSylm9RYonaliBXm3mEVLokPraxB480jfyGTAjSoYCKUeE//F9zAApBj0ENRoODEmsPJhcgsgsCaLOh985KiJsw9MG9yMAlxTakxcijH+R7q2fNK30u4eFaGHJz81xgnXVz8mi41WpbDhntA6aYgBGvs54QryYOS6u04dSDE00nuRdeY2s58ZB4WPeXzhLLwf0iGGSo0s0LqK5S/JXyBRqTx0bNglxAsKCoa2dIpP7CZzkXIdv/P/DeGEI4a1Nq921AaHCjTxHPOJtu0dsLUIFw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by DB5PR08MB9970.eurprd08.prod.outlook.com (2603:10a6:10:489::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.23; Wed, 28 Jun 2023 13:46:46 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::2301:1cde:cfe7:eaf0]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::2301:1cde:cfe7:eaf0%6]) with mapi id 15.20.6521.026; Wed, 28 Jun 2023 13:46:46 +0000 Date: Wed, 28 Jun 2023 14:46:38 +0100 To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, rguenther@suse.de, jlaw@ventanamicro.com Subject: [PATCH 11/19]middle-end: implement code motion for early break. Message-ID: Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: SA0PR13CA0001.namprd13.prod.outlook.com (2603:10b6:806:130::6) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|DB5PR08MB9970:EE_|DBAEUR03FT038:EE_|DU0PR08MB9750:EE_ X-MS-Office365-Filtering-Correlation-Id: e34dde40-4485-4bd7-5e7c-08db77de22ef x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: o4dFQNJpU8PoS2mkmbI/oQMF7QwM/6oqxqWuazZMxv2kCQL5kLGdOgLEDRagpwRXe3ruotJHA/b53WCEV++SujvutiiMzdM0epWr6rbtuq04KXXxk6b6sBcsriQ6COpD52m8gyySMAxqIYDqwTKu+J2PR3SOBhDSNkg19P0U1NUXB9CLG4ARxR2L6zJlnWGo28Qr2VlT+RQHnZyk32J/IleZQy5b76lcT/+XvV30efP9Rq/FC/Q8Wf30/VjcTdaZUUfHwDa/0gi/ouM9hvUldgKh7mtPUq/wfm17ByXlez8N8cUB7FnmeIVllUxzr3I5GfKZ8Ib+eYRGk+N4y4AS+76sWuWeHwYpdf3pDD5P97Nt3ZthavwonmGyj4x1M37xQ1Y9FDDrdjZVPpMRYltpyAIbA68cc1YWDFURpPCs2B8Bid9w0KvgNz/j9EJJ12ALmL+Pn5EkkS//o7VguBUbAOznYLBJPs5La8TDidQz+naRMhRQkE75tr43xFiOFxO2vlzw54aMS3uiotkf58gb0OLzx6oZql8s9OPBoGfwiplNdy2GE7Z+9Zo2/Cg77HCC/QUy/TIWO5gcWat+029QXbSt8HVripLiK82qFNN2bHqWhB7NAnaDZ8EoUhgyVJtc X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230028)(4636009)(136003)(346002)(396003)(376002)(366004)(39860400002)(451199021)(66946007)(66476007)(6916009)(44832011)(316002)(66556008)(478600001)(4326008)(36756003)(8936002)(235185007)(66899021)(8676002)(5660300002)(86362001)(2906002)(33964004)(41300700001)(44144004)(6486002)(30864003)(186003)(4743002)(6506007)(6512007)(6666004)(38100700002)(26005)(83380400001)(2616005)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB5PR08MB9970 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT038.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 2235be9d-f28d-4626-31ea-08db77de1d47 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: uLi4UVIyv0fp6bGYpa0f0M8Yotifd9Qsm5s4XkspJQWQoR6JxDJt+uUld+/LHN3A86lp3WUuFefO1eDDfsCP7kPeYG5ZthDwLjfE0Rz8WkznXaEqlaTOZKaBsUafHa/jAjR/LL0MSyNpmohyStWEubCWf07ZWycqAnYXZNHeImh/zYw8wZE7L1iyBllZVqmkogka9P2ESaF7YjF+E/QtXMkzNrmynuStkCK9PgS7jnvhcYi/Lb0Io9L4DoDqpcQFgmVEjyc9wJDpi0VosZ2J/voeiudTH+qQQNqQ+Q1hA+zEUnP265s696OCLJMfWDzT7YrRT8Sz8/vBru5QuVIz+NEcY3jwiOq/ElikJIsFBBOOovBfeDxUEKyxNJAnoVeN+2U9RTCo2kD5XcLQ0fDdCW/oORxbLletKSOcg0w3GD/0icnHH2itTQD0bo1K7WjnvBs51MjAkRW3+53mkGwTZdm8Dmr05KjSP7qAnAJGWiiILtMT9NwLaMuEWDiSN1HaKA+8z2qGIZ95Sh97FRzicJ990KCk5eR26oangjQNwi+JPdL+cVL9SyW1WsYpgWzEly1zmba75BhtHKWEroy1zIbhALyZdW/shezZ+LGFE4sVLsUmqgZtuQrnX8ch545y+CqL7UwovDY+Sd/niESUVUPLVat4lhlpB41TD5bKOQNu0naERZ9h9VXJ/zb5bcWJA9IcMb0on0964bVL1wpHfDibvVAd2RL8UbtL3ZTvoKpuwQT/iLdZ07nMD9SPbol1qpCpG5FBV8qltrOf+uS+UQ== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230028)(4636009)(396003)(39860400002)(136003)(376002)(346002)(451199021)(36840700001)(46966006)(40470700004)(44832011)(235185007)(5660300002)(4326008)(6916009)(70206006)(478600001)(36756003)(316002)(70586007)(8936002)(66899021)(8676002)(2906002)(4743002)(40460700003)(33964004)(36860700001)(30864003)(41300700001)(6486002)(44144004)(82310400005)(186003)(40480700001)(336012)(86362001)(26005)(6506007)(6512007)(47076005)(6666004)(356005)(107886003)(81166007)(2616005)(82740400003)(83380400001)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Jun 2023 13:46:55.5458 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: e34dde40-4485-4bd7-5e7c-08db77de22ef X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT038.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU0PR08MB9750 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_LOTSOFHASH, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tamar Christina via Gcc-patches From: Tamar Christina Reply-To: Tamar Christina Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769954551153319219?= X-GMAIL-MSGID: =?utf-8?q?1769954551153319219?= Hi All, When performing early break vectorization we need to be sure that the vector operations are safe to perform. A simple example is e.g. for (int i = 0; i < N; i++) { vect_b[i] = x + i; if (vect_a[i]*2 != x) break; vect_a[i] = x; } where the store to vect_b is not allowed to be executed unconditionally since if we exit through the early break it wouldn't have been done for the full VF iteration. Effective the code motion determines: - is it safe/possible to vectorize the function - what updates to the VUSES should be performed if we do - Which statements need to be moved - Which statements can't be moved: * values that are live must be reachable through all exits * values that aren't single use and shared by the use/def chain of the cond - The final insertion point of the instructions. In the cases we have multiple early exist statements this should be the one closest to the loop latch itself. After motion the loop above is: for (int i = 0; i < N; i++) { ... y = x + i; if (vect_a[i]*2 != x) break; vect_b[i] = y; vect_a[i] = x; } The operation is split into two, during data ref analysis we determine validity of the operation and generate a worklist of actions to perform if we vectorize. After peeling and just before statetement tranformation we replay this worklist which moves the statements and updates book keeping only in the main loop that's to be vectorized. This includes updating of USES in exit blocks. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * tree-vect-data-refs.cc (validate_early_exit_stmts): New. (vect_analyze_data_ref_dependences): Use it. * tree-vect-loop.cc (move_early_exit_stmts): New. (vect_transform_loop): Use it. * tree-vectorizer.h (LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS, LOOP_VINFO_EARLY_BRK_DEST_BB, LOOP_VINFO_EARLY_BRK_VUSES): New. --- inline copy of patch -- diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc index fcc950f528b2d1e044be12424c2df11f692ee8ba..240bd7a86233f6b907816f812681e4cd778ecaae 100644 --- diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc index fcc950f528b2d1e044be12424c2df11f692ee8ba..240bd7a86233f6b907816f812681e4cd778ecaae 100644 --- a/gcc/tree-vect-data-refs.cc +++ b/gcc/tree-vect-data-refs.cc @@ -568,6 +568,278 @@ vect_analyze_data_ref_dependence (struct data_dependence_relation *ddr, return opt_result::success (); } +/* This function tries to validate whether an early break vectorization + is possible for the current instruction sequence. Returns True i + possible, otherwise False. + + Requirements: + - Any memory access must be to a fixed size buffer. + - There must not be any loads and stores to the same object. + - Multiple loads are allowed as long as they don't alias. + + NOTE: + This implemementation is very conservative. Any overlappig loads/stores + that take place before the early break statement gets rejected aside from + WAR dependencies. + + i.e.: + + a[i] = 8 + c = a[i] + if (b[i]) + ... + + is not allowed, but + + c = a[i] + a[i] = 8 + if (b[i]) + ... + + is which is the common case. + + Arguments: + - LOOP_VINFO: loop information for the current loop. + - CHAIN: Currently detected sequence of instructions that need to be moved + if we are to vectorize this early break. + - FIXED: Sequences of SSA_NAMEs that must not be moved, they are reachable from + one or more cond conditions. If this set overlaps with CHAIN then FIXED + takes precedence. This deals with non-single use cases. + - LOADS: List of all loads found during traversal. + - BASES: List of all load data references found during traversal. + - GSTMT: Current position to inspect for validity. The sequence + will be moved upwards from this point. + - REACHING_VUSE: The dominating VUSE found so far. + - CURRENT_VDEF: The last VDEF we've seen. These are updated in + pre-order and updated in post-order after moving the + instruction. */ + +static bool +validate_early_exit_stmts (loop_vec_info loop_vinfo, hash_set *chain, + hash_set *fixed, vec *loads, + vec *bases, tree *reaching_vuse, + tree *current_vdef, gimple_stmt_iterator *gstmt, + hash_map *renames) +{ + if (gsi_end_p (*gstmt)) + return true; + + gimple *stmt = gsi_stmt (*gstmt); + if (gimple_has_ops (stmt)) + { + tree dest = NULL_TREE; + /* Try to find the SSA_NAME being defined. For Statements with an LHS + use the LHS, if not, assume that the first argument of a call is the + value being defined. e.g. MASKED_LOAD etc. */ + if (gimple_has_lhs (stmt)) + { + if (is_gimple_assign (stmt)) + dest = gimple_assign_lhs (stmt); + else if (const gcall *call = dyn_cast (stmt)) + dest = gimple_call_lhs (call); + } + else if (const gcall *call = dyn_cast (stmt)) + dest = gimple_arg (call, 0); + else if (const gcond *cond = dyn_cast (stmt)) + { + /* Operands of conds are ones we can't move. */ + fixed->add (gimple_cond_lhs (cond)); + fixed->add (gimple_cond_rhs (cond)); + } + + bool move = false; + + stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt); + if (!stmt_vinfo) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "early breaks only supported. Unknown" + " statement: %G", stmt); + return false; + } + + auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo); + if (dr_ref) + { + /* We currenly only support statically allocated objects due to + not having first-faulting loads support or peeling for alignment + support. Compute the isize of the referenced object (it could be + dynamically allocated). */ + tree obj = DR_BASE_ADDRESS (dr_ref); + if (!obj || TREE_CODE (obj) != ADDR_EXPR) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "early breaks only supported on statically" + " allocated objects.\n"); + return false; + } + + tree refop = TREE_OPERAND (obj, 0); + tree refbase = get_base_address (refop); + if (!refbase || !DECL_P (refbase) || !DECL_SIZE (refbase) + || TREE_CODE (DECL_SIZE (refbase)) != INTEGER_CST) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "early breaks only supported on statically" + " allocated objects.\n"); + return false; + } + + if (DR_IS_READ (dr_ref)) + { + loads->safe_push (dest); + bases->safe_push (dr_ref); + } + else if (DR_IS_WRITE (dr_ref)) + { + for (auto dr : bases) + if (same_data_refs_base_objects (dr, dr_ref)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, + vect_location, + "early breaks only supported," + " overlapping loads and stores found" + " before the break statement.\n"); + return false; + } + /* Any writes starts a new chain. */ + move = true; + } + } + + /* If a statement if live and escapes the loop through usage in the loop + epilogue then we can't move it since we need to maintain its + reachability through all exits. */ + bool skip = false; + if (STMT_VINFO_LIVE_P (stmt_vinfo) + && !(dr_ref && DR_IS_WRITE (dr_ref))) + { + imm_use_iterator imm_iter; + use_operand_p use_p; + FOR_EACH_IMM_USE_FAST (use_p, imm_iter, dest) + { + basic_block bb = gimple_bb (USE_STMT (use_p)); + skip = bb == LOOP_VINFO_IV_EXIT (loop_vinfo)->dest; + if (skip) + break; + } + } + + /* If we found the defining statement of a something that's part of the + chain then expand the chain with the new SSA_VARs being used. */ + if (!skip && (chain->contains (dest) || move)) + { + move = true; + for (unsigned x = 0; x < gimple_num_args (stmt); x++) + { + tree var = gimple_arg (stmt, x); + if (TREE_CODE (var) == SSA_NAME) + { + if (fixed->contains (dest)) + { + move = false; + fixed->add (var); + } + else + chain->add (var); + } + else + { + use_operand_p use_p; + ssa_op_iter iter; + FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_USE) + { + tree op = USE_FROM_PTR (use_p); + gcc_assert (TREE_CODE (op) == SSA_NAME); + if (fixed->contains (dest)) + { + move = false; + fixed->add (op); + } + else + chain->add (op); + } + } + } + + if (dump_enabled_p ()) + { + if (move) + dump_printf_loc (MSG_NOTE, vect_location, + "found chain %G", stmt); + else + dump_printf_loc (MSG_NOTE, vect_location, + "ignored chain %G, not single use", stmt); + } + } + + if (move) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "==> recording stmt %G", stmt); + + for (tree ref : loads) + if (stmt_may_clobber_ref_p (stmt, ref, true)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "early breaks not supported as memory used" + " may alias.\n"); + return false; + } + + /* This statement is to be moved. */ + LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).safe_push (stmt); + + /* If we've moved a VDEF, extract the defining MEM and update + usages of it. */ + tree vdef; + if ((vdef = gimple_vdef (stmt))) + { + *current_vdef = vdef; + *reaching_vuse = gimple_vuse (stmt); + } + } + } + + gsi_prev (gstmt); + + if (!validate_early_exit_stmts (loop_vinfo, chain, fixed, loads, bases, + reaching_vuse, current_vdef, gstmt, renames)) + return false; + + if (gimple_vuse (stmt) + && reaching_vuse && *reaching_vuse + && gimple_vuse (stmt) == *current_vdef) + { + tree new_vuse = *reaching_vuse; + tree *renamed = renames->get (new_vuse); + if (renamed) + new_vuse = *renamed; + LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).safe_push ({stmt, new_vuse}); + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "current_use: %T, new_use: %T, mem_ref: %G", + *current_vdef, new_vuse, stmt); + + if (!renamed) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "stored: %T -> %T\n", *current_vdef, new_vuse); + + renames->put (*current_vdef, new_vuse); + } + } + + return true; +} + /* Function vect_analyze_data_ref_dependences. Examine all the data references in the loop, and make sure there do not @@ -612,6 +884,84 @@ vect_analyze_data_ref_dependences (loop_vec_info loop_vinfo, return res; } + /* If we have early break statements in the loop, check to see if they + are of a form we can vectorizer. */ + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) + { + hash_set chain, fixed; + auto_vec loads; + auto_vec bases; + hash_map renames; + basic_block dest_bb = NULL; + tree vdef = NULL; + tree vuse = NULL; + + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "loop contains multiple exits, analyzing" + " statement dependencies.\n"); + + for (gcond *c : LOOP_VINFO_LOOP_CONDS (loop_vinfo)) + { + stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (c); + if (STMT_VINFO_TYPE (loop_cond_info) != loop_exit_ctrl_vec_info_type) + continue; + + gimple *stmt = STMT_VINFO_STMT (loop_cond_info); + gimple_stmt_iterator gsi = gsi_for_stmt (stmt); + + /* Initiaze the vuse chain with the one at the early break. */ + if (!vuse) + vuse = gimple_vuse (c); + + if (!validate_early_exit_stmts (loop_vinfo, &chain, &fixed, &loads, + &bases, &vuse, &vdef, &gsi, &renames)) + return opt_result::failure_at (stmt, + "can't safely apply code motion to " + "dependencies of %G to vectorize " + "the early exit.\n", stmt); + + /* Save destination as we go, BB are visited in order and the last one + is where statements should be moved to. */ + if (!dest_bb) + dest_bb = gimple_bb (c); + else + { + basic_block curr_bb = gimple_bb (c); + if (dominated_by_p (CDI_DOMINATORS, curr_bb, dest_bb)) + dest_bb = curr_bb; + } + } + + dest_bb = FALLTHRU_EDGE (dest_bb)->dest; + gcc_assert (dest_bb); + LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo) = dest_bb; + + /* Do some renaming to update the uses chain. */ + for (unsigned i = 0; i < LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).length (); i++) + { + auto g = LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo)[i]; + tree *tmp = renames.get (g.second); + if (tmp) + LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo)[i] + = std::make_pair (g.first, *tmp); + } + + /* TODO: Remove? It's useful debug statement but may be too much. */ + for (auto g : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "overrode use: %T, mem_ref: %G", + g.second, g.first); + } + + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "recorded statements to be moved to BB %d\n", + LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo)->index); + } + return opt_result::success (); } diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 9065811b3b9c2a550baf44768603172b9e26b94b..b4a98de80aa39057fc9b17977dd0e347b4f0fb5d 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -11192,6 +11192,45 @@ update_epilogue_loop_vinfo (class loop *epilogue, tree advance) epilogue_vinfo->shared->save_datarefs (); } +/* When vectorizing early break statements instructions that happen before + the early break in the current BB need to be moved to after the early + break. This function deals with that and assumes that any validity + checks has already been performed. + + While moving the instructions if it encounters a VUSE or VDEF it then + corrects the VUSES as it moves the statements along. GDEST is the location + in which to insert the new statements. */ + +static void +move_early_exit_stmts (loop_vec_info loop_vinfo) +{ + /* Move all stmts that need moving. */ + basic_block dest_bb = LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo); + gimple_stmt_iterator dest_gsi = gsi_start_bb (dest_bb); + + for (gimple *stmt : LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, "moving stmt %G", stmt); + + gimple_stmt_iterator stmt_gsi = gsi_for_stmt (stmt); + gsi_move_before (&stmt_gsi, &dest_gsi); + gsi_prev (&dest_gsi); + update_stmt (stmt); + } + + /* Update all the stmts with their new reaching VUSES. */ + for (auto p : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "updating vuse %G", p.first); + unlink_stmt_vdef (p.first); + gimple_set_vuse (p.first, p.second); + update_stmt (p.first); + } +} + /* Function vect_transform_loop. The analysis phase has determined that the loop is vectorizable. @@ -11330,6 +11369,11 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call) vect_schedule_slp (loop_vinfo, LOOP_VINFO_SLP_INSTANCES (loop_vinfo)); } + /* Handle any code motion that we need to for early-break vectorization after + we've done peeling but just before we start vectorizing. */ + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) + move_early_exit_stmts (loop_vinfo); + /* FORNOW: the vectorizer supports only loops which body consist of one basic block (header + empty latch). When the vectorizer will support more involved loop forms, the order by which the BBs are diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 1cc003c12e2447eca878f56cb019236f56e96f85..ec65b65b5910e9cbad0a8c7e83c950b6168b98bf 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -919,6 +919,19 @@ public: analysis. */ vec<_loop_vec_info *> epilogue_vinfos; + /* Used to store the list of statements needing to be moved if doing early + break vectorization as they would violate the scalar loop semantics if + vectorized in their current location. */ + auto_vec early_break_conflict; + + /* The final basic block where to move statements to. In the case of + multiple exits this could be pretty far away. */ + basic_block early_break_dest_bb; + + /* Statements whose VUSES need updating if early break vectorization is to + happen. */ + auto_vec> early_break_vuses; + } *loop_vec_info; /* Access Functions. */ @@ -972,6 +985,9 @@ public: #define LOOP_VINFO_REDUCTION_CHAINS(L) (L)->reduction_chains #define LOOP_VINFO_PEELING_FOR_GAPS(L) (L)->peeling_for_gaps #define LOOP_VINFO_PEELING_FOR_NITER(L) (L)->peeling_for_niter +#define LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS(L) (L)->early_break_conflict +#define LOOP_VINFO_EARLY_BRK_DEST_BB(L) (L)->early_break_dest_bb +#define LOOP_VINFO_EARLY_BRK_VUSES(L) (L)->early_break_vuses #define LOOP_VINFO_LOOP_CONDS(L) (L)->conds #define LOOP_VINFO_LOOP_IV_COND(L) (L)->loop_iv_cond #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies --- a/gcc/tree-vect-data-refs.cc +++ b/gcc/tree-vect-data-refs.cc @@ -568,6 +568,278 @@ vect_analyze_data_ref_dependence (struct data_dependence_relation *ddr, return opt_result::success (); } +/* This function tries to validate whether an early break vectorization + is possible for the current instruction sequence. Returns True i + possible, otherwise False. + + Requirements: + - Any memory access must be to a fixed size buffer. + - There must not be any loads and stores to the same object. + - Multiple loads are allowed as long as they don't alias. + + NOTE: + This implemementation is very conservative. Any overlappig loads/stores + that take place before the early break statement gets rejected aside from + WAR dependencies. + + i.e.: + + a[i] = 8 + c = a[i] + if (b[i]) + ... + + is not allowed, but + + c = a[i] + a[i] = 8 + if (b[i]) + ... + + is which is the common case. + + Arguments: + - LOOP_VINFO: loop information for the current loop. + - CHAIN: Currently detected sequence of instructions that need to be moved + if we are to vectorize this early break. + - FIXED: Sequences of SSA_NAMEs that must not be moved, they are reachable from + one or more cond conditions. If this set overlaps with CHAIN then FIXED + takes precedence. This deals with non-single use cases. + - LOADS: List of all loads found during traversal. + - BASES: List of all load data references found during traversal. + - GSTMT: Current position to inspect for validity. The sequence + will be moved upwards from this point. + - REACHING_VUSE: The dominating VUSE found so far. + - CURRENT_VDEF: The last VDEF we've seen. These are updated in + pre-order and updated in post-order after moving the + instruction. */ + +static bool +validate_early_exit_stmts (loop_vec_info loop_vinfo, hash_set *chain, + hash_set *fixed, vec *loads, + vec *bases, tree *reaching_vuse, + tree *current_vdef, gimple_stmt_iterator *gstmt, + hash_map *renames) +{ + if (gsi_end_p (*gstmt)) + return true; + + gimple *stmt = gsi_stmt (*gstmt); + if (gimple_has_ops (stmt)) + { + tree dest = NULL_TREE; + /* Try to find the SSA_NAME being defined. For Statements with an LHS + use the LHS, if not, assume that the first argument of a call is the + value being defined. e.g. MASKED_LOAD etc. */ + if (gimple_has_lhs (stmt)) + { + if (is_gimple_assign (stmt)) + dest = gimple_assign_lhs (stmt); + else if (const gcall *call = dyn_cast (stmt)) + dest = gimple_call_lhs (call); + } + else if (const gcall *call = dyn_cast (stmt)) + dest = gimple_arg (call, 0); + else if (const gcond *cond = dyn_cast (stmt)) + { + /* Operands of conds are ones we can't move. */ + fixed->add (gimple_cond_lhs (cond)); + fixed->add (gimple_cond_rhs (cond)); + } + + bool move = false; + + stmt_vec_info stmt_vinfo = loop_vinfo->lookup_stmt (stmt); + if (!stmt_vinfo) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "early breaks only supported. Unknown" + " statement: %G", stmt); + return false; + } + + auto dr_ref = STMT_VINFO_DATA_REF (stmt_vinfo); + if (dr_ref) + { + /* We currenly only support statically allocated objects due to + not having first-faulting loads support or peeling for alignment + support. Compute the isize of the referenced object (it could be + dynamically allocated). */ + tree obj = DR_BASE_ADDRESS (dr_ref); + if (!obj || TREE_CODE (obj) != ADDR_EXPR) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "early breaks only supported on statically" + " allocated objects.\n"); + return false; + } + + tree refop = TREE_OPERAND (obj, 0); + tree refbase = get_base_address (refop); + if (!refbase || !DECL_P (refbase) || !DECL_SIZE (refbase) + || TREE_CODE (DECL_SIZE (refbase)) != INTEGER_CST) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "early breaks only supported on statically" + " allocated objects.\n"); + return false; + } + + if (DR_IS_READ (dr_ref)) + { + loads->safe_push (dest); + bases->safe_push (dr_ref); + } + else if (DR_IS_WRITE (dr_ref)) + { + for (auto dr : bases) + if (same_data_refs_base_objects (dr, dr_ref)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, + vect_location, + "early breaks only supported," + " overlapping loads and stores found" + " before the break statement.\n"); + return false; + } + /* Any writes starts a new chain. */ + move = true; + } + } + + /* If a statement if live and escapes the loop through usage in the loop + epilogue then we can't move it since we need to maintain its + reachability through all exits. */ + bool skip = false; + if (STMT_VINFO_LIVE_P (stmt_vinfo) + && !(dr_ref && DR_IS_WRITE (dr_ref))) + { + imm_use_iterator imm_iter; + use_operand_p use_p; + FOR_EACH_IMM_USE_FAST (use_p, imm_iter, dest) + { + basic_block bb = gimple_bb (USE_STMT (use_p)); + skip = bb == LOOP_VINFO_IV_EXIT (loop_vinfo)->dest; + if (skip) + break; + } + } + + /* If we found the defining statement of a something that's part of the + chain then expand the chain with the new SSA_VARs being used. */ + if (!skip && (chain->contains (dest) || move)) + { + move = true; + for (unsigned x = 0; x < gimple_num_args (stmt); x++) + { + tree var = gimple_arg (stmt, x); + if (TREE_CODE (var) == SSA_NAME) + { + if (fixed->contains (dest)) + { + move = false; + fixed->add (var); + } + else + chain->add (var); + } + else + { + use_operand_p use_p; + ssa_op_iter iter; + FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_USE) + { + tree op = USE_FROM_PTR (use_p); + gcc_assert (TREE_CODE (op) == SSA_NAME); + if (fixed->contains (dest)) + { + move = false; + fixed->add (op); + } + else + chain->add (op); + } + } + } + + if (dump_enabled_p ()) + { + if (move) + dump_printf_loc (MSG_NOTE, vect_location, + "found chain %G", stmt); + else + dump_printf_loc (MSG_NOTE, vect_location, + "ignored chain %G, not single use", stmt); + } + } + + if (move) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "==> recording stmt %G", stmt); + + for (tree ref : loads) + if (stmt_may_clobber_ref_p (stmt, ref, true)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "early breaks not supported as memory used" + " may alias.\n"); + return false; + } + + /* This statement is to be moved. */ + LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo).safe_push (stmt); + + /* If we've moved a VDEF, extract the defining MEM and update + usages of it. */ + tree vdef; + if ((vdef = gimple_vdef (stmt))) + { + *current_vdef = vdef; + *reaching_vuse = gimple_vuse (stmt); + } + } + } + + gsi_prev (gstmt); + + if (!validate_early_exit_stmts (loop_vinfo, chain, fixed, loads, bases, + reaching_vuse, current_vdef, gstmt, renames)) + return false; + + if (gimple_vuse (stmt) + && reaching_vuse && *reaching_vuse + && gimple_vuse (stmt) == *current_vdef) + { + tree new_vuse = *reaching_vuse; + tree *renamed = renames->get (new_vuse); + if (renamed) + new_vuse = *renamed; + LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).safe_push ({stmt, new_vuse}); + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "current_use: %T, new_use: %T, mem_ref: %G", + *current_vdef, new_vuse, stmt); + + if (!renamed) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "stored: %T -> %T\n", *current_vdef, new_vuse); + + renames->put (*current_vdef, new_vuse); + } + } + + return true; +} + /* Function vect_analyze_data_ref_dependences. Examine all the data references in the loop, and make sure there do not @@ -612,6 +884,84 @@ vect_analyze_data_ref_dependences (loop_vec_info loop_vinfo, return res; } + /* If we have early break statements in the loop, check to see if they + are of a form we can vectorizer. */ + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) + { + hash_set chain, fixed; + auto_vec loads; + auto_vec bases; + hash_map renames; + basic_block dest_bb = NULL; + tree vdef = NULL; + tree vuse = NULL; + + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "loop contains multiple exits, analyzing" + " statement dependencies.\n"); + + for (gcond *c : LOOP_VINFO_LOOP_CONDS (loop_vinfo)) + { + stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (c); + if (STMT_VINFO_TYPE (loop_cond_info) != loop_exit_ctrl_vec_info_type) + continue; + + gimple *stmt = STMT_VINFO_STMT (loop_cond_info); + gimple_stmt_iterator gsi = gsi_for_stmt (stmt); + + /* Initiaze the vuse chain with the one at the early break. */ + if (!vuse) + vuse = gimple_vuse (c); + + if (!validate_early_exit_stmts (loop_vinfo, &chain, &fixed, &loads, + &bases, &vuse, &vdef, &gsi, &renames)) + return opt_result::failure_at (stmt, + "can't safely apply code motion to " + "dependencies of %G to vectorize " + "the early exit.\n", stmt); + + /* Save destination as we go, BB are visited in order and the last one + is where statements should be moved to. */ + if (!dest_bb) + dest_bb = gimple_bb (c); + else + { + basic_block curr_bb = gimple_bb (c); + if (dominated_by_p (CDI_DOMINATORS, curr_bb, dest_bb)) + dest_bb = curr_bb; + } + } + + dest_bb = FALLTHRU_EDGE (dest_bb)->dest; + gcc_assert (dest_bb); + LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo) = dest_bb; + + /* Do some renaming to update the uses chain. */ + for (unsigned i = 0; i < LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo).length (); i++) + { + auto g = LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo)[i]; + tree *tmp = renames.get (g.second); + if (tmp) + LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo)[i] + = std::make_pair (g.first, *tmp); + } + + /* TODO: Remove? It's useful debug statement but may be too much. */ + for (auto g : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "overrode use: %T, mem_ref: %G", + g.second, g.first); + } + + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "recorded statements to be moved to BB %d\n", + LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo)->index); + } + return opt_result::success (); } diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 9065811b3b9c2a550baf44768603172b9e26b94b..b4a98de80aa39057fc9b17977dd0e347b4f0fb5d 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -11192,6 +11192,45 @@ update_epilogue_loop_vinfo (class loop *epilogue, tree advance) epilogue_vinfo->shared->save_datarefs (); } +/* When vectorizing early break statements instructions that happen before + the early break in the current BB need to be moved to after the early + break. This function deals with that and assumes that any validity + checks has already been performed. + + While moving the instructions if it encounters a VUSE or VDEF it then + corrects the VUSES as it moves the statements along. GDEST is the location + in which to insert the new statements. */ + +static void +move_early_exit_stmts (loop_vec_info loop_vinfo) +{ + /* Move all stmts that need moving. */ + basic_block dest_bb = LOOP_VINFO_EARLY_BRK_DEST_BB (loop_vinfo); + gimple_stmt_iterator dest_gsi = gsi_start_bb (dest_bb); + + for (gimple *stmt : LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS (loop_vinfo)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, "moving stmt %G", stmt); + + gimple_stmt_iterator stmt_gsi = gsi_for_stmt (stmt); + gsi_move_before (&stmt_gsi, &dest_gsi); + gsi_prev (&dest_gsi); + update_stmt (stmt); + } + + /* Update all the stmts with their new reaching VUSES. */ + for (auto p : LOOP_VINFO_EARLY_BRK_VUSES (loop_vinfo)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "updating vuse %G", p.first); + unlink_stmt_vdef (p.first); + gimple_set_vuse (p.first, p.second); + update_stmt (p.first); + } +} + /* Function vect_transform_loop. The analysis phase has determined that the loop is vectorizable. @@ -11330,6 +11369,11 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call) vect_schedule_slp (loop_vinfo, LOOP_VINFO_SLP_INSTANCES (loop_vinfo)); } + /* Handle any code motion that we need to for early-break vectorization after + we've done peeling but just before we start vectorizing. */ + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) + move_early_exit_stmts (loop_vinfo); + /* FORNOW: the vectorizer supports only loops which body consist of one basic block (header + empty latch). When the vectorizer will support more involved loop forms, the order by which the BBs are diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 1cc003c12e2447eca878f56cb019236f56e96f85..ec65b65b5910e9cbad0a8c7e83c950b6168b98bf 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -919,6 +919,19 @@ public: analysis. */ vec<_loop_vec_info *> epilogue_vinfos; + /* Used to store the list of statements needing to be moved if doing early + break vectorization as they would violate the scalar loop semantics if + vectorized in their current location. */ + auto_vec early_break_conflict; + + /* The final basic block where to move statements to. In the case of + multiple exits this could be pretty far away. */ + basic_block early_break_dest_bb; + + /* Statements whose VUSES need updating if early break vectorization is to + happen. */ + auto_vec> early_break_vuses; + } *loop_vec_info; /* Access Functions. */ @@ -972,6 +985,9 @@ public: #define LOOP_VINFO_REDUCTION_CHAINS(L) (L)->reduction_chains #define LOOP_VINFO_PEELING_FOR_GAPS(L) (L)->peeling_for_gaps #define LOOP_VINFO_PEELING_FOR_NITER(L) (L)->peeling_for_niter +#define LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS(L) (L)->early_break_conflict +#define LOOP_VINFO_EARLY_BRK_DEST_BB(L) (L)->early_break_dest_bb +#define LOOP_VINFO_EARLY_BRK_VUSES(L) (L)->early_break_vuses #define LOOP_VINFO_LOOP_CONDS(L) (L)->conds #define LOOP_VINFO_LOOP_IV_COND(L) (L)->loop_iv_cond #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies From patchwork Wed Jun 28 13:47:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 113904 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp8941197vqr; Wed, 28 Jun 2023 06:51:17 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6MQu7woBA+hSBCiuZf4/7zS2k/i+TDWYrcXiXlviZdINGjRTwhfCjR69dqMs9f1NfcigCV X-Received: by 2002:a17:906:d185:b0:98d:76f8:217b with SMTP id c5-20020a170906d18500b0098d76f8217bmr11568446ejz.73.1687960276615; Wed, 28 Jun 2023 06:51:16 -0700 (PDT) Received: from sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id rk7-20020a170907214700b00977e6c403f6si5697809ejb.78.2023.06.28.06.51.16 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Jun 2023 06:51:16 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b="BmU/lm3n"; arc=fail (signature failed); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 27CC43858417 for ; Wed, 28 Jun 2023 13:48:48 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 27CC43858417 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1687960128; bh=Ui2rl++CFYwsVMrMe+ZAeefFPViI7SNcy5h8Lj6oioU=; h=Date:To:Cc:Subject:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=BmU/lm3nzVtLbSwXFL1gTVbm5adhWj9lUD0dHcHcd62TK/AM78wcWGwJm7U38aLwT L2Gt09i8GWriufsgHVzd2rNPGfBtkt8myRnxBwZLqvegHzVH2MhyLM6DBKZ01F/82b PsQN+EqGW/UbcJ5M4I3NLHxD+v0ynXndScHBba18= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR04-VI1-obe.outbound.protection.outlook.com (mail-vi1eur04on2054.outbound.protection.outlook.com [40.107.8.54]) by sourceware.org (Postfix) with ESMTPS id 0643B385DC31 for ; Wed, 28 Jun 2023 13:47:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0643B385DC31 Received: from DU2PR04CA0030.eurprd04.prod.outlook.com (2603:10a6:10:3b::35) by AM9PR08MB6099.eurprd08.prod.outlook.com (2603:10a6:20b:286::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.26; Wed, 28 Jun 2023 13:47:19 +0000 Received: from DBAEUR03FT047.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:3b:cafe::66) by DU2PR04CA0030.outlook.office365.com (2603:10a6:10:3b::35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.34 via Frontend Transport; Wed, 28 Jun 2023 13:47:17 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT047.mail.protection.outlook.com (100.127.143.25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6544.20 via Frontend Transport; Wed, 28 Jun 2023 13:47:17 +0000 Received: ("Tessian outbound 546d04a74417:v142"); Wed, 28 Jun 2023 13:47:17 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 1973f08724841d8a X-CR-MTA-TID: 64aa7808 Received: from 384d18af9d5d.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id DA1579B8-B33E-4E17-B108-637E8E3A63E9.1; Wed, 28 Jun 2023 13:47:10 +0000 Received: from EUR05-DB8-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 384d18af9d5d.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 28 Jun 2023 13:47:10 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=oLoSMDthogKGyZvbtRXiABg46yTdX2Ln9rckX/BanxFRVtRRWnUW4Z4vtk+X7utjnZRmiXmB21la7g5Nhr1rIhRgjOhuKQdcXbgB1q+KAs7xU86+5kIMQAFqhwCwVLbmNr3HJzBg5BqGQxxjQg4flZ/My59hyuAMs9pX0oxl/K9In17qrh/QA8vl4oENgBGQ9hVJOdYBXUXZDF4FrytsULvglOo6vYvRjtVqzzD8tU5BvpVMQ5G56tVNDyW28ywFht/HKzj/mzRAumbGfIri4uabxtDkD0QA3x+piv1IPYPeQ77qK7FDz/IxI3WtgrBYlnXQLSVfXIUuhpiYm3wFmw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Ui2rl++CFYwsVMrMe+ZAeefFPViI7SNcy5h8Lj6oioU=; b=bllFSAvl5L0nHZx1tsIHbcie2dcPEDNELZybQDh0iX1SenBdcAEoXslWGqA/2Q7m2Dwf9LtZ8ba3OcsNI+Q28K0klQvVTRM1DwIvAwwVoSLemIIMi/BL5i+puw0SMPII5qDRfcf2wTb68YSfVgruC1TccxNS5+ubPti/D1Jnn3OcqQ8grrwwkLYD2buSUlsEODwjFr2Lo01UKkhmCQqCZwDdUD1CAQM+Xxp3y7vv15FndwI/FahuJaDjv4Q94rIBC3jBBMIZs9oxGQt3v2jywjq3OZcTyMb8scUhzoymiCEg/Ol+mnIFLAZ0Xa5tjb132jKayFpOaNrBSlStl5Z6qQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by DB5PR08MB9970.eurprd08.prod.outlook.com (2603:10a6:10:489::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.23; Wed, 28 Jun 2023 13:47:08 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::2301:1cde:cfe7:eaf0]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::2301:1cde:cfe7:eaf0%6]) with mapi id 15.20.6521.026; Wed, 28 Jun 2023 13:47:07 +0000 Date: Wed, 28 Jun 2023 14:47:03 +0100 To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, rguenther@suse.de, jlaw@ventanamicro.com Subject: [PATCH 12/19]middle-end: implement loop peeling and IV updates for early break. Message-ID: Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: LO4P265CA0022.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:2ae::14) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|DB5PR08MB9970:EE_|DBAEUR03FT047:EE_|AM9PR08MB6099:EE_ X-MS-Office365-Filtering-Correlation-Id: c526a30e-6e97-468e-d2e7-08db77de2fe2 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: ZK3rx1EG21j2uHAEmIa3CgcMKlBVPURsPfKf3uWEJt3r1SeqdS++0aHKQrsZchk1TdK3ZfrNp1iNEyP3e2u/gKLkb3bv0k5ESqKwP1RCBJEOpqKFoQ35pBDKYhF8LDXNVDa0yYZ9tLyZvFwivWlRpn/1aWeI89p6iIetUUqO4TwrmHcWjeRHkSM/ElZpw1WLdgmEDS6SfbEstyh9IvZTLfiIA4cFm/RyH7uG3kB8AhVbIIolX4vUCX8WijtVBib174kkuyJb6+C3qts11HDu0Ce2dvKBzpcoFbFGPMtm8Co5I1KQcRveF5IdwyA6jBI8xp3jCz7qvYcUAlUfBiuzZvLruCUE5D34Dkc0uMUOk4pCMoAvznm6u+sBPMBseqs4VfURs9HyhuDK9nozCeEQ7UNMzGNBYLQwKXLDmtRQmyVpSL24lJrdRWVGj23rLSzi72nJoL4JZU3Wq4xkuCpnzw/iK0K/zfhkfbSQWTcqSAltRYL3UX0zz4hRnycwL7UdSuI+IL2S8U11iA32QHpshZrp3a4xo86NfzoH71PWrk+ZOTATO6De+v51x3OLS/sp/ZmtU0ZfonGog76rCBuoSDXkp0A5OQsiturGpmBzB3Z4L5/OJlX84MwWLnkcoq/J X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230028)(4636009)(136003)(346002)(396003)(376002)(366004)(39860400002)(451199021)(66946007)(66476007)(6916009)(44832011)(316002)(66556008)(478600001)(4326008)(36756003)(8936002)(235185007)(66899021)(15650500001)(8676002)(5660300002)(86362001)(2906002)(33964004)(41300700001)(44144004)(6486002)(30864003)(186003)(4743002)(6506007)(6512007)(6666004)(38100700002)(26005)(83380400001)(2616005)(2700100001)(579004)(559001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB5PR08MB9970 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT047.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 743a029e-9558-4d57-c6d5-08db77de2a20 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 51vQvEw/F7cgQnrlxsiM3wKX9J2KCI30M5WvwRWNbPuZSefWmiLPT6L/rVw/pRXLpaRQ3ai2dEWH+FIkV0DJepzHLUEyzHv4eufQAoGDW68RdCF2Gw+edVtduworyc4oDe/few1ysJbJQk3GPEfuXjLpl8/13DzAVziyBtJ9V9urf6P8sRVps+yIxuM3y0kCrqoUAvCmaxXtQ18q5ZfXwY+sCLUzf5GqCqhTNjsZeoymcJA60pry4JgQK67oW87tYlhBXLum8YehM0OBy7qewlaDW6mD3RWiwunWKDOZoilGn/wZ2/YlbbkxgHB2TFGlAeAtNgtVsm8DPV/2e82KtQLg25yfvsX/426wzWBYHdTxRf7bWvq2IcmtNbPDw5dzc/eAfse3ZIQHiryPldWANZcCN6vV3OlMZ9QHlNJZfX849sIRIgPr+urAIooMF+ZIE8gIchgoFB8ELLvmMMIaW0xgRgHbsW1+t9euUlLVxYFWL0ou0c86czaIzWAOyyRNfWPp1PvVVp1yG8iBsDAg5t9QFf9Q+Nrfu6Y77QXbJEsr+Zh/Q5Y2AKevR5hTLCMR9n2hsX7Rxtp1V9UjVoYkz3QPcJyb7kyr5etwyq1+h6vMjdxYDrcNa5BMKvvsY3gVPgSgBPV+EuE5JOu5x75x8E5XfEo6qD2GvjBs0pFz0lPobeBRm4J3cQNlO3lENZq9QKwqw8pSO9Ah9/lUzTKx4OC604tgM2DX3eKTFtH9Z75FTKud1PTernnmCMO5wyTU9ULzemvJMoxqww7RiTgzvg== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230028)(4636009)(136003)(346002)(376002)(396003)(39860400002)(451199021)(40470700004)(46966006)(36840700001)(6512007)(44832011)(4326008)(316002)(70586007)(6916009)(478600001)(8936002)(70206006)(36756003)(235185007)(8676002)(66899021)(5660300002)(15650500001)(30864003)(2906002)(40460700003)(86362001)(36860700001)(40480700001)(41300700001)(33964004)(6666004)(6486002)(82310400005)(6506007)(186003)(4743002)(26005)(47076005)(44144004)(336012)(81166007)(356005)(82740400003)(83380400001)(2616005)(107886003)(2700100001)(579004)(559001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Jun 2023 13:47:17.2896 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: c526a30e-6e97-468e-d2e7-08db77de2fe2 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT047.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM9PR08MB6099 X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_LOTSOFHASH, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SCC_10_SHORT_WORD_LINES, SCC_5_SHORT_WORD_LINES, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tamar Christina via Gcc-patches From: Tamar Christina Reply-To: Tamar Christina Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769954635258504573?= X-GMAIL-MSGID: =?utf-8?q?1769954635258504573?= Hi All, This patch updates the peeling code to maintain LCSSA during peeling. The rewrite also naturally takes into account multiple exits and so it didn't make sense to split them off. For the purposes of peeling the only change for multiple exits is that the secondary exits are all wired to the start of the new loop preheader when doing epilogue peeling. When doing prologue peeling the CFG is kept in tact. For both epilogue and prologue peeling we wire through between the two loops any PHI nodes that escape the first loop into the second loop if flow_loops is specified. The reason for this conditionality is because slpeel_tree_duplicate_loop_to_edge_cfg is used in the compiler in 3 ways: - prologue peeling - epilogue peeling - loop distribution for the last case the loops should remain independent, and so not be connected. Because of this propagation of only used phi nodes get_current_def can be used to easily find the previous definitions. However live statements that are not used inside the loop itself are not propagated (since if unused, the moment we add the guard in between the two loops the value across the bypass edge can be wrong if the loop has been peeled.) This is dealt with easily enough in find_guard_arg. For multiple exits, while we are in LCSSA form, and have a correct DOM tree, the moment we add the guard block we will change the dominators again. To deal with this slpeel_tree_duplicate_loop_to_edge_cfg can optionally return the blocks to update without having to recompute the list of blocks to update again. When multiple exits and doing epilogue peeling we will also temporarily have an incorrect VUSES chain for the secondary exits as it anticipates the final result after the VDEFs have been moved. This will thus be corrected once the code motion is applied. Lastly by doing things this way we can remove the helper functions that previously did lock step iterations to update things as it went along. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * tree-loop-distribution.cc (copy_loop_before): Pass flow_loops = false. * tree-ssa-loop-niter.cc (loop_only_exit_p): Fix bug when exit==null. * tree-vect-loop-manip.cc (adjust_phi_and_debug_stmts): Add additional assert. (vect_set_loop_condition_normal): Skip modifying loop IV for multiple exits. (slpeel_tree_duplicate_loop_to_edge_cfg): Support multiple exit peeling. (slpeel_can_duplicate_loop_p): Likewise. (vect_update_ivs_after_vectorizer): Don't enter this... (vect_update_ivs_after_early_break): ...but instead enter here. (find_guard_arg): Update for new peeling code. (slpeel_update_phi_nodes_for_loops): Remove. (slpeel_update_phi_nodes_for_guard2): Remove hardcoded edge 0 checks. (slpeel_update_phi_nodes_for_lcssa): Remove. (vect_do_peeling): Fix VF for multiple exits and force epilogue. * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize non_break_control_flow and early_breaks. (vect_need_peeling_or_partial_vectors_p): Force partial vector if multiple exits and VLA. (vect_analyze_loop_form): Support inner loop multiple exits. (vect_create_loop_vinfo): Set LOOP_VINFO_EARLY_BREAKS. (vect_create_epilog_for_reduction): Update live phi nodes. (vectorizable_live_operation): Ignore live operations in vector loop when multiple exits. (vect_transform_loop): Force unrolling for VF loops and multiple exits. * tree-vect-stmts.cc (vect_stmt_relevant_p): Analyze ctrl statements. (vect_mark_stmts_to_be_vectorized): Check for non-exit control flow and analyze gcond params. (vect_analyze_stmt): Support gcond. * tree-vectorizer.cc (pass_vectorize::execute): Support multiple exits in RPO pass. * tree-vectorizer.h (enum vect_def_type): Add vect_early_exit_def. (LOOP_VINFO_EARLY_BREAKS, LOOP_VINFO_GENERAL_CTR_FLOW): New. (loop_vec_info_for_loop): Change to const and static. (is_loop_header_bb_p): Drop assert. (slpeel_can_duplicate_loop_p): Update prototype. (class loop): Add early_breaks and non_break_control_flow. --- inline copy of patch -- diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc index 97879498db46dd3c34181ae9aa6e5476004dd5b5..d790ce5fffab3aa3dfc40d833a968314a4442b9e 100644 --- diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc index 97879498db46dd3c34181ae9aa6e5476004dd5b5..d790ce5fffab3aa3dfc40d833a968314a4442b9e 100644 --- a/gcc/tree-loop-distribution.cc +++ b/gcc/tree-loop-distribution.cc @@ -948,7 +948,7 @@ copy_loop_before (class loop *loop, bool redirect_lc_phi_defs) edge preheader = loop_preheader_edge (loop); initialize_original_copy_tables (); - res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, NULL, preheader); + res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, NULL, preheader, false); gcc_assert (res != NULL); /* When a not last partition is supposed to keep the LC PHIs computed diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc index 5d398b67e68c7076760854119590f18b19c622b6..79686f6c4945b7139ba377300430c04b7aeefe6c 100644 --- a/gcc/tree-ssa-loop-niter.cc +++ b/gcc/tree-ssa-loop-niter.cc @@ -3072,7 +3072,12 @@ loop_only_exit_p (const class loop *loop, basic_block *body, const_edge exit) gimple_stmt_iterator bsi; unsigned i; - if (exit != single_exit (loop)) + /* We need to check for alternative exits since exit can be NULL. */ + auto exits = get_loop_exit_edges (loop); + if (exits.length () != 1) + return false; + + if (exit != exits[0]) return false; for (i = 0; i < loop->num_nodes; i++) diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc index 6b93fb3f9af8f2bbdf5dec28f0009177aa5171ab..550d7f40002cf0b58f8a927cb150edd7c2aa9999 100644 --- a/gcc/tree-vect-loop-manip.cc +++ b/gcc/tree-vect-loop-manip.cc @@ -252,6 +252,9 @@ adjust_phi_and_debug_stmts (gimple *update_phi, edge e, tree new_def) { tree orig_def = PHI_ARG_DEF_FROM_EDGE (update_phi, e); + gcc_assert (TREE_CODE (orig_def) != SSA_NAME + || orig_def != new_def); + SET_PHI_ARG_DEF (update_phi, e->dest_idx, new_def); if (MAY_HAVE_DEBUG_BIND_STMTS) @@ -1292,7 +1295,8 @@ vect_set_loop_condition_normal (loop_vec_info loop_vinfo, gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT); /* Record the number of latch iterations. */ - if (limit == niters) + if (limit == niters + || LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) /* Case A: the loop iterates NITERS times. Subtract one to get the latch count. */ loop->nb_iterations = fold_build2 (MINUS_EXPR, niters_type, niters, @@ -1303,7 +1307,13 @@ vect_set_loop_condition_normal (loop_vec_info loop_vinfo, loop->nb_iterations = fold_build2 (TRUNC_DIV_EXPR, niters_type, limit, step); - if (final_iv) + /* For multiple exits we've already maintained LCSSA form and handled + the scalar iteration update in the code that deals with the merge + block and its updated guard. I could move that code here instead + of in vect_update_ivs_after_early_break but I have to still deal + with the updates to the counter `i`. So for now I'll keep them + together. */ + if (final_iv && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) { gassign *assign; edge exit = LOOP_VINFO_IV_EXIT (loop_vinfo); @@ -1509,11 +1519,19 @@ vec_init_exit_info (class loop *loop) on E which is either the entry or exit of LOOP. If SCALAR_LOOP is non-NULL, assume LOOP and SCALAR_LOOP are equivalent and copy the basic blocks from SCALAR_LOOP instead of LOOP, but to either the - entry or exit of LOOP. */ + entry or exit of LOOP. If FLOW_LOOPS then connect LOOP to SCALAR_LOOP as a + continuation. This is correct for cases where one loop continues from the + other like in the vectorizer, but not true for uses in e.g. loop distribution + where the loop is duplicated and then modified. + + If UPDATED_DOMS is not NULL it is update with the list of basic blocks whoms + dominators were updated during the peeling. */ class loop * slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, - class loop *scalar_loop, edge e) + class loop *scalar_loop, edge e, + bool flow_loops, + vec *updated_doms) { class loop *new_loop; basic_block *new_bbs, *bbs, *pbbs; @@ -1602,6 +1620,19 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, for (unsigned i = (at_exit ? 0 : 1); i < scalar_loop->num_nodes + 1; i++) rename_variables_in_bb (new_bbs[i], duplicate_outer_loop); + /* Rename the exit uses. */ + for (edge exit : get_loop_exit_edges (new_loop)) + for (auto gsi = gsi_start_phis (exit->dest); + !gsi_end_p (gsi); gsi_next (&gsi)) + { + tree orig_def = PHI_ARG_DEF_FROM_EDGE (gsi.phi (), exit); + rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), exit)); + if (MAY_HAVE_DEBUG_BIND_STMTS) + adjust_debug_stmts (orig_def, PHI_RESULT (gsi.phi ()), exit->dest); + } + + /* This condition happens when the loop has been versioned. e.g. due to ifcvt + versioning the loop. */ if (scalar_loop != loop) { /* If we copied from SCALAR_LOOP rather than LOOP, SSA_NAMEs from @@ -1616,28 +1647,106 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, EDGE_SUCC (loop->latch, 0)); } + vec alt_exits = loop->vec_loop_alt_exits; + bool multiple_exits_p = !alt_exits.is_empty (); + auto_vec doms; + class loop *update_loop = NULL; + if (at_exit) /* Add the loop copy at exit. */ { - if (scalar_loop != loop) + if (scalar_loop != loop && new_exit->dest != exit_dest) { - gphi_iterator gsi; new_exit = redirect_edge_and_branch (new_exit, exit_dest); + flush_pending_stmts (new_exit); + } - for (gsi = gsi_start_phis (exit_dest); !gsi_end_p (gsi); - gsi_next (&gsi)) + auto loop_exits = get_loop_exit_edges (loop); + for (edge exit : loop_exits) + redirect_edge_and_branch (exit, new_preheader); + + + /* Copy the current loop LC PHI nodes between the original loop exit + block and the new loop header. This allows us to later split the + preheader block and still find the right LC nodes. */ + edge latch_new = single_succ_edge (new_preheader); + edge latch_old = loop_latch_edge (loop); + hash_set lcssa_vars; + for (auto gsi_from = gsi_start_phis (latch_old->dest), + gsi_to = gsi_start_phis (latch_new->dest); + flow_loops && !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to); + gsi_next (&gsi_from), gsi_next (&gsi_to)) + { + gimple *from_phi = gsi_stmt (gsi_from); + gimple *to_phi = gsi_stmt (gsi_to); + tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi, latch_old); + /* In all cases, even in early break situations we're only + interested in the number of fully executed loop iters. As such + we discard any partially done iteration. So we simply propagate + the phi nodes from the latch to the merge block. */ + tree new_res = copy_ssa_name (gimple_phi_result (from_phi)); + gphi *lcssa_phi = create_phi_node (new_res, e->dest); + + lcssa_vars.add (new_arg); + + /* Main loop exit should use the final iter value. */ + add_phi_arg (lcssa_phi, new_arg, loop->vec_loop_iv, UNKNOWN_LOCATION); + + /* All other exits use the previous iters. */ + for (edge e : alt_exits) + add_phi_arg (lcssa_phi, gimple_phi_result (from_phi), e, + UNKNOWN_LOCATION); + + adjust_phi_and_debug_stmts (to_phi, latch_new, new_res); + } + + /* Copy over any live SSA vars that may not have been materialized in the + loops themselves but would be in the exit block. However when the live + value is not used inside the loop then we don't need to do this, if we do + then when we split the guard block the branch edge can end up containing the + wrong reference, particularly if it shares an edge with something that has + bypassed the loop. This is not something peeling can check so we need to + anticipate the usage of the live variable here. */ + auto exit_map = redirect_edge_var_map_vector (exit); + if (exit_map) + for (auto vm : exit_map) + { + if (lcssa_vars.contains (vm.def) + || TREE_CODE (vm.def) != SSA_NAME) + continue; + + imm_use_iterator imm_iter; + use_operand_p use_p; + bool use_in_loop = false; + + FOR_EACH_IMM_USE_FAST (use_p, imm_iter, vm.def) { - gphi *phi = gsi.phi (); - tree orig_arg = PHI_ARG_DEF_FROM_EDGE (phi, e); - location_t orig_locus - = gimple_phi_arg_location_from_edge (phi, e); + basic_block bb = gimple_bb (USE_STMT (use_p)); + if (flow_bb_inside_loop_p (loop, bb) + && !gimple_vuse (USE_STMT (use_p))) + { + use_in_loop = true; + break; + } + } - add_phi_arg (phi, orig_arg, new_exit, orig_locus); + if (!use_in_loop) + { + /* Do a final check to see if it's perhaps defined in the loop. This + mirrors the relevancy analysis's used_outside_scope. */ + gimple *stmt = SSA_NAME_DEF_STMT (vm.def); + if (!stmt || !flow_bb_inside_loop_p (loop, gimple_bb (stmt))) + continue; } + + tree new_res = copy_ssa_name (vm.result); + gphi *lcssa_phi = create_phi_node (new_res, e->dest); + for (edge exit : loop_exits) + add_phi_arg (lcssa_phi, vm.def, exit, vm.locus); } - redirect_edge_and_branch_force (e, new_preheader); - flush_pending_stmts (e); + set_immediate_dominator (CDI_DOMINATORS, new_preheader, e->src); - if (was_imm_dom || duplicate_outer_loop) + + if ((was_imm_dom || duplicate_outer_loop) && !multiple_exits_p) set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_exit->src); /* And remove the non-necessary forwarder again. Keep the other @@ -1647,9 +1756,42 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, delete_basic_block (preheader); set_immediate_dominator (CDI_DOMINATORS, scalar_loop->header, loop_preheader_edge (scalar_loop)->src); + + /* Finally after wiring the new epilogue we need to update its main exit + to the original function exit we recorded. Other exits are already + correct. */ + if (multiple_exits_p) + { + for (edge e : get_loop_exit_edges (loop)) + doms.safe_push (e->dest); + update_loop = new_loop; + doms.safe_push (exit_dest); + + /* Likely a fall-through edge, so update if needed. */ + if (single_succ_p (exit_dest)) + doms.safe_push (single_succ (exit_dest)); + } } else /* Add the copy at entry. */ { + /* Copy the current loop LC PHI nodes between the original loop exit + block and the new loop header. This allows us to later split the + preheader block and still find the right LC nodes. */ + edge old_latch_loop = loop_latch_edge (loop); + edge old_latch_init = loop_preheader_edge (loop); + edge new_latch_loop = loop_latch_edge (new_loop); + edge new_latch_init = loop_preheader_edge (new_loop); + for (auto gsi_from = gsi_start_phis (new_latch_init->dest), + gsi_to = gsi_start_phis (old_latch_loop->dest); + flow_loops && !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to); + gsi_next (&gsi_from), gsi_next (&gsi_to)) + { + gimple *from_phi = gsi_stmt (gsi_from); + gimple *to_phi = gsi_stmt (gsi_to); + tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi, new_latch_loop); + adjust_phi_and_debug_stmts (to_phi, old_latch_init, new_arg); + } + if (scalar_loop != loop) { /* Remove the non-necessary forwarder of scalar_loop again. */ @@ -1677,31 +1819,36 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, delete_basic_block (new_preheader); set_immediate_dominator (CDI_DOMINATORS, new_loop->header, loop_preheader_edge (new_loop)->src); + + if (multiple_exits_p) + update_loop = loop; } - if (scalar_loop != loop) + if (multiple_exits_p) { - /* Update new_loop->header PHIs, so that on the preheader - edge they are the ones from loop rather than scalar_loop. */ - gphi_iterator gsi_orig, gsi_new; - edge orig_e = loop_preheader_edge (loop); - edge new_e = loop_preheader_edge (new_loop); - - for (gsi_orig = gsi_start_phis (loop->header), - gsi_new = gsi_start_phis (new_loop->header); - !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_new); - gsi_next (&gsi_orig), gsi_next (&gsi_new)) + for (edge e : get_loop_exit_edges (update_loop)) { - gphi *orig_phi = gsi_orig.phi (); - gphi *new_phi = gsi_new.phi (); - tree orig_arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, orig_e); - location_t orig_locus - = gimple_phi_arg_location_from_edge (orig_phi, orig_e); - - add_phi_arg (new_phi, orig_arg, new_e, orig_locus); + edge ex; + edge_iterator ei; + FOR_EACH_EDGE (ex, ei, e->dest->succs) + { + /* Find the first non-fallthrough block as fall-throughs can't + dominate other blocks. */ + while ((ex->flags & EDGE_FALLTHRU) + && single_succ_p (ex->dest)) + { + doms.safe_push (ex->dest); + ex = single_succ_edge (ex->dest); + } + doms.safe_push (ex->dest); + } + doms.safe_push (e->dest); } - } + iterate_fix_dominators (CDI_DOMINATORS, doms, false); + if (updated_doms) + updated_doms->safe_splice (doms); + } free (new_bbs); free (bbs); @@ -1777,6 +1924,9 @@ slpeel_can_duplicate_loop_p (const loop_vec_info loop_vinfo, const_edge e) gimple_stmt_iterator loop_exit_gsi = gsi_last_bb (exit_e->src); unsigned int num_bb = loop->inner? 5 : 2; + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) + num_bb += LOOP_VINFO_ALT_EXITS (loop_vinfo).length (); + /* All loops have an outer scope; the only case loop->outer is NULL is for the function itself. */ if (!loop_outer (loop) @@ -2044,6 +2194,11 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo, class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); basic_block update_bb = update_e->dest; + /* For early exits we'll update the IVs in + vect_update_ivs_after_early_break. */ + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) + return; + basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest; /* Make sure there exists a single-predecessor exit bb: */ @@ -2131,6 +2286,208 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo, /* Fix phi expressions in the successor bb. */ adjust_phi_and_debug_stmts (phi1, update_e, ni_name); } + return; +} + +/* Function vect_update_ivs_after_early_break. + + "Advance" the induction variables of LOOP to the value they should take + after the execution of LOOP. This is currently necessary because the + vectorizer does not handle induction variables that are used after the + loop. Such a situation occurs when the last iterations of LOOP are + peeled, because of the early exit. With an early exit we always peel the + loop. + + Input: + - LOOP_VINFO - a loop info structure for the loop that is going to be + vectorized. The last few iterations of LOOP were peeled. + - LOOP - a loop that is going to be vectorized. The last few iterations + of LOOP were peeled. + - VF - The loop vectorization factor. + - NITERS_ORIG - the number of iterations that LOOP executes (before it is + vectorized). i.e, the number of times the ivs should be + bumped. + - NITERS_VECTOR - The number of iterations that the vector LOOP executes. + - UPDATE_E - a successor edge of LOOP->exit that is on the (only) path + coming out from LOOP on which there are uses of the LOOP ivs + (this is the path from LOOP->exit to epilog_loop->preheader). + + The new definitions of the ivs are placed in LOOP->exit. + The phi args associated with the edge UPDATE_E in the bb + UPDATE_E->dest are updated accordingly. + + Output: + - If available, the LCSSA phi node for the loop IV temp. + + Assumption 1: Like the rest of the vectorizer, this function assumes + a single loop exit that has a single predecessor. + + Assumption 2: The phi nodes in the LOOP header and in update_bb are + organized in the same order. + + Assumption 3: The access function of the ivs is simple enough (see + vect_can_advance_ivs_p). This assumption will be relaxed in the future. + + Assumption 4: Exactly one of the successors of LOOP exit-bb is on a path + coming out of LOOP on which the ivs of LOOP are used (this is the path + that leads to the epilog loop; other paths skip the epilog loop). This + path starts with the edge UPDATE_E, and its destination (denoted update_bb) + needs to have its phis updated. + */ + +static tree +vect_update_ivs_after_early_break (loop_vec_info loop_vinfo, class loop * epilog, + poly_int64 vf, tree niters_orig, + tree niters_vector, edge update_e) +{ + if (!LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) + return NULL; + + gphi_iterator gsi, gsi1; + tree ni_name, ivtmp = NULL; + basic_block update_bb = update_e->dest; + vec alt_exits = LOOP_VINFO_ALT_EXITS (loop_vinfo); + edge loop_iv = LOOP_VINFO_IV_EXIT (loop_vinfo); + basic_block exit_bb = loop_iv->dest; + class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); + gcond *cond = LOOP_VINFO_LOOP_IV_COND (loop_vinfo); + + gcc_assert (cond); + + for (gsi = gsi_start_phis (loop->header), gsi1 = gsi_start_phis (update_bb); + !gsi_end_p (gsi) && !gsi_end_p (gsi1); + gsi_next (&gsi), gsi_next (&gsi1)) + { + tree init_expr, final_expr, step_expr; + tree type; + tree var, ni, off; + gimple_stmt_iterator last_gsi; + + gphi *phi = gsi1.phi (); + tree phi_ssa = PHI_ARG_DEF_FROM_EDGE (phi, loop_preheader_edge (epilog)); + gphi *phi1 = dyn_cast (SSA_NAME_DEF_STMT (phi_ssa)); + if (!phi1) + continue; + stmt_vec_info phi_info = loop_vinfo->lookup_stmt (gsi.phi ()); + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "vect_update_ivs_after_early_break: phi: %G", + (gimple *)phi); + + /* Skip reduction and virtual phis. */ + if (!iv_phi_p (phi_info)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "reduc or virtual phi. skip.\n"); + continue; + } + + /* For multiple exits where we handle early exits we need to carry on + with the previous IV as loop iteration was not done because we exited + early. As such just grab the original IV. */ + phi_ssa = PHI_ARG_DEF_FROM_EDGE (gsi.phi (), loop_latch_edge (loop)); + if (gimple_cond_lhs (cond) != phi_ssa + && gimple_cond_rhs (cond) != phi_ssa) + { + type = TREE_TYPE (gimple_phi_result (phi)); + step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (phi_info); + step_expr = unshare_expr (step_expr); + + /* We previously generated the new merged phi in the same BB as the + guard. So use that to perform the scaling on rather than the + normal loop phi which don't take the early breaks into account. */ + final_expr = gimple_phi_result (phi1); + init_expr = PHI_ARG_DEF_FROM_EDGE (gsi.phi (), loop_preheader_edge (loop)); + + tree stype = TREE_TYPE (step_expr); + /* For early break the final loop IV is: + init + (final - init) * vf which takes into account peeling + values and non-single steps. */ + off = fold_build2 (MINUS_EXPR, stype, + fold_convert (stype, final_expr), + fold_convert (stype, init_expr)); + /* Now adjust for VF to get the final iteration value. */ + off = fold_build2 (MULT_EXPR, stype, off, build_int_cst (stype, vf)); + + /* Adjust the value with the offset. */ + if (POINTER_TYPE_P (type)) + ni = fold_build_pointer_plus (init_expr, off); + else + ni = fold_convert (type, + fold_build2 (PLUS_EXPR, stype, + fold_convert (stype, init_expr), + off)); + var = create_tmp_var (type, "tmp"); + + last_gsi = gsi_last_bb (exit_bb); + gimple_seq new_stmts = NULL; + ni_name = force_gimple_operand (ni, &new_stmts, false, var); + /* Exit_bb shouldn't be empty. */ + if (!gsi_end_p (last_gsi)) + gsi_insert_seq_after (&last_gsi, new_stmts, GSI_SAME_STMT); + else + gsi_insert_seq_before (&last_gsi, new_stmts, GSI_SAME_STMT); + + /* Fix phi expressions in the successor bb. */ + adjust_phi_and_debug_stmts (phi, update_e, ni_name); + } + else + { + type = TREE_TYPE (gimple_phi_result (phi)); + step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (phi_info); + step_expr = unshare_expr (step_expr); + + /* We previously generated the new merged phi in the same BB as the + guard. So use that to perform the scaling on rather than the + normal loop phi which don't take the early breaks into account. */ + init_expr = PHI_ARG_DEF_FROM_EDGE (phi1, loop_preheader_edge (loop)); + tree stype = TREE_TYPE (step_expr); + + if (vf.is_constant ()) + { + ni = fold_build2 (MULT_EXPR, stype, + fold_convert (stype, + niters_vector), + build_int_cst (stype, vf)); + + ni = fold_build2 (MINUS_EXPR, stype, + fold_convert (stype, + niters_orig), + fold_convert (stype, ni)); + } + else + /* If the loop's VF isn't constant then the loop must have been + masked, so at the end of the loop we know we have finished + the entire loop and found nothing. */ + ni = build_zero_cst (stype); + + ni = fold_convert (type, ni); + /* We don't support variable n in this version yet. */ + gcc_assert (TREE_CODE (ni) == INTEGER_CST); + + var = create_tmp_var (type, "tmp"); + + last_gsi = gsi_last_bb (exit_bb); + gimple_seq new_stmts = NULL; + ni_name = force_gimple_operand (ni, &new_stmts, false, var); + /* Exit_bb shouldn't be empty. */ + if (!gsi_end_p (last_gsi)) + gsi_insert_seq_after (&last_gsi, new_stmts, GSI_SAME_STMT); + else + gsi_insert_seq_before (&last_gsi, new_stmts, GSI_SAME_STMT); + + adjust_phi_and_debug_stmts (phi1, loop_iv, ni_name); + + for (edge exit : alt_exits) + adjust_phi_and_debug_stmts (phi1, exit, + build_int_cst (TREE_TYPE (step_expr), + vf)); + ivtmp = gimple_phi_result (phi1); + } + } + + return ivtmp; } /* Return a gimple value containing the misalignment (measured in vector @@ -2632,137 +2989,34 @@ vect_gen_vector_loop_niters_mult_vf (loop_vec_info loop_vinfo, /* LCSSA_PHI is a lcssa phi of EPILOG loop which is copied from LOOP, this function searches for the corresponding lcssa phi node in exit - bb of LOOP. If it is found, return the phi result; otherwise return - NULL. */ + bb of LOOP following the LCSSA_EDGE to the exit node. If it is found, + return the phi result; otherwise return NULL. */ static tree find_guard_arg (class loop *loop, class loop *epilog ATTRIBUTE_UNUSED, - gphi *lcssa_phi) + gphi *lcssa_phi, int lcssa_edge = 0) { gphi_iterator gsi; edge e = loop->vec_loop_iv; - gcc_assert (single_pred_p (e->dest)); for (gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi); gsi_next (&gsi)) { gphi *phi = gsi.phi (); - if (operand_equal_p (PHI_ARG_DEF (phi, 0), - PHI_ARG_DEF (lcssa_phi, 0), 0)) - return PHI_RESULT (phi); - } - return NULL_TREE; -} - -/* Function slpeel_tree_duplicate_loop_to_edge_cfg duplciates FIRST/SECOND - from SECOND/FIRST and puts it at the original loop's preheader/exit - edge, the two loops are arranged as below: - - preheader_a: - first_loop: - header_a: - i_1 = PHI; - ... - i_2 = i_1 + 1; - if (cond_a) - goto latch_a; - else - goto between_bb; - latch_a: - goto header_a; - - between_bb: - ;; i_x = PHI; ;; LCSSA phi node to be created for FIRST, - - second_loop: - header_b: - i_3 = PHI; ;; Use of i_0 to be replaced with i_x, - or with i_2 if no LCSSA phi is created - under condition of CREATE_LCSSA_FOR_IV_PHIS. - ... - i_4 = i_3 + 1; - if (cond_b) - goto latch_b; - else - goto exit_bb; - latch_b: - goto header_b; - - exit_bb: - - This function creates loop closed SSA for the first loop; update the - second loop's PHI nodes by replacing argument on incoming edge with the - result of newly created lcssa PHI nodes. IF CREATE_LCSSA_FOR_IV_PHIS - is false, Loop closed ssa phis will only be created for non-iv phis for - the first loop. - - This function assumes exit bb of the first loop is preheader bb of the - second loop, i.e, between_bb in the example code. With PHIs updated, - the second loop will execute rest iterations of the first. */ - -static void -slpeel_update_phi_nodes_for_loops (loop_vec_info loop_vinfo, - class loop *first, class loop *second, - bool create_lcssa_for_iv_phis) -{ - gphi_iterator gsi_update, gsi_orig; - class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); - - edge first_latch_e = EDGE_SUCC (first->latch, 0); - edge second_preheader_e = loop_preheader_edge (second); - basic_block between_bb = single_exit (first)->dest; - - gcc_assert (between_bb == second_preheader_e->src); - gcc_assert (single_pred_p (between_bb) && single_succ_p (between_bb)); - /* Either the first loop or the second is the loop to be vectorized. */ - gcc_assert (loop == first || loop == second); - - for (gsi_orig = gsi_start_phis (first->header), - gsi_update = gsi_start_phis (second->header); - !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_update); - gsi_next (&gsi_orig), gsi_next (&gsi_update)) - { - gphi *orig_phi = gsi_orig.phi (); - gphi *update_phi = gsi_update.phi (); - - tree arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, first_latch_e); - /* Generate lcssa PHI node for the first loop. */ - gphi *vect_phi = (loop == first) ? orig_phi : update_phi; - stmt_vec_info vect_phi_info = loop_vinfo->lookup_stmt (vect_phi); - if (create_lcssa_for_iv_phis || !iv_phi_p (vect_phi_info)) + /* Nested loops with multiple exits can have different no# phi node + arguments between the main loop and epilog as epilog falls to the + second loop. */ + if (gimple_phi_num_args (phi) > e->dest_idx) { - tree new_res = copy_ssa_name (PHI_RESULT (orig_phi)); - gphi *lcssa_phi = create_phi_node (new_res, between_bb); - add_phi_arg (lcssa_phi, arg, single_exit (first), UNKNOWN_LOCATION); - arg = new_res; - } - - /* Update PHI node in the second loop by replacing arg on the loop's - incoming edge. */ - adjust_phi_and_debug_stmts (update_phi, second_preheader_e, arg); - } - - /* For epilogue peeling we have to make sure to copy all LC PHIs - for correct vectorization of live stmts. */ - if (loop == first) - { - basic_block orig_exit = single_exit (second)->dest; - for (gsi_orig = gsi_start_phis (orig_exit); - !gsi_end_p (gsi_orig); gsi_next (&gsi_orig)) - { - gphi *orig_phi = gsi_orig.phi (); - tree orig_arg = PHI_ARG_DEF (orig_phi, 0); - if (TREE_CODE (orig_arg) != SSA_NAME || virtual_operand_p (orig_arg)) - continue; - - /* Already created in the above loop. */ - if (find_guard_arg (first, second, orig_phi)) + tree var = PHI_ARG_DEF (phi, e->dest_idx); + if (TREE_CODE (var) != SSA_NAME) continue; - tree new_res = copy_ssa_name (orig_arg); - gphi *lcphi = create_phi_node (new_res, between_bb); - add_phi_arg (lcphi, orig_arg, single_exit (first), UNKNOWN_LOCATION); + if (operand_equal_p (get_current_def (var), + PHI_ARG_DEF (lcssa_phi, lcssa_edge), 0)) + return PHI_RESULT (phi); } } + return NULL_TREE; } /* Function slpeel_add_loop_guard adds guard skipping from the beginning @@ -2910,13 +3164,11 @@ slpeel_update_phi_nodes_for_guard2 (class loop *loop, class loop *epilog, gcc_assert (single_succ_p (merge_bb)); edge e = single_succ_edge (merge_bb); basic_block exit_bb = e->dest; - gcc_assert (single_pred_p (exit_bb)); - gcc_assert (single_pred (exit_bb) == single_exit (epilog)->dest); for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi)) { gphi *update_phi = gsi.phi (); - tree old_arg = PHI_ARG_DEF (update_phi, 0); + tree old_arg = PHI_ARG_DEF (update_phi, e->dest_idx); tree merge_arg = NULL_TREE; @@ -2928,7 +3180,7 @@ slpeel_update_phi_nodes_for_guard2 (class loop *loop, class loop *epilog, if (!merge_arg) merge_arg = old_arg; - tree guard_arg = find_guard_arg (loop, epilog, update_phi); + tree guard_arg = find_guard_arg (loop, epilog, update_phi, e->dest_idx); /* If the var is live after loop but not a reduction, we simply use the old arg. */ if (!guard_arg) @@ -2948,21 +3200,6 @@ slpeel_update_phi_nodes_for_guard2 (class loop *loop, class loop *epilog, } } -/* EPILOG loop is duplicated from the original loop for vectorizing, - the arg of its loop closed ssa PHI needs to be updated. */ - -static void -slpeel_update_phi_nodes_for_lcssa (class loop *epilog) -{ - gphi_iterator gsi; - basic_block exit_bb = single_exit (epilog)->dest; - - gcc_assert (single_pred_p (exit_bb)); - edge e = EDGE_PRED (exit_bb, 0); - for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi)) - rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), e)); -} - /* EPILOGUE_VINFO is an epilogue loop that we now know would need to iterate exactly CONST_NITERS times. Make a final decision about whether the epilogue loop should be used, returning true if so. */ @@ -3138,6 +3375,14 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, bound_epilog += vf - 1; if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)) bound_epilog += 1; + /* For early breaks the scalar loop needs to execute at most VF times + to find the element that caused the break. */ + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) + { + bound_epilog = vf; + /* Force a scalar epilogue as we can't vectorize the index finding. */ + vect_epilogues = false; + } bool epilog_peeling = maybe_ne (bound_epilog, 0U); poly_uint64 bound_scalar = bound_epilog; @@ -3297,16 +3542,24 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, bound_prolog + bound_epilog) : (!LOOP_REQUIRES_VERSIONING (loop_vinfo) || vect_epilogues)); + + /* We only support early break vectorization on known bounds at this time. + This means that if the vector loop can't be entered then we won't generate + it at all. So for now force skip_vector off because the additional control + flow messes with the BB exits and we've already analyzed them. */ + skip_vector = skip_vector && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo); + /* Epilog loop must be executed if the number of iterations for epilog loop is known at compile time, otherwise we need to add a check at the end of vector loop and skip to the end of epilog loop. */ bool skip_epilog = (prolog_peeling < 0 || !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) || !vf.is_constant ()); - /* PEELING_FOR_GAPS is special because epilog loop must be executed. */ - if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)) + /* PEELING_FOR_GAPS and peeling for early breaks are special because epilog + loop must be executed. */ + if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) + || LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) skip_epilog = false; - class loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo); auto_vec original_counts; basic_block *original_bbs = NULL; @@ -3344,13 +3597,13 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, if (prolog_peeling) { e = loop_preheader_edge (loop); - gcc_checking_assert (slpeel_can_duplicate_loop_p (loop, e)); - + gcc_checking_assert (slpeel_can_duplicate_loop_p (loop_vinfo, e)); /* Peel prolog and put it on preheader edge of loop. */ - prolog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, scalar_loop, e); + prolog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, scalar_loop, e, + true); gcc_assert (prolog); prolog->force_vectorize = false; - slpeel_update_phi_nodes_for_loops (loop_vinfo, prolog, loop, true); + first_loop = prolog; reset_original_copy_tables (); @@ -3420,11 +3673,12 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, as the transformations mentioned above make less or no sense when not vectorizing. */ epilog = vect_epilogues ? get_loop_copy (loop) : scalar_loop; - epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, epilog, e); + auto_vec doms; + epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, epilog, e, true, + &doms); gcc_assert (epilog); epilog->force_vectorize = false; - slpeel_update_phi_nodes_for_loops (loop_vinfo, loop, epilog, false); /* Scalar version loop may be preferred. In this case, add guard and skip to epilog. Note this only happens when the number of @@ -3496,6 +3750,54 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf, update_e); + /* For early breaks we must create a guard to check how many iterations + of the scalar loop are yet to be performed. */ + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) + { + tree ivtmp = + vect_update_ivs_after_early_break (loop_vinfo, epilog, vf, niters, + *niters_vector, update_e); + + gcc_assert (ivtmp); + tree guard_cond = fold_build2 (EQ_EXPR, boolean_type_node, + fold_convert (TREE_TYPE (niters), + ivtmp), + build_zero_cst (TREE_TYPE (niters))); + basic_block guard_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest; + + /* If we had a fallthrough edge, the guard will the threaded through + and so we may need to find the actual final edge. */ + edge final_edge = epilog->vec_loop_iv; + /* slpeel_update_phi_nodes_for_guard2 expects an empty block in + between the guard and the exit edge. It only adds new nodes and + doesn't update existing one in the current scheme. */ + basic_block guard_to = split_edge (final_edge); + edge guard_e = slpeel_add_loop_guard (guard_bb, guard_cond, guard_to, + guard_bb, prob_epilog.invert (), + irred_flag); + doms.safe_push (guard_bb); + + iterate_fix_dominators (CDI_DOMINATORS, doms, false); + + /* We must update all the edges from the new guard_bb. */ + slpeel_update_phi_nodes_for_guard2 (loop, epilog, guard_e, + final_edge); + + /* If the loop was versioned we'll have an intermediate BB between + the guard and the exit. This intermediate block is required + because in the current scheme of things the guard block phi + updating can only maintain LCSSA by creating new blocks. In this + case we just need to update the uses in this block as well. */ + if (loop != scalar_loop) + { + for (gphi_iterator gsi = gsi_start_phis (guard_to); + !gsi_end_p (gsi); gsi_next (&gsi)) + rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), guard_e)); + } + + flush_pending_stmts (guard_e); + } + if (skip_epilog) { guard_cond = fold_build2 (EQ_EXPR, boolean_type_node, @@ -3520,8 +3822,6 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, } scale_loop_profile (epilog, prob_epilog, 0); } - else - slpeel_update_phi_nodes_for_lcssa (epilog); unsigned HOST_WIDE_INT bound; if (bound_scalar.is_constant (&bound)) diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index b4a98de80aa39057fc9b17977dd0e347b4f0fb5d..ab9a2048186f461f5ec49f21421958e7ee25eada 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -1007,6 +1007,8 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared) partial_load_store_bias (0), peeling_for_gaps (false), peeling_for_niter (false), + early_breaks (false), + non_break_control_flow (false), no_data_dependencies (false), has_mask_store (false), scalar_loop_scaling (profile_probability::uninitialized ()), @@ -1199,6 +1201,14 @@ vect_need_peeling_or_partial_vectors_p (loop_vec_info loop_vinfo) th = LOOP_VINFO_COST_MODEL_THRESHOLD (LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo)); + /* When we have multiple exits and VF is unknown, we must require partial + vectors because the loop bounds is not a minimum but a maximum. That is to + say we cannot unpredicate the main loop unless we peel or use partial + vectors in the epilogue. */ + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo) + && !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ()) + return true; + if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0) { @@ -1652,12 +1662,12 @@ vect_compute_single_scalar_iteration_cost (loop_vec_info loop_vinfo) loop_vinfo->scalar_costs->finish_cost (nullptr); } - /* Function vect_analyze_loop_form. Verify that certain CFG restrictions hold, including: - the loop has a pre-header - - the loop has a single entry and exit + - the loop has a single entry + - nested loops can have only a single exit. - the loop exit condition is simple enough - the number of iterations can be analyzed, i.e, a countable loop. The niter could be analyzed under some assumptions. */ @@ -1693,11 +1703,6 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info) | (exit-bb) */ - if (loop->num_nodes != 2) - return opt_result::failure_at (vect_location, - "not vectorized:" - " control flow in loop.\n"); - if (empty_block_p (loop->header)) return opt_result::failure_at (vect_location, "not vectorized: empty loop.\n"); @@ -1768,11 +1773,13 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info) dump_printf_loc (MSG_NOTE, vect_location, "Considering outer-loop vectorization.\n"); info->inner_loop_cond = inner.loop_cond; + + if (!single_exit (loop)) + return opt_result::failure_at (vect_location, + "not vectorized: multiple exits.\n"); + } - if (!single_exit (loop)) - return opt_result::failure_at (vect_location, - "not vectorized: multiple exits.\n"); if (EDGE_COUNT (loop->header->preds) != 2) return opt_result::failure_at (vect_location, "not vectorized:" @@ -1788,11 +1795,36 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info) "not vectorized: latch block not empty.\n"); /* Make sure the exit is not abnormal. */ - edge e = single_exit (loop); - if (e->flags & EDGE_ABNORMAL) - return opt_result::failure_at (vect_location, - "not vectorized:" - " abnormal loop exit edge.\n"); + auto_vec exits = get_loop_exit_edges (loop); + edge nexit = loop->vec_loop_iv; + for (edge e : exits) + { + if (e->flags & EDGE_ABNORMAL) + return opt_result::failure_at (vect_location, + "not vectorized:" + " abnormal loop exit edge.\n"); + /* Early break BB must be after the main exit BB. In theory we should + be able to vectorize the inverse order, but the current flow in the + the vectorizer always assumes you update successor PHI nodes, not + preds. */ + if (e != nexit && !dominated_by_p (CDI_DOMINATORS, nexit->src, e->src)) + return opt_result::failure_at (vect_location, + "not vectorized:" + " abnormal loop exit edge order.\n"); + } + + /* We currently only support early exit loops with known bounds. */ + if (exits.length () > 1) + { + class tree_niter_desc niter; + if (!number_of_iterations_exit_assumptions (loop, nexit, &niter, NULL) + || chrec_contains_undetermined (niter.niter) + || !evolution_function_is_constant_p (niter.niter)) + return opt_result::failure_at (vect_location, + "not vectorized:" + " early breaks only supported on loops" + " with known iteration bounds.\n"); + } info->conds = vect_get_loop_niters (loop, &info->assumptions, @@ -1866,6 +1898,10 @@ vect_create_loop_vinfo (class loop *loop, vec_info_shared *shared, LOOP_VINFO_LOOP_CONDS (loop_vinfo).safe_splice (info->alt_loop_conds); LOOP_VINFO_LOOP_IV_COND (loop_vinfo) = info->loop_cond; + /* Check to see if we're vectorizing multiple exits. */ + LOOP_VINFO_EARLY_BREAKS (loop_vinfo) + = !LOOP_VINFO_LOOP_CONDS (loop_vinfo).is_empty (); + if (info->inner_loop_cond) { stmt_vec_info inner_loop_cond_info @@ -3070,7 +3106,8 @@ start_over: /* If an epilogue loop is required make sure we can create one. */ if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) - || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)) + || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) + || LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) { if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "epilog loop required\n"); @@ -5797,7 +5834,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, basic_block exit_bb; tree scalar_dest; tree scalar_type; - gimple *new_phi = NULL, *phi; + gimple *new_phi = NULL, *phi = NULL; gimple_stmt_iterator exit_gsi; tree new_temp = NULL_TREE, new_name, new_scalar_dest; gimple *epilog_stmt = NULL; @@ -6039,6 +6076,33 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, new_def = gimple_convert (&stmts, vectype, new_def); reduc_inputs.quick_push (new_def); } + + /* Update the other exits. */ + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) + { + vec alt_exits = LOOP_VINFO_ALT_EXITS (loop_vinfo); + gphi_iterator gsi, gsi1; + for (edge exit : alt_exits) + { + /* Find the phi node to propaget into the exit block for each + exit edge. */ + for (gsi = gsi_start_phis (exit_bb), + gsi1 = gsi_start_phis (exit->src); + !gsi_end_p (gsi) && !gsi_end_p (gsi1); + gsi_next (&gsi), gsi_next (&gsi1)) + { + /* There really should be a function to just get the number + of phis inside a bb. */ + if (phi && phi == gsi.phi ()) + { + gphi *phi1 = gsi1.phi (); + SET_PHI_ARG_DEF (phi, exit->dest_idx, + PHI_RESULT (phi1)); + break; + } + } + } + } gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT); } @@ -10355,6 +10419,13 @@ vectorizable_live_operation (vec_info *vinfo, new_tree = lane_extract ; lhs' = new_tree; */ + /* When vectorizing an early break, any live statement that is used + outside of the loop are dead. The loop will never get to them. + We could change the liveness value during analysis instead but since + the below code is invalid anyway just ignore it during codegen. */ + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) + return true; + class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest; gcc_assert (single_pred_p (exit_bb)); @@ -11277,7 +11348,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call) /* Make sure there exists a single-predecessor exit bb. Do this before versioning. */ edge e = LOOP_VINFO_IV_EXIT (loop_vinfo); - if (! single_pred_p (e->dest)) + if (e && ! single_pred_p (e->dest) && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) { split_loop_exit_edge (e, true); if (dump_enabled_p ()) @@ -11303,7 +11374,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call) if (LOOP_VINFO_SCALAR_LOOP (loop_vinfo)) { e = single_exit (LOOP_VINFO_SCALAR_LOOP (loop_vinfo)); - if (! single_pred_p (e->dest)) + if (e && ! single_pred_p (e->dest)) { split_loop_exit_edge (e, true); if (dump_enabled_p ()) @@ -11641,7 +11712,8 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call) /* Loops vectorized with a variable factor won't benefit from unrolling/peeling. */ - if (!vf.is_constant ()) + if (!vf.is_constant () + && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) { loop->unroll = 1; if (dump_enabled_p ()) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 87c4353fa5180fcb7f60b192897456cf24f3fdbe..03524e8500ee06df42f82afe78ee2a7c627be45b 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -344,9 +344,34 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo, *live_p = false; /* cond stmt other than loop exit cond. */ - if (is_ctrl_stmt (stmt_info->stmt) - && STMT_VINFO_TYPE (stmt_info) != loop_exit_ctrl_vec_info_type) - *relevant = vect_used_in_scope; + if (is_ctrl_stmt (stmt_info->stmt)) + { + /* Ideally EDGE_LOOP_EXIT would have been set on the exit edge, but + it looks like loop_manip doesn't do that.. So we have to do it + the hard way. */ + basic_block bb = gimple_bb (stmt_info->stmt); + bool exit_bb = false, early_exit = false; + edge_iterator ei; + edge e; + FOR_EACH_EDGE (e, ei, bb->succs) + if (!flow_bb_inside_loop_p (loop, e->dest)) + { + exit_bb = true; + early_exit = loop->vec_loop_iv->src != bb; + break; + } + + /* We should have processed any exit edge, so an edge not an early + break must be a loop IV edge. We need to distinguish between the + two as we don't want to generate code for the main loop IV. */ + if (exit_bb) + { + if (early_exit) + *relevant = vect_used_in_scope; + } + else if (bb->loop_father == loop) + LOOP_VINFO_GENERAL_CTR_FLOW (loop_vinfo) = true; + } /* changing memory. */ if (gimple_code (stmt_info->stmt) != GIMPLE_PHI) @@ -359,6 +384,11 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo, *relevant = vect_used_in_scope; } + auto_vec exits = get_loop_exit_edges (loop); + auto_bitmap exit_bbs; + for (edge exit : exits) + bitmap_set_bit (exit_bbs, exit->dest->index); + /* uses outside the loop. */ FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt_info->stmt, op_iter, SSA_OP_DEF) { @@ -377,7 +407,7 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo, /* We expect all such uses to be in the loop exit phis (because of loop closed form) */ gcc_assert (gimple_code (USE_STMT (use_p)) == GIMPLE_PHI); - gcc_assert (bb == single_exit (loop)->dest); + gcc_assert (bitmap_bit_p (exit_bbs, bb->index)); *live_p = true; } @@ -683,6 +713,13 @@ vect_mark_stmts_to_be_vectorized (loop_vec_info loop_vinfo, bool *fatal) } } + /* Ideally this should be in vect_analyze_loop_form but we haven't seen all + the conds yet at that point and there's no quick way to retrieve them. */ + if (LOOP_VINFO_GENERAL_CTR_FLOW (loop_vinfo)) + return opt_result::failure_at (vect_location, + "not vectorized:" + " unsupported control flow in loop.\n"); + /* 2. Process_worklist */ while (worklist.length () > 0) { @@ -778,6 +815,20 @@ vect_mark_stmts_to_be_vectorized (loop_vec_info loop_vinfo, bool *fatal) return res; } } + } + else if (gcond *cond = dyn_cast (stmt_vinfo->stmt)) + { + enum tree_code rhs_code = gimple_cond_code (cond); + gcc_assert (TREE_CODE_CLASS (rhs_code) == tcc_comparison); + opt_result res + = process_use (stmt_vinfo, gimple_cond_lhs (cond), + loop_vinfo, relevant, &worklist, false); + if (!res) + return res; + res = process_use (stmt_vinfo, gimple_cond_rhs (cond), + loop_vinfo, relevant, &worklist, false); + if (!res) + return res; } else if (gcall *call = dyn_cast (stmt_vinfo->stmt)) { @@ -11919,11 +11970,15 @@ vect_analyze_stmt (vec_info *vinfo, node_instance, cost_vec); if (!res) return res; - } + } + + if (is_ctrl_stmt (stmt_info->stmt)) + STMT_VINFO_DEF_TYPE (stmt_info) = vect_early_exit_def; switch (STMT_VINFO_DEF_TYPE (stmt_info)) { case vect_internal_def: + case vect_early_exit_def: break; case vect_reduction_def: @@ -11956,6 +12011,7 @@ vect_analyze_stmt (vec_info *vinfo, { gcall *call = dyn_cast (stmt_info->stmt); gcc_assert (STMT_VINFO_VECTYPE (stmt_info) + || gimple_code (stmt_info->stmt) == GIMPLE_COND || (call && gimple_call_lhs (call) == NULL_TREE)); *need_to_vectorize = true; } diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index ec65b65b5910e9cbad0a8c7e83c950b6168b98bf..24a0567a2f23f1b3d8b340baff61d18da8e242dd 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -63,6 +63,7 @@ enum vect_def_type { vect_internal_def, vect_induction_def, vect_reduction_def, + vect_early_exit_def, vect_double_reduction_def, vect_nested_cycle, vect_first_order_recurrence, @@ -876,6 +877,13 @@ public: we need to peel off iterations at the end to form an epilogue loop. */ bool peeling_for_niter; + /* When the loop has early breaks that we can vectorize we need to peel + the loop for the break finding loop. */ + bool early_breaks; + + /* When the loop has a non-early break control flow inside. */ + bool non_break_control_flow; + /* List of loop additional IV conditionals found in the loop. */ auto_vec conds; @@ -985,9 +993,11 @@ public: #define LOOP_VINFO_REDUCTION_CHAINS(L) (L)->reduction_chains #define LOOP_VINFO_PEELING_FOR_GAPS(L) (L)->peeling_for_gaps #define LOOP_VINFO_PEELING_FOR_NITER(L) (L)->peeling_for_niter +#define LOOP_VINFO_EARLY_BREAKS(L) (L)->early_breaks #define LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS(L) (L)->early_break_conflict #define LOOP_VINFO_EARLY_BRK_DEST_BB(L) (L)->early_break_dest_bb #define LOOP_VINFO_EARLY_BRK_VUSES(L) (L)->early_break_vuses +#define LOOP_VINFO_GENERAL_CTR_FLOW(L) (L)->non_break_control_flow #define LOOP_VINFO_LOOP_CONDS(L) (L)->conds #define LOOP_VINFO_LOOP_IV_COND(L) (L)->loop_iv_cond #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies @@ -1038,8 +1048,8 @@ public: stack. */ typedef opt_pointer_wrapper opt_loop_vec_info; -inline loop_vec_info -loop_vec_info_for_loop (class loop *loop) +static inline loop_vec_info +loop_vec_info_for_loop (const class loop *loop) { return (loop_vec_info) loop->aux; } @@ -1789,7 +1799,7 @@ is_loop_header_bb_p (basic_block bb) { if (bb == (bb->loop_father)->header) return true; - gcc_checking_assert (EDGE_COUNT (bb->preds) == 1); + return false; } @@ -2176,9 +2186,10 @@ class auto_purge_vect_location in tree-vect-loop-manip.cc. */ extern void vect_set_loop_condition (class loop *, loop_vec_info, tree, tree, tree, bool); -extern bool slpeel_can_duplicate_loop_p (const class loop *, const_edge); +extern bool slpeel_can_duplicate_loop_p (const loop_vec_info, const_edge); class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *, - class loop *, edge); + class loop *, edge, bool, + vec * = NULL); class loop *vect_loop_versioning (loop_vec_info, gimple *); extern class loop *vect_do_peeling (loop_vec_info, tree, tree, tree *, tree *, tree *, int, bool, bool, diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc index a048e9d89178a37455bd7b83ab0f2a238a4ce69e..0dc5479dc92058b6c70c67f29f5dc9a8d72235f4 100644 --- a/gcc/tree-vectorizer.cc +++ b/gcc/tree-vectorizer.cc @@ -1379,7 +1379,9 @@ pass_vectorize::execute (function *fun) predicates that need to be shared for optimal predicate usage. However reassoc will re-order them and prevent CSE from working as it should. CSE only the loop body, not the entry. */ - bitmap_set_bit (exit_bbs, single_exit (loop)->dest->index); + auto_vec exits = get_loop_exit_edges (loop); + for (edge exit : exits) + bitmap_set_bit (exit_bbs, exit->dest->index); edge entry = EDGE_PRED (loop_preheader_edge (loop)->src, 0); do_rpo_vn (fun, entry, exit_bbs); --- a/gcc/tree-loop-distribution.cc +++ b/gcc/tree-loop-distribution.cc @@ -948,7 +948,7 @@ copy_loop_before (class loop *loop, bool redirect_lc_phi_defs) edge preheader = loop_preheader_edge (loop); initialize_original_copy_tables (); - res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, NULL, preheader); + res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, NULL, preheader, false); gcc_assert (res != NULL); /* When a not last partition is supposed to keep the LC PHIs computed diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc index 5d398b67e68c7076760854119590f18b19c622b6..79686f6c4945b7139ba377300430c04b7aeefe6c 100644 --- a/gcc/tree-ssa-loop-niter.cc +++ b/gcc/tree-ssa-loop-niter.cc @@ -3072,7 +3072,12 @@ loop_only_exit_p (const class loop *loop, basic_block *body, const_edge exit) gimple_stmt_iterator bsi; unsigned i; - if (exit != single_exit (loop)) + /* We need to check for alternative exits since exit can be NULL. */ + auto exits = get_loop_exit_edges (loop); + if (exits.length () != 1) + return false; + + if (exit != exits[0]) return false; for (i = 0; i < loop->num_nodes; i++) diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc index 6b93fb3f9af8f2bbdf5dec28f0009177aa5171ab..550d7f40002cf0b58f8a927cb150edd7c2aa9999 100644 --- a/gcc/tree-vect-loop-manip.cc +++ b/gcc/tree-vect-loop-manip.cc @@ -252,6 +252,9 @@ adjust_phi_and_debug_stmts (gimple *update_phi, edge e, tree new_def) { tree orig_def = PHI_ARG_DEF_FROM_EDGE (update_phi, e); + gcc_assert (TREE_CODE (orig_def) != SSA_NAME + || orig_def != new_def); + SET_PHI_ARG_DEF (update_phi, e->dest_idx, new_def); if (MAY_HAVE_DEBUG_BIND_STMTS) @@ -1292,7 +1295,8 @@ vect_set_loop_condition_normal (loop_vec_info loop_vinfo, gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT); /* Record the number of latch iterations. */ - if (limit == niters) + if (limit == niters + || LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) /* Case A: the loop iterates NITERS times. Subtract one to get the latch count. */ loop->nb_iterations = fold_build2 (MINUS_EXPR, niters_type, niters, @@ -1303,7 +1307,13 @@ vect_set_loop_condition_normal (loop_vec_info loop_vinfo, loop->nb_iterations = fold_build2 (TRUNC_DIV_EXPR, niters_type, limit, step); - if (final_iv) + /* For multiple exits we've already maintained LCSSA form and handled + the scalar iteration update in the code that deals with the merge + block and its updated guard. I could move that code here instead + of in vect_update_ivs_after_early_break but I have to still deal + with the updates to the counter `i`. So for now I'll keep them + together. */ + if (final_iv && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) { gassign *assign; edge exit = LOOP_VINFO_IV_EXIT (loop_vinfo); @@ -1509,11 +1519,19 @@ vec_init_exit_info (class loop *loop) on E which is either the entry or exit of LOOP. If SCALAR_LOOP is non-NULL, assume LOOP and SCALAR_LOOP are equivalent and copy the basic blocks from SCALAR_LOOP instead of LOOP, but to either the - entry or exit of LOOP. */ + entry or exit of LOOP. If FLOW_LOOPS then connect LOOP to SCALAR_LOOP as a + continuation. This is correct for cases where one loop continues from the + other like in the vectorizer, but not true for uses in e.g. loop distribution + where the loop is duplicated and then modified. + + If UPDATED_DOMS is not NULL it is update with the list of basic blocks whoms + dominators were updated during the peeling. */ class loop * slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, - class loop *scalar_loop, edge e) + class loop *scalar_loop, edge e, + bool flow_loops, + vec *updated_doms) { class loop *new_loop; basic_block *new_bbs, *bbs, *pbbs; @@ -1602,6 +1620,19 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, for (unsigned i = (at_exit ? 0 : 1); i < scalar_loop->num_nodes + 1; i++) rename_variables_in_bb (new_bbs[i], duplicate_outer_loop); + /* Rename the exit uses. */ + for (edge exit : get_loop_exit_edges (new_loop)) + for (auto gsi = gsi_start_phis (exit->dest); + !gsi_end_p (gsi); gsi_next (&gsi)) + { + tree orig_def = PHI_ARG_DEF_FROM_EDGE (gsi.phi (), exit); + rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), exit)); + if (MAY_HAVE_DEBUG_BIND_STMTS) + adjust_debug_stmts (orig_def, PHI_RESULT (gsi.phi ()), exit->dest); + } + + /* This condition happens when the loop has been versioned. e.g. due to ifcvt + versioning the loop. */ if (scalar_loop != loop) { /* If we copied from SCALAR_LOOP rather than LOOP, SSA_NAMEs from @@ -1616,28 +1647,106 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, EDGE_SUCC (loop->latch, 0)); } + vec alt_exits = loop->vec_loop_alt_exits; + bool multiple_exits_p = !alt_exits.is_empty (); + auto_vec doms; + class loop *update_loop = NULL; + if (at_exit) /* Add the loop copy at exit. */ { - if (scalar_loop != loop) + if (scalar_loop != loop && new_exit->dest != exit_dest) { - gphi_iterator gsi; new_exit = redirect_edge_and_branch (new_exit, exit_dest); + flush_pending_stmts (new_exit); + } - for (gsi = gsi_start_phis (exit_dest); !gsi_end_p (gsi); - gsi_next (&gsi)) + auto loop_exits = get_loop_exit_edges (loop); + for (edge exit : loop_exits) + redirect_edge_and_branch (exit, new_preheader); + + + /* Copy the current loop LC PHI nodes between the original loop exit + block and the new loop header. This allows us to later split the + preheader block and still find the right LC nodes. */ + edge latch_new = single_succ_edge (new_preheader); + edge latch_old = loop_latch_edge (loop); + hash_set lcssa_vars; + for (auto gsi_from = gsi_start_phis (latch_old->dest), + gsi_to = gsi_start_phis (latch_new->dest); + flow_loops && !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to); + gsi_next (&gsi_from), gsi_next (&gsi_to)) + { + gimple *from_phi = gsi_stmt (gsi_from); + gimple *to_phi = gsi_stmt (gsi_to); + tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi, latch_old); + /* In all cases, even in early break situations we're only + interested in the number of fully executed loop iters. As such + we discard any partially done iteration. So we simply propagate + the phi nodes from the latch to the merge block. */ + tree new_res = copy_ssa_name (gimple_phi_result (from_phi)); + gphi *lcssa_phi = create_phi_node (new_res, e->dest); + + lcssa_vars.add (new_arg); + + /* Main loop exit should use the final iter value. */ + add_phi_arg (lcssa_phi, new_arg, loop->vec_loop_iv, UNKNOWN_LOCATION); + + /* All other exits use the previous iters. */ + for (edge e : alt_exits) + add_phi_arg (lcssa_phi, gimple_phi_result (from_phi), e, + UNKNOWN_LOCATION); + + adjust_phi_and_debug_stmts (to_phi, latch_new, new_res); + } + + /* Copy over any live SSA vars that may not have been materialized in the + loops themselves but would be in the exit block. However when the live + value is not used inside the loop then we don't need to do this, if we do + then when we split the guard block the branch edge can end up containing the + wrong reference, particularly if it shares an edge with something that has + bypassed the loop. This is not something peeling can check so we need to + anticipate the usage of the live variable here. */ + auto exit_map = redirect_edge_var_map_vector (exit); + if (exit_map) + for (auto vm : exit_map) + { + if (lcssa_vars.contains (vm.def) + || TREE_CODE (vm.def) != SSA_NAME) + continue; + + imm_use_iterator imm_iter; + use_operand_p use_p; + bool use_in_loop = false; + + FOR_EACH_IMM_USE_FAST (use_p, imm_iter, vm.def) { - gphi *phi = gsi.phi (); - tree orig_arg = PHI_ARG_DEF_FROM_EDGE (phi, e); - location_t orig_locus - = gimple_phi_arg_location_from_edge (phi, e); + basic_block bb = gimple_bb (USE_STMT (use_p)); + if (flow_bb_inside_loop_p (loop, bb) + && !gimple_vuse (USE_STMT (use_p))) + { + use_in_loop = true; + break; + } + } - add_phi_arg (phi, orig_arg, new_exit, orig_locus); + if (!use_in_loop) + { + /* Do a final check to see if it's perhaps defined in the loop. This + mirrors the relevancy analysis's used_outside_scope. */ + gimple *stmt = SSA_NAME_DEF_STMT (vm.def); + if (!stmt || !flow_bb_inside_loop_p (loop, gimple_bb (stmt))) + continue; } + + tree new_res = copy_ssa_name (vm.result); + gphi *lcssa_phi = create_phi_node (new_res, e->dest); + for (edge exit : loop_exits) + add_phi_arg (lcssa_phi, vm.def, exit, vm.locus); } - redirect_edge_and_branch_force (e, new_preheader); - flush_pending_stmts (e); + set_immediate_dominator (CDI_DOMINATORS, new_preheader, e->src); - if (was_imm_dom || duplicate_outer_loop) + + if ((was_imm_dom || duplicate_outer_loop) && !multiple_exits_p) set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_exit->src); /* And remove the non-necessary forwarder again. Keep the other @@ -1647,9 +1756,42 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, delete_basic_block (preheader); set_immediate_dominator (CDI_DOMINATORS, scalar_loop->header, loop_preheader_edge (scalar_loop)->src); + + /* Finally after wiring the new epilogue we need to update its main exit + to the original function exit we recorded. Other exits are already + correct. */ + if (multiple_exits_p) + { + for (edge e : get_loop_exit_edges (loop)) + doms.safe_push (e->dest); + update_loop = new_loop; + doms.safe_push (exit_dest); + + /* Likely a fall-through edge, so update if needed. */ + if (single_succ_p (exit_dest)) + doms.safe_push (single_succ (exit_dest)); + } } else /* Add the copy at entry. */ { + /* Copy the current loop LC PHI nodes between the original loop exit + block and the new loop header. This allows us to later split the + preheader block and still find the right LC nodes. */ + edge old_latch_loop = loop_latch_edge (loop); + edge old_latch_init = loop_preheader_edge (loop); + edge new_latch_loop = loop_latch_edge (new_loop); + edge new_latch_init = loop_preheader_edge (new_loop); + for (auto gsi_from = gsi_start_phis (new_latch_init->dest), + gsi_to = gsi_start_phis (old_latch_loop->dest); + flow_loops && !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to); + gsi_next (&gsi_from), gsi_next (&gsi_to)) + { + gimple *from_phi = gsi_stmt (gsi_from); + gimple *to_phi = gsi_stmt (gsi_to); + tree new_arg = PHI_ARG_DEF_FROM_EDGE (from_phi, new_latch_loop); + adjust_phi_and_debug_stmts (to_phi, old_latch_init, new_arg); + } + if (scalar_loop != loop) { /* Remove the non-necessary forwarder of scalar_loop again. */ @@ -1677,31 +1819,36 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, delete_basic_block (new_preheader); set_immediate_dominator (CDI_DOMINATORS, new_loop->header, loop_preheader_edge (new_loop)->src); + + if (multiple_exits_p) + update_loop = loop; } - if (scalar_loop != loop) + if (multiple_exits_p) { - /* Update new_loop->header PHIs, so that on the preheader - edge they are the ones from loop rather than scalar_loop. */ - gphi_iterator gsi_orig, gsi_new; - edge orig_e = loop_preheader_edge (loop); - edge new_e = loop_preheader_edge (new_loop); - - for (gsi_orig = gsi_start_phis (loop->header), - gsi_new = gsi_start_phis (new_loop->header); - !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_new); - gsi_next (&gsi_orig), gsi_next (&gsi_new)) + for (edge e : get_loop_exit_edges (update_loop)) { - gphi *orig_phi = gsi_orig.phi (); - gphi *new_phi = gsi_new.phi (); - tree orig_arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, orig_e); - location_t orig_locus - = gimple_phi_arg_location_from_edge (orig_phi, orig_e); - - add_phi_arg (new_phi, orig_arg, new_e, orig_locus); + edge ex; + edge_iterator ei; + FOR_EACH_EDGE (ex, ei, e->dest->succs) + { + /* Find the first non-fallthrough block as fall-throughs can't + dominate other blocks. */ + while ((ex->flags & EDGE_FALLTHRU) + && single_succ_p (ex->dest)) + { + doms.safe_push (ex->dest); + ex = single_succ_edge (ex->dest); + } + doms.safe_push (ex->dest); + } + doms.safe_push (e->dest); } - } + iterate_fix_dominators (CDI_DOMINATORS, doms, false); + if (updated_doms) + updated_doms->safe_splice (doms); + } free (new_bbs); free (bbs); @@ -1777,6 +1924,9 @@ slpeel_can_duplicate_loop_p (const loop_vec_info loop_vinfo, const_edge e) gimple_stmt_iterator loop_exit_gsi = gsi_last_bb (exit_e->src); unsigned int num_bb = loop->inner? 5 : 2; + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) + num_bb += LOOP_VINFO_ALT_EXITS (loop_vinfo).length (); + /* All loops have an outer scope; the only case loop->outer is NULL is for the function itself. */ if (!loop_outer (loop) @@ -2044,6 +2194,11 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo, class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); basic_block update_bb = update_e->dest; + /* For early exits we'll update the IVs in + vect_update_ivs_after_early_break. */ + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) + return; + basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest; /* Make sure there exists a single-predecessor exit bb: */ @@ -2131,6 +2286,208 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo, /* Fix phi expressions in the successor bb. */ adjust_phi_and_debug_stmts (phi1, update_e, ni_name); } + return; +} + +/* Function vect_update_ivs_after_early_break. + + "Advance" the induction variables of LOOP to the value they should take + after the execution of LOOP. This is currently necessary because the + vectorizer does not handle induction variables that are used after the + loop. Such a situation occurs when the last iterations of LOOP are + peeled, because of the early exit. With an early exit we always peel the + loop. + + Input: + - LOOP_VINFO - a loop info structure for the loop that is going to be + vectorized. The last few iterations of LOOP were peeled. + - LOOP - a loop that is going to be vectorized. The last few iterations + of LOOP were peeled. + - VF - The loop vectorization factor. + - NITERS_ORIG - the number of iterations that LOOP executes (before it is + vectorized). i.e, the number of times the ivs should be + bumped. + - NITERS_VECTOR - The number of iterations that the vector LOOP executes. + - UPDATE_E - a successor edge of LOOP->exit that is on the (only) path + coming out from LOOP on which there are uses of the LOOP ivs + (this is the path from LOOP->exit to epilog_loop->preheader). + + The new definitions of the ivs are placed in LOOP->exit. + The phi args associated with the edge UPDATE_E in the bb + UPDATE_E->dest are updated accordingly. + + Output: + - If available, the LCSSA phi node for the loop IV temp. + + Assumption 1: Like the rest of the vectorizer, this function assumes + a single loop exit that has a single predecessor. + + Assumption 2: The phi nodes in the LOOP header and in update_bb are + organized in the same order. + + Assumption 3: The access function of the ivs is simple enough (see + vect_can_advance_ivs_p). This assumption will be relaxed in the future. + + Assumption 4: Exactly one of the successors of LOOP exit-bb is on a path + coming out of LOOP on which the ivs of LOOP are used (this is the path + that leads to the epilog loop; other paths skip the epilog loop). This + path starts with the edge UPDATE_E, and its destination (denoted update_bb) + needs to have its phis updated. + */ + +static tree +vect_update_ivs_after_early_break (loop_vec_info loop_vinfo, class loop * epilog, + poly_int64 vf, tree niters_orig, + tree niters_vector, edge update_e) +{ + if (!LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) + return NULL; + + gphi_iterator gsi, gsi1; + tree ni_name, ivtmp = NULL; + basic_block update_bb = update_e->dest; + vec alt_exits = LOOP_VINFO_ALT_EXITS (loop_vinfo); + edge loop_iv = LOOP_VINFO_IV_EXIT (loop_vinfo); + basic_block exit_bb = loop_iv->dest; + class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); + gcond *cond = LOOP_VINFO_LOOP_IV_COND (loop_vinfo); + + gcc_assert (cond); + + for (gsi = gsi_start_phis (loop->header), gsi1 = gsi_start_phis (update_bb); + !gsi_end_p (gsi) && !gsi_end_p (gsi1); + gsi_next (&gsi), gsi_next (&gsi1)) + { + tree init_expr, final_expr, step_expr; + tree type; + tree var, ni, off; + gimple_stmt_iterator last_gsi; + + gphi *phi = gsi1.phi (); + tree phi_ssa = PHI_ARG_DEF_FROM_EDGE (phi, loop_preheader_edge (epilog)); + gphi *phi1 = dyn_cast (SSA_NAME_DEF_STMT (phi_ssa)); + if (!phi1) + continue; + stmt_vec_info phi_info = loop_vinfo->lookup_stmt (gsi.phi ()); + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "vect_update_ivs_after_early_break: phi: %G", + (gimple *)phi); + + /* Skip reduction and virtual phis. */ + if (!iv_phi_p (phi_info)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "reduc or virtual phi. skip.\n"); + continue; + } + + /* For multiple exits where we handle early exits we need to carry on + with the previous IV as loop iteration was not done because we exited + early. As such just grab the original IV. */ + phi_ssa = PHI_ARG_DEF_FROM_EDGE (gsi.phi (), loop_latch_edge (loop)); + if (gimple_cond_lhs (cond) != phi_ssa + && gimple_cond_rhs (cond) != phi_ssa) + { + type = TREE_TYPE (gimple_phi_result (phi)); + step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (phi_info); + step_expr = unshare_expr (step_expr); + + /* We previously generated the new merged phi in the same BB as the + guard. So use that to perform the scaling on rather than the + normal loop phi which don't take the early breaks into account. */ + final_expr = gimple_phi_result (phi1); + init_expr = PHI_ARG_DEF_FROM_EDGE (gsi.phi (), loop_preheader_edge (loop)); + + tree stype = TREE_TYPE (step_expr); + /* For early break the final loop IV is: + init + (final - init) * vf which takes into account peeling + values and non-single steps. */ + off = fold_build2 (MINUS_EXPR, stype, + fold_convert (stype, final_expr), + fold_convert (stype, init_expr)); + /* Now adjust for VF to get the final iteration value. */ + off = fold_build2 (MULT_EXPR, stype, off, build_int_cst (stype, vf)); + + /* Adjust the value with the offset. */ + if (POINTER_TYPE_P (type)) + ni = fold_build_pointer_plus (init_expr, off); + else + ni = fold_convert (type, + fold_build2 (PLUS_EXPR, stype, + fold_convert (stype, init_expr), + off)); + var = create_tmp_var (type, "tmp"); + + last_gsi = gsi_last_bb (exit_bb); + gimple_seq new_stmts = NULL; + ni_name = force_gimple_operand (ni, &new_stmts, false, var); + /* Exit_bb shouldn't be empty. */ + if (!gsi_end_p (last_gsi)) + gsi_insert_seq_after (&last_gsi, new_stmts, GSI_SAME_STMT); + else + gsi_insert_seq_before (&last_gsi, new_stmts, GSI_SAME_STMT); + + /* Fix phi expressions in the successor bb. */ + adjust_phi_and_debug_stmts (phi, update_e, ni_name); + } + else + { + type = TREE_TYPE (gimple_phi_result (phi)); + step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (phi_info); + step_expr = unshare_expr (step_expr); + + /* We previously generated the new merged phi in the same BB as the + guard. So use that to perform the scaling on rather than the + normal loop phi which don't take the early breaks into account. */ + init_expr = PHI_ARG_DEF_FROM_EDGE (phi1, loop_preheader_edge (loop)); + tree stype = TREE_TYPE (step_expr); + + if (vf.is_constant ()) + { + ni = fold_build2 (MULT_EXPR, stype, + fold_convert (stype, + niters_vector), + build_int_cst (stype, vf)); + + ni = fold_build2 (MINUS_EXPR, stype, + fold_convert (stype, + niters_orig), + fold_convert (stype, ni)); + } + else + /* If the loop's VF isn't constant then the loop must have been + masked, so at the end of the loop we know we have finished + the entire loop and found nothing. */ + ni = build_zero_cst (stype); + + ni = fold_convert (type, ni); + /* We don't support variable n in this version yet. */ + gcc_assert (TREE_CODE (ni) == INTEGER_CST); + + var = create_tmp_var (type, "tmp"); + + last_gsi = gsi_last_bb (exit_bb); + gimple_seq new_stmts = NULL; + ni_name = force_gimple_operand (ni, &new_stmts, false, var); + /* Exit_bb shouldn't be empty. */ + if (!gsi_end_p (last_gsi)) + gsi_insert_seq_after (&last_gsi, new_stmts, GSI_SAME_STMT); + else + gsi_insert_seq_before (&last_gsi, new_stmts, GSI_SAME_STMT); + + adjust_phi_and_debug_stmts (phi1, loop_iv, ni_name); + + for (edge exit : alt_exits) + adjust_phi_and_debug_stmts (phi1, exit, + build_int_cst (TREE_TYPE (step_expr), + vf)); + ivtmp = gimple_phi_result (phi1); + } + } + + return ivtmp; } /* Return a gimple value containing the misalignment (measured in vector @@ -2632,137 +2989,34 @@ vect_gen_vector_loop_niters_mult_vf (loop_vec_info loop_vinfo, /* LCSSA_PHI is a lcssa phi of EPILOG loop which is copied from LOOP, this function searches for the corresponding lcssa phi node in exit - bb of LOOP. If it is found, return the phi result; otherwise return - NULL. */ + bb of LOOP following the LCSSA_EDGE to the exit node. If it is found, + return the phi result; otherwise return NULL. */ static tree find_guard_arg (class loop *loop, class loop *epilog ATTRIBUTE_UNUSED, - gphi *lcssa_phi) + gphi *lcssa_phi, int lcssa_edge = 0) { gphi_iterator gsi; edge e = loop->vec_loop_iv; - gcc_assert (single_pred_p (e->dest)); for (gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi); gsi_next (&gsi)) { gphi *phi = gsi.phi (); - if (operand_equal_p (PHI_ARG_DEF (phi, 0), - PHI_ARG_DEF (lcssa_phi, 0), 0)) - return PHI_RESULT (phi); - } - return NULL_TREE; -} - -/* Function slpeel_tree_duplicate_loop_to_edge_cfg duplciates FIRST/SECOND - from SECOND/FIRST and puts it at the original loop's preheader/exit - edge, the two loops are arranged as below: - - preheader_a: - first_loop: - header_a: - i_1 = PHI; - ... - i_2 = i_1 + 1; - if (cond_a) - goto latch_a; - else - goto between_bb; - latch_a: - goto header_a; - - between_bb: - ;; i_x = PHI; ;; LCSSA phi node to be created for FIRST, - - second_loop: - header_b: - i_3 = PHI; ;; Use of i_0 to be replaced with i_x, - or with i_2 if no LCSSA phi is created - under condition of CREATE_LCSSA_FOR_IV_PHIS. - ... - i_4 = i_3 + 1; - if (cond_b) - goto latch_b; - else - goto exit_bb; - latch_b: - goto header_b; - - exit_bb: - - This function creates loop closed SSA for the first loop; update the - second loop's PHI nodes by replacing argument on incoming edge with the - result of newly created lcssa PHI nodes. IF CREATE_LCSSA_FOR_IV_PHIS - is false, Loop closed ssa phis will only be created for non-iv phis for - the first loop. - - This function assumes exit bb of the first loop is preheader bb of the - second loop, i.e, between_bb in the example code. With PHIs updated, - the second loop will execute rest iterations of the first. */ - -static void -slpeel_update_phi_nodes_for_loops (loop_vec_info loop_vinfo, - class loop *first, class loop *second, - bool create_lcssa_for_iv_phis) -{ - gphi_iterator gsi_update, gsi_orig; - class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); - - edge first_latch_e = EDGE_SUCC (first->latch, 0); - edge second_preheader_e = loop_preheader_edge (second); - basic_block between_bb = single_exit (first)->dest; - - gcc_assert (between_bb == second_preheader_e->src); - gcc_assert (single_pred_p (between_bb) && single_succ_p (between_bb)); - /* Either the first loop or the second is the loop to be vectorized. */ - gcc_assert (loop == first || loop == second); - - for (gsi_orig = gsi_start_phis (first->header), - gsi_update = gsi_start_phis (second->header); - !gsi_end_p (gsi_orig) && !gsi_end_p (gsi_update); - gsi_next (&gsi_orig), gsi_next (&gsi_update)) - { - gphi *orig_phi = gsi_orig.phi (); - gphi *update_phi = gsi_update.phi (); - - tree arg = PHI_ARG_DEF_FROM_EDGE (orig_phi, first_latch_e); - /* Generate lcssa PHI node for the first loop. */ - gphi *vect_phi = (loop == first) ? orig_phi : update_phi; - stmt_vec_info vect_phi_info = loop_vinfo->lookup_stmt (vect_phi); - if (create_lcssa_for_iv_phis || !iv_phi_p (vect_phi_info)) + /* Nested loops with multiple exits can have different no# phi node + arguments between the main loop and epilog as epilog falls to the + second loop. */ + if (gimple_phi_num_args (phi) > e->dest_idx) { - tree new_res = copy_ssa_name (PHI_RESULT (orig_phi)); - gphi *lcssa_phi = create_phi_node (new_res, between_bb); - add_phi_arg (lcssa_phi, arg, single_exit (first), UNKNOWN_LOCATION); - arg = new_res; - } - - /* Update PHI node in the second loop by replacing arg on the loop's - incoming edge. */ - adjust_phi_and_debug_stmts (update_phi, second_preheader_e, arg); - } - - /* For epilogue peeling we have to make sure to copy all LC PHIs - for correct vectorization of live stmts. */ - if (loop == first) - { - basic_block orig_exit = single_exit (second)->dest; - for (gsi_orig = gsi_start_phis (orig_exit); - !gsi_end_p (gsi_orig); gsi_next (&gsi_orig)) - { - gphi *orig_phi = gsi_orig.phi (); - tree orig_arg = PHI_ARG_DEF (orig_phi, 0); - if (TREE_CODE (orig_arg) != SSA_NAME || virtual_operand_p (orig_arg)) - continue; - - /* Already created in the above loop. */ - if (find_guard_arg (first, second, orig_phi)) + tree var = PHI_ARG_DEF (phi, e->dest_idx); + if (TREE_CODE (var) != SSA_NAME) continue; - tree new_res = copy_ssa_name (orig_arg); - gphi *lcphi = create_phi_node (new_res, between_bb); - add_phi_arg (lcphi, orig_arg, single_exit (first), UNKNOWN_LOCATION); + if (operand_equal_p (get_current_def (var), + PHI_ARG_DEF (lcssa_phi, lcssa_edge), 0)) + return PHI_RESULT (phi); } } + return NULL_TREE; } /* Function slpeel_add_loop_guard adds guard skipping from the beginning @@ -2910,13 +3164,11 @@ slpeel_update_phi_nodes_for_guard2 (class loop *loop, class loop *epilog, gcc_assert (single_succ_p (merge_bb)); edge e = single_succ_edge (merge_bb); basic_block exit_bb = e->dest; - gcc_assert (single_pred_p (exit_bb)); - gcc_assert (single_pred (exit_bb) == single_exit (epilog)->dest); for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi)) { gphi *update_phi = gsi.phi (); - tree old_arg = PHI_ARG_DEF (update_phi, 0); + tree old_arg = PHI_ARG_DEF (update_phi, e->dest_idx); tree merge_arg = NULL_TREE; @@ -2928,7 +3180,7 @@ slpeel_update_phi_nodes_for_guard2 (class loop *loop, class loop *epilog, if (!merge_arg) merge_arg = old_arg; - tree guard_arg = find_guard_arg (loop, epilog, update_phi); + tree guard_arg = find_guard_arg (loop, epilog, update_phi, e->dest_idx); /* If the var is live after loop but not a reduction, we simply use the old arg. */ if (!guard_arg) @@ -2948,21 +3200,6 @@ slpeel_update_phi_nodes_for_guard2 (class loop *loop, class loop *epilog, } } -/* EPILOG loop is duplicated from the original loop for vectorizing, - the arg of its loop closed ssa PHI needs to be updated. */ - -static void -slpeel_update_phi_nodes_for_lcssa (class loop *epilog) -{ - gphi_iterator gsi; - basic_block exit_bb = single_exit (epilog)->dest; - - gcc_assert (single_pred_p (exit_bb)); - edge e = EDGE_PRED (exit_bb, 0); - for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi); gsi_next (&gsi)) - rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), e)); -} - /* EPILOGUE_VINFO is an epilogue loop that we now know would need to iterate exactly CONST_NITERS times. Make a final decision about whether the epilogue loop should be used, returning true if so. */ @@ -3138,6 +3375,14 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, bound_epilog += vf - 1; if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)) bound_epilog += 1; + /* For early breaks the scalar loop needs to execute at most VF times + to find the element that caused the break. */ + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) + { + bound_epilog = vf; + /* Force a scalar epilogue as we can't vectorize the index finding. */ + vect_epilogues = false; + } bool epilog_peeling = maybe_ne (bound_epilog, 0U); poly_uint64 bound_scalar = bound_epilog; @@ -3297,16 +3542,24 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, bound_prolog + bound_epilog) : (!LOOP_REQUIRES_VERSIONING (loop_vinfo) || vect_epilogues)); + + /* We only support early break vectorization on known bounds at this time. + This means that if the vector loop can't be entered then we won't generate + it at all. So for now force skip_vector off because the additional control + flow messes with the BB exits and we've already analyzed them. */ + skip_vector = skip_vector && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo); + /* Epilog loop must be executed if the number of iterations for epilog loop is known at compile time, otherwise we need to add a check at the end of vector loop and skip to the end of epilog loop. */ bool skip_epilog = (prolog_peeling < 0 || !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) || !vf.is_constant ()); - /* PEELING_FOR_GAPS is special because epilog loop must be executed. */ - if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)) + /* PEELING_FOR_GAPS and peeling for early breaks are special because epilog + loop must be executed. */ + if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) + || LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) skip_epilog = false; - class loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo); auto_vec original_counts; basic_block *original_bbs = NULL; @@ -3344,13 +3597,13 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, if (prolog_peeling) { e = loop_preheader_edge (loop); - gcc_checking_assert (slpeel_can_duplicate_loop_p (loop, e)); - + gcc_checking_assert (slpeel_can_duplicate_loop_p (loop_vinfo, e)); /* Peel prolog and put it on preheader edge of loop. */ - prolog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, scalar_loop, e); + prolog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, scalar_loop, e, + true); gcc_assert (prolog); prolog->force_vectorize = false; - slpeel_update_phi_nodes_for_loops (loop_vinfo, prolog, loop, true); + first_loop = prolog; reset_original_copy_tables (); @@ -3420,11 +3673,12 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, as the transformations mentioned above make less or no sense when not vectorizing. */ epilog = vect_epilogues ? get_loop_copy (loop) : scalar_loop; - epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, epilog, e); + auto_vec doms; + epilog = slpeel_tree_duplicate_loop_to_edge_cfg (loop, epilog, e, true, + &doms); gcc_assert (epilog); epilog->force_vectorize = false; - slpeel_update_phi_nodes_for_loops (loop_vinfo, loop, epilog, false); /* Scalar version loop may be preferred. In this case, add guard and skip to epilog. Note this only happens when the number of @@ -3496,6 +3750,54 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, vect_update_ivs_after_vectorizer (loop_vinfo, niters_vector_mult_vf, update_e); + /* For early breaks we must create a guard to check how many iterations + of the scalar loop are yet to be performed. */ + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) + { + tree ivtmp = + vect_update_ivs_after_early_break (loop_vinfo, epilog, vf, niters, + *niters_vector, update_e); + + gcc_assert (ivtmp); + tree guard_cond = fold_build2 (EQ_EXPR, boolean_type_node, + fold_convert (TREE_TYPE (niters), + ivtmp), + build_zero_cst (TREE_TYPE (niters))); + basic_block guard_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest; + + /* If we had a fallthrough edge, the guard will the threaded through + and so we may need to find the actual final edge. */ + edge final_edge = epilog->vec_loop_iv; + /* slpeel_update_phi_nodes_for_guard2 expects an empty block in + between the guard and the exit edge. It only adds new nodes and + doesn't update existing one in the current scheme. */ + basic_block guard_to = split_edge (final_edge); + edge guard_e = slpeel_add_loop_guard (guard_bb, guard_cond, guard_to, + guard_bb, prob_epilog.invert (), + irred_flag); + doms.safe_push (guard_bb); + + iterate_fix_dominators (CDI_DOMINATORS, doms, false); + + /* We must update all the edges from the new guard_bb. */ + slpeel_update_phi_nodes_for_guard2 (loop, epilog, guard_e, + final_edge); + + /* If the loop was versioned we'll have an intermediate BB between + the guard and the exit. This intermediate block is required + because in the current scheme of things the guard block phi + updating can only maintain LCSSA by creating new blocks. In this + case we just need to update the uses in this block as well. */ + if (loop != scalar_loop) + { + for (gphi_iterator gsi = gsi_start_phis (guard_to); + !gsi_end_p (gsi); gsi_next (&gsi)) + rename_use_op (PHI_ARG_DEF_PTR_FROM_EDGE (gsi.phi (), guard_e)); + } + + flush_pending_stmts (guard_e); + } + if (skip_epilog) { guard_cond = fold_build2 (EQ_EXPR, boolean_type_node, @@ -3520,8 +3822,6 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, } scale_loop_profile (epilog, prob_epilog, 0); } - else - slpeel_update_phi_nodes_for_lcssa (epilog); unsigned HOST_WIDE_INT bound; if (bound_scalar.is_constant (&bound)) diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index b4a98de80aa39057fc9b17977dd0e347b4f0fb5d..ab9a2048186f461f5ec49f21421958e7ee25eada 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -1007,6 +1007,8 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared) partial_load_store_bias (0), peeling_for_gaps (false), peeling_for_niter (false), + early_breaks (false), + non_break_control_flow (false), no_data_dependencies (false), has_mask_store (false), scalar_loop_scaling (profile_probability::uninitialized ()), @@ -1199,6 +1201,14 @@ vect_need_peeling_or_partial_vectors_p (loop_vec_info loop_vinfo) th = LOOP_VINFO_COST_MODEL_THRESHOLD (LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo)); + /* When we have multiple exits and VF is unknown, we must require partial + vectors because the loop bounds is not a minimum but a maximum. That is to + say we cannot unpredicate the main loop unless we peel or use partial + vectors in the epilogue. */ + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo) + && !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ()) + return true; + if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0) { @@ -1652,12 +1662,12 @@ vect_compute_single_scalar_iteration_cost (loop_vec_info loop_vinfo) loop_vinfo->scalar_costs->finish_cost (nullptr); } - /* Function vect_analyze_loop_form. Verify that certain CFG restrictions hold, including: - the loop has a pre-header - - the loop has a single entry and exit + - the loop has a single entry + - nested loops can have only a single exit. - the loop exit condition is simple enough - the number of iterations can be analyzed, i.e, a countable loop. The niter could be analyzed under some assumptions. */ @@ -1693,11 +1703,6 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info) | (exit-bb) */ - if (loop->num_nodes != 2) - return opt_result::failure_at (vect_location, - "not vectorized:" - " control flow in loop.\n"); - if (empty_block_p (loop->header)) return opt_result::failure_at (vect_location, "not vectorized: empty loop.\n"); @@ -1768,11 +1773,13 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info) dump_printf_loc (MSG_NOTE, vect_location, "Considering outer-loop vectorization.\n"); info->inner_loop_cond = inner.loop_cond; + + if (!single_exit (loop)) + return opt_result::failure_at (vect_location, + "not vectorized: multiple exits.\n"); + } - if (!single_exit (loop)) - return opt_result::failure_at (vect_location, - "not vectorized: multiple exits.\n"); if (EDGE_COUNT (loop->header->preds) != 2) return opt_result::failure_at (vect_location, "not vectorized:" @@ -1788,11 +1795,36 @@ vect_analyze_loop_form (class loop *loop, vect_loop_form_info *info) "not vectorized: latch block not empty.\n"); /* Make sure the exit is not abnormal. */ - edge e = single_exit (loop); - if (e->flags & EDGE_ABNORMAL) - return opt_result::failure_at (vect_location, - "not vectorized:" - " abnormal loop exit edge.\n"); + auto_vec exits = get_loop_exit_edges (loop); + edge nexit = loop->vec_loop_iv; + for (edge e : exits) + { + if (e->flags & EDGE_ABNORMAL) + return opt_result::failure_at (vect_location, + "not vectorized:" + " abnormal loop exit edge.\n"); + /* Early break BB must be after the main exit BB. In theory we should + be able to vectorize the inverse order, but the current flow in the + the vectorizer always assumes you update successor PHI nodes, not + preds. */ + if (e != nexit && !dominated_by_p (CDI_DOMINATORS, nexit->src, e->src)) + return opt_result::failure_at (vect_location, + "not vectorized:" + " abnormal loop exit edge order.\n"); + } + + /* We currently only support early exit loops with known bounds. */ + if (exits.length () > 1) + { + class tree_niter_desc niter; + if (!number_of_iterations_exit_assumptions (loop, nexit, &niter, NULL) + || chrec_contains_undetermined (niter.niter) + || !evolution_function_is_constant_p (niter.niter)) + return opt_result::failure_at (vect_location, + "not vectorized:" + " early breaks only supported on loops" + " with known iteration bounds.\n"); + } info->conds = vect_get_loop_niters (loop, &info->assumptions, @@ -1866,6 +1898,10 @@ vect_create_loop_vinfo (class loop *loop, vec_info_shared *shared, LOOP_VINFO_LOOP_CONDS (loop_vinfo).safe_splice (info->alt_loop_conds); LOOP_VINFO_LOOP_IV_COND (loop_vinfo) = info->loop_cond; + /* Check to see if we're vectorizing multiple exits. */ + LOOP_VINFO_EARLY_BREAKS (loop_vinfo) + = !LOOP_VINFO_LOOP_CONDS (loop_vinfo).is_empty (); + if (info->inner_loop_cond) { stmt_vec_info inner_loop_cond_info @@ -3070,7 +3106,8 @@ start_over: /* If an epilogue loop is required make sure we can create one. */ if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) - || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)) + || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) + || LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) { if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "epilog loop required\n"); @@ -5797,7 +5834,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, basic_block exit_bb; tree scalar_dest; tree scalar_type; - gimple *new_phi = NULL, *phi; + gimple *new_phi = NULL, *phi = NULL; gimple_stmt_iterator exit_gsi; tree new_temp = NULL_TREE, new_name, new_scalar_dest; gimple *epilog_stmt = NULL; @@ -6039,6 +6076,33 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo, new_def = gimple_convert (&stmts, vectype, new_def); reduc_inputs.quick_push (new_def); } + + /* Update the other exits. */ + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) + { + vec alt_exits = LOOP_VINFO_ALT_EXITS (loop_vinfo); + gphi_iterator gsi, gsi1; + for (edge exit : alt_exits) + { + /* Find the phi node to propaget into the exit block for each + exit edge. */ + for (gsi = gsi_start_phis (exit_bb), + gsi1 = gsi_start_phis (exit->src); + !gsi_end_p (gsi) && !gsi_end_p (gsi1); + gsi_next (&gsi), gsi_next (&gsi1)) + { + /* There really should be a function to just get the number + of phis inside a bb. */ + if (phi && phi == gsi.phi ()) + { + gphi *phi1 = gsi1.phi (); + SET_PHI_ARG_DEF (phi, exit->dest_idx, + PHI_RESULT (phi1)); + break; + } + } + } + } gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT); } @@ -10355,6 +10419,13 @@ vectorizable_live_operation (vec_info *vinfo, new_tree = lane_extract ; lhs' = new_tree; */ + /* When vectorizing an early break, any live statement that is used + outside of the loop are dead. The loop will never get to them. + We could change the liveness value during analysis instead but since + the below code is invalid anyway just ignore it during codegen. */ + if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) + return true; + class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); basic_block exit_bb = LOOP_VINFO_IV_EXIT (loop_vinfo)->dest; gcc_assert (single_pred_p (exit_bb)); @@ -11277,7 +11348,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call) /* Make sure there exists a single-predecessor exit bb. Do this before versioning. */ edge e = LOOP_VINFO_IV_EXIT (loop_vinfo); - if (! single_pred_p (e->dest)) + if (e && ! single_pred_p (e->dest) && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) { split_loop_exit_edge (e, true); if (dump_enabled_p ()) @@ -11303,7 +11374,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call) if (LOOP_VINFO_SCALAR_LOOP (loop_vinfo)) { e = single_exit (LOOP_VINFO_SCALAR_LOOP (loop_vinfo)); - if (! single_pred_p (e->dest)) + if (e && ! single_pred_p (e->dest)) { split_loop_exit_edge (e, true); if (dump_enabled_p ()) @@ -11641,7 +11712,8 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call) /* Loops vectorized with a variable factor won't benefit from unrolling/peeling. */ - if (!vf.is_constant ()) + if (!vf.is_constant () + && !LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) { loop->unroll = 1; if (dump_enabled_p ()) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 87c4353fa5180fcb7f60b192897456cf24f3fdbe..03524e8500ee06df42f82afe78ee2a7c627be45b 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -344,9 +344,34 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo, *live_p = false; /* cond stmt other than loop exit cond. */ - if (is_ctrl_stmt (stmt_info->stmt) - && STMT_VINFO_TYPE (stmt_info) != loop_exit_ctrl_vec_info_type) - *relevant = vect_used_in_scope; + if (is_ctrl_stmt (stmt_info->stmt)) + { + /* Ideally EDGE_LOOP_EXIT would have been set on the exit edge, but + it looks like loop_manip doesn't do that.. So we have to do it + the hard way. */ + basic_block bb = gimple_bb (stmt_info->stmt); + bool exit_bb = false, early_exit = false; + edge_iterator ei; + edge e; + FOR_EACH_EDGE (e, ei, bb->succs) + if (!flow_bb_inside_loop_p (loop, e->dest)) + { + exit_bb = true; + early_exit = loop->vec_loop_iv->src != bb; + break; + } + + /* We should have processed any exit edge, so an edge not an early + break must be a loop IV edge. We need to distinguish between the + two as we don't want to generate code for the main loop IV. */ + if (exit_bb) + { + if (early_exit) + *relevant = vect_used_in_scope; + } + else if (bb->loop_father == loop) + LOOP_VINFO_GENERAL_CTR_FLOW (loop_vinfo) = true; + } /* changing memory. */ if (gimple_code (stmt_info->stmt) != GIMPLE_PHI) @@ -359,6 +384,11 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo, *relevant = vect_used_in_scope; } + auto_vec exits = get_loop_exit_edges (loop); + auto_bitmap exit_bbs; + for (edge exit : exits) + bitmap_set_bit (exit_bbs, exit->dest->index); + /* uses outside the loop. */ FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt_info->stmt, op_iter, SSA_OP_DEF) { @@ -377,7 +407,7 @@ vect_stmt_relevant_p (stmt_vec_info stmt_info, loop_vec_info loop_vinfo, /* We expect all such uses to be in the loop exit phis (because of loop closed form) */ gcc_assert (gimple_code (USE_STMT (use_p)) == GIMPLE_PHI); - gcc_assert (bb == single_exit (loop)->dest); + gcc_assert (bitmap_bit_p (exit_bbs, bb->index)); *live_p = true; } @@ -683,6 +713,13 @@ vect_mark_stmts_to_be_vectorized (loop_vec_info loop_vinfo, bool *fatal) } } + /* Ideally this should be in vect_analyze_loop_form but we haven't seen all + the conds yet at that point and there's no quick way to retrieve them. */ + if (LOOP_VINFO_GENERAL_CTR_FLOW (loop_vinfo)) + return opt_result::failure_at (vect_location, + "not vectorized:" + " unsupported control flow in loop.\n"); + /* 2. Process_worklist */ while (worklist.length () > 0) { @@ -778,6 +815,20 @@ vect_mark_stmts_to_be_vectorized (loop_vec_info loop_vinfo, bool *fatal) return res; } } + } + else if (gcond *cond = dyn_cast (stmt_vinfo->stmt)) + { + enum tree_code rhs_code = gimple_cond_code (cond); + gcc_assert (TREE_CODE_CLASS (rhs_code) == tcc_comparison); + opt_result res + = process_use (stmt_vinfo, gimple_cond_lhs (cond), + loop_vinfo, relevant, &worklist, false); + if (!res) + return res; + res = process_use (stmt_vinfo, gimple_cond_rhs (cond), + loop_vinfo, relevant, &worklist, false); + if (!res) + return res; } else if (gcall *call = dyn_cast (stmt_vinfo->stmt)) { @@ -11919,11 +11970,15 @@ vect_analyze_stmt (vec_info *vinfo, node_instance, cost_vec); if (!res) return res; - } + } + + if (is_ctrl_stmt (stmt_info->stmt)) + STMT_VINFO_DEF_TYPE (stmt_info) = vect_early_exit_def; switch (STMT_VINFO_DEF_TYPE (stmt_info)) { case vect_internal_def: + case vect_early_exit_def: break; case vect_reduction_def: @@ -11956,6 +12011,7 @@ vect_analyze_stmt (vec_info *vinfo, { gcall *call = dyn_cast (stmt_info->stmt); gcc_assert (STMT_VINFO_VECTYPE (stmt_info) + || gimple_code (stmt_info->stmt) == GIMPLE_COND || (call && gimple_call_lhs (call) == NULL_TREE)); *need_to_vectorize = true; } diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index ec65b65b5910e9cbad0a8c7e83c950b6168b98bf..24a0567a2f23f1b3d8b340baff61d18da8e242dd 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -63,6 +63,7 @@ enum vect_def_type { vect_internal_def, vect_induction_def, vect_reduction_def, + vect_early_exit_def, vect_double_reduction_def, vect_nested_cycle, vect_first_order_recurrence, @@ -876,6 +877,13 @@ public: we need to peel off iterations at the end to form an epilogue loop. */ bool peeling_for_niter; + /* When the loop has early breaks that we can vectorize we need to peel + the loop for the break finding loop. */ + bool early_breaks; + + /* When the loop has a non-early break control flow inside. */ + bool non_break_control_flow; + /* List of loop additional IV conditionals found in the loop. */ auto_vec conds; @@ -985,9 +993,11 @@ public: #define LOOP_VINFO_REDUCTION_CHAINS(L) (L)->reduction_chains #define LOOP_VINFO_PEELING_FOR_GAPS(L) (L)->peeling_for_gaps #define LOOP_VINFO_PEELING_FOR_NITER(L) (L)->peeling_for_niter +#define LOOP_VINFO_EARLY_BREAKS(L) (L)->early_breaks #define LOOP_VINFO_EARLY_BRK_CONFLICT_STMTS(L) (L)->early_break_conflict #define LOOP_VINFO_EARLY_BRK_DEST_BB(L) (L)->early_break_dest_bb #define LOOP_VINFO_EARLY_BRK_VUSES(L) (L)->early_break_vuses +#define LOOP_VINFO_GENERAL_CTR_FLOW(L) (L)->non_break_control_flow #define LOOP_VINFO_LOOP_CONDS(L) (L)->conds #define LOOP_VINFO_LOOP_IV_COND(L) (L)->loop_iv_cond #define LOOP_VINFO_NO_DATA_DEPENDENCIES(L) (L)->no_data_dependencies @@ -1038,8 +1048,8 @@ public: stack. */ typedef opt_pointer_wrapper opt_loop_vec_info; -inline loop_vec_info -loop_vec_info_for_loop (class loop *loop) +static inline loop_vec_info +loop_vec_info_for_loop (const class loop *loop) { return (loop_vec_info) loop->aux; } @@ -1789,7 +1799,7 @@ is_loop_header_bb_p (basic_block bb) { if (bb == (bb->loop_father)->header) return true; - gcc_checking_assert (EDGE_COUNT (bb->preds) == 1); + return false; } @@ -2176,9 +2186,10 @@ class auto_purge_vect_location in tree-vect-loop-manip.cc. */ extern void vect_set_loop_condition (class loop *, loop_vec_info, tree, tree, tree, bool); -extern bool slpeel_can_duplicate_loop_p (const class loop *, const_edge); +extern bool slpeel_can_duplicate_loop_p (const loop_vec_info, const_edge); class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *, - class loop *, edge); + class loop *, edge, bool, + vec * = NULL); class loop *vect_loop_versioning (loop_vec_info, gimple *); extern class loop *vect_do_peeling (loop_vec_info, tree, tree, tree *, tree *, tree *, int, bool, bool, diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc index a048e9d89178a37455bd7b83ab0f2a238a4ce69e..0dc5479dc92058b6c70c67f29f5dc9a8d72235f4 100644 --- a/gcc/tree-vectorizer.cc +++ b/gcc/tree-vectorizer.cc @@ -1379,7 +1379,9 @@ pass_vectorize::execute (function *fun) predicates that need to be shared for optimal predicate usage. However reassoc will re-order them and prevent CSE from working as it should. CSE only the loop body, not the entry. */ - bitmap_set_bit (exit_bbs, single_exit (loop)->dest->index); + auto_vec exits = get_loop_exit_edges (loop); + for (edge exit : exits) + bitmap_set_bit (exit_bbs, exit->dest->index); edge entry = EDGE_PRED (loop_preheader_edge (loop)->src, 0); do_rpo_vn (fun, entry, exit_bbs); From patchwork Wed Jun 28 13:47:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 113906 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp8941660vqr; Wed, 28 Jun 2023 06:52:06 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7rNX0p6o+ommtmbuOH/rBjz64h2enFqOHmvM8RQSXKBzIQbszWxzCjF84ZWtOThxe8qsvr X-Received: by 2002:a2e:910a:0:b0:2b5:9f54:e28b with SMTP id m10-20020a2e910a000000b002b59f54e28bmr8609592ljg.7.1687960326353; Wed, 28 Jun 2023 06:52:06 -0700 (PDT) Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id fi16-20020a170906da1000b00991e4587539si3532782ejb.576.2023.06.28.06.52.05 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Jun 2023 06:52:06 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=J7fUMM2m; arc=fail (signature failed); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 448753870889 for ; Wed, 28 Jun 2023 13:49:28 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 448753870889 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1687960168; bh=moRo3cWohc8wmbspzb0Zcvx6SUNQLXVpvRsjK6H1W78=; h=Date:To:Cc:Subject:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=J7fUMM2mfXVd3EKoJEi4RcaR2qeZ/LQbk/ozCGLVtwdmcphnQl2kEdhOdX/kvqDCO 9bgP3N5YcsNsstZHxJ8EoQmPSo2FzHABk5H2dOZJ+7z6c55P90rQTMN9pqx5XeQ0wb Ha3HyeksQhLmzhxrG8+l6lSxiIVQHdRc1Dxwqo4c= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR05-AM6-obe.outbound.protection.outlook.com (mail-am6eur05on2085.outbound.protection.outlook.com [40.107.22.85]) by sourceware.org (Postfix) with ESMTPS id 4C0B6385E019 for ; Wed, 28 Jun 2023 13:47:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4C0B6385E019 Received: from DUZPR01CA0130.eurprd01.prod.exchangelabs.com (2603:10a6:10:4bc::25) by PAXPR08MB6639.eurprd08.prod.outlook.com (2603:10a6:102:154::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.24; Wed, 28 Jun 2023 13:47:38 +0000 Received: from DBAEUR03FT049.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:4bc:cafe::ca) by DUZPR01CA0130.outlook.office365.com (2603:10a6:10:4bc::25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.34 via Frontend Transport; Wed, 28 Jun 2023 13:47:38 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT049.mail.protection.outlook.com (100.127.142.192) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6544.20 via Frontend Transport; Wed, 28 Jun 2023 13:47:38 +0000 Received: ("Tessian outbound 52217515e112:v142"); Wed, 28 Jun 2023 13:47:38 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: d09132068e93acc2 X-CR-MTA-TID: 64aa7808 Received: from 17280a81f5cd.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 0ADBE8A2-BFF2-4945-BE01-084DA49721AE.1; Wed, 28 Jun 2023 13:47:32 +0000 Received: from EUR05-DB8-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 17280a81f5cd.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 28 Jun 2023 13:47:32 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=iuu151Czl5bRZW0PSBnWNfnN7NBKi2g4qmTAzuxqnPUXwhJCAA4pIphMdgQh4Nc3k6d3/uht2t5anAbIhjPcc1si+w5zivX4/GyhQVSAc81rWCpmkdvVT6iSAcdZNnzg/R1Qj9TdLYwMkNItJRhjWZ3aMWF+g5GkZ/RNzcVbq0caw0dOdUcLUE08z4DIv2BJrX7f6vPEFjd2sbahKBPnCaM9uSum9czT/d05gIGQWWMviziEe8Q10oBwrIrTJHHxPIpnZves1MAeL0lQEXOTSW87bb80jWmIsC6wyln8mbG+oosdaYLHLCyL519G0d39ErH0scgqkaLRIuFiYyHVNA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=moRo3cWohc8wmbspzb0Zcvx6SUNQLXVpvRsjK6H1W78=; b=GGxxhM1MUiCX5ganRj/zy3rngGIzCDvNU6xXEM2FJpn7inj5f1hf1cHl7tMppvJK12EA0qqmB0selCSMqZkKqmrjNeAd79CLNcu3HOI2FPCTMsx7p1S9I1LJm5VW5NfsaF+P1OUn/3/SG4DtAG6RWXXIjPZDi6VcqjzuPe09P8aWjl+is6IlqZqHWn+K90Lix0sOdc6QwedDHSMrcuRvA8xMz/Q3fdtz8S6n/XwHJnrdlCy9BeFCwscMqtGnqxaDkWFOodzOWod1/R/Ue9+vKBU4uiklLG/5taTes5hpqpKnJZEfE/SMgMVGUcMeS6aGRDsapuj0O1RHOUbqEpCYfw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by DB5PR08MB9970.eurprd08.prod.outlook.com (2603:10a6:10:489::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.23; Wed, 28 Jun 2023 13:47:30 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::2301:1cde:cfe7:eaf0]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::2301:1cde:cfe7:eaf0%6]) with mapi id 15.20.6521.026; Wed, 28 Jun 2023 13:47:30 +0000 Date: Wed, 28 Jun 2023 14:47:23 +0100 To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, rguenther@suse.de, jlaw@ventanamicro.com Subject: [PATCH 13/19]middle-end testsuite: un-xfail TSVC loops that check for exit control flow vectorization Message-ID: Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: DM6PR14CA0069.namprd14.prod.outlook.com (2603:10b6:5:18f::46) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|DB5PR08MB9970:EE_|DBAEUR03FT049:EE_|PAXPR08MB6639:EE_ X-MS-Office365-Filtering-Correlation-Id: 54469910-0ab0-4764-95d1-08db77de3c6e x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: NBBWkskrdrWjOEtW2Igcy696BzsEslDzgJUhXH4AOgcYO5yGXss96IO+BXSo7T9eVV28i8ZlMq6/qWf18yiWCzzZjhVgglrnIDMOyX5dvLHO07ALaMy3PAjmazJbwHLSedMwTm5ulHDluRj/tIOmVpzRmjn/XEuKCBCqkpHuBdC3/p5DKcv3ik0jsbdsb/NzUQO4qQNovXej6w7OlBk01eUaDIzCBz988MsSg9cfzwGis2Hdbt9+kKvhqEtkQtNpkqbmuwYYwCbquhvG0qod/ktOMUGNWHNMPmBJhBg03FWsGvXx9Zc63oAYmMyf+RYZBXB0pUZmwnq3gEd3451q/d1WgZWVpQg7WKDnOnEgIb5tLAImlwkucYJpHlOCRAbiwUVF6VPWNOfaC8Jbj+qsc6/bGBEiF0tKZXGKm/BQT4hQ/aQjo2Rg4sFKKEUZ3962KFY0NUfJ318cYCfH1ZCmU/9rT7fe+sJLAeVycRaAq08+ZF5AYGl5a603HQu0uqZPVsjxRngUoF1ZsuiZ6IkQmR9P8W1NHrEtF5l+UXkzBikoMwxfNqPyQ2m/05oGhHKJMRkRW22hoKRhpVmewD9abDQLlzle4FEVKstfhpPcH8IJhzVtQ/xNqH42WopllEvXgSJqD0Io4kcetkGnFtF3dA== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230028)(4636009)(136003)(346002)(396003)(376002)(366004)(39860400002)(451199021)(66946007)(66476007)(6916009)(44832011)(316002)(66556008)(478600001)(4326008)(36756003)(8936002)(235185007)(8676002)(84970400001)(5660300002)(86362001)(2906002)(33964004)(41300700001)(44144004)(6486002)(186003)(4743002)(6506007)(6512007)(6666004)(38100700002)(26005)(83380400001)(2616005)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB5PR08MB9970 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT049.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 0f19fa28-e31b-4765-c1b7-08db77de37aa X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: OMS0ty3Le2+qsPyOVo0lzqoHkRqivm6+R8A9pZ9hvM3tb0yrdC2PA5lBk7MwOBvrrHTQdUImfb8qwvlacyhlhfn2aQqXt6ftpFE3mCrMkLugASFcyydxzmISDxxJHavu4YWzFi0+uwNPNf4mnblv3P/3Nof7w6roQAcSsvQKSVC3RsNpbGvZbz7OW6A8IpmBGG+aJZhn3BWrbfYOCTi0rD442staE8a1S+mD1QH7U0G9tzoFCjpksNQCPQSbIWpJDkr+oTf7BG6L7huLutD7RCzFE+AcuGIIAsGjXYZtoX0NhA5V3jxBAh/wgO30UwTKkH7x/BCs1xJhPFV8SDFMqZad7pyU9O5Y4M9FEUoLDRwaaLRHAr3WVIojE0nLzzqHFUk1BADWe+ACJ1s9ZZS/VXmzzsjRX9SNJKbUR9ZfB4EwvXtkiACX8XaOz5JverB7XnqUbfY3A8M+UBmpbI2p3DIuKT8EILgk+khGDJKnKxC4FCFQ5E8f2qeYSYQwGncj7zo3zMyiv/StS01TZ+Av5tDaKL4G8i5H2RG/4t46FZCVGKxnf0dyObX1hdNNvKORrPafAux3rlqvJ3y24yXw4pgUgyL0HRK9PzK0Canwl+JZfun9Bv+bHargKEfp6ObVQy3BdWQbKtyrMk3DLz5SmfnR1b7lPRv2YObY6f+VN8agVBacPubA/llnIWxfMzWWWLRTyXczqZ1Vqpr+GCtFAXKPUJR+rGP5ReoQxrkJ81ecxNoebORa5yvqqzr8Px5FzLU5ypy0WWQyCGKbS0/FnJrng3kDg8CbaGxsbATbG0w= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230028)(4636009)(346002)(376002)(396003)(39860400002)(136003)(451199021)(46966006)(40470700004)(36840700001)(84970400001)(33964004)(44144004)(6666004)(478600001)(6486002)(83380400001)(2616005)(47076005)(36860700001)(336012)(40480700001)(82740400003)(86362001)(40460700003)(82310400005)(6506007)(26005)(4743002)(107886003)(186003)(81166007)(2906002)(6512007)(4326008)(36756003)(6916009)(8936002)(41300700001)(70206006)(235185007)(70586007)(316002)(356005)(44832011)(8676002)(5660300002)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Jun 2023 13:47:38.3226 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 54469910-0ab0-4764-95d1-08db77de3c6e X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT049.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAXPR08MB6639 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_LOTSOFHASH, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tamar Christina via Gcc-patches From: Tamar Christina Reply-To: Tamar Christina Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769954687034362371?= X-GMAIL-MSGID: =?utf-8?q?1769954687034362371?= Hi All, I didn't want these to get lost in the noise of updates. The following three tests now correctly work for targets that have an implementation of cbranch for vectors so XFAILs are conditionally removed gated on vect_early_break support. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/testsuite/ChangeLog: * gcc.dg/vect/tsvc/vect-tsvc-s332.c: Remove xfail when early break supported. * gcc.dg/vect/tsvc/vect-tsvc-s481.c: Likewise. * gcc.dg/vect/tsvc/vect-tsvc-s482.c: Likewise. --- inline copy of patch -- diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c index 3fd490b3797d9f033c8804b813ee6e222aa45a3b..f3227bf064856c800d3152e62d2c4921bbe0d062 100644 --- diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c index 3fd490b3797d9f033c8804b813ee6e222aa45a3b..f3227bf064856c800d3152e62d2c4921bbe0d062 100644 --- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c +++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c @@ -49,4 +49,4 @@ int main (int argc, char **argv) return 0; } -/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! vect_early_break } } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c index bf98e173d2e6315ffc45477642eab7f9441c4376..441fdb2a41969c7beaf90714474802a87c0e6d04 100644 --- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c +++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c @@ -39,4 +39,4 @@ int main (int argc, char **argv) return 0; } -/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! vect_early_break} } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c index c4e26806292af03d59d5b9dc13777ba36831c7fc..5f2d2bf96c5bfc77e7c788ceb3f6d6beb677a367 100644 --- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c +++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c @@ -37,4 +37,4 @@ int main (int argc, char **argv) return 0; } -/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! vect_early_break } } } } */ --- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c +++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s332.c @@ -49,4 +49,4 @@ int main (int argc, char **argv) return 0; } -/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! vect_early_break } } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c index bf98e173d2e6315ffc45477642eab7f9441c4376..441fdb2a41969c7beaf90714474802a87c0e6d04 100644 --- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c +++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s481.c @@ -39,4 +39,4 @@ int main (int argc, char **argv) return 0; } -/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! vect_early_break} } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c index c4e26806292af03d59d5b9dc13777ba36831c7fc..5f2d2bf96c5bfc77e7c788ceb3f6d6beb677a367 100644 --- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c +++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s482.c @@ -37,4 +37,4 @@ int main (int argc, char **argv) return 0; } -/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail *-*-* } } } */ +/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! vect_early_break } } } } */ From patchwork Wed Jun 28 13:47:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 113908 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp8942272vqr; Wed, 28 Jun 2023 06:53:18 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4EcBtcc4IxlNxlp/uvU0ciAr8fJwHsAWhmB/P0dk7u+bLcE2MyTSCHxOwAvjNd6IBvHlmn X-Received: by 2002:a05:6402:2c2:b0:51d:b81a:1e84 with SMTP id b2-20020a05640202c200b0051db81a1e84mr2158247edx.27.1687960398405; Wed, 28 Jun 2023 06:53:18 -0700 (PDT) Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id w20-20020aa7cb54000000b0051bfcce9fe3si5514704edt.394.2023.06.28.06.53.17 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Jun 2023 06:53:18 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=oetiXifT; arc=fail (signature failed); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8E9DF388201E for ; Wed, 28 Jun 2023 13:50:28 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8E9DF388201E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1687960228; bh=7cQUOCb9O4RQKIYMICQnHdMn5OMfSW+lU9RkAql7UKQ=; h=Date:To:Cc:Subject:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=oetiXifTh3Setuqmanh7c2fKb/OSQVdajjgOuposmIr/NLbTXz5GzqvKpkzlO04MR WnD/PIBRi94pJ6QZSFb5phRaAiajWkvxiCD/NDz7mc8ApkDfyliKNVpQFpxjXGE1FN CjTXj56oFmSKqBS/xTrjbQMTm/J9ToMKhavPqvTQ= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR04-VI1-obe.outbound.protection.outlook.com (mail-vi1eur04on2051.outbound.protection.outlook.com [40.107.8.51]) by sourceware.org (Postfix) with ESMTPS id 6A325385C6E9 for ; Wed, 28 Jun 2023 13:48:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6A325385C6E9 Received: from AM6P195CA0021.EURP195.PROD.OUTLOOK.COM (2603:10a6:209:81::34) by DBBPR08MB6073.eurprd08.prod.outlook.com (2603:10a6:10:1f7::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.26; Wed, 28 Jun 2023 13:48:00 +0000 Received: from AM7EUR03FT052.eop-EUR03.prod.protection.outlook.com (2603:10a6:209:81:cafe::ef) by AM6P195CA0021.outlook.office365.com (2603:10a6:209:81::34) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.34 via Frontend Transport; Wed, 28 Jun 2023 13:47:59 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM7EUR03FT052.mail.protection.outlook.com (100.127.140.214) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6544.16 via Frontend Transport; Wed, 28 Jun 2023 13:47:59 +0000 Received: ("Tessian outbound d6c4ee3ba1eb:v142"); Wed, 28 Jun 2023 13:47:59 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 422a11f2dcbfb0cb X-CR-MTA-TID: 64aa7808 Received: from 7efad1aac2ff.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 78B1773D-E2E4-4793-94B6-BA84B644B36C.1; Wed, 28 Jun 2023 13:47:52 +0000 Received: from EUR05-DB8-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 7efad1aac2ff.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 28 Jun 2023 13:47:52 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=h74E+Qh9EWVBzpgAxup5hW73d9oqPgtH8S4ZB85vI/e3oqvxSGUjPaJ3dA3eUlq/9Q/IwGROIsaiv5STPSAeCyS6f4xTh8MhMFHx1jKyKlyftFaoz+aenSrLctdlFzPoqxx+FI60p25mvOFn3i45E5YHsBeni3cgnuEPub3oAu5YGZqiwNo2BcUdnXGSEn1qRxoUnQELgvZJkdyeYN1XhQblImJ8F6RV3fNJkY5YiKopmPzRHLx/0EOMWWfDUuOtKridbNb1zkVkmP1HIetkXinpDaRD21P/KkuFJYzyw0fVPhrtkRCnB4N+xq0y0pmWlqzIlyR5GA7O3FOnrdftgg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=7cQUOCb9O4RQKIYMICQnHdMn5OMfSW+lU9RkAql7UKQ=; b=GIut8qAjJn3sTT5sVGyH1B8RrNkxHs7LCj2myJh3vSPXjHv26nBgdvzm/j0R5dbvt2CEz8YJKjDwML8KugDSa+25c3JkWWca+V3f2DYc2MxXOoQcJQk4SgIj6KaHUZRf5/sMV/44b+wtsNefvVmFSUHgD+3qMdlES/OT65lRO2eqAqqxeUQaMraoahv/wwqDvctg9FWue/aGvwDHUSETnOgt1kl090Uxnt7hCKASA4JGUfNxJi3yMGj9MflZPG0qpbyyLW4571AhmYGam2Iswb3a6ARjjGKLXoVKnrgoJERElHczH1Izzz7xpUwvh9TE2f7GTxtnubRz2VzjWHjQMQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by DB5PR08MB9970.eurprd08.prod.outlook.com (2603:10a6:10:489::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.23; Wed, 28 Jun 2023 13:47:48 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::2301:1cde:cfe7:eaf0]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::2301:1cde:cfe7:eaf0%6]) with mapi id 15.20.6521.026; Wed, 28 Jun 2023 13:47:48 +0000 Date: Wed, 28 Jun 2023 14:47:44 +0100 To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, rguenther@suse.de, jlaw@ventanamicro.com Subject: [PATCH 14/19]middle-end testsuite: Add new tests for early break vectorization. Message-ID: Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: LO0P265CA0007.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:355::18) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|DB5PR08MB9970:EE_|AM7EUR03FT052:EE_|DBBPR08MB6073:EE_ X-MS-Office365-Filtering-Correlation-Id: b571b68e-2ed5-4b20-1569-08db77de4946 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: EVTDPdYTL/nmuEfrqrs+A7WllpkyuG2FByHruhZZ7eD1Yvw1PkQFLB5d8hQObj6gk0L71gWD3ajvAhr4apR/4ai9rxKS3i+giGhOZsBXcwBwWRbWWmUjT+SJepLTZhZCz+RVOZ7ydlxum7lxiiiu40UM4e3U1xBndxQ/4FYIKRp4ecqS7WyX4wgZuefIuSYmZCkQtLUr5m+u6LGaLjX6zfuctebmDeUTW8huJnovQ9ZwV3mCr8AScjEvS+RmeC66Sj/HRKgZWRf+WFOhv2ZMRfM36+dNS/YAbWbr9kUk58d6dQ6KWHRkrFTAkz2GCaIziMs2oMz30zobF1CMdcK+uGlstt3b+OeOz6t+lD0g3pcykP2fWtMnS4adajuDNQmjLlcAS9y2O0plG/gCeONqD3d1KgHNhN6S92SMi69FUA3h1zK1rLsmx/BKt5pOsJkng4NAeJgxlZ7CTaw/46VCmUuUWSw44M5sMiM/Cxh/bpzSFSxMMVIOlDoap9v6S0byTG7MztBoTpZAKVBYwjWQVnqnNOpD+IMRIJSrgwd16HW4+oSFsxBHbpZXINMtdhrgNfc8UxGYxJvsyxxgdLdkc0u8D758MNE858rtci/AIt73yzvoW+vGvEPo2rhkO7tX1XjdsZ0rvZZtqLW+9yCrhQ== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230028)(4636009)(136003)(346002)(396003)(376002)(366004)(39860400002)(451199021)(66946007)(66476007)(6916009)(40140700001)(44832011)(316002)(66556008)(478600001)(4326008)(36756003)(8936002)(235185007)(66899021)(8676002)(84970400001)(5660300002)(86362001)(2906002)(33964004)(41300700001)(44144004)(6486002)(30864003)(186003)(4743002)(6506007)(6512007)(6666004)(38100700002)(26005)(83380400001)(2616005)(2700100001)(579004)(559001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB5PR08MB9970 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM7EUR03FT052.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 51231607-1081-4dee-4ea9-08db77de425b X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: QQGReWc0YBNqB9CRWxqXQKd1bnfJcIDthzUTut5r6EhrRb69a3H8M39gylbpjuG5cLKM+SNUbbAagpI0g2xS2JONqKBlV9NxvPOuxLN8fPPXNBfSZ3U9vCNNmao6jUWlVcFiNe/PaM0TvyRgUj/1o/D23+nSWvfVUuKDCbbvzozv+VZ2Ea69YseB6OfiMbE6seJDLoebGh53WXGorkGLU311jxc0RDii9EyrmFM6X1cCisqRrhFDoEF963HEk9VM+jWCXnleWFJh9pNIg4ozjBg+0Sjxal3kJ2KtMZTuofAB0xjcHZmIyQWA9qqqvPilHrQBcXQkSP/3cz5CjMyciw4aUji3Z56z5JeQJg6Rhe3n/AMW4T6ElGHg+hJR5eEjw6zpz9D6n5cUaWVY+MZLmnnzY09k/aiFb6k8JBvuxuNQJrfjrj48tcnUx23HPqHGKFsJV6usKSZzcih/Yh3+PdfQqehpSW7Xsx5hAEMYHesJvs8Sn9jasyQP7getsWa7w0y9BR/1oyFurNY4yAiASHiNW5jvDSLliQenQWGoZg+wyM7mIbz8gkb6mqlKfip54DzW346CvAD5Yyn0VTR3lqF6sxD7AYKvHhJjTLAqXgqgXTIk73A8zAeKVkjnVYZnqQ5msMmz13LSnJznLBATFpKGrU65+HkeN4asStV25EbFbKue3L3eAc3urhmCTPCQr+3UdO9Ks7gB3cbo3yWaENfVQBXX2ouiIkM/4LAeZlNcNGNwYuSl+Ck3yFU/pny1zxKQ71vB0bQEnmb9dDRm90iSvNJ1GKSeEsYMhl1d6AahoZu6GktThXi2H8WelJKg X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230028)(4636009)(39860400002)(136003)(346002)(396003)(376002)(451199021)(46966006)(36840700001)(40470700004)(81166007)(356005)(82740400003)(84970400001)(47076005)(36756003)(36860700001)(83380400001)(40140700001)(86362001)(2616005)(4743002)(8936002)(6486002)(40460700003)(33964004)(44144004)(6666004)(41300700001)(70206006)(8676002)(70586007)(316002)(6916009)(336012)(26005)(4326008)(6512007)(107886003)(186003)(6506007)(44832011)(66899021)(82310400005)(40480700001)(30864003)(478600001)(2906002)(5660300002)(235185007)(2700100001)(559001)(579004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Jun 2023 13:47:59.8207 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: b571b68e-2ed5-4b20-1569-08db77de4946 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM7EUR03FT052.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DBBPR08MB6073 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_LOTSOFHASH, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tamar Christina via Gcc-patches From: Tamar Christina Reply-To: Tamar Christina Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769954763060449832?= X-GMAIL-MSGID: =?utf-8?q?1769954763060449832?= Hi All, This adds new test to check for all the early break functionality. It includes a number of codegen and runtime tests checking the values at different needles in the array. They also check the values on different array sizes and peeling positions, datatypes, VL, ncopies and every other variant I could think of. Additionally it also contains reduced cases from issues found running over various codebases. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Also regtested with: -march=armv8.3-a+sve -march=armv8.3-a+nosve -march=armv9-a Ok for master? Thanks, Tamar gcc/ChangeLog: * doc/sourcebuild.texi: Document vect_early_break. gcc/testsuite/ChangeLog: * lib/target-supports.exp (vect_early_break): New. * gcc.dg/vect/vect-early-break-run_1.c: New test. * gcc.dg/vect/vect-early-break-run_10.c: New test. * gcc.dg/vect/vect-early-break-run_2.c: New test. * gcc.dg/vect/vect-early-break-run_3.c: New test. * gcc.dg/vect/vect-early-break-run_4.c: New test. * gcc.dg/vect/vect-early-break-run_5.c: New test. * gcc.dg/vect/vect-early-break-run_6.c: New test. * gcc.dg/vect/vect-early-break-run_7.c: New test. * gcc.dg/vect/vect-early-break-run_8.c: New test. * gcc.dg/vect/vect-early-break-run_9.c: New test. * gcc.dg/vect/vect-early-break-template_1.c: New test. * gcc.dg/vect/vect-early-break-template_2.c: New test. * gcc.dg/vect/vect-early-break_1.c: New test. * gcc.dg/vect/vect-early-break_10.c: New test. * gcc.dg/vect/vect-early-break_11.c: New test. * gcc.dg/vect/vect-early-break_12.c: New test. * gcc.dg/vect/vect-early-break_13.c: New test. * gcc.dg/vect/vect-early-break_14.c: New test. * gcc.dg/vect/vect-early-break_15.c: New test. * gcc.dg/vect/vect-early-break_16.c: New test. * gcc.dg/vect/vect-early-break_17.c: New test. * gcc.dg/vect/vect-early-break_18.c: New test. * gcc.dg/vect/vect-early-break_19.c: New test. * gcc.dg/vect/vect-early-break_2.c: New test. * gcc.dg/vect/vect-early-break_20.c: New test. * gcc.dg/vect/vect-early-break_21.c: New test. * gcc.dg/vect/vect-early-break_22.c: New test. * gcc.dg/vect/vect-early-break_23.c: New test. * gcc.dg/vect/vect-early-break_24.c: New test. * gcc.dg/vect/vect-early-break_25.c: New test. * gcc.dg/vect/vect-early-break_26.c: New test. * gcc.dg/vect/vect-early-break_27.c: New test. * gcc.dg/vect/vect-early-break_28.c: New test. * gcc.dg/vect/vect-early-break_29.c: New test. * gcc.dg/vect/vect-early-break_3.c: New test. * gcc.dg/vect/vect-early-break_30.c: New test. * gcc.dg/vect/vect-early-break_31.c: New test. * gcc.dg/vect/vect-early-break_32.c: New test. * gcc.dg/vect/vect-early-break_33.c: New test. * gcc.dg/vect/vect-early-break_34.c: New test. * gcc.dg/vect/vect-early-break_35.c: New test. * gcc.dg/vect/vect-early-break_36.c: New test. * gcc.dg/vect/vect-early-break_37.c: New test. * gcc.dg/vect/vect-early-break_38.c: New test. * gcc.dg/vect/vect-early-break_39.c: New test. * gcc.dg/vect/vect-early-break_4.c: New test. * gcc.dg/vect/vect-early-break_40.c: New test. * gcc.dg/vect/vect-early-break_41.c: New test. * gcc.dg/vect/vect-early-break_42.c: New test. * gcc.dg/vect/vect-early-break_43.c: New test. * gcc.dg/vect/vect-early-break_44.c: New test. * gcc.dg/vect/vect-early-break_45.c: New test. * gcc.dg/vect/vect-early-break_46.c: New test. * gcc.dg/vect/vect-early-break_47.c: New test. * gcc.dg/vect/vect-early-break_48.c: New test. * gcc.dg/vect/vect-early-break_49.c: New test. * gcc.dg/vect/vect-early-break_5.c: New test. * gcc.dg/vect/vect-early-break_50.c: New test. * gcc.dg/vect/vect-early-break_51.c: New test. * gcc.dg/vect/vect-early-break_52.c: New test. * gcc.dg/vect/vect-early-break_53.c: New test. * gcc.dg/vect/vect-early-break_54.c: New test. * gcc.dg/vect/vect-early-break_55.c: New test. * gcc.dg/vect/vect-early-break_56.c: New test. * gcc.dg/vect/vect-early-break_57.c: New test. * gcc.dg/vect/vect-early-break_58.c: New test. * gcc.dg/vect/vect-early-break_59.c: New test. * gcc.dg/vect/vect-early-break_6.c: New test. * gcc.dg/vect/vect-early-break_60.c: New test. * gcc.dg/vect/vect-early-break_61.c: New test. * gcc.dg/vect/vect-early-break_62.c: New test. * gcc.dg/vect/vect-early-break_63.c: New test. * gcc.dg/vect/vect-early-break_64.c: New test. * gcc.dg/vect/vect-early-break_65.c: New test. * gcc.dg/vect/vect-early-break_66.c: New test. * gcc.dg/vect/vect-early-break_7.c: New test. * gcc.dg/vect/vect-early-break_8.c: New test. * gcc.dg/vect/vect-early-break_9.c: New test. --- inline copy of patch -- diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi index 526020c751150cd74f766eb83eaf61de6f4374cf..090ceebd7befb3ace9b0d498b74a4e3474990b91 100644 --- diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi index 526020c751150cd74f766eb83eaf61de6f4374cf..090ceebd7befb3ace9b0d498b74a4e3474990b91 100644 --- a/gcc/doc/sourcebuild.texi +++ b/gcc/doc/sourcebuild.texi @@ -1636,6 +1636,10 @@ Target supports hardware vectors of @code{float} when @option{-funsafe-math-optimizations} is not in effect. This implies @code{vect_float}. +@item vect_early_break +Target supports hardware vectorization of loops with early breaks. +This requires an implementation of the cbranch optab for vectors. + @item vect_int Target supports hardware vectors of @code{int}. diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_1.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_1.c new file mode 100644 index 0000000000000000000000000000000000000000..2495b36a72eae94cb7abc4a0d17a5c979fd78083 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_1.c @@ -0,0 +1,11 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast -save-temps" } */ + +#define N 803 +#define P 0 +#include "vect-early-break-template_1.c" + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_10.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_10.c new file mode 100644 index 0000000000000000000000000000000000000000..9bcd7f7e57ef9a1d4649d18569b3406050e54603 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_10.c @@ -0,0 +1,11 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast -save-temps" } */ + +#define N 800 +#define P 799 +#include "vect-early-break-template_2.c" + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_2.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_2.c new file mode 100644 index 0000000000000000000000000000000000000000..63f63101a467909f328be7f3acbc5bcb721967ff --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_2.c @@ -0,0 +1,11 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast -save-temps" } */ + +#define N 803 +#define P 802 +#include "vect-early-break-template_1.c" + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_3.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_3.c new file mode 100644 index 0000000000000000000000000000000000000000..626b95e9b8517081d41d794e9e0264d6301c8589 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_3.c @@ -0,0 +1,11 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast -save-temps" } */ + +#define N 803 +#define P 5 +#include "vect-early-break-template_1.c" + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_4.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_4.c new file mode 100644 index 0000000000000000000000000000000000000000..7e0e6426120551152a7bd800c15d9ed6ab15bada --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_4.c @@ -0,0 +1,11 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast -save-temps" } */ + +#define N 803 +#define P 278 +#include "vect-early-break-template_1.c" + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_5.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_5.c new file mode 100644 index 0000000000000000000000000000000000000000..242cf486f9c40055df0aef5fd238d1aff7a7c7da --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_5.c @@ -0,0 +1,11 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast -save-temps" } */ + +#define N 800 +#define P 799 +#include "vect-early-break-template_1.c" + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_6.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_6.c new file mode 100644 index 0000000000000000000000000000000000000000..9fe7136b7213a463ca6573c60476b7c8f531ddcb --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_6.c @@ -0,0 +1,11 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast -save-temps" } */ + +#define N 803 +#define P 0 +#include "vect-early-break-template_2.c" + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_7.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_7.c new file mode 100644 index 0000000000000000000000000000000000000000..02f93d77dba31b938f6fd9e8c7f5e4acde4aeec9 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_7.c @@ -0,0 +1,11 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast -save-temps" } */ + +#define N 803 +#define P 802 +#include "vect-early-break-template_2.c" + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_8.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_8.c new file mode 100644 index 0000000000000000000000000000000000000000..a614925465606b54c638221ffb95a5e8d3bee797 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_8.c @@ -0,0 +1,11 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast -save-temps" } */ + +#define N 803 +#define P 5 +#include "vect-early-break-template_2.c" + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_9.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_9.c new file mode 100644 index 0000000000000000000000000000000000000000..94e2b9c301456eda8f9ad7eaa67604563f0afee7 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_9.c @@ -0,0 +1,11 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast -save-temps" } */ + +#define N 803 +#define P 278 +#include "vect-early-break-template_2.c" + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-template_1.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-template_1.c new file mode 100644 index 0000000000000000000000000000000000000000..af70a8e2a5a9dc9756edb5580f2de02ddcc95de9 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-template_1.c @@ -0,0 +1,47 @@ +#ifndef N +#define N 803 +#endif + +#ifndef P +#define P 0 +#endif + +unsigned vect_a[N] = {0}; +unsigned vect_b[N] = {0}; + +__attribute__((noipa, noinline)) +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a[i] > x) + break; + vect_a[i] = x; + + } + return ret; +} + +extern void abort (); + +int main () +{ + + int x = 1; + int idx = P; + vect_a[idx] = x + 1; + + test4(x); + + if (vect_b[idx] != (x + idx)) + abort (); + + if (vect_a[idx] != x + 1) + abort (); + + if (idx > 0 && vect_a[idx-1] != x) + abort (); + +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-template_2.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-template_2.c new file mode 100644 index 0000000000000000000000000000000000000000..d0f924d904437e71567d27cc1f1089e5607dca0d --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-template_2.c @@ -0,0 +1,50 @@ +#ifndef N +#define N 803 +#endif + +#ifndef P +#define P 0 +#endif + +unsigned vect_a[N] = {0}; +unsigned vect_b[N] = {0}; + +__attribute__((noipa, noinline)) +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a[i] > x) + return i; + vect_a[i] = x; + + } + return ret; +} + +extern void abort (); + +int main () +{ + + int x = 1; + int idx = P; + vect_a[idx] = x + 1; + + unsigned res = test4(x); + + if (res != idx) + abort (); + + if (vect_b[idx] != (x + idx)) + abort (); + + if (vect_a[idx] != x + 1) + abort (); + + if (idx > 0 && vect_a[idx-1] != x) + abort (); + +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_1.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_1.c new file mode 100644 index 0000000000000000000000000000000000000000..51e7d6489b99c25b9b4b3d1c839f98562b6d4dd7 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_1.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#ifndef N +#define N 803 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a[i] > x) + break; + vect_a[i] = x; + + } + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_10.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_10.c new file mode 100644 index 0000000000000000000000000000000000000000..9e4ad1763202dfdab3ed7961ead5114fcc61a11b --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_10.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#ifndef N +#define N 803 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x,int y, int z) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a[i] > x) + break; + vect_a[i] = x; + } + + ret = x + y * z; + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_11.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_11.c new file mode 100644 index 0000000000000000000000000000000000000000..a613dd9909fb09278dd92a81a24ef854994a9890 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_11.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#ifndef N +#define N 803 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x, int y) +{ + unsigned ret = 0; +for (int o = 0; o < y; o++) +{ + ret += o; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a[i] > x) + break; + vect_a[i] = x; + + } +} + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_12.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_12.c new file mode 100644 index 0000000000000000000000000000000000000000..cc10f3238f1cb8e1307e024a3ebcb5c25a39d1b2 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_12.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#ifndef N +#define N 803 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x, int y) +{ + unsigned ret = 0; +for (int o = 0; o < y; o++) +{ + ret += o; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a[i] > x) + return vect_a[i]; + vect_a[i] = x; + + } +} + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_13.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_13.c new file mode 100644 index 0000000000000000000000000000000000000000..6967b7395ed7c19e38a436d6edcfe7c1580c7113 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_13.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#ifndef N +#define N 803 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a[i] > x) + return vect_a[i] * x; + vect_a[i] = x; + + } + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_14.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_14.c new file mode 100644 index 0000000000000000000000000000000000000000..03cce5cf6cadecb520b46be666bf608e3bc6a511 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_14.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#define N 803 +unsigned vect_a[N]; +unsigned vect_b[N]; + +int test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a[i] > x) + return i; + vect_a[i] += x * vect_b[i]; + + } + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_15.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_15.c new file mode 100644 index 0000000000000000000000000000000000000000..dec6872e1115ff66695f5a500ffa7ca01c0f8d3a --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_15.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#define N 803 +unsigned vect_a[N]; +unsigned vect_b[N]; + +int test4(unsigned x) +{ + int ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a[i] > x) + return i; + vect_a[i] += x * vect_b[i]; + + } + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_16.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_16.c new file mode 100644 index 0000000000000000000000000000000000000000..30812d12a39bd94b4b8a3aade6512b162697d659 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_16.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#define N 1024 +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a[i] > x) + return vect_a[i]; + vect_a[i] = x; + ret += vect_a[i] + vect_b[i]; + } + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_17.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_17.c new file mode 100644 index 0000000000000000000000000000000000000000..510227a18435a8e47c5a754580180c6d340c0823 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_17.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#define N 1024 +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a[i] > x) + return vect_a[i]; + vect_a[i] = x; + ret = vect_a[i] + vect_b[i]; + } + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_18.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_18.c new file mode 100644 index 0000000000000000000000000000000000000000..1372f79242b250cabbab29757b62cbc28a9064a8 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_18.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#ifndef N +#define N 803 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i+=2) + { + vect_b[i] = x + i; + if (vect_a[i] > x) + break; + vect_a[i] = x; + + } + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_19.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_19.c new file mode 100644 index 0000000000000000000000000000000000000000..677487f7da496a8f467d8c529575d47ff22c6a31 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_19.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#ifndef N +#define N 803 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x, unsigned step) +{ + unsigned ret = 0; + for (int i = 0; i < N; i+=step) + { + vect_b[i] = x + i; + if (vect_a[i] > x) + break; + vect_a[i] = x; + + } + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c new file mode 100644 index 0000000000000000000000000000000000000000..7268f6ae2485d0274fd85ea53cc1e44ef4b84d5c --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#include + +#define N 1024 +complex double vect_a[N]; +complex double vect_b[N]; + +complex double test4(complex double x) +{ + complex double ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] += x + i; + if (vect_a[i] == x) + return i; + vect_a[i] += x * vect_b[i]; + + } + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_20.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_20.c new file mode 100644 index 0000000000000000000000000000000000000000..ed41377d1c979bf14e0a4e80401831c09ffa463f --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_20.c @@ -0,0 +1,37 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#include + +#ifndef N +#define N 803 +#endif +unsigned vect_b[N]; +struct testStruct { + long e; + long f; + bool a : 1; + bool b : 1; + int c : 14; + int d; +}; +struct testStruct vect_a[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a[i].a > x) + return true; + vect_a[i].e = x; + } + return ret; +} + diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c new file mode 100644 index 0000000000000000000000000000000000000000..6415e4951cb9ef70e56b7cfb1db3d3151368666d --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c @@ -0,0 +1,37 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#include + +#ifndef N +#define N 803 +#endif +unsigned vect_b[N]; +struct testStruct { + long e; + long f; + bool a : 1; + bool b : 1; + int c : 14; + int d; +}; +struct testStruct vect_a[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a[i].a) + return true; + vect_a[i].e = x; + } + return ret; +} + diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c new file mode 100644 index 0000000000000000000000000000000000000000..2ca189899fb6bd6dfdf63de7729f54e3bee06ba0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c @@ -0,0 +1,45 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target vect_perm } */ +/* { dg-require-effective-target vect_early_break } */ + +#include "tree-vect.h" + +void __attribute__((noipa)) +foo (int * __restrict__ a, short * __restrict__ b, int * __restrict__ c) +{ + int t1 = *c; + int t2 = *c; + for (int i = 0; i < 64; i+=2) + { + b[i] = a[i] - t1; + t1 = a[i]; + b[i+1] = a[i+1] - t2; + t2 = a[i+1]; + } +} + +int a[64]; +short b[64]; + +int +main () +{ + check_vect (); + for (int i = 0; i < 64; ++i) + { + a[i] = i; + __asm__ volatile ("" ::: "memory"); + } + int c = 7; + foo (a, b, &c); + for (int i = 2; i < 64; i+=2) + if (b[i] != a[i] - a[i-2] + || b[i+1] != a[i+1] - a[i-1]) + abort (); + if (b[0] != -7 || b[1] != -6) + abort (); + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_23.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_23.c new file mode 100644 index 0000000000000000000000000000000000000000..f3298656d5d67fd137c4029a96a2f9c1bae344ce --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_23.c @@ -0,0 +1,61 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +#define N 200 +#define M 4 + +typedef signed char sc; +typedef unsigned char uc; +typedef signed short ss; +typedef unsigned short us; +typedef int si; +typedef unsigned int ui; +typedef signed long long sll; +typedef unsigned long long ull; + +#define FOR_EACH_TYPE(M) \ + M (sc) M (uc) \ + M (ss) M (us) \ + M (si) M (ui) \ + M (sll) M (ull) \ + M (float) M (double) + +#define TEST_VALUE(I) ((I) * 17 / 2) + +#define ADD_TEST(TYPE) \ + void __attribute__((noinline, noclone)) \ + test_##TYPE (TYPE *a, TYPE *b) \ + { \ + for (int i = 0; i < N; i += 2) \ + { \ + a[i + 0] = b[i + 0] + 2; \ + a[i + 1] = b[i + 1] + 3; \ + } \ + } + +#define DO_TEST(TYPE) \ + for (int j = 1; j < M; ++j) \ + { \ + TYPE a[N + M]; \ + for (int i = 0; i < N + M; ++i) \ + a[i] = TEST_VALUE (i); \ + test_##TYPE (a + j, a); \ + for (int i = 0; i < N; i += 2) \ + if (a[i + j] != (TYPE) (a[i] + 2) \ + || a[i + j + 1] != (TYPE) (a[i + 1] + 3)) \ + __builtin_abort (); \ + } + +FOR_EACH_TYPE (ADD_TEST) + +int +main (void) +{ + FOR_EACH_TYPE (DO_TEST) + return 0; +} + +/* { dg-final { scan-tree-dump {flags: [^\n]*ARBITRARY\n} "vect" { target vect_int } } } */ +/* { dg-final { scan-tree-dump "using an address-based overlap test" "vect" } } */ +/* { dg-final { scan-tree-dump-not "using an index-based" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_24.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_24.c new file mode 100644 index 0000000000000000000000000000000000000000..7b4b2ffb9b75db6d5ca7e313d1f18d9b51f5b566 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_24.c @@ -0,0 +1,46 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vect_double } */ +/* { dg-require-effective-target vect_early_break } */ + +#include "tree-vect.h" + +extern void abort (void); +void __attribute__((noinline,noclone)) +foo (double *b, double *d, double *f) +{ + int i; + for (i = 0; i < 1024; i++) + { + d[2*i] = 2. * d[2*i]; + d[2*i+1] = 4. * d[2*i+1]; + b[i] = d[2*i] - 1.; + f[i] = d[2*i+1] + 2.; + } +} +int main() +{ + double b[1024], d[2*1024], f[1024]; + int i; + + check_vect (); + + for (i = 0; i < 2*1024; i++) + d[i] = 1.; + foo (b, d, f); + for (i = 0; i < 1024; i+= 2) + { + if (d[2*i] != 2.) + abort (); + if (d[2*i+1] != 4.) + abort (); + } + for (i = 0; i < 1024; i++) + { + if (b[i] != 1.) + abort (); + if (f[i] != 6.) + abort (); + } + return 0; +} + diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_25.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_25.c new file mode 100644 index 0000000000000000000000000000000000000000..8db9b60128b9e21529ae73ea1902afb8fa327112 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_25.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* Disabling epilogues until we find a better way to deal with scans. */ +/* { dg-additional-options "--param vect-epilogues-nomask=0" } */ +/* { dg-require-effective-target vect_int } */ + +#include "vect-peel-1-src.c" + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 14 "vect" { target { { vect_element_align } && { vect_aligned_arrays } } xfail { ! vect_unaligned_possible } } } } */ +/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 1 "vect" { xfail vect_element_align_preferred } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c new file mode 100644 index 0000000000000000000000000000000000000000..5905847cc0b6b393dde728a9f4ecb44c8ab42da5 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c @@ -0,0 +1,44 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target vect_perm } */ + +#include "tree-vect.h" + +void __attribute__((noipa)) +foo (int * __restrict__ a, int * __restrict__ b, int * __restrict__ c) +{ + int t1 = *c; + int t2 = *c; + for (int i = 0; i < 64; i+=2) + { + b[i] = a[i] - t1; + t1 = a[i]; + b[i+1] = a[i+1] - t2; + t2 = a[i+1]; + } +} + +int a[64], b[64]; + +int +main () +{ + check_vect (); + for (int i = 0; i < 64; ++i) + { + a[i] = i; + __asm__ volatile ("" ::: "memory"); + } + int c = 7; + foo (a, b, &c); + for (int i = 2; i < 64; i+=2) + if (b[i] != a[i] - a[i-2] + || b[i+1] != a[i+1] - a[i-1]) + abort (); + if (b[0] != -7 || b[1] != -6) + abort (); + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_27.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_27.c new file mode 100644 index 0000000000000000000000000000000000000000..d0cfbb01667fa016d72828d098aeaa252c2c9318 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_27.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +void abort (); +int a[128]; + +int main () +{ + int i; + for (i = 1; i < 128; i++) + if (a[i] != i%4 + 1) + abort (); + if (a[0] != 5) + abort (); +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_28.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_28.c new file mode 100644 index 0000000000000000000000000000000000000000..a5eae81f3f5f5b7d92082f1588c6453a71e205cc --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_28.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +void abort (); +int a[128]; +int main () +{ + int i; + for (i = 1; i < 128; i++) + if (a[i] != i%4 + 1) + abort (); +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_29.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_29.c new file mode 100644 index 0000000000000000000000000000000000000000..75d87e99e939fab61f751be025ca0398fa5bd078 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_29.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +int in[100]; +int out[100 * 2]; + +int main (void) +{ + if (out[0] != in[100 - 1]) + for (int i = 1; i <= 100; ++i) + if (out[i] != 2) + __builtin_abort (); +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_3.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_3.c new file mode 100644 index 0000000000000000000000000000000000000000..3c6d28bd2d6e6e794146baf89e43c3b70293b7d9 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_3.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ + +unsigned test4(char x, char *vect, int n) +{ + unsigned ret = 0; + for (int i = 0; i < n; i++) + { + if (vect[i] > x) + return 1; + + vect[i] = x; + } + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_30.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_30.c new file mode 100644 index 0000000000000000000000000000000000000000..e09d883db84685679e73867d83aba9900563983d --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_30.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +int x[100]; +int choose1(int); +int choose2(); +void consume(int); +void f() { + for (int i = 0; i < 100; ++i) { + if (x[i] == 11) { + if (choose1(i)) + goto A; + else + goto B; + } + } + if (choose2()) + goto B; +A: + for (int i = 0; i < 100; ++i) + consume(i); +B: + for (int i = 0; i < 100; ++i) + consume(i * i); +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_31.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_31.c new file mode 100644 index 0000000000000000000000000000000000000000..6001523162d24d140af73143435f25bcd3a217c8 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_31.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#ifndef N +#define N 1025 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a[i] > x) + return vect_a[i]; + vect_a[i] = x; + ret += vect_a[i] + vect_b[i]; + } + return ret; +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_32.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_32.c new file mode 100644 index 0000000000000000000000000000000000000000..73abddc267a0170c2d97a7e7c680525721455f22 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_32.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#ifndef N +#define N 1024 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a[i] > x) + return vect_a[i]; + vect_a[i] = x; + ret = vect_a[i] + vect_b[i]; + } + return ret; +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_33.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_33.c new file mode 100644 index 0000000000000000000000000000000000000000..29b37f70939af7fa9409edd3a1e29f718c959706 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_33.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +#ifndef N +#define N 803 +#endif +unsigned vect_a2[N]; +unsigned vect_a1[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x, int z) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a1[i]*2 > x) + { + for (int y = 0; y < z; y++) + vect_a2 [y] *= vect_a1[i]; + break; + } + } + return ret; +} + +/* { dg-final { scan-tree-dump "vectorized 2 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_34.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_34.c new file mode 100644 index 0000000000000000000000000000000000000000..2c48e3cee33fc37f45ef59c2bbaff7bc5a76b460 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_34.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +#ifndef N +#define N 803 +#endif + +unsigned vect_a[N] __attribute__ ((aligned (4)));; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + + for (int i = 1; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a[i]*2 > x) + break; + vect_a[i] = x; + + } + return ret; +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_35.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_35.c new file mode 100644 index 0000000000000000000000000000000000000000..3442484a81161f9bd09e30bc268fbcf66a899902 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_35.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +#ifndef N +#define N 803 +#endif +unsigned vect_a2[N]; +unsigned vect_a1[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a1[i]*2 > x) + break; + vect_a1[i] = x; + if (vect_a2[i]*4 > x) + break; + vect_a2[i] = x*x; + } + return ret; +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_36.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_36.c new file mode 100644 index 0000000000000000000000000000000000000000..027766c51f508eab157db365a1653f3e92dcac10 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_36.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +#ifndef N +#define N 803 +#endif +unsigned vect_a2[N]; +unsigned vect_a1[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a1[i]*2 > x) + break; + vect_a1[i] = x; + if (vect_a2[i]*4 > x) + return i; + vect_a2[i] = x*x; + } + return ret; +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_37.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_37.c new file mode 100644 index 0000000000000000000000000000000000000000..8d363120898232bb1402b9cf7b4b83b38a10505b --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_37.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +#ifndef N +#define N 4 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a[i]*2 != x) + break; + vect_a[i] = x; + + } + return ret; +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_38.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_38.c new file mode 100644 index 0000000000000000000000000000000000000000..226d55d7194ca3f676ab52976fea25b7e335bbec --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_38.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +#ifndef N +#define N 803 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i+=2) + { + vect_b[i] = x + i; + if (vect_a[i]*2 > x) + break; + vect_a[i] = x; + + } + return ret; +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c new file mode 100644 index 0000000000000000000000000000000000000000..554e6ec84318c600c87982ad6ef0f90e8b47af01 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +#ifndef N +#define N 803 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x, unsigned n) +{ + unsigned ret = 0; + for (int i = 0; i < N; i+= (N % 4)) + { + vect_b[i] = x + i; + if (vect_a[i]*2 > x) + break; + vect_a[i] = x; + + } + return ret; +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_4.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_4.c new file mode 100644 index 0000000000000000000000000000000000000000..216c56faf330449bf1969b7e51ff1e94270dc861 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_4.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ + +#define N 1024 +unsigned vect[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + if (i > 16 && vect[i] > x) + break; + + vect[i] = x; + } + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_40.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_40.c new file mode 100644 index 0000000000000000000000000000000000000000..f2ae372cd96e74cc06254937c2b8fa69ecdedf09 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_40.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +#ifndef N +#define N 803 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i*=3) + { + vect_b[i] = x + i; + if (vect_a[i]*2 > x) + break; + vect_a[i] = x; + + } + return ret; +} + +/* SCEV can't currently analyze this loop bounds. */ +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_41.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_41.c new file mode 100644 index 0000000000000000000000000000000000000000..6ad9b3f17ddb953bfbf614e9331fa81f565b262f --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_41.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +#ifndef N +#define N 803 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; +#pragma GCC novector +#pragma GCC unroll 4 + for (int i = 0; i < N; i++) + { + vect_b[i] += vect_a[i] + x; + } + return ret; +} + +/* novector should have blocked vectorization. */ +/* { dg-final { scan-tree-dump-not "vectorized \d loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_42.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_42.c new file mode 100644 index 0000000000000000000000000000000000000000..88652f01595cb49a8736a1da6563507b607aae8f --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_42.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +#ifndef N +#define N 800 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a[i]*2 > x) + break; + vect_a[i] = x; + + } + return ret; +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c new file mode 100644 index 0000000000000000000000000000000000000000..8e3aab6e04222db8860c111af0e7977fce128dd4 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +#ifndef N +#define N 802 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i+=2) + { + vect_b[i] = x + i; + vect_b[i+1] = x + i + 1; + if (vect_a[i]*2 > x) + break; + if (vect_a[i+1]*2 > x) + break; + vect_a[i] = x; + vect_a[i+1] = x; + + } + return ret; +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c new file mode 100644 index 0000000000000000000000000000000000000000..8e3aab6e04222db8860c111af0e7977fce128dd4 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +#ifndef N +#define N 802 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i+=2) + { + vect_b[i] = x + i; + vect_b[i+1] = x + i + 1; + if (vect_a[i]*2 > x) + break; + if (vect_a[i+1]*2 > x) + break; + vect_a[i] = x; + vect_a[i+1] = x; + + } + return ret; +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_45.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_45.c new file mode 100644 index 0000000000000000000000000000000000000000..cf1cb903b31d5fb5527bc6216c0cb9047357da96 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_45.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +#ifndef N +#define N 803 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a[i]*2 > x) + break; + vect_a[i] = x; + + } + return ret; +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_46.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_46.c new file mode 100644 index 0000000000000000000000000000000000000000..356d971e3a1f69f5c190b49d1d108e6be8766b39 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_46.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_float } */ + +#include + +#define N 1024 +complex double vect_a[N]; +complex double vect_b[N]; + +complex double test4(complex double x) +{ + complex double ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] += x + i; + if (vect_a[i] == x) + return i; + vect_a[i] += x * vect_b[i]; + + } + return ret; +} + +/* At -O2 we can't currently vectorize this because of the libcalls not being + lowered. */ +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_47.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_47.c new file mode 100644 index 0000000000000000000000000000000000000000..d1cca4a33a25fbf6b631d46ce3dcd3608cffa046 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_47.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_float } */ + +void abort (); + +float results1[16] = {192.00,240.00,288.00,336.00,384.00,432.00,480.00,528.00,0.00}; +float results2[16] = {0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,54.00,120.00,198.00,288.00,390.00,504.00,630.00}; +float a[16] = {0}; +float e[16] = {0}; +float b[16] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45}; +int main1 () +{ + int i; + for (i=0; i<16; i++) + { + if (a[i] != results1[i] || e[i] != results2[i]) + abort(); + } + + if (a[i+3] != b[i-1]) + abort (); +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_48.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_48.c new file mode 100644 index 0000000000000000000000000000000000000000..77043182860321a9e265a89ad8f29ec7946b17e8 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_48.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +int main (void) +{ + signed char a[50], b[50], c[50]; + for (int i = 0; i < 50; ++i) + if (a[i] != ((((signed int) -1 < 0 ? -126 : 4) + ((signed int) -1 < 0 ? -101 : 26) + i * 9 + 0) >> 1)) + __builtin_abort (); +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_49.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_49.c new file mode 100644 index 0000000000000000000000000000000000000000..bc9e5bf899a54c5b2ef67e0193d56b243ec5f043 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_49.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +void abort(); +struct foostr { + _Complex short f1; + _Complex short f2; +}; +struct foostr a[16] __attribute__ ((__aligned__(16))) = {}; +struct foostr c[16] __attribute__ ((__aligned__(16))); +struct foostr res[16] = {}; +void +foo (void) +{ + int i; + for (i = 0; i < 16; i++) + { + if (c[i].f1 != res[i].f1) + abort (); + if (c[i].f2 != res[i].f2) + abort (); + } +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_5.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_5.c new file mode 100644 index 0000000000000000000000000000000000000000..4a36d6979db1fd1f97ba2a290f78ac3b84f6de24 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_5.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#define N 1024 +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a[i] > x) + return vect_a[i]; + vect_a[i] = x; + } + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_50.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_50.c new file mode 100644 index 0000000000000000000000000000000000000000..e2ac8283091597f6f4776560c86f89d1f98b58ee --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_50.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_float } */ + +extern void abort(); +float a[1024], b[1024], c[1024], d[1024]; +_Bool k[1024]; + +int main () +{ + int i; + for (i = 0; i < 1024; i++) + if (k[i] != ((i % 3) == 0 && ((i / 9) % 3) == 0)) + abort (); +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_51.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_51.c new file mode 100644 index 0000000000000000000000000000000000000000..af036079457a7f5e50eae5a9ad4c952f33e62f87 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_51.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +int x_in[32]; +int x_out_a[32], x_out_b[32]; +int c[16] = {3,2,1,10,1,42,3,4,50,9,32,8,11,10,1,2}; +int a[16 +1] = {0,16,32,48,64,128,256,512,0,16,32,48,64,128,256,512,1024}; +int b[16 +1] = {17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1}; + +void foo () +{ + int j, i, x; + int curr_a, flag, next_a, curr_b, next_b; + { + for (i = 0; i < 16; i++) + { + next_b = b[i+1]; + curr_b = flag ? next_b : curr_b; + } + x_out_b[j] = curr_b; + } +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_52.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_52.c new file mode 100644 index 0000000000000000000000000000000000000000..85cdfe0938e4093c7725e7f397accf26198f6a53 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_52.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +void abort(); +int main1 (short X) +{ + unsigned char a[128]; + unsigned short b[128]; + unsigned int c[128]; + short myX = X; + int i; + for (i = 0; i < 128; i++) + { + if (a[i] != (unsigned char)myX || b[i] != myX || c[i] != (unsigned int)myX++) + abort (); + } +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c new file mode 100644 index 0000000000000000000000000000000000000000..f066ddcfe458ca04bb1336f832121c91d7a3e80e --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +void abort (); +int a[64], b[64]; +int main () +{ + int c = 7; + for (int i = 1; i < 64; ++i) + if (b[i] != a[i] - a[i-1]) + abort (); + if (b[0] != -7) + abort (); +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_54.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_54.c new file mode 100644 index 0000000000000000000000000000000000000000..9d0dd8dc5fccb05aeabcbce4014c4994bafdfb05 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_54.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#ifndef N +#define N 803 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + unsigned tmp[N]; + for (int i = 0; i < N; i++) + { + tmp[i] = x + i; + vect_b[i] = tmp[i]; + if (vect_a[i] > x) + break; + vect_a[i] = x; + + } + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_55.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_55.c new file mode 100644 index 0000000000000000000000000000000000000000..073cbdf614f81525975dbd188632582218e60e9e --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_55.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ + +#ifndef N +#define N 803 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + volatile unsigned tmp = x + i; + vect_b[i] = tmp; + if (vect_a[i] > x) + break; + vect_a[i] = x; + + } + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c new file mode 100644 index 0000000000000000000000000000000000000000..9086e885f56974d17f8cdf2dce4c6a44e580d74b --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c @@ -0,0 +1,101 @@ +/* Disabling epilogues until we find a better way to deal with scans. */ +/* { dg-additional-options "--param vect-epilogues-nomask=0" } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-add-options bind_pic_locally } */ +/* { dg-require-effective-target vect_early_break } */ + +#include +#include "tree-vect.h" + +#define N 32 + +unsigned short sa[N]; +unsigned short sc[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, + 16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31}; +unsigned short sb[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, + 16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31}; +unsigned int ia[N]; +unsigned int ic[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45, + 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}; +unsigned int ib[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45, + 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}; + +/* Current peeling-for-alignment scheme will consider the 'sa[i+7]' + access for peeling, and therefore will examine the option of + using a peeling factor = VF-7%VF. This will result in a peeling factor 1, + which will also align the access to 'ia[i+3]', and the loop could be + vectorized on all targets that support unaligned loads. + Without cost model on targets that support misaligned stores, no peeling + will be applied since we want to keep the four loads aligned. */ + +__attribute__ ((noinline)) +int main1 () +{ + int i; + int n = N - 7; + + /* Multiple types with different sizes, used in independent + copmutations. Vectorizable. */ + for (i = 0; i < n; i++) + { + sa[i+7] = sb[i] + sc[i]; + ia[i+3] = ib[i] + ic[i]; + } + + /* check results: */ + for (i = 0; i < n; i++) + { + if (sa[i+7] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i]) + abort (); + } + + return 0; +} + +/* Current peeling-for-alignment scheme will consider the 'ia[i+3]' + access for peeling, and therefore will examine the option of + using a peeling factor = VF-3%VF. This will result in a peeling factor + 1 if VF=4,2. This will not align the access to 'sa[i+3]', for which we + need to peel 5,1 iterations for VF=4,2 respectively, so the loop can not + be vectorized. However, 'ia[i+3]' also gets aligned if we peel 5 + iterations, so the loop is vectorizable on all targets that support + unaligned loads. + Without cost model on targets that support misaligned stores, no peeling + will be applied since we want to keep the four loads aligned. */ + +__attribute__ ((noinline)) +int main2 () +{ + int i; + int n = N-3; + + /* Multiple types with different sizes, used in independent + copmutations. Vectorizable. */ + for (i = 0; i < n; i++) + { + ia[i+3] = ib[i] + ic[i]; + sa[i+3] = sb[i] + sc[i]; + } + + /* check results: */ + for (i = 0; i < n; i++) + { + if (sa[i+3] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i]) + abort (); + } + + return 0; +} + +int main (void) +{ + check_vect (); + + main1 (); + main2 (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 2 "vect" { xfail { vect_early_break && { ! vect_hw_misalign } } } } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c new file mode 100644 index 0000000000000000000000000000000000000000..9c7c3df59ffbaaf23292107f982fd7af31741ada --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c @@ -0,0 +1,30 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ + +void abort (); + +unsigned short sa[32]; +unsigned short sc[32] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, + 16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31}; +unsigned short sb[32] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, + 16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31}; +unsigned int ia[32]; +unsigned int ic[32] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45, + 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}; +unsigned int ib[32] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45, + 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}; + +int main2 (int n) +{ + int i; + for (i = 0; i < n; i++) + { + if (sa[i+3] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i]) + abort (); + } +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_58.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_58.c new file mode 100644 index 0000000000000000000000000000000000000000..84ea627b4927609079297f11674bdb4c6b301140 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_58.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_float } */ + +extern void abort(); +float a[1024], b[1024], c[1024], d[1024]; +_Bool k[1024]; + +int main () +{ + int i; + for (i = 0; i < 1024; i++) + if (k[i] != ((i % 3) == 0)) + abort (); +} + +/* Pattern didn't match inside gcond. */ +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_59.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_59.c new file mode 100644 index 0000000000000000000000000000000000000000..193f14e8a4d90793f65a5902eabb8d06496bd6e1 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_59.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_float } */ + +extern void abort(); +float a[1024], b[1024], c[1024], d[1024]; +_Bool k[1024]; + +int main () +{ + int i; + for (i = 0; i < 1024; i++) + if (k[i] != (i == 0)) + abort (); +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c new file mode 100644 index 0000000000000000000000000000000000000000..63ff6662f5c2c93201897e43680daa580ed53867 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#define N 1024 +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < (N/2); i+=2) + { + vect_b[i] = x + i; + vect_b[i+1] = x + i+1; + if (vect_a[i] > x || vect_a[i+1] > x) + break; + vect_a[i] += x * vect_b[i]; + vect_a[i+1] += x * vect_b[i+1]; + } + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_60.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_60.c new file mode 100644 index 0000000000000000000000000000000000000000..4c523d4e714ba67e84b213c2aaf3a56231f8b7e3 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_60.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_float } */ + +extern void abort(); +float a[1024], b[1024], c[1024], d[1024]; +_Bool k[1024]; + +int main () +{ + char i; + for (i = 0; i < 1024; i++) + if (k[i] != (i == 0)) + abort (); +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_61.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_61.c new file mode 100644 index 0000000000000000000000000000000000000000..a0c34f71e3bbd3516247a8e026fe513c25413252 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_61.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_float } */ + +typedef float real_t; +__attribute__((aligned(64))) real_t a[32000], b[32000], c[32000]; +real_t s482() +{ + for (int nl = 0; nl < 10000; nl++) { + for (int i = 0; i < 32000; i++) { + a[i] += b[i] * c[i]; + if (c[i] > b[i]) break; + } + } +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_62.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_62.c new file mode 100644 index 0000000000000000000000000000000000000000..9b94772934f75e685d71a41f3a0336fbfb7320d5 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_62.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +int a, b; +int e() { + int d, c; + d = 0; + for (; d < b; d++) + a = 0; + d = 0; + for (; d < b; d++) + if (d) + c++; + for (;;) + if (c) + break; +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_63.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_63.c new file mode 100644 index 0000000000000000000000000000000000000000..11f7fb8547b351734a964175380d1ada696011ae --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_63.c @@ -0,0 +1,28 @@ +/* Disabling epilogues until we find a better way to deal with scans. */ +/* { dg-do compile } */ +/* { dg-additional-options "--param vect-epilogues-nomask=0" } */ +/* { dg-require-effective-target vect_long } */ +/* { dg-require-effective-target vect_shift } */ +/* { dg-additional-options "-fno-tree-scev-cprop" } */ + +/* Statement used outside the loop. + NOTE: SCEV disabled to ensure the live operation is not removed before + vectorization. */ +__attribute__ ((noinline)) int +liveloop (int start, int n, int *x, int *y) +{ + int i = start; + int j; + int ret; + + for (j = 0; j < n; ++j) + { + i += 1; + x[j] = i; + ret = y[j]; + } + return ret; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "vec_stmt_relevant_p: stmt live but not relevant" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_64.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_64.c new file mode 100644 index 0000000000000000000000000000000000000000..32b9c087feba1780223e3aee8a2636c99990408c --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_64.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-additional-options "-fdump-tree-vect-all" } */ + +int d(unsigned); + +void a() { + char b[8]; + unsigned c = 0; + while (c < 7 && b[c]) + ++c; + if (d(c)) + return; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_partial_vectors } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_65.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_65.c new file mode 100644 index 0000000000000000000000000000000000000000..577c4e96ba91d4dd4aa448233c632de508286eb9 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_65.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-options "-Ofast -fno-vect-cost-model -fdump-tree-vect-details" } */ + +enum a { b }; + +struct { + enum a c; +} d[10], *e; + +void f() { + int g; + for (g = 0, e = d; g < sizeof(1); g++, e++) + if (e->c) + return; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_66.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_66.c new file mode 100644 index 0000000000000000000000000000000000000000..b56a4f755f89225cedd8c156cc7385fe5e07eee5 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_66.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +int a[0]; +int b; + +void g(); + +void f() { + int d, e; + for (; e; e++) { + int c; + switch (b) + case '9': { + for (; d < 1; d++) + if (a[d]) + c = 1; + break; + case '<': + g(); + c = 0; + } + while (c) + ; + } +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c new file mode 100644 index 0000000000000000000000000000000000000000..10fd8b42952c42f3d3a014da103931ca394423d5 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#include + +#define N 1024 +complex double vect_a[N]; +complex double vect_b[N]; + +complex double test4(complex double x) +{ + complex double ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] += x + i; + if (vect_a[i] == x) + break; + vect_a[i] += x * vect_b[i]; + + } + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_8.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_8.c new file mode 100644 index 0000000000000000000000000000000000000000..ae706b2952cfcecf20546a67a735b8d902cbb607 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_8.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#include + +#define N 1024 +char vect_a[N]; +char vect_b[N]; + +char test4(char x, char * restrict res) +{ + char ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] += x + i; + if (vect_a[i] > x) + break; + vect_a[i] += x * vect_b[i]; + res[i] *= vect_b[i]; + } + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_9.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_9.c new file mode 100644 index 0000000000000000000000000000000000000000..350f02f3c7caef457adbe1be802bba51cd818393 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_9.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ + +#ifndef N +#define N 803 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + vect_a[i] = x + i; + if (vect_a[i] > x) + break; + vect_a[i] = x; + + } + return ret; +} diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index d79ad4be10502969209f9b13bd65ab142b92e644..5516188dc0aa86d161d67dea5a7769e3c3d72f85 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -3775,6 +3775,17 @@ proc check_effective_target_vect_int { } { }}] } +# Return 1 if the target supports hardware vectorization of early breaks, +# 0 otherwise. +# +# This won't change for different subtargets so cache the result. + +proc check_effective_target_vect_early_break { } { + return [check_cached_effective_target_indexed vect_early_break { + expr { + [istarget aarch64*-*-*] + }}] +} # Return 1 if the target supports hardware vectorization of complex additions of # byte, 0 otherwise. # --- a/gcc/doc/sourcebuild.texi +++ b/gcc/doc/sourcebuild.texi @@ -1636,6 +1636,10 @@ Target supports hardware vectors of @code{float} when @option{-funsafe-math-optimizations} is not in effect. This implies @code{vect_float}. +@item vect_early_break +Target supports hardware vectorization of loops with early breaks. +This requires an implementation of the cbranch optab for vectors. + @item vect_int Target supports hardware vectors of @code{int}. diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_1.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_1.c new file mode 100644 index 0000000000000000000000000000000000000000..2495b36a72eae94cb7abc4a0d17a5c979fd78083 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_1.c @@ -0,0 +1,11 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast -save-temps" } */ + +#define N 803 +#define P 0 +#include "vect-early-break-template_1.c" + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_10.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_10.c new file mode 100644 index 0000000000000000000000000000000000000000..9bcd7f7e57ef9a1d4649d18569b3406050e54603 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_10.c @@ -0,0 +1,11 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast -save-temps" } */ + +#define N 800 +#define P 799 +#include "vect-early-break-template_2.c" + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_2.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_2.c new file mode 100644 index 0000000000000000000000000000000000000000..63f63101a467909f328be7f3acbc5bcb721967ff --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_2.c @@ -0,0 +1,11 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast -save-temps" } */ + +#define N 803 +#define P 802 +#include "vect-early-break-template_1.c" + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_3.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_3.c new file mode 100644 index 0000000000000000000000000000000000000000..626b95e9b8517081d41d794e9e0264d6301c8589 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_3.c @@ -0,0 +1,11 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast -save-temps" } */ + +#define N 803 +#define P 5 +#include "vect-early-break-template_1.c" + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_4.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_4.c new file mode 100644 index 0000000000000000000000000000000000000000..7e0e6426120551152a7bd800c15d9ed6ab15bada --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_4.c @@ -0,0 +1,11 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast -save-temps" } */ + +#define N 803 +#define P 278 +#include "vect-early-break-template_1.c" + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_5.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_5.c new file mode 100644 index 0000000000000000000000000000000000000000..242cf486f9c40055df0aef5fd238d1aff7a7c7da --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_5.c @@ -0,0 +1,11 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast -save-temps" } */ + +#define N 800 +#define P 799 +#include "vect-early-break-template_1.c" + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_6.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_6.c new file mode 100644 index 0000000000000000000000000000000000000000..9fe7136b7213a463ca6573c60476b7c8f531ddcb --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_6.c @@ -0,0 +1,11 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast -save-temps" } */ + +#define N 803 +#define P 0 +#include "vect-early-break-template_2.c" + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_7.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_7.c new file mode 100644 index 0000000000000000000000000000000000000000..02f93d77dba31b938f6fd9e8c7f5e4acde4aeec9 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_7.c @@ -0,0 +1,11 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast -save-temps" } */ + +#define N 803 +#define P 802 +#include "vect-early-break-template_2.c" + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_8.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_8.c new file mode 100644 index 0000000000000000000000000000000000000000..a614925465606b54c638221ffb95a5e8d3bee797 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_8.c @@ -0,0 +1,11 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast -save-temps" } */ + +#define N 803 +#define P 5 +#include "vect-early-break-template_2.c" + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-run_9.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_9.c new file mode 100644 index 0000000000000000000000000000000000000000..94e2b9c301456eda8f9ad7eaa67604563f0afee7 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-run_9.c @@ -0,0 +1,11 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast -save-temps" } */ + +#define N 803 +#define P 278 +#include "vect-early-break-template_2.c" + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-template_1.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-template_1.c new file mode 100644 index 0000000000000000000000000000000000000000..af70a8e2a5a9dc9756edb5580f2de02ddcc95de9 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-template_1.c @@ -0,0 +1,47 @@ +#ifndef N +#define N 803 +#endif + +#ifndef P +#define P 0 +#endif + +unsigned vect_a[N] = {0}; +unsigned vect_b[N] = {0}; + +__attribute__((noipa, noinline)) +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a[i] > x) + break; + vect_a[i] = x; + + } + return ret; +} + +extern void abort (); + +int main () +{ + + int x = 1; + int idx = P; + vect_a[idx] = x + 1; + + test4(x); + + if (vect_b[idx] != (x + idx)) + abort (); + + if (vect_a[idx] != x + 1) + abort (); + + if (idx > 0 && vect_a[idx-1] != x) + abort (); + +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break-template_2.c b/gcc/testsuite/gcc.dg/vect/vect-early-break-template_2.c new file mode 100644 index 0000000000000000000000000000000000000000..d0f924d904437e71567d27cc1f1089e5607dca0d --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break-template_2.c @@ -0,0 +1,50 @@ +#ifndef N +#define N 803 +#endif + +#ifndef P +#define P 0 +#endif + +unsigned vect_a[N] = {0}; +unsigned vect_b[N] = {0}; + +__attribute__((noipa, noinline)) +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a[i] > x) + return i; + vect_a[i] = x; + + } + return ret; +} + +extern void abort (); + +int main () +{ + + int x = 1; + int idx = P; + vect_a[idx] = x + 1; + + unsigned res = test4(x); + + if (res != idx) + abort (); + + if (vect_b[idx] != (x + idx)) + abort (); + + if (vect_a[idx] != x + 1) + abort (); + + if (idx > 0 && vect_a[idx-1] != x) + abort (); + +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_1.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_1.c new file mode 100644 index 0000000000000000000000000000000000000000..51e7d6489b99c25b9b4b3d1c839f98562b6d4dd7 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_1.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#ifndef N +#define N 803 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a[i] > x) + break; + vect_a[i] = x; + + } + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_10.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_10.c new file mode 100644 index 0000000000000000000000000000000000000000..9e4ad1763202dfdab3ed7961ead5114fcc61a11b --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_10.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#ifndef N +#define N 803 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x,int y, int z) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a[i] > x) + break; + vect_a[i] = x; + } + + ret = x + y * z; + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_11.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_11.c new file mode 100644 index 0000000000000000000000000000000000000000..a613dd9909fb09278dd92a81a24ef854994a9890 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_11.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#ifndef N +#define N 803 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x, int y) +{ + unsigned ret = 0; +for (int o = 0; o < y; o++) +{ + ret += o; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a[i] > x) + break; + vect_a[i] = x; + + } +} + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_12.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_12.c new file mode 100644 index 0000000000000000000000000000000000000000..cc10f3238f1cb8e1307e024a3ebcb5c25a39d1b2 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_12.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#ifndef N +#define N 803 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x, int y) +{ + unsigned ret = 0; +for (int o = 0; o < y; o++) +{ + ret += o; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a[i] > x) + return vect_a[i]; + vect_a[i] = x; + + } +} + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_13.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_13.c new file mode 100644 index 0000000000000000000000000000000000000000..6967b7395ed7c19e38a436d6edcfe7c1580c7113 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_13.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#ifndef N +#define N 803 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a[i] > x) + return vect_a[i] * x; + vect_a[i] = x; + + } + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_14.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_14.c new file mode 100644 index 0000000000000000000000000000000000000000..03cce5cf6cadecb520b46be666bf608e3bc6a511 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_14.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#define N 803 +unsigned vect_a[N]; +unsigned vect_b[N]; + +int test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a[i] > x) + return i; + vect_a[i] += x * vect_b[i]; + + } + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_15.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_15.c new file mode 100644 index 0000000000000000000000000000000000000000..dec6872e1115ff66695f5a500ffa7ca01c0f8d3a --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_15.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#define N 803 +unsigned vect_a[N]; +unsigned vect_b[N]; + +int test4(unsigned x) +{ + int ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a[i] > x) + return i; + vect_a[i] += x * vect_b[i]; + + } + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_16.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_16.c new file mode 100644 index 0000000000000000000000000000000000000000..30812d12a39bd94b4b8a3aade6512b162697d659 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_16.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#define N 1024 +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a[i] > x) + return vect_a[i]; + vect_a[i] = x; + ret += vect_a[i] + vect_b[i]; + } + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_17.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_17.c new file mode 100644 index 0000000000000000000000000000000000000000..510227a18435a8e47c5a754580180c6d340c0823 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_17.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#define N 1024 +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a[i] > x) + return vect_a[i]; + vect_a[i] = x; + ret = vect_a[i] + vect_b[i]; + } + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_18.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_18.c new file mode 100644 index 0000000000000000000000000000000000000000..1372f79242b250cabbab29757b62cbc28a9064a8 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_18.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#ifndef N +#define N 803 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i+=2) + { + vect_b[i] = x + i; + if (vect_a[i] > x) + break; + vect_a[i] = x; + + } + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_19.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_19.c new file mode 100644 index 0000000000000000000000000000000000000000..677487f7da496a8f467d8c529575d47ff22c6a31 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_19.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#ifndef N +#define N 803 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x, unsigned step) +{ + unsigned ret = 0; + for (int i = 0; i < N; i+=step) + { + vect_b[i] = x + i; + if (vect_a[i] > x) + break; + vect_a[i] = x; + + } + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c new file mode 100644 index 0000000000000000000000000000000000000000..7268f6ae2485d0274fd85ea53cc1e44ef4b84d5c --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_2.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#include + +#define N 1024 +complex double vect_a[N]; +complex double vect_b[N]; + +complex double test4(complex double x) +{ + complex double ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] += x + i; + if (vect_a[i] == x) + return i; + vect_a[i] += x * vect_b[i]; + + } + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_20.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_20.c new file mode 100644 index 0000000000000000000000000000000000000000..ed41377d1c979bf14e0a4e80401831c09ffa463f --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_20.c @@ -0,0 +1,37 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#include + +#ifndef N +#define N 803 +#endif +unsigned vect_b[N]; +struct testStruct { + long e; + long f; + bool a : 1; + bool b : 1; + int c : 14; + int d; +}; +struct testStruct vect_a[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a[i].a > x) + return true; + vect_a[i].e = x; + } + return ret; +} + diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c new file mode 100644 index 0000000000000000000000000000000000000000..6415e4951cb9ef70e56b7cfb1db3d3151368666d --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_21.c @@ -0,0 +1,37 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#include + +#ifndef N +#define N 803 +#endif +unsigned vect_b[N]; +struct testStruct { + long e; + long f; + bool a : 1; + bool b : 1; + int c : 14; + int d; +}; +struct testStruct vect_a[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a[i].a) + return true; + vect_a[i].e = x; + } + return ret; +} + diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c new file mode 100644 index 0000000000000000000000000000000000000000..2ca189899fb6bd6dfdf63de7729f54e3bee06ba0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_22.c @@ -0,0 +1,45 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target vect_perm } */ +/* { dg-require-effective-target vect_early_break } */ + +#include "tree-vect.h" + +void __attribute__((noipa)) +foo (int * __restrict__ a, short * __restrict__ b, int * __restrict__ c) +{ + int t1 = *c; + int t2 = *c; + for (int i = 0; i < 64; i+=2) + { + b[i] = a[i] - t1; + t1 = a[i]; + b[i+1] = a[i+1] - t2; + t2 = a[i+1]; + } +} + +int a[64]; +short b[64]; + +int +main () +{ + check_vect (); + for (int i = 0; i < 64; ++i) + { + a[i] = i; + __asm__ volatile ("" ::: "memory"); + } + int c = 7; + foo (a, b, &c); + for (int i = 2; i < 64; i+=2) + if (b[i] != a[i] - a[i-2] + || b[i+1] != a[i+1] - a[i-1]) + abort (); + if (b[0] != -7 || b[1] != -6) + abort (); + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_23.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_23.c new file mode 100644 index 0000000000000000000000000000000000000000..f3298656d5d67fd137c4029a96a2f9c1bae344ce --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_23.c @@ -0,0 +1,61 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +#define N 200 +#define M 4 + +typedef signed char sc; +typedef unsigned char uc; +typedef signed short ss; +typedef unsigned short us; +typedef int si; +typedef unsigned int ui; +typedef signed long long sll; +typedef unsigned long long ull; + +#define FOR_EACH_TYPE(M) \ + M (sc) M (uc) \ + M (ss) M (us) \ + M (si) M (ui) \ + M (sll) M (ull) \ + M (float) M (double) + +#define TEST_VALUE(I) ((I) * 17 / 2) + +#define ADD_TEST(TYPE) \ + void __attribute__((noinline, noclone)) \ + test_##TYPE (TYPE *a, TYPE *b) \ + { \ + for (int i = 0; i < N; i += 2) \ + { \ + a[i + 0] = b[i + 0] + 2; \ + a[i + 1] = b[i + 1] + 3; \ + } \ + } + +#define DO_TEST(TYPE) \ + for (int j = 1; j < M; ++j) \ + { \ + TYPE a[N + M]; \ + for (int i = 0; i < N + M; ++i) \ + a[i] = TEST_VALUE (i); \ + test_##TYPE (a + j, a); \ + for (int i = 0; i < N; i += 2) \ + if (a[i + j] != (TYPE) (a[i] + 2) \ + || a[i + j + 1] != (TYPE) (a[i + 1] + 3)) \ + __builtin_abort (); \ + } + +FOR_EACH_TYPE (ADD_TEST) + +int +main (void) +{ + FOR_EACH_TYPE (DO_TEST) + return 0; +} + +/* { dg-final { scan-tree-dump {flags: [^\n]*ARBITRARY\n} "vect" { target vect_int } } } */ +/* { dg-final { scan-tree-dump "using an address-based overlap test" "vect" } } */ +/* { dg-final { scan-tree-dump-not "using an index-based" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_24.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_24.c new file mode 100644 index 0000000000000000000000000000000000000000..7b4b2ffb9b75db6d5ca7e313d1f18d9b51f5b566 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_24.c @@ -0,0 +1,46 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vect_double } */ +/* { dg-require-effective-target vect_early_break } */ + +#include "tree-vect.h" + +extern void abort (void); +void __attribute__((noinline,noclone)) +foo (double *b, double *d, double *f) +{ + int i; + for (i = 0; i < 1024; i++) + { + d[2*i] = 2. * d[2*i]; + d[2*i+1] = 4. * d[2*i+1]; + b[i] = d[2*i] - 1.; + f[i] = d[2*i+1] + 2.; + } +} +int main() +{ + double b[1024], d[2*1024], f[1024]; + int i; + + check_vect (); + + for (i = 0; i < 2*1024; i++) + d[i] = 1.; + foo (b, d, f); + for (i = 0; i < 1024; i+= 2) + { + if (d[2*i] != 2.) + abort (); + if (d[2*i+1] != 4.) + abort (); + } + for (i = 0; i < 1024; i++) + { + if (b[i] != 1.) + abort (); + if (f[i] != 6.) + abort (); + } + return 0; +} + diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_25.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_25.c new file mode 100644 index 0000000000000000000000000000000000000000..8db9b60128b9e21529ae73ea1902afb8fa327112 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_25.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* Disabling epilogues until we find a better way to deal with scans. */ +/* { dg-additional-options "--param vect-epilogues-nomask=0" } */ +/* { dg-require-effective-target vect_int } */ + +#include "vect-peel-1-src.c" + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 14 "vect" { target { { vect_element_align } && { vect_aligned_arrays } } xfail { ! vect_unaligned_possible } } } } */ +/* { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 1 "vect" { xfail vect_element_align_preferred } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c new file mode 100644 index 0000000000000000000000000000000000000000..5905847cc0b6b393dde728a9f4ecb44c8ab42da5 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_26.c @@ -0,0 +1,44 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target vect_perm } */ + +#include "tree-vect.h" + +void __attribute__((noipa)) +foo (int * __restrict__ a, int * __restrict__ b, int * __restrict__ c) +{ + int t1 = *c; + int t2 = *c; + for (int i = 0; i < 64; i+=2) + { + b[i] = a[i] - t1; + t1 = a[i]; + b[i+1] = a[i+1] - t2; + t2 = a[i+1]; + } +} + +int a[64], b[64]; + +int +main () +{ + check_vect (); + for (int i = 0; i < 64; ++i) + { + a[i] = i; + __asm__ volatile ("" ::: "memory"); + } + int c = 7; + foo (a, b, &c); + for (int i = 2; i < 64; i+=2) + if (b[i] != a[i] - a[i-2] + || b[i+1] != a[i+1] - a[i-1]) + abort (); + if (b[0] != -7 || b[1] != -6) + abort (); + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 2 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_27.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_27.c new file mode 100644 index 0000000000000000000000000000000000000000..d0cfbb01667fa016d72828d098aeaa252c2c9318 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_27.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +void abort (); +int a[128]; + +int main () +{ + int i; + for (i = 1; i < 128; i++) + if (a[i] != i%4 + 1) + abort (); + if (a[0] != 5) + abort (); +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_28.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_28.c new file mode 100644 index 0000000000000000000000000000000000000000..a5eae81f3f5f5b7d92082f1588c6453a71e205cc --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_28.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +void abort (); +int a[128]; +int main () +{ + int i; + for (i = 1; i < 128; i++) + if (a[i] != i%4 + 1) + abort (); +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_29.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_29.c new file mode 100644 index 0000000000000000000000000000000000000000..75d87e99e939fab61f751be025ca0398fa5bd078 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_29.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +int in[100]; +int out[100 * 2]; + +int main (void) +{ + if (out[0] != in[100 - 1]) + for (int i = 1; i <= 100; ++i) + if (out[i] != 2) + __builtin_abort (); +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_3.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_3.c new file mode 100644 index 0000000000000000000000000000000000000000..3c6d28bd2d6e6e794146baf89e43c3b70293b7d9 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_3.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ + +unsigned test4(char x, char *vect, int n) +{ + unsigned ret = 0; + for (int i = 0; i < n; i++) + { + if (vect[i] > x) + return 1; + + vect[i] = x; + } + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_30.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_30.c new file mode 100644 index 0000000000000000000000000000000000000000..e09d883db84685679e73867d83aba9900563983d --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_30.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +int x[100]; +int choose1(int); +int choose2(); +void consume(int); +void f() { + for (int i = 0; i < 100; ++i) { + if (x[i] == 11) { + if (choose1(i)) + goto A; + else + goto B; + } + } + if (choose2()) + goto B; +A: + for (int i = 0; i < 100; ++i) + consume(i); +B: + for (int i = 0; i < 100; ++i) + consume(i * i); +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_31.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_31.c new file mode 100644 index 0000000000000000000000000000000000000000..6001523162d24d140af73143435f25bcd3a217c8 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_31.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#ifndef N +#define N 1025 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a[i] > x) + return vect_a[i]; + vect_a[i] = x; + ret += vect_a[i] + vect_b[i]; + } + return ret; +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_32.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_32.c new file mode 100644 index 0000000000000000000000000000000000000000..73abddc267a0170c2d97a7e7c680525721455f22 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_32.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#ifndef N +#define N 1024 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a[i] > x) + return vect_a[i]; + vect_a[i] = x; + ret = vect_a[i] + vect_b[i]; + } + return ret; +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_33.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_33.c new file mode 100644 index 0000000000000000000000000000000000000000..29b37f70939af7fa9409edd3a1e29f718c959706 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_33.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +#ifndef N +#define N 803 +#endif +unsigned vect_a2[N]; +unsigned vect_a1[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x, int z) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a1[i]*2 > x) + { + for (int y = 0; y < z; y++) + vect_a2 [y] *= vect_a1[i]; + break; + } + } + return ret; +} + +/* { dg-final { scan-tree-dump "vectorized 2 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_34.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_34.c new file mode 100644 index 0000000000000000000000000000000000000000..2c48e3cee33fc37f45ef59c2bbaff7bc5a76b460 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_34.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +#ifndef N +#define N 803 +#endif + +unsigned vect_a[N] __attribute__ ((aligned (4)));; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + + for (int i = 1; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a[i]*2 > x) + break; + vect_a[i] = x; + + } + return ret; +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_35.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_35.c new file mode 100644 index 0000000000000000000000000000000000000000..3442484a81161f9bd09e30bc268fbcf66a899902 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_35.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +#ifndef N +#define N 803 +#endif +unsigned vect_a2[N]; +unsigned vect_a1[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a1[i]*2 > x) + break; + vect_a1[i] = x; + if (vect_a2[i]*4 > x) + break; + vect_a2[i] = x*x; + } + return ret; +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_36.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_36.c new file mode 100644 index 0000000000000000000000000000000000000000..027766c51f508eab157db365a1653f3e92dcac10 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_36.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +#ifndef N +#define N 803 +#endif +unsigned vect_a2[N]; +unsigned vect_a1[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a1[i]*2 > x) + break; + vect_a1[i] = x; + if (vect_a2[i]*4 > x) + return i; + vect_a2[i] = x*x; + } + return ret; +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_37.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_37.c new file mode 100644 index 0000000000000000000000000000000000000000..8d363120898232bb1402b9cf7b4b83b38a10505b --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_37.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +#ifndef N +#define N 4 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a[i]*2 != x) + break; + vect_a[i] = x; + + } + return ret; +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_38.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_38.c new file mode 100644 index 0000000000000000000000000000000000000000..226d55d7194ca3f676ab52976fea25b7e335bbec --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_38.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +#ifndef N +#define N 803 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i+=2) + { + vect_b[i] = x + i; + if (vect_a[i]*2 > x) + break; + vect_a[i] = x; + + } + return ret; +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c new file mode 100644 index 0000000000000000000000000000000000000000..554e6ec84318c600c87982ad6ef0f90e8b47af01 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_39.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +#ifndef N +#define N 803 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x, unsigned n) +{ + unsigned ret = 0; + for (int i = 0; i < N; i+= (N % 4)) + { + vect_b[i] = x + i; + if (vect_a[i]*2 > x) + break; + vect_a[i] = x; + + } + return ret; +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_4.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_4.c new file mode 100644 index 0000000000000000000000000000000000000000..216c56faf330449bf1969b7e51ff1e94270dc861 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_4.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ + +#define N 1024 +unsigned vect[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + if (i > 16 && vect[i] > x) + break; + + vect[i] = x; + } + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_40.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_40.c new file mode 100644 index 0000000000000000000000000000000000000000..f2ae372cd96e74cc06254937c2b8fa69ecdedf09 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_40.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +#ifndef N +#define N 803 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i*=3) + { + vect_b[i] = x + i; + if (vect_a[i]*2 > x) + break; + vect_a[i] = x; + + } + return ret; +} + +/* SCEV can't currently analyze this loop bounds. */ +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_41.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_41.c new file mode 100644 index 0000000000000000000000000000000000000000..6ad9b3f17ddb953bfbf614e9331fa81f565b262f --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_41.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +#ifndef N +#define N 803 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; +#pragma GCC novector +#pragma GCC unroll 4 + for (int i = 0; i < N; i++) + { + vect_b[i] += vect_a[i] + x; + } + return ret; +} + +/* novector should have blocked vectorization. */ +/* { dg-final { scan-tree-dump-not "vectorized \d loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_42.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_42.c new file mode 100644 index 0000000000000000000000000000000000000000..88652f01595cb49a8736a1da6563507b607aae8f --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_42.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +#ifndef N +#define N 800 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a[i]*2 > x) + break; + vect_a[i] = x; + + } + return ret; +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c new file mode 100644 index 0000000000000000000000000000000000000000..8e3aab6e04222db8860c111af0e7977fce128dd4 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_43.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +#ifndef N +#define N 802 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i+=2) + { + vect_b[i] = x + i; + vect_b[i+1] = x + i + 1; + if (vect_a[i]*2 > x) + break; + if (vect_a[i+1]*2 > x) + break; + vect_a[i] = x; + vect_a[i+1] = x; + + } + return ret; +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c new file mode 100644 index 0000000000000000000000000000000000000000..8e3aab6e04222db8860c111af0e7977fce128dd4 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_44.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +#ifndef N +#define N 802 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i+=2) + { + vect_b[i] = x + i; + vect_b[i+1] = x + i + 1; + if (vect_a[i]*2 > x) + break; + if (vect_a[i+1]*2 > x) + break; + vect_a[i] = x; + vect_a[i+1] = x; + + } + return ret; +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_45.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_45.c new file mode 100644 index 0000000000000000000000000000000000000000..cf1cb903b31d5fb5527bc6216c0cb9047357da96 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_45.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +#ifndef N +#define N 803 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a[i]*2 > x) + break; + vect_a[i] = x; + + } + return ret; +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_46.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_46.c new file mode 100644 index 0000000000000000000000000000000000000000..356d971e3a1f69f5c190b49d1d108e6be8766b39 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_46.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_float } */ + +#include + +#define N 1024 +complex double vect_a[N]; +complex double vect_b[N]; + +complex double test4(complex double x) +{ + complex double ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] += x + i; + if (vect_a[i] == x) + return i; + vect_a[i] += x * vect_b[i]; + + } + return ret; +} + +/* At -O2 we can't currently vectorize this because of the libcalls not being + lowered. */ +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_47.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_47.c new file mode 100644 index 0000000000000000000000000000000000000000..d1cca4a33a25fbf6b631d46ce3dcd3608cffa046 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_47.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_float } */ + +void abort (); + +float results1[16] = {192.00,240.00,288.00,336.00,384.00,432.00,480.00,528.00,0.00}; +float results2[16] = {0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,54.00,120.00,198.00,288.00,390.00,504.00,630.00}; +float a[16] = {0}; +float e[16] = {0}; +float b[16] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45}; +int main1 () +{ + int i; + for (i=0; i<16; i++) + { + if (a[i] != results1[i] || e[i] != results2[i]) + abort(); + } + + if (a[i+3] != b[i-1]) + abort (); +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_48.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_48.c new file mode 100644 index 0000000000000000000000000000000000000000..77043182860321a9e265a89ad8f29ec7946b17e8 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_48.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +int main (void) +{ + signed char a[50], b[50], c[50]; + for (int i = 0; i < 50; ++i) + if (a[i] != ((((signed int) -1 < 0 ? -126 : 4) + ((signed int) -1 < 0 ? -101 : 26) + i * 9 + 0) >> 1)) + __builtin_abort (); +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_49.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_49.c new file mode 100644 index 0000000000000000000000000000000000000000..bc9e5bf899a54c5b2ef67e0193d56b243ec5f043 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_49.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +void abort(); +struct foostr { + _Complex short f1; + _Complex short f2; +}; +struct foostr a[16] __attribute__ ((__aligned__(16))) = {}; +struct foostr c[16] __attribute__ ((__aligned__(16))); +struct foostr res[16] = {}; +void +foo (void) +{ + int i; + for (i = 0; i < 16; i++) + { + if (c[i].f1 != res[i].f1) + abort (); + if (c[i].f2 != res[i].f2) + abort (); + } +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_5.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_5.c new file mode 100644 index 0000000000000000000000000000000000000000..4a36d6979db1fd1f97ba2a290f78ac3b84f6de24 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_5.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#define N 1024 +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] = x + i; + if (vect_a[i] > x) + return vect_a[i]; + vect_a[i] = x; + } + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_50.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_50.c new file mode 100644 index 0000000000000000000000000000000000000000..e2ac8283091597f6f4776560c86f89d1f98b58ee --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_50.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_float } */ + +extern void abort(); +float a[1024], b[1024], c[1024], d[1024]; +_Bool k[1024]; + +int main () +{ + int i; + for (i = 0; i < 1024; i++) + if (k[i] != ((i % 3) == 0 && ((i / 9) % 3) == 0)) + abort (); +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_51.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_51.c new file mode 100644 index 0000000000000000000000000000000000000000..af036079457a7f5e50eae5a9ad4c952f33e62f87 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_51.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +int x_in[32]; +int x_out_a[32], x_out_b[32]; +int c[16] = {3,2,1,10,1,42,3,4,50,9,32,8,11,10,1,2}; +int a[16 +1] = {0,16,32,48,64,128,256,512,0,16,32,48,64,128,256,512,1024}; +int b[16 +1] = {17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1}; + +void foo () +{ + int j, i, x; + int curr_a, flag, next_a, curr_b, next_b; + { + for (i = 0; i < 16; i++) + { + next_b = b[i+1]; + curr_b = flag ? next_b : curr_b; + } + x_out_b[j] = curr_b; + } +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_52.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_52.c new file mode 100644 index 0000000000000000000000000000000000000000..85cdfe0938e4093c7725e7f397accf26198f6a53 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_52.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +void abort(); +int main1 (short X) +{ + unsigned char a[128]; + unsigned short b[128]; + unsigned int c[128]; + short myX = X; + int i; + for (i = 0; i < 128; i++) + { + if (a[i] != (unsigned char)myX || b[i] != myX || c[i] != (unsigned int)myX++) + abort (); + } +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c new file mode 100644 index 0000000000000000000000000000000000000000..f066ddcfe458ca04bb1336f832121c91d7a3e80e --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_53.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +void abort (); +int a[64], b[64]; +int main () +{ + int c = 7; + for (int i = 1; i < 64; ++i) + if (b[i] != a[i] - a[i-1]) + abort (); + if (b[0] != -7) + abort (); +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_54.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_54.c new file mode 100644 index 0000000000000000000000000000000000000000..9d0dd8dc5fccb05aeabcbce4014c4994bafdfb05 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_54.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#ifndef N +#define N 803 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + unsigned tmp[N]; + for (int i = 0; i < N; i++) + { + tmp[i] = x + i; + vect_b[i] = tmp[i]; + if (vect_a[i] > x) + break; + vect_a[i] = x; + + } + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_55.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_55.c new file mode 100644 index 0000000000000000000000000000000000000000..073cbdf614f81525975dbd188632582218e60e9e --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_55.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ + +#ifndef N +#define N 803 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + volatile unsigned tmp = x + i; + vect_b[i] = tmp; + if (vect_a[i] > x) + break; + vect_a[i] = x; + + } + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c new file mode 100644 index 0000000000000000000000000000000000000000..9086e885f56974d17f8cdf2dce4c6a44e580d74b --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_56.c @@ -0,0 +1,101 @@ +/* Disabling epilogues until we find a better way to deal with scans. */ +/* { dg-additional-options "--param vect-epilogues-nomask=0" } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-add-options bind_pic_locally } */ +/* { dg-require-effective-target vect_early_break } */ + +#include +#include "tree-vect.h" + +#define N 32 + +unsigned short sa[N]; +unsigned short sc[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, + 16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31}; +unsigned short sb[N] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, + 16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31}; +unsigned int ia[N]; +unsigned int ic[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45, + 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}; +unsigned int ib[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45, + 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}; + +/* Current peeling-for-alignment scheme will consider the 'sa[i+7]' + access for peeling, and therefore will examine the option of + using a peeling factor = VF-7%VF. This will result in a peeling factor 1, + which will also align the access to 'ia[i+3]', and the loop could be + vectorized on all targets that support unaligned loads. + Without cost model on targets that support misaligned stores, no peeling + will be applied since we want to keep the four loads aligned. */ + +__attribute__ ((noinline)) +int main1 () +{ + int i; + int n = N - 7; + + /* Multiple types with different sizes, used in independent + copmutations. Vectorizable. */ + for (i = 0; i < n; i++) + { + sa[i+7] = sb[i] + sc[i]; + ia[i+3] = ib[i] + ic[i]; + } + + /* check results: */ + for (i = 0; i < n; i++) + { + if (sa[i+7] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i]) + abort (); + } + + return 0; +} + +/* Current peeling-for-alignment scheme will consider the 'ia[i+3]' + access for peeling, and therefore will examine the option of + using a peeling factor = VF-3%VF. This will result in a peeling factor + 1 if VF=4,2. This will not align the access to 'sa[i+3]', for which we + need to peel 5,1 iterations for VF=4,2 respectively, so the loop can not + be vectorized. However, 'ia[i+3]' also gets aligned if we peel 5 + iterations, so the loop is vectorizable on all targets that support + unaligned loads. + Without cost model on targets that support misaligned stores, no peeling + will be applied since we want to keep the four loads aligned. */ + +__attribute__ ((noinline)) +int main2 () +{ + int i; + int n = N-3; + + /* Multiple types with different sizes, used in independent + copmutations. Vectorizable. */ + for (i = 0; i < n; i++) + { + ia[i+3] = ib[i] + ic[i]; + sa[i+3] = sb[i] + sc[i]; + } + + /* check results: */ + for (i = 0; i < n; i++) + { + if (sa[i+3] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i]) + abort (); + } + + return 0; +} + +int main (void) +{ + check_vect (); + + main1 (); + main2 (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 2 "vect" { xfail { vect_early_break && { ! vect_hw_misalign } } } } } */ + diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c new file mode 100644 index 0000000000000000000000000000000000000000..9c7c3df59ffbaaf23292107f982fd7af31741ada --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_57.c @@ -0,0 +1,30 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ + +void abort (); + +unsigned short sa[32]; +unsigned short sc[32] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, + 16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31}; +unsigned short sb[32] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15, + 16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31}; +unsigned int ia[32]; +unsigned int ic[32] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45, + 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}; +unsigned int ib[32] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45, + 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}; + +int main2 (int n) +{ + int i; + for (i = 0; i < n; i++) + { + if (sa[i+3] != sb[i] + sc[i] || ia[i+3] != ib[i] + ic[i]) + abort (); + } +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_58.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_58.c new file mode 100644 index 0000000000000000000000000000000000000000..84ea627b4927609079297f11674bdb4c6b301140 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_58.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_float } */ + +extern void abort(); +float a[1024], b[1024], c[1024], d[1024]; +_Bool k[1024]; + +int main () +{ + int i; + for (i = 0; i < 1024; i++) + if (k[i] != ((i % 3) == 0)) + abort (); +} + +/* Pattern didn't match inside gcond. */ +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_59.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_59.c new file mode 100644 index 0000000000000000000000000000000000000000..193f14e8a4d90793f65a5902eabb8d06496bd6e1 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_59.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_float } */ + +extern void abort(); +float a[1024], b[1024], c[1024], d[1024]; +_Bool k[1024]; + +int main () +{ + int i; + for (i = 0; i < 1024; i++) + if (k[i] != (i == 0)) + abort (); +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c new file mode 100644 index 0000000000000000000000000000000000000000..63ff6662f5c2c93201897e43680daa580ed53867 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_6.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#define N 1024 +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < (N/2); i+=2) + { + vect_b[i] = x + i; + vect_b[i+1] = x + i+1; + if (vect_a[i] > x || vect_a[i+1] > x) + break; + vect_a[i] += x * vect_b[i]; + vect_a[i+1] += x * vect_b[i+1]; + } + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_60.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_60.c new file mode 100644 index 0000000000000000000000000000000000000000..4c523d4e714ba67e84b213c2aaf3a56231f8b7e3 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_60.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_float } */ + +extern void abort(); +float a[1024], b[1024], c[1024], d[1024]; +_Bool k[1024]; + +int main () +{ + char i; + for (i = 0; i < 1024; i++) + if (k[i] != (i == 0)) + abort (); +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { xfail *-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_61.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_61.c new file mode 100644 index 0000000000000000000000000000000000000000..a0c34f71e3bbd3516247a8e026fe513c25413252 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_61.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_float } */ + +typedef float real_t; +__attribute__((aligned(64))) real_t a[32000], b[32000], c[32000]; +real_t s482() +{ + for (int nl = 0; nl < 10000; nl++) { + for (int i = 0; i < 32000; i++) { + a[i] += b[i] * c[i]; + if (c[i] > b[i]) break; + } + } +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_62.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_62.c new file mode 100644 index 0000000000000000000000000000000000000000..9b94772934f75e685d71a41f3a0336fbfb7320d5 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_62.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +int a, b; +int e() { + int d, c; + d = 0; + for (; d < b; d++) + a = 0; + d = 0; + for (; d < b; d++) + if (d) + c++; + for (;;) + if (c) + break; +} + +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_63.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_63.c new file mode 100644 index 0000000000000000000000000000000000000000..11f7fb8547b351734a964175380d1ada696011ae --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_63.c @@ -0,0 +1,28 @@ +/* Disabling epilogues until we find a better way to deal with scans. */ +/* { dg-do compile } */ +/* { dg-additional-options "--param vect-epilogues-nomask=0" } */ +/* { dg-require-effective-target vect_long } */ +/* { dg-require-effective-target vect_shift } */ +/* { dg-additional-options "-fno-tree-scev-cprop" } */ + +/* Statement used outside the loop. + NOTE: SCEV disabled to ensure the live operation is not removed before + vectorization. */ +__attribute__ ((noinline)) int +liveloop (int start, int n, int *x, int *y) +{ + int i = start; + int j; + int ret; + + for (j = 0; j < n; ++j) + { + i += 1; + x[j] = i; + ret = y[j]; + } + return ret; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "vec_stmt_relevant_p: stmt live but not relevant" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_64.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_64.c new file mode 100644 index 0000000000000000000000000000000000000000..32b9c087feba1780223e3aee8a2636c99990408c --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_64.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-additional-options "-fdump-tree-vect-all" } */ + +int d(unsigned); + +void a() { + char b[8]; + unsigned c = 0; + while (c < 7 && b[c]) + ++c; + if (d(c)) + return; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_partial_vectors } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_65.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_65.c new file mode 100644 index 0000000000000000000000000000000000000000..577c4e96ba91d4dd4aa448233c632de508286eb9 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_65.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-options "-Ofast -fno-vect-cost-model -fdump-tree-vect-details" } */ + +enum a { b }; + +struct { + enum a c; +} d[10], *e; + +void f() { + int g; + for (g = 0, e = d; g < sizeof(1); g++, e++) + if (e->c) + return; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_66.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_66.c new file mode 100644 index 0000000000000000000000000000000000000000..b56a4f755f89225cedd8c156cc7385fe5e07eee5 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_66.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +int a[0]; +int b; + +void g(); + +void f() { + int d, e; + for (; e; e++) { + int c; + switch (b) + case '9': { + for (; d < 1; d++) + if (a[d]) + c = 1; + break; + case '<': + g(); + c = 0; + } + while (c) + ; + } +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c new file mode 100644 index 0000000000000000000000000000000000000000..10fd8b42952c42f3d3a014da103931ca394423d5 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_7.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#include + +#define N 1024 +complex double vect_a[N]; +complex double vect_b[N]; + +complex double test4(complex double x) +{ + complex double ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] += x + i; + if (vect_a[i] == x) + break; + vect_a[i] += x * vect_b[i]; + + } + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_8.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_8.c new file mode 100644 index 0000000000000000000000000000000000000000..ae706b2952cfcecf20546a67a735b8d902cbb607 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_8.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ + +#include + +#define N 1024 +char vect_a[N]; +char vect_b[N]; + +char test4(char x, char * restrict res) +{ + char ret = 0; + for (int i = 0; i < N; i++) + { + vect_b[i] += x + i; + if (vect_a[i] > x) + break; + vect_a[i] += x * vect_b[i]; + res[i] *= vect_b[i]; + } + return ret; +} diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_9.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_9.c new file mode 100644 index 0000000000000000000000000000000000000000..350f02f3c7caef457adbe1be802bba51cd818393 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_9.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_early_break } */ +/* { dg-require-effective-target vect_int } */ + +/* { dg-additional-options "-Ofast" } */ + +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ + +#ifndef N +#define N 803 +#endif +unsigned vect_a[N]; +unsigned vect_b[N]; + +unsigned test4(unsigned x) +{ + unsigned ret = 0; + for (int i = 0; i < N; i++) + { + vect_a[i] = x + i; + if (vect_a[i] > x) + break; + vect_a[i] = x; + + } + return ret; +} diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index d79ad4be10502969209f9b13bd65ab142b92e644..5516188dc0aa86d161d67dea5a7769e3c3d72f85 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -3775,6 +3775,17 @@ proc check_effective_target_vect_int { } { }}] } +# Return 1 if the target supports hardware vectorization of early breaks, +# 0 otherwise. +# +# This won't change for different subtargets so cache the result. + +proc check_effective_target_vect_early_break { } { + return [check_cached_effective_target_indexed vect_early_break { + expr { + [istarget aarch64*-*-*] + }}] +} # Return 1 if the target supports hardware vectorization of complex additions of # byte, 0 otherwise. # From patchwork Wed Jun 28 13:48:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 113911 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp8943689vqr; Wed, 28 Jun 2023 06:55:50 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ51NBYUeSWm+v8SzhangagW3DYYD+kqRw5DkBwV4kGrarcxvdnUcb/h4x118TvKMwKIDNyd X-Received: by 2002:aa7:de06:0:b0:51d:d41b:26d4 with SMTP id h6-20020aa7de06000000b0051dd41b26d4mr487381edv.39.1687960550174; Wed, 28 Jun 2023 06:55:50 -0700 (PDT) Received: from sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id n14-20020aa7d04e000000b005164220ba75si5254023edo.41.2023.06.28.06.55.49 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Jun 2023 06:55:50 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=gWsiAbvM; arc=fail (signature failed); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 66C3C38319CD for ; Wed, 28 Jun 2023 13:51:50 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 66C3C38319CD DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1687960310; bh=rA1tT9dzOqEFbBF1yolFgDs6NQL5g44v6fbA2ZuswCU=; h=Date:To:Cc:Subject:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=gWsiAbvMY+voII1ouqAf66f85PVmhowTK/GVEm/jBgSd6zyrFx8+iPkqZ5uvljKFp AB8PnOYf3BFvm0ldYGjtSjo4xPZJ+CXsrd7Y2Z054iHcGpkOblO3M2fGcXtwOUxALV uEDGQcA1IkcSTlDH415SVlj/G3nbXIZJNsCeZuJU= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR04-HE1-obe.outbound.protection.outlook.com (mail-he1eur04on2073.outbound.protection.outlook.com [40.107.7.73]) by sourceware.org (Postfix) with ESMTPS id 6CD89385660A for ; Wed, 28 Jun 2023 13:48:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6CD89385660A Received: from AS9PR06CA0360.eurprd06.prod.outlook.com (2603:10a6:20b:466::16) by PAWPR08MB9124.eurprd08.prod.outlook.com (2603:10a6:102:330::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.26; Wed, 28 Jun 2023 13:48:20 +0000 Received: from AM7EUR03FT010.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:466:cafe::1e) by AS9PR06CA0360.outlook.office365.com (2603:10a6:20b:466::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.34 via Frontend Transport; Wed, 28 Jun 2023 13:48:20 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM7EUR03FT010.mail.protection.outlook.com (100.127.141.22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6544.18 via Frontend Transport; Wed, 28 Jun 2023 13:48:20 +0000 Received: ("Tessian outbound c08fa2e31830:v142"); Wed, 28 Jun 2023 13:48:20 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 068e4d210c05b97c X-CR-MTA-TID: 64aa7808 Received: from aa16e7ce21e0.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 585A6D15-EE0F-4B03-A6E3-E890F2D8294A.1; Wed, 28 Jun 2023 13:48:14 +0000 Received: from EUR04-DB3-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id aa16e7ce21e0.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 28 Jun 2023 13:48:14 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=JflZhMrLw2wYG3oZtUkVxze1y0LS92Usq7WsBR7svWOWiDtU8rPRd67FvVmn2bt1g7yPFllOOcLOIMxTY7JNZbu/h1wjriejL+XubxWFTSmM6z5y03kkW0z/p8f8c5iYLYzV0N7ILyrszR1QaHzQcjE6sVT7NWyCDgmHyTr4S6gj0iZcsej+5FGhKBXoG02MpZGWIuxe0zn93h43QEcK80lOJykLnqsXRdBVW/cdCEqa9Izz9mjHzZ7D3paW9nGgPX6uP+ZJgoWoUdOlIAyWJClcqyxZ8DLYtlxjwvDVwiK7uiHAIPJc7trSt1VsNX0l54DksccdVkad4E1ZVgo+zg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=rA1tT9dzOqEFbBF1yolFgDs6NQL5g44v6fbA2ZuswCU=; b=e6xUl1I4iYamUvUtXB6UUOsRx20lb5QlA8z+KYuGWf1z+a26zN/SYrbC+6EkdfOfy8/dIE5K2g81ht14/0K3KHl7ChK5oHp7siV8ELH2C+IwFWbpNBGQpU0m2YJGfA67xBdhkToJIzCv2NUfR9BDQOewpuVX69yimgcTdDGw4pd1SvdEtLUwyVq1f8y03Z09bADj//otqvXbJQLOleWgyLVSEuT3N/SNzeMfPk+x6rXQ+ov2Sko8+m0uA7nvpqM8JASGr2Gh1E2nsFFGvWc3hSOnsZkFDxfAauGQyhc+tlz7X5cgXE3+E+wWhiebyy8+0pD8H6b/RCa/H1Xe8fWEdQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by AM7PR08MB5398.eurprd08.prod.outlook.com (2603:10a6:20b:103::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.26; Wed, 28 Jun 2023 13:48:12 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::2301:1cde:cfe7:eaf0]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::2301:1cde:cfe7:eaf0%6]) with mapi id 15.20.6521.026; Wed, 28 Jun 2023 13:48:12 +0000 Date: Wed, 28 Jun 2023 14:48:10 +0100 To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, Richard.Earnshaw@arm.com, Marcus.Shawcroft@arm.com, Kyrylo.Tkachov@arm.com, richard.sandiford@arm.com Subject: [PATCH 15/19]AArch64: Add implementation for vector cbranch for Advanced SIMD Message-ID: Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: LO4P123CA0351.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:18d::14) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|AM7PR08MB5398:EE_|AM7EUR03FT010:EE_|PAWPR08MB9124:EE_ X-MS-Office365-Filtering-Correlation-Id: c2b2a0df-e764-4ae6-7a79-08db77de55ac x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: ewpUcndxfaKB0wLlckpIGXiSnk7RuOG6uLZA09MglTrZKVz86/HNpzOVJeK2J/VXHcqoM+/fy98s/Bz5upJz+mRGIlaKm9h4sqau6WHEVFuu/sePzC+1zgcaKKGlH5EhM6mJHCT2A1jdze4Cv4yXeN+ht/Mbb6Ja6hhop+HuI46VsVosTzmPHiUqhctsuQiRGfXK9a/QVWMrhaenN1EiotE2olBpHJNON/Ercm84fXoJWZfPYGQ09RTfJ4MEWv0Ep/K8UJQ3tTuhRHjPVBZ9V2Z79oP549p4rIIk0kOWWlNseSVCFFsGahsfaUDy+v9fB5zEyok2OYcaJ7U35xIiR2hvIvfuy4+QK1u0a1srU4c8yG90+DYQhVvF5MX6x9q0N1tLE5xkdT6gIfGKA/HjwkwInRlg2iADbMWdEN/Zc9IqucTtTlE6fnYW//s/T58bxTCvvj6ln5hQFN/TYIQfAr2znGf6AmhPmh/G6dMiUIFG2Wvy+MNr2X+xBuMvLyEf4xSR3sb0N1MSJ3pNzqNQoIjWXynt2qEXjvcLu0KzGDaUqX8Ol0DZKyiHwAQGaFRWThtzYHrnIicN+jOMuzXxD4Si7arzd+XAeZU0GPf7pkE= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230028)(4636009)(136003)(376002)(346002)(39860400002)(366004)(396003)(451199021)(6486002)(4743002)(33964004)(26005)(2616005)(478600001)(186003)(2906002)(6506007)(44144004)(5660300002)(44832011)(36756003)(316002)(235185007)(8676002)(66946007)(41300700001)(38100700002)(4326008)(86362001)(6916009)(66476007)(66556008)(8936002)(6512007)(84970400001)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM7PR08MB5398 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM7EUR03FT010.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: dd828ea5-e9a6-486c-5efa-08db77de50a6 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: F5S9StlQiViI9rZtAXzEl/IJvIP/O2JKxOM0Fs4dc4qPQvubXKDFxlQmL3ukKkkXED0hRjuaSmKuailYZU6IYGDk7lrXtkDdjUD4t37nzMDS5s+ms+z0j4bPVoRI30WWGcaKSRGTw8L9WWUw0Wa+1N/TwjilJ2totGzVDbOXKAKKxMW5NttU2D0ZyVbfZgV+3mm1vzVYJ8EvuOJU5MghH1Prw6t00tkrv+WxTb/pE5FtcN1Ve7yVuoAFaABYo0txvsVU2CqOCZHx5zPXkJf/7oIaVftLTCrWZoixTRexZPVP2+FDhN3Bd1RjAKEA62Zjrooz7hbSptq5RjKnhwgcB+BU2xlg+q4es+h4XUWgbXHdFonUUKGeArLUI5/4eM1X/dODZQ3qGoaWK676nBULGJ/0QOwfTrR3RVGqt/4CP4g5FzODJ7IkgD86z6FakiQiEmiUTzDndvPu32J/SQ+qWBsnMSyGkwhWhrm2bCX7nrH/QUHFXH9K+frtvdmU3kk4I9Mzscpc87Lfby6AQKmD3X1ozoxzGh1SDNxu1YxxAAlGdt+YbCCzUCpjIEomtCPlhiySkGNWhf2EqWqeOiZQacBNmE9GOWeDEkYee5MGnkZ2E1m+FoFzCJq90snUsj4/44P/CC5nKSwdOhKlGDRBcohz44LKA3/9p4S9KTCrUyelzt8esQGtWytfqAE64IRqvZuV/Gvj2Cb5q/05RyeZGB176eWsaSIHNAAYpHZ9woYk09B4hy/waZ/XGCPHLLwiFpxMH0NDP3duc4Rnh8ePVQ== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230028)(4636009)(39860400002)(396003)(136003)(376002)(346002)(451199021)(40470700004)(46966006)(36840700001)(84970400001)(36756003)(82310400005)(36860700001)(6506007)(70206006)(81166007)(82740400003)(40460700003)(5660300002)(44832011)(356005)(86362001)(316002)(6916009)(4326008)(41300700001)(8936002)(40480700001)(235185007)(70586007)(8676002)(6486002)(47076005)(6512007)(33964004)(2906002)(44144004)(186003)(26005)(336012)(4743002)(478600001)(2616005)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Jun 2023 13:48:20.6087 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: c2b2a0df-e764-4ae6-7a79-08db77de55ac X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM7EUR03FT010.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAWPR08MB9124 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_LOTSOFHASH, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tamar Christina via Gcc-patches From: Tamar Christina Reply-To: Tamar Christina Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769954921760512341?= X-GMAIL-MSGID: =?utf-8?q?1769954921760512341?= Hi All, This adds an implementation for conditional branch optab for AArch64. For e.g. void f1 () { for (int i = 0; i < N; i++) { b[i] += a[i]; if (a[i] > 0) break; } } For 128-bit vectors we generate: cmgt v1.4s, v1.4s, #0 umaxp v1.4s, v1.4s, v1.4s fmov x3, d1 cbnz x3, .L8 and of 64-bit vector we can omit the compression: cmgt v1.2s, v1.2s, #0 fmov x2, d1 cbz x2, .L13 Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64-simd.md (cbranch4): New. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vect-early-break-cbranch.c: New test. --- inline copy of patch -- diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 90118c6348e9614bef580d1dc94c0c1841dd5204..cd5ec35c3f53028f14828bd70a92924f62524c15 100644 --- diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 90118c6348e9614bef580d1dc94c0c1841dd5204..cd5ec35c3f53028f14828bd70a92924f62524c15 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -3830,6 +3830,46 @@ (define_expand "vcond_mask_" DONE; }) +;; Patterns comparing two vectors and conditionally jump + +(define_expand "cbranch4" + [(set (pc) + (if_then_else + (match_operator 0 "aarch64_equality_operator" + [(match_operand:VDQ_I 1 "register_operand") + (match_operand:VDQ_I 2 "aarch64_simd_reg_or_zero")]) + (label_ref (match_operand 3 "")) + (pc)))] + "TARGET_SIMD" +{ + auto code = GET_CODE (operands[0]); + rtx tmp = operands[1]; + + /* If comparing against a non-zero vector we have to do a comparison first + so we can have a != 0 comparison with the result. */ + if (operands[2] != CONST0_RTX (mode)) + emit_insn (gen_vec_cmp (tmp, operands[0], operands[1], + operands[2])); + + /* For 64-bit vectors we need no reductions. */ + if (known_eq (128, GET_MODE_BITSIZE (mode))) + { + /* Always reduce using a V4SI. */ + rtx reduc = gen_lowpart (V4SImode, tmp); + rtx res = gen_reg_rtx (V4SImode); + emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc)); + emit_move_insn (tmp, gen_lowpart (mode, res)); + } + + rtx val = gen_reg_rtx (DImode); + emit_move_insn (val, gen_lowpart (DImode, tmp)); + + rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx); + rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx); + emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3])); + DONE; +}) + ;; Patterns comparing two vectors to produce a mask. (define_expand "vec_cmp" diff --git a/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c new file mode 100644 index 0000000000000000000000000000000000000000..c0363c3787270507d7902bb2ac0e39faef63a852 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c @@ -0,0 +1,124 @@ +/* { dg-do compile } */ +/* { dg-options "-O3" } */ +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */ + +#pragma GCC target "+nosve" + +#define N 640 +int a[N] = {0}; +int b[N] = {0}; + + +/* +** f1: +** ... +** cmgt v[0-9]+.4s, v[0-9]+.4s, #0 +** umaxp v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s +** fmov x[0-9]+, d[0-9]+ +** cbnz x[0-9]+, \.L[0-9]+ +** ... +*/ +void f1 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] > 0) + break; + } +} + +/* +** f2: +** ... +** cmge v[0-9]+.4s, v[0-9]+.4s, #0 +** umaxp v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s +** fmov x[0-9]+, d[0-9]+ +** cbnz x[0-9]+, \.L[0-9]+ +** ... +*/ +void f2 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] >= 0) + break; + } +} + +/* +** f3: +** ... +** cmeq v[0-9]+.4s, v[0-9]+.4s, #0 +** umaxp v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s +** fmov x[0-9]+, d[0-9]+ +** cbnz x[0-9]+, \.L[0-9]+ +** ... +*/ +void f3 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] == 0) + break; + } +} + +/* +** f4: +** ... +** cmtst v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s +** umaxp v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s +** fmov x[0-9]+, d[0-9]+ +** cbnz x[0-9]+, \.L[0-9]+ +** ... +*/ +void f4 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] != 0) + break; + } +} + +/* +** f5: +** ... +** cmlt v[0-9]+.4s, v[0-9]+.4s, #0 +** umaxp v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s +** fmov x[0-9]+, d[0-9]+ +** cbnz x[0-9]+, \.L[0-9]+ +** ... +*/ +void f5 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] < 0) + break; + } +} + +/* +** f6: +** ... +** cmle v[0-9]+.4s, v[0-9]+.4s, #0 +** umaxp v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s +** fmov x[0-9]+, d[0-9]+ +** cbnz x[0-9]+, \.L[0-9]+ +** ... +*/ +void f6 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] <= 0) + break; + } +} --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -3830,6 +3830,46 @@ (define_expand "vcond_mask_" DONE; }) +;; Patterns comparing two vectors and conditionally jump + +(define_expand "cbranch4" + [(set (pc) + (if_then_else + (match_operator 0 "aarch64_equality_operator" + [(match_operand:VDQ_I 1 "register_operand") + (match_operand:VDQ_I 2 "aarch64_simd_reg_or_zero")]) + (label_ref (match_operand 3 "")) + (pc)))] + "TARGET_SIMD" +{ + auto code = GET_CODE (operands[0]); + rtx tmp = operands[1]; + + /* If comparing against a non-zero vector we have to do a comparison first + so we can have a != 0 comparison with the result. */ + if (operands[2] != CONST0_RTX (mode)) + emit_insn (gen_vec_cmp (tmp, operands[0], operands[1], + operands[2])); + + /* For 64-bit vectors we need no reductions. */ + if (known_eq (128, GET_MODE_BITSIZE (mode))) + { + /* Always reduce using a V4SI. */ + rtx reduc = gen_lowpart (V4SImode, tmp); + rtx res = gen_reg_rtx (V4SImode); + emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc)); + emit_move_insn (tmp, gen_lowpart (mode, res)); + } + + rtx val = gen_reg_rtx (DImode); + emit_move_insn (val, gen_lowpart (DImode, tmp)); + + rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx); + rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx); + emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3])); + DONE; +}) + ;; Patterns comparing two vectors to produce a mask. (define_expand "vec_cmp" diff --git a/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c new file mode 100644 index 0000000000000000000000000000000000000000..c0363c3787270507d7902bb2ac0e39faef63a852 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch.c @@ -0,0 +1,124 @@ +/* { dg-do compile } */ +/* { dg-options "-O3" } */ +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */ + +#pragma GCC target "+nosve" + +#define N 640 +int a[N] = {0}; +int b[N] = {0}; + + +/* +** f1: +** ... +** cmgt v[0-9]+.4s, v[0-9]+.4s, #0 +** umaxp v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s +** fmov x[0-9]+, d[0-9]+ +** cbnz x[0-9]+, \.L[0-9]+ +** ... +*/ +void f1 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] > 0) + break; + } +} + +/* +** f2: +** ... +** cmge v[0-9]+.4s, v[0-9]+.4s, #0 +** umaxp v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s +** fmov x[0-9]+, d[0-9]+ +** cbnz x[0-9]+, \.L[0-9]+ +** ... +*/ +void f2 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] >= 0) + break; + } +} + +/* +** f3: +** ... +** cmeq v[0-9]+.4s, v[0-9]+.4s, #0 +** umaxp v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s +** fmov x[0-9]+, d[0-9]+ +** cbnz x[0-9]+, \.L[0-9]+ +** ... +*/ +void f3 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] == 0) + break; + } +} + +/* +** f4: +** ... +** cmtst v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s +** umaxp v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s +** fmov x[0-9]+, d[0-9]+ +** cbnz x[0-9]+, \.L[0-9]+ +** ... +*/ +void f4 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] != 0) + break; + } +} + +/* +** f5: +** ... +** cmlt v[0-9]+.4s, v[0-9]+.4s, #0 +** umaxp v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s +** fmov x[0-9]+, d[0-9]+ +** cbnz x[0-9]+, \.L[0-9]+ +** ... +*/ +void f5 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] < 0) + break; + } +} + +/* +** f6: +** ... +** cmle v[0-9]+.4s, v[0-9]+.4s, #0 +** umaxp v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s +** fmov x[0-9]+, d[0-9]+ +** cbnz x[0-9]+, \.L[0-9]+ +** ... +*/ +void f6 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] <= 0) + break; + } +} From patchwork Wed Jun 28 13:48:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 113912 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp8944092vqr; Wed, 28 Jun 2023 06:56:32 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7sZUNRZNU/niOFuA5qYrPucXmb5z2y3ZqRr+UwFPVD4e/ZxylgJXVvQdJaHc5oO8mlqayO X-Received: by 2002:a17:907:3e16:b0:978:a186:464f with SMTP id hp22-20020a1709073e1600b00978a186464fmr34967064ejc.39.1687960592154; Wed, 28 Jun 2023 06:56:32 -0700 (PDT) Received: from sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id y16-20020a1709064b1000b009920ac3783asi2337816eju.538.2023.06.28.06.56.31 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Jun 2023 06:56:32 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=ddypw26w; arc=fail (signature failed); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 90E973857016 for ; Wed, 28 Jun 2023 13:52:49 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 90E973857016 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1687960369; bh=qmXXgwqISllIw+ZdmmdrMmMzSrFzYmmD11M/Fwh+C34=; h=Date:To:Cc:Subject:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=ddypw26wNVmot+PwP03H0aJl0ZOG9uGw1v+6/ATJC2eiQim1/lqDBMkqcYiF9FqyF m6KLQjJsGnNyXFocd7bAg3MqiIBJyXEa2iRAyb5I1Rvjeqc4fpwhyOE5BvP85bLgNK HvJzRD61kAMpNDbspoRYhzBZCyoSahHS4UZHd2uA= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR01-VE1-obe.outbound.protection.outlook.com (mail-ve1eur01on2082.outbound.protection.outlook.com [40.107.14.82]) by sourceware.org (Postfix) with ESMTPS id 7CA29385C6F7 for ; Wed, 28 Jun 2023 13:48:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7CA29385C6F7 Received: from AM0PR10CA0087.EURPRD10.PROD.OUTLOOK.COM (2603:10a6:208:15::40) by AS8PR08MB7744.eurprd08.prod.outlook.com (2603:10a6:20b:508::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.24; Wed, 28 Jun 2023 13:48:46 +0000 Received: from AM7EUR03FT031.eop-EUR03.prod.protection.outlook.com (2603:10a6:208:15:cafe::29) by AM0PR10CA0087.outlook.office365.com (2603:10a6:208:15::40) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.34 via Frontend Transport; Wed, 28 Jun 2023 13:48:46 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM7EUR03FT031.mail.protection.outlook.com (100.127.140.84) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6544.18 via Frontend Transport; Wed, 28 Jun 2023 13:48:46 +0000 Received: ("Tessian outbound 52217515e112:v142"); Wed, 28 Jun 2023 13:48:45 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 1ab77b1de9879ba4 X-CR-MTA-TID: 64aa7808 Received: from acbc0cdf4396.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 5E1281A8-8CB0-451D-A7FB-44982C5BEC9C.1; Wed, 28 Jun 2023 13:48:39 +0000 Received: from EUR04-DB3-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id acbc0cdf4396.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 28 Jun 2023 13:48:39 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=gqk8KPI/Z4p43a0utj204atgFkFMUaSQYQEXqrDPR7+jhPKsfiGgekPB0ffjG4QfGlkcIa8tyKQArc7E9wY5afv3zPnhpzMrKwr2FpoujQjka0EIs9zxL/lMIO+1MIrg4MTg7FjG7kFu70TQcaYXi1bGFLIFJ9ROy4DyPbkAsRv6pvNd/g+6+rOLK0iBi4xziFVpW2KiY9ghwhmIWXuVDXEjTZJLh0fotZGAM5jgfun2DUEcnD5gjXwnkWxaabhvL8v2hj2I8UTo4+0jIDNAQeZUIQABo9U8ek+yvZ0RSJ/F2VUXso63jHjc9LPidH8OTqOUv4iMsYWKLinlxQzDSA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=qmXXgwqISllIw+ZdmmdrMmMzSrFzYmmD11M/Fwh+C34=; b=nwZTJTo/zguxGvHGwOxP9Q/wQFVjN4lyDj3eeAsPWtus5FvX0OvXZ3MEDE7CYRiZRu0aZTp4jbrtfRJBRbpiiu5ffY0ML1q54OlJON9inPVzZUXWLSssXRYeHVQNjb3meKmNnRvrwqdhwDlgmsJAscqiw0WTgKeeTLGHnLXpfN3CONssusy3H0Gkvd/FySJ9w/3ozpfsIi6ugN0wxmjGXiWGiyft6DiYk3R6Oi8qmXsehZUnGxcx9iNos9S7et1u3wRFHtQb2zD9V68o8FX8fAhjzBrpkJx3IBnuHtylzqMMFeHXqBW8VWnWxRZJc9vRtdUM3LiuwYa7h6R7XGIXyA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by AM7PR08MB5398.eurprd08.prod.outlook.com (2603:10a6:20b:103::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.26; Wed, 28 Jun 2023 13:48:37 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::2301:1cde:cfe7:eaf0]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::2301:1cde:cfe7:eaf0%6]) with mapi id 15.20.6521.026; Wed, 28 Jun 2023 13:48:37 +0000 Date: Wed, 28 Jun 2023 14:48:30 +0100 To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, Richard.Earnshaw@arm.com, Marcus.Shawcroft@arm.com, Kyrylo.Tkachov@arm.com, richard.sandiford@arm.com Subject: [PATCH 16/19]AArch64 Add optimization for vector != cbranch fed into compare with 0 for Advanced SIMD Message-ID: Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: SA1P222CA0067.NAMP222.PROD.OUTLOOK.COM (2603:10b6:806:2c1::17) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|AM7PR08MB5398:EE_|AM7EUR03FT031:EE_|AS8PR08MB7744:EE_ X-MS-Office365-Filtering-Correlation-Id: 9d16ae57-6802-475b-edce-08db77de64d2 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: XBK6HxEQDWFQOSFDOXXly0VRqaHJmMlQ1RCglaanALoAJr1G2yxqmdVRxTWybW8KPQcqhplFXo0AZgbaFAoyc9KTKliyaTvJoetUwzLoIhm5wT8iOzX56kBZJivz+d904pUMs9BIKiccZzq1poZC0NgiXEHYoY2exLWQestlICjW+XcB7SXLpRQQmk/z9dbmaQr73W7wK/FwzNYNcqUgGFcVnDsQzj706dssXitmMFGrmdp8CgXaq7WR6d6nZ6TiTiPlvV66KOmegXqmZi6uVDJaTOoPAayuMs552zcyUmmeG0AbScJhskbWkyHncrfhaEjEg3A8qTugn/NgAbi0GxZLGRmHw4lFiSvemtbnd2c94cSjS4R2W6xZXGJsAayMmapYM80Sx5im6HHseWitGi8cflXXcAi9tY7ckumgkiXK9vA8b8QxTW8GnuegRsEs2uEVe3Y+xhxU9TIx18zyaKvaVc+61j3x/4GzvotTCuDzkyGMJ0FejKuiuLbtD2O58+lMVr9+U2mwi0o5q/Mf8lZ1Z1p4Ujll8QTzbPNv4lIExtsz2q8eo0uxhr9VLsaFidsV7icda9i3KNmiyhXLll6+kn/51hwSFkQkhF+glj8= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230028)(4636009)(136003)(376002)(346002)(39860400002)(366004)(396003)(451199021)(6486002)(4743002)(33964004)(26005)(2616005)(478600001)(6666004)(186003)(2906002)(6506007)(44144004)(5660300002)(44832011)(36756003)(316002)(235185007)(8676002)(66946007)(41300700001)(38100700002)(4326008)(86362001)(6916009)(66476007)(66556008)(8936002)(6512007)(84970400001)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM7PR08MB5398 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM7EUR03FT031.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 97e6e130-736c-4568-72d9-08db77de5f9f X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: CzX4A9iZGNfJ61ZzaeiRGwNt+0GAEtt4Unsg+4fTlVYQAlaI995BdedvF69IjcBeRsanJJ35ozDQqX4jynbsGN1G4ocJGrr5YNpIFBjNqudOHJeQ+HBa81NntljGS3EgK+KSt2zs/7Df2Unc5MR4LHlBmT0gkIVQKG5qnPSv1zBBDXzzbcTG0Me8MLXthHcQGPAps36AEBoDiotyFrS+/m710kpTxmj7zt1296t4cowXJjDeU40wtwWPJVJNTfSnRbtQzvoAuOc/zpfy7YiJzkvvbl7UnJSmhX7S3a+uyTUfkf+LtlAlbCEKIJ5jWPBjlSnfquHo2SNW6TrB4hsFUnh426ryKOg931G0u+ipVf5t26oAe8h2hPc5Nxtyf3gmb+RwsPpj7vnGgwYszlwwZQDnxKke2RZhXibWCV7FYJjjNq+n7dvEroYCeVbuWerT0oUpDDv8YWOe6LfjJ4OARUBr1zt0KXb7KGBItorv9XXtd8Wt1IPN0o3Pd7vqY1NUagth1NTEvmhLYwG6N9KcnnM0XrAm+SGQXrtE3AR09063sR5feXazMieCjuYqXDMv5Q+sPvBYJMytfalyr8+W3PaqHWxalnNgcbvUwOo+aBqdx1ky/uHyxN+lMwj2aPJp8aTdjN32LaiS/1rCMPWucHw7JMNo7kaEwq8w4pwS8+ZZ/TnOl19JI+yIAbs8Optg1sedGj+6o6ZSIq7tB4ZAPDZi9FLx1oxRrc1ptYLm/4ziwYPPAgk6nWcHCSnURK9w X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230028)(4636009)(39860400002)(136003)(346002)(396003)(376002)(451199021)(40470700004)(36840700001)(46966006)(40460700003)(2906002)(6486002)(82310400005)(33964004)(2616005)(4743002)(44144004)(6666004)(356005)(186003)(6512007)(82740400003)(86362001)(81166007)(26005)(47076005)(36860700001)(40480700001)(41300700001)(478600001)(36756003)(316002)(70206006)(4326008)(6916009)(70586007)(6506007)(44832011)(336012)(5660300002)(235185007)(84970400001)(8936002)(8676002)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Jun 2023 13:48:46.0422 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 9d16ae57-6802-475b-edce-08db77de64d2 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM7EUR03FT031.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB7744 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_LOTSOFHASH, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tamar Christina via Gcc-patches From: Tamar Christina Reply-To: Tamar Christina Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769954965738488837?= X-GMAIL-MSGID: =?utf-8?q?1769954965738488837?= Hi All, Advanced SIMD lacks a cmpeq for vectors, and unlike compare to 0 we can't rewrite to a cmtst. This operation is however fairly common, especially now that we support early break vectorization. As such this adds a pattern to recognize the negated any comparison and transform it to an all. i.e. any(~x) => all(x) and invert the branches. For e.g. void f1 (int x) { for (int i = 0; i < N; i++) { b[i] += a[i]; if (a[i] != x) break; } } We currently generate: cmeq v31.4s, v30.4s, v29.4s not v31.16b, v31.16b umaxp v31.4s, v31.4s, v31.4s fmov x5, d31 cbnz x5, .L2 and after this patch: cmeq v31.4s, v30.4s, v29.4s uminp v31.4s, v31.4s, v31.4s fmov x5, d31 cbz x5, .L2 Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64-simd.md (*cbranchnev4si): New. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vect-early-break-cbranch_2.c: New test. --- inline copy of patch -- diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index cd5ec35c3f53028f14828bd70a92924f62524c15..b1a2c617d7d4106ab725d53a5d0b5c2fb61a0c78 100644 --- diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index cd5ec35c3f53028f14828bd70a92924f62524c15..b1a2c617d7d4106ab725d53a5d0b5c2fb61a0c78 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -3870,6 +3870,37 @@ (define_expand "cbranch4" DONE; }) +;; Avdanced SIMD lacks a vector != comparison, but this is a quite common +;; operation. To not pay the penalty for inverting == we can map our any +;; comparisons to all i.e. any(~x) => all(x). +(define_insn_and_split "*cbranchnev4si" + [(set (pc) + (if_then_else + (ne (subreg:DI + (unspec:V4SI + [(not:V4SI (match_operand:V4SI 0 "register_operand" "w")) + (not:V4SI (match_dup 0))] + UNSPEC_UMAXV) 0) + (const_int 0)) + (label_ref (match_operand 1 "")) + (pc))) + (clobber (match_scratch:DI 2 "=w"))] + "TARGET_SIMD" + "#" + "&& true" + [(set (match_dup 2) + (unspec:V4SI [(match_dup 0) (match_dup 0)] UNSPEC_UMINV)) + (set (pc) + (if_then_else + (eq (subreg:DI (match_dup 2) 0) + (const_int 0)) + (label_ref (match_dup 1)) + (pc)))] +{ + if (can_create_pseudo_p ()) + operands[2] = gen_reg_rtx (V4SImode); +}) + ;; Patterns comparing two vectors to produce a mask. (define_expand "vec_cmp" diff --git a/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_2.c b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_2.c new file mode 100644 index 0000000000000000000000000000000000000000..e81027bb50138be627f4dfdffb1557893a5a7723 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_2.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-O3" } */ +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */ + +#pragma GCC target "+nosve" + +#define N 640 +int a[N] = {0}; +int b[N] = {0}; + + +/* +** f1: +** ... + cmeq v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s + uminp v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s + fmov x[0-9]+, d[0-9]+ + cbz x[0-9]+, \.L[0-9]+ +** ... +*/ +void f1 (int x) +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] != x) + break; + } +} --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -3870,6 +3870,37 @@ (define_expand "cbranch4" DONE; }) +;; Avdanced SIMD lacks a vector != comparison, but this is a quite common +;; operation. To not pay the penalty for inverting == we can map our any +;; comparisons to all i.e. any(~x) => all(x). +(define_insn_and_split "*cbranchnev4si" + [(set (pc) + (if_then_else + (ne (subreg:DI + (unspec:V4SI + [(not:V4SI (match_operand:V4SI 0 "register_operand" "w")) + (not:V4SI (match_dup 0))] + UNSPEC_UMAXV) 0) + (const_int 0)) + (label_ref (match_operand 1 "")) + (pc))) + (clobber (match_scratch:DI 2 "=w"))] + "TARGET_SIMD" + "#" + "&& true" + [(set (match_dup 2) + (unspec:V4SI [(match_dup 0) (match_dup 0)] UNSPEC_UMINV)) + (set (pc) + (if_then_else + (eq (subreg:DI (match_dup 2) 0) + (const_int 0)) + (label_ref (match_dup 1)) + (pc)))] +{ + if (can_create_pseudo_p ()) + operands[2] = gen_reg_rtx (V4SImode); +}) + ;; Patterns comparing two vectors to produce a mask. (define_expand "vec_cmp" diff --git a/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_2.c b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_2.c new file mode 100644 index 0000000000000000000000000000000000000000..e81027bb50138be627f4dfdffb1557893a5a7723 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vect-early-break-cbranch_2.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-O3" } */ +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */ + +#pragma GCC target "+nosve" + +#define N 640 +int a[N] = {0}; +int b[N] = {0}; + + +/* +** f1: +** ... + cmeq v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s + uminp v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s + fmov x[0-9]+, d[0-9]+ + cbz x[0-9]+, \.L[0-9]+ +** ... +*/ +void f1 (int x) +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] != x) + break; + } +} From patchwork Wed Jun 28 13:48:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 113907 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp8942274vqr; Wed, 28 Jun 2023 06:53:19 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4o76mX8lLTazeWY8U73o8BcIF8WE0pje9pmul9qdEBPPm56LZL5PDwg6QCM2Z89UGcMk2p X-Received: by 2002:a17:907:5c8:b0:974:7713:293f with SMTP id wg8-20020a17090705c800b009747713293fmr32724140ejb.41.1687960398932; Wed, 28 Jun 2023 06:53:18 -0700 (PDT) Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id lh17-20020a170906f8d100b009929ca58b4fsi153941ejb.62.2023.06.28.06.53.18 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Jun 2023 06:53:18 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=K3bgm1r3; arc=fail (signature failed); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0F04F3882032 for ; Wed, 28 Jun 2023 13:50:29 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0F04F3882032 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1687960229; bh=qjKrTBvzvHR3uUOHBpjwDwpsGTEkrxCum1GT8OOrXWc=; h=Date:To:Cc:Subject:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=K3bgm1r3UYub9ShRLXg1zSMPMEa5/IeQHfIf2mQomX+2+lqvA2Njm7/O7buTePuX3 Lr/ueLRiHftl6D4mbe40vVAf0xTJFj/2TM12C9hiH3dMxAJZ4R5nvvw2y7cdEBlw6V HP5AmhLG485CykhxrMAqGfS9HDGkPG3lul1Xg6wo= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR04-VI1-obe.outbound.protection.outlook.com (mail-vi1eur04on2055.outbound.protection.outlook.com [40.107.8.55]) by sourceware.org (Postfix) with ESMTPS id 2A65A3853D13 for ; Wed, 28 Jun 2023 13:49:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2A65A3853D13 Received: from DUZPR01CA0058.eurprd01.prod.exchangelabs.com (2603:10a6:10:469::12) by PAWPR08MB10281.eurprd08.prod.outlook.com (2603:10a6:102:367::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.26; Wed, 28 Jun 2023 13:49:15 +0000 Received: from DBAEUR03FT053.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:469:cafe::26) by DUZPR01CA0058.outlook.office365.com (2603:10a6:10:469::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.34 via Frontend Transport; Wed, 28 Jun 2023 13:49:15 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT053.mail.protection.outlook.com (100.127.142.121) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6544.20 via Frontend Transport; Wed, 28 Jun 2023 13:49:15 +0000 Received: ("Tessian outbound c08fa2e31830:v142"); Wed, 28 Jun 2023 13:49:15 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 6d865725d07fa32d X-CR-MTA-TID: 64aa7808 Received: from 4056ee67cb68.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 89EC7A12-0A1B-4B6F-94F8-5D095F259C69.1; Wed, 28 Jun 2023 13:49:04 +0000 Received: from EUR04-DB3-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 4056ee67cb68.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 28 Jun 2023 13:49:04 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=IKzDOpoZPvc2nRZy+XbKqfuHg8wudEUc2MyU9eZVhF3y+dzRsr9sejXTgfYCamG7L0Nh1/9l48oocyfms5lKgCqBq22+wHvtgLpBu29h5g3FGKFIRQUREbuQf0BaThaekpDP+Cr2ZXpsDx2v5WW1Gk6ozw5fIIOAhsa+mENnkcXvEHpKuY1voLMZZifnPaUhJha5bOlL3X8AS7yLL7wqZUgWFQtZEPZChIIqQ5Dk9oG2GXmdS34kfDu+OaVaN2/RCkoXNmnc7HnqHxYtpcaRDTCa4hTcfpGRve2cne8fuWnZ0CDXuyXXlgpKdC28nCrr6IjdD1bH7cfUxB9g94okjw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=qjKrTBvzvHR3uUOHBpjwDwpsGTEkrxCum1GT8OOrXWc=; b=glfotqLkk4xBhsBNLxV1WxTMOzu3SuEZwzkWx1toWDKP5aGLN3jpEDDlwlXThXmCkjGBYukL9A10lnJbgdsl20lIPle8DPQgcat1NGKwBXMTTYC7DTqSoJ3iEM5X+7YoJ1NcL/79QPt4RzfFN+ipl9CfEHJqtq97v8AVFT/XNm3BumKf0uSUxc9Ggpokd7sq7mpQImCJrDKd68cNCTy24+C36VzjDALS7n+i0yuIoohyRfclTIjK6EbwjW3DMybWkdtnyIyul00dcqDhNnZwI6RT11v/8zN8Gsa8JES9YOvRB1H4tml6kgeK/cNm82hrFZZhrCCA0hjE/aAFXsiMSA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by AM7PR08MB5398.eurprd08.prod.outlook.com (2603:10a6:20b:103::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.26; Wed, 28 Jun 2023 13:49:02 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::2301:1cde:cfe7:eaf0]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::2301:1cde:cfe7:eaf0%6]) with mapi id 15.20.6521.026; Wed, 28 Jun 2023 13:49:02 +0000 Date: Wed, 28 Jun 2023 14:48:58 +0100 To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, Richard.Earnshaw@arm.com, Marcus.Shawcroft@arm.com, Kyrylo.Tkachov@arm.com, richard.sandiford@arm.com Subject: [PATCH 17/19]AArch64 Add optimization for vector cbranch combining SVE and Advanced SIMD Message-ID: Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: LO3P123CA0002.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:ba::7) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|AM7PR08MB5398:EE_|DBAEUR03FT053:EE_|PAWPR08MB10281:EE_ X-MS-Office365-Filtering-Correlation-Id: 565ff38a-a4c4-4ca0-c8c0-08db77de764f x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: nJwiKxUaFSiKCL2mEv0gLgud1QdmtDXBT31iCMjHZnbR2GBt8hy9QrGOiLoQWryPOar7VFK/5jEm2tu4zATqm0cMV3H7l7o4RuKn2qqZIFq7kdGYaW/DC2gAL84Yr4Qt23oWrkSVf2gHPBjYTv0lNjws/WUGxUZo60RtfDOa4gudFNywUJKyeLXzqqSsnkWA/j/np5yQCBOxur/4BX8WZJwGiMSAmKAyIIwfv/A0ScP1kMeGAnORyBZqhLBU/0c8hu/y00WHeCtWiwnbIbCJZ/Bt/gLaCKcGIkFQVUlHsY8udVVHt+/6Ow+a3n1gezBDL98MlOE9r/fkxMb62HkSnACWkJzIrNzqxOxAhriPESspfiwAzIZlTxLSfA9Df7M6eXKB3C177YLqosDtQ7eKUKvMmtdSMOI2iXA1ce4nr9LeY5NZpy2vGCPiX3ZnO/xA9WOVEQ8FtK3BjfU3ZrBpmb5PUQFnqaOX0EYcJ1C9w8Gvi+NUgz+Nc0fKlq8NTiQKsqxWMkO15JnqIBAHp+sUj4OhTZY36X5LQuDRKjT99L3Kj2yce6PyjNb6z8k82RDI8+Ng/404Y2rwgWzytfPO1q1Oftod875z7Qa+ss9J1Ot9Dg8ZntAEjS92nJB0rkTdGGdENWpY3pz7PyDvjHFH8A== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230028)(4636009)(136003)(376002)(346002)(39860400002)(366004)(396003)(451199021)(6486002)(4743002)(33964004)(26005)(2616005)(478600001)(6666004)(186003)(2906002)(6506007)(83380400001)(44144004)(5660300002)(44832011)(36756003)(316002)(235185007)(8676002)(66946007)(30864003)(41300700001)(38100700002)(4326008)(86362001)(6916009)(66476007)(66556008)(8936002)(6512007)(84970400001)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM7PR08MB5398 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT053.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: de2d1bea-a4ee-450c-77c0-08db77de6e66 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: RL7ahzjK418acGItf2po7NdNJLsABOyogXtlG26ifNDl/Sg7wChFYF07nuRYMMPg8+kWHhB+6Yda685g5VWtMgSNCm0H9Xy692emNDxWm8QIa07jU6O42RvS4PR0W9w6vlqq1ZGprTQNmpjPnY2Wr4IUxbOB10zUi51JT8rsG03Ok8Tq8pGuTB/1Vi1wzgbYmxa7LBk3eXa3skXZlukdtQZ6OuGVggJ12ZD9S/H2nTwSOIlMxoS6Y/oJaz3Wj9zvEVHN0+2e9TDkNftgvw78OK/PFWijUj6qDQsVpnKPv9Sw6wGGiDU9tSKdkEMbpKQrzlYOcYeR4yuF9eR3b93mcW875sX12lrpyj+ThGjk5KUU3pH0oQMdW2an6EtmrpaMZrX++GGZixibwPGbYmN1OTyWVtSKzlHEgFUHqxukNnLU0Fqpw2DYMWJkLnIzsrGY5S1pEqZRT3PCO0EIzU3WMWCg1HsLN1ksY5leiN98DtkzGEQxhazYvffgIBWIUeUw1JaZtR4kW3dNlkgvykpNV3buMqVeQ8owobYj+JffEzNifU5n9kGecqNPq1I4dnuZcM/NF8v9CAxcztk9RDdQLSg1Rm6j4Ro1vP90HWjySt8+X9DoClndb7SgH4jW9gsr59DKQP6MICz+eru5Mbg08i3vfNj5UHQhzb0hVkHNhYkRNBkSHEWYjQuwywpon2JGaHUk54/zonMSnnXvfnBrLkG2uT3ogRNEd2x6R2kWZ6aE+EnnC5ilNlltPEab+nRJguK5x848F5HL3LxaymYl8LCfEL8l7em7KcutHWiN9gg= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230028)(4636009)(136003)(376002)(396003)(346002)(39860400002)(451199021)(46966006)(40470700004)(36840700001)(41300700001)(30864003)(2906002)(40460700003)(6486002)(33964004)(44144004)(82310400005)(356005)(82740400003)(6666004)(81166007)(2616005)(336012)(6506007)(6512007)(4743002)(83380400001)(26005)(186003)(47076005)(36860700001)(40480700001)(86362001)(478600001)(316002)(70206006)(70586007)(36756003)(4326008)(6916009)(8936002)(8676002)(5660300002)(235185007)(84970400001)(44832011)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Jun 2023 13:49:15.4286 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 565ff38a-a4c4-4ca0-c8c0-08db77de764f X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT053.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAWPR08MB10281 X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_DMARC_NONE, KAM_LOTSOFHASH, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tamar Christina via Gcc-patches From: Tamar Christina Reply-To: Tamar Christina Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769954763500712423?= X-GMAIL-MSGID: =?utf-8?q?1769954763500712423?= Hi All, Advanced SIMD lacks flag setting vector comparisons which SVE adds. Since machines with SVE also support Advanced SIMD we can use the SVE comparisons to perform the operation in cases where SVE codegen is allowed, but the vectorizer has decided to generate Advanced SIMD because of loop costing. e.g. for void f1 (int x) { for (int i = 0; i < N; i++) { b[i] += a[i]; if (a[i] != x) break; } } We currently generate: cmeq v31.4s, v31.4s, v28.4s uminp v31.4s, v31.4s, v31.4s fmov x5, d31 cbz x5, .L2 and after this patch: ptrue p7.b, vl16 ... cmpne p15.s, p7/z, z31.s, z28.s b.any .L2 Because we need to lift the predicate creation to outside of the loop we need to expand the predicate early, however in the cbranch expansion we don't see the outer compare which we need to consume. For this reason the expansion is two fold, when expanding the cbranch we emit an SVE predicated comparison and later on during combine we match the SVE and NEON comparison while also consuming the ptest. Unfortunately *aarch64_pred_cmpne_neon_ptest is needed because for some reason combine destroys the NOT and transforms it into a plus and -1. For the straight SVE ones, we seem to fail to eliminate the ptest in these cases but that's a separate optimization Test show that I'm missing a few, but before I write the patterns for them, are these OK? Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64-simd.md (cbranch4): Update with SVE. * config/aarch64/aarch64-sve.md (*aarch64_pred_cmp_neon_ptest, *aarch64_pred_cmpeq_neon_ptest, *aarch64_pred_cmpne_neon_ptest): New. (aarch64_ptest): Rename to... (@aarch64_ptest): ... This. * genemit.cc: Include rtx-vector-builder.h. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/vect-early-break-cbranch_1.c: New test. * gcc.target/aarch64/sve/vect-early-break-cbranch_2.c: New test. --- inline copy of patch -- diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index b1a2c617d7d4106ab725d53a5d0b5c2fb61a0c78..75cb5d6f7f92b70fed8762fe64e23f0c05a99c99 100644 --- diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index b1a2c617d7d4106ab725d53a5d0b5c2fb61a0c78..75cb5d6f7f92b70fed8762fe64e23f0c05a99c99 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -3843,31 +3843,59 @@ (define_expand "cbranch4" "TARGET_SIMD" { auto code = GET_CODE (operands[0]); - rtx tmp = operands[1]; - /* If comparing against a non-zero vector we have to do a comparison first - so we can have a != 0 comparison with the result. */ - if (operands[2] != CONST0_RTX (mode)) - emit_insn (gen_vec_cmp (tmp, operands[0], operands[1], - operands[2])); - - /* For 64-bit vectors we need no reductions. */ - if (known_eq (128, GET_MODE_BITSIZE (mode))) + /* If SVE is available, lets borrow some instructions. We will optimize + these further later in combine. */ + if (TARGET_SVE) { - /* Always reduce using a V4SI. */ - rtx reduc = gen_lowpart (V4SImode, tmp); - rtx res = gen_reg_rtx (V4SImode); - emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc)); - emit_move_insn (tmp, gen_lowpart (mode, res)); + machine_mode full_mode = aarch64_full_sve_mode (mode).require (); + rtx in1 = lowpart_subreg (full_mode, operands[1], mode); + rtx in2 = lowpart_subreg (full_mode, operands[2], mode); + + machine_mode pred_mode = aarch64_sve_pred_mode (full_mode); + rtx_vector_builder builder (VNx16BImode, 16, 2); + for (unsigned int i = 0; i < 16; ++i) + builder.quick_push (CONST1_RTX (BImode)); + for (unsigned int i = 0; i < 16; ++i) + builder.quick_push (CONST0_RTX (BImode)); + rtx ptrue = force_reg (VNx16BImode, builder.build ()); + rtx cast_ptrue = gen_lowpart (pred_mode, ptrue); + rtx ptrue_flag = gen_int_mode (SVE_KNOWN_PTRUE, SImode); + + rtx tmp = gen_reg_rtx (pred_mode); + aarch64_expand_sve_vec_cmp_int (tmp, code, in1, in2); + emit_insn (gen_aarch64_ptest (pred_mode, ptrue, cast_ptrue, ptrue_flag, tmp)); + operands[1] = gen_rtx_REG (CC_NZCmode, CC_REGNUM); + operands[2] = const0_rtx; } + else + { + rtx tmp = operands[1]; - rtx val = gen_reg_rtx (DImode); - emit_move_insn (val, gen_lowpart (DImode, tmp)); + /* If comparing against a non-zero vector we have to do a comparison first + so we can have a != 0 comparison with the result. */ + if (operands[2] != CONST0_RTX (mode)) + emit_insn (gen_vec_cmp (tmp, operands[0], operands[1], + operands[2])); - rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx); - rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx); - emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3])); - DONE; + /* For 64-bit vectors we need no reductions. */ + if (known_eq (128, GET_MODE_BITSIZE (mode))) + { + /* Always reduce using a V4SI. */ + rtx reduc = gen_lowpart (V4SImode, tmp); + rtx res = gen_reg_rtx (V4SImode); + emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc)); + emit_move_insn (tmp, gen_lowpart (mode, res)); + } + + rtx val = gen_reg_rtx (DImode); + emit_move_insn (val, gen_lowpart (DImode, tmp)); + + rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx); + rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx); + emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3])); + DONE; + } }) ;; Avdanced SIMD lacks a vector != comparison, but this is a quite common diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md index da5534c3e32b3a8819c57a26582cfa5e22e63753..0e10e497e073ee7cfa4025d9adb19076c1615e87 100644 --- a/gcc/config/aarch64/aarch64-sve.md +++ b/gcc/config/aarch64/aarch64-sve.md @@ -8059,6 +8059,105 @@ (define_insn "*aarch64_pred_cmp_wide_ptest" "cmp\t%0., %1/z, %2., %3.d" ) +;; Predicated integer comparisons over Advanced SIMD arguments in which only +;; the flags result is interesting. +(define_insn "*aarch64_pred_cmp_neon_ptest" + [(set (reg:CC_NZC CC_REGNUM) + (unspec:CC_NZC + [(match_operand:VNx16BI 1 "register_operand" "Upl") + (match_operand 4) + (match_operand:SI 5 "aarch64_sve_ptrue_flag") + (unspec:VNx4BI + [(match_operand:VNx4BI 6 "register_operand" "Upl") + (match_operand:SI 7 "aarch64_sve_ptrue_flag") + (EQL:VNx4BI + (subreg:SVE_FULL_BHSI + (neg: + (UCOMPARISONS: + (match_operand: 2 "register_operand" "w") + (match_operand: 3 "aarch64_simd_reg_or_zero" "w"))) 0) + (match_operand:SVE_FULL_BHSI 8 "aarch64_simd_imm_zero" "Dz"))] + UNSPEC_PRED_Z)] + UNSPEC_PTEST)) + (clobber (match_scratch:VNx4BI 0 "=Upa"))] + "TARGET_SVE + && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" +{ + operands[2] = lowpart_subreg (mode, operands[2], mode); + operands[3] = lowpart_subreg (mode, operands[3], mode); + if (EQ == ) + std::swap (operands[2], operands[3]); + + return "cmp\t%0., %1/z, %2., %3."; +} +) + +;; Predicated integer comparisons over Advanced SIMD arguments in which only +;; the flags result is interesting. +(define_insn "*aarch64_pred_cmpeq_neon_ptest" + [(set (reg:CC_NZC CC_REGNUM) + (unspec:CC_NZC + [(match_operand:VNx16BI 1 "register_operand" "Upl") + (match_operand 4) + (match_operand:SI 5 "aarch64_sve_ptrue_flag") + (unspec:VNx4BI + [(match_operand:VNx4BI 6 "register_operand" "Upl") + (match_operand:SI 7 "aarch64_sve_ptrue_flag") + (EQL:VNx4BI + (subreg:SVE_FULL_BHSI + (neg: + (eq: + (match_operand: 2 "register_operand" "w") + (match_operand: 3 "aarch64_simd_reg_or_zero" "w"))) 0) + (match_operand:SVE_FULL_BHSI 8 "aarch64_simd_imm_zero" "Dz"))] + UNSPEC_PRED_Z)] + UNSPEC_PTEST)) + (clobber (match_scratch:VNx4BI 0 "=Upa"))] + "TARGET_SVE + && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" +{ + operands[2] = lowpart_subreg (mode, operands[2], mode); + operands[3] = lowpart_subreg (mode, operands[3], mode); + if (EQ == ) + std::swap (operands[2], operands[3]); + + return "cmpeq\t%0., %1/z, %2., %3."; +} +) + +;; Same as the above but version for == and != +(define_insn "*aarch64_pred_cmpne_neon_ptest" + [(set (reg:CC_NZC CC_REGNUM) + (unspec:CC_NZC + [(match_operand:VNx16BI 1 "register_operand" "Upl") + (match_operand 4) + (match_operand:SI 5 "aarch64_sve_ptrue_flag") + (unspec:VNx4BI + [(match_operand:VNx4BI 6 "register_operand" "Upl") + (match_operand:SI 7 "aarch64_sve_ptrue_flag") + (EQL:VNx4BI + (subreg:SVE_FULL_BHSI + (plus: + (eq: + (match_operand: 2 "register_operand" "w") + (match_operand: 3 "aarch64_simd_reg_or_zero" "w")) + (match_operand: 9 "aarch64_simd_imm_minus_one" "i")) 0) + (match_operand:SVE_FULL_BHSI 8 "aarch64_simd_imm_zero" "Dz"))] + UNSPEC_PRED_Z)] + UNSPEC_PTEST)) + (clobber (match_scratch:VNx4BI 0 "=Upa"))] + "TARGET_SVE + && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" +{ + operands[2] = lowpart_subreg (mode, operands[2], mode); + operands[3] = lowpart_subreg (mode, operands[3], mode); + if (EQ == ) + std::swap (operands[2], operands[3]); + + return "cmpne\t%0., %1/z, %2., %3."; +} +) + ;; ------------------------------------------------------------------------- ;; ---- [INT] While tests ;; ------------------------------------------------------------------------- @@ -8537,7 +8636,7 @@ (define_expand "cbranch4" ) ;; See "Description of UNSPEC_PTEST" above for details. -(define_insn "aarch64_ptest" +(define_insn "@aarch64_ptest" [(set (reg:CC_NZC CC_REGNUM) (unspec:CC_NZC [(match_operand:VNx16BI 0 "register_operand" "Upa") (match_operand 1) diff --git a/gcc/genemit.cc b/gcc/genemit.cc index 1ce0564076d8b0d39542f49dd51e5df01cc83c35..73309ca00ec0aa3cd76c85e04535bac44cb2f354 100644 --- a/gcc/genemit.cc +++ b/gcc/genemit.cc @@ -906,6 +906,7 @@ from the machine description file `md'. */\n\n"); printf ("#include \"tm-constrs.h\"\n"); printf ("#include \"ggc.h\"\n"); printf ("#include \"target.h\"\n\n"); + printf ("#include \"rtx-vector-builder.h\"\n\n"); /* Read the machine description. */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_1.c b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_1.c new file mode 100644 index 0000000000000000000000000000000000000000..c281cfccbe12f0ac8c01ede563dbe325237902c9 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_1.c @@ -0,0 +1,117 @@ +/* { dg-do compile } */ +/* { dg-options "-O3" } */ +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */ + +#define N 640 +int a[N] = {0}; +int b[N] = {0}; + + +/* +** f1: +** ... +** cmpgt p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 +** ptest p[0-9]+, p[0-9]+.b +** b.any \.L[0-9]+ +** ... +*/ +void f1 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] > 0) + break; + } +} + +/* +** f2: +** ... +** cmpge p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 +** ptest p[0-9]+, p[0-9]+.b +** b.any \.L[0-9]+ +** ... +*/ +void f2 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] >= 0) + break; + } +} + +/* +** f3: +** ... +** cmpeq p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 +** ptest p[0-9]+, p[0-9]+.b +** b.any \.L[0-9]+ +** ... +*/ +void f3 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] == 0) + break; + } +} + +/* +** f4: +** ... +** cmpne p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 +** ptest p[0-9]+, p[0-9]+.b +** b.any \.L[0-9]+ +** ... +*/ +void f4 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] != 0) + break; + } +} + +/* +** f5: +** ... +** cmplt p[0-9]+.s, p7/z, z[0-9]+.s, #0 +** ptest p[0-9]+, p[0-9]+.b +** b.any .L[0-9]+ +** ... +*/ +void f5 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] < 0) + break; + } +} + +/* +** f6: +** ... +** cmple p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 +** ptest p[0-9]+, p[0-9]+.b +** b.any \.L[0-9]+ +** ... +*/ +void f6 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] <= 0) + break; + } +} + diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_2.c b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_2.c new file mode 100644 index 0000000000000000000000000000000000000000..f1ca3eafc5ae33393a7df9b5e40fa3420a79bfc2 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_2.c @@ -0,0 +1,114 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 --param=aarch64-autovec-preference=1" } */ +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */ + +#define N 640 +int a[N] = {0}; +int b[N] = {0}; + + +/* +** f1: +** ... +** cmgt v[0-9]+.4s, v[0-9]+.4s, #0 +** cmpne p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 +** b.any \.L[0-9]+ +** ... +*/ +void f1 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] > 0) + break; + } +} + +/* +** f2: +** ... +** cmge v[0-9]+.4s, v[0-9]+.4s, #0 +** cmpne p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 +** b.any \.L[0-9]+ +** ... +*/ +void f2 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] >= 0) + break; + } +} + +/* +** f3: +** ... +** cmpeq p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, z[0-9]+.s +** b.any \.L[0-9]+ +** ... +*/ +void f3 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] == 0) + break; + } +} + +/* +** f4: +** ... +** cmpne p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, z[0-9]+.s +** b.any \.L[0-9]+ +** ... +*/ +void f4 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] != 0) + break; + } +} + +/* +** f5: +** ... +** cmlt v[0-9]+.4s, v[0-9]+.4s, #0 +** cmpne p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 +** b.any \.L[0-9]+ +** ... +*/ +void f5 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] < 0) + break; + } +} + +/* +** f6: +** ... +** cmle v[0-9]+.4s, v[0-9]+.4s, #0 +** cmpne p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 +** b.any \.L[0-9]+ +** ... +*/ +void f6 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] <= 0) + break; + } +} --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -3843,31 +3843,59 @@ (define_expand "cbranch4" "TARGET_SIMD" { auto code = GET_CODE (operands[0]); - rtx tmp = operands[1]; - /* If comparing against a non-zero vector we have to do a comparison first - so we can have a != 0 comparison with the result. */ - if (operands[2] != CONST0_RTX (mode)) - emit_insn (gen_vec_cmp (tmp, operands[0], operands[1], - operands[2])); - - /* For 64-bit vectors we need no reductions. */ - if (known_eq (128, GET_MODE_BITSIZE (mode))) + /* If SVE is available, lets borrow some instructions. We will optimize + these further later in combine. */ + if (TARGET_SVE) { - /* Always reduce using a V4SI. */ - rtx reduc = gen_lowpart (V4SImode, tmp); - rtx res = gen_reg_rtx (V4SImode); - emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc)); - emit_move_insn (tmp, gen_lowpart (mode, res)); + machine_mode full_mode = aarch64_full_sve_mode (mode).require (); + rtx in1 = lowpart_subreg (full_mode, operands[1], mode); + rtx in2 = lowpart_subreg (full_mode, operands[2], mode); + + machine_mode pred_mode = aarch64_sve_pred_mode (full_mode); + rtx_vector_builder builder (VNx16BImode, 16, 2); + for (unsigned int i = 0; i < 16; ++i) + builder.quick_push (CONST1_RTX (BImode)); + for (unsigned int i = 0; i < 16; ++i) + builder.quick_push (CONST0_RTX (BImode)); + rtx ptrue = force_reg (VNx16BImode, builder.build ()); + rtx cast_ptrue = gen_lowpart (pred_mode, ptrue); + rtx ptrue_flag = gen_int_mode (SVE_KNOWN_PTRUE, SImode); + + rtx tmp = gen_reg_rtx (pred_mode); + aarch64_expand_sve_vec_cmp_int (tmp, code, in1, in2); + emit_insn (gen_aarch64_ptest (pred_mode, ptrue, cast_ptrue, ptrue_flag, tmp)); + operands[1] = gen_rtx_REG (CC_NZCmode, CC_REGNUM); + operands[2] = const0_rtx; } + else + { + rtx tmp = operands[1]; - rtx val = gen_reg_rtx (DImode); - emit_move_insn (val, gen_lowpart (DImode, tmp)); + /* If comparing against a non-zero vector we have to do a comparison first + so we can have a != 0 comparison with the result. */ + if (operands[2] != CONST0_RTX (mode)) + emit_insn (gen_vec_cmp (tmp, operands[0], operands[1], + operands[2])); - rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx); - rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx); - emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3])); - DONE; + /* For 64-bit vectors we need no reductions. */ + if (known_eq (128, GET_MODE_BITSIZE (mode))) + { + /* Always reduce using a V4SI. */ + rtx reduc = gen_lowpart (V4SImode, tmp); + rtx res = gen_reg_rtx (V4SImode); + emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc)); + emit_move_insn (tmp, gen_lowpart (mode, res)); + } + + rtx val = gen_reg_rtx (DImode); + emit_move_insn (val, gen_lowpart (DImode, tmp)); + + rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx); + rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx); + emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3])); + DONE; + } }) ;; Avdanced SIMD lacks a vector != comparison, but this is a quite common diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md index da5534c3e32b3a8819c57a26582cfa5e22e63753..0e10e497e073ee7cfa4025d9adb19076c1615e87 100644 --- a/gcc/config/aarch64/aarch64-sve.md +++ b/gcc/config/aarch64/aarch64-sve.md @@ -8059,6 +8059,105 @@ (define_insn "*aarch64_pred_cmp_wide_ptest" "cmp\t%0., %1/z, %2., %3.d" ) +;; Predicated integer comparisons over Advanced SIMD arguments in which only +;; the flags result is interesting. +(define_insn "*aarch64_pred_cmp_neon_ptest" + [(set (reg:CC_NZC CC_REGNUM) + (unspec:CC_NZC + [(match_operand:VNx16BI 1 "register_operand" "Upl") + (match_operand 4) + (match_operand:SI 5 "aarch64_sve_ptrue_flag") + (unspec:VNx4BI + [(match_operand:VNx4BI 6 "register_operand" "Upl") + (match_operand:SI 7 "aarch64_sve_ptrue_flag") + (EQL:VNx4BI + (subreg:SVE_FULL_BHSI + (neg: + (UCOMPARISONS: + (match_operand: 2 "register_operand" "w") + (match_operand: 3 "aarch64_simd_reg_or_zero" "w"))) 0) + (match_operand:SVE_FULL_BHSI 8 "aarch64_simd_imm_zero" "Dz"))] + UNSPEC_PRED_Z)] + UNSPEC_PTEST)) + (clobber (match_scratch:VNx4BI 0 "=Upa"))] + "TARGET_SVE + && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" +{ + operands[2] = lowpart_subreg (mode, operands[2], mode); + operands[3] = lowpart_subreg (mode, operands[3], mode); + if (EQ == ) + std::swap (operands[2], operands[3]); + + return "cmp\t%0., %1/z, %2., %3."; +} +) + +;; Predicated integer comparisons over Advanced SIMD arguments in which only +;; the flags result is interesting. +(define_insn "*aarch64_pred_cmpeq_neon_ptest" + [(set (reg:CC_NZC CC_REGNUM) + (unspec:CC_NZC + [(match_operand:VNx16BI 1 "register_operand" "Upl") + (match_operand 4) + (match_operand:SI 5 "aarch64_sve_ptrue_flag") + (unspec:VNx4BI + [(match_operand:VNx4BI 6 "register_operand" "Upl") + (match_operand:SI 7 "aarch64_sve_ptrue_flag") + (EQL:VNx4BI + (subreg:SVE_FULL_BHSI + (neg: + (eq: + (match_operand: 2 "register_operand" "w") + (match_operand: 3 "aarch64_simd_reg_or_zero" "w"))) 0) + (match_operand:SVE_FULL_BHSI 8 "aarch64_simd_imm_zero" "Dz"))] + UNSPEC_PRED_Z)] + UNSPEC_PTEST)) + (clobber (match_scratch:VNx4BI 0 "=Upa"))] + "TARGET_SVE + && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" +{ + operands[2] = lowpart_subreg (mode, operands[2], mode); + operands[3] = lowpart_subreg (mode, operands[3], mode); + if (EQ == ) + std::swap (operands[2], operands[3]); + + return "cmpeq\t%0., %1/z, %2., %3."; +} +) + +;; Same as the above but version for == and != +(define_insn "*aarch64_pred_cmpne_neon_ptest" + [(set (reg:CC_NZC CC_REGNUM) + (unspec:CC_NZC + [(match_operand:VNx16BI 1 "register_operand" "Upl") + (match_operand 4) + (match_operand:SI 5 "aarch64_sve_ptrue_flag") + (unspec:VNx4BI + [(match_operand:VNx4BI 6 "register_operand" "Upl") + (match_operand:SI 7 "aarch64_sve_ptrue_flag") + (EQL:VNx4BI + (subreg:SVE_FULL_BHSI + (plus: + (eq: + (match_operand: 2 "register_operand" "w") + (match_operand: 3 "aarch64_simd_reg_or_zero" "w")) + (match_operand: 9 "aarch64_simd_imm_minus_one" "i")) 0) + (match_operand:SVE_FULL_BHSI 8 "aarch64_simd_imm_zero" "Dz"))] + UNSPEC_PRED_Z)] + UNSPEC_PTEST)) + (clobber (match_scratch:VNx4BI 0 "=Upa"))] + "TARGET_SVE + && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" +{ + operands[2] = lowpart_subreg (mode, operands[2], mode); + operands[3] = lowpart_subreg (mode, operands[3], mode); + if (EQ == ) + std::swap (operands[2], operands[3]); + + return "cmpne\t%0., %1/z, %2., %3."; +} +) + ;; ------------------------------------------------------------------------- ;; ---- [INT] While tests ;; ------------------------------------------------------------------------- @@ -8537,7 +8636,7 @@ (define_expand "cbranch4" ) ;; See "Description of UNSPEC_PTEST" above for details. -(define_insn "aarch64_ptest" +(define_insn "@aarch64_ptest" [(set (reg:CC_NZC CC_REGNUM) (unspec:CC_NZC [(match_operand:VNx16BI 0 "register_operand" "Upa") (match_operand 1) diff --git a/gcc/genemit.cc b/gcc/genemit.cc index 1ce0564076d8b0d39542f49dd51e5df01cc83c35..73309ca00ec0aa3cd76c85e04535bac44cb2f354 100644 --- a/gcc/genemit.cc +++ b/gcc/genemit.cc @@ -906,6 +906,7 @@ from the machine description file `md'. */\n\n"); printf ("#include \"tm-constrs.h\"\n"); printf ("#include \"ggc.h\"\n"); printf ("#include \"target.h\"\n\n"); + printf ("#include \"rtx-vector-builder.h\"\n\n"); /* Read the machine description. */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_1.c b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_1.c new file mode 100644 index 0000000000000000000000000000000000000000..c281cfccbe12f0ac8c01ede563dbe325237902c9 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_1.c @@ -0,0 +1,117 @@ +/* { dg-do compile } */ +/* { dg-options "-O3" } */ +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */ + +#define N 640 +int a[N] = {0}; +int b[N] = {0}; + + +/* +** f1: +** ... +** cmpgt p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 +** ptest p[0-9]+, p[0-9]+.b +** b.any \.L[0-9]+ +** ... +*/ +void f1 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] > 0) + break; + } +} + +/* +** f2: +** ... +** cmpge p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 +** ptest p[0-9]+, p[0-9]+.b +** b.any \.L[0-9]+ +** ... +*/ +void f2 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] >= 0) + break; + } +} + +/* +** f3: +** ... +** cmpeq p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 +** ptest p[0-9]+, p[0-9]+.b +** b.any \.L[0-9]+ +** ... +*/ +void f3 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] == 0) + break; + } +} + +/* +** f4: +** ... +** cmpne p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 +** ptest p[0-9]+, p[0-9]+.b +** b.any \.L[0-9]+ +** ... +*/ +void f4 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] != 0) + break; + } +} + +/* +** f5: +** ... +** cmplt p[0-9]+.s, p7/z, z[0-9]+.s, #0 +** ptest p[0-9]+, p[0-9]+.b +** b.any .L[0-9]+ +** ... +*/ +void f5 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] < 0) + break; + } +} + +/* +** f6: +** ... +** cmple p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 +** ptest p[0-9]+, p[0-9]+.b +** b.any \.L[0-9]+ +** ... +*/ +void f6 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] <= 0) + break; + } +} + diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_2.c b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_2.c new file mode 100644 index 0000000000000000000000000000000000000000..f1ca3eafc5ae33393a7df9b5e40fa3420a79bfc2 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_2.c @@ -0,0 +1,114 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 --param=aarch64-autovec-preference=1" } */ +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */ + +#define N 640 +int a[N] = {0}; +int b[N] = {0}; + + +/* +** f1: +** ... +** cmgt v[0-9]+.4s, v[0-9]+.4s, #0 +** cmpne p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 +** b.any \.L[0-9]+ +** ... +*/ +void f1 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] > 0) + break; + } +} + +/* +** f2: +** ... +** cmge v[0-9]+.4s, v[0-9]+.4s, #0 +** cmpne p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 +** b.any \.L[0-9]+ +** ... +*/ +void f2 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] >= 0) + break; + } +} + +/* +** f3: +** ... +** cmpeq p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, z[0-9]+.s +** b.any \.L[0-9]+ +** ... +*/ +void f3 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] == 0) + break; + } +} + +/* +** f4: +** ... +** cmpne p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, z[0-9]+.s +** b.any \.L[0-9]+ +** ... +*/ +void f4 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] != 0) + break; + } +} + +/* +** f5: +** ... +** cmlt v[0-9]+.4s, v[0-9]+.4s, #0 +** cmpne p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 +** b.any \.L[0-9]+ +** ... +*/ +void f5 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] < 0) + break; + } +} + +/* +** f6: +** ... +** cmle v[0-9]+.4s, v[0-9]+.4s, #0 +** cmpne p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 +** b.any \.L[0-9]+ +** ... +*/ +void f6 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] <= 0) + break; + } +} From patchwork Wed Jun 28 13:49:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 113910 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp8943486vqr; Wed, 28 Jun 2023 06:55:31 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5KoL0QqZQzohq6M+gxyrCIbC5lWG+9g41ErwpOJQP4JsINoe1bUdzwdf+Dt6fC7fKlbZYK X-Received: by 2002:a50:ee0c:0:b0:51d:d01c:a2c4 with SMTP id g12-20020a50ee0c000000b0051dd01ca2c4mr1407660eds.7.1687960530932; Wed, 28 Jun 2023 06:55:30 -0700 (PDT) Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id d22-20020a50fb16000000b0051d98308c3csi3937404edq.470.2023.06.28.06.55.30 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Jun 2023 06:55:30 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=vgwBhLUl; arc=fail (signature failed); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id F276F3836E92 for ; Wed, 28 Jun 2023 13:51:35 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org F276F3836E92 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1687960296; bh=t/0jiBZAURIFHKhI9Fu5xPkLfIBPgoCKcK7OvJCikss=; h=Date:To:Cc:Subject:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=vgwBhLUlnFE2k2SduNFp4ncMlR3FGJMWqi07VlV1WziHsdk+HSB8plT8kZZL7foJN HLQ3nSsRbEd32HBVVmxXJ/CtRUJTZWWjG9Zw31liHH372yaEfjmfhHhq6A2b/DVDRs uf4icpp9OLZo04OqLGY4aEt5yXE14NwJ0SBLJMQ0= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR01-DB5-obe.outbound.protection.outlook.com (mail-db5eur01on2053.outbound.protection.outlook.com [40.107.15.53]) by sourceware.org (Postfix) with ESMTPS id CA992385E45D for ; Wed, 28 Jun 2023 13:49:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CA992385E45D Received: from DB7PR05CA0071.eurprd05.prod.outlook.com (2603:10a6:10:2e::48) by AS8PR08MB9623.eurprd08.prod.outlook.com (2603:10a6:20b:618::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.24; Wed, 28 Jun 2023 13:49:30 +0000 Received: from DBAEUR03FT016.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:2e:cafe::1) by DB7PR05CA0071.outlook.office365.com (2603:10a6:10:2e::48) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.36 via Frontend Transport; Wed, 28 Jun 2023 13:49:30 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT016.mail.protection.outlook.com (100.127.142.204) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6544.18 via Frontend Transport; Wed, 28 Jun 2023 13:49:30 +0000 Received: ("Tessian outbound 7c913606c6e6:v142"); Wed, 28 Jun 2023 13:49:30 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 3be5384758081997 X-CR-MTA-TID: 64aa7808 Received: from ff8427258574.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id F2893187-3CE2-4F5B-ACC5-4FA477106442.1; Wed, 28 Jun 2023 13:49:24 +0000 Received: from EUR05-DB8-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id ff8427258574.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 28 Jun 2023 13:49:24 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=jUSkxIglCsRdo8mt2cIvOVf05V5yGlKuMIVf9OT5P0sSZV2JO+lXw2N2Qbi5SJD5qeEnnV+FUS3CN93Php4v9veWWvotMqJHEK2FWl2zjt6j/4NmTEfNZotd91c379Ne/Czk+n3fNc91jB/O8T8DwFhOUVrrrfLwY7g/zi4RuS3zFnkyWVj2p6ytKNsEm5H3/ulpdYiFsv0dp+nGZRdt+WR7GIP4OO4QZcGVum3Sj9XhV3Q/9y/BFOMd0GwB+PEH8AabAIIpzEKvtaCn/F/DCDwAeG4aLJISN33FQaC+Lex6wGLpuitmlSm54pAJ8wzsMzWCyZJpbNx1axIVqcKfBA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=t/0jiBZAURIFHKhI9Fu5xPkLfIBPgoCKcK7OvJCikss=; b=a1+7JaKbL+FmnfTPIdP5v8Ep7DpjCx4etktccTRHz1VrHmQMHCBUMo5o2RCvmFvLYVUuqLwdz7LGTq/S7hE1NF5zVax4qU7UXtNs0KizkOm9OHfwMb5S24IpLMmdy4gvYKBUmMRUWj0sCEZHDi4qixExZblORTDmRzXqkDgbojyJELkzrOH4kaiVNjpENMNU9ka/RRGAEWW8Seq78oOjE74+4fi9oRXjGTLqR3gXk82e2A3BTiVXAtMk7jI85p4PUv4YFV8G0ugBXCPfe9+itRGRMhtKXiTwTur2DSdWMLgc6HZD5xEvlzjnHHVLUcau/ByfgifG/V48UbsLYTOdhQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by DB5PR08MB9970.eurprd08.prod.outlook.com (2603:10a6:10:489::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.23; Wed, 28 Jun 2023 13:49:22 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::2301:1cde:cfe7:eaf0]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::2301:1cde:cfe7:eaf0%6]) with mapi id 15.20.6521.026; Wed, 28 Jun 2023 13:49:22 +0000 Date: Wed, 28 Jun 2023 14:49:19 +0100 To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, Ramana.Radhakrishnan@arm.com, Richard.Earnshaw@arm.com, nickc@redhat.com, Kyrylo.Tkachov@arm.com Subject: [PATCH 18/19]Arm: Add Advanced SIMD cbranch implementation Message-ID: Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: LO4P123CA0039.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:152::8) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|DB5PR08MB9970:EE_|DBAEUR03FT016:EE_|AS8PR08MB9623:EE_ X-MS-Office365-Filtering-Correlation-Id: 23f7a48b-5d30-406b-b32a-08db77de7f61 X-LD-Processed: f34e5979-57d9-4aaa-ad4d-b122a662184d,ExtAddr x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: kqVZANIcfb/3KGLwIUoX2ZQZEcMV9ZJU+Gt7qP4vA0rm8/3DJ+FRjUB6s/uWDK+Jfv7gKvkbQt0DitNqclw2HI5kLIfaOcMeLSIcWJ2evrVt0zjlsHWtyAw79g4RPrWRKmKtbwZ6iPUcnrwowL0PO28tCtXfs7ZQ+nuDoHVF6VyDS2YX1hQ46Wb0/Saffcl2ehXPnqYryS6xGcO/YkelfQp/EDjG1T9yCyJ6Olguq15ClD51CoYf3fdxPVkSD9o3dKKAJPMqQ9kSrhF+aSbT5I503jat00Obcs9hmP0LWqalyq3bFDBBAtwz42y1XaqUm+Dhqk3qM57KroHSMhdJsq4ViOelmTPIKhfZezf2n+PuIbHbrDKSBIsrTgNRMeuWMte8CV50tXADftW8MjF8/7aCwmTfeWxRNAzEoRe2jYEKDDT3DxmKp3B0jmCEfidE/2ZQbc3ajl1XWnSmun+dklQDYVxoVBjCTzKEEQqoxZ8WG45OGjuE/666TFkzif6BWf3Va95DfzYwnLNIXv5x3Aa3SjSkNMZQm358qKP36nr+6I/+ZX2m1OLlBpkg48dN8S/CFI9hYKHGl2uD79k41RP0QW7na/iDGFSlmMj7fIU= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230028)(4636009)(136003)(346002)(396003)(376002)(366004)(39860400002)(451199021)(66946007)(66476007)(6916009)(44832011)(316002)(66556008)(478600001)(4326008)(36756003)(8936002)(235185007)(8676002)(84970400001)(5660300002)(86362001)(2906002)(33964004)(41300700001)(44144004)(6486002)(186003)(4743002)(6506007)(6512007)(6666004)(38100700002)(26005)(2616005)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB5PR08MB9970 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT016.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 5ad665f8-2c56-45bf-eadd-08db77de7a2d X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 9beO9Pxef9cm98yT6i/1s4D2hmaYShVV2ysEQCTt9lYCZyvjEx1Le0NFH0cE7IXGEK6txMXWj+726VVJatcfGdCtUxAdXdyasIieTpvBsBIboFYIKYoaUEQDE/zL/esDG2O30H2ZdBCXxkbN6+Bsu2qqtMdKEqdfDTASf43qbIdrjMV2d3xrTtliQl1BA0cMNObZGxtgdA+16lP/gKICY23Gzs+9hHQ9fR46CRO87KtkI7NzuafyShXOlv4HHCRJcSR0BVIj1tXy1fSMrWbEU6nFUk0VpnVuL5uXw9RxevCQQR5/D/LHi+OaGUTfmCZY5vLv9ejBA0RehwcedFdkXWOvgnOdzygAGZNiKxB23PSlj5M9LWUv/51j3oB30kYX3/w+KGPnzqXRqnO6x4hoMXy7TxzFcNF07aYR9PIIqhQvT+XD9AUDCZR1h0YRDurZJbZfzNbB1fVZo44c/YvNqDviEa1PlAv4WydapJn9WgW4xjzQtVo3qBoH/QaFTVUHBI2Wne8SHz8DtXvBt2klGMYZRd+z2JQmyPDO991Pz9+kFBWnDcf6AW+RIny4YphdI32kIruNe5MsaDgoA0r0lwZ862U/vKAi1Q2oiLB9an1h7l8aOrwTL82eatL7ySQZpcffAg7VSYPHQeh2tVM2nK9fvSZADgzsUdaucfJEJUVqx2W+aen7WeAxumBFwWtAFMdI77ClAFin1sH3O+IkQnxQKG/r8+iIskqKJzoSqRqKQu/7RBB7SMfVFzvCNdxvZ3fM+sLBgeOtVPTfNpr3DA== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230028)(4636009)(346002)(376002)(396003)(39860400002)(136003)(451199021)(46966006)(40470700004)(36840700001)(84970400001)(33964004)(44144004)(6666004)(478600001)(6486002)(2616005)(47076005)(36860700001)(336012)(40480700001)(82740400003)(86362001)(40460700003)(82310400005)(6506007)(26005)(4743002)(186003)(81166007)(2906002)(6512007)(4326008)(36756003)(6916009)(8936002)(41300700001)(70206006)(235185007)(70586007)(316002)(356005)(44832011)(8676002)(5660300002)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Jun 2023 13:49:30.6611 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 23f7a48b-5d30-406b-b32a-08db77de7f61 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT016.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB9623 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_LOTSOFHASH, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tamar Christina via Gcc-patches From: Tamar Christina Reply-To: Tamar Christina Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769954901362996134?= X-GMAIL-MSGID: =?utf-8?q?1769954901362996134?= Hi All, This adds an implementation for conditional branch optab for AArch32. For e.g. void f1 () { for (int i = 0; i < N; i++) { b[i] += a[i]; if (a[i] > 0) break; } } For 128-bit vectors we generate: vcgt.s32 q8, q9, #0 vpmax.u32 d7, d16, d17 vpmax.u32 d7, d7, d7 vmov r3, s14 @ int cmp r3, #0 and of 64-bit vector we can omit one vpmax as we still need to compress to 32-bits. Bootstrapped Regtested on arm-none-linux-gnueabihf and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/arm/neon.md (cbranch4): New. gcc/testsuite/ChangeLog: * lib/target-supports.exp (vect_early_break): Add AArch32. * gcc.target/arm/vect-early-break-cbranch.c: New test. --- inline copy of patch -- diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index d213369ffc38fb88ad0357d848cc7da5af73bab7..130efbc37cfe3128533599dfadc344d2243dcb63 100644 --- diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index d213369ffc38fb88ad0357d848cc7da5af73bab7..130efbc37cfe3128533599dfadc344d2243dcb63 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -408,6 +408,45 @@ (define_insn "vec_extract" [(set_attr "type" "neon_store1_one_lane,neon_to_gp")] ) +;; Patterns comparing two vectors and conditionally jump. +;; Avdanced SIMD lacks a vector != comparison, but this is a quite common +;; operation. To not pay the penalty for inverting == we can map our any +;; comparisons to all i.e. any(~x) => all(x). +;; +;; However unlike the AArch64 version, we can't optimize this further as the +;; chain is too long for combine due to these being unspecs so it doesn't fold +;; the operation to something simpler. +(define_expand "cbranch4" + [(set (pc) (if_then_else + (match_operator 0 "expandable_comparison_operator" + [(match_operand:VDQI 1 "register_operand") + (match_operand:VDQI 2 "zero_operand")]) + (label_ref (match_operand 3 "" "")) + (pc)))] + "TARGET_NEON" +{ + rtx mask = operands[1]; + + /* For 128-bit vectors we need an additional reductions. */ + if (known_eq (128, GET_MODE_BITSIZE (mode))) + { + /* Always reduce using a V4SI. */ + mask = gen_reg_rtx (V2SImode); + rtx low = gen_reg_rtx (V2SImode); + rtx high = gen_reg_rtx (V2SImode); + emit_insn (gen_neon_vget_lowv4si (low, operands[1])); + emit_insn (gen_neon_vget_highv4si (high, operands[1])); + emit_insn (gen_neon_vpumaxv2si (mask, low, high)); + } + + emit_insn (gen_neon_vpumaxv2si (mask, mask, mask)); + + rtx val = gen_reg_rtx (SImode); + emit_move_insn (val, gen_lowpart (SImode, mask)); + emit_jump_insn (gen_cbranch_cc (operands[0], val, const0_rtx, operands[3])); + DONE; +}) + ;; This pattern is renamed from "vec_extract" to ;; "neon_vec_extract" and this pattern is called ;; by define_expand in vec-common.md file. diff --git a/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c b/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c new file mode 100644 index 0000000000000000000000000000000000000000..2c05aa10d26ed4ac9785672e6e3b4355cef046dc --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c @@ -0,0 +1,136 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_ok } */ +/* { dg-require-effective-target arm32 } */ +/* { dg-options "-O3 -march=armv8-a+simd -mfpu=auto -mfloat-abi=hard" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#define N 640 +int a[N] = {0}; +int b[N] = {0}; + +/* f1: +** ... +** vcgt.s32 q[0-9]+, q[0-9]+, #0 +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vmov r[0-9]+, s[0-9]+ @ int +** cmp r[0-9]+, #0 +** bne \.L[0-9]+ +** ... +*/ +void f1 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] > 0) + break; + } +} + +/* +** f2: +** ... +** vcge.s32 q[0-9]+, q[0-9]+, #0 +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vmov r[0-9]+, s[0-9]+ @ int +** cmp r[0-9]+, #0 +** bne \.L[0-9]+ +** ... +*/ +void f2 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] >= 0) + break; + } +} + +/* +** f3: +** ... +** vceq.i32 q[0-9]+, q[0-9]+, #0 +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vmov r[0-9]+, s[0-9]+ @ int +** cmp r[0-9]+, #0 +** bne \.L[0-9]+ +** ... +*/ +void f3 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] == 0) + break; + } +} + +/* +** f4: +** ... +** vceq.i32 q[0-9]+, q[0-9]+, #0 +** vmvn q[0-9]+, q[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vmov r[0-9]+, s[0-9]+ @ int +** cmp r[0-9]+, #0 +** bne \.L[0-9]+ +** ... +*/ +void f4 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] != 0) + break; + } +} + +/* +** f5: +** ... +** vclt.s32 q[0-9]+, q[0-9]+, #0 +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vmov r[0-9]+, s[0-9]+ @ int +** cmp r[0-9]+, #0 +** bne \.L[0-9]+ +** ... +*/ +void f5 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] < 0) + break; + } +} + +/* +** f6: +** ... +** vcle.s32 q[0-9]+, q[0-9]+, #0 +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vmov r[0-9]+, s[0-9]+ @ int +** cmp r[0-9]+, #0 +** bne \.L[0-9]+ +** ... +*/ +void f6 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] <= 0) + break; + } +} + diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 5516188dc0aa86d161d67dea5a7769e3c3d72f85..8f58671e6cfd3546c6a98e40341fe31c6492594b 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -3784,6 +3784,7 @@ proc check_effective_target_vect_early_break { } { return [check_cached_effective_target_indexed vect_early_break { expr { [istarget aarch64*-*-*] + || [check_effective_target_arm_neon_ok] }}] } # Return 1 if the target supports hardware vectorization of complex additions of --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -408,6 +408,45 @@ (define_insn "vec_extract" [(set_attr "type" "neon_store1_one_lane,neon_to_gp")] ) +;; Patterns comparing two vectors and conditionally jump. +;; Avdanced SIMD lacks a vector != comparison, but this is a quite common +;; operation. To not pay the penalty for inverting == we can map our any +;; comparisons to all i.e. any(~x) => all(x). +;; +;; However unlike the AArch64 version, we can't optimize this further as the +;; chain is too long for combine due to these being unspecs so it doesn't fold +;; the operation to something simpler. +(define_expand "cbranch4" + [(set (pc) (if_then_else + (match_operator 0 "expandable_comparison_operator" + [(match_operand:VDQI 1 "register_operand") + (match_operand:VDQI 2 "zero_operand")]) + (label_ref (match_operand 3 "" "")) + (pc)))] + "TARGET_NEON" +{ + rtx mask = operands[1]; + + /* For 128-bit vectors we need an additional reductions. */ + if (known_eq (128, GET_MODE_BITSIZE (mode))) + { + /* Always reduce using a V4SI. */ + mask = gen_reg_rtx (V2SImode); + rtx low = gen_reg_rtx (V2SImode); + rtx high = gen_reg_rtx (V2SImode); + emit_insn (gen_neon_vget_lowv4si (low, operands[1])); + emit_insn (gen_neon_vget_highv4si (high, operands[1])); + emit_insn (gen_neon_vpumaxv2si (mask, low, high)); + } + + emit_insn (gen_neon_vpumaxv2si (mask, mask, mask)); + + rtx val = gen_reg_rtx (SImode); + emit_move_insn (val, gen_lowpart (SImode, mask)); + emit_jump_insn (gen_cbranch_cc (operands[0], val, const0_rtx, operands[3])); + DONE; +}) + ;; This pattern is renamed from "vec_extract" to ;; "neon_vec_extract" and this pattern is called ;; by define_expand in vec-common.md file. diff --git a/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c b/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c new file mode 100644 index 0000000000000000000000000000000000000000..2c05aa10d26ed4ac9785672e6e3b4355cef046dc --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c @@ -0,0 +1,136 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_ok } */ +/* { dg-require-effective-target arm32 } */ +/* { dg-options "-O3 -march=armv8-a+simd -mfpu=auto -mfloat-abi=hard" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#define N 640 +int a[N] = {0}; +int b[N] = {0}; + +/* f1: +** ... +** vcgt.s32 q[0-9]+, q[0-9]+, #0 +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vmov r[0-9]+, s[0-9]+ @ int +** cmp r[0-9]+, #0 +** bne \.L[0-9]+ +** ... +*/ +void f1 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] > 0) + break; + } +} + +/* +** f2: +** ... +** vcge.s32 q[0-9]+, q[0-9]+, #0 +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vmov r[0-9]+, s[0-9]+ @ int +** cmp r[0-9]+, #0 +** bne \.L[0-9]+ +** ... +*/ +void f2 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] >= 0) + break; + } +} + +/* +** f3: +** ... +** vceq.i32 q[0-9]+, q[0-9]+, #0 +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vmov r[0-9]+, s[0-9]+ @ int +** cmp r[0-9]+, #0 +** bne \.L[0-9]+ +** ... +*/ +void f3 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] == 0) + break; + } +} + +/* +** f4: +** ... +** vceq.i32 q[0-9]+, q[0-9]+, #0 +** vmvn q[0-9]+, q[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vmov r[0-9]+, s[0-9]+ @ int +** cmp r[0-9]+, #0 +** bne \.L[0-9]+ +** ... +*/ +void f4 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] != 0) + break; + } +} + +/* +** f5: +** ... +** vclt.s32 q[0-9]+, q[0-9]+, #0 +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vmov r[0-9]+, s[0-9]+ @ int +** cmp r[0-9]+, #0 +** bne \.L[0-9]+ +** ... +*/ +void f5 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] < 0) + break; + } +} + +/* +** f6: +** ... +** vcle.s32 q[0-9]+, q[0-9]+, #0 +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vmov r[0-9]+, s[0-9]+ @ int +** cmp r[0-9]+, #0 +** bne \.L[0-9]+ +** ... +*/ +void f6 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] <= 0) + break; + } +} + diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 5516188dc0aa86d161d67dea5a7769e3c3d72f85..8f58671e6cfd3546c6a98e40341fe31c6492594b 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -3784,6 +3784,7 @@ proc check_effective_target_vect_early_break { } { return [check_cached_effective_target_indexed vect_early_break { expr { [istarget aarch64*-*-*] + || [check_effective_target_arm_neon_ok] }}] } # Return 1 if the target supports hardware vectorization of complex additions of From patchwork Wed Jun 28 13:50:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 113909 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp8942985vqr; Wed, 28 Jun 2023 06:54:34 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ77SXXqmWFly5/6cXBLlKfzdxSDRCV/LdXiA9hlWh3N3FCiMzoj6jHeX+RDZDwjXjL8HUmJ X-Received: by 2002:a17:907:3f29:b0:973:8cb7:4d81 with SMTP id hq41-20020a1709073f2900b009738cb74d81mr37274922ejc.49.1687960474440; Wed, 28 Jun 2023 06:54:34 -0700 (PDT) Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id bm6-20020a170906c04600b00988a1ad82a2si5661805ejb.510.2023.06.28.06.54.34 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Jun 2023 06:54:34 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=HOzMhCGG; arc=fail (signature failed); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 90E4A3856261 for ; Wed, 28 Jun 2023 13:51:07 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 90E4A3856261 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1687960267; bh=ThjR9mbvVylJsl5v5LPHfm8FX2jSKAafX7New0NOTog=; h=Date:To:Cc:Subject:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=HOzMhCGGrnhJh84o92hH67KQAuz2d+QHNMBRS8UMH7Z8Okt91M8Bx7W5lCYifsYlT qikuuMcNpMGJ09rSk8jwlNNMejJFW0uqmlDJNnQLbye+2Jir1T7sC6lajfdvjfTiVP 0jbLrHubAGqmGK2BEGMO1uLe0mASV8tVkZBybzhg= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR04-DB3-obe.outbound.protection.outlook.com (mail-db3eur04on2041.outbound.protection.outlook.com [40.107.6.41]) by sourceware.org (Postfix) with ESMTPS id 03CFE387102D for ; Wed, 28 Jun 2023 13:50:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 03CFE387102D Received: from AS9PR05CA0049.eurprd05.prod.outlook.com (2603:10a6:20b:489::9) by DB3PR08MB8820.eurprd08.prod.outlook.com (2603:10a6:10:438::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.24; Wed, 28 Jun 2023 13:50:15 +0000 Received: from AM7EUR03FT046.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:489:cafe::7) by AS9PR05CA0049.outlook.office365.com (2603:10a6:20b:489::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.34 via Frontend Transport; Wed, 28 Jun 2023 13:50:15 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM7EUR03FT046.mail.protection.outlook.com (100.127.140.78) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6544.20 via Frontend Transport; Wed, 28 Jun 2023 13:50:15 +0000 Received: ("Tessian outbound 7c913606c6e6:v142"); Wed, 28 Jun 2023 13:50:15 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 3867f126892125ca X-CR-MTA-TID: 64aa7808 Received: from d6dd77ff0507.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id B471C6DE-1D1D-4F84-AD69-8CAD6A070111.1; Wed, 28 Jun 2023 13:50:07 +0000 Received: from EUR05-DB8-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id d6dd77ff0507.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 28 Jun 2023 13:50:07 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=NzPU6XOZe6djhUtiGYq7/FQVZMsba3Idw3w0YmPQK0GtuyDGjCMFzlKMVDh2U34/8CYHaVi7ZYgaZcS6EIMM/oDni2mEEY9ma+9xU8dkGZFoEF/H2F0AI4uOZA3B78NPbWcCyCzp3hl1qEyuWh64/5K/6nKsgxmKQy2Vnx6MulWW1kBXdZkRLHxSTEWXnSPMBvhRfyx4jgLQ/Y10PoXP4Orbu0Z4OiHDsajPHF0As9r/8iCmLdu+oNIJxSx7Ymvy7wjkwuPpRulqwuRGSXiARV/SkOXcsF4AXMQezcJ3n/THwAPCFD9i4etT3lZVv764XVy42dbiUkbJVfRx+PuuIQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ThjR9mbvVylJsl5v5LPHfm8FX2jSKAafX7New0NOTog=; b=h+4iw/ZiO9gWulPisa3sslApkbAvWqLgaHEOqsPgTc3gUiEcQ7Zt48IwcJz/JDKr6+/bERKByQuUo86HOcfGjaHXD4pu9AFnLn6zHjEY4PRmiKaCrQWAQc0QFirTT26smedr3/gdDsDTfiYbYpXmgUv5NgaQ+1XpHyh7rh3sWyTejdbgp4FZKT8r1MPbGW1HIvwDRpGMnbpvg8fTnBO6IainO2sRkQvH8B5WAoezm5LFvMySMUNU6LfVKdaOhqRxjWlxSYk8AuBKRpJDsKQAG1uSYMQD6CYjSnAXaAxAvI+34pqy1poibYFBy3XbBGcDR2xIGVaBG5u9qrv/v0Jxew== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by DB5PR08MB9970.eurprd08.prod.outlook.com (2603:10a6:10:489::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.23; Wed, 28 Jun 2023 13:50:06 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::2301:1cde:cfe7:eaf0]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::2301:1cde:cfe7:eaf0%6]) with mapi id 15.20.6521.026; Wed, 28 Jun 2023 13:50:06 +0000 Date: Wed, 28 Jun 2023 14:50:03 +0100 To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, Ramana.Radhakrishnan@arm.com, Richard.Earnshaw@arm.com, nickc@redhat.com, Kyrylo.Tkachov@arm.com Subject: [PATCH 19/19]Arm: Add MVE cbranch implementation Message-ID: Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: LO4P123CA0153.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:188::14) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|DB5PR08MB9970:EE_|AM7EUR03FT046:EE_|DB3PR08MB8820:EE_ X-MS-Office365-Filtering-Correlation-Id: 31a51854-63c5-49b0-3f28-08db77de9a0f X-LD-Processed: f34e5979-57d9-4aaa-ad4d-b122a662184d,ExtAddr x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: oVSKDulZ3GT1a68Q5ieJhWZ9tfiMgdLETAxjnhC/Q5bj5CGPiy+RehguX9xSZ7lNjbB+Ew7y7f3fNS9GTFozlnDEZ5D56dOrAd9pYVNQG15CDGTzDKZ11tF/klOA9f+9emWCM99iURk+b6NuJI8s5F35RAkrT4r/Qs7y8HnsCE4UDISHRSC768V6NOMTpeQbeBP11cL/NGUG6Je2dRReyZ8rgp/gadFvi8lFRl3X6eTLUd4pP1HFt5R49KRsoP/EVnSHHul3JsWwHjwFhteIuExmnlVtdRIOOZZsbmiXvShrITmi9mRdSWTkdI7Gbzt4levRS3mcm6HPearzSS1KaMtuuMTXrgLmbH4d9NXabfehJES5g0+EiGTWCp2DMMQTzw//S8idiF00TNzGii244MDBm2j49BJlEYICHsEcKVRf8L2vMLBfCu6Y+xpyTdgD4G37mde3Dn5xxgLu1lZKpo6JEQ4TlURACboV2JnpLquZOWZpYx88iMKcdg+2hj06qrGUY7wga1QY+otBdCIXvSeaeRvGCJOd2i7p7X7MUZ9hvl8mx3EXw5C32S/gLyhSoNu7PjTaXoLWr1CkiLsTTEw/Enm6OdexrW/0anZtPF0= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230028)(4636009)(136003)(346002)(396003)(376002)(366004)(39860400002)(451199021)(66946007)(66476007)(6916009)(44832011)(316002)(66556008)(478600001)(4326008)(36756003)(8936002)(235185007)(8676002)(84970400001)(5660300002)(86362001)(2906002)(33964004)(41300700001)(44144004)(6486002)(186003)(4743002)(6506007)(6512007)(6666004)(38100700002)(26005)(83380400001)(2616005)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB5PR08MB9970 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM7EUR03FT046.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 93243f39-45e4-4a64-8021-08db77de9453 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: grHm3AG7VkL6cRyYQX5tZr/9t+GDt0KRxEWK9QKcBQ8voqt3VIkzxSKGSV58glDJKaxUe8Eo8WGqkGM0Oe36dPg/LmeMMdSAvRa0ZUoQU7tyfHmWafucKFdLzOMDeun+NxrHl6UoZnDQtAtSF1LU0f+pWzdUkQQOCrq+RBg6WVrhRnAyoafddy8ZcmjfRPLD2q+fjR49cRR8O6QnbaW0LfDSOFoDD2cPe2DBLIfPzgNodr+AghtUFB1CJhOjphR4jBF2S5CBBUlgTlPvN8ym1xoRg4dwFH5rIxj1Eo1RO8IF2+j2gmGSyRcAqWwP5d+sn6UXs7LcFTwOGn47n5oBZytmiTO4k7t5XguHRrWh7hVx7TMVs7WhexPVIlsHzla4r6Y3yvUJu8FeghRZ2lHuP3B6ijLN4mI35nIyZc337kxjQes2XZ43ctKaHRu4XaG5Z0Cq2ERdjwlGVuKRl07WrC58YzOUpIAkbjNn5jG5iu9AL5Hl6qm2t93izyIRhRsYjH9bRdmVPddzLl6D0eMsb32M1hgRkElejbgkK0bnbd9UiT1Sc6+RKnVgpNmSAxxZzn9oByINjdThwkP+OikYmJMQAH1gFiEjbi/tsosc472/nODVWHylrPb1WQ6E2YJ3nlT/pOnuTqNJXawNfTx9CzVwx6Pp64IkJkxXvCmCVZj+bHByTVlP+0jJVo4rEEsQtVguTwjIWJnp8EuGKlT0hGmU2USf2Re2qHIvppBLODIO3kpenTjpSQNqZTwjWK3N2Crab78c/Sdy3WzbHdvJKw== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230028)(4636009)(396003)(39860400002)(136003)(376002)(346002)(451199021)(36840700001)(46966006)(40470700004)(84970400001)(44832011)(235185007)(5660300002)(4326008)(6916009)(70206006)(478600001)(36756003)(316002)(70586007)(8936002)(8676002)(2906002)(4743002)(40460700003)(33964004)(36860700001)(41300700001)(6486002)(44144004)(82310400005)(186003)(40480700001)(336012)(86362001)(26005)(6506007)(6512007)(47076005)(6666004)(356005)(81166007)(2616005)(82740400003)(83380400001)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Jun 2023 13:50:15.3567 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 31a51854-63c5-49b0-3f28-08db77de9a0f X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM7EUR03FT046.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB3PR08MB8820 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_LOTSOFHASH, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tamar Christina via Gcc-patches From: Tamar Christina Reply-To: Tamar Christina Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769954842429889504?= X-GMAIL-MSGID: =?utf-8?q?1769954842429889504?= Hi All, This adds an implementation for conditional branch optab for MVE. Unfortunately MVE has rather limited operations on VPT.P0, we are missing the ability to do P0 comparisons and logical OR on P0. For that reason we can only support cbranch with 0, as for comparing to a 0 predicate we don't need to actually do a comparison, we only have to check that any bit is set within P0. Because we can only do P0 comparisons with 0, the costing of the comparison was reduced in order for the compiler not to try to push 0 to a register thinking it's too expensive. For the cbranch implementation to be safe we must see the constant 0 vector. For the lack of logical OR on P0 we can't really work around. This means MVE can't support cases where the sizes of operands in the comparison don't match, i.e. when one operand has been unpacked. For e.g. void f1 () { for (int i = 0; i < N; i++) { b[i] += a[i]; if (a[i] > 0) break; } } For 128-bit vectors we generate: vcmp.s32 gt, q3, q1 vmrs r3, p0 @ movhi cbnz r3, .L2 MVE does not have 64-bit vector comparisons, as such that is also not supported. Bootstrapped arm-none-linux-gnueabihf and regtested with -march=armv8.1-m.main+mve -mfpu=auto and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/arm/arm.cc (arm_rtx_costs_internal): Update costs for pred 0 compares. * config/arm/mve.md (cbranch4): New. gcc/testsuite/ChangeLog: * lib/target-supports.exp (vect_early_break): Add MVE. * gcc.target/arm/mve/vect-early-break-cbranch.c: New test. --- inline copy of patch -- diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc index 38f0839de1c75547c259ac3d655fcfc14e7208a2..15e65c15cb3cb6f70161787e84b255a24eb51e32 100644 --- diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc index 38f0839de1c75547c259ac3d655fcfc14e7208a2..15e65c15cb3cb6f70161787e84b255a24eb51e32 100644 --- a/gcc/config/arm/arm.cc +++ b/gcc/config/arm/arm.cc @@ -11883,6 +11883,15 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code, || TARGET_HAVE_MVE) && simd_immediate_valid_for_move (x, mode, NULL, NULL)) *cost = COSTS_N_INSNS (1); + else if (TARGET_HAVE_MVE + && outer_code == COMPARE + && VALID_MVE_PRED_MODE (mode)) + /* MVE allows very limited instructions on VPT.P0, however comparisons + to 0 do not require us to materialze this constant or require a + predicate comparison as we can go through SImode. For that reason + allow P0 CMP 0 as a cheap operation such that the 0 isn't forced to + registers as we can't compare two predicates. */ + *cost = COSTS_N_INSNS (1); else *cost = COSTS_N_INSNS (4); return true; diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md index 74909ce47e132c22a94f7d9cd3a0921b38e33051..95d40770ecc25f9eb251eba38306dd43cbebfb3f 100644 --- a/gcc/config/arm/mve.md +++ b/gcc/config/arm/mve.md @@ -6880,6 +6880,21 @@ (define_expand "vcond_mask_" DONE; }) +(define_expand "cbranch4" + [(set (pc) (if_then_else + (match_operator 0 "expandable_comparison_operator" + [(match_operand:MVE_7 1 "register_operand") + (match_operand:MVE_7 2 "zero_operand")]) + (label_ref (match_operand 3 "" "")) + (pc)))] + "TARGET_HAVE_MVE" +{ + rtx val = gen_reg_rtx (SImode); + emit_move_insn (val, gen_lowpart (SImode, operands[1])); + emit_jump_insn (gen_cbranchsi4 (operands[0], val, const0_rtx, operands[3])); + DONE; +}) + ;; Reinterpret operand 1 in operand 0's mode, without changing its contents. (define_expand "@arm_mve_reinterpret" [(set (match_operand:MVE_vecs 0 "register_operand") diff --git a/gcc/testsuite/gcc.target/arm/mve/vect-early-break-cbranch.c b/gcc/testsuite/gcc.target/arm/mve/vect-early-break-cbranch.c new file mode 100644 index 0000000000000000000000000000000000000000..c3b8506dca0b2b044e6869a6c8259d663c1ff930 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/vect-early-break-cbranch.c @@ -0,0 +1,117 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_v8_1m_mve_ok } */ +/* { dg-add-options arm_v8_1m_mve } */ +/* { dg-options "-O3" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#define N 640 +int a[N] = {0}; +int b[N] = {0}; + +/* +** f1: +** ... +** vcmp.s32 gt, q[0-9]+, q[0-9]+ +** vmrs r[0-9]+, p0 @ movhi +** cbnz r[0-9]+, \.L[0-9]+ +** ... +*/ +void f1 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] > 0) + break; + } +} + +/* +** f2: +** ... +** vcmp.s32 ge, q[0-9]+, q[0-9]+ +** vmrs r[0-9]+, p0 @ movhi +** cbnz r[0-9]+, \.L[0-9]+ +** ... +*/ +void f2 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] >= 0) + break; + } +} + +/* +** f3: +** ... +** vcmp.i32 eq, q[0-9]+, q[0-9]+ +** vmrs r[0-9]+, p0 @ movhi +** cbnz r[0-9]+, \.L[0-9]+ +** ... +*/ +void f3 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] == 0) + break; + } +} + +/* +** f4: +** ... +** vcmp.i32 ne, q[0-9]+, q[0-9]+ +** vmrs r[0-9]+, p0 @ movhi +** cbnz r[0-9]+, \.L[0-9]+ +** ... +*/ +void f4 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] != 0) + break; + } +} + +/* +** f5: +** ... +** vcmp.s32 lt, q[0-9]+, q[0-9]+ +** vmrs r[0-9]+, p0 @ movhi +** cbnz r[0-9]+, \.L[0-9]+ +** ... +*/ +void f5 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] < 0) + break; + } +} + +/* +** f6: +** ... +** vcmp.s32 le, q[0-9]+, q[0-9]+ +** vmrs r[0-9]+, p0 @ movhi +** cbnz r[0-9]+, \.L[0-9]+ +** ... +*/ +void f6 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] <= 0) + break; + } +} diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 8f58671e6cfd3546c6a98e40341fe31c6492594b..1eef764542a782786e27ed935a06243e319ae3fc 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -3785,6 +3785,8 @@ proc check_effective_target_vect_early_break { } { expr { [istarget aarch64*-*-*] || [check_effective_target_arm_neon_ok] + || ([check_effective_target_arm_v8_1m_mve_fp_ok] + && [check_effective_target_arm_little_endian]) }}] } # Return 1 if the target supports hardware vectorization of complex additions of --- a/gcc/config/arm/arm.cc +++ b/gcc/config/arm/arm.cc @@ -11883,6 +11883,15 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code, || TARGET_HAVE_MVE) && simd_immediate_valid_for_move (x, mode, NULL, NULL)) *cost = COSTS_N_INSNS (1); + else if (TARGET_HAVE_MVE + && outer_code == COMPARE + && VALID_MVE_PRED_MODE (mode)) + /* MVE allows very limited instructions on VPT.P0, however comparisons + to 0 do not require us to materialze this constant or require a + predicate comparison as we can go through SImode. For that reason + allow P0 CMP 0 as a cheap operation such that the 0 isn't forced to + registers as we can't compare two predicates. */ + *cost = COSTS_N_INSNS (1); else *cost = COSTS_N_INSNS (4); return true; diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md index 74909ce47e132c22a94f7d9cd3a0921b38e33051..95d40770ecc25f9eb251eba38306dd43cbebfb3f 100644 --- a/gcc/config/arm/mve.md +++ b/gcc/config/arm/mve.md @@ -6880,6 +6880,21 @@ (define_expand "vcond_mask_" DONE; }) +(define_expand "cbranch4" + [(set (pc) (if_then_else + (match_operator 0 "expandable_comparison_operator" + [(match_operand:MVE_7 1 "register_operand") + (match_operand:MVE_7 2 "zero_operand")]) + (label_ref (match_operand 3 "" "")) + (pc)))] + "TARGET_HAVE_MVE" +{ + rtx val = gen_reg_rtx (SImode); + emit_move_insn (val, gen_lowpart (SImode, operands[1])); + emit_jump_insn (gen_cbranchsi4 (operands[0], val, const0_rtx, operands[3])); + DONE; +}) + ;; Reinterpret operand 1 in operand 0's mode, without changing its contents. (define_expand "@arm_mve_reinterpret" [(set (match_operand:MVE_vecs 0 "register_operand") diff --git a/gcc/testsuite/gcc.target/arm/mve/vect-early-break-cbranch.c b/gcc/testsuite/gcc.target/arm/mve/vect-early-break-cbranch.c new file mode 100644 index 0000000000000000000000000000000000000000..c3b8506dca0b2b044e6869a6c8259d663c1ff930 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/vect-early-break-cbranch.c @@ -0,0 +1,117 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_v8_1m_mve_ok } */ +/* { dg-add-options arm_v8_1m_mve } */ +/* { dg-options "-O3" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#define N 640 +int a[N] = {0}; +int b[N] = {0}; + +/* +** f1: +** ... +** vcmp.s32 gt, q[0-9]+, q[0-9]+ +** vmrs r[0-9]+, p0 @ movhi +** cbnz r[0-9]+, \.L[0-9]+ +** ... +*/ +void f1 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] > 0) + break; + } +} + +/* +** f2: +** ... +** vcmp.s32 ge, q[0-9]+, q[0-9]+ +** vmrs r[0-9]+, p0 @ movhi +** cbnz r[0-9]+, \.L[0-9]+ +** ... +*/ +void f2 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] >= 0) + break; + } +} + +/* +** f3: +** ... +** vcmp.i32 eq, q[0-9]+, q[0-9]+ +** vmrs r[0-9]+, p0 @ movhi +** cbnz r[0-9]+, \.L[0-9]+ +** ... +*/ +void f3 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] == 0) + break; + } +} + +/* +** f4: +** ... +** vcmp.i32 ne, q[0-9]+, q[0-9]+ +** vmrs r[0-9]+, p0 @ movhi +** cbnz r[0-9]+, \.L[0-9]+ +** ... +*/ +void f4 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] != 0) + break; + } +} + +/* +** f5: +** ... +** vcmp.s32 lt, q[0-9]+, q[0-9]+ +** vmrs r[0-9]+, p0 @ movhi +** cbnz r[0-9]+, \.L[0-9]+ +** ... +*/ +void f5 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] < 0) + break; + } +} + +/* +** f6: +** ... +** vcmp.s32 le, q[0-9]+, q[0-9]+ +** vmrs r[0-9]+, p0 @ movhi +** cbnz r[0-9]+, \.L[0-9]+ +** ... +*/ +void f6 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] <= 0) + break; + } +} diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 8f58671e6cfd3546c6a98e40341fe31c6492594b..1eef764542a782786e27ed935a06243e319ae3fc 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -3785,6 +3785,8 @@ proc check_effective_target_vect_early_break { } { expr { [istarget aarch64*-*-*] || [check_effective_target_arm_neon_ok] + || ([check_effective_target_arm_v8_1m_mve_fp_ok] + && [check_effective_target_arm_little_endian]) }}] } # Return 1 if the target supports hardware vectorization of complex additions of From patchwork Mon Nov 6 07:42:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 161911 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:8f47:0:b0:403:3b70:6f57 with SMTP id j7csp2540276vqu; Mon, 6 Nov 2023 01:36:10 -0800 (PST) X-Google-Smtp-Source: AGHT+IHL0E42E4OOHbd94mesPM84pW08curcoJBM5PhKSK2VsUEdkFVt87oVOtFZGb/uYfqDKaD2 X-Received: by 2002:a05:620a:444b:b0:778:9824:4b6c with SMTP id w11-20020a05620a444b00b0077898244b6cmr34513408qkp.16.1699263370008; Mon, 06 Nov 2023 01:36:10 -0800 (PST) ARC-Seal: i=4; a=rsa-sha256; t=1699263369; cv=pass; d=google.com; s=arc-20160816; b=mwKC108Ws1wPZb7qPvbvYEogK7Nk+b7mK5bDVg6OR18Tk3/feFP1x7WIP9VSm+gHLy RjZqvk/Rjeh+xE1ttepld+tdJkjgsdfgEvL0433f/WdwudTnskDjYOtE57ZvS0TXG3OS I2I7Ln+/vlNWnJrQ1IGHoCNgMHXV+8pSnuhXCjvJIPyZzYZq6e9rIvT/4rJaBDrfEuoA 636TLi0v1ET/syWTzdECcwYv3plWBmqtM0Ve29LQvUL8FWAatP3dMpUJJLQc5qkywFRY TArBstAPC7nFYxaOcX4A88f4RJCKw5ZoOrq6wN6G2zAHcyDPVYNj5L40fB5L6wUIln08 eCGQ== ARC-Message-Signature: i=4; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:original-authentication-results :nodisclaimer:mime-version:in-reply-to:content-disposition :message-id:subject:cc:to:from:date:authentication-results-original :dkim-signature:dkim-signature:arc-filter:dmarc-filter:delivered-to; bh=PZ8GF3Tk4wa3SgiA8DLwo6zgJ7BwC5NOEULj2T5+evQ=; fh=Yt1FGz6RyV7+RqQlNAyvJO9M2CgMZgoxOe6Taq+wFaM=; b=BkeT+OJhcoabzBvYDuhTrps0hXE/v+uHyhGZ/A7coTmjVbAZrU12wnRzmW9PkJG99N GCEnN2kpg6C5NKOawPnTgaxtgeJ6t8hnN9EuyuguLbYLhOmCuy4ZusdyzXgg4PVyYK6O Lgq0BdQlsliZt8jBkwSW8DYclAER0Gl99y2sDkx1fdxbbJe53wyGbgFDj0O0BEo+rVCS ymDRmKf+sBY32Z0+sY19GILboeD2o9uzXI+N0vUI32JZspIBIqinVhpxMNp0rCit03GR CqFF9bvfJNjOEY9A3tSOHGt8IsI2/16EAOohsBNkaLly8sN0qa925cRTooK/52L7evWh bRRg== ARC-Authentication-Results: i=4; mx.google.com; dkim=pass header.i=@armh.onmicrosoft.com header.s=selector2-armh-onmicrosoft-com header.b=nz7hf0fS; dkim=pass header.i=@armh.onmicrosoft.com header.s=selector2-armh-onmicrosoft-com header.b=nz7hf0fS; arc=pass (i=3); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id pr13-20020a05620a86cd00b0076edc6ca0afsi5756081qkn.172.2023.11.06.01.36.09 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 Nov 2023 01:36:09 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@armh.onmicrosoft.com header.s=selector2-armh-onmicrosoft-com header.b=nz7hf0fS; dkim=pass header.i=@armh.onmicrosoft.com header.s=selector2-armh-onmicrosoft-com header.b=nz7hf0fS; arc=pass (i=3); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 27F483882047 for ; Mon, 6 Nov 2023 07:44:27 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR05-AM6-obe.outbound.protection.outlook.com (mail-am6eur05on2052.outbound.protection.outlook.com [40.107.22.52]) by sourceware.org (Postfix) with ESMTPS id D04663831396 for ; Mon, 6 Nov 2023 07:42:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D04663831396 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D04663831396 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.22.52 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1699256582; cv=pass; b=tjV2CTUxuvtYFYFC1y87WsNnpP3WXvCXREeDMZo3Sd5yu8zKwOhFoSSN/5396jE4bJDkfSNZl+X1XDhMeRXXeN5gVqywrPTi7bnM97hvBdMlJZPeIDvNZ0IBgM2xNik3IuA6jIMK0VMToNtg9DuY6YB6GBuutJtwgsH+785b83I= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1699256582; c=relaxed/simple; bh=swpwyzUOqF0DWG2eDtldPVHpeG8Oslq2Zl7zIRaG/to=; h=DKIM-Signature:DKIM-Signature:Date:From:To:Subject:Message-ID: MIME-Version; b=WRki9C7sDNJylGzrI7ZuYsSkIVNBQDRdKJxTl+YlWldf97e6H4IafRXKjRwQVAgbQFXag2eZpuSoE4N2gGlDy1ucRJ3DmZr1aP9dJb7mAavTHxdN4BqjqwL79RAG4H5ACyGW8xVkmp94f02zBjMlV7LkmYUlwKzFPWnA7mqHty8= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=SzwuEvWe68qsjuYJQmk+0FWM9smAr+L4aoEKcoVgIfa1rbdACXzI8RNlb2abtbPREc/Ip+VePzwtCbXaDy1ORYfJTA/CI+JDYMrFSriht9IdlwUxn8qwBtyfR8pljN0XcGzC5Hb6LO/Xh3y+G4Ixd7B4A++amNQjAOy0R1dgZ38auFANxWH+ol9TQPkKdxGGiX100AaGhTYyBT3884+6uZ3XWsXl7k3Pcby5fJTD0ruVcM859THcyK4ABHvyrB0pixHEVRQzFXzegpftEbBLyLXdxApGfVQhb58d8of4Ozy1KIUowXVeGFQw6Nykvc6c+ZIx32XixQEZ0YnyfBqHTg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=PZ8GF3Tk4wa3SgiA8DLwo6zgJ7BwC5NOEULj2T5+evQ=; b=LK4C4UH2wTvuwrRjrIE5hVa2lfbQC6SS9pk8GuLuq/ZzKbaVrzRM/x1iyM8ZJLabaMQ0ZSrOeaIwI7D7BJ55fZuHS/uyCx70SO2981nvpG93Xlm/I+vqaiR8XfRwFe6c9d7WcVSxWurExoKZWYko9iA6D+PXEMTmJgo926vp00E54nFXjsf1/HE5pq9MsVEZtzBzj8PX4zl1D5AhxKHkAPZR9u165QsIToSqTGvn+1EJFGxG5JJVBY0PUd/0+fS8MyWL1xjQXwJQeds1a0qCgcuRSJtsWOhBo+K1MNnEQyR8umDpVMS3Kk3hJSj39v1iT0oiQdwaAGDbOAunkPg/4g== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=PZ8GF3Tk4wa3SgiA8DLwo6zgJ7BwC5NOEULj2T5+evQ=; b=nz7hf0fSPdTr7LU6pNVPhqlXR2KXtUJ5/NH1YMM9w42LsaQmuUhdeSVa1sw/WyCmOBWCJnbSVwhw9birYXBVqHWXQ34whm3xWW1sZQBIynH5dyvs0UDlSmi1UbtIJHlPUVZK4utENWWl7qVAMYJ3w3+R6EGeupZZpfcr4uBS/c4= Received: from DUZPR01CA0197.eurprd01.prod.exchangelabs.com (2603:10a6:10:4b6::11) by DU0PR08MB9273.eurprd08.prod.outlook.com (2603:10a6:10:419::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6954.28; Mon, 6 Nov 2023 07:42:57 +0000 Received: from DB5PEPF00014B99.eurprd02.prod.outlook.com (2603:10a6:10:4b6:cafe::72) by DUZPR01CA0197.outlook.office365.com (2603:10a6:10:4b6::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6954.28 via Frontend Transport; Mon, 6 Nov 2023 07:42:57 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DB5PEPF00014B99.mail.protection.outlook.com (10.167.8.166) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6977.16 via Frontend Transport; Mon, 6 Nov 2023 07:42:56 +0000 Received: ("Tessian outbound 7c4ecdadb9e7:v228"); Mon, 06 Nov 2023 07:42:56 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: fad1c4cff1ff24ae X-CR-MTA-TID: 64aa7808 Received: from d182cc641d6a.3 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 1CE40699-F481-42A7-A01A-ED992BA002AF.1; Mon, 06 Nov 2023 07:42:49 +0000 Received: from EUR05-AM6-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id d182cc641d6a.3 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 06 Nov 2023 07:42:49 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Oc9H+VEXS1mLVe1R8+t9OHW6FRnSbKJbS7IH29N6rYm9aOzttoIaKV7Ui6Uzr7Yt1suDHwtg2a2UXjQMKf6dKu9o5HpRr+Cty2QnHU/kvrpKxtVR8dbLspmzDm+ROw/6mYGDV995zHMoqtX08mZwTX8GUT6jWsHtHdS7olF6anUzx+5JR8zJogpqK7KC0azFubYEd88Qwil8CGdPyhcfbTRTTowm74WDlOBFPbzG7fEpRHKyABlQJndR5sMyHlh3ssLZFZm6cpD1T438hip/LRSAPozwP/YS0znZOY2ac+8q0iqu/Uf63+v2xRntsLDKibIVluM744IXrtF5vQna9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=PZ8GF3Tk4wa3SgiA8DLwo6zgJ7BwC5NOEULj2T5+evQ=; b=OKx1BTSdEFWENKhIVJMfL8qSBfvueeD5sdRk07J9n7dURT4G8yXgWuUm8LrmCjzhOnxwd1gfZqgaZZ+ul13GCtSZn2IimeWc0UMKIVqIAEF897zUhWMBtE3eO8XqmgYg1sh+CCilqFzvesAvuO9nonxC0vRYVN8vtpnxUX8gC/+bzDGzHCOIsYg5uxONkGoo+zPbLwCUVRyvbhGnbvtBBIfMvUKICagFK6BKAbf60hP9hNJkmvHE5j56+3Sm2XZa0LCVyUIaNzbTG4HZdLmeaLzq7E2x0HtFNM6PFxki7i/BIk7QJBosyKQKyHOSppjFx4QJI2bAFEKMAQUPCMtxQw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=PZ8GF3Tk4wa3SgiA8DLwo6zgJ7BwC5NOEULj2T5+evQ=; b=nz7hf0fSPdTr7LU6pNVPhqlXR2KXtUJ5/NH1YMM9w42LsaQmuUhdeSVa1sw/WyCmOBWCJnbSVwhw9birYXBVqHWXQ34whm3xWW1sZQBIynH5dyvs0UDlSmi1UbtIJHlPUVZK4utENWWl7qVAMYJ3w3+R6EGeupZZpfcr4uBS/c4= Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by AS8PR08MB6694.eurprd08.prod.outlook.com (2603:10a6:20b:39e::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6954.28; Mon, 6 Nov 2023 07:42:47 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::26aa:efdd:a74a:27d0]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::26aa:efdd:a74a:27d0%5]) with mapi id 15.20.6954.028; Mon, 6 Nov 2023 07:42:47 +0000 Date: Mon, 6 Nov 2023 07:42:45 +0000 From: Tamar Christina To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, Ramana.Radhakrishnan@arm.com, Richard.Earnshaw@arm.com, nickc@redhat.com, Kyrylo.Tkachov@arm.com Subject: [PATCH 20/21]Arm: Add Advanced SIMD cbranch implementation Message-ID: Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: LO2P265CA0059.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:60::23) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|AS8PR08MB6694:EE_|DB5PEPF00014B99:EE_|DU0PR08MB9273:EE_ X-MS-Office365-Filtering-Correlation-Id: dd289889-57d8-4367-c61e-08dbde9bfe30 X-LD-Processed: f34e5979-57d9-4aaa-ad4d-b122a662184d,ExtAddr x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: QXeW2keOL0t6rUlnoFumzA3dyxdZv+13NOiDrA114w1mcGZZXJkYQ4sqVzzgwZHSLDMK34Yt5+DLL9UW+M96gnuNkwmYqiyK8pccH/CQtcxxnf3g0SrE+geN6K9WdEFw5xPdBNjkyLpgUhosuu+VfgMWuC3rrC1+k0GsA81vFiXpfZ5Nq+/8HUe1Z8zY7Y/W36phkHrFTdU9lWw6/Pylba9at+zkk/f2OO6WIy6BU4TiiX+222glb8tM4P0liZZrFgmkCv0Zav7YZrHzydOIxPGagrcumdW1rYSX1zX6HC9VjDyxu9BkhME6/CCfcjVwRhtUydW3Rn90gruh3t4V259kM7/V+DL7UcXVGaoBa4TXmb4FbiXxO6+f6kL4DynLgMks64UcYBKcryg72CDkIBVP1q5AFUaQwu7IqPwftesVBuh6HyrHBeAJceGXI9J9bJ4FXhXoI4CsGcP2gfwwVUH5GPQ9rXRFDkp4lNoItz94UidzhWzfqbBSJWC8QuEobcrNFz6wqjRtuKGfIU1QJAVZpJEID0GZlAr9vJ/N7/Qi7sobZGQpwzcH3ZT00sRwkIWCcryz3rpIAQCBNL9Uv2GylkEGofkeo7dVEY6fy64= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230031)(39860400002)(366004)(396003)(376002)(136003)(346002)(230922051799003)(64100799003)(451199024)(186009)(1800799009)(2906002)(38100700002)(4326008)(8676002)(8936002)(41300700001)(84970400001)(5660300002)(86362001)(235185007)(4743002)(44832011)(6512007)(2616005)(316002)(66556008)(66476007)(66946007)(6916009)(6506007)(33964004)(44144004)(26005)(6486002)(478600001)(36756003)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB6694 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB5PEPF00014B99.eurprd02.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 48aa3927-40ae-4c19-3ff0-08dbde9bf86f X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: wnPIJseKtzCHqQfmYn9dXImhOnNvBayRJiSR5eHRqmw1kW0YZqo/Kp80GMUSGJFSfH0v2MJaqu5uTSHn8XCP8TBcxRIYXGAEqU1JFLJ3/8AlF6xsTPTd53+38ZAivL1gYVhMbqYAD3qKHse5hMufkT0hMd8UYpKv0eyDlA+ybjr7co42jOcd2vB660jTFc/WUOfQAhanOivWk51fBf5aGb/4ASlEw16kYjeVkPKUy9a4IfpoqOa82CpZPjYGdgZDCzocrDBPqy1+Zszms0f7rHe0Ti7nBZ6jUUyAwd8jO5yiNFW4Ce1C7x5TV1QjNxPGs1m4eeSuuAkZvdrKOaX0rYRN8EjV4VynAIHcGMiAs914TfRnoZjXNXSj++MBi68BzFVuW2KUotOnJiIw33W2pJ/SjSA5ih9firJfCy6yrnzkxpVeaBTtdIkVX+fglbJMpACJJQ3pMFBP7+yq1RfGTOaG8jW5risRRN+NpzXVUoVBgdg8cs5QeXnAnRnZuyQ5iiR1ID+ges5XkVvuSkqEplyzYyC+3fWyQQ5BT0A79A7o11RF6iKTYml0mFJ5J7gtazFcc4tSwXpmnNzruWc3qx4cFKDGi4UdDsvaXcK27v/UtlbT/tAMFdTQ+cknQF1cDFnhwmhpFLQRZ9VTb9yaKWDMwILPGNuiI47oXBD9R7d7llZR3qjz9Occd/NdWCsAxdw+OTO9CJPgnPGFQEdP7bbm8ZbVjz9yrQq5hzU/oIJwHEKAdfPZZcl9cF5et69i0o0C1QBjJAdV8+aEf7Oavw== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230031)(4636009)(136003)(376002)(39860400002)(346002)(396003)(230922051799003)(186009)(451199024)(82310400011)(64100799003)(1800799009)(36840700001)(46966006)(40470700004)(6506007)(44144004)(33964004)(44832011)(86362001)(8936002)(8676002)(6486002)(478600001)(4326008)(84970400001)(5660300002)(36756003)(6512007)(6916009)(316002)(70586007)(70206006)(2616005)(235185007)(40480700001)(2906002)(26005)(4743002)(336012)(40460700003)(47076005)(36860700001)(41300700001)(356005)(82740400003)(81166007)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Nov 2023 07:42:56.8963 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: dd289889-57d8-4367-c61e-08dbde9bfe30 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DB5PEPF00014B99.eurprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU0PR08MB9273 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_LOTSOFHASH, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1769954901362996134 X-GMAIL-MSGID: 1781806787523101107 Hi All, This adds an implementation for conditional branch optab for AArch32. For e.g. void f1 () { for (int i = 0; i < N; i++) { b[i] += a[i]; if (a[i] > 0) break; } } For 128-bit vectors we generate: vcgt.s32 q8, q9, #0 vpmax.u32 d7, d16, d17 vpmax.u32 d7, d7, d7 vmov r3, s14 @ int cmp r3, #0 and of 64-bit vector we can omit one vpmax as we still need to compress to 32-bits. Bootstrapped Regtested on arm-none-linux-gnueabihf and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/arm/neon.md (cbranch4): New. gcc/testsuite/ChangeLog: * lib/target-supports.exp (vect_early_break): Add AArch32. * gcc.target/arm/vect-early-break-cbranch.c: New test. --- inline copy of patch -- diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index d213369ffc38fb88ad0357d848cc7da5af73bab7..130efbc37cfe3128533599dfadc344d2243dcb63 100644 --- diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index d213369ffc38fb88ad0357d848cc7da5af73bab7..130efbc37cfe3128533599dfadc344d2243dcb63 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -408,6 +408,45 @@ (define_insn "vec_extract" [(set_attr "type" "neon_store1_one_lane,neon_to_gp")] ) +;; Patterns comparing two vectors and conditionally jump. +;; Avdanced SIMD lacks a vector != comparison, but this is a quite common +;; operation. To not pay the penalty for inverting == we can map our any +;; comparisons to all i.e. any(~x) => all(x). +;; +;; However unlike the AArch64 version, we can't optimize this further as the +;; chain is too long for combine due to these being unspecs so it doesn't fold +;; the operation to something simpler. +(define_expand "cbranch4" + [(set (pc) (if_then_else + (match_operator 0 "expandable_comparison_operator" + [(match_operand:VDQI 1 "register_operand") + (match_operand:VDQI 2 "zero_operand")]) + (label_ref (match_operand 3 "" "")) + (pc)))] + "TARGET_NEON" +{ + rtx mask = operands[1]; + + /* For 128-bit vectors we need an additional reductions. */ + if (known_eq (128, GET_MODE_BITSIZE (mode))) + { + /* Always reduce using a V4SI. */ + mask = gen_reg_rtx (V2SImode); + rtx low = gen_reg_rtx (V2SImode); + rtx high = gen_reg_rtx (V2SImode); + emit_insn (gen_neon_vget_lowv4si (low, operands[1])); + emit_insn (gen_neon_vget_highv4si (high, operands[1])); + emit_insn (gen_neon_vpumaxv2si (mask, low, high)); + } + + emit_insn (gen_neon_vpumaxv2si (mask, mask, mask)); + + rtx val = gen_reg_rtx (SImode); + emit_move_insn (val, gen_lowpart (SImode, mask)); + emit_jump_insn (gen_cbranch_cc (operands[0], val, const0_rtx, operands[3])); + DONE; +}) + ;; This pattern is renamed from "vec_extract" to ;; "neon_vec_extract" and this pattern is called ;; by define_expand in vec-common.md file. diff --git a/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c b/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c new file mode 100644 index 0000000000000000000000000000000000000000..2c05aa10d26ed4ac9785672e6e3b4355cef046dc --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c @@ -0,0 +1,136 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_ok } */ +/* { dg-require-effective-target arm32 } */ +/* { dg-options "-O3 -march=armv8-a+simd -mfpu=auto -mfloat-abi=hard" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#define N 640 +int a[N] = {0}; +int b[N] = {0}; + +/* f1: +** ... +** vcgt.s32 q[0-9]+, q[0-9]+, #0 +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vmov r[0-9]+, s[0-9]+ @ int +** cmp r[0-9]+, #0 +** bne \.L[0-9]+ +** ... +*/ +void f1 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] > 0) + break; + } +} + +/* +** f2: +** ... +** vcge.s32 q[0-9]+, q[0-9]+, #0 +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vmov r[0-9]+, s[0-9]+ @ int +** cmp r[0-9]+, #0 +** bne \.L[0-9]+ +** ... +*/ +void f2 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] >= 0) + break; + } +} + +/* +** f3: +** ... +** vceq.i32 q[0-9]+, q[0-9]+, #0 +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vmov r[0-9]+, s[0-9]+ @ int +** cmp r[0-9]+, #0 +** bne \.L[0-9]+ +** ... +*/ +void f3 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] == 0) + break; + } +} + +/* +** f4: +** ... +** vceq.i32 q[0-9]+, q[0-9]+, #0 +** vmvn q[0-9]+, q[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vmov r[0-9]+, s[0-9]+ @ int +** cmp r[0-9]+, #0 +** bne \.L[0-9]+ +** ... +*/ +void f4 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] != 0) + break; + } +} + +/* +** f5: +** ... +** vclt.s32 q[0-9]+, q[0-9]+, #0 +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vmov r[0-9]+, s[0-9]+ @ int +** cmp r[0-9]+, #0 +** bne \.L[0-9]+ +** ... +*/ +void f5 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] < 0) + break; + } +} + +/* +** f6: +** ... +** vcle.s32 q[0-9]+, q[0-9]+, #0 +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vmov r[0-9]+, s[0-9]+ @ int +** cmp r[0-9]+, #0 +** bne \.L[0-9]+ +** ... +*/ +void f6 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] <= 0) + break; + } +} + diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 5516188dc0aa86d161d67dea5a7769e3c3d72f85..8f58671e6cfd3546c6a98e40341fe31c6492594b 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -3784,6 +3784,7 @@ proc check_effective_target_vect_early_break { } { return [check_cached_effective_target_indexed vect_early_break { expr { [istarget aarch64*-*-*] + || [check_effective_target_arm_neon_ok] }}] } # Return 1 if the target supports hardware vectorization of complex additions of --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -408,6 +408,45 @@ (define_insn "vec_extract" [(set_attr "type" "neon_store1_one_lane,neon_to_gp")] ) +;; Patterns comparing two vectors and conditionally jump. +;; Avdanced SIMD lacks a vector != comparison, but this is a quite common +;; operation. To not pay the penalty for inverting == we can map our any +;; comparisons to all i.e. any(~x) => all(x). +;; +;; However unlike the AArch64 version, we can't optimize this further as the +;; chain is too long for combine due to these being unspecs so it doesn't fold +;; the operation to something simpler. +(define_expand "cbranch4" + [(set (pc) (if_then_else + (match_operator 0 "expandable_comparison_operator" + [(match_operand:VDQI 1 "register_operand") + (match_operand:VDQI 2 "zero_operand")]) + (label_ref (match_operand 3 "" "")) + (pc)))] + "TARGET_NEON" +{ + rtx mask = operands[1]; + + /* For 128-bit vectors we need an additional reductions. */ + if (known_eq (128, GET_MODE_BITSIZE (mode))) + { + /* Always reduce using a V4SI. */ + mask = gen_reg_rtx (V2SImode); + rtx low = gen_reg_rtx (V2SImode); + rtx high = gen_reg_rtx (V2SImode); + emit_insn (gen_neon_vget_lowv4si (low, operands[1])); + emit_insn (gen_neon_vget_highv4si (high, operands[1])); + emit_insn (gen_neon_vpumaxv2si (mask, low, high)); + } + + emit_insn (gen_neon_vpumaxv2si (mask, mask, mask)); + + rtx val = gen_reg_rtx (SImode); + emit_move_insn (val, gen_lowpart (SImode, mask)); + emit_jump_insn (gen_cbranch_cc (operands[0], val, const0_rtx, operands[3])); + DONE; +}) + ;; This pattern is renamed from "vec_extract" to ;; "neon_vec_extract" and this pattern is called ;; by define_expand in vec-common.md file. diff --git a/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c b/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c new file mode 100644 index 0000000000000000000000000000000000000000..2c05aa10d26ed4ac9785672e6e3b4355cef046dc --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c @@ -0,0 +1,136 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_ok } */ +/* { dg-require-effective-target arm32 } */ +/* { dg-options "-O3 -march=armv8-a+simd -mfpu=auto -mfloat-abi=hard" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#define N 640 +int a[N] = {0}; +int b[N] = {0}; + +/* f1: +** ... +** vcgt.s32 q[0-9]+, q[0-9]+, #0 +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vmov r[0-9]+, s[0-9]+ @ int +** cmp r[0-9]+, #0 +** bne \.L[0-9]+ +** ... +*/ +void f1 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] > 0) + break; + } +} + +/* +** f2: +** ... +** vcge.s32 q[0-9]+, q[0-9]+, #0 +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vmov r[0-9]+, s[0-9]+ @ int +** cmp r[0-9]+, #0 +** bne \.L[0-9]+ +** ... +*/ +void f2 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] >= 0) + break; + } +} + +/* +** f3: +** ... +** vceq.i32 q[0-9]+, q[0-9]+, #0 +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vmov r[0-9]+, s[0-9]+ @ int +** cmp r[0-9]+, #0 +** bne \.L[0-9]+ +** ... +*/ +void f3 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] == 0) + break; + } +} + +/* +** f4: +** ... +** vceq.i32 q[0-9]+, q[0-9]+, #0 +** vmvn q[0-9]+, q[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vmov r[0-9]+, s[0-9]+ @ int +** cmp r[0-9]+, #0 +** bne \.L[0-9]+ +** ... +*/ +void f4 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] != 0) + break; + } +} + +/* +** f5: +** ... +** vclt.s32 q[0-9]+, q[0-9]+, #0 +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vmov r[0-9]+, s[0-9]+ @ int +** cmp r[0-9]+, #0 +** bne \.L[0-9]+ +** ... +*/ +void f5 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] < 0) + break; + } +} + +/* +** f6: +** ... +** vcle.s32 q[0-9]+, q[0-9]+, #0 +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vmov r[0-9]+, s[0-9]+ @ int +** cmp r[0-9]+, #0 +** bne \.L[0-9]+ +** ... +*/ +void f6 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] <= 0) + break; + } +} + diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 5516188dc0aa86d161d67dea5a7769e3c3d72f85..8f58671e6cfd3546c6a98e40341fe31c6492594b 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -3784,6 +3784,7 @@ proc check_effective_target_vect_early_break { } { return [check_cached_effective_target_indexed vect_early_break { expr { [istarget aarch64*-*-*] + || [check_effective_target_arm_neon_ok] }}] } # Return 1 if the target supports hardware vectorization of complex additions of From patchwork Mon Nov 6 07:43:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 161855 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:8f47:0:b0:403:3b70:6f57 with SMTP id j7csp2497475vqu; Sun, 5 Nov 2023 23:44:48 -0800 (PST) X-Google-Smtp-Source: AGHT+IEAFV7PaQCLRmdurC4yBCMf3TTEsacMn8OHC4cXhuqxP6c18gFAsqYT1nF2XIMW2GiygAMU X-Received: by 2002:a05:620a:280a:b0:77a:55b2:a134 with SMTP id f10-20020a05620a280a00b0077a55b2a134mr14097037qkp.51.1699256688366; Sun, 05 Nov 2023 23:44:48 -0800 (PST) ARC-Seal: i=4; a=rsa-sha256; t=1699256688; cv=pass; d=google.com; s=arc-20160816; b=X7Ux6s4bdlQ/bgBF6BqxvLUSMqFedSA4Wfx2xCaTwySN35Q7ewAHxfXahsl7Ud81q4 SZEG7MWBipNe/yZgTRRcQPlOA/CbuwomENXywoOClCycRJn6IuC1vg6cV460skpQkQlh EqiP2SVS0LGF7eXwyZUvMY4h5sQe2MDLG++8dy1Sj5v5Jiiv1EDygV0MptUNxnLkwn/D i4PUMJZQRGNLxJqowMKm1meuEHdw5dNdUe9vbYZ97oY7+vepXkjQikEd7amk9CMCkBQ+ 9l+HlEo2DLjDPXM3jVXf94+BhCTAc+pHFBFXxjHU3Mh9Q2Vew6+mZYlaM5t3Yebmoc1V YGDQ== ARC-Message-Signature: i=4; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:original-authentication-results :nodisclaimer:mime-version:in-reply-to:content-disposition :message-id:subject:cc:to:from:date:authentication-results-original :dkim-signature:dkim-signature:arc-filter:dmarc-filter:delivered-to; bh=scYkYjbIcQQsQ9Z/654hyB4cwbb/H/5ZmnhslqoLWeo=; fh=Yt1FGz6RyV7+RqQlNAyvJO9M2CgMZgoxOe6Taq+wFaM=; b=KlOL2hoL0UQB7O0ip83EuENQv5xoscVDBqZQ0j+4u3hQaJfsbn9r55ZzzT7KDcmIPl kZy0N4fJmVC3bamNKmxJ5tdtJcI6z886Zhdp/kOPCbumhDTqXOjq+gHACvvXQ2tDqJSr LfDMYEIViC3yC+zjbDRv0NMygNGGEy9eMVL4Q8OfgTFO3oawehxgvbEoFOftP1ACPad1 uz5FMO+2Cx6bKmhAoXpFQ5GRUP0Lw1bOt4EyZyw1J5GKhBqfKWwqlwXJRAykHK2dzXHA 2mJjUiFeKJK9XjiHwy8i9SaaKJhdHJWIaa6ST7Wd4p3txoB3/tr0A+cXhpkuPqaKF/v7 SJEQ== ARC-Authentication-Results: i=4; mx.google.com; dkim=pass header.i=@armh.onmicrosoft.com header.s=selector2-armh-onmicrosoft-com header.b=D+TlOqSE; dkim=pass header.i=@armh.onmicrosoft.com header.s=selector2-armh-onmicrosoft-com header.b=D+TlOqSE; arc=pass (i=3); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id j12-20020a05620a000c00b00779eb01838asi5248889qki.743.2023.11.05.23.44.48 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 05 Nov 2023 23:44:48 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@armh.onmicrosoft.com header.s=selector2-armh-onmicrosoft-com header.b=D+TlOqSE; dkim=pass header.i=@armh.onmicrosoft.com header.s=selector2-armh-onmicrosoft-com header.b=D+TlOqSE; arc=pass (i=3); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id DD3CD3882ADF for ; Mon, 6 Nov 2023 07:43:47 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR04-VI1-obe.outbound.protection.outlook.com (mail-vi1eur04on2044.outbound.protection.outlook.com [40.107.8.44]) by sourceware.org (Postfix) with ESMTPS id 361433875DC7 for ; Mon, 6 Nov 2023 07:43:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 361433875DC7 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 361433875DC7 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.8.44 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1699256601; cv=pass; b=sIk4t/dUeFJUop9QjBOkbXUP+JvhNL8zhpn72JMjSBDvNp8m0yiLaspDbe5qa7YbJgfm0cXSJPXpmUxqKbyLUFY26EEmepq1j6rfUkmRJC1olAqkCYgVk48jcTXD5k+N0X3eji9ySIBDG7W8YZtcnumy64rZaogez3h0VzydODM= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1699256601; c=relaxed/simple; bh=kukGwO0TtrBwTAY2QA0MtGtlMXBaXA7gyxIMC9lChFo=; h=DKIM-Signature:DKIM-Signature:Date:From:To:Subject:Message-ID: MIME-Version; b=hh6f++/zqMOMIIK9bs7dZ+bAML501b0tr4Xh9YHzyczMe7OveBqyDPuA5/i2HNzJlYqkcVZAnKBZ3oORAa4U6unv1gh7uZimvOkqOVrpxUezjbe+v2MpRSs1DuYBvnX2imNsS/RQC45kgoCC094yJkjc9Ftqw4UVJCAdaW4ZUTk= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=IrZP1ZdzGC2gB0RYBFihk3IWwm/d/oI9QJM3eGepwLiTrAOBRHxESi767XOYwBlpWb/6nHdCmY0st4ThKl5GdWPJ9cFTe4JmvdVOSKlaWgiRM/KRqUg2doKSgTTSAQd+LE5z4WiEBWnThb4dwW4/G1+a4QIkniAjPZ7GnhoeN4J7YSi7mhGnAQCHCMdWUSSXuDegrw4GWuxi/DaAywEq1NdU1uWy/T+odfq0lBhTEYCVo8m5OW+9KvAKcS4TgNShQ63NH2MJuH5kVZTDuYZAGj4lfdCPJG16Gd98kf0MlfOHk+Z5kMWe+r8d99MaubUr2z+0BHOTSC9cGaePbAoDvg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=scYkYjbIcQQsQ9Z/654hyB4cwbb/H/5ZmnhslqoLWeo=; b=U+0Cqcw5Da5GXH23ymc8fG9X3S4H4pNrLa0RK4ojhchVUZMS3LyXuKkkVpTe9iSCR7rPXS/+0/BpGV7cw9hg/MIQ3zvJKTB6mgq2bjGp/Tc9r51Qz9NH55vk4VFFKLq2WXR9aSCkVFKZq8PIVU0I/KAmCOH+7OkkcJoLgEoQ4LjdzFzSYW4R6P3GH4Yl/9OWB61WjgWuNFLQTC69Rs3O8DeHBld/pMG4w54AXute6dDHR4YYk59cwH0LAINvz/Wd+b4ArXiCF3qkuaeT4M2I9pquiRwaYDkf8yrW2i5gD4QjPLtqMP1nGdo7iDZQJbc4QcKEqssyC4K7x2rF3tMT6w== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=scYkYjbIcQQsQ9Z/654hyB4cwbb/H/5ZmnhslqoLWeo=; b=D+TlOqSEieOTDzwYZ7Jxtvu1N/oQ0RDYnRvJ/CXrAz3slCWLHp9ADI52M3CtX36jqCqOmyMY4kZnYcnsTpcB7pXjCpRxKCTaxpgPvo3nnmFZ0sh/qnBA5Elk/Zm6oqfmUo65l8yzp8uZ4TPlKXEVUz4PLk4bpBNX+h6uVsv9894= Received: from AS4P251CA0027.EURP251.PROD.OUTLOOK.COM (2603:10a6:20b:5d3::13) by DU0PR08MB7413.eurprd08.prod.outlook.com (2603:10a6:10:351::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6954.25; Mon, 6 Nov 2023 07:43:12 +0000 Received: from AM4PEPF00027A5D.eurprd04.prod.outlook.com (2603:10a6:20b:5d3:cafe::d7) by AS4P251CA0027.outlook.office365.com (2603:10a6:20b:5d3::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6954.28 via Frontend Transport; Mon, 6 Nov 2023 07:43:12 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM4PEPF00027A5D.mail.protection.outlook.com (10.167.16.69) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6977.16 via Frontend Transport; Mon, 6 Nov 2023 07:43:12 +0000 Received: ("Tessian outbound 385ad2f98d71:v228"); Mon, 06 Nov 2023 07:43:12 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: db4265bc5cf5fdb5 X-CR-MTA-TID: 64aa7808 Received: from 350c57be11ea.3 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 92D42E29-1F66-4003-BC8C-69F29C1B793E.1; Mon, 06 Nov 2023 07:43:05 +0000 Received: from EUR05-AM6-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 350c57be11ea.3 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 06 Nov 2023 07:43:05 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=XYi9m7CfO9HJc3q/ytrmeuDeAZveEOkyNb8lC66rJhXFdcL+ZVxLC6ZBf9b6OW64ajagafy9Jqqkq80XXn7rtxI92WWzG6S5QzM1MkqNJ14SKNU4DGk8Fi8KxomS8DRfLFaDwBcCRGktePOkiBy923YGNbZpBf8Dc/+y70Y4+M2rwazyV29xuOg9r2snPE/9qKbucU5WbI8YoIGA4hLXiLSEaIGMmo8gNEHRBPGwFMl6n38T6PI/ix6lZNvyLCWGE/kq2+2RwMp0Ww/Q+9UaIgpCUQEXwvBAFW7CCOTQdgfm/iksZW7nJSo7QhrcdgN8sTe5/ASJp7vjFr4VMk9wOQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=scYkYjbIcQQsQ9Z/654hyB4cwbb/H/5ZmnhslqoLWeo=; b=jc/RwRB8VdBgV2lOx8JN6J51dwcBtbTA01IkoM23P84TkuCGEQFBT0ZmKB2KckWuTAJE1VFfO5ZublJ24QDyYTraNXcRh8dqzhVkpTSX0eYQlqHlQ2ssiRwTiZma9dz8JEIWsnRko+TdYVuxd6+5bHEozuHw+hPgqc8t9RGtCZJmJiU8eDAmjk+atpnNPEtkScNCtjSlgpDWQAj5rVR/rPGE7mK5tQfZyUjhDln6Tt4P1d7EjL8IuPyJwId9N0yRd7Q5+CiICiWjgkazdbYVWNy/PcyomWPm4YL7H6oRfDGjzDHff/8vcHkhNFh+oKa0CA6h7pe1R6YVpcLHsx1s3g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=scYkYjbIcQQsQ9Z/654hyB4cwbb/H/5ZmnhslqoLWeo=; b=D+TlOqSEieOTDzwYZ7Jxtvu1N/oQ0RDYnRvJ/CXrAz3slCWLHp9ADI52M3CtX36jqCqOmyMY4kZnYcnsTpcB7pXjCpRxKCTaxpgPvo3nnmFZ0sh/qnBA5Elk/Zm6oqfmUo65l8yzp8uZ4TPlKXEVUz4PLk4bpBNX+h6uVsv9894= Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by AS8PR08MB6694.eurprd08.prod.outlook.com (2603:10a6:20b:39e::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6954.28; Mon, 6 Nov 2023 07:43:03 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::26aa:efdd:a74a:27d0]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::26aa:efdd:a74a:27d0%5]) with mapi id 15.20.6954.028; Mon, 6 Nov 2023 07:43:03 +0000 Date: Mon, 6 Nov 2023 07:43:00 +0000 From: Tamar Christina To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, Ramana.Radhakrishnan@arm.com, Richard.Earnshaw@arm.com, nickc@redhat.com, Kyrylo.Tkachov@arm.com Subject: [PATCH 21/21]Arm: Add MVE cbranch implementation Message-ID: Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: LO2P123CA0013.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:a6::25) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|AS8PR08MB6694:EE_|AM4PEPF00027A5D:EE_|DU0PR08MB7413:EE_ X-MS-Office365-Filtering-Correlation-Id: 535c500c-080f-40eb-7be7-08dbde9c0775 X-LD-Processed: f34e5979-57d9-4aaa-ad4d-b122a662184d,ExtAddr x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: dYvVEX1XeacrSmd77Chwk9OOS0j5lTuIWNg4HNu9k2QlXnpFXrISZyGZwC21DHC0eLPHTmeJ1r9Zgqllm4YkBBW3cNB66m/epEckQ+bPpCDrKq5kIiVVNdTKXgnhccirlkO4XnfSiI/vPYq6695eMIBK7w+cQbH/EN16wXK1DuwjUb0GXiV8ITu3YP16WV8n40uZNN9qHLjCC5BmV+fFPL1JAY9nedsRAUlQnCLXXtjyZ1IsGF1YLeo5LoLAXcHAhgxmMMrRdgv0jKtaMAI5nOSlyYbqgOB/ucTc2/sCZWmbKaq3GSCccOovCV+JY01s91CRMLyFmUSlXThlQPsCIYZe3P6TFN/b2E2mCz7x/yqDR8f/C5vStKaETf//DdGAyxFpzn2J3ff8jjXM6viI5mP1QXPZHwbx8EtdL1Sg8CpdmeogxJz2mU4P1YFpDhAHg27h//ACrRkrzPYA70O/4KVytG2kaC+6gvOLlRl4IQylIFZ7GmGfyx0uUrygHwwHPBYEmER0MWWh3s6M7VprJg6rAL3pHRb9mpOQ/A0sjr4xpC2CNxjqFHKA67dgst8gNAgOBfa4mqo2LVM8q8GzLake4DHkX/vevef9DQpq2a0= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230031)(39860400002)(366004)(396003)(376002)(136003)(346002)(230922051799003)(64100799003)(451199024)(186009)(1800799009)(2906002)(38100700002)(4326008)(8676002)(8936002)(41300700001)(84970400001)(83380400001)(5660300002)(86362001)(235185007)(4743002)(44832011)(6512007)(2616005)(316002)(66556008)(66476007)(66946007)(6916009)(6506007)(33964004)(44144004)(26005)(6486002)(478600001)(36756003)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB6694 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM4PEPF00027A5D.eurprd04.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 657f69b9-5983-4dce-bd25-08dbde9c0223 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: +hBKHJf/z+9Fzf8ZKnnMLfD3g3P8VWaDJ3gUltTVM5s6nakjaum24gQljoeHlWiA0o8OuuGrbgBYGt+EiHaWZ9txCfyT99di8AhCTbJJBnN22tHM0dbMA8DpWeb3jwtch2UCpM0setOhCmHvF6iCK8a7GrE/SirYuxA56Og8wCIzuHDF8cPDRckLTiEFjEHnotkeDFYuOsymvvjSU8BDdE1enBIGBilw8jJPY00uuC9fOJS5GUY193uPhvaKVnccVWbVwWO1BTSfWyZni2h/GaRPfPmE3JChmrY6rgHUPCZDyd0dUuO2/eXw2CNYh6gC2LPocSnYccJqtXex3Bki3I1DmrPlQi9jg4o4fi4UWf+HvDn4ypnDq5LRTwY5ptm8PobIwsv12ebiS5ubmLsMyIvWje0nBtKZ+ZpaNc17LhrpMZhTwKxkWoayjCbBX6W3D+l1UuexN4Wm/R3h69z4kpC6bDT/uzyuNEdQrm5Kzq54gpDuH8jcYs0JReBm/Wp8EetXZPmFpPXXk/ZLxHV195ucrjvdGC5Pqs7TIfMbTEX/8rg33rvytJntMMMLNPOmWaiSbWsNdSsqO97ALbskoBgLQgnPzO2IgL/DRPqFQ8V60DXqyUDd9C9he0SlPiOlVAZtUnuz8Eve8dUKKLN61qWWYl8zOw9rrCM9UUSHYn2uswaZz5PD3YF0GPopHEwYcvB60zOjqK64dTDYFiVaR0c1QiW2zVGgaPK9EfFIUe+GGKuW8NPuaQsF5/Co8GsNLpFL62eDBRLyOXVPNr7Q+w== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230031)(4636009)(39860400002)(396003)(346002)(376002)(136003)(230922051799003)(82310400011)(1800799009)(64100799003)(186009)(451199024)(36840700001)(46966006)(40470700004)(40460700003)(2616005)(4743002)(336012)(6512007)(26005)(6506007)(44144004)(33964004)(83380400001)(478600001)(8676002)(47076005)(36860700001)(8936002)(5660300002)(235185007)(2906002)(70586007)(44832011)(6916009)(6486002)(41300700001)(4326008)(316002)(70206006)(81166007)(356005)(82740400003)(86362001)(36756003)(84970400001)(40480700001)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Nov 2023 07:43:12.3680 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 535c500c-080f-40eb-7be7-08dbde9c0775 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM4PEPF00027A5D.eurprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU0PR08MB7413 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_LOTSOFHASH, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1769954842429889504 X-GMAIL-MSGID: 1781799781185881875 Hi All, This adds an implementation for conditional branch optab for MVE. Unfortunately MVE has rather limited operations on VPT.P0, we are missing the ability to do P0 comparisons and logical OR on P0. For that reason we can only support cbranch with 0, as for comparing to a 0 predicate we don't need to actually do a comparison, we only have to check that any bit is set within P0. Because we can only do P0 comparisons with 0, the costing of the comparison was reduced in order for the compiler not to try to push 0 to a register thinking it's too expensive. For the cbranch implementation to be safe we must see the constant 0 vector. For the lack of logical OR on P0 we can't really work around. This means MVE can't support cases where the sizes of operands in the comparison don't match, i.e. when one operand has been unpacked. For e.g. void f1 () { for (int i = 0; i < N; i++) { b[i] += a[i]; if (a[i] > 0) break; } } For 128-bit vectors we generate: vcmp.s32 gt, q3, q1 vmrs r3, p0 @ movhi cbnz r3, .L2 MVE does not have 64-bit vector comparisons, as such that is also not supported. Bootstrapped arm-none-linux-gnueabihf and regtested with -march=armv8.1-m.main+mve -mfpu=auto and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/arm/arm.cc (arm_rtx_costs_internal): Update costs for pred 0 compares. * config/arm/mve.md (cbranch4): New. gcc/testsuite/ChangeLog: * lib/target-supports.exp (vect_early_break): Add MVE. * gcc.target/arm/mve/vect-early-break-cbranch.c: New test. --- inline copy of patch -- diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc index 38f0839de1c75547c259ac3d655fcfc14e7208a2..15e65c15cb3cb6f70161787e84b255a24eb51e32 100644 --- diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc index 38f0839de1c75547c259ac3d655fcfc14e7208a2..15e65c15cb3cb6f70161787e84b255a24eb51e32 100644 --- a/gcc/config/arm/arm.cc +++ b/gcc/config/arm/arm.cc @@ -11883,6 +11883,15 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code, || TARGET_HAVE_MVE) && simd_immediate_valid_for_move (x, mode, NULL, NULL)) *cost = COSTS_N_INSNS (1); + else if (TARGET_HAVE_MVE + && outer_code == COMPARE + && VALID_MVE_PRED_MODE (mode)) + /* MVE allows very limited instructions on VPT.P0, however comparisons + to 0 do not require us to materialze this constant or require a + predicate comparison as we can go through SImode. For that reason + allow P0 CMP 0 as a cheap operation such that the 0 isn't forced to + registers as we can't compare two predicates. */ + *cost = COSTS_N_INSNS (1); else *cost = COSTS_N_INSNS (4); return true; diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md index 74909ce47e132c22a94f7d9cd3a0921b38e33051..95d40770ecc25f9eb251eba38306dd43cbebfb3f 100644 --- a/gcc/config/arm/mve.md +++ b/gcc/config/arm/mve.md @@ -6880,6 +6880,21 @@ (define_expand "vcond_mask_" DONE; }) +(define_expand "cbranch4" + [(set (pc) (if_then_else + (match_operator 0 "expandable_comparison_operator" + [(match_operand:MVE_7 1 "register_operand") + (match_operand:MVE_7 2 "zero_operand")]) + (label_ref (match_operand 3 "" "")) + (pc)))] + "TARGET_HAVE_MVE" +{ + rtx val = gen_reg_rtx (SImode); + emit_move_insn (val, gen_lowpart (SImode, operands[1])); + emit_jump_insn (gen_cbranchsi4 (operands[0], val, const0_rtx, operands[3])); + DONE; +}) + ;; Reinterpret operand 1 in operand 0's mode, without changing its contents. (define_expand "@arm_mve_reinterpret" [(set (match_operand:MVE_vecs 0 "register_operand") diff --git a/gcc/testsuite/gcc.target/arm/mve/vect-early-break-cbranch.c b/gcc/testsuite/gcc.target/arm/mve/vect-early-break-cbranch.c new file mode 100644 index 0000000000000000000000000000000000000000..c3b8506dca0b2b044e6869a6c8259d663c1ff930 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/vect-early-break-cbranch.c @@ -0,0 +1,117 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_v8_1m_mve_ok } */ +/* { dg-add-options arm_v8_1m_mve } */ +/* { dg-options "-O3" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#define N 640 +int a[N] = {0}; +int b[N] = {0}; + +/* +** f1: +** ... +** vcmp.s32 gt, q[0-9]+, q[0-9]+ +** vmrs r[0-9]+, p0 @ movhi +** cbnz r[0-9]+, \.L[0-9]+ +** ... +*/ +void f1 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] > 0) + break; + } +} + +/* +** f2: +** ... +** vcmp.s32 ge, q[0-9]+, q[0-9]+ +** vmrs r[0-9]+, p0 @ movhi +** cbnz r[0-9]+, \.L[0-9]+ +** ... +*/ +void f2 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] >= 0) + break; + } +} + +/* +** f3: +** ... +** vcmp.i32 eq, q[0-9]+, q[0-9]+ +** vmrs r[0-9]+, p0 @ movhi +** cbnz r[0-9]+, \.L[0-9]+ +** ... +*/ +void f3 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] == 0) + break; + } +} + +/* +** f4: +** ... +** vcmp.i32 ne, q[0-9]+, q[0-9]+ +** vmrs r[0-9]+, p0 @ movhi +** cbnz r[0-9]+, \.L[0-9]+ +** ... +*/ +void f4 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] != 0) + break; + } +} + +/* +** f5: +** ... +** vcmp.s32 lt, q[0-9]+, q[0-9]+ +** vmrs r[0-9]+, p0 @ movhi +** cbnz r[0-9]+, \.L[0-9]+ +** ... +*/ +void f5 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] < 0) + break; + } +} + +/* +** f6: +** ... +** vcmp.s32 le, q[0-9]+, q[0-9]+ +** vmrs r[0-9]+, p0 @ movhi +** cbnz r[0-9]+, \.L[0-9]+ +** ... +*/ +void f6 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] <= 0) + break; + } +} diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 8f58671e6cfd3546c6a98e40341fe31c6492594b..1eef764542a782786e27ed935a06243e319ae3fc 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -3785,6 +3785,8 @@ proc check_effective_target_vect_early_break { } { expr { [istarget aarch64*-*-*] || [check_effective_target_arm_neon_ok] + || ([check_effective_target_arm_v8_1m_mve_fp_ok] + && [check_effective_target_arm_little_endian]) }}] } # Return 1 if the target supports hardware vectorization of complex additions of --- a/gcc/config/arm/arm.cc +++ b/gcc/config/arm/arm.cc @@ -11883,6 +11883,15 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code, || TARGET_HAVE_MVE) && simd_immediate_valid_for_move (x, mode, NULL, NULL)) *cost = COSTS_N_INSNS (1); + else if (TARGET_HAVE_MVE + && outer_code == COMPARE + && VALID_MVE_PRED_MODE (mode)) + /* MVE allows very limited instructions on VPT.P0, however comparisons + to 0 do not require us to materialze this constant or require a + predicate comparison as we can go through SImode. For that reason + allow P0 CMP 0 as a cheap operation such that the 0 isn't forced to + registers as we can't compare two predicates. */ + *cost = COSTS_N_INSNS (1); else *cost = COSTS_N_INSNS (4); return true; diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md index 74909ce47e132c22a94f7d9cd3a0921b38e33051..95d40770ecc25f9eb251eba38306dd43cbebfb3f 100644 --- a/gcc/config/arm/mve.md +++ b/gcc/config/arm/mve.md @@ -6880,6 +6880,21 @@ (define_expand "vcond_mask_" DONE; }) +(define_expand "cbranch4" + [(set (pc) (if_then_else + (match_operator 0 "expandable_comparison_operator" + [(match_operand:MVE_7 1 "register_operand") + (match_operand:MVE_7 2 "zero_operand")]) + (label_ref (match_operand 3 "" "")) + (pc)))] + "TARGET_HAVE_MVE" +{ + rtx val = gen_reg_rtx (SImode); + emit_move_insn (val, gen_lowpart (SImode, operands[1])); + emit_jump_insn (gen_cbranchsi4 (operands[0], val, const0_rtx, operands[3])); + DONE; +}) + ;; Reinterpret operand 1 in operand 0's mode, without changing its contents. (define_expand "@arm_mve_reinterpret" [(set (match_operand:MVE_vecs 0 "register_operand") diff --git a/gcc/testsuite/gcc.target/arm/mve/vect-early-break-cbranch.c b/gcc/testsuite/gcc.target/arm/mve/vect-early-break-cbranch.c new file mode 100644 index 0000000000000000000000000000000000000000..c3b8506dca0b2b044e6869a6c8259d663c1ff930 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mve/vect-early-break-cbranch.c @@ -0,0 +1,117 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_v8_1m_mve_ok } */ +/* { dg-add-options arm_v8_1m_mve } */ +/* { dg-options "-O3" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#define N 640 +int a[N] = {0}; +int b[N] = {0}; + +/* +** f1: +** ... +** vcmp.s32 gt, q[0-9]+, q[0-9]+ +** vmrs r[0-9]+, p0 @ movhi +** cbnz r[0-9]+, \.L[0-9]+ +** ... +*/ +void f1 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] > 0) + break; + } +} + +/* +** f2: +** ... +** vcmp.s32 ge, q[0-9]+, q[0-9]+ +** vmrs r[0-9]+, p0 @ movhi +** cbnz r[0-9]+, \.L[0-9]+ +** ... +*/ +void f2 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] >= 0) + break; + } +} + +/* +** f3: +** ... +** vcmp.i32 eq, q[0-9]+, q[0-9]+ +** vmrs r[0-9]+, p0 @ movhi +** cbnz r[0-9]+, \.L[0-9]+ +** ... +*/ +void f3 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] == 0) + break; + } +} + +/* +** f4: +** ... +** vcmp.i32 ne, q[0-9]+, q[0-9]+ +** vmrs r[0-9]+, p0 @ movhi +** cbnz r[0-9]+, \.L[0-9]+ +** ... +*/ +void f4 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] != 0) + break; + } +} + +/* +** f5: +** ... +** vcmp.s32 lt, q[0-9]+, q[0-9]+ +** vmrs r[0-9]+, p0 @ movhi +** cbnz r[0-9]+, \.L[0-9]+ +** ... +*/ +void f5 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] < 0) + break; + } +} + +/* +** f6: +** ... +** vcmp.s32 le, q[0-9]+, q[0-9]+ +** vmrs r[0-9]+, p0 @ movhi +** cbnz r[0-9]+, \.L[0-9]+ +** ... +*/ +void f6 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] <= 0) + break; + } +} diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 8f58671e6cfd3546c6a98e40341fe31c6492594b..1eef764542a782786e27ed935a06243e319ae3fc 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -3785,6 +3785,8 @@ proc check_effective_target_vect_early_break { } { expr { [istarget aarch64*-*-*] || [check_effective_target_arm_neon_ok] + || ([check_effective_target_arm_v8_1m_mve_fp_ok] + && [check_effective_target_arm_little_endian]) }}] } # Return 1 if the target supports hardware vectorization of complex additions of