From patchwork Wed Jun 28 13:49:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 113910 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp8943486vqr; Wed, 28 Jun 2023 06:55:31 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5KoL0QqZQzohq6M+gxyrCIbC5lWG+9g41ErwpOJQP4JsINoe1bUdzwdf+Dt6fC7fKlbZYK X-Received: by 2002:a50:ee0c:0:b0:51d:d01c:a2c4 with SMTP id g12-20020a50ee0c000000b0051dd01ca2c4mr1407660eds.7.1687960530932; Wed, 28 Jun 2023 06:55:30 -0700 (PDT) Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id d22-20020a50fb16000000b0051d98308c3csi3937404edq.470.2023.06.28.06.55.30 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 28 Jun 2023 06:55:30 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=vgwBhLUl; arc=fail (signature failed); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id F276F3836E92 for ; Wed, 28 Jun 2023 13:51:35 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org F276F3836E92 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1687960296; bh=t/0jiBZAURIFHKhI9Fu5xPkLfIBPgoCKcK7OvJCikss=; h=Date:To:Cc:Subject:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=vgwBhLUlnFE2k2SduNFp4ncMlR3FGJMWqi07VlV1WziHsdk+HSB8plT8kZZL7foJN HLQ3nSsRbEd32HBVVmxXJ/CtRUJTZWWjG9Zw31liHH372yaEfjmfhHhq6A2b/DVDRs uf4icpp9OLZo04OqLGY4aEt5yXE14NwJ0SBLJMQ0= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR01-DB5-obe.outbound.protection.outlook.com (mail-db5eur01on2053.outbound.protection.outlook.com [40.107.15.53]) by sourceware.org (Postfix) with ESMTPS id CA992385E45D for ; Wed, 28 Jun 2023 13:49:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CA992385E45D Received: from DB7PR05CA0071.eurprd05.prod.outlook.com (2603:10a6:10:2e::48) by AS8PR08MB9623.eurprd08.prod.outlook.com (2603:10a6:20b:618::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.24; Wed, 28 Jun 2023 13:49:30 +0000 Received: from DBAEUR03FT016.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:2e:cafe::1) by DB7PR05CA0071.outlook.office365.com (2603:10a6:10:2e::48) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.36 via Frontend Transport; Wed, 28 Jun 2023 13:49:30 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT016.mail.protection.outlook.com (100.127.142.204) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6544.18 via Frontend Transport; Wed, 28 Jun 2023 13:49:30 +0000 Received: ("Tessian outbound 7c913606c6e6:v142"); Wed, 28 Jun 2023 13:49:30 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 3be5384758081997 X-CR-MTA-TID: 64aa7808 Received: from ff8427258574.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id F2893187-3CE2-4F5B-ACC5-4FA477106442.1; Wed, 28 Jun 2023 13:49:24 +0000 Received: from EUR05-DB8-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id ff8427258574.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 28 Jun 2023 13:49:24 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=jUSkxIglCsRdo8mt2cIvOVf05V5yGlKuMIVf9OT5P0sSZV2JO+lXw2N2Qbi5SJD5qeEnnV+FUS3CN93Php4v9veWWvotMqJHEK2FWl2zjt6j/4NmTEfNZotd91c379Ne/Czk+n3fNc91jB/O8T8DwFhOUVrrrfLwY7g/zi4RuS3zFnkyWVj2p6ytKNsEm5H3/ulpdYiFsv0dp+nGZRdt+WR7GIP4OO4QZcGVum3Sj9XhV3Q/9y/BFOMd0GwB+PEH8AabAIIpzEKvtaCn/F/DCDwAeG4aLJISN33FQaC+Lex6wGLpuitmlSm54pAJ8wzsMzWCyZJpbNx1axIVqcKfBA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=t/0jiBZAURIFHKhI9Fu5xPkLfIBPgoCKcK7OvJCikss=; b=a1+7JaKbL+FmnfTPIdP5v8Ep7DpjCx4etktccTRHz1VrHmQMHCBUMo5o2RCvmFvLYVUuqLwdz7LGTq/S7hE1NF5zVax4qU7UXtNs0KizkOm9OHfwMb5S24IpLMmdy4gvYKBUmMRUWj0sCEZHDi4qixExZblORTDmRzXqkDgbojyJELkzrOH4kaiVNjpENMNU9ka/RRGAEWW8Seq78oOjE74+4fi9oRXjGTLqR3gXk82e2A3BTiVXAtMk7jI85p4PUv4YFV8G0ugBXCPfe9+itRGRMhtKXiTwTur2DSdWMLgc6HZD5xEvlzjnHHVLUcau/ByfgifG/V48UbsLYTOdhQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by DB5PR08MB9970.eurprd08.prod.outlook.com (2603:10a6:10:489::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.23; Wed, 28 Jun 2023 13:49:22 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::2301:1cde:cfe7:eaf0]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::2301:1cde:cfe7:eaf0%6]) with mapi id 15.20.6521.026; Wed, 28 Jun 2023 13:49:22 +0000 Date: Wed, 28 Jun 2023 14:49:19 +0100 To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, Ramana.Radhakrishnan@arm.com, Richard.Earnshaw@arm.com, nickc@redhat.com, Kyrylo.Tkachov@arm.com Subject: [PATCH 18/19]Arm: Add Advanced SIMD cbranch implementation Message-ID: Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: LO4P123CA0039.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:152::8) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|DB5PR08MB9970:EE_|DBAEUR03FT016:EE_|AS8PR08MB9623:EE_ X-MS-Office365-Filtering-Correlation-Id: 23f7a48b-5d30-406b-b32a-08db77de7f61 X-LD-Processed: f34e5979-57d9-4aaa-ad4d-b122a662184d,ExtAddr x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: kqVZANIcfb/3KGLwIUoX2ZQZEcMV9ZJU+Gt7qP4vA0rm8/3DJ+FRjUB6s/uWDK+Jfv7gKvkbQt0DitNqclw2HI5kLIfaOcMeLSIcWJ2evrVt0zjlsHWtyAw79g4RPrWRKmKtbwZ6iPUcnrwowL0PO28tCtXfs7ZQ+nuDoHVF6VyDS2YX1hQ46Wb0/Saffcl2ehXPnqYryS6xGcO/YkelfQp/EDjG1T9yCyJ6Olguq15ClD51CoYf3fdxPVkSD9o3dKKAJPMqQ9kSrhF+aSbT5I503jat00Obcs9hmP0LWqalyq3bFDBBAtwz42y1XaqUm+Dhqk3qM57KroHSMhdJsq4ViOelmTPIKhfZezf2n+PuIbHbrDKSBIsrTgNRMeuWMte8CV50tXADftW8MjF8/7aCwmTfeWxRNAzEoRe2jYEKDDT3DxmKp3B0jmCEfidE/2ZQbc3ajl1XWnSmun+dklQDYVxoVBjCTzKEEQqoxZ8WG45OGjuE/666TFkzif6BWf3Va95DfzYwnLNIXv5x3Aa3SjSkNMZQm358qKP36nr+6I/+ZX2m1OLlBpkg48dN8S/CFI9hYKHGl2uD79k41RP0QW7na/iDGFSlmMj7fIU= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230028)(4636009)(136003)(346002)(396003)(376002)(366004)(39860400002)(451199021)(66946007)(66476007)(6916009)(44832011)(316002)(66556008)(478600001)(4326008)(36756003)(8936002)(235185007)(8676002)(84970400001)(5660300002)(86362001)(2906002)(33964004)(41300700001)(44144004)(6486002)(186003)(4743002)(6506007)(6512007)(6666004)(38100700002)(26005)(2616005)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB5PR08MB9970 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT016.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 5ad665f8-2c56-45bf-eadd-08db77de7a2d X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 9beO9Pxef9cm98yT6i/1s4D2hmaYShVV2ysEQCTt9lYCZyvjEx1Le0NFH0cE7IXGEK6txMXWj+726VVJatcfGdCtUxAdXdyasIieTpvBsBIboFYIKYoaUEQDE/zL/esDG2O30H2ZdBCXxkbN6+Bsu2qqtMdKEqdfDTASf43qbIdrjMV2d3xrTtliQl1BA0cMNObZGxtgdA+16lP/gKICY23Gzs+9hHQ9fR46CRO87KtkI7NzuafyShXOlv4HHCRJcSR0BVIj1tXy1fSMrWbEU6nFUk0VpnVuL5uXw9RxevCQQR5/D/LHi+OaGUTfmCZY5vLv9ejBA0RehwcedFdkXWOvgnOdzygAGZNiKxB23PSlj5M9LWUv/51j3oB30kYX3/w+KGPnzqXRqnO6x4hoMXy7TxzFcNF07aYR9PIIqhQvT+XD9AUDCZR1h0YRDurZJbZfzNbB1fVZo44c/YvNqDviEa1PlAv4WydapJn9WgW4xjzQtVo3qBoH/QaFTVUHBI2Wne8SHz8DtXvBt2klGMYZRd+z2JQmyPDO991Pz9+kFBWnDcf6AW+RIny4YphdI32kIruNe5MsaDgoA0r0lwZ862U/vKAi1Q2oiLB9an1h7l8aOrwTL82eatL7ySQZpcffAg7VSYPHQeh2tVM2nK9fvSZADgzsUdaucfJEJUVqx2W+aen7WeAxumBFwWtAFMdI77ClAFin1sH3O+IkQnxQKG/r8+iIskqKJzoSqRqKQu/7RBB7SMfVFzvCNdxvZ3fM+sLBgeOtVPTfNpr3DA== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230028)(4636009)(346002)(376002)(396003)(39860400002)(136003)(451199021)(46966006)(40470700004)(36840700001)(84970400001)(33964004)(44144004)(6666004)(478600001)(6486002)(2616005)(47076005)(36860700001)(336012)(40480700001)(82740400003)(86362001)(40460700003)(82310400005)(6506007)(26005)(4743002)(186003)(81166007)(2906002)(6512007)(4326008)(36756003)(6916009)(8936002)(41300700001)(70206006)(235185007)(70586007)(316002)(356005)(44832011)(8676002)(5660300002)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Jun 2023 13:49:30.6611 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 23f7a48b-5d30-406b-b32a-08db77de7f61 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT016.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB9623 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_LOTSOFHASH, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tamar Christina via Gcc-patches From: Tamar Christina Reply-To: Tamar Christina Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769954901362996134?= X-GMAIL-MSGID: =?utf-8?q?1769954901362996134?= Hi All, This adds an implementation for conditional branch optab for AArch32. For e.g. void f1 () { for (int i = 0; i < N; i++) { b[i] += a[i]; if (a[i] > 0) break; } } For 128-bit vectors we generate: vcgt.s32 q8, q9, #0 vpmax.u32 d7, d16, d17 vpmax.u32 d7, d7, d7 vmov r3, s14 @ int cmp r3, #0 and of 64-bit vector we can omit one vpmax as we still need to compress to 32-bits. Bootstrapped Regtested on arm-none-linux-gnueabihf and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/arm/neon.md (cbranch4): New. gcc/testsuite/ChangeLog: * lib/target-supports.exp (vect_early_break): Add AArch32. * gcc.target/arm/vect-early-break-cbranch.c: New test. --- inline copy of patch -- diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index d213369ffc38fb88ad0357d848cc7da5af73bab7..130efbc37cfe3128533599dfadc344d2243dcb63 100644 --- diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index d213369ffc38fb88ad0357d848cc7da5af73bab7..130efbc37cfe3128533599dfadc344d2243dcb63 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -408,6 +408,45 @@ (define_insn "vec_extract" [(set_attr "type" "neon_store1_one_lane,neon_to_gp")] ) +;; Patterns comparing two vectors and conditionally jump. +;; Avdanced SIMD lacks a vector != comparison, but this is a quite common +;; operation. To not pay the penalty for inverting == we can map our any +;; comparisons to all i.e. any(~x) => all(x). +;; +;; However unlike the AArch64 version, we can't optimize this further as the +;; chain is too long for combine due to these being unspecs so it doesn't fold +;; the operation to something simpler. +(define_expand "cbranch4" + [(set (pc) (if_then_else + (match_operator 0 "expandable_comparison_operator" + [(match_operand:VDQI 1 "register_operand") + (match_operand:VDQI 2 "zero_operand")]) + (label_ref (match_operand 3 "" "")) + (pc)))] + "TARGET_NEON" +{ + rtx mask = operands[1]; + + /* For 128-bit vectors we need an additional reductions. */ + if (known_eq (128, GET_MODE_BITSIZE (mode))) + { + /* Always reduce using a V4SI. */ + mask = gen_reg_rtx (V2SImode); + rtx low = gen_reg_rtx (V2SImode); + rtx high = gen_reg_rtx (V2SImode); + emit_insn (gen_neon_vget_lowv4si (low, operands[1])); + emit_insn (gen_neon_vget_highv4si (high, operands[1])); + emit_insn (gen_neon_vpumaxv2si (mask, low, high)); + } + + emit_insn (gen_neon_vpumaxv2si (mask, mask, mask)); + + rtx val = gen_reg_rtx (SImode); + emit_move_insn (val, gen_lowpart (SImode, mask)); + emit_jump_insn (gen_cbranch_cc (operands[0], val, const0_rtx, operands[3])); + DONE; +}) + ;; This pattern is renamed from "vec_extract" to ;; "neon_vec_extract" and this pattern is called ;; by define_expand in vec-common.md file. diff --git a/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c b/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c new file mode 100644 index 0000000000000000000000000000000000000000..2c05aa10d26ed4ac9785672e6e3b4355cef046dc --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c @@ -0,0 +1,136 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_ok } */ +/* { dg-require-effective-target arm32 } */ +/* { dg-options "-O3 -march=armv8-a+simd -mfpu=auto -mfloat-abi=hard" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#define N 640 +int a[N] = {0}; +int b[N] = {0}; + +/* f1: +** ... +** vcgt.s32 q[0-9]+, q[0-9]+, #0 +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vmov r[0-9]+, s[0-9]+ @ int +** cmp r[0-9]+, #0 +** bne \.L[0-9]+ +** ... +*/ +void f1 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] > 0) + break; + } +} + +/* +** f2: +** ... +** vcge.s32 q[0-9]+, q[0-9]+, #0 +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vmov r[0-9]+, s[0-9]+ @ int +** cmp r[0-9]+, #0 +** bne \.L[0-9]+ +** ... +*/ +void f2 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] >= 0) + break; + } +} + +/* +** f3: +** ... +** vceq.i32 q[0-9]+, q[0-9]+, #0 +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vmov r[0-9]+, s[0-9]+ @ int +** cmp r[0-9]+, #0 +** bne \.L[0-9]+ +** ... +*/ +void f3 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] == 0) + break; + } +} + +/* +** f4: +** ... +** vceq.i32 q[0-9]+, q[0-9]+, #0 +** vmvn q[0-9]+, q[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vmov r[0-9]+, s[0-9]+ @ int +** cmp r[0-9]+, #0 +** bne \.L[0-9]+ +** ... +*/ +void f4 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] != 0) + break; + } +} + +/* +** f5: +** ... +** vclt.s32 q[0-9]+, q[0-9]+, #0 +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vmov r[0-9]+, s[0-9]+ @ int +** cmp r[0-9]+, #0 +** bne \.L[0-9]+ +** ... +*/ +void f5 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] < 0) + break; + } +} + +/* +** f6: +** ... +** vcle.s32 q[0-9]+, q[0-9]+, #0 +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vmov r[0-9]+, s[0-9]+ @ int +** cmp r[0-9]+, #0 +** bne \.L[0-9]+ +** ... +*/ +void f6 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] <= 0) + break; + } +} + diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 5516188dc0aa86d161d67dea5a7769e3c3d72f85..8f58671e6cfd3546c6a98e40341fe31c6492594b 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -3784,6 +3784,7 @@ proc check_effective_target_vect_early_break { } { return [check_cached_effective_target_indexed vect_early_break { expr { [istarget aarch64*-*-*] + || [check_effective_target_arm_neon_ok] }}] } # Return 1 if the target supports hardware vectorization of complex additions of --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -408,6 +408,45 @@ (define_insn "vec_extract" [(set_attr "type" "neon_store1_one_lane,neon_to_gp")] ) +;; Patterns comparing two vectors and conditionally jump. +;; Avdanced SIMD lacks a vector != comparison, but this is a quite common +;; operation. To not pay the penalty for inverting == we can map our any +;; comparisons to all i.e. any(~x) => all(x). +;; +;; However unlike the AArch64 version, we can't optimize this further as the +;; chain is too long for combine due to these being unspecs so it doesn't fold +;; the operation to something simpler. +(define_expand "cbranch4" + [(set (pc) (if_then_else + (match_operator 0 "expandable_comparison_operator" + [(match_operand:VDQI 1 "register_operand") + (match_operand:VDQI 2 "zero_operand")]) + (label_ref (match_operand 3 "" "")) + (pc)))] + "TARGET_NEON" +{ + rtx mask = operands[1]; + + /* For 128-bit vectors we need an additional reductions. */ + if (known_eq (128, GET_MODE_BITSIZE (mode))) + { + /* Always reduce using a V4SI. */ + mask = gen_reg_rtx (V2SImode); + rtx low = gen_reg_rtx (V2SImode); + rtx high = gen_reg_rtx (V2SImode); + emit_insn (gen_neon_vget_lowv4si (low, operands[1])); + emit_insn (gen_neon_vget_highv4si (high, operands[1])); + emit_insn (gen_neon_vpumaxv2si (mask, low, high)); + } + + emit_insn (gen_neon_vpumaxv2si (mask, mask, mask)); + + rtx val = gen_reg_rtx (SImode); + emit_move_insn (val, gen_lowpart (SImode, mask)); + emit_jump_insn (gen_cbranch_cc (operands[0], val, const0_rtx, operands[3])); + DONE; +}) + ;; This pattern is renamed from "vec_extract" to ;; "neon_vec_extract" and this pattern is called ;; by define_expand in vec-common.md file. diff --git a/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c b/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c new file mode 100644 index 0000000000000000000000000000000000000000..2c05aa10d26ed4ac9785672e6e3b4355cef046dc --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/vect-early-break-cbranch.c @@ -0,0 +1,136 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_ok } */ +/* { dg-require-effective-target arm32 } */ +/* { dg-options "-O3 -march=armv8-a+simd -mfpu=auto -mfloat-abi=hard" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#define N 640 +int a[N] = {0}; +int b[N] = {0}; + +/* f1: +** ... +** vcgt.s32 q[0-9]+, q[0-9]+, #0 +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vmov r[0-9]+, s[0-9]+ @ int +** cmp r[0-9]+, #0 +** bne \.L[0-9]+ +** ... +*/ +void f1 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] > 0) + break; + } +} + +/* +** f2: +** ... +** vcge.s32 q[0-9]+, q[0-9]+, #0 +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vmov r[0-9]+, s[0-9]+ @ int +** cmp r[0-9]+, #0 +** bne \.L[0-9]+ +** ... +*/ +void f2 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] >= 0) + break; + } +} + +/* +** f3: +** ... +** vceq.i32 q[0-9]+, q[0-9]+, #0 +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vmov r[0-9]+, s[0-9]+ @ int +** cmp r[0-9]+, #0 +** bne \.L[0-9]+ +** ... +*/ +void f3 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] == 0) + break; + } +} + +/* +** f4: +** ... +** vceq.i32 q[0-9]+, q[0-9]+, #0 +** vmvn q[0-9]+, q[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vmov r[0-9]+, s[0-9]+ @ int +** cmp r[0-9]+, #0 +** bne \.L[0-9]+ +** ... +*/ +void f4 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] != 0) + break; + } +} + +/* +** f5: +** ... +** vclt.s32 q[0-9]+, q[0-9]+, #0 +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vmov r[0-9]+, s[0-9]+ @ int +** cmp r[0-9]+, #0 +** bne \.L[0-9]+ +** ... +*/ +void f5 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] < 0) + break; + } +} + +/* +** f6: +** ... +** vcle.s32 q[0-9]+, q[0-9]+, #0 +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vpmax.u32 d[0-9]+, d[0-9]+, d[0-9]+ +** vmov r[0-9]+, s[0-9]+ @ int +** cmp r[0-9]+, #0 +** bne \.L[0-9]+ +** ... +*/ +void f6 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] <= 0) + break; + } +} + diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 5516188dc0aa86d161d67dea5a7769e3c3d72f85..8f58671e6cfd3546c6a98e40341fe31c6492594b 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -3784,6 +3784,7 @@ proc check_effective_target_vect_early_break { } { return [check_cached_effective_target_indexed vect_early_break { expr { [istarget aarch64*-*-*] + || [check_effective_target_arm_neon_ok] }}] } # Return 1 if the target supports hardware vectorization of complex additions of