From patchwork Mon Nov 6 07:42:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 161909 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:8f47:0:b0:403:3b70:6f57 with SMTP id j7csp2536820vqu; Mon, 6 Nov 2023 01:27:12 -0800 (PST) X-Google-Smtp-Source: AGHT+IG2maN+exyAKHWdW8YgbpW1wVaY6miqbI44GUh1ffmQLmwss/0KViWvMGQa4KcQio7ZQ4E+ X-Received: by 2002:a05:620a:45a3:b0:775:6726:7e77 with SMTP id bp35-20020a05620a45a300b0077567267e77mr34653992qkb.10.1699262832478; Mon, 06 Nov 2023 01:27:12 -0800 (PST) ARC-Seal: i=4; a=rsa-sha256; t=1699262832; cv=pass; d=google.com; s=arc-20160816; b=AJOkiWXOL2uPH6MsQ9Z/httNdd0THWDienIRL9aAfAQOaUpBcDonsB++5qtiDeboe4 wT0p4nH1B1Hb5S0zNX+1K/KB3tqpJoSgucg6oMKBkp8v2ShbehDzIKsQa1JMRSJoGN8q knaofis3U6WiRRfJuzBaLr4OfZu/PAqECZ3uxkxNK9jeHZVNtHQcQACeQ50fpi4PFOT2 QvKY/YDaJK6B2vgAo5UWi8UIrSe6EztM7/ZPDeIkk0QcEb+JVqLq/XCB4nBLwMut1WaI 07cuqCr4zwA69mP/yuX6dlB5f63p2Gl3G86Qw4BKDoKGBzuqOtDEdAPvjA1RURcIncw+ 9N4w== ARC-Message-Signature: i=4; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:original-authentication-results :nodisclaimer:mime-version:in-reply-to:content-disposition :message-id:subject:cc:to:from:date:authentication-results-original :dkim-signature:dkim-signature:arc-filter:dmarc-filter:delivered-to; bh=LWEsQiyNUJCUT901INSMMuJZxwvUibDTlyARca3mDyM=; fh=A1gjXEMUlzxnxUbUxlff6kC1EuQ8Y+iSE/p7rbwCbIM=; b=UYC3NhF3m8z5VyrOcRH1y483T8SjVJDCnRCvWeV3ICmfpys6wfSUiKfJ7gujT2nf5Y wcmwX5L5uP+Y26HfsF6i+2sO9MTN0wQM8JBh8PASoajDjr6m7tTOxZOqyxLUxZLSIFev QQQpC2OxA0i3UIw1YwJZ4cZVncIQ/6jDCtPqgMZa3ung/EmPZpxyuiAlTMneZmZfmhBE 7trBYshItcZ5wkU1034sSSLsvMWQKvLtrelAJM6sQ+bJdcfJGVMkclzEWYpyOcP7oUrg 8eUTKm4M35LquRmOlgphNCY9wZ8X3gxbUH47Mcp83VO49qhzzoeyLknEqpReTN8R3FQ5 +Jbw== ARC-Authentication-Results: i=4; mx.google.com; dkim=pass header.i=@armh.onmicrosoft.com header.s=selector2-armh-onmicrosoft-com header.b=tKRLaDqs; dkim=pass header.i=@armh.onmicrosoft.com header.s=selector2-armh-onmicrosoft-com header.b=tKRLaDqs; arc=pass (i=3); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id de43-20020a05620a372b00b00779df1bedbdsi5281286qkb.310.2023.11.06.01.27.12 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 Nov 2023 01:27:12 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@armh.onmicrosoft.com header.s=selector2-armh-onmicrosoft-com header.b=tKRLaDqs; dkim=pass header.i=@armh.onmicrosoft.com header.s=selector2-armh-onmicrosoft-com header.b=tKRLaDqs; arc=pass (i=3); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4624A38845F7 for ; Mon, 6 Nov 2023 07:43:45 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR02-DB5-obe.outbound.protection.outlook.com (mail-db5eur02on2052.outbound.protection.outlook.com [40.107.249.52]) by sourceware.org (Postfix) with ESMTPS id C4480382E6A2 for ; Mon, 6 Nov 2023 07:42:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C4480382E6A2 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org C4480382E6A2 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.249.52 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1699256568; cv=pass; b=iq+kPBjuzgCRGRdSTLIBwMZtuB1npIzyn+22J1JCx0rswp7X4Z/6YC5AcCDO23mIoiS9U6UIjjUWJ9qvo8KIIMhro9hyhguF/VIGS7rv7TstgvXmcl8jO//IM16PUTebwmJG/9XIfmlp1wsxpZcH7HNVIWlF7Dl1WqqkaEqC8ZE= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1699256568; c=relaxed/simple; bh=6hyF6+3IkY0AeixCYvCHcC5ifRCdtvrtRI2E/+YH2YA=; h=DKIM-Signature:DKIM-Signature:Date:From:To:Subject:Message-ID: MIME-Version; b=vKa6ddpWi0Ratn25Y98/5B1kFGns7OEl6nJHInr0URQEYi7zupfYQct1a55bvd8WDC+mvG7nzabwwmvZfm1xoZ83YtVrOlKHpXYgEqfTs856psna9Qa2CxwThdMxFpUjz1eMCBkVyQaSrlpZ3qg2xW8rIXNiHZiTfaCs1M6sbjA= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=FhM6sBuAnpbpY/3pex8zUmc+TciQfFSCrhTHrZztzJxWrYxSQdBZF3jvqsm7ZNwkZ+/KeZJphMSja0YOFVpmIQH0bFlzqZ92BXcoSk8iZflUuuz4PlVXolRCNsU8BBpWjBpUELKvVEinP+5CoCTh9dy0/PxdmCr2zZmeli4l07onFxTabxHIZ6N75r4hw2q8/wJhe7MgIUIndKurmHJAMpADakrJ8qxVBeVMg/MTWkrBJZJagsDpR5wWW7uKTGIhjF8+dsk9WoXZHROBp8To6/p58RBPWs9gfk81s8TMsv6hK7g0vUhtiJAANqjeqO0ZZHDlnenipV/JbKImND6CTw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=LWEsQiyNUJCUT901INSMMuJZxwvUibDTlyARca3mDyM=; b=Y9JNGWe4zdbTT4kZXFM86k5NRKAxqQlCOztm0pzCkeixE1mf2ZZx8i9o0SDEHzYDIu2q7oPQLQFp4ivCsvYyKhKvr9jdVFx04yqDZ5+owekVDtu7VznwlV5QGpvKbBTMU3J/R0OwLGAslpe82PXPV9pO9pYnenPyVz6aXFmOFdE0O4V3znXa3rF8FNoLibaEWhEP6XYEO8XEQDUrOMKN/eJLo50wEZ4F+tNAP7jn73cXjPRibxHFhzwek+V9XkkIAJSXNqt9kpd3h4QbeyxIZgX+nnckyPYOzquAWtsCIgQwHDr3FM6RBSNF9KLACLf7gbopCJ4ORLEqAqXE4zH66w== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=LWEsQiyNUJCUT901INSMMuJZxwvUibDTlyARca3mDyM=; b=tKRLaDqsQW388P5mp2ARr8USHtpW+ZQsWGmoCAgv+cy/OlqRBvGKijYJsEt0srL0udyaMSXbDsOpSt5z/OmClzz0ZxcDCbfdoB0F2Nua0sOK0CQvXURfnyehaoMwEIcQV4vhsqOMYJgvTFAKm1J4G6BWdA1/n37+AlhS4CwdhIQ= Received: from AM6PR08CA0024.eurprd08.prod.outlook.com (2603:10a6:20b:b2::36) by AS8PR08MB6245.eurprd08.prod.outlook.com (2603:10a6:20b:293::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6954.28; Mon, 6 Nov 2023 07:42:41 +0000 Received: from AMS1EPF00000046.eurprd04.prod.outlook.com (2603:10a6:20b:b2:cafe::ad) by AM6PR08CA0024.outlook.office365.com (2603:10a6:20b:b2::36) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6954.28 via Frontend Transport; Mon, 6 Nov 2023 07:42:41 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AMS1EPF00000046.mail.protection.outlook.com (10.167.16.43) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6977.16 via Frontend Transport; Mon, 6 Nov 2023 07:42:41 +0000 Received: ("Tessian outbound 8289ea11ec17:v228"); Mon, 06 Nov 2023 07:42:41 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 7342ef278910c434 X-CR-MTA-TID: 64aa7808 Received: from 7626b2ecfd2d.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 778A69EA-4446-486B-94C2-67FCDB2A69D7.1; Mon, 06 Nov 2023 07:42:34 +0000 Received: from EUR05-AM6-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 7626b2ecfd2d.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 06 Nov 2023 07:42:34 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Etyy0/ozm8j9co+IAFm/LOP913tHA50B32LV2+/HKRnggb0G1IhNYHFZp+R9Qu/majce+tZs9araRiIu64FIxZx2czpaVlLDxn4MK6l/e3+HO4ZUf3SKsRZfRoX+xD98ANibVWG9M1okFaRjkhXUj7E6NoIVZmpgQhRu5Ll7ZFPf108x6kWEAm+Zr+3A8WJ3XjYY78D/KNchxScyEUXwoqGYh/32FdJR9oJYrW/3Q1COL+wWqpFavdPqxliDz5sQEiGRvPShxe9mA630aChE8+tGgdozKtTB6Sknpmsi60B6sn2t2gp0RY+1v7ztjoFIP5kLDunwEBswCceAZM6E+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=LWEsQiyNUJCUT901INSMMuJZxwvUibDTlyARca3mDyM=; b=MduNxdjSyYq7UAAPX+gCkKCJARu6uEblpIHGK/ky6uVO1BCVCf7esif31qP6D1bf1P8QU7KNzYaoiXc1aT0aXwzt79RXKuY9wGRbamrN9br/AP7HNI+wrkvs5nGwnoKdPEQY7FcVzAWkymfChVoz25OpiugOUNPXrYVKFezaC7Yyta0mL7t5qzS1koN/0BZJGp8EtmgntwT7IDtrcX9Kk5mfTsYY5SXeQ7yAl33smX6LNoWXdG9s6FAltm+ChwArFagq0GRYoCQuIb7ftmg6jXiW6poydhKC05B1bObl5XQ/Mf2yiHSfKBZbgG6Gv0HmvTM8oR5iE4ti/ArC0Axsdw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=LWEsQiyNUJCUT901INSMMuJZxwvUibDTlyARca3mDyM=; b=tKRLaDqsQW388P5mp2ARr8USHtpW+ZQsWGmoCAgv+cy/OlqRBvGKijYJsEt0srL0udyaMSXbDsOpSt5z/OmClzz0ZxcDCbfdoB0F2Nua0sOK0CQvXURfnyehaoMwEIcQV4vhsqOMYJgvTFAKm1J4G6BWdA1/n37+AlhS4CwdhIQ= Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by AS8PR08MB6694.eurprd08.prod.outlook.com (2603:10a6:20b:39e::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6954.28; Mon, 6 Nov 2023 07:42:32 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::26aa:efdd:a74a:27d0]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::26aa:efdd:a74a:27d0%5]) with mapi id 15.20.6954.028; Mon, 6 Nov 2023 07:42:32 +0000 Date: Mon, 6 Nov 2023 07:42:30 +0000 From: Tamar Christina To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, Richard.Earnshaw@arm.com, Marcus.Shawcroft@arm.com, Kyrylo.Tkachov@arm.com, richard.sandiford@arm.com Subject: [PATCH 19/21]AArch64: Add optimization for vector cbranch combining SVE and Advanced SIMD Message-ID: Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: LO2P265CA0329.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:a4::29) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|AS8PR08MB6694:EE_|AMS1EPF00000046:EE_|AS8PR08MB6245:EE_ X-MS-Office365-Filtering-Correlation-Id: 08f8f020-a4ab-4858-10b0-08dbde9bf4ea x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: ZHe9u6Cci7HtQJ1mmIiAsl0mwfY4U5QfoVaNyC0uR1sRVvBP8XXNBsdHc+6CxnADyrl49UdIbL0M7Ia4Ksg3QlXBZIYV1a8023qMpWKmP37RHRBt5uHoDO+WSobaxtQ7T9sTECtG6dKgHuRsGSa5DMHu9iLgCs9Ggr8Uz+/lE0wASHoxvx6UTsE9wanv0rzPmv3R2PBRVF9PhEbjEgjfd6ETK8PRX6keSsjMkYUKBgdSKxRV09hEjvFrnGx7ivJOZ8+fHYKcSd8asPZB51Oku9xthIhA/eKk+zUmlnYDWpD22Uf/ei0HXcuhEPLejSA+mNkB+AWo2M7t1RslVcZ5jHUubCJB9B7vBU60F9G285rxyemil+zAMyY3Ff7OYIdm97yLPCpj5VaFmQf2cCZlW6CPnMWT7VCjJH2WqvdPI+QWqxw+mrT17QCsTJfDp5zjvWB612LYjPHl1JdVmYUX4UQfHmpr9+nsJh2r9Nyk4jlMQa2tjr7R8k3zsbzwZT1HOZzHSLyvzegAW/ykKPxbXVr1Off7+U04NGzKad78wCs9xqb4ORXnNua/eC6YPZwX40Gce9ek7QRCFA4JudIHVrP0awBi66dWgDxNIhRL/qzxbmj+xXCTFnQhsymwOptuUei2ETEPWKqMpDVcSjPzYg== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230031)(39860400002)(366004)(396003)(376002)(136003)(346002)(230922051799003)(64100799003)(451199024)(186009)(1800799009)(2906002)(38100700002)(30864003)(4326008)(8676002)(8936002)(41300700001)(84970400001)(83380400001)(5660300002)(86362001)(235185007)(44832011)(6512007)(2616005)(316002)(66556008)(66476007)(66946007)(6916009)(6506007)(33964004)(44144004)(26005)(6486002)(478600001)(36756003)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB6694 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AMS1EPF00000046.eurprd04.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 5e6870be-9e71-414d-bd05-08dbde9bef74 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: XHrKdu7WIJdMVle5scs3P2Oz7pjwQefwOzIDHPRc00LTJPN9FO9xoZLOekbKhLc7CZpufceEVRf1Jo23LYZisdWI4g3vuZGmFicJ9NNE9ffifeNu0facHQqCx/0YXKRuCVn4gv2WYXeZvaLpMFg/HWWudGeUy6k2z2mHYNSe6+PgYUr7i+ow/qo3x823YOOoC99IB+5+u3t094LKcBy6WxWB1aYZzU/vOYqjmW0n/ytTCoucgqrbBS2F636hbclHKd/LGE6AgTcA0scESYKebQIIwcBeEdMITZP3zAlLAKZ4gaPnnPF8qMhthIGop/38nw0Kz8LBZ9b3aH8QURdvd3eBqwHdxysvUA95lhx2NJGkOSOwUdkCvPH3U9hS4xiPz6Wfk2+Hk6AkA/mMTszAgVoaS8KtL8fwz1JJm6SXnwJvog5JtR04dDzlVCKBnAfxVV2eNOhIWAJ15nW/KWCqZ1kYWqktdz1Q/KEDHU7fWp+H83iacxiLldb2uPyarPB0M4zoIHjx5IOZyWk8d5QOxUC+BAVUDBzBBfNFq49DIfLVOKBPupR/K7rbcVQ5/f+XSNRazHah5g6nsPVzpuHmkugz23sujPCFHgw3/GarKA03oKdDX/1Fcb2kJ5bD4ArL5mj9fx7O2yHGCJAvE9gwVuXlLNOW4sTcg4BXUFdp7l4+09x523Kl2lPA0XVUxSfNcv3Jk1CQmTwcpUC68RCYk3KhjA8kO9z71qeujcP1v70DLqHb2BMnk7pNlmVN/SJP15TAaNnKxim7BzmXpO9IdDIY1+iiaxBeEVe8Vj2wH/M= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230031)(4636009)(136003)(376002)(346002)(396003)(39860400002)(230922051799003)(64100799003)(451199024)(186009)(82310400011)(1800799009)(40470700004)(46966006)(36840700001)(40480700001)(40460700003)(84970400001)(6506007)(478600001)(6512007)(2616005)(44144004)(6486002)(33964004)(44832011)(36860700001)(36756003)(82740400003)(356005)(81166007)(86362001)(235185007)(70206006)(5660300002)(316002)(70586007)(41300700001)(336012)(47076005)(83380400001)(2906002)(30864003)(26005)(8676002)(4326008)(6916009)(8936002)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Nov 2023 07:42:41.2709 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 08f8f020-a4ab-4858-10b0-08dbde9bf4ea X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AMS1EPF00000046.eurprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB6245 X-Spam-Status: No, score=-11.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_DMARC_NONE, KAM_LOTSOFHASH, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781806223892469350 X-GMAIL-MSGID: 1781806223892469350 Hi All, Advanced SIMD lacks flag setting vector comparisons which SVE adds. Since machines with SVE also support Advanced SIMD we can use the SVE comparisons to perform the operation in cases where SVE codegen is allowed, but the vectorizer has decided to generate Advanced SIMD because of loop costing. e.g. for void f1 (int x) { for (int i = 0; i < N; i++) { b[i] += a[i]; if (a[i] != x) break; } } We currently generate: cmeq v31.4s, v31.4s, v28.4s uminp v31.4s, v31.4s, v31.4s fmov x5, d31 cbz x5, .L2 and after this patch: ptrue p7.b, vl16 ... cmpne p15.s, p7/z, z31.s, z28.s b.any .L2 Because we need to lift the predicate creation to outside of the loop we need to expand the predicate early, however in the cbranch expansion we don't see the outer compare which we need to consume. For this reason the expansion is two fold, when expanding the cbranch we emit an SVE predicated comparison and later on during combine we match the SVE and NEON comparison while also consuming the ptest. Unfortunately *aarch64_pred_cmpne_neon_ptest is needed because for some reason combine destroys the NOT and transforms it into a plus and -1. For the straight SVE ones, we seem to fail to eliminate the ptest in these cases but that's a separate optimization Test show that I'm missing a few, but before I write the patterns for them, are these OK? Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64-simd.md (cbranch4): Update with SVE. * config/aarch64/aarch64-sve.md (*aarch64_pred_cmp_neon_ptest, *aarch64_pred_cmpeq_neon_ptest, *aarch64_pred_cmpne_neon_ptest): New. (aarch64_ptest): Rename to... (@aarch64_ptest): ... This. * genemit.cc: Include rtx-vector-builder.h. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/vect-early-break-cbranch_1.c: New test. * gcc.target/aarch64/sve/vect-early-break-cbranch_2.c: New test. --- inline copy of patch -- diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index c06dd4fd6f85e07f0d4a77992b2bc06f04a1935b..33799dc35a1b90dd60d7e487ec41c5d84fb215a5 100644 --- diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index c06dd4fd6f85e07f0d4a77992b2bc06f04a1935b..33799dc35a1b90dd60d7e487ec41c5d84fb215a5 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -3883,31 +3883,58 @@ (define_expand "cbranch4" "TARGET_SIMD" { auto code = GET_CODE (operands[0]); - rtx tmp = operands[1]; - - /* If comparing against a non-zero vector we have to do a comparison first - so we can have a != 0 comparison with the result. */ - if (operands[2] != CONST0_RTX (mode)) - emit_insn (gen_vec_cmp (tmp, operands[0], operands[1], - operands[2])); - - /* For 64-bit vectors we need no reductions. */ - if (known_eq (128, GET_MODE_BITSIZE (mode))) + /* If SVE is available, lets borrow some instructions. We will optimize + these further later in combine. */ + if (TARGET_SVE) { - /* Always reduce using a V4SI. */ - rtx reduc = gen_lowpart (V4SImode, tmp); - rtx res = gen_reg_rtx (V4SImode); - emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc)); - emit_move_insn (tmp, gen_lowpart (mode, res)); + machine_mode full_mode = aarch64_full_sve_mode (mode).require (); + rtx in1 = lowpart_subreg (full_mode, operands[1], mode); + rtx in2 = lowpart_subreg (full_mode, operands[2], mode); + + machine_mode pred_mode = aarch64_sve_pred_mode (full_mode); + rtx_vector_builder builder (VNx16BImode, 16, 2); + for (unsigned int i = 0; i < 16; ++i) + builder.quick_push (CONST1_RTX (BImode)); + for (unsigned int i = 0; i < 16; ++i) + builder.quick_push (CONST0_RTX (BImode)); + rtx ptrue = force_reg (VNx16BImode, builder.build ()); + rtx cast_ptrue = gen_lowpart (pred_mode, ptrue); + rtx ptrue_flag = gen_int_mode (SVE_KNOWN_PTRUE, SImode); + + rtx tmp = gen_reg_rtx (pred_mode); + aarch64_expand_sve_vec_cmp_int (tmp, reverse_condition (code), in1, in2); + emit_insn (gen_aarch64_ptest (pred_mode, ptrue, cast_ptrue, ptrue_flag, tmp)); + operands[1] = gen_rtx_REG (CC_NZCmode, CC_REGNUM); + operands[2] = const0_rtx; } + else + { + rtx tmp = operands[1]; - rtx val = gen_reg_rtx (DImode); - emit_move_insn (val, gen_lowpart (DImode, tmp)); + /* If comparing against a non-zero vector we have to do a comparison first + so we can have a != 0 comparison with the result. */ + if (operands[2] != CONST0_RTX (mode)) + emit_insn (gen_vec_cmp (tmp, operands[0], operands[1], + operands[2])); - rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx); - rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx); - emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3])); - DONE; + /* For 64-bit vectors we need no reductions. */ + if (known_eq (128, GET_MODE_BITSIZE (mode))) + { + /* Always reduce using a V4SI. */ + rtx reduc = gen_lowpart (V4SImode, tmp); + rtx res = gen_reg_rtx (V4SImode); + emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc)); + emit_move_insn (tmp, gen_lowpart (mode, res)); + } + + rtx val = gen_reg_rtx (DImode); + emit_move_insn (val, gen_lowpart (DImode, tmp)); + + rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx); + rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx); + emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3])); + DONE; + } }) ;; Avdanced SIMD lacks a vector != comparison, but this is a quite common diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md index 5a652d8536a0ef9461f40da7b22834e683e73ceb..d9cc5c7e5629691e7abba7a18e308d35082e027d 100644 --- a/gcc/config/aarch64/aarch64-sve.md +++ b/gcc/config/aarch64/aarch64-sve.md @@ -8123,6 +8123,105 @@ (define_insn "*aarch64_pred_cmp_wide_ptest" "cmp\t%0., %1/z, %2., %3.d" ) +;; Predicated integer comparisons over Advanced SIMD arguments in which only +;; the flags result is interesting. +(define_insn "*aarch64_pred_cmp_neon_ptest" + [(set (reg:CC_NZC CC_REGNUM) + (unspec:CC_NZC + [(match_operand:VNx16BI 1 "register_operand" "Upl") + (match_operand 4) + (match_operand:SI 5 "aarch64_sve_ptrue_flag") + (unspec:VNx4BI + [(match_operand:VNx4BI 6 "register_operand" "Upl") + (match_operand:SI 7 "aarch64_sve_ptrue_flag") + (EQL:VNx4BI + (subreg:SVE_FULL_BHSI + (neg: + (UCOMPARISONS: + (match_operand: 2 "register_operand" "w") + (match_operand: 3 "aarch64_simd_reg_or_zero" "w"))) 0) + (match_operand:SVE_FULL_BHSI 8 "aarch64_simd_imm_zero" "Dz"))] + UNSPEC_PRED_Z)] + UNSPEC_PTEST)) + (clobber (match_scratch:VNx4BI 0 "=Upa"))] + "TARGET_SVE + && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" +{ + operands[2] = lowpart_subreg (mode, operands[2], mode); + operands[3] = lowpart_subreg (mode, operands[3], mode); + if (EQ == ) + std::swap (operands[2], operands[3]); + + return "cmp\t%0., %1/z, %2., %3."; +} +) + +;; Predicated integer comparisons over Advanced SIMD arguments in which only +;; the flags result is interesting. +(define_insn "*aarch64_pred_cmpeq_neon_ptest" + [(set (reg:CC_NZC CC_REGNUM) + (unspec:CC_NZC + [(match_operand:VNx16BI 1 "register_operand" "Upl") + (match_operand 4) + (match_operand:SI 5 "aarch64_sve_ptrue_flag") + (unspec:VNx4BI + [(match_operand:VNx4BI 6 "register_operand" "Upl") + (match_operand:SI 7 "aarch64_sve_ptrue_flag") + (EQL:VNx4BI + (subreg:SVE_FULL_BHSI + (neg: + (eq: + (match_operand: 2 "register_operand" "w") + (match_operand: 3 "aarch64_simd_reg_or_zero" "w"))) 0) + (match_operand:SVE_FULL_BHSI 8 "aarch64_simd_imm_zero" "Dz"))] + UNSPEC_PRED_Z)] + UNSPEC_PTEST)) + (clobber (match_scratch:VNx4BI 0 "=Upa"))] + "TARGET_SVE + && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" +{ + operands[2] = lowpart_subreg (mode, operands[2], mode); + operands[3] = lowpart_subreg (mode, operands[3], mode); + if (EQ == ) + std::swap (operands[2], operands[3]); + + return "cmpeq\t%0., %1/z, %2., %3."; +} +) + +;; Same as the above but version for == and != +(define_insn "*aarch64_pred_cmpne_neon_ptest" + [(set (reg:CC_NZC CC_REGNUM) + (unspec:CC_NZC + [(match_operand:VNx16BI 1 "register_operand" "Upl") + (match_operand 4) + (match_operand:SI 5 "aarch64_sve_ptrue_flag") + (unspec:VNx4BI + [(match_operand:VNx4BI 6 "register_operand" "Upl") + (match_operand:SI 7 "aarch64_sve_ptrue_flag") + (EQL:VNx4BI + (subreg:SVE_FULL_BHSI + (plus: + (eq: + (match_operand: 2 "register_operand" "w") + (match_operand: 3 "aarch64_simd_reg_or_zero" "w")) + (match_operand: 9 "aarch64_simd_imm_minus_one" "i")) 0) + (match_operand:SVE_FULL_BHSI 8 "aarch64_simd_imm_zero" "Dz"))] + UNSPEC_PRED_Z)] + UNSPEC_PTEST)) + (clobber (match_scratch:VNx4BI 0 "=Upa"))] + "TARGET_SVE + && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" +{ + operands[2] = lowpart_subreg (mode, operands[2], mode); + operands[3] = lowpart_subreg (mode, operands[3], mode); + if (EQ == ) + std::swap (operands[2], operands[3]); + + return "cmpne\t%0., %1/z, %2., %3."; +} +) + ;; ------------------------------------------------------------------------- ;; ---- [INT] While tests ;; ------------------------------------------------------------------------- @@ -8602,7 +8701,7 @@ (define_expand "cbranch4" ) ;; See "Description of UNSPEC_PTEST" above for details. -(define_insn "aarch64_ptest" +(define_insn "@aarch64_ptest" [(set (reg:CC_NZC CC_REGNUM) (unspec:CC_NZC [(match_operand:VNx16BI 0 "register_operand" "Upa") (match_operand 1) diff --git a/gcc/genemit.cc b/gcc/genemit.cc index 1ce0564076d8b0d39542f49dd51e5df01cc83c35..73309ca00ec0aa3cd76c85e04535bac44cb2f354 100644 --- a/gcc/genemit.cc +++ b/gcc/genemit.cc @@ -906,6 +906,7 @@ from the machine description file `md'. */\n\n"); printf ("#include \"tm-constrs.h\"\n"); printf ("#include \"ggc.h\"\n"); printf ("#include \"target.h\"\n\n"); + printf ("#include \"rtx-vector-builder.h\"\n\n"); /* Read the machine description. */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_1.c b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_1.c new file mode 100644 index 0000000000000000000000000000000000000000..c281cfccbe12f0ac8c01ede563dbe325237902c9 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_1.c @@ -0,0 +1,117 @@ +/* { dg-do compile } */ +/* { dg-options "-O3" } */ +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */ + +#define N 640 +int a[N] = {0}; +int b[N] = {0}; + + +/* +** f1: +** ... +** cmpgt p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 +** ptest p[0-9]+, p[0-9]+.b +** b.any \.L[0-9]+ +** ... +*/ +void f1 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] > 0) + break; + } +} + +/* +** f2: +** ... +** cmpge p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 +** ptest p[0-9]+, p[0-9]+.b +** b.any \.L[0-9]+ +** ... +*/ +void f2 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] >= 0) + break; + } +} + +/* +** f3: +** ... +** cmpeq p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 +** ptest p[0-9]+, p[0-9]+.b +** b.any \.L[0-9]+ +** ... +*/ +void f3 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] == 0) + break; + } +} + +/* +** f4: +** ... +** cmpne p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 +** ptest p[0-9]+, p[0-9]+.b +** b.any \.L[0-9]+ +** ... +*/ +void f4 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] != 0) + break; + } +} + +/* +** f5: +** ... +** cmplt p[0-9]+.s, p7/z, z[0-9]+.s, #0 +** ptest p[0-9]+, p[0-9]+.b +** b.any .L[0-9]+ +** ... +*/ +void f5 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] < 0) + break; + } +} + +/* +** f6: +** ... +** cmple p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 +** ptest p[0-9]+, p[0-9]+.b +** b.any \.L[0-9]+ +** ... +*/ +void f6 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] <= 0) + break; + } +} + diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_2.c b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_2.c new file mode 100644 index 0000000000000000000000000000000000000000..f1ca3eafc5ae33393a7df9b5e40fa3420a79bfc2 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_2.c @@ -0,0 +1,114 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 --param=aarch64-autovec-preference=1" } */ +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */ + +#define N 640 +int a[N] = {0}; +int b[N] = {0}; + + +/* +** f1: +** ... +** cmgt v[0-9]+.4s, v[0-9]+.4s, #0 +** cmpne p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 +** b.any \.L[0-9]+ +** ... +*/ +void f1 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] > 0) + break; + } +} + +/* +** f2: +** ... +** cmge v[0-9]+.4s, v[0-9]+.4s, #0 +** cmpne p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 +** b.any \.L[0-9]+ +** ... +*/ +void f2 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] >= 0) + break; + } +} + +/* +** f3: +** ... +** cmpeq p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, z[0-9]+.s +** b.any \.L[0-9]+ +** ... +*/ +void f3 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] == 0) + break; + } +} + +/* +** f4: +** ... +** cmpne p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, z[0-9]+.s +** b.any \.L[0-9]+ +** ... +*/ +void f4 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] != 0) + break; + } +} + +/* +** f5: +** ... +** cmlt v[0-9]+.4s, v[0-9]+.4s, #0 +** cmpne p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 +** b.any \.L[0-9]+ +** ... +*/ +void f5 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] < 0) + break; + } +} + +/* +** f6: +** ... +** cmle v[0-9]+.4s, v[0-9]+.4s, #0 +** cmpne p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 +** b.any \.L[0-9]+ +** ... +*/ +void f6 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] <= 0) + break; + } +} --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -3883,31 +3883,58 @@ (define_expand "cbranch4" "TARGET_SIMD" { auto code = GET_CODE (operands[0]); - rtx tmp = operands[1]; - - /* If comparing against a non-zero vector we have to do a comparison first - so we can have a != 0 comparison with the result. */ - if (operands[2] != CONST0_RTX (mode)) - emit_insn (gen_vec_cmp (tmp, operands[0], operands[1], - operands[2])); - - /* For 64-bit vectors we need no reductions. */ - if (known_eq (128, GET_MODE_BITSIZE (mode))) + /* If SVE is available, lets borrow some instructions. We will optimize + these further later in combine. */ + if (TARGET_SVE) { - /* Always reduce using a V4SI. */ - rtx reduc = gen_lowpart (V4SImode, tmp); - rtx res = gen_reg_rtx (V4SImode); - emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc)); - emit_move_insn (tmp, gen_lowpart (mode, res)); + machine_mode full_mode = aarch64_full_sve_mode (mode).require (); + rtx in1 = lowpart_subreg (full_mode, operands[1], mode); + rtx in2 = lowpart_subreg (full_mode, operands[2], mode); + + machine_mode pred_mode = aarch64_sve_pred_mode (full_mode); + rtx_vector_builder builder (VNx16BImode, 16, 2); + for (unsigned int i = 0; i < 16; ++i) + builder.quick_push (CONST1_RTX (BImode)); + for (unsigned int i = 0; i < 16; ++i) + builder.quick_push (CONST0_RTX (BImode)); + rtx ptrue = force_reg (VNx16BImode, builder.build ()); + rtx cast_ptrue = gen_lowpart (pred_mode, ptrue); + rtx ptrue_flag = gen_int_mode (SVE_KNOWN_PTRUE, SImode); + + rtx tmp = gen_reg_rtx (pred_mode); + aarch64_expand_sve_vec_cmp_int (tmp, reverse_condition (code), in1, in2); + emit_insn (gen_aarch64_ptest (pred_mode, ptrue, cast_ptrue, ptrue_flag, tmp)); + operands[1] = gen_rtx_REG (CC_NZCmode, CC_REGNUM); + operands[2] = const0_rtx; } + else + { + rtx tmp = operands[1]; - rtx val = gen_reg_rtx (DImode); - emit_move_insn (val, gen_lowpart (DImode, tmp)); + /* If comparing against a non-zero vector we have to do a comparison first + so we can have a != 0 comparison with the result. */ + if (operands[2] != CONST0_RTX (mode)) + emit_insn (gen_vec_cmp (tmp, operands[0], operands[1], + operands[2])); - rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx); - rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx); - emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3])); - DONE; + /* For 64-bit vectors we need no reductions. */ + if (known_eq (128, GET_MODE_BITSIZE (mode))) + { + /* Always reduce using a V4SI. */ + rtx reduc = gen_lowpart (V4SImode, tmp); + rtx res = gen_reg_rtx (V4SImode); + emit_insn (gen_aarch64_umaxpv4si (res, reduc, reduc)); + emit_move_insn (tmp, gen_lowpart (mode, res)); + } + + rtx val = gen_reg_rtx (DImode); + emit_move_insn (val, gen_lowpart (DImode, tmp)); + + rtx cc_reg = aarch64_gen_compare_reg (code, val, const0_rtx); + rtx cmp_rtx = gen_rtx_fmt_ee (code, DImode, cc_reg, const0_rtx); + emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[3])); + DONE; + } }) ;; Avdanced SIMD lacks a vector != comparison, but this is a quite common diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md index 5a652d8536a0ef9461f40da7b22834e683e73ceb..d9cc5c7e5629691e7abba7a18e308d35082e027d 100644 --- a/gcc/config/aarch64/aarch64-sve.md +++ b/gcc/config/aarch64/aarch64-sve.md @@ -8123,6 +8123,105 @@ (define_insn "*aarch64_pred_cmp_wide_ptest" "cmp\t%0., %1/z, %2., %3.d" ) +;; Predicated integer comparisons over Advanced SIMD arguments in which only +;; the flags result is interesting. +(define_insn "*aarch64_pred_cmp_neon_ptest" + [(set (reg:CC_NZC CC_REGNUM) + (unspec:CC_NZC + [(match_operand:VNx16BI 1 "register_operand" "Upl") + (match_operand 4) + (match_operand:SI 5 "aarch64_sve_ptrue_flag") + (unspec:VNx4BI + [(match_operand:VNx4BI 6 "register_operand" "Upl") + (match_operand:SI 7 "aarch64_sve_ptrue_flag") + (EQL:VNx4BI + (subreg:SVE_FULL_BHSI + (neg: + (UCOMPARISONS: + (match_operand: 2 "register_operand" "w") + (match_operand: 3 "aarch64_simd_reg_or_zero" "w"))) 0) + (match_operand:SVE_FULL_BHSI 8 "aarch64_simd_imm_zero" "Dz"))] + UNSPEC_PRED_Z)] + UNSPEC_PTEST)) + (clobber (match_scratch:VNx4BI 0 "=Upa"))] + "TARGET_SVE + && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" +{ + operands[2] = lowpart_subreg (mode, operands[2], mode); + operands[3] = lowpart_subreg (mode, operands[3], mode); + if (EQ == ) + std::swap (operands[2], operands[3]); + + return "cmp\t%0., %1/z, %2., %3."; +} +) + +;; Predicated integer comparisons over Advanced SIMD arguments in which only +;; the flags result is interesting. +(define_insn "*aarch64_pred_cmpeq_neon_ptest" + [(set (reg:CC_NZC CC_REGNUM) + (unspec:CC_NZC + [(match_operand:VNx16BI 1 "register_operand" "Upl") + (match_operand 4) + (match_operand:SI 5 "aarch64_sve_ptrue_flag") + (unspec:VNx4BI + [(match_operand:VNx4BI 6 "register_operand" "Upl") + (match_operand:SI 7 "aarch64_sve_ptrue_flag") + (EQL:VNx4BI + (subreg:SVE_FULL_BHSI + (neg: + (eq: + (match_operand: 2 "register_operand" "w") + (match_operand: 3 "aarch64_simd_reg_or_zero" "w"))) 0) + (match_operand:SVE_FULL_BHSI 8 "aarch64_simd_imm_zero" "Dz"))] + UNSPEC_PRED_Z)] + UNSPEC_PTEST)) + (clobber (match_scratch:VNx4BI 0 "=Upa"))] + "TARGET_SVE + && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" +{ + operands[2] = lowpart_subreg (mode, operands[2], mode); + operands[3] = lowpart_subreg (mode, operands[3], mode); + if (EQ == ) + std::swap (operands[2], operands[3]); + + return "cmpeq\t%0., %1/z, %2., %3."; +} +) + +;; Same as the above but version for == and != +(define_insn "*aarch64_pred_cmpne_neon_ptest" + [(set (reg:CC_NZC CC_REGNUM) + (unspec:CC_NZC + [(match_operand:VNx16BI 1 "register_operand" "Upl") + (match_operand 4) + (match_operand:SI 5 "aarch64_sve_ptrue_flag") + (unspec:VNx4BI + [(match_operand:VNx4BI 6 "register_operand" "Upl") + (match_operand:SI 7 "aarch64_sve_ptrue_flag") + (EQL:VNx4BI + (subreg:SVE_FULL_BHSI + (plus: + (eq: + (match_operand: 2 "register_operand" "w") + (match_operand: 3 "aarch64_simd_reg_or_zero" "w")) + (match_operand: 9 "aarch64_simd_imm_minus_one" "i")) 0) + (match_operand:SVE_FULL_BHSI 8 "aarch64_simd_imm_zero" "Dz"))] + UNSPEC_PRED_Z)] + UNSPEC_PTEST)) + (clobber (match_scratch:VNx4BI 0 "=Upa"))] + "TARGET_SVE + && aarch64_sve_same_pred_for_ptest_p (&operands[4], &operands[6])" +{ + operands[2] = lowpart_subreg (mode, operands[2], mode); + operands[3] = lowpart_subreg (mode, operands[3], mode); + if (EQ == ) + std::swap (operands[2], operands[3]); + + return "cmpne\t%0., %1/z, %2., %3."; +} +) + ;; ------------------------------------------------------------------------- ;; ---- [INT] While tests ;; ------------------------------------------------------------------------- @@ -8602,7 +8701,7 @@ (define_expand "cbranch4" ) ;; See "Description of UNSPEC_PTEST" above for details. -(define_insn "aarch64_ptest" +(define_insn "@aarch64_ptest" [(set (reg:CC_NZC CC_REGNUM) (unspec:CC_NZC [(match_operand:VNx16BI 0 "register_operand" "Upa") (match_operand 1) diff --git a/gcc/genemit.cc b/gcc/genemit.cc index 1ce0564076d8b0d39542f49dd51e5df01cc83c35..73309ca00ec0aa3cd76c85e04535bac44cb2f354 100644 --- a/gcc/genemit.cc +++ b/gcc/genemit.cc @@ -906,6 +906,7 @@ from the machine description file `md'. */\n\n"); printf ("#include \"tm-constrs.h\"\n"); printf ("#include \"ggc.h\"\n"); printf ("#include \"target.h\"\n\n"); + printf ("#include \"rtx-vector-builder.h\"\n\n"); /* Read the machine description. */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_1.c b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_1.c new file mode 100644 index 0000000000000000000000000000000000000000..c281cfccbe12f0ac8c01ede563dbe325237902c9 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_1.c @@ -0,0 +1,117 @@ +/* { dg-do compile } */ +/* { dg-options "-O3" } */ +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */ + +#define N 640 +int a[N] = {0}; +int b[N] = {0}; + + +/* +** f1: +** ... +** cmpgt p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 +** ptest p[0-9]+, p[0-9]+.b +** b.any \.L[0-9]+ +** ... +*/ +void f1 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] > 0) + break; + } +} + +/* +** f2: +** ... +** cmpge p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 +** ptest p[0-9]+, p[0-9]+.b +** b.any \.L[0-9]+ +** ... +*/ +void f2 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] >= 0) + break; + } +} + +/* +** f3: +** ... +** cmpeq p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 +** ptest p[0-9]+, p[0-9]+.b +** b.any \.L[0-9]+ +** ... +*/ +void f3 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] == 0) + break; + } +} + +/* +** f4: +** ... +** cmpne p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 +** ptest p[0-9]+, p[0-9]+.b +** b.any \.L[0-9]+ +** ... +*/ +void f4 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] != 0) + break; + } +} + +/* +** f5: +** ... +** cmplt p[0-9]+.s, p7/z, z[0-9]+.s, #0 +** ptest p[0-9]+, p[0-9]+.b +** b.any .L[0-9]+ +** ... +*/ +void f5 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] < 0) + break; + } +} + +/* +** f6: +** ... +** cmple p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 +** ptest p[0-9]+, p[0-9]+.b +** b.any \.L[0-9]+ +** ... +*/ +void f6 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] <= 0) + break; + } +} + diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_2.c b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_2.c new file mode 100644 index 0000000000000000000000000000000000000000..f1ca3eafc5ae33393a7df9b5e40fa3420a79bfc2 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/vect-early-break-cbranch_2.c @@ -0,0 +1,114 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 --param=aarch64-autovec-preference=1" } */ +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */ + +#define N 640 +int a[N] = {0}; +int b[N] = {0}; + + +/* +** f1: +** ... +** cmgt v[0-9]+.4s, v[0-9]+.4s, #0 +** cmpne p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 +** b.any \.L[0-9]+ +** ... +*/ +void f1 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] > 0) + break; + } +} + +/* +** f2: +** ... +** cmge v[0-9]+.4s, v[0-9]+.4s, #0 +** cmpne p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 +** b.any \.L[0-9]+ +** ... +*/ +void f2 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] >= 0) + break; + } +} + +/* +** f3: +** ... +** cmpeq p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, z[0-9]+.s +** b.any \.L[0-9]+ +** ... +*/ +void f3 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] == 0) + break; + } +} + +/* +** f4: +** ... +** cmpne p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, z[0-9]+.s +** b.any \.L[0-9]+ +** ... +*/ +void f4 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] != 0) + break; + } +} + +/* +** f5: +** ... +** cmlt v[0-9]+.4s, v[0-9]+.4s, #0 +** cmpne p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 +** b.any \.L[0-9]+ +** ... +*/ +void f5 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] < 0) + break; + } +} + +/* +** f6: +** ... +** cmle v[0-9]+.4s, v[0-9]+.4s, #0 +** cmpne p[0-9]+.s, p[0-9]+/z, z[0-9]+.s, #0 +** b.any \.L[0-9]+ +** ... +*/ +void f6 () +{ + for (int i = 0; i < N; i++) + { + b[i] += a[i]; + if (a[i] <= 0) + break; + } +}