From patchwork Wed Jan 24 09:20:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 191443 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:2553:b0:103:945f:af90 with SMTP id p19csp870248dyi; Wed, 24 Jan 2024 01:23:01 -0800 (PST) X-Google-Smtp-Source: AGHT+IEmpNwWjO1t208PF5i04N6dh/P4RmpxNEl76XDNYhT3QRmyk/tJIBf2TtLCd3eJ9C3s5X+J X-Received: by 2002:ac8:7f46:0:b0:42a:5997:2917 with SMTP id g6-20020ac87f46000000b0042a59972917mr1664197qtk.21.1706088181028; Wed, 24 Jan 2024 01:23:01 -0800 (PST) ARC-Seal: i=4; a=rsa-sha256; t=1706088181; cv=pass; d=google.com; s=arc-20160816; b=VpwqcogLzVJ7lLcGeu8MbRuPn/1EvCnTRPDe9W7KHx2FlB3Jc4AiV/l8n30TsI39z2 wCb6LaszOsZN8gWjKru8tZlAxxFLN9PCJCi4cTmrr7Up7rzcuUjUSEBjTPF7w9Kr4TxV zOX/dJg1yOGwFTMYVTjjTOiVlwfuRqp6P9QSE6AspBZvNNu26OEb31tor/CSoMZtlo/X 9z5VZuYmCTyYotBORP392F25n1q/9h1AAQBIpDxoHViQYYk4VuwbLlJsYO7GdJXm6EWc PfKJghK3YO/3bSaMKLi4Yy81e865zrcp0kaKDD03f85Yjr1MFfy9CmzaZJX9TEM5ZlWS dK5w== ARC-Message-Signature: i=4; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:original-authentication-results :nodisclaimer:mime-version:content-disposition:message-id:subject:cc :to:from:date:authentication-results-original:dkim-signature :dkim-signature:arc-filter:dmarc-filter:delivered-to; bh=zYoEd27pB27LInDgYw0lAhhLLN2O85nCqCAwx5XYP18=; fh=A1gjXEMUlzxnxUbUxlff6kC1EuQ8Y+iSE/p7rbwCbIM=; b=UbQr0z6FFFJudpuigdtM6B2oBN6MOqjFGG3eL34l8geT2T1g2d4CHmJ+8Qppy4DEKz aERO1he8TL8U2bvNaQDXqGzCVl9y8D2zVFR1heKjIHTNFiXaQhkdjtbQyYBTfrwihidc KxccNLzPeCc/spcaurkNzzP6B93VzU/dXxG6aXdI0hX7Z8U82lOXxrZOYhjYCTjIRQIT p2sOs4T3DA3mgu97V8NDRbDFyj0BzPSC2FAGy7fwWh74a9qIQCDcs1CSocB8x1ajYUku ZJRMK1/rPAoXdCxRZ2mxJwVsIZ84kmbj5SLRXrojpPJmlVf8Lr6n3UgWxYgGQ8vvEf+f kvyA== ARC-Authentication-Results: i=4; mx.google.com; dkim=pass header.i=@armh.onmicrosoft.com header.s=selector2-armh-onmicrosoft-com header.b=iM4+rte3; dkim=pass header.i=@armh.onmicrosoft.com header.s=selector2-armh-onmicrosoft-com header.b=iM4+rte3; arc=pass (i=3); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id q9-20020a05622a04c900b00429f1d32437si9794510qtx.280.2024.01.24.01.23.00 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Jan 2024 01:23:01 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@armh.onmicrosoft.com header.s=selector2-armh-onmicrosoft-com header.b=iM4+rte3; dkim=pass header.i=@armh.onmicrosoft.com header.s=selector2-armh-onmicrosoft-com header.b=iM4+rte3; arc=pass (i=3); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B4613385783E for ; Wed, 24 Jan 2024 09:23:00 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR05-DB8-obe.outbound.protection.outlook.com (mail-db8eur05on2076.outbound.protection.outlook.com [40.107.20.76]) by sourceware.org (Postfix) with ESMTPS id 5E9E73858C29 for ; Wed, 24 Jan 2024 09:21:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5E9E73858C29 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 5E9E73858C29 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.20.76 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1706088116; cv=pass; b=kwskMz77TJXEpnlMhs3nbkLgCdvrdtchT8aO2lAfqMQJLL44LXCGCSI4jb1ClkW1ExH5Ta2m1EJsQjb5pFssJDTOr3gRV56BcD8s9AObG6ecTpz2apFpLe92Zylctp9aZ+UyxRoX+FFFabnxQikgIM5wUvivc7TGyXq4It8Y0YA= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1706088116; c=relaxed/simple; bh=UTOCdhmMwbnR3CdF/QvA+PkB9sJTdVPZGTgEr25R3WA=; h=DKIM-Signature:DKIM-Signature:Date:From:To:Subject:Message-ID: MIME-Version; b=DoRnStFBPBW3qu2S6H+g4toqKkUzgZVxOzwYu8M9KLAAnVU/oU/ZkqFmsVX8nXIZox0czSQOQ4bqib5wkkdYYd8/EYNRLgtQDAHtWEm1oWewtFYgfTJbAf7DwEfO95kVH+UoMMq3vYQxbYG/BSbbmmNLdCCXn0hJaNeeUppFE5c= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=lPBVfCRIKHz1WyxNNC2uZ37YBkoj5ilcljTAS5i4LgL/XvUvsWs2eyjiBCLYue9cUTEqahlxlNowXwWhBJouhxC76f7JzalppKQXxNkbsXirYSAGCbwELwF1Rog0T/8GxoK2pwa0DvE6Dw+XgdxpL0BQUfvB+HIIrbwACszqeT3Ga1Ja5jezv0i918OZECQZdAlKpcpeKZsoo/r99kjhzyY07ae0boWwwrRDSl0X3mLUDOO+IxCRX3bZ+gj//q3NifRGPMB3+m27okxA4hbYD0gWHO3YYY8asrhe/n3ow2q0g2cKXe6BMkbxdkUOaAthorWQ8XoRIiJKCNkHuGtYVw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=zYoEd27pB27LInDgYw0lAhhLLN2O85nCqCAwx5XYP18=; b=GQ8gTfnB72mOhpdNaWP1RvEpszda8x4zMJJ00MaycAd3YUEQ6DENasN/PUcBpJkkgfR4UJTG91UEfLMELwHQ5KwrhUTWxvQcEjzAkcwFaE9LyTkjMAGb4mE7oqWUmXfSdtyLIJh/CePB5m0M6ip0QyC8NsMA7e1BCW9xd0J2qQU8Q6LotOjIfMuQR3o5dJf11F2MZqDBGd9crAsxqBO6YSeTPeUKKXRv+z/guV6BajC2QjRFgFkNSM0zof1WrBDF66VZhea/RRiGbfDQN/+92Ew0wS0rZTpjJnvvXD9Fnoq6M+CDhTEeg98PPPIQx3CeskU2B7JT3OA0zOagef1VDA== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=zYoEd27pB27LInDgYw0lAhhLLN2O85nCqCAwx5XYP18=; b=iM4+rte3h4OUVumU8uux/Jly+fgYaDAJKbzWcB9EDcqezTd4lOIILciwCmT/FTPu7Q1vPsBPRlSX1m/ICN8G7asDjYBaKToNXpjcfMng+HAA0z2dhFgb0kEza6kqcFvyFA8CtWK/NjEsH6en01SDWb4C2KCvD1EWyUncGf9egQc= Received: from AS9PR06CA0199.eurprd06.prod.outlook.com (2603:10a6:20b:45d::18) by AS2PR08MB9269.eurprd08.prod.outlook.com (2603:10a6:20b:59e::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7202.37; Wed, 24 Jan 2024 09:21:47 +0000 Received: from AMS1EPF0000004C.eurprd04.prod.outlook.com (2603:10a6:20b:45d:cafe::ef) by AS9PR06CA0199.outlook.office365.com (2603:10a6:20b:45d::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7228.22 via Frontend Transport; Wed, 24 Jan 2024 09:21:47 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AMS1EPF0000004C.mail.protection.outlook.com (10.167.16.137) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7202.16 via Frontend Transport; Wed, 24 Jan 2024 09:21:46 +0000 Received: ("Tessian outbound a297577ee0df:v228"); Wed, 24 Jan 2024 09:21:46 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 282fa0567471cdfa X-CR-MTA-TID: 64aa7808 Received: from 59d228fdbb02.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 3418AA09-A2E7-4A51-96AC-CA0FED305588.1; Wed, 24 Jan 2024 09:20:42 +0000 Received: from EUR04-DB3-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 59d228fdbb02.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Wed, 24 Jan 2024 09:20:42 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=CNBJOs7MYd/PJvOqegaEXSXi1Kmn4O08aSFOmwMereykiBFrLDDNoqeUBj4YkIH7i5ymbOvd0DN1ayDWrTpfBxukO4MJeY1STvAmZPfVoEz+W9IQQdAqbkUDMMkNtGZRRzRY549WeHI4rT+XqpypEs7XHTCF/hGdl8ND28q0x6VLeH1LojP982lz0V76H2X66NhG46ooJw6Ao/lzZsjUUIUrcnHI1ScwLoXHq+fFZYANOn++qA1artbviFP2fANMi5VXIPVMnWzBZblupe5h1q+97JkWEBhIeO1jjLUi3pKSKFG5FbgUFZ0pefAKgInfG6lZK+dCb6oE/POLSKB4nQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=zYoEd27pB27LInDgYw0lAhhLLN2O85nCqCAwx5XYP18=; b=B3Abf8vy9IXqsggFJNJPi8NRwGu+7iZu80n0BsHwf4y+mlDdKMyYnj3Ejshd6yEuiqzoJx61SU9dD7DnMwvACew5AMIgUvS3m1o9ryefqnCuNgIResd1G/FYj07gYm+xgikxegHv6IwHocz3tViUMp7Z+3XIhnjnJF9Kqi413JsMR6jfAc1Hy14pzYWZXBsQLktsKXJsKsT6HzqwCYzLJ+LvsBTmiUpJnw5i5v/gEdz0jUbWdZAMyrcQPBmvzDten/lp4Jf+5JzUy6zrp+OO39GYjFMA94pT++ojfyOXIh7cUGTffniOIbOySUhWOKztVEJaJ9XXEZmIXnXIqHbQOw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=zYoEd27pB27LInDgYw0lAhhLLN2O85nCqCAwx5XYP18=; b=iM4+rte3h4OUVumU8uux/Jly+fgYaDAJKbzWcB9EDcqezTd4lOIILciwCmT/FTPu7Q1vPsBPRlSX1m/ICN8G7asDjYBaKToNXpjcfMng+HAA0z2dhFgb0kEza6kqcFvyFA8CtWK/NjEsH6en01SDWb4C2KCvD1EWyUncGf9egQc= Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by AS8PR08MB8735.eurprd08.prod.outlook.com (2603:10a6:20b:563::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7202.37; Wed, 24 Jan 2024 09:20:39 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::810c:8495:3f0a:ef8]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::810c:8495:3f0a:ef8%7]) with mapi id 15.20.7228.022; Wed, 24 Jan 2024 09:20:38 +0000 Date: Wed, 24 Jan 2024 09:20:36 +0000 From: Tamar Christina To: gcc-patches@gcc.gnu.org Cc: nd@arm.com, Richard.Earnshaw@arm.com, Marcus.Shawcroft@arm.com, Kyrylo.Tkachov@arm.com, richard.sandiford@arm.com Subject: [PATCH]AArch64: Fix expansion of Advanced SIMD div and mul using SVE [PR109636] Message-ID: Content-Disposition: inline X-ClientProxiedBy: LO3P123CA0033.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:388::11) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|AS8PR08MB8735:EE_|AMS1EPF0000004C:EE_|AS2PR08MB9269:EE_ X-MS-Office365-Filtering-Correlation-Id: 5423d077-5761-47de-20ca-08dc1cbde372 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: I9WUH9lpm0gCZcYOcRkxatJDKeBXmMwm9IITqOBTjiFHLvcDFFiqXayroOgWvHBRINrXZpzOHgvxtysIaz8X0sqFly1VEF7Eaux6gHITcH3pTNK7PCrD78DN30QeYzmWmUw4KPbN9WvSakT2EQ/mo+CM9EZRhj2/DAXgV7hlpJasNIF84BSVRidNp3C8xvd66I2NIpzv948Bmuvci2cOoEUtgExaQKS6nUE39Owbi6FIClm0By9yC89oMJ/BXLelktwx1uu+K//1OvBiEchm3s4VLWI2VWRkr0xLmmJbkUV1AI/G/8zOdpeeTbUzRBrVKHSYG1xvi2GpEzVhq3vsmYhkXD4GfYuXTj3PVGQRSdEi/PrteU1K3XHS5hksxSfwMko4hkw8y1odVQYyGazxvMer9Tu6m5MuS4+jMRL4alauvVKe7piclLp/fTtHz+CbNR7Q2zplf2AnmlvgPcImcb76VVVL6a2QsvjaQBCO/EcVaUYjQsitX80e7X6+tksQX39oTAmCU/r4G1GOuFALiGKr6Xp4MMEyHFfX6YDrsZS2oqMDdPu6sdcd/wBQoEeguIGMlXMGm66CkqxXkCtfHf7/09TbMdHovlqiLWct1GQhHCs5RiPZkwDx+3AT9N1NANPtffThbsEVNTNKKmobiSXIGsMRxn5RoeLeq3cWuMY= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230031)(136003)(376002)(366004)(39860400002)(396003)(346002)(230922051799003)(451199024)(64100799003)(186009)(1800799012)(83380400001)(2616005)(33964004)(4743002)(26005)(6512007)(41300700001)(38100700002)(66946007)(44832011)(5660300002)(44144004)(235185007)(6486002)(8676002)(8936002)(4326008)(30864003)(478600001)(6506007)(2906002)(6916009)(66476007)(66556008)(316002)(36756003)(86362001)(84970400001)(4216001)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB8735 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AMS1EPF0000004C.eurprd04.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 78e01bba-f1f8-4b2e-9ff8-08dc1cbdba9b X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: COhQT/yGEiHgKjvVNyAeXfqbxcCYkqwGoRHmq8Jp5vz9YoF8QhTvMmDhxfdT22FAqOGc3vaotzxaeIFViTTLhpNDFIPDLHVs0Z2ipNqSMjEUZR1TAqfm1qDRIZWFn0rW7EGGeawDAGSm+PDijL8uyJJ0rIUzjnT1cda2YHViaGgbKSsp6ZE+hffZwMipLIVyGhNWw6pGrp0IEiVuF2Cm/TbY0GzdW4qws18hyelwAx8RYG/oQcPGu5jF2xswkN2mE6+gv991BLMpAVqqIIsQ7Doed1nJU8tLKLbj+NP19+nB4gtAZs3AsJZ09mG2QKfu3ScbuEZQJxSA9UT3QMhIYPIWOiMe6S4HVZ/QhiTN52WDxWFb4e887V6jPnNKpE8RftzLX0KD8NWsGApebShFe5crLOr4ILMlveE3y9PLLE+zCNjxhZdTcS8PzshFXTFJSPy/6BHkshpfMeUPoo0wECAexRYXrD0mkbTMOrGD2xCU+b0/KxRmDEi7RtajW6fA36P1s8vTlow4taQAOL/rDpb2uF2Nd/cxcjyNMT/ZxMm7M4DH2rcETBFXRYWnS5nCF1b5IgviVINRZcREIH2Cn5yR5kLXBNkemrw4oEf9rDw0l0SiLHg3y1XGdBLjH83jIxGOkSDKZEnfgDBgRCit4fC8lcazIXVksQB5iQNTqV3QLRCZBifHWVTgge/+btqLBN0AuknNql1E5mxg1jFczU5vlSxIuwRTo796mNusBafcSkv/hTM4lWGnXL4nfWGH2tD6CScC94TQcuHsNkOgF9AktRgmuWfImOx2U1ObqkkNROY2ulDpxPwLjTLfm+oH X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230031)(4636009)(39860400002)(346002)(376002)(396003)(136003)(230922051799003)(1800799012)(82310400011)(186009)(64100799003)(451199024)(40470700004)(46966006)(36840700001)(235185007)(47076005)(2906002)(83380400001)(26005)(81166007)(336012)(2616005)(4743002)(6916009)(41300700001)(5660300002)(8936002)(8676002)(356005)(36860700001)(30864003)(4326008)(70206006)(316002)(44832011)(478600001)(6506007)(70586007)(44144004)(6486002)(6512007)(82740400003)(36756003)(33964004)(86362001)(84970400001)(40460700003)(40480700001)(4216001)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Jan 2024 09:21:46.9290 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 5423d077-5761-47de-20ca-08dc1cbde372 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AMS1EPF0000004C.eurprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS2PR08MB9269 X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_DMARC_NONE, KAM_LOTSOFHASH, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1788963120525161960 X-GMAIL-MSGID: 1788963120525161960 Hi All, As suggested in the ticket this replaces the expansion by converting the Advanced SIMD types to SVE types by simply printing out an SVE register for these instructions. This fixes the subreg issues since there are no subregs involved anymore. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: PR target/109636 * config/aarch64/aarch64-simd.md (div3, mulv2di3): Remove. * config/aarch64/iterators.md (VQDIV): Remove. (SVE_FULL_SDI_SIMD, SVE_FULL_SDI_SIMD_DI, SVE_FULL_HSDI_SIMD_DI, SVE_I_SIMD_DI): New. (VPRED, sve_lane_con): Add V4SI and V2DI. * config/aarch64/aarch64-sve.md (3, @aarch64_pred_): Support Advanced SIMD types. (mul3): New, split from 3. (@aarch64_pred_, *post_ra_3): New. * config/aarch64/aarch64-sve2.md (@aarch64_mul_lane_, *aarch64_mul_unpredicated_): Change SVE_FULL_HSDI to SVE_FULL_HSDI_SIMD_DI. gcc/testsuite/ChangeLog: PR target/109636 * gcc.target/aarch64/sve/pr109636_1.c: New test. * gcc.target/aarch64/sve/pr109636_2.c: New test. * gcc.target/aarch64/sve2/pr109636_1.c: New test. --- inline copy of patch -- diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 6f48b4d5f21da9f96a376cd6b34110c2a39deb33..556d0cf359fedf2c28dfe1e0a75e1c12321be68a 100644 --- diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 6f48b4d5f21da9f96a376cd6b34110c2a39deb33..556d0cf359fedf2c28dfe1e0a75e1c12321be68a 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -389,26 +389,6 @@ (define_insn "mul3" [(set_attr "type" "neon_mul_")] ) -;; Advanced SIMD does not support vector DImode MUL, but SVE does. -;; Make use of the overlap between Z and V registers to implement the V2DI -;; optab for TARGET_SVE. The mulvnx2di3 expander can -;; handle the TARGET_SVE2 case transparently. -(define_expand "mulv2di3" - [(set (match_operand:V2DI 0 "register_operand") - (mult:V2DI (match_operand:V2DI 1 "register_operand") - (match_operand:V2DI 2 "aarch64_sve_vsm_operand")))] - "TARGET_SVE" - { - machine_mode sve_mode = VNx2DImode; - rtx sve_op0 = simplify_gen_subreg (sve_mode, operands[0], V2DImode, 0); - rtx sve_op1 = simplify_gen_subreg (sve_mode, operands[1], V2DImode, 0); - rtx sve_op2 = simplify_gen_subreg (sve_mode, operands[2], V2DImode, 0); - - emit_insn (gen_mulvnx2di3 (sve_op0, sve_op1, sve_op2)); - DONE; - } -) - (define_insn "bswap2" [(set (match_operand:VDQHSD 0 "register_operand" "=w") (bswap:VDQHSD (match_operand:VDQHSD 1 "register_operand" "w")))] @@ -2678,27 +2658,6 @@ (define_insn "*div3" [(set_attr "type" "neon_fp_div_")] ) -;; SVE has vector integer divisions, unlike Advanced SIMD. -;; We can use it with Advanced SIMD modes to expose the V2DI and V4SI -;; optabs to the midend. -(define_expand "div3" - [(set (match_operand:VQDIV 0 "register_operand") - (ANY_DIV:VQDIV - (match_operand:VQDIV 1 "register_operand") - (match_operand:VQDIV 2 "register_operand")))] - "TARGET_SVE" - { - machine_mode sve_mode - = aarch64_full_sve_mode (GET_MODE_INNER (mode)).require (); - rtx sve_op0 = simplify_gen_subreg (sve_mode, operands[0], mode, 0); - rtx sve_op1 = simplify_gen_subreg (sve_mode, operands[1], mode, 0); - rtx sve_op2 = simplify_gen_subreg (sve_mode, operands[2], mode, 0); - - emit_insn (gen_div3 (sve_op0, sve_op1, sve_op2)); - DONE; - } -) - (define_insn "neg2" [(set (match_operand:VHSDF 0 "register_operand" "=w") (neg:VHSDF (match_operand:VHSDF 1 "register_operand" "w")))] diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md index e1e3c1bd0b7d12eefe43dc95a10716c24e3a48de..eca8623e587af944927a9459e29d5f8af170d347 100644 --- a/gcc/config/aarch64/aarch64-sve.md +++ b/gcc/config/aarch64/aarch64-sve.md @@ -3789,16 +3789,35 @@ (define_expand "3" [(set (match_operand:SVE_I 0 "register_operand") (unspec:SVE_I [(match_dup 3) - (SVE_INT_BINARY_IMM:SVE_I + (SVE_INT_BINARY_MULTI:SVE_I (match_operand:SVE_I 1 "register_operand") (match_operand:SVE_I 2 "aarch64_sve__operand"))] UNSPEC_PRED_X))] "TARGET_SVE" + { + operands[3] = aarch64_ptrue_reg (mode); + } +) + +;; Unpredicated integer binary operations that have an immediate form. +;; Advanced SIMD does not support vector DImode MUL, but SVE does. +;; Make use of the overlap between Z and V registers to implement the V2DI +;; optab for TARGET_SVE. The mulvnx2di3 expander can +;; handle the TARGET_SVE2 case transparently. +(define_expand "mul3" + [(set (match_operand:SVE_I_SIMD_DI 0 "register_operand") + (unspec:SVE_I_SIMD_DI + [(match_dup 3) + (mult:SVE_I_SIMD_DI + (match_operand:SVE_I_SIMD_DI 1 "register_operand") + (match_operand:SVE_I_SIMD_DI 2 "aarch64_sve_vsm_operand"))] + UNSPEC_PRED_X))] + "TARGET_SVE" { /* SVE2 supports the MUL (vectors, unpredicated) form. Emit the simple pattern for it here rather than splitting off the MULT expander separately. */ - if (TARGET_SVE2 && == MULT) + if (TARGET_SVE2) { emit_move_insn (operands[0], gen_rtx_MULT (mode, operands[1], operands[2])); @@ -3814,26 +3833,26 @@ (define_expand "3" ;; and would make the instruction seem less uniform to the register ;; allocator. (define_insn_and_split "@aarch64_pred_" - [(set (match_operand:SVE_I 0 "register_operand") - (unspec:SVE_I + [(set (match_operand:SVE_I_SIMD_DI 0 "register_operand") + (unspec:SVE_I_SIMD_DI [(match_operand: 1 "register_operand") - (SVE_INT_BINARY_IMM:SVE_I - (match_operand:SVE_I 2 "register_operand") - (match_operand:SVE_I 3 "aarch64_sve__operand"))] + (SVE_INT_BINARY_IMM:SVE_I_SIMD_DI + (match_operand:SVE_I_SIMD_DI 2 "register_operand") + (match_operand:SVE_I_SIMD_DI 3 "aarch64_sve__operand"))] UNSPEC_PRED_X))] "TARGET_SVE" {@ [ cons: =0 , 1 , 2 , 3 ; attrs: movprfx ] [ w , Upl , %0 , ; * ] # - [ w , Upl , 0 , w ; * ] \t%0., %1/m, %0., %3. + [ w , Upl , 0 , w ; * ] \t%Z0., %1/m, %Z0., %Z3. [ ?&w , Upl , w , ; yes ] # - [ ?&w , Upl , w , w ; yes ] movprfx\t%0, %2\;\t%0., %1/m, %0., %3. + [ ?&w , Upl , w , w ; yes ] movprfx\t%Z0, %Z2\;\t%Z0., %1/m, %Z0., %Z3. } ; Split the unpredicated form after reload, so that we don't have ; the unnecessary PTRUE. "&& reload_completed && !register_operand (operands[3], mode)" [(set (match_dup 0) - (SVE_INT_BINARY_IMM:SVE_I (match_dup 2) (match_dup 3)))] + (SVE_INT_BINARY_IMM:SVE_I_SIMD_DI (match_dup 2) (match_dup 3)))] "" ) @@ -3841,14 +3860,14 @@ (define_insn_and_split "@aarch64_pred_" ;; These are generated by splitting a predicated instruction whose ;; predicate is unused. (define_insn "*post_ra_3" - [(set (match_operand:SVE_I 0 "register_operand" "=w, ?&w") - (SVE_INT_BINARY_IMM:SVE_I - (match_operand:SVE_I 1 "register_operand" "0, w") - (match_operand:SVE_I 2 "aarch64_sve__immediate")))] + [(set (match_operand:SVE_I_SIMD_DI 0 "register_operand" "=w, ?&w") + (SVE_INT_BINARY_IMM:SVE_I_SIMD_DI + (match_operand:SVE_I_SIMD_DI 1 "register_operand" "0, w") + (match_operand:SVE_I_SIMD_DI 2 "aarch64_sve__immediate")))] "TARGET_SVE && reload_completed" "@ - \t%0., %0., #%2 - movprfx\t%0, %1\;\t%0., %0., #%2" + \t%Z0., %Z0., #%2 + movprfx\t%Z0, %Z1\;\t%Z0., %Z0., #%2" [(set_attr "movprfx" "*,yes")] ) @@ -4458,13 +4477,16 @@ (define_insn "*cond__z" ;; ------------------------------------------------------------------------- ;; Unpredicated integer division. +;; SVE has vector integer divisions, unlike Advanced SIMD. +;; We can use it with Advanced SIMD modes to expose the V2DI and V4SI +;; optabs to the midend. (define_expand "3" - [(set (match_operand:SVE_FULL_SDI 0 "register_operand") - (unspec:SVE_FULL_SDI + [(set (match_operand:SVE_FULL_SDI_SIMD 0 "register_operand") + (unspec:SVE_FULL_SDI_SIMD [(match_dup 3) - (SVE_INT_BINARY_SD:SVE_FULL_SDI - (match_operand:SVE_FULL_SDI 1 "register_operand") - (match_operand:SVE_FULL_SDI 2 "register_operand"))] + (SVE_INT_BINARY_SD:SVE_FULL_SDI_SIMD + (match_operand:SVE_FULL_SDI_SIMD 1 "register_operand") + (match_operand:SVE_FULL_SDI_SIMD 2 "register_operand"))] UNSPEC_PRED_X))] "TARGET_SVE" { @@ -4474,18 +4496,18 @@ (define_expand "3" ;; Integer division predicated with a PTRUE. (define_insn "@aarch64_pred_" - [(set (match_operand:SVE_FULL_SDI 0 "register_operand") - (unspec:SVE_FULL_SDI + [(set (match_operand:SVE_FULL_SDI_SIMD 0 "register_operand") + (unspec:SVE_FULL_SDI_SIMD [(match_operand: 1 "register_operand") - (SVE_INT_BINARY_SD:SVE_FULL_SDI - (match_operand:SVE_FULL_SDI 2 "register_operand") - (match_operand:SVE_FULL_SDI 3 "register_operand"))] + (SVE_INT_BINARY_SD:SVE_FULL_SDI_SIMD + (match_operand:SVE_FULL_SDI_SIMD 2 "register_operand") + (match_operand:SVE_FULL_SDI_SIMD 3 "register_operand"))] UNSPEC_PRED_X))] "TARGET_SVE" {@ [ cons: =0 , 1 , 2 , 3 ; attrs: movprfx ] - [ w , Upl , 0 , w ; * ] \t%0., %1/m, %0., %3. - [ w , Upl , w , 0 ; * ] r\t%0., %1/m, %0., %2. - [ ?&w , Upl , w , w ; yes ] movprfx\t%0, %2\;\t%0., %1/m, %0., %3. + [ w , Upl , 0 , w ; * ] \t%Z0., %1/m, %Z0., %Z3. + [ w , Upl , w , 0 ; * ] r\t%Z0., %1/m, %Z0., %Z2. + [ ?&w , Upl , w , w ; yes ] movprfx\t%Z0, %Z2\;\t%Z0., %1/m, %Z0., %Z3. } ) diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md index 1d1eb8bfdffe7b502c1ab5dd5a8ecc94b1e0214e..934e57055d3419e5dcc89b473fd110a0d4978b4f 100644 --- a/gcc/config/aarch64/aarch64-sve2.md +++ b/gcc/config/aarch64/aarch64-sve2.md @@ -615,29 +615,29 @@ (define_insn "@aarch64_sve_clamp_single" ;; ------------------------------------------------------------------------- (define_insn "@aarch64_mul_lane_" - [(set (match_operand:SVE_FULL_HSDI 0 "register_operand" "=w") - (mult:SVE_FULL_HSDI - (unspec:SVE_FULL_HSDI - [(match_operand:SVE_FULL_HSDI 2 "register_operand" "") + [(set (match_operand:SVE_FULL_HSDI_SIMD_DI 0 "register_operand" "=w") + (mult:SVE_FULL_HSDI_SIMD_DI + (unspec:SVE_FULL_HSDI_SIMD_DI + [(match_operand:SVE_FULL_HSDI_SIMD_DI 2 "register_operand" "") (match_operand:SI 3 "const_int_operand")] UNSPEC_SVE_LANE_SELECT) - (match_operand:SVE_FULL_HSDI 1 "register_operand" "w")))] + (match_operand:SVE_FULL_HSDI_SIMD_DI 1 "register_operand" "w")))] "TARGET_SVE2" - "mul\t%0., %1., %2.[%3]" + "mul\t%Z0., %Z1., %Z2.[%3]" ) ;; The 2nd and 3rd alternatives are valid for just TARGET_SVE as well but ;; we include them here to allow matching simpler, unpredicated RTL. (define_insn "*aarch64_mul_unpredicated_" - [(set (match_operand:SVE_I 0 "register_operand") - (mult:SVE_I - (match_operand:SVE_I 1 "register_operand") - (match_operand:SVE_I 2 "aarch64_sve_vsm_operand")))] + [(set (match_operand:SVE_I_SIMD_DI 0 "register_operand") + (mult:SVE_I_SIMD_DI + (match_operand:SVE_I_SIMD_DI 1 "register_operand") + (match_operand:SVE_I_SIMD_DI 2 "aarch64_sve_vsm_operand")))] "TARGET_SVE2" {@ [ cons: =0 , 1 , 2 ; attrs: movprfx ] - [ w , w , w ; * ] mul\t%0., %1., %2. - [ w , 0 , vsm ; * ] mul\t%0., %0., #%2 - [ ?&w , w , vsm ; yes ] movprfx\t%0, %1\;mul\t%0., %0., #%2 + [ w , w , w ; * ] mul\t%Z0., %Z1., %Z2. + [ w , 0 , vsm ; * ] mul\t%Z0., %Z0., #%2 + [ ?&w , w , vsm ; yes ] movprfx\t%Z0, %Z1\;mul\t%Z0., %Z0., #%2 } ) diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index 942270e99d6d0c6632199c059256f3a902a1b138..6acf58a0e8f1533d7c4c8450c5755d22f0955852 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -108,9 +108,6 @@ (define_mode_iterator DREG [V8QI V4HI V4HF V2SI V2SF DF]) ;; Copy of the above. (define_mode_iterator DREG2 [DREG]) -;; Advanced SIMD modes for integer divides. -(define_mode_iterator VQDIV [V4SI V2DI]) - ;; All modes suitable to store/load pair (2 elements) using STP/LDP. (define_mode_iterator VP_2E [V2SI V2SF V2DI V2DF]) @@ -471,6 +468,10 @@ (define_mode_iterator SVE_FULL_HSD [VNx8HI VNx4SI VNx2DI ;; elements. (define_mode_iterator SVE_FULL_HSDI [VNx8HI VNx4SI VNx2DI]) +;; Fully-packed SVE integer vector modes that have 16-bit, 32-bit or 64-bit +;; elements and Advanced SIMD Fully-packed 64-bit elements. +(define_mode_iterator SVE_FULL_HSDI_SIMD_DI [VNx8HI VNx4SI VNx2DI V2DI]) + ;; Fully-packed SVE integer vector modes that have 16-bit or 32-bit ;; elements. (define_mode_iterator SVE_FULL_HSI [VNx8HI VNx4SI]) @@ -488,6 +489,14 @@ (define_mode_iterator SVE_FULL_SD [VNx4SI VNx2DI VNx4SF VNx2DF]) ;; Fully-packed SVE integer vector modes that have 32-bit or 64-bit elements. (define_mode_iterator SVE_FULL_SDI [VNx4SI VNx2DI]) +;; Fully-packed SVE and Advanced SIMD integer vector modes that have 32-bit or +;; 64-bit elements. +(define_mode_iterator SVE_FULL_SDI_SIMD [VNx4SI VNx2DI V4SI V2DI]) + +;; Fully-packed SVE integer vector modes that have 32-bit or 64-bit elements +;; and Advanced SIMD 64-bit elements. +(define_mode_iterator SVE_FULL_SDI_SIMD_DI [VNx4SI VNx2DI]) + ;; 2x and 4x tuples of the above, excluding 2x DI. (define_mode_iterator SVE_FULL_SIx2_SDIx4 [VNx8SI VNx16SI VNx8DI]) @@ -550,6 +559,13 @@ (define_mode_iterator SVE_I [VNx16QI VNx8QI VNx4QI VNx2QI VNx4SI VNx2SI VNx2DI]) +;; All SVE integer vector modes and Advanced SIMD 64-bit vector +;; element modes +(define_mode_iterator SVE_I_SIMD_DI [VNx16QI VNx8QI VNx4QI VNx2QI + VNx8HI VNx4HI VNx2HI + VNx4SI VNx2SI + VNx2DI V2DI]) + ;; SVE integer vector modes whose elements are 16 bits or wider. (define_mode_iterator SVE_HSDI [VNx8HI VNx4HI VNx2HI VNx4SI VNx2SI @@ -2268,7 +2284,8 @@ (define_mode_attr VPRED [(VNx16QI "VNx16BI") (VNx8QI "VNx8BI") (VNx32HI "VNx8BI") (VNx32HF "VNx8BI") (VNx32BF "VNx8BI") (VNx16SI "VNx4BI") (VNx16SF "VNx4BI") - (VNx8DI "VNx2BI") (VNx8DF "VNx2BI")]) + (VNx8DI "VNx2BI") (VNx8DF "VNx2BI") + (V4SI "VNx4BI") (V2DI "VNx2BI")]) ;; ...and again in lower case. (define_mode_attr vpred [(VNx16QI "vnx16bi") (VNx8QI "vnx8bi") @@ -2370,6 +2387,7 @@ (define_mode_attr narrower_mask [(VNx8HI "0x81") (VNx4HI "0x41") ;; The constraint to use for an SVE [SU]DOT, FMUL, FMLA or FMLS lane index. (define_mode_attr sve_lane_con [(VNx8HI "y") (VNx4SI "y") (VNx2DI "x") + (V2DI "x") (VNx8HF "y") (VNx4SF "y") (VNx2DF "x")]) ;; The constraint to use for an SVE FCMLA lane index. diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr109636_1.c b/gcc/testsuite/gcc.target/aarch64/sve/pr109636_1.c new file mode 100644 index 0000000000000000000000000000000000000000..5b37ddd2770bcbbec37b9563644da0ba061d3789 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr109636_1.c @@ -0,0 +1,13 @@ +/* { dg-additional-options "-O -mtune=a64fx" } */ + +typedef unsigned long long __attribute__((__vector_size__ (16))) V; +typedef unsigned long long __attribute__((__vector_size__ (32))) W; + +extern void bar (V v); + +void foo (V v, W w) +{ + bar (__builtin_shuffle (v, __builtin_shufflevector ((V){}, w, 4, 5) / v)); +} + +/* { dg-final { scan-assembler {udiv\tz[0-9]+.d, p[0-9]+/m, z[0-9]+.d, z[0-9]+.d} } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr109636_2.c b/gcc/testsuite/gcc.target/aarch64/sve/pr109636_2.c new file mode 100644 index 0000000000000000000000000000000000000000..6d39dc8e590a04a486a300de10c5480d9c33afba --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr109636_2.c @@ -0,0 +1,13 @@ +/* { dg-additional-options "-O -mcpu=a64fx" } */ + +typedef unsigned long long __attribute__((__vector_size__ (16))) V; +typedef unsigned long long __attribute__((__vector_size__ (32))) W; + +extern void bar (V v); + +void foom (V v, W w) +{ + bar (__builtin_shuffle (v, __builtin_shufflevector ((V){}, w, 4, 5) * v)); +} + +/* { dg-final { scan-assembler {mul\tz[0-9]+.d, p[0-9]+/m, z[0-9]+.d, z[0-9]+.d} } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pr109636_1.c b/gcc/testsuite/gcc.target/aarch64/sve2/pr109636_1.c new file mode 100644 index 0000000000000000000000000000000000000000..2bea18ad703cb3e1a1ce896bcedc2530e831a192 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pr109636_1.c @@ -0,0 +1,13 @@ +/* { dg-additional-options "-O -mtune=a64fx" } */ + +typedef unsigned long long __attribute__((__vector_size__ (16))) V; +typedef unsigned long long __attribute__((__vector_size__ (32))) W; + +extern void bar (V v); + +void foom (V v, W w) +{ + bar (__builtin_shuffle (v, __builtin_shufflevector ((V){}, w, 4, 5) * v)); +} + +/* { dg-final { scan-assembler {mul\tz[0-9]+.d, z[0-9]+.d, z[0-9]+.d} } } */ --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -389,26 +389,6 @@ (define_insn "mul3" [(set_attr "type" "neon_mul_")] ) -;; Advanced SIMD does not support vector DImode MUL, but SVE does. -;; Make use of the overlap between Z and V registers to implement the V2DI -;; optab for TARGET_SVE. The mulvnx2di3 expander can -;; handle the TARGET_SVE2 case transparently. -(define_expand "mulv2di3" - [(set (match_operand:V2DI 0 "register_operand") - (mult:V2DI (match_operand:V2DI 1 "register_operand") - (match_operand:V2DI 2 "aarch64_sve_vsm_operand")))] - "TARGET_SVE" - { - machine_mode sve_mode = VNx2DImode; - rtx sve_op0 = simplify_gen_subreg (sve_mode, operands[0], V2DImode, 0); - rtx sve_op1 = simplify_gen_subreg (sve_mode, operands[1], V2DImode, 0); - rtx sve_op2 = simplify_gen_subreg (sve_mode, operands[2], V2DImode, 0); - - emit_insn (gen_mulvnx2di3 (sve_op0, sve_op1, sve_op2)); - DONE; - } -) - (define_insn "bswap2" [(set (match_operand:VDQHSD 0 "register_operand" "=w") (bswap:VDQHSD (match_operand:VDQHSD 1 "register_operand" "w")))] @@ -2678,27 +2658,6 @@ (define_insn "*div3" [(set_attr "type" "neon_fp_div_")] ) -;; SVE has vector integer divisions, unlike Advanced SIMD. -;; We can use it with Advanced SIMD modes to expose the V2DI and V4SI -;; optabs to the midend. -(define_expand "div3" - [(set (match_operand:VQDIV 0 "register_operand") - (ANY_DIV:VQDIV - (match_operand:VQDIV 1 "register_operand") - (match_operand:VQDIV 2 "register_operand")))] - "TARGET_SVE" - { - machine_mode sve_mode - = aarch64_full_sve_mode (GET_MODE_INNER (mode)).require (); - rtx sve_op0 = simplify_gen_subreg (sve_mode, operands[0], mode, 0); - rtx sve_op1 = simplify_gen_subreg (sve_mode, operands[1], mode, 0); - rtx sve_op2 = simplify_gen_subreg (sve_mode, operands[2], mode, 0); - - emit_insn (gen_div3 (sve_op0, sve_op1, sve_op2)); - DONE; - } -) - (define_insn "neg2" [(set (match_operand:VHSDF 0 "register_operand" "=w") (neg:VHSDF (match_operand:VHSDF 1 "register_operand" "w")))] diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md index e1e3c1bd0b7d12eefe43dc95a10716c24e3a48de..eca8623e587af944927a9459e29d5f8af170d347 100644 --- a/gcc/config/aarch64/aarch64-sve.md +++ b/gcc/config/aarch64/aarch64-sve.md @@ -3789,16 +3789,35 @@ (define_expand "3" [(set (match_operand:SVE_I 0 "register_operand") (unspec:SVE_I [(match_dup 3) - (SVE_INT_BINARY_IMM:SVE_I + (SVE_INT_BINARY_MULTI:SVE_I (match_operand:SVE_I 1 "register_operand") (match_operand:SVE_I 2 "aarch64_sve__operand"))] UNSPEC_PRED_X))] "TARGET_SVE" + { + operands[3] = aarch64_ptrue_reg (mode); + } +) + +;; Unpredicated integer binary operations that have an immediate form. +;; Advanced SIMD does not support vector DImode MUL, but SVE does. +;; Make use of the overlap between Z and V registers to implement the V2DI +;; optab for TARGET_SVE. The mulvnx2di3 expander can +;; handle the TARGET_SVE2 case transparently. +(define_expand "mul3" + [(set (match_operand:SVE_I_SIMD_DI 0 "register_operand") + (unspec:SVE_I_SIMD_DI + [(match_dup 3) + (mult:SVE_I_SIMD_DI + (match_operand:SVE_I_SIMD_DI 1 "register_operand") + (match_operand:SVE_I_SIMD_DI 2 "aarch64_sve_vsm_operand"))] + UNSPEC_PRED_X))] + "TARGET_SVE" { /* SVE2 supports the MUL (vectors, unpredicated) form. Emit the simple pattern for it here rather than splitting off the MULT expander separately. */ - if (TARGET_SVE2 && == MULT) + if (TARGET_SVE2) { emit_move_insn (operands[0], gen_rtx_MULT (mode, operands[1], operands[2])); @@ -3814,26 +3833,26 @@ (define_expand "3" ;; and would make the instruction seem less uniform to the register ;; allocator. (define_insn_and_split "@aarch64_pred_" - [(set (match_operand:SVE_I 0 "register_operand") - (unspec:SVE_I + [(set (match_operand:SVE_I_SIMD_DI 0 "register_operand") + (unspec:SVE_I_SIMD_DI [(match_operand: 1 "register_operand") - (SVE_INT_BINARY_IMM:SVE_I - (match_operand:SVE_I 2 "register_operand") - (match_operand:SVE_I 3 "aarch64_sve__operand"))] + (SVE_INT_BINARY_IMM:SVE_I_SIMD_DI + (match_operand:SVE_I_SIMD_DI 2 "register_operand") + (match_operand:SVE_I_SIMD_DI 3 "aarch64_sve__operand"))] UNSPEC_PRED_X))] "TARGET_SVE" {@ [ cons: =0 , 1 , 2 , 3 ; attrs: movprfx ] [ w , Upl , %0 , ; * ] # - [ w , Upl , 0 , w ; * ] \t%0., %1/m, %0., %3. + [ w , Upl , 0 , w ; * ] \t%Z0., %1/m, %Z0., %Z3. [ ?&w , Upl , w , ; yes ] # - [ ?&w , Upl , w , w ; yes ] movprfx\t%0, %2\;\t%0., %1/m, %0., %3. + [ ?&w , Upl , w , w ; yes ] movprfx\t%Z0, %Z2\;\t%Z0., %1/m, %Z0., %Z3. } ; Split the unpredicated form after reload, so that we don't have ; the unnecessary PTRUE. "&& reload_completed && !register_operand (operands[3], mode)" [(set (match_dup 0) - (SVE_INT_BINARY_IMM:SVE_I (match_dup 2) (match_dup 3)))] + (SVE_INT_BINARY_IMM:SVE_I_SIMD_DI (match_dup 2) (match_dup 3)))] "" ) @@ -3841,14 +3860,14 @@ (define_insn_and_split "@aarch64_pred_" ;; These are generated by splitting a predicated instruction whose ;; predicate is unused. (define_insn "*post_ra_3" - [(set (match_operand:SVE_I 0 "register_operand" "=w, ?&w") - (SVE_INT_BINARY_IMM:SVE_I - (match_operand:SVE_I 1 "register_operand" "0, w") - (match_operand:SVE_I 2 "aarch64_sve__immediate")))] + [(set (match_operand:SVE_I_SIMD_DI 0 "register_operand" "=w, ?&w") + (SVE_INT_BINARY_IMM:SVE_I_SIMD_DI + (match_operand:SVE_I_SIMD_DI 1 "register_operand" "0, w") + (match_operand:SVE_I_SIMD_DI 2 "aarch64_sve__immediate")))] "TARGET_SVE && reload_completed" "@ - \t%0., %0., #%2 - movprfx\t%0, %1\;\t%0., %0., #%2" + \t%Z0., %Z0., #%2 + movprfx\t%Z0, %Z1\;\t%Z0., %Z0., #%2" [(set_attr "movprfx" "*,yes")] ) @@ -4458,13 +4477,16 @@ (define_insn "*cond__z" ;; ------------------------------------------------------------------------- ;; Unpredicated integer division. +;; SVE has vector integer divisions, unlike Advanced SIMD. +;; We can use it with Advanced SIMD modes to expose the V2DI and V4SI +;; optabs to the midend. (define_expand "3" - [(set (match_operand:SVE_FULL_SDI 0 "register_operand") - (unspec:SVE_FULL_SDI + [(set (match_operand:SVE_FULL_SDI_SIMD 0 "register_operand") + (unspec:SVE_FULL_SDI_SIMD [(match_dup 3) - (SVE_INT_BINARY_SD:SVE_FULL_SDI - (match_operand:SVE_FULL_SDI 1 "register_operand") - (match_operand:SVE_FULL_SDI 2 "register_operand"))] + (SVE_INT_BINARY_SD:SVE_FULL_SDI_SIMD + (match_operand:SVE_FULL_SDI_SIMD 1 "register_operand") + (match_operand:SVE_FULL_SDI_SIMD 2 "register_operand"))] UNSPEC_PRED_X))] "TARGET_SVE" { @@ -4474,18 +4496,18 @@ (define_expand "3" ;; Integer division predicated with a PTRUE. (define_insn "@aarch64_pred_" - [(set (match_operand:SVE_FULL_SDI 0 "register_operand") - (unspec:SVE_FULL_SDI + [(set (match_operand:SVE_FULL_SDI_SIMD 0 "register_operand") + (unspec:SVE_FULL_SDI_SIMD [(match_operand: 1 "register_operand") - (SVE_INT_BINARY_SD:SVE_FULL_SDI - (match_operand:SVE_FULL_SDI 2 "register_operand") - (match_operand:SVE_FULL_SDI 3 "register_operand"))] + (SVE_INT_BINARY_SD:SVE_FULL_SDI_SIMD + (match_operand:SVE_FULL_SDI_SIMD 2 "register_operand") + (match_operand:SVE_FULL_SDI_SIMD 3 "register_operand"))] UNSPEC_PRED_X))] "TARGET_SVE" {@ [ cons: =0 , 1 , 2 , 3 ; attrs: movprfx ] - [ w , Upl , 0 , w ; * ] \t%0., %1/m, %0., %3. - [ w , Upl , w , 0 ; * ] r\t%0., %1/m, %0., %2. - [ ?&w , Upl , w , w ; yes ] movprfx\t%0, %2\;\t%0., %1/m, %0., %3. + [ w , Upl , 0 , w ; * ] \t%Z0., %1/m, %Z0., %Z3. + [ w , Upl , w , 0 ; * ] r\t%Z0., %1/m, %Z0., %Z2. + [ ?&w , Upl , w , w ; yes ] movprfx\t%Z0, %Z2\;\t%Z0., %1/m, %Z0., %Z3. } ) diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md index 1d1eb8bfdffe7b502c1ab5dd5a8ecc94b1e0214e..934e57055d3419e5dcc89b473fd110a0d4978b4f 100644 --- a/gcc/config/aarch64/aarch64-sve2.md +++ b/gcc/config/aarch64/aarch64-sve2.md @@ -615,29 +615,29 @@ (define_insn "@aarch64_sve_clamp_single" ;; ------------------------------------------------------------------------- (define_insn "@aarch64_mul_lane_" - [(set (match_operand:SVE_FULL_HSDI 0 "register_operand" "=w") - (mult:SVE_FULL_HSDI - (unspec:SVE_FULL_HSDI - [(match_operand:SVE_FULL_HSDI 2 "register_operand" "") + [(set (match_operand:SVE_FULL_HSDI_SIMD_DI 0 "register_operand" "=w") + (mult:SVE_FULL_HSDI_SIMD_DI + (unspec:SVE_FULL_HSDI_SIMD_DI + [(match_operand:SVE_FULL_HSDI_SIMD_DI 2 "register_operand" "") (match_operand:SI 3 "const_int_operand")] UNSPEC_SVE_LANE_SELECT) - (match_operand:SVE_FULL_HSDI 1 "register_operand" "w")))] + (match_operand:SVE_FULL_HSDI_SIMD_DI 1 "register_operand" "w")))] "TARGET_SVE2" - "mul\t%0., %1., %2.[%3]" + "mul\t%Z0., %Z1., %Z2.[%3]" ) ;; The 2nd and 3rd alternatives are valid for just TARGET_SVE as well but ;; we include them here to allow matching simpler, unpredicated RTL. (define_insn "*aarch64_mul_unpredicated_" - [(set (match_operand:SVE_I 0 "register_operand") - (mult:SVE_I - (match_operand:SVE_I 1 "register_operand") - (match_operand:SVE_I 2 "aarch64_sve_vsm_operand")))] + [(set (match_operand:SVE_I_SIMD_DI 0 "register_operand") + (mult:SVE_I_SIMD_DI + (match_operand:SVE_I_SIMD_DI 1 "register_operand") + (match_operand:SVE_I_SIMD_DI 2 "aarch64_sve_vsm_operand")))] "TARGET_SVE2" {@ [ cons: =0 , 1 , 2 ; attrs: movprfx ] - [ w , w , w ; * ] mul\t%0., %1., %2. - [ w , 0 , vsm ; * ] mul\t%0., %0., #%2 - [ ?&w , w , vsm ; yes ] movprfx\t%0, %1\;mul\t%0., %0., #%2 + [ w , w , w ; * ] mul\t%Z0., %Z1., %Z2. + [ w , 0 , vsm ; * ] mul\t%Z0., %Z0., #%2 + [ ?&w , w , vsm ; yes ] movprfx\t%Z0, %Z1\;mul\t%Z0., %Z0., #%2 } ) diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index 942270e99d6d0c6632199c059256f3a902a1b138..6acf58a0e8f1533d7c4c8450c5755d22f0955852 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -108,9 +108,6 @@ (define_mode_iterator DREG [V8QI V4HI V4HF V2SI V2SF DF]) ;; Copy of the above. (define_mode_iterator DREG2 [DREG]) -;; Advanced SIMD modes for integer divides. -(define_mode_iterator VQDIV [V4SI V2DI]) - ;; All modes suitable to store/load pair (2 elements) using STP/LDP. (define_mode_iterator VP_2E [V2SI V2SF V2DI V2DF]) @@ -471,6 +468,10 @@ (define_mode_iterator SVE_FULL_HSD [VNx8HI VNx4SI VNx2DI ;; elements. (define_mode_iterator SVE_FULL_HSDI [VNx8HI VNx4SI VNx2DI]) +;; Fully-packed SVE integer vector modes that have 16-bit, 32-bit or 64-bit +;; elements and Advanced SIMD Fully-packed 64-bit elements. +(define_mode_iterator SVE_FULL_HSDI_SIMD_DI [VNx8HI VNx4SI VNx2DI V2DI]) + ;; Fully-packed SVE integer vector modes that have 16-bit or 32-bit ;; elements. (define_mode_iterator SVE_FULL_HSI [VNx8HI VNx4SI]) @@ -488,6 +489,14 @@ (define_mode_iterator SVE_FULL_SD [VNx4SI VNx2DI VNx4SF VNx2DF]) ;; Fully-packed SVE integer vector modes that have 32-bit or 64-bit elements. (define_mode_iterator SVE_FULL_SDI [VNx4SI VNx2DI]) +;; Fully-packed SVE and Advanced SIMD integer vector modes that have 32-bit or +;; 64-bit elements. +(define_mode_iterator SVE_FULL_SDI_SIMD [VNx4SI VNx2DI V4SI V2DI]) + +;; Fully-packed SVE integer vector modes that have 32-bit or 64-bit elements +;; and Advanced SIMD 64-bit elements. +(define_mode_iterator SVE_FULL_SDI_SIMD_DI [VNx4SI VNx2DI]) + ;; 2x and 4x tuples of the above, excluding 2x DI. (define_mode_iterator SVE_FULL_SIx2_SDIx4 [VNx8SI VNx16SI VNx8DI]) @@ -550,6 +559,13 @@ (define_mode_iterator SVE_I [VNx16QI VNx8QI VNx4QI VNx2QI VNx4SI VNx2SI VNx2DI]) +;; All SVE integer vector modes and Advanced SIMD 64-bit vector +;; element modes +(define_mode_iterator SVE_I_SIMD_DI [VNx16QI VNx8QI VNx4QI VNx2QI + VNx8HI VNx4HI VNx2HI + VNx4SI VNx2SI + VNx2DI V2DI]) + ;; SVE integer vector modes whose elements are 16 bits or wider. (define_mode_iterator SVE_HSDI [VNx8HI VNx4HI VNx2HI VNx4SI VNx2SI @@ -2268,7 +2284,8 @@ (define_mode_attr VPRED [(VNx16QI "VNx16BI") (VNx8QI "VNx8BI") (VNx32HI "VNx8BI") (VNx32HF "VNx8BI") (VNx32BF "VNx8BI") (VNx16SI "VNx4BI") (VNx16SF "VNx4BI") - (VNx8DI "VNx2BI") (VNx8DF "VNx2BI")]) + (VNx8DI "VNx2BI") (VNx8DF "VNx2BI") + (V4SI "VNx4BI") (V2DI "VNx2BI")]) ;; ...and again in lower case. (define_mode_attr vpred [(VNx16QI "vnx16bi") (VNx8QI "vnx8bi") @@ -2370,6 +2387,7 @@ (define_mode_attr narrower_mask [(VNx8HI "0x81") (VNx4HI "0x41") ;; The constraint to use for an SVE [SU]DOT, FMUL, FMLA or FMLS lane index. (define_mode_attr sve_lane_con [(VNx8HI "y") (VNx4SI "y") (VNx2DI "x") + (V2DI "x") (VNx8HF "y") (VNx4SF "y") (VNx2DF "x")]) ;; The constraint to use for an SVE FCMLA lane index. diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr109636_1.c b/gcc/testsuite/gcc.target/aarch64/sve/pr109636_1.c new file mode 100644 index 0000000000000000000000000000000000000000..5b37ddd2770bcbbec37b9563644da0ba061d3789 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr109636_1.c @@ -0,0 +1,13 @@ +/* { dg-additional-options "-O -mtune=a64fx" } */ + +typedef unsigned long long __attribute__((__vector_size__ (16))) V; +typedef unsigned long long __attribute__((__vector_size__ (32))) W; + +extern void bar (V v); + +void foo (V v, W w) +{ + bar (__builtin_shuffle (v, __builtin_shufflevector ((V){}, w, 4, 5) / v)); +} + +/* { dg-final { scan-assembler {udiv\tz[0-9]+.d, p[0-9]+/m, z[0-9]+.d, z[0-9]+.d} } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr109636_2.c b/gcc/testsuite/gcc.target/aarch64/sve/pr109636_2.c new file mode 100644 index 0000000000000000000000000000000000000000..6d39dc8e590a04a486a300de10c5480d9c33afba --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr109636_2.c @@ -0,0 +1,13 @@ +/* { dg-additional-options "-O -mcpu=a64fx" } */ + +typedef unsigned long long __attribute__((__vector_size__ (16))) V; +typedef unsigned long long __attribute__((__vector_size__ (32))) W; + +extern void bar (V v); + +void foom (V v, W w) +{ + bar (__builtin_shuffle (v, __builtin_shufflevector ((V){}, w, 4, 5) * v)); +} + +/* { dg-final { scan-assembler {mul\tz[0-9]+.d, p[0-9]+/m, z[0-9]+.d, z[0-9]+.d} } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pr109636_1.c b/gcc/testsuite/gcc.target/aarch64/sve2/pr109636_1.c new file mode 100644 index 0000000000000000000000000000000000000000..2bea18ad703cb3e1a1ce896bcedc2530e831a192 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pr109636_1.c @@ -0,0 +1,13 @@ +/* { dg-additional-options "-O -mtune=a64fx" } */ + +typedef unsigned long long __attribute__((__vector_size__ (16))) V; +typedef unsigned long long __attribute__((__vector_size__ (32))) W; + +extern void bar (V v); + +void foom (V v, W w) +{ + bar (__builtin_shuffle (v, __builtin_shufflevector ((V){}, w, 4, 5) * v)); +} + +/* { dg-final { scan-assembler {mul\tz[0-9]+.d, z[0-9]+.d, z[0-9]+.d} } } */